Engineering Psychology and Cognitive Ergonomics: 20th International Conference, EPCE 2023 Held as Part of the 25th HCI International Conference, HCII 2023 Copenhagen, Denmark, July 23–28, 2023 Proceedings, Part II 3031353889, 9783031353888

This two-volume set LNCS 14017 - 14018 constitutes the thoroughly refereed proceedings of the 20th International Confere

1,266 60 53MB

English Pages 618 [619] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Engineering Psychology and Cognitive Ergonomics: 20th International Conference, EPCE 2023 Held as Part of the 25th HCI International Conference, HCII 2023 Copenhagen, Denmark, July 23–28, 2023 Proceedings, Part II
 3031353889, 9783031353888

Table of contents :
Foreword
HCI International 2023 Thematic Areas and Affiliated Conferences
List of Conference Proceedings Volumes Appearing Before the Conference
Preface
20th International Conference on Engineering Psychology and Cognitive Ergonomics (EPCE 2023)
HCI International 2024 Conference
Contents – Part II
Contents – Part I
Human Factors in Aviation
Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots
1 Introduction
1.1 Background
1.2 Research Problem
2 Literature Review
3 Research Outline
3.1 Aim
3.2 Research Questions
3.3 Hypothesis
4 Method
4.1 Participant Selection
4.2 Apparatus
4.3 Procedure
4.4 Experimental Design and Analysis
5 Results
5.1 Qualitative Results
6 Discussion
6.1 Contribution to UAS Human Factors
6.2 Study Limitations
6.3 Future Research
7 Conclusion
References
Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck
1 Introduction
2 Background
3 Objectives
4 Participants
5 Method
5.1 Task
5.2 Materials
5.3 Virtual Reality System
6 Results
7 Discussion
7.1 Limitations
8 Conclusion
References
A Comparative Evaluation of Human Factors Intervention Approaches in Aviation Safety Management
1 Introduction
1.1 Different Approaches to Human Factors Intervention
1.2 Workforce Preferences and Recommended Interventions
2 Method
2.1 Participants
2.2 Research Design
2.3 Statistical Analysis
3 Results and Discussion
3.1 Evaluative Differences Amongst Intervention Approaches
3.2 Task, Human, and Organization Interventions More Feasible and Cost-Efficient
3.3 Assessed Effectiveness Subject to Mitigation Myopia
4 Conclusion
References
An In-Depth Examination of Mental Incapacitation and Startle Reflex: A Flight Simulator Study
1 Introduction
2 Method
2.1 Participants
2.2 Phase 1: Questionnaires and Laboratory Tests
2.3 Phase 2: Flight Simulation Tasks
3 Preliminary Results
3.1 Auto-evaluation of Surprise and Startle During the Startling Flight Scenario
3.2 Startle Reflex During the Thunder Sound and the Flash of Lightning
3.3 Personality Evaluation
3.4 Workload During the Three Tasks and the Startling Flight Scenario
4 Discussion
5 Conclusion
References
High-Fidelity Task Analysis Identifying the Needs of Future Cockpits Featuring Single Pilot Operation with Large Transport Airplanes
1 Introduction
2 Research Context
3 Basics
4 Previous Work
5 Related Work
6 Method
7 Results
8 Discussion
9 Conclusion
References
DART – Providing Distress Assistance with Real-Time Aircraft Telemetry
1 Introduction
2 Potential for Distress Assistance and Better Outcomes
3 Analysis of Requirements and Prerequisites
4 Technical and Economic Requirements
5 Spectrum, Data Bandwidth and Costing Considerations
6 Establishment of a DART Programme
7 Conclusion
References
A Semi-quantitative Method for Congestion Alleviation at Air Route Intersections Based on System Dynamics
1 Introduction
2 Method
2.1 Participants
2.2 Indicator Selection
2.3 Data Collection and Processing
2.4 Model Analysis
3 Results and Discussion
4 Conclusion
References:
Research on Evaluation of Network Integration of Multiple Hub Airports Within a Region
1 Introduction
1.1 Multi-airport Operation Mode
1.2 Multi-airport Connections and Network
1.3 Multi-airport and Regional Interaction
2 Method
2.1 Identification of Multi-hub Groups
2.2 Construction of Index Framework
2.3 Evaluation Method
3 Results
4 Discussion
5 Conclusion
References
Usability Evaluation of an Emergency Alerting System to Improve Discrete Communication During Emergencies
1 Introduction
1.1 Research Background
1.2 Examples of Existing Emergency Buttons in Mass Transportation
2 Methodology
2.1 Material
2.2 Research Design
3 Results
3.1 Basic Demographics
3.2 Differences in the User Acceptance Between Passengers and Cabin Crew
3.3 Differences in the User Acceptance Between Passengers and Cabin Crew
4 Discussion
4.1 Prospect Studies
5 Conclusion
References
The Application of a Human-Centred Design Integrated Touchscreen Display with Automated Electronic Checklist in the Flight Deck
1 Introduction
1.1 Background of Research
1.2 Research Aim
2 Methods
2.1 Overall Experiment Design
2.2 Simulated Display Design
2.3 Experiment Procedures
3 Result
3.1 Sample Characteristics
3.2 Objective Measurements
3.3 Subjective Workload, Situational Awareness, and System Usability
3.4 Thematic Analysis of Interviews and Answers for Open-Ended Questions
4 Discussion
4.1 Shorter Task Completion Time in Manual ECL Design
4.2 Interpretation of Eye Tracking Results
4.3 Differences in Subjective Workload and Usability
4.4 Review of SA, Workload, and Nature of NNC Operations
4.5 S/CD with Adaptive Activation of Switch
4.6 Flexible/Adaptive Automation
4.7 Human Machine Teaming
4.8 Limitations of This Study
5 Conclusions
References
Analysis of Airline Pilots’ Risk Perception While Flying During COVID-19 Through Flight Data Monitoring and Air Safety Reports
1 Introduction
2 Relative Work
3 Methodology
3.1 Data Source
3.2 Research Procedure
3.3 Statistical Tools Used
4 Results
4.1 Sample Characteristics
4.2 Testing the Increase in Severity Index for LOC-I and RE Events and for Events Related to Manual Flying as the Pandemic Started
4.3 Testing Associations Between Severity Index Scores and the Number of ASRs Submitted Across the Three Pandemic Stages
4.4 Testing Associations Between the Number of ASRs Submitted, Total FDM Events, FDM Event Categories, and Autopilot Use Across the Three Pandemic Stages
5 Discussion
5.1 Limitations
6 Conclusion
References
Effects of Radio Frequency Cross-Coupling in Multiple Remote Tower Operation on Pilots
1 Introduction and Background
1.1 Frequency Cross-Coupling and adapted Phraseology
1.2 Mental Workload and Situation Awareness
1.3 Aim of this Paper
2 Method
2.1 Sample
2.2 Experimental Task
2.3 Self-reported Measures
2.4 Procedure
2.5 Data Analysis
3 Results
3.1 Attitude
3.2 Mental Workload
3.3 Situation Awareness
3.4 Pilot Comments
4 Discussion
5 Conclusion
References
Risk Analysis of Human Factors in Aircraft Tail Strike During Landing: A Study Based on QAR Data
1 Introduction
2 Method
2.1 Causation Analysis
2.2 QAR Data Collection and Processing
2.3 Statistical Analyzing
3 Result
3.1 Trend Classification and Analysis of Large Pitch Angle in Landing
3.2 Causation Analysis on the Increased Pitch Angle After Grounding
3.3 Causation Analysis on Bounced Landing
4 Conclusion
References
EBT-CBTA in Aviation Training: The Turkish Airlines Case Study
1 Introduction
2 Developing EBT - CBTA Database to Support Global Aviation Systems Design
3 Pilot Competencies and Tem Model
4 Turkish Airlines
4.1 Airline Profile – Organizational Culture
4.2 Training Model
4.3 Technology Concept in Flight Crew Training
5 Conclusion
References
Towards eMCO/SiPO: A Human Factors Efficacy, Usability, and Safety Assessment for Direct Voice Input (DVI) Implementation in the Flight Deck
1 Human Factors perspective on Direct Voice Input (DVI) in Flight Deck in the last 40 years
2 Systematic Literature Review on DVI
3 Results
3.1 Operational Viability
3.2 Operational Reliability
3.3 Functional Utility
4 Discussion and Conclusions
5 Efficacy, Usability, and Safety Assessment of DVI Under the eMCO/SiPO Prism
References
Pilot’s Perspective on Single Pilot Operation: Challenges or Showstoppers
1 Next Generation Pilot Operation
1.1 Challenges or Showstoppers for SPO?
1.2 Aims of the Study: Identifying Characteristics of SPO-Related Problems
2 Methods and Instruments
2.1 The 3C-SPO Concept
2.2 Sample
3 Results
3.1 Evaluation of Directly Safety Related Problems
3.2 Structure of the Directly Safety Relevant Problems
3.3 Indirectly Safety Related Problems
3.4 More Challenges
3.5 Perceived Problems with the Defined 3C-SPO Concept
3.6 Willingness to Pilot SPO
4 Discussion
References
Human Factors in Operations Management
Using a Semi-autonomous Drone Swarm to Support Wildfire Management – A Concept of Operations Development Study
1 Introduction
2 Background
2.1 Wildfire Management with Drone Swarms
2.2 Earlier Concepts of Operations Work for the Command and Control of Multiple Drones
3 Study Case Approach
4 Results
4.1 Current Operational Activity
4.2 Challenges of Drone Usage in a Wildfire Situation
4.3 Visions of Drone Usage in Wildfire Situations
4.4 The First Version of the Developed Drone Swarm-Based ConOps
5 Discussion, Conclusions, and Further Research
References
Human Factors and Human Performance in UAS Operations. The Case of UAS Pilots in UAM Operations
1 Introduction
1.1 Evolution of UAS Operations
1.2 Scope
2 The Role of Human Factors in Aviation Safety
2.1 Human Performance
2.2 Decision-Making
2.3 Accident Causation
3 Human Factors in Modern UAS Operations
3.1 Overview
3.2 UAS HF Effects on Collision Cases
3.3 UAS Pilot Training
4 Case Study
4.1 People Transportation (UAM)
5 Recommendations
6 Conclusion
References
A Bridge Inspection Task Analysis
1 Introduction
2 Method
2.1 Participants
2.2 Recruitment
2.3 Procedure
3 Results
3.1 Bridge Classes
3.2 Bridge Inspectors
3.3 Inspector Classes
3.4 Inspection Task
4 Discussion
5 Conclusion
References
A Study on Human Relations in a Command and Control System Based on Social Network Analysis
1 Introduction
2 The Social Network Analysis Review
3 The Social Network Model for C2 Application
3.1 The C2 Social Networks
3.2 Characteristics of C2 Social Networks
3.3 The Social Network Communication Media
3.4 Coupling Communication Media Efficiency and Task Ambiguity
4 Case Study: A Submarine Operation and Navigation Scenario
4.1 Scenario and Tasks
4.2 Results
5 Discussion
6 Conclusion
References
Research on Ergonomic Evaluation of Remote Air Traffic Control Tower of Hangzhou Jiande Airport
1 Introduction
2 Method
2.1 Participants
2.2 Subjective Evaluation Method
2.3 Objective Detection Method
2.4 Selection of Evaluation Methods
3 Results and Discussion
3.1 Basic Information of Remote ATC Tower of Jiande Airport
3.2 Ergonomic Evaluation by Questionnaire Method and Results
3.3 Ergonomic Evaluation by Physiological Detection Method and Results
4 Conclusions and Suggestions
4.1 Evaluation Conclusions
4.2 Research Suggestions on Ergonomic Evaluation of Remote ATC Tower
References
A Framework for Supporting Adaptive Human-AI Teaming in Air Traffic Control
1 Introduction
2 Making Sense of the “New” ATM Ecosystem
3 Method
4 A Framework for Supporting Adaptive Human-AI Teaming
4.1 Steering or Goal Setting
4.2 Sensemaking and Mental Models
4.3 Common Operating Picture or Shared Mental Models
4.4 Coordination and Transfer of Control
4.5 Managing Changes
4.6 Planning-Doing-Checking Cycle
5 Discussion and Conclusion
References
Assessing Performance of Agents in C2 Operations
1 Introduction
2 Theory
2.1 NASA-Task Load Index
2.2 Crew Awareness Rating Scale
2.3 Situation Awareness Global Assessment Technique
3 Method
3.1 Participants
3.2 Simulation Runs
3.3 Situation Awareness Requirement Analysis
3.4 Materials and Measures
3.5 Procedure
4 Results
4.1 Final Version of the Assessment Tool
4.2 Example of Scores
4.3 Interpreting the Scores
5 Conclusions
References
Research and Application of Intelligent Technology for Preventing Human Error in Nuclear Power Plant
1 Background of Intelligent Human Error Prevention in Nuclear Power Plant
2 Intelligent Human Error Prevention Model Based on AR + AI Technology
2.1 Development Status of Intelligent Human Error Prevention
2.2 Human Error Mechanism and Behavior Pattern
3 Intelligent Prevention Model of Human Error in NPPs
4 Development and Application of Intelligent Human Error Prevention Device
4.1 Development
4.2 Test and Verification
4.3 Results
4.4 Future Work
5 Conclusion
References
An Emergency Centre Dispatcher Task Analysis
1 Introduction
2 Method
3 Results and Discussion
3.1 Organizational Context
3.2 Call Taker
4 Dispatchers
4.1 Police Dispatchers
4.2 Fire Dispatchers
5 Dispatching Challenges
5.1 Database Search
5.2 Disturbing Images
5.3 Multi-channel Audio Monitoring
5.4 Fire Unit Redistribution
5.5 Traffic Conditions
5.6 Hazardous Material
5.7 Fire Code and Alarm Selection
6 Meeting Challenges and Conclusions
References
Research Trends and Applications of Human-Machine Interface in Air Traffic Control: A Scientometric Analysis
1 Introduction
2 Data Sources and Methods
2.1 Data Sources
2.2 Methods
3 Results
3.1 Distribution of Publications
3.2 Performance of Countries/Regions
3.3 Performance of Institutes and Authors
3.4 Performance of Journals
3.5 Analysis of Hot Spots, Trends and Applications
3.6 Theoretical Basis
4 Conclusion
References
Construction of General Aviation Air Traffic Management Auxiliary Decision System Based on Track Evaluation
1 Introduction
2 Methodology
2.1 Status of Flight Path Evaluation
2.2 Data Source and Processing
2.3 DTW Algorithm (Dynamic Time Warping)
3 Results and Analysis
3.1 Take-Off and Landing Routes
3.2 Cloud-Crossing Routes
3.3 Algorithm Flexibility Verification
4 Discussion
4.1 Evaluation Data Differentiation
4.2 System Construction
5 Conclusion
References
How to Determine the Time-Intervals of Two-Stage Warning Systems in Different Traffic Densities? An Investigation on the Takeover Process in Automated Driving
1 Introduction
1.1 Takeover in Automated Driving
1.2 The Two-Stage Warning Systems and Time-Intervals
1.3 Traffic Density
1.4 Interaction Effect Traffic Density and Time-Intervals
1.5 Aims
2 Methods
2.1 Participants
2.2 Experiment Design and Measurement
2.3 Apparatus
2.4 Driving Scenarios
2.5 Two-Stage Warning Systems
2.6 Non-driving Related Task
2.7 Procedure
2.8 Data Analysis
3 Results
3.1 Descriptive Statistics
3.2 Motor Readiness
3.3 Takeover Performance
3.4 Situation Awareness
3.5 Subjective Ratings
4 Discussion
5 Conclusion
References
Human-Centered Design of Autonomous Systems
The City Scale Effect and the Baidu Index Prediction Model of Public Perceptions of the Risks Associated with Autonomous Driving Technology
1 Introduction
2 Empirical Study
2.1 Data Sources
2.2 Research Methodology
3 City Scale Effect and Baidu Index Prediction Model
3.1 Variation in Attention to Autonomous Driving Technology Across Cities Based on Keywords
3.2 Level of Urban Development as a Predictor of Public Attention to Autonomous Driving Technology
3.3 Baidu Index Prediction Model for Autonomous Driving Technology Based on the Urban Statistical Variables
4 Conclusions and Discussion
References
Human-Computer Interaction Analysis of Quadrotor Motion Characteristics Based on Simulink
1 Introduction
2 Method
2.1 Flight Principle and Motion State of Multi-rotor UAV
2.2 Kinematic Model of Multi-rotor UAV
2.3 Multi-rotor UAV Simulation Environment
3 Results and Discussion
3.1 Vertical Motion Characteristics
3.2 Horizontal Flight Motion Characteristics
3.3 Hover at Fixed Point
3.4 Integrated Sports
4 Conclusion
References
Fluidics-Informed Fabrication: A Novel Co-design for Additive Manufacturing Framework
1 Introduction
1.1 Design for Additive Manufacturing
1.2 Towards Novel Design for Additive Manufacturing Tools
2 User Workflow for Injection 3D Printing
2.1 3D Printer Hardware Interface
2.2 Experimental User Design Testing
2.3 Visualization of Multimaterial Injection Printing
3 Generative Design Methodology
3.1 Requirements of the 3D Fluidic Network
3.2 Procedural Modeling Methodology
3.3 Injection 3D Printing Design Interface
4 Conclusions and Future Directions
References
Advanced Audio-Visual Multimodal Warnings for Drivers: Effect of Specificity and Lead Time on Effectiveness
1 Introduction
2 Method
2.1 Participants
2.2 Apparatus and Stimuli
2.3 Experiment Design
2.4 Procedure
3 Result
3.1 Behavioral Indicators
3.2 Subjective Evaluation
4 Discussion
5 Conclusion
References
Which Position is Better? A Survey on User Preference for Ambient Light Positions
1 Introduction
1.1 Research Background
1.2 Focusing on User Needs: Related to Perceptions and Emotions
1.3 Focusing on User Personalization: Aesthetic Preference
1.4 The Purpose of the Present Study
2 Method
2.1 Participants
2.2 Research Design
2.3 Materials
2.4 Procedure
2.5 Statistical Analyses
3 Results
3.1 Descriptive Results of In-vehicle Ambient Light Usage
3.2 Aesthetic Evaluation Results of In-Vehicle Ambient Light Positions
3.3 Favourite Ranking Results of In-Vehicle Ambient Light Positions
3.4 Integration of Aesthetic Evaluation and Favourite Ranking into Preferences
4 Discussion
5 Conclusion
References
Using Automatic Speech Recognition to Evaluate Team Processes in Aviation - First Experiences and Open Questions
1 Motivation
2 Theoretical Background
2.1 Collaboration and Teamwork in Aviation
2.2 Role of Communication for Team Collaboration
2.3 Features of the Communication Process Used for Team Process Analysis
2.4 State-of-the Art of Automatic Speech Recognition (IAIS)
3 Approach to Analyze Team Communication
4 Use-Case Cockpit Crew Communication
4.1 Collaboration Scenario
4.2 Descriptive Analysis of Communication Structure
4.3 Examples of Communication Situations
4.4 Current Results of ASR of Cockpit Crew Communication
5 Summary and Outlook
References
Parsing Pointer Movements in a Target Unaware Environment
1 Structure of a Mouse Movement
1.1 Fitt’s Law
1.2 Inversion of Fitts’ Law
1.3 First Algorithm
1.4 Reduction of the Inverted FItt’s Law Algorithm
2 Experiment and Analysis
2.1 Experiment
2.2 Analysis
2.3 Testing and Fitness
3 Evolutionary Algorithm
3.1 Fitness Function
4 Results
5 Discussion
5.1 Future Ideas
6 Conclusion
References
A Framework of Pedestrian-Vehicle Interaction Scenarios for eHMI Design and Evaluation
1 Introduction
2 Methods and Results
2.1 Pedestrian-Vehicle Interaction Scenarios from Literature
2.2 Pedestrian-Vehicle Interaction Scenarios from a Focus Group Interview
3 A Framework of Pedestrian-Vehicle Interaction Scenarios
4 Discussion
5 Conclusions
References
A User Requirement Driven Development Approach for Smart Product-Service System of Elderly Service Robot
1 Introduction
2 Literature Review
2.1 Rough Set Theory
2.2 Smart Product-Service System (SPSS) Development
3 Method
3.1 Obtain Knowledge of User Kansei Needs
3.2 Obtaining Service Design Elements
3.3 RST to Capture Key Service Elements
3.4 LR to Build Uncertain Relationship
3.5 Providers Integrating to Build Service Content
4 A Case Study
4.1 User Needs Analysis
4.2 Spanning Design Attributes for Elderly Service Robot SPSS
4.3 RST to Identify the Key Design Features
4.4 LR to Build Mapping Model to Obtain the Key Features of SPSS
4.5 Construction of Product/service for SPSS
5 Conclusions
References
An Analysis of Typical Scenarios and Design Suggestions for the Application of V2X Technology
1 Introduction
2 Potential Scenarios of V2X Technology
2.1 Beyond-Visual-Range Information Presentation
2.2 Public Services
2.3 Inter-vehicles Coordination
3 Design Suggestions for V2X
4 Future Research Directions
5 Conclusion
References
Who Should We Choose to Sacrifice, Self or Pedestrian? Evaluating Moral Decision-Making in Virtual Reality
1 Introduction
2 Methods
2.1 Participants
2.2 Study Design
2.3 Virtual Reality Road Traffic Scenarios
2.4 Procedure
2.5 Data Analysis
3 Results
4 Discussion
4.1 Individual’s Moral Decision-Making in Tasks Involving Self-sacrifice Choice
4.2 The Effect of Pedestrian Type on Individual’s Moral Decision-Making
4.3 The Effect of Time Pressure on Individual’s Moral Decision-Making
4.4 Application to Autonomous Vehicles and Study Limitations
5 Conclusions
References
A Literature Review of Current Practices to Evaluate the Usability of External Human Machine Interface
1 Introduction
1.1 Autonomous Vehicles
1.2 The Study of EHMI
1.3 The Present Study
2 Method
3 Results
4 Discussion
5 Conclusion
References
Visualizing the Improvement of the Autonomous System: Comparing Three Methods
1 Introduction
2 Method
2.1 Participants
2.2 Research Design
2.3 Material
2.4 Procedure
3 Results
3.1 The Effect of Feedback
4 Discussion
5 Conclusion
References
Author Index

Citation preview

LNAI 14018

Don Harris Wen-Chin Li (Eds.)

Engineering Psychology and Cognitive Ergonomics 20th International Conference, EPCE 2023 Held as Part of the 25th HCI International Conference, HCII 2023 Copenhagen, Denmark, July 23–28, 2023 Proceedings, Part II

Lecture Notes in Computer Science

Lecture Notes in Artificial Intelligence Founding Editor Jörg Siekmann

Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Wolfgang Wahlster, DFKI, Berlin, Germany Zhi-Hua Zhou, Nanjing University, Nanjing, China

14018

The series Lecture Notes in Artificial Intelligence (LNAI) was established in 1988 as a topical subseries of LNCS devoted to artificial intelligence. The series publishes state-of-the-art research results at a high level. As with the LNCS mother series, the mission of the series is to serve the international R & D community by providing an invaluable service, mainly focused on the publication of conference and workshop proceedings and postproceedings.

Don Harris · Wen-Chin Li Editors

Engineering Psychology and Cognitive Ergonomics 20th International Conference, EPCE 2023 Held as Part of the 25th HCI International Conference, HCII 2023 Copenhagen, Denmark, July 23–28, 2023 Proceedings, Part II

Editors Don Harris Coventry University Coventry, UK

Wen-Chin Li Cranfield University Cranfield, UK

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-031-35388-8 ISBN 978-3-031-35389-5 (eBook) https://doi.org/10.1007/978-3-031-35389-5 LNCS Sublibrary: SL7 – Artificial Intelligence © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

Human-computer interaction (HCI) is acquiring an ever-increasing scientific and industrial importance, as well as having more impact on people’s everyday lives, as an ever-growing number of human activities are progressively moving from the physical to the digital world. This process, which has been ongoing for some time now, was further accelerated during the acute period of the COVID-19 pandemic. The HCI International (HCII) conference series, held annually, aims to respond to the compelling need to advance the exchange of knowledge and research and development efforts on the human aspects of design and use of computing systems. The 25th International Conference on Human-Computer Interaction, HCI International 2023 (HCII 2023), was held in the emerging post-pandemic era as a ‘hybrid’ event at the AC Bella Sky Hotel and Bella Center, Copenhagen, Denmark, during July 23–28, 2023. It incorporated the 21 thematic areas and affiliated conferences listed below. A total of 7472 individuals from academia, research institutes, industry, and government agencies from 85 countries submitted contributions, and 1578 papers and 396 posters were included in the volumes of the proceedings that were published just before the start of the conference, these are listed below. The contributions thoroughly cover the entire field of human-computer interaction, addressing major advances in knowledge and effective use of computers in a variety of application areas. These papers provide academics, researchers, engineers, scientists, practitioners and students with state-of-the-art information on the most recent advances in HCI. The HCI International (HCII) conference also offers the option of presenting ‘Late Breaking Work’, and this applies both for papers and posters, with corresponding volumes of proceedings that will be published after the conference. Full papers will be included in the ‘HCII 2023 - Late Breaking Work - Papers’ volumes of the proceedings to be published in the Springer LNCS series, while ‘Poster Extended Abstracts’ will be included as short research papers in the ‘HCII 2023 - Late Breaking Work - Posters’ volumes to be published in the Springer CCIS series. I would like to thank the Program Board Chairs and the members of the Program Boards of all thematic areas and affiliated conferences for their contribution towards the high scientific quality and overall success of the HCI International 2023 conference. Their manifold support in terms of paper reviewing (single-blind review process, with a minimum of two reviews per submission), session organization and their willingness to act as goodwill ambassadors for the conference is most highly appreciated. This conference would not have been possible without the continuous and unwavering support and advice of Gavriel Salvendy, founder, General Chair Emeritus, and Scientific Advisor. For his outstanding efforts, I would like to express my sincere appreciation to Abbas Moallem, Communications Chair and Editor of HCI International News. July 2023

Constantine Stephanidis

HCI International 2023 Thematic Areas and Affiliated Conferences

Thematic Areas • HCI: Human-Computer Interaction • HIMI: Human Interface and the Management of Information Affiliated Conferences • EPCE: 20th International Conference on Engineering Psychology and Cognitive Ergonomics • AC: 17th International Conference on Augmented Cognition • UAHCI: 17th International Conference on Universal Access in Human-Computer Interaction • CCD: 15th International Conference on Cross-Cultural Design • SCSM: 15th International Conference on Social Computing and Social Media • VAMR: 15th International Conference on Virtual, Augmented and Mixed Reality • DHM: 14th International Conference on Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management • DUXU: 12th International Conference on Design, User Experience and Usability • C&C: 11th International Conference on Culture and Computing • DAPI: 11th International Conference on Distributed, Ambient and Pervasive Interactions • HCIBGO: 10th International Conference on HCI in Business, Government and Organizations • LCT: 10th International Conference on Learning and Collaboration Technologies • ITAP: 9th International Conference on Human Aspects of IT for the Aged Population • AIS: 5th International Conference on Adaptive Instructional Systems • HCI-CPT: 5th International Conference on HCI for Cybersecurity, Privacy and Trust • HCI-Games: 5th International Conference on HCI in Games • MobiTAS: 5th International Conference on HCI in Mobility, Transport and Automotive Systems • AI-HCI: 4th International Conference on Artificial Intelligence in HCI • MOBILE: 4th International Conference on Design, Operation and Evaluation of Mobile Communications

List of Conference Proceedings Volumes Appearing Before the Conference

1. LNCS 14011, Human-Computer Interaction: Part I, edited by Masaaki Kurosu and Ayako Hashizume 2. LNCS 14012, Human-Computer Interaction: Part II, edited by Masaaki Kurosu and Ayako Hashizume 3. LNCS 14013, Human-Computer Interaction: Part III, edited by Masaaki Kurosu and Ayako Hashizume 4. LNCS 14014, Human-Computer Interaction: Part IV, edited by Masaaki Kurosu and Ayako Hashizume 5. LNCS 14015, Human Interface and the Management of Information: Part I, edited by Hirohiko Mori and Yumi Asahi 6. LNCS 14016, Human Interface and the Management of Information: Part II, edited by Hirohiko Mori and Yumi Asahi 7. LNAI 14017, Engineering Psychology and Cognitive Ergonomics: Part I, edited by Don Harris and Wen-Chin Li 8. LNAI 14018, Engineering Psychology and Cognitive Ergonomics: Part II, edited by Don Harris and Wen-Chin Li 9. LNAI 14019, Augmented Cognition, edited by Dylan D. Schmorrow and Cali M. Fidopiastis 10. LNCS 14020, Universal Access in Human-Computer Interaction: Part I, edited by Margherita Antona and Constantine Stephanidis 11. LNCS 14021, Universal Access in Human-Computer Interaction: Part II, edited by Margherita Antona and Constantine Stephanidis 12. LNCS 14022, Cross-Cultural Design: Part I, edited by Pei-Luen Patrick Rau 13. LNCS 14023, Cross-Cultural Design: Part II, edited by Pei-Luen Patrick Rau 14. LNCS 14024, Cross-Cultural Design: Part III, edited by Pei-Luen Patrick Rau 15. LNCS 14025, Social Computing and Social Media: Part I, edited by Adela Coman and Simona Vasilache 16. LNCS 14026, Social Computing and Social Media: Part II, edited by Adela Coman and Simona Vasilache 17. LNCS 14027, Virtual, Augmented and Mixed Reality, edited by Jessie Y. C. Chen and Gino Fragomeni 18. LNCS 14028, Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management: Part I, edited by Vincent G. Duffy 19. LNCS 14029, Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management: Part II, edited by Vincent G. Duffy 20. LNCS 14030, Design, User Experience, and Usability: Part I, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 21. LNCS 14031, Design, User Experience, and Usability: Part II, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares

x

List of Conference Proceedings Volumes Appearing Before the Conference

22. LNCS 14032, Design, User Experience, and Usability: Part III, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 23. LNCS 14033, Design, User Experience, and Usability: Part IV, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 24. LNCS 14034, Design, User Experience, and Usability: Part V, edited by Aaron Marcus, Elizabeth Rosenzweig and Marcelo Soares 25. LNCS 14035, Culture and Computing, edited by Matthias Rauterberg 26. LNCS 14036, Distributed, Ambient and Pervasive Interactions: Part I, edited by Norbert Streitz and Shin’ichi Konomi 27. LNCS 14037, Distributed, Ambient and Pervasive Interactions: Part II, edited by Norbert Streitz and Shin’ichi Konomi 28. LNCS 14038, HCI in Business, Government and Organizations: Part I, edited by Fiona Fui-Hoon Nah and Keng Siau 29. LNCS 14039, HCI in Business, Government and Organizations: Part II, edited by Fiona Fui-Hoon Nah and Keng Siau 30. LNCS 14040, Learning and Collaboration Technologies: Part I, edited by Panayiotis Zaphiris and Andri Ioannou 31. LNCS 14041, Learning and Collaboration Technologies: Part II, edited by Panayiotis Zaphiris and Andri Ioannou 32. LNCS 14042, Human Aspects of IT for the Aged Population: Part I, edited by Qin Gao and Jia Zhou 33. LNCS 14043, Human Aspects of IT for the Aged Population: Part II, edited by Qin Gao and Jia Zhou 34. LNCS 14044, Adaptive Instructional Systems, edited by Robert A. Sottilare and Jessica Schwarz 35. LNCS 14045, HCI for Cybersecurity, Privacy and Trust, edited by Abbas Moallem 36. LNCS 14046, HCI in Games: Part I, edited by Xiaowen Fang 37. LNCS 14047, HCI in Games: Part II, edited by Xiaowen Fang 38. LNCS 14048, HCI in Mobility, Transport and Automotive Systems: Part I, edited by Heidi Krömker 39. LNCS 14049, HCI in Mobility, Transport and Automotive Systems: Part II, edited by Heidi Krömker 40. LNAI 14050, Artificial Intelligence in HCI: Part I, edited by Helmut Degen and Stavroula Ntoa 41. LNAI 14051, Artificial Intelligence in HCI: Part II, edited by Helmut Degen and Stavroula Ntoa 42. LNCS 14052, Design, Operation and Evaluation of Mobile Communications, edited by Gavriel Salvendy and June Wei 43. CCIS 1832, HCI International 2023 Posters - Part I, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy 44. CCIS 1833, HCI International 2023 Posters - Part II, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy 45. CCIS 1834, HCI International 2023 Posters - Part III, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy 46. CCIS 1835, HCI International 2023 Posters - Part IV, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy

List of Conference Proceedings Volumes Appearing Before the Conference

xi

47. CCIS 1836, HCI International 2023 Posters - Part V, edited by Constantine Stephanidis, Margherita Antona, Stavroula Ntoa and Gavriel Salvendy

https://2023.hci.international/proceedings

Preface

The 20th International Conference on Engineering Psychology and Cognitive Ergonomics (EPCE 2023) is an affiliated conference of the HCI International Conference. The first EPCE conference was held in Stratford-upon-Avon, UK in 1996, and since 2001 EPCE has been an integral part of the HCI International conference series. Over the last 25 years, over 1,000 papers have been presented in this conference, which attracts a world-wide audience of scientists and human factors practitioners. The engineering psychology submissions describe advances in applied cognitive psychology that underpin the theory, measurement and methodologies behind the development of human-machine systems. Cognitive ergonomics describes advances in the design and development of user interfaces. Originally, these disciplines were driven by the requirements of high-risk, high-performance industries where safety was paramount, however the importance of good human factors is now understood by everyone for not only increasing safety, but also enhancing performance, productivity and revenues. Two volumes of the HCII 2023 proceedings are dedicated to this year’s edition of the EPCE conference. The first volume centers around an array of interconnected themes related to human performance, stress, fatigue, mental workload, and error management. Drawing on the latest research and real-world case studies, works included for publication explore perspectives of stress and fatigue and study mental workload across different tasks and contexts. A considerable number of articles delve into exploring high-pressure environments characteristic of aviation and technology industries, where human error can have severe consequences, aiming to evaluate and enhance performance in such demanding contexts, as well as to understand the competencies and psychological characteristics of professionals in these fields. Furthermore, this volume discusses the topic of resilience to cope with the demands of challenging contexts, exploring facets of resilience engineering in synergy with threat and error management, system safety competency assessment, as well as vigilance and psychological health and safety in these contexts. The second volume offers a comprehensive exploration of the role of human factors in aviation, operations management, as well as the design of autonomous systems. The prominence of human factors in aviation is addressed in a number of papers discussing research and case studies that investigate the gamut of aviation systems – including the flight deck, training, and communication – and explore pilots’ perceptions, perspectives, and psychological aspects. These works also deliberate on issues of safety, efficacy, and usability. Additionally, from the perspective of operations management, a considerable number of papers discuss research and provide valuable insights into the critical role of human factors in enhancing the safety and efficiency of various operational contexts. Finally, a significant proportion of this volume is devoted to understanding the complex interplay between humans and autonomous systems, exploring design processes, humancentered design practices, evaluation perspectives and ethical dilemmas.

xiv

Preface

Papers of these volumes are included for publication after a minimum of two singleblind reviews from the members of the EPCE Program Board or, in some cases, from members of the Program Boards of other affiliated conferences. We would like to thank all of them for their invaluable contribution, support and efforts. July 2023

Don Harris Wen-Chin Li

20th International Conference on Engineering Psychology and Cognitive Ergonomics (EPCE 2023)

Program Board Chairs: Don Harris, Coventry University, UK and Wen-Chin Li, Cranfield University, UK Program Board: • • • • • • • • • • • • • • • • • • • • • • •

Gavin Andrews, HeartMath UK, UK James Blundell, Coventry University, UK Mickael Causse, ISAE-SUPAERO, France Wesley Chan, Cranfield University, UK Maik Friedrich, German Aerospace Center (DLR), Germany Nektarios Karanikas, Queensland University of Technology, Australia Hannu Karvonen, VTT Technical Research Centre of Finland Ltd., Finland Gulsum Kubra Kaya, Cranfield University, UK Kylie Key, FAA Flight Deck Human Factors Research Laboratory, USA John Lin, National Taiwan Normal University, Taiwan Ting-Ting Lu, Civil Aviation University of China, P.R. China Chien-Tsung Lu, Purdue University, USA Pete McCarthy, Cathay Pacific Airways, UK Brett Molesworth, UNSW Sydney, Australia Jose Luis Munoz Gamarra, Aslogic, Spain Anastasios Plioutsias, Coventry University, UK Tatiana Polishchuk, Linköping University, Sweden Dujuan Sevillian, National Transportation Safety Board (NTSB), USA Anthony Smoker, Lund University, Sweden Lei Wang, Civil Aviation University of China, P.R. China Jingyu Zhang, Chinese Academy of Sciences, P.R. China Xiangling Zhuang, Shaanxi Normal University, P.R. China Dimitrios Ziakkas, Purdue University, USA

xvi

20th International Conference on Engineering Psychology and Cognitive Ergonomics

The full list with the Program Board Chairs and the members of the Program Boards of all thematic areas and affiliated conferences of HCII2023 is available online at:

http://www.hci.international/board-members-2023.php

HCI International 2024 Conference

The 26th International Conference on Human-Computer Interaction, HCI International 2024, will be held jointly with the affiliated conferences at the Washington Hilton Hotel, Washington, DC, USA, June 29 – July 4, 2024. It will cover a broad spectrum of themes related to Human-Computer Interaction, including theoretical issues, methods, tools, processes, and case studies in HCI design, as well as novel interaction techniques, interfaces, and applications. The proceedings will be published by Springer. More information will be made available on the conference website: http://2024.hci.international/. General Chair Prof. Constantine Stephanidis University of Crete and ICS-FORTH Heraklion, Crete, Greece Email: [email protected]

https://2024.hci.international/

Contents – Part II

Human Factors in Aviation Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots . . . . . . . . . . . . . . . Andrew Black, Steve Scott, and John Huddlestone

3

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck . . . . . . . . . . . . . Leighton Carr, Guy Wallis, Nathan Perry, and Stephan Riek

18

A Comparative Evaluation of Human Factors Intervention Approaches in Aviation Safety Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wesley Tsz-Kin Chan and Wen-Chin Li An In-Depth Examination of Mental Incapacitation and Startle Reflex: A Flight Simulator Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jonathan Deniel, Maud Dupuy, Alexandre Duchevet, Nadine Matton, Jean-Paul Imbert, and Mickaël Causse High-Fidelity Task Analysis Identifying the Needs of Future Cockpits Featuring Single Pilot Operation with Large Transport Airplanes . . . . . . . . . . . . . Lars Ebrecht DART – Providing Distress Assistance with Real-Time Aircraft Telemetry . . . . . Hannes S. Griebel and Daniel C. Smith A Semi-quantitative Method for Congestion Alleviation at Air Route Intersections Based on System Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiuxia Guo, Xuanhe Ren, Siying Xu, Xin Guo, and Yingjie Jia

36

46

60

77

91

Research on Evaluation of Network Integration of Multiple Hub Airports Within a Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Linlin Li, Mengyuan Lu, Tingting Lu, and Yiyang Zhang Usability Evaluation of an Emergency Alerting System to Improve Discrete Communication During Emergencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Elizabeth Manikath, Wen-Chin Li, Pawel Piotrowski, and Jingyi Zhang The Application of a Human-Centred Design Integrated Touchscreen Display with Automated Electronic Checklist in the Flight Deck . . . . . . . . . . . . . 135 Takashi Nagasawa and Wen-Chin Li

xx

Contents – Part II

Analysis of Airline Pilots’ Risk Perception While Flying During COVID-19 Through Flight Data Monitoring and Air Safety Reports . . . . . . . . . . 150 Arthur Nichanian and Wen-Chin Li Effects of Radio Frequency Cross-Coupling in Multiple Remote Tower Operation on Pilots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Lukas Tews, Jörn Jakobi, Anneke Hamann, and Helge Lenz Risk Analysis of Human Factors in Aircraft Tail Strike During Landing: A Study Based on QAR Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Nan Zhang, Lei Wang, Jianing An, and Xinge Qi EBT-CBTA in Aviation Training: The Turkish Airlines Case Study . . . . . . . . . . . 188 Dimitrios Ziakkas, Ibrahim Sarikaya, and Hans C. Natakusuma Towards eMCO/SiPO: A Human Factors Efficacy, Usability, and Safety Assessment for Direct Voice Input (DVI) Implementation in the Flight Deck . . . 200 Dimitrios Ziakkas, Don Harris, and Konstantinos Pechlivanis Pilot’s Perspective on Single Pilot Operation: Challenges or Showstoppers . . . . . 216 Frank Zinn, Jasmin della Guardia, and Frank Albers Human Factors in Operations Management Using a Semi-autonomous Drone Swarm to Support Wildfire Management – A Concept of Operations Development Study . . . . . . . . . . . . . . . . 235 Hannu Karvonen, Eija Honkavaara, Juha Röning, Vadim Kramar, and Jukka Sassi Human Factors and Human Performance in UAS Operations. The Case of UAS Pilots in UAM Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Michail Karyotakis and Graham Braithwaite A Bridge Inspection Task Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Jean-François Lapointe and Irina Kondratova A Study on Human Relations in a Command and Control System Based on Social Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Zhen Liao, Shuang Liu, and Zhizhong Li Research on Ergonomic Evaluation of Remote Air Traffic Control Tower of Hangzhou Jiande Airport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Chengxue Liu, Yan Lu, and Tanghong Mou

Contents – Part II

xxi

A Framework for Supporting Adaptive Human-AI Teaming in Air Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Stathis Malakis, Marc Baumgartner, Nora Berzina, Tom Laursen, Anthony Smoker, Andrea Poti, Gabriele Fabris, Sergio Velotto, Marcello Scala, and Tom Kontogiannis Assessing Performance of Agents in C2 Operations . . . . . . . . . . . . . . . . . . . . . . . . 331 Alexander Melbi, Björn Johansson, Kristofer Bengtsson, and Per-Anders Oskarsson Research and Application of Intelligent Technology for Preventing Human Error in Nuclear Power Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Yang Shen, Xiang Ye, and Di Zhai An Emergency Centre Dispatcher Task Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Norman G. Vinson, Jean-François Lapointe, and Noémie Lemaire Research Trends and Applications of Human-Machine Interface in Air Traffic Control: A Scientometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Ziang Wang Construction of General Aviation Air Traffic Management Auxiliary Decision System Based on Track Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Jiang Yuan and Chaoxiang Long How to Determine the Time-Intervals of Two-Stage Warning Systems in Different Traffic Densities? An Investigation on the Takeover Process in Automated Driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 Wei Zhang, Shu Ma, Zhen Yang, Changxu Wu, Hongting Li, and Jinlei Shi Human-Centered Design of Autonomous Systems The City Scale Effect and the Baidu Index Prediction Model of Public Perceptions of the Risks Associated with Autonomous Driving Technology . . . . 425 Jingxi Chen, Riheng She, Shuwen Yang, and Jinfei Ma Human-Computer Interaction Analysis of Quadrotor Motion Characteristics Based on Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Zengxian Geng, Junyu Chen, Xin Guang, and Peiming Wang Fluidics-Informed Fabrication: A Novel Co-design for Additive Manufacturing Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Gabriel Lipkowitz, Eric S. G. Shaqfeh, and Joseph M. DeSimone

xxii

Contents – Part II

Advanced Audio-Visual Multimodal Warnings for Drivers: Effect of Specificity and Lead Time on Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Shan Liu, Bohan Wu, Shu Ma, and Zhen Yang Which Position is Better? A Survey on User Preference for Ambient Light Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Xinze Liu, Haihang Zhang, Xinyu Pan, Haidong Liu, and Yan Ge Using Automatic Speech Recognition to Evaluate Team Processes in Aviation - First Experiences and Open Questions . . . . . . . . . . . . . . . . . . . . . . . . 501 Anne Papenfuss and Christoph Andreas Schmidt Parsing Pointer Movements in a Target Unaware Environment . . . . . . . . . . . . . . . 514 Jonah Scudere-Weiss, Abigail Wilson, Danielle Allessio, Will Lee, and John Magee A Framework of Pedestrian-Vehicle Interaction Scenarios for eHMI Design and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 Yuanming Song, Xiangling Zhuang, and Jingyu Zhang A User Requirement Driven Development Approach for Smart Product-Service System of Elderly Service Robot . . . . . . . . . . . . . . . . . . . . . . . . . . 533 Tianxiong Wang, Wei Yue, Liu Yang, Xian Gao, Tong Yu, and Qiang Yu An Analysis of Typical Scenarios and Design Suggestions for the Application of V2X Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 Xia Wang, Pengchun Tang, Youyu Sheng, Rong Zhang, Muchen Liu, Yi Chu, Xiaopeng Zhu, and Jingyu Zhang Who Should We Choose to Sacrifice, Self or Pedestrian? Evaluating Moral Decision-Making in Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 Huarong Wang, Dongqian Li, Zhenhang Wang, Jian Song, Zhan Gao, and David C. Schwebel A Literature Review of Current Practices to Evaluate the Usability of External Human Machine Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Yahua Zheng, Kangrui Wu, Ruisi Shi, Xiaopeng Zhu, and Jingyu Zhang Visualizing the Improvement of the Autonomous System: Comparing Three Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Yukun Zhu, Zhizi Liu, Youyu Sheng, Yi Ying, and Jingyu Zhang Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

Contents – Part I

Stress, Fatigue, and Mental Workload Suitability of Physiological, Self-report and Behavioral Measures for Assessing Mental Workload in Pilots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hilke Boumann, Anneke Hamann, Marcus Biella, Nils Carstengerdes, and Stefan Sammito

3

Evaluating the Impact of Passive Fatigue on Pilots Using Performance and Subjective States Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Conte, Donald Harris, and James Blundell

21

Cognitive Effort in Interaction with Software Systems for Self-regulation - An Eye-Tracking Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gilbert Drzyzga, Thorleif Harder, and Monique Janneck

37

Comparison of Two Methods for Altering the Appearance of Interviewers: Analysis of Multiple Biosignals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ziting Gong and Hideaki Kanai

53

Don’t Think Twice, It’s All Right? – An Examination of Commonly Used EEG Indices and Their Sensitivity to Mental Workload . . . . . . . . . . . . . . . . . . . . . Anneke Hamann and Nils Carstengerdes

65

Generalizability of Mental Workload Prediction Using VACP Scales in Different Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yanrong Huang, Nanxi Zhang, and Zhizhong Li

79

Research and Application of Fatigue Management of Apron Control in Beijing Daxing International Airport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aiping Jia, He Sun, Yi Liang, and Yanxi Qiu

95

Pilot Study on Gaze-Based Mental Fatigue Detection During Interactive Image Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Christian Lengenfelder, Jutta Hild, Michael Voit, and Elisabeth Peinsipp-Byma Using Virtual Reality to Evaluate the Effect of the Degree of Presence on Human Working Memory Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Majdi Lusta, Cheryl Seals, Susan Teubner-Rhodes, Sathish Chandra Akula, and Alexicia Richardson

xxiv

Contents – Part I

The Impact of Blue Light and Dark UI on Eye Fatigue and Cognitive Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Bilal Mahmood, Fatih Baha Omeroglu, Elahe Abbasi, and Yueqing Li The Evaluations of the Impact of the Pilot’s Visual Behaviours on the Landing Performance by Using Eye Tracking Technology . . . . . . . . . . . . . 143 Yifan Wang, Lichao Yang, Wojciech Tomasz Korek, Yifan Zhao, and Wen-Chin Li How Information Access, Information Volume of Head-Up Display and Work Experience Affect Pilots’ Mental Workload During Flight: An EEG Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Jinchun Wu, Chenhao Li, and Chengqi Xue A Mental Workload Control Method Based on Human Performance or Safety Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Nanxi Zhang, Chunye Bao, Xin Wang, Qiming Han, Ye Deng, Yijing Zhang, and Zhizhong Li Human Performance and Error Management Expertise Analysis in Chest X-Rays Diagnosis Based on Eye Tracking Stimulated Retrospections: Effective Diagnosis Strategies in Medical Checkup Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Hirotaka Aoki, Koji Morishita, Marie Takahashi, Rea Machida, Atsushi Kudoh, Mitsuhiro Kishino, and Tsuyoshi Shirai An Evaluation Framework on Pilot’s Competency-Based Flying Style . . . . . . . . . 190 Shan Gao, Yuanyuan Xian, and Lei Wang Proposing Gaze-Based Interaction and Automated Screening Results for Visual Aerial Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Jutta Hild, Lars Sommer, Gerrit Holzbach, Michael Voit, and Elisabeth Peinsipp-Byma The Impact Exercise Has on Cognitive Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Kevin Lee, Fatih Baha Omeroglu, Chukebuka Nwosu, and Yueqing Li Study on Temperament Characteristics of Air Traffic Controllers Based on BP Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Tingting Lu, Xinyue Liu, Ning Li, Wen-Chin Li, and Zhaoning Zhang The Evaluation Model of Pilot Visual Search with Onboard Context-Sensitive Information System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Wei Tan, Wenqing Wang, and Yuan Sun

Contents – Part I

xxv

The Similarity Recognition of Pilots’ Operational Action Sequence Based on Blocked Dynamic Time Warping during a Flight Mission . . . . . . . . . . . . . . . . 253 Huihui Wang, Yanyu Lu, and Shan Fu Analysis on the Competence Characteristics of Controllers in the Background of Air Traffic Control System with Manmachine Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Yonggang Wang and Wenting Ma A Measurement Framework and Method on Airline Transport Pilot’s Psychological Competency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Lei Wang, Jiahua Peng, Ying Zou, Mengxi Zhang, and Danfeng Li A Study on Real-Time Control Capability Assessment of Approach Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Lili Wang, Qiu-Li Gu, and Ke Ren Wang A Method for Evaluating Flight Cadets’ Operational Performance Based on Simulated Flight Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Feiyin Wang, Wei Tan, Jintong Yuan, Wenqing Wang, Wenchao Wang, and Hang Li Applying Multi-source Data to Evaluate Pilots’ Flight Safety Style Based on Safety-II Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Zixin Wei, Ying Zou, and Lei Wang Integrated Visual Cognition Performance Evaluation Model of Intelligent Control System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Xiaoli Wu and Yiyao Zhou How the Color Level of HUD Affects Users’ Search Performance: An Ergonomic Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Jinchun Wu, Chenhao Li, and Chengqi Xue An Erroneous Behavior Taxonomy for Operation and Maintenance of Network Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Zijian Yin, Lei Long, Jiahao Yu, Yijing Zhang, and Zhizhong Li Effects of the Icon Brightness, App Folder Opacity, and Complexity of Mobile Wallpaper on the Search of Thumbnail Icons . . . . . . . . . . . . . . . . . . . . . 371 Huihui Zhang, Lingxuan Li, Miao He, Yanfang Liu, and Liang Zhang How the Position Distribution of HUD Information Influences the Driver’s Recognition Performance in Different Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Ying Zhou, Liu Tang, Junfeng Huang, Yuying Xiang, and Yan Ge

xxvi

Contents – Part I

Resilience and Performance in Demanding Contexts A 7-Day Space Habitat Simulated Task: Using a Projection-Based Natural Environment to Improve Psychological Health in Short-Term Isolation Confinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Xinyu He and Ao Jiang Emerging Challenges – How Pilot Students Remained Resilient During the Pandemic? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Chien-Tsung Lu, Xinyu Lu, Ming Cheng, Haoruo Fu, and Zhenglei Ji A Study on Civil Aviation Pilots Vigilance Change on Ultra-Long-Range Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 Min Luo, Chunyang Zhang, Xingyu Liu, and Lin Zhang Short Time Algorithms for Screening Examinations of the Collective and Personal Stress Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Sergey Lytaev An Exploratory Study into Resilience Engineering and the Applicability to the Private Jet Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Heather McCann and Anastasios Plioutsias Resilience Engineering’s Synergy with Threat and Error Management – An Operationalised Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Andrew Mizzi and Pete McCarthy A Preparedness Drill Scenario Development and System Safety Competency Assessment Based on the STAMP Model . . . . . . . . . . . . . . . . . . . . . . 484 Apostolos Zeleskidis, Stavroula Charalampidou, Ioannis M. Dokas, and Basil Papadopoulos Study of Psychological Stress Among Air Traffic Controllers . . . . . . . . . . . . . . . . 501 Zhaoning Zhang, Zhuochen Shi, Ning Li, Yiyang Zhang, and Xiangrong Xu Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

Human Factors in Aviation

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots Andrew Black(B) , Steve Scott, and John Huddlestone Faculty of Engineering, Environment and Computing, Coventry University, Coventry CV1 5FB, UK [email protected]

Abstract. Unmanned aircraft systems (UAS) are a rapidly emerging sector of aviation, however loss of control in-flight (LOC-I) is the largest reported category by the Air Accident Investigations Branch (2021). When UAS unexpectedly revert from global positioning system (GPS) to attitude mode (ATTI), automatic safety features may degrade. This creates a pre-condition for LOC-I. Under visual line of sight (VLOS), remote pilots (RPs) must maintain a constant aircraft watch, but current user interfaces may not offer appropriate alerting for visual tasks. A repeated measures experiment compared RP reaction times against two different ATTI alerts: verbal and passive. Participants with General VLOS Certificates were recruited to fly a maneuver sequence in June 2022 (n = 5). Quantitative data was supported by a qualitative questionnaire. Four research questions (RQs) were asked. RQ1: Is there a significant difference in RP reaction times to unexpected modechanges between a passive system and a verbal ATTI warning system? RQ2: Is reaction time to reversion significantly affected by maneuver? RQ3: Do RPs have a preference of warning system? RQ4: What suggestions do RPs have for warning system design? Results indicate a significant improvement using a verbal system (p = 0.048), with a large effect (ïp 2 = 0.66). Participants unanimously agreed that verbal alerts enhanced awareness of unexpected reversion. A combined haptic/verbal system was suggested by participants. The theoretical concept “Alert System Assessment Tool” has been introduced, alongside other RP human factors research areas. This study expands limited research within VLOS alerts and is believed to be primary research into verbal ATTI warnings. Keywords: UAS · VLOS · Alerting · Remote Pilot · ATTI Mode

1 Introduction 1.1 Background “Unmanned aircraft system” (UAS) describes the complete system required to operate an unmanned aircraft. This includes remote pilots (RPs), the control system and any other elements required to enable flight, such as launch and recovery devices (CAP722 2020). They may be operated under visual line of sight (VLOS) or beyond visual line of sight (BVLOS) but significant operational differences exist between the two. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 3–17, 2023. https://doi.org/10.1007/978-3-031-35389-5_1

4

A. Black et al.

VLOS systems are controlled via a handheld transmitter, with Global Positioning System (GPS) allowing three-dimensional stabilized navigation as a primary source (Doumit 2020). Other redundancy modes exist in concert with GPS to help maintain position and these vary between manufacturer, however, attitude mode (ATTI) requires full RP control to maintain position (letusdrone.com, 2019). In ATTI, the GPS signal is not used which means advanced and automated functions are inhibited; however, ATTI is considered a normal mode of flight (DJI 2017). This rationale covers certain conditions – such as no GPS signal – where it may be essential to conduct an entire flight in ATTI. This condition is an expected mode of flight as ATTI has been selected by the RP. In contrast, an unexpected mode change (e.g. loss of GPS signal) may not be obvious and can cause confusion, startle and surprise (CAP722 2020; CAP737 2016). Human factors affecting UAS are yet to be explored in full (Hobbs 2018; McCarthy and Teo 2017; Özyörük 2020), creating a likely contributing factor to the higher accident rate compared with manned aviation (Doroftei et al. 2021; Dunn et al. 2020; Zhang et al. 2018). However, across both manned and unmanned disciplines, Loss of Control In-Flight (LOC-I) has been cited as the most commonly reported event (AAIB 2022). CAP722 (2020) suggests that one human factors challenge faced by RPs is mode change ambiguity leading to confusion. An example of this confusion is evidenced by an AAIB (2020) report involving a 12.8kg UAS. Shortly after take-off on a fully automated VLOS flight, unexpected ATTI reversion was encountered. The RP was unaware of the reversion and no stick inputs were made for 75 s. The UAS drifted and collided with a house. 1.2 Research Problem The AAIB’s (2020) report raised a question: why were no control inputs made? The report states that the two RPs noticed a caution (GPS-Compass Error) and the automatic Return-to-Home (RTH) function did not work when requested. VLOS was lost after approximately ten seconds as the aircraft drifted in the wind. The report describes RPs in a state of surprise (AAIB 2020). The aircraft reverted from GPS to ATTI, as intended by the manufacturer but this was unexpected by the RPs. Standard operator calls of “ATTI mode” were not made despite suitable training and assessment (AAIB 2020). It is therefore suggested the crew experienced a reduction in situation awareness (Endsley and Jones 2011). It is theorized a different alert system might have improved the situation. Alert systems and their presentation must be task appropriate (Noyes et al. 2016; Stokes and Wickens 1988). When operating a UAS under VLOS, it is often necessary to look away from the vehicle to the controller to confirm aircraft state. This is known as “heads-down” and in the case of VLOS, is likely to be both distracting and reduce situation awareness (CAP722 2020; FAA 2020; CAP737 2016). That argument is supported by the AAIB (2022) who comment that human factors are not considered in UAS controller design. Furthermore, certain aural alerts may only be heard if interface volume is turned up (DJI 2017). Even then, performance shaping factors (PSFs) may preclude effective interpretation; ambient noise, for example (Harris 2011). Based on the information presented, a problem has been identified. The method of alert presentation to RPs during VLOS operations is not conducive to flight safety.

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots

5

2 Literature Review Human factors are evidently represented within the field of unmanned aviation. The necessity for human factors research clarity was identified by Fitts (1947). As the discipline has evolved, there is a growing realization that “unmanned” does not mean humans are removed from the system (Özyörük 2020; Zhang et al. 2018; Reason and Hobbs 2017; Patel et al. 2010). However, researchers have focused on tasks and vehicles rather than RP interaction with the system as a whole (Lim et al. 2018; Endsley and Kiris 1995; Hobbs and Lyall 2016; McCarthy and Teo 2017). A lack of engagement by manufacturers in the investigative process, alongside a lack of appropriate user manual sets were cited by the AAIB (2020, 2022). Visual acuity limits for VLOS RPs were discussed by Li et al. (2019, 2020). The unique human factors challenges faced by RPs were identified by several authors, however little remedial advice was offered (CAP722 2020; Hocraffer and Nam 2016; Pratt et al. 2009; Landman et al. 2017; CAP737 2016; Martin et al. 2016). There appeared to be a void of articles on UAS alerting systems, with only one relating to ATTI mode (McCarthy and Teo 2017). Whilst UAS design taxonomies were presented, alerting systems were not explicitly defined (Hobbs and Lyall 2016; ICAO 328 2011; Demir et al. 2015). The only conclusions available are drawn from manned aviation (Berson et al. 1981; Stokes and Wickens 1988; CAA 2013); warning systems must be trustworthy, reliable, unambiguous and support decision making (Noyes et al. 2016); user documentation must also be considered as part of the design process. Authors argue that such consideration may reduce startle and surprise, thereby helping to minimize confusion and ambiguity (Patterson 1990; Stanton and Edworthy 2018; Landman et al. 2017; Martin et al. 2016; CAP737 2016). Based on all the information presented it is possible to link Fitts’s (1947) vision of human factors research to the future. It is suggested that human factors research should be conducted in an area specific to RPs. This would support Özyörük’s (2020) observations on greater depth and focus of research. The lack of support for VLOS alerting systems implies that value might be added to the field by conducting a suitable investigation. As the field of alerting systems is such a broad area, evidenced by literature, a specific aspect must be targeted (Fitts 1947). Therefore, because warnings form the most significant level of criticality (Berson et al. 1981), an investigation therein is recommended.

3 Research Outline 3.1 Aim This study aimed to investigate if warning systems for VLOS RPs could be improved for unexpected ATTI mode reversion events. This aim intends to address human factors challenges highlighted in CAP722 (2020) regarding ambiguous mode changes and RP confusion through a human-in-the-loop experiment that evaluates a novel Verbal ATTI Warning System (VAWS) vs the extant, passive system commonly found in small VLOS UAS. Three maneuver type reversions would be assessed: dynamic (aircraft in motion), static (aircraft in hover) and automated (Return-to-Home).

6

A. Black et al.

3.2 Research Questions The research questions (RQ) maybe described as: • RQ1:Is there a significant difference in RP reaction times to unexpected modechanges between a passive system and VAWS? • RQ2:Is reaction time to reversion significantly affected by maneuver? • RQ3:Do RPs have a preference of warning system? • RQ4:What suggestions do RPs have for warning system design? 3.3 Hypothesis It is possible to state the following hypotheses for investigation, and relate only to RQs 1 & 2: • Ho1 : A VAWS will not significantly affect RPreaction time versus the passive system • Ho2 :Maneuver type flown will not significantly affect RP reaction time • Ho3 :There will be no significant differences in RP reaction time due to the interaction of the VAWS and maneuver type flown versus the passive system • Ha1 : RP reaction time will be improved by VAWS versus the passive system • Ha2 :Maneuver type flown will have a significant effect on RP reactiontimes • Ha3 :A significant difference exists in RP reaction time due to the interaction of VAWS and maneuver type flown versus the passive system

4 Method 4.1 Participant Selection The UAS selected for the experiment required a flying site clear of built-up environments to satisfy current regulations (CAP722 2020). This afforded an opportunity to allow any RP experience level to participate. However, the baseline of experience and assumed knowledge across participants may not be even. Such a comparison was viewed as essential for what appears to be primary work in the area. An A2 Certificate of Competence (A2CofC) furnishes RPs with a baseline theory level, but practical flying assumptions could still not be satisfied (CAP722 2020). The General VLOS Certificate (GVC) builds on an A2CofC by the inclusion of flight training and testing. Under this qualification, a Civil Aviation Authority approved baseline of both theory and practical skill could be established. Although this discriminates the sample population, it reduces between-participant variation and increases experimental reliability (Coolican 2019). It was therefore decided that a GVC should constitute the minimum qualification level for participants in this experiment. Alumni of SimsUAV, a GVC training provider, were contacted to assist. Five participants were selected who met the requirements of holding a GVC and were able to attend during the data collection window.

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots

7

4.2 Apparatus According to the AAIB (2020), the manufacturer DJI accrued 56% of their reported UAS events. As such, DJI was selected as a suitable manufacturer by which to base research. The DJI Phantom 3 Advanced (P3A) is a small UAS with a maximum take-off weight of 1.28 kg. The maximum altitude is limited by user interface (UI) software to 120m (393ft) above the take-off position. Primary navigation is via GPS, with an ATTI sub-mode available. This sub-mode can be selected by RPs prior to or during flight. In the event of a GPS failure, reversion to ATTI mode occurs regardless of switch position. The transmitter unit is a traditional twin-stick, two axis design. Key to this experiment is a Flight Mode switch that commands different levels of navigation function. This is located at the top left of the unit, when held in the flying position. (DJI 2017). A UI is required via a USB-A connection from the transmitter to a third-party smart device which must be capable of running the DJI Go app. Due to the age of the vehicle, this limits smart devices to those using an Android operating system. The “P3A App for Android Devices” can only be downloaded from the DJI support web pages (DJI, n.d). The P3A can be flown without the UI connected but it is not recommended. The transmitter was operated in Mode 2, where left stick controls altitude and yaw. UAS Adaptation. Minimizing participant expectation bias required a unique solution with an adapted transmitter. Global Drone Training (Norwich) were contacted for assistance. A bespoke solution added a wired, 1.6m remote to a secondary switch, encased in a 3D-printed guard. This deactivated the switch housed in the transmitter. Participants were told that the remote was a device to help with academic data collection. The VAWS constituted the author stating “ATTI MODE, ATTI MODE…” from operation of the switch, until a positive control statement had been annunciated. 4.3 Procedure Questionnaire. A counterbalanced Likert-type questionnaire was used to gather demographics and qualitative data. A five-point scale was used to represent the following sentiment: 1 – Strongly disagree, to 5 – Strongly agree. Sections relating to flight contained the same six questions to allow direct pre/post comparison. Numerical data was aggregated and assessed using the median value (Robson and McCartan 2016). The pre-flight survey was used to gather RP opinion based on their existing awareness of un-commanded ATTI reversion. The post-flight survey related to opinion of VAWS. Pre-Flight. Participants were asked to complete the pre-flight survey with an Apple iPad and then received a scripted flight sequence briefing. They were informed one of three profiles had been allocated: “Biggin”, “Lambourne” or “Ockham”. This was used as a tool to reduce any unexpected participant interaction. In fact, only one flight sequence was used to support experimental validity (Coolican 2019). During this brief, familiarity with the P3A was established and any questions were answered.

8

A. Black et al.

Participants were then moved forwards to the flying area. The P3A was in a readyto-fly state. Site markers were laid in the shape of a diamond and numbered clockwise from 1 to 4. Each point was approximately 25m away from the centre, the centre point marker being 30m forward of the take-off location. Flight Procedure. Flying accuracy was not being assessed as part of the experiment, a fact deliberately omitted from participant briefings in support of experimental awareness mitigations. RPs were asked to take off and conduct flight control checks, followed by a familiarisation flight in the centre of the diamond. Six tasks followed which encompassed static, dynamic and RTH aspects. Each participant received the same task order. The experimental sequence concluded with both RTH events. A soak time of 7 s was used under RTH to enable both vehicle movement and participant situation assessment prior to any IV change. On completion of the second RTH event, participants were told the experiment had concluded and they could either land under RTH or manually. Post-Flight. The post-flight survey and de-brief followed. Participants were thanked and the real reason for their visit was explained. A short outline of the theory was presented and how they have helped.

4.4 Experimental Design and Analysis A mixed-methods, 2x3 repeated measures design was employed. Based on hypotheses, reaction time in seconds was selected as the dependent variable (DV), with warning system type as independent variable (IV). This allows RQ1 and 2 to be answered. RQ2 also required the warning system to be assessed during different flight maneuvers. Therefore static (hover), dynamic (in motion) and automatic (RTH) were selected. Reaction time would be defined as the time taken from switch selection, to a RP positive statement of control, by using the phrase “I have control”. IV Description. No approved or comprehensive list of alerts for the P3A exists, despite extensive research. However, if a real ATTI reversion is encountered, the UI will provide limited assistance. One verbal iteration of “ATTI MODE” is annunciated and the onscreen status bar changes color. “Ready to Go (GPS)” in green is replaced by “No Positioning (Atti)” in amber. A status bar UAS icon also changes from P-GPS to ATTI. Such an event occurred during a practice flight for the experiment on 21st June 2022, as witnessed by an observer (recorded in field notes). Volume of the verbal warning is dependent on that of the interface device setting. No RP input is required to either acknowledge or cancel the warning, contrary to advice from Berson et al. (1981). A transmitter switch provides the option of deliberately selecting ATTI mode. When the flight mode switch is changed from GPS to ATTI, no verbal warning is issued. Without suitable DJI documentation to refer, an assumption must be made. As this selection during normal usage forms conscious RP behavior, no verbal alert is issued. However, a textual alert is displayed indicating flight mode switch position. It is therefore argued the UI provides only passive information that requires active RP monitoring. The switch was modified to enable remote operation, mitigating participant awareness bias.

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots

9

The VAWS provides a constant verbal annunciation of “ATTI MODE, ATTI MODE…” until RP intervention occurs. It is issued from the point of switch selection and assumes the RP does not know about the change of state. This is an “active” alert that seeks to address the design suggestions posed by Noyes et al (2016) and Stokes and Wickens (1988). Furthermore, it attempts to address monitoring issues presented by the CAA (2013). Normality. In five out of six conditions, data meets parametric assumptions, but a Shapiro-Wilk Test of Passive Dynamic reaction time shows a non-normal distribution. However, Analysis of Variance (ANOVA) is robust to variations of normality (Blanca et al. 2017; Schmider et al. 2010). Therefore, since only one data set fails to meet parametric assumptions, and Mauchly’s Test of Sphericity was not significant, a parametric ANOVA was carried out.

5 Results Participant reaction times (rt) were measured from switch selection to a positive statement of control (“I have control”). RTH saw the widest individual spread, (rt = 43.74, Passive = 48.49, Verbal = 4.75), with static experiencing the smallest (rt = 1.53, Passive = 3.06, Verbal = 1.53). Table 1 depicts the recorded reaction times in seconds, mean and standard deviation. Table 1. Participant reaction time (rt) in seconds Participant

Passive Static

Verbal Dynamic

RTH

Static

Dynamic

RTH

1

3.06

6.64

5.03

1.53

1.8

1.57

2

30

6.46

4.71

4.07

3.36

1.53

3

9.94

5.0 7

4.33

3.21

3.01

4

10

7.19

7.98

4.51

3.32

4.13

5

13.17

24.72

48.49

6.72

5.36

4.75

Mean

13.23

10.02

17.40

4.23

3.41

3.00

SD

10.07

8.26

18.58

1.84

1.27

1.46

20.8

Figure 1 presents the results in graphical format, demonstrating Robson and McCartan’s (2016) assertion on the practical significance of effect size, in this case via error bars. A two-way repeated measures ANOVA showed that reaction time differed significantly between warning types, with a large effect (F(1, 4) = 7.9, p = 0.048, ïp 2 = 0.66). Post hoc tests using the Bonferroni correction revealed that reaction time improved by an average of 10 s when using the verbal system (p = 0.048). No significant result was discovered for maneuver flown, or the interaction between warning type and maneuver. A

A. Black et al.

Mean Reaction Time (Secs)

10

20 18 16 14 12 10 8 6 4 2 0

Stac

Dynamic

RTH

Passive Warning

13.23

10.02

17.40

Verbal Warning

4.23

3.41

3.00

Maneuver Fig. 1. Graphical representation of quantitative data showing standard error bars

medium, tending to large effect was present in both conditions (Maneuver ïp 2 = 0.104, Interaction ïp 2 = 0.132). The results indicate that Ho1 can be rejected as the VAWS will significantly affect RP reaction time. Furthermore, Ha1 can be accepted because a mean 10 s improvement to reaction time has been demonstrated. Ho2 and Ho3 cannot be rejected as a significant result was not discovered (p > 0.05). 5.1 Qualitative Results Table 2 describes the counterbalanced questions and responses for both pre- and postflight (demographics omitted), using the median score. The result for Q2 appears incongruent by comparison to other data, indicating disagreement that VAWS is obvious and easy to interpret. Investigation yielded no obvious reason for this and it is therefore considered an outlier. Table 3 highlights selected quotes from the narrative questions. Qualitative data revealed that RPs agree the status quo is not obvious or easy to interpret (RQ3). They found the verbal system helped them to maintain VLOS, required less interpretation and helped decision making versus the passive warning (RQ3). Participants also suggested that a combination of haptic and verbal alerts would be useful (RQ4).

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots

11

Table 2. Pre- and Post-Flight Questionnaire Responses (Demographics Omitted) Question

Pre-Flight

Post-Flight

1

It is easy to recognise when GPS to ATTI reversion occurs

2

3

2

The warning system for GPS to ATTI reversion is obvious and easy to interpret

3

2

3

I need to look at the control screen 4 to work out what has happened

2

4

GPS to ATTI reversion warnings could be improved

5

5

5

I don’t need to look away from my 2 UAS to confirm a mode change

4

6

I would find it difficult to work out 4 if my UAS had changed to ATTI mode

2

7

Did the aural alerting system enhance your awareness of ATTI reversion?

5

N/A

Notes. Q7 post-flight only

6 Discussion The aim of this study was to investigate if UAS VLOS warning systems could be improved. The specific pathway was that of an unexpected GPS to ATTI mode change. RP reaction times were assessed using two systems (verbal vs non-verbal) and three flight phases (static, dynamic, RTH). Four research questions were framed; two quantitative and two qualitative. One significant result was discovered (RQ1): the VAWS improved RP reaction time significantly (p = 0.048). It was also found a large effect size was associated therewith (ïp 2 = 0.664). Furthermore, although Ho2 and Ho3 were not rejected, their associated effect sizes were medium to large (Maneuver ïp 2 = 0.105, Interaction ïp 2 = 0.132). The effect is best represented by the change in standard error, as shown in Fig. 1. Similar variations were observed in results for maneuver and warning/maneuver interaction data. It is posited that effect size explains the change in reaction times, despite a nonsignificant result. Such an effect size also suggests that whatever happened is worthy of further investigation. Coolican (2019) alludes to the importance of this strategy as a means of avoiding Type II error. Just because something isn’t statistically significant doesn’t necessarily mean there is nothing to investigate.

12

A. Black et al. Table 3. Narrative Selected Quotes

Question

Response Quotes

8

With regard to Q7’s answer, how or why?

1) It was clear, unambiguous and immediate 2) Made it obvious without needing to look at the drone or screen 3) Takes the guesswork away

9

This project used a verbal warning, however other warning systems are available. An example is a haptic system, which uses technology to stimulate the senses of touch and motion. Other methods of presenting information also exist, such as Head Up Displays (HUDs). What methods of communicating warnings and cautions to VLOS RPs do you think would work well? If possible, please explain why

1) Sometimes the aural could be missed in noisy environments such as building sites and airports 2) I think aural is best combined with haptic so you can concentrate on flying the drone 3) I would see a combination of audio and haptic as the ideal solution

10

Do you have any suggestions or feedback relating to this project?

1) Useful for instructors to simulate mode reversions while training 2) Would like to have done it in a strong wind! Really enjoyable and interesting 3) A far clearer, definitive mode update is required, and especially so for inexperienced pilots

Separate arguments proposed about RP human factors (Hobbs 2018; Lim et al. 2018) and alert design (Noyes et al. 2016; Berson et al. 1981) appear to be supported by these results. Specifically, that by aligning warnings with manned aviation, RPs become more engaged with their aircraft. Furthermore, the evidence supports the AAIB’s (2022) assertion that lessons learned within manned aviation are not being translated to the unmanned environment; in this case an inadequate alerting system. Stokes and Wickens (1988) highlight the importance of reducing visual distractions where object fixation is desirable. Additionally, they support a notion that safety critical conditions require minimal task distraction and an unambiguous, easy to interpret system. Under VLOS conditions where an unexpected mode change occurs, the data suggests VAWS meets these requirements. Interaction between warning type and maneuver yielded no significant result, but another sizable effect was present (ïp 2 = 0.132). Larger than that of maneuver alone, an implication can be drawn that an interaction effect may exist. Convincing arguments can be formed from analyzing Passive vs Verbal RTH. A fully automated maneuver puts VLOS RPs in a situation where the primary reference for normality is observation of the vehicle and UI alone; the RP is not manipulating the controls. It is argued this exacerbates RP human factors described by Hobbs (2016, 2018), McCarthy and Teo (2017) and Lim et al. (2018). Furthermore, the problems

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots

13

associated with reliance on automation are documented in CAP737 (2016). It is argued the VLOS monitoring task becomes difficult because an out-of-the-loop (OOTL) condition is presented. This situation may be compounded by an inadequate alert system. The disconnect between human and machine requires greater processing time should an ambiguous situation occur. This argument is supported by study data and Endsley and Kiris (1995). Whilst it is not possible to draw a definitive conclusion (p > 0.05), the data trend suggests effect size represents reality. Ambiguity appears to have been removed during VAWS RTH. This is supported by qualitative data where all participants agreed that VAWS enhanced awareness of an unexpected mode change. The lack of significant interaction may also support the notion of warning trust and reliability (Berson et al. 1981). This warning system does not rely on interactions to have a large practical effect. Furthermore, the ANOVA results might indicate a novel tool exists: the Alert System Assessment Tool (ASAT). This will be presented as part of the conclusion. All maneuvers in this study were designed to be flown as distraction from an IV change. As a consequence, ATTI flying, or its accuracy was not under investigation; merely the reaction time to an IV change. This fact was deliberately omitted from participant briefings to mitigate various biases. No interaction result was found for warning type and maneuver, but a medium (to large) effect size was (ïp 2 = 0.105). There are a number of reasons to explain the lack of significance. This study had a small number of participants (n = 5) which may have prevented significant data from being discovered. The study’s main hypothesis was Ho1 , examining the warning system in general, rather than how it relates to maneuvers. As a result, consideration was given to the flight profile only in terms of achieving an IV change. Mimicking a real task may have yielded different results and should be considered as a variation should this study be repeated. Flight time durations also had to be considered. A test flight revealed that under ambient temperature conditions (29 °C), the battery planning-life should be 15 min. The combination of maneuver considerations may have created a single confounding factor: task reality. Results from McCarthy and Teo (2017) imply that certain maneuver types are less likely to generate significant results than others. Whilst their study investigated ATTI mode from a training and testing perspective, the findings provide useful guidance. All dynamic maneuvers flown during this sequence were straight-line, route based including a climb or descent. McCarthy and Teo (2017) imply that this is an ineffective means of assessing ATTI flying. It could therefore be argued a non-significant maneuver result was to be expected as the experiment was not necessarily looking to see if one existed. A medium to large effect size (ïp 2 = 0.105) yields some interesting considerations. This suggests that despite a non-significant result, there might be something of practical benefit. Perhaps a larger group of participants would afford more robust data, or perhaps a different experimental design. There is a further possibility worthy of consideration. It is posited the lack of significance may indicate maneuver is independent to the alerting system, whilst the effect size indicates if a practical application exists. Further investigation is required to ascertain if this is the case.

14

A. Black et al.

6.1 Contribution to UAS Human Factors It is believed this investigation is the first of its kind relating to verbal alerting systems and VLOS operations. Therefore, this work takes a small step to fill the knowledge gap identified. It brings together common human factors issues applicable to both manned and unmanned aviation. Furthermore, it begins to challenge and offer suggestions for some of the unique human factors faced by RPs (Hobbs and Lyall 2016; Hobbs 2018; McCarthy and Teo 2017; Özyörük 2020). As such, this investigation might provide a baseline for any subsequent research. The data appears to support theories in literature about warning presentation. For example, when object fixation is desired, verbal warnings should be used. Furthermore, reaction times to verbal warnings may be faster than those where interpretation is required (Stokes and Wickens 1988). Noyes et al.’s. (2016) observations on warning system design also appear to be supported by this research. RP suggestions of a haptic and verbal alert combination would appear to support the conclusions of Hocraffer and Nam (2016). They posited that research into multiple types of RP feedback would be beneficial. 6.2 Study Limitations Using RPs with specific qualifications limited the number of available participants, which can be viewed as both a strength and a weakness. As a strength, it provides an academically comparative baseline. The weakness lies in discriminating participants that would otherwise have been able to take part. It also likely contributed to the nonsignificant results. It is suggested that future studies would, in fact, be able to use RPs of any experience level. This is because flying accuracy itself was not being investigated. If maneuver type was to become a valid part of the investigation (i.e. not used as distraction), McCarthy and Teo (2017) provide guidance on choice. Whilst UASs will revert to ATTI in a loss of GPS signal, not every UAS has an option to select ATTI independently. Replication of this study therefore requires consideration not only of transmitter adaptation, but also making sure the vehicle itself is capable. This limits the number of aircraft available. Battery life is subjective to ambient temperatures and extremes will affect their working capacity. These extremes can be ascertained from manufacturer manuals; -10C to + 40C in this case (DJI 2017). During an apparatus practice flight in an ambient 29 °C, it was discovered that a fully charged battery was only capable of 13 min flight before a battery alert occurred (30% remaining). An original plan of 2 participants per battery was amended as a result. Two additional batteries were sourced prior to data capture, allowing each participant their own battery. As the flight sequence lasted c.7 min, in the event of a battery failure, a contingency existed in use of another battery. This contingency plan was used on one occasion. If it is not possible to allocate each participant a battery, flying sites would benefit from access to suitable charging facilities.

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots

15

6.3 Future Research ASAT has been identified as a possible method to assess any alerting system. It should be noted that significant theoretical development of this research is required, along with its practical application. Different systems should be assessed with alternative metrics to reaction time. For example, a deviation alert system using meters or feet, in support of McCarthy and Teo (2017). Alternatively, eye tracking data might be used to differentiate between alert formats. In doing so the aim would be to either prove or reject ASAT as a concept. This study was not concerned with specific automation or OOTL conditions. However, data for RTH suggests that OOTL behaviors exist. Reasons for this were unclear from the study, therefore another research area is identified. The extent to which VLOS RPs encounter and recover from OOTL conditions is unknown. Human factors research into VLOS automation usage and OOTL is therefore recommended. Endsley and Kiris (1995) are likely to provide useful direction for this suggestion.

7 Conclusion Unexpected or ambiguous mode changes are a source of RP confusion (CAP737 2016). Such a condition exists when an unexpected change from GPS to ATTI mode occurs. This exploratory investigation was an empirical study using a DJI Phantom 3 Advanced. It studied the reaction time of RPs to an unexpected ATTI mode change using two different systems. A questionnaire was employed to gain greater insight to quantitative data. The quantitative results showed that RP reaction time was improved with a verbal versus non-verbal alternative. A two-way repeated measures ANOVA indicated reaction time differed significantly between warning types, with a large effect (F(1, 4) = 7.9, p = 0.048, ïp2 = 0.66). Qualitative data suggested a RP preference for the VAWS and gathered operator thoughts on UAS alerting systems. VLOS RPs suggest a combined haptic and verbal alerting system for further investigation. The concept of ASAT was introduced, alongside suggesting OOTL as an area of future research.

References Air Accident Investigation Branch (AAIB). AAIB investigation to DJI M600 pro (UAS, registration N/a) 131219, 25 June 2020.https://www.gov.uk/aaib-reports/aaib-investigation-to-djim600-pro-uas-registration-n-a-131219 Air Accident Investigation Branch (AAIB): Annual safety review 2021, 14 June 2022. https:// www.gov.uk/aaib-reports/annual-safety-review-2021 Berson, B.L., et al.: Aircraft alerting systems standardization study. Volume II. Aircraft alerting systems. Design guidelines. FAA (1981). https://ntrl.ntis.gov/NTRL/dashboard/searchResults/ titleDetail/ADA106732.xhtml Blanca, M.J., Alarcon, R., Arnau, J., Bono, R., Bendayan, R.: Non-normal data: is ANOVA still a valid option? Psicothema 29(4), 552–557 (2017). https://doi.org/10.7334/psicothema2016.383 Civil Aviation Authority (UK): CAA paper 2013/02: Monitoring matters - Guidance on the development of pilot monitoring skills (2013). https://publicapps.caa.co.uk/modalapplication.aspx? appid=11&mode=detail&id=5447

16

A. Black et al.

Civil Aviation Authority (UK) CAP 737: Flightcrew human factors handbook (2016). https://pub licapps.caa.co.uk/modalapplication.aspx?appid=11&mode=detail&id=6480 Civil Aviation Authority (UK) CAP 722: Unmanned aircraft system operations in UK airspace - Guidance (2020). https://publicapps.caa.co.uk/modalapplication.aspx?appid=11&mode=det ail&id=415 Coolican, H.: Research Methods and Statistics in Psychology (7th ed.). Psychology Press (2019) Demir, K. A., Cicibas, H., Arica, N.: Unmanned aerial vehicle domain: areas of research. Defence Sci. J. 65(4), 319(2015). https://doi.org/10.14429/dsj.65.8631 DJI: Phantom 3 advanced user manual (1.8) (2017). https://www.dji.com/uk/downloads/products/ phantom-3-adv DJI: Download center: DJI go for Android devices (V 3.1.72). DJI Official. https://www.dji.com/ uk/downloads/djiapp/dji-go-3 Doroftei, D., DeCubber, G., De Smet, H.: Reducing drone incidents by incorporating human factors in the drone and drone pilot accreditation process. In: Zallio, M. (ed.) A. AISC, vol. 1210, pp. 71–77. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-51758-8_10 Doumit, J.A.: FROM DRONES TO GEOSPATIAL DATA. KubGU, Krasnodar. Courtesy of assistant professor Steve Scott (2018) Dunn, M.J., Molesworth, B.R., Koo, T., Lodewijks, G.: Effects of auditory and visual feedback on remote pilot manual flying performance. Ergonomics 63(11), 1380–1393 (2020). https:// doi.org/10.1080/00140139.2020.1792561 Endsley, M.R., Jones, D.G.: Designing for Situation Awareness: An Approach to User-Centered Design (2nd edn.). CRC Press (2011) Endsley, M.R., Kiris, E.O.: The out-of-the-Loop performance problem and level of control in automation. Human Factors: J. Human Factors Ergon. Soc. 37(2), 381–394 (1995). https://doi. org/10.1518/001872095779064555 Federal Aviation Administration (FAA): Managing distractions (2020). https://www.faa.gov/ news/safety_briefing/2020/media/SE_Topic_20-01_Distractions.pdf Harris, D.: Human Performance on the Flight Deck. CRC Press (2011) Hobbs, A.: Remotely piloted aircraft. In: Landry, S.J. (ed.) Handbook of human factors in air transportation systems, pp. 379–395. CRC Press LLC (2018) Hobbs, A., Lyall, B.: Human factors guidelines for unmanned aircraft systems. Ergon. Des. Q. Human Factors Appl. 24(3), 23–28 (2016). https://doi.org/10.1177/1064804616640632 Hocraffer, A., Nam, C.S.: A meta-analysis of human-system interfaces in unmanned aerial vehicle (UAV) swarm management. Appl. Ergon. 58, 66–80 (2016). https://doi.org/10.1016/j.apergo. 2016.05.011 In-depth guide for DJI’s P-mode, S-mode, and ATTI mode. Let Us Drone, 15 January 2019. https:// www.letusdrone.com/in-depth-guide-for-djis-p-mode-s-mode-and-atti-mode/. Accessed 14 Aug 2022 International Civil Aviation Organisation (ICAO): Unmanned aircraft systems (UAS) (AN/190) (2011).https://www.icao.int/Meetings/UAS/Pages/UAS_Documents.aspx Landman, A., Groen, E.L., Van Paassen, M.M., Bronkhorst, A.W., Mulder, M.: Dealing with unexpected events on the flight deck: A conceptual model of startle and surprise. Human Factors J. Human Factors Ergon. Soc. 59(8), 1161–1172 (2017). https://doi.org/10.1177/001 8720817723428 Li, K.W., Jia, H., Peng, L., Gan, L.: Line-of-sight in operating a small unmanned aerial vehicle: how far can a quadcopter fly in line-of-sight? Appl. Ergon. 81, 102898 (2019). https://doi.org/ 10.1016/j.apergo.2019.102898 Li, K.W., Sun, C., Li, N.: Distance and visual angle of line-of-sight of a small drone. Appl. Sci. 10(16), 5501 (2020). https://doi.org/10.3390/app10165501 Lim, Y., et al.: Avionics human-machine interfaces and interactions for manned and unmanned aircraft. Prog. Aerosp. Sci. 102, 1–46 (2018). https://doi.org/10.1016/j.paerosci.2018.05.002

Attitude Adjustment: Enhanced ATTI Mode for Remote Pilots

17

Martin, W.L., Murray, P.S., Bates, P.R., Lee, P.S.: A flight simulator study of the impairment effects of startle on pilots during unexpected critical events. Aviat. Psychol. Appl. Human Factors 6(1), 24–32 (2016). https://doi.org/10.1027/2192-0923/a000092 McCarthy, P., Teo, G.K.: Assessing human-computer interaction of operating remotely piloted aircraft systems (RPAS) in attitude (ATTI) mode. In: Harris, D. (ed.) Engineering Psychology and Cognitive Ergonomics: Cognition and Design. LNCS (LNAI), vol. 10276, pp. 251–265. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58475-1_19 Noyes, J.M., Starr, A.F., Kazem, M.L.: Warning system design in civil aircraft. In: Harris, D. (ed.) Human factors for civil flight deck design, pp. 141–155. Routledge (2016) Ozyoruk, H.E.: Systematic analysis and classification of the literature regarding the impact of human factors on unmanned aerial vehicles (UAV). J. Aviat. 4(2), 71–81 (2020). https://doi. org/10.30518/jav.77483 Patel, V.L., et al.: Recovery at the edge of error: debunking the myth of the infallible expert. J. Biomed. Inform. 44(3), 413–424 (2010). https://doi.org/10.1016/j.jbi.2010.09.005 Patterson, R.D.: Auditory warning sounds in the work environment. In: Broadbent, D.E., Reason, J., Baddeley, A. (eds.) Human Factors in Hazardous SituationsProceedings of a Royal Society Discussion Meeting held on 28 and 29 June 1989, pp. 37–44. Oxford University Press (1990). https://doi.org/10.1093/acprof:oso/9780198521914.003.0004 Pratt, K.S., Murphy, R., Stover, S., Griffin, C.: CONOPS and autonomy recommendations for VTOL small unmanned aerial system based on Hurricane Katrina operations. J. Field Robot. 26(8), 636–650 (2009). https://doi.org/10.1002/rob.20304 Reason, J., Hobbs, A.: Managing Maintenance Error: A Practical Guide. CRC Press (2017) eBook, 3 February 2017. https://doi.org/10.1201/9781315249926 Robson, C., McCartan, K.: Real world research. Wiley Global Education (2016) Schmider, E., Ziegler, M., Danay, E., Beyer, L., Bühner, M.: Is it really robust? Methodology 6(4), 147–151 (2010). https://doi.org/10.1027/1614-2241/a000016 Stanton, N.A., Edworthy, J.: Human Factors in Auditory Warnings. Routledge (2018) Stokes, A.F., Wickens, C.D.: Aviation displays. In: Wiener, E.L., Nagel, D.C. (eds.) Human Factors in Aviation, pp. 387–431. Academic Press (1988) Zhang, X., Jia, G., Chen, Z.: The literature review of human factors research on unmanned aerial vehicle – what Chinese researcher need to do next? In: Rau, P.-L. (ed.) Cross-Cultural Design. Methods, Tools, and Users. LNCS, vol. 10911, pp. 375–384. Springer, Cham (2018). https:// doi.org/10.1007/978-3-319-92141-9_29

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck Leighton Carr1(B) , Guy Wallis2 , Nathan Perry1 , and Stephan Riek3 1 The Boeing Company, Brisbane, QLD 4000, Australia

[email protected]

2 The University of Queensland, Brisbane, QLD 4072, Australia 3 University of the Sunshine Coast, Sippy Downs, QLD 4556, Australia

Abstract. The use of Virtual Reality (VR) in training rests on the assumption that perceptual and cognitive knowledge transfers to the real world. We conducted a repeated measures experiment to evaluate the transfer of spatial knowledge from three different environments (VR, an industry standard flat panel training device, and a physical flight deck) to a real flight deck. Participants had no previous flight deck experience. We used a time-limited endogenous search task to develop the spatial knowledge and provide an objective performance measure. We found a strong learning transfer effect for all groups, and no significant between group effects. These preliminary findings illustrate the possible utility of VR in aviation flight training, but further work is needed. Keywords: Aviation Flight Training · Virtual Reality · Spatial Perception

1 Introduction Commercial aviation training relies on Full Flight Simulators (FFSs) and Flight Training Devices (FTDs) to expose students to high-fidelity facsimiles of a real airplane during training. An FFS is a full-sized replica of a specific aircraft flight deck which can represent the airplane in ground or flight operations and includes an out of flight deck visual system and a force cuing motion system. An FTD is a part task trainer which only represents the airplane partially and is not required to have a force motion or visual system [1]. These devices must be approved by the appropriate regulator to be used, and must therefore meet the associated standard [2]. In the case of FTDs, the regulatory approval will be only given for a subset of flight training tasks that the device was designed to be suitable for. FFSs are notoriously expensive to purchase and maintain. FTDs, while still expensive, are more affordable, but their use is limited because they are usually made from flat touch screens (Fig. 3). Recent advances in Virtual Reality (VR) technology have created an opportunity to immerse student pilots in a photorealistic virtual environment using commercial computing hardware costing a fraction of FTD systems. While promising, regulatory approval of VR based training devices is limited [3], in large part because little is known about the overall effectiveness of VR in flight training. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 18–35, 2023. https://doi.org/10.1007/978-3-031-35389-5_2

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

19

2 Background The goal of flight training is to establish knowledge and behaviours that transfer to flight operations of a real aircraft. This training includes a wide range of cognitive, motor, and social skills [4]. In the commercial aviation sector, due to the limitations in using real aircraft for training, pilots are evaluated on their performance in FFSs. FFSs are deemed suitable for this role due to their well-established record of supporting effective transfer of learning to real aircraft [5, 6]. Training in lower-level devices such as FTDs must transfer to an FFS, because FTD training typically occurs first. Much of the aviation training transfer literature is focused on these quasi-transfer of training studies [7]. Since the advent of personal computers (PCs) and PC-based flight simulators, there has been research on their effectiveness in flight training. There are many quasitransfer studies comparing FTDs to personal computer-based aviation training devices (PCATDs), generally finding that FTDs did not perform better than PCATDs [8–10]. These studies however focus on pilot skills that are initially learned in smaller aircraft, such as manual handling of flight manoeuvres. Literature on PC-based simulators for commercial flight training is sparse, probably due to the stricter regulatory controls and difficulty accessing regulated devices for comparison. VR flight simulator research is similarly sparse [11–13]. For commercial aviation training, it is also focused on smaller aircraft, although some papers exist on the fundamentals of flight deck interaction and system development for commercial aircraft [14–16]. Research in VR commercial flight decks is challenging due to the lack of commercially available performant virtual environments. Low performance VR systems can induce significant simulator sickness, which is compounded by the complexity of flight decks and the need for high resolution to perceive the fine text on flight-deck controls [17]. VR is widely researched in more general training contexts [18, 19], and a significant amount of VR research is devoted to evaluating aspects of the technology such as display resolution [20]. Kaplan et al. correctly note that the rate of technological innovation in VR exceeds the capability of researchers to thoroughly evaluate the technology [21, 22]. One caveat to this is that there should exist technology progress break points where the technological requirements for a specific task and environment are met, after which technology progress is subject to diminishing returns of training effectiveness relative to other efforts. Aviation simulation and training research have not historically focused on decomposing the elements of simulators for independent evaluation. In VR research, which integrates many technologies advancing at different rates, this decomposition is more common. Slater et al. refer to Presence, the perceptual illusion that the virtual environment is real, which they break down into two components: Place Illusion (PI), which is the subjective experience created from technology that supports the natural sensorimotor contingencies of the user; and Plausibility Illusion (Psi), which is the independent illusion of the credibility of the events in the environment that respond or refer to the user [23]. These subjective qualia are created by objective characteristics of the system, which they call Immersion (for PI) and Coherence (for Psi). Skarbez et al. expand this to include a factor termed the Social Presence Illusion [24] (SPI). Alexander et al. examined the technology developed for video games in the context of training. In contrast to the studies described above, their focus was on Simulator Fidelity which they defined as

20

L. Carr et al.

“the extent to which the virtual environment emulates the real world” [25]. The authors go on to decompose this into Physical, Functional, and Psychological Fidelity. Skarbez et al. note that this concept of Fidelity is orthogonal to Immersion such that a low-fidelity environment can be highly Immersive [24], and propose Fidelity as part of Coherence. Studies have shown that visual realism contributes to Presence [26, 27]. Ragan et al. note that VR training systems should use high levels of visual realism for scanning and search tasks in visually complex environments [28]. In the context of VR flight training, PI should be created by the immersive quality of the system, and Psi should be created by meeting the expected affordances and responses of the aircraft flight deck. Interaction affordances are especially challenging with current technology due to the range of fine motor actions required to interact with flight deck switches, knobs, buttons, and dials. Presence is still the subject of much research and there is currently no agreed-on objective method of measuring Presence. Questionnaires have long been regarded as insufficient [29], but are still widely used. Wallis et al. have proposed task-based measurements [30, 31] which have the advantage that they can be directly compared to corresponding real-world performance. Notably, many presence questionnaires do not include any questions regarding environment interaction, task affordances, or realistic responses [32]. It is possible to feel present in a new environment without any meaningful interaction, suggesting that objective measures for the complete illusion may prove elusive. The visual aspect of the Presence illusion is governed by the changes in visual perception produced by various motor actions. O’Regan and Noë explain that “vision is a mode of exploration of the world that is mediated by knowledge of what we call sensorimotor contingencies” [28]. Visual stability may be a foundational element of the development of sensorimotor contingencies in a new environment based on actions as simple as linear or rotational movement [33]. Visual stability may also affect Presence in VR due to generated efferent copy mismatch, or due to conflicts with visuomotor relationships learned in the real world [34]. Sensorimotor contingency theory is supported widely by other action-based perception research [35–37]. Successful VR learning that includes a spatial memory element may be based on the development of novel sensorimotor contingencies through interaction with the Virtual Environment (VE), but that learning may not transfer to the real world if the VE is not sufficiently Immersive and Coherent. Virtual Reality environments can be designed to support spatial learning by ensuring high levels of Immersion, strong visual stability, and allowing users to interact with the environment. Ragan et al. show evidence that higher levels of Immersion improve performance in a procedures memorisation task [38]. The real-world cues that influence spatial memory organisation are well known: egocentric experience, intrinsic structure (object layout), and extrinsic structure (environment structure). Spatial memory at small scales is structured according to egocentric reference systems based on views that the observer has experienced [39]. Some evidence exists that learned sensorimotor representation of objects can be rebuilt from long-term memories of their spatial location, but the relationship between long-term and sensorimotor spatial memory needs further research [35]. Spatial memory is also positively affected by searching, especially when objects exist within a known context [40].

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

21

3 Objectives The aim of this research is to examine whether VR can be used in place of an FTD for the purposes of initial spatial learning of a Boeing 737NG flight deck. Commercial aviation students use a variety of methods for this familiarisation, such as flight deck layout posters, desktop PC software, and FTDs. These happen first in the training footprint to reduce the time spent in the FFS on flight deck familiarisation. Familiarisation is not specifically evaluated in most type rating courses, but students are expected to have memorised many procedures and be able to perform them efficiently, which requires knowing the locations of the related flight deck elements. We sought to design the experiment to maximise spatial learning based on sensorimotor contingency theory, egocentric experience, and endogenous search. We also sought to minimise the effects of non-spatial cognitive learning. One key aim of this research is to understand whether the advantages of VR outweigh the disadvantages within the constrained environment of flight deck familiarisation. VR has a significant number of advantages over flat panel FTDs. FTDs are less accurate in the spatial distribution of flight deck elements, and the geometry of the flight deck elements presented on the 2D screens have reduced depth cues such as binocular disparity and motion parallax. In VR the spatial distribution is more accurate (but affected by Head Mounted Display (HMD) depth compression), and the geometry is presented with most of the relevant depth cues. VR systems are significantly cheaper than FTDs because they are based on consumer grade hardware. VR is also more portable allowing students to train at home or in normal classrooms and allows multiple aircraft to be simulated at little to no additional cost. VR also has several challenges which are not present in FTDs. Social Presence is degraded in virtual environments compared with the real world, and many social cues are missing. VR sickness can be an issue [41], although there is much research on this topic and clear guidelines available for developing systems that reduce or prevent this. Difficulty with full body motion capture also limits the accurate presentation of the users’ limbs, although some research suggests that hand-only tracking and display is sufficient for reaching tasks [42]. Most notably, VR systems do not present complete and distortion-free distal stimuli to the user. Distortion, chromatic aberration, frame rates, and a fixed focal plane are limitations present in all modern VR systems. Valid spatial learning should allow the transfer of knowledge of the flight deck layout from the training device to a realistic physical device such as an FFS. We aim to improve our understanding of how participants with no previous experience of a flight deck acquire this initial spatial knowledge in a variety of environments. These include degraded environments such as an FTD and VR, as well as a standard physical flight simulator which should represent the best possible environment. We also seek to understand how well spatial learning transfers from one environment to another during this initial learning phase.

4 Participants We recruited 45 participants (28 male, 17 female) from Boeing Australia and The University of Queensland. Participants learned about the study through a recruitment email or flier, and all volunteered to participate. Participants were screened to ensure they had

22

L. Carr et al.

no prior experience with flight decks and were divided into three groups of 15. Participant recruitment and division was limited based on access to the simulation devices and challenges recruiting participants during the COVID-19 pandemic. This task was reviewed and approved by the University of Queensland Human Research Ethics Review Committee (reference 2020002581) (Table 1). Table 1. Participant demographics Control

FTD

VR

Male

6

12

10

Female

9

3

5

Mean Age

31.3

39.2

32.7

Participants completed an informed consent form and an initial demographics and experience survey once they had agreed to participate in the experiment. They also completed a survey after the initial training task and another survey at the end of the experiment. Participants were compensated with a gift card for a local retailer for the value of AU$20. Seven of the 15 VR participants required corrective glasses that were not able to be worn in the headset. Two participants of that group reported being unable to clearly view the panels in VR without their glasses, but had no difficulty completing the task. Participants who did not wear glasses during the first phase were asked to continue to not wear them during the second phase. Seven of the 15 VR participants had never experienced VR before (Fig. 1).

Fig. 1. Participant group break down.

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

23

5 Method 5.1 Task The experiment was broken into two phases. The first phase provided initial training and was conducted in one of three environments. The second phase involved repeating the same task, but now in a physical Boeing 737NG flight simulator. This phase provided a measure of learning transfer. The first experimental group used the physical flight simulator in both phases, acting as a measure of optimal learning transfer. The second group used a qualified flat-panel FTD during the training phase, and the final group performed the task in VR during the training phase. There was a 10-min break between phases. Participants began the task by touching a prompt on a tablet computer (or virtual tablet for the VR group), which triggered the display of a 2D image of one of the flight deck panels. They were asked to find and touch the indicated switch as quickly as they could after seeing the image. The image was displayed for 20 s, then replaced with a circular button prompting the participant to touch it to begin the next step. An audio prompt was also sounded at the end of the 20 s to ensure that participants who were still searching the flight deck, were aware the time had expired. Participants were instructed that if they had not yet found the correct switch, they should stop searching at that point and press the button to continue. They were also instructed that, in cases where they found the correct switch, they would not be able to proceed to the next card until after the 20 s had elapsed. No direct feedback was given to participants that they had touched the correct switch, and they were instructed that only the first correct touch would be counted. Participants performed this task once per panel image, for a set of 20 unique panels, tested in pseudo-random order. Order randomness was generated by using a Fisher-Yates / Knuth shuffle [43] (seeded automatically using the current time) of the card array at the start of each experiment run. The set of 20 panels was the same for each participant and condition. Panel images displayed on the tablet computer were captured in the environment being evaluated and processed in normal image editing software. The areas included were the Navigation Panel, Throttle Body, Forward Panel, and Overhead Panel. The Mode Control Panel and Flight Management Computer were excluded because these were physical elements in the FTD. Successful touch times were recorded using a combination of software and video for the physical devices and using software only for the VR condition. Fixing presentation time and the number of panels tested, ensured each participant had approximately the same amount of exposure irrespective of training environment. This total exposure time was subject to the time taken by participants to enter the simulator, and a standard script was used to ensure consistency of the procedure to enter the simulator safely. Exposure time also varied slightly based on the time taken to press the tablet button at the start of each trial. Participants were briefed and reminded to move on to the next trial immediately once the time expired (Fig. 2).

24

L. Carr et al.

20 seconds Fig. 2. Task example: tablet start button is pressed by the participant (top-left), search image is displayed on the tablet (top-right), participant touches the correct dial (bottom-right), participant waits the remaining time (bottom-left).

5.2 Materials Four different simulation devices were used for the experiments. The flat panel group was conducted at the Boeing Brisbane Flight Training Centre using a Boeing 737NG flat panel FTD for phase 1, and a Boeing 737NG physical simulator for phase 2. The control group and the VR group were conducted at the Boeing Research & Technology research laboratory at the University of Queensland. The VR group used a custom Boeing 737NG VR environment for phase 1 and a Boeing 737NG physical simulator for phase 2. The control group used a 737NG physical simulator for both phase 1 and phase 2. Main displays such as Primary Flight Display and Navigation Display were kept blank in all environments and participants were not required to look at them. None of the simulators used had a motion system. Both fixed base simulators were darker than their corresponding training environments, so additional 12W LED array white (3200K) lights were added to these to improve visibility and match the lighting. For all physical devices, task commencement and image display were through a Microsoft Surface tablet PC, which also recorded the timing information for participant button presses. Video was recorded with a Nikon D7000 at full high-definition resolution (1920 × 1080) at 24 frames per second. Time was synchronised between tablet and camera by displaying a millisecond accurate clock on the tablet and recording that with the camera at the start of each experiment.

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

25

Fig. 3. Environments: flat panel FTD (top-left), VR (top-right), physical flight simulators (bottomleft, bottom-right).

5.3 Virtual Reality System The virtual reality system used a high-end Windows gaming PC with an AMD Ryzen 9 5900X central processing unit [44], 32 Gb RAM, and an Nvidia RTX 3090 graphics card [45]. The headset used was a Varjo XR-3 [46] head mounted display using SteamVR tracking with second generation base stations [47]. The XR-3 HMD combines two displays per eye. The primary display is for the periphery of the viewable area and has 2880 × 2720 pixels over a claimed 115° horizontal field of view giving approximately 25 pixels per degree angular resolution. Overlapping and overriding this is a focus area display in the lower centre of the user’s field of view, which is 1920 × 1920 pixels over a claimed 27° horizontal field of view giving approximately 70 pixels per degree angular resolution. Flight deck elements viewed through the primary display are clear, but not all text is readable from the standard pilot seating position. Fine text across the entire flight deck can typically be read if viewed through the focus area display for a user with normal visual acuity. A Valve Index controller [48] was used for right-hand tracking and interaction. Interaction was limited to overlapping the end of the controller with a flight deck element. No controller button presses were required. Due to the absence of cutaneous contact feedback in the VR environment, a sensory substitution approach was taken [49, 50]. Flight deck elements were highlighted in blue for all interactions, regardless of whether they were correct. The tablet responded by displaying the image card, and flight deck

26

L. Carr et al.

elements highlighted blue. While the Varjo XR-3 has built-in optical hand tracking, it was not employed in this study due to lack of data on its accuracy and latency. Participants were given time to familiarise themselves with the VR equipment in a staging area (Fig. 4) before viewing the flight deck. This was designed to help them adjust the headset both for comfort and visual clarity, as well as giving them an opportunity to practice the overlap interaction. The amount of time in the staging area was not limited because this time did not contribute to spatial learning of the flight deck.

Fig. 4. VR staging area.

The virtual environment was custom built in Unreal Engine 5.0.3 (UE5) [51], and included all experiment elements required for VR participants including task interaction, a virtual tablet for displaying images, and data recording. To maintain a constant 90Hz frame rate, rendering was slightly downgraded to 1720 × 1476 pixels for the peripheral area, and 1824 × 1824 for the focus area. An earlier version of this experiment was developed in Unreal Engine 4, but was not used due to an issue in the configuration of the virtual cameras associated with the HMD. In this earlier version the virtual cameras were located at the front of the HMD forward of the eye point where the pass-through cameras are located for the XR-3’s mixed reality mode. This created significant unwanted lateral movement during head rotations due to the increase in radius from the centre of rotation (Fig. 5).

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

27

Fig. 5. Example VR flight deck interaction.

6 Results We initially calculated the mean touch time for each participant and performed twosided paired sample t-tests to understand the within group effects. We found a strong learning effect (p < 0.0005, Cohen’s-d > 1.05) for all three groups, indicating knowledge acquired in the learning phase was successfully transferred to the test phase. The mean touch time improvement ranged from 1.63 s for the FTD group, to 1.73 s for the VR group and 2.47 s for the control group. Only 2 participants (1 in the FTD group, 1 in the VR group) were able to find all the flight deck elements in the phase 2 trial (Table 2 and Fig. 6). Table 2. Within group t-test results Group

Phase 1

Phase 2

p-val

Cohen’s-d

T

1

Physical

Physical

0.000011

1.528091

6.626817

2

FTD

Physical

0.000042

1.07913

5.854269

3

VR

Physical

0.000026

1.191513

6.128937

28

L. Carr et al.

Fig. 6. Mean touch times for all environments and phases.

We ran a mixed design ANOVA with phase as the within group factor and environment as the between group factor. The analysis revealed no overall between-group effect (p = 0.14, F = 2.09, DF = 2/42, MSE = 8.21e6). When performance in the transfer phase was compared between groups, there was a marginal effect (p = 0.054, Cohen’s-d = 0.733), between the control (group 1) and flat panel (group 2) groups and no significant difference between the flat panel (group 2) and VR (group 3) groups (p = 0.062, Cohen’s-d = 0.710), or between the control (group 1) and the VR (group 3) groups (p = 0.904, Cohen’s-d = 0.044), indicating that flat panel and VR were similarly effective as training conditions. Despite marginal between group effects, we noted that the VR group had a larger effect size for training transfer than the FTD group (Fig. 7).

Fig. 7. Mixed ANOVA results.

For the VR group we conducted an Igroup Presence Questionnaire (IPQ) [52] after their initial training phase in VR using a 5-point scale. This questionnaire is widely

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

29

used and has been recommended by researchers who have compared it with other Presence questionnaires [27]. IPQ provides a three-part subscale for Spatial Presence (SP), Involvement (INV), and Experienced Realism (REAL), with a fourth non-subscale item for General Presence (GP). Cronbach’s alpha estimates of reliability were used to determine internal consistency of the results. We found poor internal consistency (α = 0.51) despite overall strong Presence scores (Fig. 8). We calculated the Pearson correlation coefficient between mean phase 1 to phase 2 touch time improvement and overall mean Presence and found moderate correlation (r = 0.54). IPQ Subscale 5

mean ratings

4 3 2 1 0 GP

SP

INV

REAL

Fig. 8. Igroup Presence Questionnaire results.

We also evaluated the effectiveness of VR and FTD based training by comparing both groups’ performance in phase 2 (post training in the physical flight simulator) with the control group performance in phase 1. In this case the control group in phase 1 represents a group of participants with no prior exposure to a flight deck, and therefore no training. The mean touch time for the control group phase 1 was 9.46 s, while the mean touch time for the FTD and VR groups in phase 2 were 8.13 s and 7.06 s respectively. We ran a one-way ANOVA with Bonferroni correction for these three groups noting a significant difference (F = 8.852, p < 0.001, n2 = 0.296). The post-hoc pairwise tests revealed the strongest effect was between the control group phase 1 and the VR group phase 2. We also noted a moderate effect between the VR group phase 2 and the FTD group phase 2, although this was not statistically significant (Table 3). Table 3. Comparison of trained performance with untrained performance in a physical flight simulator Group A

Group B

T

p

Cohen’s-d

Control phase 1

FTD phase 2

2.258

0.096

0.824

Control phase 1

VR phase 2

4.178

0.001

1.526

FTD phase 2

VR phase 2

1.944

0.186

0.710

30

L. Carr et al.

7 Discussion The strong learning effect between the two phases indicated that for all three groups spatial learning had occurred in the first phase that was committed to long-term memory. Significant within-group differences indicated that VR exposure created spatial memories that transferred to the real world. We expected the control group performance to be highest because the environments were the same in both phases and because it was the only environment with no visuo-spatial degradation in the initial phase. We expected the FTD and VR group performance to be lower due to differences between the phase 1 and phase 2 environments, and due to degraded visuo-spatial cues in these environments. Marginal between-group differences may indicate that the lower spatial and geometric accuracy of flat panel FTDs and the variety of visual degradations present in VR (such as limited field of view and refresh rate), are less important than we predicted. We predicted degraded environments may require some perceptual adaptation resulting in a less complete or accurate learned spatial map over the short duration of this experiment. The performance of the FTD and VR groups may suggest that the visual system can quickly adapt to visual degradation and build an internal mental model which can be easily updated on entering the physical flight deck. The similar performance of the FTD and VR groups was surprising given the current limitations of VR technology, such as limited field of view, refresh rate, and angular resolution. This may be offset by the lack of geometric detail in the FTD which relies on flat images of flight deck elements. These images contain sufficient detail for users to understand the type of element, including size, colour, and shape. Textual information is the same across the environments, but can more easily be read in the physical and FTD trainers. The similar performance of the FTD and physical flight deck groups is consistent with industry standards and expectations of training transfer from lower-level training devices to Full Flight Simulators. Due to the time limit for each search task and the participants’ lack of familiarity with the flight deck, we found that participants who attempted to search for the text displayed on the card ran out of time. Most participants quickly determined that they needed to start searching for other visual attributes such as shape and colour, only using the text once they had narrowed down the search, or to eliminate incorrect but visually similar candidates. The differing search strategies were not measured or controlled for but may account for some of the individual performance differences, especially during the first training phase. This difference was observed across all groups, so likely did not affect the overall results. We also observed in the search patterns and timing data that very few participants immediately located any specific flight deck elements during the second phase. This may indicate that most individual elements are not committed to memory in the first phase. We suggest instead that a more general understanding of the flight deck layout was learned during the first phase which allowed participants to start their searches in the second phase in closer proximity to the correct element. Some participants reported that they contextualised aspects of the flight deck based on function (engine controls vs electrical systems), while others reported it was based on form (large knobs vs small dials). A significant improvement for future research in this area would be to include eye tracking, although this presents a challenge for large uncontrolled environments

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

31

like physical flight simulators. Given the results of this study and the availability of commercial VR headsets with eye tracking, VR based studies could provide new insights into pilot behaviour. One significant limitation of VR is the lower level of technological maturity in creating Social Presence through human motion capture and reproduction. This includes gross motor movements as well as more subtle cues such as fine hand gestures, eye movements, and facial expressions. The communication between pilots in the flight deck is a critical part of both aviation training and commercial aircraft operation. While VR may be on par with an FTD for flight deck representation, FTDs do not inhibit social interaction and the development of necessary communications skills, as well as the ability for instructors to assess students. Future VR systems may incorporate the required technologies, which could open the potential for geographically distributed training. The IPQ results highlighted several issues and challenges associated with selfreported measures from Presence questionnaires. The item: “the virtual world seemed more realistic than the real world” was scored poorly and with significant variation (mean = 2.47, s.d. = 0.96). By contrast within the same subscale, the item “how real did the virtual world seem to you?” scored well and with less variation (mean = 4.27, s.d. = 0.58). Several participants scored low on the former item and high on the latter. This would seem logical given a virtual world cannot be more realistic than the real world, but this affected the REAL subscale results. Similarly, within the INV subscale, several participants noted contradictory (post-correction) scores for “I was not aware of my real environment” and “I still paid attention to the real environment”. One experimental issue that was highlighted by the IPQ results was lack of sound isolation for the VR experiment room. Several participants noted that sounds from adjacent rooms affected their responses to the questions about being aware of the real world. With that said, the overall Presence results are still strong compared with Schwind et al. [27]. 7.1 Limitations The lack of significant difference between these conditions conflicts with some industry opinions that relearning is required when transferring from the FTD to an FFS. One potential explanation is the difference in lighting between the environments, where FFS are typically poorly lit with greater reliance on instrument backlighting instead of ambient illumination. We introduced additional lights to the physical simulators to match the FTD and VR environments more closely but did not control this factor sufficiently to draw any further conclusions. We noted several limitations with the use of VR in this experiment. Participants with no previous VR experience required different amounts of time to adjust the HMD and become accustomed to the experience. The VR staging area created for this experiment offered participants a limited range of experiences, and participants were not given sufficient time to acclimate to VR. Based on verbal feedback, this staging area did not adequately prepare some participants for the virtual flight deck. More specifically, some participants did not use the normal range of head and body motion to gain different perspectives until partway through the experiment. One benefit of the limited overall time in VR for this experiment is that fatigue and VR sickness were not factors detracting

32

L. Carr et al.

from the study. These should be considered for future VR flight deck studies as students normally spend several hours at a time in flight training devices. Screening participants for airplane flight deck experience was insufficient in two cases. The FTD group included two participants who had helicopter maintenance experience, which was initially thought to have little overlap with commercial fixed-wing flight decks. In conducting the experiment, we discovered that the functional layout similarities between the two were significant, and these participants had strong task performance from the start. Interestingly, they did show a normal performance increase between the two phases. Due to restrictions with accessing the FTD we were not able to re-run these trials with different participants, but any results skew is in the favour of the FTD group. Given that all trainees in commercial aviation have previous experience with other aircraft types, more research is needed to understand how their experiences with different flight deck layouts affect their training.

8 Conclusion We expected the control group performance to be the higher and the flat panel group and VR group to be lower due to the environment differences and technology limitations respectively. The improvement in within-group performance shows measurable spatial knowledge is acquired within the first seven minutes of exposure to the flight deck for this task. VR group performance shows that, within the limitations of this experiment, Virtual Reality could be explored as a viable alternative to flat panel devices for initial spatial learning in flight deck familiarisation. The much lower cost of VR systems has the potential to provide major benefits in significantly reducing overall training costs and improving access to simulation devices.

References 1. E.A.S.A. (EASA): Certification Specifications for Aeroplane Flight Simulation Training Devices. Accessed 24 Jan 2023 2. Qualification and approval of flight simulators and flight training devices. F. A. Administration, F. A. Administration, and F. A. Administration Title 14, Chapter I, Subchapter D, Part 61, Subpart A (2022) 3. EASA: EASA approves the first Virtual Reality (VR) based Flight Simulation Training Device. https://www.easa.europa.eu/en/newsroom-and-events/press-releases/easa-approvesfirst-virtual-reality-vr-based-flight-simulation. Accessed 2 Jan 2023 4. I.A.T. Association: Competency-based training and assessment (CBTA) expansion within the aviation system. https://www.iata.org/contentassets/c0f61fc821dc4f62bb6441d7abedb076/ cbta-expansion-within-the-aviation-system.pdf. Accessed 5 Jan 2023 5. BürkiCohen, J., Soja, N.N., Longridge, T.: Simulator platform motion-the need revisited. Int. J. Aviat. Psychol. 8(3), 293–317 (1998). https://doi.org/10.1207/s15327108ijap0803_8 6. Brki-cohen, J., Boothe, E., Soja, N., Disario, R., Longridge, T.: Simulator Fidelity -- The Effect of Platform Motion, 30 June 2000 7. Meyer, G.F., Wong, L.T., Timson, E., Perfect, P., White, M.D.: Objective Fidelity Evaluation in Multisensory Virtual Environments: Auditory Cue Fidelity in Flight Simulation (Objective Fidelity Evaluation in Flight Simulation), vol. 7, no. 9, p. e44381 (2012). https://doi.org/10. 1371/journal.pone.0044381

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

33

8. Reweti, S., Gilbey, A., Jeffrey, L.: Efficacy of low-cost PC-based aviation training devices. J. Inf. Technol. Educ. 16(1), 127–142 (2017). https://doi.org/10.28945/3682 9. McDermott, J.T.: A comparison of the effectiveness of a personal computer -based aircraft training device and a flight training device at improving pilot instrument proficiency: a case study in leading regulatory change in aviation education (2005) 10. Reweti, S.: PC-based aviation training devices for pilot training in visual flight rules procedures : development, validation and effectiveness : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Aviation at Massey University, Palmerston North, New Zealand. Doctor of Philosophy (Ph.D.) Doctoral, Massey University (2014). http://hdl.handle.net/10179/5454 11. Cross, J.I., Boag-Hodgson, C., Ryley, T., Mavin, T., Potter, L.E.: Using extended reality in flight simulators: a literature review. IEEE transactions on visualization and computer graphics, p. 1 (2022). https://doi.org/10.1109/TVCG.2022.3173921 12. Torrence, B., Dressel, J.: Critical review of extended reality applications in aviation. In: Chen, J.Y.C., Fragomeni, G., (Eds.) Virtual, Augmented and Mixed Reality: Applications in Education, Aviation and Industry, pp. 270–288. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-06015-1_19 13. Renganayagalu, S.K., Mallam, S.C., Nazir, S.: Effectiveness of VR head mounted displays in professional training: a systematic review. Technol. Knowl. Learn. 26(4), 999–1041 (2021). https://doi.org/10.1007/s10758-020-09489-9 14. Oberhauser, M., Dreyer, D., Convard, T., Mamessier, S.: Rapid integration and evaluation of functional HMI components in a virtual reality aircraft cockpit, vol. 485, pp. 17–24 (2016) 15. Oberhauser, M., Dreyer, D.: A virtual reality flight simulator for human factors engineering. Cogn. Technol. Work 19(2–3), 263–277 (2017). https://doi.org/10.1007/s10111-017-0421-7 16. Oberhauser, M., Dreyer, D., Braunstingl, R., Koglbauer, I.: What’s real about virtual reality flight simulation? Aviat. Psychol. Appl. Human Factors 8(1), 22–34 (2018). https://doi.org/ 10.1027/2192-0923/a000134 17. Auer, S., Gerken, J., Reiterer, H., Jetter, H.-C.: Comparison between virtual reality and physical flight simulators for cockpit familiarization, pp. 378–392 (2021). https://doi.org/10.1145/ 3473856.3473860 18. Luo, H., Li, G., Feng, Q., Yang, Y., Zuo, M.: Virtual reality in K-12 and higher education: a systematic review of the literature from 2000 to 2019. J. Comput. Assist. Learn. 37(3), 887–901 (2021). https://doi.org/10.1111/jcal.12538 19. Jensen, L., Konradsen, F.: A review of the use of virtual reality head-mounted displays in education and training. Educ. Inf. Technol. 23(4), 1515–1529 (2017). https://doi.org/10.1007/ s10639-017-9676-0 20. Buttussi, F., Chittaro, L.: Effects of different types of virtual reality display on presence and learning in a safety training scenario. IEEE Trans. Visual Comput. Graph. 24(2), 1063–1076 (2018). https://doi.org/10.1109/TVCG.2017.2653117 21. Kaplan, A.D., Cruit, J., Endsley, M., Beers, S.M., Sawyer, B.D., Hancock, P.A.: The effects of virtual reality, augmented reality, and mixed reality as training enhancement methods: a meta-analysis. Hum Factors 63(4), 706–726 (2021). https://doi.org/10.1177/001872082090 4229 22. Hancock, P.A., Hoffman, R.R.: Keeping up with intelligent technology. IEEE Intell. Syst. 30(1), 62–65 (2015). https://doi.org/10.1109/MIS.2015.13 23. Slater, M.: Place illusion and plausibility can lead to realistic behaviour in immersive virtual environments. Philos. Trans. R. Soc. B 364(1535), 3549–3557 (2009). https://doi.org/10. 1098/rstb.2009.0138 24. Skarbez, R., Brooks, J.F., Whitton, M.: A survey of presence and related concepts. ACM Comput. Surv. (CSUR) 50(6), 1–39 (2018). https://doi.org/10.1145/3134301

34

L. Carr et al.

25. Alexander, A., Brunyé, T., Sidman, J., Weil, S.: From Gaming to Training: A Review of Studies on Fidelity, Immersion, Presence, and Buy-in and Their Effects on Transfer in PC-Based Simulations and Games. 01 Jan 2005 26. Slater, M., Khanna, P., Mortensen, J., Insu, Y.: Visual realism enhances realistic response in an immersive virtual environment. IEEE Comput. Graph. Appl. 29(3), 76–84 (2009). https:// doi.org/10.1109/MCG.2009.55 27. Schwind, V., Knierim, P., Haas, N., Henze, N.: Using Presence Questionnaires in Virtual Reality (2019) 28. Ragan, E.D., Bowman, D.A., Kopper, R., Stinson, C., Scerbo, S., McMahan, R.P.: Effects of field of view and visual complexity on virtual reality training effectiveness for a visual scanning task. IEEE Trans. Visual Comput. Graph. 21(7), 794–807 (2015). https://doi.org/ 10.1109/TVCG.2015.2403312 29. Slater, M.: How colorful was your day? Why questionnaires cannot assess presence in virtual environments. Presence: Teleoper. Virtual Environ. 13(4), 484–493 (2004) 30. Wallis, G., Tichon, J., Mildred, T.: Speed perception as an objective measure of presence in virtual environments (2007) 31. Wallis, G., Tichon, J.: Predicting the efficacy of simulator-based training using a perceptual judgment task versus questionnaire-based measures of presence. Presence Teleoper. Virtual Environ. 22(1), 67–85 (2013). https://doi.org/10.1162/PRES_a_00135 32. Usoh, M., Catena, E., Arman, S., Slater, M.: Using presence questionnaires in reality. Presence Teleoper. Virtual Environ. 9(5), 497–503 (2000). https://doi.org/10.1162/105474600566989 33. Bridgeman, B., Van der Heijden, A.H.C., Velichkovsky, B.M.: A theory of visual stability across saccadic eye movements. Behav. Brain Sci. 17(2), 247–258 (1994). https://doi.org/10. 1017/S0140525X00034361 34. Noë, A.: Action in perception/Alva Noë. MIT Press, Cambridge (2004) 35. Kelly, J., McNamara, T.: Spatial memories of virtual environments: how egocentric experience, intrinsic structure, and extrinsic structure interact. Psychon. Bull. Rev. 15(2), 322–327 (2008). https://doi.org/10.3758/PBR.15.2.322 36. McNamara, T.P., Rump, B., Werner, S.: Egocentric and geocentric frames of reference in memory of large-scale space. Psychonom. Bull. Rev. 10(3), 589–595 (2003). https://doi.org/ 10.3758/BF03196519 37. Wallis, G.M., Backus, B.T.: When action conditions perception: evidence of cross-modal cue recruitment. J. Vis. 16(14), 6 (2016). https://doi.org/10.1167/16.14.6 38. Ragan, E.D.: The effects of higher levels of immersion on procedure memorization performance and implications for educational virtual environments. Presence 19(6), 527–543 (2010). https://doi.org/10.1162/pres_a_00016 39. Shelton, A.L., McNamara, T.P.: Systems of spatial reference in human memory. Cogn. Psychol. 43(4), 274–310 (2001). https://doi.org/10.1006/cogp.2001.0758 40. Draschkow, D., Wolfe, J.M., Võ, M.L.H.: Seek and you shall remember: scene semantics interact with visual search to build better memories. J. Vis. 14(8), 10 (2014). https://doi.org/ 10.1167/14.8.10 41. Saredakis, D., Szpak, A., Birckhead, B., Keage, H.A.D., Rizzo, A., Loetscher, T.: Factors associated with virtual reality sickness in head-mounted displays: a systematic review and meta-analysis. Front Hum Neurosci 14, 96 (2020). https://doi.org/10.3389/fnhum.2020.00096 42. Tran, T., Shin, H., Stuerzlinger, W., Han, J.: Effects of virtual arm representations on interaction in virtual environments, vol. 131944, pp. 1–9 (2017) 43. Knuth, D.E.: The art of Computer Programming. 3rd edn. Addison Wesley (1997) 44. AMD Ryzen 9 5900X CPU. https://www.amd.com/en/products/cpu/amd-ryzen-9-5900x. Accessed 7 Jan 2023 45. Nvidia RTX 3900 Video Card. https://www.nvidia.com/en-au/geforce/graphics-cards/30-ser ies/rtx-3090-3090ti/. Accessed 7 Jan 2023

Spatial Learning in a Virtual Reality Boeing 737NG Flight Deck

35

46. "Varjo XR-3. https://varjo.com/products/xr-3/. Accessed 7 Jan 2023 47. V. Software: Valve Index Base Stations. https://www.valvesoftware.com/en/index/base-sta tions. Accessed 7 Jan 2023 48. Valve Index Controllers. https://www.valvesoftware.com/en/index/controllers. Accessed 8 Jan 2023 49. Moraru, D., Boiangiu, C.-A.: Seeing without eyes: Visual sensory substitution. J. Inf. Syst. Oper. Manage. 9(2), L1 (2015) 50. Venini, D.W.: Visual sensory substitution: initial testing of a custom built visual to tactile device. In: Venini, D.W., (ed.) (2018) 51. Epic Games Unreal Engine 5. https://www.unrealengine.com/en-US/unreal-engine-5. Accessed 7 Jan 2023 52. Igroup Presence Questionnaire overview. https://www.igroup.org/pq/ipq/index.php. Accessed 8 Jan 2023

A Comparative Evaluation of Human Factors Intervention Approaches in Aviation Safety Management Wesley Tsz-Kin Chan(B)

and Wen-Chin Li

Safety and Accident Investigation Centre, Cranfield University, Cranfield, UK [email protected]

Abstract. In the Human Factors Intervention Matrix (HFIX) framework, human factors interventions are categorized into five different approaches and each of these approaches can be evaluated by five different evaluation criteria of feasibility, acceptability, cost, effectiveness, and sustainability. Although the outcome of evaluations can assist safety management practitioners in the selection of more viable safety recommendations, there exists a research gap on how the five different approaches differ with each other. In this study, overall comparisons of the five approaches in HFIX were carried out using the five evaluation criteria. Each intervention approach was also compared independently with other approaches in a pairwise manner to highlight comparative strengths. It was discovered that amongst the five evaluation criteria, only feasibility, cost, and effectiveness differed across the five approaches. Task- and human-based interventions were more feasible, whereas task- and organization-based interventions were rated more highly on cost, reflective of better cost-efficiency. Subjective differences in evaluation were also identified in effectiveness, showcasing that cognitive biases can exist within evaluative frameworks. The findings will benefit safety practitioners and managers in the selection and application of human factors intervention strategies, especially in resource constrained situations in the real world. Keywords: Human Factors Intervention · Evaluation · Safety Management

1 Introduction Although human error is partly attributable to between 60–80% of aviation accidents, it is widely considered inappropriate to simply attribute causes of aircraft accidents to aircrew error [1]. The Human Factors Analysis and Classification System (HFACS) is a popular framework for the categorization of human factors errors in the wider organizational perspective [2]. HFACS addresses human factors over four levels, with unsafe acts and active failures of frontline operators at Level 1, active and latent preconditions at Level 2, latent failures created by unsafe supervision at Level 3, and latent failures originating from organizational-level influences at Level 4. Mishaps and accidents attributable to human errors at Level 1 are considered to occur when the higher levels fail to provide suitable conditions to promote and enable safe practices. As active failures © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 36–45, 2023. https://doi.org/10.1007/978-3-031-35389-5_3

A Comparative Evaluation of Human Factors Intervention Approaches

37

can be generated from latent failures across the organizational levels, the corresponding human factors interventions and safety recommendations to rectify these failures will be equally spread out. Consequently, safety managers are required to identify and prioritize human factors interventions which can be implemented through a variety of approaches at different parts of the system [3]. 1.1 Different Approaches to Human Factors Intervention To assist in the development of efficient intervention strategies for the reduction of human error, Wiegmann & Shappell (2007) proposed the Human Factors Intervention Matrix (HFIX) framework. Unsafe acts of frontline operators, including decision errors, skill-based errors, perceptual errors, and violations, are pitted against five different intervention approaches (see Table 1.). The organizational/administrative approach covers areas such as reviewing human resource management and reviewing rules and regulations; the task/mission approach includes amendments to procedures and manuals; the technology/engineering approach includes the design, repair, and inspection of parts and equipment; the operational/physical environment approach focuses on modifications to the operational or ambient environment (e.g. weather, heat, lighting, etc.); and the human/crew approach involves the review, development, and implementation of personnel training programs [5]. Table 1. Examples of HFIX categories of intervention approaches [5] Human Factors Intervention Approaches

Examples

Organisational/ Administrative

 Human resource management evaluations  Reviewing and issuing rules, regulations, and policies  Improving information management and communication  Conduct research and study on safety culture improvements

Task/ Mission

 Amending, reviewing, and modifying procedures and manuals

Technology/ Engineering

 Design, repair, or inspect parts and equipment

Operational/ Physical Environment

 Modifications to the operational or ambient environment (e.g. weather, altitude, terrain, heat, vibration, lighting)

Human/ Crew

 Reviewing, developing, and implementing training programs

By setting active failures against the different intervention approaches, the HFIX is beneficial for safety practitioners by ensuring that a wide array of prospective interventions extending through the multiple HFACS levels are considered. Realistically, however, resource constraints (e.g. financial, time, etc.) within each organization can limit

38

W. T.-K. Chan and W.-C. Li

the viability of prospective interventions. This necessitates the evaluation of prospective intervention approaches on criteria such as feasibility, acceptability, cost, effectiveness, and sustainability (see Table 2.) [6]. These evaluations are beneficial in assisting safety managers to select interventions based on likelihood of success, and to minimize the risk of workforce disengagement by implementing intervention approaches that are perceived negatively. Table 2. HFIX categories of intervention approaches pitted against five evaluation criteria Evaluation criteria

Intervention approaches Organisational/Administrative

Task/Mission

Technology/Engineering

Operational/PhysicalEnvironment

Human/Crew

Feasibility Acceptability Cost Effectiveness Sustainability

1.2 Workforce Preferences and Recommended Interventions Previous studies have discovered that aviation professionals had overwhelmingly greater preferences towards organizational (62.4%) and human (26.7%) approaches to human factors interventions [3]. These preferences were similar to safety recommendations proposed by previous accident and incident investigation reports. The U.S. FAA Safer Skies initiative, for example, categorized 614 safety recommendations and found organizational (36.6%) and human (32.6%) approaches to be the first and second-most common [6]. Similarly, in a categorization of 182 safety recommendations drawn from commercial airline incident narratives, organizational (41.1%) and human (35.2%) interventions were the most common [7]. Although the coordination between aviation professionals’ selective preferences and the recommendations proposed in accident and incident investigations places great emphasis on organizational and human interventions, there is little information on what actually makes these intervention approaches more viable. Chen, Lin & Li [8] used the Analytical Hierarchy Process (AHP) to assess the five intervention approaches using the five evaluation criteria in HFIX, and compared the evaluation results from commercial airline pilots, airline managers, and aviation accident investigation authority officers. Whilst they concluded that each intervention approach possessed unique characteristics in relation to the various evaluation criteria, their study looked at each intervention independently (i.e. on a column-by-column basis in Table 2.). It therefore lacks the ability to provide comparisons of evaluative ratings across the five approaches. To illustrate, whilst Chen and colleagues found that the organizational approach had a higher score on acceptability than it did on feasibility, cost, effectiveness, or sustainability, their research does not enable the comparison of these criteria with the other intervention approaches – what made organizational interventions more acceptable than the other approaches?

A Comparative Evaluation of Human Factors Intervention Approaches

39

In addition, the similarity between the preferences of aviation professionals and the recommendations suggested by previous investigations may be instigated by “mitigation myopia”. Rather than objectively evaluating interventions for the likelihood of success, a possibility is that investigators simply made recommendations for interventions based on their own preferences [4]. Unless the criterion/criteria which made these intervention approaches more desirable can be determined, the “mitigation myopia” effect cannot be ruled out. The present study consisted of two research objectives. The first objective was to determine whether the five intervention approaches differed significantly on their scores on the respective evaluation criteria of feasibility, acceptability, cost, effectiveness, and sustainability. The second objective adds to the first by determining whether certain intervention approaches were comparatively strong or weak on any of the five evaluation criteria. Building on the previous findings on workforce preferences and recommendations by accident investigation reports, this paper attempts to answer the following questions: do certain intervention approaches have better or worse feasibility, acceptability, cost, effectiveness, and sustainability?; and if so, what were the comparative strengths and weaknesses of each intervention approach? It was envisioned that the findings will help safety managers in selecting and creating different types of human factors interventions.

2 Method 2.1 Participants Aviation professionals involved in safety management within an airline were recruited through their participation in a wider program on air safety investigations. A total of N = 14 responses were included in the present analysis. Specific roles include safety managers (n = 7), engineers (n = 3), ground safety officer (n = 1), safety executive (n = 1), airline manager (n = 1), and airline executive (n = 1). Participation was voluntary, with no identifying information collected, and participants had the option to discontinue the survey at any time. The Cranfield University Research Ethics System provided ethics approval (CURES/12950/2021). 2.2 Research Design Safety recommendations proposed in the official accident report of the TransAsia Airways GE235 crash into Keelung River, Taiwan, on 4th Feb 2015 [9] were utilized as the basis for evaluation. These recommendations were categorized by two subject matter experts into the five HFIX intervention approaches. They were then imported onto a web-based survey hosted on Microsoft Forms for data collection. The participants were asked to rate the safety recommendations on the HFIX evaluation criteria of feasibility, acceptability, cost, effectiveness, and sustainability on a five-point Likert scale from poor (1) to excellent (5).

40

W. T.-K. Chan and W.-C. Li

2.3 Statistical Analysis Data collection took place in December 2021 and was analyzed using SPSS (version 28). One-way repeated measures ANOVAs were conducted to examine the effect that the type of intervention approach had on evaluation scores across five criteria. The intervention approaches (organization, task, technology, operational environment, and human) were entered as the test attributes, and the evaluation rating scores ( feasibility, acceptability, cost, effectiveness, and sustainability) were the dependencies. Mauchly’s sphericity tests were used to evaluate the assumption of sphericity for ANOVA testing, and where this had been violated, the Greenhouse–Geisser correction was applied. For evaluation criteria where the ANOVA results were significant (p < 0.05), suggesting that the intervention approaches differed in feasibility, acceptability, cost, effectiveness, and sustainability ratings, post-hoc analyses were performed on pairwise groups to find out if any of the intervention approaches (organization, task, technology, operational environment, and human) were particularly strong or weak on the evaluation criteria when compared with the other available approaches.

3 Results and Discussion 3.1 Evaluative Differences Amongst Intervention Approaches The mean rating scores for each intervention approach and the ANOVA results are presented in Table 3.. The ANOVA results showed that amongst the five evaluation criteria, only feasibility, cost, and effectiveness scores differed significantly across the intervention approaches. For scoring on feasibility (F 4, 48 = 6.01, p < 0.001), human approaches were considered to be the most feasible, followed respectively by task, organization, operational environment, and lastly technological interventions. For the cost criterion (F 4, 40 = 6.82, p < 0.001), task-based interventions had the highest score (representing a high level of perceived cost-efficiency), followed by organization, human, operational environment, and lastly technology interventions. The respondent scores also differed significantly on the effectiveness criterion (F 2.09, 22.98 = 4.16, p < 0.05). In this case, task-based interventions had the highest mean score, followed by human, technology, organization, and operational environment interventions. These findings will benefit safety managers in their selection of human factors interventions. When the organization or the safety management system is limited by resource constraints, and hence a choice has to be made to as to which of the five approaches should be implemented, the present results encourage managers to place greater emphasis on comparing the factors of feasibility, cost, and effectiveness. These are the evaluative metrics which were found to be most significant in differentiating between the five intervention approaches. The finding of non-significant differences in the acceptability and sustainability criteria was also interesting. Firstly, it suggests that the use of simple point-scores as the evaluative basis for different human factors intervention approaches may be insufficient. Point-scores may only portray insignificant deviations from the mean, denoting that apparent comparative differences between the organization, task, technology, operational environment, and human approaches may not be statistically significant. Secondly,

A Comparative Evaluation of Human Factors Intervention Approaches

41

Table 3. Evaluation scores and ANOVA results for safety interventions related to the five HFIX intervention approaches Intervention approaches Evaluation criteria

Organizational/ Administrative

Task/ Mission

Technology/ Engineering

Operational/ Physical Environment

Human/ Crew

ANOVA Results

Feasibility

M = 3.15, SD = 1.07

M = 3.57, SD = 1.09

M = 2.64, SD = 1.08

M = 3.07, SD = 0.92

M = 3.79, SD = 1.05

F 4, 48 = 6.01, p < 0.001

Acceptability

M = 3.00, SD = 0.60

M = 3.85, SD = 0.80

M = 3.42, SD = 0.79

M = 3.25, SD = 0.75

M = 3.50, SD = 1.00

n.s.

Cost

M = 3.64, SD = 0.67

M = 3.92, SD = 0.79

M = 2.42, SD = 0.90

M = 2.58, SD = 1.08

M = 3.17, SD = 0.94

F 4, 40 = 6.82, p < 0.001

Effectiveness

M = 3.17, SD = 1.27

M = 4.08, SD = 0.51

M = 3.33, SD = 0.78

M = 3.00, SD = 1.13

M = 4.08, SD = 0.49

n.s.

Sustainability

M = 3.10, SD = 1.20

M = 3.92, SD = 0.79

M = 3.69, SD = 0.75

M = 3.36, SD = 1.03

M = 3.92, SD = 0.67

F 2.09, 22.98 = 4.16, p < 0.05

the findings also suggest that the application of HFIX framework can possibly be simplified. For example, safety managers can attempt to assure early on whether the criteria of acceptability and sustainability are applicable for the target audience within their organizations. If they are confirmed to be inapplicable, then a possibly is that these evaluative criteria can be discounted for expediency. 3.2 Task, Human, and Organization Interventions More Feasible and Cost-Efficient For the evaluation criteria which significantly differed across the five intervention approaches, pairwise comparisons were carried out between intervention approach categories to identify comparative differences. For the feasibility criterion, the task – technology (MD = 1.15, p < 0.05) and human – technology (MD = 1.39, p < 0.05) pairs had statistically significant differences. Task-based and human-based intervention approaches were rated as more feasible by the respondents than technology-based interventions (see Table 3., Fig. 1). Interventions related to task and human changes, such as modifications to standard operating procedures and enhanced training programs for personnel were more desirable than changes to the technological hardware. This finding complements the argument, from previous literature, that the benefits of functional changes will be ringfenced unless other supportive resources are provided [10]. To illustrate, technology-based interventions may have limited impact unless they are supported by corresponding procedural adaptations and operator training programs. On the other hand, for the cost criterion (high scores representing better cost-efficiency), significant pairwise differences were found between the organization – technology (MD = 1.09, p < 0.05), task – technology, (MD = 1.27, p < 0.01), and task – operational environment (MD = 1.27, p < 0.05) pairs. Human factors interventions involving organization and task changes were rated higher than modifications to the technology and operational environment (see Table 3., Fig. 2).

42

W. T.-K. Chan and W.-C. Li

Feasibility Score (mean)

5

4

3

2

1

Human

Task

Technology

Fig. 1. Scores for human-, task-, and technology-based intervention approaches on the evaluation criteria of feasibility. Significant pairwise differences were found between these four approaches

Cost Efficiency Score (mean)

5 4 3 2 1

Fig. 2. Scores for task-, organization, operational environment, and technology-based intervention approaches on the evaluation criteria of cost. Significant pairwise differences were found between these four approaches

A Comparative Evaluation of Human Factors Intervention Approaches

43

Although it is understandable that technology and operational environment changes are likely to be costlier than changes to organizational processes and operating procedures, the present findings are nonetheless relevant for safety management when resources are limited. When cost is a limitation, then the perceived cost-efficiency is important. In this case task-based intervention approaches will be preferable as it scores highly on both feasibility and cost criteria. Contrarily, if cost was less of a concern, then the prioritization of feasibility through the implementation of human-based interventions may be more viable. 3.3 Assessed Effectiveness Subject to Mitigation Myopia Similarly, as the effectiveness rating was found to differ significantly across the five intervention approaches, pairwise comparisons were conducted to determine which specific pairs amongst the five approaches were comparatively different. However, none of the pairwise comparisons amongst the five approaches returned significant results. This highlights the trap of “mitigation myopia”, where recommendations for human factors interventions are made based on accident investigators’ own subjective preferences, rather than on objective evaluations of the likelihood of success [4]. In the case of the present results, differences in effectiveness cannot be objectively substantiated. We can tell that that effectiveness varied across the five approaches, but the results do not provide objective information on how the respective approaches compared with each other on the effectiveness rating. HFIX has conventionally been touted as a tool to minimize the effect of cognitive biases by providing safety managers and investigators with a taxonomic framework of viable intervention options and objective evaluation criteria [6]. However, the current finding suggests that although the framework does successfully encourage the consideration of a wider array of viable recommendations, the application of the framework for evaluative purposes remains subject to cognitive biases. Therefore, there is an urgent need for further research on the practical application of the HFIX, especially in the determination of which specific approaches are most relevant to the intended purpose, and how the evaluation of these approaches may be affected by interpersonal differences. Safety practitioners should also be mindful that despite their use of tools like HFIX which are touted as a tool for objectivity, subjective differences in evaluation can still affect the analysis and recommendation of human factors interventions.

4 Conclusion The Human Factors Intervention Matrix (HFIX) is a framework which assists safety practitioners to come up human factors interventions and safety recommendations. It ensures that a wide array of prospective intervention approaches across five categories are considered, and also provides five evaluation criteria on which the various approaches can be rated. However, whilst the evaluated scores of the respective intervention approaches have been assessed in previous research, there is little information on how the intervention approaches compare with each other on these ratings, and which of the evaluation

44

W. T.-K. Chan and W.-C. Li

criteria are more relevant in the detection of differences between the various intervention approaches. In this study, airline safety managers’ ratings of five intervention approaches (organization, task, technology, operational environment, and human) based on five evaluation criteria ( feasibility, acceptability, cost, effectiveness, and sustainability) were assessed for differences. Overall, it was discovered that only the evaluation criteria of feasibility, cost, and effectiveness differed significantly across the intervention approaches. Safety recommendations and interventions related to task and human changes, such as revisions to standard operating procedures and associated training programs for personnel, scored higher on feasibility than technological modifications. Similarly, organization and task changes, such as changes to supervisory oversight and operational processes and regulations, were assessed as more cost-efficient than changes to the technology and operational environment. These findings are beneficial for safety practitioners in selecting interventions in the real world. For example, when cost is a limiting factor, then according to the present results task-based interventions are preferrable as it is concurrently more cost-efficient and more feasible than the other approaches. Although the HFIX encompasses a range of viable intervention approaches and provides the criteria on which these approaches can be evaluated, the finding of overall variances in effectiveness with no associated significant pairwise differences between intervention approaches suggests that the rating scores based on the HFIX evaluation criteria may nevertheless be subject to cognitive biases. Whilst this presents an opportunity for further research, in the meantime safety practitioners should be aware of these issues to ensure appropriate and objective application of the framework.

References 1. Shappell, S., Detwiler, C., Holcomb, K., Hackworth, C., Boquet, A., Wiegmann, D.A.: Human error and commercial aviation accidents: An analysis using the human factors analysis and classification system. Human Factors: J. Human Factors Ergon. Soc. 49(2), 227–242 (2007). https://doi.org/10.1518/001872007X312469 2. Wiegmann, D.A., Shappell, S.A.: A Human Error Approach to Aviation Accident Analysis. Routledge, New York (2003) 3. Chan, W.T.-K., Li, W.-C.: Cultural effects on the selection of aviation safety management strategies. In: Harris, D., Li, W.-C. (eds.) Engineering Psychology and Cognitive Ergonomics, pp. 245–252. Springer International Publishing, Cham (2022). https://doi.org/10.1007/9783-031-06086-1_18 4. Shappell, S., Wiegmann, D.: Developing a methodology for assessing safety programs targeting human error in aviation. Proc. Human Factors Ergon. Soc. Ann. Meet. 51(2), 90–92 (2007). https://doi.org/10.1177/154193120705100208 5. Shappell, S., Wiegmann, D.: Closing the loop on the system safety process: the human factors intervention matrix (HFIX). ISASI 2009 Proceedings, pp. 62–67 (2009) 6. Shappell, S., Wiegmann, D.: A methodology for assessing safety programs targeting human error in aviation. Int. J. Aviat. Psychol. 19(3), 252–269 (2009). https://doi.org/10.1080/105 08410902983904 7. Chen, J.-C., Chi, C.-F., Li, W.-C.: The analysis of safety recommendation and human error prevention strategies in flight operations. In: Harris, D. (ed.) Engineering Psychology and Cognitive Ergonomics. Applications and Services, pp. 75–84. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39354-9_9

A Comparative Evaluation of Human Factors Intervention Approaches

45

8. Chen, J-C., Lin, S-C., Li, W-C.: Utilization of Human Factors Intervention Matrix (HFIX) to develop aviation safety management strategy. In: Proceedings of the 18th World MultiConference Syst. Cybern. Informatics, 2014, pp. 50–55 (2014) 9. Aviation Safety Council. Aviation Occurrence Report: TransAsia Airways Flight GE235 (Report Number: ASC-AOR-16-06-001). Taipei (2016) 10. Edwards, J.R.D., Davey, J., Armstrong, K.: Returning to the roots of culture: A review and re-conceptualisation of safety culture. Saf. Sci. 55, 70–80 (2013). https://doi.org/10.1016/j. ssci.2013.01.004

An In-Depth Examination of Mental Incapacitation and Startle Reflex: A Flight Simulator Study Jonathan Deniel1(B)

, Maud Dupuy2 , Alexandre Duchevet2 Jean-Paul Imbert2 , and Mickaël Causse1

, Nadine Matton2 ,

1 ISAE-SUPAERO, Université de Toulouse, Toulouse, France

[email protected] 2 ENAC, Université de Toulouse, Toulouse, France

Abstract. In aviation, mental incapacitation refers to a temporary degradation in a pilot’s mental abilities, often due to stressful situations. This can result in poor decision-making, as well as a decrease in comprehension, perception, and judgment, leading to the inability to perform the necessary duty as a pilot. Many aviation accidents and incidents are caused by mental incapacitation, particularly when the pilots are suddenly surprised or startled. To better understand this phenomenon, an experiment is being conducted to identify physiological, behavioral, and personality indicators that may explain individual susceptibility to mental incapacitation caused by sudden stressful events. Volunteer pilot students participate in a two-part process: (1) a personality and cognitive evaluation phase, where mental stress is manipulated, and (2) a flight simulator phase in which a sudden, surprising event (an intense thunder sound with a flash of lightning) takes place. Physiological, behavioral, and subjective evaluations are collected to determine the potential correlations and precursors of mental incapacitation. Keywords: Piloting · Stress · Surprise · Startle Reflex · Mental Incapacitation

1 Introduction In aviation, incapacitation is a situation in which a pilot on duty is no longer able to perform its job to the required level. Incapacitations has been categorized into two operational classifications: “obvious” and “subtle” [1]. Obvious incapacitations are immediately noticeable by the other crew members, while subtle incapacitations are typically partial and temporary in nature, and they can be insidious because the concerned pilot may look normal and be himself unaware of its state. Yet, he will operate at a very reduced level of performance and safety. Medical incapacitation (e.g., fainting, heart attack, loss of consciousness) can be easily classified as obvious while mental incapacitation, the core of this work, is more subtle, less understood, and can be defined as a state of temporary cognitive impairments resulting from different phenomena such as cognitive overload [2], acute stress [3, 4], unexpected events like an intense thunder sound [5], or life-threatening situations [6]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 46–59, 2023. https://doi.org/10.1007/978-3-031-35389-5_4

An In-Depth Examination of Mental Incapacitation

47

Startle and surprise, generally caused by unexpected events, are frequently reported to be the cause of detrimental effects on cognitive functioning. Although the terms startle and surprise are sometimes used interchangeably, they do not cover the same type of phenomenon. Indeed, startle refers to the stereotyped reflex reaction, implying eye blink, neck retraction and shoulders elevation [7] triggered by the occurrence of a highintensity stimulus (e.g., a loud sound, a sudden image, etc.), whereas surprise refers to an event that was not foreseen by the individual, although it is not necessarily associated with a high intensity [8]. By extrapolation, it is even possible to be surprised by the unexpected disappearance of a stimulus and to be startled by a loud sound even though its occurrence was fully expected [9]. Further, it has been shown that the startle reflex can be increased if a dangerous event is expected [6, 10], this phenomenon is referred to as fear-potentiated startle. A particular case of surprise is automation surprise where the operator is puzzled by an unexpected automation behavior or feedback [11, 12]. Mental incapacitation due to startle and surprise effects has been acknowledged as a contributing factor in several incidents [13] and accidents [14]. Consequently, the International Civil Aviation Organization (ICAO, 2013), as well as the European Aviation Safety Agency (EASA, 2015) and the Federal Aviation Administration (FAA, 2015) called for the development of pilot training courses dedicated to the management of this situation. It is crucial to develop methods and tools to predict and detect such detrimental mental states and to identify potential underlying factors (e.g., personality traits, anxiety, mental workload management etc.) that could promote individual vulnerability to mental incapacitation. To this end, we developed a two-phase experimental protocol consisting of (1) a cognitive and personality screening phase followed by (2) a flight simulator phase implementing a startling scenario during which physiological, behavioral, and subjective psychological measurements were acquired. We hypothesized that inter-individual differences in vulnerability to mental incapacitation might be detectable via different subjective, behavioral, and physiological cues. For instance, by linking anxiety level, emotional regulation abilities or mental workload management skills measured during the first phase to the management of the startling scenario during the flight simulator phase. We also hypothesized that participants demonstrating a more intense startling reflex after the startling stimulus might demonstrate more degraded flight performance. Finally, we investigate whether ocular (i.e., lower blink rate, shorter blink duration) and cardiac (i.e., increased heart rate) activity could predict in some way the performance during both phases. The purpose of this article is to present the protocol and preliminary results of our ongoing experiment. In addition to presenting the data collection process, we also aim to demonstrate the validity of the flight scenarios to elicit surprise and startle in our pilots.

2 Method In order to identify behavioral and physiological cues of mental incapacitation in the context of startle and surprise during aircraft piloting, we set up a two-phase experiment. The first one included different cognitive tests as well as personality questionnaires aiming at profiling the participants, especially on the dimensions related to stress and anxiety. The second one consisted of a flight session on an A320 simulator in which the participants had to perform two flight scenarios: a baseline scenario and a startling scenario.

48

J. Deniel et al.

2.1 Participants The participants in this experiment were student pilots from the French national civil aviation school (ENAC). They were all trained to fly and had at least 20 h of experience on an A320 simulator (10th training session of the Multi-Crew Coordination Course MCC program). The experimental protocol was approved by the research ethics committee of the University of Toulouse (n°2022–524). The flight scenario was designed in collaboration with the instructors in charge of pilot training at ENAC. Currently, data from six participants (5 males; mean age: 21yo, sd: 1.06y) have been acquired. Their average global flight experience is 301.66h (sd, 88.41h) with 37h (sd, 5.89) on A320 simulator. More participants are expected to be involved in the upcoming months. 2.2 Phase 1: Questionnaires and Laboratory Tests During the first phase of the study, the participant was first asked to fill in and sign a consent form providing the main information concerning the study and indicating that the main purpose was the study of mental load. They were told that various behavioral and physiological measures would be taken and that recorded data would remain strictly confidential and would never be used to affect their training or future career. They were also told that the protocol had been designed in agreement with the flight instructors responsible for training at ENAC. The participants were then asked to complete five questionnaires. The first one was about the use of different substances (i.e. coffee, tea, drugs, etc.) as well as physical activity and sleep during the last 24 h. This information was used to check for the presence of factors that may influence the results. The second one, the multidimensional emotion questionnaire (MEQ) [15], assesses emotional reactivity and regulation. The third questionnaire was a short version of the Big Five Personality Inventory (BFI 10) [16], which evaluates the participants’ personality positioning on the dimensions of openness, conscientiousness, extraversion, agreeableness and neuroticism. The fourth one was the State Anxiety Inventory - Trait (STAI) questionnaire [17]. Finally, the fifth and last questionnaire was the Perceived Stress Scale (PSS-14), it estimates how participants perceive stress [18]. Once the questionnaires completed, the participant was fitted with a Bluetooth Faros 360 electrocardiographic belt and was invited to sit in front of the experimental computer for eye tracker calibration. A pupillometric baseline was then performed and consisted of the participant staring at a white cross displayed on the screen for 2 min. This 2-min phase was also used to perform a cardiac baseline. A second baseline with the same purpose and following the same procedure, was also performed at the end of the first experimental phase. After these preparatory steps, the instructions for the first experimental phase were given. This first experimental phase involved completing three cognitive tasks, the Toulouse N-back task (TNT) [19, 20], the priority management task (TGP) (e.g., [21]), and the multi-attribute task battery (MATB), developed in its open-source version [22, 23]. The presentation order of the tasks was balanced among all participants, with six possible orders determined and participants randomly assigned to them. The three cognitive tasks were displayed and recorded using the Tobii Spectrum eye-tracking system. Eye-tracking, ECG, and performance data were synchronized using the Lab Streaming Layer (LSL) and its accompanying LabRecorder software [24]. The response devices were adapted to match the cognitive task being performed: a response box (Cedrus box)

An In-Depth Examination of Mental Incapacitation

49

for the TNT, two joysticks and a keyboard for the TGP, a joystick, a keyboard, and a headset for the OpenMATB. Each task was accompanied by specific training, including detailed instructions on the subtasks, displays, and input devices. During the training, participants were allowed to ask for clarifications, and the experimenters monitored the process to ensure a clear understanding of the instructions. Toulouse N-back task (TNT). The Toulouse N-Back task consisted of an n-back task in which the material to be memorized was the result of arithmetic operations: addition or subtraction of two numbers. Two levels of difficulty were presented in alternating order: 0-back and 2-back. For the 0-back difficulty level, the participant had to perform the mental calculation and then compare the result to the value “50”. If the result is 50, they had to press the green button located on their right-hand side, in the case of a not matching result, they had to press the red button located on their left-hand side. The participant had 3s to answer before the next operation appeared on the screen. For the 2-back difficulty level, the operations and presentation times were similar. However, the participant had to compare the result of the operation currently presented with the result of the operation presented on the penultimate trial. The participant therefore had to always keep track of the results of the 2 previous operations. The responses were recorded along with the associated response times. Each difficulty level block consisted of 12 consecutive operations separated by the presentation of 6 neutral operations (i.e., 00 + 00). These blocks of neutral operations were identical to n-back operations in all respects in terms of visual stimulation but did not require any response from the participant. A total of 6 0-back blocks and 6 2-back blocks were presented. The total duration of the task was 9 min. Priority Management Task (TGP). The priority management task (TGP) is used by ENAC in the pilot selection procedure. It consists of four parallel subtasks, each with a different nature. The first subtask, called ‘tracking’, consisted of the participant operating the right-hand joystick to maintain a cursor in a circle moving on a defined space on the screen. The second subtask, called ‘supervision’, consisted of the participant monitoring 4 gauges (2 horizontal and 2 vertical) and detecting any movement of the cursor outside of a target area and correcting it with the second joystick when necessary. The third subtask, called ‘recognition’, consisted of participants identifying and notifying 3 target letters among a set of 9 letters randomly presented and regularly renewed. Responses were provided via the keyboard. The fourth and final subtask, called ‘calculation’, consisted of performing mental calculation operations and providing the answer as quickly as possible using the keyboard. The TGP interface also had performance bars indicating the participant’s global and detailed performance on the task. The difficulty of the TGP was manipulated through the frequency of events to be handled. After a 6-min training session, the participant was invited to engage in the first of three blocks of intermediate difficulty (i.e., 1 event every 20s) followed by the second block of low difficulty (i.e., 1 event every 30s) and finally the block of high difficulty (i.e., 1 event every 10s). The whole task took about 18 min. Open Multi-attribute Task Battery (MATB). The Open Multi-Attribute Task Battery (MATB) [22] simulates complex tasks a pilot performs during flight. It requires the participant to perform four subtasks of different nature in parallel. The first subtask, called ‘system monitoring’, requires the participant monitoring four vertical gauges and two lights, detecting any abnormal behavior and correcting it by pressing the adequate

50

J. Deniel et al.

key on the keyboard. In the advent of a non-response or an erroneous response, red feedback appeared on the gauge concerned by the error. For this subtask, the difficulty was manipulated through the frequency of the events to be corrected. The second MATB subtask, ‘tracking’, requires the participant using a joystick to maintain a cursor in a target. Constant movements of the cursor must be corrected by the participant to keep the cursor in the target. For this subtask the level of difficulty was manipulated through the size of the target. At the easiest level, the target area occupied almost the entire space in which the cursor could be moved, whereas at the most difficult level, the target represented an area covering only the extreme center of the space. The third subtask, called ‘communications’, involved the auditory modality and consisted in listening to audio messages related to radio channels. The graphical interface displays four different radios (COM 1, COM 2, NAV 1 and NAV 2) and a frequency for each of these radios. Above this interface was the code identifying the vehicle being piloted by the participant. The task was to detect the radio messages mentioning the vehicle’s code and to adjust the frequency of the radio mentioned in the message. For example, if the participant’s device was the BX215, the message to be detected could take the following form: “BX215 NAV 1 135.4”. The participant then had to use the keyboard to select the NAV 1 radio, then change the frequency to 135.4 using the left/right arrows and finally validate the instruction using the enter key. Green or red feedback was provided for good or erroneous answers. Finally, the last subtask, called ‘resource management’, simulated the monitoring and management that a pilot may have to control the depleting levels of different fuel tanks of an aircraft. This task presented the participants with six tanks, two of which were main tanks. These two main tanks visually presented their filling level as well as markers symbolizing the target level range outside of which the fuel level must not be allowed to go out. While performing this task, both main tanks’ levels were gradually depleting if no action was taken. This level depletion manipulation was used to vary difficulty. To deal with this issue, the system has eight pumps connecting the different auxiliary tanks to the main ones and that can be activated via the pump number on the keypad. The pump status and flow rate were displayed on the interface. In some instances, the pumps could experience temporary malfunctions, rendering them inoperable and causing them to be displayed in red. Although the MATB has been used very frequently in the literature, there are inconsistencies in how it is employed, e.g. some researchers did not always use all subtasks in parallel [25] and the difficulty levels were not always handled and documented in the same way [26–28]. In order to better manage this issue, the Open-MATB development team created a scenario generator that allows fine grained adjustment of test duration and difficulty level. In our study, we generated a 22-min scenario in which the difficulty increased by 4.5% per minute, from 0% in the first minute to 100% in the last minute. As the objective of the task is to manage as much as possible all 4 subtasks in parallel, the Open-MATB development team designed a real time performance feedback taking the form of a vertical bar that filled in green color between 0 and 100 according to the ability of the participant to correctly handle all the subtasks at the same time. To avoid strategies of prioritizing one subtask over the others it was decided that the level of performance displayed by this bar would be based on the performance of the least well executed subtask at the time. Thus, for example, it was possible that the value of the performance bar would first reflect the performance of the tracking task and then, a

An In-Depth Examination of Mental Incapacitation

51

few seconds later, the performance of the communication management task if the latter was left behind by the participant. To limit the possibility of abandonment in the case of very high difficulty and very poor performance, instantaneous performance could not visually fall below 20%. In this case, the bar turned red and remained at this 20% level. As soon as the participant’s performance improved, the performance bar turned green again. At the end of each of these three tasks, participants were asked to evaluate the difficulty of their experience using the NASA TLX scales by providing their subjective ratings of the most challenging period. Once these three tasks were completed, the electrocardiographic equipment was removed from the participant and an appointment was made for the second experimental phase.

2.3 Phase 2: Flight Simulation Tasks The second experimental phase took place on an A320 flight simulator (ACHIL laboratory), a few days after the first phase (maximum = 30 days). First, the general objectives of the experiment were reminded to the participant (except for the elements relating to startle and surprise). The participant filled in a questionnaire on psychoactive substances (i.e., coffee, tea, alcohol, medications, drugs, etc.) consumption over the last 24 h and a Geneva emotion wheel (GEW) [29] in order to establish a subjective emotional baseline. They were then fitted with the electrocardiogram belt, equipped with a Tobii Glasses 2 Eye Tracker, and electromyographic electrodes (Biopac MP150) were placed on their left sternocleidomastoid muscle. In addition, flight data and control actions were recorded and synchronized via LSL. A camera was also positioned in the cockpit to film the participant’s face in order to visually identify startle reactions and emotional facial expressions (ground truth). Finally, they were briefed about the main characteristics of the simulator. Participants underwent a 30-min training session to familiarize themselves with the flight controls and working with their PM, who was played by a fellow pilot. The participants occupied the right seat and held the Pilot Flying (PF) role throughout all scenarios, while the simulator manager was responsible for handling ATC communications. Once the training was completed and the eye-tracker calibrations were carried out, the experimenters made the flight scenario briefing. Two flight scenarios were performed, each consisting of a landing at Orly airport. One of the two scenarios was defined as ’baseline’ (BL) and did not involve any unusual event, while the other was defined as ’startle’ (SRTL) and involved a series of complications, including a startling event. The presentation order of the two scenarios varied between participants to control for order effects. Both scenarios started with a holding pattern on approach at 3000ft in instrument meteorological conditions (IMC) with a ceiling of 300 ft and no wind. In the BL scenario, the participant had to perform a standard approach and landing in collaboration with the PM. In the SRTL scenario, the participant was informed during the briefing that a thunderstorm cell was present north of the airport and that the weather radar was out of order. After leaving the holding pattern, a ‘low fuel’ alert was displayed in the cockpit. As time went on, this alert was repeated, suggesting a potential fuel leak and limiting therefore the participant’s ability to make a go around decision. The participant was thus expected to request an emergency landing, which was automatically approved by the ATC. On the

52

J. Deniel et al.

interception of the localizer, approximately 9 min after the start of the flight scenario, the Pilot Monitoring (PM) became incapacitated and reported his status by means of a written board which he made sure was seen. The participant would then prepare for a solo landing as the PM would not perform any task or respond to requests. Approximately 45 s after this event, a loud thunder sound (>90db) and a flash of lightning were played to simulate the aircraft being struck by lightning. This event, known to trigger a startle response [30–33], caused a major electrical failure, resulting in the loss of several displays and the autopilot and auto-throttle systems. Figure 1. Shows the screens that were lost during the electrical failure. From this point onwards, the participant had no option but to carry out a high workload manual landing.

Fig. 1. Illustration of the A320 cockpit interfaces during the Emergency Electrical configuration.

After completing the flight scenarios, participants were asked to complete the NASA TLX questionnaire, a second Emotion Wheel (GEW), and a failure recovery scale [34]. A debriefing was then conducted by all members of the experimental team (i.e., experimenters, PM and simulation manager). The participants were invited to informally transcribe their feelings about the flight phase, their reading and understanding of the situation. They were asked to describe the course of events in the SRTL scenario and their impressions in terms of control of the situation. More specific questions were eventually asked in order to verify whether certain events such as the low fuel alert, the incapacitation of the PM, or the master warning had been properly noticed. If necessary, they were also asked to explain what strategy they had put in place. During the debriefing, the participant was also asked to indicate what the worst moment had been and to indicate if at any point in the scenario they had felt that they could not land the plane or if they had disengaged from the situation because they felt it was not credible. Following this debriefing, Likert scales were provided to the participant to assess their level of surprise and their subjective amplitude of startle in 5 points for the different events in the scenario (i.e., low fuel, PM incapacitation, lightning, thunder sound, electrical

An In-Depth Examination of Mental Incapacitation

53

failure). A last questionnaire, the Impact of Event Scale - Revised (IES-R) [35], had to be completed at home 7 days after the experiment. This latter questionnaire was designed to assess the possible longer-term impact of the stressful events experienced during the SRTL scenario. Finally, a more detailed explanation of the aims and objectives of the study was given to the participant, who was told that the SRTL scenario was designed to be stressful and difficult and that there was no single and best way to deal with the situation. The participant was encouraged to contact the experimenters and the instructors again if they had any questions or concerns about the experience.

3 Preliminary Results Data collection and analysis are ongoing, thus the preliminary results presented here primarily focus on outlining the participants’ reaction to the thunder sound and the flash of lightning, personality and demographic characteristics, as well as assessing the suitability of the experimental material in meeting the original objectives. 3.1 Auto-evaluation of Surprise and Startle During the Startling Flight Scenario Subjective evaluations of surprise and startle have been collected for the SRTL scenario main events. Figure 2. Shows the current distribution of these ratings. First, the sound of thunder was widely perceived as both surprising and startling. In contrast, the flash of lightning did not elicit a similar response, as most participants reported not noticing it. The other three events in the scenario, including the low fuel warning, the PM incapacitation, and the major electrical failure, were rated as surprising but not startling. These subjective evaluations serve to validate both the surprise and startle nature of the SRTL scenario and the distinction between surprise and startle previously documented in literature.”

Fig. 2. Distributions of surprise and startle auto evaluations for each event of the SRTL flight scenario.

54

J. Deniel et al.

3.2 Startle Reflex During the Thunder Sound and the Flash of Lightning Two researchers analyzed the video recording of the pilots (ground truth) during the occurrence of the thunder noise. Based on [7] description of facial expressions of startle and surprise associated with electromyogram (EMG) signal analysis, the level of startle reaction was assessed. Based on video recordings, out of the 6 participants, 3 displayed a startle reflex (1 strong and 2 moderate startles), 1 simply blinked, and 2 could not be classified due to recordings issues. Figure 3. Depicts the EMG signal of the sternocleidomastoid muscle activity of the participant classified as strong startle reaction. These behavioral and EMG data will be correlated with the participants’ personality parameters and performance obtained from the three tests of the first phase.

Fig. 3. Illustration of the muscle activity of the left sternocleidomastoid in one participant just after the startling stimulus (thunder noise). This participant demonstrated a visible startle reflex.

3.3 Personality Evaluation The initial results from personality, stress, and anxiety surveys suggest that the recruited participants exhibit moderate levels of anxiety (as shown in Table 1). They tend to be slightly extroverted, have slightly higher levels of agreeableness and openness, and are lower in neuroticism than the general population. Notably, they display higher levels of conscientiousness than the average individuals (as demonstrated in Fig. 4.). These findings are largely consistent with the personality traits commonly observed in airline pilots, including higher agreeableness and conscientiousness, and lower levels of neuroticism [36].

An In-Depth Examination of Mental Incapacitation

55

Table 1. Stress and anxiety scores in PSS-14, STAI-State and STAI-Trait. PSS-14 Mean SD

STAI-State

STAI-Trait

Raw score

Percent score

Raw score

Percent score

Raw score

Percent score

19.66

35.11

31.33

18.88

33.5

22.5

5.50

9.82

5.27

8.79

7.39

12.32

Fig. 4. Distribution of BFI scores according to personality dimensions (horizontal red line represents the population average score [16]).

3.4 Workload During the Three Tasks and the Startling Flight Scenario Regarding the ratings of workload for each task at their most challenging level, the preliminary results depicted in Fig. 5. Suggest that the mental demand is similar for the TGP, MATB, and the startling flight scenario, and possibly slightly lower for the TNT. A similar trend seems to emerge for the temporal demand, which requires further confirmation in the effort dimension. The performance dimension appears to have a wide range of ratings for all tasks, except possibly for the TNT. The physical demand appears to be more diverse across tasks, and additional data may reveal a clearer pattern if one exists. The three cognitive tasks (TGP, MATB, and TNT) appear to cause a similar level of frustration, which varies significantly among participants. The startling flight scenario is particularly noteworthy for causing both higher levels of frustration and a smaller dispersion of participants’ ratings. The similarities in the ratings for the TGP and MATB with the rating of the flight simulation indicate that simplified tasks such as TGP and MATB mimic relatively well the workload of piloting activities. These findings align with historical data from the literature [36].

56

J. Deniel et al.

Fig. 5. Distributions of the 6 NASA TLX scales scores for each task.

4 Discussion Our preliminary results confirmed that the experimental protocol suited the objectives of the study. Firstly, subjective evaluation showed a markedly higher number of participants reported feeling startled by the thunder sound compared to the other stimuli. Also, the reactions to the thunder sound and flash of lightning differed among the participants. Although a majority of them exhibited a startle reflex, its intensity varied across participants. Some had a strong reaction while others only blinked or had minimal but still typical startle facial expressions. This variation in susceptibility to the startle reflex will be thoroughly examined by comparing the results from phase 1 to those of phase 2. For instance, it’s possible that those with higher levels of anxiety had a stronger startle reflex and a more degraded piloting performance. Electromyographic analysis is still ongoing but they suggest the good sensitivity of the sternocleidomastoid to the startling reflex. As a second result, it should be noticed that the cognitive tasks of phase 1 appear to be consistent with the flying task regarding the level of subjective workload, at least for the two multitasking environments (TGP and OpenMATB). This supports the notion that laboratory tests can provide valuable insights into behavior, particularly from a perspective of mental workload management. Finally, the personality results of our pilots indicate that they have lower levels of stress, higher levels of conscientiousness, and a lower neuroticism level in comparison to the general population. These findings support the specificity of the psychological profile of pilots, who are selected based on personality traits that are known to be advantageous for managing stressful situations (as referenced in a previous study). Our next step is to investigate the impact of these personality traits on the management of the startle reflex, which is a crucial aspect of this research.

An In-Depth Examination of Mental Incapacitation

57

5 Conclusion Our study confirmed that a flight scenario with an unexpected and intense thunder sound is a powerful source of surprise and startle reflex, and can be used effectively to elicit this behavior in research and other applications. Our preliminary findings suggest that susceptibility to the startle reflex varies among individuals. This ongoing work aims to gain a deeper understanding of which personality traits contribute to these differences. Potential applications of this study relates to the pilot selection and training as well as the in-flight detection of startle reflex and incapacitation. Acknowledgements. The author would like to thank Yves Rouillard for his contribution in the development and implementation of the scenarios in flight simulator. This work was funded by the French National Research Agency (ANR) in the framework of the ANR ASTRID-Maturation “EYE-INTERACTION” (ANR-19-ASMA-0009).

References 1. ICAO Doc 8984 “Manual of civil aviation medicine”, 3rd edn (2012) 2. Durantin, G., Gagnon, J.F., Tremblay, S., Dehais, F.: Using near infrared spectroscopy and heart rate variability to detect mental overload. Behav. Brain Res. 259, 16–23 (2014) 3. Dismukes, R., Goldsmith, T.E., Kochan, J.A.: Effects of acute stress on aircrew performance: literature review and analysis of operational aspects (2015) 4. Vine, S.J., Uiga, L., Lavric, A., Moore, L.J., Tsaneva-Atanasova, K., Wilson, M.R.: Individual reactions to stress predict performance during a critical aviation incident. Anxiety Stress Coping 28(4), 467–477 (2015) 5. Kinney, L., O’Hare, D.: Responding to an unexpected in-flight event: physiological arousal, information processing, and performance. Hum. Factors 62(5), 737–750 (2020) 6. Martin, W.L., Murray, P.S., Bates, P.R., Lee, P.S.Y.: Fear-potentiated startle: a review from an aviation perspective. Int. J. Aviat. Psychol. 25(2), 97–107 (2016) 7. Ekman, P., Friesen, W.V., Simons, R.C.: Is the startle reaction an emotion? J. Pers. Soc. Psychol. 49(5), 1416 (1985) 8. Rivera, J., Talone, A.B., Boesser, C.T., Jentsch, F., Yeh, M.: Startle and surprise on the flight deck similarities, differences, and prevalence. In: Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting, pp. 1047–1051. Santa Monica, CA, Human Factors and Ergonomics Society (2014) 9. Foss, J.A., Ison, J.R., Torre, J.P., Jr., Wansack, S.: The acoustic startle response and disruption of aiming: II. Modulation by forewarning and preliminary stimuli. Hum. Factors 31(3), 319– 333 (1989) 10. Lang, P.J., Bradley, M.M., Cuthbert, B.N.: Emotion, attention, and the startle reflex. Psychol. Rev. 97, 377 395 (1990) 11. Dehais, F., Peysakhovich, V., Scannella, S., Fongue, J., Gateau, T.: Automation surprise in aviation: real-time solutions. In: Proceedings of the 33rd annual ACM conference on Human Factors in Computing Systems, pp. 2525–2534 (2015) 12. Sarter, N.B., Woods, D.D., Billings, C.E.: Automation surprises, vol. 2. Wiley, New York (1997) 13. FBU711, BEA: https://bea.aero/fileadmin/user_upload/BEA2020-0065.en.pdf (2020) 14. AF - 447, BEA: https://bea.aero/docspa/2009/f-cp090601.en/pdf/f-cp090601.en.pdf (2012)

58

J. Deniel et al.

15. Klonsky, E.D., Victor, S.E., Hibbert, A.S., Hajcak, G.: The multidimensional emotion questionnaire (MEQ): rationale and initial psychometric properties. J. Psychopathol. Behav. Assess. 41(3), 409–424 (2019). https://doi.org/10.1007/s10862-019-09741-2 16. Courtois, R., et al.: Validation française du big five inventory à 10 items (BFI-10). Encéphale 46(6), 455–462 (2020) 17. Spielberger, C.D.: State-trait anxiety inventory for adults (1983) 18. Cohen, S., Kamarck, T., Mermelstein, R.: A global measure of perceived stress. J. Health Soc. Behav. 24(4), 385–396 (1983). https://doi.org/10.2307/2136404 19. Mandrick, K., Peysakhovich, V., Rémy, F., Lepron, E., Causse, M.: Neural and psychophysiological correlates of human performance under stress and high mental workload. Biol. Psychol. 121, 62–73 (2016) 20. Causse, M., Peysakhovich, V., Mandrick, K.: Eliciting sustained mental effort using the Toulouse n-back task: prefrontal cortex and pupillary responses. In: Advances in Neuroergonomics and Cognitive Engineering: Proceedings of the AHFE 2016 International Conference on Neuroergonomics and Cognitive Engineering, July 27–31 (2016) 21. Matton, N., Paubel, P., Cegarra, J., Raufaste, E.: Differences in multitask resource reallocation after change in task values. Hum. Factors 58(8), 1128–1142 (2016) 22. Cegarra, J., et al.: OpenMATB: A multi-attribute task battery promoting task customization, software extensibility and experiment replicability. Behav. Res. Methods 52, 1980–1990 (2020) 23. Comstock Jr, J.R., Arnegard, R.J.: The multi-attribute task battery for human operator workload and strategic behavior research. (NASA-TM-104174). National Aeronautics and Space Administration, Washington (1992) 24. Kothe, C.: Lab streaming layer (lsl) (2013) 25. Gutzwiller, R.S., Wickens, C.D., Clegg, B.A.: Workload overload modeling: an experiment with MATB II to inform a computational model of task management. In: Proceedings of the Human Factors and Ergonomics Society 58th Annual Meeting, pp. 849 – 853 (2014) 26. Zhang, J., Cao, X., Wang, X., Pang, L., Liang, J., Zhang, L.: Physiological responses to elevated carbon dioxide concentration and mental workload during performing MATB tasks. Build. Environ. 195, 107752 (2021). ISSN 0360–1323. https://doi.org/10.1016/j.buildenv. 2021.107752 27. Yufeng, K., et al.: An EEG-based mental workload estimator trained on working memory task can work well under simulated multi-attribute task. Front. Hum. Neurosci. 8, 703–713 (2014). https://www.frontiersin.org/article/10.3389/fnhum.2014.00703 28. Bowers, M.A.: The effects of workload transitions in a multitasking environment (Doctoral dissertation, University of Dayton) (2013) 29. Sacharin, V., Schlegel, K., Scherer, K.R.: Geneva emotion wheel rating study (2012). https:// archive-ouverte.unige.ch/unige:97849. Accessed 9 Feb 2023 30. Woodhead, M.M.: The effects of bursts of loud noise on a continuous visual task. Br. J. Ind. Med. 15(2), 120–125 (1958) 31. Sternbach, R.A.: Correlates of differences in time to recover from startle. Psychosom. Med. 22(2), 143–148 (1960) 32. Vlasak, M.: Effect of startle stimuli on performance. Aerospace Med. 40(2), 124–128 (1969) 33. Thackray, R.I., Touchstone, R.M.: Recovery of motor performance following startle. Perceptual Motor skills 30(1), 279–292 (1970) 34. Hindson, W., Schroeder, J., Eshow, M.: A pilot rating scale for evaluating failure transients in electronic flight control systems. In: 17th Atmospheric Flight Mechanics Conference, p. 2827 (1990)

An In-Depth Examination of Mental Incapacitation

59

35. Weiss, D.S.: The impact of event scale: revised. Cross-cultural assessment of psychological trauma and PTSD, pp. 219–238 (2007) 36. Mesarosova, K., Siegling, A.B., Plouffe, R.A., Saklofske, D.H., Smith, M.M., Tremblay, P.F.: Personality measurement and profile in a European sample of civil airline pilots. Eur. J. Psychol. Assess. 36(6), 791–800 (2018)

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits Featuring Single Pilot Operation with Large Transport Airplanes Lars Ebrecht(B) German Aerospace Center (DLR), Lilienthalplatz 7, 38108 Braunschweig, Germany [email protected]

Abstract. This paper presents a high-fidelity task analysis. It examines the actions of the Standard Operating Procedures which are jointly conducted by the pilot flying and pilot monitoring when operating a large airplane, e.g. an Airbus A320. The analysis of the activities considers the aviate, navigate, communicate and manage relationship, sequential and parallel execution, the related interface and information. The bottom up approach also investigates, how todays multi pilot operations concepts, like the task share of the two pilots, cross-checks or the four-eyes principle are implemented. Based on that, a possible reassignment of actions was made towards enabling single pilot operation with a large airplane supported by an enhanced assistance. The paper describes the research context, current regulations of multi pilot and single pilot operation, related work, the method and an insight to major results. The depicted results and needs for future developments enabling single pilot operation with large airplanes were discussed. Keywords: Aviation · Human Machine Interface · Men Machine Interaction · Automation and Assistance · NICo

1 Introduction Due to the evolution of avionics and the development of new technologies, commercial aviation has never been safer and more performant conducting its daily all-weather operation. Present and new technologies like Autoland or Remotely Piloted Aerial Systems (RPAS) cause the question whether large airplanes could be operated by just one pilot in the cockpit managing all operations, weather conditions, traffic situations, environmental circumstances as well as contingencies that can occur. At present, large airplanes are managed by two pilots – multi pilot operation (MPO). At the beginning, large airplane crews consisted of up to five members – a radio operator, a navigator, a flight engineer, the first officer and the captain. Towards the reduction of crew members and an increased level of automation, the number of tasks to operate a commercial airplane with two pilots today more or less equals the number of tasks of former large airplanes with five-member crews. Advanced avionics just provide sophisticated automation and assistance functions that enable the two pilots to © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 60–76, 2023. https://doi.org/10.1007/978-3-031-35389-5_5

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

61

aviate, communicate, navigate, manage the aircraft systems and conducting the mission management. In contrast to large airplanes [1], normal category airplanes, i.e. airplanes with a maximum of 19 passenger seating configuration and maximum take-off weight of 8.618 kg (19.000 lb) [2], are mostly certified to be operated by a single pilot (Single Pilot Operation - SPO). For instance, the Cessna Citation business jets CJ1-CJ4 are certified for single and multi pilot operation. Since year 1998 the Cessna CJ1 jet can be flown with a maximum cruise speed of 720 km/h by one pilot [3]. “The CJ1 is extremely easy to fly and can be single-pilot operated. The Citation line was designed for forward-thinking businessmen that would fly their own private jets to and from business meetings, resulting in several automated systems and a simple avionics system. For those that don’t plan to fly their own jet, its ability to be flown by a single pilot offers greater flexibility in flight operations and reduced direct operating costs.” [3, 4]. Even though the essential operational tasks concerning to aviate, navigate and communicate might be “easy” to fulfill, the mission management, solving contingencies and demanding environments must be handled seriously and reliably too. Hence, what are the differences between the operation of a normal complex high-performance airplane with one and a large airplane with two pilots? What is necessary to cope with these different demands in order to operate large airplanes as safe and reliable as today in future with a single pilot – without being stressed, overloaded or precarious? This paper presents a high-fidelity task analysis (HFTA). The goal of this analysis is to take a concrete view on the activities which are performed when operating a large airplane by two pilots. On the one hand the analysis outlines the implementation of the MPO concepts. On the other hand, it provides a concrete view on the task distribution and work share among the two pilots. The results build the baseline for future enhancements, extended automation and further needs of the next generation of cockpits. This paper is structured as follows: It will introduce the research context, regulations concerning single and multi pilot operations, i.e. pilot licensing, aircraft certification and air operation regulations. Previous work as well as related work concerning other task analyses regarding single pilot operation with large aircraft will be outlined. The method and approach of the conducted high-fidelity task analysis will be presented, its results depicted and discussed.

2 Research Context In the institutionally funded research project “Next Generation Intelligent Cockpit” (NICo), DLR investigates the feasibility and needs for single pilot operation of large airplanes. The goal is the development of a main concept and functions for future highly automated cockpits. This goal is accompanied by the development and investigation of a Virtual and a Remote Co-pilot (VCP/RCP) in order to evaluate new ways to support pilots. The VCP comprises enhanced intelligent and robust automation and assistance. The RCP represents a qualified and educated ground-based remote pilot. At the beginning of the project multi pilot crews and pilots also experienced with commercial single pilot operations were interviewed concerning the challenges when operating large and high-performance airplanes. Different aspects and situations were

62

L. Ebrecht

identified and transferred into an event catalog. Problematic situations, i.e. abnormal and emergency situations, were just named as one of the drivers for pilot’s concerns related to single pilot operations with large aircraft. Surprisingly, many concerns referred to normal operation, like handling weather, traffic, specific airport issues or passengers. As a result, one focus for investigations is taken on normal operations – so called standard operating procedures during daily changing conditions and demanding situations like weather, traffic and airport related issues [6]. Contingencies and the collaboration of MPO had been investigated in a first simulator study in NICo in 2021/2022.

3 Basics The application of SPO and MPO mainly depend on three regulatory aspects: 1. The rules for aircraft certification, 2. Pilot licenses and 3. For air operations. As described before, depending on the aircraft category – aircraft are designed to be operated by one or multiple pilots - mostly two pilots. Quite a lot of normal category aircraft are certified for SPO and MPO. Large category aircraft must be operated by two pilots applying the multi crew cooperation concept (MCC) [7] and crew resource management (CRM) [8]. The MCC specifies a task division among the pilot flying (PF) and the pilot monitoring (PM) as well as quality and safety concepts like cross-checks (CC) [9]. The certification standards for normal and large aircraft affirm that normal aircraft are able being operated by a single pilot and large aircraft by two or more pilots. Accordingly, the certification covers the required equipment, accessibility and ergonomics of the cockpit respecting SPO and MPO. In the international regulatory context, normal category airplanes are also named small airplanes and large category airplanes are also called transport category airplanes. Besides, the pilot’s license specifies whether the pilot is eligible to operate an aircraft as single pilot or as part of a multi pilot crew. The commercial pilot license (CPL) permits a pilot to conduct commercial air operations as single pilot with normal category aircraft certified for SPO [7]. Pilots holding an airline transport pilot license (ATPL) are allowed to conduct commercial operations with large category aircraft under MPO. The ATPL includes the rights of the CPL, i.e. to fly normal category aircraft certified for SPO alone as single responsible pilot. Beside the roles PF and PM during multi crew cooperation, the more experienced captain is the responsible pilot for the flight, i.e. the pilot in command (PIC). The roles of PF and PM can be assigned to either the captain or the first officer and can be switched during flight. The rules for air operations (Air Ops) extend the rules for the aircraft certification and pilot licensing. The Air Ops comprises rules for the execution of commercial air transport with normal and large category aircraft [10]. It includes for instance the requirements for the pilots and the minimum equipment depending on the operations and circumstances of the intended flight. Examples for these influencing factors are, if a flight will be conducted during night or under instrument metrological conditions, with auto land operation or in demanding surroundings like flying in mountainous regions, take-off and landing at airports with demanding procedures, or infrastructure, and more. Depending on that, the Air Ops specifies that pilots must have practiced these conditions in the recent past. Additionally, in case of conducting commercial operations with SPO certified normal

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

63

category aircraft, it could also require that an operation even have to be conducted by two pilots as MPO. Due to the demanding all-weather operation of commercial air transport, in most of the cases also normal category aircraft are accompanied by a second pilot. Prerequisite of all operational activities is that a pilot owns a type rating for the aircraft to be operated. This ensures that a pilot have the knowledge and skills to manage a certain type of aircraft in normal and degraded situations as well as being familiar to handle all the systems of the aircraft. The challenging question is - what can be enhanced and what is needed in future commercial all-weather air transport operation to enable SPO of large airplanes as safe and reliable as MPO today? Is a remote co-pilot on demand a fill-in, or what can be handled by an intelligent assistance system?

4 Previous Work At the beginning of project, the state of the art concerning present commercial air transport was analyzed. Five domains had been considered, i.e. regulations, operation, avionics, technologies and human factors. This was followed by interviews of professional pilots and multi pilot crews in order to collect the demands concerning present daily commercial all-weather operation with large airplanes as well as challenges of future SPO with large airplanes (SPO+) [6]. 19 pilots with an ATPL or CPL were asked concerning their experiences with commercial aviation. 11 were airline pilots, 5 from military transport and fighter operations and 3 from executive charter business. 8 of them were instructors or examiner. 5 of the pilots had experiences with SPO. The more experienced captains stated that they more or less could imagine to operate large airplanes as single pilot. In contrast, the asked first officers disagreed in SPO with large airplanes. Overall, the pilots who principally agreed to conduct SPO with large airplanes and those who disagreed were balanced. The named demands and challenges were work load and work share, decision making, contingencies, emergencies, fatigue, and pilot incapacitation amongst others. The interviews resulted in the identification of 74 relevant aspects, events and situations. These refer to the following five areas: 1. 2. 3. 4. 5.

normal operations and situations demanding conditions and events abnormal and problematic events and situations emergency situations special aspects concerning future SPO with large airplanes

Surprisingly, the interviewees gave more extensive explanations concerning normal and demanding situations than problematic and emergency situations. One reason for that might be that contingencies and emergencies represent more or less narrowed and specific situations – instead of normal operations and demanding conditions, which could vary a lot and the resulting action spans a larger field of possibilities. Hence, normal operation and demanding situations, due to weather, traffic, terrain, communication, airport or passenger related issues represent compulsory aspects that have to be treated in future SPO with large airplanes as safe and reliable as today. Accordingly, present MPO

64

L. Ebrecht

concepts and procedures have to be investigated for their implementation into enhanced intelligent assistance and automation systems as well as to provide a proper base to avoid overload, stress, overconfidence or irksome situations with too low demands. The main question to be addressed is what are the basic activities and everyday tasks in the cockpit of large airplanes?

5 Related Work Before presenting the results of published task analyses and related work, a common related base of task assignment is put in front. In principle, the conducted tasks and activities are related to the following four domains: aviate, navigate, communicate and manage aircraft systems (ANCM) [11]. These domains are quite often used to group or differ the tasks and activities when operating an aircraft. On top of these domains lays the tasks that corresponds to the mission management. Accordingly, all the activities and tasks undertaken by the pilots in the cockpit quite often refer to ANCM+M. In addition, in case of large airplanes the multi pilot crew has to respect the multi crew cooperation concept [7] and crew resource management [8]. These MPO concepts specify who is doing what and when and furthermore, the concepts add additional safety and reliability concepts like the execution of cross-checks, joint briefings, the four-eyes principle, collaborative application of checklists and decision making. The tasks related to the execution of flights with large airplanes had been analyzed for different reasons. Friedrich et al. for instance generated and published a taxonomy for the tasks operating an Airbus A320. They investigated and discussed the possible transition from conventionally to remotely piloted airplanes and its possible impacts on the function allocation and accessibility of information [12]. The hierarchical task analysis of this work applied the so-called “Social Organization and Cooperation Analysis Contextual Activity Template” (SOCA-CAT) [13]. The hierarchical approach comprises three layers: 1. Focusing on functional purposes, abstract and generalized functions, 2. Physical functions and 3. Physical form. In a first step, concerning the top layer, two functional purposes were named, first the safe and second the efficient execution of a commercial flight. In a following step four abstract functions were derived which are necessary to achieve the two purposes. Each abstract function could be part of the implementation of different purposes. For instance, for a safe flight operation representing a functional purpose, the flight parameters should respect the flight envelope of the aircraft. Furthermore, it has to be ensured that the airplane will be separated from any traffic, terrain and meteorological risk and that the airplane fuel consumption and load plus the aircraft systems are properly managed. The fuel management as well as to follow the flight envelope also implements the functional purpose of conducting a flight most efficiently. In a following step, four generalized functions were identified and assigned to the abstract functions, e.g. aviate, navigate, communicate or manage aircraft systems. The four generalized functions of the first layer had been broken down to 24 physical functions, like managing speed, heading, altitude, flaps and gear as subparts of the generalized function aviate or for instance position of own ship, other traffic, terrain, meteorological effects, nav aids, airports, runways and taxiways as part of the generalized function navigate. The underlaying physical forms representing the third layer,

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

65

contains the following eight items: aircraft controls and systems, aircraft performance, weather, terrain, traffic, airports and its infrastructure, Air Traffic Control Service (ATC) and airspace structure. The physical forms are the basic elements used by the physical functions. The physical functions of this hierarchical approach afterwards are used to figure out in which flight phase it is applied and by whom, i.e. the PF or the PM or the aircraft’s automation, e.g. the autopilot. Hereby, the approach and landing operation of a large airplane with two pilots (MPO) were considered. Different level automation was also taken into account, i.e. flying in manual, selected or managed mode. Additionally, the used information was identified and assigned to the former physical functions. Sources of information were measured data from the aircraft and the environment. In a next step the results of the task analysis for the approach and landing with a large airplane with two pilots were taken as base for SPO. The former task distribution of the two onboard pilots has been rearranged to a single pilot and a remote pilot. Lacabanne et al. published the results of another hierarchical task analysis [14]. This task analysis was motivated by the development of a new Flight Management System (FMS). FMS support pilots concerning flight control according to the planned route, adaptation of the route and application of flight procedures, e.g. departure, arrival, approach and landing. The flight management system represents the base for the navigation, communication with ATC and fuel management. The focus of the analysis laid on flying an airplane, i.e. ANCM. In order to identify the essential functions for the operation of the airplane, seven pilots were interviewed. One specialty was that the study regarded different types of airplanes – from normal single engine airplanes, over complex highperformance multi-engine turbine airplanes to large airplanes and even fighter jets. The task analysis comprised three level of functions and activities, one layer with related interfaces and a last layer with linked information. The first functional level contained the four domains – ANCM, the second one nine major and the third one ten subfunctions. The firstly stated functional domain navigation focused on the positioning along the flight route from start until destination. Aviate addressed manual and automated flying with respect to the envelope. The communication actions comprised the communication among the pilots as well as with ATC. Manage systems mainly contained the system monitoring regarding system and engine parameters. Furthermore, the analysis considered rough annotation of parallel and sequential task execution. The ANCM domains were regarded as parallel, i.e. that its actions were interleaved, whereas the actions of one domain follow a sequential order within each domain. In comparison to the aforementioned hierarchical high-level task analyses, Wolter and Gore conducted a detailed cognitive task analysis in order to investigate current MPO as baseline for the evaluation of a concept for future SPO supported by a remote pilot [15]. Their concept foresees the support of the single pilot by a remote pilot or so-called ground operator in nominal operation and in case of contingencies. The regarded flight phases had been the approach and landing. A task decomposition spreadsheet was used to list all the actions by the PF, PM, automation, ATC and dispatch. The task analysis covered a nominal and an abnormal scenario, i.e. the approach and landing in Denver and a diverting due to weather conditions. These two scenarios were taken as baseline for the MPO case and for SPO with two different variants of the support by a ground operator. Based on that a workload analysis and comparison of the cases was done.

66

L. Ebrecht

Stanton et al. applied a SOCA-CAT for MPO [16]. The specialty of this analysis is that the whole flight in general was regarded. 16 flight phases and 28 functions were considered and four potential models of SPO compared to current MPO.

6 Method The conducted task analyses vary quite a lot concerning the applied methodology and their procedures. Some are hierarchical and abstract, some detailed, some narrowed – focusing on certain flight phases whilst others cover the complete flight or all flight phases. While task analyses for MPO has been well examined, there is the need to add further efforts in order to investigate future SPO of large airplanes. Concerning the scope of the project, the level of detail from former work and their results were not sufficient. Especially, in order to identify essential and concrete issues for the development of future robust automation and enhanced intelligent assistance, a detailed review of the basic activities and actions during a normal flight had been undertaken. This includes all inputs and outputs from the airplane systems to the two pilots – from start till shutdown. The major difference to the former described task analyses in comparison to the high-fidelity task analysis (HFTA) is, that the conducted task analysis took a look on the basic actions in a bottom up approach. Additionally, the interfaces as well as the used information were considered. The primary goal of the HFTA is to achieve a concrete view on the work performed by the two pilots. Secondly, the analysis should provide a detailed image from the operator’s perspective, how the multi crew cooperation concepts, the crew resource management and methods are implemented ensuring an as safe and reliable as possible operation. Based on, how the main actors in the cockpit jointly interact with the airplane’s systems, the investigation of new task allocation and assignment towards future SPO+ by introducing enhanced assistance is started. In the following the approach of the HFTA is depicted. The HFTA focuses on the Standard Operating Procedures (SOP) of an Airbus A320. The SOP cover all the actions to be done getting started with a cold cockpit of a parked airplane, followed by several other procedures and phases, ending with reaching the final parking position. In particular the A320 Quick Reference Handbook (QRH) comprises the procedures listed in Table 1 [17]: Each procedure describes the actions of the PF and PM. Table 2 provides a scheme of the tabular checklist alike description of the taxi operation. The HFTA contains two iterative main parts: Firstly, a common and secondly an applied perspective and analysis of the MPO tasks and actions. In the end, the approach comprises two main and two sub-iterations, i.e.: 1. the common analysis of the SOP referring to i. the QRH ii. the FCOM 2. the applied consideration of the SOPs in relation to i. a planned real flight ii. a simulation of the planned flight as MPO and SPO+

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

67

Table 1. Standard Operating Procedures (SOP) defined in the QRH [17]. #

Procedure

#

Procedure

1

Safety Exterior Inspection

12

Cruise

2

Preliminary Cockpit Preparation

13

Descent Preparation

3

Cockpit Preparation

14

Descent

4

Before Pushback or Start

15

ILS Approach

5

Engine Start

16

Non-Precision Approach (Managed)

6

After Start

17

Non-Precision Approach (Selected)

7

Taxi

18

Landing

8

Before Take-Off

19

Go around

9

Take-Off

20

After Landing

10

After Take-Off

21

Parking

11

Climb

22

Securing the Aircraft

Table 2. Tabular description for taxi operation in the QRH [17]. PF NOSE LIGHT

PM TAXI

TAXI CLEARANCE

OBTAIN

PARKING BRAKE

OFF

ELAPSED TIME

AS RQRD

THRUST LEVERS

AS RQRD

BRAKES

CHECK

BRAKES PRESS

CHECK 0

FLT CTL

CHECK

FLT CTL

CHECK

AUTO BRAKE

MAX

ATC CLEARANCE

CONFIRM

TO DATA

CHECK

FMGS F-PLAN/SPD

CHECK

FCU ALT/HDG

SET

FD

CHECK ON

FLT INST & FMA

CHECK

* Taxi clearance obtained:

* ATC clearance obtained:

FLT INST & FMA ...

CHECK

...

The first part concerns the detailed task analysis based on the QRH and FCOM. The briefly specified actions of the QRH had been analyzed and extended by additional information extracted from detailed description of the A320 Flight Crew Operations

68

L. Ebrecht

Manual (FCOM) [18]. The first common part represents an initial general step. The second part addresses the application of the SOP during a specific flight, e.g. from Innsbruck to Hamburg. The applied part considers a real planned flight by instantiating the common part using the data of the planned flight which additionally is going to be investigated by simulating the flight. In the following, the content and approach of the common first part of the HFTA is described. The extracted and extended actions from the QRH and FCOM were listed in a MS Excel table. The columns of the QRH procedure description (see Table 2) were expanded for the purpose. The resulting main columns are as follows: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

ANCM context of each action pilot flying actions interface, i.e. the panel and instrument, used by the PF related information pilot monitoring actions interface, i.e. the panel and instrument, used by the PM related information sequential and parallel execution order new action assignment to a single pilot new action assignment to future assistance

Columns 2–4 and 5–7 contain derived information from the QRH and FCOM. The ANCM context of each action is generated from the action. The new assignment of the explored actions was based on educated guess. Related aspects to this were, whether the SP has to do something, e.g. due to the allocated responsibility. Furthermore, if an assistance could execute an action. This also includes the questions, whether an assistance would have access to the required information or to the interface the aircraft’s subsystem. As a trivial example, the walkaround and visual inspection of the airplane so far hardly seems to be executed by an assistance. Whereby, assistance in this case means a computation or automation system, e.g. a virtual co-pilot, or a remote ground operator.

7 Results The regarded 22 SOPs result in 463 PF actions in total, with 324 unique actions, i.e. without repetitions and the same actions with different variables. PM actions count 444 in total, with 288 unique actions. Most repeated and varied actions are announcements and orders from the PF and confirmations on the PM’s side. The first two operational procedures, i.e. the safety exterior inspection and the preliminary cockpit preparation, contain 84 actions, which solely are undertaken by the PM. Both procedures mainly check aircraft systems, like the visual inspection from outside, i.e. the landing gear, doors, engines, APU area or battery, electric and hydraulic system status, and more. In parallel, the PF starts with cockpit preparation procedure. This procedure prepares the avionic systems and set up the flight management and guidance system (FMGS) by inserting the flight plan. More or less in the middle of this procedure, the detailed collaboration of PF and PM starts – “when both pilots are seated” when the walkaround and the preliminary cockpit preparation are finished.

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

69

Table 3. Beginning of the multi crew cooperation of PF and PM in the cockpit preparation procedure [17]. PF

PM

… * When both pilots are seated: Glareshield (PF side):

Glareshield (PM side):

- BARO REF

SET

- BARO REF

SET

- FD

CHECK ON

- FD

CHECK ON





Lateral Console (PF side):

Lateral Console (PM side):

- OXY MASK

TEST

- OXY MASK

ADJUST

- PFD-ND brightness

PF Instrument Panel: - PFD-ND brightness

PM Instrument Panel:

... - PFD-ND

TEST ADJUST

... CHECK

Pedestal:

- PFD-ND

CHECK

Pedestal:

- ACP1

CHECK

- WEATHER RADAR

SET



FMGS data confirmation

- THRUST LEVERS

IDLE

- AIRFIELD DATA

CONFIRM

- ENG MASTER

CHECK OFF

- ATC CLEARANCE

OBTAIN

- ENG MODE SEL

CHK NORM

- IRS ALIGN

CHECK

- PARKING BRAKE PRESS

CHECK

...

… - ATC

SET

- FUEL QTY

CHECK

- TAKEOFF BRIEFING

- F-PLN A and B

CHECK

- ATC CODE

SET

- FUEL QTY

CHECK

PERFORM

The cockpit preparation procedure shows parallel and duplicated actions of PF and PM, e.g. setting the barometer parameter, enabling the flight director (FD) and checking the oxygen masks (see Table 3). Due to the fact that the captain (CPT) and the first officer (FO) can take the role of PF and PM, some instruments, controls and input units are doubled in the cockpit, i.e. on the left and the right side of the cockpit. Other controls, like levers for the thrust, flaps, speed brakes, landing gear, and more have only one instance in the cockpit, but all are reachable from both seats with restrictions for the landing gear lever. The cockpit preparation procedure depicted in Table 3 also shows parallel, independent actions. This is the case for the confirmation of the airfield data, obtaining

70

L. Ebrecht

the start-up clearance from ATC, checking the IRS alignment and flight plan (F-PLN) by the PM. Thereafter follows a re-synchronization in order to set and crosscheck the ATC code and fuel quantity. Regarding the ANCM assignment and context of the actions in Table 4, the mix of actions concerning the four domains as well as the work share between the PF and PM can be seen. Further, the actions of the four domains sometimes are independent and sometimes there is a relation and dependency, causing a sequential order of actions of different domains. Table 4. Parallel and independent actions of ANCM tasks on the example of the cockpit preparation procedure. PF

ANCM

PM

ANCM

Aviate

- AIRFIELD DATA CONFIRM

Manage

- ENG MASTER CHECK OFF Aviate

- ATC CLEARANCE OBTAIN

Communicate

- ENG MODE SEL CHECK NORM

Aviate

- IRS ALIGN CHECK

Navigate

- PARKING BRAKE PRESS CHECK

Aviate

- GROSS WEIGHT INSERTION CHECK

Manage

... FMGS data confirmation - THRUST LEVERS CHECK IDLE

...

- TO DATA CALCULATE/CHECK

- ACP2 CHECK

Communicate

- ATC SET

Communicate

- FUEL QTY CHECK

Manage

- TAKEOFF BRIEFING PERFORM

-

- F-PLN A and B CHECK

Manage

- ATC CODE SET

Communicate

- FUEL QTY CHECK

Manage -

Apart, in the take-off procedure – actions of the same domain, i.e. aviate, are jointly executed by the PF and the PM. As shown in Table 5 the PF commands “Gear up”, “Flaps1” and “Flaps 0”. The PM executes his commands and confirms the execution. In case of changing the configuration when setting the flaps, the PM also checks the current speed and related constraints for the new flaps setting. This is one part of cross- and double-check concerning the execution of essential actions and changes. This implements a four-eyes principle as safety and reliability function. When re-assigning the PF and PM’s actions to a single pilot (SP) and an assistance system or support by a remote pilot (AS), the essential question is, if the assistance

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

71

Table 5. Collaboration of PF and PM concerning aviate actions on the example of the take-off procedure. PF

ANCM

PM

ANCM

- ANNOUNCE “POS CLIMB”

Aviate

- LANDING GEAR UP

Aviate

- GRND SPLRS DISARM

Aviate

... * WHEN V/S is positive: - ORDER “GEAR UP”

Aviate

... - A/P AS REQUIRED

Aviate

- ANNOUNCE FMA

Aviate

* At thrust reduction ALT:

Aviate

- THRUST LEVERS CL

Aviate

- ANNOUNCE FMA

Aviate

- ANNOUNCE “L/G UP”

Aviate

- ONE PACK ON

Manage

- FLAPS 1 SELECT

Aviate

- CONFIRM/ANNOUNCE “FLAPS 1”

Aviate

- FLAPS 0 SELECT

Aviate

- CONFIRM/ANNOUNCE “FLAPS 1”

Aviate

- 2ND PACK - ON

Manage

* At acceleration ALT: - ANNOUNCE FMA

Aviate

* At F speed:

Aviate

- ORDER “FLAPS 1”

Aviate

* At S speed:

Aviate

- ORDER “FLAPS 0”

Aviate

has access to the system interfaces and the right information in order to be applied. Furthermore, fundamental aspects, like the responsibility or the decision making, are fixed to the single pilot’s side. Table 6 demonstrates a possible re-assignment on the example of the cockpit preparation procedure. Many actions previously performed by the PF and the PM, might be shifted into an assistance system – provided that there is a proper access the system using the same or additional interfaces, and it has access to the controls. Checking does not only mean to evaluate the position of a switch or lever, it also includes if the evaluation result is not as it should be, to achieve the right position. Other actions have to be executed by the SP and AS together. For instance, obtaining the ATC clearance likely might be triggered by the SP. However, the result of requesting the startup clearance must be forwarded from the AS to the SP. Another example is the calculation and check of the take of data. The first is done by the FMGS, but the check and validation should be accomplished by the SP. Apart from that, in the take-off procedure – today, aviate actions are jointly executed by the PF and PM (see Table 7). This results in a proper implementation of the collaboration between the SP and AS. Proper in this case means, that the assistance can

72

L. Ebrecht

Table 6. Re-assignment of PF and PM actions to a single pilot (SP) supported by new enhanced assistance system (AS) on the example of the cockpit preparation procedure. PF

SP/AS

PM

SP/AS

FMGS data confirmation

SP/AS

... - THRUST LEVERS CHECK IDLE

AS

- AIRFIELD DATA CONFIRM

AS

- ENG MASTER CHECK OFF

AS

- ATC CLEARANCE OBTAIN

SP/AS

- ENG MODE SEL CHECK NORM

AS

- IRS ALIGN CHECK

AS

- PARKING BRAKE PRESS CHECK

AS

- GROSS WEIGHT INSERTION CHECK

AS

- TO DATA CALCULATE/CHECK

SP/AS

- ACP2 CHECK

AS

- F-PLN A and B CHECK

AS

- ATC SET

SP/AS

- FUEL QTY CHECK

AS

- ATC CODE SET

SP/AS

- FUEL QTY CHECK

AS

...

- TAKEOFF BRIEFING PERFORM

SP

provide alerts in case of deviations from given constraints, but including the possibility to overrule this and forcing the application of an order of the single pilot, like it can be the case of the pilot in command in case of MPO, for given reasons. Another aspect is, that the situational awareness is cross-checked when the flight mode annunciator (FMA) state changed and the SP will make a callout. An assistance system must check it semantically according to the system state and situation. Hence, FMA announcements must be properly implemented and performed by the SP and AS. Additionally, ATC code settings and changes should be announced to the SP when assigned to the assistance system. This requires further changes concerning the integration and interconnection of future aircraft systems.

8 Discussion None of the actions undertaken to operate a large airplane can be omitted. As figured out before, they have to be evaluated concerning their re-assignment, i.e. to the single pilot or an onboard or remote assistance. The task distribution of the PF and PM or better their multi crew cooperation results in independent and sequential actions according to the task in a given procedure. The multi crew cooperation comprises a task share, cross-checks and the four-eyes principle. One essential point for the assignment of an action is the feasibility to have access the system interfaces, i.e. controls and panels, and to have access to the needed information,

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

73

Table 7. Re-assignment of PF and PM actions to a single pilot supported by new enhanced assistance system on the example of the take-off procedure. PF

SP/AS

PM

SP/AS

* WHEN V/S is positive:

-

- ANNOUNCE “POSITIVE CLIMB”

AS

- ORDER “GEAR UP”

SP/AS

- LANDING GEAR UP

AS

- GRND SPLRS DISARM

AS

...

... - A/P AS REQUIRED

SP

- ANNOUNCE FMA

SP/AS

* At thrust reduction ALT:

-

- THRUST LEVERS CL

SP/AS

- ANNOUNCE “L/G UP”

AS

- ONE PACK ON

AS

- FLAPS 1 SELECT

AS

- CONFIRM/ANNOUNCE “FLAPS 1”

AS

- FLAPS 0 SELECT

AS

- CONFIRM/ANNOUNCE “FLAPS 1”

AS

- 2ND PACK - ON

AS

- ANNOUNCE FMA * At acceleration ALT:

-

- ANNOUNCE FMA * At F speed:

-

- ORDER “FLAPS 1”

SP/AS

* At S speed:

-

- ORDER “FLAPS 0”

SP/AS

e.g. checking the airplane condition by walking around. As one outcome of the conducted HFTA, the potential developments and adaptations of the human machine interface in the cockpit seems to cause also changes of the airplane system interfaces enabling an onboard assistance to access the controls and required information. A remote assistance would also need additional interfaces. However, it is more likely, that an additional enhanced onboard assistance system will support a single pilot in large airplane in future. Two aspects are crucial concerning the additional interfaces. First, the fact, that the actions assigned to a new assistance somehow implements a multi crew cooperation. This means, that the needed interfaces must fit to the airplane systems as well as to the remaining single pilot. Otherwise, actions of the multi crew cooperation will be separated and as result the new action assignment will not represent the former multi crew cooperation. The second point is, that the additionally given access to the assistance system, which is equal or similar to access of the PF and PM (see Table 6), represent a crucial safety case. As previously stated, the PF gives orders to control the landing gear, flaps, speed brakes, and so on. The controls should be accessible or at least readable by the assistance in order to implement the cross-check while the single pilot as responsible pilot in command must

74

L. Ebrecht

always have the ability to overrule any assistance system. Otherwise, if the access is not provided, a lot of actions will be added to the actions a single pilot has to perform. This might cause additional work and prolongs processes in the case when these actions could be done in parallel, e.g. in the cockpit preparation procedure (see Table 6). Beside the accessibility aspect, the detailed view on the actions and how safety and reliability features like cross-checks are implemented showed, that a re-assignment solely focusing on the ANCM context of an action does not enable necessary developments and adaptations of future cockpits (see Table 5 and Table 7). Another aspect concerns the communication. A transferred allocation of the communication to an assistance system must be synchronized with the single pilot. This could concern the initiation of a communication on the one hand and on the other hand providing the result and content of the communication properly, e.g. the communication with the ground staff during push back, or startup clearance or taxi information. At present, systems like Aircraft Communications Addressing and Reporting System (ACARS) and Controller Pilot Data Link Communications (CPDLC) going in the right direction. Despite this, that communication systems have to be properly implemented and integrated – they also have to be provided by the ATC service, resulting in environmental efforts and network constraints concerning its availability. A further crucial aspect represents the cross-check of the situational awareness. The announcements of the flight mode, represent an essential part of checking the mode awareness of the pilot flying by the pilot monitoring. This has to covered by an assistance as well. Overall, in situations where the single pilot acts as PF all the other assisted actions must be properly implemented in order to support the single pilot and not causing additional efforts to the SP - especially, when flying manually using the flight director or when operating on ground. In contrast, the autopilot system has to be more robust in order to support the SP as much as possible, so that the SP will not be forced taking over manual flying activities including in case of contingencies.

9 Conclusion The contribution introduced the basics of current commercial aviation by differentiating normal from large category airplanes, the pilot licensing and multi pilot from single pilot operation. The institutional funded project NICo and preceding work briefly were described. Related work concerning published task analyses is reviewed. Furthermore, the approach of the conducted high-fidelity task analysis is presented and the results discussed. The analysis showed, how the multi crew cooperation and other involved concepts like task share, cross-checks and the four-eyes principle are implemented in current MPO. The re-assignment of activities and actions performed by the pilot flying and pilot monitoring to a single pilot onboard and a supporting assistance system showed, that the implemented concepts will have to be dealt with care when implemented in future systems. The focus on the SOPs, beginning from startup until shutting down the airplane, purposely was chosen in order to investigate needs and challenges in normal operation as seeding point and basis for further investigations. Aspects like crew incapacitation, the management of contingencies and emergency situations are subject of other work

High-Fidelity Task Analysis Identifying the Needs of Future Cockpits

75

packages within NICo. Hence, the requirements for single pilot operation with large airplanes, e.g. enhanced assistance and automation functions, shared control and shared situational awareness, had been initially regarded in this work when shifting and reimplementing concrete actions from the second pilot to a single pilot or an assistance system. Hereby, the accessibility of interfaces and used information plays a key role, especially when the support comes from a remote pilot on ground. The HFTA permitted detailed insights. In a next step, the results of the HFTA will be applied to a complete planned flight with an A320. The instantiation of the SOPs for this flight are going to be examined using the flight planning data and a simulation. Further investigation concerning the feasibility of the transition from MPO to SPO with large airplane will be made herewith. Apart from the actions of SOPs, the consideration and simulation of planned flight will regard additional tasks, e.g. the mission management, observing the mission progress and environmental conditions as well as the decision-making process when occasions causes changes of the route or destination.

References 1. EASA: Certification Specifications and Acceptable Means of Compliance for Large Aeroplanes (CS-25) (2023). https://www.easa.europa.eu/certification-specifications/cs-25-largeaeroplanes 2. EASA: Certification Specifications for Normal-Category Aeroplanes (CS-23) (2020). https:// www.easa.europa.eu/certification-specifications/cs-23-normal-utility-aerobatic-and-com muter-aeroplanes 3. CESSNA CITATION: C525 CERTIFICATION BASIS (2021). https://www.cessna-citation. com/en/cessna-525/training/general/certification-basis 4. Premier JET Aviation: Cessna Citation CJ1 Specifications (2018) 5. JETADVISORS: Citation CJ1 (2023). https://jetav.com/cessna-citation-cj1-specifications/ 6. Ebrecht, L., Niermann, C.A., Buch, J.-P., Niedermeier, D.: Requirements and challenges of future cockpits of commercial aircraft – next generation intelligent cockpit, German Aviation and Space Congress (DLRK) 2021, 31 Aug-2 Sep 2021, Bremen (2021) 7. EASA: Part-FCL (2020). https://www.easa.europa.eu/en/easy-access-rules-flight-crew-lic encing-part-fcl 8. Cooper, G.E., White, M.D., Lauber, J.K. (NASA): Resource Management on the Flight Deck, Proceedings of a NASA/Industry Workshop, San Francisco, 26–28 June 1979 9. Flight Standards Service. SAFO15011: Roles and Responsibilities for Pilot Flying (PF) and Pilot Monitoring (PM). Washington DC, 17 November 2015 10. EASA: Easy Access Rules for Air Operations (2022). https://www.easa.europa.eu/documentlibrary/easy-access-rules/easy-access-rules-air-operations 11. Billings, C.E.: Aviation Automation – the Search For a Human-Centered Approach. Lawrence Erlbaum Associates, Mahwah (1997) 12. Friedrich, M., Papenfuß, A., Hasselberg, A.: Transition from conventionally to remotely piloted aircraft – investigation of possible impacts on function allocation and information accessibility using cognitive work analysis methods. In: Stanton, N.A. (ed.) Advances in Human Aspects of Transportation. AISC, vol. 597, pp. 96–107. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-60441-1_10 13. Baber, C., Stanton, N.A., Atkinson, J., McMaster, R., Houghton, R.J.: Using social network analysis and agent-based modelling to explore information flow using common operational pictures for maritime search and rescue operations. Ergonomics 56(6), 889–905 (2013). https://doi.org/10.1080/00140139.2013.788216

76

L. Ebrecht

14. Lacabanne, M., Amadieu, F., Tricot, A., Spanghero-Gaillard, N.: Analysis of the flight task around different types of aircraft. In: International Conference on Hu-man-Computer Interaction in Aerospace (HCI-Aero), Sep 2012, Brussels, Belgium (2012) 15. Wolter C.A., Gore B.F.: A Validated Task Analysis of the Single Pilot Operations Concept. Moffet Field, California (2015) 16. Stanton, N.A., Harris, D., Starr, A.: The future flight deck: modelling dual, single and distributed crewing options. Appl. Ergon. 53, 331–342 (2016). https://doi.org/10.1016/j.apergo. 2015.06.019 17. Airbus: Quick Reference Handbook A319/320/321, CCM Airlines 18. Airbus: Flight Crew Operations Manual A318/A319/A320/A321, CCM Airlines

DART – Providing Distress Assistance with Real-Time Aircraft Telemetry Hannes S. Griebel1(B) and Daniel C. Smith2 1 CGI IT UK Ltd., London, UK [email protected] 2 University of Hawaii West Oahu, Kapolei, USA [email protected]

Abstract. ICAO has long recognized the need to transmit data off an aircraft while still in flight, especially in remote areas. To locate an aircraft in distress more quickly, and to allow investigators more timely access to critical evidence. Meanwhile, the available bandwidth has grown enough for aircraft to transmit ICAO Annex 6 and 14 CFR Part 121 mandated minimum parameters at all times and comparatively little cost. It therefore stands to reason to increasingly shift from ground-downloaded flight data to real-time performance monitoring similar to space operations, motorsports, and remote asset management. This in turn opens up new opportunities for ground-based support engineers to assist flight crews in an emergency scenario. To leverage the full potential of this capability, effective crew resource management and a thorough understanding of the related human factors become key disciplines. We refer to this method as DART: Distress Assistance with Real-Time Telemetry. Keywords: Flight data · real-time · distress · assistance · aviation safety · Human Factors/System Integration · Remote operations · Safety Risk and Human Reliability · Situation awareness · Systems Engineering

1 Introduction Since the first publicly available satellite broadband connection for passengers in 2004 (Connexion by Boeing), satellite data link bandwidth available for secure data communications has increased dramatically. This enables airlines to significantly improve their operational performance, by sending operational data to a ground-based data centre. Considering this development, it is justifiable to look at ways in how information derived from this data could also be used to open up new avenues of in-flight assistance that can either help prevent abnormal situations from occurring in the first place or at least help crews to rectify them promptly. This approach has the potential to improve the chances of air crews to prevent situations from deteriorating to the point where a serious incident or accident becomes unavoidable. To evaluate the potential and viability of this approach, we looked at historical accident and incident scenarios, as well as known operational issues, to understand which avenues of intervention a support programme using real-time telemetry could offer to facilitate a more desirable outcome. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 77–90, 2023. https://doi.org/10.1007/978-3-031-35389-5_6

78

H. S. Griebel and D. C. Smith

2 Potential for Distress Assistance and Better Outcomes Distress Assistance with Real-Time Aircraft Telemetry (or DART, in short), is the concept for a programme to systematically enable an aircraft operator to use real-time flight data already available, for the purpose of rendering assistance in abnormal situations. The primary way in which this works is by breaking the chain of causation that would lead to an undesirable outcome so that a better result may be achieved. Even where an abnormal situation cannot be completely rectified, a better outcome may mean a noteworthy reduction in the resulting damage. A common and relatively benign case may be a decision on whether or not to divert, and if so, where best to divert to. A notable such incident took place in 2018 over the Pacific Ocean, where a wide-body airliner diverted due to a recurring error message (reference withheld and case anonymized on request by the operator). In this case, dispatch and aircrew correctly followed the procedure and were right to err on the side of caution. The total cost of the incident was estimated by the operator to amount to just over US$ 150,000. A read-out of the QAR and a diagnosis of the aircraft’s systems, however, revealed that the error message was a false alarm. With access to certain data – and the ability to ask for selective transmission, including replay – ground personnel might have been able to verify the false alarm. Similarly, an incident at Frankfurt International Airport caused the grounding of a narrowbody airliner by over two hours, until a local maintenance crew was able to read out the QAR to confirm that a fault indication was, in fact, itself incorrect (anonymized on request of the flight crew and operator). The carrier had no maintenance personnel stationed at Frankfurt International Airport at the time of the incident, and no cost estimates were released in this case. Nevertheless, the reader may estimate the cost based on typical costs of a delayed departure for narrow-body airliners.1 There are other fairly common events where access to data can resolve issues and help plan maintenance work in borderline cases. These include flap extension at possibly excessive speed, degree of turbulence, vertical acceleration on landing (and any roll component), minor engine near-exceedances, and even cabin temperature. Early indications can help schedules and dispatchers plan their resource allocation while the flight is still in progress. A rarer but more dramatic situation would be the correct identification and isolation of faults when the aircraft starts reacting unexpectedly to flight control inputs, or its flight characteristics change unexpectedly in flight. A notable example is the well-publicised incident of QANTAS flight 32 (ATSB Transport Safety Report, Aviation Occurrence Investigation – AO-2010-089, 2010), where fragments of a ruptured IP turbine disc on an Airbus A380 caused a significant change in the aircraft’s performance and flying characteristics while severing several electrical harnesses inside the wing structure. This caused an abnormally high number of ECAM (Electronic Centralized Aircraft Monitor) messages that the crew was only able to deal with because of five flight crew members on duty that day, including a check captain and a supervising check captain. The additional crew could help with damage assessment while the captain and first officer focused on 1 The trade group Airlines for America estimated $74.24 per minute in 2020. (Airlines for

America, 2020).

DART – Providing Distress Assistance

79

controlling an aircraft of which the operating characteristics had significantly changed from any normal configuration of this aircraft type. With only two or three flight crew on board, the flight may have ended less favourably. An assisting party (a concept discussed below), however far away, could similarly lighten the workload on the crew operating the aircraft. Given the expected further increase in available satellite data communication bandwidth and the related reduction in associated cost, it is further conceivable to transmit in real-time the audio from the cockpit voice recorder microphones, or even create a full-duplex video conferencing link. This would allow a much closer recreation of the favourable circumstances of QF 32, and remove the need for the flight crew to manage an additional communication channel. The technology would also provide a way to reduce pilot workload in more benign cases occurring in single pilot operations scenarios. In the most extreme circumstances, pilots have lost control in a flight of an aircraft otherwise in good working order, both mechanically and electrically. In many such cases, no adverse circumstances affected the aircraft other than the incorrect situational awareness of the pilot flying, often triggered by a minor malfunction that would normally have been rectified easily. In such circumstances, it may be an assisting party, for example, a standby crew composed of off-duty pilots, dispatchers, and engineers, responding to an unusual attitude alert, advising the crew flying the aircraft of the additional information available to them. In the notable case of China Airlines 006 (NTSB/AAR 86/03, 1986), the asymmetric thrust from a rolled-back no. 4 engine of the Boeing 747SP operating the flight caused an aircraft attitude upset that went unnoticed and led to a rapid and uncontrolled descent during which loads exceeding 5g severely injured two passengers and caused significant mechanical damage to the airframe. Only after breaking through the cloud cover at around 11 000 feet was the captain able to orient himself and recover the flight. Although the captain noticed the increasing bank and pitch angles on the attitude indicator, he wrongly concluded that the indicators had failed. In this scenario, the no. 4 engine’s deterioration preceding its flame-out, the lack of rudder input in response to an increasing turn and pitch rate immediately after flame-out, the discrepancy between the autopilot’s roll inputs and the aircraft’s roll- and turn rates, the subsequent exceedance of bank and pitch angles and the large variations in g-forces would have all been detectable by an automated flight data monitoring system, allowing an assisting party to provide additional input to the flight crew that would have helped them rectify the situation at several stages before its nearly catastrophic deterioration. While China Airlines 006 (with 20th -century technology) eventually landed safely, Air France 447 did not (Dossier BEA f-cp090601, 2012). In this well-publicised case, it was an unreliable airspeed that caused the crew of an airliner otherwise in good working order to make flight control inputs that led to the demise of the flight. While Air France 447 is similar to China Airlines 006 in that the lack of visual cues compounded the problem, it is notably different in so far as some telemetry was available through ACARS, except that no one picked it up until after the fact. Even then, the limited amount of data transmitted during the final minutes of the flight offered little clues as to what transpired. While the cockpit voice recordings indicate the crew was aware that they needed assistance, none was immediately available. An assisting party alert to their situation would have likely been able to assist the pilots to achieve a significantly better outcome, and it is entirely plausible that the flight would have even made schedule. This

80

H. S. Griebel and D. C. Smith

highlights the necessity for a fully trained crew – flight, maintenance and dispatch – to remain available and alert to current events. It also highlights the value of an assured and secured data link, and a mission-critical big data analysis system that can reliably alert the crew to potential departures from expected parameters, querying the aircraft for additional data sets and displaying them to the assistance crew in an actionable manner. Lastly, DART offers the possibility to identify and discourage dangerous flying habits. In the recent case of the 2017 Learjet accident at Teterboro Airport (NTSB/AAR 19/02, 2019), the probable cause was determined to be the Pilot in Command’s (PIC) “attempt to salvage an unstabilized visual approach, which resulted in an aerodynamic stall at low altitude.” Contributing to the accident was “[…] the PIC’s decision to allow an unapproved [second in command] to act as [pilot flying], the PIC’s inadequate and incomplete pre-flight planning, and the flight crew’s lack of an approach briefing. Also contributing to the accident were [the operator’s] lack of safety programs that would have enabled the company to identify and correct patterns of poor performance and procedural noncompliance.” (NTSB/AAR 19/02, 2019). A DART programme not only offers assistance when in distress, but also can flag recurring departures from standard procedures, discouraging unsafe practices and allowing to identify of training needs. This is an example of FOQA-like benefits for smaller operators. Certainly, in the cases of significant master caution alarms or warnings, it is appropriate to switch from low volume updates to streaming the whole, live FDR frames along with selected data beyond that. Several widely-used software systems take FDR data and can reconstruct the appearance of the cockpit instruments in flight ops centres for pilots who may be on standby duty – and for dispatchers and maintenance controllers. Modern software solutions for space applications, such as the ones developed by CGI UK Ltd. for satellite operations, further include artificial intelligence (AI) enhanced event monitoring, facilitating quicker and more effective intervention by the operator on duty (see also Sect. 6 “Establishment of a DART programme”). As these examples illustrate, DART offers four avenues for breaking the chain of causation to improve the outcome of abnormal situations: 1. Provision of actionable information not otherwise accessible to aircrews or dispatchers/maintenance. 2. Workload reduction when a breakdown in automation or change in flight characteristics as a result of the malfunction increases the workload of the pilot flying the aircraft. With real-time data, ground staff can ask fewer but better questions via voice calls. 3. Unbiased appraisal of the situation and related crew advice when the mental picture of the crew flying the aircraft begins to deviate from reality, or when a routine alert is followed by a non-standard response. 4. Long-term monitoring of flight data to identify faults early, discourage reckless behaviour and identify training needs. Following ICAO’s Global Aeronautic Distress and Safety System (GADSS) initiative in the aftermath of Flight MH370’s disappearance and related regulation coming into force this decade, some airlines may in the future be required to install ejectable flight data recorders. But we believe the DART concept can have operational benefits that pay

DART – Providing Distress Assistance

81

for themselves and even make ejectable flight data recorders unnecessary. (With realtime aircraft data transmission, you will know both where your airplane went and have critical flight data available before the point of the end of the flight can be accessed). We emphasize the larger point that DART can be an economical complement to ejectable flight data recorders because of operational benefits. It even offers the potential to render ejectable recorders entirely unnecessary, reducing the overall complexity, weight and maintenance requirements of the airframe considerably. Recognizing the economic and safety potential of real-time flight data transmissions, EASA (European Union Aviation Safety Agency) has commissioned a Quick Recovery of Flight Recorder Data study, due to be published later in 2023 (Tender EASA.2020.HVP.06, 2021).

3 Analysis of Requirements and Prerequisites Providing assistance based on real-time telemetry is not a new concept. Both in motorsports and space flight operations, real-time telemetry is often the only means by which assistance can be provided. A Formula 1 car is so small it can only carry the driver and no one else. Similarly, spaceships and space stations are often too small to carry anyone in addition to the mission-critical astronauts. Unmanned spacecraft, such as satellites and interplanetary probes, have no one on board to begin with, and must be operated entirely remotely. Perhaps the best way to illustrate this is the notable incident of Apollo 13, which may be viewed as the original DART case. Two days into the mission, on the outbound trajectory, one of the oxygen tanks in the service module failed catastrophically (see Fig. 1). With extensive assistance from engineers on the ground, the crew managed to reconfigure the damaged spacecraft and returned safely to Earth after several manoeuvres allowed them to swing by the Moon for an early return trajectory (National Aeronautics and Space Administration, 1970). To understand what is necessary to use real-time aircraft telemetry to improve the chances of a successful outcome (be it minimizing the cost of the outcome or maximizing survivability), we can therefore turn to experience gained in space mission operations and Formula 1 racing, and compare the key lessons learned to reports of selected, past aviation incidents and accidents where the provision of additional information and resources, or the lack thereof, had a significant impact on the outcome of the situation (see above). Taking this experience into account, we can learn the following key lessons that we will explore further: 1. 2. 3. 4.

Distress assistance is not a root cause analysis, Good training and well-established operating procedures are key success factors, Integration with crew resource management is a key success factor, Efficient and effective data processing and display is a key success factors.

Lesson 1: Distress assistance is not a root cause analysis. While a party assisting a flight crew in an abnormal situation may well identify the root cause of an issue, whether or not they conclusively do so is less important than gaining the consequential knowledge required to resolve the situation satisfactorily. The

82

H. S. Griebel and D. C. Smith

Fig. 1. Photograph of the damaged service module of Apollo 13 (National Aeronautics and Space Administration, 1970)

classic example of this prioritization is the recovery of an upset spacecraft attitude: with the main parabolic dish no longer pointing towards Earth, communication can be established by way of omnidirectional antennas aboard the spacecraft. While the data rate through these means of communication is low, sufficient telemetry and telecommanding can be communicated to restore accurate pointing towards Earth and to avoid any attitude that may overheat the spacecraft by exposing the wrong panels to sunlight for too long. The root cause analysis can follow once the vehicle attitude is recovered. Similarly, the pilots aboard QANTAS Flight 32 had no knowledge of the burst stub oil pipe that caused the chain of events leading to the turbine disc failure, much less the manufacturing flaw causing it to fail in the first place (ATSB Transport Safety Report, Aviation Occurrence Investigation – AO-2010-089, 2010). Nor would that knowledge have been of much consequence to them. The consequential knowledge they needed to obtain was which of the ECAM error messages had to be taken seriously, which ones to leave for later, how the aircraft could be safely flown and which the best available runway was at the time of the incident. With a flow of data, flight ops and maintenance personnel on the ground could perform most of the functions that the additional crew members provided aboard

DART – Providing Distress Assistance

83

QANTAS Flight 32, assess the data and talk with the crew, or be on a party line with air traffic control even in the more likely event of just two pilots aboard the aircraft. Lesson 2: Good training and well-established operating procedures. This lesson should hardly come as a surprise to anyone. A DART programme is no different than any other operations programme or set of procedures in that it works best when its various elements are well rehearsed regularly. To that end, space operations crews frequently train with spacecraft simulators (digital representations of the spacecraft in question) to practice emergency recovery procedures, fault isolation skills and crew cooperation. Similarly, dispatchers, maintenance crews and DART standby crews can rehearse typical scenarios for quicker reaction times, and to establish a particular kind of operational culture that is accustomed to working in such an environment. Lesson 3: Integration with crew resource management. What QANTAS Flight 32 also demonstrated is that the five crew members in the cockpit were able to distribute the workload quickly, efficiently, and effectively between each other. DART is no different in this respect. On the contrary: the fact that the assisting party is not aboard the incident aircraft, but instead located in a facility many thousands of miles away, requires even greater discipline in Crew Resource Management (CRM). Current satellite voice communication services offer a telephony service, and future services may offer sufficient bandwidth for full-duplex video conferencing, for example using entertainment/passenger connectivity bandwidth. But they will, inevitably, be connected through an electronic device that will suffer from the same limitations as any other such means of communication. Including microphone issues, bandwidth issues, general understandability issues and a certain risk of misunderstandings. Yet this is nothing new in spacecraft operations, as the case of Apollo 13 demonstrated (see above). Even in the operations of interplanetary probes, a contributing party may be located in another control centre, at one of the Earth receiving stations on another continent or simply in an adjacent building. Lesson 4: Efficient and effective data processing and display. Experience with space flight control centre development, but also battle space management for maritime and aerial defence, shows how critically important ergonomic design, efficient and effective data processing and ergonomic display are. Even the besttrained crew can only be as good as the consequential knowledge they can efficiently and effectively learn from the actionable information displayed to them, and the reliability and integrity of the underlying data sources. For example, the fatal accident of Alaska Air Flight 261 was attributed to a worn-out ACME nut that formed part of the pitch trimming actuator and failed in flight (NTSB/AAR 02/01, 2002). The nut threads failed because insufficient lubrication caused excessive thread wear. This excessive wear put additional strain on the actuation motors, which in turn would have shown an excessive electrical current draw on the power bus on every actuation of the electrical trimming system. Without prior knowledge of the accident sequence, this may be difficult to identify amongst the many thousands of parameters available. However, an AI-enhanced telemetry and big data analysis system like the one currently developed by CGI for communication satellite operations (CGI UK Ltd., 2021) would have likely flagged the correlation of trimming system actuation and above-average current draw to engineers, who would then have had cause to inspect the system to identify the root cause of the

84

H. S. Griebel and D. C. Smith

additional force required to operate the system. A similar, and more recent example of how long-term data analysis might alert engineers ahead of a potentially dangerous deterioration, were the accidents caused in part by the malfunctioning MCAS (Maneuvering Characteristics Augmentation System) on the Boeing 737 MAX. The first of the two accident aircraft had been operated on previous flights with unreliable angle of attack indicators due to an incorrectly calibrated angle of attack sensor, resulting in erroneous stall warnings and stick shaker action, and prompting the crews to disable the systems before the MCAS stall protection function provided aggressive and catastrophic nose down inputs to recover from an incorrectly detected, deteriorating stall situation. Maintenance crews inspected the system, but without causing engineering to perform further analysis. The aircraft crashed the following day as Lion Air Flight 610, when the MCAS system again tried to recover from an incorrectly diagnosed stall, providing aggressive nose down inputs too close to the ground for the startled crew to recover. As in the case of AS261, long-term data monitoring would have provided cause for a more detailed investigation of the malfunction. While a DART assisting party might not have been able to intervene quickly enough during either of the 737 MAX’ accident flights, real-time data streaming could have made recovery of the flight data recorder much less critical. The Lion Air Flight 610 crash occurred on October 29, 2018. While the flight data recorder was found on Nov. 1, the cockpit voice recorder was found only two months later, on January 14, 2019 (Reuters, 2023). A DART programme would have meant more eyes on the data in near real-time causing some people to speak up and, therefore, providing additional and better cause for investigating the initial malfunction, and additional and better opportunities to help avert the second crash. During the accident flight of AS261, the increasing friction caused the pitch trim actuation motors to get stuck initially, leading to a spike in the current draw on the electrical bus. This information would have in turn alerted the DART standby crew, allowing them to advise the flight crew not to operate the trimming system, and to that end fly to a convenient airport for a straight-in, high-speed landing that requires no or minimal configuration changes impacting the aircraft’s pitch trim. Whether or not a DART standby crew might have come to the correct conclusion in any of these cases remains, of course, speculation. But at the time of occurrence, the critical and actionable information was not available to anyone until after the accident, and so DART would have opened up credible opportunities to save the flights. For such an opportunity to exist, however, it is important that the collected data is reliable, secure, available in an instant and processed quickly and efficiently. Assisting parties of standby crews and maintenance engineers must then be able to identify the malfunction quickly, for which ergonomic and well-laid-out telemetry displays are of equally critical importance as the crew on duty being well trained, including DART practice drills.

4 Technical and Economic Requirements Any economical implementation of DART should rely, where possible, on existing solutions that are part of today’s best practice, so that the addition of equipment and software can be avoided or kept to a minimum.

DART – Providing Distress Assistance

85

To understand what technical capability is already available today, we remind ourselves of the typically available sources of data: First, there are crash-survivable flight recorder data which are mandated by regulations and manufacturers. A real-time transmission system could listen to the DFDR echoing what it is recording and send it. Those data are very carefully characterized by the sampling rate and latency and formatted by a data acquisition unit. Second, the DFDR data plus other airline-specified parameters are recorded in a quick access recorder (QAR) system that typically transmits data on the ground and may record on a removable medium. These QAR data are typically used by Flight Operational Quality Assurance (FOQA) programs to improve operational safety and efficiency.2 Third, a part of the QAR system will format engine reports that may be transmitted in real-time over ACARS, if urgent or required by contract, and if not sent in real-time, marked for transmission on the ground. Fourth, ECAM/EICAS (Engine Indicating and Crew Alerting System) alert and warning messages transmitted typically by ACARS and presented to the flight crew can already trigger appropriate actions or precautions today. Fifth, operators are increasingly installing Aircraft Information Devices (AIDs) that may be interfaced with the same concentrated sources as the DFDR and QAR systems and data communication systems. The AID typically processes data to and from Electronic Flight Bag (EFB) computers. The AID + EFB system is more easily configurable – from both regulatory and technical perspectives than a QAR system. For example, the AID with real-time communication links may be commanded from the ground – or through a manual flight deck trigger – to send certain data in real-time over IP links or ACARS. These IP links may be over any available medium such as Inmarsat’s SwiftBroadband or Iridium’s Certus for assured cockpit communications, or communications links shared with the Inflight Entertainment (IFE) system. Of course, those data are typically encrypted and/or sent over a VPN. Routinely, EFB’s may send air data and engine parameters for flight path optimization, including adjustments for weather, by ground-based systems.3 The goal of routinely transmitting aircraft data (for example, engine and systems performance) to the ground in real-time, and by supplying their crews with information as the flight progresses (for example, updates to weather, traffic and more), is to gain benefits that exceed the costs of data transmission. A growing number of operators are learning how to leverage this technology for long-term economic, operational, and environmental benefits. Therefore, actionable information about flights in progress is becoming increasingly available.

2 “FOQA is a voluntary safety programme that is designed to make commercial aviation safer by

allowing commercial airlines and pilots to share de-identified aggregate information with the FAA so that the FAA can monitor national trends in aircraft operations and target its resources to address operational risk issues (e.g., flight operations, air traffic control (ATC), airports).” (FAA, 2004). 3 The authors gratefully acknowledge their experience with the Collins AID and Inmarsat SwiftBroadband Safety at Hawaiian Airlines.

86

H. S. Griebel and D. C. Smith

5 Spectrum, Data Bandwidth and Costing Considerations We note that the required data volume is small by comparison with common IP applications but may nevertheless generate significant costs when using exclusively safetyapproved radio spectrum. The data stored in a standard, 1024-word DFDR amounts to a fraction of common internet traffic and can be always streamed inside of a 9.6 kbps data link, about the same bandwidth of a 1980’s telephone modem. It could easily be streamed using safety-approved services in the L-band spectrum, such as, for example, Inmarsat’s SwiftBroadband Safety or Iridium’s CERTUS. Depending on the configuration, each user has up to 50 to 200 times the bandwidth available (Inmarsat, 2021), (Iridium, 2021). The advantages of these services are the relatively small antenna footprint they require, resiliency against all kinds of weather, physical separation from other, non-safety related users and global coverage. Their downside, however, is their comparatively high price per megabyte of data. Ka- and Ku-band satellite communication services, along with ground-based infrastructures such as the European Aviation Network or SmartSky Networks, offer much lower data transmission costs and several thousands of times the required bandwidth, but share channels with entertainment users, are susceptible to moisture attenuation in the atmosphere and rarely offer truly global coverage. Such systems could, however, play an important role in a multi-channel, redundant system that provides optimised data routing depending on priority and system state. This is not a new concept, but in fact what the Internet was conceived for by the U.S. Defense Department’s ARPA (Waldrop, 2015). With an IP connection open, the cost of sending a few parameters every minute can be very low – 10 bytes every second is only 36 KB/h. A system on the ground can sound alarms if one of these position/condition/flight-following reports is not received on time. It can also be interfaced with a system receiving ECAM messages over ACARS and a database of appropriate automated responses. ECAM/EICAS warnings, including some not presented to flight crews in flight, can be triggers for later artificial intelligence/machine learning exercises. Think of them as ideas or faults to be investigated later. If so configured, the contents of a buffer of recent data could be expedited to the ground as would be needed in an emergency. In the case of an emergency, it could be very important to have some data from before the start of the event. Approximate data transmission costs for L-Band can be estimated from publicly advertised sources such as Satellite Phone Store (Satellite Phone Store, 2021). Pricing is highly dependent on the monthly volume, but the ROM cost is on the order of $1/MB for L-band services. For in-flight entertainment (IFE) Ka-Band connections, industry sources say they strive for about $0.01/MB. With careful DART configuration, admittedly to be refined by testing and managed by a least-cost or best-available routing (depending on circumstances and priorities), the added communication cost would be low even at L-Band but worth every dollar for an abnormal flight. Based on these numbers, we estimate the DART-related data bandwidth costs to amount to between USD100 to USD200 per aircraft per month, depending on current capabilities and available configurations, but we expect the real-world data cost to reduce to below USD50 per aircraft per month or less within the next five to ten years.

DART – Providing Distress Assistance

87

A key technology to allow widespread use of real-time aircraft telemetry is therefore an onboard data processing system that can provide the most cost-efficient and assured data routing, depending on the circumstances. Currently, available real-time data transmission systems come either integrated into the AID (e.g., the authors have experience with the Collins Aerospace AID), as part of the Electronic Flight Bag (EFB) (e.g., the authors have used the WxOps EFB app in trials), integrated into the flight recorder (e.g., Honeywell’s Black Box in the Sky), integrated into the satellite terminal or even standalone (e.g., South Africa-based SatAuth or Canadian FLYHT Aerospace Solutions). Lastly, both Airbus and Boeing have introduced real-time data analysis programmes to which an operator may subscribe. While we do not wish to endorse any of the products above over solutions we haven’t mentioned, we chose these as examples with which we have gained prior experience to illustrate the very real possibility of implementing a DART programme with available technology. To be economically viable, future systems supporting a DART programme should also dynamically query the aircraft’s systems for relevant data depending on its current status, and dynamically route that data through assured and secured VPN channels across the best available network. Depending on the current state of the aircraft (normal operations, abnormal operations or distress), the best available can either mean the lowest cost or most reliable connection. In an emergency distress situation, the system may even route data through all available channels simultaneously, prioritise safety-critical traffic and block unnecessary traffic (such as inflight entertainment, software updates etc.). Lastly, the system may prioritize certain types of data in accordance with current ICAO guidance (ICAO DOC 10054, 2019) and AEEC guidance (AEEC 681, 2021) for the timely recovery of flight data.

6 Establishment of a DART Programme How a DART programme with an “assisting party” always available can best be set up so that it becomes economically advantageous depends on the circumstances of the operator. A commercial airline with a large fleet and its own maintenance section may wish to establish its own, in-house DART programme. Smaller commercial carriers with fewer resources may either rely on data analysis programmes offered by major aircraft manufacturers or subcontract a third-party subscription service, of which a number have become available in recent years. As we have seen, most of the basic elements of a DART programme already exist. A possible system architecture for real-time flight data transmission with an integrated DART centre can be set up without any new technology, conceivably as a dedicated system by a large aircraft operator, or as a centralised service centre by an airframe OEM, MRO, or even by a data link service provider. A large carrier based in Asia (name withheld on request by the operator) already established a real-time telemetry programme in 2017, including a database and data analysis software developed in-house, with some aircraft transmitting data through modified AIDs. Similarly, a large European carrier ran trials of a comparable nature, transmitting flight data through an in-house developed EFB app that could be activated at the captain’s discretion (name withheld on request by the operator). In the case of the European carrier, it is noteworthy that an agreement with

88

H. S. Griebel and D. C. Smith

the pilot’s union had to be reached before the system could go live. We note that flight data display and verification tools are part of routine recorder system maintenance. While major airframe manufacturers already have real-time and non-real-time telemetry analysis programmes in place, many smaller carriers, operators of an older fleet, or operators of a small number of corporate business jets, have opted to go with third-party aftermarket suppliers which, aside from offering the required hardware, also offer service-level agreements for flight data storage, analysis and distress alerting functions. The step from any of these solutions towards a full DART programme is comparatively small. We believe that well-staffed parties engaged in the analysis of flight data for economic benefits including preventive maintenance are well-positioned to instantly drop route analysis and jump into DART mode with the airline’s standby pilots and maintenance control. By arranging work schedules – and working from home – these experts could become quickly available. A 2020 neuroscience study (Chen X, 2020) found that seasoned pilots showed better functional dynamics and cognitive flexibility, making them ideally suited to form assisting parties in a DART scenario. For the necessary proficiency with the telemetry systems, as well as the required crew resource management, we recommend a dedicated simulation and training programme similar in nature to existing training programs. Providing the necessary systems for training, simulation, crew resource management and flight data displays with the aim of quick decision-making, the author’s company, CGI (and CGI IT UK Ltd. in particular), has created, built, and operated many highly successful and state-of-the-art solutions for space operations centres and defence related applications of a similar nature. The author has operated interplanetary spacecraft using software, and simulators and training facilities, provided by CGI for this very purpose. While the telemetry received from aircraft and spacecraft differs somewhat, the basic principles are nevertheless the same: secure and assured data communication, processing, storage, and dissemination systems that enable operators to obtain consequential knowledge in a quick, efficient, and effective manner, thereby enabling a timely reaction to events as they unfold. DART is no different in this regard. To remain commercially neutral, we again refrain from mentioning specific product names and reference projects. We believe the demonstrable capability as evidenced by the author’s experience and the routine application of these services across several sectors, especially space operations, provide sufficient evidence to prove the wider point that a DART programme can be established relying exclusively on proven and well-tested technology. The only new aspect is the combination of these elements with the intent of not only improving the economic performance of an aircraft operator but also opening new avenues of intervention when consequential knowledge about an aircraft’s status or performance may not otherwise be accessible in time to improve the outcome of a particular set of circumstances. Well before the feasibility of DART programmes, several experts have cited the value of CRM beyond flight and cabin crews to dispatch and maintenance (Helmreich & Fourshee, 2010). Ground support personnel would join a line of flight training (LOFT) exercises. We note that DART standby crews/assisting parties, remaining available in appropriate lounges in addition to operations centres, should ideally be composed of experienced pilots and engineers with a significant number of hours or years of experience to their

DART – Providing Distress Assistance

89

credit. We further note that pilots and engineers who form part of a DART team will also, through their recurring DART training, themselves understand their planes, and the constraints of a DART crew in assistance, increasingly well as the programme matures. The assisting parties would have dedicated computer resources that, by only a few keystrokes (if any), display the key parameters coming from the DART stream.

7 Conclusion The analysis shows that data which can benefit an operator economically can also be used to both help flight crews avoid abnormal or distress situations altogether and to assist them if an abnormal or distress situation cannot be prevented. Enabling technologies and processes already exist. However, the analysis also shows that such assistance can only be rendered effectively if it is integrated with crew resource management and associated training, and if the DART programme includes systematic big data analysis based on secure and assured data sources, ground support operations training and integration with existing FOQA and safety management systems. In essence, many of the same steps are required to reap the economic and operational benefits of real-time aircraft data. While the establishment of a DART programme requires expenses on top of and beyond the provision of data used to improve economic efficiency, preventing a single event can make it all worthwhile. We therefore conclude that a DART programme is superior to an ejectable DFDR program. It is therefore not hard to imagine a future in which having a DART programme, much like FOQA today, is part of best industry practices and not having one may be seen as reckless. We are very excited about EASA’s project regarding flight data recovery (Tender EASA.2020.HVP.06, 2021), and look forward to the results of this study.

References AEEC 681: Timely Recovery of Flight Data (TRFD). ARINC Industry Activities (2021) Airlines for America: US passenger carrier delay costs (2020). https://www.airlines.org/dataset/ u-s-passenger-carrier-delay-costs/ ATSB Transport Safety Report, Aviation Occurrence Investigation – AO-2010-089: In-flight uncontained engine failure overhead Batam Island, Indonesia 4 November 2010 VH-OQA Airbus A380-842. Australian Transport Safety Bureau, Canberra (2010) CGI UK Ltd.: Addressing the Challenges of Future Satcom Systems Using AI. UK: CGI, London (2021) Chen, X., et al.: Increased functional dynamics in civil aviation pilots: evidence from a neuroimaging study 15 (2020). https://doi.org/10.1371/journal.pone.0234790 Dossier BEA f-cp090601: Rapport final, Accident survenu le 1er juin 2009 à l’Airbus A330203 immatriculé F-GZCP exploité par Air France vol AF 447 Rio de Janeiro - Paris. Bureau d’Enquêtes et d’Analyses pour la sécurité de l’aviation civile, Paris (2012) FAA: Advisory Circular 120-82. Flight Operational Quality Assurance (2004) Helmreich, R.L., Fourshee, H.C.: Why CRM? Empirical and theoretical bases for human factors training. In: Kanki, B.G., Helmreich, R.L., Anca, J. (eds.) Crew Resource Management. Academic Press – Elsevier, San Diego (2010) ICAO DOC 10054: Manual on Location of Aircraft in Distress and Flight Recorder Data Recovery. International Civil Aviation Organisation, Montreal (2019)

90

H. S. Griebel and D. C. Smith

Inmarsat: Inmarsat SwiftBroadband Safety, 27 August 2021. www.inmarsat.com Iridium: Iridium Broadband, 27 August 2021. www.iridium.com National Aeronautics and Space Administration: Final report of the apollo 13 review board. NASA, Washington D.C. (1970) NTSB/AAR 02/01: Aircraft Accident Report Loss of Control and Impact with Pacific Ocean Alaska Airlines Flight 261 McDonnell Douglas MD-83, N963AS About 2.7 Miles North of Anacapa Island, California January 31, 2000. National Transportation Safety Board, Washington D.C. (2002) NTSB/AAR 19/02: Aircraft Accident Report, Departure From Controlled Flight Trans-Pacific Air Charter, LLC Learjet 35A, N452DA, Teterboro, New Jersey, May 15, 2017. National Transportation Safety Board, Washington D.C. (2019) NTSB/AAR 86/03: Aircraft Accident Report China Airlines Boeing 747-SP, N4522V 300 Nautical Miles Northwest of San Francisco, California. National Transportation Safety Board, Washington D.C. (1986) Reuters: Timeline Boeing’s 737 MAX crisis, 16 February 2023. https://www.reuters.com/article/ boeing-737max-timeline-idUSL1N2I417A Satellite Phone Store: Inmarsat BGAN Data Plans, 6 July 2021. https://satellitephonestore.com/ bgan-service Tender EASA.2020.HVP.06: Quick Recovery of Flight Recorder Data. EASA (2021). https:// www.easa.europa.eu/research-projects/quick-recovery-flight-recorder-data Waldrop, M.: DARPA and the Internet Revolution (2015). Accessed 21 Feb 2023, from Paving the Way to the Modern Internet. https://www.darpa.mil/attachments/(2O15)%20Global% 20Nav%20-%20About%20Us%20-%20History%20-%20Resources%20-%2050th%20-% 20Internet%20(Approved).pdf

A Semi-quantitative Method for Congestion Alleviation at Air Route Intersections Based on System Dynamics Jiuxia Guo1

, Xuanhe Ren1(B)

, Siying Xu2 , Xin Guo2 , and Yingjie Jia1

1 Air Traffic Management College, Civil Aviation Flight Univ. of China, Guanghan, China

[email protected] 2 Operation Supervisory Center, Civil Aviation Administration of China, Beijing, China

Abstract. With the development of air traffic, congestion at air route intersections has become an increasingly serious problem. To analyze the reasons for the congestion and find a systematic approach to alleviate it. This paper proposes a semi-quantitative method to analyze the congestion-related indicators using system dynamics (SD). To begin with, we construct the indicator system from four aspects: person, equipment, environment, and management. Then, the indicator weights are calculated by AHP and the entropy method, respectively, and weighted to obtain the comprehensive weights. After that, a congestion management model is established by using SD. Finally, we propose a feasible congestion alleviation solution by adjusting the value of some indicators. The results show that indicators of person and environment have a greater impact on congestion. We can adjust the comprehensive quality of controllers, the impact on passengers, the control operational environment, and the flight operational environment to achieve efficient congestion alleviation. This paper provides new insight to analyze the root causes of air route congestion problems from the system dynamics. Keywords: air route congestion · AHP · the entropy method · System Dynamics

1 Introduction Air traffic flow congestion mostly occurs at the intersection of busy air routes. To alleviate congestion, traditional air traffic flow management (ATFM) is the most common method, which includes advanced flow management and pre-flight flow management, supplemented by real-time flow management. These methods are used to control the speed and flight volume. However, with the rapid growth of flight volume and the increasingly complex operating environment, traditional methods couldn’t meet the requirement of air traffic, we should figure out the root causes of congestion and propose relevant control measures. Currently, research in alleviating air route congestion is primarily focused on four aspects. Firstly, the optimizing ATFM, Veronica focused on designing efficient 4D trajectories for the planning phase of ATFM and proposed a multi-objective approach for trajectory-based operations [1]. David constructed a combinatorial optimization model for the ATFM problem which better represented penalties and took into consideration © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 91–103, 2023. https://doi.org/10.1007/978-3-031-35389-5_7

92

J. Guo et al.

the dynamic structure of the segment configuration through intermediate waypoints [2]. To minimize deviations in airport departure and arrival schedules, Zhang developed a distributed ATFM strategy [3]. Ref [4] introduced a novel polynomial approximationbased chance-constrained optimization method to address uncertainty in ATFM. Ref [5] constructed a multi-objective air traffic network flow optimization model to alleviate airspace congestion and simultaneously reduce flight delays in ATFM. Secondly, for air traffic flow prediction, Gui established two prediction models applying support vector regression and long short-term memory to facilitate timely surveillance and optimization of air traffic flow [6]. Lin proposed an end-to-end model based on deep learning for short-term air traffic flow prediction considering temporal-spatial correlation [7]. Thirdly, for the identification of air traffic congestion and conflict, Jiang established a method to identify air traffic congestion based on complex network analysis and independent component analysis fault detection algorithm to avoid the subjective setting of thresholds by the traditional congestion identification methods [8]. Ref [9] analyzed a key conflict aircraft identification method to fully understand the air flight situation and provide a decision-making basis for controllers based on complex network theory and node deletion method. Zhang adopted methods of network science to analyze flight conflicts in the Chinese air route network, it shows that the frequency of flight conflicts follows heterogeneous with exponential distribution [10]. Finally, for air route intersections, Pang established an artificial potential field model to optimize different types of airspace structures, which considered the attractive forces produced by the optimal routes and the repulsive forces produced by the obstacles [11]. Wang studied an air route intersection classification index combining intersection complexity and collision which can help decision-makers better understand the traffic operation and make informed decisions to balance workload and efficiency [12]. However, most scholars mainly focus on studying air traffic congestion from the view of the post-analysis, and systematic and in-depth analyses of the internal factors of congestion are rarely available. Our study departs from them in that we analyze the four internal factors of “person, equipment, environment, and management” from the perspective of aviation operation safety management. Due to the complexity of intersection congestion-related factors, the use of the System Dynamics (SD) approach is founded appropriate for the present investigation, since with SD it is possible to develop a model to forecast the impact of each indicator on intersection congestion and make a reasonable assessment and optimization. In recent years there are some studies in the field of civil aviation using SD methods. Tascón studied the degree of use of the runways for the airport as well as forecasting runway needs in the medium term based on the SD approach [13]. In this regard, we constructed an intersection congestion indicator system and analyzed the feedback relationship of congestion indicators. Based on these two factors, we propose an SD-based intersection congestion management model, which can be referenced when considering intersection congestion alleviation. The remainder of this paper is organized as follows: Sect. 2 describes the major methods of the research, including the participants, the selection of indicators, the data collection and processing of the entropy method and AHP, and the construction procedure of the SD model. In Sect. 3, this paper proposes a congestion alleviation approach based on the simulation results and the optimization strategy, which can provide a reference

A Semi-quantitative Method for Congestion Alleviation

93

for ATC units. In the last section, we summarize the modeling process and simulation results as well as present future research plans.

2 Method To make the simulation results of the system dynamics model realistic and convincing. The questionnaires were distributed to the civil aviation operation supervisors and air traffic controllers of the civil aviation bureau to achieve a high-quality questionnaire collection. Moreover, the subjective entropy method and the objective AHP were combined for data processing to ensure the accuracy of the data. 2.1 Participants Seventeen responses are included in this analysis. Four of them are civil aviation operation supervisors (4 male) who come from the operations supervision center of the Civil Aviation Administration of China; Thirteen of them are air traffic controllers (12 male/1 female). The geographical distribution of participants is shown in Fig. 1. Their age is between 25 and 40 years and their working experience is between 3 and 15 years.

Fig. 1. Geographical distribution of participants

2.2 Indicator Selection From the perspective of safety management theory, we select 20 indicators from four aspects: person, equipment, environment, and management [14, 15]. Which, person factors (A1 ) include comprehensive quality (B1 ), control skills (B2 ), work procedures (B3 ), workload (B4 ), pilots (B5 ), and passengers (B6 ); Equipment factors (A2 ) including hardware (B7 ) and software (B8 ); Environment factors (A3 ) include the flight operating

94

J. Guo et al.

environment (B9 ), the airport operation environment (B10 ) and the controlled operating environment (B11 ); Management factors (A4 ) include resource management (B12 ), controller training (B13 ), regulations and management procedures (B14 ), control agreements (B15 ), and monitoring and checking (B16 ). In summary, the congestion index system is divided into 4 primary indexes and 16 secondary indexes. As shown in Fig. 2. (1) Person indicators The congestion of traffic flow is not only influenced by external factors but can also be limited by the air traffic controller. The comprehensive quality of controllers determines work efficiency, and controllers with a high level of skills are often able to cope easily with emergencies. And controllers are easily mistaken when working under stressful workloads [16]. In addition, it is easy to overlook the work procedure, its regularity determines the correctness of the operation. Next, the pilot’s flight skills and experience have a great influence on the actual operation. Lastly, it is common for congestion due to passengers’ low legal awareness, unfamiliarity with registration procedures, and serious illnesses. Comprehensive quality B1) Control skills B2) Work Procedures

B3 )

Person A1 Workload Pilots Passengers

Intersection Congestion Indicator System

B4 ) B5) B6)

Hardware

B7)

Software

B8 )

Equipment A2 Flight operating environment B9)

Environment A3

Airport operation environment

B10)

Controlled operating environment B11) Resource management

B12)

Controller training B13)

Management A4

Regulations and Management Procedures B14) Control agreements B15) Monitoring and Checking

B16)

Fig. 2. Congestion assessment index system

(2) Equipment indicators Air traffic control (ATC) equipment can be subdivided into software and hardware. It is mainly used to guide the visualization of flight routes. Among them, the hardware related to flight operations mainly includes radar equipment and airborne equipment. While the software mainly includes flight procedures, control procedures, and the ATC automation system.

A Semi-quantitative Method for Congestion Alleviation

95

(3) Environment indicators Environment indicators include three components: the aircraft operating environment, the airport operating environment, and the working environment of air traffic controllers. In particular, the aircraft operating environment includes the airspace structure, the volume of flights over the navigation station, the weather conditions of the air route, and the surrounding military activities, which have a direct impact on the safety and delay rates of flight operations. Also, as an important part of airport operation security capacity, runway operation determines the airport’s ability to accept aircraft take-off and landing. Besides, the working environment of air traffic controllers mainly refers to the lighting and the distribution of ATC equipment. (4) Management indicators Management indicators include resource management, training of air traffic controllers, regulations and management procedures, monitoring and check, and control agreements. First of all, resource management means team resource management, which assesses the psychological quality and skill level of controllers through scientific work assignments, and it achieves efficient operation by giving full effect to collaborative ability. Second, the training of controllers is divided into preinduction training, qualification training, equipment training, proficiency training, and refresher training. Third, regulations and management procedures deal mainly with airport management, navigation management, and air traffic management [17]. Fourth, the control agreement specifies the scope of the controller’s responsibilities and the rules for coordinating handovers. Lastly, the monitor and check specifically refer to the operational quality monitoring of the ATC system [18]. Its organization and implementation are divided into three stages, including the preparation phase of supervision and inspection work, the implementation phase, and the rectification phase. 2.3 Data Collection and Processing In this paper, the expert system, entropy method, and AHP were used to analyze the sample data. And the sample data were obtained by the expert scoring method. The entropy method and AHP were used to process the data, respectively. Where the indicator weight based on the entropy method is W j and the indicator weight based on AHP is wj . The comprehensive indicator weights were calculated by Eq. (1). Qj = αWj + (1 − α)wj

(1)

where, α = 0.5. And the values of the comprehensive indicator weights Qj are shown in Table 1.

96

J. Guo et al. Table 1. The comprehensive weighting of indicators

Indicators

Weighting

Indicators

Weighting

Person

0.36823

Hardware

0.044435

Equipment

0.10457

Software

0.060135

Environment

0.189435

Flight operating environment

0.08433 0.03121

Management

0.337825

Airport operation environment

Comprehensive quality

0.078355

Controlled operating environment 0.073895

Control skills

0.061865

Resource management

0.036855

Work Procedures

0.062465

Controller training

0.061275

Workload

0.06636

Regulations and Management Procedures

0.07219

Pilots

0.048615

Control agreements

0.0672

Passengers

0.05057

Monitor and Check

0.100305

2.4 Model Analysis The SD is a modeling and simulation method to better understand and analyze dynamic behavior in complex systems [19]. This paper simulated and analyzed the impact of each indicator in the system using the software Vensim. (1) The modeling steps of the SD are shown in Fig. 3. Establishing modeling objectives Systematic Analysis Define the system boundary Model debugging and correction

Feedback relationship analysis

Model Analysis

System dynamics analysis

Edit function relation

Model run Model Testing and Inspection Model Application Yes

Fig. 3. Modeling steps

A Semi-quantitative Method for Congestion Alleviation

97

(2) The causal loop diagram can visualize the feedback relationship and logical relationship between the indicators. Figure 4 shows the causal loop diagram of the congestion management model. Pilots

+

-

- -

Control skills + +

+

+

+

-

-

Person

-

Hardware -

Software

-

Comfortable controlled operating environment

Equipment + Flight operating environment

-

+

Comprehensive quality

+

Workload

Controller Training

Resource management

+

++

+ -

+

Work Procedures +

Environment

-

High-load airport operation environment +

Regulations and Management Procedures

Unreasonable control agreements

+

Management

+ Loose monitoring

Passengers

Fig. 4. The causal loop diagram of the congestion management model

With the causal loop diagram drawn by the software Vensim, the root cause tree of the indicator can be obtained. It helps to analyze the feedback process for a specific indicator. For example, Fig. 5 shows the cause tree for the indicator “Person”. (3) Stock flow diagrams are used for the quantitative analysis of the indicators. During the research, the indicators are quantified by inputting the comprehensive indicator weights Qj and initial values associated with the causal loop diagram. Then, the input values of constant indicators are dynamically adjusted to analyze the sensitivity. Finally, the feasible optimized values are selected as the simulation solution for congestion alleviation. Figure 6 shows the stock flow diagram of the congestion management model, and all the variable descriptions in the model are listed in detail in Table 2.

98

J. Guo et al.

Fig. 5. Cause tree for Person

Equipment

increase rate of Equipment

reduction rate of Equipment

Software

Flight operating environment

Hardware

Pilots

Control skills Comprehensive quality

Workload High-load airport operation environment

Passengers

Person

increase rate of Person

reduction rate of Person

Comfortable controlled operating environment

increase rate of Environmental

Environmental

Unreasonable control agreements reduction rate of Environmental

Resource management

Controller Training

Work Procedures

Loose monitoring increase rate of Management

Management

reduction rate of Management

Regulations and Management Procedures

Fig. 6. The stock flow diagram of the congestion management model

3 Results and Discussion The simulation results for the four level variables are shown in Fig. 7. Note that when the longitudinal coordinate of the curve is below zero, the impact of the objective is considered within the acceptable level of congestion.

A Semi-quantitative Method for Congestion Alleviation

99

Table 2. System boundary and parameter setting Indicators

Type

Initial Optimized Indicators values values

Type

Person

Level

5.12

5.12

Hardware

Auxiliary 4.77

4.77

Equipment

Level

5.12

5.12

Software

Constant

5.46

7.46

Environment

Level

6.10

6.10

Flight Constant operating environment

7.83

5.83

Management

Level

5.45

5.45

Airport Auxiliary 5.62 operation environment

5.62

4.62

8

Controlled Constant operating environment

4.85

7.85

Comprehensive Constant quality

Initial Optimized values values

Control skills

Auxiliary 7

7

Resource Constant management

5.83

5.83

Work Procedures

Auxiliary 5.46

5.46

Controller training

5

5

Workload

Auxiliary 6.31

6.31

Regulations Constant and Management procedures

4.92

4.92

Pilots

Auxiliary 4.25

4.25

Control agreements

Constant

6.92

5.92

Passengers

Constant

1.09

Monitor and Constant Check

4.58

4.58

3.09

Constant

The red curves reflect the simulation results while inputting the initial values. From the output results, it can be found that: • Person factors and environment factors have the greatest influence on the congestion at air route intersections. And the degree of influence is still increasing with time. • The impact of the management factor on congestion declines to an acceptable level in the 25th month, but the curve shows a significant upward trend after the 44th month. • The equipment factor has the lowest degree of influence on congestion. Its curve declines to an acceptable level in the 30th month and continues its downward trend. To reduce the impact of indicators on congestion, the sensitivity of each constant indicator to system feedback is analyzed initially. Then, the model is composite simulated by continuously changing the input values until the ideal results are obtained.

100

J. Guo et al.

Fig. 7. Initial and optimized simulation results (Color figure online)

The blue curves reflect the simulation results while inputting the optimized values. And it can be found that the output curves of each research objective show a significant decrease with time: • The influence of both the person factor and environment factor on congestion has been effectively controlled. Among them, the acceptable levels of person factor and environment factor on congestion are advanced to the 3rd and the 47th month, respectively. • The acceptable level of management factor on congestion is advanced to the 10th month. And there is no obvious trend to rebound. • The acceptable level of equipment factor on congestion is advanced to the 19th month. In conclusion, we numerically optimize the simulation model and discuss it with air traffic controllers during the research, which provides the following suggestions: Among the person factors, it focuses on improving the comprehensive quality of controllers and reducing the impact on passengers. The comprehensive quality of controllers can be analyzed and improved at three levels: the administration, the training institutions, and the controllers themselves. • For administrations, it focuses on optimizing training, supervision, and humanistic care. Among them, effective regular training and examination should be increased and unnecessary training should be reduced to alleviate the workload. Besides, the training and examinations should be adapted to the region of the controller to reduce unnecessary training; The administration should seek feedback from controllers on a regular schedule. And the inefficient control operation methods should be adjusted to help controllers establish good control consciousness; Similarly, the administration should regularly conduct anonymous surveys and talks on the controllers’ psychological, emotional, and physical conditions, which can reduce the controller’s psychological pressure and minimize illness on duty. At the same time, it can also make the controllers a sense of identity for the working and the unit.

A Semi-quantitative Method for Congestion Alleviation

101

• For the training institutions, it is important to train students not only in academic education but also in high-quality education; In the future, efforts should be focused on developing controllers’ professional terminology. In the Chinese current training model, controllers devote more time to general English exams. In contrast to the training model for pilots, bringing professional English tests ahead of the student phase will allow them more prepared to deal with problems when they enter the workplace in the future. • For air traffic controllers, frequent mistakes can be corrected in time through continuous self-learning and consulting others in daily work. In addition, when health conditions cannot allow them to continue working, a request of incompetence should be given; Meanwhile, unreasonable management or work procedures should be pointed out in time to prevent negative effects. The impact of passengers on congestion should be analyzed from two aspects: airport operation and flight operation. • For the airport operation, a city terminal should be opened so that passengers can advance baggage and check-in; In addition, more advanced security equipment needs to be installed to ensure the security process is simple, efficient, and accurate. • Passengers’ illness or delivery during flight operations can change flight plans and easily conflict with other flights. Therefore, it is necessary to set up a special medical rescue cabin on the aircraft with commonly used medical equipment and medicines to provide enough time to transfer treatment. For the equipment factor, the impact of software is mainly considered. The present control software has redundant interfaces and functions, which should be simplified and developed for front-line operation. In the future, with the development of collaboration and the application of new ATC technologies, the requirements for software will be more humanized and efficient. For the environment factors, we mainly focus on the control operation environment and flight operation environment. • The control operating environment is mainly the controller’s working environment. Among them, the working environment can directly affect work efficiency. The high room temperature and dark light will make controllers tired; Meanwhile, the distribution of control equipment will also have an impact on work efficiency. • The flight operation environment mainly refers to the external environment of the aircraft in operation. Low-speed aircraft can influence the takeoff and landing of other aircraft, and there should have special management for this type of aircraft accordingly. Likewise, other airspace users can also achieve effective airspace allocation and operation regulation by applying in advance. Furthermore, aircraft in the air route operation phase is vulnerable to serious weather. To reduce the impact of meteorological factors by enhancing the accuracy of serious weather detection by ground, space, and airborne equipment, achieving advanced prediction and processing. For management, it focuses on control agreements. At present, the operation standards and separation at many small and medium airports are unreasonable, which can lead to large flight separation, waste of airspace resources, and complex coordination

102

J. Guo et al.

processes. For unreasonable control agreements or inefficient operating processes, applications should be submitted to senior units for the design and verification of new programs in according with regulations. After receiving approval, relevant airports should follow the new processes and agreements to operate.

4 Conclusion (1) We have established the congestion indicator system of air route intersection, and the comprehensive indicator weights have been calculated by combining the entropy method and AHP. (2) This paper constructed the system dynamics model and simulated it through an in-depth analysis of the indicator system. Then, the simulation results identify a feasible solution to alleviate congestion. The results show that the congestion can be effectively alleviated by optimizing six indicators: comprehensive quality of controllers, passenger-related airport and flight operation management, software for ATC, the working environment for controllers, flight operation environment, and control agreements. (3) Among our future research plans, we will optimize the congestion management model, and increase the amount of data collection. Acknowledgments. The authors would like to give special thanks to all air traffic controllers and civil aviation operations supervisors for their contributions to this research. Their support has been invaluable in facilitating the authors’ research. This work is supported by Sichuan Provincial Science and Technology Department Project 2023YFSY0025.

References: 1. Dal Sasso, V., Fomeni, F.D., Lulli, G., Zografos, K.G.: Planning efficient 4D trajectories in Air Traffic Flow Management. Eur. J. Oper. Res. 276(2), 676–687 (2019). https://doi.org/10. 1016/j.ejor.2019.01.039 2. García-Heredia, D., Alonso-Ayuso, A., Molina, E.: A combinatorial model to optimize air traffic flow management problems. Comput. Oper. Res. 112, 104768 (2019). https://doi.org/ 10.1016/j.cor.2019.104768 3. Zhang, Y., Su, R., Li, Q., Cassandras, C.G., Xie, L.: Distributed flight routing and scheduling for air traffic flow management. IEEE Trans. Intell. Transp. Syst. 18(10), 2681–2692 (2017). https://doi.org/10.1109/TITS.2017.2657550 4. Chen, J., Chen, L., Sun, D.: Air traffic flow management under uncertainty using chanceconstrained optimization. Transp. Res. Part B Methodol. 102, 124–141 (2017). https://doi. org/10.1016/j.trb.2017.05.014 5. Cai, K.-Q., Zhang, J., Xiao, M.-M., Tang, K., Du, W.-B.: Simultaneous optimization of airspace congestion and flight delay in air traffic network flow management. IEEE Trans. Intell. Transp. Syst. 18(11), 3072–3082 (2017). https://doi.org/10.1109/TITS.2017.2673247 6. Gui, G., Zhou, Z., Wang, J., Liu, F., Sun, J.: Machine learning aided air traffic flow analysis based on aviation big data. IEEE Trans. Veh. Technol. 69(5), 4817–4826 (2020). https://doi. org/10.1109/TVT.2020.2981959

A Semi-quantitative Method for Congestion Alleviation

103

7. Lin, Y., Zhang, J., Liu, H.: Deep learning based short-term air traffic flow prediction considering temporal–spatial correlation. Aerosp. Sci. Technol. 93, 105113 (2019). https://doi.org/ 10.1016/j.ast.2019.04.021 8. Jiang, X., Wen, X., Wu, M., Song, M., Tu, C.: A complex network analysis approach for identifying air traffic congestion based on independent component analysis. Phys. A Stat. Mech. Appl. 523, 364–381 (2019). https://doi.org/10.1016/j.physa.2019.01.129 9. Wang, Z., Wen, X., Wu, M.: Identification of key nodes in aircraft state network based on complex network theory. IEEE Access 7, 60957–60967 (2019). https://doi.org/10.1109/ACC ESS.2019.2915508 10. Zhang, M., Liang, B., Wang, S., Perc, M., Du, W., Cao, X.: Analysis of flight conflicts in the Chinese air route network. Chaos Solitons Fractals 112, 97–102 (2018). https://doi.org/10. 1016/j.chaos.2018.04.041 11. Pang, B., Dai, W., Hu, X., Dai, F., Low, K.H.: Multiple air route crossing waypoints optimization via artificial potential field method. Chin. J. Aeronaut. 34(4), 279–292 (2020). https:// doi.org/10.1016/j.cja.2020.10.008 12. Wang, L., Wang, W., Wei, F., Hu, Y.: Research on the classification of air route intersections in the airspace of China. Transp. Res. Rec. 2673(2), 243–251 (2019). https://doi.org/10.1177/ 0361198118825452 13. Tascón, D.C., Olariaga, O.D.: Air traffic forecast and its impact on runway capacity. A system dynamics approach. J. Air Transp. Manag. 90, 101946 (2020). https://doi.org/10.1016/j.jairtr aman.2020.101946 14. Guo, J., Pan, C., Yang, C.: Research on the identification method of air traffic control hazards based on grey correlation analysis. J. Saf. Environ. 15(6), 157–161 (2015). https://doi.org/10. 13637/j.issn.1009-6094.2015.06.032 15. Guo, J.: Research on text classification method of hazard sources in ATC system based on natural language processing. J. Saf. Environ. 22(2), 819–825 (2022). https://doi.org/10.13637/ j.issn.1009-6094.2021.1687 16. Liu, P., Gan, X., Wei, X., Sun, J.: Control load-based cascade-resistant failure strategies for air traffic networks. Firepower Command Control 47(9), 143–152 (2022) 17. Regulations of the general administration of civil aviation of China. Official Gazette of the State Council of the People’s Republic of China, no. 6, pp. 43–46 (2008) 18. Sun, Y., Wang, Y., Yang, J., Wu, L.: Analysis and design of operational quality monitoring system for civil aviation air traffic control system. Comput. Appl. Softw. 32(4), 104–108+185 (2015) 19. Jia, S., Yang, K., Zhao, J., Yan, G.: The traffic congestion charging fee management model based on the system dynamics approach. Math. Probl. Eng. 2017, 1–13 (2017). https://doi. org/10.1155/2017/3024898

Research on Evaluation of Network Integration of Multiple Hub Airports Within a Region Linlin Li1 , Mengyuan Lu1(B)

, Tingting Lu2 , and Yiyang Zhang2

1 Civil Aviation Science and Technology Center, Beijing 100028, China

[email protected] 2 Civil Aviation University of China, Tianjin 300300, China

Abstract. The phenomenon of having multiple commercial airports within a single city or region, known as regional multi-hub airports or multi-hub groups, is prevalent globally. Despite this, research on the integration achieved by these airport groups remains in its early stages. This study aims to shed light on this issue by identifying 15 existing regional multi-hub groups using quantifiable criteria and categorizing their modes of operation. Additionally, an novel evaluation framework is proposed, considering both the integral and differentiated aspects of the airport network. This framework employs the principal component analysis method to evaluate the network integration, generate a comprehensive score, and rank the 15 groups accordingly. The analysis reveals that a key aspect of optimizing the airport group network is reducing duplicate routes. Based on the findings, relevant policy recommendations are provided, which have practical implications for future research on the integrated and sustainable development of regional multi-hub airports. Keywords: Regional multi-hub airports · Network Integration Evaluation · Hub Airport Network

1 Introduction With the advancements in economic globalization and transportation infrastructure, the scope of economic activity has expanded beyond regional borders. As a result, the study of airports has shifted from focusing on individual cities to considering regional scales. In recent decades, the international air transport market has experienced tremendous growth, leading airlines and airports to prioritize their international presence. The most notable examples are the air hubs in Northeast Asia and the Middle East, which have leveraged their advantageous locations to establish a strong presence in the global air transportation network. The development of air hubs has resulted in the formation of regional multi-hub airports globally. These airports are often located close to each other, have large capacities, and overlap in their catchment areas, leading to intense competition for passenger and cargo demand. On one hand, this competition can lead to inefficiencies and waste. On the other hand, hub airports are critical components of a national airport network system © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 104–119, 2023. https://doi.org/10.1007/978-3-031-35389-5_8

Research on Evaluation of Network Integration

105

and excessive convergence can hinder their ability to fulfill their functions and increase their global influence. As such, understanding the integration among regional multi-hub airports is of crucial importance. A well-planned regional air network is critical for airlines and airports as it can facilitate functional complementarity and coordination among regional multi-hub airports, enhance the competitiveness, and improve the efficiency and profitability of airlines. Coordinated planning of the regional air network is therefore of great significance in terms of resource allocation, regional economic development, and increasing the competitiveness and influence of hub airports. This study evaluates the integration and integration level of regional multi-hub airports from the perspective of the network, with the aim of achieving mutual benefit and sustainability of the airport groups in the region. 1.1 Multi-airport Operation Mode In the 1990s, the study of regional multi-airport systems began [1]. Subsequently, Bonnefoy built upon the systematic evaluation of the US multi-airport system and established a development model based on life cycle theory [2]. This model was then used to analyze the evolution of the global multi-airport system. With the rise of regional airport planning and development, relevant research on China’s airport clusters has become increasingly abundant [3–6]. In terms of quantitative research, commonly used methods are based on empirical distance thresholds or jurisdiction [7]. More recent studies have shifted towards spatial analysis using time-distance models, focusing on characteristics such as network structure and traffic concentration [8]. The evolution stages of airport groups and the empirical analysis of global hubs have also garnered attention [9]. 1.2 Multi-airport Connections and Network Scholarly research has indicated that the formation of airline networks is influenced by the clustering between cities [10], and the implementation of a multi-airport system enhances the capacity of the air transportation system [11]. Further studies of aviation linkages in multi-airport systems in various regions of the world, such as London, New York, and Los Angeles, revealed that cooperation between airports should be prioritized over competition [12]. The study concluded that the allocation of routes in multi-airport systems is impacted by differences in governance systems, competition among routes, and the service capabilities of the airline network [13]. 1.3 Multi-airport and Regional Interaction Academic research on the interplay between multi-airports and regions has been conducted, exploring a range of topics related to the development of the multi-airport system. Limitations such as a shortage of airspace resources, a lack of core airport ground support capabilities, and idle resources at non-hub airports are seen as important drivers for integration between airports [6]. Studies on airport competition and cooperation encompass areas like market positioning, business scale, airline company alignment, airport catchment, route planning, flight schedules, operational strategy, and management integration

106

L. Li et al.

[7, 14]. Additionally, research has delved into the relationship between multi-airports and regional development [15]. There has been a growing interest in the quantitative evaluation of the integration between airport groups and city groups [16]. Current research on multi-airports in regions has highlighted competition among airports, but research on integration primarily focuses on the need for differentiation among airports from a qualitative standpoint. There is currently a lack of literature that analyzes the degree of network integration in multi-airport groups or identifies the factors that contribute to low degrees of integration using evaluation indices.

2 Method 2.1 Identification of Multi-hub Groups The spatial distance between airports has long been a topic of discussion in the literature, with many works positing that regional airport groups should fall within a range of 50 to 300 km. Philippe and Richard adopt a similar definition, suggesting that airports within a group should serve more than 1% of total passenger traffic in the multi-airport group annually [2]. On the other hand, Sidiropoulos et al. define a regional airport group as all airports within a 150 km radius from the centroid of multiple airports [17]. As airport density increases and competition intensifies, researchers are shifting their focus from spatial distance to time distance. O’Connor et al. propose that airports in a group should not only be within 100 km of each other, but also offer at least 12 flights per day and be accessible by ground transportation in under 90 min [18]. Sun et al. use traffic data from OpenStreetMap to determine the time distance between two airports in the same regional multi-airport system, which is within 1.5 h [8]. This article defines multiple hub airports as two or more hub airports within a specific area, serving an annual passenger number of over 20 million, and located within a 300 km spatial distance from one another. The global multiple hub airport groups have been identified based on the criteria of having a regional boundary of 300 km and large hub airports with a passenger count of over 20 million in the year 2019, as depicted in Table 1.

Research on Evaluation of Network Integration

107

Table 1. Global multi hub airports. Hub group

Code

Airport

City

Country

Shanghai hub group

PVG

Shanghai Pudong International Airport

Shanghai

China

SHA

Shanghai Hongqiao International Airport

Shanghai

China

HGH

Hangzhou Xiaoshan International Airport

Hangzhou

China

NKG

Nanjing Lukou International Airport

Nanjing

China

CAN

Guangzhou Baiyun International Airport

Guangzhou

China

HKG

Hongkong International Airport

Hongkong

China

SZX

Shenzhen Baoan International airport

Shenzhen

China

CTU

Sichuan Shuangliu International Airport

Chengdu

China

CKG

Chongqing Jiangbei International Airport

Chongqing

China

Paris hub group

CDG

Paris Charles de Gaulle Airport

Paris

France

ORY

Paris Orly Airport

Paris

France

Frankfurt hub group

FRA

Frankfurt International Airport

Frankfurt

Germany

MUC

Munich International Airport

Munich

Germany

HND

Tokyo Haneda International Airport

Tokyo

Japan

NRT

Tokyo Narita International Airport

Tokyo

Japan

SVO

Sheremetyevo International Airport

Moscow

Russia

DME

Domodedovo International Airport

Moscow

Russia

ICN

Incheon International Airport

Seoul

South Korea

GMP

Gimpo International Airport

Seoul

South Korea

Pearl River Delta hub group

Chengdu-Chongqing hub group

Tokyo hub group

Moscow hub group

Seoul hub group

(continued)

108

L. Li et al. Table 1. (continued)

Hub group

Code

Airport

City

Country

Barcelona hub group

BCN

Barcelona airport

Barcelona

Spain

PMI

Palma de Mallorca Airport

Palma

Spain

BKK

Suvarnabhumi Airport

Bangkok

Thailand

DMK

Bangkok International Airport

Bangkok

Thailand

IST

Ataturk International Airport

Istanbul

Turkey

SAW

Sabiha Gokcen International Airport

Istanbul

Turkey

LHR

Heathrow Airport

London

The U.K.

LGW

Gatwick Airport

London

The U.K.

MAN

Manchester International Airport

Manchester

The U.K.

LAX

Los Angeles International Airport

Los Angeles

The U.S.

SAN

San Diego International Airport

San Diego

The U.S.

JFK

Kennedy International Airport

New York

The U.S.

LGA

LaGuardia Airport

New York

The U.S.

EWR

Newark international airport

Newark

The U.S.

BOS

Logan International Airport

Boston

The U.S.

PHL

Philadelphia International Airport

Philadelphia

The U.S.

BWI

Baltimore International Airport

Baltimore

The U.S.

MIA

Miami international airport

Miami

The U.S.

FLL

Fort Lauderdale Airport

Fort Lauderdale

The U.S.

MCO

Orlando International Orlando Airport

Bangkok hub group

Istanbul hub group

London hub group

Los Angeles hub group

New York hub group

Miami Hub

The U.S.

Research on Evaluation of Network Integration

109

In 2019, the average passenger number for the 15 hub group airports was 49.88 million, making them a crucial player in the national air transportation industry. These airports boast a large operational scale domestically and globally. Furthermore, the hub group airports tend to have a relatively high share of international flights, with an average of 46%. On average, these airports connect to 298 cities both domestically and internationally, and possess a hub-and-spoke route network, which allows for easy transfer of passengers for international air travel (Table 2). Table 2. Basic data of 15 global multi-airports groups. Hub group

Code

Airport

2019 Passengers (in 10 thousand)

Shanghai hub group

PVG

Shanghai Pudong International Airport

7615

42%

511846

SHA

Shanghai Hongqiao International Airport

4564

3%

272928

HGH

Hangzhou Xiaoshan International Airport

4067

9%

290919

NKG

Nanjing Lukou International Airport

3058

9%

234869

CAN

Guangzhou Baiyun International Airport

7339

24%

491249

HKG

Hongkong International Airport

7142

100%

430294

SZX

Shenzhen Baoan International airport

5293

10%

370180

CTU

Sichuan Shuangliu International Airport

5586

12%

366887

CKG

Chongqing Jiangbei International Airport

4479

7%

318398

CDG

Paris Charles de Gaulle Airport

7615

92%

482676

ORY

Paris Orly Airport

3185

69%

221405

FRA

Frankfurt International Airport

7056

89%

513912

MUC

Munich International Airport

4794

80%

417138

HND

Tokyo Haneda International Airport

8551

22%

458368

NRT

Tokyo Narita International Airport

4429

83%

265217

SVO

Sheremetyevo International Airport

4993

53%

386370

DME

Domodedovo International Airport

2825

42%

207114

Pearl River Delta hub group

Chengdu-Chongqing hub group

Paris hub group

Frankfurt hub group

Tokyo hub group

Moscow hub group

Proportion of International Passengers

2019 Movements (a movement is a landing or take-off of an aircraft at an airport)

(continued)

110

L. Li et al. Table 2. (continued)

Hub group

Code

Airport

2019 Passengers (in 10 thousand)

Seoul hub group

ICN

Incheon International Airport

7120

99%

406598

GMP

Gimpo International Airport

2545

17%

151939

Barcelona hub group

Bangkok hub group

Proportion of International Passengers

2019 Movements (a movement is a landing or take-off of an aircraft at an airport)

BCN

Barcelona airport

5266

73%

344558

PMI

Palma de Mallorca Airport

2972

75%

217218

BKK

Suvarnabhumi Airport

6542

81%

380909

DMK

Bangkok International Airport

4131

43%

284954

Istanbul

IST

Ataturk International Airport

5203

76%

329900

hub group

SAW

Sabiha Gokcen International Airport

3557

40%

235717

London hub group

LHR

Heathrow Airport

8089

94%

478002

LGW

Gatwick Airport

4658

93%

284987

MAN

Manchester International Airport

2944

91%

203041

Los Angeles hub group LAX

Los Angeles International Airport

8807

28%

691257

SAN

San Diego International Airport

2522

4%

231354

JFK

Kennedy International Airport

6255

55%

456060

LGA

LaGuardia Airport

3108

7%

373356

EWR

Newark international airport

4634

31%

446320

BOS

Logan International Airport

4259

20%

427176

PHL

Philadelphia International Airport

3302

12%

390321

BWI

Baltimore International Airport

2699

5%

262597

MIA

Miami international airport

4592

49%

416773

FLL

Fort Lauderdale Airport

3675

24%

331447

New York hub group

Miami Hub

(continued)

Research on Evaluation of Network Integration

111

Table 2. (continued) Hub group

Code

Airport

2019 Passengers (in 10 thousand)

MCO

Orlando International Airport

5061

Proportion of International Passengers

14%

2019 Movements (a movement is a landing or take-off of an aircraft at an airport) 357689

2.2 Construction of Index Framework The concept of a “multi-airport group with a higher degree of integration” refers to a synergistic process in which the various sub-systems within the larger system are optimized through collaboration and integration. The evaluation index framework is constructed by incorporating indicators from both the integration and differentiation aspects. The primary objective of the integrated evaluation is to enhance the overall performance of the hub group’s network, while the differential evaluation focuses on the extent of differentiation in the operation of airports within the hub group. The second-level indicators that contribute to this evaluation are the hub group’s network accessibility, the quality of route services, the availability of connecting opportunities, and the diversification of routes. These are presented in Table 3. 2.3 Evaluation Method In this paper, the authors use principal component analysis to assess the level of integration among multiple hub airports within a region. The method involves transforming multiple indicators into a smaller number of key and uncorrelated indicators, referred to as principal components, which capture over 85% of the original indicators’ information. The use of principal component analysis offers several benefits, including a reduction in the number of indicators through dimensionality reduction and a more objective representation of the relationships between indicators through the analysis of the base data, which is reflected in the weights assigned to the integrated indicators. Additionally, the transformed indicators not only encapsulate the information of the original indicators but also are independent of each other, reducing duplication and redundancy in the information presented.

112

L. Li et al. Table 3. Evaluation index of the integration degree of multi-airport network.

For the purpose of describing the method, it is assumed that there are n objects being evaluated, and a certain evaluation factor for these objects contains p evaluation indicators. Let xij be the value of the jth indicator for the ith evaluated object in the given evaluation factor, and suppose there are n evaluated objects and p evaluation indicators for this factor, the matrix of the original data is recorded as X = (xij)n × p . The main computational steps of the principal component analysis method are as follows:

Research on Evaluation of Network Integration

113

Standardization of raw indicator data: xij∗ =

where xij =

1 n

n  i=1

 xij , sj =

1 n

n  

 1 xij − xij sj

xij − xj

(1)

2

i=1

Computes the correlation coefficient matrix for the indicator data:   R ◦ R = rjk p×p , (j = 1, 2, . . . p; k = 1, 2, . . . p)

(2)

where rjk is the correlation coefficient between the index j and the index k. rjk =

1 n ∗ x∗ ∗ xjk i=1 ij n−1

(3)

Calculate the eigenvalue, eigenvector, contribution rate and cumulative contribution rate of the correlation matrix R, and select m principal components according to the  cumulative contribution rate. The eigenvectors L=T k lk1 , lk2 , . . . , lkp is corresponding to the eigenvalues of λk , then λi contribution rate is: λk ak = p

(4)

i=1 λi

The cumulative contribution rate of the m eigenvalues is

m  i=1

pλi

j=1 λj

. When the cumu-

lative contribution rate of is greater than 85%, it is the number of principal components, and the linear combination formula for the corresponding principal components is: p Yik = lkj xij∗ (k = 1, 2, . . . , p; i = 1, 2, . . . , n) (5) j=1

Calculate the average of the evaluation elements: m Fi = ak YIk k=1

(6)

Calculation of principal component scores. By applying the previous steps to calculate the evaluation value for each evaluation element, an evaluation element matrix is formed. Using this matrix as the new original data matrix, the steps are repeated to obtain the scores for each principal component for each evaluated object. The objects are ranked according to the scores of their principal components, with those with higher scores being considered better than those with lower scores.

114

L. Li et al.

3 Results The traffic data used in this analysis was sourced from the ACI Traffic 2019 Report, while the scheduling information was obtained from OAG for 2019. The calculations were performed using the Python programming and SPSS. Specifically, the connection opportunity was selected for the schedule data of December 12, 2019, and calculated by Python, calculation method is shown in Table 3. In evaluating the performance of different hubs, several key factors were considered, including the number of destinations, daily flight frequencies, transfer connection opportunities, and the number of international flights. These factors were deemed positive indicators and the higher their values, the better the overall performance of the hub. In particular, a value close to 1 indicated a well-balanced mix of long-distance and surrounding international flights. However, the results of the calculations showed that all values were less than 1 and thus, no further processing was necessary. The larger the value of the index, the greater its contribution to the overall integration degree. In contrast, the route repeatability was considered a reverse index, with smaller values being preferable. To account for this, the opposite of the route repeatability was positively processed. Given that the number of airports in each hub group can vary, the indices were standardized by dividing the values of the number of destinations, daily flights, transfer connection opportunities, international flights, and route repeatability by the number of airports in each hub group. Before conducting the principal component analysis, it was necessary to standardize the index values to account for differences in their dimensions. In this study, this was achieved through the use of Z-score standardization in SPSS as formula 1 indicated, resulting in dimensionless data with a mean value of 0 and a variance of 1. The original evaluation index data and the standardized data are presented in Table 4. The results of a principal component analysis, conducted using SPSS statistical analysis software on the data, are presented in Table 5. The two principal components obtained account for 78.4% of the original information contained in the six indicators. To simplify the data, the original six variables are replaced with two new variables, with the variance of the first principal component being 3.18 and the variance of the second principal component being 1.53. As shown in Table 6, the largest coefficients in the first principal component belong to P3, P5, and P6, which represent the wave hit rate of transit flights, the balance between long-distance and surrounding international flights, and the degree of route repetition. Hence, the first principal component represents the transfer capacity and degree of differentiation of the hub group. On the other hand, the largest coefficients in the second principal component belong to P1, P2, and P4, which represent the number of navigation points, daily flight density, and international flight volume, respectively. This means that the second principal component represents the international operation and integrity of the hub group.

Research on Evaluation of Network Integration

115

Table 4. Standardized evaluation index data. Hub group

Number of destinations

Daily flight frequencies

Opportunities for connecting flights

International flight

Balance of long-distance and surrounding international flights

Duplicate routes

Los Angeles hub group

−1.17

1.07

0.27

−1.03

11%

−0.15

Shanghai hub group

−0.4

−0.19

0

−0.2

−0.22

11%

New York hub group

0.06

0.39

−0.12

−1.44

−24%

−0.76

Tokyo hub group

1.49

−0.91

0.4

1.23

−29%

−0.02

−38%

−1.99

Miami Hub

1.92

Chengdu-Chongqing hub group

−0.75

Pearl River Delta hub group

0.95 −0.8

Seoul hub group Bangkok hub group

0.56

−0.49

−0.79

1.36

−0.84

−0.3

69%

−1.24

−0.69

−0.33

−65%

0.08

−0.8

0.27

−29%

2.16

−1.11

−0.37

0.22

−78%

−0.24

0.83

2.18

1.31 −0.47

−0.36

0.41

−0.73

0.31

−51%

Istanbul hub group

0.77

−0.6

−0.4

0.37

−56%

0.12

Barcelona hub group

0.14

−1.28

−0.35

1.32

−20%

−1.18 −0.04

Paris hub group

2.17

2.99

−0.92

337%

Moscow hub group

−1.68

−0.14

0.94

−0.98

−7%

0.8

Frankfurt hub group

−0.76

0.31

68%

−0.68

−0.33

−0.18

London hub group

0.04

Table 5. Explanation of total variance. Component Initial eigenvalue

Extract square and load

Overall % of variance Cumulative % Overall % of variance Cumulative % 1

3.18

52.92

52.92

3.18

52.92

52.92

2

1.53

25.43

78.35

1.53

25.43

78.35

3

0.63

10.52

88.88

4

0.36

5.97

94.85

5

0.23

3.85

98.70

6

0.08

1.30

100.00

Through the aforementioned procedures, the values for Factor 1 and Factor 2 are calculated. Then, the data in the component matrix in Table 5 are divided by the square root of the respective eigenvalues of the principal components in Table 6, yielding the corresponding values of the two principal components in each index. This allows us to calculate the scores of Principal Component 1 and Principal Component 2 for each hub group, as shown in formulas 7 and 8. The eigenvalues corresponding to each principal

116

L. Li et al. Table 6. Component Matrix. Component 1

2

P1: Number of destinations

−0.71

0.58

P2: Daily flight frequencies

0.88

0.11

P3: Opportunities for connecting flights

0.70

0.58

−0.78

0.24

P5: Balance of long-distance and regional international flights

0.76

0.56

P6: Duplicate routes

0.48

−0.69

P4: International flight

component, divided by the total eigenvalues of the extracted principal components, are used as weights to calculate the comprehensive score of the principal components. This calculation is shown in formula 9 and the results are presented in Table 7. Component 1 = −0.40 × P1 + 0.49 × P2 + 0.39 × P3 − 0.44 × P4 + 0.43 × P5 + 0.27 × P6

(7) Component 2 = 0.47 × P1 + 0.09 × P2 + 0.47 × P3 + 0.19 × P4 + 0.45 × P5 − 0.56 × P6

(8) Y = 52.92/78.35 × Component 1 + 25.43/78.35 × Component 2

(9)

4 Discussion The ranking of the comprehensive scores of network integration are presented in Table 7. The Los Angeles hub group has the highest comprehensive score and ranks first in terms of differentiation in transit capabilities, international operations, and integrity. The Shanghai hub group has the second highest comprehensive score. The New York and Tokyo hub groups follow in third and fourth place, respectively. The Miami hub group ranks fifth. The remaining five hub groups, listed in order, are Istanbul, Barcelona, London, Moscow, and Frankfurt. The top five hubs with the highest scores in transit capacity and differentiation are Los Angeles, Shanghai, New York, Tokyo, and Miami, while the top five with the highest scores in international operations and integration are Los Angeles, Frankfurt, Paris, London, and the Chengdu-Chongqing hub group. The Los Angeles hub group has the highest daily flights, connection opportunities, and balance between long-haul and regional international routes. The annual passenger volume at Los Angeles Airport is 3.6 times that of San Diego Airport, which focuses mainly on domestic routes. Los Angeles showcases a “comprehensive developmentsingle focus” model in its operation, with international passengers accounting for 30% and domestic passengers accounting for 70%. In terms of route structure, the Los Angeles hub group has a balanced distribution of 37,675 long-haul international flights and 39,718

Research on Evaluation of Network Integration

117

Table 7. Evaluation score of the network of each hub group. Hub group

Transit capability and International degree of differentiation Operations and Integration

Overall

Principal Component 1

Score

Ranking

Principal Component 2

Ranking

Ranking

Los Angeles hub group

4.06

1

2.92

1

3.69

1

Shanghai hub group

1.56

3

−0.39

11

0.92

2

New York hub group

1.59

2

−1.02

12

0.74

3

Tokyo hub group

1.42

4

−1.11

14

0.60

4

Miami Hub

0.83

5

−0.18

10

0.50

5

Chengdu-Chongqing hub group

0.44

6

0.05

5

0.32

6

Pearl River Delta hub group

0.18

8

−0.17

9

0.07

7

Seoul hub group

0.38

7

−2.03

15

−0.40

8

Bangkok hub group

−0.07

9

−1.10

13

−0.41

9

Paris hub group

−1.56

13

0.93

3

−0.75

10

Istanbul hub group

−1.13

10

−0.11

7

−0.80

11

Barcelona hub group

−1.41

11

−0.16

8

−1.00

12

London hub group

−1.81

14

0.61

4

−1.02

13

Moscow hub group

−1.52

12

−0.07

6

−1.05

14

Frankfurt hub group

−2.97

15

1.85

2

−1.41

15

regional international flights, as well as a dense network of domestic routes. This results in the highest degree of coordination in its route network. On the other hand, the New York hub group has higher connection opportunities among the 15 hub groups, thanks to its multiple airports and convenient ground transportation between them. The main issue for the bottom five hub groups in overall ranking, such as Frankfurt, London, Moscow and Barcelona, is their high degree of route duplication, followed by the imbalance of long-haul and regional international routes, and lack of international flights. Thus, improvements in these three areas are necessary for these groups to improve their network. It can be noted that the high degree of route repetition is the primary cause of the low comprehensive score among these hub groups. The hub group is classified into three categories based on the ranking discrepancy between Principal Component 1 and Principal Component 2. The Shanghai, New York, Tokyo, Miami, and Pearl River Delta hub groups possess a higher ranking in Principal Component 1 as compared to Principal Component 2, which suggests that their transit

118

L. Li et al.

capacity and level of differentiation surpass their international operations and integration. Hence, it is imperative for these hub groups to enhance their overall connectivity, raise the frequency of daily flights, and elevate the level of international operations to sustain their transit capacity and degree of differentiation. On the other hand, the hub groups of Chengdu-Chongqing, Seoul, Bangkok, Paris, Istanbul, Barcelona, London, Moscow, and Frankfurt rank higher in Principal Component 2 than in Principal Component 1, indicating that their international operations and integration are more pronounced than their transit capacity and differentiation. These groups must prioritize the improvement of flight wave construction within their hub group, balance their long-distance and regional international routes, and enhance network differentiation to boost their overall competitiveness. Finally, the Los Angeles hub group has a relatively balanced development of transit capacity, differentiation, and international operations and integration.

5 Conclusion In this study, the concept of a multi-hub group is introduced and 15 such groups are identified worldwide based on quantifiable criteria. The concept of “multi-hub airport network integration” is defined from the viewpoint of network integration and differentiation, and an novel evaluation index framework for the network integration level of these hub groups is developed. The integration level of the 15 hub groups is calculated using principal component analysis, resulting in a ranking with the Los Angeles hub group at the top. The integration and differentiation problems of each hub group are analyzed based on their scores and rankings, and the strengths of high-integration hub groups are examined to provide guidance for others. Finally, improvement suggestions and recommendations are put forward for each multi-hub airport group. Acknowledgements. The authors are grateful to the HCII committee and reviewers who provided valuable comments on this paper.

References 1. Hansen, M.: Positive feedback model of multiple-airport systems. J. Transp. Eng. æ(6), 453– 460 (1995) 2. Bonnefoy, P.A.: Scalability of the air transportation system and development of multi-airport systems: a worldwide perspective. Thesis, Engineering Systems Division, Massachusetts Institute of Technology (2008) 3. Fritsche, B.: Airport System Planning Practices: A Synthesis of Airport Practice. FAA, ACRP Program Synthesis 14 (2009) 4. Fuellhart, K.: Airport catchment and leakage in a multi-airport region: the case of Harrisburg International. J. Transp. Geogr. 15, 231–244 (2007) 5. Gao, Y.: Estimating the sensitivity of small airport catchments to competition from larger airports: a case study in Indiana. J. Transp. Geogr. 82, 102628 (2020) 6. Li, K.W., Miyoshi, C., Pagliari, R.: Dual-hub network connectivity: an analysis of all Nippon Airways’ use of Tokyo’s Haneda and Narita airports. J. Air Transp. Manag. 23, 12–16 (2012)

Research on Evaluation of Network Integration

119

7. Loo, B.P.Y., Hob, H.W., Wong, S.C.: An application of the continuous equilibrium modelling approach in understanding the geography of air passenger flows in a multi-airport region. Appl. Geogr. 25, 169–199 (2005) 8. Sun, X., Wandelt, S., Hansen, M., Li, A.: Multiple airport regions based on inter-airport temporal distances. Transp. Res. Part E 101, 84–98 (2017) 9. Vowles, T.M.: Geographic perspectives of air transportation. Prof. Geogr. 58(1), 12–19 (2006) 10. Wang, J.: Spatial organization and structure. In: International Encyclopedia of Geography: People, the Earth, Environment and Technology (2017) 11. de Bonnefoy, P.A., Neufville, R., Hansman, R.J.: Scalability of the air transportation system and development of multi-airport systems: worldwide perspective. J. Transp. Eng. 136(11), 1021–1029 (2010) 12. Derudder, B., Devriendt, L., Witlox, F.: A spatial analysis of multiple airport cities. J. Transp. Geogr. 18, 345–353 (2010) 13. Burghouwt, G.: Airport capacity expansion strategies in the era of airline multi-hub networks. OECD (2013). https://www.oecd-ilibrary.org/content/paper/5k46n421b15ben?cra wler=true&mimetype=application/pdf 14. Postorino, M.N., Praticò, F.G.: An application of the Multi-Criteria Decision-Making analysis to a regional multi-airport system. Res. Transp. Bus. Manag. 4, 44–52 (2012) 15. Woodburn, A.: Investigating neighborhood change in airport-adjacent communities in multiairport regions 1970–2010. Transp. Res. Rec. J. Transp. Res. Board 2626, 1–8 (2017) 16. Yang, Z., Yu, S., Notteboom, T.: Airport location in multiple airport regions (MARs): the role of land and airside accessibility. J. Transp. Geogr. 52, 98–110 (2016) 17. Sidiropoulos, S., Majumdar, A., Han, K., Schuster, W., Ochieng, W.: A framework for the classification and prioritization of arrival and departure routes in multi-airport systems terminal manoeuvring areas. In: Proceedings of the 15th AIAA Aviation Technology, Integration, and Operations Conference. AIAA Aviation, p. 3031 (2015) 18. O’Connor, K., Fuellhart, K.: Airports and regional air transport markets: a new perspective. J. Transp. Geogr. 56, 78–82 (2016)

Usability Evaluation of an Emergency Alerting System to Improve Discrete Communication During Emergencies Elizabeth Manikath1,2(B) , Wen-Chin Li1 , Pawel Piotrowski2 , and Jingyi Zhang1 1

Safety and Accident Investigation Centre, Cranfield University, Cranfield, UK [email protected] 2 Lufthansa Technik AG, Hamburg, Germany Abstract. In public flight attendants’ primary responsibility is often downplayed and considered as serving meals and beverages. However, insuring safety and security is the key role of cabin crew. In case of emergencies, flight attendants are the first point of contact for passengers. The only possibility for alerting is to use the existing Passenger Call Button. Nevertheless, flight attendants cannot distinguish whether it is a service call or an emergency. Research question. This paper tries to close the existing gap in the emergency communication between passengers and cabin crew. Methodology. Therefore, a prototype of a virtual emergency button which shall be integrated into the IFE screens, has been presented to passengers and cabin crew (N = 56) to assess the user acceptance using the Van der Laan scale. Results. There was a significant difference in the sub-scale of satisfying for the passenger rating. Discussion. The significant difference could result from passengers being the main target group for the use of the emergency button. Furthermore, since not every aircraft is equipped with IFE screens concepts for a physical emergency button are presented in this paper. Conclusion. To enhance passenger safety an emergency alerting device is necessary to maintain the high safety standards of aviation. Detailed usability analysis for physical emergency buttons are necessary in future studies. Keywords: in-flight emergencies · emergency alerting · passenger safety · emergency call procedure · emergency call button · aircraft cabin emergency

1 1.1

Introduction Research Background

Flight attendants’ main duty is to ensure safety and security of all passengers in the aircraft cabin [6]. However, the safety role of cabin crew is often trivialized and only reduced to keeping the cabin “tidy” and evacuating the passengers [5]. Especially during potential emergencies, flight attendants are the first contact c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 120–134, 2023. https://doi.org/10.1007/978-3-031-35389-5_9

Usability Evaluation of an Emergency Alerting System

121

person for passengers. For instance, passengers on board of Air Ontario flight 1363 approached the lead cabin attendant to inform about ice buildup on the aircraft wings [17]. However, the flight attendant missed to pass over this information to the flight crew, which resulted in a fatal crash. A good example for the importance of the safety and security duty of cabin attendants is Pan Am Flight 73 in September 1986. The cabin crew was praised internationally to have performed outstandingly trying to save the lives of the passengers, especially US American citizens, which were in the focus of the hijackers. Moreover, after the final shooting some of the remaining cabin crew re-entered the aircraft to search for survivors [2]. One of the survivors of this accident, Mike Thexton, stated following “I may be biased but I feel that day proved that the flight attendants on board were some of the best in the industry” in his book “What happened to the hippy man” stressing the importance of the cabin crew. Another example for the importance of cabin crew is flight BA762 passengers tried to inform the cabin attendant of the detached engine cowl by pressing the call button and shouting [1]. The lead flight attendant contacted the flight deck via interphone about the “highly unusual” behavior of the passengers. In almost all common mass transportation vehicles, so-called Emergency Buttons are installed as an additional safety feature to alert the vehicle operator or the service personnel. According to the norm BS EN 16334, a Passenger Safety Device needs to be installed in at least every compartment of trains, in each vestibule, and closer than 12 m reach [4]. Due to the Delhi gang rape case in 2012, it is mandatory by law in India since 2015 to install panic buttons in public service vehicles [22]. However, in an aircraft cabin there is - as of now - no Emergency Button. It is only possible to use the Passenger Call Button to contact the cabin crew. A hypothesis would be that historically the risks of air travel have been euphemized e.g. safety announcements: “in the unlikely event of a loss in cabin pressure [...]”) and safety equipment which is stowed invisibly in closed cabinets [17]. The first flight attendants were called “Sky Girls” to make the business traveller feel at home with a clear focus on customer service instead of security [17]. At that time the typical air traveller were affluent since air travel was not affordable and only accessible for a small social stratum [18]. The focus of these passengers was on comfort as it is shown in Fig. 1. Since the number of passengers in the aircraft cabin to be taken care of was lower, it seems that additional means such as alerting buttons were unnecessary. After World War II commercial aviation increased rapidly and commercial air travel developed as mass transportation. However, the passengers and their requirements on the cabin crew changed, but the equipment remained unchanged. The number of people on board to be taken care of increased by a factor of about seven within 40 years, max. 119 (de Havilland Comet 4C) in the 1960s vs. max. 853 on the Airbus A380 in the 2000s. It seems today that a maximum in capacity of passengers per aircraft has been reached in absolute numbers, however, pressure in cost efficiency leads to an increase of passengers a single flight attendant is responsible for. The dominant role of the cabin crew is “accommodation over safety and reassurance over authority” to avoid

122

E. Manikath et al.

Fig. 1. Historic picture (a) shows a flight attendants taking care of passengers and picture (b) a flight attendant serving a passenger [8].

mass panics on board [17]. Additionally in case of emergencies, aircrews face the dilemma of how to dose the information giving to passengers to avoid panic and self-evacuations. An openly visible panic button would destroy the carefully staged performance of the cabin crew of “denying death” [17] and would draw the attention towards possible risks of air travel. However, such provisions might be necessary and support cabin crew in their task performance. 1.2

Examples of Existing Emergency Buttons in Mass Transportation

Emergency buttons appear in various forms and sizes. In the following, the design of emergency alerting devices in vehicles such as cars and trains will be presented. Other applications (e.g. elevators, lifts, plants) were intentionally left out. Passenger Call Button. The Passenger Call Button in modern aircraft can be found in the overhead panel close to the reading light and air outlet (refer to Fig. 2). On some aircraft, you can also find a respective button in the armrest. In case the seat is equipped with a removable controller (Passenger Control Unit), an additional call button is also usually included. Moreover, there is also one call button in lavatories. In older aircraft, the call button used to be integrated into the armrest. In Fig. 2(b), the call button as well as the earphone jack and the light button are integrated into the armrest. Since both buttons are located close by, this could lead to a confusion. The similar design could easily be mistaken in a dark cabin.

Usability Evaluation of an Emergency Alerting System

123

Fig. 2. Detailed picture (a) shows the overhead panel with reading lights, air outlets and Passenger Call Button. Picture (b) shows an example for a misleading design [7].

There are reports about “countless pointless trips” of flight attendants to passenger seats only because the call button was accidentally pushed [19]. When the button is pushed, a chime alerts the flight attendants. It is possible for the cabin crew to reset and prevent the calls by respective settings of the whole passenger call system. In modern aircraft types such as the continuously enhanced Boeing 737, the new “Sky Interior” introduces a novel design by locating the Passenger Call Button away from the light switch and making it well distinguishable [19].

Fig. 3. Detailed icon of Passenger Call Button

There are controversies about when to use the call button [21]. Since on some buttons a person with a tray table is depicted, as shown in Fig. 3, it implies that a service call could be made. However, there are also flight attendants insisting on only using the button in case of an emergency. eCall Button in Cars. eCall is an automated system, which in case of serious accidents calls the 112 emergency service using telephone and data link [10]. It

124

E. Manikath et al.

can also be activated manually e.g. in case of a medical emergency by pressing the button, which is typically located in the upper front console between the driver and the passenger seat (refer to Fig. 4). For all cars manufactured in the European Union and the U.K. after March, 31st 2018 the eCall system needs to be mandatorily installed. In case of an accident the system automatically calls the nearest emergency response center transmitting the exact vehicle location, time of the accident, direction of travel and vehicle identification number [10]. The telephone link allows the passengers to directly communicate with the emergency center [10]. The aim of the system is to reduce fatalities in road accidents by increasing the speed of emergency services [14]. It is estimated that rescue time will be shortened by 60% in urban areas and by 50% in rural areas in the EU [14]. According to the standard BS EN 16072-2022 (E) in case the eCall system is activated it shall provide “a clear visual and/or audible information” on the state of the connection. In case of malfunction, the driver shall also be notified [3]. The actual design and positioning of the button itself is not standardized and part of the product design. An accidental triggering shall be prevented [3]. The vehicle occupants need to be made aware that in case the eCall system is either automatically or manually activated a voice connection with the emergency center will be established [3].

Fig. 4. The picture (a) shows the eCall button in the upper front console of an Audi A5. The detailed picture (b) shows an icon of eCall button.

Passenger Emergency Device in Trains. The requirements for passenger alarm system are described in BS EN 16334-2:2020. According to the standard, a passenger alarm system needs to be installed in every urban rail train, at least in every compartment and in each vestibule [4]. The system combines three functions: Firstly, passengers can alert the driver or the Operations Control Centre in case of emergencies. Secondly, the train can be stopped according to the “operational rules” and lastly the train can move or stop at a safe location.

Usability Evaluation of an Emergency Alerting System

125

As soon as the Passenger Alarm Device is pressed, the driver will receive an acoustic and visual warning signal [4]. Once activated the passenger will receive either a visual and or an acoustic signal as a feedback. The location of the alerting device shall grant easy access. It needs to be assured that the passengers do not need to walk more than 12 m, walk through doors or take stairs to reach the device [4]. The equipment needs to be identifiable from at least 60 m distance. To avoid unintentional operation it needs to be clearly distinguishable regarding size, shape and color [4]. Virtual Emergency Button. Manikath and Li [15] propose to include a virtual emergency button as a mean to improve communication between passengers and cabin crew in case of need. The suggested solution shall be included into the IFE home screen [15]. The article proposes a two-step functionality (reporting and confirmation) to reduce unintentional use as shown in Fig. 5.

Fig. 5. The picture (a) shows the Virtual Emergency Button embedded into home screen. The reporting screen is shown in (b) and the confirmation screen in (c), [15].

Manikath and Li [15] used the Van der Laan scale to assess the user acceptance of this virtual emergency button. In the piloting study the user acceptance was low. Assumptions for the low scores were design deficiencies as well as only one use case (medical emergencies) was presented to the participants as a possible application area [15]. The aim of this paper is to derive differences in the user acceptance of this proposed design between the two focus groups passengers and cabin crew. The following hypotheses were tested in this paper in order to assess the differences in the user acceptance: H1: There is a significant difference in the user acceptance of the emergency button between passengers and cabin crew. H2: There is a significant difference in the perceived usefulness of the emergency button between passengers and cabin crew. H3: There is a significant difference in the user acceptance of the emergency button between European and Asian passengers. H4: There are significant differences in the user acceptance of the emergency button among participants from different ethnicities.

126

E. Manikath et al.

H5: There are significant differences in the satisfying of the emergency button among participants from different ethnicities. H6: There are significant differences in the usefulness of the emergency button among participants from different ethnicities.

2

Methodology

The current unsatisfactory situation in aviation was the central motivation to initiate a study regarding the emergency call procedures in order to unify and simplify the procedures for the passengers and improve the situation awareness and response time of the crew. An experimental survey study was chosen as the starting point. The goal is to identify the expectations of the potential users. This approach shall ensure that the pursued improvement complies with what the user expects in the emergency intuitively and under stress. 2.1

Material

Two separate online surveys were distributed to assess passengers’ and cabin crews’ user acceptance of a prototype version of a virtual emergency button. The first section of both questionnaires were basic demographic questions. Afterwards, the users rated their experience using the prototype according to the Van der Laan scale [13]. The Van der Laan scale measures user acceptance on two sub-scales: usefulness and satisfaction [13]. Participants rated their perception after using the prototype on a five-point Likert scale. Nine statements were presented to the participants (see Table 1) which needed to be evaluated. Statements 1, 3, 5, 7, and 9 count for the usefulness dimension, whereas statements 2, 4, 6, and 8 belong to the satisfying dimension. To calculate the individual scores for the usefulness and satisfaction, the statements need to be coded from +2 to −2 from left to right. Items 3, 6, 8 need to be coded −2 to +2 from left to right, because these statements are mirrored [13]. Adding up the two sub-scale scores results in the overall score per participant. The usefulness and satisfaction rating of the assessed system is greater the higher the individual score. 2.2

Research Design

Qualtrics (www.qualtrics.com) was used to generate and distribute the online questionnaire. The link to the survey ensured anonymous participation. A signed and read consent form was the prerequisite to start the survey. The questionnaire started with basic demographic questions. The link to the prototype of the virtual emergency button and user instructions on the usage were shared with the participants within the survey. After familiarizing with the instructions users were asked to evaluate the prototype using the Van der Laan scale. Usually, it takes less than 15 min to complete the online questionnaire.

Usability Evaluation of an Emergency Alerting System

127

Table 1. The Van der Laan scale [13]

My judgments of the (...) system are ... (please tick a box on every line) 1 2 3 4 5 6 7 8 9

3 3.1

useful pleasant bad nice effective irritating assisting undesirable raising alertness

useless unpleasant good annoying superfluous likeable worthless desirable sleep-inducing

Results Basic Demographics

The 55 participants are separated into two focus groups of passengers (87%) and cabin crews (13%). More than half of the participants has European ethnicity, and almost a quarter of the participants has Asian ethnicity. Both these ethnicities represents the majority of more than 78% participants. Less than a quarter is represented by a minority of participants of other ethnicities including African, North American, Oceanic, and South American. The detailed demographic information is shown in Table 2. Participants’ rating scores on three sub-dimensions of the Van der Laan scale: usefulness, satisfying, and user acceptance of the call button were collected and calculated. The independent samples t-test analysis was applied to examine the difference in the Call Button acceptance between passengers and cabin crews. The effect sizes of samples were quantified by Cohen’s d. Furthermore, the differences in the call button acceptance among participants from six different ethnicities were compared by one-way ANOVA. The post-hoc pairwise comparisons were accomplished by Bonferroni, and the effect sizes of samples were quantified by partial eta square (ηp2 ). 3.2

Differences in the User Acceptance Between Passengers and Cabin Crew

The independent samples t-test results showed that there is no significant difference in the user acceptance of the emergency button between passengers and cabin crew, p = 0.231. Therefore, “H1: There is a significant difference in the user acceptance of the emergency button between passengers and cabin crew.” was rejected. For the sub-dimension of satisfying, the rating of passengers is significantly higher than cabin crews, t(53) = 2.24, p = 0.015, Cohen’s d = 0.91.

128

E. Manikath et al. Table 2. Basic demographic information of 55 participants Category

Count Percentage

Position

Passenger Cabin Crew

48 7

87% 13%

Gender

Male Female Other

25 27 3

45% 49% 5%

18 20 7 10

33% 36% 13% 18%

European 31 Asian 12 African 4 North American 4 Oceanic 2 South American 2

56% 22% 7% 7% 4% 4%

Age (years old) 50 Ethnicity

Therefore, “H2: There is a significant difference in the satisfying of the emergency button between passengers and cabin crew.” was supported. There is no significant difference in the sub-dimension of usefulness between passengers and cabin crews, p = 0.366. Therefore, “H3: There is a significant difference in the usefulness of the emergency button between passengers and cabin crew.” was rejected. The means and standard deviations, as well as the independent t-test results of the Van der Laan scale scores rated by passengers and cabin crews are shown in Table 3. Table 3. The means and standard deviation of the Van der Laan scale scores and independent t-test results between passengers and cabin crews. N M ean SD Usefulness

Passenger 48 Cabin Crew 7

3.48 0.57

Satisfying

Passenger 48 2.98 Cabin Crew 7 −1.00

t-test t df

4.55 0.97 7.74

4.14 2.24 53.0 5.94

User Acceptance Passenger 48 6.46 8.19 1.32 Cabin Crew 7 −0.43 13.48

p

Cohen’s d

6.62 0.366 0.58 0.029 0.91

6.66 0.231 0.77

Usability Evaluation of an Emergency Alerting System

3.3

129

Differences in the User Acceptance Between Passengers and Cabin Crew

According to one-way ANOVA results, there is no significant difference in usefulness (p = 0.820), satisfying (p = 0.785), and user acceptance (p = 0.376) scores among participants of different ethnicities: European, Asian, African, Northern American, Southern American, and Oceanic. Therefore, “H4: There are significant differences in the user acceptance of the emergency button among participants from different ethnicities.”, “H5: There are significant differences in the satisfying of the emergency button among participants from different ethnicities.”, and “H6: There are significant differences in the usefulness of the emergency button among participants from different ethnicities.” were not supported. The means and standard deviations, as well as the one-way ANOVA results of the Van der Laan scale scores rated by participants from different ethnicities are shown in Table 4. Table 4. The means and standard deviation of the Van der Laan scale scores and one-way ANOVA results among different ethnicities. N M ean SD

one-way ANOVA F df 1 df 2 p

ηp2

Usefulness

European 31 2.52 Asian 12 4.17 African 4 3.75 North American 4 5.50 Oceanic 2 1.50 South American 2 1.50

5.64 0.44 5 4.53 4.79 2.08 4.95 6.36

49

0.820 0.043

Satisfying

European 31 1.68 Asian 12 3.50 African 4 3.75 North American 4 3.00 Oceanic 2 2.50 South American 2 5.00

5.44 0.49 5 3.23 3.40 1.63 0.71 4.24

49

0.785 0.047

User Acceptance European 31 4.19 Asian 12 7.67 African 4 7.50 North American 4 8.50 Oceanic 2 4.00 South American 2 6.50

10.74 0.38 5 7.48 6.61 2.08 5.66 10.61

49

0.863 0.037

130

4

E. Manikath et al.

Discussion

In this study, the user acceptance with the sub-dimensions usefulness and satisfaction according to the Van der Laan scale was assessed between the two focus groups passengers and cabin crew. The first finding is that there is no significant difference in the overall user acceptance between those two focus groups. This is an indication on the positive assessment of the emergency button. Additionally, there is also no significant difference in the sub-scale of usefulness. The subdimension measures the practicality of the product [13]. This is an indication that for both focus groups, the concept of an emergency button is appealing and the need for such a new device is substantiated. In a previous study conducted by Manikath and Li [16] some of the participating cabin crew proposed to either add an additional functionality on the existing call button (e.g. triggering multiple times) or to implement a “medical emergency call/assistance” in order to increase situation awareness during a medical emergency. Hence, this undermines the identified research gap of a missing emergency alerting. However, looking at the overall scores separately from the two focus groups: the overall user acceptance for the target group cabin crew is low (M = −0.43, SD = 13.48). In comparison passengers rated the overall user acceptance positively (M = 6.46, SD = 8.19). This could result from the fact that cabin crew are not the primary user of the emergency button itself. This finding is in line with Eason [9] who defines three different user types. The primary user is defined as the one who is directly interacting with the new technology. Either the secondary user is using the device irregularly, may benefit from its use or have an influence on the usage of the primary user. This could be e.g. a technician, developer or caregiver in a health care environment [23]. The tertiary user would be someone affected by the new product and having a direct influence on the purchasing or implementation decision [23]. According to this definition, passengers belong to the group of primary users and cabin crew to the group of secondary users. Regarding the pleasantness of the new technology, there is a significant difference in the sub-dimension of the satisfying score between the two focus groups. Passengers rated the satisfaction significantly higher than cabin crew did. This could again result from passengers being the primary users of the emergency button. The prototype is designed to be integrated into the passengers’ IFE screen and primarily fulfill passengers’ requirements. The IFE screens are located in every seat and are accessible within the reach of one arm length. Therefore, the main user would be the passenger and not the cabin crew. However, the overall functionality of the emergency call should increase the flight attendants’ situation awareness and improve discrete communication. Especially in cases, such as terrorism, suspicious activities, and rude passengers discrete communication is vital to ensure a positive outcome. However, the user interface and alerting function for cabin crews have not been presented during this study, which could be another reason for the lower satisfaction scores. The main goal of user centered design is to transmit information presented on the user interface which is “generally understandable” [20]. Previous research found out that different cultures and languages could pose different needs on

Usability Evaluation of an Emergency Alerting System

131

how the information is presented [20], and it should not lead to any misunderstandings. Looking at the cross-cultural evaluation of the prototype design, the results indicate that there are no such biases. Looking at the use of colors red and green were found to be internationally understood [20]. The colors used are in accordance with international standards ISO 22324 e.g. red for danger/abort an action, and green for safe/continue an action as used on the reporting screen [11]. However, previous research revealed that the use of symbols is dependent on the cultural background and could be easily misinterpreted [20]. The exclamation mark used as a symbol in the prototype is a general warning symbol according to ISO 7010 [12]. There are two potential limitations on the design of the virtual emergency button. Firstly, it is not an inclusive design considering disabilities such as vision or auditory impairment. Secondly, not every aircraft has built-in IFE screens. Therefore, solutions for a possible physical emergency buttons including the findings of this study need to be evaluated as an alternative to the virtual button. 4.1

Prospect Studies

This study provides an initial assessment on the user acceptance of the virtual emergency button. Currently there is only one type of general Passenger Call Button, which is simultaneously used for service request as well as for emergency alerting. This design has its roots in the beginning of commercial aviation; however, it is sub-optimal for the safety requirements of a highly optimized and cost efficient aviation of the 21th century. Based on the findings of this study, following requirements should be fulfilled by a potential physical Emergency Call Button. An additional (at least one) Emergency Call Button (ECB) integrated in each seat row or at each seat is desired. The ECB shall be easily reachable and shall indicate clearly its purpose to be used only during emergencies. To fulfill access requirements one Emergency Call Button shall be within the range of each passenger at any time; especially when the passenger is seated and the “fasten your seat-belt” signs are turned on. This limits the potential location to a radius of about one arm length around the seat. Small people and children shall be considered in the definition of the maximum distance, too. Within the range of the seated passenger and on the majority of seat rows, the following cabin components are potentially available for implementation of an Emergency Call Button: – – – – – –

the the the the the the

Passenger Service Unit (PSU) channel above the passengers’ head, upper part of the seat in front of the passengers’, armrest of the passengers’ seat, passengers’ seat- and/or IFE-remote control, sidewall and bulkhead panel, as well as privacy room divider.

To avoid cross-cultural biases the pictogram on the Emergency Call Button as well as the color and shape shall be clearly understandable by every passenger

132

E. Manikath et al.

independent from the international, cultural, or religious background. Additionally, using Braille font on the button would make the design more accessible for vision-impaired passengers [24]. For both alerting devices, detailed usability testing is necessary using standardized questionnaires such as SUS and QUIS. Additional interviews with subject matter expert are necessary to evaluate the optimal screen content for the virtual solution. Furthermore, experimental testing for the physical button with participants (experienced and layperson target groups) is necessary. For this purpose, an aircraft environment seems to be ideally suitable to assess response time and usability. Therefore, potential Emergency Call Button concepts have to be selected, designed and manufactured. These prototypes shall be installed into an aircraft cabin mock-up where the user testing will take place.

5

Conclusion

With the rise of low cost carriers, air travel became more affordable and evolved to a mass transport vehicle. However, changing passenger needs and the highly optimized and efficiency driven airline operations are not considered sufficiently in the operational procedures of flight attendants. The risks of flight are euphemized (e.g. in the safety instructions “in the unlikely event of a loss in cabin pressure [...]”) and the job of flight attendants is trivialized [5,17]. The public perception of flight attendants responsibility are mainly serving meals and beverages, as well as keeping the cabin “clean and tidy” [5]. However, the main task of flight attendants is to ensure safety of all passengers on board. This paper emphasizes the fact that there is a gap, which needs to be closed in future in order to enhance passenger safety and support cabin crew in their task performance. Aircraft are considered as the means of transport with the highest safety and security standards. However, there is a substantial lack of possibilities for passengers to discretely alert in case of an emergency on board an aircraft. Since the terrorist attacks of 9/11, aircraft can be considered as vulnerable means of transport. It is necessary to improve the passenger possibilities of emergency alerting and close this gap in order to improve the safety and security concept. It is surprising that in terms of passenger alerting, aircraft do not offer any facilities compared to other transportation vehicles. For instance in buses or trains emergency alerting devices are implemented as a standard since decades although the safety and security requirements are lower compared to aviation. In that context, this paper presents a tool on how to alert cabin crew of potential emergencies. The overall positive scores on the sub-scale of usefulness confirm the need for such an alerting device. The Van der Laan scale provides an initial estimate on the user acceptance with the two sub-scales usefulness and satisfaction. Regarding the detailed design of the prototype, additional usability evaluations utilizing commonly used questionnaires such as SUS and QUIS will provide more information on potential design flaws. Embedding a virtual emergency button into the existing IFE screens would be a cost- and weight-efficient way to improve

Usability Evaluation of an Emergency Alerting System

133

passenger safety on-board commercial and business/general aviation. The subsequent alerting for the flight attendants e.g. in the galley workspace, purser panel or as an indicator light in the respective seat row was not presented to the participants of the study. It is necessary to develop the technical concept of the system architecture. Moreover, since not every aircraft is equipped with an IFE screen the integration of a physical emergency button needs detailed assessment. In subsequent studies, it is necessary to assess whether such an alerting device would enhance cabin crews’ situation awareness.

References 1. Air Accident Investigation Branch: Air Accident Report 1/2015 Report on the accident to Airbus A319-131, G-EUOE London Heathrow Airport, May 2013. https://www.gov.uk/aaib-reports/aircraft-accident-report-12015-airbus-a319-131-g-euoe-24-may-2013. Accessed 10 Jan 2023 2. BBC: Homepage (2022). https://www.bbc.com/news/world-asia-35800683. Accessed 29 Dec 2022 3. British Standards Institution: Intelligent transport systems. ESafety. PanEuropean eCall operating requirements. BS EN 16072:2022 (E) (2022) 4. British Standards Institution: Railway applications. Passenger alarm system System requirements for urban rail. BS EN 16334-2:2022 (2022) 5. Chute, R., Wiener, E.: Cockpit-cabin communication: I. A tale of two cultures. Int. J. Aviat. Psychol. 5, 257–276 (1995) 6. Damos, D.L., Boyett, K.S., Gibbs, P.: Safety versus passenger service: the flight attendants’ dilemma. Int. J. Aviat. Psychol. 23(2), 91–112 (2013) 7. Darnell, M.J.: Bad Human Factors Design (2023). https://baddesigns.com. Accessed 06 Feb 2023 8. Deutsche Lufthansa AG: Flight Attendant History (2023). https://medialounge. lufthansagroup.com/de. Accessed 05 Feb 2023 9. Eason, K.: Information Technology and Organizational Change. Taylor & Francis, London (1987) 10. EU Homepage: Ecall (2022). https://europa.eu/youreurope/citizens/travel/ security-and-emergencies/emergency-assistance-vehicles-ecall/index en.htm. Accessed 29 Dec 2022 11. International Standards Organization: ISO 22324 (2023). https://en.wikipedia. org/wiki/ISO 22324. Accessed 19 Feb 2023 12. International Standards Organization: ISO 7010 (2023). https://en.wikipedia.org/ wiki/ISO 22324. Accessed 19 Feb 2023 13. Van der Laan, J.D., Heino, A., De Waard, D.: A simple procedure for the assessment of acceptance of advanced transport telematics. Transp. Res. Part C Emerg. Technol. 5, 1–10 (1997) 14. Li, Y., Majeed, N., Liu, J.Q.: In-Vehicle system design for the European Union emergency call. In: IEEE International Conference on Electro/Information Technology (EIT), pp. 0908–0912 (2018) 15. Manikath, E., Li, W.C.: Developing an innovative health monitoring device to improve communication between passengers and cabin attendants during Inflight Emergencies. Transp. Res. Procedia 66, 125–135 (2022)

134

E. Manikath et al.

16. Manikath, E., Li, W.C.: Usability evaluation of a web interface for passengers’ health monitoring system on detecting critical medical conditions. In: Harris, D., Li, W.C. (eds.) HCII 2022. LNAI, vol. 13307, 3rd edn., pp. 74–84. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06086-1 6 17. Murphy, A.: The flight attendant dilemma: an analysis of communication and sense-making during in-flight emergencies. J. Appl. Commun. Res. 29(1), 30–53 (2001) 18. Petrauskaite, G.: Flight Attendant History, May 2021. https://www.aerotime.aero/ articles/28032-stewardess-flight-attendant-history. Accessed 14 Dec 2022 19. Reuters: New Skyinterior (2023). https://www.reuters.com/article/us-airshowbutton-idUSTRE75K1XR20110621. Accessed 12 Dec 2023 20. Romberg, M., R¨ ose, K., Z¨ uhlke, D.: Global demands of non-European markets for the design of user-interfaces. IFAC Proc. Vol. 137–141 (1998). https://doi.org/10. 1515/9783110452440-006 21. The Sun: Flight Attendant Button (2023). https://www.thesun.co.uk/travel/ 17966529/passenger-call-button-flight-attendant/. Accessed 15 Jan 2023 22. Times of India: public vehicles to have tracking device panic button, October 2022. https://timesofindia.indiatimes.com/city/chandigarh/public-vehiclesto-have-tracking-device-panic-button/articleshow/95000298.cms. Accessed 10 Jan 2023 23. Triberti, S., Riva, G.: Engaging users to design positive technologies for patient engagement: the perfect interaction model. In: Patient Engagement: A Consumer Centered Model to Innovate Healthcare, pp. 56–65 (2015) 24. University of Washington: Making air travel accessible (2023). https://www. washington.edu/doit/making-air-travel-accessible. Accessed 20 Feb 2023

The Application of a Human-Centred Design Integrated Touchscreen Display with Automated Electronic Checklist in the Flight Deck Takashi Nagasawa(B)

and Wen-Chin Li

Safety and Accident Investigation Centre, Cranfield University, Cranfield, UK [email protected]

Abstract. The touchscreen is not yet determinedly implemented in commercial aircraft cockpit displays. While automation has been implemented considerably in many cockpit systems, the operation of Non-Normal Checklists (NNCs) on the electronic checklist (ECL) system is not yet fully automated. This study examined how to apply the Human Centred Design (HCD) approach to developing touchscreen inceptors on synoptic displays and automating NNC operations considering Human-Machine Interaction (HMI). A total of 57 participants, including 21 professional pilots, joined the onsite and remote experiments to conduct scenarios with automated ECL display and Synoptic/Control Display (S/CD). The designed displays were evaluated through multiple questionnaires including NASA-TLX, SART and SUS. Onsite participants’ pupil diameters and fixations were measured through eye-tracking device. Also, a thematic analysis was utilised to find common themes regarding touchscreen and automation in the cockpit from the feedback from participants after conducting experiments. The results confirmed certain relationships between subjective and objective measurements. Based on the ideas from thematic analysis, innovative designs of the Adaptive Activation of Switches concept on S/CD and the Flexible/Adaptive Automation concept on ECL were proposed. Also from the thematic analysis, Human-Machine Teaming (HMT), Explainability and Adaptability were identified as crucial elements of the HCD approach for designing future commercial cockpits. Keywords: Adaptive Automation · Eye-Tracking · Human Centred Design · Synoptic Display · Touchscreen

1 Introduction 1.1 Background of Research Displays are the primary means by which pilots achieve Situational Awareness (SA) [1]. Still, touchscreens, one of the most intuitive input devices which have gotten a significant population in ordinal life (e.g., smartphones and portable electronic devices), have not been aggressively implemented in commercial aircraft cockpits except for limited application cases (e.g., [2, 3]). Proximity Compatibility Principle (PCP) [4] is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 135–149, 2023. https://doi.org/10.1007/978-3-031-35389-5_10

136

T. Nagasawa and W.-C. Li

one of the theories supporting the prominence of touchscreen manipulation of the systems over traditional system manipulation by physical switches located far from the display, like overhead panels. Although the touchscreen seems a good interface considering the PCP, considering some disadvantages, like the difficulty to grasp the “big picture” briefly, its usability should be meticulously considered from operational and human factors perspectives. For the transportation industry, one of the main drivers for automation is increasing operation capacity and flexibility [5]. In commercial aviation industry, automation is one of the critical elements that enable the accomplishment of flight operation tasks by two pilots, which were undertaken by five pilots sixty years ago [1]. Non-Normal Checklists (NNCs) provide pilots with step-by-step instructions to deal with non-normal situations in the flight deck, and in modern commercial aircraft, the NNC is stored in electronic form and hosted in a system called Electronic Checklist (ECL). The closed-loop function, which allows the aircraft to check whether certain steps in NNC are completed [6], automates some portion of NNC operations. Still, a considerable portion of NNC operations requires manual handling by pilots. The situation conducting NNC tends to be a high workload situation for pilots because they have to deal with not only the system failure itself but also other tasks for the remaining of the flight simultaneously (coordination with ATC, airline operation centre, cabin, etc.). Therefore, manual handling of NNCs imposes an additional workload on pilots when they are already busy. Consideration of Human-Machine Interaction (HMI) is crucial because increasingly complex interaction with increasingly computerised systems [7] is needed in today’s commercial aircraft cockpit. Implementing further automation and touchscreens would inevitably increase the reliance on computerised systems (software), so the considerations on HMI would become more crucial in the future. As one of the considerations of HMI, workload measurement is a critical aspect of the evaluation of a new cockpit design and mental workload assessment has been a key aspect of flight deck certification since 1993 [1]. To measure the workload, while subjective measures are easy to obtain, they are by definition subjective and it is a fact of life that people’s subjective reports do now always coincide with their performance [4]. Measures of visual scanning, or eye tracking, are useful in understanding the qualitative nature of workload changes [4]. Many previous studies used eye-tracking to measure participants’ workload on new cockpit design concepts (e.g., [8, 9]). 1.2 Research Aim Further automation in the flight deck will be almost inevitable, considering the present and future operational circumstances of transport aircraft. Touchscreens have a potential to enable pilot more efficient system operations in combination with automation. The Human Centred Approach (HCD) considers two characteristics of humans: (1) humans have limitations; (2) humans are unique problem-solvers in unanticipated situations [7]. Before the touchscreen and automation are implemented in commercial aircraft cockpits on a broader scale, to achieve a good HMI, the HCD approach should be considered from the design stage to evaluate the effects of new technologies on human performance and, especially in the aviation domain, the safety of flight operations. In these perspectives, the high-level research question of this study can be defined as: How can the touchscreen

The Application of a Human-Centred Design Integrated Touchscreen Display

137

and more automation be implemented in NNC operations in commercial aircraft cockpits so that aircraft systems can assist pilots’ tasks in all possible non-normal situations? Two fictional near-future commercial cockpit display designs were developed to achieve the research aim. By the experiments with these designs, workload, situational awareness, system usability and other aspects relative to cockpit display design were evaluated. Eye-tracking was also utilised to measure the participants’ workload in objective and quantitative values.

2 Methods 2.1 Overall Experiment Design The onsite experiment was conducted in one of the rooms of Martell House on Cranfield campus of Cranfield University. Remote experiments were conducted in parallel. As a follow-on of the experiment, interviews were conducted with selected five participants of remote experiments. For onsite experiments, in total 35 university students and staffs joined the experiment (male: 25, female: 10). For remote experiments, 22 participants joined the experiment (all are male). The research proposal was submitted to the Cranfield University Research Ethics System for ethical approval (CURES/15971/2022) and was approved before starting the experiment. 2.2 Simulated Display Design Four displays were simulated for two display designs in the experiment. Boeing 787 (B787) was referred to as a baseline for creating the display designs to represent the latest cockpit display design of commercial jet aircraft [6]. Synoptic/Control Display (S/CD): This study adapted virtual switches on touchscreens as system manipulation inceptors. An electronic system synoptic for imaginary commercial aircraft was designed, and virtual switches for manipulations of electronic systems were integrated into the synoptic display (Fig. 1). Electronic checklist (ECL) and automatic ECL (aECL): The closed-loop function is one of the features which automatically tick the steps by sensing the switch/selector positions manipulated by the pilots [10]. This conventional ECL feature was implemented in one of two designs developed in this experiment (ECL design). A more advanced version, the automatic ECL function, was adopted in the other design (aECL design, Fig. 1). aECL autonomously conducts the steps in the displayed NNC without pilots’ manipulations of switches/selectors. The process required for NNC operation is described in Fig. 2. Engine Display/Crew Alerting and Monitoring System (ED/CAMS) and Primary Flight Display (PFD): ED/CAMS was configured to accommodate vital information of primary sub-systems. Alert messages appear on CAMS when system malfunctions are detected, together with a beeper sound (Fig. 3). PFD displays primary parameters regarding the flight control of aircraft. In this experiment, B787’s PFD design [6] was simulated (Fig. 3).

138

T. Nagasawa and W.-C. Li

Fig. 1. The format of information presentation on aECL and S/CD. The checklist is displayed on the left side and the system autonomously manipulate appropriate switches on S/CD. In ECL design, participants manually read the checklist and manipulate through S/CD.

Fig. 2. NNC operation flows in ECL and aECL. The sequences of steps continue from left to right. In ECL design the “read and do” step must be done manually. In aECL all the steps are done automatically.

2.3 Experiment Procedures Simulating software and computer screens: Display designs and interactions were programmed utilizing Adobe Experience Design (XD) software [11] so that the NNC operations are precisely simulated as if participants are interacting with actual aircraft cockpit displays. For onsite experiments, two computer screens were used for projecting the simulated displays and one of them was 14-in. touchscreen display of a note PC (Fig. 4). Eye tracking: For onsite experiments, Pupil Core from Pupil Lab [12] was used to record participants’ eye movements. The device accommodates two cameras in a plastic frame which can be put on the participant’s face like regular glasses. The world camera

The Application of a Human-Centred Design Integrated Touchscreen Display

139

Fig. 3. The format of information presentation on ED/CAMS and PFD. When system malfunction occurs, relevant alert message appears on CAMS.

Fig. 4. Screen arrangement in onsite experiments. The right two displays, ECL and S/CD are projected on the touchscreen and participants manipulated the system by touching appropriate switches on S/CD.

records the participant’s forward vision, and the eye camera captures the movements of the participant’s right eye. Four Areas of Interests (AOIs) were defined on each simulator display (PFD, ED/CAMS, ECL and S/CD) for post-experiment analysis. Experiment Scenarios: Two scenarios with ECL design and two with aECL design were prepared for onsite experiments. Each participant conducted one scenario with ECL design and one with aECL design. For onsite experiments, the order of scenarios was almost equally distributed to counterbalance the learning effect of the system on the

140

T. Nagasawa and W.-C. Li

evaluation. For remote experiments, one scenario with ECL design and one with aECL design were prepared. In all scenarios, the execution of NNC “AC BUS XX” and “APU GEN” were required. Figure 5 show the flow of the scenario. The briefing document, video and tutorial scenarios were provided to participants to let them be familiar with the system before conducting the experiments. After conducting each scenario, participants were asked to fill in the questionnaires to evaluate the scenario and design manipulated.

Fig. 5. Experiment scenario flow chart. Two NNC tasks were conducted. Between them, a ATC communication task was also imposed.

Interview procedure: Five participants in remote experiments joined the interview. Interviews were individually and remotely conducted through Microsoft Teams. In the semi-structured interview, three generic questions were asked to participants: 1) Do you agree that replacing the physical switchers in the cockpit with virtual, touchscreen switches is a good idea? 2) What system/function should be and should not be automated further? 3) About the non-normal procedure, are you happy to allot the system to conduct the procedure automatically? What is needed to build trust between you and the system? Based on these questions, discussions regarding touchscreen and automation in the cockpit were developed. After the interview, interview minutes were created by the researcher and verified by the participants. Dependent Variables: Through the experiments, following dependent variables were measured or taken from the participants. 1) Eye-tracking data (onsite only): fixation counts and durations, and pupil diameters. 2) Task completion time for ECL and aECL scenario (onsite only). 3) Subjective workload (NASA-TLX score). 4) Subjective SA (SART score). 5) Subjective usability of systems (SUS score). 6) Comments against questionnaires or open-ended questions in interview, regarding the system designs experienced in the experiment.

3 Result 3.1 Sample Characteristics The median for onsite participants’ age was 20–29, while the median for remote participants’ age was 50–59. Regarding flight experience, among 35 onsite participants, only one participant possessed the ATPL license and type-rating for jet aircraft (A320 family). The average flight hour of onsite participants was 203 (SD = 1183), while 31 participants had no flight experience. Among 22 remote participants, 18 and three pilots

The Application of a Human-Centred Design Integrated Touchscreen Display

141

possessed ATPL and CPL, respectively. The mean flight hour of remote participants was 10,870 (SD = 4339). 3.2 Objective Measurements From the eye-tracking device, among 35 participants in onsite experiments, the data of fixation count/duration was successfully taken from 27 participants. Among them, the data of pupil diameter was successfully taken from 22 participants. Table 1 shows the means and SDs of the variables for each AOI and display design (ECL or aECL). Table 1. Summary of results of objective measurement on Eye Tracking parameters Variables

Display design N

PFD M

Fixation count

ECL

M

M

SD

S/CD SD

M

SD

ECL

27 39.2 27.3

22.9 20.5

79.2 29.9

66.6 30.6

aECL

27 46.1 27.6

34.4 22.2

74.1 38.6

53.5 27.3

27 194

51.8

168

53.2

175

26.6

192

35.7

27 197

40.5

184

40.9

187

34.8

206

44.5

Fixation duration ECL (msec) aECL Pupil diameter (mm)

SD

CAMS

ECL

22 3.35

0.74 3.36

0.68 3.25

0.70 3.36

0.74

aECL

22 3.04

0.64 3.00

0.69 2.93

0.72 2.88

0.70

Paired t-Tests showed a significant difference of fixation count on CAMS (t(26) = 1.94, p = 0.032 (one-sided), d = 0.374) and S/CD (t(26) = 1.73, p = 0.048 (one-sided), d = 0.333) by display design. Also, paired t-Tests showed a significant difference of fixation duration on ECL (t(26) = 2.36, p = 0.013 (one-sided), d = 0.454) and S/CD (t(26) = 2.14, p = 0.021 (one-sided), d = 0.413) by display design. Paired t-Tests showed a significant difference in average pupil diameter on all AOIs by display design. For PFD, t(21) = 2.36, p = 0.028, d = 0.502. For CAMS, t(21) = 2.52, p = 0.020, d = 0.538. For ECL, t(21) = 2.48, p = 0.022, d = 0.528. For S/CD, t(21) = 3.38, p = 0.003, d = 0.721. From the eye-tracking video captures of 27 onsite participants, task completion times were measured. The mean of task completion time in ECL design was 63.9 s (SD = 6.39). The mean of that in aECL design was 72.9 s (SD = 0.89). Paired t-Test showed a significant difference in task completion time by the display design (t(26) = 7.32, p < 0.001, d = 1.408). 3.3 Subjective Workload, Situational Awareness, and System Usability Paired t-Tests showed a significant difference in NASA-TLX values by display design in onsite experiments (t(34) = 4.76, p < 0.001, d = 0.805) Independent samples t-tests showed a significant difference in NASA-TLX values by experiment location in aECL design (t(54) = 4.00, p < 0.001, d = 1.103). No significant difference was found in the

142

T. Nagasawa and W.-C. Li

mean value of SART by display design or experiment location. Paired t-Tests showed a significant difference in SUS values by display design in onsite experiments (t(34) = 2.90, p = 0.007, d = 0.490). Independent samples t-tests showed a significant difference in SUS values by display design in remote experiments (t(41) = 3.23, p = 0.002, d = 0.985). Independent samples t-tests showed a significant difference in SUS values by experiment location in aECL design (t(54) = 5.13, p < 0.001, d = 1.416). Table 2 shows the means and SDs of NASA TLX, SART and SUS values by display design and experiment location. Table 2. Summary of results of subjective measurements on NASA-TLX, SART and SUS questionnaires. Experiment location

Onsite

Variables

Display Design

N

M

NASA- TLX

ECL

35

22.7

aECL

35

17.3

5.37

22

24.3

7.66

SART

ECL

35

20.7

6.62

22

22.0

5.48

aECL

35

20.6

6.30

22

19.7

7.66

ECL

35

71.0

13.16

22

71.6

12.71

aECL

35

77.9

13.19

22

55.4

19.67

SUS

Remote SD 5.57

N

M

22

21.7

SD 5.41

3.4 Thematic Analysis of Interviews and Answers for Open-Ended Questions All the answers to open-ended questions and comments in interviews (total 215) were analysed through the six-step thematic analysis procedure [13]. As a result, five themes and 26 codes were found from answers/comments from participants (Fig. 6). In thematic analysis, the themes can be determined either by inductive or deductive approach [13]. In this research, a deductive or “top-down” approach was first utilised, and themes were determined based on the research aim; what are the current pilots’ opinions on two new technologies (touchscreen and automation)? Opinions were divided into favours and concerns over each topic. Another theme was found in the inductive or “data-driven” approach. Many participants aggressively suggested their ideas about the new design of future commercial aircraft cockpit display and its operation, which formed a theme of “New Design/Operation Proposal”. In Fig. 6, each coloured box represents one theme. In each box, each row represents the code; the code’s abbreviation and the code’s count appeared in answers and comments.

The Application of a Human-Centred Design Integrated Touchscreen Display

143

Fig. 6. Thematic analysis results. Colored boxes represent five themes determined by analysis. Each row in themes represents code, the abbreviation of code, and the number of each code appeared in whole comments from participants (Color figure online)

4 Discussion 4.1 Shorter Task Completion Time in Manual ECL Design Although onsite experiment participants were less experienced in commercial aircraft cockpit operations, the mean task completion time was significantly shorter in ECL design than in aECL design. This result could imply that the benefit of PCP is valid in the display design of commercial aircraft. During NNC operations with ECL in present commercial aircraft cockpits, pilots often have to move their fixation points radically between overhead panels and forward displays. In the experimented design, the NNC and relevant synoptic and switches were located side-by-side, so minimal fixation movements were required to conduct the NNC operation. A better PCP was modulated in ECL and S/CD designs, which could have resulted in quicker manual completion than automated processes. 4.2 Interpretation of Eye Tracking Results Previous research (e.g., [14, 15]) found positive correlations between mental workload and pupil diameters. In this study, the same result was found: the mean pupil diameter was bigger in ECL design than that in aECL design, and the subjective workload measured by NASA TLX was higher in ECL design than that in aECL design. It was reported that previous studies found that task demand correlates positively with fixation duration and negatively with fixation count [14]. In this study, on S/CD display, less fixation

144

T. Nagasawa and W.-C. Li

count and more fixation duration were found in aECL design than those in the ECL design. Superficially, these findings contradict each other; for aECL design, smaller pupil diameter and less NASA-TLX scores suggest lower workload, while less fixation count and more fixation duration suggest more task demands on S/CD display. These results could be interpreted as different aspects of mental workload. Wicken’s Human information processing (HIP) model [16] divided the human information processing process into four stages: 1) Sensory processing, 2) Perception, 3) Decision and Response Selection, and 4) Response Execution. In ECL design, participants had to conduct 3) and 4) manually. In contrast, in aECL design, they were automatically conducted by the aECL system. Therefore, in aECL design, participants were free from the conduction of two latter HIP processes, which could have caused the less NASA-TLX score and smaller pupil diameter in aECL design. Looking at the fixation on S/CD, other aspects of the eye-tracking result were found. It was suggested that passive monitoring of an automated system can cause a higher mental workload than if the automated task was performed manually [17]. Although participants did not have to do physical manipulations of switches on the S/CD in aECL design, to acquire the SA, they had to keep monitoring what was going on on the S/CD. Therefore, it could have required more workload for monitoring the S/CD and consequently affected the fixation count and duration of participants. 4.3 Differences in Subjective Workload and Usability For subjective workload and usability, inconsistent results were found between onsite and remote experiments. Onsite participants felt a higher workload and less usability in ECL design, while remote participants, most of them were professional pilots, felt them in aECL design. Mental models can be described as our understanding of system components, how the system works, and how to use it [4]. It is one of the organisations of information in Long-term memory and can be strengthened by training [4]. It is likely that many remote participants had already developed their own mental models of ECL, indicators, switches and other systems in current commercial aircraft cockpits. Therefore, the fact that remote participants felt a higher workload and less usability in aECL design might infer that the aECL design was not “compatible” [4] with their mental models. 4.4 Review of SA, Workload, and Nature of NNC Operations It was argued that sometimes automation could reduce not only the workload but also the operator’s SA [4]. While the Degraded SA was quite a popular topic in thematic analysis, subjective SA did not show a significant difference by display designs. In contrast, the subjective workload was significantly increased by aECL design in remote experiments. In thematic analysis, Increased Workload due to Automation and Frustration over Automatic Proceeding were popular topics among professional pilots. This phenomenon can be explained as follows. In “good” (from pilots’ perception) automation design in the cockpit, pilots are comfortable with being out-of-the-loop of task conduction. Thus, the workload decreases, and SA sometimes unintentionally deteriorates simultaneously. On the other hand, in “bad” (or “awkward”) automation

The Application of a Human-Centred Design Integrated Touchscreen Display

145

design, pilots feel their SA is deteriorated and try harder to understand what is going on to keep themselves situationally aware. As a result, the workload is increased. In the aECL design of this study, the actual drills of NNC steps were automated and proceeded with system-define paces. Participants were forced to be in the outer loop of processes and monitor the automated process passively. Pilots with mental models of current (manual) ECL might try to keep themselves in the inner loop of NNC operations which could have resulted in a higher workload. NNCs, by nature, cannot cover all non-normal situations a crew might encounter [10]. Therefore, pilots may have to amend partial or whole steps of NNCs to fit their situations. Under the theme of Concern of Automation, many comments related to these natures of NNC were raised (e.g., Desire for Pilot Authority and Multi Task Processing (of pilots’ duty)). These results could indicate that the autonomous proceeding of NNC steps in aECL design was incompatible with these natures of current NNC operations, which could have caused the lower scores of system usability for aECL design among pilots. 4.5 S/CD with Adaptive Activation of Switch In this study, the non-normal situation imposed was quite simple, and participants could conduct the NNC only through the Electrical page on S/CD. Even in this relatively simple scenario, 15 comments expressed Concern over Mis-Operations in thematic analysis. This result confirmed that mitigations for mis-operations of touchscreen must be seriously considered when developing S/CD as a primary inceptor for commercial aircraft system manipulations. Pilot participants proposed several concrete mitigations for mis-operations of touchscreens. One pilot suggested that “It may be one of the future design concepts: the switches are displayed only when they are needed to be manipulated. This would eliminate the error of manipulating wrong switches.” Physical switches must be located and occupy a certain space of panels to be manipulated when needed. In contrast, virtual switches are not necessarily displayed or activated permanently. In other words, the virtual switches can be designed with adaptivity for display or activation of them. If the wrong switch is not displayed or is disabled to manipulate, the risk of mis-operations of switches is eliminated. Figure 7 shows an imaginary example of S/CD with Adaptive Activation of Switch based on this idea. In this design, only the switch relevant to the current NNC step (L2 GEN switch) is activated, and all the other switches are deactivated. Additional action (pushing “Activate all switches”) is needed to activate the deactivated switches. With Adaptive Activation of Switches on S/CD, the risk of mis-operations of wrong switches can be eliminated. Pilots are expected to be able to understand the status of each sub-system quickly and to take the appropriate action promptly. 4.6 Flexible/Adaptive Automation Considering the discussion about the shortfalls of the original aECL design in previous sections, what would be an alternate design for improved aECL? In ECL design, the “read and do” processes were still conducted manually by the participants, and only the processes for monitoring results were automated. On the other hand, in aECL design, all

146

T. Nagasawa and W.-C. Li

Fig. 7. Conceptual design of Adaptive Activation of Switch on S/CD. Only the switch relevant to current NNC step can be manipulated without pushing “Activate all switches”

the processes, including “read and do” processes were done autonomously. This difference could have forced participants to do passive monitoring in aECL design. Assuming that the level of automation in the original aECL was too high, and based on several comments from participants, Fig. 8 shows an imaginary design with “modest” level of automation in the ECL system. For each step in NNC, firstly “AUTOMATE” button appears and pilots can push this button to let the system conduct process autonomously. Alternatively, pilots can manually manipulate the step by manipulating the switches on S/CD. If no action is taken for a certain time, the “PAUSE” button appears next to the “AUTOMATE” button and ten seconds timer count appears in “AUTOMATE” button. Pilots can still push the “PAUSE” button to prevent the automatic proceeding for ten seconds. The design in Fig. 8 respects one principle of human-centred automation; Making the automation flexible and adaptive [4]. Flexible automation simply means that different levels are possible. In this design, pilots can choose whether they let the system do steps or do them manually. Adaptive automation goes further than flexible automation and changes the level or number of systems operating under automatic control with regard to aspects of the situation [1]. In this design, the system autonomously increases the level of automation if no input is given for a certain amount of time. Through aECL with Flexible/Adaptive Automation, pilots can retain the authority for NNC operation and conduct NNCs at their own pace. The action implementation can be automated by pilots’ discretions, which can reduce the workload. If no input is given for a certain amount of time, the system autonomously does the step, which can be helpful in overload due to other tasks or a rare case of pilot incapacitation.

The Application of a Human-Centred Design Integrated Touchscreen Display

147

Fig. 8. Conceptual design of ECL with Flexible/Adaptive Automation. Pilots decide whether the step should be done, and once determined and push “AUTOMATE” button, the system automatically handles the step. If no action is taken for a certain time, the step will be done automatically.

4.7 Human Machine Teaming In thematic analysis, five comments mentioned the importance of Human-Machine Teaming (HMT). Also, explainability (or observability) and adaptability are two general requirements for HMT [18]. These results could imply the participants’ significant consideration of HMT for cockpit design. Even in the current cockpit, the pilot forms a team with the other pilot, aircraft systems including automation, and other stakeholders (ATC officer, cabin crews, dispatchers etc.). The result suggests that the same level of teaming is regarded as one of the crucial elements in conducting NNC operations with more advanced automation systems. Automation is not an “all or nothing” thing [1]. In coordination between the two pilots in current commercial aircraft cockpits, they continuously and almost unconsciously monitor each other’s condition and conduct tasks toward a shared goal of their mission. When needed, they help each other and shuffle the task distributions to fit the situation. To achieve the same level of adaptive coordination, automation systems need to continuously monitor the situations around them, including pilots’ workload and performance, and to change the level of automation adaptively so that they can provide just the right amount of support to pilots. It may be a natural result that adaptive automation/switch activation emerged as one of the outcomes of this study because it is one of the critical elements of HMT in flight operations. 4.8 Limitations of This Study Due to the availability of participants in onsite experiments, the eye-tracking data were taken from participants with few prior flight experience. In future studies, eye-tracking

148

T. Nagasawa and W.-C. Li

experiments with professional pilots would validate the finding in this study and be helpful for the comprehensive evaluation of future cockpit display designs. Also, in this study, only two NNCs were conducted by participants in each experiment scenario. Further study should be conducted with more NNCs to determine the best ranges and levels of automation that should be implemented in the ECL system, considering its effect on pilots’ SA and workload. Finally, Both Adaptable Activation of Switches on S/CD and Flexible/Adaptive Automation on ECL display were not examined against their feasibility for production from engineering and certification perspectives. Further study will be needed when materialising these concepts in real aircraft, including considerations of those perspectives.

5 Conclusions This study examined the HCD approach to determine how touchscreens and automation would affect the Human-Machine interactions in commercial aircraft cockpits. Several theories related to human cognitions (PCP, SA, and mental models in Long-term memory) were examined to interpret the study’s results. The discussed topics in this study can be used as a case study of how touchscreen and automation in cockpit displays affect humans’ cognitive workload, SA, and system usability. Adaptive Activation of Switches can be a solution for touchscreen inceptors which takes the benefits of PCP and prevents the risks of mis-operations. Flexible/Adaptive automation in ECL system can balance the pilots’ SA and workload in NNC operations. Although these concepts are not tested by engineering feasibility and certification requirements, they can be used as a seed of the idea for cockpit display designs of future commercial aircraft. The effectiveness of eye-tracking for measurements of workload was also examined. It was found that the pupil diameter and fixation could indicate different aspects of participants’ visual attention and workload imposed by designed displays. The findings confirmed the benefits of eye-tracking measurements in the HCD approach and the importance of comprehensive comparisons between the results of eye-tracking and other subjective measurements. Although it must be admitted that the S/CD and ECL displays are just a small portion of entire cockpit systems which interface with pilots, the methods, findings and concepts used in and outputted from this study can be referred to as a case study of a HCD approach for commercial aircraft cockpit systems. Further study of cockpit design should be continued to materialise an entire cockpit ecosystem, assuring the highly balanced safety and efficiency of flight operations in more complex operational environments in the future.

References 1. Harris, D.: Human Performance on the Flight Deck, 1st edn. CRC Press (2011) 2. Watkins, C.B., Nilson, C., Taylor, S., et al.: Development of touchscreen displays for the gulfstream G500 and G600 symmetryTM flight deck (2018) 3. AIRBUS: Press release: airbus delivers first A350s with touchscreens (2019) 4. Wicken, C.D., Lee, J., Liu, Y., Becker, S.G.: An Introduction to Human Factors Engineering, 2nd edn. Prentice Hall (2003)

The Application of a Human-Centred Design Integrated Touchscreen Display

149

5. CIEHF: Human Factors in Highly Automated Systems (2022) 6. Boeing: 787 Flight Crew Operations Manual, Revision 26 (2018) 7. Boy, G.A.: Introduction to Human-Machine Interaction: A Human-Centered Design Approach (2011) 8. Li, W.C., Zhang, J., le Minh, T., et al.: Visual scan patterns reflect to human-computer interactions on processing different types of messages in the flight deck. Int. J. Ind. Ergon. 72, 54–60 (2019). https://doi.org/10.1016/j.ergon.2019.04.003 9. Li, W.C., Horn, A., Sun, Z., et al.: Augmented visualization cues on primary flight display facilitating pilot’s monitoring performance. Int. J. Hum. Comput. Stud. 135, 102377 (2020). https://doi.org/10.1016/j.ijhcs.2019.102377 10. Myers, P.L.: Commercial aircraft electronic checklists: benefits and challenges (literature review). Int. J. Aviat. Aeronaut. Aerosp. 3 (2016). https://doi.org/10.15394/ijaaa.2016.1112 11. Daniel, S.: Jump Start Adobe XD. SitePoint (2017) 12. Pupil Labs: Pupil Core: Getting Started (2022). https://docs.pupil-labs.com/core/. Accessed 7 Aug 2022 13. Braun, V., Clarke, V.: Using thematic analysis in psychology. Qual. Res. Psychol. 3, 77–101 (2006). https://doi.org/10.1191/1478088706qp063oa 14. Glaholt, M.G.: Eye tracking in the cockpit: a review of the relationships between eye movements and the aviator’s cognitive state. Defence Research and Development, Canada (2014) 15. Recarte, M.Á., Pérez, E., Conchillo, Á., et al.: Mental workload and visual impairment: differences between pupil, blink, and subjective rating. Span. J. Psychol. 11, 374–385 (2008) 16. Wickens, C.D., Helton, W.S., Hollands, J.G., Banbury, S.: Engineering Psychology and Human Performance, 5th edn. Taylor & Francis Group (2021) 17. Endsley, M.R.: From here to autonomy: lessons learned from human-automation research. Hum. Factors 59, 5–27 (2017). https://doi.org/10.1177/0018720816681350 18. Mcdermott, P., Dominguez, C., Kasdaglis, N., et al.: Human-machine teaming systems engineering guide (2018)

Analysis of Airline Pilots’ Risk Perception While Flying During COVID-19 Through Flight Data Monitoring and Air Safety Reports Arthur Nichanian(B)

and Wen-Chin Li

Cranfield University, Cranfield, Bedford, UK {a.nichanian,wenchin.li}@cranfield.ac.uk

Abstract. The onset of the COVID-19 pandemic has created a new operating environment (involving less traffic and ATC shortcuts, therefore more direct approaches) due to the reduced number of flights and new health regulations. This disrupted the aviation sector and created new threats while others disappeared or became less severe. This research focuses on the variations in risk perception between pilots and retrospective safety analyses by comparing Flight Data Monitoring (FDM) data and Air Safety Report (ASR) submissions. The data consists of 3702 FDM events out of which 398 events featured an ASR over 24 months between 2019 and 2021. The results show that pilots notice and favour external risks (such as mid-air collisions) over internal ones (e.g., involving piloting skills). Moreover, the severity of FDM events significantly increased during the pandemic but it did not feature more ASRs being submitted. Therefore, pilots’ risk perception may have been negatively affected by the new operating environment. These findings can be useful for airlines to tailor pilot training and to identify new ways to improve safety management programmes. Keywords: flight data monitoring · risk perception · air safety reports

1 Introduction The COVID-19 pandemic has had a profound impact on aviation. Passenger demand dropped on average by 75% and airlines had to cut flights and furlough personnel, including pilots. It also disrupted the airline’s safety monitoring programmes as most flights were not operated regularly anymore, thus airlines had to rely on their accumulated expertise over the previous years. For pilots still flying, the operating environment suddenly changed: fewer opportunities to fly, fewer busy airspaces, and new health regulations. In this new operating environment, some threats disappeared or decreased in severity, and newly emerged. It is possible to analyse pilots’ risk perception by comparing the submission of Air Safety Reports (ASRs), which are subjective, to the objective Flight Data Monitoring (FDM) data collected by an airline. Analysing pilots’ perception of these new risks versus the older, known ones, and comparing them to the risk management of an airline’s flight safety department can be useful to identify any discrepancy between © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 150–162, 2023. https://doi.org/10.1007/978-3-031-35389-5_11

Analysis of Airline Pilots’ Risk Perception

151

the real-time, dynamic risk assessment process from pilots, and the retrospective analysis done by the airline’s safety department. It may help airlines tailor training sessions for pilots and identify new opportunities to improve their safety monitoring systems.

2 Relative Work Flight Data Monitoring (FDM), or Flight Operation Quality Assurance (FOQA) is the process of recording and analysing data from the aircraft’s Quick Access Recorder (QAR). The events precursors are classified according to EASA regulations in different categories: Runway Excursion (RE), Mid-Air Collision (MAC), Controlled Flight Into Terrain (CFIT), and Loss Of Control Inflight (LOC-I) [1]. FDM is a powerful tool for airlines and regulators to analyse and quantify safety performance and is typically integrated into their Safety Management System (SMS). Through the quantification of operational risks, they can adapt their procedures and/or establish training objectives for crews. FDM analyses are based on capturing deviations from pre-set thresholds, such as unstable approaches, long landings, altitude deviations, etc. FDM analyses are typically performed within safety matrixes, which combine both event severity and frequency. The exceedance events are classified into the EASA FDM categories, such as Controlled Flight Into Terrain (CFIT), Loss of Control In-flight (LOC-I), Mid-Air Collision (MAC) and Runway Excursion (RE). This classification enables a standardised event analysis, whose events can then be assessed through their severity [1]. It has been determined, based on the EASA FDM categories, that on average CFIT, LOC-I, and RE events present the highest severity [2]. In general, recurring events with a low severity are preferable to infrequent events but with a much higher severity level. Pilots’ inputs on the controls are recorded, but it is difficult to perform a naturalistic decision-making analysis behind the sole inputs [3]. For this reason, pilots are usually encouraged to submit Air Safety Reports (ASRs) if they encountered any incident. The ASR consists of a form which is filled out by the pilots, containing information regarding the flight conditions (aircraft type, weather, flight phase) and a written description of the encountered event. The submission of an ASR can be mandatory for some events such as a Traffic Collision and Avoidance System resolution advisory; these requirements are laid down by ICAO [4]. Airlines can also make additional events mandatory to report, such as go-arounds. The just culture principles must be strictly followed when handling ASRs, to maintain an adequate reporting culture within the organisation. A just culture ensures that the person reporting an honest error, or an unsafe condition does not get blamed so that the submitted feedback can contribute to improving safety. It thus contributes to an atmosphere of trust within people and a willingness to submit reports [5]. Moreover, the submitted reports and/or data need to be anonymised so that the persons involved cannot be blamed. However, deliberate unsafe actions (for instance reckless noncompliance) must still be sanctioned. To implement and maintain an effective just culture programme within an organisation, it must be documented and thus defined accurately and precisely. The reporting system must be linked to the just culture programme and closely monitored to identify any possible deviation. Mishandling of ASRs can lead to a loss of motivation to report, therefore preventing an organisation to learn from the encountered incidents [6]. Typically, ASRs are combined with FDM events to gain a broader understanding of a

152

A. Nichanian and W.-C. Li

situation beyond the raw data, combining the data with the pilot’s subjective assessment of the incident [7]. In modern aircraft, autopilot usage is a routine operation, most of the flight is being flown by automated systems which have the advantage of flying more precisely than a human while reducing crew workload in most cases [8]. Interactions with automated systems in the cockpit have become so embedded in pilots’ duties that the autopilot (AP) is now being referred to as the “third crew member” in the cockpit [9]. However, concerns have arisen since the nineties regarding automation complacency and misuse [10]. Manual flying skills and automation dependency have been widely studied in the past years, with research showing that increased and regular use of the autopilot can degrade manual flying skills among pilots due to lack of practice [11]. Research also demonstrated that pilots’ decision to use or not to use automation depends on various factors, including the assessed risk [10]. During a flight, pilots constantly have to make decisions, i.e., assess the current situation, project possible outcomes, and decide the appropriate course of action [12]. This implies weighing the risks implied in the various courses of action. Assessing risks is based on the balance between potential losses and perceived gains [13] and therefore involves the appraisal of the external situation and weighting it to one’s capabilities [14]. Risk perception varies from person to person and is dependent on various individual factors such as training and experience [15, 16] as well as on external factors such as the environment (place, time of the day, environmental conditions). Pilots’ abilities to analyse risks are also based on observable information, i.e., on the available cues and their clarity or ambiguity. Ambiguous cues can lead to pilots misperceiving potential risks, therefore impairing their risk assessment [17]. However, studies related to risk assessments of airline pilots tend to demonstrate that pilots tend to be overconfident when operating in a dynamic environment. As their mental workload increases, they tend to underestimate environmental risks which in turn increases the likelihood of a higher risk-taking behaviour [18]. Perceiving and evaluating risks also plays a role in pilots’ willingness to submit a report if they deemed that they encountered an incident. There appears to be a “self-protection” effect in pilots submitting a report: it has been demonstrated that events related to the environment (such as procedures, weather, and hardware) are more likely to be reported than individual errors. Therefore, it is likely that the airline’s distribution of reports is biased towards environmental factors whereas individual errors are less represented. This has an impact on the safety monitoring abilities of the airline [19]. Moreover, pilots are more likely to report events which they perceived to be of high risk rather than events deemed to be of a lower risk [19]. It is to be noted that the majority of studies involving the analysis of risk perception among pilots focus either on questionnaires or simulator experiments involving assessing a particular scenario. Few studies have been conducted that aim at focusing on flight operations instead of a closed, managed experiment with less variability.

Analysis of Airline Pilots’ Risk Perception

153

3 Methodology 3.1 Data Source The dataset consists of 4761 FDM occurrences out of 123 140 flights operated in a major European commercial airline between June 2019 and May 2021 (across 24 months) on the Airbus A320 aircraft family (from A319 to A321) and Boeing 777 family (B772 and B773). The raw FDM data were first processed by the operator through their FDM data analysis programme, before being sent to the authors. The dataset, in form of a.csv file consists of a short event description, categorization, and associated pieces of information, such as month and year of occurrence, aircraft type, flight phase, place of occurrence and assessed event severity. The associated severity index (SI) score is attributed to each event. It is based on an equation for each event, which places the occurred deviation on a continuous scale, ranging from 0 to 300 depending on the level of exceedance. It aims at providing a standardized risk index for all events, the higher the number, the higher the risk encountered. For the analysis, the events were classified within the EASA standard FDM framework [1]: Controlled Flight Into Terrain (CFIT), Loss of Control In-Flight (LOC-I), Mid-Air Collision (MAC), and Runway Excursion (RE). Due to confidentiality reasons, the data is non-disclosable therefore this study cannot be replicated by an external practitioner.

Fig. 1. Summary of the main EASA FDM categories [1]

In addition to the FDM data, information regarding the submission of an associated Air Safety Report (ASR) is provided. The content of the ASR is not disclosed; however, it indicates if the pilots submitted an ASR following the exceedance event.

154

A. Nichanian and W.-C. Li

3.2 Research Procedure As per the airline’s policy, pilots are required to submit an ASR in case of a go-around, which also creates an FDM event. Therefore, to avoid any data bias, FDM events classified as go-arounds were removed from the dataset, leaving 3702 FDM events, out of which 398 feature an associated ASR. The dataset has been divided into three stages following the pandemic outbreak. The first stage (stage 1 – 06.2019 to 01.2020) comprises 2261 FDM events. The second stage (stage 2 – 02.2020 to 09.2020) comprises 1037 FDM events. The third and last stage (stage 3 – 10.2020 to 05.2021)) comprises 404 FDM events. The number of flights significantly dropped from 2020 (82 819 flights at stage 1 for 26 128 flights at stage 2, and 14 193 flights at stage 3). Several statistical analyses were performed which include analysing the severity index scores and ASR submission per event categories, flight phases, and fleets. The focus was set on specific elements relevant to risk perception and aircraft automation (RE, CFIT, and LOC-I categories) which can be related to aircraft automation use or manual flying skills [1]. RStudio interface 1.3.1093 requiring R version 4.0.3 were used [20]. 3.3 Statistical Tools Used Occurrence frequencies (percentage of submitted ASRs), means, medians and standard deviations of different variables (such as severity index scores) were calculated as part of the exploratory data analysis. In addition, correlations and Pearson statistics were used. P-levels lower than 0.05 were considered to be statistically significant [3]. Following explanatory data analysis, The Levene and the Shapiro-Wilk tests were used which suggested data heterogeneity and non-normality distributions among the severity index scores. For the analysis of the severity index scores at each stage, the Kruskal-Wallis and pairwise Dunn tests were conducted. The according p-values we’re Holm-corrected. Chisquare tests were used to compare the frequency of submitted ASRs to the FDM severity index scores, pandemic stages, and FDM event categories. The dependent variables here represent the frequency of submitted ASRs per event category, pandemic stage, and severity index.

4 Results 4.1 Sample Characteristics A total of 3702 FDM events were processed including 398 events with an associated ASR for 24 months divided into three stages (Table 1). The following elements were analysed: the significance of the change in severity index across the three pandemic stages for each category (CFIT, LOC-I, MAC, RE), the association between the FDM severity index scores and the ASR submissions, and the correlation between the FDM event category and ASR submission.

Analysis of Airline Pilots’ Risk Perception

155

Fig. 2. Significance tests for the severity index scores for each FDM category and autopilot use across the three pandemic stages

4.2 Testing the Increase in Severity Index for LOC-I and RE Events and for Events Related to Manual Flying as the Pandemic Started There is a significant increase in the severity index for LOC-I events at stage 2 compared to stage 1 (XK 2 (2) = 28.57, p < 0.001). Equally, there is a significant increase in severity index in stage 2 and stage 3 compared to stage 1 for RE events (XK 2 (2) = 12.89, p < 0.05). However, no significant increase in severity index over the three pandemic stages has been found for the CFIT and MAC categories. Moreover, a significant increase in event severity involving manual flying (autopilot off) across the three pandemic stages, XK 2 (2) = 25.89, p < 0.001. In addition, a significant increase in severity index between stage 1 and stage 2 has also been found for events in which the autopilot was engaged, XK 2 (2) = 11.79, p < 0.05 (Fig. 2).

156

A. Nichanian and W.-C. Li

4.3 Testing Associations Between Severity Index Scores and the Number of ASRs Submitted Across the Three Pandemic Stages Chi-square tests were used to determine significant associations between the SI scores and ASR submissions. Results show that there is a significant association between the SI intervals and ASRs submitted at stage 1, X2 (3) = 43.924, p < 0.001. There is also a significant association between the SI intervals and ASRs submitted at stage 2, X2 (3) = 32.257, p < 0.001. Moreover, there is a significant association between the SI interval and ASRs submitted at stage 3, X2 (3) = 9.483, p < 0.01. Finally, there is a significant association between the SI intervals and total ASRs submitted, X2 (3) = 45.781, p < 0.001 (Table 1). Table 1. Number of ASR submitted for each severity index interval, divided into quartiles, and associated standardised Pearson residuals SI Interval

No ASR submitted (standardised residuals)

ASR submitted (standardised residuals)

Percentage of ASRs submitted

0–22

485 (−3.99)

86 (3.99)

15.06%

22–32

530 (3.51)

38 (−3.51)

6.69%

32–50

524 (4.14)

33 (−4.14)

5.92%

50–262

482 (−3.63)

83 (3.63)

14.69%

Total

2021

240

10.61%

0–25

264 (−1.00)

35 (1.00)

11.71%

25–39

223 (4.30)

6 (−4.30)

2.62%

Stage 1

Stage 2

39–65

235 (1.44)

20 (−1.44)

65–298

209 (−4.54)

45 (4.54)

17.72%

7.84%

Total

931

106

10.22%

0–24

93 (0.81)

11 (−0.81)

10.58%

24–39

94 (2.37)

6 (−2.37)

6.00%

39–61

84 (−0.78)

15 (0.78)

15.15%

61–190

81 (−2.40)

20 (2.40)

19.80%

Total

352

52

12.87%

Stage 3

(continued)

Analysis of Airline Pilots’ Risk Perception

157

Table 1. (continued) SI Interval

No ASR submitted (standardised residuals)

ASR submitted (standardised residuals)

Percentage of ASRs submitted

0–24

848 (−3.42)

134 (3.42)

13.65%

24–34

829 (6.12)

45 (−6.12)

5.15%

34–52

831 (0.78)

93 (−0.78)

10.06%

52–298

796 (−3.30)

126 (3.30)

13.67%

Total

3304

398

10.75%

Total

4.4 Testing Associations Between the Number of ASRs Submitted, Total FDM Events, FDM Event Categories, and Autopilot Use Across the Three Pandemic Stages No significant association has been found between the frequency of submitted ASRs relative to the number of FDM events across the three pandemic stages. X2(2) = 0.410, p = 0.81 (Table 2). Table 2. Percentage of ASRs submitted relative to the number of FDM events across the three pandemic stages and associated Pearson residuals Pandemic stage

Percentage of FDM events with no associated ASR

Percentage of FDM events with associated ASR

Stage 1

89.39 (0.07)

10.61 (−0.19)

Stage 2

89.78 (0.11)

10.22 (−0.30)

Stage 3

87.13 (−0.17)

12.87 (0.49)

Total

89.25

10.75

There is a significant association between the FDM event categories and the submitted ASRs. X2 (3) = 482.0, p < 0.001. Table 3 displays the number of ASRs submitted for each event category and the associated Pearson residuals.

158

A. Nichanian and W.-C. Li

Table 3. Cross-tabulation of the submitted ASRs for each FDM event category and associated standardised Pearson residuals over the three pandemic stages Category

No ASR submitted (standardised residuals)

ASR submitted (standardised residuals)

Percentage of ASRs submitted

CFIT

252 (−4.23)

55 (4.23)

17.92%

LOC-I

746 (−6.54)

149 (6.54)

16.65%

MAC

80 (−18.70)

93 (18.70)

53.76%

RE

2226 (16.38)

101 (−16.38)

Total

3304

398

4.34% 10.75%

There is a significant association between the number of ASRs submitted and the AP use. X2(1) = 13.969, p < 0.001. Table 4 displays the number of ASRs submitted relative to the autopilot being engaged or disengaged. Table 4. Number of FDM events depending on autopilot usage, and associated standardised Pearson residuals AP use

No ASR submitted (standardised residuals)

ASR submitted (standardised residuals)

AP not in use

2301 (3.79)

240 (−3.79)

AP in use

1003 (−3.79)

158 (3.79)

5 Discussion The results demonstrate that although the severity index of LOC-I and RE events significantly increased with the pandemic (Fig. 2), the frequency of ASR submissions did not change (Table 2). Several possibilities exist, which could explain this result. It is possible that pilots did not realise an FDM event occurred during their flight nor did they perceive any deviation occurring, hence they did not submit an ASR. Another possibility could be that they perceived the operating environment during pandemic times as being less risky due to the reduced number of aircraft flying around. This might have biased their perception of risks as new risks have emerged due to the reduced flights operating such as ATC suggesting shortcuts, thus reducing the time spent during the descent and approach phases. It is very likely that pilots still flying during the early pandemic stages had the sensation of operating in a more “relaxed” environment. However, they were affected by proficiency decay [21, 22] and the new operating environment required adaptability from the pilots both in terms of interactions with ATC and health regulations, whose importance might have been underestimated. Moreover, MAC events are the most reported (53.76%, Table 3). These events mostly consist of Traffic Collision

Analysis of Airline Pilots’ Risk Perception

159

and Avoidance (TCAS) resolutions and altitude deviations which are more noticeable by crews and considered to form external threats [23], although the average severity index of MAC events is comparatively lower than those of LOC-I, CFIT, and RE events (Fig. 1). This would be consistent with previous studies which identified that pilots are more prone to identifying external risks than internal ones [24]. However, as it is mandatory for pilots to report TCAS RA events as per ICAO regulations [4], it may bias these results, although MAC events do not consist only of TCAS resolution advisories. Consistently, the percentage of reports associated with the other event categories (CFIT, LOC-I, and RE) remains comparatively low (Table 3). Only 4.34% of RE events are associated with an ASR although this category features the highest average severity index (Fig. 2). As per EASA classification, LOC-I and RE events are typically related to pilot inputs as they consist of deviations from stable approach criteria, dual pilot inputs, or long landings for instance [1]. Therefore, they can be correlated to pilots’ proficiency and skills [11]. These events mostly consist of aircraft internal threats, therefore less likely to be reported or even noticed. These results are consistent with those of Sieberichs & Kluge (2018) who identified that internal threats, mistakes, and errors are less likely to be reported than external ones. Results in Table 4 indicate a significant association between autopilot use and ASR submissions. More precisely, crews appear to be more willing to submit an ASR on an event where the autopilot was engaged than on an event where the aeroplane had been flown manually. Although the severity index scores of FDM events where the autopilot was engaged significantly increased during the pandemic, it did not lead to an increase in ASR submissions (Fig. 1 and Table 2) [21]. The results also demonstrate significant associations between the ASR submissions and the FDM exceedance severity index scores (Table 1). Events with a low severity index or with a high severity index are more prone to be reported, regardless of the pandemic stage, although the percentages overall remain low. At a first glance, it seems to contradict the results of Sieberichs & Kluge (2018) who found that the higher the perceived risk, the higher the likelihood of the event being reported. The higher percentage of lower severity events which are reported can be explained by the higher percentage of MAC events reported, as they feature a relatively low overall severity index score (Fig. 1 & Table 3). This underlines the discrepancy between the perceived risk by pilots of a MAC event versus the categorised risk by the airline’s safety department. Nevertheless, the percentage of events with high severity (3rd quartile) does not exceed 20% (Table 1). This can have an impact on an airline’s safety management system since the raw FDM data is available but the number of ASRs is skewed towards external threats [19]. It thus underlines the need to emphasise pilots to submit reports more frequently if they considered their performance to be either suboptimal or better than expected regardless of perceived threat severity. This would prove useful for an airline as they would be able to better combine FDM data and ASRs and therefore obtain more learning opportunities from the operations. Moreover, reporting events where performance was good (i.e., consistent with a safety-II objective) can present useful learning opportunities which would have been missed otherwise and would help develop safety resilience [25].

160

A. Nichanian and W.-C. Li

5.1 Limitations Several limitations exist in this work. The dataset only encompasses 24 months of data. Moreover, the content of the submitted ASRs is not provided. The thoroughness of the description of an encountered event depends on the pilots’ willingness to provide details, hence the ASRs usually vary in length. The usefulness of an ASR does not only depend on it being submitted by pilots but also on its quality, i.e., on the amount of information it contains. Moreover, it is not possible to determine any specific information regarding pilots’ demographics. It is possible that different populations of pilots exist, i.e., that some pilots are recurrent ASR writers whereas others virtually never submit one. Regarding the use of automation, the dataset only provided information on whether the autopilot was being used or not and did not contain any information regarding the automation level used. Studies have determined that the various levels of automation, from manual flight to fully automated, also play a role in pilots’ skill maintenance [11]. Finally, the severity index scores might have been influenced by weather, for example, a depression or small storm leading to high winds could lead to an increased number of deviations and an increase in their assessed severity.

6 Conclusion The COVID-19 pandemic not only affected pilots’ proficiency but also pilots’ risk perception. The reduced flight numbers as the pandemic started meant for pilots operating in a new, less busy, environment. This environment might have been perceived as being less risky due to the reduced number of traffics and the increased simplicity of operations (ATC shortcuts, no ATC slots, no holdings). However, it also created new threats, in particular a redefinition of the available time during descent and approach, which was reduced. Pilots tend to notice external, environmental threats more easily than internal ones, which is shown by a higher percentage of reports related to mid-air collisions than runway excursion precursors before and during the pandemic. Equally, external threats seem to be considered riskier, which justifies the submission of an ASR. This contradicts the airline’s risk analysis management, which considers runway excursions to be the riskier event, albeit least reported by the crews. This highlights the gap between the pilot’s perception of risk and actual risk. These findings can be useful to airlines when designing training sessions as they demonstrate the need to emphasise internal risks over currently well-identified external ones. Finally, establishing and maintaining an effective just and reporting culture is paramount to ensuring an adequate level of reports. However, as writing a report also features additional work for pilots, it might prove useful to conduct further research on providing adequate insensitive to pilots to write ASRs when needed.

References 1. EASA: Developing standardised FDM-based indicators (2016). Accessed 04 Aug 2021. https://www.easa.europa.eu/sites/default/files/dfu/EAFDM__standardised_FDM-based_ind icators_v2_Ed2017.pdf

Analysis of Airline Pilots’ Risk Perception

161

2. Borjalilu, N., Rabiei, P., Enjoo, A.: A fuzzy TOPSIS based model for safety risk assessment of operational flight data. Int. J. Aerosp. Mech. Eng. 12, 1073–1080 (2018). https://doi.org/ 10.5281/zenodo.2022699 3. Stogsdill, M.: When outcomes are not enough: an examination of abductive and deductive logical approaches to risk analysis in aviation. Risk Anal. (2021). https://doi.org/10.1111/ RISA.13681 4. ICAO: Doc 9859 Safety Management Manual (2018). Accessed 27 Aug 2021. https://store. icao.int/en/safety-management-manual-doc-9859 5. Reason, J.: Managing the Risks of Organizational Accidents. Routledge (2016). https://doi. org/10.4324/9781315543543. 6. Fearson, J.: Operational Safety. Routledge (2017). https://doi.org/10.4324/978131556645 0-21 7. Walker, G.: Redefining the incidents to learn from: safety science insights acquired on the journey from black boxes to flight data monitoring. Saf. Sci. 99, 14–22 (2017). https://doi. org/10.1016/j.ssci.2017.05.010 8. Harris, D.: Human Performance on the Flight Deck. CRC Press (2016). https://doi.org/10. 1201/9781315252988 9. Cahill, J., et al.: Adaptive automation and the third pilot: managing teamwork and workload in an airline cockpit. In: Longo, L., Leva, M.C. (eds.) H-WORKLOAD 2017. CCIS, vol. 726, pp. 161–173. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61061-0_10 10. Parasuraman, R., Riley, V.: Humans and automation: use, misuse, disuse, abuse. Hum. Factors J. Hum. Factors Ergon. Soc. 39(2), 230–253 (1997). https://doi.org/10.1518/001872097778 543886 11. Casner, S.M., Geven, R.W., Recker, M.P., Schooler, J.W.: The retention of manual flying skills in the automated cockpit. Hum. Factors 56(8), 1506–1516 (2014). https://doi.org/10.1177/ 0018720814535628 12. Klein, G.A.: A recognition-primed decision (RPD) model of rapid decision making. In: Decision Making in Action: Models and Methods, pp. 138–147. Ablex Publishing, Westport (1993) 13. Gullone, E., Moore, S.: Adolescent risk-taking and the five-factor model of personality. J. Adolesc. 23(4), 393–407 (2000). https://doi.org/10.1006/JADO.2000.0327 14. Hunter, D.R.: Risk Perception and Risk Tolerance in Aircraft Pilots (2002). Accessed 06 Jan 2022. https://apps.dtic.mil/sti/citations/ADA407997 15. Brown, V.J.: Risk perception: it’s personal. Environ. Health Perspect. 122(10) (2014). https:// doi.org/10.1289/ehp.122-A276 16. Wachinger, G., Renn, O., Begg, C., Kuhlicke, C.: The risk perception paradox-implications for governance and communication of natural hazards. Risk Anal. 33(6), 1049–1065 (2013). https://doi.org/10.1111/j.1539-6924.2012.01942x 17. Orasanu, J., Fischer, U., Davison, J.: Risk perception and risk management in aviation. In: Teaming Up: Components of Safety Under High Risk, pp. 93– 116 (2017). https://doi.org/10.4324/9781315241708-7/RISK-PERCEPTION-RISK-MAN AGEMENT-AVIATION-JUDITH-ORASANU-UTE-FISCHER-JEANNIE-DAVISON 18. Wang, L., Zhang, J., Sun, H., Ren, Y.: Risk cognition variables and flight exceedance behaviors of airline transport pilots. In: Harris, D. (ed.) EPCE 2018. LNCS (LNAI), vol. 10906, pp. 725– 737. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91122-9_57 19. Sieberichs, S., Kluge, A.: Influencing factors on error reporting in aviation - a scenario-based approach. Adv. Intell. Syst. Comput. 597, 3–14 (2018). https://doi.org/10.1007/978-3-31960441-1_1/FIGURES/4 20. R Core Team: R: A Language and Environment for Statistical Computing, Vienna, Austria (2020). https://www.R-project.org/

162

A. Nichanian and W.-C. Li

21. Li, W.-C., Nichanian, A., Lin, J., Braithwaite, G.: Investigating the impacts of COVID-19 on aviation safety based on occurrences captured through Flight Data Monitoring. Ergonomics 1–39 (2022). https://doi.org/10.1080/00140139.2022.2155317 22. Mizzi, A., Lohmann, G., Carim Junior, G.: The role of self-study in addressing competency decline among airline pilots during the COVID-19 pandemic. Hum. Factors J. Hum. Factors Ergon. Soc. (2022). https://doi.org/10.1177/00187208221113614 23. Coso, A.E., Fleming, E.S., Pritchett, A.R.: Characterizing pilots’ interactions with the aircraft collision avoidance system. In: International Symposium on Aviation Psychology, pp. 493–498 (2011). https://corescholar.libraries.wright.edu/isap_2011https://corescholar.lib raries.wright.edu/isap_2011/32. Accessed 14 Dec 2022 24. Drinkwater, J.L., Molesworth, B.R.C.: Pilot see, pilot do: examining the predictors of pilots’ risk management behaviour. Saf. Sci. 48(10), 1445–1451 (2010). https://doi.org/10.1016/J. SSCI.2010.07.001 25. Hollnagel, E.: Safety-II in Practice: Developing the Resilience Potentials, 1st edn. Routledge (2017)

Effects of Radio Frequency Cross-Coupling in Multiple Remote Tower Operation on Pilots Lukas Tews(B)

, Jörn Jakobi , Anneke Hamann , and Helge Lenz

Deutsches Zentrum für Luft-Und Raumfahrt e.V. (DLR), Institut für Flugführung, Lilienthalplatz 7, 38108 Braunschweig, Germany [email protected]

Abstract. Multiple Remote Tower Operation (MRTO) is a further development of Remote Tower Operation (RTO) that changes the way air traffic service (ATS) is provided at airports. Using MRTO, a single Air Traffic Controller can provide air traffic services to two or even more small airports with light traffic simultaneously, increasing efficiency, service utilization and cost-effectiveness. MRTO concept has been thoroughly proven to be feasible from the controllers’ point of view, but there are some issues that remain to be resolved from the pilots’ perspective. In order to safely apply MRTO, controller stated, that it is needed to “cross-couple” the radio frequencies of the served airports and to slightly adapt the standard radiotelephony phraseology by stating the relevant airport’s name in each radio call. Although these changes may appear to be only minor, their implications and effects on pilots have not been investigated scientifically, which motivated this study. In a human-in-the-loop real-time simulation experiment, 25 private and commercial pilots flew a Cessna C172 light aircraft at BraunschweigWolfsburg airport in a within-subject experiment design: one flight in an MRTO setting with coupled radio frequencies, and the other in a traditional RTO setting. The data analysis showed that the pilots’ overall mental workload was below an optimal medium during flights in both the RTO and MRTO cases. Workload differences were slightly, but statistically significantly higher in MRTO than in RTO, closer to but still below an optimal medium value. The measured situation awareness followed the opposite pattern, with slightly yet significantly lower ratings in the MRTO environment than in RTO. Attitudes towards MRTO were predominantly positive before and after performing the experiment. There were no mistakes or confusions in either flight performance or radio communication that could be attributed to the MRTO frequency cross-coupling. Therefore, the observed effects on mental workload and situation awareness are thought to be caused as well by the MRTO concept-driven higher number of radio calls that each pilot experienced during radio frequency cross-coupling. In summary, the effects of frequency cross-coupling in an MRTO environment compared to an RTO environment are statistically significant but slight, and did not impact the pilots’ mental workload and situation awareness to an extent which would affect their performance. In conclusion, frequency cross-coupling did not interfere with safe and efficient flight operations, and MRTO using frequency cross-coupling is therefore considered an appropriate and beneficial concept for small airports and airfields controlled or uncontrolled.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 163–177, 2023. https://doi.org/10.1007/978-3-031-35389-5_12

164

L. Tews et al. Keywords: Mental workload · Situation awareness · Multiple remote tower · Frequency cross-coupling · Effects on pilots

1 Introduction and Background Air traffic control (ATC) at aerodromes is usually provided by Tower Air Traffic Controllers (ATCO) from an ATC Tower with a view of the surface manoeuvring areas and the local airspace. Tower ATCOs are responsible for the safe and smooth operation of all aerodrome traffic on the manoeuvring area of an aerodrome and all aircraft flying in the vicinity of an aerodrome [1, 2]. At smaller airports traffic demand may be high in the morning or evening hours, in good visibility conditions or on weekends and holidays, but low at other times, and the valuable “Tower ATCO” resource might be under-utilized. Remote Tower Operations (RTO) is a new concept that can compensate such shortcomings in ATCO productivity and cost-effectiveness. With RTO, optical sensor systems at the airport produce a video stream which may be relayed and displayed at any required location. This enables aerodrome air traffic services (ATS) to be provided from a Remote Tower Center (RTC) that controls several airports from a centralised location [2, 3]. An RTC has the advantage of having a pool of ATCOs who can be employed much more efficiently, which improves productivity and cost-effectiveness. Furthermore, there is also no longer a local physical ATC Tower building to be constructed and maintained, which saves additional costs [4]. The basic RTO concept, by which each airport is controlled from one Controller Working Position (CWP), has been in operation since 2015 [4]. Multiple Remote Tower Operation (MRTO) is a further development of RTO which is not yet operational but in the deployment phase [5], nevertheless it has the potential to change the way aerodrome ATS is provided to smaller airports with light traffic. Visual surveillance and communications at several airports (up to three in the present concept) can be combined at one CWP, which is then called a Multiple Remote Tower Module (MRTM). When traffic levels and workload permit, aerodrome ATS may be provided from an MRTM by a single ATCO. By applying MRTO, an RTC can be operated even more efficiently, because it offers the possibility of a more flexible allocation of ATCOs and aerodromes that matches the traffic situation, e.g. by combining ATS to several smaller aerodromes in one MRTM during times of low traffic [5, 6]. From the ATCOs’ point of view, the feasibility of MRTO has been thoroughly proven and there are hardly any hurdles standing in the way of a speedy implementation, but from the pilots’ perspective there are still some unanswered questions [5, 7]. Generally, it is assumed that pilots are not affected by whether the “tower” facility is a physical ATC Tower at the aerodrome, an RTO or MRTO, because they receive the same ATS service level in each case (or maybe even higher ATS levels from an RTO/MRTO facility, such as longer airport opening times). However, MRTO validation trials with ATCOs produced recommendations to “crosscouple” the radio frequencies of the served airports and to slightly adapt the standard radiotelephony (R/T) phraseology by stating the airport name in each R/T transmission [5, 7]. Although these small adjustments may appear only minor, their implications and effects on pilots have not been investigated scientifically. But what is frequency cross-coupling and adapted phraseology about?

Effects of Radio Frequency Cross-Coupling

165

1.1 Frequency Cross-Coupling and adapted Phraseology Providing ATS to several aerodromes from the same “tower facility” at the same time creates a new situation for R/T communications between ATCOs and pilots. The European Union Aviation Safety Agency (EASA) [8] describes how communication between ATCOs and pilots can be handled in a MRTO environment. Two obvious possibilities exist for aeronautical mobile service (air-ground communications) in an MRTO environment: either the respective aerodrome frequencies are handled separately or are cross-coupled (as is commonly done when combining sectors at an area control centre). If the frequencies are handled separately, the ATCO is able to hear all transmissions from pilots from all aerodromes, but the pilots can only hear ATCO transmissions related to their ‘own’ aerodrome. This might have the advantage for pilots of avoiding the need to pay attention to transmissions that are not relevant. However, since pilots would not hear the ATCO’s communications with other airports, they might call when the ATCO is busy, creating risks of the ATCO not hearing or overlooking the call and of both parties being unaware of the fact. Such communication failures are seen as a having a very critical safety aspect [8]. Another shortcoming of a separately frequency handling is that the ATCO has to select the appropriate frequency before transmitting, which could lead to the possibility of the ATCO transmitting on the wrong aerodrome frequency due to human error. To avoid such risks, the concept of cross-coupled frequencies has been proposed and positively validated from the ATCO perspective [7, 9]. If the aerodrome frequencies are cross-coupled, pilots as well as the ATCO hear all transmissions related to all aerodromes under the responsibility of the ATCO. The benefits of this method are that since pilots are at all times aware of ATCO R/T occupancy, orderly communication is ensured and the risk of simultaneous calls to the ATCO is significantly reduced, and the risk of ATCO frequency selection error is eliminated [4] (see Fig. 1). For these reasons, frequency cross-coupling was the method preferred by ATCOs for radio communications with airports in an “multiple” context.

Fig. 1. Communication flows between ATCOs and pilot with cross-coupled frequencies in a MRTO environment

166

L. Tews et al.

However, cross-coupling the frequencies also means that pilots then have to listen to radio traffic from other airports, which is usually not of relevance to them. A high level of irrelevant radio traffic should not pose a problem to commercial pilots, who are used to heavy radio traffic and clearances regarding other runways when flying at large airports, but there are questions regarding pilots who usually fly from small aerodromes and may not be used to those amounts of radio traffic, like private pilots: Can such pilots also cope with increased levels and “foreign” radio traffic, or would they become too distracted to maintain situation awareness (SA) and would their performance be impeded by increased mental workload (MWL)? To mitigate such potential negative effects of cross-coupled radio frequencies, ATCOs recommended to slightly adapt the standard R/T phraseology in a MRTO environment [7, 10], and the recommendation is already part of EASA guidelines [8]: “… the ATS provider may also consider, as part of the local safety assessment, the inclusion of aerodrome names/ATS unit call sign for all transmissions (i.e. not only for the first contact) between pilots and ATCOs/AFISOs in multiple mode of operation. If this procedure is to be implemented, it should be published in the AIP for the particular aerodrome, together with any other specific communication methods as deemed necessary”. Such a recommendation to include the ground radio station as a prefix to each radio call for both ATCO and pilots as part of the phraseology procedures should make it easier both for the ATCO to be aware of which airport’s traffic s/he is talking to and for pilots to filter relevant and non-relevant information. 1.2 Mental Workload and Situation Awareness When implementing changes and new operational concepts, it is important to assess how they impact human operators, in this case the pilots. Two central concepts that are important for pilot performance are MWL and SA. MWL describes the cognitive demand imposed on the pilot by the tasks at hand. More precisely, it describes the part of a person’s cognitive resources that is required to perform a task [11]. The more difficult the task, the more resources that must be spent to maintain task performance, and the higher the MWL. MWL also describes a continuum between under- and overloaded: a “medium” MWL is associated with the best task performance, while both extremes of under- and overload can reduce performance and should thus be avoided [12]. SA can briefly be described as a pilot’s mental model of the current flight situation, including aircraft systems and environmental factors [13]. According to Endsley, SA involves three levels: the perception of relevant elements, comprehension of their meaning for the current situation, and projection into the future [14]. SA is therefore closely associated with perception and attention as well as memory functions and knowledge [12]. Errors in any of the three levels of SA can result in poor decision-making and reduced task performance.

Effects of Radio Frequency Cross-Coupling

167

1.3 Aim of this Paper Unlike with ATCOs, there have been no experiments with pilots to determine the effects of the cross-coupling of radio frequencies in a multiple remote tower environment. This study is therefore the first empirical evaluation with pilots flying in an MRTO environment to determine the effects of cross-coupled radio frequencies. The aim of the current paper was to assess pilots’ MWL and SA and their attitude towards MRTO. The following hypotheses were formulated: The main hypothesis states there are no differences in pilots’ performance or their perception of aerodrome ATS between flights in RTO and MRTO environments. One of the “sub-hypotheses” relates to MWL; it assumes that the MWL of the test subjects does not differ significantly between flights served by RTO and MRTO facilities. The second “sub-hypotheses” relates to situation awareness; it assumes that situational awareness is not significantly poorer with an MRTO than with an RTO. A third sub-hypothesis relates to pilots’ attitude towards the MRTO concept; it assumes that their attitude will improve or at least not deteriorate.

2 Method 2.1 Sample 25 pilots (23 male, 2 female) participated in the experiment. All participants were native German speakers and held a pilot’s license. 5 pilots held a sport pilot licence (SPL), 4 pilots held a light aircraft pilot licence (LAPL), 11 held a private pilot licence (PPL), 2 pilots held a commercial pilot licence (CPL) and 3 pilots held an airline transport pilot licence (ATPL). The flight experience of the pilots was between 60 and 15,500 flight hours (M = 1,786; SD = 4,031). This diversity in pilots’ licences and experience can be considered representative for smaller aerodromes. The participants received monetary compensation for travel expenses. The study was conducted in accordance with the Declaration of Helsinki which governs the ethical treatment of humans in research. 2.2 Experimental Task The human-in-the-loop real-time simulation experiment was conducted using a Cessna C172 Model at the iSIM cockpit simulator at the DLR Institute of Flight Guidance, German Aerospace Center (DLR), Braunschweig (see Fig. 2).

168

L. Tews et al.

Fig. 2. DLR Cockpit-Simulator iSIM configured with Cessna C172 flight model

The experimental task was to approach Braunschweig-Wolfsburg (EDVE) airport, perform a landing on RWY 26 and after reaching their final parking position and a short break to request taxi for departure RWY 08 and perform a take-off and climb phase. The flight scenario commenced at waypoint November 1 for the inbound leg and ended when reaching waypoint Echo 2 for the outbound leg, covering all flight phases of an airport turn around (see Fig. 3). Each flight lasted approximately 25 min. The pilot participants performed identical flights in two different experimental conditions, RTO vs. MRTO. In the RTO condition the airport was controlled by an ATCO who only controlled EDVE, whereas in MRTO it was controlled by an ATCO who operated three airports with cross-coupled frequencies: EDVE, Mannheim airport (EDFM) and Erfurt-Weimar airport (EDDE). As a result, the traffic scenario at EDVE and related communications between the pilot and ATCO did not differ between the experimental conditions, but in the MRTO condition the R/T load was increased by the cross-coupled frequencies of EDFM and EDDE. The ATCO role was performed by an instructed simulation ATCO. The communication was scripted and approved by two qualified tower ATCOs from EDVE with RTO experience.

Effects of Radio Frequency Cross-Coupling

169

Fig. 3. EDVE Visual Operation Chart with in- and outbound flight scenario for the participant pilot

As is typical at small airports, the traffic in the scenario was operating under visual flight rules (VFR) and radio communication was in German language. Radio calls from other traffic in the scenario were pre-recorded and could be replayed by the simulation ATCO when needed to enable a dynamic flow of scenario but always with the same pre-scripted sequence of radio calls. The ATCO started every radio transmission with the name of the airport whose traffic they were talking to, and the pilot participants were instructed to include the name of the airport in each radio transmission. Table 1 illustrates an example of how communication across three airports looked like. To increase reality and to add traffic complexity, the traffic scenario, both in RTO and MRTO, contained an additional aircraft flying at EDVE (D-ENAI in Table 1) during inbound and another one during the outbound leg, to which attention had to be paid by the participant pilots. In the MRTO experimental condition there were three additional aircraft each at EDFM and EDDE, but maximum five at a time at all three airports. To mitigate learning effects and to avoid unwanted effects of systematic variance the order of the experimental conditions was randomized between participants. Furthermore, the callsigns of the aircraft were varied in between of the experimental.

170

L. Tews et al. Table 1. Excerpt of Radio Traffic in MRTO environment test condition. Braunschweig (EDVE)

ATCO

D-EDLR, Braunschweig Turm, Wind 040, 5 Knoten, QNH 1015, Melden Sie Abflugbereit (D-EDLR, Braunschweig tower, wind 040, 5 knots, QNH 1015, report when ready for departure)

Pilot QNH 1015, WILCO, D-EDLR D-EDLR (QNH 1015, Wilco, D-EDLR) Mannheim (EDFM) ATCO

D-EOXR, Mannheim Turm, Rollen Sie über Foxtrot und Alpha zu Abstellposition 4 (D-EOXR, Mannheim tower, taxi via Foxtrot and Alpha to parking position 4)

Pilot D-EOXR

Mannheim Turm, Rolle über Foxtrot und Alpha zu Abstellposition 4, D-EOXR (Mannheim tower, taxiing via Foxtrot and Alpha to parking position 4, D-EOXR)

ATCO

D-EVIL, Mannheim Turm, Rollen Sie zum Abflugpunkt Piste 09, halten Sie dort (D-EVIL, Mannheim tower, line up on runway 09 and wait)

Pilot D-EVIL

Mannheim Turm, Rolle zum Abflugpunkt Piste 09, halte dort, D-EVIL (Mannheim tower, lining up on runway 09 and waiting, D-EVIL) Braunschweig (EDVE)

Pilot D-ENAI

Braunschweig Turm, D-ENAI, Queranflug Rechts Piste 08 (Braunschweig tower, D-ENAI, right base leg runway 08)

ATCO

D-ENAI, Braunschweig Turm, Anflug fortsetzen melden Sie Endanflug Piste 08 (D-ENAI, Braunschweig tower, continue approach, report final runway 08)

Pilot D-ENAI

WILCO, D-ENAI Erfurt (EDDE)

Pilot D-FAST

Erfurt Turm, D-FAST Endanflug, Piste 10, erbitte Tiefanflug (Erfurt tower, D-FAST, final runway 10, request low approach)

ATCO

ATCO: D-FAST, Erfurt Turm, Nach dem Tiefanflug, fliegen Sie in die Platzrunde Piste 10, Wind 070, 4 Knoten, Piste 10, Frei zum Tiefanflug (Erfurt tower, D-FAST after low approach join traffic circuit runway 10, wind 070, 4 knots, runway 10 cleared for low approach)

Pilot D-FAST

Erfurt Turm, D-FAST, Nach dem Tiefanflug fliege ich in die Platzrunde Piste 10, Piste 10 frei zu Tiefanflug (Erfurt tower, D-FAST, joining traffic circuit runway 10 after low approach, runway 10 cleared for low approach)

Remark: D-EDLR is the call sign of the test subject pilot flying at Airport Braunschweig, marked in bold letters

Effects of Radio Frequency Cross-Coupling

171

2.3 Self-reported Measures Before and after the experiment, the participants were asked about their attitude towards MRTO, choosing between “positive”, “neutral” and “negative”. During the flights, MWL was assessed every two minutes using the Instantaneous Self-Assessment (ISA), a fivepoint Likert scale from one (“under-utilized”) to five (“excessive workload”) [15, 16]. In addition, after each flight, MWL was assessed with the Bedford Workload Scale from one (“very easy, workload insignificant”) to ten (“impossible, task abandoned”) [17]. SA was also assessed after each flight using the China Lake Situational Awareness Scale from one (“SA far too low”) to ten (“SA excellent”) [18]. At the end of the experiment, the pilots were asked about any uncertainties in the radiotelephony. Finally, they were able to comment on the experiment and the MRTO concept. 2.4 Procedure The duration of the experiment was approximately 150 min. Before starting the experiment, the participants were briefed on the MRTO concept, then they filled in a questionnaire on their demographic data and experience, and gave a rating of their attitude towards MRTO. They received instructions on the experiment, the airport EDVE and the flight simulator, and were allowed to perform training runs for as long as required to feel confident with the flight simulator and the EDVE environment. They then proceeded to the main experiment. Each participant conducted two flights during which they gave ratings on the ISA scale. After each flight, the participants filled in the Bedford and China-Lake scales. After completion of both flights, the second attitude assessment was done, and the pilots were invited to make further comments on the experiment. Finally, they were debriefed, thanked and compensated. 2.5 Data Analysis All statistical analyses were conducted using the SPSS 26 software (IBM Corp., Armonk, NY, USA). Differences between the RTO and MRTO experimental conditions were analysed with either paired two-sided t-tests or the nonparametric equivalent, the Wilcoxon signed-rank test, if the assumptions of the t-test (i.e. a normal distribution, as checked by the by the Shapiro-Wilk test [19]) were not met. For the analysis of attitude, a numerical value was assigned to each of the choices: “negative” was assigned the value −1, “neutral” 0, and “positive” 1. Since the assigned numerical values are scaled ordinally, the Wilcoxon signed-rank test was used. For the analysis of MWL, the ISA values were averaged for each flight. Sporadic missing values in a row were replaced by the mean of the respective variable. For the ISA analysis, no data could be recorded for one pilot, so that pilot was omitted from the ISA analysis. The differences of the mean values between the RTO and MRTO conditions were checked for normal distribution by the Shapiro-Wilk test, and because no significant deviations from the normal distribution (p = .063) were found, a paired two-sided t-test was carried out. The values assessed with the Bedford Workload Scale were not normally distributed according to the Shapiro-Wilk test (pRTO = < .001, pMRTO = .003), so a Wilcoxon signed-rank test was computed.

172

L. Tews et al.

For the analysis of SA, the values assessed with the China-Lake Scale were not normally distributed according to the Shapiro-Wilk test (pRTO < .001, pMRTO = .017), so a Wilcoxon signed-rank test was computed. Pilots’ comments on radio telephony and the MRTO concept were evaluated qualitatively.

3 Results 3.1 Attitude The Wilcoxon test for the attitude showed no significant difference between the pre- and post-test data (z = −.38, p = .705), so no change in general attitude between pre- and post-experiment could be registered. It was predominantly positive before and after (as shown in Fig. 4). On participant level there were three participants who changed from positive to neutral, three from neutral to positive, and one from negative to neutral.

16 14

Frequency

12 10 8 6 4 2 0

Attitude pre-experiment Negative

Attitude post-experiment Neutral

Positive

Fig. 4. Distribution of pilots’ attitudes towards MRTO pre- and post-experiment

3.2 Mental Workload The t-test on the ISA data showed a significantly higher pilot MWL in the MRTO condition than in the RTO condition (t(23) = 5.28, p < .001). In the RTO conditions, the mean MWL was considered too low (“under-utilized”, M = 1.39, SD = 0.48). In the MRTO condition, the mean MWL was considered low (M = 1.76, SD = 0.66), see Fig. 5.

Effects of Radio Frequency Cross-Coupling

173

5.00 4.00 3.00 2.00 1.00

Instantaneous Self-Assessment (ISA) RTO

MRTO

Fig. 5. Distribution of the pilots’ MWL determined with ISA, error bars indicate standard deviation

The Wilcoxon test on the Bedford Workload Scale showed significantly higher pilot MWL in experimental condition MRTO compared to RTO (z = −2.10 p = .036). The majority of the pilots rated their MWL as insignificant in RTO and as low in MRTO. No pilot reported unacceptably high (“very objectionable”, ≥ 6) MWL in either condition, as shown in Fig. 6.

14 12 Frequency

10 8 6 4 2 0

1

2

3

4 5 6 7 Bedford Workload Scale RTO

8

9

10

MRTO

Fig. 6. Distribution of the pilots’ MWL determined with Bedford Workload scale

174

L. Tews et al.

3.3 Situation Awareness The Wilcoxon test for the China-Lake Scale data showed significantly lower pilot SA in the MRTO condition compared to RTO (z = −2.33, p = .020). The majority of the pilots rated their SA as “very good” in RTO and as “good” in MRTO. No pilot reported unacceptably low SA (100 mm presence Delamination* Spallation

presence

Corrosion of armature and reduction of exposed bars section due to corrosion crack (reinforced concrete)

surface rust or exposed bars

rusted exposed bars

exposed bars very corroded

none

≤30 %

>30 %

3 mm

crack (mass 6 mm concrete) *Note: delamination cannot be detected with image sensors in the visible spectrum. Table 2. General assessment criteria of an element behavior (source: [9]) Behavior

Diminution of an element aptitude to play its role

CEC rating Principal element

Secondary element

4

< 10 %

< 10 %

3

10 to 20 %

10 to 30 %

2

20 to 30 %

30 to 50 %

1

> 30 %

> 50 %

special acoustic signature that indicates the presence of internal delamination that sometimes can’t be detected visually. To access the bridge infrastructure, different means are used such as the use of a nacelle (a.k.a. under bridge inspection unit), a boat or a simple walk, depending on the accessibility of the structure. Bridge management (including inspection) is done either by the government or by the private industry under specific jurisdictions. Each jurisdiction has its own way of conducting inspections (e.g. [4,9,15]). Also, several classes of possible defects exist, and the detection method and quantification of each defect severity depends on its class.

A Bridge Inspection Task Analysis

4

287

Discussion

Given that the task analysis described in this paper aims to maximize the impact of the development of drone-based technology to assist inspectors in performing their tasks, let’s take a moment to describe this technology. The drone technology envisioned here is a for a remote guidance collaboration scenario involving a remote helper (the inspector) guiding in real time a local worker (the drone’s pilot) in performing tasks on physical objects (the bridge) [5,7]. Since inspectors work in teams, the drone’s pilot might well be also an inspector too. The scenario envisioned is illustrated in Fig. 3 below and is described in details in [6].

Fig. 3. Illustration of the concept (source: [6])

Given that the drone is equipped with a regular visible light camera, the detection of some defects such as delaminations is not possible since their detection is not possible in the visible light spectrum. That prevents de facto the use of that type of sensor for general inspections, but fits perfectly well in a scenario of visual inspections only, such as those related to annual inspections. Discussion with the inspectors also revealed that drones could be well suited to inspect confined spaces. Confined Space. A confined space is defined as a space that is totally or partially closed, which has the three following characteristics: – It is not designed to be occupied by people nor intended to be, but may occasionally be occupied for the performance of a job;

288

J.-F. Lapointe and I. Kondratova

– It is not possible to enter or leave it other than through a restricted route; – It may present risks to health, safety or physical integrity for anyone who enters it, in particular due to the nature of work performed, its location, design, atmosphere or insufficient ventilation (natural or mechanical) as well as materials or substances that may be found inside. Currently, the inspection of closed spaces are conducted by private companies with the help of a rescue team. An example of a confined space is illustrated in Fig. 4.

Fig. 4. Example of a confined space

5

Conclusion

This paper presented the results of a bridge inspection task analysis for bridges made of concrete, as performed by the MTQ on Qu´ebec’s territory. The results indicate that drones equipped with visible image sensors for bridge or overpass inspections are suitable for annual visual inspections since these inspections are completely visual and do not require any contact with the bridge. Drone inspections could also be well suited to inspect confined spaces. The results of the task analysis also indicate the kind of defects that inspectors are looking for and how they rank them in terms of severity. All this information is necessary in order to

A Bridge Inspection Task Analysis

289

develop drone-based technology that could help bridge inspectors perform better in their job, by using such a system in a collaborative remote guidance scenario such as the one described in this paper. It could also be used for at the user interface design stage of such systems. Field tests of the drone-based system will allow to determine the next steps in terms of refinement of this tool to better support bridge inspectors in performing their tasks. Initial field tests indicate that a reduction of task time of about an order of magnitude (i.e. 10x reduction) seems possible given the speed at which a drone can scan the entire structure of a bridge as opposed to the current visual inspections [8]. If such a scenario will become reality, the benefits would be enormous in terms of safety, cost savings, improved quality of inspections and reduced transportation downtime for these important transportation infrastructures. Acknowledgements. This task analysis would not have been possible without the support and collaboration from the participants of the Module Inspection des Structures, Direction g´en´erale de l’Outaouais, du Minist`ere des Transports du Qu´ebec. We kindly thank them for the time they donated to this project.

References 1. Calvi, G.M., et al.: Once upon a time in Italy: the tale of the morandi bridge. Struct. Eng. Int. 29, 198217 (2019). https://doi.org/10.1080/10168664.2018.1558033 2. Canadian institutes of health research, Natural sciences and engineering research council of canada, and social sciences and humanities research council: tri-council policy statement: ethical conduct for research involving humans. secretariat on responsible conduct of research, Government of Canada (2018). https://ethics.gc. ca/eng/documents/tcps2-2018-en-interactive-final.pdf 3. Diaper, D., Stanton, N.: The Handbook of Task Analysis for Human-Computer Interaction. Lawrence Erlbaum Associates (2004) 4. Government of Alberta: bridge inspection and maintenance system - BIM level 1 inspection manual version 4.0. (2020). https://www.alberta.ca/bridgemanagement.aspx 5. Huang, W., Alem, L., Tecchia, F.: HandsIn3D: supporting remote guidance with immersive virtual environments. In: Kotz´e, P., Marsden, G., Lindgaard, G., Wesson, J., Winckler, M. (eds.) INTERACT 2013. LNCS, vol. 8117, pp. 70–77. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40483-2 5 6. Lapointe, J.F., Allili, M.S., Belliveau, L., Hebbache, L., Amirkhani, D., Sekkati, H.: AI-AR for bridge inspection by drone. In: Chen, J.Y.C., Fragomeni, G. (eds.) Virtual, Augmented and Mixed Reality: Applications in Education, Aviation and Industry, pp. 302–313. Springer, Cham (2022). https://doi.org/10.1007/978-3-03106015-1 21 7. Lapointe, J.-F., Molyneaux, H., Allili, M.S.: A literature review of AR-based remote guidance tasks with user studies. In: Chen, J.Y.C., Fragomeni, G. (eds.) HCII 2020. LNCS, vol. 12191, pp. 111–120. Springer, Cham (2020). https://doi. org/10.1007/978-3-030-49698-2 8 8. Lapointe, J.F., Sekkati, H., Allili, M.S., Hebbache, L., Amirkhani, D., Hammouche, N.: AI-AR for remote visual bridge inspection by drone. In: Proceedings of the 11th International Conference on Structural Health Monitoring of Intelligent Infrastructure (SHMII-11), to be published (2022)

290

J.-F. Lapointe and I. Kondratova

´ 9. Minist`ere des Transports, de la Mobilit´e durable et de l’Electrification des transports: Manuel d’inspection des structures. Gouvernement du Qu´ebec (2017). https://boutique.publicationsduquebec.gouv.qc.ca/boutique/fr/Catalogue/ Transports/978-2-551-25998-4/p/978-2-551-25998-4 ´ 10. Minist`ere des Transports, de la Mobilit´e durable et de l’Electrification des transports: Manuel d’inventaire des structures. Gouvernement du Qu´ebec (2017). https://boutique.publicationsduquebec.gouv.qc.ca/boutique/fr/Catalogue/ Transports/978-2-551-25999-1/p/978-2-551-25999-1 ´ 11. Minist`ere des Transports: de la Mobilit´e durable et de l’Electrification des transports: Mat´eriel d’inspection des structures. Internal document, Gouvernement du Qu´ebec (2020) 12. Minist`ere des Transports et de la Mobilit´e durable du Qu´ebec: Signalisation routi`ere - Tir´e ` a part - Travaux. Gouvernement du Qu´ebec (2021), https:// boutique.publicationsduquebec.gouv.qc.ca/boutique/fr/Catalogue/Transports/ 3841/p/3841 13. Scapin, D.L., Bastien, J.M.C.: Analyse des tˆ aches et aide ergonomique ` a la conception : l’approche MAD*. In: Kolski C, pp. 85–116. Analyse et conception de l’IHM Herm`es (2001) 14. Scapin, D.L., Pierret-Golbreich, C.: MAD: Une m´ethode de description de tˆ ache. In: Colloque sur l’ing´enierie des interfaces homme-machine, pp. 131–148, May 1989 15. US department of transportation: national bridge inspection standards (2023). https://www.fhwa.dot.gov/bridge/nbis.cfm

A Study on Human Relations in a Command and Control System Based on Social Network Analysis Zhen Liao1,3(B)

, Shuang Liu2

, and Zhizhong Li1

1 Department of Industrial Engineering, Tsinghua University, Beijing 100084, China

[email protected]

2 School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China 3 China Institute of Marine Technology and Economy, Beijing 100081, China

Abstract. Teamwork is the basic feature of command and control tasks. There are a large number of information communication and exchange actions. Smoothness of human relations directly affects the overall effectiveness of the system. Social network analysis is a common method of studying the human relations, which combines complex network theory with sociology, using mathematical statistical features to quantify and visualize social group behavior and information dissemination mechanisms. In view of the characteristics of human relations in a command and control system which are constrained by both task (ambiguity) and equipment (human-computer interface such as communication media) factors, a social network model is constructed in this study. Three main social network characteristics are proposed, along with a measurement method for the coupling effect of communication media and task ambiguity. The calculation is demonstrated with two simulated submarine command and control tasks. Keywords: The Social Network Analysis · Human Relations · Command and Control · Complex Networks · Submarines

1 Introduction The US Department of Defense defines a command and control (C2) system as: the facilities, equipment, communications, procedures, and personnel essential for a commander to plan, direct, and control operations of assigned and attached forces pursuant to the missions assigned [1]. C2 is the recursive process of sharing the intent of decision-makers across organizations, turning intent into action, monitoring success, and adjusting goals to meet changing needs [2, 3]. The core is scientific decision-making, while the basic feature is teamwork. Some researchers summarize C2 as team decision-making [4]. Alberts et al. [5] described collaboration as a process that takes place between two or more entities. Collaboration always implies working together toward a common purpose. This distinguishes it from simply sharing data, information, knowledge, or awareness. It is also a process that takes place in the cognitive domain. The collaboration process is represented as a dotted box between two entities. Collaboration requires the ability to © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 291–305, 2023. https://doi.org/10.1007/978-3-031-35389-5_20

292

Z. Liao et al.

share information. Thus, human relations in a C2 system refers to the contact collection of various requests, orders, confirmation, and exchanges between the C2 personnel for the purpose of information transmission to ensure the success of team cooperation during the task process. The state of human relations is critical for the overall effectiveness of the system [6, 7]. Especially, the social group behavior and interpersonal network of C2 team are worthy of further study [8]. Generally speaking, social groups are self-organized, emerging and complex. A globally consistent pattern will emerge from the local interaction of elements that make up the system. The larger the group size, the more obvious these patterns [9]. Many researchers refer to these as complex sociotechnical systems [10, 11]. In this case, the traditional linear analysis method and reductionism are difficult to reveal these patterns. More and more researchers are using complex networks or complexity science approach to analyze complex sociotechnical systems [12–14]. The application of emerging technologies, such as artificial intelligence and big data, is changing human relations, and have led to unexpected results. The width and depth of the C2 personnel’s access to information and task coordination become remarkably different. Different communication mediums will have different effects on the human relations of different task types in C2. For example, although face-to-face communication is good at timely feedback, it is limited by the distance and time constraints; video communication enables more people to communicate face to face, but the timeliness and accuracy of feedback are weakened to some extent due to the internet constraints. Email communication is less timely, but it can send more accurate data and available for re-check at any time. Low-abundance media is recommended for precise tasks with low ambiguity, while high-abundance media is recommended for tasks involving more personal thinking, creativity, and so on [15]. This study proposes a method to analyze the influence of communication media on the human relations in a C2 system in the context of different task types and puts forward suggestions in the early stage of command and control system development.

2 The Social Network Analysis Review Social network analysis (SNA) is a systematic framework to retrieve meaningful information from a given social network, which consists of actors (nodes or vertexes) and their relations (links or edges). Nodes usually represent the actors, who can be individuals, groups, teams, communities, organizations, political parties and even nations and countries. Relationships between nodes can be directional or non-directional, and can be a combination of several different relationship types. In addition, since all kinds of information and/or knowledge processing entities (including people, groups, organizations, computers, etc.), can be regarded as actors, SNA has been widely adopted in various disciplines, such as sociology, psychology, anthropology, education, and so on [16]. SNA generally uses two groups of data sets to represent social networks: a conventional data set, sometimes called a node list, in which each node is an observation unit, and edges between nodes, representing the relationships between these observation units. There are two common representation methods, namely, the chart method and the matrix method. A chart is composed of a group of nodes as well as their connection lines. It can visually display the characteristics of social networks. However, it is

A Study on Human Relations in a Command and Control System

293

not convenient for statistical calculation. In the matrix method, the actors are listed in columns and rows in the same order. The value of each pair of nodes represents their corresponding relationship. By replacing columns or rows in the matrix with a set of activities, locations and tools, other matrix reflecting the operation of the whole social network can be further constructed [17]. The characteristics of social networks include the network density (the density of the overall network and nodal network), centrality, central potential and reputation, number of edges and distance (the shortest path, the longest path, and the average distance or path length), factions or subgroups, clustering, structural equivalence [18, 19], etc. SNA method has been widely used in non-military fields. For example, Park [17] verified the applicability of SNA method to identifying the characteristics of crew communications, and it was found that SNA metrics could be meaningful for explaining the communication characteristics of main control room operating crews in a nuclear power plant. Lo et al. [20] used SNA centrality metrics, such as degree, closeness, and betweenness, to assess network cognition in particular railway traffic and passenger traffic control. Mohammadfam et al. [21] showed that SNA method provided a quantitative and logical approach for the examination of the coordination status among response teams and it also provided a main opportunity for managers and planners to have a clear understanding of the presented status. Sparrowe et al. [22] found that social networks which were defined according to positive and negative relationships were related to not only individual but also group performance. Min et al. [23] adopted the SNA method to analyze the team communication for maritime collaborative task performance, and they found that compared with the low task performance group, the high task performance group exhibited a denser and more decentralised communication structure. At the same time, the SNA method is also increasingly used in the military field. Roberts and Stanton et al. [24, 25] comprehensively applied the complex network theory in the event analysis of systematic teamwork (EAST) method to model and analyze the task, society and information networks of a Submarine C2 system. They proposed that SNA metrics can help to quantify such processes and afford empirical investigation. Houghton et al. [13] combined cognitive work analysis (CWA) with social network analysis (SNA), and by taking battlefield assessment of a military planning group as an example, showed how actor-to-actor and actor-to-function mapping can be analyzed in terms of centrality, which can then be used as metrics of system structure under different operating conditions. These studies mainly used SNA method to analyze the human relations in nonmilitary and military fields. The characteristics of network node state and network global state was calculated by collecting communication data in different task states. However, there are few studies considering the influence of the connection carrier between nodes, namely “communication media”, and it is lacking further consideration of the influence of communication media under different ambiguity task conditions.

294

Z. Liao et al.

3 The Social Network Model for C2 Application 3.1 The C2 Social Networks For a C2 system, a social network is a network structure composed of individuals of combat posts or team members with relationships in certain forms of organization, team structure, and cultural rules. As shown in Fig. 1, the C2 personnel are presented as nodes and communication relationships as edges. Such a social network belongs to the typical power/authority type [26] with strict division of labor between superiors, subordinates and equals. At the same time, there is complex communication relationships between superiors, subordinates and equals in the implementation of tasks. Considering the importance of military tasks, social networks of C2 systems are very stable and greatly depend on task characteristics. On the other hand, C2 tasks rely on equipment. Therefore, the networks are also closely related to the equipment form, especially the communication media, including face-to-face communication, text communication, audio, video, instant messaging, etc. [27].

Communication media

Fig. 1. Illustration of the social networks of a C2 system

Communication generally involves coding, decoding, feedback, media [21]. As shown in Fig. 2, the communication process in a C2 system starts from the senders, who encode the command intention in their mind to form information, which is then sent to the receivers through the information transmission channels, i.e. the communication media. After receiving the information, the receivers first decode the information, that is, process it into an understandable form. Finally, the receivers encode the intention after communication into information, which is then transmitted back to the senders through the communication media, such as face-to-face contacts, telephone, and e-mail. The senders receive and decode the information. The communication process is then ended. The whole communication process may be affected by various factors, such as training level of the personnel, team mental status, and working status of team members, resulting in information distortion in any link of the communication process.

A Study on Human Relations in a Command and Control System

295

Fig. 2. The communication model of a C2 system

3.2 Characteristics of C2 Social Networks This study focused on the impact of communication media. By referring to the models and methods for social network analysis of C2 [27, 28], and a model based analysis of the complexity of collaborative C2 system [29], three media related social network characteristics were selected for modeling and predicting the difference in human relations between different scenes of task ambiguity. A collection of metrics is adopted as defined in Table 1 [30]. Table 1. Definitions of metrics Metric

Definition

Number of nodes

Number of entities (people) in a C2 social network

Number of edges

Number of pairs of connected entities

Average Path length

Average geodesic distance and communication difficulty caused by the medias between the nodes of the C2 social network (Cdis )

Network diameter

Length of the longest geodesic path and communication difficulty caused by the medias to be found in the C2 social network (Cdia )

Network Density

Number of relations and connection state caused by the medias observed represented as a fraction of the total relations possible (Cden )

Communication Efficiency

Difficulty of communication edges influenced by communication media between the nodes (δij )

296

Z. Liao et al.

Three Characteristics of C2 social networks are proposed as follows. Average Path Length. It is supposed that there are N post nodes in the social network of a C2 system. The distance between the post nodes i and j is defined as the smallest number of edges connecting the two nodes, which can be expressed as dij . The communication efficiency between the post node i and j is affected by the prism effect of communication media and notated as δij . Specially, δij = 1 means an idealized plane mirror state, that is, communication media have no impact on the cooperation efficiency. The worst condition corresponds to the approaching of δij to 0, that is, two nodes can hardly communicate or there are no communication relationships. Thus, according to different communication media in a C2 system, the values of δij are different with the range of [0, 1]. The higher the communication efficiency δij , the lower the communication effort, the smaller the average path length. fij is the communication frequency between the nodes i and j, indicating the number of communications between the two nodes. The higher the communication frequency, the more the number of communications between the two nodes. Cdis =

N N 1 2 fij dij i,j=1 i=j δij N (N − 1)

(1)

When the difference of communication frequency the influence of communication media is discarded, the calculation of the average path length can be simplified as: Cdis =

N N 2 dij i,j=1 i=j N (N − 1)

(2)

Network Diameter. Network diameter (Cdia ) reflects the longest communication distance in the social network of a C2 system: Cdia = max fij 1≤i,j≤N

1 ∗ dij δij

(3)

Similarly, it can be simplified as follows when communication frequency and communication media are not considered: Cdia = max dij 1≤i,j≤N

(4)

Network Density. Network density (Cden ) describes the closeness of communication relationships, which can be revealed by the average number of connection edges and connection state of the communication network between team members. The social network density of a C2 system can be expressed by the ratio of the actual number of the connecting edges to the maximum possible number of connected edges in a social networks of N nodes.  2 i=j fij δij Lij (5) Cden = N (N − 1)

A Study on Human Relations in a Command and Control System

297

where Lij indicates the connection state whether there is a connection edge between the two nodes. If there is a connection edge, L ij = 1, otherwise, L ij = 0. As a special case, when fij = 1 and δij =δji =1, the above formula can be simplified as:  2 i=j Lij Cden = (6) N (N − 1)

3.3 The Social Network Communication Media According to the media richness theory proposed by Daft et al. [31], the connection relationship between social network nodes can be regarded as a prism, which produces “spectra”. The degree and effect of the connection edge between two nodes change with the communication media. If the communication media is ignored, the relationship between social network nodes is like a flat mirror, meaning that the communication effect is not affected by communication media. According to the media richness theory, when the information requirements on both sides of the communication are matched by the media’s capacity of disseminating the richness of information, their work performance will be enhanced. Media richness can be decomposed into four dimensions [31]: Immediate Feedback. Reflecting the ability to get real-time response and feedback to the communication content. Number of Cues and Channels Utilized. Showing that the rich medias provide multiple cues via body language and tone of voice, numbers, text, or image information, etc. Language Variety. That is the scope of the meaning of information transmitted through the channels. Personalization. Namely, the integrity of information that can be transmitted by communication media when personal feelings and emotions are immersed in the communication process. For a communication method, say i, its media richness can be expressed as the sum of the ranking scores of the above four dimensions, as illustrated in Fig. 3: MR(i) =

4 j=1

mr(i) = mr1 (i) + mr2 (i) + mr3 (i) + mr4 (i)

(7)

298

Z. Liao et al.

Fig. 3. Media richness calculation for several communication methods

3.4 Coupling Communication Media Efficiency and Task Ambiguity For well defined tasks, lower richness media should be selected, while for tasks involving more personal thinking, creation, or unanticipated, richer media should be considered [13]. Information in the process of communication and transmission contains not only language and symbols, but also thoughts, intentions, emotions, and other content to be shared by the sender and the receiver. Different communication media are required for different information richness needs. For tasks requiring different information richness, using matching or unmatching communication media will result in different social network communication efficiency. Hollingshead et al. [31] presented a framework that was a five-by-six matrix. One axis of the matrix was defined in terms of the five task types of ambiguity. The other axis consisted of six media forms (text, email, instant messaging, telephone, audio/video, face to face) that vary in information richness. This matrix classified the matching between information requirements by tasks and information richness of communication methods. The matching between task and medium was near the diagonal of the matrix. The combination of tasks and media tended to be inefficient in the upper right of the matrix, because communication methods might be too abundant for tasks, causing interference to required information by unnecessary communication information. On the other hand, the combination of tasks and media in the lower left of the matrix was inefficient or ineffective either, because the communication mode was too poor to transmit enough information. Based on the above analysis, the matching relationship between task ambiguity and communication media efficiency could be expressed as the following matrix relationship, as shown in Table 2. The complete C2 system tasks can be categorized into the following three types, namely execution, intellective and discussion tasks:

A Study on Human Relations in a Command and Control System

299

Table 2. Matching relationship between task ambiguity and communication media efficiency

Task ambiguity

Low Ambiguity ↓ High Ambiguity

Notes

(The communicaon media efficiency of different communicaon media) Text Email Instant TeleAuFace to messaging phone dio/vide face o

Very (0, 1] 0.8 1.0 0.8 0.6 0.4 0.2ε low Low (1, 2] 0.6 0.8 1.0 0.8 0.6 0.4ε Middle (2, 3] 0.4 0.6 0.8 1.0 0.8 0.6ε High (3, 4] 0.4 0.6 0.6 0.8 1.0 0.8ε Very (4,5] 0.1 0.2 0.4 0.6 0.8 1.0ε high The bigger numeric value of , the higher communicaon efficiency:

The bigger numeric value of task ambiguity, the higher task ambiguity:

Since face to face communicaon is easily affected by distance, ε is used to represent the influence of distance determined by the actual situaon. ε=1 represents very close distance, which is face to face.

Execution Task. This type of tasks does not require thinking, but require direct actions such as drawing, labeling, and manipulation according to a clear goal and process. It is a task type with a low ambiguity. Intellective Task. This type of tasks refers to problem-solving where the problem is well defined and has a deterministic answer. The team members must spend significant intellective effort in finding the correct answer for the problem. It is a task type with a relatively moderate ambiguity. Discussion Task. For this type of tasks, the problem has not only one unique correct answer, and there is not established or recognized standard process to find a correct answer. Usually, team members need to discuss with each other to eliminate the cognitive conflicts and reach a consensus among members according to their experience and knowledge. It is a task type with a high ambiguity [31]. Based on Table 2 the mathematical relationship between efficiency of communication media and task ambiguity could be modeled using the Linear Regression or Quadratic polynomial regression. With y as communication efficiency affected by the prism effect of communication media, and x as task ambiguity, the fitting equations were shown in the Fig. 4.

300

Z. Liao et al.

Fig. 4. Schematic diagram of the relationship between the task ambiguity and the communication efficiency under different communication media

The above three task units represent three typical task types with a high, medium and low ambiguity. In combination with the analysis of the uncertainty (ambiguity) of tasks, the comparison is shown in Table 3. For typical C2 tasks, the proportions of the above three types can be calculated. Supposing the total time of a task T i , execution tasks account for x%; intellective tasks account for y%; and discussion tasks account for z%. Then task ambiguity can be calculated as the weighted sum of the ambiguity score of each type, as shown in the following formula:  Task_Amb = x% ∗ E + y% ∗ I + z% ∗ D (8) wherex% + y% + z% = 1 The ambiguity score of each task type should be appropriately rated according to characteristics of actual task. As a set of typical scores, E = 1; I = 3; D = 5, could be used in general.

A Study on Human Relations in a Command and Control System

301

Table 3. The comparison of characteristics of three typical task units Task types

Task characteristics

Ambiguity

Examples

No communication and discussion are needed

1

Helmsman control

Intelligence demanded in selection, calculation and judging and significant error probability

Communication and discussion is rarely needed

3

Information analysis

A certain ambiguity in decision-making accuracy, calculation method and process

Communication and 5 discussion are needed to eliminate the decision-making conflicts and reach a consensus

Operation ambiguity

Role ambiguity

Execution (E)

With a clear and definite goal, method and plan

Intellective (I)

Discussion (D)

Comprehensive decision making

4 Case Study: A Submarine Operation and Navigation Scenario 4.1 Scenario and Tasks As a case study, a typical scenario of submarine operation and navigation was analyzed with two simulated team coordination tasks, i.e. “target threat judgment” and “early warning of underwater deep fall”, as shown in Fig. 5. The target threat task included electronic tasks of detection and monitoring, target identification, situation judgment, comprehensive decision-making, and execution. The underwater deep fall task included detection and calculation, threat assessment, situation judgment, comprehensive decision-making, route execution. The involved posts included sonar operator (SOP), target motion analysis (TMA), commanding officer (CO), ship control (SHC), etc. The following three different conditions were assumed: Condition 1. The C2 team members were separated in two different cabins. Face to face communication was used inside and outside the cabins, and it need to move to achieve face-to-face between the two different cabins. Condition 2. The C2 team members were in the same cabin. Face to face communication was adopted inside a cabin and Instant messaging communication was adopted between cabins. Condition 3. The C2 team members were in the same cabin. Face to face communication was adopted inside a cabin and telephone communication was adopted between cabins.

302

Z. Liao et al.

Fig. 5. Schematic diagram of the C2 task of simulated submarine

Domain experts were invited to rate task type proportions and ambiguity. The results of the average are presented in Table 4. According to the expert rating results, the ambiguity of both tasks was high. Table 4. Task type proportion and ambiguity evaluated by domain experts Tasks

Execution

Intellective task

Discussion

Task ambiguity

Overall ambiguity

Judging of target threat

10%

30%

60%

4.0

High

Warning of underwater deep fall

15%

30%

55%

3.8

High

4.2 Results According to the definitions and models described in Sect. 2, the network characteristics of the two tasks under the three assumed conditions were calculated, as shown in Table 5. It can be seen from Table 5 that the target threat judgment task adopting Instant messaging communication mode would lead to the maximum network diameter, the

A Study on Human Relations in a Command and Control System

303

Table 5. Summary of the calculation of social network characteristics under typical simulation task scenarios Task

Network characteristics

Judging of target threat

Condition 2 (Instant messaging)

Condition 3 (Telephone)

Network diameter 29.46

47.46

26.78

Average path length

10.83

16.30

10.01

Network density

23.59

16.80

25.38

Warning of Network diameter 11.45 underwater deep fall Average path 5.50 length

14.46

10.82

6.34

4.74

9.97

12.72

Network density

Condition 1 (Face to face)

11.06

maximum average path length, and the minimum network density, namely, the difficult communication media is unfavorable for team coordination. In the early warning task of underwater deep fall, the influence of three communication media were similar to those of the target threat judgment task. However, the difference between the three conditions were relatively small.

5 Discussion From the case study, the following suggestions can be proposed: • It is necessary to re-design the cabin layout to reduce the spatial distance between nodes, and adjust communication frequency and direction, for reducing the social network average path length and diameter as well as improving the density, under the constraints of the task and organizational relationships. • The higher the ambiguity, the greater the difference of the influence of communication media on the complexity of social networks. Therefore, when facing with high task ambiguity, it is necessary to provide a communication mode with high media richness. • When the communication media efficiency of a certain link in the social network of the C2 system increases, the average path length of social networks decreases; that is, the average communication complexity of the entire social network decreases, which is favorable for improving the overall coordination efficiency of the C2 system network. Therefore, when designing a C2 system network, the optimization and the enhancement of communication media between key two post nodes is beneficial to the performance of the whole system. • There is always a link in the social network of the C2 system with the weakest communication media efficiency, which determines the maximum network diameter. Even if the communication efficiency of other links in social networks is improved, there still might be large risks with the unimproved link with the greatest communication difficulty in the network. Only by optimizing communication media of the weakest

304

Z. Liao et al.

link in social networks, could the network diameter of the overall social network be reduced. • In comparison with written communication such as email and instant messaging, oral communication (such as face-to-face and telephone communication) has a higher density of social networks, which means more efficient interactions.

6 Conclusion Communication media have significant influence on information communication and exchange between the human relations among a C2 team. After coupling communication media efficiency and task ambiguity, the effects of task ambiguity on the social network characteristics of a C2 system can be revealed. By now only the relationship between task characteristics and communication media are considered. In the future, the relationship between statistical characteristics of social networks the comprehensive performance of a C2 system should be further studied to provide more useful guidance for the design of C2 system equipment.

References 1. USA D. Department of Defense Dictionary of Military and Associated Terms (2016) 2. Dai, H.: Construction of the command and control discipline (2016) 3. Eisenberg, D.A., Alderson, D.L., Kitsak, M., et al.: Network foundation for command and control (C2) systems: literature review. IEEE Access 6, 68782–68794 (2018) 4. Breton, R., Roussea, R.: Modelling approach for team decision making (2006) 5. Alberts, D.S., Garstka, J., et al. Understanding information age warfare (2001) 6. Liao, Z., Wang, X., Liu, S.: Application of human factors engineering in command and control information system. In: The fifth China Command and Control Conference, Beijing, China (2017) 7. Stanton, N.A., Roberts, A.P.J., et al.: The quest for the ring: a case study of a new submarine control room configuration. Ergonomics 65(3), 384–406 (2022) 8. Hu, X.F., Li, Z.Q., He, X.Y., et al.: Complex networks: the new method of war complex system modeling and simulation. J. Inst. Equip. Command Techno. 20(02), 1–7 (2009) 9. Boccaletti, S., Latora, V., Moreno, Y., et al.: Complex networks: structure and dynamics. Phys. Rep. 424(4), 175–308 (2006) 10. Stanton, N.A., Harvey, C.: Beyond human error taxonomies in assessment of risk in sociotechnical systems: a new paradigm with the EAST “broken-links” approach. Ergonomics 60(2), 221–233 (2017) 11. Klein, G., Ross, K.G., Moon, B.M., Klein, D.E., Hoffman, R.R., Hollnagel, E.: Macrocognition. IEEE Intell. Syst. 18(3), 81–85 (2003) 12. Roberts, A., Webster, L.V., Salmon, P.M., et al.: State of science: models and methods for understanding and enhancing teams and teamwork in complex sociotechnical systems. Ergonomics 65(2), 161–187 (2022) 13. Houghton, R.J., Baber, C., Stanton, N.A., et al.: Combining network analysis with cognitive work analysis: insights into social organisational and cooperation analysis. Ergonomics 58(3), 434–449 (2015) 14. Waterson, P.E., Older Gray, M.T., Clegg, C.W.: A sociotechnical method for designing work systems. Hum. Factors 44(3), 376–391 (2002)

A Study on Human Relations in a Command and Control System

305

15. Hollingshead, A.B., Mcgrath, J.E., O’Connor, K.M.: Group task performance and communication technology: a longitudinal study of computer-mediated versus face-to-face work groups. Small Group Res. 24(3), 307–333 (1993) 16. Park, J.: The use of a social network analysis technique to investigate the characteristics of crew communications in nuclear power plants—a feasibility study. Reliab. Eng. Syst. Saf. 96(10), 1275–1291 (2011) 17. Wasserman, S., Foster, C.: Social network analysis: methods and applications (2012) 18. Carrington, J.P., Scott, J., et al.: Models and Methods in Social Network Analysis. Cambridge University Press, Cambridge 2005 19. Burt, R.S., Kilduff, M., Tasselli, S.: Social network analysis: foundations and frontiers on advantage. Ann. Rev. Psychol. 64, 527–547 (2013) 20. Lo, J.C., Meijer, S.A.: Assessing network cognition in the Dutch railway system: insights into network situation awareness and workload using social network analysis. Cogn. Technol. Work 22(1), 57–73 (2020) 21. Mohammadfam, I., Bastani, S., Esaghi, M,. et al.: Evaluation of Coordination of Emergency Response Team through the Social Network Analysis. Case Study: Oil and Gas Refinery. Saf. Health Work 6(1), 30–34 (2015) 22. Sparrowe, R.T., Liden, R.C., Wayne, S.J.: Social networks and the performance of individuals and groups. Acad. Manag. J. 2, 44 (2001) 23. Min, Y.C., Wanyan, X.R., Liu, S.: Quantitative analysis of team communication for maritime collaborative task performance improvement. Int. J. Ind. Ergon. 11, 92 (2022) 24. Roberts, A.P.J., Stanton, N.A.: Macro cognition in submarine command and control: a comparison of three simulated operational scenarios. J. Appl. Res. Mem. Cogn. 7(1), 92–105 (2018) 25. Stanton, N.A., Roberts, A.P.J.: Examining social, information, and task networks in submarine command and control. IEEE Trans. Hum.-Mach. Syst. 48(3), 252–265 (2018) 26. Ji, X.F.: Research on the mechanism of team communication on team knowledge sharing, Zhejiang University (2008) 27. Wang, Z.F.: Practical communication course, Beijing University of Technology Press, Beijing (2017) 28. Guo, S.Z., Lu, Z.M.: Basic theory of complex networks, 1st edition. Science Press, June 2012 29. Liao, Z., Liu, S., Li, Z., et al.: A model-based analysis of the complexity of collaborative command-and-control system: SPIE (2022) 30. Stanton, N.A., Roberts, A.P.J., et al.: Returning to periscope depth in a circular control room configuration. Cogn. Tech. Work 23, 783–804 (2021) 31. Daft, R.L., Lengel, R.H.: Organizational information requirements, media richness and structural design. Manage. Sci. 32(5), 554–571 (1986)

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower of Hangzhou Jiande Airport Chengxue Liu1(B) , Yan Lu1 , and Tanghong Mou2 1 Zhejiang General Aviation Industry Development Co., Ltd., Hangzhou, Jiande 311612,

Zhejiang, China [email protected] 2 The Second Research Institute of CAAC, Chengdu 610041, Sichuan, China

Abstract. Remote air traffic control (ATC) tower technology has been applied in six locations in China, and has achieved good application effects in recent years. When utilizing the technology, the ergonomic evaluation is an important work to ensure the safe and efficient operation of remote ATC tower, and also a new research topic in the field of air traffic management in recent years. This paper takes the remote ATC tower of Jiande Airport in Hangzhou, China as an example to discuss how to carry out ergonomic evaluation. 11 experienced air traffic controllers (ATCO) were interviewed by questionnaire about their opinion on system availability, operation procedure, and working environment of remote ATC tower. Physiological detection of 4 duty air traffic controllers were performed on the time domain index of heart rate variability (RMSSD) during the aircraft takeoff and landing phases when the ATCO commanded the aircraft in the busy and non-busy scenarios. The results showed that the ATCO’s alertness and emotional tension increased during the aircraft takeoff and landing phases when commanding the aircraft both in the busy and non-busy scenarios. When commanding the aircraft to take off and land in a non-busy scenario, the workload of the ATCO will suddenly increase significantly; However, in the aircraft takeoff and landing phase of aircraft in the busy scenario, the workload of ATCOs increases slightly. The evaluation results prove that the function of the remote ATC tower of Jiande Airport in Hangzhou, China conforms to the ergonomics principle and can meet the use needs of the ATCO. It is suggested to improve the ATCOs’ cognitive level of the remote ATC tower, add the abnormal alarm prompt function of the remote ATC tower system, and explore the operation mode of one remote ATC tower servicing multiple airports simultaneously. Keywords: Air traffic management · Remote ATC tower · Ergonomics

1 Introduction In recent years, the remote air traffic control (ATC) tower technology has been gradually accepted by more air traffic controllers (ATCO) and applied in 6 locations, including large international transport airports, small and medium-sized civil aviation airports and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 306–319, 2023. https://doi.org/10.1007/978-3-031-35389-5_21

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower

307

general aviation airports in China, and has achieved good application effects. The remote ATC tower is based on a unified physical remote control center, which centralizes the tower control tasks undertaken by one or multiple physical airport towers to operate in a remote ATC tower center, and provide ATC services by utilizing remote tower system (RTS). Since there is no need to build new physical towers and other ATC related airport projects, it has advantages of low cost and convenience. The RTS is especially suitable for those airports who are unattended or with limited opening hours or only small traffic, and it can also be used as emergency backup equipment for large hub airports. One of the most important work when utilizing RTS technology is to evaluate the software and hardware system and operation environment of the RTS from the perspective of ergonomics, to determine whether the construction, operation environment and function of the RTS can meet the work needs of ATCOs. Leitner R [1] studied the situation of a physical remote tower control center serving multiple airports, developed a flow control tool for the simultaneous operation of multiple airports, and interviewed ATCOs and put forward 56 problems to improve the human-computer interface. Wen-ChinLi [2] studied the safety and screen scanning of an air traffic controller who is responsible for the control of two airports at the same time through a remote ATC tower, and found that an air traffic controller can be responsible for the controlling of two airports at the same time, and the installation of PTZ and visual enhancement system can assist the screen scanning of the air traffic controller. Peter Kearney studied the workload of an air traffic controller using a remote ATC tower to control and command two airports at the same time [3]. Moehlenbrink C, in the simulated environment of a remote tower control center commanding two or more airports at the same time, found the key factors affecting the visual attention and command performance of the tower controller by using the method of eye-movement recording and questionnaire survey of the controller, which can improve the design of the air traffic controllers’ working place [4]. The German Aerospace Center has studied the best frame rate that should be set for the remote ATC tower display device at Braunschwich Airport in Germany, and tested the work performance of the air traffic controller under four conditions of the remote ATC tower frame rate of 2, 5, 10, and 15 fps. The results show that no less than 2 fps can ensure that the air traffic controller can not only obtain the information needed for work, but also reduce the construction cost [5]. Zhang Jianping [6] summarized the combination of technical systems and the design of operating environment for improving the situational awareness and perception of remote ATC tower controllers by using ergonomics principles in Europe. In terms of practical application, China has implemented 3 different types of applications scenario in six airports, as shown in Table 1 below. China’s Guangzhou Baiyun International Airport, Guiyang Longdongbao International Airport, and Harbin Taiping International Airport use remote ATC towers to provide apron control services for aircrafts. The remote apron tower system integrates many traditional information monitoring systems such as multi-point positioning, surface surveillance radar, weather self-observation, flight information management, and AI enhanced panoramic video, allowing ATCOs to provide ATC service with better situational awareness than the traditional tower. For the all-weather, full-process and global monitoring of aircraft ground operation, when personnel, vehicles and aircraft intrude

308

C. Liu et al. Table 1. Airports using remote control tower technology in China

Number

Airports

Types of remote ATC tower function

1

Guangzhou Baiyun International Airport

Large transport airports using remote ATC tower to implement apron control

2

Guiyang Longdongbao International Airport

3

Harbin Taiping International Airport

4

Xinjiang Nalati Airport

5

Yunnan Luguhu Airport

6

Zhejiang Jiande General Aviation Airport

The control center using remote tower system to commands those airport in remote area and small flight volume A control center using remote tower system to serve general airports and heliports with small flight volume

into the unauthorized area, the system will automatically trigger the alarm. At the same time, more intuitive and multi-angle monitoring information can improve the maximum number of aircrafts commanded by the ATCOs, increase the flight flow, and improve safety level of the airport. Nalati Airport in Xinjiang province, and Luguhu Airport in Yunnan province are the representatives of small and medium-sized airports to implement remote control towers. To implement remote control tower technology and carry out ergonomic verification and evaluation in such small airports, it is necessary to focus on whether the remote ATC tower system can provide the ATCO with the same field of vision and the same visual effect as the traditional airport tower under the condition that the ATC seat setting and tower configuration are basically the same. In addition, Luguhu Airport in Yunnan province is the first high plateau airport in China to adopt the remote control tower technology. The high altitude and rapidly changing weather have brought great challenges to the airport operation, and the remote control tower has also posed new challenges to the use and management of the airport flight procedures and the minimum operating standards. Hangzhou jiande Remote ATC Center is the sole remote tower specialized for general aviation in China. Not only that, It is also the first remote control tower put into operation, with the longest operation time and commanding the largest number of aircrafts in China. There are significant difference for remote control tower usage between transport airports and general aviation airports, because the flight activity of general airports is quite different from that of transport airports. The general aviation aircraft has strong maneuverability, and its trajectory changes rapidly; The rapid rotation of the helicopter rotor requires high screen refresh rate, and it consume attention of ATCOs to track the small aircraft or helicopter; However, the flight path of large aircraft is stable, so it is very easy to be identified by ATCO. In the process of carrying out ergonomic verification on Hangzhou jiande Remote ATC Center, the following problems are emphasized. Firstly, the operation stability of the remote control tower system equipment and related risks were checked. Secondly, engineers continuously sort out, record and analyze the validation related data and information, and improved the performance of RTS. The third

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower

309

is to standardize the operation specifications of the remote control tower, formulating operation regulations and rules. In addition to this basic requirement above, it can also provide the following enhancement services: Firstly, the scene enhancement technology, which can provide the ATCOs with a clear vision under low visibility or at night through the image enhancement processing technology provided by the system; The second is to intelligently identify and track the aircraft, and prevent the controller from making mistakes, forgetting and missing through the technology of aircraft tag and dynamic monitoring; The third is to display all the information required by the ATCOs on one display screen through the integration with the ADS-B monitoring, meteorological information and other signals, so as to enhance the situational awareness of the controllers. Based on the practical experience in recent years, this paper introduces ergonomic assessment of the Jiande Remote ATC Center.

2 Method The purpose of ergonomic evaluation of remote control tower is to determine whether the working environment, facilities and equipment, working procedures, etc. can meet the work needs of ATCOs. The main assessment methods commonly used in this field include subjective assessment (KSS sleepiness scale, PSQI sleepiness scale, etc.) and objective detection (EEG, ECG, etc.). 2.1 Participants There are 11 licensed ATCOs working in Hangzhou jiande Remote ATC Center, and they all participated in this research work. They were interviewed from three aspects of system, i.e. availability, operation procedures and working environment. The ages of the participants were between 27 and 35 years old (M = 29.8; SD = 3.8). Working experience of them was between 3 and 11 years old (M = 4.8; SD = 2.1). 2.2 Subjective Evaluation Method The subjective evaluation methods commonly used in the aviation field include the following three types. The first is the Cooper-Harper scale, which is mainly used to evaluate the difficulty of aircraft driving. It has been used by the Federal Aviation Administration (FAA) for many years to evaluate the man-machine interface and pilot load of various aircraft. The second is the subjective workload assessment scale (SWAT), which assesses workload from three dimensions: time load, effort load, and psychological stress load. Each dimension has three grades: mild, moderate, and severe, so a total of 3 × 3 × 3 = 27 combinations. During the test, the subjects firstly ranked the importance of the three dimensions, and then selected the corresponding load level according to their own feelings, and converted it into a work load value between 0 and 100. The higher the score, the greater the workload. Thirdly, the task load index (NASA-TLX), a commonly used subjective assessment scale in the aviation field, assesses six dimensions of workload: effort, performance,

310

C. Liu et al.

frustration, time demand, mental demand, and physical demand. During the test, the subjects scored themselves on each dimension. The score of each item was between 0–20 points. There was also a scale setting with 10 points as the maximum score. Then, the total score is obtained by weighted summation, and the weight of each dimension is determined by the subjects’ dual comparison of each dimension. Theoretical research and practical application show that the above subjective measurement tools of workload have good reliability and validity, and the measurement is relatively simple. However, individual differences are large, and everyone’s understanding and feeling of workload are different, and the measurement results are not objective and accurate, therefore this method has great limitations. 2.3 Objective Detection Method The objective detection method judges the mental load level by measuring the parameter changes of individual physiological indicators (mainly including heart, eyes and brain) in the process of work. This method generally requires measuring equipment such as eye movement meter and sphygmomanometer. The subjective evaluation method and the objective detection method have their own advantages and disadvantages. The subjective evaluation method is simple and easy to implement, but it is closely related to the supervisor’s feeling of participants. The objective detection method has reliable detection data, but it requires facilities and equipment and its universality is limited. 2.4 Selection of Evaluation Methods Considering that the evaluation conclusions of the three subjective evaluation methods mentioned above are relatively simple, and fully respecting the practical experience and use opinions of experienced controllers, in order to maximize the reliability of the evaluation results, this paper selects the combination of questionnaire interview method and controllers’ physiological detection method for ergonomics evaluation.

3 Results and Discussion 3.1 Basic Information of Remote ATC Tower of Jiande Airport Hangzhou jiande Remote ATC Center was built and put into operation in 2019 (see Fig. 1). Until now it is the sole remote ATC center specialized for general aviation airport in China. The remote ATC center was originally only responsible for the flight control service of Jiande General Aviation Airport, which is 3.8 km away from it. In 2022, it also began to provide flight control service for Shengsi heliport in Zhoushan City, Zhejiang Province, which is more than 400 km away from the remote ATC center. Up to now it has operated stably for more than three years, commanding more than 20000 aircrafts for Hangzhou j Airport and 200 flights at Shengsi heliport.

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower

311

Fig. 1. Jiande remote ATC Center

The software and hardware system of the RTS is developed by the Second Research Institute of Civil Aviation of China. RTS contain high-definition digital cameras, weather sensors, audio and video access and other related equipment to collect airport data information in real time, and projects the real-time situation of the airport under control, such as the scene and weather, onto the surrounding LCD screen of the remote ATC tower center, Thus, it can provide comprehensive ATC services including approach guidance, takeoff and landing, and surface surveillance. The optical system terminal of the remote tower is the core equipment of the remote tower (see Fig. 2), with the backup of which the ATCO can master the information of the airport’s operating environment, the aircraft’s position and other information. With the assistance of ADS-B monitoring system, meteorological information system, and electronic strip system, it can achieve similar traditional tower functions, and can control and command the aircraft.

Fig. 2. Optical system of RTS in Jiande Remote ATC Center

312

C. Liu et al.

3.2 Ergonomic Evaluation by Questionnaire Method and Results Using the questionnaire in the Remote Tower Operation Evaluation Specification [7] as shown in Table 2, 11 licensed controllers of the remote control tower were interviewed from three aspects of system, i.e. availability, operation procedures and working environment. The average age of the controllers was 29.8 years old, and the average length of control work was 4.8 years. Table 2. Questionnaire for Human Performance Evaluation of Remote Control Tower Company:

Date:

Name of interviewer:

Name of the controller interviewed:

Number Question

Answer

1

Basic information

1.1

How long have you worked in this position?

1.2

Do you have any intention to work in this position for a long time?

1.3

When did you get in touch with the work related to the remote control tower?

1.4

What do you think are the most important functional modules in the process of using remote control tower to provide services?

1.5

What do you think is the biggest challenge to your work during the operation of the remote control tower?

1.6

How do you think the improvement effect of the remote control tower on the unit’s human resources optimization, cost control, etc.?

2

Remote Control Tower System Availability

2.1

Effectiveness of the system Are you satisfied with the ease of use of the system in general? Do you think the system is very simple to use? With the help of the system provided, can you complete the task and action plan quickly and effectively? How does it help you improve your work efficiency How comfortable is the system? Is it easy to learn how to use the system?

2.2

Information quality

Whether the system can prompt error messages and tell you how to solve the problem clearly? In terms of system use, how is the fault tolerance of the system (such as whether it can be easily and quickly recovered when mistakes are made during use)? How convenient is the information and functions provided by the system (such as help, screen information and other documents) to use? The organization and presentation of information on the system screen are clear ?

2.3

Quality of Interface

Do you think the system interface is comfortable? How much do you like these system interfaces? Whether the functions of the system are comprehensive and meet all the functions and performance you expect?

2.4

Opinions and suggestions

(Suggestions on system usability, accuracy, fault tolerance, comfort, etc.)

(continued)

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower

313

Table 2. (continued) 3

Operation Procedure

3.1

Completeness

How reasonable are the overall organizational structure and assignment of responsibilities? Are there any deficiencies? Is there corresponding procedures and manuals to follow for all air traffic controllers’ work? Are there any deficiencies? Do you think the current emergency treatment procedures are complete and are there any deficiencies for the possible emergency situations of remote control tower operation? Do you think the current information notification mechanism is comprehensive? Do you think the current daily safety management system is comprehensive?

3.2

Operability

Do you think the current working procedure is clear and easy to master? According to the existing working procedures, can you complete the task and action plan quickly? Do you have any suggestions for better working procedures? Do you think the current emergency treatment procedures are clear and easy to grasp?

3.3

Effectiveness

According to the existing working procedures, whether you can effectively complete the task and action plan? Do you have any suggestions for better working procedures? Whether relevant emergency situations can be effectively solved through emergency treatment procedures?

3.4

Opinions and suggestions

4

Working Environment

4.1

Environment Set

(Opinions and suggestions on the completeness, operability and effectiveness of the operation procedure)

How do you feel about the overall comfort of working environment of the control center? What do you think of the space size of the control center? How comfortable do you think the ventilation, temperature, noise, lighting and other aspects of the control center are? Do you think the working space of the ATC seat is reasonable, and how about the layout design of the ATC seat? How comfortable do you think the layout design of optical display screen is (for controllers)?

4.2

Adaptability

In the face of changes in working procedures and working environment, can you quickly adapt to such changes? There is a certain signal transmission delay in remote control. How much will it affect your task and action plan? Remote control tower uses optical display screen instead of on-site visual observation. How much impact will it have on your mission and action plan? How long will you feel tired or uncomfortable when working in the remote control tower?

4.3

Opinions and suggestions

(work environment, human adaptability and relevant suggestions)

314

C. Liu et al.

The results of the questionnaire interview with the ATCOs showed that the remote control tower system of Jiande Remote Control Center has high efficiency, simple operation, comfortable use, clear information display and reasonable interface layout, but 27% of the controllers believed that the system lacked some information prompt functions. In terms of operation procedures, daily work procedures and emergency response procedures are complete and easy to master, can effectively complete tasks and action plans, and can effectively solve relevant emergency situations, but 18% of controllers believe that the work manual still needs to be improved. In terms of working environment, the studio is generally comfortable, the space is large enough, the seat layout is reasonable, the layout design of optical display screen is comfortable, and the impact of video delay is acceptable. However, 27% of controllers believe that there is slight visual error between optical display screen and visual observation, and the delay has a slight impact, and continuous duty for 3 to 4 h will cause visual fatigue. 3.3 Ergonomic Evaluation by Physiological Detection Method and Results The skin resistance and conductance of the human body change with the function of the sweat glands of the skin. These measurable changes in skin electricity are called skin electrical activity (EDA). The electrodermal response is closely related to individual emotion, arousal and attention. The electrical activity of skin can reflect the influence of stimulus events on individuals very quickly and sensitively. RMSSD (root mean square of continuous difference of IBI), a commonly used index for time domain analysis, can reflects the rapid change of heart rate. The lower its value, the higher the workload. Based on the group training data monitoring platform developed by Beijing ZhongKe XinYan Technology Co., Ltd, the author collected the controller’s skin electricity and heart rate variability data, and evaluated the ATCOs’ workload quantitatively. The experimental results are as follows (see Figs. 3, 4, 5, 6, 7 and 8).

Fig. 3. RMSSD of ATCO in busy scenario

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower

315

Fig. 4. RMSSD of ATCO in non-busy scenario

Fig. 5. RMSSD of ATCO during aircraft landing under busy scenario

Fig. 6. RMSSD of ATCO during aircraft takeoff under busy scenario

Through data processing, index extraction, and comparative analysis of the time domain index of heart rate variability (RMSSD) during the takeoff and landing phases of the aircraft commanded by the ATCO under busy and non-busy scenarios, and combined with the visual analysis of frequency domain index spectrum, the results show that the heart rate variability (RSMSSD) of the ATCO during the takeoff and landing phases of the aircraft commanded by the controller shows a downward trend no matter in busy or non-busy scenarios. The results show that whether in busy or non-busy scenes, the sympathetic activity of the ATCO is significantly enhanced during the command of aircraft takeoff and landing, which reflects the increase of the ATCO’s physiological arousal level and its alertness and emotional tension at this stage (see Fig. 9).

316

C. Liu et al.

Fig. 7. RMSSD of ATCO during aircraft takeoff under non-busy scenario

Fig. 8. RMSSD of ATCO during aircraft landing under non-busy scenario

Fig. 9. Comparison of RMSSD of ATCOs in non-busy and busy scenarios

It can be seen from Fig. 7 that the average value of RMSSD of the ATCO in the non-busy scenario is significantly higher than that in the busy scenario, which confirms the correctness that the smaller the RMSSD value is, the higher the workload of the controller is. Compared with the busy scene, the minimum RMSSD and the average

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower

317

RMSSD of the ATCO during the takeoff and landing phase of the aircraft in the nonbusy scene are reduced more, indicating that the sympathetic activity of the ATCO is more intense when the ATCO commands the takeoff and landing phase of the aircraft in the non-busy scene, and its alertness and emotional tension are increased strongly, indicating that the workload is suddenly increased significantly at this time. In the takeoff and landing phase of aircraft under busy scenes, the minimum RMSSD and the average RMSSD of ATCOs are slightly reduced, indicating that the increase of sympathetic nerve activity is relatively slight at this time, and its alertness and emotional tension are slightly increased, and the workload is slightly increased.

4 Conclusions and Suggestions 4.1 Evaluation Conclusions The ATCO questionnaire survey shows that the functions of the remote control tower of Jiande Airport can meet the use needs of the ATCO; The facilities and equipment, and the working environment layout conform to the ergonomic principles of human factors, and the ATCO working in the remote tower environment can obtain the situational awareness and perception ability similar to the traditional tower. The physiological test of the ATCO showed that when the ATCO used the remote control tower to command the takeoff and landing process of the aircraft, the sympathetic nerve activity of the ATCO was significantly enhanced, reflecting the increase of the physiological arousal of the ATCO at this stage, and the increase of his alertness and emotional tension. When the ATCO commands the aircraft to take off and land in a nonbusy situation, the workload of the ATCO will suddenly increase significantly; However, in the takeoff and landing phase of aircraft in the busy scene, the workload of ATCOs increases slightly. Based on the use feedback of the ATCO, the remote control tower also has some shortcomings. It is necessary to enhance the adaptability of the ATCO to the remote control tower and improve the support ability of the remote control tower: Firstly, based on the fact that some ATCOs are not willing to accept the remote control tower, measures should be taken to improve controllers’ cognition level of the remote control tower, so as to make ATCOs easier and more willing to accept the remote control tower. It is advises to strengthen the training of ATCOs, enhance the adaptability of controllers to the use of optical systems for remote control tower, and improve the level of human-computer interaction. The second is that due to the lack of abnormal alarm prompt function of the remote control tower system, the controller cannot be reminded to pay attention or take corresponding measures; Because the system does not have the function of automatic detection of aircraft tracking, it cannot automatically prompt the controller of the aircraft position. The controller consumes a lot of energy to judge the aircraft position, and this aspect should be improved. Thirdly, at present the two ATC seats in the control center are independent from each other, which does not reflect the advantages of the remote control tower in saving manpower. Therefore, the configuration of ATC seats should be further optimized, and the feasibility of using one seat to command two airports should be explored.

318

C. Liu et al.

4.2 Research Suggestions on Ergonomic Evaluation of Remote ATC Tower Based on the practical experience of China’s remote control tower and application needs for remote tower technology of more and more countries around the world in the future, it is suggested to carry out the following research in ergonomics field. The first is to study the impact of air traffic control center layout, lighting, facilities and equipment color, house height and other factors on the physical and mental health of ATCOs, and provide physical conditions for cultivating the positive attitude of controllers by optimizing the working environment layout of the remote control tower center. Secondly, the optical system terminal is the core equipment of the remote control tower. How to improve the simplicity, learnability and comfort of the equipment, make the human-computer interface more friendly, and reduce the visual fatigue of the ATCO is worth studying. Thirdly, under the operation mode of one remote control tower commanding air traffic of multiple airports, the key question is how to clearly distinct the airport under the command of the ATCO by means of information label, equipment color difference and other methods to avoid the controller confusing the airports. Fourthly, is to study the differences in the adaptability and acceptance of the remote control tower between the controllers who have new contact with the remote control tower and those who have been working in the remote control tower after working in the traditional physical tower for a long time, and explore ways to improve the adaptability and acceptance of the ATCO to the remote control tower. Fifthly, based on the difference between the relatively stable flight of transport aircraft and the strong flight maneuverability of general aviation aircraft, the differences between transport airports and general aviation airports in the allocation of remote tower equipment and operating standards are studied. Acknowledgement. The authors would like to express special thanks to all the air traffic controllers who engage in the questionnaire and physiological Detection. Their opinion, experience, and advice are invaluable to this research.

References 1. Leitner, R., Oehme, A.: Planning remote multi-airport control—design and evaluation of a controller-friendly assistance system. In: Fürstenau, N. (ed.) Virtual and Remote Control Tower. Research Topics in Aerospace. Springer, Cham (2022). https://doi.org/10.1007/978-3-03093650-1_20 2. Li, W.-C., Kearney, P., Braithwaite, G., Lin, J.J.H.: How much is too much on monitoring tasks? Visual scan patterns of single air traffic controller performing multiple remote tower operations. Int. J. Indust. Ergonom. 67, 135–144 (2018) 3. Kearney, P., Li, W.C., Zhang, J., et al.: Human performance assessment of a single air traffic controller conducting multiple remote tower operations. Hum. Fact. Ergon. Manuf. Serv. Indust. 30(2), 114–123 (2020) 4. Moehlenbrink, C., Papenfuss, A.: ATC-monitoring when one controller operates two airports: Research for remote tower centres. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Sage Publications, Los Angeles, CA, 2011, vol. 55, pp. 76–80 (2011)

Research on Ergonomic Evaluation of Remote Air Traffic Control Tower

319

5. Jakobi, J., Hagl, M.: Effects of lower frame rates in a remote tower environment. In: The Tenth International Conference on Advances in Multimedia (MMEDIA 2018), IARIA, pp. 16–24 (2018) 6. Jianping, Z., Xiaoqiang, T.: Overview of research and application of remote control tower operation. Sci. Technol. Eng. 20(24), 9742–9750 (2020) 7. Civil Aviation Administration of China. Remote control tower operation evaluation specification (draft for comment) (2019)

A Framework for Supporting Adaptive Human-AI Teaming in Air Traffic Control Stathis Malakis1(B) , Marc Baumgartner1 , Nora Berzina1 , Tom Laursen1 , Anthony Smoker1 , Andrea Poti1 , Gabriele Fabris1 , Sergio Velotto1 , Marcello Scala1 , and Tom Kontogiannis2 1 IFATCA Joint Cognitive Human Machine System (JCHMS) Group, International Federation

of Air Traffic Controllers Associations, Montreal, Canada [email protected] 2 Cognitive Ergonomics and Industrial Safety Laboratory, Department of Production Engineering and Management, Technical University of Crete, Chania Hellas, Greece

Abstract. In recent years, the growth of cognitively complex systems has motivated researchers to study how to improve these systems’ support of human work. At the same time, there is a momentum for introducing Artificial Intelligence (AI) in safety critical domains. The Air Traffic Control (ATC) system is a prime example of a cognitively complex safety critical system where AI applications are expected to support air traffic controllers in performing their tasks. Nevertheless, the design of AI systems that support effectively humans poses significant challenges. Central to these challenges is the choice of the model of how air traffic controllers perform their tasks. AI algorithms are notoriously sensitive to the choice of the models of how the human operators perform their tasks. The design of AI systems should be informed by knowledge of how people think and act in the context of their work environment. In this line of reasoning, the present study has set out to propose a framework of cognitive functions of air traffic controllers that can be used to support effectively adaptive Human - AI teaming. Our aim was to emphasize the “staying in control” element of the ATC. The proposed framework is expected to have meaningful implications in the design and effective operationalization of Human - AI teaming projects at the ATC Operations rooms. Keywords: Air Traffic Control · Cognitive Systems Engineering · Artificial Intelligence

1 Introduction In recent years, there has been a realization that the growth of cognitively complex systems has motivated researchers and practitioners from diverse backgrounds to study how to improve these systems’ support of human work. At the same time, there is a momentum for introducing Artificial Intelligence (AI) in the safety critical domains building upon: a. Advances in capacity to collect and store massive amounts of data. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 320–330, 2023. https://doi.org/10.1007/978-3-031-35389-5_22

A Framework for Supporting Adaptive Human-AI Teaming

321

b. Significant increases in computing power, and c. Development of increasingly powerful algorithms and architectures. The Air Traffic Control (ATC) system is a prime example of a cognitively complex safety critical system where AI applications are introduced. Soon AI systems are expected to support even more than now air traffic controllers, flow controllers, operational supervisors, and flight information officers in performing their safety critical tasks within the ATC ecosystem. Furthermore, the widespread introduction of AI is expected to create a new ATC environment, which will be tightly coupled, more complex to manage, and with pressing needs for: a. b. c. d. e.

Minimization of delays. Accommodating a diverse array of autonomous aircraft. Operating smoothly in adverse weather conditions. Smoothing out 4D aircraft trajectories, and Minimizing environmental impact.

European Union [1, 2] is envisioning a Digital European Sky and an irreversible shift to low and ultimately no-emission mobility. With this goal in mind, ATC and aviation will evolve into an integrated digital ecosystem characterized by distributed data services. This is planned to be accomplished mostly by leveraging digital technologies to transform the aviation sector. The aim is to deliver a fully scalable Air Traffic Management (ATM) system for manned and unmanned aviation that is even safer than today’s, and, based on higher air ground integration. The ATM provides an essential service for aviation. While the essence of ATM is and will always remain, to ensure the safe and orderly execution of all flights it needs to do so in the most environmentally friendly and cost-efficient way [1]. In this context, International Federation of Air Traffic Controllers (IFATCA) has created the concept of Joint Cognitive Human Machine System (JCHMS) or a Human Centric approach and wishes to influence ICAO and standardization bodies such as EUROCAE and EASA in Europe as well as other global organization such as RTCA, SAE in the way these new technologies are designed. Nevertheless, the design of AI systems that effectively support humans poses significant challenges [3]. In the ATC domain, as early as 1951, Fitts [4] proposed the use of automation to replace human functions with technology. In this approach the capabilities (and limitations) of people and machines are compared on a number of salient dimensions, and function allocation is made to ensure that the respective capabilities are used in an optimal way (Compensatory principle). However, it failed to acknowledge the essential condition that it is necessary at all times to remain in control of what happens. Humans are aware of what they are doing and can imagine what the outcomes may be. Machines and technology can do neither. Digitalization relies on highly effective but poorly understood algorithms. By replacing human functions with technology that is not fully comprehensible, control is gradually and irretrievably lost. It is acknowledged that although AI has many potential benefits, suffers from a number of challenges for successful performance in complex safety critical real-world environments, including brittleness, perceptual limitations, hidden biases, and lack of a model of causation important for understanding and predicting future events. These

322

S. Malakis et al.

limitations mean that AI will remain inadequate for operating on its own in many complex and novel situations for the foreseeable future, and that AI will need to be carefully managed by humans to achieve their desired utility [5]. Central to these challenges is the choice of the model of how air traffic controllers perform their tasks. AI algorithms are notoriously sensitive to the choice of the models of how the human operators perform their tasks. It is widely accepted that effective sensemaking and decision support systems cannot be designed by an engineer’s intuition alone. The design of AI systems should be informed by knowledge of how people think and act in the context of their work environment. Undoubtedly, AI design teams must be knowledgeable about computer software and the state of the art in algorithms, but if the AI systems fail to support cognitive functions of air traffic controllers, they will be eventually rejected in the workplace. In this line of reasoning, the present study has set out to propose a framework of cognitive functions of air traffic controllers that can be used to support effectively adaptive Human - AI teaming. Our aim was to develop a pragmatic framework that will provide a basis for building adaptive human - AI teaming architectures by respecting the human centric nature of the ATC system and emphasizing the “staying in control” element of the ATC. With this goal in mind, we employed Cognitive Systems Engineering (CSE) [6, 7] methods to develop a framework of cognitive functions for air traffic controllers. We have chosen CSE as our research paradigm because CSE has grown out of a need to design systems within which people can interact effectively. Moreover, CSE is an approach to the design of technology, training, and processes intended to manage cognitive complexity in sociotechnical systems [8]. The proposed framework is expected to have meaningful implications in the design and effective operationalization of Human - AI teaming projects at the ATC Operations rooms.

2 Making Sense of the “New” ATM Ecosystem A pilot in a cockpit is one of the earliest and most persistent symbols of what we can achieve with a positive, collaborative and adaptive working relationship between humans and machines. Taking a step further and consider as the unit of analysis the airplane, i.e., the pilots plus flight control and automation, as the flight Joint Cognitive System (JCS) then the ATM is the environment. If we make one more step further and consider the pilots and the ATM as one system—the traffic flow JCS in which case the environment is the airlines, the airports, and the other aviation stakeholders [9]. If we consider the ATM comprised of three layers then current architecture is characterized by: • Airspace layer: Limited capacity, poor scalability, fixed routes, fixed national airspace structures. • Air traffic service layer: Limited automation, Low level of information sharing. • Physical layer: Fragmented ATM infrastructure. Digitalization and especially AI, which lies at its core, is expected to enable [1]: • Dynamic & cross FIR airspace configuration & management Free routes High resilience at the Airspace layer.

A Framework for Supporting Adaptive Human-AI Teaming

323

• Automation support & virtualization Scalable capacity at the Air traffic service layer. • Integrated and rationalized ATM infrastructure at the Physical layer. It is evident that in the era of digitalization and Big Data, the ATM ecosystem faces important and potentially disruptive challenges with the introduction of AI both in the air and the in the ground. It is ubiquitous that the use of AI is spreading rapidly in every industry with aviation and ATM making no exception. However, innovative technologies not only provide capacity enhancement opportunities and other performance improvements but also raise new regulatory, safety, cognitive and operational challenges, and tradeoffs [3]. Therefore, there is an urgent need to examine the introduction of AI and more importantly the human-AI teaming cautiously. The significant and continued growth in air traffic in the years prior to the COVID19 pandemic has prompted considerable exploration of the use of AI in the ATC. It is expected that AI will provide the additional capacity to meet the challenges of increasing air traffic complexity due to sustained growth and new airspace users and support more efficient and environmentally friendly operations while maintaining and increasing current safety levels. Modern ATM systems comprise many airspace sectors with varying air traffic flows that interact in complex ways and evolve dynamically. ATC is a work domain that relies on the cognitive functions of Air Traffic Controllers and their collaboration with flight crews, airport operators, network managers and the other aviation stakeholders to control the airspace, manage safety and adapt to the changing demands of new technological initiatives [10]. From a purely AI view the ATM system is a real-time safety-critical decision-making process in highly dynamic and stochastic environments where human air traffic controllers monitor and direct many aircraft flying through its designated airspace sectors [11]. AI is expected to increase the resilience and the flexibility of the system (i.e., increase support during emergency in flight or on the ground or unusual situations, as severe weather, failures etc.). Furthermore, ongoing projects such as Extended Minimum Crew Operations (eMCOs) and Single Pilot Operations (SiPOs) rely heavily on AI and the application of powerful Machine Learning (ML) methods. eMCOs are defined as operations where the flight time is extended by means of rest in flight with the minimum flight crew. It is achieved by allowing operations with one pilot at the controls, during the cruise flight phase; however, offering an equivalent overall level of safety through compensation means (e.g., ground assistance, advanced cockpit design with workload alleviation means, pilot incapacitation detection). SiPOs are defined as end-to-end singlepilot operations. Air operations regulation already foresees conditions and limitations under which these types of operations are allowed. In the future, it is expected that these conditions and limitations will need to evolve to extend single-pilot operations to large airplanes, provided that compensation means (e.g., ground assistance, advanced cockpit design with workload alleviation means, capability to cope with pilot incapacitation) are in place in order to provide for an overall level of safety equivalent to today’s twopilot operations. EASA is working with interested industry stakeholders to explore the feasibility of such operational concepts, while maintaining current safety levels. It is evident that both projects that rely heavily on AI and ML will need ATM support and

324

S. Malakis et al.

therefore introduce new operational requirements. All these will impose a brand-new array of challenges to the ATM systems in the next years.

3 Method We followed the CSE approach research methods [6–8] We used a range of methods over several phases of fieldwork, documentation analysis and finally divergent thinking, comparative reasoning, and integrative thinking to compile a framework of cognitive functions of Air Traffic Controllers that can support adaptive Human-AI teaming. We applied a documentation analysis of the most recent reports, white papers, position papers and technical documents from ATM and aviation organizations [12–17] about digitalization and AI/ML in the European continent. The next step was to perform a literature review. There is an extensive body of research CSE literature that can inform the development and application of automated systems which is beyond the scope of this paper. Hence, we decided to concentrate on some influential research publications and reports in the areas of automation AI/ML, CSE and ATM [18–29]. In the final phase, we performed an extensive literature review focused on models of human performance, team performance, decision-making, anomaly response and cognitive functions [30–59].

4 A Framework for Supporting Adaptive Human-AI Teaming The framework we propose examines six cognitive functions as follows: • • • • • •

Steering or Goal Setting. Sensemaking and Mental Models. Common Operating Picture or Shared Mental Models. Coordination and Transfer of Control. Managing Changes. Operating or Planning-Doing-Checking cycle.

4.1 Steering or Goal Setting An orientation of performance towards specific goals that can be modified later during events. It involves an assessment of the context of work in terms of threats, constraints, and resources. Sometimes explorative goals may be taken by air traffic controllers to see how the system reacts. Steering broadly refers to the choices and goals that people set in order to achieve their mission. In a complex environment, people may have to tackle multiple goals that compete for limited resources. To be successful in steering, people need to demonstrate willingness to make the necessary tradeoffs and show an ability to spot and exploit leverage points [35]. Fundamental to steering is the presence of a large ‘set of action repertoires’ in the form of work practices honed through ‘enlightened’ experimentation [60]. AI could very well assist air traffic controllers in making more environment friendly decisions in situations where either solution method has no consequence on the overall

A Framework for Supporting Adaptive Human-AI Teaming

325

traffic picture. If a conflict needs to be solved by either a level change or headings and it does not make a difference which one it is, AI would have the capability to advise which solution would be better for the environment based on the wind and atmospheric conditions. Making aviation more climate friendly is a goal made by the EU to which individual air traffic controllers have no chance to contribute to unless more assistance is provided in their decision making. Obviously, safety would come first, but with the way, we see AI potential in ATC as an assistant, air traffic controllers do not normally care for environmentally wise headings or a level change if there is no other traffic to consider. 4.2 Sensemaking and Mental Models A process of interpretation of the situation to make sense of the problem and understand the factors that contributed to its occurrence. Sensemaking relies on the mental models of the air traffic controllers that are developed and refined over time. When information is incomplete or delayed, mental models are useful in filling gaps in understanding and testing hypotheses about causes and plausible effects. In dynamic situations, poor decisions are made when models remain outdated. Sensemaking relies on having an adequate ‘mental model’ of potential hazards, causes, available resources and risk control strategies. Mental models help controllers and flight crews challenge their understanding and remain vigilant to the possibility of failure. In ATC and aviation, a common problem is ‘failure to reframe’ a mental model or mindset, as new evidence becomes available. The initial situation assessment may seem appropriate, given the availability of information, but practitioners may fail to revise their mindsets. Many patterns of breakdown relate to problems in managing complexity and flawed mental models [61]. 4.3 Common Operating Picture or Shared Mental Models A COP refers to a common perception of threats, available resources and forces, opportunities for action and assessments of work. The COP can create common orientation and help teams coordinate and converge their efforts to the overall mission. The COP is an important frame for making sense of a problem in a collective manner. The COP should incorporate information that enables situational information to be produced, visualized and presented in such a way that all information is available to all actors involved in the situation in real time. Flight decks and ATC operations rooms should support COP for practitioners. However, radar displays, decision support tools, flight data displays and other inanimate objects are simply repositories from which practitioners gather information. AI could assist air traffic controllers with establishing a clearer picture during weather avoidance. The weather radars available on controller working positions are extremely unreliable and currently not updated in real time. Displaying a full weather radar on the screen would cause too much clutter. If AI works in the background and synchronizes the most up to date weather radar and predicted traffic flows, it could assist the air traffic controllers by giving information on expected avoidance direction. This has the potential to allow the air traffic controller to manage the disrupted flows more efficiently (because

326

S. Malakis et al.

the requests to avoid weather cells would not be unexpected) and reduce capacity restrictions. Within a single center, this assistant could warn air traffic controllers about certain headings that would most likely result in later weather avoidance in another sector and let the air traffic controller proactively offer a re-routing. It would be great if AI could also collect PIREP information and NOTAMs on turbulence and provide air traffic controllers with information on which cruising levels to avoid. This would reduce air traffic controllers’ workload, as we would no longer climb aircraft to levels that they then need to descend out of shortly after. At the very least, air traffic controllers could use the information given by AI to provide more accurate information to pilots that ask about turbulence and then let them make their own decision about their cruising level. 4.4 Coordination and Transfer of Control A team process that has been included to examine the coordination of multiple loops or units, the handover of work between shifts and the interaction of people and automation. Coordination and transfer of control is a complex process that involves team synchronization, handover of work between shifts and interaction with colleagues and automation. The very nature of teamwork and task allocation can generate many dependencies that require orchestrated action in order to converge toward the same goal. 4.5 Managing Changes A process for addressing changes in the system or the environment that may have an impact on performance. Adaptation usually takes the form of flexibility in changing behaviors between alternative modes of operation—for example, tight versus loose plans and feedforward versus feedback control modes. Adaptive teams monitor changes in the system or the environment and try to match their capabilities to them or even reserve their capabilities for anticipated events. The implication is that practitioners should retain some residual capacity for managing a number of secondary activities that have to do with correcting side effects and coping with task interruptions, including an assessment of one’s own capabilities and “margins for maneuver” [25]. For instance, sector load prediction could be made more reliable, as AI would have the capability to collect and cross-reference all the necessary data – predicted traffic count, complexity, weather, turbulence, etc. In essence, AI assistant could provide everything that a supervisor needs to know in order to decide to either open or close a sector. After collecting this data, the AI assistant can then provide advice on sector opening/closing, but the supervisor would make the final decision. 4.6 Planning-Doing-Checking Cycle A basic performance cycle that involves devising a plan to implement a certain goal, executing the plan and monitoring its effects. Implementation refers to manual and tracking activities necessary to achieve the action targets set in the plans. Tracking activities should respect the constraints of space and time in the work domain. Implementation

A Framework for Supporting Adaptive Human-AI Teaming

327

relies on feedback control where target-outcome gaps are corrected in time. The basic Planning-Doing-Checking (PDC) cycle underlies all types of performance where people devise a plan of action to use their resources within certain constrains, execute the plan in a timely fashion and check over or evaluate their work progress.

5 Discussion and Conclusion Changes in the ATM domain are of permanent nature and challenges of research, development, and transition to introduce these changes are a daily life for Air navigation Service Providers and their staff. Be it Air Traffic Controllers, Technicians, Engineers, managers, and Decision makers. Automation is nothing new in the ATM system. The so-called New Technologies leading digitalization, including AI and ML are finding their ways into the ATM working environment. Whereas lot of expectation is linked to a so-called technology hype introduction of new technology will have to follow the path of introducing new technological component into a running ATM system. Linked to the regulatory and certification challenges, a lot of the modern technology will have to be interwoven into the existing architecture. This will create new challenges, surprises and will not escape the rough journey of increased automated system in ATM. One of the driving arguments for the introduction new technology is that costs of production are reduced because there are fewer Air Traffic Controllers’ costs - be it training, the reliability and inefficiency of the practitioner. Designs that seek to optimize managerial values can have the effect -intentional or otherwise -to privilege the managerial objectives and in doing so constrain the humanistic design. The consequences of this are that the practitioner’ degrees of freedom are reduced; buffers and margins are impacted in ways that limit the ability of the system to maintain and sustain adaptability when confronted with uncertainty and surprise events and thereby making the system less effective [62]. Additionally, increasing the distance between the Air Traffic Controllers, and the system reduces the practitioners’ ability to intervene in case of unexpected events. When work changes, as in the case of introduction of modern technology in the operations room, there are consequences on the practitioner’s ability to create strategies that can exploit system characteristics of agility and flexibility, in other words adaptive capacity. Boy [63] refers to this as a form of smart integration: designing for innovative complex systems - that exploit the ability to understand increasing complexity. This means embracing complexity. A design that embraces complexity will adopt the opposite of the reductionist view – which means reducing or eliminating the effects of complexity, by eliminating or reducing the role of the human. As opposed to designs that embrace and design for complexity by matching emerging system behaviors with creative emergent human real time responses. The findings of this study are pending further validation and generalization due to the exploratory character of research. Any associations and inferences drawn from this study are expected to remain relatively stable when studies of introducing new technology to OPS rooms are carried out in live settings. It is also hoped that this framework for supporting adaptive Human-AI Teaming could provide a viable solution to the efficient introduction of innovative technology in the OPS rooms.

328

S. Malakis et al.

References 1. Ju, S.: Strategic Research and Innovation Agenda - Digital European Sky (2020) 2. Ju, S.: European ATM Master Plan- Digitalising Europe’s Aviation Infrastructure (2020) 3. Malakis, S., et al.: Challenges from the Introduction of Artificial Intelligence in the European Air Traffic Management System. IFAC-PapersOnLine 55(29), 1–6 (2022) 4. Fitts, P.M. (ed.): Human Engineering for an Effective Air Navigation and Traffic-Control System. Ohio State University Research Foundation. Columbus (1951) 5. National Academies of Sciences, Engineering, and Medicine.: Human-AI Teaming: State-ofthe-Art and Research Needs. The National Academies Press. Washington, DC (2022) 6. Hollnagel, E., Woods, D.D.: Joint Cognitive Systems Foundations of Cognitive Systems Engineering. Taylor and Francis, London (2005) 7. Woods, D.D., Hollnagel, E.: Joint Cognitive Systems: Patterns in Cognitive Systems Engineering. CRC Press, Boca Raton (2006) 8. Militello, L.G., Dominguez, C.O., Lintern, G., Klein, G.: The role of cognitive systems engineering in the systems engineering design process. Syst. Eng. 13, 261–273 (2010) 9. Hollnagel, E.: Flight decks and free flight: Where are the system boundaries? Appl. Ergon. 38, 409–416 (2007) 10. Kontogiannis, T., Malakis, S.: Cognitive Engineering and Safety Organization in Air Traffic Management. CRC Press; Taylor & Francis, Boca Raton (2017) 11. Brittain, M., Yang, X., Wei, P.: A Deep multi-agent reinforcement learning approach to autonomous separation assurance. arXiv 2020, arXiv:2003.08353 (2020) 12. EASA. Artificial Intelligence Roadmap 1.0. European Aviation Safety Agency (2020) 13. EASA. First Usable Guidance for Level 1 Machine Learning Applications. European Aviation Safety Agency (2021) 14. Eurocontrol. A Patterns in How People Think and Work Importance of Patterns Discovery for Understanding Complex Adaptive Systems. Eurocontrol, Brussels (2021) 15. Eurocontrol. Digitalisation and Human Performance. Hindsight 33. Winter 2021–2022. Eurocontrol, Brussels (2021) 16. CANSO. Artificial Intelligence. CANSO Whitepapers Emerging Technologies for Future Skies. Civil Air Navigation Services Organization (2021) 17. CANSO. Virtualisation. CANSO Whitepapers Emerging Technologies for Future Skies. Civil Air Navigation Services Organization (2022) 18. Bainbridge, L.: Ironies of automation. Automatica 19, 775–780 (1983) 19. Parasuraman, R., Riley, V.: Humans and automation: use, misuse, disuse, abuse. Hum. Fact. 39 (1997) 20. Dekker, S.W., Woods, D.D.: To intervene or not to intervene: the dilemma of management by exception. Cogn. Technol. Work 1, 86–96 (1999) 21. Moray, N., Inagaki, T.: Laboratory studies of trust between humans and machines in automated systems. Trans. Inst. MC 21(4/5), 203–211 (1999) 22. Parasuraman, R., Sheridan, T.B., Wickens, C.D.: A Model for types and levels of human interaction with automation. IEEE Trans. Syst. Man Cybernet. Part A Syst. Hum. 30(3), May (2000) 23. Russell, S.J., Norvig, P., Davis, E.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall, Upper Saddle River (2010) 24. Woods, D.D., Sarter, N.B.: Learning from automation surprises and ‘going sour’ accidents. In: Sarter, N.B., Amalberti, R. (eds.) Cognitive Engineering in the Aviation Domain, pp. 327–354. Lawrence Erlbaum Associates, Mahwah (2000) 25. Woods, D.D., Branlat, M.: Holnagell’s test: Being in control of highly interdependent multilayered networked systems. Cogn. Technol. Work 12, 95–101 (2010)

A Framework for Supporting Adaptive Human-AI Teaming

329

26. Woods, D.D., Dekker, D., Cook, R., Johannesen, L., Sarter, N.: Behind Human Error, 2nd edn. Ashgate Publishing, Farnham (2010) 27. Norman, D.: The Design of Everyday Things. MIT Press (2013) 28. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012) 29. Alpaydin, E.: Introduction to Machine Learning, 3rd edn. MIT Press, Massachusetts Institute of Technology, Cambridge (2014) 30. Cohen, M.S., Freeman, J.T., Wolf, S.P.: Meta-recognition in time stressed decision making: recognizing critiquing and correcting. Hum. Factors 38, 206–219 (1996) 31. Colvin, K., Funk, K., Braune, R.: Task prioritization factors: two part-task simulator studies. Int. J. Aviat. Psychol. 15, 321–338 (2005) 32. D’Arcy, J.F., Della Rocco, P.: Air Traffic Control Specialist Decision Making and Strategic Planning – A Field Study (DOT/FAA/CT-TN01.05). DOT/FAA William J, Hughes Technical Center, Atlantic City International Airport, NJ (2001) 33. Gronlund, S.D., Dougherty, M.R., Durso, F.T., Canning, J.M., Mills, S.H.: Planning in air traffic control: impact of problem type. Int. J. Aviat. Psychol. 15, 269–293 (2005) 34. Klein, G.A.: Recognition-primed decisions. In: Rouse, W.B. (ed.) Advances in Man-Machine Systems Research, vol. 5, pp. 47–92. JAI Press, Greenwich (1989) 35. Klein, G.A.: Sources of Power: How People Make Decisions. MIT Press, Cambridge (1998) 36. Kontogiannis, T.: Stress and operator decision making in coping with emergencies. Int. J. Hum. Comput. Stud. 45, 75–104 (1996) 37. Kontogiannis, T.: Training effective human performance in the managing of stressful emergencies. Cogn. Technol. Work 1, 7–24 (1999) 38. Rantanen, E.M., Nunes, A.: Hierarchical conflict detection in air traffic control. Int. J Aviat. Psychol. 15, 339–362 (2005) 39. Reynolds, T.G., Histon, J.M., Davison, H.J., Hansman, R.J.: Structure, intent and conformance monitoring in ATC. In: Proceedings of the Air Traffic Management (ATM) Workshop on ATM System Architectures and CNS Technologies, Capri, Italy, 22–26 September (2002) 40. Seamster, T.L., Redding, R.E., Cannon, J.R., Purcell, J.A.: Cognitive task analysis of expertise in air traffic control. Int. J. Aviat. Psychol. 3, 257–283 (1993) 41. Woods, D.D.: Cognitive demands and activities in dynamic fault management: abduction and disturbance management. In: Standon, N. (ed.) Human Factors of Alarm Design. Taylor & Francis, London (1994) 42. Bowers, C.A., Jentsch, F., Salas, E., Brawn, C.: Analyzing communication sequences for team training needs assessment. Hum. Factors 40, 672–679 (1998) 43. Cannon-Bowers, J.A., Tannebaum, S.I., Salas, E., Volpe, C.E.: Defining competencies and establishing team training requirements. In: Guzzo, R.A., Salas, E. (eds.) Team Effectiveness and Decision Making in Organizations, pp. 333–381. Jossey-Bass, San Francisco (1995) 44. Entin, E.B., Entin, E.E.: Assessing team situation awareness in simulated military missions. In: Proceedings of the Human Factors and Ergonomics Society 44th Annual Meeting. Human Factors and Ergonomics Society Press, San Diego, CA, pp. 73–77 (2000) 45. Salas, E., Sims, D.E., Burke, C.S.: Is there a ‘“big five”’ in teamwork? Small Group Research 36, 555–599 (2005) 46. Salas, E., Cooke, N.J., Rosen, M.A.: On teams, teamwork, and team performance: discoveries and developments. Hum. Factors 50, 540–547 (2008) 47. Salas, E., Fiore, S.M.: Team Cognition: Understanding the Factors that Drive Process and Performance. American Psychological Association, Washington, DC (2004) 48. Orasanu, J.M.: Decision Making in the Cockpit. In: Wiener, E.L., Kanki, R.G., Helmreich, R.L. (eds.) Cockpit Resource Management, pp. 137–172. Academic Press, San Diego (1993) 49. Patterson, E.S., Watts-Perotti, J., Woods, D.D.: Voice loops as coordination aids in space shuttle mission control. Comput. Support. Coop. Work 8, 353–371 (1999)

330

S. Malakis et al.

50. Kontogiannis, T.: Adapting plans in progress in distributed supervisory work: aspects of complexity, coupling, and control. Cogn. Technol. Work 12, 103–118 (2010) 51. Malakis, S., Kontogiannis, T.: A sensemaking perspective on framing the mental picture of air traffic controllers. Appl. Ergon. 44, 327–339 (2013) 52. Malakis, S., Kontogiannis, T.: Exploring team sensemaking in air traffic control (ATC): Insights from a field study in low visibility operations. Cogn. Technol. Work 16, 211–227 (2014) 53. Malakis, S., Kontogiannis, T., Kirwan, B.: Managing emergencies and abnormal situations in air traffic control (Part I): Taskwork strategies. Appl. Ergon. 41, 620–627 (2010) 54. Malakis, S., Kontogiannis, T., Kirwan, B.: Managing emergencies and abnormal situations in air traffic control (Part II): Teamwork strategies. Appl. Ergon. 41, 628–635 (2010) 55. Klein, G., Snowden, D., Pin, C.L.: Anticipatory thinking. In: Mosier, K., Fischer, U. (eds.) Informed Knowledge: Expert Performance in Complex Situations. Psychology Press, London (2010) 56. Langan-Fox, J., Canty, J.M., Sankey, M.J.: Human–automation teams and adaptable control for future air traffic management. Int. J. Ind. Ergon. 39(5), 894–903 (2009) 57. Luokkala, P., Nikander, J., Korpi, J., Virrantaus, K., Torkki, P.: Developing a concept of a context-aware common operational picture. Saf. Sci. 93, 277–295 (2017) 58. Steen-Tveit, K., Munkvold, B.E.: From common operational picture to common situational understanding: an analysis based on practitioner perspectives. Saf. Sci. 142 (2021). https:// doi.org/10.1016/j.ssci.2021.105381 59. Stein, E.S., Della Rocco, P.S., Sollenberger, R.L. Dynamic re-sectorization in air traffic control: a human factors perspective. FAA, U.S. Department of Transportation. DOT/FAA/TCTN06/19 report (2006) 60. Thomke, S.: Enlightened experimentation. The new imperative for innovation. Harvard Bus. Rev. 79(2), 66–75 (2001) 61. Grote, G.: Management of Uncertainty: Theory and Application in the Design of Systems and Organizations. Springer, Berlin (2009) 62. Laursen, T., Smoker, A.J., Baumgartner, M., Malakis, S., Berzina, N.: Reducing the gap between designers and users, why are aviation practitioners here again? In: International conference on Cognitive Aircraft Systems – ICCAS, Toulouse France, June 2022 (2022) 63. Boy, G.: Human-Systems Integration. CRC Press (2020)

Assessing Performance of Agents in C2 Operations Alexander Melbi(B) , Björn Johansson, Kristofer Bengtsson, and Per-Anders Oskarsson Swedish Defence Research Agency, Linköping, Sweden [email protected]

Abstract. This paper describes an assessment tool for agents’ performance in C2 operations. Aspects that are considered to affect performance, such as mental workload and situation awareness, have been applied to create this tool. In order to measure performance from multiple perspectives, generic performance measures from First Person Shooter games and other simulated environments have also been applied. As a result, the assessment tool consists of the Crew Awareness Rating Scale (CARS), the Situation Awareness Global Assessment Technique (SAGAT) and the Nasa-Task Load Index (NASA-TLX). The tool also consists of generic performance measures for goal achievement, number of fratricide, and number of soldiers killed in action. Three feedback session with platoon commanders were conducted in order to refine the assessment tool. Each feedback session was preceded with a simulation run conducted in a simulated combat environment. The feedback sessions resulted in removing and adding questions in the questionnaires, as well as including generic performance measures considered relevant. The assessment tool presented in this paper is considered effective, easy to use, cost effective, flexible, and objective. Keywords: Command and control · C2 · Performance · Performance assessment · Military · NASA-TLX · CARS · SAGAT

1 Introduction Agents operating in C2 environments perform their tasks in complex, rapidly changing, uncertain, time constrained, and high-risk contexts. In order to perform successfully, these agents need to perceive, interpret, and exchange large amounts of ambiguous information [1]. Loss of or poor Situation Awareness (SA) has typically been associated with costly or in worst-case considerable consequences [2]. Therefore, developing a tool to assess the performance of agents in C2 environments is important. Commanders must have a precise understanding of the current situation including the mission goal, terrain, time constraints, knowledge of own troop location, condition, tactics, morale and casualties, as well as the enemy troops’ numbers, locations, goals, tactics, and equipment [3]. Several approaches could be applied to assess the performance of a platoon commander. In this study, the measures were divided in two parts: (1) SA and Mental Workload © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 331–344, 2023. https://doi.org/10.1007/978-3-031-35389-5_23

332

A. Melbi et al.

(MWL) and (2) generic performance measures from First Person Shooter (FPS) games and other simulation environments. FPS games were used to develop generic performance measures for the platoon commander since video/computer games are popular instructional tools for creating virtual simulations for training [5]. In order to create an assessment tool, different criteria can be used to guide the researcher. Meister [4] present eight different criteria that the researcher can consider: effectiveness, ease of use, cost, flexibility, range, validity, reliability, and objectivity. Effectiveness concerns the extent to which the method accomplishes its purpose. Ease of use concerns how easy the method is to apply. Cost concerns several different aspects, such as monetary costs, data requirements, equipment needs, personnel, and time needed to apply the method. Flexibility concerns if the method can be used in different contexts, with different system types, and at several system levels. Range concerns the number of phenomena, behaviors, and events that the method can analyze or measure. Validity concerns if the method measures what it is intended to measure. Reliability concerns if the method provides similar results if it is applied to the same phenomena. Finally, objectivity concerns if the method is independent of the researcher’s subjective biases, feelings, and interpretations. The following sections describe the theoretical background including SA and MWL questionnaires. The experimental methodology applied in this paper including the Situation Awareness Requirement Analysis and the three simulation runs conducted. The results from the study include the final version of the assessment tool, an example of scores, and interpretations of the scores. Finally, the conclusions drawn from the study are presented.

2 Theory Situation Awareness (SA) refers to the level of awareness that agents have about a given situation [6]. Endsley defines SA as:”… The perception of elements in the environment within specific spatial and temporal indications, an understanding of their significance and the projection of their status in the near future” [7, p. 36]. SA can be divided in two perspectives, individual and distributed. The individual perspective considers SA from the perspective of the individual agent and the distributed perspective considers SA as distributed over several agents and artifacts that together form a system [6]. Within the individual perspective, the three-step model of SA is typically used to describe SA and will be used for the simulation runs. Endsley’s [7] threestep model consists of (1) perception of elements in the environment, (2) understanding of the current situation, and (3) projection of future states. SA has been shown to be highly predictive of performance [8]. However, measuring SA is a difficult task and has proven to be a challenge for the Human Factors research area. In order to evaluate SA in a C2 environment, a multi-dimensional approach is recommended [3, 9]. Mental workload (MWL) represents the balance between a person’s cognitive resources and the demands of the task at hand when a certain level of performance is desired to be maintained [10]. In human factors and ergonomics, MWL is one of the most widely used concepts and it has been shown to be a topic of increasing importance [11]. Meeting the requirements of a task can lead to unbalanced MWL that can degrade

Assessing Performance of Agents in C2 Operations

333

performance [12]. Depending on the task, fatigue, stress, and even accidents may be caused. Therefore, MWL is an important aspect to include in assessments of agents in C2 operations. 2.1 NASA-Task Load Index NASA-Task Load Index (NASA-TLX) measures a participant’s overall workload when performing one or more tasks. The original domain of application for NASA-TLX was aviation, but the scale is now used to assess workload in several domains, such as automobile, medicine, and combat contexts [16]. The NASA-TLX questionnaire contains six different dimensions: mental load, physical load, time pressure, performance, effort, and frustration. The assumption is that by combining the measures from each dimension, the overall workload experience of the participant will be represented. The participants answers the questions on a scale ranging from 0 (low) to 100 (high). For performance, the direction of the scale is reversed, from 0 (good) to 100 (poor). The participants can fill out the questionnaire either during or after a task. To account for differences between participants, a weighting procedure is conducted preferably before the administering the questionnaire. The weighting procedure is conducted by presenting 15-paired comparisons of the six dimensions (e.g. temporal demand vs. frustration). For each pairwise comparison, the participant selects the dimension with largest importance of subjectively experienced workload. When all 15 pairwise comparisons have been completed, a score is given to each dimension ranging from zero to five. The weighted workload index is calculated by multiplying each dimensions score with its weight [17]. 2.2 Crew Awareness Rating Scale Crew Awareness Rating Scale (CARS) is a subjective measurement of a participant’s SA developed for military missions. CARS has previously been used for measuring SA of military commanders using a digitized C2 system in a simulated combat scenario [15]. The CARS questionnaire consists of eight questions. Three questions assess the participant’s ease of identification, understanding, and projection of SA elements that refer to Endsley’s three-step model. Four questions assess the participant’s mental effort to identify, understand, and project future SA elements. One question assesses how much the participant uses their identification, understanding, and projection of SA elements to make decisions. For each question, the participants rate their SA on a scale 1 (worst case) to 4 (best case). The advantages with CARS is its low requirements for training, both of the researcher and the participant. It only requires a pen and paper and does not need subject matter experts for assessing SA. Further, the time it takes for the participant to complete the questionnaire is considered low [9]. 2.3 Situation Awareness Global Assessment Technique Situation Awareness Global Assessment Technique (SAGAT) is a method for measuring participant SA. It has mainly been used in high fidelity or medium fidelity simulations

334

A. Melbi et al.

[13]. SA is measured by SAGAT questions, which the participant answers during determined freezes in the simulation. At each freeze, the participant is presented with a random selection of questions that are to be answered within a fixed time. There are two advantages of using SAGAT, it provides a direct measure of the participant’s SA by not relying on data that is subjective or obtained after the entire simulation has been completed. SAGAT is also one of the most common and validated methods for measuring SA [9]. The questions included in the SAGAT depends on the scenario in which the study is performed. To create the questions for the SAGAT questionnaire, a Situation Awareness Requirement Analysis (SARA) can be performed. The purpose of SARA is to determine what tasks an actor performs in a given environment, while strengthening the validity of the SA assessment method applied. SARA is carried out by showing a subject matter expert a graphic representation of goals sorted in a hierarchical structure. The subject matter expert then evaluates how well the goals correspond to reality [14].

3 Method The assessment tool applies a multiple measure approach in order to assess the performance of a platoon commander in a simulated environment. NASA-TLX measures the participants’ self-experienced workload, CARS indirectly measures SA, SAGAT directly measures SA, and the generic performance measures assess tasks and goals relevant to the platoon commander. The generic performance measures provide an overarching result of the actual outcome of the mission. This is done by assessing if the goal of the missions was achieved, how much fratricide occurred, and how many soldiers were killed in action. An iterative approach containing three simulation runs and three feedback sessions was conducted to refine the assessment tool. 3.1 Participants One participant was studied during each simulation run, the platoon commander. The first participant had 4 years previous experience serving as a platoon commander in the Swedish Armed Forces. The second participant had served as a platoon commander for 6 years in the Swedish Armed Forces. The first participant was used in simulation run 1 and 3, as well as in the SARA. The second participant was used in simulation run 2. One experiment leader played the role of a game master and company commander. Additional personnel played the roles of gunner and driver for the platoon commander’s IFV, as well as two drivers operating one IFV each. The platoon commander operated one of the IFVs. 3.2 Simulation Runs The three simulation runs were conducted in the 3D virtual training environment Virtual Battle Space 3 (VBS3) with first-person-shooter perspective and multiplayer capabilities. VBS3 is commonly used for tactical training experimentation and mission rehearsal for land, sea, and air warfare. In VBS3, the player can operate different vehicles such as

Assessing Performance of Agents in C2 Operations

335

an Infantry Fighting Vehicle (IFV) (see Fig. 1). The simulation setup was composed of one station for the participant, one for the experiment leader, and four for the additional personnel (see Fig. 2). 3.3 Situation Awareness Requirement Analysis A Situation Awareness Requirement Analysis (SARA) was conducted before the simulation runs and feedback sessions. The purpose of performing a SARA was to gain an understanding of the tasks and goals of a platoon commander in a combat scenario and gather information for the SAGAT questions. A graphical representation of an infantry platoon commander’s goals in Military Operations in Urbanized Terrain (MOUT) from [14] was presented to the participant. The participant was given the task of stating whether the goals were relevant for a platoon commander in a combat scenario. The relevant goals considered were avoid casualties, avoid detection by the enemy, maintain troop readiness, defend against attack, avoid fratricide, negate enemy threat, prioritize enemy threats, make enemy position untenable, obscure/avoid enemy line of fire, change enemy behavior, engage enemy, reach point X at time Y, determine route, determine formation/order of movement, attack to objective, plan/modify plan to accomplish mission, and reorganize.

Fig. 1. The platoon commander operating an IFV during a simulation run.

336

A. Melbi et al.

Fig. 2. Physical simulation environment.

3.4 Materials and Measures Paper copies of the NASA-TLX, CARS, SAGAT questionnaires, and maps were distributed to the platoon commander. The maps showed an overview of the scenario. A pen, one red and one blue whiteboard marker were provided to the platoon commander in order to fill out the questionnaires and create markings on the maps. Three video cameras were used to record sound and video material from the simulation runs and feedback sessions. The NASA-TLX, CARS, and SAGAT questionnaires, as well as generic performance measures were used to assess the performance of the platoon commander. The generic performance measures used in the simulation runs were, achieve the goal of the mission, fratricide, number of soldiers killed in action, number of enemies killed, number enemies wounded, and kill participation (the percentage of a soldier’s kills in relation to the total number of kills in the company).

Assessing Performance of Agents in C2 Operations

337

Five of Meister’s [4] criteria were used to support the development of the assessment tool in this study: effectiveness, ease of use, cost, flexibility, and objectivity. The assessment tool should be effective, meaning that the measurements assess performance related to the platoon commander’s tasks and goals. The assessment tool should also be easy of use. Meaning it should not require training from the participants when administered and not require a subject matter expert to assess the results. The assessment tool should also be cost effective, meaning it should have a low application time to avoid personnel-related costs. It should also be flexible, meaning that assessment tool could be applied, through smaller modifications, in different contexts with other C2 agents. Finally, the assessment tool should be objective. Meaning that the results should not rely on the researcher’s subjective bias, emotions, or interpretations to evaluate. Some modifications were made before the simulation runs in order to adapt the questionnaires to the study. The scales for the NASA-TLX was changed to 1–20 and the physical demand scale was removed. As the NASA-TLX were answered on a 1–20 scale, the scores had to be multiplied by 5 in order to convert the scores to its original 1–100 scale. The CARS was also answered by the participants on a 1–20 scale. In order to convert the CARS scores to its original 1–4 scale, the scores had to be converted by dividing each score by 5. The participants were not presented with a randomized selection of questions and they did not have a fixed time limit to answer the questions in the SAGAT. The SAGAT questions were divided in two parts. The first part was answered by the participant making markings on a map over friendly and enemy position, as well as which of the enemies have the weakest and strongest positions. These questions were scored as correct if the markings were 100 m from the actual position of own or enemy units. The correctness of the map markings were controlled by reviewing the video material recorded in the After Action Review (AAR) in VBS3. The second part consisted of the participant answering questions, such as “Can the mission be completed during the fixed time limit?” and “Does the enemy have observation of your platoon?”. These questions were scored by comparing the participant’s answers with what actually happened during the simulation run through the AAR. After each feedback session, the video recordings were analyzed. The analysis consisted of listening and writing down the feedback given by the platoon commanders. The feedback was then used to modify the questionnaires, which were then distributed to the platoon commander during the next simulation run. The generic performance measures were also modified according to the feedback session in simulation run 3.

338

A. Melbi et al.

3.5 Procedure The simulation runs began with the company commander briefing the platoon commander of the attack that would be performed in VBS3. The company commander described that the goal of the scenario was to seize a village occupied by enemy troops. The company commander drew seven battle lines on a map and indicated where the platoon should be positioned at each battle line. This map was copied and handed to the platoon commander during the simulation runs. Finally, the company commander also described where s/he thought the enemies were positioned. During the simulation runs, freezes occurred when the platoon commander reached battle line 1, 2, and 4. During the freezes, the experimenter handed out the NASA-TLX, CARS, and SAGAT questionnaires for the platoon commander to fill out. Some of the questions in the SAGAT questionnaire required the platoon commander to make markings on the map. The generic performance measures were collected through in-game statistics in VBS3. All simulation runs ended with a feedback session. During the feedback sessions, the platoon commander provided verbal feedback on the questionnaires. The feedback comprised comments of the formulations of the questions, the relevance of the questions in relation to a platoon commander’s tasks in the given combat scenario, as well as reflections on the generic performance measures. After each completed simulation run, the questionnaires were modified in accordance with the platoon commander’s feedback. For a more detailed description of the procedure, see Melbi [18].

4 Results First, a description is presented of modifications made to the assessment tool after the feedback sessions and then the final version of the assessment tool. This is followed by examples of scores from the assessment tool and how the scores can be interpreted. 4.1 Final Version of the Assessment Tool The modifications of the assessment tool made after the feedback sessions with the participants comprised both formulation of questions, as well as layout changes, such as adding boxes to indicate similarities between questions. The final version of the assessment tool can be seen in Fig. 3, Fig. 4, and Fig. 5. As the participants did not only perform one single task before each freeze of the simulation they found the formulations of the NASA-TLX questions to be confusing (e.g. “How mentally demanding was the task?”). Due to this feedback the questions were rephrased to end with “… during the last 10 min?”. One of the participants was also confused of whether the frustration question in NASA-TLX focused on technical problems in VBS3 and keyboard shortcuts. A short text was added to clarify that this was not the focus of the frustration question.

Assessing Performance of Agents in C2 Operations

339

The participants considered the questions in the CARS questionnaire as vague, therefore specific examples for each question were added, such as “e.g. your own and enemy position”. Five SAGAT questions were considered irrelevant by the participants and were therefore removed from the SAGAT instrument. Three of the generic performance measures were removed due to their limited relevance for the platoon commander. These include, number of wounded enemies, kill participation, and number of enemies killed. The participants considered that the most important generic performance measurement was completing the goal of the mission, followed by how much fratricide occurred. The number of soldiers killed in action was also considered to be relevant.

Fig. 3. Final version NASA-TLX questionnaire.

340

A. Melbi et al.

Fig. 4. Final version CARS questionnaire.

Assessing Performance of Agents in C2 Operations

Fig. 5. Final version SAGAT questionnaire.

341

342

A. Melbi et al.

4.2 Example of Scores Examples of NASA-TLX, CARS, and SAGAT scores can be seen in Table 1, and the scores for the generic performance measures can be seen in Table 2. The example scores are from simulation run 3, which lasted for 1 h and 40 min. Only the results from simulation run 3 is presented since the questions in the questionnaires changed after each simulation run. Table 1. Results from questionnaires in simulation run 3. NASA-TLX (scores range between 0–100), CARS (scores range between 0–4), and SAGAT (scores range between 0–15, percentage depend on the amount of questions correctly answered). Questionnaire

Battle line 1

Battle line 2

Battle line 4

NASA-TLX

32

41

48

CARS

3.4

2.7

3.5

SAGAT

9 (60%)

6 (40%)

5 (33%)

Table 2. Results from generic performance measures in simulation run 3. Goal of the mission completed

Number of fratricide

Number of soldiers killed in action

Yes

0

23

4.3 Interpreting the Scores In order to assess the performance of the participant, the scores from the questionnaires and generic performance measures (see Table 1 and Table 2) can be compared. As CARS measures self-experienced SA and SAGAT directly measures what SA aspects the participant is aware of, the two measures can be compared for similarities or discrepancies. For example, it can be investigated if a high/low score from CARS correlates with a high/low score from SAGAT. The NASA-TLX scores provide a measure of self-experienced workload, which also can be compared regarding similarities or discrepancies with the SA measurements. The results from simulation run 3 show that the amount of correct SAGAT answers decrease from battle line 1 to battle line 4, indicating a decreasing situational awareness during the simulation run. These scores can then be compared to the NASA-TLX scores that increased from battle line 1 and battle line 4. This is logical, since decreasing SA is generally related to increasing mental workload. Interestingly, the CARS scores showed a different pattern of decreasing between battle line 1 and 2, and then increasing between battle line 2 and 4. The NASA-TLX scores indicate that the subjective experience of increased workload is related to situational awareness as measured by the SAGAT in this study. The CARS

Assessing Performance of Agents in C2 Operations

343

scores however, show a different pattern from the SAGAT and NASA-TLX scores. This could serve as basis for questioning the participant of why they experienced high levels of SA during battle line 4, when their workload was at its highest point and the SAGAT was at its lowest point. The generic performance scores provides a general assessment of a platoon commander’s performance in a simulated combat scenario. Analyses of the relation between the generic performance scores and the scores from the questionnaires can provide further understanding of the platoon commander’s performance.

5 Conclusions The purpose of this paper was to assess the performance a platoon commander in a simulated combat scenario. As a result, an assessment tool has been developed, focusing on mental workload, situation awareness, and generic performance measures. The assessment tool consists of modified versions of the NASA-Task Load Index (NASA-TLX), Crew Awareness Rating Scale (CARS), and the Situation Awareness Global Assessment Technique (SAGAT). The tool also consisted of generic performance measures of goal achievement, fratricide, and number of soldiers killed in action. The assessment tool is considered effective, easy to use, cost effective, flexible, and objective.

References 1. Riley, J.M., Endsley, M.R., Bolstad, C.A., Cuevas, H.M.: Collaborative planning and situation awareness in Army command and control. Ergonomics 49, 1139–1153 (2006). https://doi.org/ 10.1080/00140130600612614 2. Salmon, P., Stanton, N., Walker, G., Green, D.: Situation awareness in military command and control (C4I) systems: the development of a tool to measure SA in C4I systems and battlefield environments. Stage 1: SA Methods Review. In: Human Performance Situation Awareness and Automation, pp. 44–49. Psychology Press, East Sussex (2004) 3. Endsley, M., Garland, D., Wampler, R., Matthews, M.: Modeling and measuring situation awareness in the infantry operational environment. 126p (2000) 4. Meister, D.: Behavioral Analysis and Measurement Methods. Wiley, New Jersey (1985) 5. Grimshaw, M., Lindley, C.A., Nacke, L.: Sound and immersion in the first-person shooter: Mixed measurement of the player’s sonic experience. In: Proceedings of the Audio Mostly Conference - A Conference on Interaction with Sound. pp. 9–15 (2008) 6. Stanton, N.A., Salmon, P.M., Rafferty, L.A., Walker, G.H., Baber, C., Jenkins, D.P.: Human Factors Methods: A Practical Guide for Engineering and Design, 2nd edn., CRC Press, London (2013) 7. Endsley, M.R.: Toward a theory of situation awareness in dynamic systems. Hum Factors. 37, 32–64 (1995). https://doi.org/10.1518/001872095779049543 8. Endsley, M.R.: A systematic review and meta-analysis of direct objective measures of situation awareness: a comparison of SAGAT and SPAM. Hum Factors. 63, 124–150 (2021). https:// doi.org/10.1177/0018720819875376 9. Salmon, P., Stanton, N., Walker, G., Green, D.: Situation awareness measurement: a review of applicability for C4i environments. Appl. Ergon. 37, 225–238 (2006). https://doi.org/10. 1016/j.apergo.2005.02.001

344

A. Melbi et al.

10. Virtanen, K., Mansikka, H., Kontio, H., Harris, D.: Weight watchers: NASA-TLX weights revisited. Theor, Issues Ergon. Sci. 23, 725–748 (2022). https://doi.org/10.1080/1463922X. 2021.2000667 11. Young, M.S., Brookhuis, K.A., Wickens, C.D., Hancock, P.A.: State of science: mental workload in ergonomics. Ergonomics 58, 1–17 (2015). https://doi.org/10.1080/00140139.2014. 956151 12. Mansikka, H., Virtanen, K., Harris, D., Jalava, M.: Measurement of team performance in air combat – have we been underperforming? Theor. Issues Ergon. Sci. 22, 338–359 (2021). https://doi.org/10.1080/1463922X.2020.1779382 13. Endsley, M.: Measurement of situation awareness in dynamic systems. Hum Factors. 37, 65 (1995). https://doi.org/10.1518/001872095779049499 14. Matthews, M.D., Strater, L.D., Endsley, M.R.: Situation awareness requirements for infantry platoon leaders. Mil. Psychol. 16, 149–161 (2004). https://doi.org/10.1207/s15327876mp1 603_1 15. McGuinness, B., Ebbage, L.: Assessing human factors in command and control: workload and situational awareness metrics. 12p (2002) 16. Hart, S.G.: NASA-task load index (NASA-TLX); 20 years later. In: Proceedings of the Human Factors and Ergonomics Society, pp. 904–908 (2006) 17. Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Adv. Psychol. 52, 139–183 (1988) 18. Matthews, M.D., Strater, L.D., Endsley, M.R.: Situation awareness requirements for infantry platoon leaders. Military Psychol. 16, 149–161 (2004). https://doi.org/10.1207/s15327876 mp1603_1 19. Melbi, A.: Measuring a platoon commander’s performance in a complex, dynamic and information rich environment. Linköping University (2021)

Research and Application of Intelligent Technology for Preventing Human Error in Nuclear Power Plant Yang Shen(B) , Xiang Ye, and Di Zhai Research Institute of Nuclear Power Operation, Wuhan, Hubei, China [email protected]

Abstract. The prevention of human error is a key issue to be paid attention to in nuclear power plant operation since it is a complex human-machine system. With the rapid development of intelligent technology, human error defense has gradually developed from traditional human-defense to intelligent-defense, with the aim of further improving work efficiency and human reliability. This study analyzes and summarizes the human error problems existing in the procedure operation process in nuclear power plants such as wrong operation object, unconfirmed status response, skipped and missed steps in the procedure, lack of monitoring of key steps. This study takes rule-based tasks (panel operation) in nuclear power plant as the research object, combines human error prevention rules and concepts of intelligent prevention, uses the new technology of AR + AI including image and voice recognition to develop a set of intelligent human error prevention device. This device realizes the intelligent supervision and reminder of the entire behavior process of the operators, and achieves the purpose of reducing the cognitive load and preventing human error. Compared with the traditional concept of human error prevention based on human-defense, this study emphasizes the use of AI technology to assist in improving the safety and risk awareness of the operators, which can more effectively replace the traditional human-defense with intelligentdefense at the human failure points, to achieve the goal of truly reducing human error and improving work efficiency. Keywords: Human Error Prevention · Human-machine Interaction · Intelligence · Augmented Reality · Safety

1 Background of Intelligent Human Error Prevention in Nuclear Power Plant Safety is the lifeline of a nuclear power plant (NPP). Preventing human error is closely related to every task. Based on the analysis of human error mode based on lots of humanfactor events in NPPs and the human nature, operational errors can usually be divided into the error of omission (EOO) and the error of commission (EOC). The former refers to the fact that the operators forget or do not perform a part of the task due to memory error during the execution of the task. The latter refers to the failure to perform the task © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 345–359, 2023. https://doi.org/10.1007/978-3-031-35389-5_24

346

Y. Shen et al.

in the correct way due to the allocation of attention resources during the execution of the task. Aiming at the mode of human error, a series of human error prevention tools and behavior specifications have been developed specially to prevent the EOO and EOC. The key point is to require the on-site personnel to use human error prevention tools correctly and effectively. According to the work order statistics of China Nuclear Power Production Management System, there are more than 20,000 on-site operations at the nuclear power site each year, including more than 500 high-frequency and high-risk operations. To avoid human error, each operation involves the use of at least three human error prevention tools. Although worker in NPPs have been required to use human error prevention tools in a standardized manner, human-related incidents are still unavoidable. After analyzing the root causes, it was found that human-related incidents are caused by the insufficient use of human error prevention tools. This shows that the traditional method of preventing human error has encountered a bottleneck, which is also a problem and pain point that the nuclear power industry needs to solve urgently. With the digital and intelligent transformation of China NPPs, the demand for intelligent assistance to prevent human error is showing a strong growth trend. Combined with the development status of nuclear power, through the communication with nuclear power owners and different professionals and the analysis of human-related incidents, the following main pain points was found. Ineffective Use of Human Error Prevention Tools or Failure to Use Them Correctly. There are massive rule-based tasks in the nuclear power industry, and they are solidified in the form of procedures. In the process of executing tasks, workers are required to use human error prevention tools to avoid human errors. To ensure that the executor uses the prevention tools correctly and effectively, usually in the pre-job meeting, the task parts that require the use of human error prevention tools will be explained, and the names of the prevention tools that need to be used in key steps will be marked on the procedures. However, due to the lack of supervision, human-related incidents caused by unregulated use of those prevention tools still exist. Therefore, it is urgent to reduce the psychological and physical load of workers through intelligent supervision, to improve the effectiveness of human error prevention tools, and to reduce costs and human errors. High Monitoring Cost of Human Error Prevention in NPPs, and Urgent Need for Intelligent and Unmanned Supervision of the Use of Human Error Prevention Tools. In the supervision process of human error prevention in NPPs, both the executor and the guardian must be required to perform a job, and higher requirements are placed on the qualifications of the guardian. The guardian is required to observe and supervise the executor’s operation process in real-time throughout the whole process. Since there are many tasks requiring supervision on the site of the NPPs, major human-related incidents usually occur due to inadequate supervision of some tasks. Intelligent monitoring to prevent human errors can replace the existing guardian, greatly improve the effectiveness of monitoring, and reduce the labor cost of NPPs. Therefore, the goal of this study is to use the AR + AI technology to improve the monitoring of on-site workers, and solve the problems existing in the NPP site such as failure to confirm the operation object or status response according to the requirements of

Research and Application of Intelligent Technology

347

the procedures, lack of supervision or invalid monitoring of key steps of the procedures, and loss of key information in the three-way communication process.

2 Intelligent Human Error Prevention Model Based on AR + AI Technology 2.1 Development Status of Intelligent Human Error Prevention Human error (HE) is generally recognized as human actions and results that deviate from the stated goals, producing or causing potential adverse effects, thus posing a risk to any industry that relies on humans to complete tasks, especially for those safety-critical industries such as aerospace, healthcare, and nuclear power plants. Preventing human error is a key issue that needs to be paid attention to in the operation of a NPP since it is a complex man-machine system. With the rapid development of intelligent technology, human error prevention has gradually developed from the traditional prevention by human to intelligent prevention, with the goal of further improving human efficiency and reliability. Recently, due to the rapid development of deep learning technology, artificial intelligence (AI) technology has been applied to complex industries such as the NPPs. Ahn, Bae, and Lee developed a framework based on deep neural networks and CPNs to identify operator errors, and to reduce human errors in NPP operations [1]. Park, Jo, and Na proposed an anomaly detection model based on long-short-term memory variational autoencoders (LSTM-VAE) to detect system and component anomalies, thereby reducing human error in NPP diagnostic tasks [2]. An accident diagnosis method based on a Bayesian classifier was proposed for NPPs; this method combines knowledge-driven and data-driven and has strong robustness and interpretability [3]. Based on typical human failure mode, Shen analyzed the intelligent technical means adopted in different tasks to prevent human error from the two perspectives of cognition and behavior [4]. Yang and Hu used intelligent image recognition technology to automatically identify scanned documents, automatically find abnormal images in massive scanned data, and assist document managers to process them quickly [5]. Bae, Ahn, and Lee predicted the trend of 55 equipment parameters and detected the presence of human error by using artificial neural networks [6]. Nishiura, Nambu, Maruyama, and Wada analyzed the electroencephalogram (EEG) signals of human operators, and predicted human errors using shallow CNN-based anomaly detection [7]. Augmented reality (AR), as a technology for image enhancement or expansion of human visual system, it integrates computer-generated graphics, text, annotations, and other information into the real-world scene seen by the user, thereby providing people with virtual information that does not exist in the real world, and helping people better perceive and understand their environment. Therefore, AR technology can help prevent human error in complex systems. For example, the use of AR devices allows operators to directly send instructions to the system through gestures, eye movements, etc. at a fixed position, greatly reducing the operating burden [8]. Li, Chen, and Zhang proposed an MR-based equipment maintenance assistance system for NPPs by using the Microsoft Holo Lens device [9]. Momin, Panchal, Liu, and Perera established a Personal

348

Y. Shen et al.

Augmented Reality Reference System (PARRS) to minimize human error by digitizing programs and monitoring the environment in real time [10]. Yan and Yue proposed a simulator operation and maintenance support system architecture based on augmented reality technology, which realizes the digital management of the whole process of operation and maintenance [11]. By using VR and AR technology, Popov et al. built a virtual reality for operator training, to achieve the purpose of training while ensuring safety [12]. In addition, the head-mounted equipment can be used to display visual and dynamic operation guidance on the corresponding buttons or interfaces within the line of sight, thereby helping the operator reduce the error rate and workload during the operation. For example, Park, Kim, Choi, and Lee proposed an intelligent task assistance method by deep learning-based object detection and instance segmentation with wearable AR technology, to achieve better complex task performance with less cognitive load [13]. With the help of digital twins and wearable AR devices in the assembly process, the adaptive assembly process perception based on the fusion of real and virtual scenes can enhance the synergy between physical systems and network systems [14]. In summary, the main countermeasure against human error in NPPs so far still relies on the individual. However, this anti-error strategy will face the problems of insufficient human resources due to the aging population, talent dilution, high labor costs, and low efficiency of multiple inspections. If AI technology is used to replace the antierror strategy which relies on human, it can not only replace manpower for double inspection and reduce manpower expenditure, but also essentially solve the errors caused by subjective influences such as human emotion and experience since AI technology strictly relies on the logical inference of computing components. Therefore, the human error prevention tools which combine AI algorithms and AR technology have technical feasibility and broad application prospects. 2.2 Human Error Mechanism and Behavior Pattern Human error refers to the result of human behavior that deviates from the stated goal or exceeds the acceptable limit, resulting in adverse effects. If the human brain is regarded as a processing system, the input is stimulus and the output is behavior. The process of human acquiring and processing information is shown in Fig. 1. Human errors may occur at any stage in the process, such as the stages of information perception, information stimulation, information processing, thinking, behavior output. As shown in Fig. 1, since the attention resource of an individual is limited, attention allocation is critical during information processing. If the task is relatively new, difficult, or special, and has an important impact on safety, there will be more attention allocated to perception and thinking. If the task is more dependent on operational skills, the attention allocated to the action will be more than the previous two stages (perception and thinking). In total, the attention used for perception and thinking is always more than the attention used for action, which means that the mental activities require more attention than physical activities; unfamiliar tasks require more attention than familiar tasks; more attention is devoted to complex, important, and valued tasks rather than to simple, less important tasks.

Research and Application of Intelligent Technology

349

Attention allocation

Perceive

Thinking

Action

Path of the information flow Fig. 1. The process of humans acquiring and processing information.

Therefore, human nature (internal) and situational characteristics (external) determine that the human errors of most people in NPPs are unintentional, and are caused by careless actions. The AR + AI intelligent monitoring technology based on rules-based tasks mainly enhances the individual attention and performance in perception, thinking and action, and solves the problem of human errors in the process of unintentional human-computer interaction as follows. Perception Error. Such as human errors in operation/display object selection, or in reading information; Thinking Error. Such as skipped steps in the procedures, or unconfirmed the display status of the instrument or light; Action Error. Such as the switch operation is not performed correctly.

3 Intelligent Prevention Model of Human Error in NPPs Traditional prevention of human error mainly depends on monitoring human error prevention tools to defend against human error in the work process. The monitoring flow of human error prevention is shown in Fig. 2. Combining the role requirements of the guardians and the characteristics of humancomputer interaction, AR and AI technologies can be used to replace the guardian’s eyes (visual positioning), mouth (voice), and brain (recognition and judgment). They can replace the guardians, realize intelligent monitoring, and solve the perception errors, thinking errors, and action errors that people are prone to in the work process by using the AR + AI technology. Usually, the on-site task is executed based on operating procedures, and the executor holds the operating procedures to complete the operation steps. The intelligent monitoring robot can be used to collect information through AR glasses, and make judgments by

350

Y. Shen et al.

Fig. 2. The monitoring flow of human error prevention.

comparing the information collected with the information required in the procedure in real-time, such as device location information (whether the perception is wrong), device status information (whether the perception is wrong), operation intention (whether the thinking is wrong), operation execution and status response after execution (whether the execution is wrong), to clarify whether the operation of the executor is correct. If it is correct, the robot will give a clear instruction “execute operation” to the operator and mark the operation steps. If it is not correct, the robot will give a clear warning message “do not operate”, thus effectively solving the problem of human error, such as object selection error, sequence selection error, operation time error, and status monitoring error. The intelligent monitoring model is shown in Fig. 3.

Research and Application of Intelligent Technology

351

Fig. 3. Intelligent monitoring model.

Based on the intelligent monitoring model, it is not difficult to find out that it is necessary to use the augmented reality scene formed by AR technology as the user interaction interface in the process of intelligent monitoring. Therefore, on the one hand, it is necessary to collect image and voice information of the executor in real-time at the front end, and analyze the capture skills and rules based on the precise positioning and movement changes of users which obtained by using AR technology, so as to determine the scope and capture time of the operation target. On the other hand, the back end needs to establish an AI model to identify the information collected from the front end. The best control method can be obtained through the parameter optimization of different models such as target detection, image classification, semantic segmentation, and speech recognition based on AI technology, and through the application effect debugging of data enhancement technology. Additionally, the AI model can compare real information with the information required by the procedures in

352

Y. Shen et al.

real-time and make judgments, then quickly issue operating instructions to the executor, to achieve the purpose of supervision.

4 Development and Application of Intelligent Human Error Prevention Device Taking a certain task scenario of NPPs as an example to design and develop the intelligent device, the goal is to complete the design, development, and application of intelligent monitoring devices based on the AR & AI intelligent human error prevention model, and finally improve the reliability of human factors in high-risk operations and prevent human errors and improve work efficiency. Combining the intelligent human error prevention model and scene requirements, the system architecture design of the device is shown in Fig. 4.

Fig. 4. System Architecture of intelligent human error prevention.

For this system, model-related AI technologies mainly include speech recognition and computer vision processing. Speech recognition technology mainly recognizes the correctness of operation instructions, avoiding wrong instructions and operating wrong parts; computer vision technology mainly focuses on the analysis of operation results. For example, the status interface (such as indicator lights) of most operation results can be identified by using multi-category target detection; by using image classification, some knobs with many states can be re-identified after target detection; if the status interface has a dashboard-type status, it is also necessary to detect and identify numbers or pointers, and convert the identification results into numerical values.

Research and Application of Intelligent Technology

353

4.1 Development Through comparative analysis and determination of technical routes, combined with onsite application scenarios, the development of intelligent human error prevention device was competed. The front end adopts MVC architecture for development, including the view layer, control layer, and data layer. The view layer is responsible for obtaining relevant references of various components and responding to instructions from the server. The controller layer is responsible for the business logic of device positioning, gesture pointing, UI, human error prevention rule, and processing server instruction, etc. The data layer is responsible for configuration file management, system data, step configuration files, server instruction processing and distribution, etc. The back-end system architecture design adopts the MVC three-layer architecture design pattern, including the presentation layer, business logic layer, and data access layer. The system mainly includes four core modules, namely process scheduling, speech recognition, image recognition, and real-time streaming media processing. The four modules communicate with each other through API interfaces or message queues. The background functional module is shown in Fig. 5.

Fig. 5. The back-end architecture of the intelligent human error prevention device.

4.2 Test and Verification The intelligent human error prevention device has been tested in the operating NPP, and it is mainly used in the intelligent monitoring of panel-related operations, such as the operation monitoring in periodic tests. Considering that the relevant part of the operation involves nuclear safety, the field test adopts a 1:1 simulation body for operation verification. In this experiment, 16 operators in a NPP with different skill levels were recruited. They were divided into two groups, with six experienced operators in Group A and 10

354

Y. Shen et al.

less experienced employees in Group B. The task is an operating procedure task with a total of about 290 steps. All participants completed the task both in traditional way and in innovative way (with the intelligent human error prevention device). The whole experiment last one week. The traditional way refers to directly holding the procedure to operate, and the completion time is recorded. The innovative way refers to equipping the intelligent human error prevention device when completing the task. The specific experiment process in the innovative way is as follows. First, the operator wore AR glasses and approached the device to be operated (see Fig. 6a). The AR glasses automatically scanned to complete the device positioning (see Fig. 6b). After the device was scanned, the operator would be reminded that the spatial positioning was successful, and the operation interface would pop up.

Fig. 6. Experiment site and display interface

Second, the operator checked the three-dimensional UI panel, and carried out the operation according to the guidance of electronic procedures. The three-dimensional UI panel displayed the progress of the current operation process. Each operation step includes viewing operation instructions, reading the text aloud, simulated operation, actual operation, and confirming the status after operation. Reading the Text Aloud. According to the text prompts on the panel (see Fig. 7), the operator read aloud the operation command, for instance, rotate XX switch, then the intelligent device would automatically recognize the voice and transmit the voice to the 11 back-end for recognition. The back-end would feed back the recognition results to the front-end AR glasses, and the glasses would give feedback to the operator in the form of voice and picture display. Simulated Operation. After the text was recognized correctly, the operator used gestures to point to the expected operation, and the glasses would judge the operator’s finger

Research and Application of Intelligent Technology

355

Fig. 7. The text prompts on the panel

position in real-time by using the gesture recognition technology. If the finger position appears in wrong area, a real-time warning would appear. Actual Operation. When the operator started the actual operation, the back-end of the server judged whether the operation area was consistent with the procedures’ requirements, and send out the corresponding voice and screen prompts (see Fig. 8) to inform correct or incorrect.

Fig. 8. Screen prompts

Confirming the Status after the Operation. After the operation was completed, the video data was transmitted to the server through the LAN, and the device status was verified by AI before feedback. When the three-dimensional panel showed that the expected status was correct, the operation procedure entered the next step.

4.3 Results For the operators in Group A, it was found that the completion time in the two different operation way was the same. As shown in Table 1, the overall completion time was about 40 min (Mtraditional = 40.33, Mintelligent = 40.67), and the maximum time deviation does

356

Y. Shen et al.

not exceed 5.3%. Since operators in this group were relatively old (average age = 49 years-old), some of them were still not familiar with the use of related equipment after multiple pieces of training, resulting in longer operating time. Table 1. Results of Group A. Operators

Completion time in traditional way (min)

Completion time with intelligent monitoring device (min)

Time difference (%)

A

42

40

−4.7

B

39

41

5.1

C

43

44

2.3

D

38

40

5.3

E

41

39

−4.9

F

39

40

2.5

For the operators in Group B, the completion time was quite different in the two different operation methods. The average completion time in the traditional method was 56.1 min which it was 49.3 min with the intelligent device. That indicated that when adopting the intelligent human error prevention device, the average completion time of the unskilled operators (average age = 26 years-old) can be reduced by about 6.8 min. As shown in Table 2, the maximum time difference reached 15.3% and the average time difference was 12% under the two operation modes. In addition, it was found that it took less time to identify the knobs and indicator lights for the first time with the intelligent device, which indicated that under the supervision of the intelligent device, the ability of the operator to quickly locate the operating object was enhanced. Besides, the operation task with this set of devices must be performed in accordance with the prescribed steps, and it is impossible to artificially skip steps, thereby avoiding the possibility of human error. In conclude, the application of intelligent human error prevention devices in NPPs can bring the following significant benefits. Perception Error Defense. For less experienced workers, the device provides them with visualized operation progress, key prompt information, voice and visual prompts, etc., which can effectively reduce cognitive load, reduce perception time, and improve work efficiency. Thinking Error and Action Error Defense. Through the intelligent supervision of the device and the mandatory execution of procedure steps, more accurate operator guidance and identification of operating intentions were realized to ensure the correctness of the operation.

Research and Application of Intelligent Technology

357

Table 2. Results of Group B. Operators

Completion time in traditional way (min)

Completion time with intelligent monitoring device (min)

Time difference (%)

G

50

44

−12.0

H

53

46

−13.2

I

55

48

−12.7

J

58

51

−12.1

K

62

55

−11.3

L

57

50

−12.3

M

54

49

−9.3

N

53

47

−11.3

O

60

53

−11.7

P

59

50

−15.3

4.4 Future Work According to the results of the current verification experiment, it can be expected that the following higher expectations can be achieved after the system is further matured in the future. Reducing On-site Monitoring Costs. After the device gradually matures through iterative training, it becomes a senior monitoring robot, which can gradually replace on-site human monitoring. This indicates that the number of the on-site monitoring worker can be reduced while ensuring the comprehensiveness and effectiveness of the supervision. This kind of devices can be popularized and applied in the field of operation and maintenance, which can greatly reduce labor costs and provide technical support for the operation of NPPs with few or no people. Ensuring the Effectiveness of Human Error Prevention Tools. The device can supervise the use of human error prevention tools required by the procedures during the whole work process, to avoid the low-level human error caused by incorrect use or missing use of human error prevention tools, which in turn leads to the occurrence of human error incidents.

5 Conclusion Through the research and application of intelligent technology for human behavior supervision, the current domestic situation of low effectiveness of human behavior management and heavy workload of workers has been solved. It also realized the real-time monitoring and management of human behavior, thereby achieving the purpose of improving the human performance level of NPPs. In the future, it can be promoted and applied to various scenarios such as on-site valve operation and panel operation in NPPs.

358

Y. Shen et al.

Due to the complexity and particularity of nuclear power, the reliability requirements for the on-site workers are very high. With the rapid development of intelligent technology, the concept of preventing human errors has gradually transitioned from human-defense management of human behavior to smart-defense management. Combined with the data analysis related to human behavior obtained by intelligent perception, a machine deep learning algorithm model can be established, which can use computers to simulate human thinking process and intelligent behaviors (i.e., recognition, learning, reasoning, thinking, planning, etc.) for automatic decision-making, operation, and control. It has become a general trend to assist or replace NPPs’ workers to perform more efficiently, safely, and reliably.

References 1. Ahn, J., Bae, J., Lee, S.J.: A human error detection system in nuclear power plant operations. In: 11th Nuclear Plant Instrumentation, Control and Human-Machine Interface Technologies, NPIC and HMIT, pp. 115–121, American Nuclear Society (2019) 2. Park, J.H., Jo, H.S., Na, M.G.: System and component anomaly detection using LSTM-VAE. In: 5th International Conference on System Reliability and Safety (ICSRS), IEEE, pp. 131-137 (2021) 3. Qi, B., Liang, J., Zhang, L., Tong, J.: Research on accident diagnosis method for nuclear power plant based on Bayesian classifier. Atom. Energy Sci. Technol. 56(3), 512–519 (2022) 4. Shen, Y.: Study on intelligent prevention of human error in nuclear power plant. China Nucl. Power 15(1), 81-84+96 (2022) 5. Yang, Q., Hu, X.: Application practice of nuclear power documents based on image recognition technology. Power Syst. Big Data 22(11), 58–63 (2019) 6. Bae, J., Ahn, J., Lee, S.J.: Comparison of multilayer perceptron and long short-term memory for plant parameter trend prediction. Nucl. Technol. 206(7), 951–961 (2020) 7. Nishiura, D., Nambu, I., Maruyama, Y., Wada, Y.: Improvement of human error prediction accuracy in single-trial analysis of electroencephalogram. In: 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), IEEE, pp. 6179–6182 (2021) 8. Chen, C.H., Lee, I.J., Lin, L.Y.: Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders. Res. Dev. Disab. 36, 396–403 (2015) 9. Li, Z., Chen, J., Zhang, L.: Application of mixed reality technology in maintenance of nuclear power stations. Comput. Simul. 35(5), 340–345 (2018) 10. Momin, G., Panchal, R.D., Liu, D., Perera, S.: Case study: enhancing human reliability with artificial intelligence and augmented reality tools for nuclear maintenance. In:ASME 2018 Power Conference collocated with the ASME 2018 12th International Conference on Energy Sustainability and the ASME 2018 Nuclear Forum, 24–28 June, Lake Buena Vista, Florida, USA (2018). 11. Yan, M., Yue, Z.: Applied research on augmented reality technology in operation and maintenance of nuclear power simulator. Process Autom. Instrum. 40(10), 21–24 (2019) 12. Popov, O.O., et al.: Immersive technology for training and professional development of nuclear power plants personnel. In: 4th International Workshop on Augmented Reality in Education, May 11, Kryvyi Rih, Ukraine (2021) 13. Park, K.B., Kim, M., Choi, S.H., Lee, J.Y.: Deep learning-based smart task assistance in wearable augmented reality. Robot. Comput. Integr. Manuf. 63, 101887 (2020)

Research and Application of Intelligent Technology

359

14. Liu, X., et al.: Human-centric collaborative assembly system for large-scale space deployable mechanism driven by digital twins and wearable AR devices. J. Manuf. Syst. 65, 720–742 (2022)

An Emergency Centre Dispatcher Task Analysis Norman G. Vinson1(B) , Jean-Fran¸cois Lapointe1 , and No´emie Lemaire2 1

Digital Technologies Research Centre, National Research Council, Canada, Ottawa, Canada {norman.vinson,jean-francois.lapointe}@nrc-cnrc.gc.ca 2 Thales Digital Solutions, Montr´eal, Canada [email protected] https://nrc-cnrc.canada.ca, https://www.thalesgroup.com Abstract. In the context of a research project regarding the Next Generation 911 (NG911) emergency call network, we studied the operations of an emergency call centre through a task analysis. While much research on emergency response focuses on dealing with sizeable incidents, such as natural disasters, our study specifically looked at the daily operations of the call centre. To understand the tasks of the dispatchers, we analyzed their training materials and conducted interviews. Dispatchers are responsible for receiving information from the call takers and then entering additional details to assist the first responders. In this article, we also discuss some of the challenges faced by the dispatchers as well as some potential solutions. This analysis complements our previous work detailing the emergency call centre’s call takers’ tasks [19]. Keywords: Emergency call center · 911 dispatch Task analysis · Next Generation 911 · NG911

1

· 911 dispatchers ·

Introduction

Firefighters, police officers, and paramedics are the most common first responders to emergencies. In many 911 emergency call centres (ECCs), the calls are first taken by call takers who also enter information into an emergency management software system, as documented in [19]. The dispatcher accesses this call information and selects and sends first responders to the scene. While the fundamental dispatching tasks are very similar across ECCs, the particular task assignments, workflows and software systems can differ. This paper presents a task analysis of the fire and police dispatchers in Qu´ebec City’s (Qu´ebec, Canada) 911 ECC.1 1

In Canada and the United States, ECCs are typically referred to as Public Safety Answering Points or PSAPs [2, 8].

This project was supported by the Canadian Safety and Security Program, a federal program led by Defence Research and Development Canada’s Centre for Security Science, in partnership with Public Safety Canada. Project partners include Thales Digital Solutions and Service de police de la Ville de Qu´ebec (SPVQ). c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 360–378, 2023. https://doi.org/10.1007/978-3-031-35389-5_25

An Emergency Centre Dispatcher Task Analysis

361

Previous research on emergency response has often concentrated on emergence response management of large-scale incidents, such as hurricanes [4,5,11, 12], whereas this article primarily focuses on more routine, everyday emergencies, like building fires. The differences in tasks, processes, and responsibilities between handling emergencies and disasters make it challenging to apply findings from disaster response research to the more common, everyday emergency response. Therefore, the examination of the regular operations of an emergency call center in this study offers insight into an area that has not been thoroughly explored in the literature. In our project, we analysed call centre staff workflows to explore how information technology could assist staff in processing multimedia information expected to be transmitted through the new Next Generation 911 (NG911) system. NG911 is a migration of the US and Canada’s emergency calling system from the landline (switched voice data) telephony network to an internet (IP) network2 . The new network will allow callers and the public to transmit multimedia (audio, text, images, and perhaps video) to the ECC, and, as a result, 911 staff will have to deal with this multimedia data. The objective of our project was to develop a software prototype with various artificial intelligence (AI)-enabled functions to support 911 staff’s processing of this multimedia information. It was to determine how to best support call centre staff that we conducted a task analysis of the call centre’s operations. We previously reported on our task analysis of call takers’ tasks in [19]. The dispatcher task analysis presented here complements that analysis. While we focused on developing new technology for anticipated needs, task analysis is a necessary first step in developing technology to increase performance and accuracy in time sensitive operational settings like an ECC, where a delay of a few seconds can mean the difference between life and death [17]. Once the tasks are well described and understood, we can determine where technology can increase the staff members’ efficiency. Consequently, this task analysis should be useful to anyone interested in improving call centre performance, irrespective of the issue of multimedia data.

2

Method

The method is the same as described in Vinson et al. [19]. In this section, we provide a summary of our method. For additional information, please consult Vinson et al. Our data sources were documentation from the ECC and semi-structured interviews of ECC staff. Of 69 eligible ECC staff members, five volunteered to be 2

While the 911 system in both the US and Canada is being transitioned to an IP network, national and local agencies in each country maintain final approval and control of over the roll-out timelines [3, 8].

362

N. G. Vinson et al.

interviewed. All the interviews were conducted using the Microsoft Teams videoconference application, without recording. A team of two or three researchers interviewed each participant, generally with one or two of the researchers asking questions and the other taking notes. Because of the COVID-19 pandemic, all the data collection, recruitment and interviewing activities were conducted remotely and online. This study, which involved human subjects, was approved both by the Service de police de la Ville de Qu´ebec (SPVQ) and the Research Ethics Board of the National Research Council Canada, which conforms to the TriCouncil Policy Statement [1]. Our objective was to map out the tasks performed by the dispatchers, including the conditions under which, and the order in which, they performed them. Essentially, our main objective was to construct Figs. 3, 4, 5, 6, 7, and 8. A secondary objective was to determine the dispatchers’ challenges. For data representation, we used a hierarchical task analysis [6] method inspired by MAD [15] and its enhanced version, MAD* [14]. The constructors ALT, LOOP, PAR and SEQ explain the links between activities and subactivities shown in Figs. 3, 4, 5, 6, 7, and 8. Their meaning is as follows: – – – –

ALT = alternative tasks: different ways to execute a particular task. LOOP = cyclical tasks: tasks that must be repeated several times. PAR = parallel tasks: several tasks executed simultaneously or in any order. SEQ = sequential tasks: which must be executed in order (left to right).

3 3.1

Results and Discussion Organizational Context

The dispatchers work in the ECC’s organizational context. The ECC’s organizational structure and workflow (Fig. 1) provide the context in which we describe the dispatchers’ task models in more details later in the paper. When people need emergency assistance, they call the ECC by dialing 911. ECC call takers answer the calls, collect information from the callers, and record pertinent information in the call record (Fig. 2). Dispatchers access call records and send first responders to the scene. In addition to 911 calls, the call centre also receives calls and texts from police and firefighters who need additional information about an incident, as well as calls about alarms. These calls are placed through administrative lines to reach dispatchers rather than call takers. Along with the staff directly involved in handling incidents, there are also management and training roles such as supervisors, coordinators, quality assurance attendants, and coaches. Staff in these roles do not participate in incident processing. Staff often play multiple roles depending on their level of experience.

An Emergency Centre Dispatcher Task Analysis

363

Fig. 1. Emergency Call centre Broad Context. The figure shows that the various actors of the 911 centre all have direct relationships through the call record. From Vinson, N.G., Lapointe, JF., Lemaire, N. (2022). An Emergency Centre Call Taker Task Analysis. In: Harris, D., Li, WC. (eds) Engineering Psychology and Cognitive Ergonomics. HCII 2022. Lecture Notes in Computer Science, vol 13307. Springer, Cham. https://doi.org/10.1007/978-3-031-06086-1 17. Copyright His Majesty the King in Right of Canada, as represented by the The Minister of Innovation, Science, and Economic Development (2022), exclusively licensed to Springer, Cham. Reprinted with permission.

364

N. G. Vinson et al.

At the organizational level, the ECC communicates with several other units and organizations [19]: – Police units. – The Surveillance and Operation Support Centre (CVSO) for police operations requiring coordination. – Firefighters. – Ambulance dispatch (Health Communication Centre des Capitales). – External services (like utility companies). ECC call processing is subject to certain response time and information collection standards that are discussed in [19]. 3.2

Call Taker

As described in Vinson et al. [19], a call taker’s main responsibility is to interactively process a 911 call. The call taker collects information directly from the caller and adds this information to the call record (Fig. 2). Once sufficient information is collected, the call taker submits the call record to an incident queue, from which the dispatcher selects incidents to which to dispatch first responders. In some types of emergencies, the call taker will submit the call record to the queue before having collected all the relevant information so that the dispatcher can send first responders as quickly as possible to the scene. The call taker then continues to interact with the caller to add more information to the call record. This information may be required by regulations [9], to support dispatch, or to support the first responders’ response. For example, a description of a fire may lead a dispatcher to send in more fire fighters, or a description of a suspect may help police apprehend this individual. The call taker transfers health emergency calls to the regional paramedic and ambulance emergency call centre, which collects the caller’s information and dispatches medical personnel. If there is also a need for police or firefighter response, the SPVQ ECC call taker will create a call record with the necessary information for the SPVQ ECC dispatcher to send police and/or firefighters to the scene. This is a common occurrence in the case of serious traffic accidents. In addition to collecting information from the caller and entering it in the call record, call takers must also insert an incident code into the call record. The incident code will indicate a certain type of incident (e.g. burglary, vehicle fire) as well as a degree of urgency (as is also described in [7,16]). Not all calls are emergencies (see also [7]). In some cases, the caller is referred to a nonemergency line or police station, or a call record, known as an administrative record, is created without submitting it for dispatch. The call taker’s tasks are described in more detail in Vinson et al. [19].

An Emergency Centre Dispatcher Task Analysis

365

Fig. 2. Call record as displayed on the call taker’s computer screen. English translations are provided in black text on a teal background. They are not part of the call taker’s display. From Vinson, N.G., Lapointe, JF., Lemaire, N. (2022). An Emergency Centre Call Taker Task Analysis. In: Harris, D., Li, WC. (eds) Engineering Psychology and Cognitive Ergonomics. HCII 2022. Lecture Notes in Computer Science, vol 13307. Springer, Cham. https://doi.org/10.1007/978-3-031-06086-1 17. Copyright His Majesty the King in Right of Canada, as represented by the The Minister of Innovation, Science, and Economic Development (2022), exclusively licensed to Springer, Cham. Reprinted with permission.

366

4

N. G. Vinson et al.

Dispatchers

There are three types of dispatchers, each type requiring specialized training: – Police dispatcher. – Fire dispatcher. – Public works and alarms dispatcher. Since the primary focus of this article is on police and fire dispatch, we will avoid here the role of the public works and alarms dispatcher. ECC staff begin their careers as call takers. After one year of experience they can train to be become police dispatchers as well, and then after another year, fire dispatchers. Staff members can fill any role for which they are qualified. For example, a staff member can be a call taker one day and a fire dispatcher the next. Note that there are no ambulance or paramedic dispatchers. In Qu´ebec City’s system, the call taker transfers calls for medical emergencies to the medical emergency call centre, which dispatches medical personnel (Fig. 1). 4.1

Police Dispatchers

A police dispatcher uses the ECC’s software system to select a call record and assign a police unit to it, and answers the questions from police officers over radio or by phone. In a shift, there are police dispatchers assigned to dispatch units and others in support of dispatch activities (monitoring police radio, answering police questions, conducting research in the CRPQ database, etc.) The police dispatcher’s task model is shown in Figs. 3 to 5, on the following pages. A police dispatcher mainly: – Dispatches units. • Receives call records. • Selects a call record. • Notes the call record’s information. • Reviews the incident code entered by the call taker and adjusts it if needed. • Reviews the dispatch software’s suggested response unit and changes it if needed. • For urgent incidents: sends the closest unit (typically as identified by the dispatch software). • For non-urgent incidents: sends a unit from the incident area when available. – Monitors the call record queue. – Listens to the police radio and adds information to the call record. – Assists police officers by answering their questions.

An Emergency Centre Dispatcher Task Analysis

367

Fig. 3. Police Dispatcher’s Task Model, Sect. 1. Dispatcher in a dispatch role. ALT = alternative tasks: different ways to execute a particular task. LOOP = cyclical tasks: tasks that must be repeated several times. PAR = parallel tasks: several tasks executed simultaneously or in any order. SEQ = sequential tasks: which must be executed in order (from left to right).

First responders will contact dispatchers over administrative telephone lines or radio to ask for additional information regarding an incident. In other cases, first responders will update the dispatcher on an incident and request additional support. Police officers can also call in an incident (for example an abandoned vehicle) for which the dispatcher will create an administrative call record, which does not go to the dispatch queue but provides a record of the incident.

368

N. G. Vinson et al.

Police dispatchers often have to multitask, for instance by listening to many audio sources simultaneously (over the air and over the phone). The police dispatcher’s task model is shown in Figs. 3 to 5, on the following pages.

Fig. 4. Police Dispatcher’s Task Model, Sect. 2. Police dispatcher in a support role. ALT = alternative tasks: different ways to execute a particular task. LOOP = cyclical tasks: tasks that must be repeated several times. PAR = parallel tasks: several tasks executed simultaneously or in any order. SEQ = sequential tasks: which must be executed in order (from left to right).

4.2

Fire Dispatchers

A fire dispatcher uses the ECC’s software system to retrieve a call record and assign a firefighting unit to it, and answers the questions from firefighters over the air or by phone. A fire dispatcher: – Dispatches units. • Receives call records. • Selects a call record. • Reads the information (as quickly as possible). • Reviews the incident code and number of alarms and adjusts them if needed.

An Emergency Centre Dispatcher Task Analysis

369

Fig. 5. Police Dispatcher’s Task Model, Sect. 3. Dispatcher in a dispatch mode, continued from Fig. 3. ALT = alternative tasks: different ways to execute a particular task. LOOP = cyclical tasks: tasks that must be repeated several times. PAR = parallel tasks: several tasks executed simultaneously or in any order. SEQ = sequential tasks: which must be executed in order (from left to right).

370

N. G. Vinson et al.

Fig. 6. Fire dispatcher’s Task Model, Sect. 1. The highest level of the fire dispatcher’s task model. The other sections are displayed in Figs. 7 and 8. ALT = alternative tasks: different ways to execute a particular task. LOOP = cyclical tasks: tasks that must be repeated several times. PAR = parallel tasks: several tasks executed simultaneously or in any order. SEQ = sequential tasks: which must be executed in order (from left to right).

– – – –

• Reviews the dispatch software’s suggested response unit(s) and makes changes if needed. • For high priority incidents, dispatches the nearest unit(s), typically as suggested by the dispatching software: ∗ Rings the alarm bells in all fire stations. ∗ Announces the incident and response unit(s) on the fire station’s public address speakers. • For low priority incidents: dispatches a unit from the incident area when one becomes available. Monitors the call records in queue. Supports the firefighters by providing additional information as they make their way to the incident. Monitors firefighter communications during an incident response. Enters information into the call record regarding the incident and response.

The fire dispatcher’s task model is shown in Figs. 6, above, and 7 and 8, on the following pages.

5

Dispatching Challenges

Our task analysis revealed some of the challenges faced by dispatchers. These challenges provide opportunities to develop software that will better support dispatchers. The challenges we discuss below are

An Emergency Centre Dispatcher Task Analysis

– – – – – –

371

Database search. Disturbing images. Multi-channel audio monitoring. Fire unit redistribution. Traffic conditions. Hazardous materials.

5.1

Database Search

While call takers obtain relevant information directly from callers [19], dispatchers also seek information relevant to emergency response. One source of such information is official records. Dispatchers may, time permitting, conduct searches of the following databases to add information to the call record: – The Centre de renseignements policiers du Qu´ebec (CRPQ; Qu´ebec Police Information Centre). – The ECC’s own call records (from previous calls). – The Soci´et´e de l’assurance automobile du Qu´ebec (SAAQ; Qu´ebec Automobile Insurance Corporation). – The Centre d’information de la police canadienne (Canadian Police Information Centre; CIPC/CPIC). – Dispatchers also report having conducted Internet searches. The CRPQ is a database managed by the Sˆ uret´e du Qu´ebec, the provincial police force [10]. The CRPQ provides access to provincial information about: – – – –

An individual’s encounters with law enforcement (e.g. full criminal record). Firearms. Addresses. Vehicles, their plates, and ownership.

The ECC has a database of call records from previous calls. Call takers and dispatchers can examine these records to add relevant information to the current call record. The SAAQ’s database contains information about vehicles, their plates, their owners, and drivers licenses, which also show the holder’s address. CPIC is a Canadian national database of police information. It provides similar information to the CRPQ, but from provinces other than Qu´ebec, and often in summary form [13]. Dispatchers sometimes search the Internet to provide first responders with additional information, about a whether business may have hazardous materials on site for example. The difficulty dispatchers face is having to launch multiple searches across these multiple databases and sources of data. This multiple search process wastes time, which is a critical resource in an emergency situation. Instead of this multiple search process, dispatchers would prefer to have a federated search capability, where the information from the call record automatically launches searches and retrieves all associated information. However, one may wonder whether this would lead to information overload.

372

N. G. Vinson et al.

Fig. 7. Fire dispatcher’s Task Model, Sect. 2. Processing a call record. The other sections are displayed in Figs. 6 and 8. ALT = alternative tasks: different ways to execute a particular task. LOOP = cyclical tasks: tasks that must be repeated several times. PAR = parallel tasks: several tasks executed simultaneously or in any order. SEQ = sequential tasks: which must be executed in order (left to right).

An Emergency Centre Dispatcher Task Analysis

373

Fig. 8. Fire dispatcher’s Task Model, Sect. 3. Monitoring communications, providing support. The other sections are displayed in Figs. 7 and 8. ALT = alternative tasks: different ways to execute a particular task. LOOP = cyclical tasks: tasks that must be repeated several times. PAR = parallel tasks: several tasks executed simultaneously or in any order. SEQ = sequential tasks: which must be executed in order (left to right).

374

N. G. Vinson et al.

5.2

Disturbing Images

911 call processing can be psychologically taxing [7,19,20]. As discussed by Vinson et al. [19], ECC staff can be exposed to disturbing images, audio or video. Moreover, the upcoming NG911 system is intended to allow callers from the public to transmit images and possibly video [2,18], making exposure to troubling material even more likely. 5.3

Multi-channel Audio Monitoring

While engaging in their main tasks, dispatchers also monitor police and fire radio communications. This poses a divided attention problem, likely reducing the dispatcher’s overall efficiency [21]. 5.4

Fire Unit Redistribution

A two alarm fire requires a number of firefighting units to respond. The firefighting units not responding to the fire must then be redistributed throughout the city to maintain coverage of all areas. This would be a time-consuming task to perform fully manually. Fortunately, the particular software used at the SPVQ can perform this task automatically. However, dispatchers report that the results are not always acceptable and the system has some bugs. In most cases, dispatchers can correct or workaround any deficiencies. However, it would be better if the bugs were addressed and the system optimized to deliver better results. 5.5

Traffic Conditions

For police and fire dispatch, real-time traffic information is not displayed in the software’s geographical map. Obviously in an emergency, response time is critical and since traffic can impact response time, the system should take traffic information into account when suggesting response units. It also important for dispatchers to see the traffic information so they can correctly approve or modify the system’s recommendation. Interestingly, the fire unit redistribution software system, which is separate from the dispatch system, does take traffic conditions into account when recommending a certain redistribution. It should be possible for dispatch system developers to add traffic conditions as a layer to the map by subscribing to a traffic condition information service. 5.6

Hazardous Material

Dispatchers have access to a database of the locations of hazardous materials in the city. However, the database is incomplete. Dispatchers will sometimes conduct an Internet search on a company or building to attempt to determine whether hazardous material may be present. If they do find information that is missing from the database, they add it to the database so it is available in the future. However, A better hazardous waste inventory would help dispatchers save time and provide first responders with important relevant information.

An Emergency Centre Dispatcher Task Analysis

5.7

375

Fire Code and Alarm Selection

There are 26 different incident codes that indicate an incident requiring a firefighter response. These codes indicate the nature of the incident (for example vehicle on fire) and its urgency. Another factor determining the fire response is the number of alarms, with a greater number of alarms requiring more responding units. The call taker initially enters the code and number of alarms, and both the dispatcher and call taker can edit them as additional information comes in. The code and alarm information is important because it is used by the dispatch software system to suggest which units should respond. Dispatchers report that this function is slow, but they rarely override it or modify the selection. Given this system, it becomes very important to enter the correct incident code and number of alarms. However, the determination of the code and alarms depends initially completely on the information provided by the caller. Once first responders arrive on site, they can update the dispatcher who may then change the code or number of alarms. Nonetheless, such a workflow delays an appropriate response. While dispatchers did not see this as a problem, it is not optimal.

6

Meeting Challenges and Conclusions

A few descriptions of call takers’ and dispatchers’ tasks can be found in the literature, even though these publications often focused on another topic. In general, our findings regarding the dispatchers’ tasks are consistent with the tasks reported in those articles, though more details are provided here. However, organizational structures differ markedly across ECCs. The documentation of tasks is a crucial step in ultimately improving the dispatchers’ performance through software that better supports their work. We also identified a number of difficulties faced by dispatchers. These difficulties can hinder performance. Obviously, ECC staff performance is critical since a delay of a few seconds in responding to an emergency can mean the difference between life and death [17]. While this article focuses on the dispatcher’s task model and our previous article focused on the call taker’s task model [19], these task analyses were conducted to guide the development of a software prototype to support ECC staff’s processing of multimedia information expected to come through the new NG911 network. The prototype we developed includes a number of functions that either support dispatchers in meeting their challenges, or can be leveraged to that effect. In this context, we developed functionality that initially masks images and automatically provides a text description of them. This description helps ECC staff decide whether or not to unmask the image, thus reducing the odds that dispatchers would be exposed to images they find disturbing. While ECC staff currently rarely see incident images, we expect them to become more prevalent when callers gain the ability to transmit images through the NG911 network. This will then increase the utility of image filtering and description.

376

N. G. Vinson et al.

Moreover, viewing images (or video) can help dispatchers determine the type of fire and estimate its severity and urgency. Currently, the call taker and dispatcher must perform these tasks based solely on the caller’s description before first responders arrive at the scene. Supporting a more accurate characterization of a fire early on can lead to a faster and more appropriate response. Callers will be able to send images through the new NG911 netrwork, providing the dispatcher with additional information. Moreover, automatically searching social networks (twitter for example) for information relevant to the incident might also help dispatchers optimize the response. Dispatchers reported that their greatest challenge lies in searching for information across databases and the internet. Searching requires them to retype information from the call record into the several search fields. Accordingly, the dispatchers’ highest priority feature is a federated search engine that would use call record entries to search and retrieve information from several sources. However such a feature would retrieve a substantial volume of information, with some repetition. In our prototype, we included natural language processing that extracts information relevant to the incident from speech and text. This functionality can be applied to filter and sort information retrieved from federated search, thus greatly reducing the need for dispatchers to review it. However, our natural language processing feature cannot currently process information from the dispatchers’ databases due to privacy and security restrictions. Another application of federated search with natural language processing would be the automatic identification of hazardous material locations, though we did not implement this functionality. A federated search function can be adapted to crawl the web and Facebook to search for businesses and other locations that may house hazardous material. The information retrieved by such a function could then populate the ECC’s hazardous waste database. This search could be greatly facilitated by a list of the types of hazardous materials for each type of business. For example, the list would show that a welding company will have acetylene tanks. It is not clear that such a list exists however. Our natural language processing functions took speech as input and generated a transcription from which it extracted keywords. Limits to speech recognition accuracy do not allow the generation of a full, accurate transcription. However, we were able to extract keywords and display them to the call taker, making call record data entry more efficient. The same functionality could be applied to the radio chatter that dispatchers listen to. Due to privacy issues, we did not have access to such audio recordings, so we were not able to demonstrate the use of such a function in our prototype. While dispatchers face several challenges that undoubtedly increase incident response time, it is possible to develop software to ease those challenges, hopefully reducing response times, and consequently reducing injuries and property damage.

An Emergency Centre Dispatcher Task Analysis

377

References 1. Canadian Institutes of Health Research, Natural Sciences and Engineering Research Council of Canada, and Social Sciences and Humanities Research Council: Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans. Secretariat on Responsible Conduct of Research, Government of Canada (2018). https://ethics.gc.ca/eng/documents/tcps2-2018-en-interactive-final.pdf 2. Canadian Radio-Television and Telecommunications Commission: Next-generation 9-1-1 – Modernizing 9-1-1 networks to meet the public safety needs of Canadians, Telecom Regulatory Policy CRTC 2017-182, File numbers: 1011-NOC20160116 and 8665-C12-201507008. Canadian Radio-Television and Telecommunications Commission, Government of Canada (2017). https://crtc.gc.ca/eng/archive/ 2017/2017-182.pdf 3. Canadian Radio-Television and Telecommunications Commission: Next-generation 9-1-1. Canadian Radio-Television and Telecommunications Commission, Government of Canada (2021). https://crtc.gc.ca/eng/phone/911/gen.htm 4. Chatfield, A.T., Scholl, H.J., Brajawidagda, U.: Sandy tweets: Citizens’ coproduction of time-critical information during an unfolding catastrophe. In: Proceedings of the Annual Hawaii International Conference on System Sciences, pp. 1947–1957. IEEE (2014) 5. Dearstyne, B.: The FDNY on 9/11: information and decision making in crisis. Gov. Inf. Q. 24(1), 29–46 (2007) 6. Diaper, D., Stanton, N.: The Handbook of Task Analysis for Human-Computer Interaction. Lawrence Erlbaum Associates, London (2004) 7. Forslund, K., Kihlgren, A., Kihlgren, M.: Operators’ experiences of emergency calls. J. Telemed. Telecare 10(5), 290–297 (2004) 8. Gallagher, J.C.: Next generation 911 technologies: Select issues for Congress (R45253 - Version: 4). Congressional Research Service, United States Congress (2018). https://crsreports.congress.gov/product/details?prodcode=R45253 9. Government of Quebec: Regulation respecting standards, specifications and quality criteria applicable to 9-1-1 emergency centres and to certain secondary emergency call centres (2021). http://www.legisquebec.gouv.qc.ca/fr/ShowDoc/cr/S-2. 3,%20r.%202%20/ 10. Government of Quebec: Sˆ uret´e du Qu´ebec (undated). https://www.sq.gouv.qc.ca/ en/the-surete-du-quebec/ 11. Kim, J.K., Sharman, R., Rao, H.R., Upadhyaya, S.: Framework for analyzing critical incident management systems (CIMS). In: Proceedings of the Annual Hawaii International Conference on System Sciences, vol. 4, pp. 1–8 (2006) 12. Kim, J.K., Sharman, R., Rao, H.R., Upadhyaya, S.: Efficiency of critical incident management systems: Instrument development and validation. Decis. Support Syst. 44(1), 235–250 (2007) 13. Royal Canadian Mounted Police (RCMP): Canadian Police Information Centre (CPIC). Royal Canadian Mounted Police (RCMP), Government of Canada (undated). https://web.archive.org/web/20090124145105/http://www.rcmp-grc. gc.ca/fs-fd/pdfs/cpic-cipc-eng.pdf 14. Scapin, D.L., Bastien, J.M.C.: Analyse des tˆ aches et aide ergonomique ` a la conception : l’approche MAD*. In: Kolski C, pp. 85–116. Analyse et conception de l’IHM Herm`es (2001) 15. Scapin, D.L., Pierret-Golbreich, C.: MAD: Une m´ethode de description de tˆ ache. In: Colloque sur l’ing´enierie des interfaces homme-machine. pp. 131–148 (May 1989)

378

N. G. Vinson et al.

16. Shively, R.J.: Emergency (911) dispatcher decision making: Ecological display development. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 39, pp. 506–510 (october 1995). http://journals.sagepub. com/doi/10.1177/154193129503900914 17. Terrell, I.S., McNeese, M.D., Jefferson, T.: Exploring cognitive work within a 911 dispatch center: Using complementary knowledge elicitation techniques. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 48, pp. 605–609 (September 2004). http://journals.sagepub.com/doi/10.1177/ 154193120404800370 18. U.S. Government: Next generation 911. https://www.911.gov/issue nextgeneration911.html 19. Vinson, N.G., Lapointe, J.F., Lemaire, N.: An emergency centre call taker task analysis. In: Harris, D., Li, W.C. (eds.) Engineering Psychology and Cognitive Ergonomics, pp. 225–241. Springer, Cham (2022). https://doi.org/10.1007/978-3031-06086-1 17 20. Weibel, L., Gabrion, I., Aussedat, M., Kreutz, G.: Work-related stress in an emergency medical dispatch center. Annals of Emergency Medicine 41(4), 500–506 (2003). https://www.sciencedirect.com/science/article/pii/S0196064403000052 21. Wickens, C.: Engineering Psychology and Human Performance. Routledge, Merril (1984)

Research Trends and Applications of Human-Machine Interface in Air Traffic Control: A Scientometric Analysis Ziang Wang(B) Beijing Institute of Technology, Beijing 100081, China [email protected]

Abstract. To systematically understand trends and applications of humanmachine interface in air traffic control, this study used data from Web of Science Core Collection as a sample, adopted scientific bibliometric method, used VOSviewer and CiteSpace to draw intuitive data knowledge graph and completed visualization analysis. The results reveal the research hotspots, trends and application status, and also show that the overall number of literature in the search area is increasing, the USA, France, Germany, China and the Netherlands are outstanding in terms of output in countries; universities and colleges are the main institutions of research, but the cooperation between research institutions and authors is not close enough; the research hotspots mainly focus on air traffic control, human-computer interaction, human-machine interface, automation, systems, performance, situation awareness, mental workload, etc.; future research will focus on systems, automation, human factor, mental workload, air traffic management, performance, human-machine interface, and situational awareness, etc. This study provides an in-depth analysis and interpretation of the trends and applications of human-machine interface in air traffic control in an objective and quantitative manner while also presents a clear knowledge structure of the research topic to provide theoretical and practical guidance for subsequent scholars in conducting research in this field. Keywords: Human-Machine Interface · Air Traffic Control · Scientometric Analysis

1 Introduction Human-machine interface is the medium of interaction and information interaction between human and machine, which plays an important role in human-machine system and has been deeply applied in many fields. Air traffic control refers to the unified management and control of aircraft flight activities by the state in its airspace and flight information region. With the rapid development of contemporary civil aviation technology, the operational aspects are increasingly focused on human factors. New traffic demands, better navigational data, new forms of automated assistance, and technological innovations require air traffic control to evolve to meet new requirements in new ways © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 379–390, 2023. https://doi.org/10.1007/978-3-031-35389-5_26

380

Z. Wang

(Hopkin 1988). The human-machine interface is gaining attention and is widely used, but there are areas for improvement. Human-machine interface designs in air traffic control must be flexible enough to benefit from technological advances yet continue to satisfy human needs (Hopkin 1989). This paper analyzes the trends and applications of human-machine interface in air traffic control, with the aim of summarizing the problems, optimizing the human-machine interface in air traffic control, and providing new ideas for a new generation of interface design to promote the safety, efficiency, and environmental protection of air traffic management. The research in this paper contributes to a comprehensive understanding of trends and applications of human-machine interface in air traffic control (human-machine interface in air traffic control abbreviated as HMI-ATC).However, it is worth noting that, on the one hand, there is a certain lack of relevant review papers in the field of HMIATC, and the literature is cumbersome and old, so it is necessary to further update and analyze the relevant research. On the other hand, the quantitative analysis of various literature materials by bibliometric method can help to discover potential patterns and information in the large amount of literature data. Therefore, the present study aims to explore the relevant literature in the field of HMI-ATC by bibliometric method, with the aim of grasping the status of research, analyzing the research hotspots and applications, revealing the future development trend, and promoting the subsequent breakthroughs and innovations in theory and methods.

2 Data Sources and Methods 2.1 Data Sources Web of Science (WOS) is a well-known international core database, so this paper selects it as the data source and selects five major citation indexes as the search sources in WOS database to retrieve and obtain literature data on HMI-ATC. To ensure the accuracy and representativeness of the data sources, each literature included not only the text and references, but also other information such as title, author, institution, abstract, keywords, publication vehicle, and time, etc. Afterwards, the retrieved literature was exported, and interfering articles such as deviations from the research topic, missing field information, and duplicate data were excluded, a total of 301 papers were obtained. After that, the collected data were analyzed and studied by using the method of scientific knowledge mapping. 2.2 Methods The scientific bibliometric method and information visualization method are the main research methods used in this paper. And the scientific knowledge map generated in this paper is precisely a technique to display the knowledge of research topics through the visualization of scientific and technical texts, supported by the theories and methods of scientometrics and bibliometrics. Bibliometrics is a comprehensive knowledge system that focuses on quantification and uses mathematical and statistical methods to quantitatively mine and analyze the potential patterns and information in literature data. By

Research Trends and Applications

381

measuring the time of publication, country/region, author, institution, and journal, we can quickly understand the evolution and development trends of the frontier issues in the discipline or knowledge field and obtain the research hotspots and the latest application status in the field, which helps scholars to have a macro grasp of the relevant research and facilitate the subsequent in-depth research. For data mining and visualization presentation, we selected VOSviewer and CiteSpace as analysis tools and applied knowledge map and literature statistics to the visualization analysis. VOSviewer is a scientific knowledge map analysis tool with a powerful user graphical interface and map visualization developed by Van Eck and Waltman at the Center for Scientific and Technological Research in Leiden University (Van Eck and Waltman 2010). CiteSpace is a JAVA-based scientometric visualization system developed by Professor Chaomei Chen at Drexel University, which is mainly applied to identifying and displaying new trends and developments in scientific literature and detecting the frontiers of research development. It has become one of the most reliable and advanced software in the field - of scientometric data visualization (Chen 2006).

Fig. 1. Distribution of annual publications on HMI-ATC.

3 Results 3.1 Distribution of Publications One of the important methods to measure research trends and applications of humanmachine interface in air traffic control is to analyze and assess the research dynamics of the field based on the output change pattern of academic literature over time. The annual output distribution of HMI-ATC papers from 1989–2022 is shown as follows (see Fig. 1). As shown in the WOS publication, the annual output of papers before 2001 was relatively stable with less than 10 papers, and the output of papers after 2001 showed a fluctuating growth trend. The first article in the search scope was published in 1989,

382

Z. Wang

and since 1989–2021 the annual number of articles published increased from 2 in 1989 to 15 in 2021, reaching a peak in 2019 with a total of 23 annual publications. The overall fluctuating upward trend of the literature output reflects the continuous development of HMI- ATC research and foretells that HMI- ATC research will be the focus of academic research in the future. 3.2 Performance of Countries/Regions In terms of countries/regions output, the number of publications in the data set and the number of citations describe the high-producing countries/regions in the research field and their impact. The results show that a total of 41 countries/regions contribute to this research area worldwide, and the USA ranks the first with 84 publications, followed by France (34), Germany (27), China (25), and the Netherlands (24). The number of publications from these countries accounts for 56.9% of the total number of publications. From the perspective of the overall cooperation network, the cooperation in European countries or regions is relatively intensive and has a large academic impact, but the overall cooperation worldwide is not close enough and is mostly scattered. 3.3 Performance of Institutes and Authors The analysis of the searched literature shows that 66 institutions worldwide have conducted HMI-ATC-related research from 1989 to 2022. Delft University of Technology ranks the first with 19 articles, followed in order by Mitre Corp, Nanyang Technological University, Linkoping University and University of Toulouse, with 9, 9, 8 and 7 articles respectively. These institutions are from different countries, while universities account for most of them, which shows that higher education institutions are highly productive institutions on HMI-ATC research and have contributed greatly to the development of HMI-ATC research. Clarifying the collaborative relationships among researchers is a prerequisite for gaining insight into the degree of development and refinement of an academic field. And the authors of the papers are the smallest units of the output bodies of the research field and the direct contributors to HMI-ATC research. The preliminary analysis of the authors’ names, processed for co-citation analysis, allows the identification of authors who are more active in the field. The data analysis reveals that there are not many highly productive authors involved in HMI-ATC research internationally, and the top 5 authors in terms of literature output are Borst, Clark; Mulder, Max; van Paassen, M. M.; Ohneiser, Oliver; Chen, Chun-Hsien. The analysis shows that the main limitations of the current study include not only the number of high-yield authors and research institutions is insufficient, but also the cooperation between research authors and institutions is not close enough. The followup study should integrate the research advantages of each author and institution, various resources through solidarity and collaboration to promote the development of HMI-ATC research.

Research Trends and Applications

383

Table 1. Top 10 high-yield journals.

3.4 Performance of Journals Analysis of the distribution of journal sources on HMI-ATC can provide researchers with scientific reference information about the HMI-ATC research field and its direction. The research literature on HMI-ATC within the search area is mainly distributed among 195 journals. There is a list of the top 10 most productive journals from 1989–2022 with their respective impact factor and five-year impact factor (see Table 1). Their publication volume account for 19% of the total number of articles published approximately. The first in line is Human Factors, which had 10 articles and an impact factor of 3.598 and a fiveyear impact factor of 4.212. The second in line is IEEE Transactions on Human-Machine Systems, which had 6 articles and an impact factor of 4.124 and a five-year impact factor of 4.54. The third in line is International Journal of Aviation Psychology, which had 6 articles and an impact factor of 0.8 and a five-year impact factor of 1.103.The fourth in line is Frontiers in Human Neuroscience, which had 5 articles and an impact factor of 3.473 and a five-year impact factor of 4.111.The fifth in line is Aerospace, which had 4 articles and an impact factor of 2.66 and a five-year impact factor of 2.579.The sixth in line is Aircraft Engineering and Aerospace Technology, which had 4 articles and an impact factor of 1.478 and a five-year impact factor of 1.293.The seventh in line is Cognition Technology & Work, which had 4 articles and an impact factor of 2.818 and a five-year impact factor of 2.32.The eighth in line is International Journal of HumanComputer Studies, which had 4 articles and an impact factor of 4.866 and a five-year impact factor of 4.435.The ninth in line is Sensors, which had 4 articles and an impact factor of 3.847 and a five-year impact factor of 4.05.The tenth in line is Transportation Research Part C-Emerging Technologies, which had 4 articles and an impact factor of 9.022 and a five-year impact factor of 10.323. The analysis reveals that the journals on HMI-ATC research cover many fields such as technology, radiation, human, electricity,

384

Z. Wang

etc. and shows the characteristics of multidisciplinary development. In general, HMIATC research literature is mainly published in high-level journals, but the number of publications is inadequate. And there is still much room for research to develop to a high level, which should attract more attention from the academic community. 3.5 Analysis of Hot Spots, Trends and Applications The keywords of the literature are highly refined by the authors, reflecting the research direction and theme of the article, and can be used to explore the research hotspots in the field of HMI- ATC. The keyword co-occurrence frequency was set to 2 by running Vosviewer, and the keyword co-occurrence clustering network formed by 171 keywords was obtained after filtering and merging synonyms (see Fig. 2). The keywords with the same color in the figure belonged to the same cluster. A total of 3 main clusters were formed. From the analysis results, the hot research topics of HMI-ATC can be divided into three major categories, which are #1 air traffic control and human-computer interaction, #2 autonomous systems and interface design, and #3 human factors and control.

Fig. 2. Keywords co-occurrence clustering network.

Cluster #1—air traffic control and human-computer interaction contains 59 cluster members, mainly including air traffic control, human-computer interaction, visualization, augmented reality, simulation, mixed reality and other keywords. This cluster reflects that HMI-ATC focuses on applications of human-machine interaction in air traffic control and emphasizes machine integration. Air traffic control is the unified

Research Trends and Applications

385

management and control by the state of aircraft flight activities in its airspace and flight information region, and is a service provided by ground controllers who ensure the safe, orderly, and expeditious flow of air and airport traffic. They control and supervise aircraft departing and landing, as well as during flight by instructing pilots to fly at assigned altitudes and on defined routes (Bergner and Hassa 2012). Since the 1960s, the rapid development of information systems has led to the extensive development of human-computer interaction (HCI) research aimed at designing human-computer interfaces with ergonomic properties such as friendliness, usability, transparency, etc. (Hoc 2000). At the same time, in the context of intelligent air traffic control, the need for human-computer interaction is further enhanced. This cluster also contains visualization and augmented reality. There is a growing interest in eye tracking techniques used to support traditional visualization techniques such as charts, diagrams, maps, or dynamic interactive graphics. More sophisticated data analysis is required to derive knowledge and meaning from the data, so the human visual system is supported by appropriate visualization tools that enable human operators to solve complex tasks (Burch, Chuang et al. 2017). Cluster #2—autonomous systems and interface design contains 56 cluster members, mainly including automation, systems, interface design, man-machine systems, adaptive systems, and other keywords. This cluster reflects that HMI-ATC focuses not only on the basics of interface design, but also on autonomous systems. Automation technologies have been widely used to improve production and operational efficiency, reduce complex processes, and reduce the human errors (Yiu, Ng et al. 2021). With the increasing sophistication of automation in recent years, the interaction between humans and automated systems has shifted from humans using automated tools to humans and automated systems “collaborating” with each other (Lee 2005). Trust is becoming increasingly important in the growing need for collaborative human-machine teams. Thus, it is more necessary to expand on previous meta-analytic foundation in the field of human-robot interaction to include all of automation interaction (Schaefer, Chen et al. 2016). The development of automated systems and key technologies can improve the safety and efficiency of air traffic control, reduce the workload of air traffic controllers, and adapt to the rapid development of the future aviation industry. Cluster #3—human factors and control contains 56 cluster members, mainly including air traffic controller, mental workload, performance, brain, eye tracking and other keywords. This cluster reflects that HMI-ATC focuses on the role of human factors in air traffic control. Humans are the most flexible, adaptable, and valuable part of the air traffic control (ATC) system. Fully utilizing the human element can improve the quality and efficiency of aviation management activities to a great extent. However, the human factor is also the most vulnerable part of the system to adverse effects. In the air traffic control system, human errors can lead to dangerous aircraft approaches, and more seriously to air crashes (Yun-zhong and Ding 2004). More than 80% of accidents and incidents were caused by poor human performance, human factors have become the biggest factor affecting the efficiency of safety (Li, Li et al. 2010). With the increasing sophistication of systems and the increasing influence of controllers on ATC safety, it is important to enhance the understanding of the importance of human factors in ATC as a prerequisite for the healthy development of aviation.

386

Z. Wang

By statistically analyzing the average occurrence time of the keywords and superimposing them on the original clustering map (see Fig. 3), we can further understand and obtain the frontier themes and development trends of HMI-ATC. The research hotspots summarized by the three clusters can be found that the cluster #3—human factors and control is closest to the present, which contains the current cutting-edge topics of HMIATC. Cluster #2—autonomous systems and interface design is also an important research direction that HMI-ATC focuses on. Cluster#1—air traffic control and human- computer interaction, which appeared on average before 2015 and contained hot topics for early research in this area. In the whole clustering network, the keywords that appeared on average later than 2015 mainly include adaptive systems, mental workload, performance, brain, etc.

Fig. 3. Keywords co-occurrence clustering superposition diagram.

The keywords burst graph can show the evolutionary dynamics of research hotspots in a certain discipline and predict the research frontier trends. To further explore and corroborate the frontier trends of HMI-ATC, a map of keywords burst term from Cite Space was used. There is a list of the top 10 keywords with the strongest bursts in different periods from 1989 to 2022 (see Fig. 4). The column of “Strength” in the figure shows the burst strength and keywords with larger values tend to have milestone significance. In addition, the red part indicates the time span of the rise and fall of the research hotspots. After sorting the top 10 burst keywords by time, we can see that some keywords burst in the past 5 years, which are air traffic controller, performance, human-machine interface, and situation awareness. After a comprehensive analysis, we can predict that the future research of HMI-ATC will focus on system, automation, human factor, mental workload, air traffic management, performance, human-machine interface, situation awareness,

Research Trends and Applications

387

etc. Human-machine integration will be more prominent in the research trend. While the automation technology is upgrading, HMI-ATC research will also focus on considering the human factor, improving the safety and efficiency of air traffic control, guaranteeing the safety of aviation business, and ensuring the high-quality development of air traffic control.

Fig. 4. Keywords Burst Term.

3.6 Theoretical Basis According to the collation and statistics, a total of 6,940 valid references were cited in the 301 documents within the search scope. As some references were cited in pairs and formed co-citation relationships. After analyzing the co-citation relationships of the literature data, we can obtain the highly cited references, the fundamental knowledge of the HMI- ATC domain. The entire set of references formed a co-citation network. The co-citation network shows the evolution of HMI-ATC at the level of fundamental knowledge. The top 10 high-cited classic literature is listed below (see Table 2), and the top 5 literature is described in detail in this paragraph. The first in line is “A model for types and levels of human interaction with automation”, which was authored by Parasuraman, R et al. in 2000. The article pointed out that technological developments in computer hardware and software could now introduce automation into almost all aspects of human-computer systems, appropriate selection was important because automation did not merely supplant but changed human activity and could impose new coordination demands on the human operator. This article also proposed that automation could be applied to four broad classes of functions: information acquisition; information analysis; decision and action selection; and action implementation (Parasuraman, Sheridan et al. 2000). The second high-cited literature is “Ecological interface design: Theoretical foundations”, which was authored by Vicente KJ et al. in 1992. The article proposed a theoretical framework for designing interfaces for complex human-machine systems. The framework was called ecological interface design (EID), based on the skills, rules, knowledge taxonomy of cognitive control (Vicente and Rasmussen 1992). The third in line is “Ecological interface design of a tactical airborne separation assistance tool”, which was authored by Van Dam, SBJ et al. in 2008.

388

Z. Wang

The article reported that in a free-flight airspace environment, pilots were freer to choose user-preferred trajectories. In this article, a novel interface, the state vector envelope, was presented that was intended to provide pilots with both low-level information, allowing direct action, and high-level information, allowing conflict understanding and situation awareness (Van Dam, Mulder et al. 2008). The fourth in line is “Adaptive automation triggered by EEG-based mental workload index: A passive brain-computer interface application in realistic air traffic control environment”, which was authored by Arico, P et al. in 2016. The article pointed out that Adaptive Automation (AA) was a promising approach for keeping the task workload demand at appropriate levels to avoid both the under- and over-load conditions, therefore improving the overall performance and safety of the human-machine system. The researchers proposed a pBCI system able to trigger AA solutions integrated in a realistic Air Traffic Management (ATM) research simulator developed and hosted at MAC (Arico, Borghini et al. 2016).The fifth in line is “ Ironies of automation”, which was authored by Bainbridge, L in 1983.This paper discussed the ways in which automation of industrial processes may expand rather than eliminate problems with the human operator. Some comments would be made on the possibility of continuing to use of the human operator for on-line decision-making within humancomputer collaboration (Bainbridge 1983). The research perspectives and contents of these classic literature show diversified characteristics, which have important academic reference values in the field of HMI-ATC research and are worthy of close reading and in-depth analysis by subsequent researchers. Table 2. Top 10 high-cited literature.

Research Trends and Applications

389

4 Conclusion In this study, scientific bibliometric and information visualization methods were used to generate scientific knowledge maps to make us understand the development trends and obtain the research hotspots and the latest application status of HMI-ATC. After the collation and analysis of the above results, we can get the following conclusions: (1) In terms of literature output, HMI-ATC research is generally on a rising trend and gaining more attention from academic community. A total of 41 countries/regions worldwide contribute to this research area, the USA, France, Germany, China and the Netherlands are the core countries in the international HMI-ATC research; institutions such as Delft University of Technology, Mitre Corp, Nanyang Technological University, Linkoping University and University of Toulouse, etc. are the main research institutions, among which universities account for most of them, showing that higher education institutions are the main force of HMI-ATC research; Borst, Clark; Mulder, Max; van Paassen, M. M.; Ohneiser, Oliver; Chen, Chun-Hsien, etc. are the main research authors. However, there is a lack of close cooperation between authors and institutions in the field of HMI-ATC and a close cooperative research system has not been formed. In addition, the academic research communication in the field still needs to be strengthened. In the future, the research advantages of each author and institution should be integrated through solidarity and collaboration. (2) In terms of research hotspots, trends and application progress, HMI-ATC research contents are comprehensive and diversified. The involvement of multiple elements broadens the depth and breadth of HMI-ATC research while the hot research topics of HMI-ATC can be divided into three major categories, which are #1 air traffic control and human-computer interaction, #2 autonomous systems and interface design, and #3 human factors and control. Research hotspots focus on air traffic control, human-computer interaction, human-machine interface, automation, systems, performance, situation awareness, mental workload, etc. During the application and development process, the research trends of HMI-ATC have changed from only studying human-machine interface in air traffic control to integrating systems, automation, and human factors in recent years. Future research will focus on systems, automation, human factors, mental workload, air traffic management, performance, human-machine interface, and situation awareness, etc. (3) In terms of theoretical basis, a lot of classic literature has been produced in the field of HMI-ATC, which constitutes the main knowledge base and links most of the research contents, playing a fundamental role in the formation of knowledge system and disciplinary branches of HMI-ATC research.

References Arico, P., et al.: Adaptive automation triggered by EEG-based mental workload index: a passive brain-computer interface application in realistic air traffic control environment. Front. Hum. Neurosci. 10 (2016) Bainbridge, L.: Ironies of automation. Automatica 19(6), 775–779 (1983)

390

Z. Wang

Bergner, J., Hassa, O.: Air traffic control. In: Stein, M., Sandl, P. (eds.) Information Ergonomics, pp. 197–225. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-258 41-1_7 Burch, M., Chuang, L.L., Duchowski, A., Weiskopf, D., Groner, R.: Eye tracking and visualization: introduction to the special thematic issue of the journal of eye movement research. J. Eye Mov. Res. 10(5) (2017) Chen, C.: CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature. J. Am. Soc. Inform. Sci. Technol. 57(3), 359–377 (2006) Hoc, J.-M.: From human–machine interaction to human–machine cooperation. Ergonomics 43(7), 833–843 (2000) Hopkin, V.D.: Air Traffic Control. Human Factors in Aviation, pp. 639–663. Elsevier, Amsterdam (1988) Hopkin, V.D.: Man-machine interface problems in designing air traffic control systems. Proc. IEEE 77(11), 1634–1642 (1989) Lee, P.U: Understanding human-human collaboration to guide human-computer interaction design in air traffic control. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, IEEE (2005) Li, H., Li, W., Wu, X.: Study on ATC human reliability based on EDT. Chin. Ergon. 16(2), 34–39 (2010) Parasuraman, R., Sheridan, T.B., Wickens, C.D.: A model for types and levels of human interaction with automation. IEEE Trans. Syst. Man Cybern. Part a-Syst. Hum. 30(3), 286–297 (2000) Schaefer, K.E., Chen, J.Y.C., Szalma, J.L., Hancock, P.A.: A meta-analysis of factors influencing the development of trust in automation: implications for understanding autonomy in future systems. Hum. Factors 58(3), 377–400 (2016) Van Dam, S.B., Mulder, M., Van Paassen, M.: Ecological interface design of a tactical airborne separation assistance tool. IEEE Trans. Syst. Man, Cybern.-Part A: Syst. Hum. 38(6), 1221– 1233 (2008) Van Eck, N., Waltman, L.: Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 84(2), 523–538 (2010) Vicente, K.J., Rasmussen, J.: Ecological interface design: theoretical foundations. IEEE Trans. Syst. Man Cybern. 22(4), 589–606 (1992) Yiu, C.Y., et al.: A digital twin-based platform towards intelligent automation with virtual counterparts of flight and air traffic control operations. Appl. Sci. 11(22), 10923 (2021) Yun-zhong, H., Ding, G.: Fuzzy synthetic evaluation of human reliability in air traffic control. China Saf. Sci. J. 14(11), 57–60 (2004)

Construction of General Aviation Air Traffic Management Auxiliary Decision System Based on Track Evaluation Jiang Yuan(B) and Chaoxiang Long Civil Aviation Flight University of China, Guanghan 618307, China [email protected]

Abstract. With the epoch-making development of China’s civil aviation and the gradual promotion of reform in low airspace, China’s general aviation has entered a period of rapid development. The number of general aircraft and the time of general aviation operation have grown in a fast rate. However, general aviation still has problems such as weak overall safety and imperfect safety system, which has not achieved the expected goals. In particular, rules and regulations concerning general air traffic management are still to be improved, and many places need to refer to transport aviation as implementation. It is of great significance to explore the operation and management methods applicable to general aviation. This paper takes a flight training airport as an example, uses DTW algorithm model for flight quality analysis, obtains aircraft flight data, selects trainees’ training projects in take-off and landing routes and cloud-crossing routes, takes multiple flight paths as samples and puts them into the algorithm model, carries out comparison and analysis between actual flight paths and expected flight paths, obtains evaluation results, and conducts evaluation results differentiation. Based on the analysis of the flight path of the general aviation operation, the score evaluation of the general aviation flight quality was carried out, and the idea of the auxiliary decision system of the general aviation air traffic management based on the flight path evaluation was constructed. The research includes air traffic control services, airspace management, flow management of a full range of auxiliary decision suggestions, with the aim of reducing the occurrence of human errors in air traffic management, and provide theoretical reference for the development and improvement of general aviation air traffic management. Keywords: track analysis · general aviation · auxiliary decision

1 Introduction With China’s civil aviation industry entering a new stage and the gradual advance of reform in low airspace, the development of China’s general aviation has entered a period of rapid development, and the number of general aircraft and the time of general aviation operation have grown in a fast speed. Compared with the more mature general aviation industry systems in foreign countries, China’s general aviation development still has a © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 391–405, 2023. https://doi.org/10.1007/978-3-031-35389-5_27

392

J. Yuan and C. Long

long way to go. By 2019, there had been about 446,000 general aircraft in the world, and the general aircraft market was mainly concentrated in the United States, Canada, France, Brazil, Germany, the United Kingdom and Australia. The total stock of general aircraft was about 350,000, and the stock in China only accounted for 0.61%. According to the “China General Aviation Report” issued by Yaxiang Aviation Co., LTD in 2021, “since 2014, the number of general aviation enterprises has been showing a growing trend, but due to the impact of the pandemic, the growing trend has gradually slowed its pace. By June 2021, the number of actual operating general aviation enterprises in China had reached 454.Since the 14th Five-Year Plan, China’s general aviation has maintained a momentum of rapid development. Hunan, Jiangxi, Anhui and other provinces in China has been deepening the reform of low airspace management, and all parts of the country are passionate about the development of general aviation industry. The “14th Five-Year Plan” for general aviation and policies and measures to promote the development of general aviation have been introduced one after another. The construction of general airports and other infrastructure has made steady progress, the general aviation operation environment has been improved constantly, and the main development indicators of general aviation are maintaining smooth growth. The general aviation industry is gradually becoming a new economic growth point of the country. Despite the rapid development of general aviation, China is still in its primary stage. General aviation still has some problems, such as weak overall safety ability and imperfect safety system, which has not achieved the expected development goals. In particular, related rules and regulations of general aviation air traffic management and operational mechanisms are still to be improved, and many scenarios still need to be implemented with reference to transport aviation. At present, there are some researches on flight quality analysis and management based on flight path analysis, with the focus on improving safety, reliability and efficiency. Seah C E [1] et al. use stochastic linear hybrid system to process and analyze the random deviation of actual flight path and planned flight path, and carry out compliance evaluation combined with the statistical characteristics of residual. However, the model is complex and prone to overfitting, which has an impact on the effectiveness of evaluation results. ZHENG Q M [2] et al. adopted the Monte Carlo method, taking the probability of flight path deviation as the evaluation standard of aircraft flight compliance, and created a random motion trajectory prediction model. However, this model was highly dependent on the accuracy of the flight kinematics model and has limitations. Liu Guoqian [3] used the deep-time clustering method, combined with the idea of extended k-means clustering, to conduct the clustering analysis of the flight path, in order to improve the aircraft performance. However, under the noise factor, the performance would gradually decline. Zheng Fujun [4] et al. applied fuzzy technology to fulfill flight path management from flight path to flight path based on the level of trust, but they had high requirements on the quality of point and flight path, which affected the level of confidence. Wang Bin [5] et al. used the historical operation data of aircraft to establish index quantification and realize the evaluation of the approach management system. However, the selection parameters were single and the evaluation results were limited. For the air traffic management research of general aviation, Yu Xianze [6] analyzed the key links of air traffic management and put forward relevant requirements.

Construction of General Aviation

393

Zhang Yue [7] compared the general aviation industry at home and abroad and used the basic flow rate entry tree model to put forward measures for the reform of navigable airspace management, but only explored at the macro level without analyzing specific operation conditions. Generally speaking, there are few researches on the construction of general aviation air traffic management system based on flight path analysis. In terms of the DTW algorithm model, this paper classifies and processes the real flight data of the trainees in a training airport, and then analyzes the flight quality. After analyzing and comparing the results, the auxiliary decision system of general aviation air traffic management based on flight track evaluation is constructed. Based on the analysis of the flight path of the traffic operation, this paper evaluates the flight quality of the traffic operation, puts forward reference suggestions for all parts of the air traffic management, plays an auxiliary decision-making role, and then helps to reduce the occurrence of human errors in the air traffic management.

2 Methodology 2.1 Status of Flight Path Evaluation Current flight path evaluation technologies are mainly divided into aircraft abnormal trajectory recognition and automatic assessment of flight consistency monitoring system. The basic idea of abnormal trajectory recognition is to cluster flight paths [8], extract characteristic trajectories to construct multidimensional feature expressions, and mark trajectories that do not belong to classification clustering as abnormal trajectories. The flight consistency monitoring technology is to compare the command from the control center or the planned flight path with the actual flight path data. If there is a large deviation between them, the flight consistency monitoring system will automatically evaluate the deviation of the flight path from the planned flight path. However, general aviation has the characteristics of various types of flight tasks, various targets and high operational risks, so the principles of practicability, standardization and applicability should be taken into account when designing and analyzing methods. Therefore, the two technologies above cannot be applied directly. Currently, the commonly used analysis system is based on the content difference between the planned flight path and the actual flight path in general aviation. By determining the calculation method of the difference, it decides whether the indicators in the actual flight path are reasonable, and establishes the flight quality analysis model based on the flight path according to each index. The common method is to convert the physical factors such as speed, altitude and time into distance, and determine the coincidence between the actual flight path and the planned flight path according to the sequence of flight path points and flight path distance range. This method is usually built on a single distance threshold value of the point pair of the continuous path to evaluate the consistency, so it is difficult to accurately analyze the measurement and control results, which greatly limits the stability of the evaluation, greatly restricts the evaluation results, and reduces the persuasiveness. The reference value of the evaluation results obtained by the traditional method will be further reduced when the aircraft maneuvering flight produces more unstable tracks.

394

J. Yuan and C. Long

Under such a background, the flight path evaluation method based on time series DTW algorithm combined with dynamic programming and vector quantization can well solve the flight data error problem of unequal flight path length. This method has no specific mandatory requirements between the actual flight path and the planned flight path. As long as the order of matching each flight data point remains unchanged, the optimal distance of each flight path point can be obtained, intended for calculating and judge the degree of coincidence between the two flight paths, meant to enhance the robustness and effectiveness of the analysis results, which has practical significance for the air traffic management of general aviation. 2.2 Data Source and Processing The statistics of China’s general aviation operations in 2020 show that training flights and non-operational flights account for more than 70% in total [9]. Among them, training flight is the main type of operation. Therefore, this paper selects a large flight training institution in China as the research object, and the flight data comes from the training aircraft of this institution. The data is derived from the Garmin G1000 integrated avionics system. The system has the characteristics of large screen, high resolution display, integrated communication, navigation and other avionics equipment. It has been widely loaded on various general aviation aircraft and provides time, latitude and longitude, altitude, speed, heading, attitude, external temperature and other information. The data are often used in risk monitoring, risk assessment and diagnostic research. The usual flight quality analysis is used by the aviation department to carry out the evaluation of the pilot operation, focusing on the longitudinal and lateral maneuverability, stability and control efficiency of the aircraft, as well as other contents such as stall characteristics, high speed characteristics, or through the establishment of a model like the altitude change rate to analyze the cause of the over-limit event and find the source of danger. The considerations in this paper provide support for air traffic management, which includes air traffic control, airspace management and flow management. The three parts influence each other, but air traffic control attaches more significance to the real-time contact and interaction with aircraft. In air traffic control, the focus is to determine the aircraft can fly according to the expected flight path, especially the longitude and latitude position under the projection. At the same time, considering that the flight altitude of the aircraft is a reference to the barometric altimeter, the data changes greatly in the actual operation, so in the text data analysis, Only the latitude and longitude coordinates in the data are selected for analysis and comparison. In this paper, all the tracks of 12 training flights of two airports were selected, and the interval of track point updating was 1s. The actual flight track was transformed into actual coordinates by using the longitude and latitude coordinates of the flight track using UTM [10] projection method, and the specific flight path was presented visually. After the data visualization, the results obtained are shown in Fig. 1. After analyzing and categorizing the data, the main training flight paths are divided into three categories: take-off and landing routes, cloud-crossing routes, and DME arcs. As the data samples of DME arc are too few, thus have large errors and contingencies, it is of little reference significance. Therefore, take-off and landing routes and cloud-crossing routes are selected for data screening and sorting, as shown in Fig. 2.

Construction of General Aviation

395

Fig. 1. Flight path projection diagram

Fig. 2. Data sets of takeoff and landing routes and cloud-crossing routes after classification.

2.3 DTW Algorithm (Dynamic Time Warping) DTW is an evaluation problem about the similarity of data link. The time warping function satisfying certain conditions describes the time correspondence between standard sequence and test sequence, and solves the warping function corresponding to the minimum cumulative distance when two templates   match. Suppose that there are two sequences P = p1 , p2 ,p3 · · · pm , S = {s1 , s2 ,s3 · · · sn }, and the length of the sequences are m、n respectively (and m = n). When comparing the sequences, it is necessary to ensure that the sequences have a high degree of similarity and coincidence, but the length, shape or peak value of these sequences are not one-toone correspondence. Therefore, before evaluating the similarity of the two sequences, the two sequences need to be distorted under the time axis to reach the same length of the sequence and have a good similarity and conformity. Therefore, the matrix is generated: ⎫ ⎧ ⎪ ⎬ ⎨ dist(p1 , s1 ) · · · dist(p1 , sn ) ⎪ .. .. .. d (P, S) = . . . ⎪ ⎪ ⎭ ⎩ dist(pm , s1 ) · · · dist(pm , sn )

396

J. Yuan and C. Long

where, the position of the matrix (i, j) represents the distance between ti and sj :

2 dist(i, j) = pi − sj The purpose of the DTW algorithm is to find a minimum path from (1, 1) to (m, n) in the matrix. The path is represented by W, and the K th element in the path W is defined as Wk = (m, n)k then the element can be regarded as a map of sequence P and sequence S. Then the minimum path is: W = w1 , w2 , w3 · · · wk

max(m, n) ≤ k < m + n − 1

The DTW algorithm has the following three constraints: (1) Boundedness: The DTW path must begin at the upper left corner (1, 1) of the distance matrix and end at the lower right corner (m, n) of the matrix. The order of each part of the time series must not change.

(2) Continuity: the DTW must be continuous, if wk = (a, b) and wk−1 = a , b



path then must satisfy a − a ≤ 1 and b − b ≤ 1. This constraint means that it is impossible for the algorithm to match crossing a certain point. In the matching process, only once step around can be matched in the case of many-to-one and oneto-many, that is, only the points adjacent to itself can be aligned. Therefore, every element of the sequence P and  the

sequence S can be guaranteed in the path

W.  and w = (a, b), then 0 ≤ a − a  and 0 ≤ = a , b (3) Monotonicity: if w k−1 k

b − b need to be satisfied, that is, in the formation process of the path, the point limiting the path W appearance must increase monotonically with time. Combined with the constraints of DTW algorithm, then in the path W , each element in the path to the next element has and only three choices. Assuming that the current grid point is (i, j) then the next choice of the path can only be as follows: (i + 1, j + 1), (i + 1, j), (i, j + 1). In a matrix, there are usually several paths satisfying the constraints of the DTW algorithm, so that the minimum value of the path is the DTW distance. Therefore, according to the DTW value, we can judge the coincidence of the two sequences, namely: ⎫ ⎧  k ⎬ ⎨ w k k=1 DTW (P, S) = min ⎭ ⎩ K Compared with the traditional method, the DTW algorithm has the advantage that when the length of the planned route and the actual route is inconsistent, it can easily gain the adaptive matching between the planned route and the actual route. By efficiently adjusting the order of each track point on the two flight paths, the optimal adjustment is made, and the DTW distance between the flight paths is calculated, which is convenient for flight quality analysis of the flight paths.

Construction of General Aviation

397

3 Results and Analysis 3.1 Take-Off and Landing Routes During takeoff and landing routes, due to the simultaneous training of multiple aircraft in the training airport during the flight, flight conflicts occur easily, and due to the influence of flow control and other factors, in order to ensure safety and maintain the stability of traffic flow, the common way is to extend upwind to ensure the spacing of aircraft. In order to increase the capacity of take-off and landing routes in training, there are at most three cases of extending upwind. For the aim of making the results of flight path analysis more reasonable, this paper selects and establishes the expected flight path in four cases according to different conditions of extending upwind groups, other flight paths according to the similarity degree of the expected flight path, then calculates them into the DTW model. The results are as follows: Table 1. Distance measurement results of take-off and landing routes in Airport 1 Actual track number

DTW distance

Actual track number

DTW distance

1.1

50691.029 m

3.2

53386.555 m

1.2

69289.519 m

4.1

44839.866 m

1.3

39250.835 m

4.2

27571.985 m

1.4

52696.088 m

4.3

93479.731 m

2.1

42014.343 m

4.4

50763.407 m

2.2

14939.871 m

4.5

10765.420 m

3.1

16273.261 m

The maximum value of DTW distance is 93479.731m, and the minimum value is 10765.420m. On the whole, most DTW distances are distributed in the range of 40,000m to 60,000m. The shape of several flight paths with a small DTW distance is similar to the expected flight path, while there are two obvious sides between the flight paths with a large DTW distance and the expected flight path. Therefore, most of the data conform to the rule that the smaller the DTW distance is, the more identical it is to the expected flight path. However, there is still a similar trajectory but a large DTW distance. Combining the original flight data and comparing the aircraft state at similar coordinate points, it is found that the speed and altitude of the aircraft are significantly different from the flight data on upwind and crosswind of other similar tracks, which may be caused by the need to adjust the interval between other aircraft or the human factors of the pilot students. Through the search of the China Weather network [11], the weather conditions of the day with light rain, north wind less than three, the weather conditions are not good, the more likely reason is the control caused by the flight path extension and the shower caused by the irregular flight path, resulting in the actual flight path and expected flight path deviation, have an impact on the final calculation results. After classifying the takeoff and landing routes of Airport 2 with the same method, three groups of takeoff

398

J. Yuan and C. Long

and landing routes are obtained. The results obtained by substituting them into the model are as follows (Table 2): Table 2. Distance measurement results of take-off and landing routes in Airport 2 Actual track number

DTW distance

Actual track number

DTW distance

1.1

18043.845 m

2.2

16186.933 m

1.2

27412.272 m

2.3

55798.148 m

1.3

55361.284 m

2.4

20123.276 m

1.4

36395.534 m

3.1

27120.621 m

1.5

41822.711 m

3.2

29087.883 m

1.6

55227.836 m

3.3

21270.126 m

2.1

38282.774 m

3.4

21821.223 m

The maximum value of DTW distance is 55798.148m and the minimum value is 16186.933m. In general, it is basically consistent with the above rules. However, the trajectory data of this airport are mainly distributed in the range of 16,000m to 40,000m. Compared with the assessment results of the take-off and landing routes of the last airport, the overall distribution interval is closer and the DTW distance obtained is smaller. Through the calculation of the track length of the original data and the comparison of the projected track (as shown in Fig. 3), the track of this airport is relatively shorter, and the flight path is more consistent with the expected flight path, and the track is denser, so the result is smaller.

Fig. 3. Comparison of two airports’ tracks (left is Airport 1, right is airport 2)

Construction of General Aviation

399

3.2 Cloud-Crossing Routes Compared with the take-off and landing routes, the flight time of the cloud-crossing route is longer and the flight distance is longer, and the difference caused by the trainees in turning training is larger and prone to errors. Therefore, the conclusions obtained in the takeoff and landing routes cannot be directly applied, and the data need to be re-analyzed. The DTW distance results obtained by substituting other flight path data into the model and the expected flight path are shown in the table below (Table 3). Table 3. The distance measurement results of the cloud-crossing route Actual track number

DTW distance

Actual track number

DTW distance

5.1

66357.646 m

5.5

786969.669 m

5.2

173809.982 m

5.6

397587.258 m

5.3

359600.620 m

5.7

236064.546 m

5.4

66163.597 m

5.8

411594.574 m

The maximum value of DTW distance is 786969.669mm and the minimum value is 66163.597m. The DTW distance of different flight paths varies greatly, which is because the cloud-crossing route can be carried out at multiple altitudes and interlaced during other flight training programs. The deviation degree of flight paths is large. The difference of flight proficiency of pilot students and the weather conditions of the day is also the reason for the big difference of DTW distance. However, from the perspective of the overall shape, smaller the DTW distance is, the more consistent the flight path is with the anticipated rule. However, the DTW distance between some similar flight paths and expected flight paths is quite different. For example, 5.5 and 5.8 are similar from the perspective of trajectory. Through consulting the original data, it can be found that 5.8 takes off from the airport and executes the cloud-crossing route, while 5.5 is a segment of the continuous cloud-crossing route, with slower flight speed and longer flight time. In the same period, the DTW distance from the standard flight path is smaller, resulting in a smaller result. 3.3 Algorithm Flexibility Verification In daily training, planes often wait at designated points or areas due to weather, trainee training errors, control interval and other reasons. In the traditional flight path analysis algorithm, if the above situation occurs, the calculation results will be greatly different when compared with the standard flight path, and then have a great impact on the flight path analysis results. But the DTW algorithm can effectively overcome this problem. Here is a flight path with two circles, with circles removed, as shown in Fig. 4 for comparison. The DTW distance obtained from both sections of the flight path to the expected flight path is 411594.574m. After the two flight paths are substituted into the model for

400

J. Yuan and C. Long

Fig. 4. Flight path comparison diagram

comparison calculation, the result is that the DTW distance of the two flight paths is 0m, which is because there will be repeated coordinate points on the flight path after the aircraft completes circling, and the DTW algorithm will select the minimum cumulative distance when matching, so that the calculation result of the circling distance of this flight path is 0m. The calculation shows that the difference of flight path caused by hovering waiting does not affect the analysis and evaluation of flight path.

4 Discussion By analyzing the flight path of the time, the effect of the time flight can be evaluated. Through the overall analysis of a large number of historical data, the evaluation results can be used to evaluate the pilot and even the airline. By providing the evaluation results to the air traffic control unit to assist the air traffic control unit to make more comprehensive decisions, it can help the air traffic control unit to make corresponding work adjustment, with the purpose of improving the operation efficiency, ensuring safety and promote smooth operation. 4.1 Evaluation Data Differentiation Because the method used in this paper is to compare the deviation value of the actual track and the predicted track, with the different length of the track, the direct pair ratio will be too large, which is not convenient for intuitive comparison. For example, the calculated value range in the pre-landing route includes 10000m to 90000m.Through percentage, the evaluation range of all flights can be 0 to 100, which is more convenient for air traffic control units to intuitively understand the operation level of the aircraft. Percentage method of flight track evaluation: 1. Select the recent flight path data of the evaluation object for analysis, and eliminate the flight path with more serious abnormal operation. 2. Carry out the flight path evaluation and obtain the evaluation data set.

Construction of General Aviation

401

3. The maximum value of DTW distance in the data set is rounded up and the minimum value is rounded down to obtain the revised maximum value Xmax and revised minimum value Xmin . 4. Plug the evaluation results into the following formula for calculation: Pi =100 −

Xi − Xmin • 100 Xmax − Xmin

where,P is the evaluation score of the i th locus and Xi is the DTW distance of the i th locus. Taking the above Table 1. as an example, the result of taking off and landing route analysis based on percentage system is as follows: Table 4. Percentage system results of takeoff and landing routes Track number

DTW distance

Evaluation result

Track number

DTW distance

Evaluation result

1.1

50691.029 m

54.79

3.2

53386.555 m

51.79

1.2

69289.519 m

34.12

4.1

44839.866 m

61.29

1.3

39250.835 m

67.50

4.2

27571.985 m

80.48

1.4

52696.088 m

52.56

4.3

93479.731 m

7.24

2.1

42014.343 m

64.43

4.4

50763.407 m

54.71

2.2

14939.871 m

94.51

4.5

10765.420 m

99.15

3.1

16273.261 m

93.03

It can be seen from the Table 4 that the higher the score is, the closer it is to the expected flight path; conversely, the larger the deviation from the expected flight path is, the lower the score will be. This score can intuitively show the ability of the evaluation object to fly out of the expected flight economy, thus helping the air traffic control unit to make decisions. 4.2 System Construction By obtaining the historical data of the service objects, flight analysis based on flight track can be carried out, and the results can be fed back to the relevant air traffic control units for assisting plan approval and formulation of operation schemes. At the same time, real-time data of service objects can be obtained and evaluated through ADS-B and other means during operation, and the results can be fed back to relevant units of air traffic control to assist real-time decision making and control services. The system flow is shown in Fig. 5. (1) Data acquisition and processing Historical flight data can be exported by the aircraft’s avionics system and uploaded to the server by the airline company or acquired and uploaded to the server through the ADS-B system. In the early stage, manual sorting can be adopted, and

402

J. Yuan and C. Long

Fig. 5. Schematic diagram of navigable air traffic management auxiliary decision based on flight path

in the later stage, the system can automatically sort and classify flight tracks through machine learning, and then evaluate and output the results. The overall result can be averaged after item-by-item analysis, and the weight of different flight types can be set according to calculation. The assignment and weight setting of the overall result should be constantly improved and optimized during operation. Real-time flight data can be acquired through the ADS-B system, which can obtain the expected track corresponding to the current flight through the voice recognition of the controller, and then carry out comparative analysis and output the results to assist the controller in carrying out work. (2) Application of evaluation results. Air traffic management is chiefly divided into three parts, namely air traffic service, flow management and airspace management. The evaluation results are sent to the corresponding departments to assist them in making decisions. In terms of air traffic services, it mainly carries out air traffic control work, while information and alerting services are less reflected in real-time operation. The evaluation data can be connected to the control monitoring service system. For the objects with high scores, the controller can consider reducing the attention appropriately and allocating the energy to the objects with low scores. Real-time evaluation data access is more conducive to the controller’s real-time work adjustment. If the

Construction of General Aviation

403

historical evaluation data of the service object is low, the controller needs to pay attention to the whole process to guarantee the safety. Appropriate reminders and signs can help controllers allocate their energy rationally, thus reducing controller fatigue and improving control effect. In terms of flow management, it mainly involves flight plan approval and flow control of the general aviation airline company. The data will be connected to the data system of the general aviation flight service station, and certain restrictions will be imposed on the flight plan approval for the objects with low scores. Moreover, when the operation situation is more complex and requires restrictions and delays, flow control will be carried out first to ensure the operation of other aviation departments. For the objects with high scores, they will be encouraged or even rewarded in the process of plan approval and operation restriction, so as to reduce delays. The principle of high score priority also urges all aviation units to strengthen team building, perform well in quality management, and improve flight quality, thus promoting the improvement of the overall flight quality of the industry. Airspace management also adopts the principle of high score priority in the approval of temporary routes and temporary airspace, and gives priority to the use and approval of units with higher scores. Promote the general aviation department to strengthen the foundation, emphasis on quality. By providing evaluation scores to air traffic control units for decision-making reference, it can help front-line controllers to better allocate energy [12]. To improve the efficiency and effect of control, it can also provide support for the examination and approval departments and make decisions based on a reasonable and scientific basis. The principle of high score priority also forces the general aviation department to do a good job in operation management and improve flight quality. Human factors are considered to be one of the decisive factors for the management and improvement of flight safety. How to reduce the errors and even risks caused by human beings in aviation operation is also an important direction for the efforts of researchers. The system proposed in this paper provides positive effects from two aspects: monitor approval and operation execution. In the examination and approval process, the units with higher evaluation are given a tilt, so that they are less likely to make mistakes at the pilot operation level in the execution of flight tasks compared with units with lower evaluation. In terms of control implementation, the control scenario of general aviation is more complex, especially when general aviation and transport aviation operate at the same time, it will increase the difficulty and intensity of the controllers’ work, and more easily cause control fatigue and even “error, forget and omission “. The auxiliary system proposed in this paper can, to some extent, help controllers to allocate more reasonable energy and reduce energy consumption, carrying out work in a better state, reduce human errors and ensure aviation operation. The system set up in this paper is still relatively simple, and there is still a large space for improvement and improvement. Some technologies need to be overcome, such as automatic track classification based on machine learning, classification and evaluation of real-time track data, etc. At the same time, this system also has more application space. In January 2023, CAAC issued a notice on soliciting opinions on two civil aviation industry standards, including Requirements for Air Traffic Service

404

J. Yuan and C. Long

of Civil Unmanned Aircraft (Draft for Comments). In the notice, it put forward the requirement of regularly uploading flight data of unmanned aviation. The application of this system for flight track evaluation, and further optimization of the operation of unmanned aircraft, will provide ideas and references for the air traffic management of unmanned aircraft.

5 Conclusion Taking a flight training airport as an example, this paper uses DTW algorithm model for flight quality analysis, obtains real flight data, selects trainees’ training projects in landing routes and cloud-crossing routes, and carries out the application of algorithm model after completing the classification according to the characteristics of the trajectory. After the evaluation results are obtained, the results are compared and analyzed under the influence of weather, control and other reasons, and the conclusion is clearly drawn. At the same time, flexibility verification is carried out to verify the effectiveness of the algorithm. The experimental results show that the DTW algorithm has better anti-interference ability, and can better evaluate the coincidence between the actual and predicted flight path. On the basis of DTW algorithm model, an air traffic management auxiliary decision system of general aviation based on flight path evaluation is constructed. Based on the flight path analysis of general aviation operation, the evaluation results are divided into several parts to make the results more intuitive, and the score evaluation of general aviation flight quality is completed. The air traffic management suggestions are put forward based on the evaluation score of the general navigation flight. By providing evaluation scores to air traffic control units for decision-making reference, it can aid front-line controllers to better allocate energy. To improve the efficiency and effect of control, it can also provide support for the examination and approval departments and make decisions based on a reasonable and scientific basis. The principle of high score priority also forces the general aviation department to do a good job in operation management and improve flight quality. The auxiliary decision system can provide a theoretical reference for the development and improvement of general aviation air traffic management. The application of the system can also help reduce the occurrence of human errors in air traffic control work, which proves to be of practical significance.

References 1. Seah, et al.: Algorithm for conformance monitoring in air traffic control. J. Guidance, Control, Dyn.: Publ. Am. Inst. Aeronaut. Astronaut. Devoted Technol. Dyn. Control 33(2) (2010). https://doi.org/10.2514/1.44839 2. Zheng, et al.: Probabilistic approach to trajectory conformance monitoring. J. Guidance, Control, Dyn.: Publ. Am. Inst. Aeronaut. Astronaut. Devoted Technol. Dyn. 35(6) (2012). https://doi.org/10.2514/1.53951 3. Liu, G., et al.: Deep flight track clustering based on spatial–temporal distance and denoising auto-encoding. Expert Syst. Appl. 198, 116733 (2022). https://doi.org/10.1016/J.ESWA.2022. 116733 4. Zheng, F., He, Z.: Flight path management based on fuzzy technology in practical systems. Tactical Missile Control Technol. 15(1), 60–64 (2007)

Construction of General Aviation

405

5. Wang Bin, et al., Study on operational efficiency improvement of approach management system based on track. J, Harbin Univ. Commer. (Natural Science Edition) 37(03), 295– 299+306 (2021). https://doi.org/10.19492/j.cnki.1672-0946.2021.03.006 6. Xianze, Y.: Air traffic management of civil UAV. Civil Aviat. Manag. 03, 56–58 (2018) 7. Zhang, Y.: Research on airspace management reform under the development of general aviation industry. Nanchang Hangkong University (2020). https://doi.org/10.27233/d.cnki.gnchc. 2020.000266 8. Yang, J., Zhao, C.: A review of k-means clustering algorithms. Comput. Eng. Appl. 55(23), 7–14+63 (2019) 9. CHINA AVIATIONNEWS. https://www.cannews.com.cn/2022/0421/342268.shtml 10. Application of UTM projection in low latitude regions abroad [EB/OL]. https://www.xzbu. com/8/view-5660913.htm 11. China weather net [EB/OL]. http://www.weather.com.cn/ 12. Li, X.: Establishment of a situation awareness analysis model for flight crew alerting. In: Harris, D., Li, WC. (eds.) Engineering Psychology and Cognitive Ergonomics. HCII 2022. Lecture Notes in Computer Science(), vol. 13307. Springer, Cham. https://doi.org/10.1007/ 978-3-031-06086-1_26

How to Determine the Time-Intervals of Two-Stage Warning Systems in Different Traffic Densities? An Investigation on the Takeover Process in Automated Driving Wei Zhang1 , Shu Ma2(B) , Zhen Yang2 , Changxu Wu1 , Hongting Li3 , and Jinlei Shi4 1 Tsinghua University, Beijing, China

[email protected]

2 Zhejiang Sci-Tech University, Hangzhou, China

[email protected]

3 Zhejiang University of Technology, Hangzhou, China 4 Zhejiang University, Hangzhou, China

Abstract. In level 3 automated driving, the warning systems played a critical role in reminding drivers who were engaging in non-driving-related tasks to take over the vehicle when the automated driving system met difficulties that exceed its capacity. The two-stage warning systems, which provided a time-interval between the first and the second warnings, compensated for deficiencies of the single-stage warning systems in inadequate motor preparation and situation awareness. However, the effectiveness of different lengths of time-intervals in different traffic densities remains unclear. The present study conducted an experiment to investigate different time-intervals (3 s, 5 s, 7 s, and 9 s) and traffic density levels (low: 0 vehicle /km and high: 20 vehicles/ km) on the drivers’ motor readiness, situation awareness, takeover performance, and subjective ratings. Results suggested that 5 s and 7 s (compared with 3 s and 9 s) were two favored time-intervals in general but the drivers’ response patterns and the beneficial aspects of time-intervals varied for driving environments with different traffic density levels. The findings of the present study had implications for the design and application of the two-stage warning systems. Keywords: Time-intervals of Two-stage Warning Systems · Traffic Density · Situation Awareness

1 Introduction 1.1 Takeover in Automated Driving In level 3 automated driving, drivers are allowed to perform various non-driving related tasks (NDRTs) but are required to take over the vehicles when the automated driving system came across situations that could not handle (e.g., missing lane, stationary vehicles, or obstacles in the current lane) [1, 2]. In this circumstance, the takeover warning systems are critical for takeover safety, which take the responsibility of reminding distracted drivers to transfer attention from NDRTs to the traffic road. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 406–422, 2023. https://doi.org/10.1007/978-3-031-35389-5_28

How to Determine the Time-Intervals of Two-Stage Warning Systems

407

1.2 The Two-Stage Warning Systems and Time-Intervals In the past few decades, researchers had investigated and validated the effect of different takeover warning systems such as auditory, visual, tactile, and multimodal warning systems on takeover performance [3]. The commonplace of the above takeover warning systems was that these systems generally issue warning only once at a certain lead time; that is, these were the single-stage warning systems. However, it was found that drivers usually rapidly finished the takeover process without making good motor readiness and gaining sufficient situations awareness [4, 5], which may lead to their omission of potential hazards and even result in unexpected collisions. In addition, even if the drivers can always successfully avoid the hazards and take over the vehicles, these swift response patterns would bring driving stress issues [4] and be harmful to the drivers’ physical health in the long term. To make up for the above deficiencies, our previous study originally put forward the two-stage warning system and found its beneficial effect on enhancing situation awareness, alleviating stress, improving takeover performance, and acquiring drivers’ acceptance[4, 5]. The two-stage warning systems include twice warnings with different purposes [6, 7]. That is, the first warning “attention please” aims to acknowledge the drivers that the automated vehicle has met hazards. To respond to the first warning, drivers are required to pay attention to the road without immediate action. The second warning “take over please” aims to ask drivers to take actual maneuvers. By requiring such different responses for drivers, the two-stage warning systems can form a safer takeover habit for drivers in the long run. The primary benefit of the two-stage warning systems lies in the time-intervals between the first warning and the second warning, during which the drivers can make preparation such as shifting their attention from NDRTs to driving scenarios, gaining situation awareness, taking motor readiness, and making decisions [4]. Based on the advent of the Internet of vehicles (IoV), an early warning system can be realized [8], which means that the time-intervals of two-stages warning systems can be prolonged as well. Nevertheless, how long the length of time-intervals was needed for drivers to make motor readiness and gain good situation awareness so as to prompt the drivers’ safe takeover behaviors? Should the time-intervals be set as long as possible? Previous researchers suggested that it was important that the warning should retain human attention so that information can be encoded [9] and prevented them from being distracted by other stimuli before the message was encoded satisfactorily [10]. In practice, the long time-intervals were more favorable for gaining situation awareness (which comprised information encoding) and making motor readiness rather than the short ones. However, it was possible that the drivers’ attention would be attracted by other issues or lose patience if the time-intervals were too long, which may lead to the long time-intervals as detrimental as the short time-intervals. Therefore, to optimize the effectiveness of the two-stage warning systems, it was needed to find proper time-intervals for drivers to make sufficient preparation and acquire safe takeover behaviors. 1.3 Traffic Density As the takeover process is a typical human-system-environment loop, the effectiveness of takeover warning systems would be influenced by other factors including driverrelated factors (e.g., driving experience, trust, age, fatigue, and alcohol), NDRTs (e.g.,

408

W. Zhang et al.

watching videos, listening to music, text messages, play games), and driving environment (e.g., traffic density, available escape paths, road types, and weather conditions) [3]. Compared to the NDRTs and human-related factors, the environmental factors during the takeover process, which varied in traffic density, road type, weather conditions, and other factors [11, 12], were more uncontrollable and unpredictable. Previous researchers suggested that traffic density played a major role in takeover performance compared to the drivers’ state (Gold et al., 2016). In addition, due to the technical bottleneck, automated driving would first be realized on the highway, where the variation of traffic density was the commonest. Therefore, the traffic density of the takeover environment was widely explored in studies. In general, previous studies suggested that the traffic density impaired the drivers’ takeover performance [13–17]. For example, Körber et al. (2016) explored the effect of traffic density, age, and NDRTs on takeover performance, observing that traffic density extended takeover time, increased the number of collisions, and decreased minimum TTC [15]. Investigating the effect of traffic density when drivers engaged in different NDRTs, Gold et al. (2016) found the longer takeover time and worse takeover quality in the presence of traffic [14]. 1.4 Interaction Effect Traffic Density and Time-Intervals However, studies mainly focused on the drivers’ factors or NDRTs with the variation of traffic density. For the single-stage warning systems, there was only a few researchers explored the effectiveness of warning systems by manipulating different lengths of lead time in various traffic density conditions. For example, Du et al. (2020) investigated the effect of the lead time of takeover warnings, traffic density, and the cognitive workload of NDRTs on the takeover performance [13]. In their study, increased traffic density led to shorter minimum TTC while no interaction between lead time and traffic density was observed on takeover performance. Doubek et al. (2020) explored the effect of lead time (5 s, 7 s, and 20 s) and traffic density level (low vs. medium) on takeover performance [18]. Results indicated that in the medium traffic density, many participants did not decelerate before making a lane change, which resulted in a dangerous emergency brake of the left-lane vehicle. No interaction effect of lead time and traffic density was observed. However, it should note that the lead time of the single-stage warning systems was different from the time-intervals of the two-stage warning systems. When the warning was issued, the drivers needed to take over directly within the provided time budget. While for the time-intervals of two-stages warning systems, the drivers’ primary task was to make motor readiness and gain situation awareness for the takeover without immediate actions. As high traffic density was coupled with fewer escape paths and restricted the drivers’ opportunity to change lanes [13], it was assumed that the longer time-intervals would be beneficial for drivers to gain sufficient situation awareness (including perceiving the traffic environment, understanding the information, analyzing and predicting for the surrounding vehicles’ movements) and own good performance as a result. However, through the literature review, we found that the effect of time-intervals of the two-stage warning systems and traffic density on the takeover process remains unclear. In addition, the drivers’ opinions should also be emphasized as the user experience was important for the practical application. However, most researchers focused on the takeover performance, few studies explored the drivers’ subjective opinions on the parameters of takeover warning systems (e.g., how appropriate and useful the system is).

How to Determine the Time-Intervals of Two-Stage Warning Systems

409

1.5 Aims In summary, the present study aims to investigate the effect of time-intervals on drivers’ motor preparation, situation awareness, takeover performance, and subjective ratings in different traffic density levels. A simulated driving experiment was conducted to test our hypotheses.

2 Methods 2.1 Participants A total of 44 participants (29 male, 15 female) ranging from 19 to 26 (Mean = 23.07, SE = 0.293) were recruited from Zhejiang Sci-Tech University. All participants were required for a valid driving license and normal or corrected-to-normal vision. As regard the driving experience of the participants, their average number of years of driving was 2.763 years (SE = 0.280), the average driving mileage in the past year was 507.661 km (SE = 250.662) and the average total driving mileage was 4011.366 km (SE = 2510.550). 2.2 Experiment Design and Measurement This study adopted a 2 × 4 mixed-design, with traffic density (high traffic density vs. low traffic density) as the between-subject factor and time-interval of two-stage warning systems (3 s, 5 s, 7 s, and 9 s) as the within-subject factor. The traffic density was defined as the number of surrounding vehicles per kilometer [14]. In the experiment, the low and high traffic density was 0 vehicles per kilometer and 20 vehicles per kilometer. The participants were randomly distributed into two groups corresponding to low and high traffic density. There were no significant differences between the two groups as regard age (t (42) = 0.850, p = 0.401), total driving year (t (42) = −0.392, p = 0.697), the average driving mileage in the past year (t (42) = −1.142, p = 0.265), and the total driving mileage (t (42) = −1.408, p = 0.174). The motor readiness, takeover performance, situation awareness, and subjective ratings for time-intervals during the takeover process were collected in the experiment. Motor readiness was measured by the hands-on-steering-wheel time, which was defined from the moment the warning was issued until the participants placed at least one hand on the steering wheel (Lu et al., 2019); Takeover performance was measured by two indicators. (1) the takeover time was defined as the minimum time between the issuance of the warning and the moment when the steering angle is greater than 2° or the brake percentage exceeded 10% (Gold et al., 2013); (2) the maximum resulting acceleration was defined as the Eq. (1).  maximum resulting acceleration = maximum acceleration2longitudinal + acceleration2lateral (1) This was an indicator of takeover quality and was collected from the onset of the second warning to the moment when the driver overtook the hazards. The higher the maximum

410

W. Zhang et al.

resulting acceleration, the worse and less safe the takeover quality [19]. All takeover performance indicators were collected by STISIMDRIVE M100K software at the frequency of 120 Hz. Situation awareness was measured by the Situation Awareness Rating Technique (SART), which included three dimensions (i.e., demand from attentional resources (D), the supply of attentional resources (S), and understanding of the situation (U)) with 10 questions [20, 21]. Each participant completed the SART based on a seven-point Likert scale after each takeover trial. The final SA score was calculated as the sum of U and S less D. Subjective ratings included appropriateness and useful ratings of time-intervals. A 15-point rating scale was used to evaluate these two indicators by following steps: participants first selected one out of five categories (appropriateness: far too long, too long, just right, too short, and far too short; usefulness: very useless, useless, moderate, useful, and very useful) and then narrowed down their answers by using three subcategories (-, 0, +) [22]. The results of the appropriateness rating or usefulness rating were then transformed into scores ranging from 1 to 15. 2.3 Apparatus The present experiment was conducted by a fixed-based driving simulator (see Fig. 1), which consisted of a driving simulator software (STISIMDRIVE M100K, Systems Technology Inc., Hawthorne, CA, United States), a ThinkCentre [Precision M6600t, Intel Core (TM) i5–6500 CPU 3.20 GHz], an adjustable seat, and an operating system (Logitech MOMO, Newark, CA, United States) (including a steering wheel, an accelerator, and a brake pedal). The driving scenarios were presented on a 60-in viewing screen with 1920 × 1080-pixel resolution and the warning messages and driving sounds were presented by a loudspeaker which was placed below the screen. A Logitech C270 webcam with 1280 × 720-pixel resolution and its support was placed at the right rear of the driving seat to record the driving behaviors.

Fig. 1. Driving simulator of this study

How to Determine the Time-Intervals of Two-Stage Warning Systems

411

2.4 Driving Scenarios The driving scenarios were built by STISIM M100K software. This a six-lane highway road with three lanes in each direction. The automated driving systems could drive at the speed of 100 km/h in the middle lane. The experiment included four static takeover events: (1) a broken-down truck, (2) obstacles, (3) a traffic accident, and (4) a work zone. All takeover events appeared in the same lane of the automated vehicle and required the drivers to take over the vehicle and change lanes. In addition, these takeover events would appear simultaneously with the issue of the first warning of the two-stage warning systems. Therefore, the drivers could not see the driving scenarios in advance. Traffic density during the takeover process varied in different experimental conditions. For the low traffic density conditions, there were no vehicles in the adjacent lane of the automated vehicle during the takeover (i.e., the traffic density was 0 vehicles per kilometer.) For the high traffic density condition, there were some other vehicles with a speed of 100 km/h in the adjacent lane of the automated vehicle (i.e., the traffic density was 20 vehicles per kilometer and the distance between vehicles was 50 m). 2.5 Two-Stage Warning Systems The two-stage warning system consisted of two short auditory messages for the benefit of auditory modality in omni-direction [23] and effectiveness in takeover performance [24]. When approximating the takeover event, the first warning “attention please” would issue at the lead time of 10 s or 12 s or 14 s, or 16 s. Then the second warning “take over please” would issue at the lead time of 7 s (the commonly used lead time in literature) [3]. Consequently, time-intervals of the two warnings were 3 s, 5 s, 7 s, and 9 s, respectively. These four time-intervals were balanced by Latin Square and randomly corresponded to one of four takeover events. All speeches were transformed into a digital female voice with a speed of 150 words per minute, a loudness of 70 dB, and a duration of 1100 ms. 2.6 Non-driving Related Task The Tetris game installed in a smartphone was chosen as the non-driving related task during the takeover process. The fallen speed of pieces in the Tetris game was set to 1.6 square/s. To control the motivation effect, participants were told in advance that they would receive scores based on a composite score of their takeover performance (weight: 60%) and Tetris game scores (weight: 40%). The top three participants would receive rewards of 50 yuan, 75 yuan, and 100 yuan, respectively. 2.7 Procedure First, the experimenter welcomed participants for joining in the experiment. All participants were required to fill out the consent form and a basic questionnaire including age, driving experience, and physical state. After that, participants were asked to get familiar with the driving simulator and practice the takeover process as well as the Tetris game during the automated driving (which lasted for around 15 min). They were then randomly distributed into two groups, which corresponded to low and high traffic

412

W. Zhang et al.

density. To reduce the expectation effect, the automated driving process (i.e., duration of non-driving related tasks) lasted for 3 or 4 min, which was counterbalanced with time-intervals. Each participant completed four trials, which corresponded to four timeintervals. Participants were asked to pay attention to the road without taking any actions when they heard the first warning; however, they could make motor readiness. Moreover, participants were not forbidden to continue playing the Tetris game after the first warning was issued. When the second warning was issued, participants could take over the vehicle by steering the wheel or breaking the pedal. After successfully taking over the vehicle, they drove for 1 min, and then the procedure ended autonomously. After each trial, participants needed to rate the appropriateness and usefulness of the time-interval they experienced and complete the SART as regards their situation awareness during the takeover process. All participants were required to report if they experienced any typical simulator sickness symptoms (e.g., feeling dizzy, having a headache, wanting to vomit, etc.) during the experiment. No simulator sickness was reported. The whole experiment lasted for around 45 min and each participant received 25 yuan in compensation. The Tetris composite scores of the top three participants would receive the award of 50, 75, and 100 yuan, respectively. 2.8 Data Analysis The statistical analysis was conducted by IBM SPSS 26.0 (IBM, Inc). For the data of hands-on steering-wheel time, takeover time, and maximum resulting acceleration, the normality distribution of data was firstly checked by the one-sample Kolmogorov Smirnov test; then the linear mixed model (LMM), which considered the fixed and random effects was conducted [25–27] by taking the time-interval, traffic density, and their interaction as fixed effects, and the individual differences, the location and sequence of the takeover event, the automated driving time, and the sequence of time-interval as random effects. The paired t-test was employed for the post hoc test if the main effect was significant and the simple effect analysis if the interaction effect was significant. For the ordinal data including situation awareness, appropriateness ratings, and usefulness ratings, the boxplot of raw data was first plotted and the outliers were removed for data analysis. Then a Friedman test was conducted to test the main effect [28]. If the main effect was significant, Wilcoxon signed-rank test was used for the post hoc analysis to check which pair of comparisons was significant [29]. Due to the limitation of the Friedman test in the interaction effect, the effect of time-interval was analyzed under the low traffic density and high traffic density, respectively. All the significance level of the analyses was set to 0.05.

3 Results 3.1 Descriptive Statistics The mean and standard deviation (SE) of time-intervals in hands-on steering-wheel time, takeover time, and maximum resulting acceleration in low and high traffic density conditions were shown in Table 1. The median, interquartile range (IQR), and mean rank of time-intervals in situation awareness, appropriateness, and usefulness in low and high traffic density conditions were shown in Table 2.

How to Determine the Time-Intervals of Two-Stage Warning Systems

413

Table 1. Mean and standard deviation (SE) of time-interval in hands-on steering-wheel time, takeover time, and maximum resulting acceleration in low and high traffic density conditions. Traffic density Time-interval Hands-on Takeover time (s) Maximum resulting steering-wheel time acceleration (m/s2 ) (s) Low

High

Mean (SE)

Mean (SE)

Mean (SE)

3s

3.477 (0.193)

1.572 (0.102)

1.516 (0.182)

5s

3.964 (0.310)

1.280 (0.088)

1.465 (0.194)

7s

4.741 (0.491)

1.283 (0.108)

1.347 (0.118)

9s

6.333 (0.731)

1.618 (0.137)

1.446 (0.151)

3s

3.246 (0.205)

1.882 (0.176)

3.439 (0.353)

5s

2.909 (0.196)

1.373 (0.108)

3.790 (0.516)

7s

3.450 (0.406)

1.564 (0.135)

3.580 (0.332)

9s

4.114 (0.583)

1.600 (0.115)

3.571 (0.427)

Table 2. Median, interquartile range (IQR), and mean rank of time-interval in situation awareness, appropriateness, and usefulness in low and high traffic density conditions. Traffic density

Time-interval

Situation awareness Median

Low

High

IQR

Appropriateness Mean rank

Median

IQR

Usefulness Mean rank

Median

IQR

Mean rank

3s

26.0

17.75–28.0

1.74

10.0

7.0–12.0

3.58

10.0

8.75–12.0

2.07

5s

28.0

27.25–31.0

2.87

8.0

6.5–8.0

2.83

12.0

10.0–13.0

3.02

7s

26.0

22.0–29.5

2.74

7.0

6.0–8.0

2.42

12.0

10.5–12.0

2.86

9s

28.5

22.75–31.25

2.66

4.5

2.0–6.0

1.18

10.0

8.75–12.25

2.05

3s

23.0

15.5–26.75

2.18

10.0

9.0–11.0

3.59

11.0

10.0–12.0

2.25

5s

21.0

18.5–24.0

2.18

8.0

7.75–9.0

2.50

11.0

9.75–12.5

2.28

7s

24.0

20.0–26.75

2.87

8.0

7.0–9.0

2.16

12.0

11.0–12.0

2.91

9s

22.0

19.0–25.0

2.76

7.0

6.0–8.0

1.75

12.0

10.0–13.01

2.56

Note: IQR: interquartile range; for appropriateness 1 = “far too long”, 8 = “just right”, 15 = “far too short”); for usefulness: 1 = “completely useless”, 8 = “moderate”, 15 = “very useful”

3.2 Motor Readiness The LMM analysis suggested that the main effect of time-intervals on hands-on steeringwheel time was significant (F (3, 126) = 11.196, p < 0.001). As Fig. 2 shows, the time that drivers put their hands on the steering wheel increased with the increase of the time-interval. Each pair’s comparison for four time-intervals was significant except for the comparison of 3 s and 5 s. The main effect of traffic density was significant (F (1, 42) = 5.646, p < 0.05). Drivers tended to spend more time putting their hands on the

414

W. Zhang et al.

steering wheel in low traffic density than in high traffic density (ps < 0.05) (see Fig. 3). In addition, the interaction effect of time-intervals and traffic density was significant (F (3, 126) = 2.544, p = 0.064). As Fig. 4 shows, the simple effect analysis indicated that when the traffic density was low, drivers’ reaction time to put their hands on steeringwheel was increase with the time-intervals (each pair’s comparison was significant except for the comparison of 3 s and 5 s). When the traffic density was high, results suggested that the hands-on steering-wheel time in time-intervals of 5 s and 7 s were significantly shorter than that in 9 s, respectively (ps < 0.01 for 5 s and 9 s, ps = 0.072 for 7 s and 9 s). No significant differences were observed among 3 s, 5 s, and 7 s.

Fig. 2. The effect of different time-intervals on hands-on steering-wheel time (*p < 0.05; ***p < 0.001)

Fig. 3. The effect of different traffic density levels on hands-on steering-wheel time (*p < 0.05)

How to Determine the Time-Intervals of Two-Stage Warning Systems

415

Fig. 4. The effect of different time-intervals on hands-on steering-wheel time in low and high traffic density (+ p < 0.1; *p < 0.05; **p < .01; ***p < 0.001)

3.3 Takeover Performance Takeover Time. The LMM analysis indicated that the main effect of time-intervals on takeover time was significant (F (3, 126) = 6.734, p < 0.001). As Fig. 5 shows, the post hoc test suggested that drivers spent less time taking over the vehicle in the time-interval of 5 s than that in 3 s (ps < 0.001) and 9 s (ps < 0.001). Similarly, their takeover time in the time-interval of 7 s was significantly shorter than that in 3 s (ps < 0.01) and 9 s (ps = 0.073). Nevertheless, there were no significant differences between time-intervals of 5 s and 7 s. The main effect of traffic density (F (1, 42) = 1.862, p = 0.180) and interaction effect (F (3, 126) = 1.130, p = 0.345) were not significant on takeover time.

Fig. 5. The effect of different time-intervals on takeover time (+ p < 0.1; **p < .01; ***p < 0.001)

416

W. Zhang et al.

The Maximum Resulting Acceleration. Results suggested that the main effect of traffic density on the maximum resulting acceleration was significant (F (1, 42) = 36.063, p < 0.001). As Fig. 6 shows, drivers had a larger maximum resulting acceleration in the high traffic density condition than in the low traffic density condition (ps < 0.001). The main effect of time-intervals (F (3, 126) = 0.196, p = 0.899) and the interaction effect of time-intervals and traffic density on the maximum resulting acceleration (F (3, 126) = 0.602, p = 0.617) were not significant.

Fig. 6. The effect of traffic density on maximum resulting acceleration (+ p < 0.1; **p < .01; ***p < 0.001)

3.4 Situation Awareness For the low traffic density, the Friedman test suggested the significant effect of timeintervals on situation awareness (χ2 (3) = 10.123, p < 0.05) (Fig. 7). The Wilcoxon test suggested that situation awareness in the time-interval of 3 s (median = 10.0, IQR = 8.75 – 12.0) was significantly lower than that of 5 s (median = 12.0, IQR = 10.0 – 13.0) (Z = − 2.966, p < 0.01, r = − 0.632), 7 s (median = 12.0, IQR = 10.5 – 12.0), (Z = −2.040, p < 0.05, r = − 0.435) and 9 s (median = 10.0, IQR = 8.75 – 12.25) (Z = − 2.676, p < 0.01, r = − 0.571). The differences among time-intervals of 5 s, 7 s, and 9 s on situation awareness were not significant. For the high traffic density, the Friedman test suggested that the effect of timeintervals on situation awareness was not significant (χ2 (3) = 4.840, p = 0.184).

How to Determine the Time-Intervals of Two-Stage Warning Systems low traffic density

417

high traffic density

Situation Awareness

40

30

20

10

0 3

5

7

9 3 Timeinterval(s)

5

7

9

Fig. 7. The boxplot of raw data of situation awareness in terms of time-intervals in low and high traffic density

3.5 Subjective Ratings Appropriateness Ratings of Time-Intervals. For the low traffic density, the Friedman test suggested that the main effect of time-intervals on appropriateness ratings of timeintervals was significant (χ2 (3) = 38.179, p < 0.001) (Fig. 8). The post hoc test suggested that the appropriateness ratings of 3 s (median = 10.0, IQR = 7.0–12.0) was significantly larger than that of 5 s (median = 8.0, IQR = 6.5–8.0) (Z = 3.175, p < 0.001, r = 0.677), 7 s (median = 7.0, IQR = 6.0–8.0) (Z = 3.305, p < 0.001, r = 0.705), and 9 s (median = 4.5, IQR = 2.0–6.0) (Z = 4.027, p < 0.001, r = 0.859). Moreover, the appropriateness rating of 9 s was significantly smaller than that of 5 s (Z = − 4.030, p < 0.001, r = −0.859) and 7 s (Z = −3.401, p < 0.001, r = −0.725). However, the difference between 5 s and 7 s on appropriateness ratings was not significant, which ratings of two time-intervals being closest to the “just right” rating (i.e., “8”). For the high traffic density, the Friedman test also suggested the significant main effect of time-intervals on appropriateness ratings of time-intervals (χ2 (3) = 20.593, p < 0.001). The post hoc test suggested that the appropriateness ratings of 3 s (median = 10.0, IQR = 9.0–11.0) was significantly larger than that of 5 s (median = 8.0, IQR = 7.75–9.0) (Z = 3.177, p < 0.001, r = 0.677), 7 s (median = 8.0, IQR = 7.0–9.0) (Z = 3.357, p < 0.001, r = 0.716), and 9 s (median = 7.0, IQR = 6.0–8.0) (Z = 3.439, p < 0.001, r = 0.733). In addition, the appropriateness rating of 9 s was slightly smaller than that of 5 s (Z = −1.898, p = 0.058, r = −0.405). No significant differences between 5 s and 7 s as well as 7 s and 9 s on appropriateness ratings were observed.

418

W. Zhang et al. low traffic density

high traffic density

Appropriateness

12

8

4

3

5

7

9 3 Timeinterval(s)

5

7

9

Fig. 8. The boxplot of raw data of appropriateness in terms of time-intervals in low and high traffic density (the dotted line from bottom to top: 1: far too long, “−”; 8: just right, “0” (the most appropriate rating); 15: fart too short, “ +”).

Usefulness Ratings of Time-Intervals. For the low traffic density, the Friedman test suggested that the main effect of time-intervals on usefulness ratings of time-interval was significant (χ2 (3) = 12.300, p < 0.01) (Fig. 9). The post hoc test suggested that the usefulness ratings of 3 s (median = 10.0, IQR = 8.75–12.0) was significantly smaller than that of 5 s (median = 12.0, IQR = 10.0–13.0) (Z = −2.415, p < 0.05, r = − 0.515) and 7 s (median = 12.0, IQR = 10.5–12.0) (Z = −2.547, p < 0.05, r = −0.543). Moreover, the usefulness rating of 9 s (median = 10.0, IQR = 8.75–12.25) was also significantly smaller than that of 5 s (Z = -2.929, p < 0.01, r = −0.625) and 7 s (Z = −2.256, p < 0.05, r = -0.481). For the high traffic density, the Friedman test indicated the main effect of timeintervals on usefulness ratings was not significant (χ2 (3) = 3.865, p = 0.276). low traffic density

high traffic density

15

Usefulness

12

9

6

3

5

7

9 3 Timeinterval(s)

5

7

9

Fig. 9. The boxplot of raw data of usefulness in terms of time-intervals in low and high traffic density (1: not very useful, “−”; 15: very useful, “ +”).

How to Determine the Time-Intervals of Two-Stage Warning Systems

419

4 Discussion To explore how the two-stage warning systems can benefit the takeover process when traffic conditions changed, the present study investigated the effect of the time-interval of the two-stage warning systems in different traffic densities on drivers’ takeover performance, situation awareness, and subjective ratings. We found that 5 s and 7 s were favored time-intervals for both low and high traffic density in general but the drivers’ response patterns and the beneficial aspects of time-intervals varied in different traffic densities. When traffic density was low, the 5 s and 7 s were favored time-intervals for swift takeover responses, sufficient situation awareness, and good subjective ratings, with the cost of relatively slower preparing actions. Specifically, results of hands-on-steeringwheel time suggested that drivers’ response time to make hands preparation increased with the time-interval when traffic density was low; in other words, drivers were reluctant to make preparation as fast as 3 s when the time-interval prolonged to 5 s or more. It was understandable because the takeover scenarios in the present experiment were not critical and the prolonged time-intervals let the situation be perceived as more unurgent further; thus, drivers believed it was unnecessary to make fast preparation. However, the driver took over the vehicle more quickly under 5 s and 7 s time-intervals rather than 3 s and 9 s. It suggested that 3 s might be insufficient for drivers to make preparation, which extended their reaction after the takeover warning was issued. Moreover, drivers got higher situation awareness when time-intervals were 5 s, 7 s, and 9 s than 3 s, which implied that drivers needed at least 5 s to get sufficient situation awareness. In addition, the benefit of time-intervals of 5 s and 7 s was also supported by the subjective ratings. Thus, combining the results of the above indicators, the 5 s and 7 s were two favored time-intervals in general. When the traffic density was high, 5 s and 7 s were also advisable time-intervals for fast takeover responses without the cost of preparation speed but the time-interval could not compensate for the reduced situation awareness. Specifically, drivers’ responses to putting their hands on the steering wheel were slower until the time-interval increased to 9 s, which in turn suggested that they had similar fast responses when the timeinterval was 3 s, 5 s, or 7 s. Moreover, drivers also took over more swiftly when the time-interval was 5 s or 7 s in high traffic density than that of 3 s or 9 s. The subjective rating again suggested the 5 s and 7 s as more appropriate time-intervals than the other. Therefore, the above objective and subjective results again supported the 5 s and 7 s were favored time-intervals when the traffic density was high. However, the overall situation awareness of drivers was lower in high traffic density than that in low traffic density. No differences in situation awareness among four time-intervals were observed in high traffic density. It seems that prolonging the time-interval could not compensate for the reduced situation awareness caused by the increased traffic density. On the one hand, it might because of the unbalanced distribution of limited human attention resources [30]. As the primary takeover reaction of drivers was to change the vehicle to another lane, the drivers would allocate most of their time to the nearest moving vehicles, analyze their speed and location, and plan to find an opportunity to change the lane during the preparation time. The above process would last until the takeover warning was issued; at which time they could take actual actions. On the other hand, the previous studies

420

W. Zhang et al.

suggested stress as a mediating factor in situation awareness [31], which narrowed the drivers’ field of attention to include only several central aspects. Therefore, it might be because the stress induced by high traffic density aggravated the drivers’ attention on one or two certain vehicles and interfered with their evaluations for the overall situation awareness under different time-intervals. As a consequence, drivers were less likely to grasp the overall situation even though the time-interval was prolonged. Regardless of traffic density, it should note that a U-shaped relationship between time-intervals and takeover time was observed. No matter whether the traffic density was low or high, drivers took over more swiftly in the time-interval of 5 s and 7 s than in 3 s and 9 s. This result further indicated that the time for drivers to make preparation could not be too short or too long while the middle length of time-intervals would be the most suitable. The subjective ratings of appropriateness also supported this finding. The above results replicated the finding of our previous study [5], which investigated the effect of neuroticism personality and time-intervals on the takeover process and found a U-shaped relationship between time-intervals and takeover time regardless of personality type. Thus, it can be concluded that the effect of time-intervals on takeover time was robust no matter the traffic situation or drivers’ neuroticism personality. Regardless of time-intervals, we observed that the drivers were quicker to put their hands on the steering wheel in high traffic density, which was reasonable as the increased number of vehicles let drivers perceive the situation as more hazardous and facilitated their faster responses. In addition, they had worse takeover quality in terms of the larger maximum resulting acceleration in high traffic density conditions. It might be because drivers in high traffic density needed to avoid collisions with takeover scenarios as well as surrounding vehicles so they should not only break the pedal but also steer the wheel while steering the wheel might be enough to avoid hazards in the low traffic condition. This finding was consistent with previous research [14], which also found the larger acceleration caused by the presence of traffic density when compared with the no-traffic situation. Although the present experiment was elaborately designed, there were several limitations. First, all participants in the experiment were college students, who lacked driving experience in general. As the driving experience may influence the gain of situation awareness [32], a diverse pool of participants was needed to generalize the present findings. Second, the present study adopted an easy-to-use evaluation tool “SART” to measure the drivers’ situation awareness. Some limitations such as the loss of memory, the inability of drivers to rate their situation awareness, the impact of the performance, and possible confounding with workload issues may exist in the subjective evaluation [33], future studies were expected to adopt other measurements (e.g., eye movement, SAGAT) to validate the present findings. Third, the present study was conducted in a driving simulator, where the drivers may have different behavioral patterns from that in a real driving environment. The present study was expected to be studied in a more realistic driving environment. Lastly, the complexity of driving scenarios was manipulated by traffic density as many previous studies did [14–16]. However, other factors like the speed of the nearby vehicles, the road types, and the weather also influenced the complexity of driving scenarios [3], which should be considered in future studies.

How to Determine the Time-Intervals of Two-Stage Warning Systems

421

5 Conclusion The present study explored the effect of the time-interval of the two-stage warning system and traffic density on drivers’ motor readiness, situation awareness, takeover performance, and subjective ratings. Results suggested that 5 s and 7 s (compared with 3 s and 9 s) were two favored time-intervals in general but the drivers’ response patterns and the beneficial aspects of time-intervals varied in different traffic densities. When traffic density was low, the 5 s and 7 s were favored time-intervals for swift takeover responses, sufficient situation awareness, and good subjective ratings, with the cost of relatively slower preparing reactions. When the traffic density was high, 5 s and 7 s were also advisable time-intervals for fast takeover responses without the cost of preparation speed. However, prolonging the time-interval could not compensate for the reduced situation awareness in high traffic density. The present findings provided reference implications for the design and application of the two-stage warning systems in different driving environments. Future studies were suggested to investigate how to improve the situation awareness of drivers when using the two-stage warning systems in complex traffic environments.

References 1. Zhang, B., et al.: Determinants of take-over time from automated driving: a meta-analysis of 129 studies. Transport. Res. F: Traffic Psychol. Behav. 64, 285–307 (2019) 2. SAE International. Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles. (J3016_202104), 30 April 2021. https://www.sae.org/standards/ content/j3016_202104 3. McDonald, A.D., et al.: Toward computational simulations of behavior during automated driving takeovers: a review of the empirical and modeling literatures. Hum. Factors 61(4), 642–688 (2019) 4. Ma, S., et al.: Take over gradually in conditional automated driving: the effect of two-stage warning systems on situation awareness, driving stress, takeover performance, and acceptance. Int. J. Hum.-Comput. Interact. 37(4), 352–362 (2021) 5. Zhang, W., et al.: Optimal time intervals in two-stage takeover warning systems with insight into the drivers’ neuroticism personality. Front. Psychol. 12, 601536 (2021) 6. Werneke, J., Kleen, A., Vollrath, M.: Perfect timing: urgency, not driving situations, influence the best timing to activate warnings. Hum. Factors 56(2), 249–259 (2014) 7. Winkler, S., Werneke, J., Vollrath, M.: Timing of early warning stages in a multi stage collision warning system: Drivers’ evaluation depending on situational influences. Transport. Res. F: Traffic Psychol. Behav. 36, 57–68 (2016) 8. Wang, R., et al.: Implementation of driving safety early warning system based on trajectory prediction on the internet of vehicles environment. Secur. Commun. Netw. (2022) 9. Rousseau, G.K., Lamson, N., Rogers, W.A.: Designing warnings to compensate for agerelated changes in perceptual and cognitive abilities. Psychol. Mark. 15(7), 643–662 (1998) 10. Wogalter, M.S., Leonard, S.D.: Attention capture and maintenance. Warn. Risk Commun., 123–148 (1999) 11. Li, S., et al.: Investigation of older driver’s takeover performance in highly automated vehicles in adverse weather conditions. IET Intel. Transp. Syst. 12(9), 1157–1165 (2018) 12. Louw, T., et al.: Coming back into the loop: Drivers’ perceptual-motor performance in critical events after automated driving. Accid. Anal. Prev. 108, 9–18 (2017)

422

W. Zhang et al.

13. Du, N., et al.: Evaluating effects of cognitive load, takeover request lead time, and traffic density on drivers’ takeover performance in conditionally automated driving. In: 12th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 66–73, ACM, New York (2020) 14. Gold, C., et al.: Taking over control from highly automated vehicles in complex traffic situations: The role of traffic density. Hum. Factors 58(4), 642–652 (2016) 15. Körber, M., et al.: The influence of age on the take-over of vehicle control in highly automated driving. Transport. Res. F: Traffic Psychol. Behav. 39, 19–32 (2016) 16. Radlmayr, J., et al.: How traffic situations and non-driving related tasks affect the take-over quality in highly automated driving. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, pp. 2063–2067. Sage Publications Sage CA, Los Angeles (2014) 17. So, J.J., et al.: Investigating the impacts of road traffic conditions and driver’s characteristics on automated vehicle takeover time and quality using a driving simulator. J. Adv. Transp. 2021, 1–13 (2021) 18. Doubek, F., et al.: Takeover quality: assessing the effects of time budget and traffic density with the help of a trajectory-planning method. J. Adv. Transp. 2020, 1–12 (2020) 19. Hergeth, S., Lorenz, L., Krems, J.F.: Prior familiarization with takeover requests affects drivers’ takeover performance and automation trust. Hum. Factors 59(3), 457–470 (2017) 20. Nguyen, T., et al.: A review of situation awareness assessment approaches in aviation environments. IEEE Syst. J. 13(3), 3590–3603 (2019) 21. Selcon, S.J., Taylor, R.M., Koritsas, E.: Workload or situational awareness?: TLX vs. SART for aerospace systems design evaluation. In: Proceedings of the Human Factors Society Annual Meeting, pp. 62–66. SAGE Publications Sage CA, Los Angeles (1991) 22. Heller, O., Theorie und Praxis des Verfahrens der Kategorienunterteilung (KU): Forschungsbericht. Würzburg: Psychologisches Institut, Lehrstuhl für Allgemeine Psychologie, 1–15 (1981) 23. Bazilinskyy, P., de Winter, J.: Auditory interfaces in automated driving: an international survey. PeerJ Comput. Sci. 1, e13 (2015) 24. Petermeijer, S., Doubek, F., De Winter, J.: Driver response times to auditory, visual, and tactile take-over requests: a simulator study with 101 participants. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1505–1510. IEEE, Banff (2017) 25. Baayen, R.H., Davidson, D.J., Bates, D.M.: Mixed-effects modeling with crossed random effects for subjects and items. J. Mem. Lang. 59(4), 390–412 (2008) 26. Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88(421), 9–25 (1993) 27. Wan, Y., Sarter, N.: Attention limitations in the detection and identification of alarms in close temporal proximity. Hum. Factors, 00187208211063991 (2022) 28. Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937) 29. Wilcoxon, F.: Individual Comparisons by Ranking Methods. Springer, New York (1992) 30. Kahneman, D.: Attention and Effort. PRENTICE-HALL INC, Englewood Cliffs (1973) 31. Endsley, M.R.: Toward a theory of situation awareness in dynamic systems. In: Salas, E. (ed.) Situational awareness, Routledge, London, pp. 9–42 (2017) 32. Wright, T.J., et al.: Experienced drivers are quicker to achieve situation awareness than inexperienced drivers in situations of transfer of control within a Level 3 autonomous environment. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, pp. 270–273. Sage Publications Sage CA, Los Angeles (2016) 33. Endsley, M.R., et al.: A comparative analysis of SAGAT and SART for evaluations of situation awareness. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, pp. 82–86. SAGE Publications Sage CA, Los Angeles (1998)

Human-Centered Design of Autonomous Systems

The City Scale Effect and the Baidu Index Prediction Model of Public Perceptions of the Risks Associated with Autonomous Driving Technology Jingxi Chen1 , Riheng She1 , Shuwen Yang2 , and Jinfei Ma2(B) 1 School of Languages and Communication Studies, Beijing Jiaotong University,

Beijing 100091, People’s Republic of China 2 College of Phsychology, Liaoning Normal University, Dalian 116029, People’s Republic of

China [email protected] Abstract. The uneven development of autonomous driving in China has resulted in a lack of clarity regarding public perceptions of this emerging technology. The aim of the present study was to compare the awareness of autonomous driving technology in various Chinese cities based on keyword searches of three categories related to perceived risk, trust, and perceived value in the Baidu index. The search terms were “autonomous driving accident,” “autonomous driving safety questioned,” and “autonomous driving cost performance,” representing the categories of perceived risk, trust, and perceived value, respectively. These categories were then classified into four dimensions according to the level of development and geographical location of the cities in which the searches were conducted. Specifically, the three keyword categories (autonomous driving accident, safety questioned, and cost performance) * 2 two factors in a mixed design. SPSS Statistics 22.0 served to analyze the descriptive statistics, and ANOVA, correlation analysis, and regression analysis served to interpret the results. The results of the study show that there seems to be a trend in public attention to autonomous driving in China in that the level of a city’s economic development and consumption, the size of its science and technology sector, and the extent of its internet penetration and transportation infrastructure were positively associated with the level of public concern as well as public perceptions of the value, risk, and trust associated with autonomous driving technology. This study complements previous research on public concern about autonomous driving technology by making use of a larger sample than most technology acceptance studies. Keywords: Autonomous driving · Baidu index · Public concern · Perceived value · Perceived risk · Trust

1 Introduction In the information technology era, intelligent vehicles represent an inevitable trend in the development of the automotive industry that will initiate, in turn, a further round of technological and industrial change. The Medium and Long-term Development Plan for © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 425–441, 2023. https://doi.org/10.1007/978-3-031-35389-5_29

426

J. Chen et al.

the Automotive Industry jointly issued in 2017 by the Chinese Ministry of Industry and Information Technology, National Development and Reform Commission, and Ministry of Science and Technology indicated that automotive development in the country will involve a move toward new energy vehicles, autonomous driving, and related vehicles. Autonomous driving technology has, then, become central to efforts to improve China’s automotive industry. As the technology matures, survey data indicate, the attitude of the Chinese public toward autonomous vehicles is growing increasingly positive, more so than the attitudes of the citizens of developed countries [1], with more than 94% of Chinese respondents expressing willingness to pay a premium for an intelligent driving car [2]. Thus, the market for autonomous vehicles in China is potentially enormous. The research on autonomous vehicles has tended to focus on consumers’ perceptions, attitudes, and behaviors [3]. However, scholars have recently begun to explore the advent of the technology from fresh perspectives, such as perceived value (based on the theory of planned behavior or technology acceptance model) [4–7], perceived risk and trust [5, 8], and variables related to the acceptance of technology [9, 10]. Some have proposed a new integrated model [7, 11]. However, external factors also influence consumers’ perceptions of autonomous driving [12–14], and the level of urban development can affect not only consumers’ psychological perceptions of such factors as access [15] and consumption [16] but also their behavior [17, 18]. In the context of the national policy of encouraging the production and sale of autonomous vehicles, the effect of the level of urban development in China on consumers’ search behavior relating to self-driving vehicles, particularly their focus when searching for information on the topic, is a matter of considerable significance. In recent years, researchers in China and elsewhere have provided valuable insights into perceptions of autonomous vehicles. The customer perceived value theory proposed by Peter Drucker laid the foundation for research on the acceptance of autonomous driving, which is expected to improve road safety, help the elderly and disabled to drive, increase fuel efficiency, reduce air pollution, and provide drivers with a more comfortable and convenient driving experience [19–22]. Consumers naturally buy products with the greatest perceived value [24] and least perceived risk [25]. Researchers have identified five main categories of perceived risk to consumers from autonomous driving, specifically, privacy [26], performance [27, 28], safety [21], financial [29], and psychosocial risks [30]. Perceived value and perceived risk, then, significantly influence consumers’ willingness to accept and purchase products [31], and trust also plays a mediating role in the human-automation relationship [32]. In general, perceived value positively influences trust, perceived risk negatively influences trust, and trust has a significant negative influence on perceived risk. Trust is also a key determinant of users’ reliance on automated systems [33–35], adoption of automation [34, 36], and willingness to use autonomous vehicles specifically [7]. The previous findings, which indicate a focus by domestic and international scholars on the popularity and acceptance of autonomous driving, have significantly enhanced the understanding of public attitudes toward autonomous driving. However, to date, the development of the technology in Chinese cities has been uneven and has involved little effort to respond to

The City Scale Effect and the Baidu Index Prediction

427

public concerns. In other words, a macro-level inquiry into the factors that influence the perceived value, risk, and trust associated with autonomous driving has been lacking. Previous studies of these issues have tended to rely on questionnaires and interviews to collect information, an approach that has several drawbacks. To begin with, the sample size and representativeness are limited, and the results can serve only to measure the attitudes of consumer groups with purchasing needs and abilities and, in this case, driving experience, leaving the attitudes of other groups unrepresented. Second, given that these tools are used in a closed environment and rely on structured answers, it is doubtful that consumers’ responses fully reflect their true thoughts. Accordingly, a different approach is used here based on consumers’ online activity. The 50th Statistical Report on the Development of the Internet in China issued by the China Internet Network Information Centre reports the number of Chinese Internet users in June 2022 as 1.051 billion, with the internet penetration rate reaching 74.4%. Baidu, as the largest search engine on the Internet in China, holds a significant share of the search engine market [37], with more than 6 billion search requests per day. This enormous number of internet users searching for information provides a database that researchers can access. The Baidu Index, generated based on big data analysis, covers almost all individual searches, including for information related to autonomous driving, thereby providing a measure of the real and exact search behavior of a significant portion of the Chinese public. The present paper, in addition to introducing the use of the Baidu Index to the study of autonomous driving by looking at regional differences in concerns about the technology, draws on earlier research using statistical yearbook data to model behavior [38, 39]; thus, urban development indicators from the China Urban Statistical Yearbook serve here to predict search behavior specific to the topic at hand. In sum, though the body of research on public attitudes toward autonomous driving technology is growing, many questions remain unanswered, in part because studies of the technology have been limited in terms of both the size of the samples and the focus on the micro-level of users as well as a lack of regional granularity. The aim of the present study was to transcend these limitations by analyzing the behavior of research subjects in 68 mainland Chinese cities and using the 2021 autonomous driving search index from the Baidu Index as the base dataset as well as select city development indicators from the China Statistical Yearbook. This approach served to construct and fit a model of the influence of each city’s developmental level and location on its citizens’ attention to autonomous driving.

2 Empirical Study 2.1 Data Sources In this study, 68 cities were selected from 663 cities in China based on their representativeness and level of economic development. This sample includes 19 first-tier and new first-tier cities, 49 s- to fourth-tier cities, 17 coastal and 51 non-coastal cities, 31 provincial capitals and 37 non-capital cities, and 28 northern and 40 southern cities. The China Urban Statistical Yearbook, sponsored by the Department of Urban Social and Economic Survey of the National Bureau of Statistics of the People’s Republic of China, is an informative annual publication that provides a comprehensive picture of the

428

J. Chen et al.

economic and social development of Chinese cities. The analysis presented here made use of 12 of the 295 statistical variables that the China Urban Statistical Yearbook tracks, as Table 1 shows. Table 1. Classification of selected urban statistical variables Urban statistical variables

Mean ± standard error

Skewness

Kurtosis

Skewness

Skewness standard error

Kurtosis

Kurtosis standard error

3.489

0.291

19.421

0.574

0.61

0.291

−0.211

0.574

739.628

3.124

0.291

13.382

0.574

Control variables

Average annual population (10,000)

Urban economic development

Gross regional product per capita (GRP in $)

Urban transport development

Number of vehicles operating public buses (trams) at the end of the year (vehicles)

5,249.1

Total annual public bus (tram) passenger traffic

60,105.46

7,462.142

1.858

0.291

3.992

0.574

Number of vehicles operating for hire at the end of the year (vehicles)

8,771.92

1,265.281

3.897

0.291

20.047

0.574

Road passenger traffic

9,187.47

1,711.945

3.594

0.291

14.034

0.574

41,772.69

6,559.496

2.245

0.291

5.349

0.574

Urban science and technology development

Number of patent applications (pieces)

Urban consumption

Total retail sales of social consumer goods (RMB million)

Urban internet development

Revenue from telecommunications business (RMB million)

Inventions

Year-end mobile phone subscribers (millions) Number of internet broadband connections (millions)

673.55

54.865

94,276.53

4,874.586

4,897.45 34,313,750.3

1,406,114.548

1,119.192

306.4827

1,035.638

3.687

0.291

3,948,891.501

1.969

0.291

4.374

0.574

3.908

0.291

19.555

0.574

105.10604

1.977

0.291

3.988

0.574

26.83557

2.185

0.291

7.387

0.574

228,097.6832

16.45

0.574

The City Scale Effect and the Baidu Index Prediction

429

To maximize the universality and timeliness of the data selection, a search matrix was created by obtaining the whole-month average of Baidu’s search indices from January 1 to December 31, 2021. After the selection of the 68 cities for positioning, the 10 most-searched keywords were then selected and classified as either “autonomous driving accident” (loss of control or accident involving autopilot), “autonomous driving safety questioned” (reliability and safety of autopilot, including for fixed-speed cruising, smart driving, driverless driving, adaptive cruising), and “autonomous driving cost performance” (cost performance of Tesla and Azera). The semantic analysis was combined with perceived value, perceived risk, and trust theory. By correcting the data, it was possible to use the overall searches for 2021 to identify the relevant concerns across the cities. 2.2 Research Methodology Study Design. In This Study, Four Dimensions of the Cities Were Distinguished Based on Their Development and Geography, Specifically, Their Size, Coastal or Inland Location, Location in the North or South, and Administrative Level. For Each Dimension, a Four-Group ANOVA Was Conducted Using a Two-Factor Mixed Design of 3 (Keyword Category) * 2 (City Category). Keyword Processing. By categorizing and summarizing the keywords included in the Baidu index, 10 keywords related to autonomous driving were finally selected and divided into three main categories. The annual search indices of the keyword search terms “autonomous driving accident,” “autonomous driving safety questioned” and “autonomous driving cost performance” (PC + Mobile) served to assess the development of and changes in the perceived risk, trust, and value of autonomous driving technology, respectively. Independent Variables. The within-group variables were the keyword categories divided into keywords in the category of autonomous driving accidents, questions about the safety of autonomous driving, and the cost performance of autonomous driving. The city categories served as the between-group variables, in which four dimensions were distinguished and two levels under each dimension (i.e., for size, first-tier, or non-firsttier; for coastal location, coastal or non-coastal; for north-south location, southern or northern; for administrative level, provincial capital or not). Dependent Variable. The Baidu index served as the dependent variable. Using the keywords as the statistical object, the weighted sum of the search frequency of each keyword in the Baidu web searches was scientifically analyzed and calculated according to the various search sources. Based on the correlation analysis, the average annual population served as the control variable and the remaining 12 urban statistical yearbook variables as the predictor variables. The Baidu index of the three keywords served as the outcome variable for the regression analysis. Methodology for Data Statistics. Because of the variation in the average annual population, number of broadband internet connections, economic development, geographic location, and policies across the cities, this study used the Baidu index as a measure of public concern and took “autonomous driving” as its scope. The sample of 68 major

430

J. Chen et al.

Chinese cities served as the research target, and the keywords were “autonomous driving accident,” “autonomous driving safety questioned,” and “autonomous driving cost performance.” Statistical analysis methods were used for the descriptive statistical analysis, interaction analysis, correlation analysis, and regression analysis with Excel and SPSS Statistics 22.0.

3 City Scale Effect and Baidu Index Prediction Model 3.1 Variation in Attention to Autonomous Driving Technology Across Cities Based on Keywords Table 2 presents the ANOVA results, showing the significant main effects for city size (F(1,65) = 80.454, p = 0.000, η2 = 0.553), average annual population (F(1,65) = 4.518, p = 0.037, η2 = 0.065), and keyword category (F(2,65) = 16.178, p = 0.000, η2 = 0.199) and a significant interaction among the city size, the population and the keyword category (F(2,65) = 17.450, p = 0.001, η2 = 0.212). The main effects of coastal location (F(1,65) = 6.836, p = 0.011, η2 = 0.95), average annual population (F(1,65) = 24.809, p = 0.000, η2 = 0.276), and keyword category were significant (F(2,65) = 4.823, p = 0.010, η2 = 0.069), while the interaction among the coastal location, the population and the keyword category (F(2,65) = 0.776, p = 0.462, η2 = 0.012) were insignificant. The main effect of the north-south location of the cities was insignificant (F(1,65) = 2.761, p = 0.101, η2 = 0.041) while the main effects of the average annual population (F(1,65) = 21.564, p = 0.000, η2 = 0.249) and keyword category were significant (F(2,65) = 4.741, p = 0.010, η2 = 0.68), as was the interaction among the north-south location, the population and the keyword category (F(2,65) = 0.526, p = 0.592, η2 = 0.08). The main effects of city administrative level (F(1,65) = 7.730, p = 0.007, η2 = 0.106), average annual population (F(1,65) = 16.929, p = 0.000, η2 = 0.207), and keyword category were significant (F(2,65) = 8.431, p = 0.000, η2 = 0.115), as was the interaction among the city administrative level, the population and the keyword category (F(2,65) = 9.024, p = 0.000, η2 = 0.122). In other words, city size, location, and status as a provincial capital all exerted significant effects on consumers’ attention to autonomous driving technology: users in tier-1, coastal, and provincial capital cities paid significantly more attention to autonomous driving than those in cities that were not first-tier, coastal, or provincial capitals. A two-by-two comparison of the three keyword categories at two levels of city development and a simple effects analysis by LSD post-hoc test showed that the differences among the keyword categories were significant in the tier-1 cities (p = 0.000), with the searches for the “autonomous driving accident” category being significantly more frequent than those for the “autonomous driving safety questioned” or “autonomous driving cost performance”(M = 313.20, p = 0.000; M = 182.71, p = 0.000) while searches for “autonomous driving safety questioned” were significantly less frequent than those for “autonomous driving cost performance” (M = 130.50, p = 0.02). There were significantly more searches for “autonomous driving accident” than for the other two categories at the non-tier-1-city level (M = 89.70, p = 0.000; M = 96.71, p = 0.000),

The City Scale Effect and the Baidu Index Prediction

431

Table 2. ANOVA results for city categories and the three types of automated driving keywords Variables

Size

Tier 1 cities

City category

Keyword category

City category* Keyword category

F

F

F

p

p

p

80.454 0.000 16.178 0.000 17.450 0.001

Non-tier 1 cities Coastal location

Coastal cities

6.836 0.011

4.823 0.010

0.776 0.462

2.761 0.101

4.741 0.010

0.526 0.592

7.730 0.007

8.431 0.000

9.024 0.000

Non-coastal cities North-south location Southern cities Northern cities Administrative level Provincial capital cities Non-provincial cities

and there were more searches for “autonomous driving safety questioned” than for “autonomous driving cost performance,” though the difference was insignificant (M = 7.01, p = 0.34). These results indicate that users in the tier-1 cities differentiated among the three types of autonomous driving keywords and expressed a need for clear, detailed, and specific information when searching. Users in the non-tier-1 cities, on the other hand, did not clearly distinguish in their queries between safety and cost performance. In general, then, the level of the residents’ interest in and understanding of autonomous driving correlated with the size of the cities in which they resided (Fig. 1).

Fig. 1. Effect of city size variation on attention to autonomous driving keywords

432

J. Chen et al.

The residents of the urban coastal cities performed significantly more searches for “autonomous driving accident” than for “autonomous driving safety questioned” or “autonomous driving cost performance” (M = 180.15, p = 0.000; M = 123.35, p = 0.001) and fewer searches for “autonomous driving safety questioned” than for “autonomous driving cost performance,” but this difference was not significant (M = 56.80, p = 0.153). The residents of the non-coastal cities searched for “autonomous driving accident” significantly more often than for the other two keywords (M = 180.15, p = 0.000; M = 123.35, p = 0.000) and less often for “autonomous driving safety questioned” than for “autonomous driving cost performance,” but this difference was not significant (M = 22.95, p = 0.088). These results suggest that no significant difference existed between users in the coastal and non-coastal cities in terms of their knowledge or searches for “safety questioned” and “cost performance” relating to autonomous driving technology (Fig. 2).

Fig. 2. Effect of coastal or inland location of cities on attention to autonomous driving keywords

The residents of the southern urban cities searched for “autonomous driving accident” significantly more often than for “autonomous driving safety questioned” or “autonomous driving cost performance” (M = 358.91, p = 0.000; M = 281.56, p = 0.000) and for “autonomous driving safety questioned” significantly less often than for “autonomous driving cost performance” (M = 77.34, p = 0.049). The residents of the northern urban cities searched for “autonomous driving accident” significantly more often than for the other two keywords (M = 312.18, p = 0.000; M = 47.63, p = 0.000) and less often for “autonomous driving safety questioned” than for “autonomous driving cost performance,” but the difference was not significant (M = 39.13, p = 0.505). These results indicate that the users in the southern cities clearly distinguished among the three types of autonomous driving keywords while users in northern cities did not (Fig. 3).

The City Scale Effect and the Baidu Index Prediction

433

Fig. 3. Effect of north or south location of cities on attention to autonomous driving keywords

The residents of the provincial capital cities searched for “autonomous driving accident” significantly more often than for “autonomous driving safety questioned” or “autonomous driving cost performance” (M = 234.03, p = 0.001; M = 169.32, p = 0.001) and for “autonomous driving safety questioned” significantly less often than for “autonomous driving cost performance” (M = 64.71, p = 0.016). The residents of the non-capital cities searched for “autonomous driving accident” significantly more often than for the other two keywords (M = 83.55, p = 0.001; M = 80.04, p = 0.001) and less often for “autonomous driving safety questioned” than for “autonomous driving cost performance,” but the difference was not significant (M = 3.51, p = 0.766). These results indicate that the users in the provincial capitals differentiated among the three types of autonomous driving keywords, while there was no significant difference in the level of concern about the “autonomous driving safety questioned” and “autonomous driving cost performance” between the users in the capital and non-capital cities. As the graph makes clear, the residents of the provincial capitals were more concerned about autonomous driving than those in the non-capital cities (Fig. 4).

Fig. 4. Effect of city administrative level on attention to autonomous driving keywords

434

J. Chen et al.

3.2 Level of Urban Development as a Predictor of Public Attention to Autonomous Driving Technology In this study, 12 statistical variables from the statistical yearbook for Chinese cities were selected for correlation and linear regression analysis of three types of autonomous driving keywords. Table 3 shows the results of the correlation analysis. Table 3. Results of the analysis of the correlation between the city statistical yearbook and the three types of autonomous driving keywords (N = 68). Autonomous driving accident

Autonomous driving safety questioned

Autonomous driving cost performance

Average annual population (10,000)

0.493**

0.468**

0.418**

Gross regional product per capita (million yuan) GRP

0.684**

0.730**

0.733**

Number of public transport (tram) vehicles in operation at the end of the year (in thousands)

0.798**

0.818**

0.792**

Total annual public bus (tram) passenger traffic (in millions)

0.858**

0.821**

0.767**

Number of vehicles operating for hire at the end of the year (in thousands)

0.724**

0.637**

0.662**

Road passenger traffic (in millions)

0.253*

0.246*

0.197

Number of patent applications (in millions)

0.825**

0.880**

0.869**

Inventions

0.790**

0.743**

0.771**

Total retail sales of social consumer goods (RMB million)

0.903**

0.898**

0.889**

Revenue from telecommunications business (RMB million)

0.557**

0.519**

0.550**

Year-end mobile phone subscribers (in millions)

0.857**

0.888**

0.829**

Number of internet broadband connections (in millions)

0.766**

0.751**

0.731**

Note: * p < 0.05,** p < 0.01,*** p < 0.001

The results indicate that the level of development of the cities had a positive predictive effect on their residents’ online search behavior relating to autonomous driving technology. Specifically, the cities’ levels of economic, technological, and internet development

The City Scale Effect and the Baidu Index Prediction

435

and of consumption and transportation correlated significantly and positively with their residents’ attention to autonomous driving technology. 3.3 Baidu Index Prediction Model for Autonomous Driving Technology Based on the Urban Statistical Variables To test further the relationship between the city statistical variables and the search keywords based on the correlation analysis, the average annual population served as the control variable, the keyword Baidu search index as the outcome variable, and the remaining 12 city statistical variables as the predictor variables for the linear regression analysis. Table 4 shows the regression results, which indicate that comprehensive development correlated with the intensity of the residents’ web search behavior regarding autonomous driving technology. Table 4. Linear regression analysis of the study variables (N = 68) Regression equation (N = 68)

Fitted indicators

Variables

Predictor variables

R

R2

F(df)

β

t

Autonomous driving accident

Total retail sales of social consumer goods

0.936

0.867

88.343*** (5)

0.543

4.79***

Total public bus (tram) passenger traffic for the year

0.349

4.20***

Gross regional product per capita

0.198

2.84***

0.488

4.98***

0.479

4.02***

0.657

5.51***

0.416

4.11***

Autonomous driving safety questioned

Number of patent applications

Autonomous driving cost performance

Total retail sales of social consumer goods

0.929

0.857

Coefficient Significance

135.383*** (3)

Total retail sales of social consumer goods 0.931

0.858

102.266*** (4)

Number of patent applications

Note: Using the average annual population as a control variable

After controlling for the average annual population, the results of the multiple linear regression analysis showed a significant multiple linear relationship among the two urban statistical variables of total retail sales of consumer goods, the total annual public transport passenger volume, and the search index for the keyword “autonomous driving accident,” with the independent variable explaining 86.7% of the variation in the overall dependent variable (corrected R2 = 0.867). There was also significant multicollinearity

436

J. Chen et al.

among the two urban variables of total retail sales of consumer goods, the number of patent applications, and the search index for the keyword “autonomous driving safety questioned,” with the independent variable explaining 85.8% of the variation in the overall dependent variable (corrected R2 = 0.858). The per capita gross regional product served to indicate the level of urban economic development, and the total retail sales of consumer goods served to indicate the level of urban consumption. The results show that both GDP per capita and the total retail sales of consumer goods correlated positively with public concern about autonomous driving technology, indicating that a city’s level of economic development as well as urban consumption significantly and positively influenced public perceptions of value, risk, and trust with respect to autonomous driving technology. Since technological innovation can promote urban economic development [40] and urban economic development and technological innovation are mutually reinforcing and complementary, the number of patent applications served to reflect the technological development of the cities sampled for this study. The findings indicate that patent grants as well as inventions correlated positively with public attention to autonomous driving technology. Thus, the extent of the cities’ technological development significantly and positively influenced their residents’ perceptions of value, risk, and trust with respect to autonomous driving technology. Cities characterized by a high level of technological development attract large numbers of professional niche talents and host expert seminars, summits, and forums that are often effective in promoting the diffusion of information, including information relating to autonomous driving technology among their residents. The introduction of products associated with the technology, discussions of safety performance, and responses to negative press by authorities such as product managers, the heads of car companies, and other experts can reduce the perception that autonomous driving carries significant risks, thereby increasing acceptance of it. In these respects, a city’s technological development correlates with its residents’ likelihood of perceiving autonomous driving as valuable and as trustworthy rather than risky. In addition, the related analysis demonstrated that telecommunications business revenue and the numbers of mobile phone subscribers at year-end, broadband internet connections, actual electric public transport vehicles in operation at year-end, public bus passengers in operation throughout the year, and actual rental vehicles in operation at year-end correlated positively with public concern about autonomous driving technology. In other words, the level of internet development and transport in the cities significantly and positively influenced the perceptions of value, risk, and trust in autonomous driving technology among their residents.

4 Conclusions and Discussion The comprehensive analysis of the search index data of 68 cities in China presented here supports a number of conclusions regarding the city scale effect on public perceptions of the risks associated with autonomous driving technology. To begin with, the level of a city’s comprehensive development correlated with the level of public concern about autonomous driving technology. Further, irrespective of a city’s developmental or administrative level, the residents of the cities sampled for this study were especially

The City Scale Effect and the Baidu Index Prediction

437

concerned about the safety of the technology and perceived it as risky. Also, the prediction model for the level of concern about autonomous driving technology constructed from the city scale effect predicts the top 10 cities where autonomous driving technology is expected to develop rapidly. In terms of city scale effects, the analysis of the data in this study indicates that the overall level of a city’s development, particularly economic development correlated with the detail and specificity of its residents’ searches for information about autonomous driving technology. Thus, the residents of the cities with low levels of overall development had less information and a more general and vague understanding of autonomous driving than the residents of the more developed cities. This finding is in line with that of a cross-national survey on the acceptance of driverless cars that the GDP per capita of the respondents’ countries predicted the average overall level of acceptance across countries [13]. However, the effect of the cities’ geographical location on their residents’ awareness of autonomous driving technology was insignificant in this study. Specifically, the difference between the coastal and non-coastal cities was significant, but the effect of economic and other factors could not be excluded. In terms of public perceptions of the risks associated with autonomous driving technology, the most-searched keywords in every city in this study were “autonomous driving accident,” irrespective of a city’s size, coastal or north-south location, or administrative level. This finding indicates that safety is the public’s primary concern when it comes to autonomous driving, and consumers naturally prefer to minimize perceived risk when deciding on their purchase behavior (as opposed to maximizing perceived benefits) [41]. Previous research has shown that perceived risk can negatively affect consumers’ attitudes and intentions to adopt innovative products and services [27, 29, 42, 43]. Accordingly, it is to be expected that, if consumers perceive that adopting and using autonomous vehicles is risky, they are more likely to develop negative attitudes about the technology and less likely to adopt it. The second-most-searched keyword by the residents of the cities sampled for this study (after “autonomous driving accident”) was “autonomous driving cost performance,” suggesting that the perceived value of autonomous driving is also an important factor influencing public concern about the technology. The existing research discusses the perceived value of autonomous vehicles in terms of improving road safety, reducing traffic congestion and air pollution, improving fuel efficiency, increasing transportation accessibility for elderly or disabled drivers, and enhancing the comfort and convenience of driving [19–22]. In the linear regression analysis for this study, the coefficient on the “autonomous driving cost performance” variable was greater for the total retail sales of consumer goods than for the “autonomous driving safety questioned” variable. In addition to further confirming that the level of urban economic development influences public concern about autonomous driving technology, the results presented here suggest that, naturally, price is an important factor in the public perceptions of the value of autonomous driving technology, but it does not significantly affect public trust in it. This finding is at variance with the findings of Peng Jing [9], where factors such as the level of culture and transport infrastructure, the market penetration of autonomous vehicles, technological development, government policies, and pricing in various countries influence attitudes associated with trust in autonomous driving technology. However, the

438

J. Chen et al.

picture was the opposite in the non-tier-1 cities. A deeper understanding of the reasons for these differences between less- and more-developed cities is thus a topic for future research. Lastly, the comparison of the cities with respect to their residents’ searches for autonomous driving online presented here served to identify the characteristics of cities with good prospects for the development of autonomous driving. These characteristics include a high level of (especially economic) development, a mature automotive industry chain, a sufficient pool of professionals, and the active support of government policies. These indicators of comprehensive city development predict that autonomous driving technology is likely to flourish in Shanghai, Shenzhen, Hangzhou, Guangzhou, Beijing, Chengdu, Nanjing, Wuhan, Changsha, and Xi’an. According to the August 2022 monthly report on the new energy vehicle industry released by the China Automobile Dealers Association’s Automotive Market Research Branch, the 10 cities with the highest new energy passenger vehicle market sales were Shanghai, Shenzhen, Hangzhou, Guangzhou, Beijing, Chengdu, Zhengzhou, Suzhou, Wuhan, and Tianjin, thus including 7 of the cities predicted based on the indicators described in this study. In other words, the predictions of this study are largely consistent with the observed reality. Regarding the limitations of this study, in the first place, the analysis presented here relied on the Baidu index. While this index can serve to measure the volume of public internet searches for information on-demand and, thus, the desire on the part of the public for relevant information, since it is based on a certain number of searches, it cannot reflect public search behavior in its entirety. Moreover, the keyword classification and aggregation used here were necessarily selective, and the index can only reflect search behavior early on for specific events. Further exploration of public attitudes toward autonomous driving technology could, accordingly, combine questionnaires and interviews to yield more detailed insights into these issues.

References 1. Ang, J., Shen, M.: Investigating consumer market acceptance of automated vehicles in China. 中国自动驾驶汽车消费市场接受度调查. J. Chang’an Univ. (Social Science Edition) 6, 34– 42 (2017) 2. You, F., Chu, X.: Research on investigation of consumer market acceptance for intelligent vehicles in China: case study in Guangzhou. 中国智能驾驶接受度调查——以广州为例. J. Guangxi Univ. (Natural Science Edition) 2, 534–545 (2019). https://doi.org/10.13624/j.cnki. issn.1001-7445.2019.0534 3. Gkartzonikas, C., Gkritza, K.: What have we learned? A review of stated preference and choice studies on autonomous vehicles. Transp. Res. Part C: Emerg. Technol. 98, 323–337 (2019). https://doi.org/10.1016/j.trc.2018.12.003 4. Peng, L., Yang, R., Xu, Z.: Public acceptance of fully automated driving: effects of social trust and risk/benefit perceptions. Risk Anal.: Official Publ. Soc. Risk Anal. 39, 326–341 (2019). https://doi.org/10.1111/risa.13143 5. Buckley, L., Kaye, S., Pradhan, A.K.: Psychosocial factors associated with intended use of automated vehicles: a simulated driving study. Accid. Anal. Prev. 115, 202–208 (2018). https:// doi.org/10.1016/j.aap.2018.03.021

The City Scale Effect and the Baidu Index Prediction

439

6. Madigan, R., Louw, T., Wilbrink, M., Merat, N.: What influences the decision to use automated public transport? Using UTAUT to understand public acceptance of automated road transport systems. Trans. Res. Part F: Psychol. Behav. 50, 55–64 (2017). https://doi.org/10.1016/j.trf. 2017.07.007 7. Choi, J.K., Ji, Y.G.: Investigating the importance of trust on adopting an autonomous vehicle. Int. J. Hum-Comput. Interact. 31, (2015). https://doi.org/10.1080/10447318.2015.1070549. 150709133142005 8. Peng, L., Yang, R., Xu, Z.: How safe is safe enough for self-driving vehicles? Risk Anal.: Official Publ. Soc. Risk Anal. 39, 315–325 (2019). https://doi.org/10.1111/risa.13116 9. Jing, P., Huang, F., Xu, G., Wang, W.: Analysis of autonomous driving payment willingness and influencing factors. 自动驾驶支付意愿及影响因素分析. J. Chang’an Univ. (Natural Science Edition) 1, 90–102 (2021). https://doi.org/10.19721/j.cnki.1671-8879.2021.01.010 10. Zhao, M., Wang, S.: Investigating the intention for electric vehicle sharing. 电动汽车共享 的使用意向研究. J. Dalian Univ. Technol. (Social Sciences) 6, 34–42 (2018) 11. Zhang, T., Tao, D., Qu, X., Zhang, W.: The roles of initial trust and perceived risk in public’s acceptance of automated vehicles. Trans. Res. Part C: Emerg. Technol. 98, 207–220 (2019). https://doi.org/10.1016/j.trc.2018.11.018 12. Moody, J., Bailey, N., Jinhua, Z.: Public perceptions of autonomous vehicle safety: an international comparison. Saf. Sci. 121, 634–650 (2020). https://doi.org/10.1016/j.ssci.2019. 07.022 13. Kyriakidis, M., Happee, R., de Winter, J.C.F.: Public opinion on automated driving: results of an international questionnaire among 5000 respondents. Transp. Res. Part F: Psychol. Behav. 32, 127–140 (2015). https://doi.org/10.1016/j.trf.2015.04.014 14. Sierzchula, W., Bakker, S., Maat, K., van Wee, B.: The influence of financial incentives and other socio-economic factors on electric vehicle adoption. Energy Policy 68, 183–194 (2014). https://doi.org/10.1016/j.enpol.2014.01.043 15. Nie, W., Cai, P.: Making cities more youth-friendly: the impact of social quality on youth’s sense of achievement. 让城市对青年发展更友好: 社会质量对青年获得感的影响研究. China Youth Study 3, 53–60+119 (2021). https://doi.org/10.19633/j.cnki.11-2579/d.2021. 0023 16. Cai, S.: Rumination on the relationship between partial first wealth and common wealth. 部 分先富与共同富裕的关系刍议. Red Flag Manuscript 2, 7–8 (1999) 17. Cai, F.: Citizenship of rural migrant workers and the development of new consumers. 农民 工市民化与新消费者的成长. J. Grad. School Chin. Acad. Soc. Sci. 3, 5–11 (2011) 18. Ouyang, L.: Transformation of the farmer-turned-workers’ consumption behavior and modes. 进城农民工消费行为与消费方式探析——湘潭市进城农民工消费的调查与分析. Econ. Manage. 4, 38–40 (2006) 19. Sparrow, R., Howard, M.: When human beings are like drunk robots: driverless vehicles, ethics, and the future of transport. Trans. Res. Part C: Emerg. Technol. 80, 206–215 (2017). https://doi.org/10.1016/j.trc.2017.04.014 20. Meyer, J., Becker, H., Bösch, P.M., Axhausen, K.M.: Autonomous vehicles: the next jump in accessibilities? Res. Transp. Econ. 62, 80–91 (2017). https://doi.org/10.1016/j.retrec.2017. 03.005 21. Bansal, P., Kockelman, K.M., Singh, A.: Assessing public opinions of and interest in new vehicle technologies: an Austin perspective. Trans. Res. Part C: Emerg. Technol. 67, 1–14 (2016). https://doi.org/10.1016/j.trc.2016.01.019 22. Fagnant, D.J., Kockelman, K.: Preparing a nation for autonomous vehicles: opportunities, barriers and policy recommendations. Trans. Res. Part A: Policy Pract. 77, 167–181 (2015). https://doi.org/10.1016/j.tra.2015.04.003 23. Krueger, R., Rashidi, T.H., Rose, J.M.: Preferences for shared autonomous vehicles. Trans. Res. Part C: Emerg. Technol. 69, 343–355 (2016). https://doi.org/10.1016/j.trc.2016.06.015

440

J. Chen et al.

24. Zeithaml, V.A.: Consumer perceptions of price, quality, and value: a means-end model and synthesis of evidence. J. Mark. 52, 2–22 (1988). https://doi.org/10.1177/002224298805 200302 25. Bettman, J.R.: Perceived risk and its components: a model and empirical test. J. Mark. Res. 10, 184–190 (1973) 26. Dinev, T., Hart, P.: An extended privacy calculus model for e-commerce transactions. Inf. Syst. Res. 17, 61–80 (2006). https://doi.org/10.1287/isre.1060.0080 27. Featherman, M.S., Pavlou, P.A.: Predicting e-services adoption: a perceived risk facets perspective. Int. J. Hum. Comput. Stud. 59, 451–474 (2003). https://doi.org/10.1016/S1071-581 9(03)00111-3 28. Grewal, D., Gotlieb, J., Marmorstein, H.: The Moderating effects of message framing and source credibility on the price-perceived risk relationship. J. Consum. Res. 21, 145–153 (1994). https://doi.org/10.1086/209388 29. Chen, R., He, F.: Examination of brand knowledge, perceived risk and consumers’ intention to adopt an online retailer. Total Qual. Manage. Bus. Excellence 14 (2010). https://doi.org/ 10.1080/1478336032000053825 30. Kim, H.K., Lee, M., Jung, M.: Perceived risk and risk-reduction strategies for high-technology services. In: Ha, Y.-U., Yi, Y. (eds.) AP-Asia Pacific Advances in Consumer Research, vol. 6, pp. 171–179. Association for Consumer Research, Duluth (2005) 31. Wang, S., Wang, J., Li, J., Liang, L.: Policy implications for promoting the adoption of electric vehicles: do consumer’s knowledge, perceived risk and financial incentive policy matter? Trans. Res. Part A: Policy Pract. 117, 58–69 (2018). https://doi.org/10.1016/j.tra. 2018.08.014 32. Sheridan, T.B.: Considerations in modeling the human supervisory controller. IFAC Proc. Volumes (1975). https://doi.org/10.1016/S1474-6670(17)67555-4 33. Bailey, N.R., Scerbo, W.M.: Automation-induced complacency for monitoring highly reliable systems: the role of task complexity, system experience, and operator trust. Theor. Issues Ergon. Sci. 8, 321–348 (2007). https://doi.org/10.1080/14639220500535301 34. Gefen, D., Karahanna, E., Straub, D.W.: Trust and TAM in online shopping: an integrated model. MIS Q. 27, 51–90 (2003). https://doi.org/10.2307/30036519 35. Muir, B.M., Moray, N.: Trust in automation. part II. experimental studies of trust and human intervention in a process control simulation. Ergonomics 39, 429–460 (1996). https://doi.org/ 10.1080/00140139608964474 36. Lee, J.D., Neville, M.: Trust, self-confidence, and operators’ adaptation to automation. Int. J. Hum. Comput. Stud. 40, 153–184 (1994). https://doi.org/10.1006/ijhc.1994.1007 37. Zhao, C., Yang, Y., Wu, S., Zhen, Q: Search trends and prediction of human brucellosis using Baidu index data from 2011 to 2018 in China. Sci. Rep. 1 (2020). https://doi.org/10.1038/s41 598-020-62517-7 38. Deng, J.: Study on the analysis of rural residents’ living consumption in China: a comprehensive analysis based on spatio-temporal data. 中国农村居民生活消费分析研究——基于时 空数据的综合分析. Times Finan. 26, 24–26 (2020) 39. Song, J.: Analysis of data on urban residents’ per capita income and expenditure from 2013 to 2016 based on the national Bureau of statistics, China statistical yearbook 2017. 关于2013年到2016年城镇居民人均收支数据的分析——基于国家统计 局 《2017年中国统计年鉴》 . China New Telecommun. 9, 221–223 (2018) 40. Liu, X., Zhang, P., Shi, X.: Industrial agglomeration, technological innovation and highquality economic development: empirical research based on China’s five major urban agglomerations. 产业集聚、技术创新与经济高质量发展——基于我国五大城市群的实 证研究. Reform 4, 68–87 (2022) 41. Mitchell, V.-W.: Consumer perceived risk: conceptualisations and models. Eur. J. Mark. 33, 163–195 (1999). https://doi.org/10.1108/03090569910249229

The City Scale Effect and the Baidu Index Prediction

441

42. Qian, L., Yin, J.: Linking Chinese cultural values and the adoption of electric vehicles: the mediating role of ethical evaluation. Transp. Res. Part D: Transp. Environ. 56, 175–188 (2017). https://doi.org/10.1016/j.trd.2017.07.029 43. Burgess, M., King, N., Harris, M., Lewis, E.: Electric vehicle drivers’ reported interactions with the public: driving stereotype change? Transp. Res. Part F: Psychol. Behav. 17, 33–44 (2013). https://doi.org/10.1016/j.trf.2012.09.003

Human-Computer Interaction Analysis of Quadrotor Motion Characteristics Based on Simulink Zengxian Geng1,2(B) , Junyu Chen1 , Xin Guang1 , and Peiming Wang1 1 School of Air Traffic Management, Civil Aviation University of China, Tianjin 300300, China

[email protected] 2 Key Laboratory of Civil Aviation Flight Wide Area Surveillance and Safety Control

Technology, Tianjin 300300, China

Abstract. The creation of unmanned aerial vehicles (UAVs) has recently become a hot issue in the new round of the global scientific and technological revolutions, as well as the industrial revolution. Meanwhile, in order to satisfy higher operational requirements, it is vital to investigate the human-machine interaction environment of UAVs and encourage the advancement of applied research. The Simulink platform was paired with the Vicon motion capture system, which includes an infrared high-speed capture camera, to provide a human-computer interface operating environment for quadrotor UAV motion characteristics. Based on the analysis of the motion characteristics of the UAV, the Vicon motion capture system is used to obtain specific data during the actual operation, and the simulation environment is constructed by Simulink. The findings demonstrate the fundamental features of picture display, mode selection and switching, flight speed adjustment, and data selection for display. The simulation environment is used to conduct vertical, translational, hovering, and integrated UAV flight tests. And finally, the human-computer interaction UAV motion characterization environment designed in this paper achieves a functional visualization design for bringing motion characterization and motion state display to UAVs. Keywords: UAV · Human-Computer Interaction · Simulation · Motion Characteristics

1 Introduction With the fast development of different software and hardware technologies such as communication, flight control, sensing and navigation, the application scenarios and application domains of UAVs have become increasingly richer and expanded in recent years. Experts and academics from all around the world have undertaken extensive study and analysis on the dynamic features and operation simulation of the UAV operation process. The United States Air Force Research Laboratory’s Leo Gugerty, Ellen Hall, and William Tirre proposed a method to obtain information from the operation of the capture software and described how CTA is applied to the design of the synthetic mission © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 442–453, 2023. https://doi.org/10.1007/978-3-031-35389-5_30

Human-Computer Interaction Analysis of Quadrotor Motion

443

environment simulating the operation of the capture software’s ground control station [1]. ShixinMao, WeeKiatTan, and K.H.Low of the Nanyang University of Technology in Singapore used the Vicon motion capture system to track, record flight data, and control the flight formation of the four-axis aircraft [2]. Seong-yeopMun et al. designed an integrated UAV simulation environment, which interconnected the flight simulator, Simulink and a target avionics simulation model, developed the collision warning module of TCAS, and tested it with IUS and flight encounter model [3]. Khalid bin Hasnan et al. obtained telemetry and visualization data from unmanned vehicles, and converted the information into useful data from unmanned aerial vehicles with the help of the joystick, RC controller and Simulink toolbox of MATLAB [4]. SezerÇOBAN、Tu g˘ RulOKTAY has studied the design of small unmanned aerial vehicle (UAV) and the real-time application of flight control system and transverse space model [5]. According to the specific aerodynamic parameters and simulation data flow of UAV, Yunchao et al. established a modular simulation model of a small UAV based on Matlab and Simulink environment [6]. Wu Tong built a hardware-in-the-loop simulation experimental platform for four-rotor aircraft using Vicon motion capture Matlab/Simulink [7]. Cao Jun built a formation control system based on the Vicon system in the indoor environment for the formation control of four-rotor UAVs [8]. Zhang Mingjia and Feng Xiu designed and built a simulation experiment platform based on flight gear based on the six-degreeof-freedom four-rotor UAV model [9]. Based on the basic structure, flight principle, dynamic modeling and common PID and sliding mode controller of four-rotor UAV, Yuan Yuan Yuan carried out the development of the overall architecture of hardware and software of the hardware-in-the-loop simulation experimental platform [10]. The four-rotor UAV is chosen as the research object in this work. The design of a multi-rotor UAV motion simulation environment based on visualization is constructed on the basis of describing the motion characteristics and simulation environment of UAV, and the flight experiments of UAV in vertical, translation, hover, and integrated operation are carried out using the simulation environment, which provides a basis for the research of UAV motion characteristics.

2 Method 2.1 Flight Principle and Motion State of Multi-rotor UAV The principle of altering the flying attitude of a multi-rotor UAV in the air is to adjust the rotation speed of the rotors to create tension on the UAV body, and then drive the body to change the flight attitude using the created torque. Figure 1 depicts the general structural model of a four-rotor UAV: The four-rotor UAV’s motors 1 and 3 rotate counterclockwise, while motors 2 and 4 revolve clockwise. The torque created by the two opposing groups of rotors offsets each other, allowing the UAV to fly in a balanced condition. The overall lift of the UAV may be increased by increasing the output power of the four motors at the same time. When the overall pull created by the rotor is increased sufficiently to overcome its own gravity, the UAV begins to ascend vertically, conversely, when the output power of the four motors is reduced, the UAV begins to plummet vertically. The UAV is in hover when the lift provided by its four rotors equals its own gravity.

444

Z. Geng et al.

Fig. 1. Quadcopter UAV model

The lift between various wings is varied when the output power of the engine is changed, and the torque can cause the body to realize pitching, rolling, yaw, and other motion states. Figure 2 depicts the motion state of a multi-rotor UAV:

Fig. 2. Motion state of multi-rotor UAV

According to the above analysis, the multi-rotor UAV can change the motion speed of four motors under different motion conditions, thus realizing different motion states of the UAV. The relationship between the motor speed and flight of the multi-rotor UAV is shown in Table 1: Table 1. Relation between four-rotor speed and flight action Action

Motor1

Motor2

Motor3

Motor4

Hovering (steady state)

M1

M2

M3

M4

Rising (vertical movement)

M1+

M2+

M3+

M4+

Descending (vertical movement)

M1−

M2−

M3−

M4−

Left (roll motion)

M1+

M2−

M3−

M4+

Right (roll motion)

M1−

M2+

M3+

M4−

Forward (pitch motion)

M1−

M2−

M3+

M4+

Back (pitch motion)

M1+

M2+

M3−

M4−

Clockwise (right yaw)

M1+

M2−

M3+

M4−

Counterclockwise (left yaw)

M1−

M2+

M3−

M4+

Human-Computer Interaction Analysis of Quadrotor Motion

445

2.2 Kinematic Model of Multi-rotor UAV During the flight motion, the quadrotor UAV is susceptible to external influences due to the complexity of the actual flight motion, the influence of its own body structure and aerodynamic characteristics, including forces and moments, wind deviation, structural stability of the body and other uncertain factors. Therefore, in order to reasonably analyze the lift, gravity, friction and air resistance of the UAV in flight, accurately analyze and calculate the attitude, velocity and other UAV motion information in flight with time. According to Newton’s second law, the linear equations of motion of the UAV are obtained as follows by combining the angular equations of motion and Newton’s second law linear equations of motion in the ground coordinate system and again deriving the first-order derivatives of displacement in three directions: ⎧    (sin θ cos φ cos ϕ+sin φ sin ϕ) ⎪ ⎪ × 4i=1 Fi − KDx xm ⎨x = m    (1) y = (sin θ cos φ cosmϕ−sin φ sin ϕ) × 4i=1 Fi − KDy ym ⎪  ⎪ 4  ⎩ cos φ cos θ z z = × i=1 Fi − KDz m − g m F i is used to represent the lift of a single rotor rotation, K D is the air drag coefficient, θ ,ψ,ϕ,respectively represent the pitch angle, roll angle and yaw angle of the UAV. To analyze the lifting, rolling, pitching and yaw motions of UAV, it is necessary to change the speed of the motor. Therefore, the four-rotor UAV dynamics model is decomposed into four input channels U1–U4 to control the lifting, rolling, pitching and yaw respectively. It is assumed that U1 represents the total lift of the four rotors, that is, the control amount of vertical motion, and U2, U3 and U4 represent the differential force of the body when rolling, pitching and yawing. Define four control inputs, Convert the linear motion equation of UAV, and the angular motion equation generated by the rotating motion of rotor UAV body is: ⎧  ⎪ ⎨ ωx = U2 L/IX + (IY − IZ )ωY ωZ /IX  (2) ωy = U3 L/IY + (IZ − IX )ωX ωZ /IY ⎪ ⎩ ω = U L/I + (I − I )ω ω /I 4 Z X Y X Y Z z L represents the moment length of UAV, ω Is the angular acceleration of the UAV, and I i is the moment in a certain direction. Based on the above formula integration, the 6-DOF motion model is derived and integrated as follows: ⎧  x = (sin θ cos φ cosmϕ+sin φ sin ϕ) × U1 ⎪ ⎪ ⎪ ⎪ ⎪ y = (sin θ cos φ cos ϕ−sin φ sin ϕ) × U1 ⎪ m ⎪  ⎨ z = cos φmcos θ × U1 − g (3)    ⎪ φ = [U2 L + (IY − IZ )θ ϕ ]/IX ⎪ ⎪    ⎪ ⎪ θ = [U3 L + (IZ − IX )ϕ ϕ ]/IY ⎪ ⎪   ⎩  ϕ = [U4 L + (IX − IY )θ φ ]/IZ

446

Z. Geng et al.

2.3 Multi-rotor UAV Simulation Environment The multi-rotor UAV simulation environment is mainly constructed in the laboratory using the Vicon system. The hardware facilities of the Vicon system are mainly composed of the Vicon PC host, MX components, calibration kit, MX infrared high-speed capture camera, and experimental objects with fluorescent marker balls fixed. The built-in Vicon Tracker human-computer interface in the system can monitor and display the flight status of the experimental object in real time (Fig. 3).

Fig. 3. UAV simulation environment based on Vicon system

The operator drives the UAV with the fluorescent ball mark through the program and changes the attitude. The optical motion capture system captures the real-time flight moving point. The flight data is transmitted to the host computer through MX components for recording. After the final data is processed, it is converted into a visual image in real time. The specific UAV human-computer interaction process is shown in Fig. 4 below. The real-time flight data acquired from the Vicon system can subsequently be used for online or offline motion data processing and analysis, which is very relevant for the study of UAV motion characteristics, remote control interaction, and visualization problems. The hardware facilities of the experimental system mainly consist of a Vicon PC host, MX components, calibration kit, MX infrared high-speed capture camera, and experimental objects with fluorescent marker balls fixed on them. The experimental facilities and flow chart are shown in Fig. 4. (In Fig. 4, Number 1 is an MX infrared high-speed capture camera, number 2 is an experimental flight drone, number 3 is an MX component, number 4 is a data calibration kit, and number 5 is a client configured with a Vicon system.) Among them, the fluorescent marker ball is used to mark the experimental object. The object with the fluorescent marker ball fixed will be captured by the infrared high-speed camera. The number of fluorescent marker balls is determined according to the size of

Human-Computer Interaction Analysis of Quadrotor Motion

447

Fig. 4. UAV human-computer interaction

the research object and data acquisition requirements. Generally, an experimental object needs at least three or a more fluorescent marker balls to mark. The UAV “Xiao” Spark used in this paper is marked with three fluorescent marker balls, The function of MX infrared high-speed capture camera is to capture the motion tracks and data of the marked experimental objects in the experimental environment, which are distributed in specific locations of the running scene, and can also obtain the motion tracks and data of multiple experimental objects at the same time, MX components mainly include MX Link, MX Net and MX control. Its function is to connect multiple groups or multiple MX infrared cameras to the Vicon PC host through the gigabit network, The Vicon PC host is the core component. Users use the PC host for system configuration and upper application development during use, including installation and configuration software and Vicon SDK, The calibration kit includes a T-calibration rod (embedded with multiple Marker acquisition points) and the fluorescent marker ball mentioned above. The T-calibration rod and the fluorescent marker ball can accurately calibrate the motion capture system in the experiment and perform stable output in the background [7] (Fig. 5). The Vicon Tracker interface allows you to visually observe whether each MX infrared camera in the system is working properly and whether experimental subjects are being captured in the system. The entire system can also be calibrated and calibrated in Vicon Tracker to output more accurate motion data. Also, the calibrated capture objects can be set and named in the interface using the fluorescent calibration sphere. After getting the motion data of the UAV operation environment, Matlab is used to output and display the dynamic state of the UAV. In this paper, the motion of the UAV is simulated in the form of moving points, and only three data of displacements (Tx, Ty, Tz) on the X, Y and Z axes are used to complete the simulation of the UAV flight. According to the optimized data collected and processed by the experiment, the dynamic and static display of the 3D trajectory and the dynamic data display of the moving points are realized in Matlab, and the design of the initial simulation environment is completed.

448

Z. Geng et al.

Fig. 5. Operation interface of Vicon Tracker

3 Results and Discussion This paper uses the application App Designer in Matlab to design the human-computer interaction interface of UAV. Compared with the UI design of traditional application GUI, App Designer has more optional controls, more refined control design, more convenient interface design process, and more direct and beautiful display effect. Compared with the UI design of the interface, App Designer has more optional controls, more refined control design, and more convenient interface design process. It is a running platform built on modern web technology that can flexibly keep up with user needs and create a more modern and friendly open world for users. The main functions of human-computer interaction include the following: • Image display function. In the process of interface visualization, image display is essential. Image display is divided into dynamic image display and static image display. The dynamic image needs to include the 3D dynamic image, top dynamic image and side dynamic image of the UAV in operation, the static image is the speed time image of the UAV, the coordinates of specified points are added to the 3D image for display, and the button control component is added to the above image display function for indication. • Mode selection and switching After implementing the initial image display, it is also necessary to switch between four different types of motion in the graphics box: vertical, level flight, hover, and synchronization, use the drop-down boxes to achieve this functionality. Each time the drop-down box is toggled, the dynamic 3D view, top view and side view are converted to the corresponding motion images. The velocity time image will also change when the control button is clicked.

Human-Computer Interaction Analysis of Quadrotor Motion

449

• Speed regulation Add speed adjustments to dynamic images as they are displayed. Use the Speed adjustment button in the App Designer component library to divide the display speed of the image into three levels. “Fast”, “Medium”, and “Slow”, image speed adjustment can make the process of displaying images more intuitive and clear. • Select image display of other In order to make the interface truly data-driven, in addition to the image display and simulation of vertical, horizontal, hover and integrated motion, it is necessary to add the ability to select excel files and read data for the interface and output the data as dynamic 3D, top view, side view and corresponding velocity time images, to achieve this function, the data collected by the Vicon system can be directly output as images for analysis and observation, The components required for interface design include four coordinate areas (UIAxes), twelve buttons, two knobs, a drop-down box, a text box and several labels. The specific distribution on the interface is divided into three areas: left panel, UIAxes4, label 1 and 2; The left panel contains three coordinate areas: 3D view, top view and side view. After determining the page layout and using components, you need to write private functions and define the display code of different images, and then give functions to many command buttons through callback functions. After the above functional analysis, a human-computer interaction interface that can achieve the expected functions is designed based on the model and data requirement related to the human-computer interaction of the Simulink-based UAV motion characteristics study proposed in this paper. After correcting some details, the final UI interface is obtained. It is a multi-functional HCI interface that can display various motion trajectories and velocity images of the UAV through data-driven (Fig. 6).

Fig. 6. Human-machine interface

After collecting and processing the experimental data of the UAV in the simulated environment, the motion characteristics of the UAV in vertical motion, level flight

450

Z. Geng et al.

motion, hovering motion and integrated flight states were studied using the designed visual human-computer interaction interface. 3.1 Vertical Motion Characteristics The data path of the vertical experiment is input into the. m program of the image display, and the 3D, top and side views of the vertical movement process are obtained, as shown in Fig. 7:

(a) motion simulation

(b) velocity-time

(c) Side view

(d) Top view

Fig. 7. Analysis of vertical motion characteristics

From the velocity-time image of vertical motion, it can be seen that when the UAV reaches the position where it changes the motion state, the speed will change significantly, After climbing to the specified altitude, the speed begins to drop sharply. After a short irregular flight, the speed starts to accelerate and descend. The speed increases first and then decreases, and finally lands at a slope of 30–40 cm close to the ground. Basically consistent with the trajectory of the experimental UAV. When simulating the vertical movement, the rotation speed of the four motors of the rotorcraft UAV is the same and consistent, and only the position coordinate changes on the Z axis will occur in the ideal state. However, from the 3D graph displayed by the collected data in the actual experiment process, a certain amount of displacement of the X-axis and Y-axis position coordinates will occur during the vertical climbing or descending process, which may be affected by the unskilled operation of the UAV and the delay of transmission signal, resulting in a large displacement of the UAV in the climbing phase during the experiment. 3.2 Horizontal Flight Motion Characteristics The level flight motion intercepts a part of the collected data for simple simulation, and inputs the data path into the. m program displayed in the image respectively, and obtains the 3D map (a), speed-time map (b), top view (c), and side view (d) during the level flight, as shown in Fig. 8 below. Observe its velocity-time graph (b). During the acceleration phase, the acceleration gradually decreases until it approaches zero. It can be seen from the graph that there are some points that deviate severely from the curve, and the velocities presented by these points are not realistic and can be ignored. Ideally, the position synergy offset of

Human-Computer Interaction Analysis of Quadrotor Motion

(a) motion simulation

(b) velocity-time

(c) Side view

451

(d) Top view

Fig. 8. Simulation of level flight motion

the rotorcraft Z-axis during level flight should be zero, however, it can be seen from the simulation plot that the coordinates in the Z-axis direction are unstable, and the trend of the 400 data points is first falling and then rising. The overall offset is small and the error is within the controllable range. 3.3 Hover at Fixed Point For the fixed-point hover experiment, the intercepted data can be divided into three stages: climbing - hovering - landing, Input the data path into the. m program of the image display respectively, and obtain the 3D (a), speed-time (b), top view (c), and side view (d) of the hover experiment, as shown in Fig. 9 below.

(a) motion simulation

(b) velocity-time

(c) Side view

(d) Top view

Fig. 9. Hovering motion simulation diagram

In the process of vertical rise, the speed-time image is close to a straight line, uniformly accelerating upward motion. After rising to the specified height, the speed starts to drop sharply, and is close to uniformly decelerating motion, After entering the hover state, the speed decreases to close to zero, and keeps this state for a period of time, and finally enters the descent phase, the speed increases first and then decreases. In the ideal state of fixed-point hovering flight, the coordinates of the X, Y and Z axes of the UAV will not change with time, and the speed will also be stable at zero. However, from the data simulation diagram (a), when the UAV hovers in the air at a fixed point, the position coordinates of its X and Y axes will shift slightly to their respective positive directions (right), Observe its side view (c), the Z-axis coordinate basically tends to be

452

Z. Geng et al.

stable in hover, but the altitude will fluctuate from high to low, which may be related to the signal transmission or the change trend of pitch angle. It is necessary to further study the attitude angle and adjust the relevant parameters of the simulation experiment. 3.4 Integrated Sports The current simulation includes four stages: vertical climb, straight level flight, fixedpoint hover and yaw descent. Input the data path into the. m program displayed in the image respectively, and get the simulation image of the comprehensive experiment, as shown in Fig. 10 below.

(a) motion simulation

(b) velocity-time

(c) Side view

(d) Top view

Fig. 10. Comprehensive motion simulation chart

In the vertical climbing stage, the UAV quickly rises to the specified height. It can be observed that the displacement of the X-axis and the Y-axis are relatively small, and the coordinate of the Z-axis increases rapidly, which is the same as that of the front vertical motion analysis, After entering the level flight stage, the UAV begins to level off in the negative direction of the Y-axis immediately after the vertical climb. According to the previous analysis, it is speculated that the continuous height loss occurs during the whole level flight process because the height loss always occurs after the vertical movement, and the track of the whole vertical climb and level flight stage is very close to a straight line, The position coordinates of the UAV on the X, Y and Z axes in the next hovering phase are relatively stable, belonging to a small range of fluctuations, In the final yaw descent phase, it is observed in the top view (d) and side view (c) that the X and Y axis directions have a small range of deviation with the change of yaw direction, and at the same time, it is also accompanied by a slight increase in the Z axis coordinates. After completing the yaw, the UAV starts to move in the positive direction of the X-axis and lower the altitude. The track in the top view is close to a straight line, and the speed increases first and then decreases. After the simulation of the integrated experiment, it was found that several motions were relatively complete and effective after being combined, and the results were basically not much different from the results of the separate analysis, which proved the basic reliability of the follow-up integrated track simulation experiment.

Human-Computer Interaction Analysis of Quadrotor Motion

453

4 Conclusion This paper takes a quadrotor UAV as the research object. On the basis of establishing its kinematic model and analyzing the motion characteristics of the UAV, the Vicon motion capture system is combined with MATLAB Simulink, and Vicon is used to obtain the actual operation process data and build the simulation environment through Simulink. Simulink-based human-computer interaction operating environment for the motion characteristics of quadrotor UAV is built to bring a functional visual design for UAV motion characteristics analysis and motion state display, and the designed humancomputer interaction interface is used to study the motion characteristics of UAV such as vertical, level flight and fixed-point hovering. Acknowledgements. The authors acknowledge the funding supports from the National key research and development plan manned unmanned aerial vehicle integrated operation safety risk monitoring technology (2022YFB4300904) project. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

References 1. Gugerty, L., Hall, E., Tirre, W.: Designing a simulation environment for uninhabited aerial vehicle (UAV) operations based on cognitive task analysis. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 42, no. 23 (1998) 2. Mun, S.Y., Kim, J.Y., Lee, D.W., Baek, G.M., Kim, J.S., Na, J.: Study on the Integrated UAV simulation environment for the evaluation of the Midair collision alarm system. J. Adv. Navig. Technol. 19(4) (2015) 3. Yun, C., Li, X., Zheng, Z.: Design of a hardware-in-the-loop UAV simulation system based on Matlab/Simulink. Comput. Meas. Control 20(12), 3306–3308+3324 (2012) 4. Wu, T.: Control design and hardware-in-the-loop simulation experiment platform development for Quadrotor. Shanghai Jiaotong University (2015) 5. Zhang, M.J., Feng, X.: Experimental platform design of quadrotor UAV simulation based on Matlab/FlightGear. Wireless Connected Technol. 17(06), 45–46+52 (2020) 6. Yuan, Y.: Research on model-based semi-physical simulation platform for quadrotor UAV. Comput. Measur. Control 29(01), 174–178 (2021)

Fluidics-Informed Fabrication: A Novel Co-design for Additive Manufacturing Framework Gabriel Lipkowitz, Eric S.G. Shaqfeh, and Joseph M. DeSimone(B) Stanford University, Stanford, CA 94305, USA [email protected]

Abstract. Effective design for additive manufacturing (DfAM) tools for ensuring digital models are 3D printable are crucial for realizing the benefits of this burgeoning fabrication technology. However, existing tools do not allow engineers to co-design the part itself for the manufacturing process, instead imposing cumbersome support structures on the model after the design stage. Here, we present a physics-informed generative design framework allowing designers to readily take into account complex printing process fluid dynamics concurrent with 3D model development, facilitating printability and multimaterial printing. Keywords: 3D printing Computational design

1 1.1

· Design for Additive Manufacturing ·

Introduction Design for Additive Manufacturing

3D printing allows designers to rapidly fabricate highly complex computer-aided design (CAD) models unmakeable by traditional manufacturing methods, and reduces the barriers to entry for users interested in digital fabrication. This includes for the manufacture of interactive devices of interest to the humancomputer interaction community, ranging from wearables Everitt et al. (2021) to augmented reality devices Bai et al. (2021). Soft materials have been of particular interest, including elastomers Schmitz et al. (2017) and less traditional, but still printable, flexible materials Hudson (2014); Peng et al. (2015). Combined with lattice architectures achievable only with 3D printing, this enables even more complex interactions including for e.g. haptic gloves Moheimani et al. (2021). After printing, such multimaterial structures can be functionalized with embedded capacitive touch Schmitz et al. (2015) or optoelectronic Willis et al. (2012) sensors. However, a major learning barrier remains to train human designers, often accustomed to traditional manufacturing techniques, to effectively design for this new technology. The challenge is summarized in the left column in Fig. 1. In brief, poor designs leading to failed prints result in large material waste, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2023  D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 454–466, 2023. https://doi.org/10.1007/978-3-031-35389-5_31

Fluidics-Informed Fabrication

455

Human 3D modeler

DESIGN

FLUIDIC CO-DESIGN fluid pressure [kPa]

10

CAD geometry

5 0 -5 -10

faster printing

slower printing

fewer support structures

supports required

lower forces

high forces

!

generatively designed microfluidic network

print failure

print success

Fig. 1. Fabrication-informed co-design for 3D printing manufacturability. Existing DfAM approaches do not account for the fluid dynamics of the printing process itself, instead imposing cumbersome support structures that fail to prevent print failures. Our proposed generative co-design method integrates design with fluid dynamics modeling to minimize forces during printing and thus maximize printability.

456

G. Lipkowitz et al.

lost human productivity, and reduced factory outputs Thompson et al. (2016). From the perspective of human-computer interaction, while the digital nature of 3D printing makes it user friendly compared with traditional manufacturing methods, which require significant manual training and accumulated expertise, these pre-design and post-processing steps make 3D printing a still relatively user-unfriendly process. Design for Additive Manufacturing (DfAM) is broadly concerned with developing computational tools usable by human designers so that they can more easily and quickly model for these new digital manufacturing techniques. The most popular and widespread design feature users employ to ensure 3D digital models are 3D printable are supports. Supports attach the to-be-printed part to the build platform of the printer. These are designed after the design stage but before printing to offset forces that otherwise preclude printing, and are necessary for most, if not all, 3D printing jobs Gibson et al. (2021); Yao et al. (2017). Still, supports are not user-friendly and are cumbersome from a designer’s perspective in several ways. They necessitate substantial manual time to detach in post-processing and harm the surface smoothness of the printed object after removal. They also waste large quantities of potentially expensive resin material, which is economically costly to the user. While automated support structure generators are available, moreover, these may not align with a designer’s wishes, requiring substantial re-design effort to revise. Finally, supports can fail to prevent print failures in the first place, causing either part defects or entire print failure. 1.2

Towards Novel Design for Additive Manufacturing Tools

To provide a more user-friendly computational tool for designers seeking to use 3D printing, here we describe our ongoing work on our novel 3D printing method that promises to obviate, or reduce the need for, such supports. Injection 3D printing is a fast, high-resolution vat-based 3D printing method that infuses resin through fluidic channels within the 3D model itself. Salient differences between traditional 3D printing and injection 3D printing are summarized in Fig. 1. Injection has been shown to effectively offset suction forces, hence promising to reduce the need for supports, during printing Lipkowitz et al. (2022). In addition, such channels can rapidly and spatioselectively infuse multiple materials into the printer vat, or the object itself, through software-designed channels specifically designed to craft multimaterial objects. No commercially-available resin printers can currently enable the user to engage in such multimaterial printing. Injection 3D printing requires very slight changes to existing, widely available, inexpensive 3D printers: the only required hardware add-on is any commercially available syringe pump. Injection 3D printing depends upon the precise design of a fluidic system within or around an object to satisfactorily offset suction forces and guide multimaterial distributions. To achieve this, here we also outline the current state of our algorithmic workflow behind our accompanying physics-informed design tool that allows users to implement multimaterial injection printing on their otherwise single material 3D printer, with minimal hardware changes. Our tool takes

Fluidics-Informed Fabrication

457

as input an arbitrary user-defined input mesh and outputs a corresponding fluidic network, generatively designed taking into account surrogate fluid dynamics modeling of the 3D printing process. The digital fluidic network innervates the user’s mesh in preparation for injection 3D printing.

2

User Workflow for Injection 3D Printing

As an add-on extension to widely-available commercial UV resin 3D printers, injection 3D printing does not require users to learn an entirely new fabrication workflow; rather, they can modify existing and well-established resin 3D printing workflows with our new add-on. All that is required in hardware to implement injection 3D printing is a commercially available and inexpensive syringe pump to be connected to the build platform of the printer. Any programmable syringe pump can be used to inject resin; we specifically used a Harvard Apparatus PHD Ultra syringe pump to controllably administer resin flow during and/or after printing. This minimal hardware addition, however, allows injection 3D printing to overcome suction forces during printing that otherwise can cause print failures and require support structures, as shown in the bottom panel of Fig. 1. This experiment was performed for the case of an example mechanical bracket accessory used in, e.g., an automotive design, which has a complicated,

Fig. 2. Injection 3D printing add-on extension to existing UV resin printer in schematic form (a-b) and real-world implementation (c). A syringe pump feeds one or multiple resins to the growing part.

458

G. Lipkowitz et al.

irregular geometry. In traditional 3D printing, such unsupported edges need significant scaffolding to overcome otherwise significant suction forces, requiring the user to cleave afterwards. In injection 3D printing, mechanical infusion of resin is used to offset such forces. 2.1

3D Printer Hardware Interface

For the printer itself, either low-cost home built printers or commercially available printers can be modified. For custom printers, traditional printer hardware elements, namely the build platform and light projector, can be coordinated with the add-on syringe pump by an inexpensive Arduino MEGA 2560 microcontroller mounted with a RAMPS 1.4 shield running open-source Marlin firmware. Flexible tubing is used to connect the syringe pump add-on to the build platform of the printer. Our custom user interface allows the designer to adjust resin flow during or after printing as desired, using our generative design software tool outlined below. Commercially available printers include any of the commercialized CLIP 3D printers by, e.g., Carbon 3D. Our prototype iCLIP printer is shown in Fig. 2. 2.2

Experimental User Design Testing

We validate experimentally that our generatively-designed microfluidic networks allow the designer to effectively fabricate objects by injection 3D printing.

a

Human 3D modeler

c

b

input CAD geometry

co-designed microfluidic network

3D printer implementation

time

real-time view of printer

! print failure

print success

Fig. 3. Experimental validation. A microfluidic network for a designer’s CAD model (a) is confirmed to allow flow of multiple resins by digital imaging of the printer vat (b). Forces during printing are significantly lessened with injection than without injection, i.e. in the traditional process (c).

Fluidics-Informed Fabrication

459

We use as an example geometry a mechanical turbine, as shown in Fig. 3. During printing, as viewed in real time from beneath the printer apparatus, fluidic networks are confirmed to allow designers to precisely control fluid flows within the printer. To confirm that such networks also offset failure-inducing forces, we also install a force sensor on our printer. Every layer during printing, which corresponds with the lifting of the platform, a peak in force reading, of variable amplitude, is recorded by this force sensor. High forces are likely to cause print failures, as indicated in gray in the figure. This occurs when there is not injection during printing. By contrast, low forces allow print success, as indicated in red in the figure. As assessed by such online force sensors, injection through generativelydesigned networks allow designers to significantly reduce forces during printing, compared with control experiments without injection. 2.3

Visualization of Multimaterial Injection Printing

We also experimentally validate that such variable-density generatively-designed fluidic networks, beyond reducing forces during fabrication to aid designers in making their models 3D printable, also allow designers to adjust the fraction of the print region filled by different materials. Such multimaterial injection 3D printing demands such careful control over fluid distributions during printing. To that end, integrated into our generative design software is a module to predict, as explained in more detail below, using surrogate fluid dynamics modeling, outflows of injected material during printing. This allows the user to adjust the fraction of an object filled by different materials, including with materials otherwise too viscous to print with because of the aforementioned suction forces. Figure 3b illustrates boundaries between differently colored materials brought into the printer, either through injection or suction.

3

Generative Design Methodology

Given that the added hardware accessories for injection 3D printing are relatively minor, the significant requirement for a user to implement our fabrication approach is software to assist in the design a fluidic network and hence control injection during printing. To that end, our goal is, for an arbitrary 3D model provided as input by a CAD modeler and 3D printer user, to construct a fluidic network that offsets suction, in order to effectively replace, or minimize, support structures required. Below we describe the software modules required, using as an example a mechanical hinge structure, which possesses multiple arms and rapidly changing cross sectional areas, making the design of a corresponding fluidic network non-intuitive. We computationally design a corresponding suitable 3D printable fluidic network that innervates the part to sufficiently distribute one, or multiple, materials during printing. Specifically, a trajectory optimization formulation fits well the design problem. For an input CAD to be 3D printed via a set of two dimensional layers, the back-end of our tool calculates suction forces, which we flag as

460

G. Lipkowitz et al.

failure-prone CAD regions to the user (Fig. 4). While the fluid dynamics modeling behind our tool is outside the scope of this work, in brief we achieve this by modeling printing process fluid dynamics using lubrication theory Cameron 1971, solved using Poisson’s equation in two dimensions with irregular boundaries Arias et al. 2018. Relevant to the interface of our tool, however, is that in this manner, the designer need not account for complex fluid dynamics, a cause of significant print failures. While traditional support structure generators also automate such printability analysis such that the 3D printer user does not have to intuitively guess when print failures will occur, instead subsequently imposing cumbersome extra solid material to accommodate such forces, we design a fluidic network to instead offset suction directly while fabricating the user’s part. This leads to a natural formal representation of the to-be-printed 3D microfluidic network as a graph with nodes, where nodes in adjacent layers are connected by edges. The problem is therefore twofold: to design an optimal graph network comprising injection nodes and edges to fully offset suction forces for every layer while printing 3D object, along with an appropriate layer-dependent set of pump injection instructions. 3.1

Requirements of the 3D Fluidic Network

Our tool respects the following geometric requirements in the inverse design of a fluidic system for injection 3D printing. First, the design space for the innervating

c

3D printability analysis

material viscosity [cP] 1000

printing speed [mm/hr]

!

b !

!

3000

4000

50

!

2000

!

150

!

75

CAD slices for 3D printing

100

input CAD 3D geometry

125

a

! !

! -10

!

!

-5

0

pressure [kPa]

Fig. 4. A designer’s input CAD is analyzed for printability slice-by-slice, simulating fluid pressure profiles to flag failure-prone slices, here regions of dark grey indicating highly negative pressure (a-b). A range of real-world machine fabrication parameters, e.g. printing speed for a range of arbitrary mechanical designs from a CAD library, can be modeled by our tool (c).

Fluidics-Informed Fabrication

461

fluidic network is specified by the frontiers of a given input CAD geometry, akin to lattice generation DfAM tools. Second, the generated channel arrangement at all layers is fully connected; this results from the need for, during printing, all points to be reachable by the injection site on the platform supplied by the pump. These requirements are met via our tool, implemented in parametric design software, which similar to vascularization algorithms Wang et al. (2021), or tree-building L-system grammars Zhang et al. (2020), procedurally generates microfluidic geometry from the defining designer input geometric mesh. Our method differs fundamentally from these branching structure-generating approaches, however, in optimizing for manufacturability. 3.2

Procedural Modeling Methodology

Our fabrication-aware computational design methodology is divided into three modules, summarized in order below. For clarity, while we distinguish here the three modules that form the back-end of our generative design tool, we note that in real-world operation, the three are tightly integrated, and run in parallel, to produce a viable fluidic design for injection 3D printing. Network Design Module. The initial design of our flow network is outlined in Algorithm 1, which recursively constructs a fully-connected network graph from an input CAD geometry. Our inverse design approach originates a fluidic network with a single input injection point connected to the build platform and, thus, the syringe pump. Then, injection channels are positioned within the user-provided model at optimal positions as determined by surrogate fluid dynamics modeling, in order to offset suction during printing to minimize need for supports. Specifically, high pressure injection sites are incrementally added to alleviate these suction forces (Fig. 5). The 3D point cloud specifying these microfluidic nodes is connected recursively into network branches, whereby shortest paths to a single input source are solved using a modified Dijkstra’s algorithm implementation. Once swept to produce 3D multipipes, Boolean differencing produces negative channels in the original CAD geometry.

Algorithm 1. DESIGN FLOW NETWORK 1: function build microfluidic graph(object P ) 2: for slice 1, 2, . . . in P do 3: Compute force F = x,y p dxdy 4: while F < Fcritical do 5: Add injection point d at argmin p to microfluidic graph µ 6: Recompute force F 7: end while 8: end for 9: return microfluidic graph µ 10: end function

x,y

G. Lipkowitz et al.

a

b input CAD 3D geometry

slice in 3D model

c

number of injection points 0

1

2

3

4

5 10

slice 0

slice 10

5

0 slice 20

slice 30

fluid pressure [kPa]

462

-5

CAD slices for 3D printing slice 40

-10

Fig. 5. After the input CAD geometry is analyzed slice-by-slice to predict detrimental suction forces during printing, forces are nullified by incrementally adding injection nodes to administer fluid.

Genetic Optimization Module. To further enhance microfluidic network design, a genetic optimization loop Moon et al. (2021) minimizes maximum fluid reflow distances during printing (Fig. 6). Algorithm 2 summarizes this module in our generative design tool. We specifically perform a genetic optimization routine to minimize the negative pressure during printing in all 2D cross sections of 3D part, performed on the initial network offsetting suction forces. An optimal solution is that which minimizes negative pressure during printing, but multiple potential solutions are identified by the position of one, or multiple, injection nodes. We mutate potential solutions by shuffling injection nodes in a given cross section of an object, which can be delineated into a number of Voronoi regions organizing points according to their nearest fluid source, either by one injection node containing all closest pixels or by the part contour supplying fluid via

Algorithm 2. OPTIMIZE MICROFLUIDIC NETWORK 1: function optimize network(microfluidic graph D)  2: Compute current force Fcurrent = x,y p dxdy with D 3: for generations 1, 2, . . . , G do 4: Mutate and crossover nodes d1 , d2 , . . . in D 5: if Fmutated < Fcurrent then 6: Set best node positions Dopt = Dmutated 7: end if 8: end for 9: return Dopt 10: end function

Fluidics-Informed Fabrication

463

suction. Our design tool then recomputes for the user pressure fields to optimally alleviate negative pressure regions within the part.

a

b

slice level

input CAD

generation 20

generation 10

generation 0

converged

duct nodes

reflow distance [mm] 10 monotonic non-rational uniform B-spline network

c

initial placement final placement

0

5

network level generation 0

generation 10

generation 20

converged

Fig. 6. Node positions in the initial generatively-designed network (a) are metaheuristically optimized to minimize the fluid reflow distance objective function until convergence (b), yielding a revised B-spline network better minimizing forces during printing (c).

Flow Rate Optimization Module. So that the designer need not intuitively guess what injection rates are required to administer through the fluidic network during printing, we finally integrate a module into our design strategy that computes these rates for all slices during printing. Specifically, optimal flow rates to administer through the network via injection during printing are computed using the circuit analogy for pressure-driven microfluidics Oh et al. (2012). Importantly, this circuit analogy captures the evolving nature of the network. Branches may be either open (with outlets), or branches may become closed if their termini are solidified within the object, thus not contributing to fluid flow. In our evolving graph, this is described by the case when a parent node in a given layer is assigned no child nodes in the next layer, such that node is a leaf node and the corresponding branch terminates. In this case, input injection flow is redirected to remaining active branches. Algorithm 3 summarizes our approach to integrating such flow analysis into our design tool. In short, input injection rates are incremented until fluid suction forces at a particular layer in the part are offset (Fig. 7).

464

G. Lipkowitz et al.

Algorithm 3. OPTIMIZE FLOW RATES 1: function optimize flow rates(object O, microfluidic network N ) 2: for slice s1 , s2 , . . . in O do 3: Build circuit representation for N supplying slice s 4: while F orce > 0 do 5: Increment pump flow rate qin 6: Recompute output flows qout given qin 7: Recompute F orce for s given qout 8: end while 9: end for return pump program P 10: end function a

CAD geometry

b

3D printed microfluidic circuit

c

outflows

printing speed [mm/hr]

injection

multimaterial distributions

material viscosity [cP]

outflows

flow rate [μL/s]

outflows 20

10

0

outflows

microfluidic circuit fluid pressure profiles

slice 100

fluid pressure [kPa]

slice 90

slice 80

flow rate [μL/s]

e

d

3D print injection pump program

20

10

0 10

0

-10

Fig. 7. Flow rate optimization. Given a designer’s input CAD geometry with desired fabrication parameters (a), and generatively-designed microfluidic network (b), an optimal pump profile is computed using the circuit analogy for pressure-driven microfluidics (c) to supply resin to all regions of the printed object (d-e).

3.3

Injection 3D Printing Design Interface

To aid a user new to injection 3D printing, we develop a UI for our generative design tool. Our procedural modeling tool, implemented back-end in Python, is presented as an add-on plug-in to the 3D modeling program Rhinoceros 3D, and specifically its parametric design plug-in Grasshopper (GH). In short, our GH tool enables a Rhino3D CAD designer to design for injection 3D printing. The software imports both the input bounding CAD geometry and the generatively designed network, encoded as a connectivity matrix of nodes and edges. If desired, user modifications may be made via the custom interface. In particular, the user can adjust fluidic network geometry and hyperparameters such as channel radius and curvature prior to printing, using the GH post-processing script.

Fluidics-Informed Fabrication

465

The final network is then translated into spline geometry and then 3D channel geometry to be imported into the Rhino3D design environment (Fig. 8).

3D modeling view

channel geometric hyperparameters

channel smoothness parameters

channel terminus parameters

Fig. 8. Our UI for Rhinoceros 3D, and specifically its Grasshopper parametric design interface, enables the designer to switch between the input 3D model for printing and the generatively-designed fluidic network to fabricate in parallel, if needed tuning channel orientations and other hyperparameters.

4

Conclusions and Future Directions

Here, we presented ongoing work on our novel 3D printing method, Injection 3D printing, and specifically its accompanying 3D generative design tool that allows users to perform this novel fabrication approach. Our method, with minimal changes to the hardware of existing and available resin printers, produces printable fluidic networks that reduce forces, and hence promises to alleviate the need for user unfriendly supports, during printing. It also opens up the potential for users to engage in fast multimaterial resin printing. Future applications-oriented work will investigate the ability of this extensible add-on to commercial resin 3D printers to fabricate highly interactive input devices, specifically with multimaterial latticed elastomers.

466

G. Lipkowitz et al.

References Arias, V., Bochkov, D., Gibou, F.: Poisson equations in irregular domains with Robin boundary conditions-solver with second-order accurate gradients. J. Comput. Phys. 365, 1–6 (2018) Bai, H., Li, S., Shepherd, R.F.: Elastomeric haptic devices for virtual and augmented reality. Adv. Funct. Mater. 31, 39, 2009364 (2021) Cameron, A.: Basic Lubrication Theory. 195p. Longman (1971) Everitt, A., Eady, A.K., Girouard, A.: Enabling multi-material 3D printing for designing and rapid prototyping of deformable and interactive wearables. In: 20th International Conference on Mobile and Ubiquitous Multimedia, pp. 1–11 (2021) Gibson, I., Rosen, D., Stucker, B., Khorasani, M.: Design for additive manufacturing. In: Additive Manufacturing Technologies. Springer, Singapore. pp. 555–607. https:// doi.org/10.1007/978-981-13-8281-9 Hudson, S.E.: Printing teddy bears: a technique for 3D printing of soft interactive objects. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 459–468 (2014) Lipkowitz, G.: Injection continuous liquid interface production of 3D objects. Sci. Adv. 8 (2022) Moheimani, R., Agarwal, M., Dalir, H.:3D-printed flexible structures with embedded deformation/displacement sensing for the creative industries. In: AIAA Scitech 2021 Forum, 0534 (2021) Moon, H., McGregor, D.J. Miljkovic, N., King, W.P.: Ultra-power-dense heat exchanger development through genetic algorithm design and additive manufacturing. Joule 5(11), 3045–3056 (2021) Oh, K.W., Lee, K., Ahn, B., Furlani, E.P.: Design of pressure-driven microfluidic networks using electric circuit analogy. Lab Chip 12(3), 515–545 (2012) Peng, H., Mankoff, J., Hudson, S.E., McCann, J.: A layered fabric 3D printer for soft interactive objects. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 1789–1798 (2015) Schmitz, M., et al.: Capricate: a fabrication pipeline to design and 3D print capacitive touch sensors for interactive objects. In: Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pp. 253–258 (2015) Schmitz, M., Steimle, J., Huber, J., Dezfuli, N., M¨ uhlh¨ auser, M.: Flexibles: deformation-aware 3D-printed tangibles for capacitive touchscreens. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 1001–1014 (2017) Thompson, M.K., et al. : Design for additive manufacturing: trends, opportunities, considerations, and constraints. CIRP Ann. 65(2), 737–760 (2016) Wang, P., Sun, Y., Shi, X., Shen, X., Ning, H., Liu, H.: 3D printing of tissue engineering scaffolds: a focus on vascular regeneration. Bio-design Manuf. 4(2), 344–378 (2021) Willis, K., Brockmeyer, E., Hudson, S., Poupyrev. I.: Printed optics: 3D printing of embedded optical elements for interactive devices. In: Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 589–598 (2012) Xiling Yao, Seung Ki Moon, and Guijun Bi. 2017. A hybrid machine learning approach for additive manufacturing design feature recommendation. Rapid Prototyp. J .l (2017) Zhang, Y., Wang, Z., Zhang, Y., Gomes, S., Bernard, A.: Bio-inspired generative design for support structure generation and optimization in Additive Manufacturing (AM). CIRP Ann. 69(1), 117–120 (2020)

Advanced Audio-Visual Multimodal Warnings for Drivers: Effect of Specificity and Lead Time on Effectiveness Shan Liu, Bohan Wu, Shu Ma, and Zhen Yang(B) Department of Psychology, Zhejiang Sci-Tech University, Hangzhou, China [email protected]

Abstract. Multimodal warning systems have been proved to be more effective than traditional unimodal driving warning systems, but few previous studies have systematically discussed the factors affecting the effectiveness of multimodal warnings. This study was designed to investigate the influence of the specificity of warning content and warning lead time on the effectiveness of audio-visual multimodal warning system based on indicators of drivers’ behavior performance and subjective evaluation. We conducted the experiment in a driving simulator with 37 participants, which included 8 common dangerous driving scenarios, with visual warnings provided via head-up display and accompanied by audio warnings provided via two speakers in front of the participants. Results showed that by using the combination of specific visual and general auditory warning content, drivers can obtain a fast braking response, remarkable TTC at the braking onset, and a low likelihood of collision, which can improve driving safety. Although drivers obtain a fast braking response with a late warning (lead time of 2.5 s) in audio-visual warning system, they feel excessive workload and insufficient understandability of warning content compared with those with an early warning (lead time of 4.5 s). Keywords: Multimodal warnings · Specificity of warning content · Warning lead time · Effectiveness of multimodal warnings

1 Introduction Approximately 1.35 million people die each year as a result of road traffic crashes, which cost most countries 3% of their gross domestic product (World Health Organization [WHO], 2020). Human error is the main cause of most accidents [i.e., operational errors caused by driver behavior (or its absence)] [1]. Therefore, research efforts should be devoted to designing effective driving warning systems to avoid danger and improve driving performance. An efficient method to deliver warning message to a driver is by implementing multimodal driving warning systems, which present information by using cues in more than one sensory modality. Many studies have proved the effectiveness of multimodal warning over unimodal warning as follows. (i) Multimodal warning systems can balance the mental workload level of drivers when driving, which can help avoid the ignorance © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 467–484, 2023. https://doi.org/10.1007/978-3-031-35389-5_32

468

S. Liu et al.

of warning information caused by the excessive workload of a single sensory channel [2–4]. For example, Biondi et al. (2017) compared the effects of auditory, vibrotactile, and audio-tactile multimodal warning in two scenarios (i.e., performing a concurrent cell phone conversation and driving in high-density traffic) and measured drivers’ brake response times and subjective workload [2]. The results showed that multimodal warning produced higher urgency ratings, but no increase in frustration scores, and it obtained significantly shorter brake response times. (ii) In comparison with unimodal warning, multimodal warning can improve drivers’ response time to warning information [5, 6]. Oskarsson et al. (2012) compared multimodal display (i.e., uni-, bi- and trimodal cueing) in a simulated combat vehicle [6]. The results showed that trimodal and bimodal displays composed of visual head-up display (HUD) and 3-D audio enhanced perception and performance in a military threat scenario than either of the displays alone. Although multimodal driving warning systems have been proved to perform better than unimodal warning systems, the parallelism of different sensory channels is not mechanical superposition [7]. How to achieve good collocation and cooperation between channels to obtain the best warning effect should be the focus of research [8, 9]. Otherwise, the design of multimodal warning systems may not achieve the desired effect and may even damage the driving performance in turn. Some studies have proved that the effect of audio-tactile warning is significantly better than that of spatial auditory warning alone [10]. By contrast, Politis, Brewster, & Pollick (2015) found that the performance of audio-tactile warning in response recognition is worse than that of the combination of visual-auditory warning, or even visual warning alone, and no significant difference exists in the effect of audio-tactile warning and auditory warning alone in braking response [11]. Moreover, Spence (2010) found that multimodal warning may not be better than unimodal warning when perceptual load is low [12]. The influence of modality on the driving warning effect may be insignificant [13], and the shifting effect among modalities may slow down the response of drivers [14]. Therefore, increasing the number of modalities may not be sufficient in improving the warning effect, and factors that affect the multimodal warning effect should be explored. The effectiveness of warning information is affected by the form, content, and lead time of warning [15]. The effectiveness of multimodal warning is also affected by warning content and lead time. Warning content can be either generic or specific, in which the generic content refers to warning messages without semantic information (e.g., a visual icon composed of an exclamation point [16] or repeated pulses tone [17]), whereas the specific content refers to warning messages containing certain semantic information (e.g., a visual icon containing the type and location of the hazard [18] or a verbal message containing the moving direction of cross traffic [19]). For the advantages of specific warning content, existing studies have conclusive explanations. Some studies have indicated that directional-specific warning signals can trigger a faster driver response [20], whereas other studies hold the opposite conclusion that specific signals are not beneficial to early warning, and instead lead to more collisions during late warning [21]. Previous studies on the specificity of warning content are mostly in the field of unimodal warning, but few in the field of multimodal warning. In addition, researchers have found an interaction effect between the modalities and the specificity of warning content presented [22]. They have investigated the effects of modality (auditory vs. visual) and

Advanced Audio-Visual Multimodal Warnings

469

specificity (low vs. high) on warning effectiveness, and the results reveal that visual specific warning content is superior to the other three warning conditions in perceptual speed and driving behavior performance and that auditory specific warning content is deficient in driving performance, especially in braking reaction time. Politis et al. (2015) compared the differences between abstract and language-based warnings across all combinations of audio, visual, and tactile modalities [11]. The authors found that abstract warnings achieve faster recognition time in noncritical situations, whereas in critical situations, language-based and abstract warnings behave equally well. Although the study of Politis et al. thoroughly discussed the influence of the specificity of warning content (with or without semantic association to the signified event) on driving performance under various multimodal warnings (all combinations of visual, audio, and tactile), the interaction effects between modality and the specificity of warning content had not been fully explored. Therefore, whether differences exist in the specificity of warning content suitable for each modality in multimodal warning systems and how to match the specificity of warning contents and modalities to obtain the optimal warning effect are the focus of the current study. Warning lead time is a crucial parameter for driving warning systems, which has been studied excellently in unimodal warning systems but rarely in multimodal systems. Winkler, Werneke, & Vollrath (2016) evaluated the optimal warning lead time of visual warning in a two-stage warning system, ranging from 6 s to 30 s [23]. Wan, Wu, & Zhang (2016) investigated the driver performance with auditory warning lead time ranging from 0 s to 60 s [24]. The study found that early warning is better than late warning in reducing collisions and that the optimal lead time interval is 4.5–8 s. Similar results were found in Scott & Gray (2008) whose warning lead time is 3 s for late warning and 5 s for early warning [25]. Few previous studies have been conducted on multimodal warning systems. In view of cooperative perception, the optimal warning time frame needs to be greater than 1500 ms to effectively transmit the warning information to drivers [26]. Naujoks & Neukum (2014) further studied the effect of two-stage warning lead time while responding to audio-visual warning systems, which provided visual signals via HUD and accompanied by an unobtrusive acoustic signal [27]. They varied the lead time from 1 s to 5 s before the last warning stage into five steps for evaluation. The results showed that about 2 s before the hazards is efficient (the original value was converted to time-to-collision value, TTC), but drivers prefer the lead time between about 3 s or 5 s before the hazards. The present study focused on the effectiveness of audio-visual warning system in driving. Human cognitive information acquisition mainly relies on visual and auditory channels, of which the utilization rate of the visual channel reaches 70%, the auditory channel reaches 20%, and the remaining 10% comes from touch, smell, and taste [7]. Therefore, multimodal warnings combining auditory and visual have a wider application value and a lower technical cost. Furthermore, the warning message of the visual channel will be presented by HUD, and the auditory warning message will be presented in the form of speech. These warning methods have been proved to be effective in visual or auditory warning systems [28, 29]. Few previous studies have systematically discussed the effects of warning lead time and the specificity of warning content on driving performance in the case of multimodal

470

S. Liu et al.

warnings. Therefore, this study aims to investigate the effects of the specificity of warning content combinations and the warning lead time of audio-visual multimodal warning systems on driver behavior and performance. A driving simulation experiment was conducted in the laboratory, which included eight common dangerous driving scenarios to improve ecological validity. The driver’s subjective evaluation of the audio-visual multimodal warning system and the hazards were also recorded and discussed.

2 Method 2.1 Participants A total of 37 participants (24 males and 13 females, mean age = 24.0 years) were recruited for the experiment. The participants all possessed a valid Chinese driver’s license and had driving experience. All the participants had normal or correctedto-normal vision and were compensated with 25 yuan for their participation in the experiment. 2.2 Apparatus and Stimuli A driving simulator (STISIMDRIVE M100K, Systems Technology Inc., Hawthorne, CA, USA) was adopted in the experiment, equipped with an operating system (Logitech G29, Newark, CA, USA), two computers, a 60-inch LED screen with 1920 × 1080 pixels resolution and facilities related to the HUD system (see Fig. 1). Driving scenarios created by STISIMDRIVE were presented on the 60-inch LED screen, and the driver’s behavior data were transmitted over a cross-serial line to another computer in the lab. We used the VB.NET programming to receive and process the incoming data. The results of the processing were used to trigger the warning information (visual and audio) and subsequent driving behavior data analysis. The visual warnings were presented via the HUD system, whose construction was based on Min & Jung (2014) [30]. When the continuously received and processed TTC value decreased to the warning lead time set on the second computer, the visual warning picture fired on the windshield by a projector was placed on the screen, such that participants could detect the visual warnings. The content of visual warnings adopted the set of warning prototypes used in Felix et al. (2017) [22], which include generic and the specific visual warnings. The specific and generic visual warning mean picture with information about the type and location of the potential collision opponent and with no additional information, respectively. The speech warnings were presented via two speakers in front of the participants, with a synthetic speech in human female voice. The synthesized speech was spoken at a speed of 200 Chinese characters per minute and a sound level of 70 dB. The background noise from the engine running in the driving simulator was presented through another speaker in front of the participants with a sound level of 50 dB on average. Similarly, the content of speech warnings also included two types, generic and specific, which was shown as “Danger! Attention, please!” and “Danger! Right front pedestrian crossing your lane!”, respectively.

Advanced Audio-Visual Multimodal Warnings

471

Driving scenarios and latent driving threats were adopted from our previous study [28], which consisted of eight scenarios with three hazard types (i.e., vehicle, pedestrian, or bicyclist) on two road types (i.e., straight road or crossroads; see Fig. 2). The scenarios were shown to the participants in random order during the experiment. Between the two driving scenarios with latent threats, we randomly set up a number of spurious stimuli, with similar scenarios but no threats presented, to prevent drivers from anticipating scenarios and threats.

Fig. 1. The STISIM driving simulator.

Fig. 2. Eight scenarios with three hazard types.

472

S. Liu et al.

2.3 Experiment Design A 2 × 4 within-subjects design was adopted in the experiment, with warning lead time and the specificity of warning content as independent variables. The warning lead time was defined as the time duration of collision between the subject vehicle and the hazard (vehicle, pedestrian, or bicyclist) under the condition that the longitudinal speed and acceleration of the vehicle and the hazard were unchanged to trigger early warnings (visual and speech warnings). In the experiment, we adopted two warning lead time levels of the early lead time (4.5 s) and the late lead time (2.5 s). The early lead time (4.5 s) is a proper choice based on safety benefits and willingness to use [24]. The late lead time (2.5 s) is a widely accepted, shortest time that allows drivers to react to warnings before the collision [31]. The specificity of warning content included four levels, which composed of two modalities (visual and audio) with no additional information (generic) or specific information about the type and moving direction of the danger (specific) on each modality (Fig. 3). For example, we used Vspecific Ageneric to represent that the warning message was presented on visual and audio modalities, with specific information about the type and moving direction of the danger on visual modality and no additional information on audio modality. Eight independent levels were balanced by Latin square design in the experiment to eliminate the sequence effect. Similarly, to prevent drivers from anticipating scenarios and threats in association with the emergence of warning messages, we presented 20 normal messages (e.g., traffic information, weather forecast, and news) to drivers randomly during the experiment with the same picture size and color in visual and same speech rate and loudness level in the audio of the warning messages [29]. The dependent variables included behavioral and subjective evaluation indicators. The behavioral indicators in this study included brake reaction time, TTC at the braking onset, and collision. The brake reaction time variable was defined as the time between the points of the warnings presented and the driver first pressing the brake pedal. The TTC at the braking onset was defined as the time duration of collision between the subject vehicle and the hazard (vehicle, pedestrian, or bicyclist) under the condition that the longitudinal speed and acceleration of the vehicle and the hazard were unchanged when the driver first pressed the brake pedal. Collision implies whether the subject vehicle collided with the hazards. The subjective evaluation indicators included four dimensions, namely, the Validity, Workload, Understandability, and Criticality of the warnings, with a 15-point rating scale for each dimension to assess how the participants feel. The questions for the four dimensions are as follows: “How effective do you feel about the warnings presented to you?”; “How do you feel about your reaction workload when the warning was presented to you?”; “How comprehensible do you feel about the warnings presented to you?”; and “How critical do you feel about the warnings and the hazards?”.

Advanced Audio-Visual Multimodal Warnings

Specificity of warning content

visual

473

audio Danger! Attention, please!

Danger! Right front pedestrian crossing your lane! Danger! Right front pedestrian crossing your lane!

Danger! Attention, please!

Fig. 3. Diagrams of warning content specificity.

2.4 Procedure After arriving, the participants were asked to sign an informed consent form and were then asked to provide demographic, driver’s license, and driving experience information. Next, we presented the warning message examples used in the experiment to the participants and asked them to explain the warning messages based on their own understanding to ensure that all participants accurately understood the meaning of the warnings. Subsequently, they were asked to complete a training block similar with the experiment block, except for the order of driving scenarios and the corresponding specificity of warning content. The participants operated the driving simulator with a speed limit of 44 mph and driving in the right lane. During the training block, they were needed to familiarize them with the operation of the simulator and meaning of the warnings. After training, the participants were asked to complete the experiment block with eight driving scenarios and the corresponding specificity of warning content in different orders. Different spurious stimuli of scenarios without threats and 20 normal messages were also randomly presented during the block. A four-lane arterial road inclusive of two lanes in each direction was used, with intersections and running vehicles in each direction. The 44-mph speed limit message was provided to the participants in audio once they had exceeded the speed limit. After each threat trail, the driving simulator was paused. The participants were asked to complete a questionnaire consisting of four

474

S. Liu et al.

dimensions for subjective evaluation (i.e., the Validity, Workload, Understandability, and Criticality of the warnings), with a 15-point rating scales.

3 Result Given the lack of data (missing brake reaction time and TTC at the braking onset, for the participants did not show brake response after noticing the warnings, but the missing data still represented meaningful driver responses), the dependent variable included linear data (brake reaction time) and nonlinear data (collision or not, a dichotomous variable), and the random effects of the participants and scenarios on driving behavior, we used a generalized linear mixed model (GLMM) to analyze behavioral indicators. For the subjective evaluation indicators, given that no missing data were present and the data met the requirements of repeated measurement ANOVA, we used two-way factorial 2 × 4 repeated measurement ANOVAs with warning lead time (2.5 s and 4.5 s) and specificity of warning content (Vgeneric Ageneric , Vspecific Aspecific , Vgeneric Aspecific , and Vspecific Ageneric ) as within-subject factors. 3.1 Behavioral Indicators We used GLMM to analyze behavior indicators by using the warning lead time and the specificity of warning content as independent variables. The GLMM adopted a linear distribution for the dependent variables of brake reaction time and TTC at the braking onset and a binary logistic regression within the framework of the GLMM for the dependent variable of collision (whether collision was a binary categorical variable: yes or no). Moreover, for the three dependent variables, we all set the warning lead time, specificity of warning content, and their interaction as fixed effects; the participants and driving scenarios as random effects; and the repeated covariance models of fixed and random effects as diagonal and autoregressive moving average (1, 1), respectively. Brake Reaction Time. In some occasions, the participants reacted to the warnings merely by releasing the gas pedal or turning the wheel without braking. Other times, the participants took the brake response too early (before the warning onset or in less than 0.2 s) or too late (later than the TTC set: 2.5 s or 4.5 s). All of the above occasions resulted in the brake reaction time data being invalid [16]; thus, we treated them as missing values in GLMM. The brake reaction time exerted a significant main effect of the warning lead time (p < 0.001) and a marginal significant effect of the specificity of warning content (p < 0.087; Table 1). In comparison with the early lead time condition (4.5 s), the participants showed significantly faster brake reaction time (p < 0.01) under the late lead time condition (2.5 s). In comparison with the specificity of warning content of Vspecific Ageneric , the participants showed significantly slower brake reaction time (ps < 0.05) under the other three warning types (Table 2). The interaction effects between the warning lead time and the specificity of warning content were insignificant.

Advanced Audio-Visual Multimodal Warnings

475

TTC at the Braking Onset. The TTC at the braking onset had significant effect of the warning lead time (p < 0.001) but no significant main effect of the specificity of warning content. The interaction effects also showed no significance (Table 1). In comparison with the early lead time condition (4.5 s), the participants showed significantly shorter TTC at the braking onset (p < 0.001) under the late lead time condition (2.5 s). In comparison with the specificity of warning content of Vspecific Ageneric , TTC at the braking onset was significantly shorter (p < 0.05) under the specificity of warning content of Vspecific Aspecific (Table 2). Collision. The GLMM of collision showed a significant main effect of the specificity of warning content (p < 0.001) and warning lead time (p < 0.001) and a significant interaction effect (p < 0.001; Table 1). The participants were more likely to have a collision under the specificity of warning content of Vgeneric Aspecific than Vspecific Ageneric when the warning lead time was set to 2.5 s (p < 0.001; Table 2). Overall, the number of collisions was quite low, with a total of 24 collisions at 296 safety-critical scenarios recorded, which accounted for approximately 8.1% collision frequency. We found that participants could effectively avoid collisions when presented with warnings, and the effect of collision avoidance was better under the specificity of warning content of Vspecific Ageneric . Table 1. Effects of warning lead time and specificity of warning content on brake reaction time, TTC at braking onset and collision. Parameter

F

df1

df2

p

54.13

1

269

.000∗∗∗

Specificity of warning content

2.22

3

269

.087

Warning lead time * Specificity of warning content

1.58

3

269

.194

3581.20

1

268

.000∗∗∗

Specificity of warning content

1.49

3

268

.217

Warning lead time * Specificity of warning content

0.57

3

268

.636

Warning lead time

51.44

1

288

.000∗∗∗

Specificity of warning content

63.69

3

288

.000∗∗∗

Warning lead time * Specificity of warning content

66.82

3

288

.000∗∗∗

Brake reaction time Warning lead time

TTC at braking onset Warning lead time

Collision

476

S. Liu et al.

Table 2. Comparison of warning lead time and specificity of warning content on brake reaction time, TTC at braking onset and collision. Parameter

Mean

SE

B

t

p

2.5s

1.07

0.05

−0.12

−2.59

.010∗∗

4.5s

1.28



Vgeneric Ageneric

1.20

0.07

0.14

2.07

.039∗

Vspecific Aspecific

1.19

0.05

0.14

2.54

.012∗

Vgeneric Aspecific

1.19

0.06

0.13

2.12

.035∗

Vspecific Ageneric

1.12



2.5 s

1.39

0.06

−1.88

−33.89

.000∗∗∗

4.5 s

3.23



Vgeneric Ageneric

2.28

0.07

−0.10

Vspecific Aspecific

2.29

0.06

−0.13

−2.22

.027∗

Vgeneric Aspecific

2.36

0.06

−0.06

Vspecific Ageneric

2.40



2.5 s

0.10

0.81

−0.32

−0.39

.696

4.5 s

0.06



Vgeneric Ageneric

0.08

0.96

0.43

0.45

.650

Vspecific Aspecific

0.09

0.81

−0.32

−0.39

.696

Vgeneral Aspecific

0.09

0.61

10.14

16.59

.000∗∗∗

Vspecific Ageneric

0.05



2.5 s * Vgeneric Ageneric

0.11

1.22

−0.43

−0.36

.722

2.5 s * Vspecific Aspecific

0.09

1.15

0.64

0.55

.581

2.5 s * Vgeneral Aspecific

0.13

0.98

10.14

10.40

.000∗∗∗

2.5 s * Vspecific Ageneric

0.09



Brake reaction time Warning lead time Specificity of warning content

TTC at braking onset Warning lead time Specificity of warning content

Collision Warning lead time Specificity of warning content

Warning lead time * Specificity of warning content

3.2 Subjective Evaluation We used two-way factorial 2 × 4 ANOVAs, with warning lead time and specificity of warning content as independent variables and the subjective evaluation of the Validity, Workload, Understandability, and Criticality of warnings as dependent variables (Table 3).

Advanced Audio-Visual Multimodal Warnings

477

Table 3. Effects of warning lead time and specificity of warning content on validity, workload, understandability, and criticality. Parameter

F

df 1

df 2

p

η2

27.50

1

36

.000∗∗∗

.433

validity Warning lead time Specificity of warning content

1.61

3

34

.206

.124

Warning lead time*Specificity of warning content

3.10

3

34

.039∗

.215

29.20

1

36

.000∗∗∗

.448

3.31

3

34

.032∗

.226 .221

workload Warning lead time Specificity of warning content Warning lead time*Specificity of warning content

3.21

3

34

.035∗

7.12

1

36

.011∗

.165

34

.001∗∗

.387 .387 .424

understandability Warning lead time Specificity of warning content Warning lead time*Specificity of warning content

7.15

3

7.15

3

34

.001∗∗

26.53

1

36

.000∗∗∗

criticality Warning lead time Specificity of warning content

0.28

3

34

.837

.024

Warning lead time*Specificity of warning content

0.41

3

34

.746

.035

Validity. The main effect of warning lead time was significant (p < 0.001), and the Validity score of the early warning condition (4.5 s) was higher than that of the late warning condition (2.5 s) (p < 0.001). The interaction effect between warning lead time and specificity of warning content was significant (p < 0.05). Under the early warning condition (4.5 s), the specificity of warning content of Vgeneric Ageneric scored significantly lower on Validity than the other three (ps < 0.05; Fig. 4). Workload. The warning lead time (p < 0.001) and the specificity of warning content (p < 0.05) had a significant main effect on the workload of the warnings. The Workload score of the early warning condition (4.5 s) was lower than that of the late warning condition (2.5 s; p < 0.001), and the content of Vgeneric Ageneric showed higher Workload scores than the content of Vspecific Ageneric (p < 0.01) and Vspecific Aspecific (p < 0.05). Moreover, the interaction effect between warning lead time and specificity of warning content was significant (p < 0.05). Under the early warning condition (4.5 s), the specificity of warning content of Vgeneric Ageneric scored significantly higher on Workload than the other three (ps < 0.05). Under the late warning condition (2.5 s), in comparison with the specificity of warning content of Vgeneral Aspecific , the specificity of warning content of Vspecific Ageneric had a lower score on Workload (p < 0.05; Fig. 4).

478

S. Liu et al.

Validity

Workload 15

15

V(generic) A(generic)

V(generic) A(generic) V(specific) A(generic)

8

1

2.5s

V(specific) A(generic)

8

V(generic) A(specific)

V(generic) A(specific)

V(specific) A(specific)

V(specific) A(specific)

4.5s

1

2.5s

Understandability 15

4.5s

Criticality 15

V(generic) A(generic) V(specific) A(generic)

8

1

2.5s

4.5s

V(generic) A(generic) V(specific) A(generic)

8

V(generic) A(specific)

V(generic) A(specific)

V(specific) A(specific)

V(specific) A(specific)

1

2.5s

4.5s

Fig. 4. Effects of warning lead time and specificity of warning content on validity, workload, understandability, and criticality.

Understandability. The understandability of the warnings exerted significant effects of the warning lead time (p < 0.001) and the specificity of warning content (p < 0.01), and the interaction effect between warning lead time and specificity of warning content was also significant (p < 0.01). The Understandability score of the early warning condition (4.5 s) was higher than that of the late warning condition (2.5 s; p < 0.05). The content of Vgeneric Ageneric showed the lowest Understandability scores (ps < 0.01), and the content of Vspecific Ageneric showed higher Understandability scores than Vgeneric Ageneric (p < 0.01) but lower scores than Vgeneric Aspecific (p < 0.01) and Vspecific Aspecific (p < 0.01). Under the early warning condition (4.5 s), the order of Understandability score, ranked from smallest to largest, on the specificity of warning content dimension was:Vgeneric Ageneric < Vspecific Ageneric < Vgeneric Aspecific = V specific Aspecific . Under the late warning condition (2.5 s), the order of Understandability score, ranked from smallest to largest, on

Advanced Audio-Visual Multimodal Warnings

479

the specificity of warning content dimension was:Vgeneric Ageneric = Vspecific Ageneric < Vspecific Aspecific (Fig. 4). Criticality. Only the warning lead time affected Criticality significantly (p < 0.001). No main effect of specificity of warning content or interaction effect between warning lead time and the specificity of warning content was observed. The Criticality score of the early warning condition (4.5 s) was lower than that of the late warning condition (2.5 s; p < 0.001) (Fig. 4).

4 Discussion This study investigated the effect of specificity of warning content and warning lead time on driver behavior in audio-visual multimodal warning systems. These factors have been previously studied in unimodal warning systems, but their influences on multimodal driving warning systems have been rarely studied. This study systematically examined the optimal specificity of the audio-visual multimodal warning content and warning lead time in a driving simulation environment by using indicators of behavior performance and subjective evaluation. The results showed that among four specificity types of audiovisual multimodal warning content, Vspecific Ageneric (specific content used for vision, and generic content used for hearing) was the optimal warning content type with respect to drivers’ behavior performance (e.g., brake reaction time, TTC at the braking onset, and likelihood of collision) and subjective evaluation. In addition, in comparison with the lead time of 4.5 s, drivers’ performance was slightly better but overall similar (i.e., faster braking response, but no difference in collision probability) under the lead time of 2.5 s. However, the results of subjective evaluation showed that drivers preferred the lead time of 4.5 s than 2.5 s. The warning content of Vspecific Ageneric was significantly better than the other three warning content types in terms of behavioral indicators, with faster braking response, greater TTC at the braking onset, and lower likelihood of collision. Thus, when equipped with Vspecific Ageneric audio-visual multimodal warning, drivers can respond to hazards more quickly and effectively and thus improve driving safety. Furthermore, in terms of the effect of warning content on drivers’ subjective scores, Vspecific Ageneric showed an advantage in lower Workload level and higher Understandability score than Vgeneric Ageneric (with the lowest level of specificity). These results are consistent with previous studies, which provided evidence for the benefits of warning signal with specific content (with the type, location, or motion direction of hazards) on driving behavior [32–34]. This phenomenon was explained by a model of warning information processing in [35], that is, warning information with high specificity can improve drivers’ understanding of the cause of the warning and accelerate their attention to move toward the hazards. Therefore, conversely, the use of a generic warning content may hinder drivers’ understanding of the cause of the warning and slow down their response time [36]. However, an inconsistency was observed between our results and the above warning information processing model [35]. This inconsistency was reflected in Vgeneric Aspecific and Vspecific Aspecific obtaining higher scores in Understandability, whereas poorer behavior performance compared with Vspecific Ageneric . In Vspecific Ageneric condition, drivers

480

S. Liu et al.

obtained the fastest braking response, higher TTC at the braking onset, and lower likelihood of collision. However, the warning information processing model and previous studies exploring the effect of the specificity content on warning performance were mostly discussed in the unimodal warning rather than in the audio-visual multimodal warning, and the effectiveness of multimodal warning might be affected by the modalities. Hence, a difference in the warning content exists, which is appropriate for visual or auditory modality. Previous studies have found that in the process of warning message, the efficiency of information transmission through hearing is significantly lower than that through vision (for the same information, it takes 5 s for voice, 1.8 s for icon, and 3.6 s for text) [37]. In addition, empirical studies have shown that in driving warning, visual channel is more suitable for presenting specific information (including the type or location of hazards), whereas the warning effect is poor when the auditory channel presents specific information [37]. Therefore, in the audio-visual multimodal warning system, the effectiveness of warning may not be completely improved with the increase of the warning Understandability scores, and the optimal driving behavior and better subjective evaluation can be obtained by using the form of Vspecific Ageneric warning, which indicates the specific content used for vision and the generic content used for hearing. In terms of the effect of warning lead time on driving performance and the subjective evaluation score, in comparison with the early warning (4.5 s), the driving performance of the late warning (2.5 s) was slightly better with faster braking responses but similar likelihood of collision, and the collisions were rare in the whole experiment (8%). However, the drivers subjectively preferred the early warning (4.5 s), specifically with higher validity, lower workload, lower critical, and better understanding of the warning content. Some inconsistencies were observed between the results of this study and previous research on unimodal warning. First, the optimal lead time for better driving behavior was shorter in audio-visual multimodal warning (2.5 s) than that in unimodal warning (3–6 s) [25]. Second, the early warning (4.5 s) did not show a significant advantage over the late warning (2.5 s) on the driving performance in our study [24]. These inconsistencies may be related to the fact that multimodal warning can improve the response speed of drivers. It may be too short for unimodal warning under the late warning (2.5 s), but long enough for audio-visual multimodal warning, thereby allowing drivers to respond to danger in a timely manner. Previous studies in audio-visual multimodal warning have also found that if the warning system provided warnings at least 1 s before the last possible warning moment, it could significantly improve driving performance and subjective evaluation, whereas drivers subjectively preferred the system to provide warnings 2 s and 3 s before the last possible warning moment [27]. Given that the last possible warning moment here is around 1.4–1.8 s, the effect of warning lead time after conversion was consistent with ours. As for the interaction effect between warning lead time and the specificity of warning content, our study showed that in the late warning (2.5 s), the workload level of the drivers’ response to the warning content of Vspecific Ageneric was the lowest, which resulted in the lowest likelihood of collision. However, notably, in the early warning (4.5 s), the workload level of the drivers’ response to Vspecific Ageneric was significantly lower than that in Vgeneric Ageneric ; whereas in the late warning (2.5 s), such a difference was not found, and the workload of Vspecific Ageneric kept a relative high level. Given that warning

Advanced Audio-Visual Multimodal Warnings

481

with only generic content would make it difficult to understand the risk events leading to the workload increase [33, 36], the cause of this result may be that the drivers failed to fully understand the warning content of Vspecific Ageneric in the late warning (2.5 s), but reacted by treating it as similar to Vgeneric Ageneric . This explanation was also confirmed by the results of the Understandability scores for the late warning (2.5 s) and the early warning (4.5 s) (no difference was observed between Vspecific Ageneric and Vgeneric Ageneric only in the late warning). Therefore, Vspecific Ageneric as warning content can effectively improve driving safety, but the improvement can be more effectively in early warning (4.5 s), because the warning with specific content may be at risk of not being fully understood in late warning (2.5s). Generally, for the design of audio-visual multimodal driving warning systems, we suggest the following. First, for the warning content, high specificity is more conducive to drivers’ response to dangers, but not “the higher, the better.” A specific warning content is more suitable for visual presentation, whereas a generic one is more suitable for auditory presentation. Second, drivers achieve a faster braking response with a late warning (2.5 s); however, if conditions permit, then using early warning (4.5 s) as the lead time can achieve better subjective evaluation based on equal good behavior performance, which will have a lower risk when used in actual driving. Although this study had been carefully prepared, it still has some limitations. Although we had set up eight dangerous driving scenarios and inserted some normal messages (e.g., advertising and weather forecast) with the similar form of audio-visual warning messages to improve the ecological validity with driving simulation systems, future studies should further explore actual driving situations. In addition, this study only evaluated the effects of lead time and the specificity of warning content on audio-visual warning systems with auditory warning in the form of speech. The usability of other auditory warning forms, such as auditory-icon and ear-icon in the audio-visual warning system, must be further investigated. In this study, two relatively fixed leading times were selected based on the previous studies. However, the time required to process different warning contents (visual and auditory warnings) may be different. Compared to the early warning (lead time of 4.5 s), drivers feel insufficient understandability of warning content with late warning (lead time of 2.5 s). Hence, measuring the specific duration needed for full understanding of each type of warnings can provide further insight into the design of in vehicle warning system.

5 Conclusion With the development of advanced driver assistance systems and car2x technology, advanced driver warning systems will be installed in actual driving scene in the near future. Multimodal warning systems have been proved to be more effective than traditional unimodal warning systems, but few previous studies have systematically discussed the factors affecting the effectiveness of multimodal warnings. This study focused on the effects of the specificity of warning content combinations and warning lead time of the audio-visual multimodal warning system on drivers’ behavior performance and subjective evaluation. The results revealed that by using the combination of specific

482

S. Liu et al.

visual and general auditory warning content, drivers can obtain a fast braking response, a remarkable TTC at the braking onset, and a low likelihood of collision to improve driving safety. Although drivers can achieve faster braking response with the late warning (lead time of 2.5 s) in audio-visual warning system, they feel excessive workload and insufficient understandability of warning content when compared with those with early warning (lead time of 4.5 s).

References 1. Hollnagel, E.: Barriers and Accident Prevention: Barriers and Accident Prevention. Routledge, Milton Park (2016) 2. Biondi, F., Strayer, D.L., Rossi, R., Gastaldi, M., Mulatti, C.: Advanced driver assistance systems: using multimodal redundant warnings to enhance road safety. Appl. Ergon. 58, 238–244 (2017) 3. Liu, Y.C.: Comparative study of the effects of auditory, visual and multimodality displays on drivers’ performance in advanced traveller information systems. Ergonomics 44(4), 425–442 (2001) 4. Spence, C., Ho, C.: Tactile and multisensory spatial warning signals for drivers. IEEE Trans. Haptics 1(2), 121–129 (2008) 5. Haas, E.C., Erp, J.V.: Multimodal warnings to enhance risk communication and safety. Saf. Sci. 61, 29–35 (2014) 6. Oskarsson, P.A., Eriksson, L., Carlander, O.: Enhanced perception and performance by multimodal threat cueing in simulated combat vehicle. Hum. Factors 54(1), 122 (2012) 7. Xi, W., Zhou, L., Chen, H., Ma, J., Chen, Y.: Research on the brain mechanism of visualaudio interface channel modes affecting user cognition. In: Ayaz, H., Mazur, L. (eds.) AHFE 2018. AISC, vol. 775, pp. 196–204. Springer, Cham (2019). https://doi.org/10.1007/978-3319-94866-9_20 8. Hands, G.L., Larson, E., Stepp, C.E.: Effects of augmentative visual training on audio-motor mapping. Hum. Mov. Sci. 35, 145–155 (2014) 9. Cowgill Jr, J.L., Gilkey, R.H., Simpson, B.D.: The VERITAS facility: a virtual environment platform for human performance research. IFAC Proc. Volumes 46(15), 357–362 (2013) 10. Haas, E., Stachowiak, C.: Multimodal displays to enhance human robot interaction on-themove. In: Proceedings of the 2007 Workshop on Performance Metrics for Intelligent Systems (PerMIS 2007), pp. 135–140. Association for Computing Machinery, New York (2007) 11. Politis, I., Brewster, S., Pollick, F.: To beep or not to beep?: comparing abstract versus language-based multimodal driver displays. In: Proceedings of the ACM Conference on Human Factors in Computing Systems (2015) 12. Spence, C.: Crossmodal spatial attention. Ann. N. Y. Acad. Sci. 1191, 182–200 (2010) 13. Lee, J.D., Hoffman, J.D., Hayes, E.: Collision warning design to mitigate driver distraction. In: Proceedings of the Conference on Human Factors in Computing Systems (2004) 14. Sarter, N.B.: Multimodal information presentation: design guidance and research challenges. Int. J. Ind. Ergon. 36(5), 439–445 (2006) 15. Xiang, W., Yan, X., Weng, J., Li, X.: Effect of auditory in-vehicle warning information on drivers’ brake response time to red-light running vehicles during collision avoidance. Transp. Res. Part F Traffic Psychol. Behav. 40, 56–67 (2016) 16. Winkler, S., Kazazi, J., Vollrath, M.: How to warn drivers in various safety-critical situations – Different strategies, different reactions. Accid. Anal. Prevent. 117, 410–426 (2018) 17. Politis, I., Brewster, S.A., Pollick, F.: Evaluating multimodal driver displays under varying situational urgency. In: ACM (2014)

Advanced Audio-Visual Multimodal Warnings

483

18. Zarife, R.: Integrative Warning Concept for Multiple Driver Assistance Systems. Doctoral dissertation, Universität Würzburg (2014) 19. Zhang, Y., Yan, X., Zhuo, Y.: Discrimination of effects between directional and nondirectional information of auditory warning on driving behavior. Discrete Dyn. Nat. Soc.,1–7 (2015) 20. Spence, C., Ho, C.: Multisensory warning signals for event perception and safe driving. Theoret. Issues Ergon. Sci. 9, 523–554 (2008). https://doi.org/10.1080/14639220701816765 21. Yan, X., Xue, Q., Ma, L., Xu, Y.: Driving-simulator-based test on the effectiveness of auditory red-light running vehicle warning system based on time-to-collision sensor. Sensors 14(2), 3631–3651 (2014) 22. Schwarz, F., Fastenmeier, W.: Augmented reality warnings in vehicles: EFFECTS of modality and specificity on effectiveness - ScienceDirect. Accid. Anal. Prevent. 101, 55–66 (2017) 23. Winkler, S., Werneke, J., Vollrath, M.: Timing of early warning stages in a multi stage collision warning system: drivers’ evaluation depending on situational influences. Transp. Res. Part F Psychol. Behav. 36, 57–68 (2016) 24. Wan, J., Wu, C., Zhang, Y.: Effects of lead time of verbal collision warning messages on driving behavior in connected vehicle settings. J. Saf. Res. 58, 89–98 (2016) 25. Scott, J.J., Gray, R.: A comparison of tactile, visual, and auditory warnings for rear-end collision prevention in simulated driving. Hum. Factors J. Hum. Factors Ergon. Soc. 50(2), 264–275 (2008) 26. Neukum, A., Lübbeke, T., Krüger, H. P., Mayser, C., Steinle, J.: ACC-Stop&Go: Fahrerverhalten an funktionalen Systemgrenzen. In: 5. Workshop Fahrerassistenzsysteme - FAS 2008, pp. 141–150 (2008) 27. Naujoks, F., Neukum, A.: Timing of in-vehicle advisory warnings based on cooperative perception. In: Human Factors and Ergonomics Society Europe Chapter Annual Meeting (2014) 28. Yang, Z., et al.: Effect of warning graphics location on driving performance: an eye movement study. Int. J. Hum.–Comput. Interact. 36(12), 1150–1160 (2020) 29. Zhang, Y., Wu, C., Qiao, C., Hou, Y.: The effects of warning characteristics on driver behavior in connected vehicles systems with missed warnings. Accid. Anal. Prevent. 124, 138–145 (2019) 30. Min, W.P., Jung, S.K.: A projector-based full windshield HUD simulator to evaluate the visualization methods. In: The Sixth IEEE International Conference on Ubiquitous and Future Networks (ICUFN 2014) (2014) 31. Yan, X., Zhang, Y., Lu, M.: The influence of in-vehicle speech warning timing on drivers’ collision avoidance performance at signalized intersections. Transp. Res. Part C 51, 231–242 (2015) 32. Ho, C., Spence, C.: Assessing the effectiveness of various auditory cues in capturing a driver’s visual attention. J. Exp. Psychol. Appl. 11(3), 157 (2005) 33. Ho, C., Tan, H.Z., Spence, C.: The differential effect of vibrotactile and auditory cues on visual spatial attention. Ergonomics 49(7), 724–738 (2006) 34. Roswarski, T.E., Proctor, R.W.: Auditory stimulus-response compatibility: Is there a contribution of stimulus-hand correspondence? Psychol. Res. 63(2), 148–158 (2000) 35. Wogalter, M.S.: Communication-Human Information Processing (C-HIP) Model. Handbook of Warnings, pp. 51–61(2006) 36. Lee, J.D., Mcgehee, D.V., Brown, T.L., Reyes, M.L.: Collision warning timing, driver distraction, and driver response to imminent rear-end collisions in a high-fidelity driving simulator. Hum. Factors J. Hum. Factors Ergon. Soc. 44(2), 314–334 (2002) 37. Cao, Y., Castronovo, S., Mahr, A., Müller, C.: On timing and modality choice with local danger warnings for drivers. In: International Conference on Automotive User Interfaces & Interactive Vehicular Applications (2009)

484

S. Liu et al.

38. Cao, Y., Mahr, A., Castronovo, S., Theune, M., Müller, C.: Local danger warnings for drivers: the effect of modality and level of assistance on driver reaction. In: Proceedings of the 2010 International Conference on Intelligent User Interfaces, 7–10 February, 2010, Hong Kong, China (2010)

Which Position is Better? A Survey on User Preference for Ambient Light Positions Xinze Liu1,2 , Haihang Zhang3 , Xinyu Pan1 , Haidong Liu3 , and Yan Ge1,2(B) 1 CAS Key Laboratory of Behavioural Science, Institute of Psychology, Chinese Academy of

Sciences, Beijing, China [email protected] 2 Department of Psychology, University of Chinese Academy of Sciences, Beijing, China 3 Chongqing Changan Automobile Co., Ltd., Chongqing, China

Abstract. With the development of intelligent automotive products, in-vehicle ambient lighting has received increasing attention, as shown by the growing number of models on the market with in-vehicle ambient lighting installed and the growing range of ambient lighting installation positions. Currently, there is no industry guideline for the appropriate position of ambient lights. Therefore, this study investigated user preference for 15 ambient light positions and explored whether driving experience, demographic variables, and ambient light usage experience influence user preference for ambient light positions. This study used two indicators to measure users’ preference for ambient light positions, aesthetic evaluation and favourite ranking. The results of 479 valid questionnaires showed that users’ preference for 15 ambient light positions can be divided into three categories. The ambient light positions with high preference were those surrounding the centre screen, armrest and centre screen bottom. Users had a low preference for the upper front windshield, steering wheel, bottom of left and right windows, A-pillar and rear view mirror. It was also found that users’ aesthetic evaluation of each position was influenced by gender and experience with ambient lighting. Male drivers, users with ambient lights currently installed in their vehicles and users with more frequent use of ambient lights showed a higher aesthetic evaluation for each position, indicating a more open attitude toward ambient lighting. The results of this study can provide a reference for the layout of in-vehicle ambient lighting, help relevant practitioners determine in-vehicle ambient light positions that better meet user needs, and suggest that ambient light layouts should be designed to better match the characteristics of consumer groups. Keywords: In-vehicle Ambient Light Position · Ambient Light layout · User Experience

1 Introduction 1.1 Research Background In-vehicle ambient lighting is a kind of interior light used to bring out the ambiance in cars (Wang 2019), often made of flexible, bright diffused light-conducting LED fibres (Sun 2017). Over the past decade, there has been a significant increase in the popularity © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 485–500, 2023. https://doi.org/10.1007/978-3-031-35389-5_33

486

X. Liu et al.

of in-vehicle ambient light. As a global free-access online questionnaire showed, nearly 79% of users wanted to install an in-vehicle lighting system in their new cars (Weirich et al. 2022). In line with this, the market has seen a significant increase in the number of models equipped with in-vehicle ambient lighting in recent years, covering most price ranges. However, there is a wide range of ambient light positions in these models, some of which are not welcomed by users and can even cause safety hazards. Since there are no detailed industry specifications to guide the design of in-vehicle ambient light positions, practitioners and UX researchers need to explore which ambient light positions are preferred from a user perspective. Therefore, in this study, a questionnaire was administered to investigate users’ preferences for 15 in-vehicle ambient light positions, and two indicators were chosen to reflect users’ preferences: aesthetic evaluation and favourite ranking. 1.2 Focusing on User Needs: Related to Perceptions and Emotions In-vehicle ambient lighting not only plays a decorative role but also a functional role in the product experience. Many studies have found an effect of lighting on perceptions (Houser and Tiller 2003; Veitch et al. 2008), emotions (Greule 2007), and mood (Ku¨ller et al. 2006; McCloughan et al. 1999). In terms of perception, in-vehicle ambient lighting has advantages of space perception and functionality, even at low illumination (Caberletti et al. 2010). Moreover, the overall perception of the car interior is improved through the use of ambient lighting while driving. Different positions of the ambient light also affect users’ subjective perception. Ambient lighting in the doors increases the perception of luxury and the perception of the interior lighting environment (Weirich et al. 2022). In terms of emotions and mood, previous research has shown that ambient lighting, although in a peripheral position in the driver’s field of view, has several significant positive influences on interior attractiveness, perceived safety, and perceived interior quality (Caberletti et al. 2010). Different in-vehicle ambient light positions also have different effects on users’ psychological feelings. Ambient lighting in the footboard area is found to be cheaper, less comfortable, and less pleasant, causing poorer perceptions of the whole car interior (Caberletti et al. 2010). Another study found that the more peripheral the ambient lighting is, the higher the driver’s comfort level (Schellinger et al. 2006). Considering the effect of in-vehicle ambient light positions on driver perception and emotions, it is necessary to explore differences in customers’ aesthetic evaluations and favourite rankings for ambient lighting in different positions to provide meaningful information about customer preference for the design of human-centric ambient lighting products. The aesthetic evaluation and favourite ranking obtained through a subjective reporting method can not only directly reflect the user’s aesthetic preferences but also laterally reflect the user’s psychological state, specifically perception and emotion, which has an important role in reflecting user needs (e.g., satisfaction, comfort, and performance) and designing human-centred lighting (HCL) (Wolska et al. 2020).

Which Position is Better?

487

1.3 Focusing on User Personalization: Aesthetic Preference When designing products, it is necessary to pay attention not only to the design aesthetics but also to the characteristics of the target user group. Currently, the design of in-vehicle ambient lighting is also gradually paying attention to the needs of different user groups, and some personalized designs have emerged. For example, in order to meet a user’s preference for the intelligence of in-vehicle ambient lighting, the ambient light has gradually added functions such as interacting with music rhythm and switching colours according to cold or warm air conditioning. In order to meet users’ pursuit of personalization, Rolls-Royce’s star roof can be customized for users with different constellation style ambient light layouts. However, there are few studies exploring whether there are differences in preferences for ambient light positions among users with different characteristics. Weirich et al. (2022) used a questionnaire to examine user preferences for nine positions, and the results showed that in manual driving, there were similar tendencies for ambient light position preference among users by gender and origin, with users having the highest preference for lighting in the door and floorboard areas, followed by the centre and screen area, and the lowest preference for the steering wheel; in autonomous driving, users’ preference increases for all positions except the door and floorboard (Weirich et al. 2022). There are no research findings on whether factors such as usage experience with in-vehicle ambient lighting and driving experience affect user preference for ambient light positions. Currently, car models with ambient lighting basically cover most of the areas where ambient lights can be installed. By analysing the layout of the in-vehicle ambient light positions of five common car models on the market (Audi A8; Mercedes-Benz S-Class and E-Class; BMW 5 Series and 7 Series), it was found that the strip-like ambient lights were mainly installed at the door control light area, the door interior panel armrest area, the centre console side trim area, the instrument panel table area, the rear seat top side trim area, the sunroof area, the roof area and the front and rear footrest areas; the dot-like ambient lights were mainly placed in the armrest area of front and rear door interior trim panels, storage area and front and rear footrest areas (Wan and Huang 2018). However, users have position preference differences, which means that the more positions the ambient lighting can be presented in, the better it is. There is an urgent need to fill the research gap of user personalization on in-vehicle ambient lighting because of the lack of in-depth investigation of user interests, aesthetic preferences, and so on. 1.4 The Purpose of the Present Study The purpose of this study is to explore users’ preference for different ambient light positions and investigate whether users’ personalized characteristics influence their ambient light position preference. Fifteen common or emerging ambient light positions were selected, and users’ demographic information, driving information, and in-vehicle ambient light usage information were collected. Two indicators were chosen to reflect users’ preferences: aesthetic evaluation (whether the ambient light in that position is aesthetical) and favourite ranking (favourite order for different ambient light positions).

488

X. Liu et al.

2 Method 2.1 Participants In this study, a total of 642 participants were recruited from the online survey platform Sojump (www.wjx.com), and all of them had driver’s licences. Excluding those who did not pass the attention testing questions and did not answer the questionnaire as required, 479 participants were included in the analysis. A total of 323 (67.43%) participants were male, and 156 (32.57%) were female. A total of 66.39% of participants were 18–30 years old, and 29.44% were 31–40 years old. The average driving experience of the participants was 5.21 years (SD = 4.65), and the average driving mileage was 7.60 million km (SD = 10.28). Detailed demographic information is shown in Table 1. Table 1. Demographic characteristics of participants, N = 479. Number of samples

Percentage (%)

Male

323

67.43

Female

156

32.57

18–25

137

28.60

26–30

181

37.79

31–40

141

29.44

41–50

19

3.97

1

0.21

82

17.12

378

78.92

17

3.55

2

0.42

Professional drivers

135

28.18

Non-professional drivers

344

71.82

Gender

Age range

Over 51 Level of education More than a bachelor’s degree Bachelor’s degree High school education Less than a high school education Employment

Which Position is Better?

489

2.2 Research Design This study used an online questionnaire to explore the differences in user preference for 15 in-vehicle ambient light positions. The 15 ambient light positions of interest in this study were obtained from a summary of the literature and current positions in vehicles (Weirich et al. 2022; Wan and Huang 2018) and also contained some emerging ambient light positions that are not available on the current market. In total, the 15 ambient light positions included the upper front windshield, bottom front windshield, centre screen top, centre screen bottom, centre screen surround, steering wheel, bottom of left and right windows, rear view mirror, armrest, cup holder, air conditioner vents, audio, Apillar, floorboard, and roof. This study used two indicators to measure participants’ preference for 15 ambient light positions: aesthetics evaluation and favourite ranking. The aesthetics evaluation focused on reflecting the aesthetic experience value of each ambient light position, while the favourite ranking reflects the user’s priority for ambient lights in these positions. 2.3 Materials This study used an online questionnaire to investigate the differences in participants’ preferences for 15 in-vehicle ambient light positions. The questionnaire consisted of four main parts: 1) introduction to in-vehicle ambient lighting; 2) aesthetic evaluation items; 3) favourite ranking items; and 4) collection of basic information. Aesthetics Evaluation Items for In-vehicle Ambient Light Positions. A 5-point scale questionnaire was used to investigate the difference in the participants’ aesthetic evaluations of 15 different in-vehicle ambient light positions. Participants were asked to rate the aesthetics of each position of the ambient lights on a scale from 1 to 5, where 1 means “very unaesthetic”, 2 means “less aesthetical”, 3 means “general”, 4 means “more aesthetical”, and 5 means “very aesthetical”. A higher score means a higher evaluation of the aesthetics. Each ambient light position was presented in a pseudorandom order from the first to the fifteenth position, with a corresponding schematic diagram appearing along with text descriptions of light positions so that participants could clearly identify and rate the ambient light position. Favourite Ranking Items for In-vehicle Ambient Lighting. A self-administered ranking questionnaire was used to explore the difference in participants’ favourite rankings for 15 different in-vehicle ambient light positions. A lower score means a higher favourite ranking of the position. Participants were asked: “In which position do you prefer to have interior ambient lights? Please rank from “like to have ambient light in this location” to “dislike having ambient light in this location”, the higher the ranking indicates the more you would like to have ambient light in that position”. The schematic diagram of each ambient light position was also presented with the corresponding text description.

490

X. Liu et al.

Basic Information Collection. In this section, we collected participants’ 1) demographic information such as gender, age range, and education level, 2) driving information such as professional drivers or not, driving experience, and total driving mileage, and 3) in-vehicle ambient light usage information such as whether the current vehicle is equipped with ambient lighting and the frequency of using in-vehicle ambient lighting, ranging from 1-never use to 5-frequently use. Schematic Diagrams of 15 In-vehicle Ambient Light Positions. In the questionnaire, a schematic diagram of each ambient light position was presented to the participants to help them understand the specific position in the car more clearly. The images that were presented on a black background simulated the effect of ambient lights at night in dim conditions. The images showed vehicle structure and used an ice blue colour to designate the ambient light positions and luminescence effects (Fig. 1). Since the ambient light positions involved in this study also included emerging ambient light positions that are not available on the current market, the presentation of schematic diagrams helped the participants to better imagine each ambient light position and make the evaluation.

Fig. 1. Examples of schematic diagrams corresponding to the ambient light positions (from left to right in the picture: floorboard, centre screen surround, bottom front windshield).

2.4 Procedure First, the participants were shown a brief introduction about the concept of in-vehicle ambient light and its functions to help them form an overall understanding. Participants were presented with pictures of different positions of ambient lights in four kinds of cars and were informed of the importance of the ambient light positions and the purpose of the research. Second, participants needed to rate the aesthetics of each position from 1 to 5. Next, participants were asked to rank their favourite locations for ambient lights from 1 to 15. Finally, participants filled in their demographic information, driving information and in-vehicle ambient light usage information.

Which Position is Better?

491

2.5 Statistical Analyses A descriptive analysis was conducted using SPSS 26.0 for the aesthetic evaluation and favourite ranking of the in-vehicle ambient light positions, followed by repeatedmeasures ANOVA, with ambient light positions as the independent variables and participants’ ratings of aesthetics and favourite rankings as the dependent variables. In addition, to examine the effects of personalized characteristics such as demographic information, driving information, and in-vehicle ambient light usage information on aesthetic evaluation and favourite ranking, repeated-measures ANOVA was conducted with the ambient light position as a within-subjects variable and personalized characteristics as between-subjects variables.

3 Results 3.1 Descriptive Results of In-vehicle Ambient Light Usage In general, among the participants included in this study, more than 80% reported that they had ambient lights inside their current vehicle. In addition, the frequency of using in-vehicle ambient lights among the research sample was also relatively evenly distributed, with 34.6% of the participants using ambient lights less frequently (never and occasionally) and 38.0% using them more frequently (often and frequently). This shows that the sample is similar to the actual users of in-vehicle ambient light and is suitable for further analysis to reveal participants’ preferences for the positions of in-vehicle ambient light (Table 2). Table 2. Descriptive results of ambient light usage, N = 479. Number of participants

Percentage (%)

Yes

390

81.42

No

89

18.58

69

14.41

Occasionally (used less than 5 times a year)

97

20.25

Generally (used less than 4 times a month)

131

27.35

Often (used 1 to 2 times a week)

116

24.22

66

13.78

Do you have ambient lights inside your car?

Ambient light usage frequency Never

Frequently (used 3 times or more a week)

492

X. Liu et al.

3.2 Aesthetic Evaluation Results of In-Vehicle Ambient Light Positions The results showed that the main effect of ambient light position on the aesthetic evaluation was significant, F = 26.004, p < 0.001, η2 = 0.052. Further multiple comparisons were conducted and found that the aesthetic evaluation of different positions could be divided into 3 levels (as shown in Fig. 2): high aesthetic (armrest, roof, centre screen surround, floorboard), medium aesthetic (cup holder, audio, centre screen bottom, centre screen top, bottom of left and right windows, air conditioner vents), and low aesthetic (A-pillar, upper front windshield, steering wheel, bottom front windshield, rear view mirror). The specific descriptive results are shown in Table 3. In addition, the results of the repeated-measures ANOVA showed that gender, whether the current vehicle is equipped with ambient lights, and ambient light usage frequency had a significant effect on the users’ aesthetic evaluation. Gender had a significant main effect on aesthetic evaluation (F = 7.874, p = 0.005, η2 = 0.016), and the interaction effect with ambient light position was not significant. Male participants (M = 3.724) had significantly higher aesthetic evaluations than females (M = 3.565, as shown in Fig. 3A). Moreover, whether the current vehicle is equipped with ambient lights and the frequency of ambient light use also significantly affected the aesthetic evaluation of the ambient light position. For whether the current vehicle was equipped with ambient lights, a significant main effect on aesthetic evaluation was found (F = 6.426, p = 0.012, η2 = 0.013), and the interaction effect with ambient light position was also significant (F = 2.247, p = 0.012, η2 = 0.005). The results of the post test showed that compared to the users without in-vehicle ambient lighting (M = 3.531), users who already have vehicle ambient lighting (M = 3.704) have a higher aesthetic evaluation of some ambient light positions (as shown in Fig. 3B). For frequency of ambient light use, the main effect was significant (F = 12.069, p = 0.001, η2 = 0.034), and the interaction effect with ambient light position was not significant. Users who used in-vehicle ambient light more frequently (M = 3.827) rated the aesthetics significantly higher than those who used it less frequently (M = 3.611, as shown in Fig. 3C). Table 3. Descriptive statistics and ANOVA results of aesthetic evaluation and favourite ranking of in-vehicle ambient light positions. In-vehicle ambient light positions

Aesthetic evaluation

Favourite ranking

Mean

SD

Mean

Upper front windshield

3.495

.053

7.413

5.113

Bottom front windshield

3.359

.052

7.666

4.867

Centre screen top

3.693

.041

9.520

3.900

Centre screen bottom

3.720

.041

9.781

3.656

Centre screen surround

3.873

.042

10.301

3.627

SD

Windshield

Centre screen area

(continued)

Which Position is Better?

493

Table 3. (continued) In-vehicle ambient light positions

Aesthetic evaluation

Favourite ranking

Mean

SD

Mean

Steering wheel

3.434

.056

7.509

4.473

Bottom of left and right windows

3.685

.050

6.739

3.352

Rear view mirror

3.296

.058

5.704

3.774

SD

Operating areas

Other interactive locations Armrest

4.033

.043

9.714

3.636

Cup holder

3.777

.049

8.340

3.988

Air conditioner vents

3.635

.047

8.232

3.700

Audio

3.762

.047

8.388

3.941

A-pillar

3.516

.053

5.933

3.533

Floorboard

3.860

.048

7.021

4.733

Roof

3.944

.049

7.739

4.834

F

26.004***

50.783***

Partial η2

.052

.096

Note: *** p < 0.001

Fig. 2. Aesthetic evaluation of in-vehicle ambient light positions. High aesthetic: armrest to floorboard; medium aesthetic: cup holder to air conditioner vents; low aesthetic: A-pillar to rear view mirror (note: *p < 0.05).

494

X. Liu et al.

Fig. 3. Differences in aesthetic evaluation on gender (A), whether the current vehicle is equipped with ambient lights (B), and ambient light usage frequency (C) (note: *p < 0.05, **p < 0.01).

3.3 Favourite Ranking Results of In-Vehicle Ambient Light Positions The specific descriptive results are shown in Table 3. The main effect of ambient light position on the favourite ranking was significant, F = 50.783, p < 0.001, η2 = 0.096. Based on the results of multiple comparisons, the favourite ranking of different positions could be divided into 3 levels: high favourite ranking (centre screen surround, centre screen bottom, armrest, centre screen top), medium favourite ranking (audio, cup holder, air conditioner vents, roof, bottom front windshield, steering wheel, upper front windshield, floorboard, bottom of left and right windows), and low favourite ranking (A-pillar, rear view mirror), as shown in Fig. 4.

Which Position is Better?

495

We also analysed whether personalized characteristics affect user favourite for ambient light positions. The results of repeated-measures ANOVA did not find that gender, driving information, or ambient light usage frequency had significant effects on favourite ranking (p > 0.05).

Fig. 4. Favourite ranking of in-vehicle ambient light positions. High favourite ranking: centre screen surround to centre screen top; medium favourite ranking: audio to bottom of left and right windows; low favourite ranking: A-pillar to rear view mirror (note: *p < 0.05).

3.4 Integration of Aesthetic Evaluation and Favourite Ranking into Preferences After analysing the two indicators, aesthetic evaluation and favourite ranking, separately, the two indicators were further integrated into preference for analysis. Three methods were used to calculate the scores of participants’ preference for each in-vehicle ambient light position: 1) summing the average score of aesthetics evaluation and favourite score: the favourite ranking of each position was reverse scored, and then the mean scores were calculated and divided by 3 to sum up with the mean scores of the aesthetic evaluation to obtain the preference mean score of each position, which was ranked from highest to lowest; 2) summing the average of aesthetics ranking (1–15) and favourite ranking (1–15): the aesthetic evaluation and favourite ranking of each position were ranked from 1 to 15 according to their average scores, with smaller numbers meaning a higher ranking, and then the rankings were added to form the order of preferences from lowest to highest; 3) summing the average of multiple comparison result rankings for aesthetics (1–3) and favourites (1–3): according to the results of the previous multiple comparisons, the aesthetic evaluation and favourite ranking of each position were ranked from 1 to 3, with smaller numbers meaning a higher ranking, and then the rankings were summed to form the order of preferences from lowest to highest. By analysing the results of the three calculation methods, it can be found that in all three calculations, the first three in-vehicle ambient light positions with the highest preference were the same, which were the centre screen surround, armrest, and centre screen bottom. Meanwhile, the six ambient light positions with the lowest preference

496

X. Liu et al.

were also the same in the three calculation methods, which were the upper front windshield, steering wheel, bottom of left and right windows, A-pillar, and rear view mirror. Thus, the preference of different positions could be divided into 3 levels (as shown in Fig. 5): high preference (centre screen surround, armrest, centre screen bottom), medium preference (centre screen top, audio, cup holder, roof, air conditioner vents, floorboard), and low preference (upper front windshield, steering wheel, bottom of left and right windows, A-pillar, rear view mirror). This shows that users prefer ambient lighting in the centre screen area, followed by other interactive locations, and finally the windshield and operating areas, when the 15 ambient light positions are divided into areas.

Fig. 5. Preference scores of different in-vehicle ambient light positions (the first calculated method).

4 Discussion In this study, we investigated participants’ aesthetic evaluation and favourite ranking for 15 interior ambient light positions through an online questionnaire and obtained two main results. On the one hand, after combining aesthetic evaluation and favourite ranking into preferences, we found that participants’ preferences for in-vehicle ambient light positions were consistent: high preference (centre screen surround, armrest, centre screen bottom), medium preference (centre screen top, audio, cup holder, roof, air conditioner vents, floorboard), and low preference (upper front windshield, steering wheel, bottom of left and right windows, bottom front windshield, A-pillar, rear view mirror). This showed that participants preferred ambient lighting in the centre screen area, followed by other interaction locations, and finally the windshield and operating areas. The results showed that aesthetic evaluations differed significantly by gender, whether the current vehicle was equipped with ambient lights and how often ambient lighting was used.

Which Position is Better?

497

The preference results of in-vehicle ambient light positions are both the same and different from the previous results. In terms of the preference ranking of specific positions for in-vehicle ambient lighting, this study partially agreed with the results of Weirich et al. (2022); the preference ratings decreased in order from foot, top, A-pillar to steering wheel. This reflects a general dislike of ambient lighting on the steering wheel, possibly because it interferes with driver attention and thus driving safety. Matthews et al. (2004) argued that peripheral displays allow a person to be aware of and receive information while focusing on their primary task. However, the steering wheel is in the centre of the driver’s field of vision, giving the ambient light the potential to interfere with the main driving task, especially at night, thus increasing driving hazards and resulting in a low preference among users. There are also two other positions ranked differently in preference than the previous results. Compared with previous studies (Weirich et al., 2022), both the centre screen surround and centre screen bottom in this study were ranked higher. The reason may be that the two positions had different ranges in the two studies. The two locations in the Weirich et al. (2022) study were much more extensive, covering essentially the entire console and even extending to the steering wheel. Earlier results from Schellinger et al. (2006) found that the more peripheral the light source to the driver’s eyes was, the higher the driver’s comfort level. Therefore, the two positions, which were at the periphery of the driver’s vision compared to the steering wheel, in this study were less distracting and more comfortable, and the participants rated them higher. For the same reason, the armrest ambient lighting was also highly preferred. Similarly, the driver must not only pay attention to the road in front but also constantly check the left and right sides to notice the vehicles and pedestrians, so the upper front windshield, bottom of left and right windows, bottom front windshield, A-pillar, and rear view mirror are closely related to safety. Installing ambient lighting in these locations will interfere with the driver’s vision and thus increase the risk of traffic accidents, so participants generally assigned them the lowest level of preference. Similarly, the remaining locations (audio, cup holder, roof, air conditioner vents) are smaller in scope and not in the centre of the field of vision that affects driving safety, so participants gave them a medium rating. Thus, when designing in-vehicle ambient light products, it is important to consider their safety. In addition, aesthetic evaluations differed significantly by gender, whether the current vehicle was equipped with ambient lights and ambient lighting usage frequency. The higher scores for aesthetics from males compared to females may indicate that males have more open attitudes towards various positions of ambient lights, which may be due to the differences in colour preferences, safety attitudes, and risk seeking by gender. Previous studies have shown that the average highest preference level for women is in the red–pink spectrum, and the average highest preference level for men is in the blue–green spectrum (Hurlbert and Ling 2007). The ice blue pictures used in this study were more in line with male colour preferences, which may lead men to rate them as more aesthetically pleasing than women. Moreover, considering that in-vehicle ambient lighting may affect driving safety and thus the participants’ evaluation of it, this result may be because men are more willing to take risks than women (Morgenroth et al. 2018); therefore, men may be more tolerant of the driving risks associated with in-vehicle ambient lighting and thus

498

X. Liu et al.

rate the lighting higher than women. This gives us the insight that gender differences should be taken into account when designing in-vehicle ambient light products. For example, designing the colour, position, state (dynamic or static), and function of the in-vehicle ambient light according to gender differences can meet the individual needs of users. Aesthetics evaluations differed significantly in terms of usage experience with invehicle ambient lighting. Those participants who had ambient lights rated them higher than those who did not, and the more frequently these participants used them, the higher their overall rating, reflecting a more open attitude towards in-vehicle ambient lights. In addition, participants who had used in-vehicle ambient lights were consistent in their aesthetic evaluations, reflecting that the participants’ aesthetic attitudes had a high degree of consistency. This may be because usage enhances people’s evaluation and acceptance of in-vehicle ambient light products. Research has demonstrated that the more frequently a product is used, the higher the post purchase evaluation of the product (Etkin and Sela 2016). Therefore, those users in this study who had higher usage experience also gave higher ratings. This gives us the insight that although users do not quickly accept new ambient light products or newly developed ambient light positions in cars, car sales can take advantage of the exposure effect to let them experience and become familiar with in-vehicle ambient light. When users experience the product, they may be more receptive to new in-vehicle ambient lighting. This study has some limitations. The online questionnaire explored the difference in participants’ preference for in-vehicle ambient light in 15 positions and obtained the preference ranking for each location. However, the data can only present simple results and cannot provide more information. Therefore, more research is needed, including qualitative studies, multifactor experimental designs, and cross-sectional designs with more variables, to explore the factors that influence people’s evaluation of in-vehicle ambient light, such as perceptions and emotions, to provide more detailed information and design in-vehicle ambient lighting that better meets the demand. Moreover, in addition to focusing on users’ subjective preferences, it is also important to pay attention to the safety of ambient light and identify the potential interference of ambient light on driving.

5 Conclusion This study explored users’ preferences for 15 in-vehicle ambient light positions to provide guidance for ambient light design. The results showed that users prefer ambient lights in the centre screen area, followed by other interactive locations, and users are least willing to have ambient lights in the windshield and operation areas that interfere with the driving view. Aesthetic evaluations differed significantly by gender, with men likely to rate in-vehicle ambient lighting higher than women. There were also significant differences in interior ambient light usage, with those who had installed ambient lights rating them higher than those who had not, and those who used them more frequently rating them higher, reflecting the influence of usage experience on product acceptance. The results showed that there is a consensus on ambient light position preference among users with different characteristics, revealing that designers should focus on the centre

Which Position is Better?

499

control screen area when laying out ambient light positions and avoid setting ambient lights in the operation areas. Acknowledgement. The authors would like to thank Chongqing Changan Automobile Co., Ltd. for project support.

References Caberletti, L., Elfmann, K., Kummel, M., Schierz, C.: Influence of ambient lighting in a vehicle interior on the driver’s perceptions. Light. Res. Technol. 42(3), 297–311 (2010). https://doi. org/10.1177/1477153510370554 Etkin, J., Sela, A.: How experience variety shapes postpurchase product evaluation. J. Mark. Res. 53(1), 77–90 (2016). https://doi.org/10.1509/jmr.14.0175 Greule, R.: Emotionale Wirkung von farbiger LED-Beleuchtung im Innenraum. Hamburg: Hochschule fur Angewandte Wissenschaften (HAW) Hamburg (2007) Measuring the subjective response to interior lighting: paired comparisons and semantic differential scaling. Light. Res. Technol. 35, 183–198 (2003) Hurlbert, A.C., Ling, Y.: Biological components of sex differences in color preference. Curr. Biol. 17, R623–R625 (2007) Ku¨ ller, R., Ballal, S., Laike, T, Mikellides, B., Tonello, G.: The impact of light and colour on psychological mood: a cross-cultural study of indoor work environments. Ergonomics 49, 1496–1507 (2006) Kim, T., Kim, Y., Jeon, H., Choi, C.-S., Suk, H.-J.: Emotional response to in-car dynamic lighting. Int. J. Automot. Technol. 22(4), 1035–1043 (2021). https://doi.org/10.1007/s12239-0210093-4 Matviienko, A., et al.: Towards new ambient light systems: a close look at existing encodings of ambient light systems. Interact. Des. Archit. 26, 10–24 (2015) McCloughan, C.L.B., Aspinall, P.A., Webb, R.S.: The impact of lighting on mood. Int. J. Light. Res. Technol. 31(3), 81–88 (1999) Morgenroth, T., Fine, C., Ryan, M.K., Genat, A.E.: Sex, drugs, and reckless driving: are measures biased toward identifying risk-taking in men? Soc. Psychol. Person. Sci. 9(6), 744–753 (2018) Schellinger, S., Meyrueis, P.P., Pearsall, T.P., Franzke, D., Klinger, K., Lemmer, U.: Advantages of ambient interior lighting for drivers contrast vision. In: Proceedings of SPIE - The International Society for Optical Engineering, 6198, 61980J (2006) Stylidis, K., Woxlin, A., Siljefalk, L., Heimersson, E., Söderberg, R.: Understanding light. A study on the perceived quality of car exterior lighting and interior illumination. Procedia CIRP 93, 1340–1345. https://doi.org/10.1016/j.procir.2020.04.080 Matthews, T., Dey, A.K., Mankoff, J., Carter, S., Rattenbury, T. A toolkit for managing user attention in peripheral displays.In: Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology, pp. 247–256 Veitch, J.A., Newsham, G.R., Boyce, P.R., Jones, C.C.: Lighting appraisal, well-being and performance in open-plan offices: a linked mechanisms approach. Light. Res. Technol. 40(2), 133–151 (2008) Weirich, C., Lin, Y., Khanh, T.Q.: Evidence for human-centric in-vehicle lighting: Part 1. Appl. Sci. 12(2), 552 (2022) Weirich, C., Lin, Y., Khanh, T.Q.: Modern in-vehicle lighting-market studies meet science. ATZ worldwide 124(4), 52–56 (2022)

500

X. Liu et al.

Wolska, A., Sawicki, D., Tafil-Klawe, M.: Visual and Non-Visual Effects of Light: Working Environment and Well-Being. CRC Press (2020) Wang, J., Huang, H.M.: Research on the application of LED car interior ambient lighting technology. Autom. Appl. Technol. 21, 4 (2018) Sun, L.: Current state and development trends of LED ambient lighting inside vehicle. China Light Light 6, 3 (2017). (Chinese) Wang, S.: Research on the interior ambient lamp design and application. Automob. Appl. Technol. 19, 2 (2019). (Chinese)

Using Automatic Speech Recognition to Evaluate Team Processes in Aviation - First Experiences and Open Questions Anne Papenfuss1(B)

and Christoph Andreas Schmidt2

1 Deutsches Zentrum Für Luft- Und Raumfahrt, Institut Für Flugführung, Lilienthalplatz 7,

38108 Braunschweig, Germany [email protected] 2 Fraunhofer IAIS, Schloss Birlinghoven, 1, 53757 Sankt Augustin, Germany [email protected]

Abstract. This paper reports on the application of automatic speech recognition (ASR) for the analysis of team communication. Shortening the time needed to derive literal transcripts of team communication is seen as an enabler to conduct team process analysis more often in applied research. First, the theoretical background, the connection between teamwork and communication and the stateof-the-art of ASR is described. Second, one use case of collaboration is described in detail to understand the specifics of the observed communication. Applying the nearly untrained ASR to this data yielded a word error rate of 36%. These results and required performance are discussed. The outlook discusses the use of a new generation of ASR models and their expected performance benefits. Keywords: automatic speech recognition · communication analysis · teamwork

1 Motivation Teamwork is omnipresent in the aviation domain. Often, decisions and actions taken by human operators are a result of teamwork processes. The quality of these teamwork processes influences the overall system performance and consequently should be evaluated. Despite this, human factors methods to assess human performance mainly focus on indicators of individual performance. There is a need for established and practicable methods to assess team performance and processes. Therefore, the German Aerospace Center launched an internal project ITC (“InterTeam-Collaboration” (2018–2021), where the institutes of flight guidance, aerospace medicine and flight systems worked together to develop competence in evaluating teamwork processes in multi-team-systems. One goal was to set up and test a simulation infrastructure, how close-to-reality teamwork processes can be analyzed. Furthermore, a compilation of approaches and methods to evaluate inter-team collaboration was developed in the project. The project focused on maturing the methodology and tools needed to evaluate team processes and processes of collaboration specifically. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 501–513, 2023. https://doi.org/10.1007/978-3-031-35389-5_34

502

A. Papenfuss and C. A. Schmidt

In this paper we present work that was conducted to make use of automatic speech recognition (ASR) technologies to analyze teamwork processes. The focus of this paper is to explain how and where ASR can be integrated into the analysis of teamwork processes. Intermediate results of that integration are given, as it is still work in progress. At the moment, manual content analysis of communication data is time consuming, as literal transcripts of the spoken communication are required as a basis for team process analysis. As a rule of thumb, in many cases at least a time effort thrice the duration of the raw audio data is required. This labor-some task causes that analysis of communication data is quite unattractive to be incorporated into applied research projects. To help overcome this hurdle and make teamwork process analysis more attractive in terms of value for time effort, the usefulness of ASR for creating transcripts of team communication as a basis for empirical research was explored. ASR was applied to cockpit communication data, which is one use case of team collaboration investigated in the ITC project. It was used to derive transcripts of the cockpit crew communication. The initial tests reported here, were conducted as benchmarking with the aim of understanding first: what are characteristics of communication data derived from a cockpit crew, and second: how does a current ASR system trained on other domains perform on this data, and third: which challenges need to be tackled to use ASR in human factors research and for system design. This paper first gives a theoretical background that explains the role of communication for teamwork, highlights the different perspectives of approaches that use communication data for teamwork analysis and provides a state-of-the-art view on ASR technology. Following, the approach for team communication analysis within the project ITC is summarized. For the cockpit-communication use case, the coordination among a cockpit crew, a more detailed description of the teamwork situation, the communication observed and the application of ASR is provided. In the following, the achieved current state is summarized and an outlook for that approach is given.

2 Theoretical Background 2.1 Collaboration and Teamwork in Aviation Today and in the near future, the efficiency of air traffic management (ATM) depends on human decisions, which are made collaboratively and cross-functional. Collaboration is needed, because actors and organizations are dependent on each other. Within the scope of the European research framework SESAR (Single European Sky – Air Traffic Management Research), operational concepts and technical solutions are developed and brought into operation, which build upon “collaborative decision making” CDM). Accordingly, in the near future teams in ATM need not only work together following clear and fixed procedures, but more and more in a flexible and collaborative manner. For example, future emission reducing concepts or ATM, like Continuous Descent Operations, require a changed and closer collaboration of cross-functional teams. At the moment, decision making processes in multi-team systems are not sufficiently defined. One concept that explicitly tackles this perspective is Airport Collaborative Decision Making (A-CDM) that defined a joint, coordinated process to keep all stakeholders of a flights ground processes informed [1]. Nevertheless, in current research,

Using Automatic Speech Recognition

503

there are is a lack of established methods to evaluate those decision-making processes in early concept development phases. 2.2 Role of Communication for Team Collaboration One central collaborative task of multi-team systems (MTS) in ATM is the coordination of distributed, time and safety critical processes [2]. Information and knowledge to cope with this task are distributed between different roles, decision makers or teams of the overall multi-team-system. Coordination is conducted via the exchange of time information, like time stamps of an event, duration of processes or the sequence of several processes. Consequently, an MTS requires a mutual understanding, in the sense of “who is doing what with which information and transfers it in which form to whom”. On the cognitive level, so called transactive memory systems (TMS) are regarded as the pre-requisites that enable teams to collaborate [3]. A TMS develops only via communication, whereby communication is understood as more than just sharing information. The communication system is a convergence of information, message and understanding. Within the project ITC, this communication system was a central element of analysis. To adequately analyze collaboration in MTS, a broad understanding of communication was applied to the analysis, as it is summarized by [4]: “If collaboration is core, communication is key!”. Communication is a meta process of teamwork [5, 6] that facilitates coordination and collaboration within teams. Keyton, Ford, and Smith [7] remarked that “collaboration can only happen through communication”. Experts from the air traffic management domain identified within a previous project in interviews [8] and as participants of a workshop [9] that deficits within the communication cause problems in their shared work processes. From a measurement perspective, communication can be described as the observable behavior of teams that is based on latent, cognitive states within the teams, e.g. TMS, and the cognition of the individual team members. For collaborative work situations, the analysis of communication is key in order to objectively evaluate the processes of teamwork [10]. Therefore, an analysis of team communication is conducted to explain performance differences of team collaboration. 2.3 Features of the Communication Process Used for Team Process Analysis Communication is complex social behavior. As such, data representing that behavior has multiple features that can be used for analysis. As Table 1 summarizes, several aspects of the communication process can be analyzed, for instance the medium chosen for communication and its impact on team performance. Studies could quantify, based on the analysis of communication data, the impact face-to-face communication has on team performance compared to text-based interaction via chat [11, 12]. The authors concluded that higher cognitive costs of producing messages via text compared to natural speech caused less communication. In time critical tasks, these production costs can explain performance losses.

504

A. Papenfuss and C. A. Schmidt Table 1. Aspects of the communication process and analysis methods

Aspect

Explanation

Examples of analysis methods

Example of study

Medium

Spoken language in F2F Technical boundaries, condition, chat-based e.g. bandwidth, SNR, conversation, gestures production costs, e.g. comparison of team performance in chat vs. F2F-conditions

Higher production costs of text-mediated teamwork [11, 13]

Symbol/signifier

Formal design: choice Conformance with of words, choice of style phraseology, style of via grammar speech, e.g. noun clauses; number of questions versus answers, number of directions, e.g. question versus request versus proposal

Communication profiles, for instance frequencies of requests within a cockpit crew [14]

Context

Environment in which communication takes place/situation. Trigger for communication, structure of communication, conventions and rules for communication

Analysis of communication system, relationship of team members, type of team task: problem solving versus coordinated actions, also, predecessors of communication events, e.g. information after question = = reply

Analysis which categories of events follow after a “disturbance” of a group (negative emotions) [15]

Meaning/content

Status of systems, e.g. technical failure of an aircraft; status of team

Content analysis of text

Impact of workload on cockpit crew communication [16]

Effect/impact

Communication leads to Analysis of actions, an action, change in reactions, e.g. behavioral behavior, change in anchors attitude, e.g. calling out “Mayday” causes emergency procedures

Observation, evaluation of presence of specific behaviors, e.g. TARGET [17]

Depending on the focus of analysis, different steps of preparing the raw communication data for further analysis are required. For instance, analyzing the context of communication by quantifying who communicates how often within a team, requires detecting communication events and link them to specific team members. In case the technical setup allows it, participants of a study could be equipped with individual wireless microphones. Either during data recording or in post-processing, based on the

Using Automatic Speech Recognition

505

signal strengths detected by the individual microphones, start and end times of communication events can be automatically calculated. This is state-of-the-art of smartphone microphones and video communication technology. Another way is to automatically segment the data recorded with a single microphone using AI-based speaker diarization, as described in Sect. 4.4. Summarizing, preparing data for this analysis could generally be done fully automatically. Compared to this, analyzing the formal style of communication, the choice of signifiers e.g. to evaluate the conformance with a given phraseology, data preparation requires a transcript of the spoken content. Furthermore, the transcript might require a correct grammar, e.g. to identify questions as a way how information is exchanged. 2.4 State-of-the Art of Automatic Speech Recognition (IAIS) Automatic Speech Recognition (ASR) is the transcription of the spoken words in an audio or video file. The research field of ASR has made considerable progress in the last decade due to the use of deep neural networks [18]. However, the performance of current state-of-the-art systems still varies depending on the conditions of the provided audio file. While the transcript for a clean audio of read speech from a domain or topic on which the system has been trained can be almost flawless, audio data with strong background noise, spontaneous, colloquial or non-native speech covering a topic which was not seen during training might still have a high error rate (as an example for such challenging conditions see the “Chime challenge” [19]). The conditions of cockpit communication described in the previous section share most of the above challenges. Approaches to tackle the issue of adapting an ASR system trained on other conditions to the field of aviation are called transfer learning [20] and use a relatively small amount of in-domain data (i.e. transcribed cockpit recordings) to finetune a general ASR system to a specific domain. To improve the models specifically for noisy environments, we apply multi-condition training by artificially mixing the training data with a variety of noises [21]. The Fraunhofer IAIS Audio Mining system [22] used in the course of this project analyzes audio and video files and does not only provide the transcript of the spoken words, but also a segmentation of the file into speaker turns. The system can be adapted to individual use cases by retraining the machine-learning based models, such as the acoustic model, lexicon and language model, on data specific to the use case. Since the project could only provide a limited amount of data for training, transfer learning techniques were applied to fine-tune the baseline model which was trained on about 1300 h of annotated broadcast data to the domain of aviation.

3 Approach to Analyze Team Communication The ITC project focused on communication, both as a facilitator of collaboration and a source for analyzing collaboration. Communication is a meta-process of the so-called teamwork processes and facilitates teamwork. Consequently, features of the communication process can be used as indictors of team processes. For instance, studies have linked a so-called pro-active communication behavior with good team performance [23].

506

A. Papenfuss and C. A. Schmidt

The approach was to analyze communication with regards to “Who says what to whom at which time and in which form?” So, all empirical studies conducted within the project, derived a set of features from raw audio data: First, for the analysis of team collaboration, shorter communication events (or segments) are of interest. The idea is that breaks or intermissions within the conversation stream indicate “closed” or meaningful units, or acts of communication that represent some sense or meaning and can be analyzed and interpreted on their own [24]. For each segment, start and end times of communication were logged. Second, the sender, the team member or entity generating an act, is of interest, as well as the receiver, the person or group to whom this act is directed to. In some studies, the features communication event and sender were derived automatically by assigning an individual microphone to each team member and by automatically detect silence and non-silence in the audio data. Third, the content of these communicative acts was required. It is used to conduct a content analysis, which has the goal to assigns a pre-defined category to each communication segment. The goal in this labeling process is to exclude subjective interpretation as much as possible, in order to ensure the objectivity of the analysis process. Therefore, this labeling is conducted by trained personnel according to a pre-defined coding scheme and often conducted by two persons. As a metric of this objectivity, inter-rater-reliability (IRR) can be calculated [25]. Based on the IRR results, specific categories are either excluded or re-organized in order to maximize IRR. There exist several coding schemes for the analysis of team and group processes (e.g. [10, 15, 26–28]), nevertheless there is no standardized coding as for each teamwork situation different communicative behaviors will be observable that are shaped by the task and the environment in which the team operates [6]. Within the ITC project, coding schemes were adapted to capture the specific perspectives on team collaboration that were of interest for each study.

4 Use-Case Cockpit Crew Communication 4.1 Collaboration Scenario Communication data for the cockpit crew use case was gathered in a study aimed at capturing the state-of-the-art collaboration between pilots and air traffic controllers. From a teamwork perspective, it was of interest to analyze team decision making in situations under time pressure. A scenario was designed that incorporated both: an air traffic controller working position and a cockpit simulator. To also realize a coupled simulation infrastructure, both simulators connected to a data exchange service that allowed to broadcast position and state date of aircraft to realize the coupled, interdependent scenario. The scenario incorporated the flight of the cockpit crew from Frankfurt airport to Braunschweig airport. It was intended that the status of the flight incrementally changes from a normal, to a priority flight to a mayday flight, to have examples if and how inter-team-collaboration adapts to these changed environmental conditions. During the cockpit crews flight, the left engine has a fuel leak which, at first, is indicated by an imbalance warning. At this state, it was intended that the crew realizes

Using Automatic Speech Recognition

507

a challenging situation which provides a couple of options on how to proceed with the flight (called priority phase). Later, this fuel leak causes a “low on fuel” warning which requires the crew to declare a mayday situation (mayday phase). This mayday status then urges the air traffic controller to give this flight priority over all other traffic and manage the surrounding air traffic accordingly. In the study, ten simulation runs with ten pilots (1 female, age 35 ± 3 years, flight experience 5300 ± 1200 h) and four different air traffic controllers (4 male, age 40 ± 9 years, experience 16 ± 11 years) were conducted. In the remaining simulation runs, staff from DLR with operational expertise in air traffic control took over the controller position. Radio communication between ATC and (simulation) pilots were recorded, as well as communication of the cockpit crew, eye tracking data of the first officer (FO), and questionnaires regarding workload, situation awareness and stress and arousal. Individual and joint debriefings were conducted after the simulations runs. All teams (MTS) finished the scenario in a safely manner and there were no visible differences in team performance. It must be mentioned that both standard and emergency procedures are clearly defined and are trained regularly by pilots and controllers. Furthermore, there was not clearly observable, join phase of collaboration between pilots and controllers in the priority phase. Within the debriefings, both pilots and controllers stated that they do not feel like being in a team with each other, because the follow different goals. Theoretically, the concept of “Crew Resource Management” (CRM) [29] foresees that from the viewpoint of the pilots also controllers are part of the crew. But the pilots stated that this is not how work is done. Nevertheless, they trust in each other’s competency and that decisions made and information given by the others are correct and trustworthy. Even if they do not understand decisions of the other, they would rather trust the other than asking for explanations. A non-ideal collaboration leads to delays but not necessarily to safetycritical situations. Both pilots and controllers knew situations from their daily experience, where more collaboration would make sense, e.g. from the point of view of pilots when approaching an airport with high traffic numbers. Based on these insights, it was concluded that analyzing the communication within the cockpit crew could be a more fruitful source of data to find indicators in team communication that explain different team performance. The following analyses therefore focus on the scenario from the point of view of the cockpit crew. Both cockpit crew members, being First Officer (FO, speaker 2) and Captain (CO, speaker 1) need to communicate in order to conduct the callouts required from the procedures. Furthermore, the crew needs to decide how to manage the technical failure(s) and to understand the impact on the flight. 4.2 Descriptive Analysis of Communication Structure For six runs, a literal transcript of the crew’s communication was produced manually, by listening to the video recordings. Data was collected in a table with each row representing one communication event, which will be referred to as ‘segment’ in the following paragraphs. Time stamps were taken which indicate start and stop of identified segments, as well speaker followed by a transcript of the communication content. The realistic communication data was analyzed in an exploratory manner.

508

A. Papenfuss and C. A. Schmidt

The runs lasted between 47 and 58 min with an average of 52 min (sd = 4). This duration was caused by the scenario, but from experience of other studies is quite a normal duration when investigating decision making in the cockpit. During this time, the crew was actively involved in verbal communication for an average of 72% of the time, ranging from 48% to 91% (sd = 17). These numbers show that verbal communication is a frequent behavior of cockpit crews. Analyzing this behavior is likely to give insights into the important cognitive processes of the crew and by this explain performance differences. The average length of one communication segment is five seconds (sd = 1), ranging from an average minimum of less than one second to an average maximum of 70 s (sd = 17, n = 6). Within one simulation run, communication in average consists of 424 segments (sd = 61), nearly equally distributed between the two crew members. The minimum share of communication segments was 47% (with 53% maximum share respectively). Most of the time, communication segments follow shortly one after the other, as on average 67% of all communication segments have an intermission of one second or less between each other, and a cumulated 80% of all segments have an intermission of less than four seconds between each other. These descriptive statistics show, that communication in the cockpit is characterized by frequent, short segments between two communication partners, which together build a nearly constant stream of segments. 4.3 Examples of Communication Situations With regards to communication content, within the cockpit crew communication there are situations that are highly structured, like following a procedure that requires call-outs and readbacks from the team members. In Table 2 communication segments referring to the take-off phase are shown. From segment 7 to 15, segments represent a procedure, which defines the speaker and content and the order of segments. Segments are short and only consist of one or two words. Furthermore, segments follow quickly one after the other. This communication situation is highly structured and can be compared to a normative flow of segments, defined by the procedures. The example furthermore illustrates the instantaneous change between languages, like in segment 1, as well as the use of very specific vocabular, like in segment 3. The communication situation in Table 3 can be characterized as more unstructured. The situation relates to a phase of problem solving and decision making. As explained above, communication is conceptualized as the means to facilitate the processes of problem solving and decision making. To structure this decision-making process and to prevent the crew from making hasty decisions, FORDEC was developed [30]. Pilots are trained to follow this structured process that consists of collecting Facts (F), developing options (O) and to examine risks associated with the options (R). Based on this analysis phase, a decision is selected (D), executed (E) and it is checked whether the decision had the expected effect or outcome (C). The communication content in Table 3 can be related to a phase, where the crew collects facts about the situation. After having recognized a warning given by the aircraft systems, the crew tries to understand the technical reasons behind this warning and the impact on the actual mission. This communication situation is less structured, consists more of natural German language with intermixed technical terms.

Using Automatic Speech Recognition

509

Table 2. Example of structured communication for take-off checklist Nr

Speaker

Content

1

1

Prima, ja vielen Dank. Rolling take off okay?

2

2

Gerne

3

1

TCAS geht nicht, können wir gerne wieder ausstellen

4

2

Achso okay

5

1

Dann haben wir die rote Warnung nicht. Vielen Dank

6

1

Ja dann, rolling Take-Off

7

1

When TOGA SAS Runway, Autothrust. Blue

8

2

Thrust set

9

2

100 knots

10

1

Checked

11

2

V1

12

1

Rotate

13

1

Positive Climb

14

1

Gear Up

15

2

Gear Up

16

2

Also jetzt habe ich gewechselt

Table 3. Unstructured communication about fuel status Nr Speaker Content 1

1

Ja, dann gucken wir mal auf die Fuel Page. Ehm, was haben wir jetzt noch für eine Endurance

2

2

Wir haben, also im Moment laufen noch beide. So, jetzt ist die Frage, wie lange laufen sie noch?

3

1

Ja

4

2

Weil wenn, na die ist jetzt im Wing würde ich fast sagen

5

1

Bitte?

6

2

Die ist jetzt, also es läuft ja irgendwo aus dem Flügel heraus, ne?

7

1

Ja offensichtlich. Oder aus dem Eninge Pylon, das kann auch sein

8

2

Ja, das weiß man natürlich nicht. Das heißt im Moment sind wir bei 2 Komma, ja der kämpft ein bisschen, also ich sag mal 2 einhalb?

9

1

Ja, ich habe jetzt nur nen bisschen Angst…

10 2

Und wenn wir die Speed jetzt nochmal rausnehmen?

11 1

Je nachdem…Speed rausnehmen ist ne gute Idee, 2,80 oder was?

510

A. Papenfuss and C. A. Schmidt

4.4 Current Results of ASR of Cockpit Crew Communication The ASR system described in Sect. 2.4 was adapted to the aviation use case with the manual transcripts of the six simulation runs, representing roughly six hours. Additionally, the phonetic representation of the NATO alphabet was provided as well as the specific phonetics of numbers. Transcripts of all the audio data were derived. Tests on a subsample of the automatic transcripts, containing examples of both structured and unstructured communication situations, showed that the trained basic system achieved a word error rate of 36 percent. As the transcripts should serve as a basis for further coding and ideally inferential statistical analyses, a lower error rate is desirable. The following paragraphs summarize the characteristics of the audio data at hand, representing the cockpit domain. The ASR system was mainly trained on audio data from newscasts. Compared to this data which is mostly recorded in the studio, the quality of the cockpit sound data is rather poor. Within a cockpit there is a rather high level of background noise, microphones from headsets were used and audio data was broadcasted over a voice-over-IP system. This is partly due to the simulation setup, but it can be assumed that also the real cockpit has background noises. Furthermore, the segment detection of ASR could not detect all speaker changes as intermissions were rather short and communication segments did overlap, especially in the unstructured communication phases. This is mostly not the case in newscast data where segments of the anchor or other speakers are usually longer. The characteristics mentioned here could be dealt with by improving audio recording systems, for instance by transcribing the individual audio streams from each headphone and not the combined audio stream as it was recorded for the video documentation of the simulator. Furthermore, communication situations differ in their characteristics, ranging from more natural, unstructured discussions to standardized procedures. These different communication situations were also identified by [30]. The intense discussion and exchange of thoughts in the unstructured situations also lead to incomplete sentences, sometimes with incomplete grammar. Furthermore, languages – in our case German and English are switched frequently within one segment, which is referred to as code-switching.

5 Summary and Outlook Summarized, applying the ASR system trained on broadcast data to cockpit communication with minimum training yielded a Word Error Rate of 36%. The characteristics of the cockpit communication, as described in Sect. 4.4 require more adaptation of the ASR models in order to derive accurate transcripts of the cockpit crew communication. Nevertheless, the descriptive analysis of the manual transcripts, as well as a long history of (manual) communication analysis of cockpit communication (e.g. [31, 32]) demonstrated, that communication between the crew members is a rich source to investigate team collaboration. Automated transcription of communication content could allow for a broad set of analysis and research questions related to teamwork. Lower word error rates could enable the envisioned time savings in the future. More experience is needed to understand which accuracy level is needed to actually save time.

Using Automatic Speech Recognition

511

At the moment, the most promising near-future application of ASR for teamwork analysis can be the detection of relevant keywords in cockpit crew communication as data preparation for a manual, more thorough analysis of video/audio data. So, ASR could identify candidate phases of team communication which are verified by a researcher. In order to gain more trust in the output of an ASR, it would be interesting to investigate if the concept of inter-rater-reliability could be applied. Overall, the first results highlight that in order to apply current ASR technology for cockpit crew communication, time and effort are needed to adapt and train the ASR. In the domain of air traffic control communication, it was shown that high levels of correct transcriptions can be achieved, if the technology is closely adapted to the domain [32]. In order to make use of ASR for a broader range of human factors studies on aviation, it is of interest to keep this effort as low as possible. Otherwise, the expected benefit of time-saving is questionable. As an outlook, since the end of the project, several algorithmic improvements have been achieved, which we have integrated or are planning to integrate into the Audio Mining solution. In the field of ASR research, recently transformer architectures have been introduced which use large amounts of unannotated raw audio data to automatically learn the acoustic characteristics of speech [21]. One of the latest achievements was the training of a multilingual transformer-based ASR system on 680k hours of speech, which also facilitates cross-lingual learning, such that the system achieves lower error rates on languages with little training data, especially in noisy environments [22]. Such models promise improved transcription quality as well as better handling of code-switching scenarios. An improved speaker diarization system based on X-Vectors [34] has already been integrated into the system. We will examine the impact of these improvements on the domain of aviation in a future work.

References 1. Eurocontrol. Airport collaborative decision-making: Improving the efficiency and resilience of airport operations by optimising the use of resources and improving the predictability of air traffic. [cited 2023 14.03.]; https://www.eurocontrol.int/concept/airport-collaborative-dec ision-making 2. Schulze Kissing, D., Bruder, C.: Der Einsatz Synthetischer Aufgabenumgebungen zur Untersuchung kollaborativer Prozesse in Leitzentralen am Beispiel der "generic Control Center Task Environment" (ConCenT). Kognitive Systeme, 2016 3. Bruder, C., et al.: Do you know what i know? - investigating and enhancing transactive memory systems in multi-team systems. In: 1st Human System Integration Conference. 2019, INCOSE, Biarritz, France (2019) 4. Keyton, J., Beck, S.J., Asbury, M.B.: Macrocognition: a communication persepctive. Theor. Issues Ergon. Sci. 11(4), 272–286 (2010) 5. Dickinson, T.L., McIntyre, R.M.: A conceptual framework for teamwork measurement. In: Brannick, M.T., Salas, E., Prince, C. (eds.) Team Performance Assessment and Measurement: Theory, Methods, and Application, pp. 19–44. Lawrence Erlbaum, Mahwah (1997) 6. Papenfuss, A.: Phenotypes of teamwork - an explorative study of tower controller teams. In: HFES International Annual Meeting, San Diego (2013) 7. Keyton, J., Ford, D.J., Smith, F.L.: Communication, Collaboration, and identification as facilitators and constraints of multiteam systems. In: Zaccaro, S.J., Marks, M.A., DeChurch, L.

512

8.

9. 10.

11.

12. 13. 14.

15. 16.

17. 18. 19.

20.

21.

22. 23. 24. 25. 26. 27.

A. Papenfuss and C. A. Schmidt (eds.) Multiteam Systems. An Organization Form for Dynamic and Complex Environments, 2012, p. 173–191. Routledge: New York (2012) Papenfuss, A., Carstengerdes, N., Günther, Y.: Konzept zur Kooperation in FlughafenLeitständen, in 57. FACHAUSSCHUSSSITZUNG ANTHROPOTECHNIK. Rostock, Germany (2015) Bruder, C.: DLR COCO-Symposium about collaborative operations in control rooms (meeting report). Aviat. Psychol. Appl. Hum. Fact. 8(1), 58–61 (2018) Papenfuss, A.: Identifizierung leistungsrelevanter Merkmale der Teamkommunikation in der Domäne Flugsicherung, in Fakultät V - Verkehrs- und Maschinensysteme. TU Berlin, Berlin (2019) Straus, S.G., McGrath, J.E.: Does the medium matter? The interaction of task type and technology on group performance and member reactions. J. Appl. Psychol. 79(1), 87–97 (1994) Graetz, K.A., et al.: Information sharing in face-to-face, teleconferencing, and electronic chat groups. Small Group Res. 29(6), 714–743 (1998) Purdy, J.M., Nye, P., Balakrishnan, P.V.: The impact of communication media on negotiation outcomes. Int. J. Confl. Manag. 11(2), 162–187 (2000) Fischer, U., Orasanu, J.M.: Say it again, Sam! Effective communication strategies to mitigate pilot error. In: Proceedings of the 10th International Symposium on Aviation Psychology, Columbus, OH (1999) Bales, R.F., The equilibrium problems in small groups. In: Argyle, M. (ed.) Social Encounters. Contributions to Social Interaction, Aldine Tramsaction: New Brunswick (1973) Silberstein, D., Dietrich, R.: Cockpit communication under high cognitive workload. In: Dietrich, R. (ed.) Communication in High Risk Environments, pp. 9–56. Buske, Hamburg (2003) Fowlkes, J.E., et al.: Improving the measurement of team performance: the TARGETs methodology. Military Psychol. 6(1), 47–61 (1994) Huan, X., Baker, J., Reddy, R.: A historical perspective of speech recognition. Commun. ACM 57(1), 94–103 (2014) Du, J., et al.: The USTC-iFlytek systems for CHiME-5 challenge. In: 5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018). Hyderabad, India (2018) Wang, D., Zheng, T.F.: Transfer learning for speech and language processing. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 2015. Hongkong, China: IEEE (2015) Gref, M., Schmidt, C.A., Köhler, J.: Improving robust speech recognition for German oral history interviews using multi-condition training. In: Speech Communication. 13. ITG-Fachtagung Sprachkommunikation 2018 (2018) Schmidt, C., Stadtschnitzer, M., Köhler, J.: The Fraunhofer IAIS audio mining system: current state and future directions. In: Speech Communication; 12. ITG Symposium, VDE (2016) Entin, E.E., Serfaty, D.: Adaptive Team Coordination. Hum. Fact. J. Hum. Fact. Ergon. Soc. 41(2), 312–325 (1999) Swerts, M.: Prosodic features at discourse boundaries of different strength. J. Acoust. Soc. Am. 101(1), 514–521 (1997) Krippendorff, K.: Reliability in content analysis: some common misconceptions and recommendations. Hum. Commun. Res. 30(3), 411–433 (2004) Schermuly, C.C., et al.: Das Instrument zur Kodierung von Diskussionen (IKD). Zeitschrift für Arbeits- und Organisationspsychologie 54(4), 149–170 (2010) Futoran, G.C., Kelly, J.R., McGrath, J.E.: TEMPO: A time-based system for analysis of group interaction process. Basic Appl. Soc. Psychol. 10(3), 211–232 (1989)

Using Automatic Speech Recognition

513

28. Badke-Schaub, P., et al.: Mental models in design teams: a valid approach to performance in design collaboration? CoDesign 3(1), 5–20 (2007) 29. Kanki, B.G., Helmreich, R.L., Anca, J.: Crew Ressource Management, 2 edn. Academic Press, San Diego (2010) 30. Hofinger, G., et al.: FOR-DEC & Co: Hilfen für strukturiertes Entscheiden im Team. In: Heimann, R., Strohschneider, S., Schaub, H. (eds.) Entscheiden in kritischen Situationen: Neue Perspektiven und Erkenntnisse, Verlag für Polizeiwissenschaft. pp. 119–136 (2014) 31. Orasanu, J.M., Fischer, U.: Team cognition in the cockpit: linguistic control of shared problem solving. In: Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society. Indiana, Bloomington: Lawrence Erlbaum Associates, Inc. (1992) 32. Sexton, J.B., Helmreich, R.L.: Analyzing cockpit communication: the links between language, performance, error, and workload. In: Proceedings of the Tenth International Symposium on Aviation Psychology, Columbus, OH (1999) 33. Ohneiser, O., et al.: Understanding tower controller communication support in air traffic control displays. In: 12th SESAR Innovation Days, Budapest, Hungary (2022) 34. Landini, F., et al.: Bayesion hmm clustering of x-vector sequences (vbx) in speaker diarization: theory, implementation and analysis on standard tasks. Comput. Speech Lang. 71 (2022)

Parsing Pointer Movements in a Target Unaware Environment Jonah Scudere-Weiss1(B)

, Abigail Wilson1 , Danielle Allessio2 , Will Lee2 , and John Magee1

1 Department of Computer Science, Clark University, Worcester, MA 01610, USA

{Jscudereweiss,abwilson,jmagee}@clarku.edu 2 College of Information and Computer Sciences, University of Massachusetts Amherst,

Amherst, MA 01003, USA

Abstract. Analysis of the movements of the Mouse pointer could lead to valuable insights into a user’s mental status in digital environments. Previous research has yielded data showing a significant link between user mental status and pointer movements [1] . However, there is currently no standardized system to detect and parse out individual targeted movements of a mouse pointer by a user in an active environment. Active analysis of mouse movements could be useful in situations where the emotional state of the user is being measured. Data was collected through the Mathspring Project including results of problems solved, the facial expressions and self-reported emotions of students, and the movements of the mouse pointer, which is the focus of this work [3]. Although a connection has been shown in previous research [1], the ability to track this in a live system is held back by the manual process for splitting the motions of the pointer. The focus of this project is the development of a generalizable system to parse these movements automatically without needing much processing power or an immense amount of training data for each time. Keywords: Trust in Automation/Automation and Autonomous Systems · User Experience Improvement · Human Factor Measures and Methods

1 Structure of a Mouse Movement 1.1 Fitt’s Law The movement of a mouse has several complexities to it but can generally be split into two parts: an initial longer motion in a roughly straight line, and a series of corrective movements leading into the actual target. This motion can be described using Fitts’ Law, which states that the time it takes to reach a target can be described as a function of the ratio between the width of the target and the distance to the target, and is defined as follows: MT = a + b ∗ ID = a + b ∗ log2 (2A/W)

(1)

where a and b are derived constants, ID is the difficulty index, equivalent to log2 (2A/W). A is the amplitude of the movement, and W is the width of the target [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 514–522, 2023. https://doi.org/10.1007/978-3-031-35389-5_35

Parsing Pointer Movements in a Target Unaware Environment

515

1.2 Inversion of Fitts’ Law This analysis is being done from a target unaware environment, the system has no knowledge of what is on the screen, only what the pointer itself is doing. Fitts’ law is only applicable when the size and distance to a target is known. In target unaware terms, detecting a mouse movement involves analyzing a vector of the three major factors: movement time, distance, and angle (Fig. 1).

Fig. 1. An example of an optimal mouse movement

By starting with an assumption of Fitts’ Law and working backwards a framework for parsing mouse movements was developed considering three major factors, The Time Delta (TΔ), the Distance Delta (DΔ), and the Angle (θ ). The TΔ between mouse movements is an important indicator of discrete movement events in a target unaware environment. A large TΔ indicates a long period of non-motion whereas a small TΔ shows active movement. Unlike time, DΔ and θ between the current point in the motion, and the points directly in front of and behind it must be analyzed together to generate new motions. A low θ indicates the value is near to π in radians. DΔ is measured as high if these two requirements are met: first, DΔ is longer than a baseline distance value, second, the DΔ must be larger than a percentage of the sum of all distances leading up to it within the currently active movement.

516

J. Scudere-Weiss et al.

1.3 First Algorithm The following process is the initial mechanism developed to parse motions. The initial motion occurs when DΔ is high and θ is low, such that vectors which are found to contain this movement would qualify as initial motion. Once high θ and low DΔ are detected in the vectors then, the motion has entered the corrective phase. Finally, after this correction movement completes, the motion will shift back to high DΔ and low θ. Upon the detection of those motions a new mouse movement is established (Fig. 2).

Fig. 2. A Digraph of the initial system used to detect mouse movements

This system could be used to collect mouse data in a live environment while being actively split into distinct cursor movements. 1.4 Reduction of the Inverted FItt’s Law Algorithm Redundancies One initial change that was made very early on was the elimination of the initial 1 state. Al though it represented a step in the beginning of the process, the immediate detection of “high-angle” resulted in no data analysis changes after the initial_1 state removal. Non-angular Motion One of the major obstacles this first version of the algorithm faced was the issue of angles. While the stated method of how to analyze mouse data by inverting Fitt’s law works in theory, the reality is that a mouse movement will not always have high changes in angle. Oftentimes mouse movements will instead have clearly defined start and end points where the user has no issues moving from point A to point B in what is a nearly straight line. This means the angle data can stop the system from detecting what may otherwise be a mouse movement. The hope was that the time difference point would detect more of these straight line motions, however that proved to be ineffective. The Reduced Algorithm The following process is the Reduced mechanism developed to parse motions. The initial motion occurs when DΔ is high, such that vectors which are found to contain this movement would qualify as initial motion. Once low DΔ is detected in the vectors the motion has entered the corrective phase. Finally, after this correction movement completes, the motion will shift back to high DΔ. Upon the detection of those motions a new mouse movement is established. At any point in this process a new motion can be established via a High TΔ (Fig. 3).

Parsing Pointer Movements in a Target Unaware Environment

517

Fig. 3. The State graph for the reduced detection algorithm

2 Experiment and Analysis 2.1 Experiment Although we had much data related to mouse movements from previous tests of the Mathspring system, we realized that we lacked the context needed to manually mark the endpoints of the old data for testing as we lacked the information regarding what was on the screen at the time of collection. This context was needed and therefore this data was unsuitable to test the system. As such new data with the appropriate context was required. An experiment was designed using the Mathspring system to allow the analysis of the Mouse splitting algorithm. Nine college age students answered math questions for a period of time ranging from forty-five to sixty minutes in the Mathspring system. During this time a researcher watched over the data being collected, ensuring that the Mathspring system was working as intended and stepping in only if the program itself had any errors. A screen recording was also taken on the device for later analysis. All participants in the study used a mouse to interact with the pointer, no data was collected from laptop trackpads. 2.2 Analysis Upon completion of the data collection, researchers went through the recorded screen data and manually marked the locations of the endings of mouse movements using the established framework of mouse movement appearances This timing data was then synced with the correct mouse data to generate the dataset of correct outputs. 2.3 Testing and Fitness The above system was then coded, and a simple genetic machine learning algorithm was constructed to find the correct variables for the system. Each mouse movement was grouped and then scattered into a random order, which maintained the inline nature of the system while allowing the randomization of training, test, and reporting datasets.

3 Evolutionary Algorithm The implemented evolutionary algorithm worked as follows 1. Generate a random generation of size 150

518

2. 3. 4. 5. 6. 7.

J. Scudere-Weiss et al.

Evaluate the generation. Run a weighted crossover on the top 30% of the generation to generate 37 children. Apply a bit of random mutation to those children. Evaluate the children and merge with the initial 150 Eliminate all with a fitness score lying below that group of 150 Repeat steps 3–6 n times, where n is the total number of generations

3.1 Fitness Function Testing the system required a method to score the fitness of each candidate. Originally edit distance (Levenshtein distance) was thought to be a good candidate for this process. However due to its nature of being On2 and the immense size of the dataset, it was determined that Levenshtein distance was too computationally intensive to be used to register fitness. As such a new System was developed with our particular goals in mind. An important piece of the mouse data is that the data consists mostly of the negative case (the mouse is not starting a new movement) with positive movements representing n% of the dataset. With this consideration and the fact that data was classified as a binary of either is, or is not, a new movement a Simplified edit distance formula was determined. This process has been named Simplified Binary Edit Distance (SBED). Simplified Binary Edit Distance The process of SBED is as follows: First take the system output (Y S ) and true output (Y T ) and construct a list for each containing the index locations of the positive values (Y S - > L S , Y T - > L T ); Then for each value from (L S - > L T ) take the minimum of the difference between the value in L S and all values in L T , generating the list DS ; Repeat this same process from L S -> L T and generate DT ; Finally take the sum of all values in DT + DS , generating a final score. You can choose to minimize this as an error calculation or divide by n(n + 1)/2 where n is the length of Y S to generate a score to maximize.  m    k (2) SBEDscore = min abs(LSi − LT )+ min abs(LTm − LS ) i=0

j=0

where k is the number of values in LS and m is the number of values in LT. Notably the basic setup of SBED is now Ok2 time complexity where k is the number of positive cases. However this can be made into Ok (linear) time complexity by usage of ordered lists, iterating through both lists simultaneously, and stepping forward in either one when the minimum for that value is found. SBED relies on three key assumptions. First there are only two cases, if there are more than two cases this equation fails. Second, only distances between the positive cases is relevant, if positive cases are missing or in the wrong locations it will appear very clearly in the distance. Finally, it is important that the data is heavily weighted toward either the positive or negative case. If the expected output has a more equivalent distribution of cases then the SBED score would be less valuable, this is especially important when using the exponential implementation of SBED for time purposes. The SBED score increases exponentially when outputs contain different counts of mouse movements. It increases similarly if those values are weighted to one side of the

Parsing Pointer Movements in a Target Unaware Environment

519

dataset, as that will cause the same distances to be measured multiple times. Because of this, SBED encourages the system to find similar numbers of mouse movements and in similar locations. Upon completion of training, the results of the system were tested against an actual Levenshtein Distance to see if the results were accurate.

4 Results We were able to achieve a system with a degree of success (Tables 1, 2 and 3). Table 1. Final values the model after N generations Titles

High time delta (TΔ),

Low distance delta (DΔ),

High distance (percentage)

High radian delta (RΔ),

Scaled Reduced

0.812061

0.740962

0.428721

NA

Scaled Full

0.830061

0.61366

0.132242

0.677810

Table 2. The final Scores of the System Reduced/Full

SBED score

SBED percentage

Reduced

153204

0.9929088619811214

Full

148825

0.9997178647667658

Table 3. More standard methods of understanding results of a binary Classifier Reduced/Full

TP (True Positive)

FP (False Positive)

TN (True Negative)

FN (False Negative)

Reduced

2

58

18626

274

Full

2

54

18640

274

Counter to what was expected, the full prediction system seems to perform better than the reduced which includes change in angle. While a difference of roughly 0.007 may seem small, since the scale for the upper limit of the score is exponential, small differences can mean a lot especially as the score gets closer to perfection. One possible reason for the increased performance is that the addition of angles to detect movements reduces the total number of movements it detects, which can be seen in the lower false positive rate of the full System (Figs. 4 and 5)

520

J. Scudere-Weiss et al.

Fig. 4. Chart for graph showing progression of best and worst child in each generation

Fig. 5. Shows the best generational outputs for movements over time

Parsing Pointer Movements in a Target Unaware Environment

521

5 Discussion Many factors could cause errors, but a small number of additional false positives is acceptable here, as it is distinctly possible the ends of some mouse movements were missed in the manual analysis process. The generalizability of this system could also be further limited by the granularity of the data. The Mathspring mouse data tracks at a rate of 20 ms between detections, if a system collects data slower or faster than this then the specific values needed for the framework may change. One factor that may affect the total accuracy of our system and specifically its ability to accurately predict exactly where movements end, is the fact that the screen recorder only works at 30 fps. This means the granularity of the human analysis data, and the granularity of the mouse data could be different causing difficulties in granularity. 5.1 Future Ideas Combined Classifier One idea for improving this data classification is to use ML to find an optimal linear classifier over distance and time to get an initial classification instead of the random system. Then, like the current system, run this mechanism over the combined dataset to generate the outputs. This would allow us to classify the initial mouse movements using time. Then run our classification system over the data in between. This could allow for a faster identification of points, to an acceptable degree, then find the correct classification for the remaining points. This option could lead to a more accurate algorithm, but has the downside of requiring a new form of analysis. Full ML Another option to optimize this problem is to use a solution that is fully Machine Learning based. The goal of this is still to find a lightweight and replicable system, so its complexity would need to be capped off, however effective implementation of this system could eliminate the need to find an engineered generalized solution or point us in the right direction for a complete solution. Improved Evolutionary Algorithm The current version of the evolutionary algorithm does not have any mechanisms separating elite mutation or combined children. In fact it does not have parents at all, instead opting to generate a general gene pool from the top 30% of the previous generation. This particular implementation could be causing the evolutionary algorithm to fall into a local Maxima, which could be escaped through a better implementation.

522

J. Scudere-Weiss et al.

6 Conclusion The nature of human fine motor movement can provide insight into a user’s mental state [1]. As such the movement of a mouse by a user can contribute meaningful data however, the structure of a mouse movement is a complicated motion to analyze. It can be difficult to know where one movement stops, and another begins. The goal of this research is to build a system which can take raw data from mouse movements, and parse that data into a set of pointer movements with defined endpoints, without any context of the displayed page’s information. This novel method of splitting the mouse data will be used to improve the emotion detection system in conjunction with the other data collected using the Mathspring system. Acknowledgements. NSF support for this project is acknowledged and greatly appreciated (IIS1551590, IIS-1551589). The authors gratefully thank the MathSpring team at the University of Massachusetts, Amherst for assistance with data collection.

References 1. Yamauchi, T., Leontyev, A., Razavi, M.: Assessing emotion by mouse-cursor tracking: theoretical and empirical rationales. In: 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), 2019, pp. 89–95. https://doi.org/10.1109/ACII.2019.892 5537.) 2. Chen, Y., Hoffmann, E.R., Goonetilleke, R.S.: Structure of hand/mouse movements. IEEE Trans. Hum.-Mach. Syst. 45(6), 790–798 (2015). https://doi.org/10.1109/THMS.2015.243 0872.) 3. Yu, H., et al.: Measuring and integrating facial expressions and head pose as indicators of engagement and affect in tutoring systems. In: Sottilare, R.A., Schwarz, J. (eds.) HCII 2021. LNCS, vol. 12793, pp. 219–233. Springer, Cham (2021). https://doi.org/10.1007/978-3-03077873-6_16

A Framework of Pedestrian-Vehicle Interaction Scenarios for eHMI Design and Evaluation Yuanming Song1 , Xiangling Zhuang1(B) , and Jingyu Zhang2,3 1 Shaanxi Key Laboratory of Behavior and Cognitive Neuroscience, School of Psychology,

Shaanxi Normal University, Xi’an, China [email protected] 2 CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China 3 Department of Psychology, University of the Chinese Academy of Sciences, Beijing, China

Abstract. The emergence of autonomous vehicles has brought new challenges for pedestrian-vehicle interaction (PVI). To facilitate such interactions, many external human-machine interface (eHMI) concepts have been proposed. However, the development and evaluation of these eHMIs should be based on representative PVI scenarios in daily life, which have not been systematically constructed. The goal of this study is to: 1) Identify typical PVI scenarios and analyze their constituent elements; 2) Provide a practical method to generate PVI scenarios to support the development and evaluation of eHMIs. With a literature review and a focus group interview, we concluded four typical sets of PVI scenarios: street-crossing scenarios, starting scenarios, parking scenarios, and following scenarios. After analyzing key elements of these scenarios, we constructed, from the perspective of the human-machine-environment system, a three-dimension scenario analysis framework that consisted of pedestrian, vehicle, and environmental variables. We then used our framework to generate several common or challenging PVI scenarios where eHMIs can play a role and needs to be evaluated. Our work contributes to the development and evaluation of eHMIs in both academia and industry. Keywords: Autonomous Vehicles · External Human-machine Interfaces · Pedestrian-vehicle Interaction

1 Introduction As autonomous vehicles (AVs) gradually join everyday traffic, they also bring with them some new challenges that differ from traditional interactions between vehicles and other road users (ORUs). Pedestrians are typical vulnerable road users (VRUs) among ORUs. In the processes of traditional pedestrian-vehicle interaction (PVI), pedestrians can use driver cues (e.g., eye contact) to decide whether they are noticed and whether they should pass [1]. However, for high-level autonomous vehicles, driving tasks are often operated by the system, which makes driver cues no longer reliable [2]. Besides, the driving behavior of the system and the human drivers may be different. If pedestrians still interact with AVs based on their experience, they may form wrong expectations that lead to traffic accidents [3]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 523–532, 2023. https://doi.org/10.1007/978-3-031-35389-5_36

524

Y. Song et al.

To facilitate interactions between pedestrians and AVs, or between ORUs and AVs in general, many external human-machine interfaces (eHMIs) have been explored. These interfaces can be regarded as essentially signaling devices that present specific content (e.g., motions, intentions, suggestions, etc.) in a specific form (e.g., signs, animations, sounds, etc.) to achieve vehicle-to-pedestrian (V2P) communication or V2X communication in general. For example, Clamann, Aubert, and Cummings [4] designed an eHMI that directly presents the speed of the vehicle in the form of text. Velasco, Farah, van Arem, and Hagenzieker [5] designed an eHMI in the form of icons that suggests pedestrians can cross. The eHMI of Deb, Strawderman, and Carruth [6] conveys the same message in the form of voice. Dey et al. [7] made a general classification and summary of various eHMIs, and Jiang et al. [2] analyzed and sorted out the eHMIs for pedestrian-vehicle interaction based on the pedestrian road crossing decision-making model. With the help of these eHMIs, pedestrians and ORUs are expected to have a clearer understanding of AVs’ status, form more reliable situation awareness, and make safer and more efficient decisions. However, these benefits depend on whether the eHMIs have been designed to match pedestrians’ and vehicles’ needs, which vary in different interaction scenarios. For example, for a pedestrian preparing to cross the road, it is important to understand the vehicle’s motion and yield intention, while for a pedestrian blocking the way of the vehicle, it is important to notice the vehicle and give way. Therefore, as the first step towards user-centered eHMIs, we need many sets of typical interaction scenarios at both the development and evaluation phases. From the perspective of development, the key information of ORUs, AVs, and the environment are all included in a scenario. Through the analysis and organization of the key information, we can set proper goals, contents, and forms for eHMIs. From the perspective of evaluation, a scenario is also a test environment, which includes several challenges that need to be overcome by eHMIs. The developed eHMIs can be tested and compared in one or several scenarios. However, current studies usually selectively chose one or two typical scenarios (usually including a vehicle yield to a pedestrian waiting to cross) for evaluating their own designs. A comprehensive list of ORU-AV interaction scenarios that can support more tailored and adaptive designs is still unavailable. Our study hopes to fill in this gap. Since an exhaustive list of all potential scenarios can be excessively detailed, we constrained the scope to pedestrian-AV interaction or pedestrian-vehicle interaction scenarios in general. The goal of our study is to summarize and generalize typical PVI scenarios and build a framework, through which we can generate scenarios to assist the development and evaluation of eHMIs.

A Framework of Pedestrian-Vehicle Interaction Scenarios

525

2 Methods and Results 2.1 Pedestrian-Vehicle Interaction Scenarios from Literature We first reviewed literature for potential interaction scenarios. Since there were no studies or summary documents that directly list PVI scenarios, we referred to the literature analyzing pedestrian-involved accidents instead. The reason why we selected the accident-related literature is that the safety issue is still one of the most important issues for AVs. In these scenarios, eHMIs may play a very critical role. Pedestrians crossing the street can be a set of risky traffic situations. Chen, Dai, Wang, and He [8] analyzed 49 pedestrian-vehicle serious traffic accidents collected by the Shanghai United Road Traffic Safety Scientific Research Center (SHUFO). They found that about 76% of accidents occurred on straight roads and about 16% of accidents occurred at intersections. The main causes of accidents on straight roads were pedestrians jaywalking and vehicles not yielding, while at intersections were pedestrians running red lights and vehicles not yielding. The research of Sewalkar and Seitz [9] further distinguished the motion state of pedestrians on straight roads and found that 88% of pedestrian fatal accidents were pedestrians crossing in front of coming vehicles, and 12% were pedestrians and vehicles moving in the same direction. Zhang, Liu, Li, and Gao [10] analyzed 181 pedestrian fatal accidents recorded in the National Automobile Accident In-depth Investigation System (NAIS). They found that 81.2% of pedestrian fatal accidents occurred on straight roads without street-crossing facilities (e.g., traffic lights, crosswalks), 7.8% of accidents occurred at intersections, half of which were T-intersections and half were cross-intersections, and 5.5% of accidents occurred on sidewalks. For the traffic lane, 16% of accidents occurred on a single lane, 42.0% occurred on the left lane (near the middle of the road), 18.8% occurred on the middle lane (or lanes), and 16.0% occurred on the right lane (away from the middle of the road). Among the 146 accidents caused by pedestrians jaywalking, 49.8% of them crossed the green verge, and 24.3% of them crossed the yellow line in the middle of the road. It is particularly worth noting that in their sample none of the accidents occurred in the process of a right turn. Tan, Che, Xiao, Li, Zhang, Xu [11] also used NAIS and analyzed 441 accidents in which the Abbreviated Injury Scale (AIS) for pedestrians reached level 3. They found that 68.9% of accidents occurred on straight roads where vehicles were moving forward, and pedestrians crossed the road from both sides. About 7.5% of accidents occurred at T-intersections and cross-intersections where vehicles were moving forward, and pedestrians crossed the road from the right side. So far, the findings of multiple studies are highly consistent. We can conclude from the perspective of pedestrian-involved traffic accidents that street-crossing (especially on straight roads) scenarios are one typical and risky set of PVI scenarios.

526

Y. Song et al.

2.2 Pedestrian-Vehicle Interaction Scenarios from a Focus Group Interview We also held an online focus group interview on the theme of PVI scenarios, as a supplement to the diversity and uniqueness of PVI scenarios. Five participants attended the interview. Two experts in the field of traffic ergonomics, one eHMI product manager from a car company, one novice driver (driving experience of less than 1 year), and one expert driver (driving experience of more than 20 years) took part in the interview. They were divided into the practitioner group and the driver group, providing knowledge from different perspectives. The researcher served as the host, with an assistant taking notes. Group members were guided to discuss three main topics: the recall of PVI scenarios, the feeling in these scenarios, and the potential role of eHMIs. First, participants were asked to recall the situations in which they encountered vehicles as a pedestrian. And then the situations in which they encountered pedestrians as a driver or a passenger. Participants were encouraged to recall as meticulously as possible, and the researcher also kept asking for the details. They were also asked to imagine how things would be different if the vehicle became an autonomous vehicle. Then, based on the scenarios they just said, participants continued to talk about how they felt during the interaction. And how things would change for an autonomous vehicle. We paid particular attention to the safety aspects and emotional aspects of the experience. Finally, the researcher introduced the concept of eHMI to the participants as succinctly as possible so as not to cause unnecessary bias. Participants were free to talk about how these devices could improve pedestrian-vehicle interaction, how could they help pedestrians or vehicles, what they should do, what they should look like, etc. The scenarios mentioned by the participants included: crosswalks, apartment complexes, bus stops, parking spaces in shopping malls, wet markets, hospitals, schools, etc. Strictly speaking, these were not PVI scenarios, but sites where pedestrians and vehicles met. Participants were encouraged to recall as many details as possible, they also talked about how they felt during their own pedestrian-vehicle interaction experience and how eHMIs can play a beneficial role. A participant said, for example, “You may encounter pedestrians near the bus stop. Maybe one just gets off the bus and suddenly jumps out from the front or back of the bus. The cars behind would have no time to react. That’s what we often call the “multiple threat”. Such situations are very dangerous and stressful for both pedestrians and drivers. An eHMI may help if it can remind pedestrians not to do this because there is a car coming nearby or detect and tell the driver the unseen pedestrian in advance. Both ways can help”. For another example, a participant said, “Wet markets can be very chaotic. Hawkers set up their stalls directly on the road. There are many people buying things. It is difficult for vehicles to drive on such a road. First, it is too noisy for pedestrians to notice the cars behind. Second, there are so many people gathering together and they just don’t let the cars pass. Driving on such a road is annoying. Wish I have a better way than honking to get pedestrians out of the way”.

A Framework of Pedestrian-Vehicle Interaction Scenarios

527

Many participants mentioned parking spaces in shopping malls and apartment complexes where they were about to drive away or park, but pedestrians blocked the way. Some participants also mentioned that on the interior roads of apartment complexes, they sometimes had to drive slowly behind pedestrians because they shared the road. We analyzed the conversations of the participants and found that there were three key variables of a PVI scenario. The first was the location where a pedestrian-vehicle interaction happened. We summarized four types of locations from a variety of specific places. They were 1) straight roads, 2) intersections, 3) interior roads, and 4) parking places, ordered by the general vehicle speed from fast to slow. The second was the status of the vehicle. We summarized five types of vehicle status, which were 1) holding still, 2) accelerating forward, 3) cruising, 4) slowing down, and 5) moving backward. They were descriptions of vehicles’ physical motion. The third was the status of the pedestrian. We summarized three types of pedestrian status, which were 1) standing still, 2) walking laterally, and 3) walking longitudinally. They were descriptions of pedestrians’ physical motion relative to the forward direction of the vehicle. In a more general way, besides the street-crossing scenarios mentioned above, we concluded another three sets of PVI scenarios from our focus group interview. They were starting, parking, and following scenarios. Starting scenarios took place at parking places where vehicles were holding still and about to leave but pedestrians blocked the way. Parking scenarios took place also at parking places where vehicles were holding still and about to park but pedestrians blocked the way. Following scenarios took place on interior roads where vehicles and pedestrians shared the same lane and vehicles often had to follow pedestrians slowly.

3 A Framework of Pedestrian-Vehicle Interaction Scenarios In addition to the key variables about PVI scenarios that had been summarized above, we added more variables that may affect pedestrian-vehicle interaction and constructed a more comprehensive framework. Because our ultimate goal was to use this framework to generate PVI scenarios and support the development and evaluation of eHMIs, we mainly considered adding variables that could affect eHMIs’ functional goal, the content of communication, and the form of communication to this framework. • Functional goal: The purpose of using an eHMI in a PVI scenario and the effect that is expected to be achieved. For example, in street-crossing scenarios, the functional goal of an eHMI is to remind pedestrians to cross the road or not, while in starting scenarios, the functional goal is to ask pedestrians to give way. The comparison between the observed effect and the expected effect can measure the functional effectiveness of an eHMI. • The content of communication: Specific information (e.g., physical motion, intention, suggestion) that is conveyed through eHMIs. • The form of communication: The way or medium by which the information is conveyed (e.g., signs, animations, sounds).

528

Y. Song et al.

Many variables may affect eHMIs’ functional goal, the content of communication, and the form of communication. To be more concise, we tried to include those related to the dynamic pedestrian road crossing decision-making model [2]. We also referred to the general framework of a human-machine-environment system in engineering psychology to guide our analysis. The whole framework was shown in Table 1. Table 1. A three-dimension framework of pedestrian-vehicle interaction scenarios with the possible settings of variables. Dimension

Variable

Setting

Pedestrian

Intent

Crossing/Strolling/Unspecified

Position (Relative to vehicle)

Front/Right front/Right/Right rear/Rear/Left rear/Left/Left front (Or 0–360°, the forward direction of the vehicle is 0°, clockwise)

Direction (Relative to vehicle)

Go forward/Go right/Go backward/Go left (Or 0–360°, the forward direction of the vehicle is 0°, clockwise)

Movement

Standing/Walking/Running

Population

Single/Multiple

Age

Child/Adult/Senior

Attention

Focused/Visual occupied/Auditory occupied/Audiovisual occupied

Task

Start/Go straight/Turn/Park

Distance (Relative to pedestrian)

Near/Far (Or accurate to meters)

Motion

Holding still/Accelerating forward/Cruising/Slowing down/Moving backward

Location

Straight road/Intersection/Interior road/Parking place

Facilities

None/Crosswalk/Traffic light/Both

Illumination

Daylight/Streetlights/None

Disturbance

None/Visual/Auditory/Audiovisual

Vehicle

Environment

A Framework of Pedestrian-Vehicle Interaction Scenarios

529

According to our framework, we constructed and made schematic diagrams of the above four typical scenarios (street-crossing, starting, parking, and following). Since street-crossing scenarios were probably the most important, we also stacked some variables for it, making three variants of different settings. Street-crossing scenarios including variants are shown in Fig. 1. Starting, parking, and following scenarios are shown in Fig. 2.

Fig. 1. Standard street-crossing scenario and its variants as an example. The attached table is a subset of the PVI scenario framework because it only shows the changed settings. In these scenarios, the same settings are that the intent of pedestrians is to cross, and they are all located in front of the vehicle to the right, standing. The task of the vehicle is to go straight, coming from a distance. The place where the interaction takes place is on a straight road. The easy variant adds crossing facilities, namely crosswalks and traffic lights. They make it safer for pedestrians to cross the road. The hard variant changes the illumination to streetlights and adds other vehicles parked on both sides of the road at night, causing visual occlusion. The customized variant means that researchers and designers can generate their own scenarios through our framework. For example, the number of pedestrians becomes multiple, and the age becomes senior. These settings require eHMI designers to consider communication directionality and politeness issues.

530

Y. Song et al.

Fig. 2. Standard starting, parking, and following scenarios. The attached table only shows the changed settings. In these scenarios, the same settings are that there is only one adult pedestrian, the vehicle is near the pedestrian, there are no crossing facilities or special disturbances, and the interaction happens during the day. For the starting and parking scenarios, the intent of the pedestrians is not clear, but they block the way of the vehicle. For the following scenario, the pedestrian is just walking around with their back to the vehicle, without noticing that there is an incoming vehicle.

4 Discussion We concluded pedestrian-vehicle interaction (PVI) scenarios and constructed a threedimension (pedestrian, vehicle, and environment) framework for generating scenarios, aiming to support the development and evaluation of eHMIs. By reviewing the literature, we summarized one typical and risky set of PVI scenarios, namely street-crossing scenarios. However, the shortcoming of summarizing PVI scenarios from traffic accidents is that those low-risk scenarios may be ignored. In low-risk scenarios, pedestrian-vehicle interaction may be relatively safer. The emphasis of eHMIs’ function may shift from ensuring safety to improving traffic efficiency and experience. Therefore, as a supplement, our focus group interview provided more PVI scenarios. Through the focus group interview, we summarized another three sets of PVI scenarios which were starting, parking, and following scenarios. We generated some examples of PVI scenarios with different settings through our framework. Our work was more of a top-down approach. This method may miss some bottom-up, life-based, and specific scenarios, but it can generalize and summarize several types of scenarios with similar characteristics. Specific sites (e.g., apartment complexes, schools, etc.) did not appear in our framework because we found that some specific sites can be

A Framework of Pedestrian-Vehicle Interaction Scenarios

531

deconstructed into a combination of variables’ settings. Take the school site, for example, the surrounding roads are essentially a combination of straight roads and intersections, children may appear among pedestrians, and there may be parked vehicles which could cause visual occlusion. The contribution of our study is that a comprehensive series of PVI scenarios can be generated according to the development and evaluation needs of eHMI designers and researchers. They can also start from the built scenarios to explore what eHMI is needed.

5 Conclusions Street-crossing (especially on straight roads) scenarios are one typical and risky set of pedestrian-vehicle interaction scenarios. Starting, parking, and following scenarios are another three typical, low-risk, and communication-needed sets of pedestrian-vehicle interaction scenarios. Through a three-dimension framework of pedestrian, vehicle, and environment, a series of pedestrian-vehicle interaction scenarios can be generated to support the development and evaluation of eHMIs. Acknowledgements. This work is supported by the National Natural Science Foundation of China (31970998) and Youth Innovation Team of Shaanxi Universities (2020084).

References 1. Sucha, M., Dostal, D., Risser, R.: Pedestrian-driver communication and decision strategies at marked crossings. Accid. Anal. Prev. 102, 41–50 (2017). https://doi.org/10.1016/j.aap.2017. 02.018 2. Jiang, Q., Zhuang, X., Ma, G.: Evaluation of external HMI in autonomous vehicles based on pedestrian road crossing decision-making model. Adv. Psychol. Sci. 29(11), 1979–1992 (2021) 3. Hagenzieker, M.P., et al.: Interactions between cyclists and automated vehicles: results of a photo experiment*. J. Transp. Saf. Secur. 12(1), 94–115 (2020). https://doi.org/10.1080/194 39962.2019.1591556 4. Clamann, M.P., Aubert, M.C., Cummings, M.L.: Evaluation of vehicle-to-pedestrian communication displays for autonomous vehicles. In: Proceedings of the Transportation Research Board 96th Annual Meeting, Washington, DC, USA (2017) 5. Velasco, J.P.N., Farah, H., van Arem, B., Hagenzieker, M.P.: Studying pedestrians’ crossing behavior when interacting with automated vehicles using virtual reality. Transp. Res. F Traffic Psychol. Behav. 66, 1–14 (2019). https://doi.org/10.1016/j.trf.2019.08.015 6. Deb, S., Strawderman, L.J., Carruth, D.W.: Investigating pedestrian suggestions for external features on fully autonomous vehicles: a virtual reality experiment. Transp. Res. F Traffic Psychol. Behav. 59, 135–149 (2018). https://doi.org/10.1016/j.trf.2018.08.016 7. Dey, D., et al.: Taming the eHMI jungle: a classification taxonomy to guide, compare, and assess the design principles of automated vehicles’ external human-machine interfaces. Transp. Res. Interdisc. Perspect. 7, 100174 (2020). https://doi.org/10.1016/j.trip.2020.100174 8. Chen, J., Dai, C., Wang, H., He, Y.: Analysis of pedestrian protection strategies based on the characteristics of pedestrian accidents in China. Beijing Automot. Eng. 191(2), 1–4+13 (2014)

532

Y. Song et al.

9. Sewalkar, P., Seitz, J.: Vehicle-to-pedestrian communication for vulnerable road users: survey, design considerations, and challenges. Sensors 19(2), 358 (2019). https://doi.org/10.3390/s19 020358 10. Zhang, S., Liu, L., Li, P., Gao, W.: Characteristics and mechanism of fatal traffic accidents involving pedestrians: based on 181 in-depth investigated cases. J. Transp. Inf. Saf. 36(06), 16–23 (2018) 11. Tan, Z., Che, Y., Xiao, L., Li, P., Zhang, Q., Xu, J.: Trace analysis for the typical precrash scenario between car vehicle and pedestrian caused by the automatic driving. J. Saf. Environ. 21(04), 1573–1582 (2021)

A User Requirement Driven Development Approach for Smart Product-Service System of Elderly Service Robot Tianxiong Wang1(B) , Wei Yue1 , Liu Yang2 , Xian Gao2 , Tong Yu2 , and Qiang Yu1 1 School of Art, Anhui University No. 111, Jiulong Road, Economic and Technological

Development District, Hefei 230601, China [email protected] 2 School of Machinery and Electrical Engineering, Anhui Jianzhu University, No. 292, Ziyun Road, Shushan, Hefei 230601, China

Abstract. With the advancement of computational intelligence and information technology, smart product-service system (SPSS) has become an important research field, which focuses on advocating the management and resource integration of products and services. As an intelligent service robot SPSS for the elderly, how to promote enterprises to provide services that satisfy the elderly has always become a difficult problem. Therefore, this paper aims to explore the mapping relationship between the design elements of this SPSS and the perception of demand experience. Firstly, this research explores the Kansei needs of elderly users through correlation analysis, and analyzes the key demands of users through the Kano model. Secondly, the stakeholders and their relationships could be analyzed for this SPSS, and to extract service elements through the user journey map, then the rough set theory (RST) is applied to identify the key service elements which could have an important impact on user demands. Finally, the logistic regression is adopted to establish the mapping model between user needs and service elements, and then the optimal service system content that meets user demands is obtained. In addition, the service blueprint is used to sort out the overall service process. To take the development and design of the intelligent elderly robot SPSS as an example, an effective system design combining product hardware and service interface is proposed. The results show that this SPSS could make elderly users more satisfied so as to significantly improves the new service performance, thereby increasing the user’s emotional satisfaction. Keywords: RST · Smart product-service system · Elderly service robot

1 Introduction By 2050, the proportion of the population aged 65 and above will exceed 20%, and China will continue to be the country with the largest aging population [1]. The physical aging of the elderly is usually related to the gradual deterioration of physiological and cognitive functions, such as the impairment of eyesight and hearing, the loss of hand © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 533–551, 2023. https://doi.org/10.1007/978-3-031-35389-5_37

534

T. Wang et al.

grip and dexterity, and the decline of motor skills, reaction, memory and learning ability. In view of the declining physiological functions of the elderly, elderly care problem is in urgent need. At the same time, on one hand, the fast pace of social life has made the children of the elderly tired of making a living, and it is difficult for them to spend plenty of time with the elderly. On the other hand, due to the imperfect system construction of nursing homes in China, there is a lack of nursing staff with professional care experience to serve the majority of the elderly. Therefore, it is very necessary to deal with the actual needs of the elderly and solving the problem of providing for the elderly. Along with the continuous updating and development of information technology and the in-depth digital transformation of the manufacturing industry [2], enterprises have begun to develop a robot with the ability to “perceive, learn, understand or reason to form new knowledge and respond to new situations” [3]. At the same time, the focus of robot product application and development has shifted from the industrial field to serving human life, and more environmental information is obtained of service robot through interaction with smart spaces, including device scheduling based on interaction with smart spaces and smart home for user behavior understanding based on scene knowledge. The elderly intelligent service robot of robot products has become an important research direction in the field of artificial intelligence, and it is also a powerful solution for the elderly home service scenarios. Alves-Oliveira et al. [4] utilized experiments to analyze that the service robot can impact the quality of life of elderly. To develop the smart service robot that meets the physical and psychological needs of the elderly, which can meet the high-quality life needs of the elderly, and then to minimize the pressure on their children and society. In fact, ubiquitous sensors and networked communication make it possible to fully capture and accurately perceive users’ needs and preference, which greatly contribute to the development of product customization, personalization and servitization. The product service system (PSS) design provides users with value creation that combines tangible actual products with intangible service systems, which is a strategic design that provides services for users centered on tangible items so as to provide users with a complete user experience [5–7]. At the same time, in the context of the continuous emergence of new industrial technologies and business models, PSS has also developed from Internet-based PSS (conventional PSS) and IoT-based PSS to the current Smart PSS [8]. Smart product-service system (SPSS) is fundamentally composed of intelligent technology, interconnected products and the generated digital services [9]. Under the rapid development background of intelligence and service, the connection and integration of robots and services can be further deepened, and it emphasizes satisfying user Kansei need to provide more services combining material and dematerialization. Especially, with the advent of the era of Kansei experience, elderly users pay more attention to pleasant experience and personalized expression. Therefore, to design and develop the corresponding smart service robot SPSS for the elderly undoubtedly provides an effective solution to improve service satisfaction. This SPSS design could solve the health management problems of the elderly, so as to meet the living needs of elderly users and achieve the goal of improving the quality of life, thereby contributing to format a systematic and comprehensive elderly health medical management system. Recently, while KE has been adapted to the service industry to realize the relationship between service design elements and customer emotional perception [10]. However,

A User Requirement Driven Development Approach

535

most literature studies focus on measuring customer perceptions of service quality and satisfaction [11], and these studies were primarily developed as tools for diagnosing service performance and understanding customer behavior. However, few studies have investigated how to build SPSS that can best meet customers’ emotional needs by considering the ambiguity and complexity of emotions, thus ignoring the uncertainty of user needs in SPSS development. To this end, rough set theory has become an effective solution. The rough set theory (RST) is a mathematical method proposed by Polish scholar Pawlak in 1982 to deal with fuzzy, imprecise and uncertain data problems [12]. Its biggest advantage is that it can perform attribute reduction without relying on any additional knowledge, and using rough set theory can effectively excavate key core attribute factors in decision attributes. Therefore, this research could combine RST technology in artificial intelligence, so that to discuss key user needs. To premise that the decision quality is unchanged, RST removes redundant design features and improves the performance efficiency of the algorithm. Furthermore, the mapping relationship between customer sensibility and key service features of SPSS is established based on logistic regression. The main purpose of this research is to combine RST and logistic regression to complete the design of SPSS, so as to help operators and designers to create SPSS for the elderly intelligent service robot that meets the needs of users. Thus, the satisfaction of the elderly experience could be improved, thereby generating new value-added services.

2 Literature Review 2.1 Rough Set Theory With the popularization and development of science and technology, how to effectively acquire useful knowledge and analyze data is still an urgent task. In fact, people find that many data are uncertain. Several existing methods, such as evidence theory, fuzzy theory, and probability statistics, are used to deal with problem with uncertainty. When using these methods to deal with uncertainty factors, it is necessary to provide prior information, such as fuzzy membership functions and probability distributions, which are often not readily available, but are not necessary when analyzing data using RST. As a mathematical tool, RST [12] was first described by Polish mathematician Z Pawlak to deal with problems with fuzzy and uncertain knowledge, which could reduct and derive decision or classification rules for a problem. The axiomatic system in RST has explicit objectivity that it does not require any external information or additional subjective adjustments for data analysis [13], this method only extracts known information or knowledge from the given data, and has approximate description on imprecise and uncertain concepts, or to deal with uncertain phenomena and problems according to the results of observation and measurement. Hence, RST become a more suitable tool for dealing with imprecise information in product development. Pawlak [14] outlined the basic concepts of RST, including Indiscernibility relation, Approximations, Rough membership, Dependency of attributes, Reduction of attributes, and proposed decision tables and decision algorithms in the decision support process. Thus, RST has been widely concerned by many scholars to deal with uncertain problems. Zhu et al. [15]

536

T. Wang et al.

proposed a systematic method for evaluating design concepts in a subjective environment, integrating individual judgments and preferences through RST. In order to deal with the ambiguity in decision-making, an analytic hierarchy process based on rough numbers is proposed. Xu [16] proposes a packaging design method for modern cultural and creative products based on RST, so as to assist designers in designing products that meet the Kansei needs of consumers. 2.2 Smart Product-Service System (SPSS) Development The concept of Product Service System (PSS), which was first proposed by Goedkoop [17], is considered as a hybrid solution that integrates tangible products and intangible services to meet customer needs. With the rapid development of the Internet, the advancement of computational intelligence and network technology, a new productservice systems with intense communication between diverse sensors, cyberspace and physical devices has emerged [18]. In this context, PSS is also showing a significant trend of improvement in terms of adaptability, reactiveness, autonomy, versatility, and the ability to cooperate with various devices [19, 20]. Kuhlenkotter et al. [21] described smart PSS as “highly complex, dynamic and interconnected digital ecosystem in which multiple stakeholders pursue their interests in a network of value co-creation”. In fact, SPSS has integrated smart products and electronic services [22], and it has broken the boundaries not only between physical and virtual components [23], but also smart products and internet services [20]. SPSS is a technology-enabled innovation that will create new dynamics for the relationship between service providers and consumers [24]. Valencia A et al. [22] pointed out that the core elements of SPSS consist of structured products, intelligent technologies, and connected services. Due to the deep coupling between digital technology and information technology, SPSS is able to realize context-aware recognition through communication technology, network facilities and cloud computing [25]. At the same time, SPSS can collect data to create value through digital technology [9]. The value co-creation among stakeholders is the core of smart PSS business innovation [26]. In fact, value flows from providers to receivers and is co-created by customers and other stakeholders [27], so as to result in a network platform PSS [28] that integrates smart connected products and smart service systems. Stakeholders and customers can then participate in this platform, interact and jointly create their value [27]. The current SPSS is given a market proposition to expand the functionality of the product based on additional services, so as to create value for customers, and further building positive customer relationships for enterprises. SPSS could integrate services and products to meet the individual needs of consumers jointly [24]. SPSS design methods [29] can be classified into 6 types, namely customer perspective, modeling techniques, visualization methods, modularity, TRIZ, and system dynamics respectively. In fact, users are the center and fundamental driving force of product and service development, so that many scholars developed the SPSS framework based on the user’s perspective. Chang et al. [30] proposed that a conceptual framework for user-centered SPSS and provided its specific development method in a case study on medication management for the elderly. Guo et al. [31] developed rehabilitation assistive devices (RADs) SPSS and evaluated the usability of this system, which has fully proved the usability and effectiveness of this proposed SPSS development method. Although existing research

A User Requirement Driven Development Approach

537

emphasizes that the goal of SPSS is to meet various dynamic user needs, there is still a lack of a correct understanding of SPSS from the user’s perspective and a quantitative development method that fully considers user perception and experience. Therefore, in this study, in order to develop SPSS that meets the needs of elderly, the objective key SPSS design elements are obtained by the rough set method, and a quantitative SPSS development method based on user needs is constructed to improve the satisfaction of elderly.

3 Method In this section, a user Kansei demand driven elderly service robot SPSS development method is proposed based on the KE. In particular, the correlation analysis is used to explore users’ high satisfaction demand items for elderly service robot SPSS. Then, the Kansei items with attractive quality could be excavated, and then screen out the attribute items that meet the user’s demands. Then, the user journey map of service content is used to analyze the design elements of SPSS, and then screen out the service design attribute items that meet the user’s demands and build a decision table. In order to accurately obtain the weight and priority of service design items, the attributes are further reduced by using RST to deal with ambiguity and inconsistent knowledge, so as to extract the core key design features of SPSS. In order to further explore the relationship between elderly Kansei need and service factors, the logistic regression method was used to obtain the mapping relationship between service design features and perceptual requirements, so that to screen out the service system function points, which were used to guide the development of SPSS. Finally, the SPSS planning is further explored through provider identification and integration network, and the system is visualized through service blueprints. Moreover, the design of products and service plans for elderly intelligent robots is completed. 3.1 Obtain Knowledge of User Kansei Needs In fact, the user Kansei need is complex, ambiguous, dynamic, and changeable for users with the same goals, so that to understand user Kansei needs accurately is essential to correctly locate product/service development. Once areas of service have been identified, we must identify and measure the Kansei that people use to express their psychological feelings and emotions. There are different ways to measure Kansei, and the most common way to measure Kasnei is through words, because words are the external description of the inner sensibility of the human mind [6]. Barrett et al. [32] agree with this view and suggest that in order to capture the person emotions, they should be allowed to describe their experiences in their own words. Therefore, in order to fully understand the user’s emotional experience needs for SPSS, this study distinguishes positive and negative emotions according to the bipolar pairs of Kansei words, each Kansei attribute is defined by a pair of Kansei words, which can be collected from all available sources, including journals, literature, manuals, experts and experienced users [6]. Then, researchers must eliminate duplicate or similar words by manual methods [33] or statistical methods [10] to find representative advanced words in the services field. Manual methods typically

538

T. Wang et al.

include KJ simplification method, designer interviews, focus groups, etc., which could make researchers manually group and summarize Kansei vocabulary. Many statistical methods could be used to eliminate duplicate or similar perceptual words include factor analysis, principal component analysis, cluster analysis and other related methods, which usually need to collect data from customer groups in the form of questionnaires. Hence, this study adopts focus group and KJ method to analyze user needs and eliminate repeated words to obtain the Kansei needs of elderly for service robot SPSS. Compared with traditional user demand acquisition, the Kansei demand acquisition in this study is not limited to the elderly themselves, including their children and relatives of the elderly, so as to have a more comprehensive understanding of needs. Then, the correlation analysis is utilized to analyze the Kansei demand with the highest correlation coefficient between users’ Kansei needs and user satisfaction, which is the key demand of users. 3.2 Obtaining Service Design Elements In the analysis of service design elements, the process of mining design attributes includes the following two steps [34]: (1) Firstly, design attributes are first collected from technical documents, competing services, pertinent literature, manuals, experts, experienced users, concept studies, analysis of the usage of existing services and related service groups; (2) The user journey maps can be used to present the user task state (three dimensions of physical, emotional and cognitive). In particular, the tasks are divided into different stages, and the user performance (such as actions, goals, emotions, obstacles, etc.) in each stage is observed and recorded, and the complete process from the first understanding of the service to form the contractual relationship is sorted out in a visual way. Furthermore, the functional responsibilities of each task role could be clarified in the system. Therefore, this study applies user journey maps to obtain service elements of the elderly robot SPSS, and then selects the design elements of SPSS based on focus groups. 3.3 RST to Capture Key Service Elements For service domains, experimental stimuli are always represented as service scenarios, which come from different combinations of possible values of design attributes [33]. However, asking subjects to evaluate a large number of service scenarios on different perceptual attributes, which may lead to unreliable results. Hence, it is necessary to further explore the core design elements based on the RST [35]. In the RST system, it could abstract the research object into the information system or knowledge expression system, which can be represented by a four-tuple information and is usually defined as [35, 36]: S = (U, A, V, f), where U is a finite set of objects (universe); A is the knowledge attribute set, V is the knowledge attribute value set, f is the knowledge function, U × A → V. If A = C ∪ D and C ∩ D = Ø, then I = (U, C, D, V, f) is called the decision table, where C is the conditional attribute set and D is the decision attribute set. The important technique of rough set could include lower approximation (R(X)), upper approximation R (X), boundary BN( X), indiscernibility, positive region (posC (D)), dependency (γ (C,D), significance (Sig (C)), core and reduct [13, 35, 37, 38].

A User Requirement Driven Development Approach

539

3.4 LR to Build Uncertain Relationship There are many methods to relate design attributes to Kansei attributes. To simultaneously consider nonlinearity, uncertainty, and boundedness of perceptual data, logistic regression analysis is an effective method to establish the relationship between design attributes and perceptual attributes [6]. Logistic regression analysis is a widely used statistical model, and its basic form is to use a logistic function to model a binary dependent variable. Multinomial logistic regression (MLR) is a classification method that generalizes logistic regression to multi-class problems, where ordinal logistic regression is used to model the relationship between ordinal dependent variable and independent variables. Therefore, this study uses SPSS design elements as independent variables and take the results of user experience evaluation as dependent variables to establish the relationship model based on the ordinal logistic regression [39]: 3.5 Providers Integrating to Build Service Content In order to achieve a complete service chain for diverse user needs, it is necessary to integrate multiple suppliers to provide service to meet the identified user needs, so as to construct the involved all relevant aspects system. All such aspects can be included as parts of the service [30]. Therefore, identification and integration of suppliers are very important (Fig. 1).

Fig. 1. The provider integration network [30]

From a user-centered design perspective, the identification of involved suppliers should take user needs into account. When organizing related providers, there are three important flows: information flow, value flow and material flow. Specifically, material flow refers to production, operation and transportation of materials. Suppliers involved in product development, manufacturing and delivery are key points of the formation of service system. Information flow refers to the collection, visualization and transmission of information, such as product information, courier status and medical information, etc. The information is collected, integrated and presented in a visualized way to support service implementation. In the value flow, the main values include users’ ability to accomplish their goals and pleasant user experience, and the functional and aesthetic realization of the product/service. Doubtless, the integration and participation of various providers is an important factor in the development of this proposed SPSS, and the planning of service processes and content is explored so as to create a new potential

540

T. Wang et al.

value through functional interaction, information interaction and value interaction among stakeholders. To build a concrete product or service, UX (User Experience) theory should be referred to in the product architecture, where product design (as a function) can be considered layer by layer from abstract ideas to concrete specifications (as shown in Fig. 2). It includes five levels of content: strategy, scope, structure, framework and presentation, each of which corresponds to product goal, functional specification (content), interaction design (information architecture), interface design (information and navigation) and sensory design of product respectively. According to the bottom-up sequence of logic, the layer structure clearly guides the entire user experience design process so as to truly transform the design goals from abstract concepts into concrete design outcomes.

Fig. 2. Five elements of user experience [40]

4 A Case Study In this study, in order to illustrate the development method of SPSS based on the KE, this study selects the elderly service robot SPSS as a case study to verify the effectiveness of this proposed method.

A User Requirement Driven Development Approach

541

4.1 User Needs Analysis In order to achieve the most complete set of Kansei words, Kansei words were initially collected mainly from two sources: (1) previous literature on robotic products and related to elderly services such as books, reviews and magazines; (2) by in-depth focus groups interview with elderly. A total of 48 elderly were selected as representative customers, of which 55% were male and 45% were female. The 48 elderly were asked to provide their Kansei needs of service robot SPSS until no new Kansei words emerged. According to the interview results, a set of 86 Kansei adjectives were preliminarily selected. To obtain more manageable and relevant Kansei words, the KJ method is utilized to compile Kansei words, eliminate duplicate or similar words. In the first screening stage, we use KJ simplification to remove words that are repeated or similar in meaning. In the second screening stage, a focus group consisting of 10 designers and 10 teachers was formed to determine the final selected Kansei vocabulary through discussion. The experimental results are shown in Table 1. Table 1. 14 Kansei words K1

K2

K3

K4

K5

Safe

Comfortable

Modern

Relaxed

Convenient

K6

K7

K8

K9

K10

Timely

Precise

Reliable

Gracious

High quality

K11

K12

K13

K14

/

Swift

Novel

Intelligent

Considerate

/

The correlation analysis is used to extract user needs which are highly correlated with satisfaction, and further describe the relationship between satisfaction and Kansei needs. The Pearson’s correlation coefficient (Pearson’s r) represents the consistency strength of each Kansei demand and satisfaction, and selects the coefficient with the highest correlation as the key Kansei demand that affects satisfaction. Specifically, this study firstly investigates the importance of each Kansei demand based on SD method, and further analyzes it using correlation analysis method. It is concluded that the demand with the highest correlation coefficient is the high-quality demand (0.961). The experimental results are shown in Table 2. Table 2. The correlation analysis statistical results Users need

Cozy

Convenient

Warm

High quality

Correlation

0.934**

0.779**

0.631

0.961**

P value

0.000

0.008

0.050

0.000

542

T. Wang et al.

4.2 Spanning Design Attributes for Elderly Service Robot SPSS In order to explore the design elements contained in the intelligent service robot SPSS, the service design attributes that can affect the emotional response of the elderly should be clarified. In KE research process, the collection of product or service design attributes is the output of this step. In this study, the user journey map of user behavior analysis method is used to obtain the service design elements and key touchpoints of users at each stage for the elderly robot SPSS, then the key service elements are obtained through KJ simplification method and expert interview method, so as to guide the construction of subsequent SPSS. Stakeholder Analysis. In 1984, the stakeholder management theory was clearly put forward. The stakeholders of this SPSS can be divided into 8 categories: elderly users, pharmacies, medical institutions, equipment manufacturers, R&D service providers, doctors, shopping agencies, family of the elderly and other service personnel, as shown in Fig. 3. Service robot products and elderly users are the core stakeholders, and other stakeholders and linkages in this SPSS can facilitate the interaction between them. Since the service robot product interacts with almost all relevant stakeholders and generates a large amount of data, all data and information generated by these stakeholders in the program are stored, analyzed and managed in the platform of this proposed service system.

Fig. 3. Stakeholder analysis

A User Requirement Driven Development Approach

543

Behavior Analysis for Elderly. The user behavior analysis refers to observe the interaction scenarios between certain elderly and service robot in their lives, and discovering deeper user needs. Based on the user journey map, this study sorted out the complete process of users from the first understanding of service to the formation of contractual relationship so as to extract the service elements of this SPSS. Hence, we can figure out the problems and design opportunities point, and the user journey map of elderly users is shown in Fig. 4.

Fig. 4. User journey map

Through the observation of users’ behaviors, the above specific problems of user needs can be further categorized into six types from the perspective of user experience: (1) Cloud-based data records, including doctor’s recommendations, prescription information, patients’ self-monitoring and medication behavior data; (2) Pharmacies could automate single-dose dispensing, packaging, and pickup; (3) Ordering ingredients and medicines remotely through the robot service interface; (4) Recommending dishes according to the physical state of the elderly; (5) Monitoring the real sleep status of the elderly; (6) Providing diversified entertainment ways for the elderly; (7) The regulation of smart home appliances by robot products; (8) Providing a more convenient remote medical treatment method. Determination of Service Elements. The specific service design elements and contents of SPSS are extracted and merged based on the KJ method, so that similar and repeated attribute items are merged. Finally, 19 design attribute sets are obtained, as shown in Table 3

544

T. Wang et al. Table 3 Design elements with their descriptions of this SPSS

E1

Optimize the appearance design of smart robot to meet the fashion trend

E2

Monitor the regular medication behaviors

E3

Package of single-dose medicine dispensing

E4

Robot automatically cleans on time

E5

Recommend dishes according to the physical condition of the elderly

E6

Order foods through online price comparison

E7

Remind and record health data (blood pressure, blood sugar, and blood fat) regularly

E8

Monitor posture falling status in real time

E9

Communicate with the family by video at any time

E10

Provide a variety of games in the app

E11

Robots become real players in the game

E12

Assist sleep through music and sound

E13

The robot monitors the vital signs of the elderly during sleep, and links with the hospital

E14

The robot receives instructions and controls the smart devices through the Internet of Things

E15

The robot transmits the disease data to the hospital to realize telemedicine

E16

Support automatic registration

E17

Digital medical records

E18

Digital doctor advice

E19

Emotionalized voice communication and companionship

4.3 RST to Identify the Key Design Features Determination of Service Stimuli. In fact, it is impossible to consider all 19 design attributes in the elderly service robot SPSS design. 8 elderly over 60 years old and 4 professional designers who have been engaged in intelligent product design for more than 2 years were selected as experts because of their detailed understanding and cognition of intelligent robot products and the behavioral needs of the elderly. Then, we conducted face-to-face interviews with these 12 experts, who were asked to select the design attributes they believed to have an impact on the need for high quality. Then, the majority principle is used to derive refined design properties [6]. In other words, if at least six of the 12 experts selected the design attribute, it was used as design attributes of this SPSS, and the set of 14 refined design attribute sets are obtained. For the service design, experimental stimuli are always derived from different combinations of possible values of design attributes [6], which are representative experimental samples [10]. Furthermore, the suitable number of experiment stimuli in KE is about 10–20 [41]. Therefore, this research is based on these 14 design attributes, and discusses with a number of experts in the field of industrial design to determine the different attribute status. Furthermore, a total of 32 service scenarios were generated, namely S1, S2, …, S32.

A User Requirement Driven Development Approach

545

Constructing the Decision Table. In fact, the RST could not directly deal with continuous attribute variables. Therefore, in this study, the L method proposed by Lenarcik [42] is used to make kansei scores obtained by the SD to convert into discrete datas. Specifically, the emotional image evaluation results are in the interval [1,2.5] which coded number 1, the result in the interval [2.5,3.5] which coded number 2, and the result in the interval [3.5,5] which coded number 3. Moreover, the discretized kansei words evaluation value is used as the decision attribute D, which is combined with the condition attribute C to construct the decision table (DT) for this elderly service robot SPSS. Finally, a decision table for the design of this ESR-SPSS could be established (as Table 4). Table 4. The decision table U

E1

E2

E3

E4

E5

E6

E7

E8

E9

E10

E11

E12

E13

E14

D

1

1

1

2

2

2

2

2

1

1

1

3

2

2

1

2

2

2

2

1

1

1

1

3

1

2

2

4

2

1

2

1

3

1

1

2

2

2

3

1

2

3

1

1

1

1

2

1

4

2

1

1

2

2

2

3

2

1

2

2

1

2

2

1

5

2

1

2

1

1

1

2

1

1

1

1

1

1

1

3

6

2

2

1

1

1

1

3

2

2

2

4

2

1

1

2

7

1

1

2

2

2

3

1

2

3

1

1

1

1

1

2

8

2

1

2

2

2

2

2

1

1

1

3

2

2

2

1

9

1

2

1

1

1

1

3

1

2

2

1

2

1

1

3

10

1

1

2

2

2

1

1

2

1

1

1

1

1

1

2

11

1

1

1

1

1

2

2

1

1

1

3

2

1

1

3

12

1

1

1

1

1

1

3

1

2

2

4

2

1

1

2

13

1

1

1

1

1

3

1

2

3

1

1

1

1

2

2

14

1

1

1

2

1

2

3

2

1

2

2

1

1

1

2

15

1

1

2

1

1

1

2

1

1

1

1

1

1

1

3

16

2

2

1

1

1

1

3

2

2

2

1

2

1

1

2

17

1

1

1

1

2

3

1

2

3

1

1

1

1

1

2

18

1

1

2

2

2

2

2

1

1

1

3

2

2

2

1

19

1

1

1

1

1

1

3

1

2

2

1

2

1

1

2

20

1

1

2

1

1

1

2

1

1

1

1

1

1

2

3

21

1

2

1

1

2

3

3

1

2

2

4

2

1

2

1

22

1

1

2

2

2

3

1

2

3

1

1

1

1

1

1

23

2

1

1

2

2

2

3

2

1

2

2

1

2

1

1

(continued)

546

T. Wang et al. Table 4. (continued)

U

E1

E2

E3

E4

E5

E6

E7

E8

E9

E10

E11

E12

E13

E14

D

24

2

1

2

1

1

1

2

1

1

1

1

1

2

2

2

25

1

1

2

2

1

1

1

2

3

1

1

1

1

1

3

27

2

2

2

2

2

2

3

2

2

1

4

2

2

2

1

28

2

2

2

2

2

1

3

2

2

1

4

2

2

2

2

29

2

1

2

1

2

1

2

1

1

1

1

1

2

1

1

30

2

1

2

1

1

1

2

1

1

1

1

1

2

1

2

31

2

1

2

1

1

3

2

1

1

1

1

1

2

1

1

32

2

1

2

1

1

1

2

1

1

1

1

1

1

1

2

Determining Key Service Design Elements. According to RST, the development and design attributes of ESR-SPSS are reduced, and it is concluded that E1, E2, E4, E5, E6, E11, E12, E13, E14 are necessary attributes, but E3, E7, E8, E9, E10 are unnecessary in C for D, that is, redundant attributes which should be deleted. Therefore, the core elements of the final reduction is {E1, E2, E4, E5, E6, E11, E12, E13, E14}. Hence, the core service design element that affects users’ demands for “high quality” mainly include regular monitoring of drug-taking behavior, single-dose drug dispensing, online price comparison of pre-ordered food, regular reminders to record health data, monitoring of body fall status, receiving instructions and controlling smart devices through the LoT, telemedicine, automatic registration and digital medical records, voice communication. On the contrary, the combination of dishes and cooking, dynamic video communication, diversified game entertainment, and sleep monitoring design attributes have little influence on high quality needs of decision-making attributes, and the importance of attribute E1 is 0.102, the importance of E2 is 0.122, the importance of E4 is 0.082, and the importance of E5 is 0.122. The importance of E6 was 0.163, the importance of E11 was 0.082, the importance of E12 was 0.082, the importance of E13 was 0.102, and the importance of E14 was 0.143 (Table 5).

Table 5. The robot service design criteria weight values Service Attributes W

0.102

E1

E2

E4

E5

E6

E11

E12

E13

E14

0.122

0.082

0.122

0.163

0.082

0.082

0.102

0.143

A User Requirement Driven Development Approach

547

Table 6. Regression coefficients of the 9 design attributes Kansei attributes Threshold Position

Regression coefficients

Kansei attributes

Regression coefficients

D=1

7.150

D=2

11.868

E1 = 1

6.124

E11 = 2

9.723

E1 = 2

0.000

E11 = 3

21.697

E2 = 1

−6.851

E11 = 4

0.000

E2 = 2

0.000

E12 = 1

7.001

E4 = 1

−0.375

E12 = 2

0.000

E4 = 2

0.000

E13 = 1

−1.086

E5 = 1

4.363

E13 = 2

0.000

E5 = 2

0.000

E14 = 1

2.944

E6 = 1

4.707

E14 = 2

0.000

Position

E6 = 3

0.000

E11 = 1

−0.634

4.4 LR to Build Mapping Model to Obtain the Key Features of SPSS To apply ordinal logistic regression to establish uncertainty relationships between design attributes and Kansei attributes through the ordinal LR method. Therefore, regression coefficients for the relationship between the nine design attributes and the three realization levels relative to Kansei needs can be obtained. Clearly, the nine design features contribute differently to a given perceptual property Y j . and the most important design attributes among the 9 design elements are E1 as 1, E2 as 2, E4 as 2, E5 as 1, E6 as 1, E11 as 3, E12 as 1, E13 as 2 and E14 is 1 respectively. The experimental results show that the bold design elements in Table 6 are corresponding service content in the SPSS development process. Furthermore, experimental results shows that the performance type of E13 is 2, which means that this SPSS cannot provide digital medical records and orders, which may be inconsistent with the design requirements of SPSS. Therefore, according to the design goal of this SPSS is to make the developed SPSS more intelligent, so as to provide more valuable services for elderly users. Furthermore, the results of E13 is adjusted to 1 based on consulting design experts and elderly users, so that to make this proposed SPSS more intelligent and convenient. Accordingly, the service content corresponding to the results calculated by the ordinal logistic regression should be considered in the development process of SPSS, and the experimental results should be effectively transformed to the design elements of the ESR-SPSS. 4.5 Construction of Product/service for SPSS According to the identified user needs and the service design plan planned by the integration provider, physical products need to be designed to provide solutions. The modeling design of elderly service robot products should adopt multi-rounded design, body head

548

T. Wang et al.

design and human-like design so as to increase affinity. Accordingly, the appearance of the elderly service robot is a cute shape with rounded and smooth lines. Moreover, the human-like shape design can increase the trust and dependence of elderly users on service robots, and eliminate the resistance for elderly users. Therefore, the Rhinoceros 3D modeling software is used to establish the final solution model of the service robot, and Keyshot-10 is used to render the 3D model of this service robot. The overall appearance of the robot product is shown in Fig. 5.

Fig. 5. The product form design of elderly service robot

5 Conclusions The rise of digital technology has brought a broad market for smart products, which is conducive for manufacturing enterprises to transfer their traditional business to SPSS. As an important component that can guide the service innovation design process, user need preference has major influence on the realization of enterprise service innovation goals. In fact, this study takes the ESR-SPSS as an example, and constructs a user Kanseidriven SPSS development method. Firstly, we combined the KJ simplification method with the correlation analysis, then the users’ key Kansei need is obtained. Furthermore, based on stakeholders and user journey map to extract the service design elements of this SPSS. At the same time, the decision table (DT) is constructed of combining the original

A User Requirement Driven Development Approach

549

project of service design elements with decision-making attributes, after the reduction and weight calculation of RST, the key service design elements are extracted. Finally, the Kansei knowledge is obtain to get service design elements that meet the needs of users. Acknowledgements. Our work is supported by Humanities and Social Sciences Project of Anhui Provincial Education Department under Grant No. SK2021A0058 and the Anhui University talent introduction research start-up funding project (No. S020318019).

References 1. United Nations, Department of Economic and Social Affairs (UN DESA), Population Division. World Population Ageing (2017). 2. C.-H. Lee, C.-H. Chen, Y.-C. Lee, Customer requirement-driven design method and computeraided design system for supporting service innovation conceptualization handling, Adv. Eng. Inf. 45 (2020) 3. Goertzel, B.: Artificial general intelligence: concept, state of the art, and future prospects. J. Artif. Gen. Intell. 5 (2014) 4. P. Alves-Oliveira, S. Petisca, F. Correia, N. Maia, A. Paiva, Social robots for older adults: Framework of activities for aging in place with robots, International Conference on Social Robotics, Springer, 2015, pp. 11–20. 5. M. Stickdorn, J. Schneider, K. Andrews, A. Lawrence, This Is Service Design Thinking: Basics, Tools, Cases, Wiley, Hoboken (2011) 6. Yan, H.-B., Li, M.: An uncertain Kansei Engineering methodology for behavioral service design. IISE Trans. 53, 497–522 (2020) 7. Kuo, T.C., Wang, M.L.: The optimisation of maintenance service levels to support the product service system. Int. J. Prod. Res. 50, 6691–6708 (2012) 8. Cong, J.-c., Chen, C.-H., Zheng, P., Li, X., Wang, Z.: A holistic relook at engineering design methodologies for smart product-service systems development, J. Cleaner Prod. 272, 122737 (2020) 9. Zheng, P., Lin, T.-J., Chen, C.-H., Xu, X.: A systematic design approach for service innovation of smart product-service systems, J. Cleaner Prod. 201 657–667 (2016) 10. Yeh, C.-T., Chen, M.-C.: Applying Kansei Engineering and data mining to design door-to-door delivery service, Comput. Ind. Eng. 120, 401–417 (2020) 11. Ma, M.-Y., Chen, C.-W., Chang, Y.-M.: Using Kano model to differentiate between future vehicle-driving services. Int. J. Ind. Ergon. 69, 142–152 (2019) 12. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11, 341–356 (1982) 13. Li, Y., Zhu, L.: Extracting knowledge for product form design by using multiobjective optimisation and rough sets. J. Adv. Mech. Des. Syst. Manuf. 14, 1–16 (2020) 14. Pawlak, Z.: Rough set approach to knowledge-based decision support, Eur. J. Oper. Res. 99, 48–57 (1997) 15. Zhu, G.-N., Hu, J., Qi, J., Gu, C.-C., Peng, Y.-H.: An integrated AHP and VIKOR for design concept evaluation based on rough number, Adv. Eng. Inf. 29, 408–418 (2015) 16. Xu, X.: Packaging design method of modern cultural and creative products based on rough set theory. Math. Probl. Eng. 2022, 1–11 (2022) 17. Goedkoop, M.J., van Halen, C.J.G., te Riele, H.R.M., Rommens, P.J.M.: Product Service systems, Ecological and Economic Basics, Ministry of Environment. The Hague, Netherlands (1999)

550

T. Wang et al.

18. Colombo, A.W., Karnouskos, S., Bangemann, T.: Towards the Next Generation of Industrial Cyber-Physical Systems, pp. 1–22. Springer, Industrial cloud-based cyber-physical systems (2014) 19. Rijsdijk, S.A., Hultink, E.J.: How today’s consumers perceive tomorrow’s smart products. J. Prod. Innov. Manag. 26, 24–42 (2009) 20. .Abramovici, M., Göbel, J.C., Neges, M.: Smart Engineering as enabler for the 4th industrial revolution. In: Fathi, M. (eds.) Integrated Systems: Innovations and Applications, pp. 163– 170. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15898-3_10 21. Kuhlenkötter, B., et al.: New perspectives for generating smart PSS solutions–life cycle, methodologies and transformation. Procedia CIRP 64 217–-222 (2014) 22. Valencia, A., Mugge, R., Schoormans, J., Schifferstein, H.: The design of smart productservice systems (PSSs): an exploration of design characteristics, Int. J. Des. 9 (2015) 23. Lerch, C., Gotsch, M.: Digitalized product-service systems in manufacturing firms: a case study analysis. Res.-Technol. Manag. 58, 45–52 (2015) 24. Valencia, A., Mugge, R., Schoormans, J.P.L., Schifferstein, R.: The design of smart productservice systems (PSSs): an exploration of design characteristics. Int. J. Design 9,13–18 (2015) 25. Chen, Z., Ming, X., Wang, R., Bao, Y.: Selection of design alternatives for smart product service system: A rough-fuzzy data envelopment analysis approach, J. Cleaner Prod. 273, 122931 (2020) 26. Pahk, Y., Self, J., Baek, J.: A value based approach to co-designing symbiotic productservice system, service design geographies. In: Proceedings of the ServDes. 2016 Conference, Linköping University Electronic Press, 2016, pp. 304–316. (2016) 27. Liu, Z., Ming, X., Song, W.: A framework integrating interval-valued hesitant fuzzy DEMATEL method to capture and evaluate co-creative value propositions for smart PSS. J. Cleaner Prod. 215, 611–625 (2019) 28. Nishino, N., Wang, S., Tsuji, N., Kageyama, K., Ueda, K.: Categorization and mechanism of platform-type product-service systems in manufacturing. CIRP Ann. 61, 391–394 (2012) 29. Chou, J.-R.: A TRIZ-based product-service design approach for developing innovative products, Comput. Ind. Eng. 161 (2021) 30. Chang, D., Gu, Z., Li, F., Jiang, R.: A user-centric smart product-service system development approach: a case study on medication management for the elderly, Adv. Eng. Inf. 42 (2019) 31. Jia, G., et al.: A synthetical development approach for rehabilitation assistive smart product– service systems: a case study, Adv. Eng. Inf. 48 (2021) 32. Chen, M.-C., Hsu, C.-L., Chang, K.-C., Chou, M.-C.: Applying Kansei engineering to design logistics services – a case of home delivery service, Int. J. Ind. Ergon. 48, 46–59 (2015) 33. Schütte, S.T.W., Eklund, J., Axelsson, J.R.C., Nagamachi, M.: Concepts, methods and tools in Kansei engineering, Theor. Issu. Ergon. Sci. 5, 214—231 (2004) 34. Wang, T., Zhou, M.: Integrating rough set theory with customer satisfaction to construct a novel approach for mining product design rules. J. Intell. Fuzzy Syst. 41, 331–353 (2021) 35. Shieh, M.-D., Yeh, Y.-E., Huang, C.-L.: Eliciting design knowledge from affective responses using rough sets and Kansei engineering system. J. Ambient Intell. Hum. Comput. 7,107–120 (2015) 36. Wang, C.-H.: Combining rough set theory with fuzzy cognitive pairwise rating to construct a novel framework for developing multi-functional tablets, J. Eng. Des. 29, 430–448 (2018) 37. Shi, F., Sun, S., Xu, J.: Employing rough sets and association rule mining in KANSEI knowledge extraction. Inf. Sci. 196, 118–128 (2012) 38. Aktar Demirtas, E., Anagun, A.S., Koksal, G.: Determination of optimal product styles by ordinal logistic regression versus conjoint analysis for kitchen faucets, Int. J. Ind. Ergon. 39, 866–875 (2009) 39. Garrett, J.J.: The elements of User Experience: User-Centered Design for the Web and Beyond, Pearson Education (2010)

A User Requirement Driven Development Approach

551

40. Wang, T.: A Novel approach of integrating natural language processing techniques with fuzzy topsis for product evaluation. Symmetry 14 (2022) 41. Miao, D.-q.: A New method of discretization of continuous attributes in rough sets, Acta Autom. Sin. 27, 296–302 (2022)

An Analysis of Typical Scenarios and Design Suggestions for the Application of V2X Technology Xia Wang1,2 , Pengchun Tang3 , Youyu Sheng1,2 , Rong Zhang1,2 , Muchen Liu1,2 , Yi Chu1,2 , Xiaopeng Zhu3 , and Jingyu Zhang1,2(B) 1 CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100000, China

[email protected]

2 Department of Psychology, University of the Chinese Academy of Sciences, Beijing 100000,

China 3 Chongqing Chang’an Automobile Co., Ltd., Chongqing 400000, China

Abstract. Vehicle to everything (V2X) is an emerging technology for intelligent connected vehicles. It could enable connections among vehicles, pedestrians, road infrastructures, etc. However, the potential application scenarios from a user-centric perspective have not been examined systematically. In this study, we summarized four common application scenarios of V2X technology based on a literature review. Our review found three major types of potential scenarios: beyond-visual-range (BVR) information presentation, public service, and intervehicle coordination. Whereas the BVR information presentation is quite clear in their everyday usage, more detailed user-oriented research is needed to describe the concrete circumstances of the rest two categories. This study is one step further toward a thorough understanding of the scenarios of V2X technology and would imply the future design and study of V2X. Keywords: Vehicle to everything · autonomous vehicles · application scenario

1 Introduction Vehicle to everything (V2X) is a wireless technology that connects vehicles to vehicles, pedestrians, and road infrastructure [1], which is essential in the age of autonomous vehicles. Specifically, it includes Vehicle-to-Vehicle (V2V), Vehicle-to-Road (V2R), Vehicle-to-Pedestrian (V2P), Vehicle to Infrastructure (V2I), Vehicle to Network (V2N) modes [2]. This "People-Vehicle-Road-Cloud" collaborative interaction system equips vehicles with intelligent features that increase road traffic safety and efficiency and lessen environmental pollution [3]. The previous V2X research mostly focused on the technology development perspective. In technology, there are two different V2X communication regimes: dedicated short-range communications (DSRC) and cellular-V2X (C-V2X) [4]. DSRC technology consists of an onboard unit (OBU) and a roadside unit (RSU), providing two-way transmission of information [5]. It is widely used in various fields, including fleet management © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 552–559, 2023. https://doi.org/10.1007/978-3-031-35389-5_38

An Analysis of Typical Scenarios and Design Suggestions

553

and ETC toll collection. Although it works well for short-range communication, supporting applications in high-speed mobile environments is challenging. In contrast, V2X technology derives from cellular network technology and, as a result, has the benefit of long communication distance [6]. However, it is also important to design systems from a user-centric perspective. V2X, as a system ultimately serves the user, is also important to be designed to meet the needs and preferences of potential users [7, 8]. Many human-computer interaction systems from different areas have been improved from this perspective, and a great deal of implicational design advice has been obtained [9]. The first step in user-centric design is the user’s application scenarios analysis, which could quickly help the designer know the user’s demand [10]. Thus far, previous studies have not systematically analyzed v2x application scenarios, which is significant and necessary to guide the V2X technology development. Therefore, this study aimed to review the potential V2X applied scenarios from previous studies to fill the research gap. We would first summarize these scenarios into several large categories. Second, we will describe each category’s most typical specific scenarios with a graphic representation. Finally, we would offer a design suggestion from a user-centric perspective.

2 Potential Scenarios of V2X Technology 2.1 Beyond-Visual-Range Information Presentation The first type of applied scenario of V2X technology is beyond-visual-range information presentation. It can greatly enhance road safety and advance traffic efficiency. V2X technology uses sensors to connect vehicles to everything, effectively giving users a global perspective [11], which improves the limitation of long-range perception of people. Specifically, the vehicle can obtain the position, velocity, and motion intention of objects beyond its line of sight based on V2X communication. In this application scenario, V2X technology can solve people’s limitations in two main ways: warning and real-time traffic management. The V2X warning system is designed to solve the problem that drivers may not see or recognize the incoming collision in time due to human visibility limitations, special weather conditions, and road.[12] proposed a warning system that used roadside multisensor fusion perception and V2X communication to provide vehicles with real-time blind spot early warning, which reduced the likelihood of traffic accidents. For example, when the main vehicle reaches an intersection (e.g., T-junction, roundabout, high-speed ramp, etc.), there is a risk of collision with vehicles in the side blind area [13]. The V2X real-time traffic management technology is also important in the application scenario of beyond-visual-range information presentation, giving vehicles real-time road information to maximize efficiency. The rational planning of traffic routes depends heavily on the traffic situation outside the driver’s line of sight. Although Beidou and GPS can also provide road information and help with route planning, V2X technology can collect more detailed data, plan the route more precisely, and is more affordable [14]. Based on V2I traffic surveillance, some research hypothesized that traffic light control could dynamically adjust priorities following traffic flows and volumes, and potential

554

X. Wang et al.

designs for traffic light controller systems with V2I communication were put forth [15, 16] (Fig. 1).

Fig. 1. The sight-blocked intersection scenario

The Sight-Blocked Intersection Scenario. In this scenario, the vehicle is about to enter the intersection, and there is a visually blind area on the right side that cannot see the vehicle coming from the right side of the intersection. The RSU could capture the information of the intersection and send or display a reminder message to the vehicle via V2I communication “There is a vehicle coming out on the right side of the intersection ahead.” In this process, the design of the vehicle interaction interface is related to the efficiency of information communication. How to highlight key information, reduce the interference of irrelevant information and reduce cognitive load are important factors to consider.

2.2 Public Services The second type of applied scenario of V2X technology is public service. It integrates voluminous public information to help vehicle users drive safely, such as emergency avoidance and sharing driver information. This function helps the vehicle user to obtain information about the driving environment and other drivers and can help the vehicle user to avoid driving risks and accidents. In the previous scenario of information presentation beyond the visual range, vehicles are more likely to passively accept information from the internet, which occurs in an unplanned and unconscious manner. In contrast, vehicles are active communicators and participants in traffic planning in public services. In the case of emergency vehicles (EVs), they act more like a speaker, broadcasting the

An Analysis of Typical Scenarios and Design Suggestions

555

urgent need for road access on the network. This applied scenario could be subdivided into three specific scenarios: remote driving, emergency vehicle preemption, and Driver information display. Remote driving can remote control vehicles in emergencies, like disabled passengers, vehicles in dangerous environments, and drivers in emergencies [17]. Emergency vehicle preemption can help Emergency vehicles (ambulances, accident vehicles, etc.) have priority to pass [18]. Driver information display can Transmit the real-time status of nearby drivers (i.e., whether tired or drunk) and Provide driving characteristics of nearby drivers (i.e., whether they are irritable drivers or not) [8]. If a driver knows that there is a driver around him who is driving drunk, he can avoid that vehicle in advance and send this dangerous information to other vehicles or to the traffic police center (Fig. 2).

Fig. 2. Remote control of vehicles in emergency situations (disabled passengers)

Remote Control of Vehicles in Emergency Situations (Disabled Passengers): In this scene, the car’s driver suddenly feels ill and cannot effectively control the vehicle. Through the communication between the vehicle and the cloud, the driver can request the central control platform to drive on his/her behalf. The central control platform can remotely control the vehicle and take the driver to the nearest hospital. In the interactive interface design of this function, the convenience of operation and private security are the key factors to be considered.

2.3 Inter-vehicles Coordination The third type of applied scenario of V2X technology is inter-vehicle coordination. It refers to obtaining the intentions and information of other vehicle users during formation, overtaking, and lane change scenarios, facilitating collaborative behavior between vehicle users. It can promote cooperative behavior between vehicle users and improve traffic efficiency. At the very beginning, autonomous vehicles and V2X communication are considered separate technologies; however, their combination enables two key cooperative features: sensing and maneuvering [19]. Thus, some researchers use a new term called Cooperative automated driving (CAD) to describe the corresponding technology

556

X. Wang et al.

[20]. While cooperative sensing allows the exchange of sensor data (e.g., raw sensor data) and object information, cooperative maneuver enables inter-vehicle coordination of the trajectories (e.g., lane change, platooning, CACC, and cooperative intersection control) [21]. Therefore, in the second scenario category, V2X technology allows autonomous vehicles to work together, which could significantly increase vehicles’ environmental perception, traffic safety, driver comfort, and friendliness. Despite Cigno and Segata [22] believing that the only application of Cooperative Driving (CD) is platooning on highways, we contend that platooning, cooperative lane changing, and cooperative overtaking makeup most of the CD cases. Platooning, cooperative overtaking, and lane changing are the three aspects covered most in this scenario. Platooning means vehicles traveling at the same speed and spacing on a motorway [23]. When traveling in formation, the vehicle in front notices an obstacle ahead and sends a message to the other vehicles in convoy to avoid the obstacle. V2X helps vehicles to stay in formation and avoid hazards. In a platoon, cooperative overtaking is like a cooperative truck overtaking the highway [24]. Lane changing means cooperative interactions at intersections when merging and turning left (Fig. 3).

Fig. 3. Coordinate vehicles in advance to avoid conflicts when meeting on narrow roads

Coordinate Vehicles in Advance to Avoid Conflicts when Meeting on Narrow Roads: In this scene, there is a narrow stretch of road. Before the two vehicles traveling in opposite direction reach the narrow junction, the central control platform judges and assigns the passing priority, and informs the related vehicles through the V2X communication. In the interactive design of this scenario, it is important to consider whether the distribution principles of the central control platform are fair, and whether the way of information transmission helps to promote prosocial behavior.

3 Design Suggestions for V2X In this section, we will offer some design suggestions for each of the four categories of V2X technology applications from a user-centric perspective based on the typical scenarios described.

An Analysis of Typical Scenarios and Design Suggestions

557

For beyond-visual-range information presentation, the possible problem is that excessive information leads to users’ cognitive overload. To reduce the users’ cognitive load, we can classify information and design its presentation according to the characteristics of different categories of information. The information can be divided into danger warning information, such as collision warning and general reminder information, such as congestion warning. In addition, danger warnings are prioritized by the severity of danger, and the general reminders are sorted by the sequence of events. Meanwhile, in order to maintain the simplicity of the interface, the quantity of information displayed on the interface should be reduced as much as possible. In other words, we encourage real-time synchronization of the information presented on the interface as a time and distance approach. For public service, we must focus on system design, especially on the fairness of resource allocation and privacy protection. To be specific, the system algorithm should be based on the existing laws and regulations and skillfully integrate the social accessibility norms. A controllable algorithm should be designed to ensure users’ absolute decisionmaking rights on personal privacy. For inter-vehicle coordination, it is necessary to strengthen the content management of the design interface and enhance the certainty of the collaboration. The interface should clearly show the status and intentions of the partner vehicle, and it can explicitly show the task of the collaborating parties and its completion with concrete steps to fulfill the task. We also should focus on interrupt management. Sometimes, we may suspend vehicle collaboration due to unexpected events, such as pedestrians who suddenly appear, but the partner vehicle may not know the reason for the failure of cooperation. In order to avoid unnecessary misunderstandings, the collaboration interface should support user input so that the user can make the necessary explanations after the suspension of cooperation.

4 Future Research Directions This study summarizes three common V2X technology scenarios: presentation of beyond-visual-range information, public service, and inter-vehicle coordination. First, we found that the V2X technology is widely used in beyond-visual-range information presentation scenarios. One possible explanation is that V2X technology solves traffic safety issues in these scenarios [25]. Life safety is a critical issue that researchers have paid more attention to in V2X communication. Second, V2X technology is important in public service and vehicle cooperation scenarios. Some specific scenarios are not common in the two types of scenarios, but V2X technology could ensure traffic safety in these special scenarios [26]. These special user-oriented scenarios could be paid more attention to in future studies. In addition, more research with actual or potential users is needed to understand what users think. Expert interviews and large-scale user questionnaires can be carried out in the future. Furthermore, according to these ways, we could optimize the classification of the V2X application scenario. Therefore, we need an ongoing review of possible new V2X technology application scenarios. Finally, the more extensive system design should be considered besides functional and interface design. For example, resource allocation is related to the algorithm’s fairness [27], and disclosure of personal information is related to privacy.

558

X. Wang et al.

5 Conclusion In conclusion, this study reviewed the previous study about V2X technology and summarized three main types of the applied scenario of V2X technology: presentation of beyond-visual-range information, public service, and inter-vehicle coordination. Based on these, we provided design suggestions and potential direction for V2X. This study is one step further toward a thorough understanding of the V2X technology’s application and would imply the future study of V2X.

References 1. Abdel Hakeem, S.A., Hady, A.A., Kim, H.: Current and future developments to improve 5GNewRadio performance in vehicle-to-everything communications. Telecommun. Syst. 75(3), 331–353 (2020). https://doi.org/10.1007/s11235-020-00704-7 2. Garcia, M.H.C., et al.: A tutorial on 5G NR V2X communications. IEEE Commun. Surv. Tutor. 23(3), 1972–2026 (2021) 3. Wang, Y., Wang, J., Ge, Y., Yu, B., Li, C., Li, L.: MEC support for C-V2X system architecture. In: 2019 IEEE 19th International Conference on Communication Technology (ICCT) (2019) 4. Zeadally, S., Javed, M.A., Hamida, E.B.: Vehicular communications for ITS: standardization and challenges. IEEE Commun. Stand. Mag. 4(1), 11–17 (2020) 5. Abboud, K., Omar, H.A., Zhuang, W.: Interworking of DSRC and cellular network technologies for V2X communications: a survey. IEEE Trans. Veh. Technol. 65(12), 9457–9470 (2016) 6. Gonzalez-Martín, M., Sepulcre, M., Molina-Masegosa, R., Gozalvez, J.: Analytical models of the performance of C-V2X mode 4 vehicular communications. IEEE Trans. Veh. Technol. 68(2), 1155–1166 (2018) 7. Tang, P., Sun, X., Cao, S.: Investigating user activities and the corresponding requirements for information and functions in autonomous vehicles of the future. Int. J. Ind. Ergon. 80, 103044 (2020). https://doi.org/10.1016/j.ergon.2020.103044 8. Zhou, X., Li, S., Ma, L., Zhang, W.: Driver’s attitudes and preferences toward connected vehicle information system. Int. J. Ind. Ergon. 91, 103348 (2022) 9. Ritter, F.E., Baxter, G.D., Churchill, E.F.: User-centered systems design: a brief history. In: Ritter, F.E., Baxter, G.D., Churchill, E.F., Ritter, F.E. (eds.) Foundations for Designing UserCentered Systems: What System Designers Need to Know About People, pp. 33–54. Springer, London (2014). https://doi.org/10.1007/978-1-4471-5134-0_2 10. Sears, A., Jacko, J.A. (eds.): Human-Computer Interaction: Design Issues, Solutions, and Applications. CRC Press, Boca Raton (2009) 11. Bazzi, A., Campolo, C., Masini, B.M., Molinaro, A.: How to deal with data hungry V2X applications? In: Proceedings of the Twenty-First International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (2020) 12. Xiang, C., et al.: Multi-sensor fusion algorithm in cooperative vehicle-infrastructure system for blind spot warning. Int. J. Distrib. Sens. Netw. 18(5), 15501329221100412 (2022) 13. Xin, C., Dan, L., Shuo, H.: Research on deceleration early warning model based on V2X. In: 2020 12th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA) (2020) 14. Aissaoui, R., Menouar, H., Dhraief, A., Filali, F., Belghith, A., Abu-Dayya, A.: Advanced realtime traffic monitoring system based on V2X communications. In: 2014 IEEE International Conference on Communications (ICC) (2014)

An Analysis of Typical Scenarios and Design Suggestions

559

15. Krajzewicz, D., et al.: COLOMBO: investigating the potential of V2X for traffic management purposes assuming low penetration rates (2013) 16. Schwarz, F., Fastenmeier, W.: Augmented reality warnings in vehicles: effects of modality and specificity on effectiveness. Accid. Anal. Prev. 101, 55–66 (2017) 17. Vitale, F., Roncoli, C.: Distributed formation control for managing CAV overtaking and intersection maneuvers. IFAC-PapersOnline 55(13), 198–203 (2022) 18. Shaaban, K., Khan, M.A., Hamila, R., Ghanim, M.: A strategy for emergency vehicle preemption and route selection. Arab. J. Sci. Eng. 44(10), 8905–8913 (2019). https://doi.org/10. 1007/s13369-019-03913-8 19. Hobert, L., Festag, A., Llatser, I., Altomare, L., Visintainer, F., Kovacs, A.: Enhancements of V2X communication in support of cooperative autonomous driving. IEEE Commun. Mag. 53(12), 64–70 (2015) 20. Keshavamurthy, P., Pateromichelakis, E., Dahlhaus, D., Zhou, C.: Resource scheduling for V2V communications in co-operative automated driving. In: 2020 IEEE Wireless Communications and Networking Conference (WCNC) (2020) 21. Boban, M., Kousaridas, A., Manolakis, K., Eichinger, J., Xu, W.: Use cases, requirements, and design considerations for 5G V2X. arXiv preprint arXiv:1712.01754 (2017) 22. Cigno, R.L., Segata, M.: Cooperative driving: a comprehensive perspective, the role of communications, and its potential development. Comput. Commun. 193, 82–93 (2022) 23. Balador, A., Bazzi, A., Hernandez-Jayo, U., de la Iglesia, I., Ahmadvand, H.: A survey on vehicular communication for cooperative truck platooning application. Veh. Commun. 35, 100460 (2022) 24. Fank, J., Knies, C., Diermeyer, F.: After you! Design and evaluation of a human machine interface for cooperative truck overtaking maneuvers on freeways. In: 13th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 90–98 (2021) 25. Anaya, J.J., Talavera, E., Giménez, D., Gómez, N., Jiménez, F., Naranjo, J.E.: Vulnerable road users detection using V2X communications. In: 2015 IEEE 18th International Conference on Intelligent Transportation Systems (2015) 26. Miao, L., Virtusio, J.J., Hua, K.-L.: PC5-based cellular-V2X evolution and deployment. Sensors 21(3), 843 (2021) 27. Bonald, T., Roberts, J.: Multi-resource fairness: objectives, algorithms and performance. In: Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 31–42 (2015)

Who Should We Choose to Sacrifice, Self or Pedestrian? Evaluating Moral Decision-Making in Virtual Reality Huarong Wang1 , Dongqian Li1,2 , Zhenhang Wang3 , Jian Song1 , Zhan Gao3(B) , and David C. Schwebel4 1 Traffic Psychology, Institute of Special Environmental Medicine, Nantong University, No. 9

Seyuan Road, Nantong 226019, Jiangsu, China 2 Jiangsu Shipping College, Nantong, China 3 School of Information Science and Technology, Nantong University, Nantong, China [email protected] 4 Department of Psychology, University of Alabama at Birmingham, Birmingham, AL, USA

Abstract. Ethical issues surrounding autonomous vehicles (AVs) have emerged as a classic topic in academic circles. When an AV faces a situation with inevitable casualties, how can the machine make the most ethical decision about who to sacrifice? Does the AV protect its passengers in the AV, or save pedestrians on the roadway? To provide reference for designing moral algorithms that govern AVs, the present study considers the roles of time pressure, pedestrian demographics, and driver demographics on human driver’s moral decision-making. Sixty college students (50% men, mean age = 20.13 years, SD = 1.76) participated in a series of moral decisions involving self- versus other-sacrifice within a virtual reality driving scenario. A 3 (pedestrian age group: child vs middle-age vs elderly) x 2 (pedestrian gender: male vs female) x 2 (time pressure: present vs absent) x 2 (participant gender: male vs female) mixed factorial design was implemented. The results showed that: (1) individuals chose to sacrifice themselves more often than pedestrians, with self-sacrifice more common when faced with child pedestrians and female pedestrians; (2) female participants chose to sacrifice themselves more often than males; and (3) when facing time pressure, individuals chose to sacrifice themselves more than pedestrians, but there was no significant difference between self vs other-sacrifice without time pressure.We concluded that moral decisions when faced with inevitable sacrifice in a simulated traffic situation followed super altruism patterns. Female participants were more deontological in moral judgment and greater priority was given to saving female and child pedestrians in decisions. Further, individual’s decision making under time pressure was more deontic. Results have implications for development of AV machine learning paradigms. Keywords: Moral dilemma · Virtual reality · Time pressure · Pedestrian

H. Wang and D. Li—Contributed equally to this work. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 560–572, 2023. https://doi.org/10.1007/978-3-031-35389-5_39

Who Should We Choose to Sacrifice, Self or Pedestrian?

561

1 Introduction The rapid development of autonomous vehicles (AVs) has aroused great attention to the ethical challenges they present. In particular, the “dilemma” of how machine- learning devices engage in moral decision-making has emerged as a classic topic in academic circles. When an AV is faced with the situation of inevitable casualties, what decision should they make? Should they protect the passengers in the AV as a priority or should they obey a utilitarian doctrine, which claims that the moral rightness or wrongness of actions depends solely on the quality of its consequences, and therefore casualties should be minimized? Should the identity or demographics of the passengers or the potential victims on the roadway matter? Gathering information about actual human decisions provides a valuable reference to design moral algorithms governing AV actions. Previous studies suggest individuals’ moral decisions in situations of inevitable casualty are affected by the type of pedestrian(s) who may be killed in traffic situations. Individuals tend to make active choices to minimize loss of life when facing different number of people in traffic (e.g., sacrificing fewer to save a larger group) [1–3], and tend to spare females over males [1, 4]. Further, participants seem to accept sacrificing of elderly people over younger ones; sacrificing children is accepted least often [5–7]. Previous research also suggests the presence of time pressure impacts moral decisionmaking. People tend to make more deontological moral decisions – decisions that evaluate the rightness of an action not merely in terms of its consequences but also by embracing moral norms such as duties – when under time pressure compared to when they have ample time to make a decision [6, 8–10]. This leads to time-pressured decisions to sacrifice AV passengers rather than pedestrians. For example, Frank and colleagues (2019) used a moral machine online survey to collect 807 people’s decisions under 2 decision-making modes, deliberate and without time pressure versus intuitive and with time pressure. Pedestrians were sacrificed considerably less often (21.5%) under time pressure than without it (36.5%) [6]. However, this deontological moral decision-making tendency does not always emerge in more ecologically valid simulated traffic situations. For example, unlike previous results using survey or hypothetical scenario methodologies [6, 8, 9], Sütfeld and colleagues employed immersive virtual reality to assess ethical behavior in simulated road traffic scenarios [4]. They found that time pressure decreased people’s consistency in their patterns of decisions, suggesting moral judgments and behaviors are context-dependent and may vary based on multiple contextual factors, including whether there is time pressure to act quickly versus ample time to be thoughtful in considering the decision. Another factor that impacts moral decision-making is individual differences, such as gender and culture. Previous research suggests women are more deontological than men in moral judgment, while men showed a stronger preference for utilitarian judgments than women [11–15]. Thus, we might expect women to sacrifice themselves more often than men out of perceived duty or obligation. Cultural effects have also been observed in previous research. Awad and colleagues, for example, conducted a large-scale online experiment to gather 40 million decisions from millions of people in ten languages across 233 countries and territories. Results showed substantial cross-cultural variation in decision-making, with a greater preference to spare younger rather than older individuals in Central and South America compared to Eastern Asia [1].

562

H. Wang et al.

One factor that is lacking in most previous research but relevant to the development of AV paradigms is the moral dilemma involved in sacrificing oneself. Most existing research presents dilemmas concerning other-sacrifice, but human behavior may differ when one’s own life is involved. Available research finds that individuals tend to protect themselves more often in decision-making [6, 16–18]. The current study presented individuals with a virtual reality (VR) scenario in which they faced a dilemma to sacrifice themselves or a pedestrian in the roadway. Specifically, drivers were traveling on a mountainous highway when they encountered a fork in the road and failed brakes. Choosing one direction led to a destroyed bridge, where their car would plummet off a cliff. Choosing the other direction led to a pedestrian on the road, who would be struck and killed. We manipulated the type of pedestrian (child vs middle-aged vs elderly; male vs female) as well as time pressure to make the decision (5-s vs 30-s window). All data were collected in China, an Eastern Asian country. We posited four primary hypotheses: (1) Compared with a middle-aged pedestrian, when faced with an elderly person or a child on the virtual road, individuals will choose to sacrifice themselves more often; (2) Compared with a male pedestrian, when faced with a female pedestrian on the virtual road, individuals will choose to sacrifice themselves more often; (3) individuals will choose to sacrifice themselves more often under time pressure compared with an absence of time pressure; and (4) Compared with male individuals, female individuals will choose to sacrifice themselves more often.

2 Methods 2.1 Participants Based on previous research [19], the target sample size was determined prior to the study using G-Power 3.1.9.2. Power analysis indicated that a total of 12 participants in each group (number of measurements = 6; number of groups = 4) was needed to detect a large effect size (η2p = 0.14, f = 0.40) when α = 0.05 and power = 0.95, using a repeated measures analysis of variance (ANOVA). In total, we recruited 64 participants. Four were excluded from further analysis due to errors while collecting physiological data, leaving valid data for analysis from 60 participants (50% male; mean age = 20.13 years old, SD = 1.76, range = 20–24). Among them, 42 participants reported they were licensed drivers (70%). Approval for the research was obtained from the Nantong University Academic Ethics Committee. Written consent was secured from all participants prior to participation. 2.2 Study Design We implemented a 3 (pedestrian age: child vs middle-aged person vs elderly person) × 2 (pedestrian gender: male vs female) × 2 (time pressure: with vs without) × 2 (participant gender: male vs female) mixed factorial design. Pedestrian age and gender served as within-subject predictors, and time pressure and participant gender as between-subject predictors. The dependent variable was participants’ decision in VR traffic situations: what percentage of times did they choose to sacrifice themselves?

Who Should We Choose to Sacrifice, Self or Pedestrian?

563

2.3 Virtual Reality Road Traffic Scenarios Decision-making data were collected within a virtual reality traffic scenario. We developed a mountainous highway traffic environment with. Mountainous highways are common in China, and they suffer from high crash and fatality rates due to the complex terrain [20–23]. The environment included several traffic scenarios, each of which began with a typical driving section and then introduced an emergency. During the typical driving section, the participants were asked to drive at a speed of about 100 km/h for 3 km. They encountered no traffic and traveled along a typical mountain highway, with some curves and inclines/declines. Next, participants faced an emergency situation with a fork in the road ahead. One side of the fork led to a collapsed bridge, where the driver would descend off a high cliff and presumably die. The other side of the fork presented a crashed vehicle with a pedestrian standing in the middle of the road. As the participant approached this fork in the road, they were unable to decelerate, as the vehicle’s brakes failed; when they depressed the brake pedal, the vehicle did not decelerate. Given their speed and the size of the cliff, either option would clearly lead to a fatality. Thus, participants were forced to steer into one fatal choice or the other. Across scenarios, we manipulated the pedestrian’s age (child vs middle-aged person vs elderly person) and gender (male vs female) (See Fig. 1). To optimize participant alertness, at the beginning of the emergency segment two visual and auditory warnings were presented, “Bridge collapse ahead on the right side of the fork, danger!” and “Attention! There is a pedestrian on the left side of the fork ahead!”. The two warnings were presented simultaneously. Further, to reduce potential impact of decisions based on sidedness, the collapsed road and the pedestrian appeared on the left versus right side of the road in a random manner. To conform to ethical guidelines in experimental studies, the VR simulation did not display the actual crash, but went black shortly before the collision or cliff drop-off and then presented a written message on the screen, either “You’ve chosen to sacrifice yourself” or “You’ve chosen to sacrifice the pedestrian”. Across traffic scenarios, we manipulated the time when the vehicle with brake failure arrived at the fork to influence time pressure on drivers to make a decision. Based on previous research [6], we selected 30s as the latency with no time pressure and 5s as the latency with time pressure. The virtual traffic scene was presented to participants using HTC VIVE positioning and head-mounted display (HMD) equipment (Taoyuan City, Taiwan). We used Unity2019.3.1f (Unity Technologies, San Francisco, USA) to develop the VR software and a car-driving simulator based on the Logitech G27 platform to evaluate participants’ driving behaviors and decisions. Figure 2 illustrates the setup and visuals. Immediately following their participation, participants completed a short survey to report whether they experienced simulator sickness via the Simulator Sickness Questionnaire (SSQ) [24], a 16- item scale describing the extent to which participants feel nausea, oculomotor, and disorientation after experiencing a VR environment. Each item is scored on a 4-point scale, with 0 signifying no such feeling and 3 intense feelings. Higher scores indicate more severe simulator sickness.

564

H. Wang et al.

Fig. 1. Schematic representations of the virtual reality traffic scenes: (1) a boy on the virtual road; (2) a girl on the virtual road; (3) a middle-aged man on the virtual road; (4) a middle-aged woman on the virtual road; (5) an elderly man on the virtual road; (6) an elderly woman on the virtual road; (7) the collapsed bridge; and (8) black screen with text reading, “you’ve chosen to sacrifice yourself”.

Who Should We Choose to Sacrifice, Self or Pedestrian?

565

Fig. 2. Experimental setup and visuals.

2.4 Procedure Each participant was tested individually in a quiet laboratory room. Participants were informed about the purpose of the experiment and completed informed consent procedures. After that, research assistants assisted participants to wear the VR helmets and familiarized them with the Logitech steering wheel. After calibrating VR equipment, the research assistant guided participants to begin driving. During the initial “practice” phase, participants were asked to drive 3 km on a mountainous highway at a speed of about 100 km/h. If the participant handled that stretch successfully twice without crashing (e.g., hitting the guardrail), the practice section was complete, participants took a 2 min break, and then the formal experiment began. If participants failed three practice sessions, they would not participate in the following experiment. No participants were excluded for this reason. During the formal experimental stage, participants were randomly assigned to complete the driving task under time pressure (5s) or without time pressure (30s), with assignment stratified by participant gender. The 6 types of pedestrians were presented on the left and right side of the fork randomly, so each participant drove 12 trials and made 12 decisions (3 pedestrian age x 2 pedestrian gender x 2 sides). A 1 min break was provided between each trial. On average, the full experiment lasted approximately 30 min and participants received small gifts as compensation for their time. 2.5 Data Analysis First, we tested examined descriptive data concerning simulator sickness and evaluated any influence of driving experience on moral decision results. Next, we calculated the

566

H. Wang et al.

average proportion of participants who chose to sacrifice themselves versus the pedestrians in each of the traffic situations. These data were examined descriptively and using inferential t-tests and ANOVA models. SPSS 22.0 was used to conduct all analyses.

3 Results Data preprocessing found that there was no significant difference in moral decisionmaking among individuals with different driving experiences. Therefore, the influence of driving experience was not considered further. Our analysis of simulator sickness showed that the average score was 0.34 (SD = 0.02; score range = 0–3) and all participants average scored under 1, indicating that the VR traffic situation did not cause significant motion sickness to any participants. Table 1 presents descriptive data on the decision for self-sacrifice versus othersacrifice under various scenarios. Paired t-test analyses showed that when there was a child on VR road, the proportion of participants who chose to sacrifice themselves was significantly higher than those who sacrificed the child (p < 0.001). Similarly, when a female pedestrian was on the road, participants were more likely to sacrifice themselves than sacrifice the pedestrian (p < 0.05). In all other situations, there was no significant difference in sacrificing oneself versus others. Participants chose to sacrifice themselves more often than sacrifice the pedestrian when under time pressure, p < 0.001, but when there was no time pressure, selection of self-sacrifice vs other-sacrifice was similar, p > 0.05. Finally, there was no significant difference between self-sacrifice and other-sacrifice among male participants, p > 0.05, but female participants chose to sacrifice themselves more often than sacrifice the pedestrian, p < 0.001. Table 2 shows the percentage of self-sacrifice decisions across participant gender and time pressure situations. A 2 (time pressure: with vs. without) × 2 (participant gender: male vs. female) × 2 (pedestrian gender: female vs. male) × 3 (pedestrian age group: child vs middle-aged person vs. elderly person) repeated measures ANOVA showed a significant main effect of pedestrian age, F(2, 112) = 20.27, p < 0.001, η2p = 0.27. Participants chose to sacrifice themselves more often when they encountered a child versus a middle-aged (p < 0.001) or elderly (p < 0.001) pedestrian. The main effect of participant gender was also significant, F(1, 56) = 11.4, p < 0.01, η2p = 0.17, and showed female participants chose to self-sacrifice more often than male participants. No significant main effects emerged for pedestrian gender or time pressure, nor did any significant interaction effects emerge. The interaction effect of age group by time pressure was marginally significant, F(2, 112) = 2.45, p = 0.091. Simple effect tests showed that under time pressure, participants chose to sacrifice themselves less often when they faced an elderly pedestrian versus a child or a middle-age one, and they chose to sacrifice themselves more often when they faced a child versus a middle-aged person. However, in the absence of time pressure, participants chose to sacrifice themselves less often when they faced an elderly person than when they faced a child or a middle-aged person, but there was no difference between when they faced a child versus a middle-aged person (See Fig. 3).

Who Should We Choose to Sacrifice, Self or Pedestrian?

567

Table 1. Percentage of self-sacrifice versus pedestrian sacrifice at different variable levels (M ± SE) Variables Traffic situation

Time pressure

Participant gender

Self-sacrifice

Sacrifice pedestrian

t

Child pedestrian

76.67 ± 4.82

23.33 ± 4.82

5.53***

Middle-aged pedestrian

55.00 ± 5.24

45.00 ± 5.24

0.95

Elderly pedestrian

41.67 ± 5.66

58.33 ± 5.66

−1.47

Male pedestrian

55.28 ± 4.40

44.72 ± 4.40

1.20

Female pedestrian

60.28 ± 4.27

39.72 ± 4.27

2.41*

Time pressure

62.22 ± 6.03

37.78 ± 6.03

3.59***

Without time pressure

53.33 ± 5.55

46.67 ± 5.55

0.95

Male participants

45.00 ± 5.98

55.00 ± 5.98

−0.84

Female participants

70.56 ± 4.62

29.44 ± 4.62

4.45***

Table 2. Percentage of self-sacrifice under various variable levels (M ± SE) Traffic situation Male participant

Female participant

Time pressure

Without time pressure

Time pressure

Without time pressure

Male child pedestrian

76.67 ± 10.76

53.33 ± 12.41

93.33 ± 4.45

76.67 ± 10.76

Male middle-aged pedestrian

36.67 ± 12.41

40.00 ± 13.07

63.33 ± 10.31

63.33 ± 11.41

Male elderly pedestrian

20.00 ± 10.69

23.33 ± 10.76

73.33 ± 10.76

43.33 ± 11.82

Female child pedestrian

80.00 ± 10.69

56.67 ± 12.79

93.33 ± 17.59

83.33 ± 7.97

Female middle-aged pedestrian

43.33 ± 12.79

56.67 ± 12.79

66.67 ± 10.54

70.00 ± 10.69

Female elderly pedestrian

23.33 ± 10.76

30.00 ± 11.75

76.67 ± 9.59

43.33 ± 11.82

568

H. Wang et al.

Fig. 3. Individuals’ percentage of self-sacrifice decisions by pedestrian age group and time pressure.

4 Discussion The present study considered decision-making in moral dilemmas presented through VR, offering data that may be useful to develop AV machine-learning paradigms for traffic situations with high stakes. The results showed that individuals tend to choose to sacrifice themselves more often than sacrifice pedestrians when presented with only those two choices, although the tendency to self-sacrifice varied based on the participant’s gender, the gender and age of the pedestrian, and whether there was time pressure to make the decision. 4.1 Individual’s Moral Decision-Making in Tasks Involving Self-sacrifice Choice Except for the traffic situations involving an elderly pedestrian, the proportion of individuals who chose to sacrifice themselves was higher than the proportion who chose to sacrifice the pedestrian on the VR road. The difference between the two choices (self-sacrifice versus other-sacrifice) was statistically significant when there was a child or a female pedestrian on simulated roadway, indicating that individuals’ moral decision-making displayed super altruistic tendencies [25, 26]. These results differ somewhat from previous studies using different methodologies [6, 17, 18], which reported individual’s moral decision reflected self-protective tendencies. One explanation for the difference in our results compared to previous research is our use of VR traffic situations to carry out the experiments. Compared with simulation studies using photographs of traffic scenes [6, 17], or questionnaire surveys [18], VR technology offers a high fidelity interactive situation, creating higher ecological validity and potentially leading individuals to respond in a more realistic manner to the moral prospect of sacrificing others. This hypothesis is supported by results from Bergmann, which also showed individuals were willing to sacrifice themselves more than others when presented with the decision on VR roadway [2]. A second explanation for our

Who Should We Choose to Sacrifice, Self or Pedestrian?

569

finding is the fact that the data were collected in China, a culture influenced by traditional Confucian culture [27] where individuals tend to give priority to others. With the exception of the Zhu et al. questionnaire survey study, other conflicting data were collected in Western countries [18]. The finding that male and female individuals responded to the moral dilemmas differently, with females choosing to sacrifice themselves more than males, is consistent with previous research [11–14] and confirm that females are more deontological than males in moral judgment. Women may experience stronger affective reactions to the notion of causing harm to others, while males engage in more cognitive processing aimed at maximizing overall outcomes [14]. This possibility is supported by the fact that when male participants faced a child on VR road in our study, they chose to sacrifice themselves more often than sacrifice the child, a pattern similar to the female participants. However, when men faced a middle-aged or elderly person, they chose less often to sacrifice themselves, perhaps reasoning that the value of their lives was equal or superior to the others given their younger age. 4.2 The Effect of Pedestrian Type on Individual’s Moral Decision-Making Consistent with previous findings [7, 28, 29], individuals in our study chose to sacrifice themselves more often when they faced a child in the virtual road compared with a middle-aged or elderly pedestrian. This supports the conclusion that people tend to value life less as an individual ages [30]. We also found that individuals chose to sacrifice themselves more often when confronted with female pedestrians, but self-sacrifice was about equal to other-sacrifice when participants confronted male pedestrians. This result may reflect the tendency of individuals to give priority to women in moral decisionmaking [4, 31]. 4.3 The Effect of Time Pressure on Individual’s Moral Decision-Making Under time pressure, we found that individuals chose to sacrifice themselves more than the pedestrian, while there was no significant difference between self- and other-sacrifice without time pressure. This result replicates previous studies [6, 8, 10], and indicates individuals tend to display more deontological reasoning under time pressure, choosing to sacrifice themselves rather than the other pedestrian. Combined with our broader finding that individuals chose to sacrifice themselves more often than the pedestrian across most VR traffic situations, these results suggest that, in the highly self-involved moral dilemma situation where the choice of “sacrificing oneself” exists, whether there is time pressure or not, individuals tend toward a deontic decision. Time pressure intensifies the deontic tendency of individual moral decisionmaking rather than changing it. Specifically, under time pressure, the proportion of individuals who chose to sacrifice themselves when facing a child was significantly higher than the proportion facing a middle-aged or elderly person (and there was no difference in the proportion who chose to sacrifice themselves when facing a middle-aged versus elderly person). Without time pressure, individuals chose to sacrifice themselves when facing a child more than when facing an elderly person, but at a rate statistically similar to self-sacrifice when facing

570

H. Wang et al.

a middle-aged person. Thus, individuals are more inclined toward deontology under time pressure, but when individuals have time to consider their decisions, they tend somewhat more toward utilitarian responses based on the life value model. Therefore, with sufficient time middle-aged people are given higher priority than elderly ones, a decision that reflects people’s general ethical preferences to respect the value of life [28, 30]. 4.4 Application to Autonomous Vehicles and Study Limitations As engineers develop machine learning protocols for AVs, they must consider paradigms for machines to follow when confronted with difficult moral decisions. Our results offer additional data concerning human beliefs about the right moral decisions and suggest self-sacrifice is accepted by many, especially when it allows younger people to survive. There is some tendency to preserve female lives over male lives as well, although females tend to self-sacrifice more than males. Time pressure decreases utilitarian decisions that value preservation of life and sacrifice of those who are older. Our study suffered from two notable limitations. First, we studied individual’s moral decision-making in a virtual environment. Although VR offers the advantage of presenting vivid, realistic-feeling traffic scenes and allowing potentially-fatal moral decisions without jeopardizing participant safety, it also may create a “game-like” scenario among participants that affects results. It is unclear if decisions in a virtual world would translate directly to decisions in the real world. Second, the participants in our study were relatively young (mean age = 20.13 years), and results may not generalize. Older drivers may sacrifice themselves more often, for example, and future research should recruit a wider age range of drivers.

5 Conclusions The moral decision of individuals in the dilemma VR traffic situation presented followed super altruism; that is, individuals were more inclined to sacrifice themselves to protect others. Greater priority for survival was given to female and child pedestrians in decisionmaking; female individuals were more deontological in their moral judgments; and individual moral decision-making under time pressure was more deontic. This study contributes to the field by expanding our understanding of factors that influence moral decisions in traffic and offering data to help guide development of machine learning protocols for AVs. Funding. This work was primarily supported by Major Project of Philosophy and Social Science Research in Colleges and Universities of Jiangsu Province in China [2020SJZDA118], and Jiangsu Social Science Fund Project in China [21JYB013].

References 1. Awad, E., Dsouza, S., Kim, R., Schulz, J., Rahwan, I.: The moral machine experiment. Nature 563(7729), 59–64 (2018)

Who Should We Choose to Sacrifice, Self or Pedestrian?

571

2. Bergmann, L.T., et al.: Autonomous vehicles require socio-political acceptance-an empirical and philosophical perspective on the problem of moral decision making. Front. Behav. Neurosci. 12(31), 1–12 (2018) 3. Greene, J.D.: Moral Tribes: Motion, Reason and the Gap between us and them. Penguin Books, London (2014) 4. Sütfeld, L.R., Gast, R., König, P., Pipa, G.: Using virtual reality to assess ethical decisions in road traffic scenarios: applicability of value-of-life-based models and influences of time pressure. Front. Behav. Neurosci. 11(122), 1–13 (2017) 5. Faulhaber, A.K., et al.: Human decisions in moral dilemmas are largely described by utilitarianism: virtual car driving study provides guidelines for autonomous driving vehicles. Sci. Eng. Ethics 25(6293), 399–418 (2018) 6. Frank, D.A., Chrysochou, P., Mitkidis, P., Ariely, D.: Human decision-making biases in the moral dilemmas of autonomous vehicles. Sci. Rep. 9(1), 1–19 (2019) 7. Kawai, N., Kubo, K., Kubo-Kawai, N.: “Granny dumping”: acceptability of sacrificing the elderly in a simulated moral dilemma. Jpn. Psychol. Res. 56(3), 254–262 (2014) 8. Greene, J.D., Nystrom, L.E., Engell, A.D., Darley, J.M., Cohen, J.D.: The neural bases of cognitive conflict and control in moral judgment. Neuron 44(2), 389–400 (2004) 9. Suter, R.S., Hertwig, R.: Time and moral judgment. Cognition 119(3), 454–458 (2011) 10. Swann, W.B., Gómez, Á., Buhrmester, M.D., López-Rodríguez, L., Jiménez, J., Vázquez, A.: Contemplating the ultimate sacrifice: identity fusion channels pro-group affect, cognition, and moral decision making. J. Pers. Soc. Psychol. 106(5), 713–727 (2014) 11. Armstrong, J., Friesdorf, R., Conway, P.: Clarifying gender differences in moral dilemma judgments: the complementary roles of harm aversion and action aversion. Soc. Psychol. Personality Sci. 10(3), 353–363 (2019) 12. Bracht, J., Zylbersztejn, A.: Moral judgments, gender, and antisocial preferences: an experimental study. Theor. Decis. 85(3–4), 389–406 (2018). https://doi.org/10.1007/s11238-0189668-6 13. Capraro, V., Sippel, J.: Gender differences in moral judgment and the evaluation of genderspecified moral agents. Cogn. Process. 18(4), 399–405 (2017). https://doi.org/10.1007/s10 339-017-0822-9 14. Friesdorf, R., Conway, P., Gawronski, B.: Gender differences in responses to moral dilemmas: a process dissociation analysis. Pers. Soc. Psychol. Bull. 41(5), 696–713 (2015) 15. Fumagalli, M., et al.: Gender-related differences in moral judgments. Cogn. Process 11(3), 219–226 (2010) 16. Mamak, K., Glanc, J.: Problems with the prospective connected autonomous vehicles regulation: Finding a fair balance versus the instinct for self-preservation. Technol. Soc. 71, 102127 (2022) 17. Mayer, M.M., Bell, R., Buchner, A.: Self-protective and self-sacrificing preferences of pedestrians and passengers in moral dilemmas involving autonomous vehicles. PLoS ONE 16(12), e0261673 (2021) 18. Zhu, A., Yang, S., Chen, Y., Xing, C.A.: Moral decision-making study of autonomous vehicles: Expertise predicts a preference for algorithms in dilemmas. Personality Individ. Differ. 186, 111356 (2022) 19. Zhan, Y., Xiao, X., Tan, Q., Zhong, Y.: Influence of self-relevance on moral decision-making under reputational loss risk: An ERP study. Chin. Sci. Bull. 65(19), 1996–2009 (2020) 20. Chen, F., Chen, S., Ma, X.: Crash frequency modeling using real-time environmental and traffic data and unbalanced panel data models. Int. J. Environ. Res. Public Health 13(6), 609 (2016) 21. Wen, H., Xue, G.: Injury severity analysis of familiar drivers and unfamiliar drivers in singlevehicle crashes on the mountainous highways. Accid. Anal. Prev. 144, 105667 (2020)

572

H. Wang et al.

22. Xue, G., Wen, H.Y.: Crash-prone section identification for mountainous highways considering multi-risk factors coupling effect. J. Adv. Transp. 2019, 9873832 (2019) 23. Xue, G., Wen, H.: How accurately do the drivers perceive the hazardous degrees of different mountainous highway traffic risk factors? Cogn. Technol. Work 23(1), 177–187 (2020). https://doi.org/10.1007/s10111-020-00623-2 24. Kennedy, R., Lane, N., Berbaum, K., Lilienthal, M.: Simulator sickness questionnaire: An enhanced method of quantifying simulator sickness. Int. J. Aviat. Psychol. 3, 203–220 (1993) 25. Crockett, M.J., et al.: Dissociable effects of serotonin and dopamine on the valuation of harm in moral decision making. Curr. Biol. 25(14), 1852–1859 (2015) 26. Volz, L.J., Welborn, B.L., Gobel, M.S., Gazzaniga, M.S., Grafton, S.T.: Harm to self outweighs benefit to others in moral decision making. Proc. Natl. Acad. Sci. 114(30), 7963–7969 (2017) 27. Tsai, D.F.C.: The bioethical principles and Confucius’ moral philosophy. J. Med. Ethics 31(3), 159–163 (2005) 28. Cropper, M.L., Aydede, S.K., Portney, P.R.: Preferences for life savings programmes: how the public discounts time and age. J. Risk Uncertain. 8, 243–265 (1994) 29. Mandel, D.R., Vartanian, O.: Taboo or tragic: effect of trade off type on moral choice, conflict, and confidence. Mind Soc. 7(2), 215–226 (2008) 30. Johansson-Stenman, O., Martinsson, P.: Are some lives more valuable? An ethical preferences approach. J. Health Econ. 27(3), 739–752 (2008) 31. Fazio, R.H., Olson, M.A.: Implicit measures in social cognition research: their meaning and use. Annu. Rev. Psychol. 54(1), 297–327 (2003)

A Literature Review of Current Practices to Evaluate the Usability of External Human Machine Interface Yahua Zheng1,2 , Kangrui Wu1,2 , Ruisi Shi1,2 , Xiaopeng Zhu3 , and Jingyu Zhang1,2(B) 1 CAS. Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of

Sciences, Beijing, China [email protected] 2 Department of Psychology, University of the Chinese Academy of Sciences, Beijing, China 3 Chongqing Changan Automobile Co., Ltd., Chongqing, China

Abstract. In traditional traffic environments, human drivers can communicate with other drivers, cyclists, and pedestrians through gestures, facial expressions, etc., to convey their intentions. However, most of the current autonomous vehicles cannot effectively communicate with other road users with an autonomous driving model. The critical problem is that other road users need help understanding the intention of the autonomous vehicle. Thus, autonomous vehicles need to communicate with the outside world in addition to the ability to detect other road users and make relevant maneuvers to avoid potential conflicts. The external human-machine interface (eHMI) communicates with other road users outside the vehicle. It is believed to effectively resolve the conflict between fully autonomous vehicles and other road users. As pedestrians need to understand the intentions of the vehicles quickly and accurately, usability is of great importance for an eHMI. The current study sought to thoroughly analyze the existing literature on evaluation methods and establish a theory-driven evaluation system. We developed our evaluative framework based on the Situational Awareness model of pedestrians. In making a safe crossing decision, the pedestrians need to form a correct situation awareness in which they must form accurate perception, proper understanding, and timely prediction of vehicle behaviors [1–3]. Based on this model, a well-designed eHMI must facilitate the perception, understanding, and prediction processes. In this way, safety, perceptibility, intelligibility, and adaptability are required, and the four major usability domains have been theoretically established. Keywords: eHMI · Autonomous Vehicle · Pedestrian · Evaluation

1 Introduction 1.1 Autonomous Vehicles In the traditional traffic environment, human drivers need both driving ability and specific communication ability to facilitate the necessary interaction with other vehicle drivers, cyclists, or pedestrians to maintain traffic safety. For example, drivers express © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 573–586, 2023. https://doi.org/10.1007/978-3-031-35389-5_40

574

Y. Zheng et al.

their intentions to other road users through eye contact, gestures, etc. This form of communication, however, does not apply to autonomous vehicles. Although the current autonomous vehicle can detect the road users around the vehicle and present the specific location information to the autonomous driver through the human-machine interface in the vehicle, it lacks the ability of expression. The lack of effective interaction will create uncertainty in the traffic environment, increase the possibility of traffic accidents, and reduce people’s acceptance of autonomous vehicles on public roads [4]. Evidence from a large amount of literature shows that the lack of trust and effective interaction modules will reduce perceived safety, and pedestrians may be more cautious and feel unsafe when interacting with autonomous vehicles [5–7]. Due to the inefficient interaction with other vulnerable road users, specifically pedestrians, the public is skeptical about the safety of autonomous vehicles, which negatively affects trust formation [8, 9]. The emergence of autonomous vehicles can establish a safer and more efficient traffic system. Still, the potential safety issues associated with the lack of communication between vehicles and pedestrians could significantly impact its development. In the current period, pedestrians may need help to accurately identify the automation levels and the diverse strategies of different autonomous vehicles. This can affect the accuracy of vehicle behavior judgment. When other road users predict the behavior of autonomous vehicles based on their experience with manual-driving cars or their unsubstantiated expectations of autonomous vehicles, they may generate false expectations, leading to traffic accidents [10]. Therefore, establishing effective communication between autonomous vehicles and other road users and ensuring safe interaction is critical in developing autonomous vehicles. From a vehicle design perspective, autonomous vehicles must add additional interactive information to compensate for the absence of interactive details provided by the driver in traditional situations. In this way, it can help pedestrians form precise situational awareness of when to or not to cross the road at the sight of an autonomous vehicle. From Deb’s perspective [11], additional interactive information between autonomous vehicles and pedestrians will be a critical improvement for road safety regarding autonomous vehicles. Therefore, with effective communication between road users and autonomous vehicles, safer road travel will be experienced, including reduced chances of injuries due to accidents among pedestrians, more so in busy areas like central business districts. 1.2 The Study of EHMI Inspired by internal human-machine interfaces (HMIs), many studies have attempted to design an external human-machine interface (eHMI) installed on the exterior of the vehicle, aiming to provide information about the vehicle status or intentions to other road users outside the vehicle, thus removing public uncertainty about the behavior of self-driving vehicles. According to Ackermans [12], the external human-machine interaction aims to help pedestrians understand the vehicles’ intentions quickly and accurately and build correct situational awareness to make safe crossing decisions while feeling good about the experience. EHMI adoption would reduce or eliminate public uncertainty about autonomous vehicles’ behavior in ambiguous situations, such as when a pedestrian needs to go or stop when a vehicle approaches slowly. The vehicle’s slowing behavior may not be yielding to the pedestrian. Compared with the traditional informal

A Literature Review of Current Practices

575

communication mode influenced by human factors in the driving environment, eHMI can effectively communicate with autonomous vehicles and pedestrians. Additionally, Eisma’s study [13] shows that the adoption of eHMI different from the current systems (camera, radar, and lidar-based system) can effectively enhance the safety of the vehicles and occupants. As more semi-autonomous vehicles enter public roads in the future, the mix of manual and autonomous vehicles will become more complex, increasing the public demand for understanding public transportation. Currently, the usability of external pedestrian-vehicle interactions remains critical for autonomous vehicle developers. Although eHMI representations have been extensively studied, there still needs to be a consensus on the effectiveness of different eHMI design solutions. There still needs to be clear research evidence on the effectiveness of exterior interaction interfaces in complex traffic environments. This literature review will systematically analyze existing research to establish a standardized eHMI evaluation system. Researchers believe that pedestrians’ situational awareness will be influenced by vehicle motion, environmental cues, and individual factors, including specific cognitive processing [2, 3]. 1.3 The Present Study The main target of eHMI is to help pedestrians quickly and accurately understand the vehicle’s intent so that they can make safe crossing decisions and feel good in the process. Therefore, research on pedestrian-vehicle interaction should consider different needs of the information presented by eHMIs corresponding with the pedestrian decision-making process and correct situation awareness of the traffic scene in the most common scenario, such as crossing the road. Endsley defines situational awareness as the perception of the elements within the surrounding within a limited space and time, understanding of their respective meanings, and projection of their status shortly [14]. This process defines how people establish their internal representation of the dynamic external environment [1]. Situational awareness is supported by top-down processing, which entails perceiving the environment around us by critically drawing what we already know to interpret the new information. The core of situation awareness is the perception, understanding, and prediction of vehicle behaviors [2]. The primary stage of situational awareness is the perception of environmental elements. Information collected at the perception stage is the basis of understanding and prediction, and 76.3% of errors in situational awareness are caused by perceptual mistakes [15]. Therefore, to enhance the perception of information, we can improve the interface based on these influencing factors to improve the recognition of the information presented by eHMIs. However, information overload should be avoided, which may lead to the perceptual failure of pedestrians. In the understanding stage, information from the environment, schemata, and mental models stored in the brain provide individuals with many details to promote their integration and interpretation of information [16]. Further, at the understanding stage, the pedestrians get to know and understand the information through situation awareness. The developers can take advantage of the understanding stage by installing eHMIs. The advantage of textual interface is that language is a communication tool people use daily,

576

Y. Zheng et al.

and the judgments have precise meanings. Therefore, the clarity of information transmission in such interfaces is most important to human actions or decisions in the event of an encounter with autonomous vehicles [17]. When using a non-text interface to present information, the projection on the crosswalk is best understood. However, it is difficult to know when the presented information is unlearned by icons or lights [18]. The prediction stage is the last stage of situational awareness. In this phase, pedestrians will recognize vehicle intentions and predict their future actions based on vehicle motion and information provided by eHMIs. The pedestrians further assess the risk and safety of road crossing based on such predictions and then make crossing decisions based on their evaluation of perceived safety. However, when pedestrians infer the future actions of vehicles by referring to road information and vehicle trajectory change information, pedestrians need to extend their decision time to confirm whether the vehicle actions are consistent with the inference. Epiphenomenal information presented through eHMIs can convey information about the future actions of the vehicle and reduce the reaction time [19]. In combination with the above, pedestrians have different requirements for the information presented by eHMIs in different cognitive processing stages. Well-designed eHMIs can compensate for the lack of driver clues by providing additional hints to ensure that pedestrians can establish comprehensive and correct situational awareness, increase their sense of security and reduce unnecessary waiting [18]. Therefore, establishing a standardized eHMI evaluation system according to the pedestrian decision-making process is of great value and significance.

2 Method To establish a standardized eHMI evaluation system, we searched relevant literature with the keywords ‘Autonomous’, ‘Automated’, ‘Self-driving’, ‘Vehicle’, ‘Car’, ‘Pedestrian’, ‘Cyclist’. ‘Vulnerable Road User’, ‘Other Road User’, ‘Interface’. ‘Interaction’, ‘Communication’, ‘external human-machine interface’, ‘implicit’, and found 416 pieces of literature through manual screening. These papers were published primarily in the Proceedings of International Conference on Automotive User Interfaces and Interactive Vehicular Applications, Information, IEEE Intelligent Vehicles Symposium, International Conference on Intelligent Transportation Systems, Information, IEEE Intelligent Vehicles Symposium, International Conference on Intelligent Transportation Systems, Applied Ergonomics, Ergonomics, Human Factors, Transportation Research. Part F: Traffic Psychology and Behaviour, Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, etc. In the initial stage of the literature review, we removed duplicates and papers that were not relevant to the topic. We selected 96 empirical studies related to eHMI, and read them in detail, focusing on subject type, sample size, interaction target, experimental setting, assessment items, method type, and measurement methods in the papers, finally we found key measurement indicators related to eHMI from 63 literature, and classified these indicators (Fig. 1).

A Literature Review of Current Practices

577

Fig. 1. Flow diagram for the scoping review process.

3 Results This evaluation system includes five first-level categories: safety, perceptibility, intelligibility, adaptability, and affectivity, which comprise twenty-five second-level indicators. We marked the source of each class of second-level indicators to form the eHMI evaluation indicator system table. Among the five first-level indicator categories, the first category, safety, is defined as ’the possibility that the car interaction design enables both parties to keep lives safe when interacting in traffic scenes [20]. It contains five categories of the second-level indicator: stop-start time, safety margin, perceived safety, perceived risk, cross behavior. The second category is perceptibility, defined as ‘the level of perceptual effects obtained by interaction design in different environments, cognitive levels, and motion states.’ It includes five second-level indicators: detection speed, maximum perceived distance, concentration effort, anti-environment distraction, and speed perception accuracy.

578

Y. Zheng et al.

The third category is intelligibility, which defines ‘the understandable level of interaction design,’ including six indicators: comprehensibility, vehicle intention understanding, arrival time prediction, intuitiveness, learnability, and cognitive load. The fourth category is adaptability, defined as ‘the level of adjustment and adaption that interaction design can produce according to changes in the external environment and its tasks.’ It includes four types of second-level indicators, universal users, robust for multiple scenarios, weather adaptation and equipment failure tolerance. The fifth category is affectivity, defined as ‘the extent to which interaction design meets the emotional needs of users and other road users.’ It includes five secondlevel indicators: pleasure, fun, science of high-tech and innovation, amiability, and trustworthiness. Table 1. eHMI evaluation indicator system table. Category/ Indicator

Definition

Method

Reference

Safety Stop-start time

The time from when the Objective vehicle comes to a complete stop to when the pedestrian begins to cross the road

[21]

Safety Margin

The time of the vehicle reaching the pedestrian crossing point at its current speed

Objective

[22–26]

Perceived safety

The extent to which the pedestrian feels safe to cross

Subjective

[4, 27–31]

Perceived risk

The extent to which the Subjective pedestrian feels dangerous to cross the road

[32–36]

Cross behavior

The extent to which interaction design allows pedestrians to cross the street safely

[4, 31, 37–39]

Objective

Perceptibility Detection speed

The time that the Objective pedestrians need to detect the eHMI

[13]

Maximum perceived distance

The maximum distance at Objective which pedestrians can see the eHMI

[40, 41]

Concentration effort The amount of subjective Subjective effort of the pedestrians to see the eHMI clearly

[40]

(continued)

A Literature Review of Current Practices

579

Table 1. (continued) Category/ Indicator

Definition

Method

Reference

Anti-Environment Distraction

The difficulty in distinguishing the eHMI from the background

Subjective

[42]

Speed perception accuracy

The difference between the perceived speed and the actual speed

Objective

[17, 43–45]

Intelligibility Comprehensibility

The extent to which Subjective/ pedestrians easily Objective understand the interaction design

[21, 31, 46, 47]

Vehicle intention understanding

To what extent the vehicle’s actual intention can be understood

Objective

[13, 37, 46, 48, 49]

Arrival time prediction

To what extent did the pedestrians correctly predict the vehicle arrival time

Subjective/ Objective

[43, 50]

Intuitiveness

The extent to which the Subjective interaction design fits the mental models of pedestrians and other road users

[47, 51]

Learnability

To what extent individuals Subjective can directly understand the meaning of the design without training

[4, 47, 52]

Cognitive load

The amount of cognitive Subjective effort of the pedestrians to understand the interaction design

[13, 53]

The degree to which the eHMI can accommodate the needs of different types of users, even including the elders, the kids, the disabled, and foreigners

[54]

Adaptability Universal Users

High Voltage Test Set

(continued)

580

Y. Zheng et al. Table 1. (continued)

Category/ Indicator

Definition

Method

Reference

Robust for multiple Scenarios

The degree to which the High eHMI can be effective Voltage Test Set across multiple scenarios of different tasks and road conditions

[55]

Weather adaptation

The degree to which the eHMI can be effective in bad weather

High Voltage Test Set

[56, 57]

Equipment failure tolerance

When the hardware of eHMI (e.g., the LED) has problems, the degree to which interaction design can still be effective

High Voltage Test Set

[32, 58]

Affectivity Pleasure

The degree to which the Subjective interaction with the eHMI makes the pedestrians feel pleased

[34, 59, 60]

Fun

The degree to which the Subjective interaction with the eHMI brings much fun

[21]

Sense of High-tech and Innovation

The degree to which the Subjective interaction with the eHMI makes the pedestrians consider the vehicle to be full of high-tech and innovative elements

[34, 35]

Amiability

The extent to which the Subjective users consider the vehicle to be amiable and like to pursue closer and continuous interaction

[5]

Trustworthiness

The extent to which the pedestrians trust the vehicles

[29, 32, 33, 42, 61, 62]

Subjective

4 Discussion The study results showed that well-designed eHMIs could provide additional clues to make up for the lack of driver clues in the autonomous systems, which majorly leads to road users’ accidents due to the breakdown of communication between autonomous vehicles and other road users, like pedestrians. Further, eHMIs help pedestrians quickly

A Literature Review of Current Practices

581

and accurately understand the intention of vehicles, make safe crossing decisions, and generate good experiences [13]. eHMI design should also be based on the information requirements presented in the three different cognitive processing stages(situational awareness, understanding, and prediction stage) to make corresponding dimension selections to help pedestrians form correct situational awareness. For example, in the perception stage, the eHMI information should be highly identifiable, not only to improve the prominence of the information but also to consider the location and form of its location to avoid invisible location and information overload. In the understanding stage, the eHMI information should be clear and intuitive to facilitate pedestrians’ understanding. It is necessary to use familiar icons or texts to improve the accuracy of pedestrians’ understanding. In the prediction stage, the eHMI needs to present information about vehicle intention and movement status and communicate information about the vehicle’s future actions to help pedestrians predict risks, reduce their reaction time, and assist them in making street crossing decisions.

5 Conclusion An excellent eHMI design scheme should appropriately combine interactive content, display technology, and display features. Typical interactive designs include text, icons, and lighting changes. Visual distance and vehicle motion affect the textual interface’s readability, and text visibility can be reduced if the vehicle is too far away or at high speed. Text-based interfaces also have the disadvantage of cross-cultural comprehension difficulties for those who need a grasp of the local language. Compared with textual interfaces, graphical eHMIs have a long visual distance and lower cross-cultural learning barriers, but with standardization, its comprehensibility is generally better than text. Graphical interfaces transmitting dynamic cues can convey vehicle motion changes which can further improve information recognition [21]. Pedestrians easily recognize light-based interfaces when providing dynamic cues [6]. Display technologies mounted outside the vehicle generally fall into two categories. One is the external LED. Displays with common mounting locations, including the front bumper, rear, or windshield in front of the vehicle on the passenger side. The other is to project interactive content onto the ground in front or to the side of the vehicle. This display technology can present visible information based on a fixed location. Even when the vehicle is at high speed, it has a high recognition [17]. However, ground-projected will lead to more visual attention distraction and lack of clarity in light solid environments. Therefore, it is necessary to consider infrastructure compatibility, the impact of external lighting, and pedestrians’ visual allocation. In terms of display characteristics, the color, brightness, and saturation settings of the displayed content also affect the prominence and visibility of the information [42]. Additionally, unstandardized light colors can affect pedestrians’ integration and interpretation process. Autonomous vehicles also express motions (acceleration, deceleration, swap, etc.) and movement characteristics, such as the motion of implicit clues. Some researchers suggest that the interface should be combined with implicit information when designing eHMIs. For example, a display screen that displays the real-time vehicle speed can help pedestrians predict vehicle intentions more quickly and accurately [63]. This interface maintains the consistency and synchronization of implicit and explicit information,

582

Y. Zheng et al.

improves the recognition of implicit knowledge, and promotes the information’s accuracy and comprehensibility. In the future design of eHMIs, the dynamic presentations of vehicle state information through the interface can also be considered.

References 1. Endsley, M.R.: Measurement of situation awareness in dynamic systems. Hum Factors. 37, 65–84 (1995). https://doi.org/10.1518/001872095779049499 2. Palmeiro, A.R., van der Kint, S., Vissers, L., Farah, H., de Winter, J.C.F., Hagenzieker, M.: Interaction between pedestrians and automated vehicles: a wizard of Oz experiment. Transp. Res. Part F: Traffic Psychol. Behav. 58, 1005–1020 (2018). https://doi.org/10.1016/j.trf.2018. 07.020 3. Chen, W., Jiang, Q., Zhuang, X., Ma, G.: Comparison of pedestrians’ gap acceptance behavior towards automated and human-driven vehicles. In: Harris, D., Li, W.-C. (eds.) Engineering Psychology and Cognitive Ergonomics. Cognition and Design. LNCS (LNAI), vol. 12187, pp. 253–261. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49183-3_20 4. de Clercq, K., Dietrich, A., NúñezVelasco, J.P., de Winter, J., Happee, R.: External humanmachine interfaces on automated vehicles: effects on pedestrian crossing decisions. Hum Factors. 61, 1353–1370 (2019). https://doi.org/10.1177/0018720819836343 5. Lanzer, M., et al.: Designing communication strategies of autonomous vehicles with pedestrians: an intercultural study. In: 12th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. pp. 122–131. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3409120.3410653 6. Lee, Y.M., et al.: Investigating Pedestrians’ Crossing Behaviour During Car Deceleration Using Wireless Head Mounted Display: An Application Towards the Evaluation of eHMI of Automated Vehicles. https://drivingassessment.uiowa.edu/proceedings. Accessed 21 Mar 2022 7. Locken, A., Wintersberger, P., Frison, A.-K., Riener, A.: Investigating user requirements for communication between automated vehicles and vulnerable road users. In: 2019 IEEE Intelligent Vehicles Symposium (IV). pp. 879–884. IEEE, Paris, France (2019). https://doi. org/10.1109/IVS.2019.8814027 8. Eisma, Y.B., van Bergen, S., ter Brake, S.M., Hensen, M.T.T., Tempelaar, W.J., de Winter, J.C.F.: External human–machine interfaces: the effect of display location on crossing intentions and eye movements. Information 11(1), 13 (2019). https://doi.org/10.3390/info11 010013 9. de Visser, E.J., Pak, R., Shaw, T.H.: From ‘automation’ to ‘autonomy’: the importance of trust repair in human–machine interaction. Ergonomics 61, 1409–1427 (2018). https://doi.org/10. 1080/00140139.2018.1457725 10. Hagenzieker, M.P., van der Kint, S., Vissers, L., van Schagen, I.N.L.G.: Interactions between cyclists and automated vehicles: Results of a photo experiment. J. Transp. Saf. Secur. (2019) 11. Deb, S., Strawderman, L., Carruth, D.W., DuBien, J., Smith, B., Garrison, T.M.: Development and validation of a questionnaire to assess pedestrian receptivity toward fully autonomous vehicles. Transp. Res. Part C: Emerg. Technol. 84, 178–195 (2017). https://doi.org/10.1016/ j.trc.2017.08.029 12. Ackermans, S., Dey, D., Ruijten, P., Cuijpers, R.H., Pfleging, B.: The Effects of Explicit Intention Communication, Conspicuous Sensors, and Pedestrian Attitude in Interactions with Automated Vehicles. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–14. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3313831.3376197

A Literature Review of Current Practices

583

13. Eisma, Y.B., Reiff, A., Kooijman, L., Dodou, D., de Winter, J.C.F.: External human-machine interfaces: effects of message perspective. Transp. Res. Part F: Traffic Psychol. Behav. 78, 30–41 (2021). https://doi.org/10.1016/j.trf.2021.01.013 14. Endsley, M.R.: Design and evaluation for situation awareness enhancement. Proc. Human Factors Soc. Ann. Meet. 32, 97–101 (1988). https://doi.org/10.1177/154193128803200221 15. Jones, D.G., Endsley, M.R.: Sources of situation awareness errors in aviation. Aviat. Space Environ. Med. 67, 507–512 (1996) 16. Endsley, M.R.: Situation models: an avenue to the modeling of mental models. Proc. Human Factors Ergon. Soc. Ann. Meet. 44, 61–64 (2000). https://doi.org/10.1177/154193120004 400117 17. Ackermann, C., Beggiato, M., Bluhm, L.-F., Löw, A., Krems, J.F.: Deceleration parameters and their applicability as informal communication signal between pedestrians and automated vehicles. Transport. Res. F: Traffic Psychol. Behav. 62, 757–768 (2019). https://doi.org/10. 1016/j.trf.2019.03.006 18. Bazilinskyy, P., Dodou, D., de Winter, J.: Survey on eHMI concepts: The effect of text, color, and perspective. Transp. Res. Part F: Traffic Psychol. Behav. 67, 175–194 (2019). https://doi. org/10.1016/j.trf.2019.10.013 19. Lagström, T., Lundgren, V.M.: An investigation of pedestrian-driver communication and development of a vehicle external interface. Human Factors 20. Chen, K.-T., Chen, H.-Y.W.: Manipulating music to communicate automation reliability in conditionally automated driving: A driving simulator study. Int. J. Human-Comput. Stud. 145, 102518 (2021). https://doi.org/10.1016/j.ijhcs.2020.102518 21. Othersen, I., Conti-Kufner, A.S., Dietrich, A., Maruhn, P., Bengler, K.: Designing for Automated Vehicle and Pedestrian Communication: Perspectives on eHMIs from Older and Younger Persons 22. Joisten, P., et al.: Displaying vehicle driving mode – effects on pedestrian behavior and perceived safety. In: Ahram, T., Karwowski, W., Pickl, S., Taiar, R. (eds.) Human Systems Engineering and Design II. AISC, vol. 1026, pp. 250–256. Springer, Cham (2020). https:// doi.org/10.1007/978-3-030-27928-8_38 23. Zhuang, X., Wu, C.: The safety margin and perceived safety of pedestrians at unmarked roadway. Transport. Res. F Traffic Psychol. Behav. 15, 119–131 (2012). https://doi.org/10. 1016/j.trf.2011.11.005 24. Petzoldt, T.: On the relationship between pedestrian gap acceptance and time to arrival estimates. Accid. Anal. Prev. 72, 127–133 (2014). https://doi.org/10.1016/j.aap.2014.06.019 25. Lobjois, R., Cavallo, V.: Age-related differences in street-crossing decisions: the effects of vehicle speed and time constraints on gap selection in an estimation task. Accid. Anal. Prev. 39, 934–943 (2007). https://doi.org/10.1016/j.aap.2006.12.013 26. Chung, Y.-S., Chang, C.-Y.: Farther and safer: an illusion engendered by incapability? Transport. Res. F: Traffic Psychol. Behav. 54, 110–123 (2018). https://doi.org/10.1016/j.trf.2018. 01.020 27. Faas, S.M., Stange, V., Baumann, M.: Self-driving vehicles and pedestrian interaction: does an external human-machine interface mitigate the threat of a tinted windshield or a distracted driver? Int. J. Human–Comput. Interact. 37(14), 1364–1374 (2021). https://doi.org/10.1080/ 10447318.2021.1886483 28. Osswald, S., Wurhofer, D., Trösterer, S., Beck, E., Tscheligi, M.: Predicting information technology usage in the car: towards a car technology acceptance model. In: Proceedings of the 4th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 51–58. Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2390256.2390264

584

Y. Zheng et al.

29. Löcken, A., Golling, C., Riener, A.: How should automated vehicles interact with pedestrians? A comparative analysis of interaction concepts in virtual reality. In: Proceedings of the 11th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 262–274. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3342197.3344544 30. Bazilinskyy, P., Kooijman, L., Dodou, D., De Winter, J.C.F.: How should external humanmachine interfaces behave? Examining the effects of colour, position, message, activation distance, vehicle yielding, and visual distraction among 1,434 participants. Appl. Ergon. 95, 103450 (2021). https://doi.org/10.1016/j.apergo.2021.103450 31. Forke, J., et al.: Understanding the Headless Rider: Display-Based Awareness and IntentCommunication in Automated Vehicle-Pedestrian Interaction in Mixed Traffic. Multimodal Technol. Interact. 5, 51 (2021). https://doi.org/10.3390/mti5090051 32. Kaleefathullah, A.A., Merat, N., Lee, Y.M., Eisma, Y.B., Madigan, R., Garcia, J., de Winter, J.: External human–machine interfaces can be misleading: an examination of trust development and misuse in a CAVE-based pedestrian simulation environment. Human Factors: J. Human Factors Ergon. Soc. 64(6), 1070–1085 (2020). https://doi.org/10.1177/0018720820970751 33. Witmer, B.G., Singer, M.J.: Measuring Presence in Virtual Environments: A Presence Questionnaire. Presence Teleop. Virt. 7(3), 225–240 (1998). https://doi.org/10.1162/105474698 565686 34. Faas, S.M., Mathis, L.-A., Baumann, M.: External HMI for self-driving vehicles: Which information shall be displayed? Transportation research. Part F, Traffic psychology and behaviour. 68, 171–186 (2020). https://doi.org/10.1016/j.trf.2019.12.009 35. Bartneck, C., Kuli´c, D., Croft, E., Zoghbi, S.: Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. Int J of Soc Robotics. 1, 71–81 (2009). https://doi.org/10.1007/s12369-008-0001-3 36. Inoue, K., Nonaka, S., Ujiie, Y., Takubo, T., Arai, T.: Comparison of human psychology for real and virtual mobile manipulators. In: ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, pp. 73–78 (2005). https://doi.org/10.1109/ ROMAN.2005.1513759 37. Kooijman, L., Happee, R., de Winter, J.: How do eHMIs affect pedestrians’ crossing behavior? A study using a head-mounted display combined with a motion suit. Information 10(12), 386 (2019). https://doi.org/10.3390/info10120386 38. Roether, C.L., Omlor, L., Christensen, A., Giese, M.A.: Critical features for the perception of emotion from gait. J. Vis. 9, 15 (2009). https://doi.org/10.1167/9.6.15 39. Crane, E., Gross, M.: Motion capture and emotion: affect detection in whole body movement. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) Affective Computing and Intelligent Interaction. LNCS, vol. 4738, pp. 95–101. Springer, Heidelberg (2007). https://doi.org/10.1007/ 978-3-540-74889-2_9 40. Rettenmaier, M., Schulze, J., Bengler, K.: How much space is required? Effect of distance, content, and color on external human–machine interface size. Information 11(7), 346 (2020). https://doi.org/10.3390/info11070346 41. Bhagavathula, R., Williams, B., Owens, J., Gibbons, R.: The reality of virtual reality: a comparison of pedestrian behavior in real and virtual environments. Proc. Human Factors Ergon. Soc. Ann. Meet. 62, 2056–2060 (2018). https://doi.org/10.1177/1541931218621464 42. Faas, S.M., Baumann, M.: Light-based external human machine interface: color evaluation for self-driving vehicle and pedestrian interaction. Proc. Human Factors Ergon. Soc. Ann. Meet. 63, 1232–1236 (2019). https://doi.org/10.1177/1071181319631049 43. Sun, R., Zhuang, X., Wu, C., Zhao, G., Zhang, K.: The estimation of vehicle speed and stopping distance by pedestrians crossing streets in a naturalistic traffic environment. Transport. Res. F: Traffic Psychol. Behav. 30, 97–106 (2015). https://doi.org/10.1016/j.trf.2015.02.002

A Literature Review of Current Practices

585

44. Champion, R.A., Warren, P.A.: Contrast effects on speed perception for linear and radial motion. Vision. Res. 140, 66–72 (2017). https://doi.org/10.1016/j.visres.2017.07.013 45. Hussain, Q., Alhajyaseen, W.K.M., Pirdavani, A., Reinolsmann, N., Brijs, K., Brijs, T.: Speed perception and actual speed in a driving simulator and real-world: a validation study. Transport. Res. F: Traffic Psychol. Behav. 62, 637–650 (2019). https://doi.org/10.1016/j.trf.2019. 02.019 46. Li, J., Guo, H., Pan, S., Tan, H.: Where is the best autonomous vehicle interactive display place when meeting a manual driving vehicle in intersection? In: Rau, P.-L. (ed.) Cross-Cultural Design. Applications in Cultural Heritage, Tourism, Autonomous Vehicles, and Intelligent Agents. LNCS, vol. 12773, pp. 225–239. Springer, Cham (2021). https://doi.org/10.1007/ 978-3-030-77080-8_19 47. Oudshoorn, M., de Winter, J., Bazilinskyy, P., Dodou, D.: Bio-inspired intent communication for automated vehicles. Transport. Res. F: Traffic Psychol. Behav. 80, 127–140 (2021). https:// doi.org/10.1016/j.trf.2021.03.021 48. She, J., Islam, M., Zhou, F.: The Effect of Dynamic Speed Information and Timing of Displaying EHMI on Automated Vehicle and Pedestrian Interactions. In: ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference November 17 (2021). https://doi.org/10.1115/DETC2021-68319 49. Eisma, Y.B., Cabrall, C.D.D., de Winter, J.C.F.: Visual sampling processes revisited: replicating and extending senders (1983) using modern eye-tracking equipment. IEEE Trans. Human-Mach. Syst. 48, 526–540 (2018). https://doi.org/10.1109/THMS.2018.2806200 50. Feldstein, I.T.: Impending collision judgment from an egocentric perspective in real and virtual environments: a review. Perception 48, 769–795 (2019). https://doi.org/10.1177/030 1006619861892 51. Dey, D., et al.: Distance-dependent eHMIs for the interaction between automated vehicles and pedestrians. In: 12th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 192–204. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3409120.3410642 52. Faas, S.M., Kao, A.C., Baumann, M.: a longitudinal video study on communicating status and intent for self-driving vehicle-pedestrian interaction. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–14. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3313831.3376484 53. Holländer, K., Krüger, A., Butz, A.: Save the smombies: app-assisted street crossing. In: 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services, pp. 1–11. Association for Computing Machinery, New York, NY, USA (2020). https://doi. org/10.1145/3379503.3403547 54. She, J., Neuhoff, J., Yuan, Q.: Shaping pedestrians’ trust in autonomous vehicles: an effect of communication style, speed information, and adaptive strategy. J. Mech. Des. 143 (2021). https://doi.org/10.1115/1.4049866 55. Dey, D., Matviienko, A., Berger, M., Pfleging, B., Martens, M., Terken, J.: Communicating the intention of an automated vehicle to pedestrians: the contributions of eHMI and vehicle behavior it. Inf. Technol. 63(2), 123–141 (2020). https://doi.org/10.1515/itit-2020-0025 56. Sahin, ¸ H., Vöge, S., Stahr, B., Trilck, N., Boll, S.: An exploration of potential factors influencing trust in automated vehicles. In: Ardito, C., et al. (eds.) Human-Computer Interaction – INTERACT 2021. LNCS, vol. 12936, pp. 364–367. Springer, Cham (2021). https://doi.org/ 10.1007/978-3-030-85607-6_38 57. Li, Y., Dikmen, M., Hussein, T.G., Wang, Y., Burns, C.: To cross or not to cross: urgencybased external warning displays on autonomous vehicles to improve pedestrian crossing safety. In: Proceedings of the 10th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 188–197. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3239060.3239082

586

Y. Zheng et al.

58. Rettenmaier, M., Albers, D., Bengler, K.: After you?! – Use of external human-machine interfaces in road bottleneck scenarios. Transport. Res. F: Traffic Psychol. Behav. 70, 175–190 (2020). https://doi.org/10.1016/j.trf.2020.03.004 59. Colley, M., Mytilineos, S.C., Walch, M., Gugenheimer, J., Rukzio, E.: Evaluating highly automated trucks as signaling lights. In: 12th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 111–121. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3409120.3410647 60. Schlackl, D., Weigl, K., Riener, A.: eHMI visualization on the entire car body: results of a comparative evaluation of concepts for the communication between AVs and manual drivers. In: Proceedings of the Conference on Mensch und Computer, pp. 79–83. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3404983.341 0011 61. Wintersberger, P., von Sawitzky, T., Frison, A.-K., Riener, A.: Traffic augmentation as a means to increase trust in automated driving systems. In: Proceedings of the 12th Biannual Conference on Italian SIGCHI Chapter, pp. 1–7. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3125571.3125600 62. Verstegen, R., Dey, D., Pfleging, B.: CommDisk: a holistic 360°; eHMI concept to facilitate scalable, unambiguous interactions between automated vehicles and other road users. In: 13th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 132–136. Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3473682.3480280 63. Clamann, M., Aubert, M., Cummings, M.L.: Evaluation of vehicle-to-pedestrian communication displays for autonomous vehicles. In: Transportation Research Board 96th Annual MeetingTransportation Research Board (2017)

Visualizing the Improvement of the Autonomous System: Comparing Three Methods Yukun Zhu1,2 , Zhizi Liu3 , Youyu Sheng1,2 , Yi Ying1,2 , and Jingyu Zhang1,2(B) 1 CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100000, China

[email protected]

2 Department of Psychology, University of the Chinese Academy of Sciences, Beijing 100000,

China 3 Chongqing Changan Automobile Co., Ltd., Chongqing 400000, China

Abstract. As one important characteristic of an intelligent product is its ability to evolve, system updates have become vital for its success. In addition, a good perception of system updates may improve users’ positive attitudes toward the product and encourage continuous usage. However, how to enhance such perception remains largely under-explored. This study investigated how different visualizations of steering performance improvement can influence people’s perception of autonomous vehicles. We measured 120 people’s responses to three different types of visualizing methods. The results showed that the line charts were most effective in increasing the perceived improvement of updates, which could further promote users’ satisfaction, purchase, and recommendation intentions. We further discussed how our findings could help design future visualization of system updates. Keywords: Perception of Updates · Autonomous Vehicles · Single Feedback · Trend Feedback

1 Introduction Updates are important methods to improve the products while they are already being used [1]. It has become prevalent in our daily life, from computer software, mobile applications, and digital devices such as phones and cameras, to the more intelligent product such as robots and autonomous vehicles [2–4]. As intelligent products are believed to be able to evolve, the use of updates to enhance their ability has become more important. While all issued updates are to make a better product (e.g., fewer bugs, faster speed, new functions, etc.), the users may not always be satisfied with the updates. The introduction, content, timing, and feedback of an update can influence users’ perception of it [5–7]. Whereas a good perception can help spread positive word of mouth and attract more potential users [8], a bad perception may prevent the users from using the product in the future [9]. Previous studies on the perception of updates mostly focus on factors including the content of the updates [5], update frequency [7], and the previous update experience [6]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 587–595, 2023. https://doi.org/10.1007/978-3-031-35389-5_41

588

Y. Zhu et al.

In addition, some researchers also used the updates as control variables for investigating other phenomena [10]. However, how different feedback of system updates can influence users’ perceptions has been rarely examined. Indeed, for most of nowadays updates, there was no or only very simple feedback. In most cases, the users are only told that a package, often with a strange serial-number-like name, has been successfully installed. Sometimes, they are given more details, such as some important functions optimized, some bugs fixed, and/or some new functions added. At best, the only advantage of this type of feedback is noninvasiveness. Actually, the user could hardly perceive them, let alone elicit any positive perceptions. One possible solution to improve the users’ perception is by providing the quantification of the improvement. For example, the 3DSMAX benchmark provides a quantified performance evaluation of different graphic cards. Users can clearly understand the performance improvement of any updates from the figures (Fig. 1). In this way, providing the figures can be an important way to enhance their perception of an update. As a result, we hypothesized that providing figures would enhance the user’s perception of updates more than no feedback information (H1).

Fig. 1. The example of 3DMAX benchmark

Past studies on the visualization of changing figures also suggest several important principles to improve the effects. First, in information synthesis or comparison tasks, graphs information is more effective than figures information [11]. Second, in humancomputer interaction, users might notice the presence of some information but might not pay enough attention to analyze all information [12]. Compared with figures, graphs showed the previous and current results, which could make the user more convenient to analyze. Third, when using line charts to describe data, it is easier for people to pay

Visualizing the Improvement of the Autonomous System

589

attention to the trend of data changes than other types of graphs [13]. Therefore, we hypothesized that providing a line chart would enhance the user’s perception of updates more than no feedback information (H2), and it would also enhance the user’s perception of updates more than figures (H3). Therefore, in this study, we compared the effects of three different feedback conditions in autonomous vehicles (because it is one of the most important and frequently updated products in the artificial intelligence period). The first condition is called no feedback. It presents no figures or graphs, which might be mostly used in the current product. The second one is called single feedback, which only shows the current updated figures information. Another one is called trend feedback, which shows the changes in the performance of every previous update by line chart.

2 Method 2.1 Participants The participants were recruited from the website. Each participant had more than one year of driving experience. A total of 120 subjects were recruited, including 60 males and 60 females, with an average age of 24.45 (SD = 3.14). All subjects were right-handed, with no history of neurological disease, normal corrected vision, and no color blindness or color weakness. They were paid 10 yuan (around 1.5 dollars) for their participation. All subjects signed the informed consent statements. 2.2 Research Design In this research, there were three between-subject conditions: No feedback condition, Single feedback condition, and Trend feedback condition. 2.3 Material The Videos Show Different Levels of Steering Performance. We recorded five videos using a country road from the inner-vehicle angle through the same route. The route contains city roads with medium traffic and three turns (see Fig. 2). The five videos were different in the vehicle’s steering ability. The detailed changes in their performance are described in Table 1. Types of Feedback. In the No feedback condition, participants only saw the five videos from the one with the worst performance to the one with the best performance in a consecutive order. In the Single feedback condition, participants saw the scores of the vehicle’s performances after each video. They were told that the overall scores were based on four indicators: the times of hitting an obstacle, the times of crossing the road centerline, the times of driving out of the lane, and the times of deviating from the center line (see Fig. 3). The four indicators were also shown.

590

Y. Zhu et al. Table 1. The driving performance across five videos

Video no

Overall scores

Hitting obstacle

Road departure

Cross the solid line

Lane departure

1

1664

3

2

2

5

2

2016

0

1

2

5

3

2592

0

0

1

3

4

2784

0

0

1

3

5

2976

0

0

0

1

Fig. 2. Top view of the experiment road and inner-vehicle angle

In the Trend feedback condition, the participants saw a line graph after each video showing the performance of this stage and all previous stages. The first score was set as 50, and the rest scores were changed based on their relative ratios (see Fig. 4). Measurement. Perception of the updates. Three 7-point Likert-type items were used to measure the perception of the updates. A sample item was, "I can perceive the changes

Visualizing the Improvement of the Autonomous System

Fig. 3. Single Feedback

Fig. 4. Trend Feedback

591

592

Y. Zhu et al.

in the system after each update.“ (1: completely disagree-7 completely agree). The Cronbach alpha coefficient was 0.76. Recommendation Intention. We used a net-promoter item to measure participants’ intention to recommend the product. The participants would respond to "How likely is it that you would recommend this autonomous vehicle to a friend or colleague?” on an 11-point scale from 0 (not at all likely) to 10 (extremely likely). We only used the raw score of this scale [14, 15]. User Satisfaction Scale. User satisfaction is measured by three adjective rating levels [16], such as “1: rarely meets my needs – 7: greatly meets my needs” (α = 0.98). Purchase Intention Scale. We measured the intention of individuals to purchase autonomous vehicles [17, 18]. The 7-point scale included three items such as "I will buy this autonomous vehicle if it is available and at a reasonable price.“ (1: completely disagree-10: completely agree) The Cronbach alpha coefficient was 0.81. Demographic Variables. We also collected the subjects’ driving experience, autonomous driving experience, and driving frequency.

2.4 Procedure The participants were first asked to sign the informed consent statement. Then subjects were asked to start the experiment program and read the instructions. In the instruction, participants were told they would see a series of videos showing the updating process of a vehicle, and they had to watch them carefully and give their evaluation of the updating process. The formal experiment began when they fully understood the requirement, and they were randomly allocated to one of the three between-subject conditions. After watching the videos with or without the corresponding feedback and answering the questions, the participants were thanked, paid and debriefed. The experiment generally lasted about 10 min.

3 Results 3.1 The Effect of Feedback Based on the results of the correlation analysis, a Manova -way variance analysis was conducted on the effect of feedback. The results showed that: For the perception of the updates, the main effect of feedback was significant (F = 7.35, ηp2 = 0.124, p < .001). The results of combined comparisons showed that the score for the Trend feedback condition was significantly higher than that of the No feedback condition (p < .001) and Single feedback condition (p = .004). For the Net-promoter, the main effect of feedback was significant (F = 15.97, ηp2 = .178, p < .001). The results of combined comparisons showed that the score for the Trend feedback condition was significantly higher than that of the No feedback condition (p < .001), Single feedback condition (p = .003).

Visualizing the Improvement of the Autonomous System

593

For user satisfaction, the main effect of feedback was significant (F = 12.39, ηp2 = .136, p < .001). The results of combined comparisons showed that the score for the Trend feedback condition was significantly higher than that of the No feedback condition (p < .001), Single feedback condition (p = .003). For the purchase intention, the main effect of feedback was significant (F = 6.73, ηp2 = .087, p = .001). The results of combined comparisons showed that the score for the Trend feedback condition was significantly higher than that of the No feedback condition (p < .001) and Single feedback condition (p = .027, see Fig. 5).

Fig. 5. The mean and variance of perception of the updates, net-promoter, purchase intention, and user satisfaction in three conditions

4 Discussion The purpose of the present study was to investigate how to enhance the perception of the updates. We showed participants three different kinds of feedback in autonomous vehicles’ steering control update scenarios to explore. Several findings of this study are worth discussing. First, we found that the single feedback conditions could significantly enhance the perception of the updates than the no feedback, and trend feedback condition also could significantly enhance the perception of the updates than the no feedback and single feedback conditions, which is consistent with our H1, H2, and H3. As suggested, the reason for this difference might result from the fact that trend feedback changes the reference point for user comparison [1, 19, 20] and helps the user make the comparison more easily [12]. Regarding the practical implications, the designer could consider more trend feedback in future updates. Second, we also found that the enhanced perception of updates increases the users’ net-promoter, satisfaction, and purchase intention. It is

594

Y. Zhu et al.

consistent with the previous studies. The good update could influence the user’s belief and attitude [1, 21–23], our results added new evidence to this finding. The current study has several limitations. First, in this study, the updating process of vehicles always presents an upwards trend, but in real life, some updating processes may fluctuate up and down, so future research can conduct more discussion on updating different trends. Second, our research used line graphs as the trend feedback. Although previous studies have shown that the line graph is quite good at reflecting the trend, other graphs may also show different advantages in complex cases. As a result, we can continue to discuss the difference in the future representation of trends by different graphs.

5 Conclusion Trend feedback could enhance the user’s perception of updates more than single feedback or no feedback, and enhancing the perception of updates could further increase the users’ recommendation, satisfaction, and purchase intention. As such, the current study provides a new dimension and experimental evidence for future update design. Acknowledgement. This study was supported by the National Natural Science Foundation of China (Grant No. T2192932).

References 1. Fleischmann, M., Amirpur, M., Grupp, T., Benlian, A., Hess, T.: The role of software updates in information systems continuance—an experimental study from a user perspective. Decis. Support Syst. 83, 83–96 (2016) 2. Mallozzi, P., Pelliccione, P., Knauss, A., Berger, C., & Mohammadiha, N.: Autonomous vehicles: state of the art, future trends, and challenges. Autom. Syst. Softw. Eng. 347–367 (2019) 3. Sääksjärvi, M., Hellén, K., Tuunanen, T.: Design features impacting mobile phone upgrading frequency. J. Inf. Technol. Theory Appl. 15(1), 3 (2014) 4. Vaniea, K., Rashidi, Y.: Tales of software updates: The process of updating software. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (2016) 5. Ghose, A., Han, S.P.: Estimating demand for mobile applications in the new economy. Manage. Sci. 60(6), 1470–1488 (2014) 6. Vaniea, K. E., Rader, E., Wash, R.: Betrayed by updates: how negative experiences affect future security. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (2014) 7. Gong, X., Razzaq, A., Wang, W.: More Haste, Less Speed: How Update Frequency of Mobile Apps Influences Consumer Interest. J. Theor. Appl. Electron. Commer. Res. 16(7), 2922–2942 (2021) 8. Do˘gan, K., Ji, Y., Mookerjee, V.S., Radhakrishnan, S.: Managing the versions of a software product under variable and endogenous demand. Inf. Syst. Res. 22(1), 5–21 (2011) 9. Fagan, M., Khan, M.M.H., Buck, R.: A study of users’ experiences and beliefs about software update messages. Comput. Hum. Behav. 51, 504–519 (2015) 10. Claussen, J., Kretschmer, T., Mayrhofer, P.: The effects of rewarding user engagement: The case of Facebook apps. Inf. Syst. Res. 24(1), 186–200 (2013)

Visualizing the Improvement of the Autonomous System

595

11. Speier, C.: The influence of information presentation formats on complex task decisionmaking performance. Int. J. Hum Comput Stud. 64(11), 1115–1131 (2006) 12. Wogalter, M.S., Leonard, S.D.: Attention capture and maintenance. In: Warnings and Risk Communication, pp. 123–148. Taylor & Francis, London (1999) 13. Shah, P., Mayer, R.E., Hegarty, M.: Graphs as aids to knowledge construction: signaling techniques for guiding the process of graph comprehension. J. Educ. Psychol. 91(4), 690–702 (1999) 14. Grisaffe, D.: Questions about the ultimate question: conceptual considerations in evaluating Reichheld’s net promoter score (NPS). J. Consum. Satisf. Dissatisf. Complain. Behav. 20 (2007) 15. Reichheld, F.F.: The one number you need to grow. Harv. Bus. Rev. 81(12), 46–55 (2003) 16. Pyrialakou, V.D., Gkartzonikas, C., Gatlin, J.D., Gkritza, K.: Perceptions of safety on a shared road: driving, cycling, or walking near an autonomous vehicle. J Safety Res 72, 249–258 (2020) 17. Montoro, L., Useche, S.A., Alonso, F., Lijarcio, I., Martí-Belda, A.: Perceived safety and attributed value as predictors of the intention to use autonomous vehicles: a national study with Spanish drivers. Saf. Sci. 120, 865–876 (2019) 18. Moons, I., De Pelsmacker, P.: An extended decomposed theory of planned behaviour to predict the usage intention of the electric car: a multi-group comparison. Sustainability 7(5), 6212–6245 (2015) 19. Helson, H.: Adaptation-Level Theory: An Experimental and Systematic Approach to Behavior. Harper and Row, New York (1964) 20. Oliver, R.L.: A cognitive model of the antecedents and consequences of satisfaction decisions. J. Mark. Res. 17(4), 460–469 (1980) 21. Bhattacherjee, A., Barfar, A.: Information technology continuance research: current state and future directions. Asia Pacific J. Inf. Syst. 21(2), 1–18 (2011) 22. Bhattacherjee, A., Premkumar, G.: Understanding changes in belief and attitude toward information technology usage: a theoretical model and longitudinal test. MIS Q. 28, 229–254 (2004) 23. Kim, S.S., Malhotra, N.K.: A longitudinal model of continued IS use: an integrative view of four mechanisms underlying postadoption phenomena. Manage. Sci. 51(5), 741–755 (2005)

Author Index

A Abbasi, Elahe I-131 Akula, Sathish Chandra I-120 Albers, Frank II-216 Allessio, Danielle II-514 An, Jianing II-178 Aoki, Hirotaka I-181 B Bao, Chunye I-168 Baumgartner, Marc II-320 Bengtsson, Kristofer II-331 Berzina, Nora II-320 Biella, Marcus I-3 Black, Andrew II-3 Blundell, James I-21 Boumann, Hilke I-3 Braithwaite, Graham II-254 C Carr, Leighton II-18 Carstengerdes, Nils I-3, I-65 Causse, Mickaël II-46 Chan, Wesley Tsz-Kin II-36 Charalampidou, Stavroula I-484 Chen, Jingxi II-425 Chen, Junyu II-442 Cheng, Ming I-415 Chu, Yi II-552 Conte, Stefano I-21 D della Guardia, Jasmin II-216 Deng, Ye I-168 Deniel, Jonathan II-46 DeSimone, Joseph M. II-454 Dokas, Ioannis M. I-484 Drzyzga, Gilbert I-37 Duchevet, Alexandre II-46 Dupuy, Maud II-46

E Ebrecht, Lars

II-60

F Fabris, Gabriele II-320 Fu, Haoruo I-415 Fu, Shan I-253 Fabris, Gabriele II-320 Fu, Haoruo I-415 Fu, Shan I-253

G Gao, Shan I-190 Gao, Xian II-533 Gao, Zhan II-560 Ge, Yan I-383, II-485 Geng, Zengxian II-442 Gong, Ziting I-53 Griebel, Hannes S. II-77 Gu, Qiu-Li I-286 Guang, Xin II-442 Guo, Jiuxia II-91 Guo, Xin II-91

H Hamann, Anneke I-3, I-65, II-163 Han, Qiming I-168 Harder, Thorleif I-37 Harris, Don II-200 Harris, Donald I-21 He, Miao I-371 He, Xinyu I-399 Hild, Jutta I-109, I-200 Holzbach, Gerrit I-200 Honkavaara, Eija II-235 Huang, Junfeng I-383 Huang, Yanrong I-79 Huddlestone, John II-3

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. Harris and W.-C. Li (Eds.): HCII 2023, LNAI 14018, pp. 597–600, 2023. https://doi.org/10.1007/978-3-031-35389-5

598

I Imbert, Jean-Paul

Author Index

II-46

J Jakobi, Jörn II-163 Janneck, Monique I-37 Ji, Zhenglei I-415 Jia, Aiping I-95 Jia, Yingjie II-91 Jiang, Ao I-399 Johansson, Björn II-331 K Kanai, Hideaki I-53 Karvonen, Hannu II-235 Karyotakis, Michail II-254 Kishino, Mitsuhiro I-181 Kondratova, Irina II-280 Kontogiannis, Tom II-320 Korek, Wojciech Tomasz I-143 Kramar, Vadim II-235 Kudoh, Atsushi I-181 L Lapointe, Jean-François II-280, II-360 Laursen, Tom II-320 Lee, Kevin I-215 Lee, Will II-514 Lemaire, Noémie II-360 Lengenfelder, Christian I-109 Lenz, Helge II-163 Li, Chenhao I-154, I-341 Li, Danfeng I-276 Li, Dongqian II-560 Li, Hang I-301 Li, Hongting II-406 Li, Lingxuan I-371 Li, Linlin II-104 Li, Ning I-227, I-501 Li, Wen-Chin I-143, I-227, II-36, II-120, II-135, II-150 Li, Yueqing I-131, I-215 Li, Zhizhong I-79, I-168, I-354, II-291 Liang, Yi I-95 Liao, Zhen II-291 Lipkowitz, Gabriel II-454 Liu, Chengxue II-306 Liu, Haidong II-485

Liu, Muchen II-552 Liu, Shan II-467 Liu, Shuang II-291 Liu, Xingyu I-431 Liu, Xinyue I-227 Liu, Xinze II-485 Liu, Yanfang I-371 Liu, Zhizi II-587 Long, Chaoxiang II-391 Long, Lei I-354 Lu, Chien-Tsung I-415 Lu, Mengyuan II-104 Lu, Tingting I-227, II-104 Lu, Xinyu I-415 Lu, Yan II-306 Lu, Yanyu I-253 Luo, Min I-431 Lusta, Majdi I-120 Lytaev, Sergey I-442

M Ma, Jinfei II-425 Ma, Shu II-406, II-467 Ma, Wenting I-264 Machida, Rea I-181 Magee, John II-514 Mahmood, Bilal I-131 Malakis, Stathis II-320 Manikath, Elizabeth II-120 Matton, Nadine II-46 McCann, Heather I-459 McCarthy, Pete I-474 Melbi, Alexander II-331 Mizzi, Andrew I-474 Morishita, Koji I-181 Mou, Tanghong II-306

N Nagasawa, Takashi II-135 Natakusuma, Hans C. II-188 Nichanian, Arthur II-150 Nwosu, Chukebuka I-215

O Omeroglu, Fatih Baha I-215, I-131 Oskarsson, Per-Anders II-331

Author Index

P Pan, Xinyu II-485 Papadopoulos, Basil I-484 Papenfuss, Anne II-501 Pechlivanis, Konstantinos II-200 Peinsipp-Byma, Elisabeth I-109, I-200 Peng, Jiahua I-276 Perry, Nathan II-18 Piotrowski, Pawel II-120 Plioutsias, Anastasios I-459 Poti, Andrea II-320 Q Qi, Xinge II-178 Qiu, Yanxi I-95 R Ren, Xuanhe II-91 Richardson, Alexicia I-120 Riek, Stephan II-18 Röning, Juha II-235 S Sammito, Stefan I-3 Sarikaya, Ibrahim II-188 Sassi, Jukka II-235 Scala, Marcello II-320 Schmidt, Christoph Andreas II-501 Schwebel, David C. II-560 Scott, Steve II-3 Scudere-Weiss, Jonah II-514 Seals, Cheryl I-120 Shaqfeh, Eric S. G. II-454 She, Riheng II-425 Shen, Yang II-345 Sheng, Youyu II-552, II-587 Shi, Jinlei II-406 Shi, Ruisi II-573 Shi, Zhuochen I-501 Shirai, Tsuyoshi I-181 Smith, Daniel C. II-77 Smoker, Anthony II-320 Sommer, Lars I-200 Song, Jian II-560 Song, Yuanming II-523 Sun, He I-95 Sun, Yuan I-238

599

T Takahashi, Marie I-181 Tan, Wei I-238, I-301 Tang, Liu I-383 Tang, Pengchun II-552 Teubner-Rhodes, Susan I-120 Tews, Lukas II-163

V Velotto, Sergio II-320 Vinson, Norman G. II-360 Voit, Michael I-109, I-200

W Wallis, Guy II-18 Wang, Feiyin I-301 Wang, Huarong II-560 Wang, Huihui I-253 Wang, Ke Ren I-286 Wang, Lei I-190, I-276, I-320, II-178 Wang, Lili I-286 Wang, Peiming II-442 Wang, Tianxiong II-533 Wang, Wenchao I-301 Wang, Wenqing I-238, I-301 Wang, Xia II-552 Wang, Xin I-168 Wang, Yifan I-143 Wang, Yonggang I-264 Wang, Zhenhang II-560 Wang, Ziang II-379 Wei, Zixin I-320 Wilson, Abigail II-514 Wu, Bohan II-467 Wu, Changxu II-406 Wu, Jinchun I-154, I-341 Wu, Kangrui II-573 Wu, Xiaoli I-331

X Xian, Yuanyuan I-190 Xiang, Yuying I-383 Xu, Siying II-91 Xu, Xiangrong I-501 Xue, Chengqi I-154, I-341

600

Y Yang, Lichao I-143 Yang, Liu II-533 Yang, Shuwen II-425 Yang, Zhen II-406, II-467 Ye, Xiang II-345 Yin, Zijian I-354 Ying, Yi II-587 Yu, Jiahao I-354 Yu, Qiang II-533 Yu, Tong II-533 Yuan, Jiang II-391 Yuan, Jintong I-301 Yue, Wei II-533 Z Zeleskidis, Apostolos I-484 Zhai, Di II-345 Zhang, Chunyang I-431 Zhang, Haihang II-485 Zhang, Huihui I-371 Zhang, Jingyi II-120

Author Index

Zhang, Mengxi I-276 Zhang, Jingyu II-523, II-552, II-573, II-587 Zhang, Liang I-371 Zhang, Lin I-431 Zhang, Nan II-178 Zhang, Nanxi I-79, I-168 Zhang, Rong II-552 Zhang, Wei II-406 Zhang, Yijing I-168, I-354 Zhang, Yiyang I-501, II-104 Zhang, Zhaoning I-227, I-501 Zhao, Yifan I-143 Zheng, Yahua II-573 Zhou, Ying I-383 Zhou, Yiyao I-331 Zhu, Xiaopeng II-552, II-573 Zhu, Yukun II-587 Zhuang, Xiangling II-523 Ziakkas, Dimitrios II-188, II-200 Zinn, Frank II-216 Zou, Ying I-276, I-320