Intelligent Systems and Applications: Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 2 [1st ed. 2020] 978-3-030-29512-7, 978-3-030-29513-4

The book presents a remarkable collection of chapters covering a wide range of topics in the areas of intelligent system

1,393 31 174MB

English Pages XV, 1312 [1327] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 2 [1st ed.] 9783030551865, 9783030551872

The book Intelligent Systems and Applications - Proceedings of the 2020 Intelligent Systems Conference is a remarkable c

983 105 89MB Read more

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 1 [1st ed.] 9783030551797, 9783030551803

The book Intelligent Systems and Applications - Proceedings of the 2020 Intelligent Systems Conference is a remarkable c

2,314 148 110MB Read more

Intelligent Systems and Applications: Proceedings of the 2020 Intelligent Systems Conference (IntelliSys) Volume 3 [1st ed.] 9783030551896, 9783030551902

The book Intelligent Systems and Applications - Proceedings of the 2020 Intelligent Systems Conference is a remarkable c

2,776 88 89MB Read more

Intelligent Systems and Applications: Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 1 [1st ed. 2020] 978-3-030-29515-8, 978-3-030-29516-5

The book presents a remarkable collection of chapters covering a wide range of topics in the areas of intelligent system

957 117 108MB Read more

Intelligent Systems and Applications: Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 1: 294 (Lecture Notes in Networks and Systems) [1st ed. 2022] 3030821927, 9783030821920

This book presents Proceedings of the 2021 Intelligent Systems Conference which is a remarkable collection of chapters c

1,234 102 98MB Read more

Advances in Computing and Intelligent Systems: Proceedings of ICACM 2019 (Algorithms for Intelligent Systems) [1st ed. 2020] 9789811502224, 9789811502217, 9811502226

This book gathers selected papers presented at the International Conference on Advancements in Computing and Management

234 54 61MB Read more

Proceedings of 2020 Chinese Intelligent Systems Conference: Volume I [1st ed.] 9789811584497, 9789811584503

The book focuses on new theoretical results and techniques in the field of intelligent systems and control. It provides

704 41 80MB Read more

Proceedings of 2020 Chinese Intelligent Systems Conference: Volume II [1st ed.] 9789811584572, 9789811584589

The book focuses on new theoretical results and techniques in the field of intelligent systems and control. It provides

465 94 18MB Read more

Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2020 [1st ed.] 9783030586683, 9783030586690

This book presents the proceedings of the 6th International Conference on Advanced Intelligent Systems and Informatics 2

2,498 73 84MB Read more

Recent Trends in Communication and Intelligent Systems: Proceedings of ICRTCIS 2020 (Algorithms for Intelligent Systems) [1st ed. 2021] 9811601666, 9789811601668

This book presents best selected research papers presented at the International Conference on Recent Trends in Communica

962 119 11MB Read more

Intelligent Systems and Applications: Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 2 [1st ed. 2020]
978-3-030-29512-7, 978-3-030-29513-4

Author / Uploaded
Yaxin Bi
Rahul Bhatia
Supriya Kapoor

Table of contents :
Front Matter ....Pages i-xv
A New Approach of Service Platform for Water Optimization in Lettuce Crops Using Wireless Sensor Network (Edgar Maya-Olalla, Hernán Domínguez-Limaico, Carlos Vásquez-Ayala, Edgar Jaramillo-Vinueza, Marcelo Zambrano V, Alexandra Jácome-Ortega et al.)....Pages 1-13
A Novel AI Based Optimization of Node Selection and Information Fusion in Cooperative Wireless Networks (Yuan Gao, Hong Ao, Weigui Zhou, Su Hu, Haosen Yu, Yang Guo et al.)....Pages 14-23
Using Automated State Space Planning for Effective Management of Visual Information and Learner’s Attention in Virtual Reality (Opeoluwa Ladeinde, Mohammad Abdur Razzaque, The Anh Han)....Pages 24-40
Real-Time Lane Detection and Extreme Learning Machine Based Tracking Control for Intelligent Self-driving Vehicle (Sabir Hossain, Oualid Doukhi, Inseung Lee, Deok-jin Lee)....Pages 41-50
Pedestrian Recognition and Obstacle Avoidance for Autonomous Vehicles Using Raspberry Pi (Charlie Day, Liam McEachen, Asiya Khan, Sanjay Sharma, Giovanni Masala)....Pages 51-69
Improving Urban Air Quality Through Long-Term Optimisation of Vehicle Fleets (Darren M. Chitty, Rakhi Parmar, Peter R. Lewis)....Pages 70-89
Analysing Data Set of the Bike–Sharing System Demand with R Scripts: Mexico City Case (Luis A. Moncayo–Martínez)....Pages 90-105
A Path Towards Understanding Factors Affecting Crash Severity in Autonomous Vehicles Using Current Naturalistic Driving Data (Franco van Wyk, Anahita Khojandi, Neda Masoud)....Pages 106-120
Automobile Automation and Lifecycle: How Digitalisation and Security Issues Affect the Car as a Product and Service? (Antti Hakkala, Olli I. Heimo)....Pages 121-137
Less-than-Truckload Shipper Collaboration in the Physical Internet (Minghui Lai, Xiaoqiang Cai)....Pages 138-151
Smartphone-Based Intelligent Driver Assistant: Context Model and Dangerous State Recognition Scheme (Igor Lashkov, Alexey Kashevnik)....Pages 152-165
Voice Recognition Based System to Adapt Automatically the Readability Parameters of a User Interface (Hélène Soubaras)....Pages 166-178
A Machine-Synesthetic Approach to DDoS Network Attack Detection (Yuri Monakhov, Oleg Nikitin, Anna Kuznetsova, Alexey Kharlamov, Alexandr Amochkin)....Pages 179-191
Novel Synchronous Brain Computer Interface Based on 2-D EEG Local Binary Patterning (Daniela De Venuto, Giovanni Mezzina)....Pages 192-210
LSTM-Based Facial Performance Capture Using Embedding Between Expressions (Hsien-Yu Meng, Jiangtao Wen)....Pages 211-226
Automatic Curation System Using Multimodal Analysis Approach (MAA) (Wei Yuan, Yong Zhang, Xiaojun Hu, Mei Song)....Pages 227-240
Mixed Reality for Industry? An Empirical User Experience Study (Olli I. Heimo, Leo Sakari, Tero Säntti, Teijo Lehtonen)....Pages 241-253
Person Detection in Thermal Videos Using YOLO (Marina Ivasic-Kos, Mate Kristo, Miran Pobar)....Pages 254-267
Implementation of a Hybridized Machine Learning Framework for Flood Risk Management (Oluwole Charles Akinyokun, Udoinyang Godwin Inyang, Emem Etok Akpan)....Pages 268-291
Word Similarity Computing Based on HowNet and Synonymy Thesaurus (Hongmei Nie, Jiaqing Zhou, Hui Wang, Minshuo Li)....Pages 292-305
A Comparison of fastText Implementations Using Arabic Text Classification (Nuha Alghamdi, Fatmah Assiri)....Pages 306-311
The Social Net of Sentiment: Improving the Base Sentiment Analysis of High Impact Events with Lexical Category Exploration (Maxwell Fowler, Aleshia Hayes, Kanika Binzani)....Pages 312-320
Reflective Writing Analysis Approach Based on Semantic Concepts: An Evaluation of WordNet Affect Efficiency (Huda Alrashidi, Mike Joy)....Pages 321-333
Semantic-Based Feedback Recommendation for Automatic Essay Evaluation (Tsegaye Misikir Tashu, Tomáš Horváth)....Pages 334-346
Figurative Language Grounding in Humanoid Robots (Maja Gwóźdź)....Pages 347-362
Masdar: A Novel Sequence-to-Sequence Deep Learning Model for Arabic Stemming (Mohammed M. Fouad, Ahmed Mahany, Iyad Katib)....Pages 363-373
Nomen Meum Earl: Yet Another Route to Intelligent Machine Behavior (Chris Lanz)....Pages 374-394
LIT: Rule Based Italian Lemmatizer (Simone Molendini, Antonio Guerrieri, Andrea Filieri)....Pages 395-404
Intelligent Sense-Enabled Lexical Search on Text Documents (Anu Thomas, S. Sangeetha)....Pages 405-415
Predicting Sentiment in Yorùbá Written Texts: A Comparison of Machine Learning Models (Abimbola Rhoda Iyanda, Omolayo Abegunde)....Pages 416-431
An Introductory Survey on Attention Mechanisms in NLP Problems (Dichao Hu)....Pages 432-448
Advanced Similarity Measures Using Word Embeddings and Siamese Networks in CBR (Kareem Amin, George Lancaster, Stelios Kapetanakis, Klaus-Dieter Althoff, Andreas Dengel, Miltos Petridis)....Pages 449-462
Intelligent Flower Detection System Using Machine Learning (Amna Safar, Maytham Safar)....Pages 463-472
Development of an Interactive Virtual Reality for Medical Skills Training Supervised by Artificial Neural Network (Shabnam Sadeghi Esfahlani, Viktor Izsof, Sabrina Minter, Ali Kordzadeh, Hassan Shirvani, Karim Sadeghi Esfahlani)....Pages 473-482
Toward Robust Image Classification (Basemah Alshemali, Alta Graham, Jugal Kalita)....Pages 483-489
Determining the Number of Hidden Layers in Neural Network by Using Principal Component Analysis (Muh. Ibnu Choldun R., Judhi Santoso, Kridanto Surendro)....Pages 490-500
Symmetry Constrained Machine Learning (Doron L. Bergman)....Pages 501-512
An Integrated SEM-Neural Network for Predicting and Understanding the Determining Factor for Institutional Repositories Adoption (Shahla Asadi, Rusli Abdullah, Yusmadi Yah Jusoh)....Pages 513-532
Load Balancing Using Neural Networks Approach for Assisted Content Delivery in Heterogeneous Network (Raid Sakat, Raed Saadoon, Maysam Abbod)....Pages 533-547
Combining Diffusion Processes for Semi-supervised Learning on Graph Structured Data (Abdullah Al-Gafri, Muhammed Moinuddin, Ubaid M. Al-Saggaf)....Pages 548-556
Smart Energy Usage and Visualization Based on Micro-moments (Abdullah Alsalemi, Faycal Bensaali, Abbes Amira, Noora Fetais, Christos Sardianos, Iraklis Varlamis)....Pages 557-566
The Functional Design Method for Public Buildings Together with Gamification of Information Models Enables Smart Planning by Crowdsourcing and Simulation and Learning of Rescue Environments (Jukka Selin, Markku Rossi)....Pages 567-587
Deep Learning Based Pedestrian Detection at Distance in Smart Cities (Ranjith K Dinakaran, Philip Easom, Ahmed Bouridane, Li Zhang, Richard Jiang, Fozia Mehboob et al.)....Pages 588-593
Infrastructural Models of Intermediary Service Providers in Digital Economy (Anton Ivaschenko, Stanislav Korchivoy, Michail Spodobaev)....Pages 594-605
“Smart City” Governance Technologies Development in the Era of the 4th Industrial Revolution (Mikhail Kuznetsov, Maria Nikishova, Anna Belova)....Pages 606-618
D2C-DM: Distributed-to-Centralized Data Management for Smart Cities Based on Two Ongoing Case Studies (Amir Sinaeepourfard, John Krogstie, Sobah Abbas Petersen)....Pages 619-632
Information and Thermic Control System for Water Contaminant Charge Removal of the City of Villavicencio, Colombia (Obeth Romero)....Pages 633-641
Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia (João Caldeira, Alex Fout, Aniket Kesari, Raesetje Sefala, Joseph Walsh, Katy Dupre et al.)....Pages 642-649
Dynamic Scaling of EEG Fluctuations of Patients with Learning Disorders Based on Artificial Intelligence (Oswaldo Morales Matamoros, Jesús Jaime Moreno Escobar, Ixchel Lina Reyes, Teresa Ivonne Contreras Troya, Ricardo Tejeida Padilla)....Pages 650-670
Blood Glucose Level Prediction Using Optimized Neural Network for Virtual Patients (Muhammad Asad, Younas Khan, Usman Qamar, Saba Bashir)....Pages 671-683
A Review of Continuous Blood Glucose Monitoring and Prediction of Blood Glucose Level for Diabetes Type 1 Patient in Different Prediction Horizons (PH) Using Artificial Neural Network (ANN) (Muhammad Asad, Usman Qamar)....Pages 684-695
ED Revisits Forecasting: Utilizing Latent Models (Ofir Ben-Assuli, Joshua R. Vest)....Pages 696-702
Color Signal Processing Methods for Webcam-Based Heart Rate Evaluation (Mikhail Kopeliovich, Mikhail Petrushan)....Pages 703-723
Genetic Algorithm Based Selection of Appropriate Biomarkers for Improved Breast Cancer Prediction (Arnab Kumar Mishra, Pinki Roy, Sivaji Bandyopadhyay)....Pages 724-732
Electronic Prosthetics for the Rehabilitation of the Carpal Tunnel Syndrome (Santiago Núñez, Patricio Encalada, Santiago Manzano, Juan P. Pallo, Dennis Chicaiza, Carlos Gordón)....Pages 733-749
Artificial Intelligent (AI) Clinical Edge for Voice Disorder Detection (Jaya Shankar Vuppalapati, Santosh Kedaru, Sharat Kedari, Anitha Ilapakurti, Chandrasekar Vuppalapati)....Pages 750-766
Virtual Therapy System in a Multisensory Environment for Patients with Alzheimer’s (Patricio Encalada, Johana Medina, Santiago Manzano, Juan P. Pallo, Dennis Chicaiza, Carlos Gordón et al.)....Pages 767-781
Using Local Binary Patterns and Convolutional Neural Networks for Melanoma Detection (Saeed Iqbal, Adnan N. Qureshi, Mukti Akter)....Pages 782-789
A Hybrid Model to Predict Glucose Oscillation for Patients with Type 1 Diabetes and Suggest Customized Recommendations (João Paulo Aragão Pereira, Anarosa Alves Franco Brandão, Joyce da Silva Bevilacqua, Maria Lúcia Cardillo Correa Giannella)....Pages 790-801
Modeling a Vulnerability Index for Leprosy Using Spatial Analysis and Artificial Intelligence Techniques in a Hyperendemic Municipality in the Amazon (Rafael Eich da Silva, Valney Mara Gomes Conde, Marcos José da Silva Baia, Cláudio Guedes Salgado, Guilherme Augusto Barros Conde)....Pages 802-823
Ensemble Approach for Left Ventricle Segmentation (Chen Avni, Maya Herman)....Pages 824-834
Escaping Diagnosability and Entering Uncertainty in Temporal Diagnosis of Discrete-Event Systems (Nicola Bertoglio, Gianfranco Lamperti, Marina Zanella, Xiangfu Zhao)....Pages 835-852
A Robotic Hand for Arabic Sign Language Teaching and Translation (Maha Alrabiah, Hissah AlMuneef, Sadeem AlMarri, Ebtisam AlShammari, Faten Alsunaid)....Pages 853-869
Can Machine Learning Be Used to Discriminate Between Burns and Pressure Ulcer? (Aliyu Abubakar, Hassan Ugail, Ali Maina Bukar)....Pages 870-880
Genetic Algorithm Based Optimal Feature Selection Extracted by Time-Frequency Analysis for Enhanced Sleep Disorder Diagnosis Using EEG Signal (Md. Rashedul Islam, Md. Abdur Rahim, Md. Rajibul Islam, Jungpil Shin)....Pages 881-894
On Applying Ambient Intelligence to Assist People with Profound Intellectual and Multiple Disabilities (Michal Kosiedowski, Arkadiusz Radziuk, Piotr Szymaniak, Wojciech Kapsa, Tomasz Rajtar, Maciej Stroinski et al.)....Pages 895-914
Style Transfer for Dermatological Data Augmentation (Tamás Nyíri, Attila Kiss)....Pages 915-923
Analysis of the Stability, Control and Implementation of Real Parameters of the Robot Walking (Arbnor Pajaziti, Xhevahir Bajrami, Ahmet Shala, Ramë Likaj, Lum Rexha, Astrit Zekaj et al.)....Pages 924-939
Autonomous Robot Navigation with Signaling Based on Objects Detection Techniques and Deep Learning Networks (Carlos Gordón, Patricio Encalada, Henry Lema, Diego León, Cristhian Castro, Dennis Chicaiza)....Pages 940-953
Intelligent Autonomous Navigation of Robot KUKA YouBot (Carlos Gordón, Patricio Encalada, Henry Lema, Diego León, Cristhian Castro, Dennis Chicaiza)....Pages 954-967
Intrusion Detection in Robotic Swarms (Ian Sargeant, Allan Tomlinson)....Pages 968-980
Human Digital Twin: Enabling Human-Multi Smart Machines Collaboration (Wael Hafez)....Pages 981-993
‘If You Agree with Me, Do I Trust You?’: An Examination of Human-Agent Trust from a Psychological Perspective (Hsiao-Ying Huang, Michael Twidale, Masooda Bashir)....Pages 994-1013
Using Local Objects to Improve Estimation of Mobile Object Coordinates and Smoothing Trajectory of Movement by Autoregression with Multiple Roots (Nikita Andriyanov, Konstantin Vasiliev)....Pages 1014-1025
An Empirical Review of Calibration Techniques for the Pepper Humanoid Robot’s RGB and Depth Camera (Avinash Kumar Singh, Neha Baranwal, Kai-Florian Richter)....Pages 1026-1038
Convolutional Neural Network Applied to the Gesticulation Control of an Interactive Social Robot with Humanoid Aspect (Edisson Arias, Patricio Encalada, Franklin Tigre, Cesar Granizo, Carlos Gordon, Marcelo V. Garcia)....Pages 1039-1053
Automation of Synthesized Optimal Control Problem Solution for Mobile Robot by Genetic Programming (Askhat Diveev, Elena Sofronova)....Pages 1054-1072
Towards Partner-Aware Humanoid Robot Control Under Physical Interactions (Yeshasvi Tirupachuri, Gabriele Nava, Claudia Latella, Diego Ferigo, Lorenzo Rapetti, Luca Tagliapietra et al.)....Pages 1073-1092
Momentum-Based Topology Estimation of Articulated Objects (Yeshasvi Tirupachuri, Silvio Traversaro, Francesco Nori, Daniele Pucci)....Pages 1093-1105
Telexistence and Teleoperation for Walking Humanoid Robots (Mohamed Elobaid, Yue Hu, Giulio Romualdi, Stefano Dafarra, Jan Babic, Daniele Pucci)....Pages 1106-1121
Terrain Classification Using W-K Filter and 3D Navigation with Static Collision Avoidance (J. P. Matos-Carvalho, Dário Pedro, Luís Miguel Campos, José Manuel Fonseca, André Mora)....Pages 1122-1137
Robot Navigation with PolyMap, a Polygon-Based Map Format (Johann Dichtl, Xuan S. Le, Guillaume Lozenguez, Luc Fabresse, Noury Bouraqadi)....Pages 1138-1152
Identification of Motor Parameters on Coupled Joints (Nuno Guedelha, Silvio Traversaro, Daniele Pucci)....Pages 1153-1172
Improving Human-Machine Interaction for a Powered Wheelchair Driver by Using Variable-Switches and Sensors that Reduce Wheelchair-Veer (David Sanders, Martin Langner, Nils Bausch, Ya Huang, Sergey Khaustov, Sarinova Simandjunta)....Pages 1173-1191
De-Noising Signals Using Wavelet Transform in Internet of Underwater Things (Asiya Khan, Richard Pemberton, Abdul Momen, Daniel Bristow)....Pages 1192-1198
Using Experts’ Perceptual Skill for Dermatological Image Segmentation (Qiao Li, Wanju Hou)....Pages 1199-1208
Applications of Gaussian Process Latent Variable Models in Finance (Rajbir S. Nirwan, Nils Bertschinger)....Pages 1209-1221
Channel-Wise Reconstruction-Based Anomaly Detection Framework for Multi-channel Sensor Data (Mingu Kwak, Seoung Bum Kim)....Pages 1222-1233
Designing an Artefact for Sharing and Reusing Teaching Practices in Higher Education Institutions: An Exploratory Study (Nouf Almujally, Mike Joy)....Pages 1234-1242
Intelligent Method for 3D Image Display with Semitransparent Object Representations (Kohei Arai)....Pages 1243-1250
Estimation of Average Information Content: Comparison of Impact of Contexts (Michael Richter, Yuki Kyogoku, Max Kölbl)....Pages 1251-1257
Fuzzy Controller for Sun Tracking (Using Image Processing) (Ali Hamouda, Mutaz Ababneh, Mohamed Al Zahrani, Abdelkader Chabchoub)....Pages 1258-1266
GMC: Grid Based Motion Clustering in Dynamic Environment (Handuo Zhang, Karunasekera Hasith, Hui Zhou, Han Wang)....Pages 1267-1280
Rising from Systemic to Industrial Artificial Intelligence Applications (AIA) for Predictive Decision Making (PDM) - Four Examples (Bernhard Heiden, Bianca Tonino-Heiden, Tanja Obermüller, Christian Loipold, Wolfgang Wissounig)....Pages 1281-1288
A Machine Learning Approach for Classification of Tremor - A Neurological Movement Disorder (Rajesh Ranjan, Marimuthu Palaniswami, Braj Bhushan)....Pages 1289-1307
Back Matter ....Pages 1309-1312

Citation preview

Advances in Intelligent Systems and Computing 1038

Yaxin Bi Rahul Bhatia Supriya Kapoor Editors

Intelligent Systems and Applications Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 2

Advances in Intelligent Systems and Computing Volume 1038

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen, Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong

The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink ** More information about this series at http://www.springer.com/series/11156

Yaxin Bi Rahul Bhatia Supriya Kapoor •

•

Editors

Intelligent Systems and Applications Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 2

123

Editors Yaxin Bi School of Computing, Computer Science Research Institute Ulster University Newtownabbey, UK

Rahul Bhatia The Science and Information (SAI) Organization Bradford, West Yorkshire, UK

Supriya Kapoor The Science and Information (SAI) Organization Bradford, West Yorkshire, UK

ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-29512-7 ISBN 978-3-030-29513-4 (eBook) https://doi.org/10.1007/978-3-030-29513-4 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Editor’s Preface

The Intelligent Systems Conference (IntelliSys) 2019 was held on September 5 and 6, 2019, in London, UK. The Intelligent Systems Conference is a prestigious annual conference on areas of intelligent systems and artificial intelligence and their applications to the real world, which is built on the success of the IntelliSys conferences in the past five years held at London. This conference not only presented the state-of-the-art methods and valuable experience from researchers in the related research areas, but also provided the audience with a vision of further development in the fields. The research that comes out of a series of the IntelliSys conferences will provide insights into the complex intelligent systems and pave a way for the future development. The Program Committee of IntelliSys 2019 represented 25 countries, and the authors submitted 546 papers from 45 countries. This certainly attests to the widespread international importance of the theme of the conference. Each paper was reviewed on the basis of originality, novelty and rigorousness. After the reviews, 223 were accepted for presentation, out of which 189 papers are finally being published in the proceedings. The event was a two-day program comprised of 24 paper presentation sessions and poster presentations. The themes of the contributions and scientific sessions ranged from theories to applications, reflecting a wide spectrum of artificial intelligence. We are very gratified to have an exciting lineup of featured speakers who are among the leaders in changing the landscape of artificial intelligence and its application areas. Plenary speakers include: Grega Milcinski (CEO, Sinergise), Detlef D Nauck (Chief Research Scientist for Data Science at BT Technology), Giulio Sandini (Director of Research - Italian Institute of Technology) and Iain Brown (Head of Data Science, SAS UK&I). The conference would truly not function without the contributions and support received from authors, participants, keynote speakers, program committee members, session chairs, organizing committee members, steering committee members and others in their various roles. Their valuable support, suggestions, dedicated commitment and hard work have made the IntelliSys 2019 successful.

v

vi

Editor’s Preface

It has been a great honor to serve as the general chair for the IntelliSys 2019 and to work with the conference team. We believe this event will certainly help further disseminate new ideas and inspire more international collaborations. Kind Regards, Yaxin Bi Conference Chair

Contents

A New Approach of Service Platform for Water Optimization in Lettuce Crops Using Wireless Sensor Network . . . . . . . . . . . . . . . . . Edgar Maya-Olalla, Hernán Domínguez-Limaico, Carlos Vásquez-Ayala, Edgar Jaramillo-Vinueza, Marcelo Zambrano V, Alexandra Jácome-Ortega, Paul D. Rosero-Montalvo, and D. H. Peluffo-Ordóñez A Novel AI Based Optimization of Node Selection and Information Fusion in Cooperative Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . Yuan Gao, Hong Ao, Weigui Zhou, Su Hu, Haosen Yu, Yang Guo, and Jiang Cao

1

14

Using Automated State Space Planning for Effective Management of Visual Information and Learner’s Attention in Virtual Reality . . . . . Opeoluwa Ladeinde, Mohammad Abdur Razzaque, and The Anh Han

24

Real-Time Lane Detection and Extreme Learning Machine Based Tracking Control for Intelligent Self-driving Vehicle . . . . . . . . . . . . . . . Sabir Hossain, Oualid Doukhi, Inseung Lee, and Deok-jin Lee

41

Pedestrian Recognition and Obstacle Avoidance for Autonomous Vehicles Using Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Charlie Day, Liam McEachen, Asiya Khan, Sanjay Sharma, and Giovanni Masala

51

Improving Urban Air Quality Through Long-Term Optimisation of Vehicle Fleets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Darren M. Chitty, Rakhi Parmar, and Peter R. Lewis

70

Analysing Data Set of the Bike–Sharing System Demand with R Scripts: Mexico City Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luis A. Moncayo–Martínez

90

vii

viii

Contents

A Path Towards Understanding Factors Affecting Crash Severity in Autonomous Vehicles Using Current Naturalistic Driving Data . . . . . 106 Franco van Wyk, Anahita Khojandi, and Neda Masoud Automobile Automation and Lifecycle: How Digitalisation and Security Issues Affect the Car as a Product and Service? . . . . . . . . 121 Antti Hakkala and Olli I. Heimo Less-than-Truckload Shipper Collaboration in the Physical Internet . . . 138 Minghui Lai and Xiaoqiang Cai Smartphone-Based Intelligent Driver Assistant: Context Model and Dangerous State Recognition Scheme . . . . . . . . . . . . . . . . . . . . . . . 152 Igor Lashkov and Alexey Kashevnik Voice Recognition Based System to Adapt Automatically the Readability Parameters of a User Interface . . . . . . . . . . . . . . . . . . . 166 Hélène Soubaras A Machine-Synesthetic Approach to DDoS Network Attack Detection . . . 179 Yuri Monakhov, Oleg Nikitin, Anna Kuznetsova, Alexey Kharlamov, and Alexandr Amochkin Novel Synchronous Brain Computer Interface Based on 2-D EEG Local Binary Patterning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Daniela De Venuto and Giovanni Mezzina LSTM-Based Facial Performance Capture Using Embedding Between Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Hsien-Yu Meng and Jiangtao Wen Automatic Curation System Using Multimodal Analysis Approach (MAA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Wei Yuan, Yong Zhang, Xiaojun Hu, and Mei Song Mixed Reality for Industry? An Empirical User Experience Study . . . . 241 Olli I. Heimo, Leo Sakari, Tero Säntti, and Teijo Lehtonen Person Detection in Thermal Videos Using YOLO . . . . . . . . . . . . . . . . . 254 Marina Ivasic-Kos, Mate Kristo, and Miran Pobar Implementation of a Hybridized Machine Learning Framework for Flood Risk Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Oluwole Charles Akinyokun, Udoinyang Godwin Inyang, and Emem Etok Akpan Word Similarity Computing Based on HowNet and Synonymy Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Hongmei Nie, Jiaqing Zhou, Hui Wang, and Minshuo Li

Contents

ix

A Comparison of fastText Implementations Using Arabic Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Nuha Alghamdi and Fatmah Assiri The Social Net of Sentiment: Improving the Base Sentiment Analysis of High Impact Events with Lexical Category Exploration . . . 312 Maxwell Fowler, Aleshia Hayes, and Kanika Binzani Reflective Writing Analysis Approach Based on Semantic Concepts: An Evaluation of WordNet Affect Efficiency . . . . . . . . . . . . . . . . . . . . . 321 Huda Alrashidi and Mike Joy Semantic-Based Feedback Recommendation for Automatic Essay Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 Tsegaye Misikir Tashu and Tomáš Horváth Figurative Language Grounding in Humanoid Robots . . . . . . . . . . . . . . 347 Maja Gwóźdź Masdar: A Novel Sequence-to-Sequence Deep Learning Model for Arabic Stemming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Mohammed M. Fouad, Ahmed Mahany, and Iyad Katib Nomen Meum Earl: Yet Another Route to Intelligent Machine Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Chris Lanz LIT: Rule Based Italian Lemmatizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Simone Molendini, Antonio Guerrieri, and Andrea Filieri Intelligent Sense-Enabled Lexical Search on Text Documents . . . . . . . . 405 Anu Thomas and S. Sangeetha Predicting Sentiment in Yorùbá Written Texts: A Comparison of Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 Abimbola Rhoda Iyanda and Omolayo Abegunde An Introductory Survey on Attention Mechanisms in NLP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 Dichao Hu Advanced Similarity Measures Using Word Embeddings and Siamese Networks in CBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Kareem Amin, George Lancaster, Stelios Kapetanakis, Klaus-Dieter Althoff, Andreas Dengel, and Miltos Petridis Intelligent Flower Detection System Using Machine Learning . . . . . . . . 463 Amna Safar and Maytham Safar

x

Contents

Development of an Interactive Virtual Reality for Medical Skills Training Supervised by Artificial Neural Network . . . . . . . . . . . . . . . . . 473 Shabnam Sadeghi Esfahlani, Viktor Izsof, Sabrina Minter, Ali Kordzadeh, Hassan Shirvani, and Karim Sadeghi Esfahlani Toward Robust Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Basemah Alshemali, Alta Graham, and Jugal Kalita Determining the Number of Hidden Layers in Neural Network by Using Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . 490 Muh. Ibnu Choldun R., Judhi Santoso, and Kridanto Surendro Symmetry Constrained Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 501 Doron L. Bergman An Integrated SEM-Neural Network for Predicting and Understanding the Determining Factor for Institutional Repositories Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 Shahla Asadi, Rusli Abdullah, and Yusmadi Yah Jusoh Load Balancing Using Neural Networks Approach for Assisted Content Delivery in Heterogeneous Network . . . . . . . . . . . . . . . . . . . . . 533 Raid Sakat, Raed Saadoon, and Maysam Abbod Combining Diffusion Processes for Semi-supervised Learning on Graph Structured Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Abdullah Al-Gafri, Muhammed Moinuddin, and Ubaid M. Al-Saggaf Smart Energy Usage and Visualization Based on Micro-moments . . . . . 557 Abdullah Alsalemi, Faycal Bensaali, Abbes Amira, Noora Fetais, Christos Sardianos, and Iraklis Varlamis The Functional Design Method for Public Buildings Together with Gamification of Information Models Enables Smart Planning by Crowdsourcing and Simulation and Learning of Rescue Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 Jukka Selin and Markku Rossi Deep Learning Based Pedestrian Detection at Distance in Smart Cities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588 Ranjith K Dinakaran, Philip Easom, Ahmed Bouridane, Li Zhang, Richard Jiang, Fozia Mehboob, and Abdul Rauf Infrastructural Models of Intermediary Service Providers in Digital Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Anton Ivaschenko, Stanislav Korchivoy, and Michail Spodobaev “Smart City” Governance Technologies Development in the Era of the 4th Industrial Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606 Mikhail Kuznetsov, Maria Nikishova, and Anna Belova

Contents

xi

D2C-DM: Distributed-to-Centralized Data Management for Smart Cities Based on Two Ongoing Case Studies . . . . . . . . . . . . . . . . . . . . . . 619 Amir Sinaeepourfard, John Krogstie, and Sobah Abbas Petersen Information and Thermic Control System for Water Contaminant Charge Removal of the City of Villavicencio, Colombia . . . . . . . . . . . . . 633 Obeth Romero Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642 João Caldeira, Alex Fout, Aniket Kesari, Raesetje Sefala, Joseph Walsh, Katy Dupre, Muhammad Rizal Khaefi, Setiaji, George Hodge, Zakiya Aryana Pramestri, and Muhammad Adib Imtiyazi Dynamic Scaling of EEG Fluctuations of Patients with Learning Disorders Based on Artificial Intelligence . . . . . . . . . . . . 650 Oswaldo Morales Matamoros, Jesús Jaime Moreno Escobar, Ixchel Lina Reyes, Teresa Ivonne Contreras Troya, and Ricardo Tejeida Padilla Blood Glucose Level Prediction Using Optimized Neural Network for Virtual Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Muhammad Asad, Younas Khan, Usman Qamar, and Saba Bashir A Review of Continuous Blood Glucose Monitoring and Prediction of Blood Glucose Level for Diabetes Type 1 Patient in Different Prediction Horizons (PH) Using Artificial Neural Network (ANN) . . . . . 684 Muhammad Asad and Usman Qamar ED Revisits Forecasting: Utilizing Latent Models . . . . . . . . . . . . . . . . . . 696 Ofir Ben-Assuli and Joshua R. Vest Color Signal Processing Methods for Webcam-Based Heart Rate Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703 Mikhail Kopeliovich and Mikhail Petrushan Genetic Algorithm Based Selection of Appropriate Biomarkers for Improved Breast Cancer Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 724 Arnab Kumar Mishra, Pinki Roy, and Sivaji Bandyopadhyay Electronic Prosthetics for the Rehabilitation of the Carpal Tunnel Syndrome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733 Santiago Núñez, Patricio Encalada, Santiago Manzano, Juan P. Pallo, Dennis Chicaiza, and Carlos Gordón Artificial Intelligent (AI) Clinical Edge for Voice Disorder Detection . . . 750 Jaya Shankar Vuppalapati, Santosh Kedaru, Sharat Kedari, Anitha Ilapakurti, and Chandrasekar Vuppalapati

xii

Contents

Virtual Therapy System in a Multisensory Environment for Patients with Alzheimer’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Patricio Encalada, Johana Medina, Santiago Manzano, Juan P. Pallo, Dennis Chicaiza, Carlos Gordón, Carlos Núñez, and Diego F. Andaluz Using Local Binary Patterns and Convolutional Neural Networks for Melanoma Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782 Saeed Iqbal, Adnan N. Qureshi, and Mukti Akter A Hybrid Model to Predict Glucose Oscillation for Patients with Type 1 Diabetes and Suggest Customized Recommendations . . . . . 790 João Paulo Aragão Pereira, Anarosa Alves Franco Brandão, Joyce da Silva Bevilacqua, and Maria Lúcia Cardillo Correa Giannella Modeling a Vulnerability Index for Leprosy Using Spatial Analysis and Artificial Intelligence Techniques in a Hyperendemic Municipality in the Amazon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802 Rafael Eich da Silva, Valney Mara Gomes Conde, Marcos José da Silva Baia, Cláudio Guedes Salgado, and Guilherme Augusto Barros Conde Ensemble Approach for Left Ventricle Segmentation . . . . . . . . . . . . . . . 824 Chen Avni and Maya Herman Escaping Diagnosability and Entering Uncertainty in Temporal Diagnosis of Discrete-Event Systems . . . . . . . . . . . . . . . . . 835 Nicola Bertoglio, Gianfranco Lamperti, Marina Zanella, and Xiangfu Zhao A Robotic Hand for Arabic Sign Language Teaching and Translation . . . 853 Maha Alrabiah, Hissah AlMuneef, Sadeem AlMarri, Ebtisam AlShammari, and Faten Alsunaid Can Machine Learning Be Used to Discriminate Between Burns and Pressure Ulcer? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870 Aliyu Abubakar, Hassan Ugail, and Ali Maina Bukar Genetic Algorithm Based Optimal Feature Selection Extracted by Time-Frequency Analysis for Enhanced Sleep Disorder Diagnosis Using EEG Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881 Md. Rashedul Islam, Md. Abdur Rahim, Md. Rajibul Islam, and Jungpil Shin On Applying Ambient Intelligence to Assist People with Profound Intellectual and Multiple Disabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 895 Michal Kosiedowski, Arkadiusz Radziuk, Piotr Szymaniak, Wojciech Kapsa, Tomasz Rajtar, Maciej Stroinski, Carmen Campomanes-Alvarez, B. Rosario Campomanes-Alvarez, Mitja Lustrek, Matej Cigale, Erik Dovgan, and Gasper Slapnicar

Contents

xiii

Style Transfer for Dermatological Data Augmentation . . . . . . . . . . . . . . 915 Tamás Nyíri and Attila Kiss Analysis of the Stability, Control and Implementation of Real Parameters of the Robot Walking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924 Arbnor Pajaziti, Xhevahir Bajrami, Ahmet Shala, Ramë Likaj, Lum Rexha, Astrit Zekaj, and Dibran Hoxha Autonomous Robot Navigation with Signaling Based on Objects Detection Techniques and Deep Learning Networks . . . . . . . . . . . . . . . 940 Carlos Gordón, Patricio Encalada, Henry Lema, Diego León, Cristhian Castro, and Dennis Chicaiza Intelligent Autonomous Navigation of Robot KUKA YouBot . . . . . . . . . 954 Carlos Gordón, Patricio Encalada, Henry Lema, Diego León, Cristhian Castro, and Dennis Chicaiza Intrusion Detection in Robotic Swarms . . . . . . . . . . . . . . . . . . . . . . . . . 968 Ian Sargeant and Allan Tomlinson Human Digital Twin: Enabling Human-Multi Smart Machines Collaboration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981 Wael Hafez ‘If You Agree with Me, Do I Trust You?’: An Examination of Human-Agent Trust from a Psychological Perspective . . . . . . . . . . . . 994 Hsiao-Ying Huang, Michael Twidale, and Masooda Bashir Using Local Objects to Improve Estimation of Mobile Object Coordinates and Smoothing Trajectory of Movement by Autoregression with Multiple Roots . . . . . . . . . . . . . . . . . . . . . . . . . 1014 Nikita Andriyanov and Konstantin Vasiliev An Empirical Review of Calibration Techniques for the Pepper Humanoid Robot’s RGB and Depth Camera . . . . . . . . . . . . . . . . . . . . . 1026 Avinash Kumar Singh, Neha Baranwal, and Kai-Florian Richter Convolutional Neural Network Applied to the Gesticulation Control of an Interactive Social Robot with Humanoid Aspect . . . . . . . . . . . . . . 1039 Edisson Arias, Patricio Encalada, Franklin Tigre, Cesar Granizo, Carlos Gordon, and Marcelo V. Garcia Automation of Synthesized Optimal Control Problem Solution for Mobile Robot by Genetic Programming . . . . . . . . . . . . . . . . . . . . . . 1054 Askhat Diveev and Elena Sofronova Towards Partner-Aware Humanoid Robot Control Under Physical Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073 Yeshasvi Tirupachuri, Gabriele Nava, Claudia Latella, Diego Ferigo, Lorenzo Rapetti, Luca Tagliapietra, Francesco Nori, and Daniele Pucci

xiv

Contents

Momentum-Based Topology Estimation of Articulated Objects . . . . . . . 1093 Yeshasvi Tirupachuri, Silvio Traversaro, Francesco Nori, and Daniele Pucci Telexistence and Teleoperation for Walking Humanoid Robots . . . . . . . 1106 Mohamed Elobaid, Yue Hu, Giulio Romualdi, Stefano Dafarra, Jan Babic, and Daniele Pucci Terrain Classification Using W-K Filter and 3D Navigation with Static Collision Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122 J. P. Matos-Carvalho, Dário Pedro, Luís Miguel Campos, José Manuel Fonseca, and André Mora Robot Navigation with PolyMap, a Polygon-Based Map Format . . . . . . 1138 Johann Dichtl, Xuan S. Le, Guillaume Lozenguez, Luc Fabresse, and Noury Bouraqadi Identification of Motor Parameters on Coupled Joints . . . . . . . . . . . . . . 1153 Nuno Guedelha, Silvio Traversaro, and Daniele Pucci Improving Human-Machine Interaction for a Powered Wheelchair Driver by Using Variable-Switches and Sensors that Reduce Wheelchair-Veer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173 David Sanders, Martin Langner, Nils Bausch, Ya Huang, Sergey Khaustov, and Sarinova Simandjunta De-Noising Signals Using Wavelet Transform in Internet of Underwater Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1192 Asiya Khan, Richard Pemberton, Abdul Momen, and Daniel Bristow Using Experts’ Perceptual Skill for Dermatological Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199 Qiao Li and Wanju Hou Applications of Gaussian Process Latent Variable Models in Finance . . . 1209 Rajbir S. Nirwan and Nils Bertschinger Channel-Wise Reconstruction-Based Anomaly Detection Framework for Multi-channel Sensor Data . . . . . . . . . . . . . . . . . . . . . . 1222 Mingu Kwak and Seoung Bum Kim Designing an Artefact for Sharing and Reusing Teaching Practices in Higher Education Institutions: An Exploratory Study . . . . . . . . . . . . 1234 Nouf Almujally and Mike Joy Intelligent Method for 3D Image Display with Semitransparent Object Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1243 Kohei Arai

Contents

xv

Estimation of Average Information Content: Comparison of Impact of Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1251 Michael Richter, Yuki Kyogoku, and Max Kölbl Fuzzy Controller for Sun Tracking (Using Image Processing) . . . . . . . . 1258 Ali Hamouda, Mutaz Ababneh, Mohamed Al Zahrani, and Abdelkader Chabchoub GMC: Grid Based Motion Clustering in Dynamic Environment . . . . . . 1267 Handuo Zhang, Karunasekera Hasith, Hui Zhou, and Han Wang Rising from Systemic to Industrial Artificial Intelligence Applications (AIA) for Predictive Decision Making (PDM) - Four Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1281 Bernhard Heiden, Bianca Tonino-Heiden, Tanja Obermüller, Christian Loipold, and Wolfgang Wissounig A Machine Learning Approach for Classification of Tremor - A Neurological Movement Disorder . . . . . . . . . . . . . . . . . . 1289 Rajesh Ranjan, Marimuthu Palaniswami, and Braj Bhushan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1309

A New Approach of Service Platform for Water Optimization in Lettuce Crops Using Wireless Sensor Network Edgar Maya-Olalla1 , Hern´ an Dom´ınguez-Limaico1 , Carlos Vásquez-Ayala1 , 1 acome-Ortega1 , Edgar Jaramillo-Vinueza , Marcelo Zambrano V1 , Alexandra J´ 1,2(B) Paul D. Rosero-Montalvo , and D. H. Peluffo-Ord´ on ˜ez3 1

2

Universidad Técnica del Norte, Ibarra, Ecuador {eamaya,pdrosero}@utn.edu.ec Instituto Tecnol´ ogico Superior 17 de Julio, Urcuqu´ı, Ecuador 3 YachayTech, Urcuqu´ı, Ecuador

Abstract. Wireless sensor network is implemented and communicated with the cloud through IPv6. The entire system is applied to precision irrigation systems for lettuce crops in Ecuador. The main objective is to provide optimization system for irrigation water for productive purposes and providing crops with the adequate amount of water needed for surviving and producing. To do that the system has a data acquisition system by sensors and this data is stored in web services. By improving the irrigation system crops can be planted throughout the year including summer, the system has a remarkable result for efficient water savings and lettuce crops. Keywords: WSN Irrigation

1

· Cloud Computing · Precision agriculture ·

Introduction

In Ecuador, irrigation water is an increasingly limited resource. Currently, in the Northern Zone, small and medium-scale farmers continue to use the water in a traditional way (furrow irrigation). As a result, this inefficient process wastes water resources [1]. In this way, is necessary to use proper technological tools to improve the irrigation process and optimizing the water usage in their crops. Nowadays, many systems try to acquire data, analyzing them and obtain new knowledge about when and where is necessary irrigation [2]. Thus allowing the efficient use of irrigation water from the establishment of technical and environmental conditions for agricultural activities [3]. In addition, it contributes to government strategies about Productive matrix transformation. For this reason, agricultural development of the Ecuador Northern Zone is a very important part for governmental Agenda [4,5]. For data acquisition in crops, wireless sensor network (WSN) allows the collection of environmental data, later it will send and store on a Cloud Computing c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 1–13, 2020. https://doi.org/10.1007/978-3-030-29513-4_1

2

E. Maya-Olalla et al.

platform. With these values, WSN can make decisions for the drip irrigation system [5]. Each WSN node needs to communicate between them. For this reason, WSN uses 6loWPAN technology [6]. It uses IPv6 protocol on networks based on the IEEE 802.15.4 standard, achieving direct network node communication with other IP devices and thus facilitate the monitoring of the field variables. The use of the RPL routing protocol that allows communication between each one of the nodes based on a mesh topology [7]. The incorporation of IPv6 in the WSN is chosen, trying to anticipate the new technological advances in the next generation networks, allowing the possibility of connection to the cloud and global flexibility of scalability by its availability of space of 128 bits for the origin addresses and of destination which allows having 3.4 × 1038 possible directions [8] managing to get to the concept of the Internet of Things. Once the data of the nodes is acquired it is sent to a public PaaS server [8], for its storage in a database and the creation, development, implementation and hosting in the cloud of a website that monitors the environmental variables in order to control the irrigation system, all this based on free code [9]. The model of the platform as a service (PaaS), aims to provide API developers to develop, test, implement and manage their applications remotely through the cloud [10]. With the new tendencies of PaaS tools, the same allows integrating services of intelligent and effective form for the companies by means of different standards for tools of development. In this way, management is simplified through an SOA framework by improving the life cycle of services and applications. As a result, there are quality guarantees and code audits. In such a way this streamlines the internal operational transfers of each company [11]. At present, there are several PaaS platforms such as Azure, AWS, Criteria, among others. One of the most used is OpenShift because it has short application development cycles and quality software. Among its main features is the use of firewalls, intrusion detection systems, port monitoring, verification of RPM (Red Hat RPM package manager), encrypted communication, multilanguage, to name the most important [12]. Related works as [8–10] exposes a different way to implement a WSN and connect to IoT. However, exists open issues like WSN to PAAS communication, PAAS service into a greenhouse, among others. The present system acquires data from environmental data in lettuce crops through WSN and sensors. To improve the communication between WSN nodes, each electronic systems has a 6loWPAN protocol. One WSN node is the gateway to 6LoWPAN protocol and the Internet, where the system stores data. Later, a Paas service is implemented in the cloud to determine how it is necessary to activate the drill system. The rest of this paper is structured as follows: Sect. 2 presents the electronic design and WSN nodes communication. Section 3 shows the PaaS methodology and their implementation. The results are presented in Sect. 4. Finally, Sect. 5 presents the most remarkable conclusions.

2

Electronic Design

In this section are explained: (Sect. 2.1) data sensor acquisition, (Sect. 2.2) WSN architecture, and (Sect. 2.3) wireless sensor network logical topology.

PaaS for Water Optimization in Crops

2.1

3

Sensor Data Acquisition

The design of the wireless sensor network through 6loWPAN is applied in the Hacienda Cananvalle in Ecuador North zone, which is a work area located in the Ibarra city. For the implementation are considered two spaces: one of 180 m2 and another of 200 m2 , which are used for the planting vegetables [13,14]. WSN system works in two stages. For one side, acquire data by sensors used by precision agriculture as: (i) A temperature sensor is used which determines the optimum environmental condition when irrigating, as it should not be irrigated in frost or in excessive temperatures because the crops can be affected, these values are measured in degrees Celsius. (ii) Relative humidity sensor, which allows the measurement of the humidity or quantity of water vapor in the environment, values that are measured from 0 to 100%. (iii) Soil moisture sensor, which determines when soil and crops need moisture, the main parameter for activating or stopping the irrigation system to achieve water optimization. (iv) The sensor of luminosity, it allows determining what the condition of irrigation is by effects of the sun, as when there is high luminosity it can get to burn the crops by the reflection of the water. (v) Rain sensor, its application is very simple since when it detects the presence of rain the irrigation system is deactivated not to waste irrigation water. For another side, WEN needs to active de irrigation systems. For this reason, WSN uses a relay to convert 5 volts in 12 or 110 to active to solenoid valves. It uses two types of embedded systems such as CM5000 [10] which is a TelosB clone [11] which incorporates temperature, relative humidity, and luminosity sensors, and an Arduino Uno [15] to read external sensors of soil moisture and precipitations. The WSN has an Arduino module and a relay plate that serve to activate or deactivate the system [16]; it could be manipulated manually, automatically or by the state of the monitoring of the sensors. Figure 1 shows the data acquisition systems and solenoid valves activation. 2.2

Wireless Sensor Network Architecture

The wireless sensor network architecture is based on a mesh topology [13], as 6loWPAN provides IPv6 routing protocols such as RLP (IPv6 Routing Protocol for Low-Power and Lossy Networks), where each client node sends the sensor status information to the server node for data transmission and can re-transmit messages to their neighbors and use more efficient links to reach the target server [17]. This topology allows the automatic repair of its links in the event that any client node fails, that there is interference or new nodes are added to the network. This topology uses multiple paths available to reach its destination allowing it to be a scalable, robust and reliable network [18]. The project architecture is presented in Fig. 2 and IPv6 addressing of the wireless sensors network is presented in Table 1.

4

E. Maya-Olalla et al.

(a) Sensor block diagram.

(b) actuators block diagram

Fig. 1. Sensor data acquisition scheme Table 1. IPv6 addressing of the wireless sensors network Device

MAC address

Client 1

00:12:74:00:13:CC:1f:ed fe80::212:7400:13cc:1fed aaaa::212:7400:13cc:1fed

Link-local

Client 2

00:12:74:00:13:cb:0a:92 fe80::212:7400:13cb:a92 aaaa::212:7400:13cb:a92

Client 3

00:12:74:00:13:cc:01:70 fe80::212:7400:13cc:170 aaaa::212:7400:13cc:170

Server

00:12:74:00:13:cb:f8:8c

fe80::212:7400:13cb:f88c aaaa::ff:fe00:1

Router-border

2.3

IPv6-address

aaaa::b5:5aff:fe0b:114

Wireless Sensor Network Logical Topology

For the logical topology of the WSN, IPv6 addressing is used that is assigned to the nodes for their communication and has the following address: Stateless Address Autoconfiguration [14] (SAA) = Prefix (64 bits) + Subfix (64bit). The determination of the height of the location of the nodes on the ground and that there is coverage of the communication antennas of the WSN is applied (1) to determine the first Fresnel zone as indicated in Fig. 3 with the objective to avoid signal losses and thus optimize the power of the devices since it must be free of obstacles. Where (d) is the distance measured in Km between the server and the client, in this case, there is a distance equal to 50 m; f is the operating frequency of the link measured in Ghz, a frequency of 2.4 GHz is used; whereby the optimum radius rm is equal to 1.24 m. dKm (1) rm = 17.32 4fGhz

PaaS for Water Optimization in Crops

Fig. 2. Logic topology of the wireless sensors network

Fig. 3. First Fresnel zone between antennas.

where: rm = optimalradio dKm = 50 m

5

6

E. Maya-Olalla et al.

fGhz = 2.4 Ghz rm = 17.32

3

0, 05Km = 1, 24 m 4(2, 4)Ghz

Paas Methodology

Once the nodes have sent the data to the server, it is responsible for moving them to a platform as PAAS - OpenShift service which is an open source software, which has tools for storage in MySQL and the creation, development and hosting of web interfaces in PHP programming language which allows monitor the environmental conditions of the agricultural sector as shown in Fig. 4 for subsequent decision making in the control of the irrigation system [18].

Fig. 4. PaaS.

3.1

Paas Implementation

This section describes the software of the sensor nodes and the web application developed in this research. For the hardware of the WSN nodes, a real-time network operating system is built based on Contiki OS, which is based on open source and it can be adapted to microcontrollers with limited memory capacity allowing support for IPv6 [15]. The CM5000 embedded devices support the 6loWPAN standard, allowing connectivity and wireless network management of the sensors.

PaaS for Water Optimization in Crops

7

Fig. 5. Topology between the wireless sensor network and the cloud computing PAAS

3.2

Paas Web Server

The web application is divided into two parts, the first is in charge of managing requests for control of external or local users where they can activate or deactivate the irrigation manually, by alarm programming or by the current state of the sensors, in addition, collecting data from real-time sensors from the 6loWPAN wireless network, all these activities are to be stored in a database in the OpenShift cloud, as shown in Fig. 5; and the second is a web site that presents a graphical interface where clients with a PC, laptop, tablet or mobile device can control and monitor the irrigation system through.

8

4

E. Maya-Olalla et al.

Results

The 6loWPAN standard allows integration between wireless sensor networks and TCP/IP networks, collecting information on the state of the sensors focused on agriculture and then being presented in real time to the user over TCP/IP networks [18]. By means of the CM 5000 embedded devices, the sensors can be monitored in real time, and then a web page of the current state of the measurements for subsequent decision-making is displayed in Fig. 6. 4.1

Web System Monitoring

Figure 10 shows the application of the 6loWPAN standard by means of a USBDongle UD1000 sniffer, this embedded system uses the TinyOS operating system. For the analysis the Z-Monitor and Wireshark tools are used, allowing to monitor the 6loWPAN standard and the IEEE 802.15.4 standard to verify its performance. This process is showed in Fig. 7.

Fig. 6. Graphic interface of the control system.

PaaS for Water Optimization in Crops

9

Fig. 7. Monitoring sniffer

The objective of the proposal is to optimize the consumption of irrigation water, which is evidenced to be less than traditional flood irrigation systems, showing an efficient saving between 70 and 85%, this is important when the limiting factor for agriculture is water. In order to evaluate the system, preliminary tests are carried out, these are based on field research, which consists in solving troublesome situations, working in a natural environment and is based on data collection, in this case, is used the observation made for the different types of irrigation which are subsequently compared in order to obtain results. Four methods of irrigation were used in vegetable crops, they were observed during a time interval and data was collected every 15 min and a data record of the scenarios was obtained for its comparison. 4.2

Application of Irrigation by Hose

The first test of distribution of irrigation was achieved with the use of a hose that for 5 min used an approximate of 4 l in the plants located in a groove; visually achieving. The crop received adequate moisture. The first groove where water was supplied with the help of a hose, its location was very accessible to sunlight, at 10:00 am its temperatures reached 30 ◦ C. As the temperature is not adequate the crop did not produce the cabbage, taking an appearance opposite to normal as seen in Fig. 8.

10

E. Maya-Olalla et al.

Fig. 8. Plant deformation.

4.3

Denial of Water

The second dose of application of water to the crop was to deny the vital liquid during 7 days to observe the behavior. The plants stopped to develop foliage falling in the denominated water stress, giving as consequences that the root was lost as it dried, with the leaves losing the natural color, among others as can be observed in Fig. 9.

Fig. 9. Consequences of the lack of water.

PaaS for Water Optimization in Crops

4.4

11

Water Application with Drip Irrigation System

With the manual activation of the drip irrigation system, the visual method was performed in the same way, consuming 3 l in 15 min. One liter was reduced from the previous consumption which used a hose. With this method of application, the crop production was improved, but not as expected, its growth time was very slow and with a smaller production size (Fig. 10).

Fig. 10. Use of drip irrigation system.

Fig. 11. Plant growing with the proposed project.

12

4.5

E. Maya-Olalla et al.

Water Application with the Proposed Project

The method applied is the proposed one, which allows the control and monitoring of climatic variables in different crops using the wireless sensor network. In Fig. 11, it is observed as with the adequate supply of vital liquid without wasting, the plant has normal foliage, its cabbage normally forms, and its color is bright greenish.

5

Conclusions and Future Works

Farmers in the Northern Zone of the country are provided with tools that control drip irrigation systems and monitor climatic conditions, with new generation technologies such as the wireless sensor network through IPv6-6loWPAN and cloud computing. The proposal allows replenishing water to the soil in the area where it is actually needed, optimizing the irrigation water process. Monitoring can be done remotely by any device such as a personal computer, tablet or smart-phone, as the application is based on Web technology and adaptive templates stored in a public cloud, with graphical interfaces the management and administration is fast and easy for the farmer, because he can observe the virtual image of the terrain and the position of the irrigation devices that are installed. One limitation for the small agricultural sector is the scarce economic resources to access these types of systems because the required technology is very high and initially an investment is required for the installation. Due to this, it is proposed to seek local or international funding, such as the Senescyt-Ecuador through the Bank of Ideas, accessing seed capital, and thus give the opportunity to invest in innovation and entrepreneurship projects for farmers. As future works, the system can improve data acquisition by sensors. In this way, it can implement a real-time system to select an adequate time to wake up the system. Finally, the system should have an alert for the disconnection to the cloud.

References 1. Sartillo Salazar, E., Hern´ andez Hérnandez, J.C., Caporal, R.M., Martinez Hern´ andez, H.P., Ordo˜ nez Flores, R.: Maximum expectation algorithm and neuronal network base radial applied to the estimate of an environmental variable, evapotranspiration in a greenhouse. In: 2014 International Conference on Electronics, Communications and Computers (CONIELECOMP), Cholula, pp. 225– 230 (2014). https://doi.org/10.1109/CONIELECOMP.2014.6808595 2. Bennis, N., et al.: Greenhouse climate modelling and robust control. Comput. Electron. Agric. 61(2), 96–107 (2008). https://doi.org/10.1016/j.compag.2007.09. 014 3. Food and Agriculture Organization. http://www.fao.org/docrep/009/x0490s/x049 0s00.htm 4. Ponce, J.: Ministerio de Agricultura, Ganader´ıa, Acuacultura y Pesca. Plan Nacional de Riego y Drenaje 2012–2026, p. 5

PaaS for Water Optimization in Crops

13

5. Ministerio de Coordinaci´ on de la Producci´ on, Empleo y Competitividad. Agenda de Transformaci´ on Productiva 2010–2013, p. 137 6. Singh, K., Kumar, P., Singh, B.K.: An associative relational impact of water quality on crop yield: a comprehensive index analysis using LISS-III sensor. IEEE Sens. J. 13(12), 4912–4917 (2013). https://doi.org/10.1109/JSEN.2013.2276760 7. Lee, J., Kang, H., Bang, H., Kang, S.: Dynamic crop field analysis using mobile sensor node. In: 2012 International Conference on ICT Convergence (ICTC), Jeju Island, pp. 7–11 (2012). https://doi.org/10.1109/ICTC.2012.6386766 8. Vijayabaskar, P.S., Sreemathi, R., Keertanaa, E.: Crop prediction using predictive analytics. In: 2017 International Conference on Computation of Power, Energy Information and Commuincation (ICCPEIC), Melmaruvathur, pp. 370–373 (2017). https://doi.org/10.1109/ICCPEIC.2017.8290395 9. Ponce-Guevara, K.L.: GreenFarm-DM: a tool for analyzing vegetable crops data from a greenhouse using data mining techniques (first trial). In: IEEE Second Ecuador Technical Chapters Meeting (ETCM), Salinas 2017, pp. 1–6 (2017). https://doi.org/10.1109/ETCM.2017.8247519 10. Sahu, S., Chawla, M., Khare, N.: An efficient analysis of crop yield prediction using Hadoop framework based on random forest approach. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, pp. 53–57 (2017). https://doi.org/10.1109/CCAA.2017.8229770 11. Rosero-Montalvo, P.D., et al.: Data visualization using interactive dimensionality reduction and improved color-based interaction model. In: Biomedical Applications Based on Natural and Artificial Computing. IWINAC. LNCS, vol 10338. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7 30 12. Velasquez, L.C., Argueta, J., Mazariegos, K.: Implementation of a low cost aerial vehicle for crop analysis in emerging countries. In: IEEE Global Humanitarian Technology Conference (GHTC), Seattle, WA, pp. 21–27 (2016). https://doi.org/ 10.1109/GHTC.2016.7857255 13. Bhanu, B.B., Rao, K.R., Ramesh, J.V.N., Hussain, M.A.: Agriculture field monitoring and analysis using wireless sensor networks for improving crop production. In: 2014 Eleventh International Conference on Wireless and Optical Communications Networks (WOCN), Vijayawada, pp. 1–7 (2014). https://doi.org/10.1109/ WOCN.2014.6923043 14. Ma, X., Luo, W.: The analysis of 6LowPAN technology. In: Pacific-Asia Workshop, vol. 1, pp. 963–966, 19–20 December 2008 (2008) 15. Zhang, Y., Li, Z.: IPv6 conformance testing: theory and practice. In: Test Conference Proceedings ITC 2004, pp. 719–727, 26–28 October 2004 (2004) 16. Accettura, N., Grieco, L., Boggia, G, Camarda, P.: Performance analysis of the RPL routing protocol. In: 2011 IEEE International Conference on Mechatronics (ICM), pp. 767–772, 13–15 April 2011 (2011) 17. Nu˜ nez, D.: Estudio para la migracion de IPv4 a IPv6 para la empresa proveedora de internet Milltec S.A. Quito, Ecuador. EPN, p. 22 (2009) 18. Aslam, M., Rea, S., Pesch, D.: Service provisioning for the WSN cloud, pp. 962–969 (2012)

A Novel AI Based Optimization of Node Selection and Information Fusion in Cooperative Wireless Networks Yuan Gao1,2,3,4(&), Hong Ao2, Weigui Zhou2, Su Hu3(&), Haosen Yu5, Yang Guo1, and Jiang Cao1 1

Academy of Military Science of PLA, Beijing 100090, China Xichang Satellite Launch Center, Xichang 615000, China 3 University of Electronic Science and Technology of China, Sichuan 611731, China [email protected] State Key Laboratory on Microwave and Digital Communications, National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China [email protected] 5 International College, Chongqing University of Posts and Telecommunications, Chongqing 400065, China 2

4

Abstract. The increasing amount of mobile terminals has brought significant traffic load to base stations, which will lead to data failure. Considering the increasing ability of mobile terminals, cooperation between mobile terminals will help increase the spectrum efficiency and reduce latency. In this work, we have raised a novel node selection and transmission fusion method using artificial intelligence. First of all, we draw the status of mobile terminals reflected by thermal pattern, then we propose the deep learning method to indicate the status of every node and make an optimized selection of target node, at last, we perform the multi stage transmission for wireless information fusion to enhance the spectrum efficiency. Simulation results have proved that our suggested method could help select the proper node to adopt transmission among all candidates and the average throughput is increased up to 32% from the system level simulation. Keywords: Artificial intelligence

Node selection Information fusion

1 Introduction The increasing demand for transmission speed and Quality of Service/Efficiency (QoS/QoE) in future wireless communication systems will require more system resources such as bandwidth, power, spatial antennas, etc. [1]. The ability of mobile terminals are developing faster rather than 5 years ago, relying on the upgrade of microchip computers, such Qualcomm Snapdragon, Huawei Kirin [2]. Most of the mobile terminals are equipped with multiple transmission types, for example, cellular,

© Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 14–23, 2020. https://doi.org/10.1007/978-3-030-29513-4_2

A Novel AI Based Optimization of Node Selection and Information Fusion

15

Bluetooth, Wi-Fi, etc., but users could select only one of them to accomplish the transmission, which will bring the waste of spectrum resources and slow down the performance of transmission. To bond them together, the cooperative transmission technique has been raised through base station cooperation using backhaul [3, 4]. But, the performance of mobile terminals is still depending on the transmission distance, through which we cannot obtain enough gain. Software defined network (SDN) is one possible solution in 5G standardization [5]. All the network elements are virtualized based on the same hardware platform, all functions of transmission and signal processing could be upgraded from air [6], which is the infrastructure of terminal cooperation. In the deployment of 5G systems, small cell is considered key technique to improve system performance [7]. Considering the benefits and need to cooperation, many researchers are discussing cooperation techniques in 5G and future wireless networks [8]. In [9], massive MIMO is discussed to break the bottleneck of throughput, but the multiple antenna system will require far more complex signal processing hardware. In [10], the author present the caching method to reduce the complexity in multicast system, and in [2], the comparison of backhaul transmission is given from energy consumption, throughput, complexity, etc. In [11] and [12], the author summarize the future of wireless communication system using artificial intelligence, from which we can squeeze benefits in wireless transmission process, in [13], the authors discussed the AI in IoT system, the foundation of big data is given to perform the learning. In [14], AI in optical networks is discussed, such as resource allocation and power control. In [15], the author discussed the AI in 5G but this work is only the open discussion not solution. In [16–18], deep learning is used in satellite communication and IoT, from which we can infer that AI is suitable for decision. In [19], the author present the experimental result in 5G using AI, with the non-optimized AI, the performance is significantly increased. In [20], AI helps the deployment of indoor Wi-Fi, with multiple influences considered, and the results seems good compared to human decided systems, in [21] and [22], AI based network management and optimization is given with two different scenarios, the results also seem good. All these new studies are not about the terminal cooperation, so in this paper, based on the requirements of terminal cooperation, we discuss the AI based node selection and information fusion in wireless networks. In Sect. 2, we discuss the system model with sketch figure. In Sect. 3, we discuss the AI based node selection and information fusion. In Sect. 4, we give the simulation and analysis. Finally, conclusion is drawn in Sect. 5.

2 System Model In wireless communication systems, terminals are equipped with many types of wireless access ability. In Fig. 1, we present the system structure referring to 5G wireless systems. Base stations are still important in wireless networks, equipped with massive MIMO and wireless backhaul, base stations could afford more high speed data traffic, such as HD video, virtual reality. Base stations are connected together using X2

16

Y. Gao et al.

interface or S1 interface to core network. The internet content servers are providing data content to customers through different links, however, when the same data is applied by multiple users, the core network and the wireless link will afford high pressure, but this is the waste. If one terminal obtained the data from base station, such data could be cached at the base station side and when other terminal require the same data, the base station will no longer need to get the same data but to send the data directly from the cache, which will save time and energy. Terminal cooperation means that all the mobile terminals could cache the data and send them to other terminals, so that the base stations will no longer need to cache duplicate information. As is shown in Fig. 1, UE1 and UE2 are served by cellular network, and they could share information use inter group cooperation. If UE4 and UE5 want to obtain information from each other, they will need to perform the intra group cooperation as mentioned.

Ethernet

Content Server Content Server

Content Server Wireless

Wireless Server

Wireless Server

Core Network

X2

X2

Cellular

WiFi AP

UE1

UE2

Inter-Group Cooperaon

ZigBee AP

UE3

UE4

UE5 UE6

Intra-Group Cooperaon

BT AP

UE7

Fig. 1. System structure of terminal cooperation using wireless fusion [23].

The cooperation between users is a simple and effective method to enhance system performance and reduce latency. The problem is also clear when adopting the cooperation, the terminal must decide which node will be selected to perform cooperation, and then the node must decide the how to cooperate.

A Novel AI Based Optimization of Node Selection and Information Fusion

17

3 AI Based Optimization In a finite AI based programming problem, we assume S ¼ fp; Bg is the set of states, p is the power of node and B is the bandwidth. AðsÞ is denoted as actions, and s 2 S are finite. The transition probabilities Pass0 ¼ Prfst þ 1 ¼ s0 j st ¼ s; at ¼ ag represent the dynamics, Rass0 ¼ E frt þ 1 j at ¼ a; st ¼ s; st þ 1 ¼ s0 g is assumed the system rewards along with the transition probability, 8 s 2 S; a 2 AðsÞ. 3.1

Node Selection

For any given node Ti ; i 2 I, it will require information from base station or other terminal node. If the selected node is busy, it will not provide enough power and bandwidth to finish the transmission, so we use power and bandwidth to describe the status of the node. Define pitx is the available transmit power for a candidate node i and Bi is the available bandwidth, according to Shannon formula, the capacity that the terminal will have from node i is given: Ci ¼ Bi log2 ð1 þ

pitx Þ n0 B i

ð1Þ

So we can see that, the capacity is the decision variable that the intelligence system will use. For any observation point, the optimal node to select is that the node is not occupied until the transmission of data is accomplished. If there is other user that will get access to the target node, the transmission will be affected. The proposed optimal policy is calculated by the optimal value functions, denoted by V* or Q*, which will also satisfy the Bellman optimality equations. V ðsÞ ¼ maxa Q ðs; aÞ ¼

X s0

X s0

Pass0 Rass0 þ cV ðs0 Þ ; or

Pass0 Rass0 þ c maxa0 Q ðs0 ; a0 Þ ;

ð2Þ

8 s 2 S; a 2 AðsÞ; and s0 2 S þ : After the results of optimal policy, to reduce the complexity of time consumption without losing the convergence guarantee of iteration, the next step is to perform value iteration. The optimization is obtained by backup operation, which combines both policy enhancement and truncated policy evaluation steps. Vk þ 1 ðsÞ ¼ maxa

X s0

Pass0 Rass0 þ cVk ðs0 Þ

ð3Þ

18

Y. Gao et al.

With every observation period, the status of the node is given, the uncertainty is the upcoming user and data traffic, so the AI based node selection is to choose the proper node among all candidates, we have the following method to perform the selection. Value iteration will finally converge and reach the bound with infinite number of iterations. Thus, optimal point is achieved if the optimization in the value function turns small. 3.2

Wireless Information Fusion

After selecting the proper node to transmit, there goes to the second problem of information fusion, for which the combined information could consists of a complete data. Relying of Shannon Formula, the frequency channel model is given as follows: y A ¼ H A s þ nA y B ¼ H B s þ nB

ð4Þ

yA ðyB Þ C K1 illustrates the received signal from wireless channel of node A(or node B), s 2 C2K1 represents the symbol vector transmitted through the kth element, sk is denoted as information at user side which is modulated on the kth subcarrier. H AðBÞ ¼ diagðhAðBÞ;k Þ 2 C K2K is the matrix of channel response in complex domain, where the diagonal element hA;k (hB;k ) is the response of channel from target users mapped on subcarrier k to node A(B). nA ðnB Þ 2 C K1 is the noise vector, which is a zero-mean circularly symmetric complex Gaussian random process nA ðnB Þ NCð0; r2 IK Þ, and r2 is significantly the noise variance. The compression and its inverse process is similar to the scalar case, q 2 CK1 is the noise vector along with the compression, and the mean value is zero, U 2 CKK is covariance variance matrix. by B ¼ yB þ q

q NCð0; UÞ

ð5Þ

To make the optimization simple and solvable, the subcarriers are assumed orthogonal, so that the achievable rates must satisfy:

A Novel AI Based Optimization of Node Selection and Information Fusion

h Iðs; yA by B Þ ¼

K X

Iðsk ; yA;k by B;k Þ

19

ð6Þ

k¼1

yA;k is the received signal from base station A on subcarrier k. by B;k indicates the reconstructed signal decompressed by base station A. So we can obtain the overhead of information fusion link: RBH ¼ Iðby B ; yB Þ ¼

K X

Iðby B;k ; yB;k Þ CBH

ð7Þ

k¼1

Define R ¼ frjr 2 RKþ ; 1T r CBH g the set of all involved compression rates vector, the kth element rk is modeled as the rate of compression on the kth subcarriers: rk ¼ Iðby B;k ; yB;k Þ

ð8Þ

Rate vector can be considered as fusion capacity allocation vector. In this part, we formulated the information fusion problem by arranging the cooperative capacity allocation with target pairing and wireless resources mapping. A set of binary variables xi;j;k 2 f0; 1g is introduced as control symbol, when xi;j;k ¼ 1, the ith user located in base station A is linked together with the jth user in base station B. This pair is also linked to subcarrier k else xi;j;k ¼ 0. The optimization problem is established and aimed at maximizing fusion capacity with limited link capacity constraints: max x;r;U

s:t:

N X N X K X

xi;j;k Ri;j;k

i¼1 j¼1 k¼1 N X K X

xi;j;k ¼ 1; 8i

j¼1 k¼1 N X K X

xi;j;k ¼ 1; 8j

ð9Þ

i¼1 k¼1 N X N X

xi;j;k 1; 8k

i¼1 j¼1

rT 1 CBH Ri;j;k is the speed of pair ði; jÞ mapped on the subcarrier k using the compress-andforward scheme. For convenience, objective function is denoted as follows: f ðx; r; UÞ ¼

N X N X K X i¼1 j¼1 k¼1

xi;j;k Ri;j;k

ð10Þ

20

Y. Gao et al.

4 Simulation and Analysis Based on the discussion above, we perform the simulation to verify the proposed method. First of all, we discuss the simulation parameters and assumptions. 4.1

Assumptions and Parameters

To ensure the accuracy of our method, we perform the system level simulation. The system level simulation is used to create the multiple user environment, and test the stability and overall performance of the system. In this simulation, we create the 3 cell 9 sector environment, the radius of the cell is set to 50 meters to simulate the UDN scenario, the transmission power is also limited to 1 W and the carrier frequency is set to 5.1 GHz, which is the typical 5G scenario. Number of user is set to 50, so the target user could have more choice to see which node is more suitable for transmission. Note that all the information from base station and users are global parameters, both historical data and state of art information is considered globally (Table 1). Table 1. Simulation parameters. Name Cell layout Cell radius Bandwidth BS Tx power Max Re-TX time Carrier frequency HARQ scheme Channel model Pathloss Shadowing std Noise power Service type Simulation TTIs User number UE movement CQI measurement AMC level

Parameter 3 Cell/9 Sector with wrap around 50 m 100 MHz 1w 4 5.1 GHz IR SCME-Dense urban L = 128.1 + 37.6log10(R) 4 dB −107 dBm Full buffer 2000 50 0.5 m/s Ideal QPSK (R = {1/8, 1/7, 1/6, 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5}) 16QAM (R = {1/2, 3/5, 2/3, 3/4, 4/5}) Simulation length 2000 s Wi-Fi info 802.11n (3GPP 2011v07) Bluetooth info Intel BT 4.1 V2 NFC info ISO18092 Average lk lk ¼ 0 STD rk ¼ 1

A Novel AI Based Optimization of Node Selection and Information Fusion

4.2

21

Results and Analysis

In Fig. 2, we evaluate the performance of our proposed method compared to simulation. There are four sub-figures listed. The x-axis means the time in second, the blue dotted line means the optimization provided by our AI method and the red line means the simulation result using exhaustive search, so that means it is the optimal result based on initial seed and simulation environment. If our proposed method can approach the simulation that means the AI based optimization work well. The first sub figure means the pairing successful rate, it is defined the probability that the user find the right node to pair among all candidates. With the time elapse, we can see that the rate gradually increase and approach to 1, and the result is a perfect match to simulation, which means that our proposed AI based node selection method work well and could choose the right node to pair all along with the process. The second figure is the average bit error rate considering our proposed method. It is also a perfect match to simulation result, when the time increase, our proposed method work better than simulation because of the information fusion, there is additional gain to the system. The third figure means the throughput in Gbps, we can see that our proposed method is working perfectly compared to the simulation, also in the last figure, it is the average throughput of all the users, it is proved that our novel method is working well in large scale systems.

Fig. 2. Performance of proposed method and simulation.

22

Y. Gao et al.

5 Conclusion In this paper, we propose and discuss the AI based node selection and information fusion in 5G wireless systems, the node selection method can perfectly choose the right node to pair considering the bandwidth and power. The information fusion method can help increase system overall performance through multi stage information fusion. Simulation result indicates that, our proposed method could work well and the complexity is quite small. Acknowledgment. This work is funded by National Nature Science Foundation of China under grant of 61701503. The author would also like to thank all the reviewers, their suggestions help improve my work a lot.

References 1. David, K., Berndt, H.: 6G vision and requirements: is there any need for beyond 5G? IEEE Veh. Technol. Mag. 13(3), 72–80 (2018) 2. Xie, W., Mao, N.T., Rundberget, K.: Cost comparisons of Backhaul transport technologies for 5G fixed wireless access. In: 2018 IEEE 5G World Forum (5GWF), Silicon Valley, CA, pp. 159–163 (2018) 3. Ranaweera, C., et al.: Optical X-haul options for 5G fixed wireless access: which one to choose? In: IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Honolulu, HI, pp. 1–2 (2018) 4. Jaber, M., Lopez-Martinez, F.J., Imran, M.A., Sutton, A., Tukmanov, A., Tafazolli, R.: Wireless backhaul: performance modeling and impact on user association for 5G. IEEE Trans. Wirel. Commun. 17(5), 3095–3110 (2018) 5. Ramantas, K., Antonopoulos, A., Kartsakli, E., Mekikis, P., Vardakas, J., Verikoukis, C.: A C-RAN based 5G platform with a fully virtualized, SDN controlled optical/wireless fronthaul. In: 2018 20th International Conference on Transparent Optical Networks (ICTON), Bucharest, pp. 1–4 (2018) 6. Zhang, X., Zhu, Q.: Scalable virtualization and offloading-based software-defined architecture for heterogeneous statistical QoS provisioning over 5G multimedia mobile wireless networks. IEEE J. Sel. Areas Commun. 36(12), 2787–2804 (2018) 7. Valenzuela-Valdés, J.F., Palomares, Á., González-Macías, J.C., Valenzuela-Valdés, A., Padilla, P., Luna-Valero, F.: On the ultra-dense small cell deployment for 5G networks. In: 2018 IEEE 5G World Forum (5GWF), Silicon Valley, CA, pp. 369–372 (2018) 8. Alouini, M.S.: Paving the way towards 5G wireless communication networks. In: 2017 2nd International Conference on Telecommunication and Networks (TEL-NET), Noida, p. 1 (2017) 9. Duangsuwan, S., Jamjareegulgarn, P.: Detection of data symbol in a massive MIMO systems for 5G wireless communication. In: 2017 International Electrical Engineering Congress (iEECON), Pattaya, pp. 1–4 (2017) 10. Poularakis, K., Iosifidis, G., Sourlas, V., Tassiulas, L.: Exploiting caching and multicast for 5G wireless networks. IEEE Trans. Wirel. Commun. 15(4), 2995–3007 (2016) 11. Kibria, M.G., Nguyen, K., Villardi, G.P., Zhao, O., Ishizu, K., Kojima, F.: Big data analytics, machine learning, and artificial intelligence in next-generation wireless networks. IEEE Access 6, 32328–32338 (2018)

A Novel AI Based Optimization of Node Selection and Information Fusion

23

12. Li, R., et al.: Intelligent 5G: when cellular networks meet artificial intelligence. IEEE Wirel. Commun. 24(5), 175–183 (2017) 13. Osuwa, A.A., Ekhoragbon, E.B., Fat, L.T.: Application of artificial intelligence in Internet of Things. In: 2017 9th International Conference on Computational Intelligence and Communication Networks (CICN), Girne, pp. 169–173 (2017) 14. Mata, J., et al.: Application of artificial intelligence techniques in optical networks. In: 2018 IEEE Photonics Society Summer Topical Meeting Series (SUM), Waikoloa Village, HI, pp. 35–36 (2018) 15. Cayamcela, M.E.M., Lim, W.: Artificial intelligence in 5G technology: a survey. In: 2018 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, pp. 860–865 (2018) 16. Ferreira, P.V.R., et al.: Multi-objective reinforcement learning-based deep neural networks for cognitive space communications. In: 2017 Cognitive Communications for Aerospace Applications Workshop (CCAA), Cleveland, OH, pp. 1–8 (2017) 17. Ferreira, P.V.R., et al.: Multiobjective reinforcement learning for cognitive satellite communications using deep neural network ensembles. IEEE J. Sel. Areas Commun. 36 (5), 1030–1041 (2018) 18. Jiang, W., Strufe, M., Schotten, H.D.: A SON decision-making framework for intelligent management in 5G mobile networks. In: 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, pp. 1158–1162 (2017) 19. Jiang, W., Strufe, M., Schotten, H.D.: Experimental results for artificial intelligence-based self-organized 5G networks. In: 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, QC, pp. 1–6 (2017) 20. Atawia, R., Gacanin, H.: Self-deployment of future indoor Wi-Fi networks: an artificial intelligence approach. In: GLOBECOM 2017, 2017 IEEE Global Communications Conference, Singapore, pp. 1–6 (2017) 21. Rafique, D., Velasco, L.: Machine learning for network automation: overview, architecture, and applications [Invited Tutorial]. IEEE/OSA J. Opt. Commun. Netw. 10(10), D126–D143 (2018) 22. Hamidouche, K., Kasgari, A.T.Z., Saad, W., Bennis, M., Debbah, M.: Collaborative artificial intelligence (AI) for user-cell association in ultra-dense cellular systems. In: 2018 IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, pp. 1–6 (2018) 23. Huang, D., et al.: Deep learning based cooperative resource allocation in 5G wireless networks. Mob. Netw. Appl. (2018)

Using Automated State Space Planning for Effective Management of Visual Information and Learner’s Attention in Virtual Reality Opeoluwa Ladeinde(B) , Mohammad Abdur Razzaque(B) , and The Anh Han(B) Teesside University, Middlesbrough, UK {O.Ladeinde,M.Razzaque,T.Han}@tees.ac.uk

Abstract. Educational immersive virtual reality is often tasked with minimising distractions for learners and maintaining or signalling their focus to the right areas. Managing location, density and relevancy of visual information in the virtual environment pertain to this. Essentially this problem could be defined as the need of management of cognitive load from the visual information. To aid in the automated handling of this problem, this study investigates the use of automated state-space planning to model the current “state” of the virtual environment, and determine from a given pool of steps or “actions”, a sequence that prioritise minimising cognitive load from visual information through planning the location and density of objects. This study also investigates modelling the state of what a learner has been informed of and applied. This enables planning to determine when to have the learner relate concepts to existing knowledge for deeper knowledge; planning their generative learning. These states are planned in conjunction with the virtual environment states. The planning is also responsive to identified changes in the learner’s deviated attention, or performance with the task. Together it has the potential to minimise the cognitive load from being taught intrinsic information, and minimising extraneous information from the virtual environment. What was produced currently does not yield many results beyond the method of planning helping the virtual reality applications manage where information appears, but it at least also established a framework for future testing, and improvements to the used methods. This paper provides in more detail, the background for this topic in immersive virtual reality, its significance, the methods used and an evaluation of the method and how further investigations will be continued. Keywords: Immersive virtual reality · Signalling · Cognitive load theory · Generative learning · State space planning Planning Domain Definition Language

c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 24–40, 2020. https://doi.org/10.1007/978-3-030-29513-4_3

·

Using Automated State Space Planning for Effective Management

1

25

Introduction

Virtual reality applications continue to be applied in new ways to employ various kinds teaching methods - in training applications, lectures and serious games1 . As such, there is a large quantity of research on the use of virtual reality applications for education, ranging from comparisons of other educational media2 , to studying how to take advantage of benefits specific to virtual reality applications interfaces3 . Past studies have explored how student motivation and assessed knowledge can be influenced by a range of factors, including differing contexts, subject taught, method of using the media, how the media is used in relation to the learning aims. Their implications, as well as a range of factors which draw attention to the way media is used rather than the medium itself, rather than to directly link any potential improvements to student motivation and assessed knowledge to the use of media to meet the learning aims. It is important to distinguish the media used (virtual reality), from the teaching methods it can be used for. A commonly cited debate was between Richard Clark [7] and Robert Kozma [16] on media influencing learning. A point could be drawn from the former that learning could be executed regardless of media, as the same cognitive processes could be drawn from the learner [7]. As such, a quantity of research studies focus on identifying what within these media can cause learning, and how to optimise learning in these media. This study is in virtual reality applications for a form of education. A specific teaching methods we discuss are on the use of generative learning techniques to build from the learner’s existing knowledge to have the reach their learning aims [9]. The use of generative learning techniques has been performed before in immersive virtual reality in Parong’s study [22], where the learning outcomes were improved by the learners being asked to summarise what they learnt in writing, as they went through the application. Our approach to using generative learning techniques differs, as it would be to have them process the knowledge by applying it in the virtual environment. Our paper also discusses the management of the learner’s cognitive load [28]. A specific challenge this study investigates is the automated managing of the presentation of visual information (which can present intrinsic or extraneous cognitive load) in immersive virtual reality. Figure 1 is an illustration made to provide an example of how learners can process concepts. This study also investigates how learning aims can be automatically ordered and taught to the learners as a new method to optimise their learning in this media. To perform this, this study is applying the often investigated but rarely applied [18] methods of AI based planning to identify from a set amount of information on the learner, how to teach what they wish to learn from a list of 1 2 3

Serious games: Games created with the aim of educating or informing the player. Such as lecture powerpoint presentations [22]. Like optimising immersion, visualisation or fidelity of interaction in virtual reality applications [5].

26

O. Ladeinde et al.

Fig. 1. An illustration of processing intrinsic concepts “A”, “B”, “C” and “D” to meet the aim “Z”. Generative learning techniques would have the tasks get processed and summarised individually and become germane (illustrated as concepts “X” and “Y”), which would be easier to process than all the concepts together. Having to identify what bits of information are relevant to the learner’s learning (irrelevant/extraneous information illustrated by “M” and “L”) aims increases cognitive load.

limited actions a given virtual reality application can perform. This would be done through the use of AI based planning4 techniques (which will simply be referred to as “planning” henceforth in this paper). Planning can and has been used to have the educational events in (nonimmersive) virtual reality applications adapt to be at the level of understanding and comfort of the learner [1,25]. Those studies attempted to use planning to change what happened in the virtual world according to what was expected of the user’s engagement. This study wishes to apply planning to learning in immersive interactive media5 . This would involve using planning to help the learner meet learning aims, by using generative learning techniques to create germane information (which could be referred to as “knowledge” in this paper) needed to easily relate or apply deeper learning aims (illustrated in Fig. 2). This would also involve identifying the learner’s current knowledge on a subject, what in that knowledge has been brought to memory and how to make use of it. This would be done to achieve the learning aims as effectively as possible for the learner from the list of available actions the educational immersive interactive media can do; personalised to what is recorded of the learner’s knowledge and preferences. Ultimately, the study proposed in this paper wishes to investigate how one could use planning for managing the cognitive load from the visualisation of objects, text or information in immersive interactive media - which it currently 4 5

Planning is a known field, which - when phrased broadly - consists of the use of algorithms to identify a series of needed steps, to reach an identified goal. Interactive media is often tasked with minimising the distractions from its visualisation if it needs the user to focus on a specific area.

Using Automated State Space Planning for Effective Management

27

Fig. 2. An illustrated diagram on actions informing the learner or having them apply information, to have them able to perform other actions. Also illustrating intrinsic and extrinsic as the relevancy of objects to the action.

has not been applied to - as well as using planning for identifying what actions can be given to the user to personalise their experience in this media, to help them reach the learning aims. This benefits contexts such as this study’s case study in radiotherapy treatment. Generally, insufficient health literacy, and hard to process instructions often make patients unable to follow the instructions given by their physician [2,4,6,8,14,21,26]. A requirement for radiotherapy treatment for prostate cancer is for patients to have the correct level of bladder filling between their scan and the treatment itself. This is to minimise damage to surrounding organs. Patients often need to have their treatment turned down due to an inability to correctly interpret and follow the given instructions - often choosing not to drink water at all before the treatment [13,20]. For this case, virtual reality applications could be a medium to help patients visualise the effects of their actions on their bladder and the resulting treatment, to inform them of their timing in drinking water, how it changes the position of the organs in the digestive system, and how this would affect the radiotherapy treatment which would inform them on why treatment could need to be deferred, and how to avoid this. Ultimately, using virtual reality for this may have the potential to remove any extrinsic cognitive load from having to understand the written terms in their informative pamphlet, making it easier to process and understand. There would be less of a need for patients to have the higher health literacy to process the instructions (which is still an issue), making them more likely to follow their given instructions. This case study enables this study to use make use of how immersive virtual reality can help visualise information, but also to use planning around the

28

O. Ladeinde et al.

different levels of health literacy, and present information relative to the understanding of the users. We intend to use planning to construct the events and ways of interacting with the virtual reality application, around the preferences and knowledge of the user, which in this case would be on information pertaining to the effects of bladder filling on their radiotherapy treatment, and related knowledge. Using planning would avoid overly simplifying or complicating the information given to the patient’s health literacy. It is important to state that this study does not test in the actual case study itself, but it merely modelled around it. Ultimately, the first goal of this study is to apply state of the art uses of planning into virtual reality applications, to given a limited pool of learning aims - go through actions to meet those aims in a way preferred and most effective to the learner. The second goal of this study is to adapt the planner to maintain or control the benefits of using virtual reality applications to help a learner visualise information.

2

Background and Significance

To reiterate, this paper looks at how we can minimise the cognitive load from the visual information in the virtual world presented by the actions, to reach individual learning aims. This would be in relation to the media being virtual reality, which requires consideration of the location of objects, text, images, and models in 3D space. This would also require consideration of where the user would distribute their attention in this virtual scene. A large portion of this study is based in planning. There are two key areas that need to be discussed, to provide a scope on what the methods this study wishes to improve upon; the information the planning domains, problems and solutions would need to contain or handle. First we will discuss cognitive load theory (as a more well known example) and how one could apply visualisation techniques in virtual reality while managing the learner’s cognitive load. This would then give a scope to our next discussion of uses of planning and how to plan these visualisation techniques. 2.1

Cognitive Load Theory

Cognitive load theory is a relatively well known example for modelling how learners process information [10,28]. This study uses patterns that can be derived from this, as well as similar models (such as Capacity model) [10] for learning in virtual reality. Cognitive load can be phrased as how burdensome processing tasks can become due to related, surrounding or unfamiliar information, that the learner needs to process to accomplish their task. There are three forms of cognitive load. Each of them, and their relation to processed information would be: – Intrinsic - how demanding tasks that have an identifiable relation to the learner’s goal are.

Using Automated State Space Planning for Effective Management

29

– Extraneous - how demanding restrictions with no direct relation to the learner’s goal are, or impositions from the task itself that pose restrictions to how the learner wishes to learn - barriers between learning. For instance, it may be easier for you (the reader) to look at Fig. 3 to understand the last sentence (where having to paint a picture from text would be an unneeded step), or that text may be easier for you understand than Fig. 3. – Germane - relatable experiences, patterned information or actions that reduces the cognitive load, such as using a known methods to perform a task. It is substantially easier to process than identifying or processing given information or a newer concept. This list is also illustrated in Fig. 3.

Fig. 3. An illustration of what cognitive load is commonly split into, in relation to actions and aims in a virtual environment.

Ultimately, learning can be enhanced by: – identifying what one could focus on/ do to perform a task successfully [27], – minimising distractions, and – the task should use schema, routines or other knowledge familiar to the learner. This brings importance to managing tasks to be relative to the learner’s current understanding. This also requires using information the learner could immediately utilise, apply or study. Managing cognitive load also extends to the presentation of the information given. It also extends to ensuring the relevance of what the user sees; which phrased differently is how information is visualised in immersive interactive media. Virtual reality needs to manage the display of information in 3D space, which provides a large area for distraction or moving relevant information out of view. Identifying how what can make it easier for users mentally sort through information in 3D6 or how to render information to make it easier to identify key features in 3D7 , are techniques that could help with visualising information. 6 7

Like in a study by Kyritsis [17]. Like studies on making it easier to identify features in medical data and biology [15, 19, 23, 24] which relates to our case study.

30

O. Ladeinde et al.

This study primarily looks at the aims of actions in the virtual environment, and the location and ordering of them to benefit the learner’s focus and learning respectively. This is as opposed to studying the actual techniques involved in the actions themselves (how they visualise) to benefit each task. Similar to what was identified in Kyritsis’ study [17] that identified categorising objects of similar theme together made it easier for the users to identify the needed objects, we will have actions use locations in a way that groups related objects together on request of a given action. 2.2

A Syntactical Language for Planning

Planning can be used to identify a sequence of actions needed to reach an aim, given a number of assumptions about the state of the world at the start, and how each action changes the world. There are many applications for planning, in robotics [30], medical treatment plans [3], software applications and more domains it could be applied to [18]. The specific domain this paper will cover is for software applications that use an overarching AI agent to determine what happens in the application. In them, identified plans, or generated solutions are constructed for the agent to use and call the actions accordingly, or call for another plan if the agent believes the plan has failed, or wishes to identify a new goal during the plan. Figure 4 was drawn to help illustrate this.

Fig. 4. An illustration of how key components of the virtual environment work. The solution, as well as what generates the solution as the “planner”. The storing of information of the user, and the state of the world that changes according to changes, or failed plans as well as calls and uses planner solutions called the “Lesson Handler”. Finally, what the user sees and interacts with in the “Virtual Environment”

Using Automated State Space Planning for Effective Management

31

These particular plans are not intended for the users to see directly. They are only intended to see the result of the plan. However, these plans can still have their intended actions or their reasoning made to be transparent or “explainable” to the user [12]. There are two perspectives this paper identifies: from a user perspective and from a designer perspective. The aims and direction of the planner’s generated plan, as well as why it came to the conclusion this approach would be optimal could be explained to the user. For one, this could potentially improve the trust and motivation of the user to follow the plan, similar to how motivation and matching identified aims has been argued to improve learning outcomes [29]. Another benefit would be to identify and correct what the user believes the planner has yet to identify about them, or identified incorrectly about them. The user can inform the planner of any changes to their preferences, or to how they wish to learn. Learning is often stated to require information to flow two ways - to learn about the learner and teach according to that - which this method pertains to. Ultimately, we’ll use the explain-ability of our plans for the user to replan not just on failure, but also on request or an information update. This is in line with one of the aims of the project being to personalise the learning the learner. It can also theoretically improve the trust from the learner and therefore the learning outcomes, which is a passive benefit that will not be studied in this project, in contrast to the former. As for explain-ability for domain builders, this would be to make it easier to trace back knowledge or other parameters to an action, or why a given action can not be performed. This would ultimately be for convenience in identifying what to build in the domain, rather than any direct benefit to reaching the learning aims. This study will also be writing domains and solutions under the Planning Domain Definition Language (PDDL), which standardises planning domains and problems as a modelling language [11,18]. It is one of the oldest conventions for planning, yet has been argued to not have too many researched applied uses, even with tools given and a lot of research on the modelling language itself [18]. Having our study be written under a known convention could provide example uses of PDDL, to influence other project or research uses of PDDL, or vice versa. This study is working to apply PDDL planning domains, problems and solutions, to plan principals of cognitive load theory, and generative learning into the learning in an immersive interactive education application. The significance of this comes from the use of planning for learning in immersive virtual reality, and to use teaching techniques for using the media to automatically personalise the learning for deeper learning aims. This study wishes to investigate if the theorised benefits of having the planner identify when the learner has been “informed” of something, and when the learner has “applied” something allows it to match the learning in an optimal static learning sequence, while retaining the benefits of personalised learning. The aims would be structured around having information be immediately relatable to known concepts of the user, based on ideas from making information

32

O. Ladeinde et al.

intrinsic, or to apply known germane information from cognitive load theory (with the latter also relating to generative learning theory). The aims would also be structured around having the user apply the information, based on generative learning techniques. Also the planner will place objects in a way to avoid them being extraneous distractions to the task, and related to related actions.

3

Method

In order for it to be intuitive to use a planner to decide the objects that can be created, the information that could be presented or the actions that can be performed, they need to be designed in a way that grants that flexibility in creation, but also interfaces it to the planner at request. Objects could be considered anything that the user can interact with, that takes a position in the virtual world and can either be summoned or destroy. For a Unity based project, they could simply be GameObject prefabs (the base structure of an Object that can be copied), and listed referred to by name. Areas could be considered to be a collection of points in the virtual world that could be identified as the given area with how the points relate to its center illustrated in Fig. 5. They would be identified by name.

Fig. 5. An illustration of an area. The centre of it, as well as empty points objects could appear in are visible. This illustration contains 3 cube shaped objects.

Knowledge could be considered as any learning aim or skill that would aid or be required to use or relate the relevance of an interface in the virtual world, or needed to be able to relate a future learning aim. This can be information simply told to the learner (“informed”), or the learner summarising what they’ve learnt by relating it to use in the virtual world (“applied”). Actions could be considered any step that causes a change to the world - the user’s knowledge or the objects in the world. Preferences could be considered any restrictions the user has in interacting with the virtual world (example: absolutely must be sitting at all times), or way the user would prefer to interact with the virtual world (example: physically touch and pick up virtual objects with the remote).

Using Automated State Space Planning for Effective Management

33

Fig. 6. A generated domain. The values “unknown” are not used and will not be in future versions.

The approach taken in this study was to create a class that facilitates turning existing actions with learning aims, and their associated information, into planning domains. This approach also identifies problems and obtains a solution.

34

O. Ladeinde et al.

Domains are written specifically for this application (although the structure and use of each term could be reused for any PDDL domain or problem). Although the virtual scene could have created and used an internal planning algorithm, using and writing into a standardised model for planning enables it to be compared to existing models, and its implementation easier to be replicated. With more uses, more techniques as well as problems to improve on are identified, which being under the same convention, helps other uses improve from this. This also lets studies such as this attempt to identify drawbacks and benefits from unique ways of using the conventional language. One such approach was for the made domains to be dynamic, in terms of what actions and knowledge are declared in the domain. The domain changes during the run-time of the application. Even with (relatively at its time) substantially faster planning algorithms like fast-forwards, more actions and conditions slow down how quickly a planner can identify a solution. While it may help this problem, the specific reason this study is taking this approach is actually to shape the solutions the planner finds to be more personalised to the user by using their preferences8 . We would therefore omit from the domain, actions that do not match the preferences and knowledge that isn’t used9 by any action. The domains are written as follows. The planner will only be able to plan for factors in the scene that are declared to it. Areas - Pertains to an in virtual world area, with associated predicates and functions. User(s) - Pertains to the user with associated predicates and functions. Actions - Correspond to a possible action and written as actions. Objects - Built into the domain by predicates. Each have their own density value that is automatically set the change of density in the actions that use them. Knowledge - Built into the domain by predicates. In the PDDL domain, the sections are as follows: – “:types”: • “user”: Although in the model there is only one, this grants the ability for a planner to identify the focus of multiple users, as well as their knowledge and plan around that. • “area”: An area – “:functions”: • “max area density ?the area - area”: Having this parameter means the planner can identify where objects can be placed, when objects need to be moved or where to shift user attention. This can be the case as long as these rules are attached to the actions. 8

9

Personalised learning learns from the learner as they learn, which occurs both before and during learning. Preferences on the other hand are strictly identified before learning begins, and changed and re-planned when their preferences have been identified to be different from what was expected, such as when the user states their preferences have changed. Knowledge that is not involved in any action precondition or effect.

Using Automated State Space Planning for Effective Management

35

Fig. 7. An illustration of the user having difficulty doing the required task, where the planner identifies that the user does not understand concept B, so removes that state and re-plans, where the first action is to get the user to understand concept B

• “area density ?the area - area”: A implicit value to refer to how dense a given area is. Areas having an individual density lets them – “:predicates”: • “person focus inarea ?the area - area ?the user - user”: Having this parameter means the planner can identify where the user is expected to be looking at, and how therefore to know to use an area the user is already looking at, or to either call actions to shift the user’s focus to a different area with less density. • For every object type “object ’name’ ?the area - area”: These are automatically laid out in the domain. The main use for this is to be able to create and destroy objects in the planner. In virtual reality, it is possible for objects to be created or destroyed instantly, at any time during the process, as well as multiple of a given object to be created, which is unique to virtual reality. Having them as predicates also enables actions to check for types in a given area, without specifying every object type as a predicate. • For every knowledge type “informed ‘name’ ?the user - user” and “applied ‘name’ ?the user - user”: These would be the main states for the user in the planner, and there can be multiple. Having them as “informed” and “applied” enables the planner to involve generative learning in the plans, as long as these rules are attached to the actions. Problems use these same definitions, but naturally identify the current state of the scene (in ‘:init’), and the knowledge that the user is aiming to obtain (in ‘:goal’).

36

O. Ladeinde et al.

These domains and problems are automatically created by the AI agents in the virtual world. An example of a generated domain is shown in Fig. 6. There may be instances where the user is having difficulty performing a task or performs it with ease. In those instance, the planner would re-plan so the information pertains to them, illustrated in Fig. 7. As outlined, the domain contains a parameter for where the learner is expected to focus on. A key time the planner would like re-plan would be when the user has been identified to not be focusing on the area expected, illustrated in Fig. 8.

Fig. 8. An illustration of the user having their attention identified to be fixating on a different area, and the planner creating a new plan where the first action is to bring the user’s attention to the area2.

3.1

Approach to Obtaining Results on Matching the Learning Aims

To reiterate, the study is not on influencing or improving learning using virtual reality, but on how well the uses of stereoscopic view, control and immersive interface in virtual reality can use generative learning techniques, personalised learning and manage cognitive load with the assistance of planning [7]. As such, in order to quantify this we would identify the following in a comparison between two versions of a virtual reality application that makes use of planning, and one that is preset: – Personalising of the learning to the user. Compared to a version without planning, similar to other studies for the use of planning in a virtual environment [25]. This would have to be identified via a survey. – Managing the attention of the user. This method should match that of both versions.

Using Automated State Space Planning for Effective Management

37

What we would identify from our scene would be if planning can successfully perform generative learning relative to the understanding of the learner. Planning has been applied to non-immersive virtual reality before, but the teaching methods of generative learning or management of cognitive load have not been performed in conjunction with stereoscopic visualisation of information. There are a few challenges in determining the implicit question if planning “works” in virtual reality, or more precisely if it can be used to improve learning in that media. A challenge in identifying where the user is focusing on, is the lack of retina scan. The used virtual reality device can identify what direction the headset is facing, which can provide a vague idea on where the user is looking it. However, this may differ from where the use is actually looking (which can be found through retina scans) illustrated by Fig. 9. To alleviate this, areas would be spread out so the area of focus could be identified and not be ambiguous.

Fig. 9. An illustration of the headset facing area 1, yet the user’s retinas are looking closer to area 2

4

Conclusion

Only actions were produced for testing out the plan. There was also little need to involve areas in the usage, but a plan solution could be generated given a problem, and run in the virtual world, using the expected locations. This is similar to the static scripted version of the scene, except any problem could be specified and a plan would be produced and run. For locations, if the user focused on a different area for too long, that would be successfully identified. The virtual scene is shown in Fig. 10. As such, it may have greatly helped the study if another case study was chosen, as there would have been a substantially larger amount of actions that could be applied for knowledge and learning aims. Assessing the affects of personalising learning would be long term - over multiple uses - which is both challenging

38

O. Ladeinde et al.

Fig. 10. An illustration of the virtual scene running

to do, and difficult to compare when most studies are on a singular use when the affects of learning, user motivation and diversity in their knowledge is something that needs to be studies over a long term use. The case study still gave a lot of room for user preferences to be implemented. Regardless, PDDL was successfully used to apply responsive and personalised learning techniques in an immersive virtual environment. It also at least also established a framework for future testing, and improvements to the used methods by enabling any learning aim to be specified, the current knowledge specified, and the application would perform the actions to reach those learning aims as optimally for the user as possible. Acknowledgments. All authors and contributors are a part of Teesside University. The research is funded by Teesside University.

Using Automated State Space Planning for Effective Management

39

References 1. Alvarez, N., Sanchez-Ruiz, A., Cavazza, M., Shigematsu, M., Prendinger, H.: Narrative balance management in an intelligent biosafety training application for improving user performance. Int. J. Artif. Intell. Educ. 25(1), 35–59 (2015) 2. Boyle, J., Speroff, T., Worley, K., Cao, A., Goggins, K., Dittus, R., Kripalani, S.: Low health literacy is associated with increased transitional care needs in hospitalized patients. J. Hosp. Med. 12, 918–924 (2017) 3. Bradbrook, K., Winstanley, G., Glasspool, D., Fox, J., Griffiths, R.: AI planning technology as a component of computerised clinical practice guidelines. In: Artificial Intelligence in Medicine, pp. 171–180 (2005) 4. Breuer, D., Lanoux, C.: Health literacy the solid facts (2013) 5. Buttussi, F., Chittaro, L.: Effects of different types of virtual reality display on presence and learning in a safety training scenario. IEEE Trans. Vis. Comput. Graph. 24(2), 1063–1076 (2017) 6. Cartwright, L., Dumenci, L., Cassel, B., Thomson, M., Matsuyama, R.: Health literacy is an independent predictor of cancer patients’ hospitalizations. Health Lit. Res. Pract. 1, e153–e162 (2017) 7. Clark, R.E.: Media will never influence learning. Educ. Technol. Res. Dev. 42(2), 21–29 (1994) 8. Dumenci, L., Matsuyama, R., Riddle, D., Cartwright, L., Perera, R., Chung, H., Siminoff, L.: Measurement of cancer health literacy and identification of patients with limited cancer health literacy. J. Health Commun. 19, 205–224 (2014) 9. Fiorella, L., Mayer, R.E.: Eight ways to promote generative learning. Educ. Psychol. Rev. 28(4), 717–741 (2016) 10. Fisch, S.M.: Bridging theory and practice: applying cognitive and educational theory to the design of educational media. In: Cognitive Development in Digital Contexts, pp. 217–234. Elsevier (2018) 11. Fox, M., Long, D.: PDDL2. 1: an extension to PDDL for expressing temporal planning domains. J. Artif. Intell. Res. 20, 61–124 (2003) 12. Fox, M., Long, D., Magazzeni, D.: Explainable planning. CoRR, abs/1709.10256 (2017) 13. Hynds, S., McGarry, C.K., Mitchell, D.M., Early, S., Shum, L., Stewart, D.P., Harney, J.A., Cardwell, C.R., O’Sullivan, J.M.: Assessing the daily consistency of bladder filling using an ultrasonic bladderscan device in men receiving radical conformal radiotherapy for prostate cancer. Br. Inst. Radiol. 84(1005), 813–818 (2011) 14. Jessup, R.L., Osborne, R.H., Beauchamp, A., Bourne, A., Buchbinder, R.: Health literacy of recently hospitalised patients: a cross-sectional survey using the health literacy questionnaire (HLQ). BMC Health Serv. Res. 17, 52 (2017) 15. Kitaura, Y., Hasegawa, K., Sakano, Y., Lopez-Gulliver, R., Li, L., Ando, H., Tanaka, S.: Effects of depth cues on the recognition of the spatial position of a 3D object in transparent stereoscopic visualization. In: International Conference on Innovation in Medicine and Healthcare, vol. 71, pp. 277–282 (2017) 16. Kozma, R.B.: Will media influence learning? Reframing the debate. Educ. Technol. Res. Dev. 42(2), 7–19 (1994) 17. Kyritsis, M., Gulliver, S., Feredoes, E.: Environmental factors and features that influence visual search in a 3D WIMP interface. Int. J. Hum. Comput. Stud. 92– 93, 30–43 (2016)

40

O. Ladeinde et al.

18. Long, D., Dolejsi, J., Fox, M.: Building support for PDDL as a modelling tool. In: KEPS 2018, p. 78 (2018) 19. Moraes, T.F.D., Amorim, P.H., Silva, J.V., Pedrini, H.: Isosurface rendering of medical images improved by automatic texture mapping. Comput. Methods Biomech. Biomed. Eng.: Imaging Vis. 1–8 (2016) ´ McNair, H., Norman, A., Miles, E., Hooper, S., Davies, M., Lincoln, 20. O’Doherty, U., N., Balyckyi, J., Childs, P., Dearnaley, D., Huddart, R.: Variability of bladder filling in patients receiving radical radiotherapy to the prostate. Radiother. Oncol. 79(3), 335–340 (2006) 21. Paasche-Orlow, M., Parker, R., Gazmararian, J., Nielsen-Bohlman, L., Rudd, R.: The prevalence of limited health literacy. J. Gen. Int. Med. 20, 175–184 (2005) 22. Parong, J., Mayer, R.: Learning science in immersive virtual reality. J. Educ. Psychol. 110, 785 (2018) 23. Prasolova-Forland, E., Hjelle, H., Tunstad, H., Lindseth, F.: Simulation and visualization of the positioning system of the brain in virtual reality. J. Comput. 12, 258 (2017) 24. Preim, B., Baer, A., Cunningham, D., Isenberg, T., Ropinski, T.: A survey of perceptually motivated 3D visualization of medical image data. Comput. Graph. Forum 35(3), 501–525 (2016) 25. Prendinger, H., Alvarez, N., Sanchez-Ruiz, A., Cavazza, M., Oliveira, J., Prada, R., Fujimoto, S., Shigematsu, M.: Intelligent biohazard training based on real-time task recognition. ACM Trans. Interact. Intell. Syst. 6(3), 21:1–21:32 (2016) 26. Safeer, R., Keenan, J.: Health literacy: the gap between physicians and patients. Am. Fam Physician 72, 463–468 (2005) 27. Sweller, J.: The worked example effect and human cognition. Learn. Instr. (2006) 28. Sweller, J.: Cognitive load theory, chapter 2. In: Psychology of Learning and Motivation, vol. 55, pp. 37 – 76. Academic Press (2011) 29. Vermunt, J.D., Donche, V.: A learning patterns perspective on student learning in higher education: state of the art and moving forward. Educ. Psychol. Rev. 29(2), 269–299 (2017) 30. Zhang, Y., Sreedharan, S., Kulkarni, A., Chakraborti, T., Zhuo, H.H., Kambhampati, S.: Plan explicability and predictability for robot task planning. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1313– 1320, May 2017

Real-Time Lane Detection and Extreme Learning Machine Based Tracking Control for Intelligent Self-driving Vehicle Sabir Hossain, Oualid Doukhi, Inseung Lee, and Deok-jin Lee(&) Department of Mechanical Engineering, Kunsan National University, Gunsan, Republic of Korea {sabir,doukhioualid,vxz98,deokjlee}@kunsan.ac.kr

Abstract. The rising self-driving technological innovations are viewed as brimming with challenges and opportunities because of its tremendous research territory. One of the challenges for the autonomous vehicle is straight and curve line detection to enhance the assistance in the autonomous characteristics. We will use a unique way of detecting a curve line algorithm in the vehicle based on the Kalman filter as well as the parabola equation model to calculate the parameters of the curve lane. For robust stability and performance, we will use an on-line sequential extreme learning machine method. We present our proposed result through the simulation study. Keywords: Self-driving vehicle Image transformation

OS-ELM Kalman filter Lane detection

1 Introduction Reducing the risk of a vehicle collision and increasing road safety has been the concern for automobile engineers from a long time. The major traffic accident usually occurs due to inappropriate speed on the turns of the road or abrupt lane changes [1]. Innovation in some advanced driver assistance system like lane keeping aiding method, adaptive cruise control can prevail the traffic accident in a hostile situation [2] and can be a safer and convenient solution for autonomous lane keeping vehicle [3]. Recently, research on the autonomous vehicle and advanced driver assistant system is also in peak position since it has received a huge amount of attention from the research community of autonomous vehicle [4]. Providing an effective lane keeping and cruise control system is our key purpose in this paper. This research was supported by Unmanned Vehicles Advanced Core Technology Research and Development Program through the National Research Foundation of Korea (NRF), Unmanned Vehicle Advanced Research Center (UVARC) funded by the Ministry of Science, ICT & Future Planning, the Republic Of Korea (No. 2016M1B3A1A01937245). It is also supported by the Development Program through the National Research Foundation of Korea (NRF) (No. 2016R1D1A1B03935238). Also, this research was funded and conducted under 『the Competency Development Program for Industry Specialists』 of the Korean Ministry of Trade, Industry and Energy (MOTIE), operated by Korea Institute for Advancement of Technology (KIAT). (No. N0002428, HRD program for 00000). © Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 41–50, 2020. https://doi.org/10.1007/978-3-030-29513-4_4

42

S. Hossain et al.

A robust lane detection technique should detect straight and curve line both in the near and far distance so that vehicle can take a prior decision for stable control on speed and steering. The feature of the road will be taken by a camera from the vehicular robot as a vision-based input of road. Every single frame in real time will be processed in the following ways while the vehicle is in operation mode. From a basic RGB image, the input will be transformed into YUV format. After that, the output image is converted to a binary image with improved quality and resize. After that top view image transformation technique [5] will be applied upon the image which will give the top of the road. A road lane shape can be split into two types of line, one of them is a straight line and another one is curved line [6]. Near view of the road lane always behave like a straight line where the Hough transformation method [7, 8] is utilized to predict the straight line on the image. On the other hand, the far view section of the road could be either a straight line or curve line [9]. If it’s a straight line, using the previous method straight line will be detected. If the far view section of road is a curve, we will use the Kalman filter based parabolic model of curve lane detection. Though the most common approach to curve lane detection is the least square estimation method [10], in our presented technique we apply the Kalman filter based curve line detection algorithm. The control speed and steering angle of the vehicle is obtained from the effective parameter estimation. For robust control, we use elm approach to get the desired output. All the proposed method is assessed using gazebo software as an environment for the vehicle and MATLAB for proposed algorithms.

2 Method and Process In this section, three main part of the algorithm will be discussed thoroughly. (A) The processing of the image before feeding it to the straight line or curve line model. (B) Straight-line detection using Hough transformation and curve line detection using Kalman filter. (C) ELM approach to control the steering angle. 2.1

Image Pre-processing

The image is pre-processed through several image processing filter before the image frame detects the lane in the image. Below all the filters are described briefly. (1) RGB to YUV conversation: The main reason to convert the image to YUV channel is to reduce the extra unnecessary features and colorful features from the image which is not important for lane detection. RGB is the most usable color space for display which is an accompaniment color model of green, blue, red and their reproduced broad array of colors. YUV color space encodes a color image taking into account the properties of the human vision that enable reduced bandwidth for chroma components without perceptual distortion [11]. In Fig. 1 we can see converting the RGB image to YUV allow us this other side of the image feature.

Real-Time Lane Detection and Extreme Learning Machine Based Tracking Control

43

Fig. 1. Real input RGB image (Left), Converted YUV format (Right)

(2) Binary Image Generation: To change the image in binary version, Otsu threshold [12] will be used which is statistical analysis to determine the optimal threshold for the image.

Fig. 2. Converted YUV image (Left), Generated Binary Image (Right)

The output result after the auto threshold of Otsu is depicted in Fig. 2. Iteration through all the possible threshold values is required. Also, this technique requires calculating a measure of spread for the pixel levels each side of the threshold. So that the pixel could be either fall in between of these two: background or foreground. We can get the threshold value after the background spreads and foreground sum reaches its least value [12]. (3) Top View Transformation: To change the view from the orthographic angle to the top view, the top view transformation algorithm [6] is used. But before transforming to enhance the result, we consider some filter like below. (a) Region Of Interest (ROI): The bottom area of a road image is known a region of interest (ROI) which is very essential feature of the road iamge. Separating ROI will increase the efficiency of lane detection method and eliminate the effect of the upper section of a road image. (b) Resize & Improve: The resize and quality filter to improve the visualization of the image. Next step, we will transform the image into the top view.

44

S. Hossain et al.

Using this transformation, the image becomes almost the same as real road image and the lines are parallel or close to parallel as shown in Fig. 3.

Fig. 3. Top-view transformed image

2.2

Lane Detection

Lane detection section consists of two parts as below. (1) Straight Line Detection: The straight line detection algorithm is produced by using a standard Hough transformation [7, 8] in the near view section. Using hough transformation, it is also possible to eliminate incorrect lanes so that they minimize the computational process and time.

Fig. 4. Hough transformation for the longest two lines

In the algorithm, it is commanded to choose the longest two lanes, that lines should be a pair of a parallel lane or almost parallel from the Hough transform result which is illustrated in Fig. 4. (2) Curve Lane Detection: A curve line section begins at the end of these two longest straight lines. The curve lane is estimated by the parabolic model as below. (a) Parabolic model: Image data (white points in the far section) involves uncertainties and noise generated during capturing and processing steps. Therefore, A Kalman filter can be adopted to form an observer against these irregularities as a robust estimator [13]. Initially, we took the equation of the curve line, which is a non-linear equation. In this case, the best-fit equations

Real-Time Lane Detection and Extreme Learning Machine Based Tracking Control

45

are the parabola equation [14]. In total, curve line detection algorithms are based on Kalman filter and Parabola equation. Now, we can implement the proposed algorithms in the image process to the curve line detection process like Fig. 5.

Fig. 5. The real experiment results in the curve line detection based on Kalman filter

2.3

Robust Control Using OS-ELM

The Extreme learning machine was designed for generalized single hidden layer feedforward neural network (SLFN) [15, 16], where the hidden layer actually doesn’t act like a neuron. Unlike other neural networks with Backpropagation in Fig. 6 [17], during the learning procedure, the input weights and the parameters of the hidden layers require no calibration, whereas the activation functions of the neurons are nonlinear piecewise continuous.

Fig. 6. Single hidden layer feed forward neural network [17]

46

S. Hossain et al.

The weights between the hidden layer and the output layer are calculated analytically [15]. The output of the ELM with N hidden modes can be modeled as YN ðxÞ ¼

N X

bi Fðx; ci ; ai Þ;

x 2 Rn ; ci 2 Rn

ð1Þ

i¼1

Where in Eq. (1), bi is the output weight connecting the ith hidden node to the output node, F ðx; ci ; ai Þ is the activation function of the ith node, in case of additive hidden nodes use Sigmoid or threshold as activation function the parameters ci and ai are the input weight and the bias which are randomly produced at the initialization step and then it is fixed afterwards. OS-ELM has come from the notion of batch learning of ELM. Training samples can be displayed in a sequential manner which can be either one by one or block by block with fixed or changeable length and eliminate that observation from training whenever training would be accomplished [17]. Figure 7 represents the control block diagram for velocity and steering angle of the robot (pioneer).

Fig. 7. Detailed schematic diagram of control algorithm

3 Environment Setup for Simulation The whole environment is designed in GAZEBO simulator and MATLAB is used for image preprocessing, lane detection algorithm and OS-ELM control feed.

Fig. 8. Gazebo real-time physic engine simulation environment

Real-Time Lane Detection and Extreme Learning Machine Based Tracking Control

47

For lane tracking, we used an athletic track environment for our car robot pioneer shown in Fig. 8. Using this image from the real-time simulation, the algorithm in MATLAB detect curve lines and estimate robot angular velocity and linear velocity based on lane detection results and ELM output.

4 Result To estimate the effectiveness of the presented algorithms, we experimented with noisy measurement data of the parabola equation in Matlab. Figure 9 illustrates the 3d view and map of the athletic field. The plotted trajectory from the odometry data of the robot which proves that curve lane following works perfectly is shown in Fig. 10.

Fig. 9. 3D view of the environment & Robot (Left), Map of the Track (Right)

Fig. 10. Plotted trajectory path of the robot

The Kalman filter can estimate parameters of the parabola equation from noisy data. In Fig. 11, a comparison between the measured value, estimated value and the real value is shown.

48

S. Hossain et al.

Fig. 11. Comparison with measurement value, KF estimation value, the real value

In this section, we will demonstrate the effectiveness of the proposed method by comparing it with a PID algorithm. The simulation result in Figs. 12, 13 and 14 shows the output response of the proposed controller. The simulation results reveal that the proposed algorithm shows better performance, the OS-ELM output displays better result than PID since the response time and trajectory tracking become more accurate.

Fig. 12. Tracking response in X-axis using OS-ELM (Left) & PID (Right)

Fig. 13. Tracking response in Y-axis using OS-ELM (Left) & PID (Right)

Real-Time Lane Detection and Extreme Learning Machine Based Tracking Control

49

Fig. 14. Angular velocity tracking using OS-ELM (Left) & PID (Right)

5 Conclusion In this paper, we presented a unique way of vehicle velocity and angular motion control from the lane feature. The algorithm is divided into three parts: (1) At first, RGB to YUV conversation. To convert the image to binary, Otsu’s threshold method is used. From the top view of the image after top view transformation, lane detection models are applied. (2) A Hough transform to detect a straight lane in the near section. A curve lane detection is done using the Kalman filter part in the parabolic model for far section view. (3) The estimated result is feed into ELM to get robust control output for the robot motion. The experimental outcomes demonstrate that the curve lane detection technique can be viably identified even in extremely boisterous condition. One favorable position of the proposed calculation is its robustness against clamor. Through this lane detection method, it is possible to foresee street turning as well as appropriate speed and precise speed estimation for the autonomous vehicle. Acknowledgment. My special thanks and gratitude to my supervisor Professor Deok-jin Lee for his guidance and support through this paper. I would also like to pay my deep sense of gratitude to all CAIAS (Center for Artificial Intelligence and Autonomous System) lab members for their support and CAIAS lab for providing me all the facilities that were required from the lab. I would also like to mention that the core idea for this paper comes from this paper referred at reference no. 17 written by Oualid Doukhi, Abdur R. Fayjie and Deok-jin Lee which was published in Proceedings of SAI Intelligent Systems Conference on page 914 to page 924 in Springer, Cham on 6th September 2018.

References 1. Jeong-Gyu, K.: Changes of speed and safety by automated speed enforcement systems. IATSS Res. 26(2), 38–44 (2002) 2. Sehestedt, S.A., Kodagoda, S., Alempijevic, A., Dissanayake, G.: Efficient lane detection and tracking in urban environments. In: European Conference on Mobile Robots (2007)

50

S. Hossain et al.

3. Qiu, C.: An edge detection method of lane lines based on mathematical morphology and MATLAB. In: Cross Strait Quad-Regional Radio Science and Wireless Technology Conference (CSQRWC), vol. 2, pp. 1266–1269 (2011) 4. Assidiq, A.A.M., Khalifa, O.O., Islam, M.R., Khan, S.: Real time lane detection for autonomous vehicles. In: 2008 International Conference on Computer and Communication Engineering, ICCCE 2008, pp. 82–88 (2008) 5. Kano, H., Asari, K., Ishii, Y., Hongo, H.: Precise top view image generation without global metric information. IEICE Trans. Inf. Syst. 91(7), 1893–1898 (2008) 6. Dorj, B., Lee, D.J.: A precise lane detection algorithm based on top view image transformation and least-square approaches. J. Sens. 2016, Article ID 4058093, 13p. (2016). https://doi.org/ 10.1155/2016/4058093 7. Ganokratanaa, T., Ketcham, M., Sathienpong, S.: Real-time lane detection for driving system using image processing based on edge detection and Hough transform. In: The Third International Conference on Digital Information and Communication Technology and its Applications (DICTAP2013), pp. 104–109 (2013) 8. Tseng, C.-C., Cheng, H.-Y., Jeng, B.-S.: A lane detection algorithm using geometry information and modified Hough transform. In: 18th IPPR Conference on Computer Vision, Graphics and Image Processing, Taipei, Taiwan (2005) 9. Jung, C.R., Kelber, C.R.: A lane departure warning system based on a linear-parabolic lane model. In: 2004 Intelligent Vehicles Symposium. IEEE, pp. 891–895 (2004) 10. Markovsky, I., Willems, J.C., Van Huffel, S., De Moor, B.: Exact and Approximate Modeling of Linear Systems: A Behavioral Approach, vol. 11. SIAM (2006) 11. Podpora, M., Korbas, G.P., Kawala-Janik, A.: YUV vs RGB-choosing a color space for human-machine interaction. In: FedCSIS Position Papers, pp. 29–34 (2014) 12. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man. Cybern. 9(1), 62–66 (1979) 13. Lim, K.H., Seng, K.P., Ang, L.-M., Chin, S.W.: Lane detection and Kalman-based linearparabolic lane tracking. In: 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics. IHMSC, vol. 2, pp. 351–354 (2009) 14. Jung, C.R., Kelber, C.R.: An improved linear-parabolic model for lane following and curve detection. In: 2005 18th Brazilian Symposium on Computer Graphics and Image Processing, SIBGRAPI 2005, pp. 131–138 (2005) 15. Liang, N.-Y., Huang, G.-B., Saratchandran, P., Sundararajan, N.: A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 17(6), 1411–1423 (2006) 16. Matias, T., Souza, F., Araújo, R., Antunes, C.H.: Learning of a single-hidden layer feedforward neural network using an optimized extreme learning machine. Neurocomputing 129, 428–436 (2014) 17. Doukhi, O., Fayjie, A.R., Lee, D.J.: Supervisory control of a multirotor drone using on-line sequential extreme learning machine. In: Proceedings of SAI Intelligent Systems Conference 2018, pp. 914–924 (2018)

Pedestrian Recognition and Obstacle Avoidance for Autonomous Vehicles Using Raspberry Pi Charlie Day1, Liam McEachen1, Asiya Khan1(&), Sanjay Sharma1, and Giovanni Masala2 1

2

School of Engineering, University of Plymouth, Plymouth PL4 8AA, UK [email protected] School of Computing, Electronics and Mathematics, University of Plymouth, Plymouth PL4 8AA, UK

Abstract. The aim of this paper is twofold: firstly, to use ultrasonic sensors to detect obstacles and secondly to present a comparison of machine learning and deep learning algorithms for pedestrian recognition in an autonomous vehicle. A mobility scooter was modified to be fully autonomous using Raspberry Pi 3 as a controller. Pedestrians were initially simulated by card board boxes and further replaced by a pedestrian. A mobility scooter was disassembled and connected to Raspberry Pi 3 with ultrasonic sensors and a camera. Two computer vision algorithms of histogram of oriented gradients (HOG) descriptors and Haarclassifiers were trained and tested for pedestrian recognition and compared to deep learning using the single shot detection method. The ultrasonic sensors were tested for time delay for obstacle avoidance and were found to be reliable at ranges between 100 cm and 500 cm at small angles from the acoustic axis, and at delay periods over two seconds. HOG descriptor was found to be a superior algorithm for detecting pedestrians compared to Haar-classifier with an accuracy of around 83%, whereas, deep learning outperformed both with an accuracy of around 88%. The work presented here will enable further tests on the autonomous vehicle to collect meaningful data for management of vehicular cloud. Keywords: Pedestrian recognition Obstacle avoidance Ultrasonic sensors Haar classifier HOG descriptor Deep learning

1 Introduction According to a recent report from the Department of Transport, from October 2015 to September 2016, there were around £183,000 casualties resulting from traffic accidents of which 1,800 were fatal and over 25,000 were life changing [1]. The vision for the autonomous car revolution is to reduce this figure by at least 76%. Cars are undergoing a revolution just like mobile phones did twelve years ago. They are increasingly becoming intelligent agents that have the capability to learn from their environment and be driven in an autonomous manner. Therefore, to improve road safety and traffic congestion, fully or partially autonomous vehicles offer a very promising solution. © Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 51–69, 2020. https://doi.org/10.1007/978-3-030-29513-4_5

52

C. Day et al.

Most modern vehicles are equipped with advanced driver assistance systems (ADAS) that assists in driving in a number of ways such as lane keeping support, automatic parking, etc. More recently, traffic sign recognition systems are becoming an integral part of ADAS. Most new vehicles are capable of some form of autonomy e.g. automatic parking, lane recognition, etc. Raspberry Pi 3 is a small low cost computer and offers opportunity for research towards the roadmap of autonomy. Therefore, the objective of this paper is to use Raspberry Pi version 3 as a microcontroller for an autonomous vehicle connected with ultrasonic sensors and a camera and discusses the suitability of using Raspberry Pi as a controller. The mobility scooter was acquired from Betterlife healthcare [19] and adapted such that its controller communicated with the Pi which was connected to ultrasonic sensors and a camera. Ultrasonic sensors are cheap and have less power consumption and measures the accurate distance from the obstacle and transmits the measured data to the system. The ultrasonic sensor are connected to the Raspberry Pi in a way so that obstacles present in the front, back and side of the vehicle are being detected. The obstacles in the blind zone are also detected by the ultrasonic sensors. This emulates obstacle detection on the road. Presence of pedestrians is detected by computer vision using Haar-classifier and HOG descriptors and further by deep learning. Therefore, the contributions of the paper are two-fold: • To convert a mobility scooter to be fully autonomous with ultrasonic sensors and camera to detect an obstacle and find the reliable range. • To present comparative analysis between HOG descriptors, Haar-classifiers and deep learning for pedestrian recognition. The rest of the paper is organised as follows. Section 2 presents related work; Sect. 3 presents the conversion of the mobility scooter to an autonomous vehicle. In Sect. 4, the computer vision and deep learning algorithm implementation on Raspberry Pi 3 is presented. Section 5 presents the experiments, results and discussions. Section 6 concludes the paper highlighting areas of future work.

2 Related Work An important feature of autonomous driving is recognizing pedestrians and obstacles. Computer vision allows autonomous vehicles to process detailed information about images that would not be possible with only sensors and has been increasingly used to study facial recognition. Two methods in computer vision have been widely applied in image recognition as firstly, the Haar feature like classifier [2] where a set of positive and negative images are used and Haar-like features are applied to each set. Critical analysis of the technique has shown that the background complexity plays a role in the quality of the classifier and can be easily corrupted by lighting [3]. This method was originally used for facial recognition, the same principles can be applied to almost any other object therefore a pedestrian cascade can be made using this method and has been presented here. Secondly, the Histogram of Oriented Gradients (HOG) descriptor and has been used in [4] for pedestrian recognition. The principles behind the HOG descriptor [5] is

Pedestrian Recognition and Obstacle Avoidance for Autonomous Vehicles

53

that it is a type of ‘feature descriptor’ that has been trained using a Support Vector Machine (SVM), a type of supervised machine learning that works on the classification of positive and negatives samples of data. The HOG feature descriptor, unlike conventional techniques applies a general feature as opposed to a localized one to the image area. This is sent to the SVM which would then classify it as a pedestrian. The authors in [6] present an efficient hardware architecture based on an improved HOG version and linear SVM classifier for pedestrian detection for full high definition video with a reduced hardware resource and power consumption. Image processing algorithms have been used in [7] to remove unwanted noise from the image based sensor. A vision optical flow based vehicle collision warning system is proposed in [8] based on computer vision techniques. Raspberry Pi is a credit card sized single board low cost computer and provides the flexibility of using it as a microcontroller and is increasingly used in academic research, e.g. in [7] authors present the implementation of image processing operations on Raspberry Pi. In [9] ultrasonic sensors are used for object detection from the moving vehicle. In [10] authors have combined Haar detection with laser distance to recognize pedestrians. The work presented in [11] concludes that codebook representation of Haar and HoG features outperform detection based on only HoG and Haar. The codebook is generated from a set of features given by the Bag of Words [12] model. More recently, deep learning based on deep convolutional neural networks has been used for image classification of road signs [13] with 97% accuracy and for pedestrian recognition [14]. The work presented in [15] uses detection based on Haar features and classification based on HOG features with support vector machine. Pedestrian recognition using OpenCV on Raspberry Pi was implemented in [16]. In [17] authors have used Raspberry Pi for detecting road traffic signs using image processing techniques, whereas, in [18] a combination of MATLAB with Raspberry Pi is used for face detection using Haar classifier. There has been an increasing interest from the research community in image classification for pedestrian recognition. Most modern vehicles now have some form autonomous features, confidence level in obstacle detection and pedestrian recognition has to increase towards the roadmap of fully autonomous vehicles. In addition, utilizing the computational capability of Raspberry Pi in research is still in its early stages. The novelty of our work from the work presented in literature is that we are implementing our pedestrian recognition algorithms on Raspberry Pi 3, comparing them and testing its suitability from a research point of view.

3 Autonomous Vehicle Using Raspberry Pi The vehicle used for this project is a Capricorn Electric Wheelchair from Betterlife Healthcare [19] as shown in Fig. 1(a). It is a small, four wheeled vehicle with caster type front wheels, two fixed driven rear wheels and powered by two 12 V batteries. It is driven by two separate electric motors, which are connected directly to each of the rear wheels. It has a maximum speed of 4mph, a maximum incline of 6° and a turning circle of radius 475 mm. The maximum range of the wheelchair is 9.5 km. The tyres are solid and have a larger radius than many other models of its type, helping to improve performance on rough or uneven surfaces.

54

C. Day et al.

This section will present the conversion of the mobility scooter into an autonomous vehicle controlled by Raspberry Pi 3. It will further describe the connection of ultrasonic sensors and camera. 3.1

Connecting the Raspberry Pi 3

The autonomous vehicle was built from a mobility scooter as shown in Fig. 1(a). The scooter had an inbuilt microcontroller shown in Fig. 1(b) which was used as a communicative tool between the Raspberry Pi version 3 and the vehicle’s motors. The directional actions of the joystick voltage levels for each pin are shown in Fig. 1(c) (shown in the red circle in Fig. 1(b)) and are presented in Table 1.

(a)

(b)

(c)

Fig. 1. (a). Original mobility scooter. (b). Autonomous vehicle microcontroller. (c). Autonomous vehicle pins

Table 1. Voltage directional values. Direction Forward Reverse Right Left Static/stop Turn ON

Voltage applied (volts) Pin colour 3.97 Green/Grey 1.13 Green/Grey 3.97 Purple/Yellow 1.13 Purple/Yellow 2.5 All 2.5 Black

In order to make space for a platform on which the system can be installed, the chair was removed, as was the housing surrounding the frame of the vehicle. The central column between the chair and the frame was also removed, allowing the new chassis to be placed over the frame. The new chassis is shown in Fig. 2(a), whereas, the block diagram is presented in Fig. 2(b) showing the connections of the Raspberry Pi

Pedestrian Recognition and Obstacle Avoidance for Autonomous Vehicles

55

with the sensors and the vehicle’s controller. The chassis shown in Fig. 2(a) has enough space for the control panel – rewired to connect the Raspberry Pi directly to the joystick input, the Pi itself, and two breadboards with which the circuitry could be modified during the built and testing process. The chassis is designed so additional components and sensors can be added. Two digital to analogue converters were installed, controlling both forwards/backwards motion and the yaw of the vehicle, respectively. The front wheels were fixed in place by the removal of the bearings contained in the shafts. This allowed the connecting bolts to be tightened fully and restricting the motion of the vehicle to forwards and backwards.

Fig. 2. (a) The autonomous vehicle modified from the mobility scooter. (b). Block diagram of the autonomous vehicle

For the vehicle to be autonomous it would have to be controlled by the GeneralPurpose Input Output (GPIO) pins on the Pi which would send signals emulating the joystick. The GPIO pins work with digital signals therefore a Digital to Analogue Converter (DAC) (Fig. 2(b)) would be required to alter the signal type. An Adafruit MCP4725 DAC [20] was used and functioned well with the Raspberry Pi. To use the DAC effectively the Inter-Integrated Circuit (I2C) bus on the Raspberry Pi had to be operated. This is an interface that utilises data exchange between components and microcontrollers, there are many ‘slave’ devices (the DAC/s) controlled by a ‘master’ device (the Raspberry Pi). The Pi’s GPIO pins can only output 5 V or 3.3 V and the DAC is a 12-bit controller which means it ranges in values from 0 to 4096, Eq. 1 shows the formula used, DACvoltage ¼

Vdesired 212 Vmax

ð1Þ

This formula can then be applied directly to the DAC through a Python script. The DAC contains two inputs, Vdd and A0 allowing two different I2C bus addresses to be used on the Raspberry Pi at once which could then be referred to as parameters in a function as 0 62 and 0 63 for forwards/backwards and left/right, respectively.

56

3.2

C. Day et al.

Connecting Ultrasonic Sensors to the Raspberry Pi 3

Ultrasonic sensors provide basic object-detection autonomy to the vehicle. The HCSR04 sensor was used which could work with the Pi’s GPIO pins through jumper wires. The principle of the HC-SR04 is that there are four pins: power, trigger, echo, and ground. The power and ground were connected directly to the Pi’s voltage and ground pins, the trigger acts as a ‘starting gun’ for the sensor signifying when to produce a soundwave and the echo receives the soundwave. While these are binary input/output functions they can be used on Python to determine the distance of the closest object. The programming logic is shown in Flowchart in Fig. 3.

E Start

Declare sensor GPIO pins as Echo/Trigger Set Trigger as an output pin and Echo as input pin

Echo Pin=0 Start mer

False

True

Echo=1. Stop mer. Take difference in me values Convert to correct distance units and return result E End Program

Fig. 3. Ultrasonic sensor function flowchart

This ultrasonic object detection function can be imported into the DAC script and tell the Pi to fire a STOP (2.5 V) or ‘Reverse’ (1.13 V) command to the vehicle through

Pedestrian Recognition and Obstacle Avoidance for Autonomous Vehicles

57

the DAC if the sensor function detects an object that is below a predefined threshold (e.g. 1, station i receives more bikes than the ones that start at i. – φi = 1, station i is “balanced”. – φi < 1, more bikes depart from station i than bikes that arrive at i.

Fig. 3. Pareto chart for the number of arrivals per station

As seen in Fig. 4, station 267 has the highest flow index equals to 1.65 (φ267 = 0.3955/0.2396), i.e. it is a “receiving” station. Station 193 has the lowest flow index with 0.53 (φ193 = 0.1338/0.2538); thus, it is a “departure” station. From Fig. 4, 46% of the stations have a flow index less than 1; thus, those stations, classified as “receiving station”, supply bikes to the systems. The percentage of “departure” stations is 54%; thus, their flow index is less than 1; thus, these stations are demand stations, i.e. they require more bikes than those that arrive. There are no stations with φi = 1, it means that there are no balanced stations. Table 3 shows that the station with the smallest number of trips that depart and arrive is station 94. According to its flow index (φ94 = 1.2727) it is classified as a “receiving” station. The station with the largest number of trips is 27 and its flow index is 1.0069, i.e. φ27 ≈ 1; thus, station 27 is balanced. According to Table 3, when the flow index is computed, a station with a high percentage of arriving trips could be classified as delivering or departure stations. Therefore, it is not enough to know the number of input and output trips to classify stations; thus, the flow index is a metric that relates the number of trips. 3.2

Traffic Among Station

So far, the carried-out analysis has been focused on the activity per station. Another piece of information that helps managers to keep high service levels is

98

L. A. Moncayo–Mart´ınez

Fig. 4. Flow index (Eq. 1) per station.

the number of trips that take place among stations and with themselves. Thus, the objective is to identify which pair of stations has the most number of trips among them. In Fig. 5, the number of trips is plotted using a black point. The bigger the point, the more trips between two stations. We identify seven areas (A, B, . . ., G) which are darker than others, e.g. the darkest area A is shaped by stations 1 to 80; B by stations 110 to 150; and C by stations 200 to 250. Therefore, there are a lot of trips among stations that form areas A, B, and C. In area G, the heavy traffic is among stations 1, 2, . . ., 50 to stations 240, 241, . . ., 272. Other areas are F and D, as shown in Fig. 5. The maximum number of trips is 3419 between stations 211 and 217. Those stations are on the same avenue (Horacio Avenue) in the middle of a district of Table 3. Flow index (Eq. 1) for stations in Table 2 Station (i) Flow index (φi ) 94

1.2727

220

0.8512

100

1.1105

225

0.9848

243

0.7206

101

1.1853

182

1.152

64

1.0492

36

0.9686

43

1.1559

1

1.0478

27

1.0069

Bike–Sharing System Demand with R Scripts

99

commercial buildings and station 211 is beside a metro station (Polanco). The distance between them is about 550 m. There are three more pairs of stations with more than 3000 trips. Those are stations 183-174 (3272 trips and 550 m distance), stations 18-1 (3202 trips - 850 m distance), and stations 174–183 (3191 trips - 550 m distance). On the other hand, there are 3491 pairs of stations with just one trip, e.g. stations 275-245 (1 trip - 5.8 Km) and stations 1-202 (1 trip 4.8 Km). As shown in Fig. 5, there is a diagonal dark area from the bottom-left to the upper-right corner. It suggests that many trips begin and end at the same station, e.g. at station 61 there are 2073 trips with this characteristic. This station is located beside a public park (parque México); thus, it suggests that people ride around the park to exercise; nevertheless, there are 1627 trips of this kind at station 27 which is placed beside to the main avenue in the commercial district (Reforma Av.). The flow index of stations 61 and 27 are φ61 = 1.0410 and φ27 = 1.0067 (see Fig. 4); thus, we can consider them as self-balanced given that φi ≈ 1. From Fig. 5, managers could realize that there are six areas (or clusters) in which the number of trips is large. Contrary, there are areas where the traffic is light, represented by the lighter areas. 3.3

Ten–Minute Interval Activity per Station

The aim of this section is to know the average of incoming and outcoming bikes per ten-minute intervals per station. Therefore, the average number of incoming and outcomings bikes for every single day from July 2014 to January 2015 is computed at intervals of ten minutes. We present the results of some stations on Friday and Sundays because on those days the heaviest and lightest traffic

Fig. 5. From-To Matrix (number of trips among stations)

100

L. A. Moncayo–Mart´ınez

takes place, respectively, in Mexico City. Nevertheless, the reader can carry out the analysis for any station and day using our R script. Figures 6 and 7 show the results for station 27 which receives the highest percentage of incoming and outcoming bikes (see Table 2 and Fig. 2) and has an index flow of 1.0069, i.e. it is considered balanced (see Table 3). Figure 6 shows that from 6 to 11 h there are more incoming bikes than those leaving the station. After 11 h the number of incoming and outcoming bikes is very similar. The maximum number of bikes in station is at 18:09 (8 departing bikes) and at 19:29 the number of bikes decreases until midnight. On Sundays (Fig. 7), the maximum number of bikes takes place at 13:49 (8 arriving bikes) and the activity of the station begin to decline at 13:59. Therefore, with heavy traffic on Fridays the maximum activity of the station is reached at 19 h; meanwhile, with light traffic at 13 h. Figures 8 and 9 shows the analysis for station 267 which has the highest index value (φ267 = 1.6507). Figure 8 shows that on Fridays from 7 to 9 h bikes depart and after 15 h bikes arrive. On the other hand, on Sundays the number of arriving bikes is much higher than the number of departing ones. Station 193 has the lowest index value (φ193 = 0.5272); thus, it is expected more departing bikes. On Fridays (Fig. 10), the maximum number of bikes is at 8 and 18 h. As expected the red line (departing bikes) is above the blue one (arriving bikes). Nevertheless, on Sundays (Fig. 11) from 13 to 14:29 h, bikes arrive at stations in greater number than those that leave. This graph could help manager to know how many bikes there are at a station every ten-minute interval; e.g. Table 4 shows the number of arriving and departing bikes at 8:29:29 on Fridays. The available bikes are 6, 4, and −1 at stations 267, 27, and 193, respectively. It means that, there are 6 available bikes at stations 267 and 6

Fig. 6. Average arrivals and departures at station 27 on Fridays

Bike–Sharing System Demand with R Scripts

101

Table 4. Friday at 8:29:59 Station (i) Arriving bikes Departing bikes Available bikes 267

6

0

6

27

6

2

4

193

1

2

−1

at station 27. Contrary, there is a need of 1 bike at station 27. Therefore, if managers should rebalance the systems, they know where there are available bikes are where they are needed.

Fig. 7. Average arrivals and departures at station 27 on Sundays

Fig. 8. Average arrivals and departures at station 267 on Fridays

102

4

L. A. Moncayo–Mart´ınez

Discussion and Conclusions

After this study, the Ecobici system is described not only by the users’ profile but also by the mobility pattern among stations. To the best of our knowledge, there is no a publishable work to provide the information that was computed. The analysis is very similar to the one carried out with data of the Bicing (Barcelona) [9] but we provide information about the traffic among stations (Fig. 5). Most of the studies analyze the system in hours intervals [5,24]; thus, we analyze the system by ten-minute intervals. According to our results, the average time of a trip is 12.5 min. Moreover, Table 1 shows that 75% of the trips last less than 16.1 min. Therefore, each time

Fig. 9. Average arrivals and departures at station 267 on Sundays

Fig. 10. Average arrivals and departures at station 193 on Fridays

Bike–Sharing System Demand with R Scripts

103

a bike leaves a station, the stock of bikes in the arriving station will be modified after 12.5 min. The second piece of information is to compute which stations receive most of the trips; thus, stations 27, 1, 43, and 36 receive about 1% of the total trips each one. Some stations receive less than 0.1% of the trips. On the other hand, 1.5% of the bikes depart from stations 27. Therefore, in Fig. 2 the stations are sorted based on the percentage of arriving and departing bikes. One measure to relate the inbound and outbound percentage of trips is the flow index (Eq. 1). Therefore, the stations are classified according to this index; thus, stations 267, 176, 88 (see Fig. 3) are receiving ones, i.e. those stations receive more bikes that those the leave the station. Contrary, more bikes leave stations 193, 249, 154 (see Fig. 3) than those that arrive. Therefore, manager could know if a station is classified as a receiving or a departing one. According to our results, there are no self-balanced stations given that the flow index is never exactly one. According to Table 3, the percentage of inbound and outbound bikes is not enough to classify a station. For example, station 94 is the one that receives the minimum number of bikes, and from station 94 the minimum number of bikes depart. Even though, the flow index is 1.2727; thus, it is a receiving station. Station 27 is classified as a receiving station as well, but it is the station that receive and depart the biggest number of bikes. Figure 5 shows that there are some stations with heavy traffic among them. Seven clusters of stations are identified (see Fig. 5). It is identify that there are many trips that begin and end at the same station. One of the main contributions of this work is to show that a station could have available bikes and minutes later the same station needs bikes, e.g. in Fig. 6, from 11 h to 21 h the number of available bikes changes. Specifically, at 12:09:59, more bikes arrive than those that leave the stations. Ten minutes later 12:19:59 the opposite situation takes place. Therefore, the system could not be analysed in periods that last hours because the systems is changing in minutes.

Fig. 11. Average arrivals and departures at station 193 on Sundays

104

L. A. Moncayo–Mart´ınez

In summary, the mean time, a user uses a bicycle, is 12.68 min (Table 1). Station 27 receives the largest number of bicycles. The largest number of bicycles ends in this station (Table 2 and Fig. 2). According to Fig. 2, most of the stations receive less than 0.5% of the total number of bicycles. This proportion is the same for the number of arrivals. According to the flow index (Eq. 1), stations could be divided into “supply” and “demand” stages. Stations with index φi < 1 (Fig. 3) need bicycles and stations with φi > 1 hold bicycles. We can use this information to rebalance the systems. From Fig. 5, three clusters are identified based on the from-to matrix. We conclude that many trips begin and end in the same station. Manager could focus on the management of the stations given that they know where the trips begin and end. Computing a plot such as Figs. 6, 7, 8, 9, 10 and 11, we could compute the number of bicycles in stock. Those plots could be done using our R script. Acknowledgment. “Financial support from the Asociaci´ on Mexicana de Cultura, A.C. is gratefully acknowledged”. National Council of Science and Technology (CONACyT) from Mexico.

References 1. Noland, R., Ishaque, M.: Smart bicycles in an urban area: evaluation of a pilot scheme in London. J. Public Transp. 9(5), 71–95 (2006) 2. Bachand-Marleau, J., Larsen, J., El-Geneidy, A.: Much-anticipated marriage of cycling and transit. Transp. Res. Rec.: J. Transp. Res. Board 2247, 109–117 (2011) 3. Fishman, E.: Bikeshare: a review of recent literature. Transp. Rev. 36, 1–22 (2015) 4. Fishman, E., Washington, S., Haworth, N.: Bike share: a synthesis of the literature. Transp. Rev. 33(2), 148–165 (2013) 5. Schuijbroek, J., Hampshire, R., van Hoeve, W.J.: Inventory rebalancing and vehicle routing in bike sharing systems (2013) 6. Nair, R., Miller-Hooks, E.: Fleet management for vehicle sharing operations. Transp. Sci. 45(4), 524–540 (2011) 7. Raviv, T., Tzur, M., Forma, I.A.: Static repositioning in a bike-sharing system: models and solution approaches. EURO J. Transp. Logist. 2(3), 187–229 (2013) 8. Vogel, P., Greiser, T., Mattfeld, D.C.: Understanding bike-sharing systems using data mining: exploring activity patterns. Ph.D. thesis, Cornell University (2011) 9. Kaltenbrunner, A., Meza, R., Grivolla, J., Codina, J., Banchs, R.: Urban cycles and mobility patterns: exploring and predicting trends in a bicycle-based public transport system. Pervasive Mob. Comput. 6(4), 455–466 (2010) 10. O’Mahony, E.: Smarter Tools for (Citi)Bike Sharing. Ph.D. thesis, Cornell University (2015) 11. DeMaio, P.: Bike-sharing: history, impacts, models of provision, and future. J. Public Transp. 12(4), 41–56 (2009) 12. Reades, J., Calabrese, F., Sevtsuk, A., Ratti, C., Reades, J., Andres, F.C., Ratti, C.: Cellular census: explorations in urban data collection. IEEE Pervasive Comput. 6, 30–38 (2007) 13. Brockmann, D., Hufnagel, L., Geisel, T.: The scaling laws of human travel. Nature 439(7075), 462–465 (2006)

Bike–Sharing System Demand with R Scripts

105

14. Hufnagel, L., Brockmann, D., Geisel, T.: Forecast and control of epidemics in a globalized world. Proc. Nat. Acad. Sci. 101, 15124–15129 (2004) 15. Froehlich, J., Neumann, J., Oliver, N.: Measuring the pulse of the city through shared bicycle programs. Urban Sense 08, 16–20 (2008) 16. Borgnat, P., Abry, P., Flandrin, P., Robardet, C., Rouquier, J.B., Fleury, E.: Shared bicycles in a city: a signal processing and data analysis perspective. Adv. Complex Syst. 14(03), 415–438 (2011) 17. Benchimol, M., Benchimol, P., Chappert, B., de la Taille, A., Laroche, F., Meunier, F., Robinet, L.: Balancing the stations of a self service “bike hire” system. RAIRO - Oper. Res. 45(1), 37–61 (2011) 18. Nair, R., Miller-Hooks, E., Hampshire, R.C., Buˇsić, A.: Large-scale vehicle sharing systems: analysis of Vélib’. Int. J. Sustain. Transp. 7(1), 85–106 (2013) 19. Liu, S.Y., Li, C., Feng, Y.P., Rong, G.: Clustering structure and logistics: a new framework for supply network analysis. Chem. Eng. Res. Des. 91(8), 1383–1389 (2013) 20. Etienne, C., Latifa, O.: Model-based count series clustering for bike sharing system usage mining: a case study with the Vélib’ system of Paris. ACM Trans. Intell. Syst. Technol. 5(3), 1–21 (2014) 21. Moncayo-Martinez, L.A., Ramirez-Nafarrate, A.: Visualization of the mobility patterns in the bike-sharing transport systems in Mexico City. In: 2016 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pp. 1851–1855. IEEE (2016) 22. Froehlich, J.E., Neumann, J., Oliver, N.: Sensing and predicting the pulse of the city through shared bicycling. In: Twenty-First International Joint Conference on Artificial Intelligence (2009) 23. Perez-Lopez, R.: Encuesta Ecobici 2014. Technical report, Secretaria del Medio Ambiente de la Ciudad de Mexico (2015) 24. Raviv, T., Kolka, O.: Optimal inventory management of a bike-sharing station. IIE Trans. 45(10), 1077–1093 (2013)

A Path Towards Understanding Factors Affecting Crash Severity in Autonomous Vehicles Using Current Naturalistic Driving Data Franco van Wyk1, Anahita Khojandi1(&), and Neda Masoud2 1

University of Tennessee, Knoxville, TN 37996, USA [email protected], [email protected] 2 University of Michigan, Ann Arbor, MI 48109, USA

Abstract. In the U.S., in 2015 alone, there were approximately 35,000 fatalities and 2.4 million injuries caused by an estimated 6.3 million traffic accidents. In the future, it is speculated that automated systems will help to avoid or decrease the number and severity of accidents. However, before such a time, a broad range of vehicles, from non-autonomous to fully-autonomous, will share the road. Hence, measures need to be put in place to improve both safety and efficiency, while not compromising the advantages of autonomous driving technology. In this study, a Bayesian network model is developed to predict the severity of an accident, should it occur, given the road and the immediate environment conditions. The model is calibrated for the case of traditional vehicles using pre-crash information on driver behaviour, road surface conditions, weather and lighting conditions, among other variables, to predict two categories of consequences for accidents, namely property damage and injury/fatality. The results demonstrate that the proposed methodology and the determinant factors used in the models can predict the consequences of an accident, and more importantly, the probability of a crash causing injury/fatality, with high accuracy. Approaches to extend this model are proposed to predict accident severity for autonomous vehicles through leveraging their sensor data. Such a model would assist the development of countermeasures to identify the most important factors impacting severity of accidents for semi- and fully-autonomous vehicles to prevent accidents, decrease accident severity in cases where accidents are bound to occur, and improve transportation safety in general. Keywords: Accident severity Bayesian networks Transportation safety Predictive modeling

Autonomous

vehicles

1 Introduction and Literature Review More than 1.2 million fatalities and 50 million injuries were recorded globally in 2015, making road traffic accidents a leading cause of death. Fatalities and injuries from traffic accidents cause, on average, an estimated 3% loss in GDP on a global scale [1]. In the U.S., in 2015, there were 35,092 fatalities and 2.44 million injuries resulting from 6.3 million reported traffic accidents causing an estimated 1.9% loss in GDP [2]. © Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 106–120, 2020. https://doi.org/10.1007/978-3-030-29513-4_8

A Path Towards Understanding Factors Affecting Crash Severity

107

Accident severity has a significant impact on the economic costs associated with an accident. Fatalities, injuries, and property damage accidents cost, on average, $1.5 million, $80,700, and $9,300, respectively per accident [3]. These economic costs account for wage and productivity losses, medical expenses, administrative expenses, and motor vehicle damage including the value of damage to property. A popular area in traffic safety analysis is the identification of factors that increase the likelihood of an accident occurring. These factors typically include non-behavioral factors affecting accident frequency such as geometric variables (e.g., horizontal and vertical road alignments, or the immediate physical environment), traffic characteristics (e.g., hourly volume, annual average daily traffic, composition of vehicle types), and environmental conditions (e.g., road surface conditions, light conditions, or weather conditions) for a specific road type (e.g., highway, intersection, rural road) [4–8]. However, accident severity presents a less studied and understood aspect of safety in transportation systems [6]. Studying the factors of the conditional distribution of accident severity (i.e., probability of accident severity, given that an accident has occurred) allows for gaining additional insights into true motorist behavior (e.g., speeding, driving under the influence, sleep deprivation, cell phone use) as well as the interactions of the behavior with environmental and roadway characteristics. The factors influencing accident severity may therefore have different interdependencies and characteristics than those influencing the likelihood of accident itself. The main focus of this paper is therefore on the interaction between motorist behavior, environmental and roadway conditions and their impact on accident severity. Studies of the factors that contribute to and cause motor vehicle accidents with different levels of severity rely on either a univariate or a multivariate approach. The former approach aims to investigate the effect of a single factor on accident severity. In contrast, the latter approach considers the effect of a multitude of factors and their interactions on accident severity. For instance, certain studies investigated the effect of gender on accident severity and found that accidents involving men are more severe [9, 10]. Other studies investigated the relationship between alcohol use and accident severity and found that fatality rates increase dramatically with drinking and driving [11, 12]. Other factors studied in isolation include driver distraction, seat-belt usage, driver age, and lighting and weather conditions to name a few [13–16]. The univariate approach introduces potential ambiguity and bias in severity analyses, prompting the majority of recent studies to employ multivariate analyses to incorporate the influence of a multitude of factors on accident severity. Many factors are typically included in these multivariate studies to develop injury severity models. However, many of these studies choose to concentrate on a subset of traffic data limited to a particular accident type, certain road segments, or vehicle types [6, 17–19]. Typically, this approach is followed in order to obtain accurate prediction models from a somewhat homogenous dataset [20]. In recent years, various methodologies and statistical techniques have been developed and applied to model the accident injury severity. In general, methodologies and techniques used to model accident severity can be classified into three groups: (1) discrete choice models, (2) soft computing techniques, and (3) data mining techniques. Discrete choice models include logit and probit models. The logit model aims to describe the relationship of one or more independent variables to an outcome

108

F. van Wyk et al.

variable. Ouyang et al. [21] implemented a binary logit model (BLM) to investigate the simultaneity of injury severity outcomes in multi-vehicle crashes and found that high speed curve designs decrease the injury severity of car-truck collisions. Malyshkina and Mannering [22] analyzed the accident injury severity for two vehicles or fewer using a multinomial logit model (MNL) and found that adverse weather conditions correlates with severe injuries. Chen et al. [17] developed a hybrid approach to combine multinomial logit models and Bayesian network methods to analyze driver injury severities in rear-end crashes and found that factors such as windy weather conditions and truckinvolvement increase accident severity. Probit models address certain limitations of logit models. They can incorporate random variation and any pattern of substitution, and do not suffer from the multinomial logit’s assumption of independence of irrelevant alternatives (IIA) [23]. Xie et al. [24] analyzed accident injury severity using a Bayesian ordered probit model and found it an effective method to combine information contained in the data with the prior knowledge of the parameters. Mujalli and de Oña [25] provided a comprehensive summary of other types of discrete choice models to model accident severity injury and address different methodological issues for certain datasets. These models include hierarchical logit, heteroskedastic logit, ordered logit, mixed logit, and ordered probit models [19, 26–29]. A major drawback of discrete choice models is the predefined underlying relationships between variables (e.g., linear relation) which may lead to errors in injury severity estimation if these assumptions are violated in the dataset. To add flexibility to their models, some researchers have exploited soft computing techniques in accident injury severity applications. These techniques include the artificial neural network (ANN), genetic algorithm (GA) and recurrent neural network (RNN). Delen et al. [20] used a series of ANNs to model the potentially non-linear relationships between the injury severity levels and causal factors and found that no single factor by itself appears to significantly influence accident injury severity. Kunt et al. [30] compared the performance of ANN and GA in predicting accident severity outcome and found that ANN outperformed GA. Sameen and Pradhan [31] developed a RNN model to predict the injury severity of traffic accidents and found its performance superior to that of ANNs. The performance of soft computing techniques, in general, is highly dependent on complete data and typically cannot incorporate prior knowledge or expert opinion. Recently, increased attention has been directed at data mining techniques such as Bayesian networks to model accident severity as a result of increasing data availability and computational resources. Bayesian networks, which are graphical models of interactions between a set of variables, have been used in a number of traffic crash and modeling studies [17, 32–35]. For instance, Simoncic [32] developed a Bayesian network to model road accidents involving two cars incorporating several factors for both vehicles such as seatbelt use, alcohol use, and driver experience. De Oña et al. [33] used Bayesian networks to identify significant factors and analyze the severity of traffic accidents on rural highways by classifying accidents as slightly injured or killed/severely injured. De Oña et al. [34] employed Bayesian networks to model traffic accident injury severity on Spanish rural highways. Mujalli and De Oña [35] analyzed traffic accidents injury severity on two-lane highways using Bayesian networks. Zong et al. [36] compared Bayesian networks and regression models and concluded that

A Path Towards Understanding Factors Affecting Crash Severity

109

Bayesian networks outperformed regression models. However, driver characteristics related variables, which have an impact on injury severity, were not included due to limited data. Other traffic modeling applications for Bayesian networks include the identification of traffic conditions by estimating traffic accident risks and the analysis of highway safety performance [37, 38]. The review of the literature suggests that most studies addressing accident injury severity prediction are narrow in scope and rely on only a highly homogenous dataset. In addition, from the literature review it is clear that no study was conducted or laid out a plan to model accident severity for autonomous or semi-autonomous vehicles. The studies involving Bayesian networks provide insights into the typical applications in traffic accident modeling and analysis. Various features of Bayesian networks enable it to model traffic accident situations. These networks can capture interdependencies and statistical associations between dependent and independent variables that ultimately affects the predicted outcome. The directed acyclic graph (DAG) defines the network structure for a Bayesian network and the conditional probability distributions (CPD) define the quantitative relationship between variables. The network structure and CPD do not require any specified assumptions about variables. In addition, complete data, i.e., where all variable values are specified for a given observation, are not necessarily required for these models. The network predicts and infers probabilistically, conditional on the evidence provided for variables. Bayesian networks are also capable of incorporating prior knowledge and can predict more than a single output node. Furthermore, Bayesian networks are useful when uncertainty is present regarding the correlation between variables and their combined influence on the predicted outcome. This study aims to develop a Bayesian network to discover patterns from a nonhomogenous accident dataset in order to estimate the severity of an accident, should it occur. The severity of an accident injury is classified into two categories, namely, property damage and injury/fatality. The framework consists of a Bayesian network that integrates pre-crash information including driver behavior, geometric features, and environmental features. Contrary to previous studies, in this work, the domain is not narrowed to a particular accident type, road segment, or vehicle type. State-wide accident data from Michigan in 2016 is used to train, validate, and test the Bayesian network [39]. In addition, the impact of balancing the data is investigated due to the typically heavily unbalanced accident datasets with respect to the number of fatal accidents compared to those causing property damage or injury. As an extension to the tested model, a general framework based on Bayesian networks for modelling accident injury severity for various levels of autonomy in vehicles is presented for future calibration and testing as autonomous vehicle technology data become more readily available. The rest of the paper is organized as follows. In Sect. 2, we provide an overview of Bayesian networks and the performance metrics used to evaluate these networks in classification applications. In Sect. 3, we discuss the data set used in the numerical study. In Sect. 4, a Bayesian network for the driver and autonomous networks is developed. The driver network is trained with and without data balancing, and the corresponding results and insights are discussed. In addition, a detailed discussion is provided on incorporating the factors influencing accident severity in the autonomous network and ways to calibrate these factors are specified. Lastly, we conclude in Sect. 5.

110

F. van Wyk et al.

2 Methodological Approach In this section, we discuss Bayesian networks and their applicability in modeling traffic accident injury severity. In addition, we present various indicators used to measure the performance of Bayesian networks in classification applications. 2.1

Bayesian Networks

Bayesian networks have gained increasing popularity in recent years and is employed in the modeling process for various applications where expert knowledge is important including traffic analyses, medicine, bio-informatics, and image processing [40]. The Bayesian network in this study acts as a classifier to analyze the worst injury severity in a traffic accident based on factors identified from the literature and from expert knowledge. Let S ¼ fX1 ; :::; Xn g; n 1 denote a set of variables representing the nodes in the Bayesian network. Let the network structure BS denote a DAG Bayesian network over the set S and Bp ¼ fpðXi jpaðXi Þ; Xi 2 SÞg for i ¼ 1; 2; :::; n denote a set of probability distributions where paðXi Þ is the set of parents of Xi in the Bayesian network structure BS . Edges in the DAG represent the relationship between parent and child nodes. These edges indicate causality, dependence and independence, based on graph theory. Therefore, a Q Bayesian network represents joint probability distributions PðSÞ ¼ Xi 2S pðXi jpaðXi ÞÞ. Using the DAG to classify injury severity is to classify an outcome y, trained from a dataset T that contains multiple instances of ðX; yÞ. In order to use the network as a classifier, a value of y is required to maximize PðyjXÞ, i.e., argmaxy PðyjXÞ. The structure and parameters, i.e., conditional probability distributions of a Bayesian network, can be determined in a number of ways depending on the application [41, 42]. One popular approach is to learn the parameters of a Bayesian network, given the structure, usually using maximum likelihood (ML) estimation. The other popular approach is to employ a score-based approach to learn the structure of the network. Employing a scoring metric requires complete data for all variables in the network. Typical scoring methods include the Bayesian information criterion (BIC), structure entropy, Akaike information criterion (AIC), and Bayes metric. In this study, the structure is specified using expert knowledge and the parameters are learned using the ML method. In addition, the BIC metric is used to justify the network structure from a finite set of alternative network configurations. 2.2

Performance Evaluation Indicators

Several indicators are typically used to evaluate the performance of a Bayesian network used in classification applications [33]. In this study, accuracy, recall, precision, balanced F-score, receiver operating characteristic (ROC) score, and the geometric mean of recall and specificity are presented as classifier performance indicators [43]. The accuracy, which calculates the proportion of correctly classified cases, provides an evaluation of the classifier’s overall performance. However, for heavily unbalanced datasets, this metric may often be very high but does not necessarily indicate good

A Path Towards Understanding Factors Affecting Crash Severity

111

performance. Recall and precision evaluates the classifier’s ability to correctly distinguish between the property damage and injury/fatality cases. The F-score incorporates the trade-off between precision and recall and is used to evaluate the overall accident injury severity classification performance of the Bayesian network. The F-score incorporates the average proportions of precision and recall. Furthermore, the ROC score, unlike the F-score, does not give equal weight to precision and recall. Classifiers with ROC curves with an area under curve (AUC) of 0.5 are considered to be no better than guessing whereas an AUC of 1.00 describes a perfect classifier. Lastly, the geometric mean of recall and specificity provides an overall performance evaluation metric, which are particularly useful for unbalanced datasets.

3 Data Accident data were obtained from the Michigan Traffic Crash Facts (MTCF) website for the year 2016 [39]. The dataset consists of state-wide data for all types of accidents and vehicles on a public traffic way in Michigan resulting in injury, death, or at least $1,000 in property damage. The dataset consists of 81.5% property damage cases and 18.5% injury/fatality cases. In this study, 14,000 accident records were used, where accidents causing injury and death are grouped together due to the low frequency of fatal accidents.

4 Results and Discussion The following section presents the results obtained from a numerical study. First, the Bayesian network for the driver and autonomous networks is presented. Next, the driver network is trained with and without balancing, and the corresponding results are discussed. Lastly, a detailed discussion is provided on incorporating the factors affecting accident severity in the autonomous network. 4.1

Driver and Autonomous Networks

As discussed, objectives of this study include the prediction of accidents as property damage or injury/fatality accidents and to identify certain factors that influence the accident injury severity. In addition, an extension to the driver mode, the autonomous case is also presented, referred to as the ‘autonomous’ network. Factors included in the driver network is based on previous studies, expert knowledge and the availability of new data such as the distractedness of the driver [17, 33, 34]. The autonomous factors are identified with the aid of recent crashes involving semi-autonomous vehicles and industry experts. The factors included in the driver network include: characteristics of the accident (time, day of week, accident type, total number of involved vehicles, speed limit, worst injury in accident); weather information (weather and lighting conditions); driver behavior (alcohol and drug use, distracted); road characteristics (surface conditions, geometry, class, and zone); and the immediate physical environment as the accident occurred (pedestrian or deer involved). The factors included in the autonomous

112

F. van Wyk et al.

network include GPS accuracy, quality of sensor readings, quality of traffic signs, as well as V21 and V2V communication. Table 1 provides information on the factors included for the driver and autonomous networks developed in this study. Table 1. The definition of variables for the driver and autonomous networks Network type Driver

Autonomous

Variable

Symbol

Description

Accident injury level Time of day

A T

Day of the week

DW

Weather conditions Visibility

W V

Road surface Road geometry Road zone

R RG RZ

Immediate physical environment Driver behavior

IPE

Target variable (0 - Property damage; 1 – Injury/fatality) 0 – Day (10AM to 4PM); 1 – Rush hours (6AM to 10AM and 4PM to 7PM); 2 – Night (7PM to 6AM) 0 – Monday; 1 – Tuesday; 2 – Wednesday, 3 – Thursday; 4 – Friday; 5 – Saturday; 6 – Sunday 0 – Cloudy; 1 – Clear; 2 – Snow; 3 – Severe crosswinds; 4 – Fog; 5 – Rain 0 – Daylight; 1 – Dark lighted; 2 – Dark unlighted; 3 – Dusk/dawn 0 – Dry; 1 – Wet; 2 – Ice/snow 0 – Straight; 1 – Curve 0 – Interstate; 1 – Country road; 2 – City street; 3 – Intersection 0 – Deer; 1 – Pedestrian; 2 – Other vehicles; 3 – No objects in vicinity

D

Speed limit

SL

Autonomy GPS accuracy and availability Quality of traffic signs and lane markings Quality of sensor messages

AU GPS LM

Q

V2I communication

V2I

V2V communication

V2V

0 – Sober; 1 – Alcohol use; 2 – Drug use; 3 – Distracted 0 – less than or equal to 35 mph; 1 – between 35 mph and 55 mph; 2 – faster than or equal to 55 mph 0 – Driver; 1 – Autonomous vehicle 0 – Accurate; 1 – Unavailable due to surroundings; 2 – Out of date information 0 – Good quality; 1 – Faded signs or lane markings; 2 – Traffic signs or lane markings not available 0 – Good quality; 1 – Obstruction of sensor; 2 – Malicious messages; 3 – Electronic interference 0 – No communications; 1 – Traffic information; 2 – Speed limit and other driving information; 3 Malicious messages 0 – Good communication; 1 – Forward collision warning; 2 – Blind spot and lane change warning; 3 – Malicious messages

A Path Towards Understanding Factors Affecting Crash Severity

113

As discussed, the structure of the network is specified using expert knowledge and parameters are estimated using ML. In addition, the Bayesian information criterion (BIC), or Schwarz criterion, is employed for the model structure to justify selection among a finite set of possible model structures. Figure 1 presents the Bayesian network that includes the driver and autonomous networks. The autonomous network naturally also includes the factors of the driver network.

Fig. 1. Bayesian network indicating the driver and autonomous networks. The driver network is indicated with solid lines and the autonomous factors are indicated with dashed lines. Refer to Table 1 for node terminology.

4.2

Results

The total number of 14,000 accident records was split into the following sets: 10,000 records for training, 2000 records for validation, and 2000 for testing. To reduce training time, the training set of 10,000 records was split into ten equally sized, balanced sets with 500 property damage and 500 injury/fatality cases to train ten Bayesian networks. Two methods, namely, ensembling and majority vote, were then employed to select an approach to combine the results of the ten trained models. Specifically, in the ensembling method, the ten conditional probability distributions obtained from the trained models were averaged to obtain a single model that was used to predict and classify accident injury severity for the validation set. In the majority vote method, the ten models were each used to predict and classify accident injury severity. The majority vote from the ten predictions was then used as the ultimate prediction. In both methods, in the case of a tie, a random number is generated where the class is assigned to property damage if the class probability is greater than 0.185, and assigned to the injury/severity class otherwise. The threshold of 0.185 is selected based on the fraction

114

F. van Wyk et al.

of injury/fatality accidents in the overall dataset. Results for the two methods on the validation set indicate that the majority vote method performed superior to the ensemble method based on both the F-score and accuracy. The majority vote method had an F-score and accuracy of 0.85 and 0.76, respectively, as opposed to an F-score of 0.76 and accuracy of 0.66 for the ensemble method. Hence, the majority vote was used for the remaining analyses. Specifically, the majority vote method was used on the test set for both balanced and unbalanced training sets. The balanced case provides promising results as illustrated in Table 2. The F-score of 0.79 indicates that the network classified accidents as property damage and injury/fatality with high precision and recall. The AUC for the ROC curve for the testing dataset was 0.62, illustrating that the trained model discovers correlation between variables in an effective manner in order to classify the accident severity [44]. The geometric mean of recall and specificity of 0.6 illustrates that the model was able to predict both classes of severity. In addition, the high precision indicates that the Bayesian network was able to classify the majority of property damage cases correctly. Table 2 also illustrates the test results for an unbalanced training set with 815 property damage and 185 fatal/injury cases per training set. As expected, the unbalanced case overfits the data for the property damage class resulting in a high F-score, but lacks the ability to correctly predict fatal/injury cases as evident from the ROC score and geometric mean of 0.58 and 0.46, respectively. Table 2. Test results for balanced and unbalanced training sets Performance metric Accuracy Recall Precision ROC score F-score Geometric mean of recall and specificity

Balanced 0.69 0.73 0.87 0.62 0.79 0.60

Unbalanced 0.81 0.94 0.84 0.58 0.89 0.46

The relationship between variables in a Bayesian network enables probability inference analyses based on conditional probability distributions for all factors included in the network. By setting evidences for specific variables, the contribution of factors to accident injury severity is quantified. More specifically, the difference between the probabilities of accident injury severity outcomes with and without evidence for a particular level of a given factor, in the absence of any further evidence for the other factors, determines the impact of that particular factor level on the outcomes. Table 3 illustrates the inference results for variables significantly influencing accident injury severity. Specifically, Table 3 presents the ten most influential factor levels where the corresponding percentages are the percentage difference under the factor level, averaged over the ten trained Bayesian network models using balanced data.

A Path Towards Understanding Factors Affecting Crash Severity

115

Table 3. Probability inference results for variables affecting injury severities Factor Road surface condition – Dry IPE – Pedestrian involved Day of week – Saturday Weather conditions – Clear Road zone – Intersection Speed limit - Less than or equal to 35 mph Visibility – Dark unlighted Weather conditions - Severe crosswinds Weather conditions – Snow Road surface conditions – Ice/snow

Fatality/injury +11.2% +6.5% +6.1% +5.1% +2.2% −3% −4.5% −9.2% −9.9% −11.1%

As seen in Table 3, the inference results indicate that a dry road surface condition causes the highest increase, approximately by 11.2%, in the probability that a crash results in fatality/injury. This is mainly because dry road conditions are typically associated with higher travelling speeds, which can contribute to an increase in the severity if an accident occurs. Furthermore, snowy road surface and poor weather conditions are associated with a lower likelihood of an accident resulting in fatal/injury accident and therefore a higher likelihood that it causes property damage if it occurs. These results are consistent with previous studies that highlight the significance of factors influencing accident injury severity [20, 34]. 4.3

Bayesian Network Extension

With reference to the autonomous network in Fig. 1, indicated by the dashed lines, the following discussion aims to provide justification for the selection of the factors that may influence accident injury severity and ways to calibrate these variables as data become more readily available for autonomous vehicles. It is envisioned that the manufacturer data collected during recent autonomous technology studies, such as the Safety Pilot Study conducted by University of Michigan Transportation Research Institute (UMTRI), will enable the calibration of the following factors. Experts across various industries anticipate that the gradual introduction of autonomous driving technologies will lead to safer and more efficient roadways. Advocates argue that, in the future, autonomous cars will reduce traffic accidents by up to 90% [45, 46]. Autonomous vehicles will continuously monitor the environment using sensors, cameras, radars, and a global positioning system (GPS) which ultimately feeds information to the control software that enables autonomous tasks such as autonomous emergency braking (AEB), adaptive cruise control (ACC), lane keeping assist, and automatic parking [47, 48]. During the transition to an autonomous vehicle network and while the technology is still developing, care need to be taken to ensure that all factors contributing to accidents and increased severity for accidents involving autonomous vehicles are understood. In addition, new risks related to autonomous driving technologies seem inevitable. For

116

F. van Wyk et al.

instance, an area of autonomous technology in vehicles that requires significant research attention is the concept of networking. Networking comes in the forms of vehicle to vehicle (V2V) and vehicle to infrastructure (V2I) communication. However, this area of autonomous technology introduces certain dangers in the form of cybersecurity risks. In general, anticipated risks for autonomous and semi- autonomous vehicles include sensor failure, the injection of false messages, or contrasting information received from various sources. Below, the calibration of factors influencing accident injury severity in various levels of connected autonomous vehicles are discussed. GPS Accuracy and Availability. Autonomous vehicles use GPS, radar, and sensor technology to direct and steer the vehicle without the assistance of the driver. However, certain factors influence the accuracy of these readings and may therefore introduce non-precise information that could cause unsafe conditions when used for controlling the vehicle. For instance, clearly, GPS accuracy gets affected by presence of tall buildings and dense urban areas [49, 50]. It is likely that the lack of accurate information about the vehicle position may lead to accidents and possibly, increase their severity under certain conditions. Data for this factor in the future can be acquired by generating maps that could describe the strength of GPS signal, in order to estimate the GPS accuracy and availability for a journey. Quality of Traffic Signs and Lane Markings. The quality of traffic signs and lane markings significantly impact the steering capabilities of autonomous vehicles. For instance, in 2017 a Tesla Model S drove into a highway barrier at a construction zone in Dallas [51]. This occurred as a result of sensors failing to recognize the roadway markings and traffic signs to merge into another lane. The road construction and warning signs were poorly implemented. To calibrate this factor in the Bayesian network, the quality of traffic signs and lane marking can be approximated using classification models trained separately for this purpose. In addition, manufacturer data can be used to estimate the ability of hardware and software to recognize various traffic signs and lane markings. Quality of Sensor Messages. Autonomous vehicles depend on information received from sensors and radars. Therefore, if a sensor is obstructed or interfered with, the software controlling the steering of the vehicle may receive the wrong information or no information at all [52]. For instance, if mud covers a sensor, bright lights or electronic interference affect sensor readings, or if malicious information finds its way into the software, the vehicle and its passengers will be at risk. This factor can be calibrated by using manufacturer data regarding the ability of sensors to transmit the correct information under the mentioned circumstances. Vehicle to Infrastructure (V2I) Communication. V2I communication refers to the exchange of information between the vehicle and its surrounding infrastructure. For instance, traffic information such as speed limits, traffic lights readings, and traffic reports are communicated to the vehicle [53]. If incorrect information is obtained by the car, or incorrect information is transmitted by the roadside units (RSUs), dangerous

A Path Towards Understanding Factors Affecting Crash Severity

117

situations can arise. This factor in the Bayesian network can be calibrated using a combination of manufacturer and city infrastructure data. Vehicle to Vehicle (V2V) Communication. V2 V communication is the exchange of information between vehicles. For instance, emergency brake warning, forward collision warning, and intersection movement assistance will be common in autonomous vehicles and could potentially revolutionize vehicle safety [45]. However, if incorrect/imprecise information is obtained from surrounding or oncoming traffic, dangerous situations may arise. This factor can be calibrated in the future as more field studies that include V2 V technology are conducted. An example of such studies is the Safety Pilot project at University of Michigan, where about 3,000 vehicles in the Ann Arbor region were equipped with communication equipment.

5 Conclusions This paper develops a Bayesian network to estimate the severity of an accident, should it occur, for all types of accidents, road conditions, and road segments, as a function of pre-crash information. The generic Bayesian network is developed for both nonautonomous and autonomous vehicles which will share the road for the foreseeable future. A model is trained for non-autonomous vehicles using state-wide data from Michigan for the year 2016. Results indicate that the Bayesian network performs well in the classification of accident injury severity, particularly when training sets are balanced to avoid favoring the more represented group, i.e., property damage accidents. The F-score of 0.79, area under the ROC curve of 0.62, and geometric mean of recall and specificity of 0.6 illustrate that the trained model discovers correlation between causal and contributing variables in an effective manner in order to classify the accident injury severity. Furthermore, discussions are presented on the calibration and testing of the autonomous model in the future as data from autonomous vehicle technologies become more readily available. It is anticipated that the developed methodology would assist the development of countermeasures to decrease accident severity and improve traffic safety performance. The study is subject to certain limitations. The data set used in the numerical analysis is limited to the Michigan area for the year 2016. Hence, it should be noted that the reported results and insights are only valid for this area and time period. Further analyses including other areas over an extended time period are required to ensure generalizability. Furthermore, due to the low frequency of fatal accidents and limited information regarding the severity of crashes in the data set, only two levels were used to model crash severity in the numerical study. In future research efforts, a more granular approach to modeling the severity of accidents may be employed if more detailed data is acquired. Further research efforts can be employed to improve the performance of the Bayesian network and investigate the correlation between factors contributing to accident injury severity. Additional factors contributing to accidents can be included in the network and a variety of structure scoring methods can be utilized to improve the classification performance. Lastly, the autonomous Bayesian network can be calibrated and tested as data becomes more readily available.

118

F. van Wyk et al.

References 1. World Health Organization: Global status report on road safety 2015. World Health Organization (2015) 2. National Highway Traffic Safety Administration: 2015, Motor vehicle crashes: overview. Traffic Saf. Facts Res. Note 2016, 1–9 (2016) 3. National Safety Council: Injury Facts: Library of Congress Catalog Card Number: 99-74142, 2015 edn, Itasca, IL (2015) 4. Mannering, F.L., Grodsky, L.L.: Statistical analysis of motorcyclists’ perceived accident risk. Accid. Anal. Prev. 27(1), 21–31 (1995) 5. Howard, M.E., Desai, A.V., Grunstein, R.R., Hukins, C., Armstrong, J.G., Joffe, D., Swann, P., Campbell, D.A., Pierce, R.J.: Sleepiness, sleep-disordered breathing, and accident risk factors in commercial vehicle drivers. Am. J. Respir. Crit. Care Med. 170(9), 1014–1021 (2004) 6. Shankar, V., Mannering, F., Barfield, W.: Effect of roadway geometrics and environmental factors on rural freeway accident frequencies. Accid. Anal. Prev. 27(3), 371–389 (1995) 7. Lord, D., Mannering, F.: The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transp. Res. Part A: Policy Pract. 44(5), 291–305 (2010) 8. Chang, L.Y., Chen, W.C.: Data mining of tree-based models to analyze freeway accident frequency. J. Saf. Res. 36(4), 365–375 (2005) 9. Hayakawa, H., Fischbeck, P.S., Fischhoff, B.: Traffic accident statistics and risk perceptions in Japan and the United States. Accid. Anal. Prev. 32(6), 827–835 (2000) 10. Valent, F., Schiava, F., Savonitto, C., Gallo, T., Brusaferro, S., Barbone, F.: Risk factors for fatal road traffic accidents in Udine, Italy. Accid. Anal. Prev. 34(1), 71–84 (2002) 11. Zajac, S.S., Ivan, J.N.: Factors influencing injury severity of motor vehicle–crossing pedestrian crashes in rural Connecticut. Accid. Anal. Prev. 35(3), 369–379 (2003) 12. Keall, M.D., Frith, W.J., Patterson, T.L.: The influence of alcohol, age and number of passengers on the night-time risk of driver fatal injury in New Zealand. Accid. Anal. Prev. 36(1), 49–61 (2004) 13. Derrig, R.A., Segui-Gomez, M., Abtahi, A., Liu, L.L.: The effect of population safety belt usage rates on motor vehicle-related fatalities. Accid. Anal. Prev. 34(1), 101–110 (2002) 14. Yannis, G., Golias, J., Papadimitriou, E.: Driver age and vehicle engine size effects on fault and severity in young motorcyclist accidents. Accid. Anal. Prev. 37(2), 327–333 (2005) 15. Edwards, J.B.: The relationship between road accident severity and recorded weather. J. Saf. Res. 29(4), 249–262 (1999) 16. Neyens, D.M., Boyle, L.N.: The influence of driver distraction on the severity of injuries sustained by teenage drivers and their passengers. Accid. Anal. Prev. 40(1), 254–259 (2008) 17. Chen, C., Zhang, G., Tarefder, R., Ma, J., Wei, H., Guan, H.: A multinomial logit modelBayesian network hybrid approach for driver injury severity analyses in rear-end crashes. Accid. Anal. Prev. 80, 76–88 (2015) 18. Yau, K.K.: Risk factors affecting the severity of single vehicle traffic accidents in Hong Kong. Accid. Anal. Prev. 36(3), 333–340 (2004) 19. Milton, J.C., Shankar, V.N., Mannering, F.L.: Highway accident severities and the mixed logit model: an exploratory empirical analysis. Accid. Anal. Prev. 40(1), 260–266 (2008) 20. Delen, D., Sharda, R., Bessonov, M.: Identifying significant predictors of injury severity in traffic accidents using a series of artificial neural networks. Accid. Anal. Prev. 38(3), 434– 444 (2006)

A Path Towards Understanding Factors Affecting Crash Severity

119

21. Ouyang, Y., Shankar, V., Yamamoto, T.: Modeling the simultaneity in injury causation in multivehicle collisions. Transp. Res. Rec.: J. Transp. Res. Board 1784, 143–152 (2002) 22. Malyshkina, N.V., Mannering, F.L.: Markov switching multinomial logit model: an application to accident-injury severities. Accid. Anal. Prev. 41(4), 829–838 (2009) 23. Train, K.E.: Discrete Choice Methods with Simulation. Cambridge University Press, Cambridge (2009) 24. Xie, Y., Zhang, Y., Liang, F.: Crash injury severity analysis using Bayesian ordered probit models. J. Transp. Eng. 135(1), 18–25 (2009) 25. Mujalli, R.O., de Oña, J.: Injury severity models for motor vehicle accidents: a review. In: Proceedings of the Institution of Civil Engineers (2013) 26. Daniels, S., Brijs, T., Nuyts, E., Wets, G.: Externality of risk and crash severity at roundabouts. Accid. Anal. Prev. 42(6), 1966–1973 (2010) 27. Wang, X., Kockelman, K.: Use of heteroscedastic ordered logit model to study severity of occupant injury: distinguishing effects of vehicle weight and type. Transp. Res. Rec.: J. Transp. Res. Board 1908, 195–204 (2005) 28. Jin, Y., Wang, X., Chen, X.: Right-angle crash injury severity analysis using ordered probability models. In: 2010 International Conference on Intelligent Computation Technology and Automation (ICICTA), vol. 3, pp. 206–209. IEEE, May 2010 29. Zhu, X., Srinivasan, S.: A comprehensive analysis of factors influencing the injury severity of large-truck crashes. Accid. Anal. Prev. 43(1), 49–57 (2011) 30. Kunt, M.M., Aghayan, I., Noii, N.: Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods. Transport 26(4), 353–366 (2011) 31. Sameen, M.I., Pradhan, B.: Severity prediction of traffic accidents with recurrent neural networks. Appl. Sci. 7(6), 476 (2017) 32. Simoncic, M.: A Bayesian network model of two-car accidents. J. Transp. Stat. 7(2/3), 13– 25 (2004) 33. de Ona, J., López, G., Mujalli, R., Calvo, F.J.: Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accid. Anal. Prev. 51, 1–10 (2013) 34. de Oña, J., Mujalli, R.O., Calvo, F.J.: Analysis of traffic accident injury severity on Spanish rural highways using Bayesian networks. Accid. Anal. Prev. 43(1), 402–411 (2011) 35. Mujalli, R.O., De ONa, J.: A method for simplifying the analysis of traffic accidents injury severity on two-lane highways using Bayesian networks. J. Saf. Res. 42(5), 317–326 (2011) 36. Zong, F., Xu, H., Zhang, H.: Prediction for traffic accident severity: comparing the Bayesian network and regression models. Math. Probl. Eng. 2013 (2013) 37. Gregoriades, A., Mouskos, K.C.: Black spots identification through a Bayesian networks quantification of accident risk index. Transp. Res. Part C: Emerg. Technol. 28, 28–43 (2013) 38. Mbakwe, A.C., Saka, A.A., Choi, K., Lee, Y.-J.: Modeling highway traffic safety in Nigeria using Delphi technique and Bayesian network. In: TRB 93rd Annual Meeting Compendium of Papers, Washington, D.C., p. 21 (2014) 39. University of Michigan: Michigan Traffic Crash Facts (2017). https://www.michigantraffic crashfacts.org. Accessed 29 July 2017 40. Mittal, A. (ed.): Bayesian Network Technologies: Applications and Graphical Models: Applications and Graphical Models. IGI Global (2007) 41. Jensen, F.V.: An Introduction to Bayesian Networks, vol. 210, pp. 1–178. UCL Press, London (1996) 42. Margaritis, D.: Learning Bayesian network model structure from data (No. CMU-CS-03153). Carnegie-Mellon University, Pittsburgh, PA School of Computer Science (2003) 43. Powers, D.M.W.: Evaluation: from precision, recall and f-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)

120

F. van Wyk et al.

44. Tape, T.G.: Interpretation of diagnostic tests. Ann. Intern. Med. 135(1), 72 (2001) 45. Fagnant, D.J., Kockelman, K.: Preparing a nation for autonomous vehicles: opportunities, barriers and policy recommendations. Transp. Res. Part A: Policy Pract. 77, 167–181 (2015) 46. Silberg, G., Wallace, R., Matuszak, G., Plessers, J., Brower, C., Subramanian, D.: Selfdriving cars: the next revolution. White paper, KPMG LLP & Center of Automotive Research, p. 36 (2012) 47. Yeomans, G.: Autonomous vehicles: handing over control—opportunities and risks for insurance. Lloyd’s (2014) 48. SAE International: Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles (2016) 49. Kummerle, R., Hahnel, D., Dolgov, D., Thrun, S., Burgard, W.: Autonomous driving in a multi-level parking structure. In: 2009 IEEE International Conference on Robotics and Automation, ICRA 2009, pp. 3395–3400. IEEE, May 2009 50. Schipperijn, J., Kerr, J., Duncan, S., Madsen, T., Klinker, C.D., Troelsen, J.: Dynamic accuracy of GPS receivers for use in health research: a novel method to assess GPS accuracy in real-world settings. Front. Public Health 2, 21 (2014) 51. Durden, T.: Tesla autopilot crash caught on dashcam (2017). http://www.zerohedge.com/ news/2017-03-02/tesla-autopilot-crash-caught-dashcam. Accessed 17 June 2017 52. National Highway Traffic Safety Administration: Cybersecurity best practices for modern vehicles. Report No. DOT HS, 812, p. 333 (2016) 53. Petit, J., Shladover, S.E.: Potential cyberattacks on automated vehicles. IEEE Trans. Intell. Transp. Syst. 16(2), 546–556 (2015)

Automobile Automation and Lifecycle: How Digitalisation and Security Issues Affect the Car as a Product and Service? Antti Hakkala1(B) and Olli I. Heimo2 1

Department of Future Technologies, University of Turku, Turku, Finland [email protected] 2 Turku School of Economics, University of Turku, Turku, Finland [email protected]

Abstract. In this paper, we argue that to accommodate the change brought by autonomous and semi-autonomous cars, the current lifecycle model of a car should be re-assessed. Software security issues for vehicles are amplified significantly, as vehicles become more dependent on software and the driving process is transitioning towards autonomous operation. As vehicle automation levels increase and fully autonomous cars are already on public roads, the options for malicious parties to influence and harm traffic by hacking vehicles multiply, increasing risk levels of traffic. The magnitude of this issue is huge as most countries in Western world are dependent on the road travel, and the problem is a multidisciplinary one ranging from business, technology, and security to juridical, sociological, and ethical issues. We state that as the digitalisation and AI-based solutions take part to driving processes, we must start thinking cars more as a service and not as a standalone product. We argue that there should be more focus and further research on the issues of software maintenance and new software models to keep the roads safe from outdated and dangerous automated or semi-automated vehicles.

Keywords: Autonomous vehicles

1

· Security · Ethics · Life cycle

Introduction

Road vehicle automation is here. Computers are already controlling the driving process as the modern road vehicles are full of driver aids, and computers are even replacing humans as truck drivers. Google’s AI is driving down the streets of California [1–3] and militaries plan using automated vehicles to support troops in dangerous areas [4]. Drones fly deliveries through our skies [5], and Elon Musk has predicted that no one will be allowed to drive a car in the near future [6], because automation supposedly makes less mistakes. Semi-automated vehicles have already improved the driving process by making driving easier by shortening braking distances, increasing driver’s control of c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 121–137, 2020. https://doi.org/10.1007/978-3-030-29513-4_9

122

A. Hakkala and O. I. Heimo

the car, and monitoring and optimising the engines and transmissions. Automated vehicles have the potential to solve many more problems, e.g. people driving while tired, distracted, intoxicated, or with limited skills. Human mistakes can be eliminated by taking the human out of the loop. Even without full autonomy, future cars will host impressive and ever-increasing amounts of new sensor technology and computational capacity designed to improve the performance and safety of the car. It is important to observe that practically everything controlled by a computer can be hacked, and modern cars are not exempt. This in turn creates enormous pressure for securing car systems on all levels: hardware, software and communications. Hardware must be robust and resilient to both intentional attacks and unintentional failures, communications must be secured from internal networks and buses to Vehicle-to-X (V2X) communication links, and software security and safety must be assured from internal control logic to potential thirdparty software. This is a daunting task, as failure will put all those people on streets, roads, and alleys where (semi-)autonomous vehicles roam, at risk. The Internet of Things (IoT) has emerged as a platform of interconnected everyday devices equipped with microprocessors and the capability to collect, process and share data with other similar devices and backend servers. As IoT solutions became more common, hacking them became more commonplace too. Nowadays a news report of a refrigerator sending junk mail [7] would not even get published, while similar news went viral just a few years ago! Presently, news are portraying hacking incidents on a massive scale. According to Kaspersky Lab, millions of computers are mining crypto currencies without their owners’ knowledge [8]. In addition, major ransomware attacks have been made in recent years where computers have been taken over and the information contained within affected computers has been encrypted, rendering them inoperable (see e.g. [9–11]). Given the examples above, the role of a car has changed dramatically as they have become mobile, interconnected computational platforms that need regular software updates, similarly to other smart devices. Current smart devices have drastically shorter product life and update cycles compared to cars. If cars are to develop towards a service model and platform economy—Car as a Service (CaaS) on a platform of your choice—in the future, security issues observed in other “smart” devices such as smartphones and IoT devices will certainly affect the car life cycle. In this paper we discuss the development of autonomous road vehicles and what this development will bring forth in near future. We focus on the issues of what kind of product and service the automobile of the future is from a life cycle perspective, and how we should approach these at the current phase of development. The paper itself is a literature based theoretical analysis of the current situation. It also serves as a higher-level guide on what issues should be taken into account both security- and business-wise while developing these mechanical marvels.

Automobile Automation and Lifecycle

1.1

123

Organization of the Paper

The rest of the paper is organized as follows. In Sect. 2 we overview the principles and status of vehicle automation. In Sect. 3 we discuss car life cycle and how it will change in the future. In Sect. 4 we analyze the effects of cyber security and safety concerns to the car life cycle, and proceed to discuss the developments further in Sect. 5. Finally we conclude the paper with closing remarks in Sect. 6.

2

Automated Road Vehicles

The contemporary automobile already hosts multiple Electronic Control Units (ECUs) responsible for controlling various operational functions of the car, ranging from engine and drivetrain control to brakes and entertainment systems. Each ECU executes its own programming with lines of code numbering in the millions, and given that a modern car has around 50–70 individual ECUs, an estimation that a car runs on 100 million lines of code, or more, is certainly reasonable [12]. Although computers are already responsible for many operational tasks, the driving responsibility is still in the hands of the human driver for most, if not all, situations. Indeed, the share of control a computer has over the actual driving task is a key metric in defining what an autonomous vehicle is. The key terms and concepts for autonomous vehicles are defined in the Society of Automotive Engineers (SAE) standard J3016, “Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles” [13]. It establishes six levels of driving automation for on-road vehicles, going from 0 (no automation) to 5 (full automation). On levels 0–2 the human driver is in charge, while on levels 3–5 the computer has the (main) responsibility for driving the vehicle. Autonomous driving levels are illustrated in Fig. 1. On level 0, all aspects of the Dynamic Driving Task (DDT) (i.e. real-time operational and tactical control of the car) are controlled by the human driver. If there are automatic safety or warning systems that observe the environment, they only warn the human driver if defined safety parameters are violated, and do not interfere with the DDT. On level 1, an automated system can interfere with steering OR acceleration/deceleration, while on level 2 multiple automated systems together can interfere with steering AND acceleration/deceleration, effectively taking full control of the car from the human driver. On level 3 and onwards, the Automated Driving System (ADS) oversees the DDT. On level 3 the human driver is the backup operator should the ADS encounter a situation it cannot handle. On level 4 the ADS is expected to oversee the DDT and, for some driving modes or scenarios (e.g. driving on a freeway, parking, etc.) manage error situations where a human backup operator would be required on a level 3 system. On level 5, the ADS is in full control of the car, in all situations and all driving modes, without the need (or even the possibility of) human intervention. As the automation level increases, the computer must make more decisions (both ethical and operational) as the human driver increasingly becomes a passenger in the vehicle due to decreasing ability to control the vehicle. Indeed, the J3016 standard explicitly states that on SAE levels 4 and 5, the ADS does

124

A. Hakkala and O. I. Heimo

Fig. 1. Summary of SAE levels of driving automation. [13]

not have to immediately disengage from the DDT upon human request for control [13, pp. 20–21]. 2.1

Technical Implementation

As is shown in Fig. 1, autonomous vehicles need to steer the vehicle, monitor the driving environment, and make decisions on how to perform the DDT. For monitoring the environment, an autonomous vehicle must rely on various sensors. Common sensor types include video feeds in various wavelengths (normal light, infrared), ultrasound, LIDAR and RADAR. All these sensors provide information upon which decisions on how to per-form the DDT are founded. A model of the dynamic environment based on sensor observations is used to determine the surroundings of the vehicle, and to determine whether there are any obstacles on the road, where other vehicles are located, detecting pedestrians, etc. Should the data provided by the sensors be somehow compromised, either by accident or by design, the decisions made by the vehicle on how to perform the DDT are not necessarily correct. The integrity of the sensor data is therefore critical to the correct functioning of the car. Should an adversary wish to interfere with an autonomous vehicle, a simple way is to attack the sensors, causing for example the vehicle to incorrectly perceive a pedestrian as something else, or fail to observe them altogether.

Automobile Automation and Lifecycle

125

A modern non-autonomous car is controlled by the teamwork of computer and driver, and both controllers have their own separate fields of responsibility, as is illustrated by SAE automation level 0–2 definitions. Some overlapping fields, e.g. collision avoidance systems, exist in contemporary vehicles. Many new cars are equipped with systems that take control of the DDT in some predefined scenarios, such as when the car is in danger of colliding with a pedestrian. Most major car manufacturers provide SAE level 1 or 2 Advanced Driving Assistance Systems (ADASs) with collision avoidance as an option in their vehicles. 2.2

Status of Vehicle Automation

Few truly autonomous vehicles (SAE level 3 or higher) are on the road as of this writing, in January 2019. Google reports that their autonomous cars – currently developed under the name Waymo – have travelled over 5 million miles. In addition BMW, Nissan, Ford, General Motors, Delphi, Tesla, Mercedes Benz, and Bosch have their own projects going on [14]. Thus far, no car company has an autonomous vehicle beyond SAE level 3. Tesla has autonomous driving capability, informing that their hardware in SAE level 5 but the software does not meet these standards—yet. They argue that the software can be updated afterwards to match the SAE 5 requirements. These autonomous cars have been found to be working in optimal conditions in places such as California or Nevada where the weather conditions are generally favorable. The limitations come when in wintertime or other similar “bad weather” conditions due to missing road markings etc. In winter conditions, an autonomous vehicle is so far forced to rely on extremely precise location service and pre-processed route information, limiting the maneuverability and speed of an autonomous vehicle [15]. During the brief testing period on public roads, autonomous road vehicles have been involved in numerous accidents, even fatal ones. So far, four accidents that have involved fatalities have occurred, one in China and three in the United States [16–18]. In some cases, the functionality of the ADS is questioned, while in some cases there is a clear indication of human operator error. 2.3

Why Autonomous Cars?

As an artefact, the car will change significantly in the following decade due to various developments. Given properly functional sensors and software, an autonomous (or semi-autonomous) car will have operational capabilities beyond human drivers. For example, infrared vision, significantly shorter reaction time, freedom from distractions or tiredness, and more economic driving are all possible with a computer behind the wheel. To emphasize, on paper, a computer should be a better driver than a human. But how about on the tarmac? If a computer beats a human in safe driving, road safety will improve with increase of automation. Autonomous vehicles should—if done properly—reduce

126

A. Hakkala and O. I. Heimo

accidents and thereby injuries and deaths caused by human error. As the society’55s aim is to safe keep its members, at a first glance it should not be a difficult choice whether to allow autonomous vehicles to operate on public roads. Public transport could also benefit from automation. A major expense article in public transport—as in many other fields—is wages. Autonomous vehicles do not require pay or complain about long working hours, being capable of operating 24/7. Moreover, the errors made by overworked bus drivers could be mitigated by automation. In addition, the private sector could benefit from the increased safety brought by autonomous vehicles, as well as the potential savings that would have a huge impact to many economies. To emphasise, the most common job in most of the U.S. states is driver [19]. The paradigm shift brought by widespread adaption of autonomous vehicles would be comparable to the introduction of the automobile and the subsequent decline of the horse carriage driver profession. 2.4

The Way Forward

The first phase of this major societal shift has already been in progress for a while. Cars have become increasingly automated and thus more dependent on computerised systems, and will increasingly do so in the future. Therefore, these systems must also be protected against unauthorised and unwanted access and tampering. The car therefore is not only a data storage or a system that produces data, but it also is a tool used for making day-to-day life easier and safer. No one would buy a car that does not inherently promote values of safety and security. Besides the driver, the security aspects must also be considered from the point of view of passengers and other road-users as well.

3

Car as a Service or to Service a Car?

As an entity, the car has traditionally been seen as a product one buys, but this is not the whole truth. If cars are recalled due to issues in functionality or security, for example, car manufacturers are still responsible for costs. That is, the car can also be seen partly as a service, and by buying a car one is guaranteed to have a serviceable and functional car—to an extent. Even more, car manufacturers market their products with extended warranties, free services, and free insurances, thus blurring the line between product and service. Yet the service rarely includes “free” components, e.g. brake disks, when they must be changed. Car manufacturers include producing and delivering of spare parts to their business models, but depending on the manufacturer, the production of “original” spare parts ends after a decade or two. This leaves the customer with a single option: buy accessory parts. In the software industry, a product is more usually sold as a service due to the requirements of upkeep and updates. Whereas a software product may require updates to work with new operating systems, for example, they also require security updates when new threats and security issues are discovered.

Automobile Automation and Lifecycle

127

Other products also require a could-based service to function. In the mobile industry, it is standard operating procedure to cease providing the service that is required to keep the product functional and thus force the customer to buy a new phone, should they want to use the latest applications or have the latest security updates. With cars, though, security and functionality updates are more important. Whereas an unsafe mobile phone can lead to many kinds of disastrous consequences, the mobile device is not two tons of steel moving down the highway at 120 km/h. The impact of a hacked phone thus can be seen as lesser than that of a hacked road vehicle. Therefore, it is understandable to assume that in the near future, information security will become a criteria for evaluating the roadworthiness of a car. Governments already have systems and processes for road vehicle inspections that aim to guarantee a decent level of security (and emissions) for road traffic. Why should this inspection process not include the information security of a modern automated or semi-automated car? Are we not interested in security, or is it just too new for us in this context? Surely it will be included in the roadworthiness evaluation process—eventually. So the big question is: if cars are becoming increasingly dependent on security and functionality updates, what is the proper life-cycle for a car? Most cars stay functional for over two decades when used sparsely and maintained properly, and thus in many countries an average car age can be as high as 17 years [20]. This means, in essence, that for every new car there is one over 30 year old vehicle on the road. From a business perspective, though, this is bad for car manufacturers. Even though the longevity and robustness of cars can be used as a marketing asset, in the end car manufacturers are in the business of selling new cars, not giving away free update packages for old ones. This of course creates a conflict of interest between car owners and manufacturers. As information security inspections in roadworthiness evaluation processes are due in near future, the question is who will pay for it all? As of now, car manufacturers are required to fix all faults in their designs, and a hackable computer would seem to be one. Will that be also in the future, or should the customer be forced to pay for the updates after a decent time, similarly as they pay for a new set of tires or changing a leaky radiator? To be more precise: is the software a product sold as is, or is it a service for which the customer is entitled to? These questions of course are not something that can be solved in a single article due to the multidisciplinary nature and especially due the juridical and political aspects of the issues. Due to the unique nature of security, any serious vulnerabilities found can render a product—in this case a car—dangerous in mere minutes. That is, when found and made public, a security vulnerability that is maliciously exploited in the wild can endanger a significant number of human lives. If—and when—this possibility is taken into consideration by government vehicle inspection systems, they, in the worst case scenario, can deny the use and operation of such vehicles altogether until a patch has been made and installed.

128

4

A. Hakkala and O. I. Heimo

Security and Car Life Cycle

Whereas the driver of a car is relatively hard to hack, computer systems are more vulnerable. To protect all interest groups, hacking such critical computer systems should be made as hard as possible; Unhackable computer systems do not exist. Or as Eiza and Ni state about cars and computers: “if you see word ‘software’, replace it with ‘hackable’; and if you see word ‘connected’, replace it with ‘exposed’” [21]. Vulnerabilities and built-in features in different cars that can be used against the driver have been around for a while. One of the clearest examples was the vulnerability found in the Jeep Cherokee, resulting in the recall 1.4 million Cherokees after researchers demonstrated they could remotely hijack the car’s system over the internet. The attack was described to be one “. . . that lets hackers send commands through the Jeep’s entertainment system to its dashboard functions, steering, brakes, and transmission, all from a laptop that may be across the country” [22]. Numerous other examples also exist; Another clear threat is tampering with the sensors of the car from the outside [21]. It is important to notice that almost everything a hacker can make a computer, or a phone do, they can do to a car, and much more. The car is a computer with wheels and an engine. It is not that different from other computational platforms and with the right expertise could thus be turned into a (gasolinepowered) cryptocurrency mining platform, for example. A car could also be a target of a ransomware attack just to get the car to work or keeping the health and well-being of the people inside of it as ransom. Imagine your vehicle suddenly announcing that unless 200e worth of bitcoin is paid to a specified bitcoin wallet, the autonomous vehicle will deliberately crash—with the passengers still within. We need cybersecurity where not only the information and its’ integrity nor only the communication is protected, but also the physical world and the assets that the security must protect are taken into accord. E.g. Simson Garfinkel has stated that the hackers are the real obstacle for self-driving cars [23]. While the car has essentially become a computer (a set of networked computers, actually) with wheels and an engine, it is also approximately 2 tons of metal moving over 30 m/s—or a bus/truck moving a tad slower but weighing 10 to 80 tons—with humans inside and moving in an environment with people in vicinity. Therefore, the possibilities for a hacker to cause harm are increased when hacking a car compared to a normal computer. In the context of vehicles in general and cars in particular, cyber security and safety are strongly connected. Safety engineering is already strongly embedded in the automotive industry, but cyber security has not yet been seen as important a factor. As disciplines, both cyber security and safety aim to prevent or mitigate unwanted events, whether intentional or not. When cars are moving towards more automation, computational capacity embedded in a car increases, multiple communication channels to and from the car exist, and the reliance on the correct function of various sensors increases. Should the development towards fully autonomous vehicles continue, it will lead to cyber security engineering reaching

Automobile Automation and Lifecycle

129

parity with safety engineering in the automotive industry, as the role of software and computational platforms and their security will be emphasized overall. 4.1

How Cars Can Be Hacked?

Adding complexity to a computer system increases the attack surface available to a malicious actor. As Bruce Schneier has observed in an interview, “complexity is the worst enemy of security” [24]. And make no mistake, cars are already complex machines. As we already discussed, being controlled by 50–70 ECUs and more than 100 million lines of code must leave room for errors to occur. At a high abstraction level, the attack surfaces on a modern car can be divided into internal and external [25]. For an adversary to use the internal attack surface of a car, they must have physical access. External attack surfaces do not require physical access, but rather allow an adversary to attack the car over distance. A well-known conventional wisdom in cyber security states that an adversary gaining physical access to the target system equals game over for the defenders, as with physical access it is possible to hack any device. Even tamper-proof integrated circuits and cryptoprocessors can be hacked (see, c.f., [26]). The methods for gaining access to the systems of a modern car, absent of direct physical access, can be grouped into three groups: indirect physical access, short-range wireless, and long range wireless methods [25]. Indirect physical access methods include attacking through the OBD-II port indirectly by first compromising a computer used for diagnostics or attacking the entertainment system by a specifically crafted media file that play normally but also contain a malicious payload that exploits a vulnerability in the entertainment system and takes control of the car. Short-range wireless attack vectors include Bluetooth systems, RFID car keys, wireless tire pressure monitoring systems, or Wireless Local Area Networks, among others. Long-range wireless attack vectors include broadcast channels such as RDS radio systems and targeted radio channels such as remote telematics systems operating over cellular networks. All the attack methods discussed above can be—and have been—exploited to gain complete control of all systems of a car [25]. Recent examples of serious vulnerabilities using indirect physical access and wireless attack surfaces include those found in Tesla [27] and BMW [28] vehicles, both discovered by the same Chinese security team. The Tesla vulnerability was one of the first practical attacks capable of taking complete control of a stateof-the-art car over wireless. Altogether 14 different vulnerabilities were found in BMW vehicles, with various levels of access required. Six remote vulnerabilities are detailed in the report, using Bluetooth and cellular networks. But the vulnerabilities in one car model do not directly translate to another make and model, let alone another car manufacturer, right? It would at first seem logical that vulnerabilities in car systems would be limited to a single car model or a limited subset of each model, but unfortunately, this is not the case. Like the aviation industry, the automotive industry relies on standardisation to provide safety, interoperability and cost savings in the manufacturing process. This also means that the same standardized technologies, parts, controllers, and

130

A. Hakkala and O. I. Heimo

modules manufactured by automotive industry component providers are used in various car models across different manufacturers. To give a recent example, researchers have found vulnerabilities in keyless entry systems used in cars manufactured by VW Group, a manufacturer with a market share of more than 25% of cars sold in EU [29]. The vulnerability affects most car models manufactured by the group between 1995 and 2016, and a vulnerability in another keyless entry system, Hitag2, affects cars of various models from ten or more different manufacturers [30]. Even older cars that do not expose wireless attack vectors by themselves can be attacked if aftermarket entertainment or diagnostic systems are installed. Modern Android-based aftermarket entertainment systems do offer wireless connectivity over Wifi and Bluetooth, as well as access to internal car networks through the OBD-II bus. It has been shown that access to the internal network of a vehicle gives an attacker complete control over all systems of the car [25]. Many car owners who just want to upgrade their car entertainment systems may overlook this security aspect. So far to our knowledge, there are no known accidents or other security issues involving these systems, but it may just be a matter of time. 4.2

“Your Vehicle Is Missing Critical Security Updates”

It is paramount to understand that as cars will become more computerised, they will also have more in common with other modern computerised devices such as smart phones, televisions, or other smart appliances. Like these more mundane devices, the issue of software updates must be considered also for cars. As a modern telephone, a home appliance, and now, a car, is controlled by software, the software must be kept up to date to protect them against hackers that try to exploit vulnerabilities. Cybersecurity solutions must thus be implemented and maintained with meticulous care for the whole life-cycle of the car. A major issue in cyber security is the existence of old and even obsolete devices that are still actively used, regardless of discovered vulnerabilities that leave those devices open for exploitation. For example, the Wannacry ransomware that spread in the wild in May 2017 [10] used EternalBlue, a vulnerability in the Windows operating system that was publicly exposed in April 2017 [31], to spread from device to device. After disclosure, it was immediately patched by Microsoft, even to old and deprecated versions of Windows that no longer received any updates, such as Windows XP. Multiple organizations were still heavily affected by this attack. For example, the UK National Health Service (NHS) still has hundreds of thousands of computers still running unpatched Windows XP as their operating system, and these computers were targeted by the ransomware, leading to a serious compromise of NHS systems [32]. EternalBlue exemplifies the issue of obsolete, vulnerable devices that can be exploited with serious consequences to all stakeholders. Autonomous vehicles, due to their unique functionality and long product life cycle (not everyone

Automobile Automation and Lifecycle

131

can afford to drive a new car) are in danger of being used for malicious purposes if (or rather, when) vulnerabilities are discovered. As we discussed earlier software is hackable, software providers must update their software whenever security vulnerabilities are found to ensure correct and safe operation of the system. Sometimes such vulnerabilities are patched without any public scrutiny or incidents, but we have also seen some spectacular security failures in devices. The aforementioned EternalBlue vulnerability has allegedly been used for gaining unauthorized access to computer systems long before its publication, and the swiftness of the response by Microsoft spoke volumes on how serious a threat the vulnerability was. But what happens to devices that do not receive any updates? And to extend this to the vehicular domain, what happens when the device is a car? Jang et al. have examined what happens to technology in the event of an economic and infrastructural collapse [33]. While we are not discussing the end of civilization, but rather what happens when the manufacturer of a car goes out of business completely and no new updates are received, some of their conclusions do provide us insight. They found that in the event of a collapse, software (and, specifically, the lack thereof) presents a more serious challenge to computational continuity than hardware. When no one is writing new software or updates, the threats posed by malware, data corruption and disconnection from infrastructure are more serious that hardware issues. When a smart device reaches the end of its update cycle, it is probably still far from its actual end of product life. This leaves untold numbers of networkconnected devices online that are vulnerable to attacks, some of which will never be reported. An alleged CIA hacking tool, codenamed Weeping Angel, can be used to gain remote access to various Samsung Smart TV models [34]. Many Samsung devices with a vulnerable firmware version are still is active use, however, as people do not automatically replace a working device “just” because a vulnerability has been found. Such vulnerable devices can thus be found in living rooms across the globe—sometimes even creating quite the professional dilemma for a security researcher who happens to appreciate privacy, security, and a 65” screen. As earlier “dumb” phones such as Nokia 3110 are “eternal” with regard to both hardware and software, they are still fully functional even 20 years later. But a smartphone bought five years ago might already have reached the end of its support life, and any subsequent vulnerabilities found in the software will compromise the security of the device and the data stored on it. Now, let us consider the scenario where a car manufacturer is unable for whatever reason to provide updates of any kind to a vehicle. At first there will be little to no observable effect; The car will continue to function normally. Perhaps some subscription-based services will not be available, but the core functionality of the car is still there. For most vehicles, the loss of software support will not affect them for a long time. Problems arise when security and safety issues emerge, and they can effectively render a car nonfunctional whether by causing a malfunction or the relevant regulatory agency banning the use of

132

A. Hakkala and O. I. Heimo

an unpatched and thus unsafe vehicle on the road. The security dimension of the problem gets worse the more autonomous the vehicles are. SAE level 0–2 vehicles will most likely have next to none issues with lack of software updates, but as the complexity of systems increase with higher automation, so does the potential attack surfaces exposed to malicious actors. In addition to vehicle internal networks and their security, V2X communications pose another dilemma. Cars are developing towards a more connected environment and in the future will be both receiving and transmitting information to and from their surroundings, whether it be other cars (Vehicle-to-Vehicle, V2V) or the surrounding smart infrastructure (Vehicle-to-Infrastructure, V2I). For example, secure Over-the-Air (OTA) updates to vehicle software and firmware, intrusion detection and securing internal car networks are all still very much open problems [35]. An acceptable solution to these problems would make managing the overall security problems more easier, as the problem of securing a (semi-)autonomous car is already complex even when we focus on the vehicle itself. If we also include security issues brought by insecure, unpatched, or malfunctioning devices and infrastructure around the car, the problem becomes even more difficult and complex to solve in a satisfactory manner. 4.3

The Car as a Platform and Security Considerations

Cars have become increasingly complex with the integration of computer systems. The computers are getting more operation responsibility – or all of it. Will the cars of tomorrow have sufficient software updates to stay secure and safe to use? Will they have a pre-determined life cycle limited by the end of technical support lifetime? Will there be museum cars in X years? Similar concerns surface also in the case of manufacturer bankruptcy. Will a car suddenly become dangerous to its passenger should the software update cycle end, whether due to planned obsolescence or bankruptcy? If the car is moving towards a platform model where you do not buy a car, you buy into an ecosystem of different software and hardware producers and select the features and products you want, how does this affect security for vehicles, autonomous or not? As we have already discussed, even contemporary cars are vulnerable to malicious attacks, and the risks involved are amplified with the increase in levels of autonomy. In the movies whenever “the baddies” need to get access to systems (“Hack all the cars in a five-mile radius and do it now!”) they start hacking furiously and, usually sooner than later, they get access to numerous devices all around the globe, with little to no preparation at all. In real life, though, while a similar feat is arguably within the realm of possibility, it certainly is beyond the realm of plausibility. Real life cyberattacks that compromise hundreds or thousands (or hundreds of thousands, like the Mirai botnet [36] did) of devices are more of a dull affair. Such attacks are conducted over a long period and compromised devices are left operating normally, but with a backdoor that allows them to be taken over when necessary. This can be done by the hacker and by leaving a backdoor open to the system that the hacker can take into use or by spreading

Automobile Automation and Lifecycle

133

autonomously propagating malware which reports to the hacker when the backdoor is open. The latter is more dangerous, but nowadays fortunately quite rare, as operating system vendors have taken security seriously since the worst outbreaks of Sasser, MyDoom, Sobig and ILOVEYOU worms back in the early 00’s. Because the information security issues in modern cars have not been considered with similar gravity, as with desktop operating systems, the potential fallout can be even worse should the similar lax attitude also extend to autonomous vehicles. If a malicious hacker would only target, say, Bugatti or Aston Martin cars, the motive would probably be grand theft auto. Should the attacker want to hack as many cars as possible, they probably would target cars from a large manufacturer, thus giving them more attack surface. As we know, car manufacturers use similar parts and software in different models. For example a Jaguar X400 is, in fact, only “[a] little more than re-shelled Ford Mondeo” [37]. Therefore, if one finds a security vulnerability from one type of car, it is very likely that a similar flaw can be found from most of the cars produced by the manufacturer. Moreover, the vulnerability can be very widespread because the largest car manufacturers have huge market shares. For example, the VW Group (Volkswagen, Audi, Skoda, Porsche, etc.) has a market share of around 23% of all cars sold in Europe [38]. If one could infect just these cars and from the last 5 years, they would still have access to more than 10% of all the cars on European roads. This gives the attacker a lot of possibilities, for example to create a large network of bots for DDoS-attacks, a network of computers for cryptocurrency mining (not very effective computers, but loads of them!), or to use these to more sinister means such as mass hacking of autonomous road vehicles. If a single car can be used for a terror attack, what damage could an attacker do with hundreds or thousands of cars? One of the worst scenarios is using automated cars as a terror or a military first-strike weapon [39–41]. In this scenario a nation or an organisation hacks a massive amount of vehicles from another area, triggers them simultaneously to hit pedestrians, other cars, trains, bridges, or other targets causing massive amount of death and injuries simultaneously while crippling the infrastructure similar to heavy bombardment. A nation targeted by such an attack is most likely in chaos for a long time. There have also been claims that US intelligence services (mainly CIA) uses or have used the vulnerabilities in modern cars for assassination purposes, but they are yet just claims and should be treated as such [42]. Methods shown above could plausibly be used to execute such assassinations, however.

5

Discussion

Formulating a definite set of parameters for safe autonomous vehicles is a complex task. While this is not the focus of this paper, we argue that we still should have such parameters and standards for future autonomous vehicles defined, similarly as we require car manufacturers to comply with emission or road safety standards. We should require that new cars, while increasingly safe, do not also

134

A. Hakkala and O. I. Heimo

pose a hazard in new areas of road safety (such as terror attacks). And should this seem to be an impossible task for car manufacturers, we still have the cars of today which mainly fulfil these standards. We should examine the values we have embedded to automobiles. As we yearn for efficiency and ease-of-use, we should also remember the safety and security for the driver, passengers and other people alike. Whereas automated cars can react faster and do not suffer from distractions, tiredness, deceases, or intoxication, they can be hacked. As shown earlier, the autonomous automobile is a complex set of computers working together, and the words “complex” and “computer” increase the possibility of exploitable vulnerabilities. Moreover, the reasons of hacking autonomous cars are numerous, ranging from financial gain to acts of war, and there are many who stand to benefit when the overall security levels of vehicles, whether contemporary or of the future, are low. But is the development of more autonomous vehicles inevitable? Cannot we go back in time? We would argue that we indeed can not; The potential benefits of more autonomous transportation—increased efficiency, safety, security, usability, and ecological factors—are a strong driver. We also argue that instead of comparing cars to cell phones, a more apt parallel to the development of autonomous vehicles can be found in the aviation industry. This comparison between industries should be made, as the aviation industry are even more paranoid on safety issues. For example, the engine failure accident that resulted in the death of a passenger aboard a Southwest airlines flight in April 2018 [43] was the first civilian aviation fatality in the US in 9 years [44]. Should autonomous cars reach safety levels even in the vicinity of those mandatory in aviation, the number of road fatalities would be reduced by orders of magnitude. By applying safety design best practices learned from aerospace engineering, while considering the new issues brought by computerization, safe autonomous road transportation may indeed be possible. The problem we see in this domain is not more computer-controlled systems in itself; the issue is with system connectivity and increased complexity. These issues must be addressed should we want to continue on the path eventually leading to truly autonomous vehicles.

6

Conclusions

We must start thinking the lifecycle of a road vehicle from a different perspective—that of software lifecycle. Whereas the cars of yesteryear have managed to last without the manufacturer supporting them for decades, new cars need new kind of support and maintenance infrastructure in place. If the software lifecycle is not carefully designed, there will very probably be consequences ranging from temporary bans of certain cars and disabling the driver-aid computers to full bans of models and series of automobiles from certain manufacturers. To counter this from happening we must first be aware of the problem. The manufacturers and the government officials as well as the representatives from several NGOs should start a discourse on how the security and lifecycle problems would be managed optimally. Legislation should be updated to serve both the

Automobile Automation and Lifecycle

135

present and future, and this process should be begun before connected issues manifest in large scale. That might be sooner than we think! One of the solutions could be a service model where a car manufacturer commits to certain service level for certain amount of time announced when the car is first sold. Therefore, the property rights of individuals are better respected. Other ideas and business models as possible solutions include separated service providers, full or partial open source, computer-hardware customisation services, just to name few. For example, Determann and Perens have examined open-source vehicular platforms and issues surrounding them from a legal perspective [45]. Other unsolved issues that arise with increased automation in road traffic are with responsibility, ethics, insurance policies, social problems and of course security. These are issues the authors will pursue further in their future research on the subject. To conclude, we, as a society, should also demand that these mobile computers that are autonomous vehicles are sufficiently secure. Autonomous and semiautonomous vehicles are perhaps inevitable, but we can control the progress with knowledge, discourse, legislation, and forethought. When done correctly these marvels of technology should increase safety, efficiency, and ecology on the roads as well as improve security, safety, and quality of life of people all around the world.

References 1. Hawkins, A.J.: Autonomous cars without human drivers will be allowed on California roads starting next year, October 2017. https://www.theverge.com/2017/ 10/11/16458850/self-driving-car-california-dmv-regulations 2. Levin, S., Harris, M.: The road ahead: self-driving cars on the brink of a revolution in California, November 2017. https://www.theguardian.com/technology/ 2017/mar/17/self-driving-cars-california-regulation-google-uber-tesla 3. Rushe, D.: End of the road. Will automation put an end to the American trucker? The Guardian, October 2017. https://www.theguardian.com/technology/2017/ oct/10/american-trucker-automation-jobs 4. Magnuson, S.: Driverless trucks poised to join military operations. National Defense Magazine, March 2017. http://www.nationaldefensemagazine.org/articles/2017/ 3/21/driverless-trucks-poised-to-join-military-operations 5. Amazon: Amazon Prime Air (2018). https://www.amazon.com/Amazon-PrimeAir/b?node=8037720011 6. Lowensohn, J.: Elon Musk: cars you can drive will eventually be outlawed, March 2015. https://www.theverge.com/transportation/2015/3/17/8232187/elon-muskhuman-drivers-are-dangerous 7. Bort, J.: For the First Time, Hackers have Used a Refrigerator to Attack Businesses. Business Insider, January 2014. http://www.businessinsider.com/hackersuse-a-refridgerator-to-attack-businesses-2014-1?r=US&IR=T&IR=T 8. Lopatin, E., Bulavas, V.: Miners on the Rise. Kasperksy Lab (2017). https:// securelist.com/miners-on-the-rise/81706/ 9. BBC: ‘Bad Rabbit’ ransomware strikes Ukraine and Russia, October 2017. http:// www.bbc.com/news/technology-41740768

136

A. Hakkala and O. I. Heimo

10. BBC: Cyber-attack: Europol says it was unprecedented in scale, May 2017. http:// www.bbc.com/news/world-europe-39907965 11. Constantin, L.: Petya ransomware is now double the trouble. Network World, May 2016. https://www.networkworld.com/article/3069990/petya-ransomware-is-nowdouble-the-trouble.html 12. Gaertner, C.: Trends and lookout of the automotive software industries (2015) 13. SAE International: Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles. Technical report, SAE International (2016) 14. Wang, B.: Uber’ self-driving system was still 400 times worse Waymo in 2018 on key distance intervention metric. Next Big Future, March 2018. https://www. nextbigfuture.com/2018/03/uber-self-driving-system-was-still-400-times-worsewaymo-in-2018-on-key-distance-intervention-metric.html 15. Tervola, J.: VTT:n robottiauto teki ep¨ avirallisen autonomisen auton nopeusenn¨ atyksen—selviytyy lumipeitteisell¨ a tiell¨ a (Video). Tekniikka & Talous, December 2017. https://www.tekniikkatalous.fi/tekniikka/autot/vttn-robottiauto-teki-epavirallisen-autonomisen-auton-nopeusennatyksen-selviytyylumipeitteisella-tiella-video-6692518 16. Horwitz, J., Timmons, H.: There are some scary similarities between Tesla’s deadly crashes linked to Autopilot, September 2016. https://qz.com/783009/the-scarysimilarities-between-teslas-tsla-deadly-autopilot-crashes/ 17. Levin, S., Wong, J.C.: Self-driving Uber kills Arizona woman in first fatal crash involving pedestrian, March 2018. https://www.theguardian.com/technology/ 2018/mar/19/uber-self-driving-car-kills-woman-arizona-tempe 18. Tesla: An Update on Last Week’s Accident (2018). https://www.tesla.com/blog/ update-last-week’s-accident 19. Bui, Q.: Map: The Most Common* Job in Every State, February 2015. https:// www.npr.org/sections/money/2015/02/05/382664837/map-the-most-commonjob-in-every-state 20. ACEA: Average Vehicle Age, January 2019. https://www.acea.be/statistics/tag/ category/average-vehicle-age 21. Eiza, M.H., Ni, Q.: Driving with sharks: rethinking connected vehicles with vehicle cybersecurity. IEEE Veh. Technol. Mag. 12(2), 45–51 (2017) 22. Greenberg, A.: Hackers Remotely Kill a Jeep on the Highway-with Me in It. Wired, July 2015. https://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ 23. Garfinkel, S.: Hackers are the real obstacle for self-driving vehicles. MIT Technology Review. https://www.technologyreview.com/s/608618/hackers-are-the-realobstacle-for-selfdriving-vehicles/. Accessed 22 Aug 2017 24. Chan, C.S.: Complexity the Worst Enemy of Security, December 2012. https:// www.computerworld.com/article/2493938/cyberwarfare/complexity-the-worstenemy-of-security.html 25. Checkoway, S., McCoy, D., Kantor, B., Anderson, D., Shacham, H., Savage, S., Koscher, K., Czeskis, A., Roesner, F., Kohno, T.: Comprehensive experimental analyses of automotive attack surfaces. In: USENIX Security (2011) 26. Anderson, R.: Security Engineering: A Guide to Building Dependable Distributed Systems. Wiley, Hoboken (2010) 27. Keen Security Lab: Car Hacking Research: Remote Attack Tesla Motors (2016). https://keenlab.tencent.com/en/2016/09/19/Keen-Security-Lab-of-Tencent-CarHacking-Research-Remote-Attack-to-Tesla-Cars/ 28. Keen Security Lab: Experimental Security Assessment of BMW Cars: A Summary Report (2018). https://keenlab.tencent.com/en/Experimental Security Assessment of BMW Cars by KeenLab.pdf

Automobile Automation and Lifecycle

137

29. Leggett, D.: VW Group continues to gain market share in Europe. just-auto.com, May 2018. https://www.just-auto.com/news/vw-group-continues-to-gain-marketshare-in-europe id182728.aspx 30. Garcia, F.D., Oswald, D., Kasper, T., Pavlidès, P.: Lock it and still lose it—on the (in)security of automotive remote keyless entry systems. In: USENIX Security, pp. 929–944 (2016) 31. The Shadow Brokers: “Lost in Translation” leak (2017). https://github.com/ misterch0c/shadowbroker 32. Clarke, R., Youngstein, T.: Cyberattack on Britain’s National Health Service–a wake-up call for modern medicine. N. Engl. J. Med. 377(5), 409–411 (2017) 33. Jang, E., Johnson, M., Burnell, E., Heimerl, K.: Unplanned obsolescence: hardware and software after collapse. In: Proceedings of the 2017 Workshop on Computing Within Limits, LIMITS 2017, pp. 93–101. ACM, New York (2017). http://doi.acm. org/10.1145/3080556.3080566 34. Wikileaks: Vault 7: CIA Hacking Tools Revealed (2017). https://wikileaks.org/ ciav7p1/ 35. Le, V.H., den Hartog, J., Zannone, N.: Security and privacy for innovative automotive applications: a survey. Comput. Commun. 132, 17–41 (2018). http://www. sciencedirect.com/science/article/pii/S014036641731174X 36. Antonakakis, M., April, T., Bailey, M., Bernhard, M., Bursztein, E., Cochran, J., Durumeric, Z., Halderman, J.A., Invernizzi, L., Kallitsis, M., et al.: Understanding the Mirai Botnet. In: USENIX Security Symposium (2017) 37. Adams, K.: Jaguar X-Type development story. AROnline, November 2011. https:// www.aronline.co.uk/cars/jaguar/x-type-jaguar/the-cars-jaguar-x-type/ 38. ACEA: Average Vehicle Age, January 2019. https://www.acea.be/statistics/tag/ category/by-manufacturer-registrations 39. Harris, M.: FBI warns driverless cars could be used as ‘lethal weapons’. The Guardian, July 2014. https://www.theguardian.com/technology/2014/jul/ 16/google-fbi-driverless-cars-leathal-weapons-autonomous 40. Lewis, J.W.: A smart bomb in every garage? Driverless cars and the future of terrorist attacks, September 2015. https://www.start.umd.edu/news/smart-bombevery-garage-driverless-cars-and-future-terrorist-attacks 41. Lima, A., Rocha, F., V¨ olp, M., Esteves-Ver´ıssimo, P.: Towards safe and secure autonomous and cooperative vehicle ecosystems. In: Proceedings of the 2nd ACM Workshop on Cyber-Physical Systems Security and Privacy, CPS-SPC 2016, pp. 59–70. ACM, New York (2016). http://doi.acm.org/10.1145/2994487.2994489 42. Overly, S.: What we know about car hacking, the CIA and those WikiLeaks claims. The Washington Post, March 2017. https://www.washingtonpost.com/ news/innovations/wp/2017/03/08/what-we-know-about-car-hacking-the-ciaand-those-wikileaks-claims/?noredirect=on&utm term=.c66cb11e06f7 43. Stack, L., Stevens, M.: Southwest airlines engine explodes in flight, killing a passenger. The New York Times, April 2018. https://www.nytimes.com/2018/04/17/ us/southwest-airlines-explosion.html 44. NTSB: Accidents Involving Passenger Fatalities: U.S. Airlines (Part 121) 1982–Present, January 2019. https://www.ntsb.gov/investigations/data/Pages/ paxfatal.aspx 45. Determann, L., Perens, B.: Open cars. Berkeley Tech. LJ 32, 915 (2017)

Less-than-Truckload Shipper Collaboration in the Physical Internet Minghui Lai1 and Xiaoqiang Cai2(B) 1

School of Economics and Management, Southeast University, Nanjing, People’s Republic of China 2 Shenzhen Key Laboratory of IoT Intelligent Systems and Wireless Network Technology, The Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, People’s Republic of China [email protected]

Abstract. This paper studies the shippers’ less-than-truckload collaboration in the Physical Internet (PI) for logistics. PI offers a new way to address the efficiency challenges in logistics consolidation through exploiting Internet of Things. The PI facilitates the consolidation of loads from various parties in a much quicker and more efficient way. With the PI, shippers can collaboratively consolidate their freight into truckloads or much larger less-than-truckload loads. Their transportation cost are significantly reduced, due to economies of scale. The collaborative planning problem in the PI is formulated as a non-convex integer network flow model. The problem is generally NP -hard, and a local search heuristic combined with simulate annealing method is developed. The algorithm is evaluated through varied computational examples. This is the first study that investigates shipper collaboration in the PI. Keywords: Shipper consortium · Physical Internet · Intelligent transportation system · Less-than-truckload

1

Introduction

In less-than-truckload (LTL) shipping, it becomes a growing trend in logistics industry that shippers set up partnerships with each other to reduce freight costs. [1] summarizes five common forms of shipper partnerships, which are mode optimization, aggregation, multi-stop truckload, pool distribution, and shipper consortium, ordered by the level of implementation difficulty–from easy to more difficult. This paper studies the collaborative planing problem for shipper consortium. According to [1], in a shipper consortium, several companies collaborate to combine their freights into larger shipments and shift from LTL to truckload (TL) whenever possible, with the ultimate goal of decreasing handling and costs. Shippers may actively change the timing of shipments if desirable. Shipper consortium is particularly beneficial for those companies that have small freight c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 138–151, 2020. https://doi.org/10.1007/978-3-030-29513-4_10

Co-loading in the Physical Internet

139

quantities with overlapping shipping lanes and delivery time windows. However, shipper collaboration is more often the exception than the norm in the past, due to obstacles mostly related to operational difficulties arising from cross-enterprise and cross-department data inconsistencies, highly complex and dynamic interactions, and also hidden agenda of self-interested participants. Until recently, advancements in information and communication technologies present many exciting opportunities for enhancing logistics. One prominent example is the Physical Internet (PI). According to [2], the PI is a contemporary conceptualization of a highly modular logistics network that mimics the routing of packets over a network of hubs in the virtual Internet. It represents an open, global, interconnected and transparent logistics system, based on Internet of Things (IoT). [3] formally define the PI system for city logistics. The main physical components of the PI are containers, hubs and movers. Freight are encapsulated in designed-for-logistics standard, modular, smart and reusable PI-containers, from the size of small cases up to that of cargo containers. PIcontainers support real-time identification, communication, state memory and reasoning capabilities. PI-hubs are open logistics facilities interconnected with each other, such as semi-trailer transit centres, cross-docking hubs and warehouses, enabling a global “Logistics Web”. Vehicle of various types are openly available to move the PI-containers from hubs to hubs, as a part of multimodal transport, irrespective of who is owning them. In the PI, freight are encapsulated in sets of PI-containers according to their specific destinations, and then are moved to the open hub nearest to their origins. Then the PI-containers travel through the web of hyperconnected open hubs via multiple transporters, all the way to final destinations. The PI uncouples origindestination trips into multiple segments between open PI-hubs, each segment is dynamically assigned to the most appropriate mode and transporters. Handling systems, vehicles and carriers are designed or retrofit to seamlessly and efficiently during the process. The Physical Internet constitutes a path-breaking solution to the inefficiencies of traditional proprietary logistics system [4]. Since the introduction, PI have received significant attention from researchers and policy makers. According to [5], the Alliance for Logistics Innovation through Collaboration in Europe plan to fully implement the PI by 2050. Due to container modularization and advanced technologies, cross docking at the PI hubs becomes quicker and less expensive. Furthermore, the PI facilitates the consolidation of loads from various parties allowing for quicker dispatching and efficient routing through the network [2]. Thus, PI can perfectly implement the collaboration strategy of shipper consortium. The PI is orchestrated by an intermediary platform. In a given time horizon, e.g., one week, a group of shippers in the consortium submit their LTL freight requests to the platform that must be delivered within the planning horizon. With the PI, the platform optimally schedules the routes of PI-containers that encapsulate shippers’ freight through the PI-hubs. The objective is to create as many full truckloads as possible to reduce cost of shippers through consolidation,

140

M. Lai and X. Cai

even delaying the delivery of shipments if necessary. For each route, the segments may be carried by different transporters assigned by the platform. The cost function of each segment is determined by the tariff already negotiated by the platform with transporters. The tariff structure is typically piecewise linear, consisting of many ranges and minimum charges. Moreover, late delivery cause penalty cost to the shippers, representing losses in freight values. Shippers’ collaboration is a very significant problem in the LTL logistics industry [1]. The demand for logistics services in urban areas is continuously increasing and the freight flows becomes more and more fragmented. Transport capacity is often partially utilized, resulting in inefficient transport movements within urban areas, giving rise to various external costs, such as congestion, air pollution, and noise hindrance [6]. Consolidating shipments on the shippers’ side would effectively improve the efficiency of city logistics. A multi-commodity network flow model is developed for the platform to optimize consolidation routes in the PI. To incorporate the timing of shipments, the time-space expanded network of the PI is constructed and the freight flows in the network are time-dependent. The model utilizes the characteristics of PI and provides a theoretical guide on how to optimize consolidation from shippers’ perspective. Due to the complicated transportation cost structure, the optimization problem is non-convex and highly intractable. No exact polynomial-time algorithms are known for such problem. Thus, a local search heuristic combined with simulated annealing method is proposed to search approximate optimal solutions. The algorithm iteratively searches local optima. To escape trapping in a local optimum, the algorithm randomly perturbs the current local optimum and then searches a new one. This process repeats until the probability of further improvement is sufficiently low. To position this paper in the literature, a brief survey of previous researches is summarized here. First, the literature on LTL collaboration restricted their problems to special cases, e.g., two-shipper collaboration case [7,8], and network structure with fixed routes [9–13]. Thus, there is a lack of study for more general LTL collaboration problems. Second, most of the literature on PI focused on the system design, cost-benefit analyses, and the operational planning of carriers [4,14]. For example, [15] firstly studied the PI network design problem. [16] proposed transportation protocols for PI network. [2] compared shipment consolidation in PI with traditional network. [17] and [18] studies the auction bidding problem in PI transportation. Both [19] and [20] investigated the routing problem for PI. [21] and [22] assessed inventory management policy with PI distribution system. Clearly, this paper is the first study that considers LTL collaboration in the PI from the shipper’s perspective. The remainder of this paper is organized as follows. Section 2 presents a network flow optimization model for the shipper consortium problem. Section 3 describes a solution algorithm to the optimization model. Then, Sect. 4 discusses the computational experiments to test the algorithm. Finally, Sect. 5 concludes this paper and discusses future research plans.

Co-loading in the Physical Internet

2

141

The Optimization Model

Consider a group of shippers N . Each freight request s of the shippers specifies the origin os , the destination ds , the weight ws (unit: lbs) at the beginning of the planning horizon, the available time tos , and the due time tds . Define S to be the set of all freight requests. Assume that the delivery of each request cannot be split, i.e., through a single path. The available time, i.e., the earliest time to be picked up, cannot be altered, but the delivery time can be later than the due time. For the ease of modelling, the time is discretized as {1, 2, ..., T }. The unit of time could be one day or an integral number of hours, depending on the granularity of operational planning.

Fig. 1. An example of transportation cost function. Source: [16].

The shipments are delivered through a PI network. An example of PI given by [16] is illustrated in Fig. 1. A freight request is first sent to the nearest PI-hub from the origin, where it is unloaded from the inbound vehicle and reloaded onto an outbound vehicle together with other pending freights for further distribution, in order to build larger LTL or TL shipments if possible. The outbound shipment then travels to other intermediary PI-hubs for further consolidation or directly to

142

M. Lai and X. Cai

the PI-hub closest to the final destination. The freight on each segment, i.e., an arc between the nodes in the network, may be carried by different transporters, as the PI is an open system. For simplicity, assume that all the PI-hubs are cross-docks, and hence it is impossible to hold freight over time periods at any PI-hubs. In each period, the inbound shipment must be loaded onto outbound vehicles within the period. However, freight s can be held at the origin waiting for pickup, incurring a per unit ground cost hs . Meanwhile, the lateness delivery of freight s causes a per unit loss ps in each unit of time.

Fig. 2. An example of transportation cost function.

The transportation cost function on each segment for shippers is modelled as a combination of LTL and TL rates. The transportation cost function ce (x) in the weight x for a vehicle traversing segment e is defined as: ⎧ if x = 0; ⎨ 0, (1) ce (x) = rek x + bke , if Qk−1 < x ≤ Qk , 1 ≤ k ≤ K; ⎩ if QK < x ≤ Ue ; Fe , where Q0 = 0, 1 ≤ k ≤ K is a segment of LTL rate, b1e is the minimum charge, and Fe is the TL rate.

Co-loading in the Physical Internet

143

The LTL rate is a modified all-unit-discount function satisfying: for flat seg; for increasing segments, re1 = re3 = · · · = reK−1 = 0, 0 < b1e < b3e < · · · < bK−1 e = 0; Q , ments, re2 > re4 > · · · > reK > 0, b2e = b4e = · · · = bK k k = 1, 3, ..., K − 1, e are the break points of increasing segments; Qk is the indifference point between increasing segments k − 1 and k + 1, for all k = 2, 3, ..., K − 2. The TL rate satisfies Fe = reK · QK . By definition, ce (x) is a continuous and increasing function. In the LTL rate part, each pair of successive segments contains one strictly increasing segment and one flat segment. The first segment is flat representing the minimum charge. An example of the cost function ce (x) is illustrated in Fig. 2, where a load between Q6 and Ue is shipped as a truckload. For most carriers, TL shipments normally range from 15,000 lbs to 45,000 lbs, while LTL shipments are commonly less than 15,000 lbs. To model time-varying freight flows in the PI, construct a time-space network (V, A), where the vertex set V consists of (v, t) for all t ∈ {1, 2, ..., T } and nodes v including origins, destinations and PI-hubs. A pair of vertices (v1 , t1 ) and (v2 , t2 ) are connected with a service arc if arc e = (v1 , v2 ) exists in the physical transportation network and t2 = t1 + ae , where ae is the traveling time from v1 to v2 . Traversing arc [(v1 , t1 ), (v2 , t2 )] represents that the shipment departures from v1 at time t1 and then arrives at v2 at time t2 . For each freight s, link each pair of (os , t) and (os , t + 1) with a ground arc to represent that freight s is held at the origin for one unit of time; also add a dummy vertex denoted by dˆs , which is the sink for freight s, and link each pair of (ds , t) and dˆs with a delivery arc to represent that s is delivered at time t, where t ≥ ts + 1. Let AL , AG and AD denote the sets of service arcs, ground arcs and delivery arcs in the time-space network, respectively. Then, A = AL ∪ AG ∪ AD . The time-space network is a directed acyclic graph. Let I(v, t) and O(v, t) denote the set of incoming and outgoing arcs of vertex (v, t) in the expanded network, respectively. The cost function of a service arc e = [(v1 , t1 ), (v2 , t2 )] ∈ AL is given by ce (x) = c(v1 ,v2 ) (x). The cost of a ground arc e = [(os , t), (os , t + 1)] ∈ AG for freight s is given by hse = hs ws . Finally, the cost of a delivery arc e = [(ds , t), dˆs ] ∈ AD for freight s is given by pse = ps ws · (t − tds )+ . All the transportation cost, holding cost, and lateness cost are paid by the shippers themselves. The shipper consortium’s collaborative transportation planning problem is formulated as a network flow model. Define binary variable xse = 1 if freight order s is routed through arc e and 0 otherwise, for all s ∈ S and e∈ A. The total flow through a service arc e = [(v1 , t1 ), (v2 , t2 )] ∈ AL equal to s∈S ws xse is the load of the consolidated shipment executed by the carrier. The load must not exceed the capacity given by Ue = U(v1 ,v2 ) . The objective is to determine the optimal flow x := (xse )e∈A,s∈S that minimizes total transportation, ground holding and lateness cost of the shippers. ce ( ws xse ) + hse xse + pse xse (2) min Γ (x) = e∈AL

s∈S

s∈S e∈AG

s∈S e∈AD

144

M. Lai and X. Cai

(CP) s.t.

e∈I(v)

xse −

∀ s ∈ S, v ∈ V;

(3)

e∈O(v)

ws xse ≤ Ue ,

s∈S xse ∈

xse = qvs ,

{0, 1},

∀ e ∈ AL ;

∀ e ∈ A, s ∈ S,

where qvs as defined as below is the demand for freight s at vertex v ∈ V: ⎧ ⎨ −1, if v = (os , tos ); s qv = 1, if v = dˆs ; ⎩ 0, otherwise.

(4) (5)

(6)

In the objective function (2), the three terms represent the total transportation cost, the total ground holding cost and the total lateness cost, respectively. Constraints (3) enforce network flow balance. Constraints (4) impose vehicle capacity for shipments on all the service arcs. Constraints (5) state the binary requirement of flow variables to ensure no shipment split. In this problem, a feasible delivery path of freight request s is a path from (os , tos ) to dˆs in the time-space network. Assume each freight has a feasible delivery path. Remark 1. The optimization model uses time-space network flows to represent the time-dependent routing of shipments. This approximates the practical routing operations in PI, where the error depends on the granularity of time. For large-scale networks and small unit of time, the time-space network would be huge and the model is complicated.

3

Heuristic Algorithm

The collaborative optimization model (CP) is a capacitated, non-convex integer program with combinatorial structure. Since the fixed-charge network flow problem is a special case, problem (CP) is generally NP -hard. Thus it is necessary to develop a heuristics algorithm to search approximately optimal solutions by combining local search and simulated annealing. In the following, the design process of the algorithm is illustrated step-bystep. 3.1

Local Search Subroutine

The core of the heuristics algorithm is the local search subroutine. Given an initial solution of problem (CP), i.e., initial delivery paths of freight, the algorithm iteratively searches a local optimum to improve the current solution. Since no shipment split is allowed, the local search algorithm only needs to solve series of shortest path problems, as described in Algorithm 1.

Co-loading in the Physical Internet

145

Algorithm 1. Locally Search Subroutine (LSS) 0 Initialization. Let ˚ x be a given initial extreme flow to problem (CP). Initialize: x =˚ x. Compute the total cost Γ (x). 1 Search. For each shipment s ∈ S, fix the flows of other shipments, and search for an improving path, as follows. 1a. Compute the incremental cost of arcs in the time-space network, as follows: ⎧ ⎪ ce ( s =s ws xse + ws ) − ce ( s =s ws xse ), ⎪ ⎪ ⎨ c¯e = +∞, s ⎪ ⎪ ⎪ hse , ⎩ pe ,

if if if if

e ∈ AL and s =s ws xse + ws ≤ Ue ; e ∈ AL and s =s ws xse + ws > Ue ; e ∈ AG ; e ∈ AD .

(7) 1b. Search the shortest path from vertex (os , tos ) to dˆs in the time-space network with service arc cost given by (7). Update the flow of shipment s using the shortest path in the solution x. Denote the updated extreme flow by x . Compute the total cost Γ (x ). 2 Stopping criterion. Check if there exists any shipment s with Γ (x ) < Γ (x). If so, then randomly choose one such shipment, set x to be the updated flow x generated in Step 1b for the selected shipment, and go to Step 1. Otherwise, a locally optimal primal solution x and its cost Γ (x) are obtained. Return.

3.2

Random Perturbation

To escape trapping in a local optimum, the algorithm randomly perturbs the current local optimum and then applies local search procedure to search a new one. Randomly select a small subset of the freight requests and randomly generate a new feasible solution for each selected one. The size of this subset must be at least two. If only one is selected, it is quite likely that the algorithm goes back to the old solution after the local search procedure. However, if the size is too large, e.g., nearly all the firms, then valuable information of the current local optimum is wasted. The proper size will be investigated in the numerical study. 3.3

Acceptance and Stopping Criterion

After the local search procedure, a new local optimum is found. A simulated annealing type criterion is applied to update the local optimum. Suppose that the current local optimum is (x) and the new local optimum is x ¯. Define the cost difference as ΔΓ := Γ (¯ x) − Γ (x). Then the probability of accepting x ¯ is given by 1, if ΔΓ ≤ 0; 1 (8) P r(ΔΓ ) = e− temp , otherwise. where temp is a parameter called temperature. If the temperature is sufficiently high, this criterion accepts any new local optimum; if the temperature is sufficiently low, this criterion only accepts the improving local optimum. In simulated annealing, the temperature is usually positive and gradually decreasing through

146

M. Lai and X. Cai

the iterations, such that the algorithm diversifies the search during the early iterations and then intensifies the search during the later iterations. In this paper, an exponential schedule is used to iteratively decrease the temperature; that is, the temperature at iteration n is temp = temp0 · τ n , where temp0 is the initial temperature and τ ∈ (0, 1) is the decreasing rate. Finally, following from the suggestion of [23], the algorithm is stopped when no new local optimum can be accepted or the number of moves without improvement in the current best solution exceeds a pre-specified number, denoted by M AX. To check the latter condition, record the best local optimum so far at each iteration. 3.4

The Implementation Algorithm

The complete algorithm that implements the heuristic is presented in Algorithm 2, which repetitively calls for the local-search procedure. Algorithm 2. The Heuristic Algorithm 0. Initialization. Set the perturbation size sn, the number of allowable nonimprovement steps M AX, the initial temperature temp, and the temperature decreasing rate τ . Assign a random feasible delivery path in the time-space network to each freight request s ∈ S, represented by the initial local optimum x. 1. Random perturbation. Randomly choose sn number of freight requests. For each selected freight s, randomly generate a feasible delivery path represented by ˆs = xs . x ˆs ; for each non-selected freight s , set x 2. Local Search. Call procedure Local Search Subroutine to find a new local optimum starting with input x ˆ. Let x ¯ to be the new local optimum derived from this procedure. 3. Acceptance criterion. Compute the cost reduction ΔΓ = Γ (¯ x)−Γ (x). Calculate the probability of acceptance P r(ΔΓ ) as in Eq.(8) for the new local optimum x ¯. Generate a random number θ ∈ [0, 1] with uniform distribution. 4. Stopping criterion. If θ ≤ P r(Δ) and the number of moves without improvement in the current best solution is less than M AX, set x ← x ¯, temp ← temp · τ , and go to Step 1. Otherwise, stop and go to Step 5. 5. Settlement. Set the current best local optimum denoted by x∗ as the final solution.

A small example for the PI network in Fig. 1 is presented here to illustrate the heuristic algorithm. In this example, 10 shipments s = (os , ds , ws , tos , tds ) were generated: s1 = (v7 , v16 , 7663.1, 2, 6), s2 = (v7 , v21 , 6380.7, 4, 7), s3 = (v17 , v20 , 6889.6, 2, 5), s4 = (v13 , v17 , 7773.4, 1, 6), s5 = (v12 , v22 , 6497.7, 3, 7), s6 = (v15 , v17 , 6941.9, 1, 5), s7 = (v9 , v14 , 6550.3, 2, 7), s8 = (v11 , v16 , 8749.2, 1, 6), s9 = (v8 , v21 , 6478.7, 3, 7), and s10 = (v14 , v17 , 7535.8, 3, 6). The traveling time between each pari of nodes was set 1 unit. The planning horizon was 7 units of time. LTL freight rates were based on the weight (unit: lbs) of the shipment. The break points of increasing segments in the tariff are 500, 1000, 2000, 5000, 10000,

Co-loading in the Physical Internet

147

Fig. 3. An iteration process example of the heuristic algorithm.

15000; the indifference points between increasing segments are 1000 ∗ ρ, 2000 ∗ ρ, 5000 ∗ ρ, 10000 ∗ ρ, where ρ is the discount level based on the previous level. In range (0,500], the freight rate is a fixed minimum charge, and freight with weight in (15000, 45000] is a truckload. Set ρ = 0.9. The holding cost is 1.07 per unit of time per lbs and the lateness cost is 1.70 per unit of time per lbs. For the algorithm, set the initial temperature temp = 60, the limit number of non-improving movement M AX = 10, the decreasing rate τ = 0.8, and the random perturbation number sn = 3. The algorithm stopped after searching 14 local optima and the final system efficiency improvement compared to no collaboration case was 5.3182%. The iteration process for this example is shown in Fig. 3, where the system cost is the local optimum selected by the algorithm at each round.

4

Computational Examples

The heuristic algorithm was tested on randomly generated examples. In each computational instance, the PI network was randomly generated and then freight requests were generated by randomly selecting a pair of nodes, an available time and a due time. The planning horizon is set as T = 7. The travel time between

148

M. Lai and X. Cai

each pair of PI-hubs was set as 2 units of time, while the travel time between each pair of linked customer location and a hub was set as 1 unit of time. Denote the number of PI-hubs, the number of shipments and the number of shippers in the network by |H|, |S| and |N |, respectively. Table 1. Results of computational examples. Instance |H| |S| |N | iter

timx

round impr%

No.1

10

50

15

971

3.31

7

12.60

No.2

10

60

17 1001

3.51

7

12.56

No.3

10

80

18 1273

4.06

7

9.28

No.4

10

100

19 2653

6.58 12

15.08

No.5

20

100

28 2265

82.83

No.6

20

150

37 1855

69.92

5

10.67

No.7

20

180

41 4042 133.20

8

12.37

No.8

20

200

43 4595 131.33 11

22.31

No.9

30

200

59 3134 202.25

6

15.99

No.10

30

250

57 2104 189.84

2

14.73

No.11

30

275

67 1950 195.94

1

14.56

No.12

30

300

57 5225

72.62

7

26.51

No.13

40

300

86 5669 182.51

7

19.55

No.14

40

350

70 7610 246.39

6

24.35

No.15

40

375

87 5080 250.87

6

24.27

No.16

40

400

92 7042 242.41

7

25.89

No.17

50

400 110 4301 279.04

4

20.78

No.18

50

450 102 7613 242.44

6

20.57

No.19

50

500 120 4325 217.11

3

24.63

No.20

50

600 111 7334 302.58

4

22.61

8

13.67

The freight rate structure was set the same as in the illustrative example of Sect. 3.4. The unit holding cost hs was uniformly distributed in [1.0, 1.5]. The unit lateness cost ps was uniformly distributed in [1.5, 2.0]. The algorithm was evaluated by computing the improvement relative to the non-collaboration case, where each shipper’s freight are optimally delivered in the PI without consolidation with others’ freight. Let impr% denote the relative improvement (unit: percents), round the number of unique local optima visited, iter the number of total iterations, and timx the computational time (unit: minutes). After trial tests, the algorithm was found to be satisfactory when setting the initial temperature temp = 60, the limit number of non-improving movement M AX = 5, the decreasing rate τ = 0.8, and the random perturbation number sn = 4.

Co-loading in the Physical Internet

149

In this experiment, 20 random instances of varied sizes were generated. The results are shown in Table 1. This experiment shows that the algorithm can substantially improve the system efficiency. From collaboration, shippers can expect to receives substantial cost savings. In particular, the cost savings tend to be larger when more freight are included in the network. The results in Table 1 also show that the computational time of the algorithm quickly grew as the scale increased. Remark 2. The heuristic algorithm can quickly solve small- and medium- scale problems while significantly improving system efficiency by searching a number of local optima. However, for large-scale problems, the algorithm becomes slower, and the parameters (i.e., temp, M AX, τ , and sn) may need to be adjusted so as to accelerate the convergence speed, at the cost of reduced efficiency improvement.

5

Conclusion

In this paper, a non-convex integer multi-commodity network flow model is developed to formulate the shipper LTL collaboration in the PI. The key concept behind the PI logistics system is the routing of highly modular containers through PI hubs to efficiently consolidate LTL shipments. The system is open, global, interconnected and transparent. The network flow model utilizes these characteristics of PI. Since the model is generally NP -hard, a local search heuristic combined with simulate annealing method is proposed. The algorithm iteratively searches local optimum and escape trapping in a local optimum by randomly perturbing the current local optimum. The computational experiments on the heuristic algorithm verify the effectiveness. The future works plan to design cost allocation schemes for shipper consortium based on cooperative game theories. In particular, the centralized model is highly non-convex and nonlinear, and the design coalitional stable cost allocation is very challenging. Acknowledgments. This research is partially supported by Natural Science Foundation of China (Nos. 71531003, 71501039, 71432004), the Leading Talent Program of Guangdong Province (No. 2016LJ06D703), the Shenzhen Science and Technology Innovation Committee (Grant No. ZDSYS20170725140921348), and the Fundamental Research Funds of Southeast University(No. 3214008411).

References 1. CH Robinson: Assessing the 5 biggest LTL savings opportunities. Technical report (2016). https://www.chrobinson.com/en-US/Resources/White-Papers/ ¨ u, M.A.: On Physical Internet logistics: modeling 2. Venkatadri, U., Krishna, K.S., Ulk¨ the impact of consolidation on transportation and inventory costs. IEEE Trans. Autom. Sci. Eng. 13(4), 1517–1527 (2016). https://doi.org/10.1109/TASE.2016. 2590823

150

M. Lai and X. Cai

3. Crainic, T.G., Montreuil, B.: Physical Internet enabled hyperconnected city logistics. Transp. Res. Procedia 12, 383–398 (2016). https://doi.org/10.1016/j.trpro. 2016.02.074 4. Pan, S., Ballot, E., Huang, G.Q., Montreuil, B.: Physical Internet and interconnected logistics services: research and applications. Int. J. Prod. Res. 55(9), 2603– 2609 (2017). https://doi.org/10.1080/00207543.2017.1302620 5. ALICE: Alliance for logistics innovation through collaboration in Europe. Technical report (2017). http://www.etp-logistics.eu/ 6. van Heeswijk, W.J.A., Mes, M.R., Schutten, J.M.: The delivery dispatching problem with time windows for urban consolidation centers. Transp. Sci. 53(1), 203–221 (2019). https://doi.org/10.1287/trsc.2017.0773 7. Vanovermeire, C., S¨ orensen, K.: Measuring and rewarding flexibility in collaborative distribution, including two-partner coalitions. Eur. J. Oper. Res. 239(1), 157–165 (2014b). https://doi.org/10.1016/j.ejor.2014.04.015 8. Tinoco, S.V.P., Creemers, S., Boute, R.N.: Collaborative shipping under different cost-sharing agreements. Eur. J. Oper. Res. 263(3), 827–837 (2017). https://doi. org/10.1016/j.ejor.2017.05.013 9. Vanovermeire, C., S¨ orensen, K.: Integration of the cost allocation in the optimization of collaborative bundling. Transp. Res. Part E: Logist. Transp. Rev. 72, 125–143 (2014a). https://doi.org/10.1016/j.tre.2014.09.009 10. Vanovermeire, C., S¨ orensen, K., Van Breedam, A., Vannieuwenhuyse, B., Verstrepen, S.: Horizontal logistics collaboration: decreasing costs through flexibility and an adequate cost allocation strategy. Int. J. Logist. Res. Appl. 17(4), 339–355 (2014). https://doi.org/10.1080/13675567.2013.865719 11. Hanbazazah, A.S., Abril, L., Erkoc, M., Shaikh, N.: Freight consolidation with divisible shipments, delivery time windows, and piecewise transportation costs. Eur. J. Oper. Res. (2018). https://doi.org/10.1016/j.ejor.2018.12.043 12. Chabot, T., Bouchard, F., Legault-Michaud, A., Renaud, J., Coelho, L.C.: Service level, cost and environmental optimization of collaborative transportation. Transp. Res. Part E: Logist. Transp. Rev. 110, 1–14 (2018). https://doi.org/10.1016/j.tre. 2017.11.008 13. Zhang, W., Uhan, N.A., Dessouky, M., Toriello, A.: Moulin mechanism design for freight consolidation. Transp. Res. Part B: Methodol. 116, 141–162 (2018). https://doi.org/10.1016/j.trb.2018.07.013 14. Ambra, T., Caris, A., Macharis, C.: Towards freight transport system unification: reviewing and combining the advancements in the Physical Internet and synchromodal transport research. Int. J. Prod. Res. (2018). https://doi.org/10.1080/ 00207543.2018.1494392 15. Ballot, E., Gobet, O., Montreuil, B.: Physical Internet enabled open hub network design for distributed networked operations. In: Service Orientation in Holonic and Multi-Agent Manufacturing Control, pp. 279–292. Springer, Heidelberg (2012) 16. Sarraj, R., Ballot, E., Pan, S., Hakimi, D., Montreuil, B.: Interconnected logistic networks and protocols: simulation-based efficiency assessment. Int. J. Prod. Res. 52(11), 3185–3208 (2014). https://doi.org/10.1080/00207543.2013.865853 17. Kong, X.T., Chen, J., Luo, H., Huang, G.Q.: Scheduling at an auction logistics centre with physical internet. Int. J. Prod. Res. 54(9), 2670–2690 (2016). https:// doi.org/10.1080/00207543.2015.1117149 18. Qiao, B., Pan, S., Ballot, E.: Revenue optimization for less-than-truckload carriers in the Physical Internet: dynamic pricing and request selection. Comput. Ind. Eng. (2018). https://doi.org/10.1016/j.cie.2018.12.010

Co-loading in the Physical Internet

151

19. Ben Mohamed, I., Klibi, W., Labarthe, O., Deschamps, J.C., Babai, M.Z.: Modelling and solution approaches for the interconnected city logistics. Int. J. Prod. Res. 55(9), 2664–2684 (2017). https://doi.org/10.1080/00207543.2016.1267412 20. Yao, J.: Optimisation of one-stop delivery scheduling in online shopping based on the Physical Internet. Int. J. Prod. Res. 55(2), 358–376 (2017). https://doi.org/ 10.1080/00207543.2016.1176266 21. Yang, Y., Pan, S., Ballot, E.: Innovative vendor-managed inventory strategy exploiting interconnected logistics services in the Physical Internet. Int. J. Prod. Res. 55(9), 2685–2702 (2017). https://doi.org/10.1080/00207543.2016.1275871 22. Ji, S.F., Peng, X.S., Luo, R.J.: An integrated model for the production-inventorydistribution problem in the Physical Internet. Int. J. Prod. Res. 57(4), 1000–1017 (2019). https://doi.org/10.1080/00207543.2018.1497818 23. Resende, M. G.: Metaheuristic hybridization with greedy randomized adaptive search procedures. Tutor. Oper. Res. 295–319 (2008).https://doi.org/10.1287/ educ.1080.0045

Smartphone-Based Intelligent Driver Assistant: Context Model and Dangerous State Recognition Scheme Igor Lashkov1,2(&) and Alexey Kashevnik1,2 1

SPIIRAS, 39, 14 Line, Saint Petersburg 199178, Russia {igla,alexey}@iias.spb.su 2 ITMO University, Kronverkskiy Prospekt, 49, Saint Petersburg 197101, Russia

Abstract. The paper proposes the context model and the dangerous state recognition scheme for intelligent driver assistant system. The system is aimed at utilization of smartphone’s front-facing camera and other sensors for dangerous states reignition to prevent emergency and reduce the accidents probability. The proposed context model is divided into following types of contexts, driver context, vehicle context, road context, and environment context. The model shows how the smartphone front-facing camera and sensors as soon as accessible Internet services are used to support the proposed context types. Then, the context-based dangerous state recognition model is presented. The model calculates the computational power of the smartphone and based on this information implements the frame skipping algorithm to reduce the computation complexity for the smartphone processor. The implementation of the proposed frame skipping model shows that for the modern smartphones the complexity is decreased by three times in compare with usual scheme. The proposed context model and dangerous state recognition scheme has been implemented in Drive Safely system that is available in Google Play. Keywords: Context-aware driver assistant Smartphone Vehicle Driver

Dangerous state Context

1 Introduction Road accidents remain the most disastrous of almost any country in the whole world. Drowsiness, distraction or alcohol intoxication of the driver are the most common causes of vehicle related dangerous situations. It should be highlighted that one of the increasingly popular approaches presented in previous scientific researches relies in the development of advanced of driver assistance systems. These safety systems are aimed at reducing road accidents and providing better interaction and engagement with a driver. Some common examples of driver safety technologies for this type of systems are vehicle collision avoidance system, lane keep assistant, driver drowsiness and distraction monitoring and alerting. General use of such systems comprises a certain sequence of commands and can be described in this way: monitoring driver behavior, state of the vehicle or road situation by using different built-in auxiliary devices, © Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 152–165, 2020. https://doi.org/10.1007/978-3-030-29513-4_11

Smartphone-Based Intelligent Driver Assistant: Context Model

153

including short and long-range radars, lasers, lidars, video cameras to perceive the surroundings; continuous analysis of reading from sensors and determining of dangerous situations while driving; notifying driver about recognized unsafe in-cabin and road situations; and taking control of the vehicle if driver reaction is not sufficient or missing. At the moment, driver safety systems heavily rely on data collected from different in-vehicle sensors. This data can aid to describe changes in the surrounding environment by using both single and multiple sensors together in different situations and, therefore, provide relevant context. According to the definition of Dey [1] the context represents any information that can be used to characterize the situation of a person, place or object, that is considered relevant to the interaction between the user and the application. Some examples of information that can be associated with a driver and environment and describe driver’s situation for a safety driver assistance application are weather conditions, traffic information, vehicle characteristics or physiological parameters of the driver. This kind of information can affect the driving performance in different real-time situations. There is a certain research and technological gap in implementing the contextaware approaches for better understanding vehicle driver and current environment situation. That is, situation relevant information is not a contributing factor for determining unsafe driving behavior, alerting driver about dangerous road situation, generating context-based recommendations, preventing or mitigating traffic accidents or adapting system for driver needs, taking into its account preferences and competencies. This paper considers the advantages of context utilization for driver assistance application, presents a context model and the idea of implementing context in the application. The design and prototyping of this application have been described in our previous works [2–4], including the mobile application for Android smartphones1. This paper extends our prior work. The rest of the paper is organized as follows. Section 2 presents a comprehensive related work in the area of context-aware solutions for safe driving and compares the available solutions. Section 3 describes in detail our context-aware model. The contextaware dangerous state detection algorithm based on camera frame skipping is presented in Sect. 4. The implementation and evaluation are presented in Sect. 5. It is followed by discussion (Sect. 6). Finally, the main results and potential future work are summarized in Conclusion (Sect. 7).

2 Related Work Firstly, we can distinguish following types of context related to driver assistance systems: driver, vehicle and environment. On one hand, this type of information is provided, in general, by electronic control unit that is already embedded in automotive electronics and controls vehicle’s systems or subsystems. On the other hand, in recent years modern smartphones became a perspective powerful device not only for calls and SMS messages, but also for a variety of diverse tasks including informational,

1

https://play.google.com/store/apps/details?id=ru.igla.drivesafely.

154

I. Lashkov and A. Kashevnik

multimedia, productivity, safety, lifestyle, accessibility related applications and many others. Most modern smartphone already have a set of built-in sensors [5], measuring some kind of physical quantity and, in general, include accelerometer, gyroscope, magnetometer, Global Positioning System, light and proximity sensors. Because of its affordable low price, a set of embedded sensors and small sizes, smartphones are gaining popularity for building driver assistance systems at a large scale. The study [6] estimates the influence of the smartphone based advanced driver assistance system (ADAS) on driving behavior when driver is distracted by mobile social networking applications. The proposed approach utilizes the specially designed driving simulator that replicates the car structure and its parts and includes following third-party components. In order to analyze driver behavior, the eye-tracking system Tobix X120 was used to show where the people are looking exactly. Smartphone was used as a platform for ADAS system and to provide continuously taken images from the front-facing camera to detect the presence of head and eyes of the driver in the scene and alert its through a single beep of 1250 ms. This study showed that using front-facing camera of the smartphone for ADAS aids to track driver behavior, mitigate the distracted tasks and improve the overall driving performance. The data for their research was the driving behavior perceived by the video camera and eye tracking system. Another research [7] tackles the problem of driver behavior classification for driver assistance systems. In this paper the neuro-fuzzy system is proposed to estimate the driving behaviors based on their similarities to fuzzy patterns. Sensor fusion is used by the neural network for recognizing types of driver maneuvers, including driver’s lane change, left and right turns and U-turn. The initial data source for these maneuvers are raw data readings from accelerometer, gyroscope and magnetometer, that measure changes in velocity, magnetic field and rotation velocity respectively. The output results of the proposed method are two different scores, safe driving score and aggressive driving score. The results of this paper state that the evaluation of driving behavior plays a crucial role in improving driver safety. Based on the fact that ignoring road signs by a vehicle driver can lead to major accident, this approach [8] is focused on traffic sign recognition using contour analysis approach. The proposed system makes an audible alert for a driver of road signs and, therefore, helping its to avoid a traffic accident. This paper only considers the information of traffic sign boards mounted along the roads for safe driving, that in its turn describes the context relevant for the environment. Additionally, the vehicle context can provide information about vehicle characteristics and its capabilities in a certain situation scenario. In the paper [9], the developed components utilize on-board diagnostics communication module to acquire vehicle information such as mileage information, sensor data, including fuel usage, velocity, etc. Afterwards it provides information tips about eco-driving and safedriving. This kind of information can aid to influence the driving style and increase the overall driving safety. Patent [10] shows a possible integration of data from smartphone sensor devices and vehicle on-board diagnostic system to address a problem of generated pollution.

Smartphone-Based Intelligent Driver Assistant: Context Model

155

Smartphone sensors used in the study are accelerometer and location-based GPS. Onboard diagnostics port provides data about accelerometer, tire pressure, velocity, antilock brake condition, temperature and others. Such system can be used in ranking driving style and estimating its fuel economy. Collected sensor data can be applied for further analysis in remote services. Another study [11] presents an approach that operates driver personal information (age, gender), traffic violations and traffic accident records dataset for furthermore traffic safety assessment. Traffic violations were taken from Public security recommendation standard GA/T 16.30-2017 and some examples of them are parking in the prohibited area, running a red light, speeding, overload, not wearing a seat belt. The performance and accuracy of the proposed data mining framework was evaluated by Kolmogorov-Smirnov chart and the receiver operating characteristic curve. This study is based only on using driver personal information to safety assessment. One more type of information involved in traffic safety is provided by Green Light Optimal Speed Advisory (GLOSA) systems, allowing to achieve reduced emissions of vehicles and increase the efficiency of traffic flow in the areas of the signalized intersections. One of the studies [12] addresses the performance of this type of systems in terms of fuel consumption, carbon dioxide emissions of vehicles and travel time. Conducted experiments show that in certain situations whereas 400–500 vehicles are going per hour, the developed GLOSA system is able to reduce congestion, enhance the environment state and, therefore, improve the overall driving performance. This system is relied on using data taken from wireless vehicle-to-infrastructure [13] communication systems. The parameters measured in this study are travel time, consumption rate, CO2 emissions for every 60 s, including information about acceleration, velocity and position of the vehicles calculated at step of 0.1 s. Nowadays, connected-vehicles technology [14] is gaining popularity among researchers and focused on increasing road safety and driving performance. It utilizes dedicated short-range protocol either on-board units already integrated in some vehicles to transmit information messages between vehicles or infrastructure at a predefined rate. Generally, these transmitted safety messages can include information about vehicle unique identifier, its GPS position, speed, time and driving direction. One of the scenarios of using connected-vehicle technology is vehicle platooning [15], that is built with Vehicle-to-Vehicle communication and aids a group of vehicles to travel very closely together. To maintain a vehicle set, each vehicle in a chain requires environmental information and needs share sensing data, including route information, vehicle speed, acceleration, heading direction, future intent actions (e.g. turning, braking, changing lane, etc.), vehicle parameters or driver behavior information like reaction time, so that other vehicles will be notified before attempting maneuver. This technique allows to increase the capacity of roads, road safety and provides steady state of traffic and, therefore, can reduce fuel consumption. Big scale vehicle systems in terms of growing data flows, comprising sensing data, collecting and sharing information about vehicles, pedestrians, road and weather conditions are built for autonomous vehicles [16]. They utilize a numerous number of

156

I. Lashkov and A. Kashevnik

modern sensors, including cameras, LIDARs, radars and lasers to facilitate different safety technologies. Today, user assistants are gaining popularity in many aspects of life. Currently available digital assistants are focused on making user life easier and more comfortable by automating different types of daily tasks. Context-relevant information can be perfectly fitted in vehicle safety systems in a form of intelligent driver assistants [17, 18]. This way driver assistant can provide actionable information for a driver in a certain scenario to increase its driving performance and road safety. Typical use cases for a vehicle driver are context-relevant driving safety tips or recommendations for navigation, eco driving, safe driving using visual format or natural language dialog. One of the positive aspects of the listed research studies is that they consider different types of context for their needs. But the work of these systems generally relies on the utilization of certain types of information. The use of information relevant for various real scenarios can influence the performance of some work or process, optimize it and potentially propose the new ways of solving tasks.

3 Context Model Driver behavior is a result of complex interactions between the driver, the vehicle and the environment. According to the above listed studies and the survey of research studies related to the smartphone-based context [19], sensor’s information taken from the smartphone can be categorized into four following groups. Based upon these formed groups the context model for the intelligent driver assistant on the road has been developed as shown in Fig. 1.

Fig. 1. Context model for intelligent driver assistant on the road.

Smartphone-Based Intelligent Driver Assistant: Context Model

157

Driver context includes type of smartphone used by a driver; physiological state like drowsiness, distraction or drunk state; reckless (abnormal) driving actions (parameters) the driver makes on public road; the driving style (eco, normal, aggressive driving); smoothness of driving, in-cabin situation (speaking with passengers, noise level), navigation route provided by some navigation system, personal schedule or calendar of tasks provided by some third party service; reminder list; driver preferences (e.g. music genre, audio level); phone contact list; trip information, including trip elapsed time and distance, start point of trip, destination point of trip; internet connection availability; health related information (e.g. from wearable device) that are pulse or pressure; and driver reaction time. Vehicle context: the vehicle location (longitude, latitude, altitude), speed (speedometer), fuel level, fuel usage, accelerator pedal position, airbag state, yaw angle, tilt angle, RPM vehicle infotainment system state, tire pressure (flat tire), lights state (switch on/off), vehicle engine status. Road context: nature of surface, width, conditions, obstacles, road accidents, accident rate, traffic congestion (e.g. traffic volume), road works, road signs (types: guidance, warning, regulation), (e.g. speed limit, driving directions), other road users (e.g. vehicles, pedestrians, cyclists), traffic signals (e.g. red traffic signal), lightning for section of road. Environment context: weather conditions (e.g. humidity, temperature, pressure, wind speed/direction), weather forecast, current time, nearest POIs (Point of Interest) (location, working hours, food and beverages availability, prices, place to sleep, gas availability). These formed groups of context allow to describe driver for every real situation.

4 Context-Aware Dangerous State Detection Based on Camera Frame Skipping The process of continuously recognizing dangerous driving states for each frame taken from the front-facing camera can significantly reduce the battery charge over time and make a noticeable load not only for the whole application, but also affect the operating system of the smartphone. Therefore, the context can be used to address this problem. Numerous technical improvements had been recently contributed to driver assistance systems to increase road safety and driving performance. This kind of driving safety systems is limited in using personalization and adaptation for a driver. Developed context model can aid system to make context-based decisions while driving by managing driver personal information and situation-relevant information. There is a significant number of use cases whereas context can improve the overall system performance. A number of research studies proves that modern smartphones [20] can be considered as a standalone platform with integrated cameras, GPS, accelerometer, magnetometer, gyroscope sensors inside and can be efficiently utilized by the drivers to reduce or mitigate the occurring road accidents, improve their safety, driving skills and performance. The essential part of the modern smartphone is a

158

I. Lashkov and A. Kashevnik

front-facing camera that outputs a set of consecutive image frames. This type of camera is pretty generally applicable in continuously monitoring driver behavior and recognizing dangerous situation detection tasks. Assuming the detection of dangerous situations should work without interruptions for each incoming frame, this process can quickly drain the smartphone battery till the full discharge. To improve the smartphone’s battery usage and reduce the application overall load on the CPU and other OS systems, we propose to consider the following developed context-based algorithm for skipping irrelevant camera frames. The generalized process of processing camera frames is shown in Fig. 2. Current research studies and projects show that the time to collision parameter is actively used in building-up monitoring driving behavior systems. This parameter depends on the time, that is used for dangerous state recognition and driver’s reaction time. The driving safety system should place its work in the interval ts used for detecting dangerous situation teðiÞ in each frame taken from the front-facing camera. Since modern smartphones do not lack of high-performance computational power, they are able to process a pretty high number of camera frames in a certain period of time, reaching a peak of more than 10 frames per one second. This number of frames is excessively much for predicting and recognizing hazardous driving behavior in a real situation through the interval of 1.5 s.

Fig. 2. Camera frame processing on the timeline.

A number of frames exceeding the high limit of maximum processed frames nlimit can be evenly skipped while recognizing dangerous situation without strong influence on the accuracy of the ongoing process. This is achieved by diving the time of

Smartphone-Based Intelligent Driver Assistant: Context Model

159

dangerous state recognition in time periods for recognizing unsafe driving behavior and skip time intervals. Skip time interval is a period of time during which driver behavior recognition task can be skipped for the frame received from the camera. This parameter

Fig. 3. UML diagram of camera frame skipping while detection of dangerous states.

160

I. Lashkov and A. Kashevnik

is estimated as the ratio of time needed for dangerous state recognition ts and max count of processed frames nlimit for the dangerous state, excluding the average processing time of a single frame teavg . The full scheme of context-based algorithm for frame skipping while recognizing dangerous state is presented in Fig. 3 and can be described as follows. The goal of the whole algorithm is to estimate the skip time interval that can be further used for choosing only a limited number of frames nlimit that can be submitted for dangerous state recognition task. The preliminary operation of the algorithm is a warming-up action, that skips first “cold” camera frames COLD_F for stabilizing further frame processing calculations. The first task of the algorithm is to estimate the average time to process single frame teavg and average skip time interval tsk this way (1): tsk ¼

ts nlimit

teavg

ð1Þ

where teavg is as an average processing time of single frame. Total time of skipping frames sum related to dangerous state (2): is calculated as the sum of all skip time intervals tsk sum ¼ tsk

X

tsk

ð2Þ

Otherwise, if the average skip time interval tsk has been already calculated, we sum check whether the sum of skip time intervals tsk is equals or greater than a fixed skip sum and, finally, we proceed to time interval tsk and, if it does, we subtract tsk from tsk frame recognition. In case, if there is no enough time skip interval for current dangerous state recognition, we skip this camera frame and wait for a new one. This algorithm involves the processing of every frame read from the smartphone camera one after another. The proposed context-based algorithm can aid driver assistance system to reduce the battery drain and increase the system performance for other vehicle related tasks.

5 Evaluation The evaluation has been carried out with 15 volunteer drivers of different ages in vehicles and implemented in according to the following methodology. The implementation of the proposed context-based algorithm for frame skipping while recognizing dangerous state has been developed in Kotlin language2 (Listing 1) and, afterwards, tested in the mobile application installed on smartphones. This approach estimates the time recognition, smartphone’s battery drain and performance while determining dangerous driving behavior.

2

https://kotlinlang.org/.

Smartphone-Based Intelligent Driver Assistant: Context Model

161

class FrameSkipper { private val maxFrames by lazy { CommonPrefs.instance.maxProcessCameraFrames } private val observableTimeInterval by lazy { RecognizePrefs.getInstance().dangerDetectInPeriodMs } private var coldFrames = 0 private var fixInterval = false private var frameCountProcessInInterval = 0 private var framesCount = 0 private var lastTimeUsed = 0L private var averageTimeProcessFrame = 0L private var totalTimeProcessFrame = 0L private var freeSkipIntervalTimeSum = 0L private var freeSkipIntervalTime = 0L private var timeRemainFreeInIntervalSum = 0L private var timeRemainFreeInInterval = 0L private var lastItemSkip = false private var lastTimeStart = 0L fun canSkipFrame(): Boolean { val timeNow = DateUtils.getCurrentDateInMs() val isSkip = shouldSkipFrame(timeNow) lastItemSkip = isSkip lastTimeUsed = timeNow return isSkip } private fun shouldSkipFrame(timeNow: Long): Boolean { if (coldFrames++ < COLD_START_SKIP_FRAMES) return false if (fixInterval) return whenIntervalFixed(timeNow) if (lastTimeUsed != 0L) { val diffBetweenFrames = timeNow - lastTimeUsed framesCount++ totalTimeProcessFrame += diffBetweenFrames } if (!fixInterval && totalTimeProcessFrame >= observableTimeInterval) { averageTimeProcessFrame = totalTimeProcessFrame / framesCount frameCountProcessInInterval = framesCount - 1 fixInterval = true if (frameCountProcessInInterval > maxFrames) { val redundanceFrames = frameCountProcessInInterval - maxFrames freeSkipIntervalTimeSum = redundanceFrames * averageTimeProcessFrame val intervals = maxFrames - 1 freeSkipIntervalTime = freeSkipIntervalTimeSum / intervals } return false } return false } private fun whenIntervalFixed(timeNow: Long): Boolean { if (freeSkipIntervalTimeSum == 0L) return false val diffTimeFromPrevCall = timeNow - lastTimeUsed if (lastTimeStart == 0L) { lastTimeStart = timeNow } else if (timeNow - lastTimeStart >= observableTimeInterval) { timeRemainFreeInIntervalSum = freeSkipIntervalTimeSum timeRemainFreeInInterval = freeSkipIntervalTime lastTimeStart = timeNow } if (timeRemainFreeInInterval >= diffTimeFromPrevCall) { timeRemainFreeInInterval -= diffTimeFromPrevCall return true } timeRemainFreeInInterval = freeSkipIntervalTime timeRemainFreeInIntervalSum -= freeSkipIntervalTime return false } }

Listing . 1. The context-aware algorithm for skipping camera frames written in Kotlin.

162

I. Lashkov and A. Kashevnik

The smartphone camera video stream is created and started to continuously receive preview frames from the front facing camera. The application utilises the developed context-aware algorithm for skipping irrelevant frames to optimize the smartphone’s battery usage and improve the performance of the overall application for other heavy tasks. The applicability and validity was shown in the experiment of the comparison of the time recognition to detect dangerous state in driving behavior with smartphones of different models and manufactures (Fig. 4), where E is a smartphone computing power, n is the number of camera frames being processed, t is the time that the smartphone uses to determine dangerous state in driving behavior. Evaluation of the time required to recognize dangerous state of the driver was conducted provided that the driver’s reaction time is defined as 0.5 s, being a quite common for drivers. The most of the presented smartphones that are mid-range phone or flagships may recognize more than 12 dangerous situations in two seconds that is a high bound of the acceptable range. It means that, for example, Xiaomi Mi 5 smartphone, can skip seven frames and provide nearly the same performance of dangerous state recognition for a driver by processing only five camera frames.

Fig. 4. Evaluation of time recognition and smartphone’s performance while determining dangerous driving behavior

The proposed algorithm was integrated in the mobile Drive Safely application that is intended for Android-based smartphones. The screenshot of the application (Fig. 5) shows that the application recognized drowsiness dangerous state of the driver and made a warning signal for him in a form of textual messages and audible signals.

Smartphone-Based Intelligent Driver Assistant: Context Model

163

Fig. 5. Screenshot of the Drive Safely application in the moment of recognizing drowsiness dangerous driver behavior utilizing the front-facing camera of the smartphone and alerting driver to pay its attention.

6 Discussion In this study, we proposed the context-aware approach that accumulates the information about vehicle driver and the current environment situation (context). The context is applicable for monitoring driving behavior, adapting safety system for a driver, and recognizing dangerous situations, early alerting driver about road hazards and preventing or mitigating traffic accidents. This approach eliminates the drawbacks of current research studies and solution by leveraging different types of context. The proposed context-based approach was implemented and evaluated with the aid of Drive Safely mobile application for smartphone. Nevertheless, the presented context-based approach can be also successfully applied in fastest-growing category of advance driver assistance systems, in-cabin cameras focused on monitoring driver facial features and wearable electronic devices, equipped with built-in sensors, tracking driver’s biological measurements while driving. Although advanced driver assistance systems equipped with many high-precision sensors show high accuracy and performance in recognizing driving dangerous situations in different weather conditions, the mobile applications built for smartphones are at much lower price, and are much popular among people in almost every country that is easy to use in every vehicle. The beneficial effect of the proposed context-based approach is to consolidate the information about driver profile, its preferences and environment situation that would allow driver safety system to increase its performance and accuracy and adapt for driver needs in better personalized way.

164

I. Lashkov and A. Kashevnik

7 Conclusion The paper presents the context model for the intelligent driver assistant on the road. This model was divided into four following groups: driver context, vehicle context, road context and environment context. The developed context model can aid system to make context-based decisions while driving by managing driver personal information and situation-relevant information. To improve the smartphone’s battery usage and reduce the application overall load on the different subsystem of the operation system, we proposed the context-based algorithm for skipping irrelevant camera frames without affecting the accuracy of dangerous driving state recognition. The evaluation of the developed context-aware algorithm of skipping camera frames shows that it allowed to reduce the computation complexity to the smartphone processor by three times in compare with general scheme. In future work, we expect to collect the dataset of driving statistics in real scenarios involving people of different age with different vehicles to test the proposed context model. We plan to estimate the influence and performance of concrete context parameters on the overall driving safety by utilizing machine learning techniques. In these experiments we intend to use the developed Android-based mobile application Drive Safely aimed at recognizing dangerous behavior and alerting driver to prevent the road accident. As an extension to this work, we consider adding more driving dangerous states, that is drunk driving, aggressive driving, and involving of utilizing non-smartphone external sensor devices, that in its turn can allow to expand the use of proposed context-based approach. Acknowledgments. The research is funded by the Russian Science Foundation (project # 1871-10065).

References 1. Dey, A.K.: Understanding and using context. Pers. Ubiquitous Comput. 5(1), 4–7 (2001) 2. Smirnov, A., Kashevnik, A., Lashkov, I., Hashimoto, N., Boyali, A.: Smartphone-based two-wheeled self-balancing vehicles rider assistant. In: Proceedings of the 17th IEEE Conference of the Open Innovations Association FRUCT, Yaroslavl, Russia, pp. 201–209 (2015) 3. Smirnov, A., Kashevnik, A., Lashkov, I., Baraniuc, O., Parfenov, V.: Smartphone-based dangerous situation identification while driving: algorithms and implementation. In: Proceedings of the 18th IEEE Conference of the Open Innovations Association FRUCT, Finland, pp. 306–313 (2016) 4. Lashkov, I., Smirnov, A., Kashevnik, A., Parfenov, V.: Ontology-based approach and implementation of ADAS system for mobile device use while driving. In: Proceedings of the 6th International Conference on Knowledge Engineering and Semantic Web, Moscow, CCIS 518, pp. 117–131 (2015) 5. Fazeen, M., Gozick, B., Dantu, R., Bhukhiya, M., Gonzalez, M.C.: Safe driving using mobile phones. IEEE Trans. Intell. Transp. Syst. 13(3), 1462–1468 (2012)

Smartphone-Based Intelligent Driver Assistant: Context Model

165

6. Dumitru, A.I., Girbacia, T., Boboc, R.G., Postelnicu, C.-C., Mogan, G.-L.: Effects of smartphone based advanced driver assistance system on distracted driving behavior: a simulator study. Comput. Hum. Behav. 83, 1–7 (2018) 7. Eftekhari, H.R., Ghatee, M.: A similarity-based neuro-fuzzy modeling for driving behavior recognition applying fusion of smartphone sensors. J. Intell. Transp. Syst. 23, 1–12 (2019) 8. Pandey, P.S.K., Kulkarni, R.: Traffic sign detection for advanced driver assistance system. In: 2018 International Conference on Advances in Communication and Computing Technology (ICACCT), Sangamner, pp. 182–185 (2018) 9. Yun, D.S., Lee, J., Lee, S., Kwon, O.: Development of the eco-driving and safe-driving components using vehicle information. In: 2012 International Conference on ICT Convergence (ICTC), Jeju Island, pp. 561–562 (2012) 10. Hirschfeld, R.A.: Integration of Vehicle On-Board Diagnostics and Smart Phone Sensors. US20110012720A1, United States Patent and Trademark Office, 20 January 2011 11. Fang, A., Qiu, C., Zhao, L., Jin, Y.: Driver risk assessment using traffic violation and accident data by machine learning approaches. In: 2018 3rd IEEE International Conference on Intelligent Transportation Engineering (ICITE), Singapore, pp. 291–295 (2018) 12. Suzuki, H., Marumo, Y.: A new approach to green light optimal speed advisory (GLOSA) systems and its limitations in traffic flows. In: Proceedings of the 1st International Conference on Human Systems Engineering and Design (IHSED2018): Future Trends and Applications, 25–27 October 2018. CHU-Université de Reims Champagne-Ardenne, France, pp. 776–782 (2019) 13. Biswas, S., Tatchikou, R., Dion, F.: Vehicle-to-vehicle wireless communication protocols for enhancing highway traffic safety. IEEE Commun. Mag. 44(1), 74–82 (2006) 14. Abdulsattar, H., Mostafizi, A., Siam, M.R.K., Wang, H.: Measuring the impacts of connected vehicles on travel time reliability in a work zone environment: an agent-based approach. In: Journal of Intelligent Transportation Systems Technology Planning and Operations (2019) 15. Kulla, E., Jiang, N., Spaho, E., Nishihara, N.: A survey on platooning techniques in VANETs. In: Complex, Intelligent, and Software Intensive Systems, pp. 650–659. Springer (2019) 16. Mahmood, A., Butler, B., Sheng, Q.Z., Zhang, W.E., Jennings, B.: Need of ambient intelligence for next-generation connected and autonomous vehicles. In: Guide to Ambient Intelligence in the IoT Environment, pp. 133–151. Springer (2019) 17. Aziz, Z., Nandi, A. Microsoft Technology Licensing LLC. Digital assistant for vehicle related activities. US10169794B2, United States Patent and Trademark Office, 01 January 2019 18. Wolverton, M.J., Mark, W.S., Bratt, H., Bercow, D.A.: SRI International. Vehicle personal assistant. US20140136187A1, United States Patent and Trademark Office, 19 January 2012 19. Engelbrecht, J., Booysen, M.J.(Thinus), van Rooyen, G.-J., Bruwer, F.J.: A survey of smartphone-based sensing in vehicles for ITS applications. IET Intell. Transp. Syst. 9, 1–22 (2015) 20. Singh, G., Bansal, D., Sofat, S.: A smartphone-based technique to monitor driving behavior using DTW and crowdsensing. Pervasive Mob. Comput. 40(9), 56–70 (2017)

Voice Recognition Based System to Adapt Automatically the Readability Parameters of a User Interface Hélène Soubaras(B) Thales Research & Technology, palaiseau, France [email protected] http://www.thalesgroup.com

Abstract. When a user interface (UI) is displayed on a screen, parameters can be set to make it more readable to the user: font size and type, colors, brightness, widgets, etc. The optimal settings are specific to each user. For example, dark backgrounds are better for many visually impaired people who are dazzled. Adjusting the settings may be time-consuming and inefficient because of the user subjectivity. The proposed approach optimizes them automatically by using a measure of the reading performance. After a survey of existing set-ups for optimizing UIs, a new system composed of a microphone with voice recognition, and an optimization algorithm to perform reinforcement learning (RL), will be proposed. The user reads aloud a text displayed through the UI, and the feedback adaptation signals are the reading performance criteria. The UI parameters are modified while the user is reading, until an optimum is reached. Keywords: Graphical user interface · Ux design · Readability · Machine learning · Optimization · Visually impaired · Voice recognition

1 Introduction Man-machine interaction is a key issue in intelligent systems. The majority of serious accidents and crashes of complex systems are due to human errors that can be reduced by improvements in the human-computer interaction (HCI) function. Moreover, 50% of the cost of software products is dedicated to their HCI. This is why developing efficient methodologies to make the user interfaces (UI) as readable as possible is crucial. When a UI is displayed on a screen, parameters can be set to make it more readable to the user: font size and type, colors, brightness, layout, widgets, etc. The optimal settings are specific to each user. For example, dark backgrounds are better for many visually impaired people who are dazzled. Adjusting the settings may be time-consuming and inefficient because of the user subjectivity. The aim of this work is to propose a system to adapt the readability parameters of a UI, for two fields of application:

c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 166–178, 2020. https://doi.org/10.1007/978-3-030-29513-4_12

UI Parameters Adaptation

167

– The accessibility to the visually impaired people. – The improvement of the performances of systems controlled by a human being (cockpits, software development environments, etc.). One wants to obtain a device which measures the performances of the readability of a UI for a given user. It allows us to recommend the optimal settings. The user has to take a short test which consists in reading aloud a text displayed in the UI with given parameters of shaping (colors, fonts, layout, etc.). These parameters must be set adaptively until an optimum is reached. The additional advantages of such a system is that it could also be used for: – – – –

UI design. Informing the user about what are the best parameters for her. Setting a UI (e.g. at the beginning of use). Measuring somebody’s intrinsic reading performances (vision capacity, cognitive performance, knowledge of a language, etc.).

In this paper, the problem of parameter settings to solve will be detailed, in particular in the case of visually impaired people. A review of the literature will then be made on the different aspects of the proposed solution, which will be described in a third part.

2 The Addressed Problem of Settings Optimization One aims at fitting the display settings of a user interface (UI) in order to optimize its readability for a given user, particularly for visually impaired people. Those parameters may be: – – – – – – – – – – – – –

The screen resolution. The font size. The font and background colors, contrast. The font thickness. The font type. The margins and the spacing of lines, words, letters, paragraphs, etc. The columns width. The thickness, color and type of borders. The magnifying functions and associated image processing: binary or smoothed, and if so the kind of interpolation used (see Figs. 1 and 2). The scrolling functions, the management of text overflow (wrapping or scrolling), text zone pointing. The scrolling speed, blinks and other animation parameters. Augmented reality settings. Any function in the interface that can have an impact on the reading performance.

Such a device can also help in the design by quantifying the effect of the following data: – The screen dimensions.

168

H. Soubaras

Fig. 1. Blocking effect and smooting in a zoomed in image [10]).

Fig. 2. Blocking effects make zoomed in low resolution texts unreadable for visually impaired people.

– – – – –

The distance between the screen and the user. The trade-off between the font and the screen sizes. The ambient luminosity. The dazzling factors. The trouble due to lag or the lacks of fluidity in the displayed content, etc.

Today, especially in case of visual impairment, the user must adjust by herself manually her parameters of display in the “Preferences” menu of her software. The problem is that it is difficult to find the optimum, because the evaluation of the obtained comfort is subjective. The people suffering from low vision know badly their own needs. For example, the majority of them read better in clear characters on a dark background, because they are very often dazzled (see Fig. 3). Nevertheless, many are not yet aware of this fact and continue to work with white backgrounds. The other issue is that some differences are so small that it is difficult to assess them. This is the case for example in the choice of the smoothing of the “ClearType” font proposed in Windows 10: they suggest to the user making boring choices between fonts that are impossible to distinguish for the visually impaired! Finally, of course, the adjustment of the parameters requires a good knowledge of the software. That is why an automated solution would be useful. These subjectivity and perception level issues are also relevant for all normallysighted users.

3 Related Work This section provides a state-of-the-art illustrating existing methods that can solve a part of the problem.

UI Parameters Adaptation

169

Fig. 3. Most of the visually-impaired people see better of dark backgrounds. Always think about using the “automatic” font color instead of the black one, since it inverts automatically.

3.1 Methods for Interface Design with User Tests This section provides an overview of the methods that involve tests on users. The collected data can be questionnaires and results of various measures. After the test campaign, they are taken into account to improve the product, but their exploitation is not automated. First, there exist psychophysical measures that were used first in the medical domain. The most classical are the threshold methods to evaluate the detection threshold of a sensory organ, or the capacity to distinguish two different stimuli. Some ear, nose an throat specialists [16] have soon improved these methods for audio diagnosis by automatizing the generation of test sounds. Such methods in the vision domain allow us to determine the minimal font size and color contrast that can be read by somebody. In France, the CCU (conception centrée utilisateur, or user-centered design) has become a methodology based on the ISO 13 407 norm. The steps are: – The analysis of the need. – The design. – The user evaluation. One evaluates the produced solutions (prototypes) following the requirements. If they are not satisfied, one must restart the phase of understanding of the use context and requirements, and the process is iterated.

170

H. Soubaras

The ERGOLAB web site (http://www.ergolab.net) says: “The means for implying the users use methods such as interviews or questionnaires, observation, focus groups and user tests.” “Piloting user tests with a precise evaluation protocol allows us to detect easily the interface defects.” “The ISO 13 407 norm addresses the largest field of software design. It gives the requirements for a project to be human-centered. This norm has five principles. Among them are: – To take into account the user need uphill. – Active user participation. – Solutions iteration.” Here are some examples of criteria that are measured objectively: – – – – – – –

Success rate. Number of errors. Task execution time. Number of steps necessary to complete a task. Need for help to use the product. Learning time. User satisfaction, etc.

All tests results (questionnaires and measures) are analyzed by a human expert before being exploited in the following engineering cycle. There is no automation, in contrary to the proposed idea described in this paper. More generally, the UX design (User Experience Design) (called design d’expérience in French), relies on the experience feedback given by the user, as for example the “I like” or “I don’t like” to be clicked on web sites. This principle is not limited to the ergonomics. It provides a test plan in the design cycle of a product. Such a methodology has been applied to test a screen magnifier for visually impaired controlled by a joy-stick [10]. The tests have been performed on a test software and then on a web page. Two methods have been used to collect the data: the think aloud and the questionnaire. 3.2

Reading Tests

Apart from design purposes mentioned above, reading performance evaluation tests have been developed since decades to control if children learn correctly, or to detect dyslexia. In France, the most famous test of reading speed is “le test de l’Alouette” proposed in 1967 and revised in 2004 [11, 12]. It consists in reading a text where the words have poor signification within 3 mn. But many other tests have been developed afterwards [2]. They allow us to detect a number of learning and cognitive issues in the child. For example, some of them test the understanding of the read texts, by presenting to the child two different sentences that have the same meaning or almost. Other tests evaluate the confusion between words that are more or less similar in their meaning, their pronunciation of their spelling.

UI Parameters Adaptation

171

The knowledge which allowed to elaborate the Speed Reading methods [20], showing that one must limit eye’s movements, have soon been used in display screens with the “rapid serial visual presentation (RSVP)” technique [3]. Many other reading performance measures have been used since in screen design [8, 21]. A deeper study on readability was performed for Chinese characters [25]. With user tests, if concludes on recommendations about colors and spacings that must be used for a good readability, noticing that the screen type (LCD or CRT) had no influence. In particular, there have been many ergonomics studies during the last decades on head-up displays (HUDs) in aeronautics and automobile industry. Simple criteria are used such as road signs and speedometer reading speed [9, 13, 18, 26]. 3.3 Using Algorithmic Optimization Methods But today, the use of combinatorial optimization for UI design has started to develop, as shown in a recent survey [17]. One uses algorithms such as branch-and-bound or ant colony to optimize objective functions with a heuristics inspired from the model used in the UI. This is also made possible thanks to the emergence of user models and the development of UI description languages. For example, there are many improvements in the visual representation of scientific data such as clouds of points, etc. The team who achieved the most significant works for the automatized adaptation of UIs to users with possibly special needs is probably Gajos et al. [5, 6]. These works rely on a tool called SUPPLE [4], a configurable UI they developed during more than six years. In SUPPLE one can set – the layout, – the widgets choice, and – the size and colors of characters and elements. The measure they use for adaptation is the user trace, i.e. the succession of interaction actions. The optimized criterion is the average task execution time. The adaptation algorithm is a branch-and-bound improved with constraint propagation. They also tested a local algorithm but they obtained less good results in terms on convergence speed. They did a lot of work on physical disability, and more recently they integrated the visually impaired user needs in their solution [5]. This way, they could obtain UIs with different aspects shown in Fig. 4. The problem of adaptation of an interface to a user is equivalent to the problem of adaptation of an interface to any kind of terminal. This is affirmed in recent works [15, 23] which provided a good state-of-the-art on UI adaptation to disabled users. They propose a model for the context of use and they adapt the UI with a combinatorial optimization algorithm (an example is shown in Fig. 5). There are also works aiming at optimizing interfaces for educational games [14]. They benefit from an on-line game platform where a great number of children play, allowing to do statistics. They implemented a multi-armed bandit learning algorithm. One finds also in the literature some more general works on reinforcement learning (RL) involving a human [24], in a way that can be applied to UI optimization. This approach does not consider the algorithm itself but the addition of a video camera as an

172

H. Soubaras

Fig. 4. Rendering given by SUPPLE for people having or not a physical or visual disability [5].

Fig. 5. UI generated for a iPad (left) and a desktop PC (right) [15].

annex user feedback measuring means. The principle is illustrated in Fig. 6. The reward data are: – The satisfaction signals given by the user on a pedal. – The facial expression analyzed through the camera.

UI Parameters Adaptation

173

Fig. 6. A test system with user feedback on a pedal and on a video camera, and with a RL algorithm [24].

3.4 Image Processing Image processing algorithms that can enhance the readability already exist in many available products dedicated to images (IrfanView, GIMP, Photo Shop, etc.). As mentioned above, [10] proposes image processing associated to zooming to avoid blocking effects. Image processing has also been introduced in products that are dedicated to visually impaired such as PortaNum [22].

4 Proposed Solution As one can notice is the previous Section, the literature offers many interesting elements including a configurable UI (SUPPLE) and combinatorial optimization algorithms for UI optimization. But there is still no approach dedicated to the readability with automatic learning and subjective measurement of the user performance. 4.1 Proposed System Architecture We propose to optimize automatically the readability parameters of a UI provided the reading performance (speed, fluidity, error rate) measured on a displayed text. The tested user must read aloud the proposed text, the his voice will be registered by a microphone and recognized in real time. This proposed device scheme is given in Fig. 7. 4.2 Adaptation Algorithms The problem of optimizing those readability parameters car be treated as a RL problem, since the system will try sets of parameters, measure they positive or negative impact on

174

H. Soubaras

Fig. 7. Proposed device to optimize automatically the readability of texts in a UI.

the reading performance, and then readjust them. The problem may be combinatorial (due to the nature of the parameters to set) or continuous if the study focuses on a transfer function characteristics for example. Various solutions may be appropriate following the studied system. In the continuous case, a gradient-like algorithm can be efficient. In a mixed case, a simulated anneal metaheuristics can be a good candidate. Additional criteria showing that the optimum is reached may be: – stability (the user is steady), or – signals given by the user, etc. 4.3

An Implementation for the Reinforcement Learning

The principle of RL is to learn how to behave in front of an unknown system by taking into account the positive and negative feedback signals called rewards. It provides when stimulated. the model which is classically used to represent the unknown system is a Markov Decision Process (MDP). A MDP [19] is a sequential decision-making process, i.e. a system where an action must be performed at each time step t. Each action has two effects: it influences the following system state, and it provides a gain called reward. The MDP is modeled by a tuple (Ω, A, Q, R, P0 ) where Ω is the set of all possible states s of the system, and A is the space of all possible actions a. The applications Q and R: Q : Ω × A × Ω −→ [0; 1] (s, a, s ) −→ Q(s, a, s ) = Pr(s |a, s) R : Ω × A × Ω −→ (s, a, s ) −→ R(s, a, s ) = R(s |a, s)

UI Parameters Adaptation

175

are respectively the transition probability and the reward function providing the reward r obtained when moving from state s at time t to state s at the following time t + 1 on action a. And P0 is the vector of the initial probabilities of all the states. The use of RL to improve UIs has been proposed recently in [24]. Their system is illustrated at Fig. 6. They focus on the comparison between the families of positive or negative rewards: – satisfaction signals given willingly by the user through a pedal, – an estimation of the user’s satisfaction obtained through the analysis of her face captured by a webcam. In the proposed setup (see Fig. 7), the webcam is replaced by a microphone and the user satisfaction criteria are replaced by user reading performance criteria such as, typically: – r1 related to the reading speed V, – r2 related to the confusion rate θ in the reading of similar words. As there are two (or more) rewards, they can be combined by any aggregation function (weighted average, etc.) to obtain a single global reward. Aggregation functions are well known in the multi-criteria decision making domain. 4.4 Fitting Discrete Parameters We focus here on the case where the actions that can be applied to the system are discrete and their space has no metric, so no neighborhood can be defined. This is the case for the font family to use in the proposed UI (e.g. Courier New, Arial, Comics Sans MS, etc.). Performing action a is applying a given font family. The reward is a gain or a loss in V or θ. The system is modeled as a MDP. When applying an action a to a MDP in state s the MDP evolves to state s and provides a reward r: (s, a) −→ (s , r) with probability Pr(s |s, a). The state s of the system is the set of its appearance parameters. So, the set of states Ω is a Cartesian product of the space of the font families and the other parameters. When applying a to s, one obtains a new state s . Here s is deterministic given a, but the reward r1 = V(s ) − V(s) and

r2 = θ(s ) − θ(s)

are probabilistic since they are obtained from measurements (that are necessarily noisy).

176

4.5

H. Soubaras

Fitting Continuous Parameters

We focus here on continuous features of the UI appearance, such as the font size or the color coordinates (R, G, B). The possible actions consist in increasing or decreasing those parameters. The system behavior will be continuous w.r.t. the action. The most appropriate model is to use a gradient optimization, viewed as a particular case of RL. The gradient is en = yn − yn−1 where the system output at step n is yn . To optimize y one must provide to the system an input xn such that xn+1 = xn + αen (xn − xn−1 ) where α is a coefficient. This expression can be extended to the multidimensional case with a scalar product. Here, x is the action a and y is the reward r. In other words, if one sees that the change in the parameters decreases the performance, one reverses the change. As the system has both discrete and continuous parameters, the model to apply is hybrid. 4.6

Orders of Magnitude and Practical Aspects

It seems reasonable that the system stabilizes within a couple of minutes, so the amount of texts to be read by the user has to be chosen for this total reading time lapse. The system needs no user profile data base since it performs no classification of the user. This is why it should also be quite robust to the user profile. However this method is to be developed particularly for disabled people, which will guarantee some robustness. 4.7

Choosing the Text to Be Read by the User

Existing data bases of texts for reading performance evaluation can be used. Existing tests that were developed for pedagogical purpose [2] are well suited. They need to be long enough to show the reading speed and fluidity, and the reading error rate.

5 Conclusion We proposed a new device for adjusting the appearance parameters of a GUI to a given user with aspects that are particular: – the use of a microphone and a text to be read by the user, – the aspect is optimized automatically without being affected by the user’s subjectivity and imprecision, and – the system can help the user to know what is the best for her.

UI Parameters Adaptation

177

A theoretical description of the proposed approach was given in this paper. It can be applied to a wide variety of UIs for visually impaired or for all users as for example the SUPPLE UI design software [4]. It can also be used to fit the sensitivity and speed parameters of the new zooming interaction model with gesture control for tablets and smartphones that was developed and published recently [1, 7]. The idea was proposed theoretically here. The future work to do is its evaluation. To implement the voice recognition feedback loop, one will have to study which measured criteria are more relevant, how to measure them (voice speed and rhythm), and how to cope with the errors due to the voice recognition itself. A system with some continuous parameters optimization should be developed in a first step before trying reinforcement learning for discrete parameters. The criteria to evaluate the quality of the obtained system can also be studied as a human factor point of view: user satisfaction, speed of convergence, and performance.

References 1. Cherké, S., Girard, N., Soubaras, H.: Des malvoyants a´ l’industrie: une nouvelle commande gestuelle dezoom pour tablettes et smartphones. In: Proceedings of HANDICAP, Paris, France, June 2018 2. Ecalle, J.: L’évaluation de la lecture et des compétences associées. Revue française de linguistique appliquée 15(1), 105–120 (2010) 3. Forster, K.I.: Visual perception of rapidly presented word sequences of varying complexity. Atten. Percept. Psychophys. 8(4), 215–221 (1970) 4. Gajos, K., Weld, D.S.: SUPPLE: automatically generating user interfaces. In: Proceedings of the 9th International Conference on Intelligent User Interfaces, pp. 93–100. ACM (2004) 5. Gajos, K.Z., Wobbrock, J.O., Weld, D.S.: Automatically generating user interfaces adapted to users’ motor and vision capabilities. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology, pp. 231–240. ACM (2007) 6. Gajos, K.Z., Wobbrock, J.O., Weld, D.S.: Improving the performance of motor-impaired users with automatically-generated, ability-based interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1257–1266. ACM (2008) 7. Girard, N., Cherké, S., Soubaras, H., Nieuwenhuis, K.: A new gesture control for zooming on tablets and smartphones for visually impaired people. In: International Conference on Computers Helping People with Special Needs, pp. 351–358. Springer (2018) 8. Gish, K.W., Staplin, L.: Human factors aspects of using head up displays in automobiles: a review of the literature. Technical report, US Dept of Transportation, National Highway Traffic Safety Administration, Washington, DC, Interim Report, August 1995 9. Kiefer, R.J.: Quantifying head-up display (HUD) pedestrian detection benefits for older drivers. In: 16th International Technical Conference on Experimental Safety Vehicles. NHTSA, Windsor, pp. 428–437 (1998) 10. Kurniawan, S., King, A., Evans, D.G., Blenkhorn, P.: Design and user evaluation of a joystick-operated full-screen magnifier. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 25–32. ACM (2003) 11. Lefavrais, P.: Test de l’alouette (1967) 12. Lefavrais, P.: Test de l’Alouette Révisé. Editions du Centre de Psychologie Appliquée, Paris (2005) 13. Lino, T., Otsuka, T., Suzuki, Y.: Development of heads-up display for a motor vehicle. Technical report, SAE Technical Paper (1988)

178

H. Soubaras

14. Lomas, J.D., Forlizzi, J., Poonwala, N., Patel, N., Shodhan, S., Patel, K., Koedinger, K., Brunskill, E.: Interface design optimization as a multi-armed bandit problem. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 4142–4153. ACM (2016) 15. Macik, M., Cerny, T., Slavik, P.: Context-sensitive, cross-platform user interface generation. J. Multimodal User Interfaces 8(2), 217–229 (2014) 16. Spillmann, T., Dillier, N.: Test d’audiométrie par ordinateur dans le diagnostic audiométrique (1988) 17. Oulasvirta, A.: User interface design with combinatorial optimization. Computer 50(1), 40– 47 (2017) 18. Post, D.L., Lippert, T.M., Snyder, H.L.: Color contrast metrics for head-up displays. In: Proceedings of the Human Factors Society Annual Meeting, vol. 27, pp. 933–937 (1983) 19. Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994) 20. Richaudeau, F., Gauquelin, M., Gauquelin, F.: La lecture rapide: une méthode moderne pour apprendre sans peine: lire mieux et davantage, décupler son information, Marabout (1999) 21. So, J.C.Y., Chan, A.H.S.: Design factors on dynamic text display. Eng. Lett. 16(3), 368–371 (2008) 22. Soubaras, H., Colineau, J.: PortaNum: une nouvelle aide technique la vision de loin comportant du traitement d’images pour les malvoyant. In: Proceedings of HANDICAP, Paris, France (2002) 23. Tomasek, M., Cerny, T.: Automated user interface generation involving field classification. Softw. Netw. 2018(1), 53–78 (2018) 24. Veeriah, V., Pilarski, P.M., Sutton, R.S.: Face valuing: training user interfaces with facial expressions and reinforcement learning. CoRR, abs/1606.02807 (2016) 25. Wang, A.-H., Chen, C.-H.: Effects of screen type, chinese typography, text/background color combination, speed, and jump length for VDT leading display on users’ reading performance. Int. J. Ind. Ergon. 31(4), 249–261 (2003) 26. Yoo, H.: Display of HUD warnings to drivers: determining an optimal location. The University of Michigan Transportation Research Institute, Technical report, Ann Arbor, MI (1999)

A Machine-Synesthetic Approach to DDoS Network Attack Detection Yuri Monakhov, Oleg Nikitin, Anna Kuznetsova(B) , Alexey Kharlamov, and Alexandr Amochkin Vladimir State University, 600000 Vladimir, Russia [email protected], [email protected]

Abstract. In the authors’ opinion, anomaly detection systems, or ADS, seem to be the most perspective direction in the subject of attack detection, because these systems can detect, among others, the unknown (zeroday) attacks. To detect anomalies, the authors propose to use machine synesthesia. In this case, machine synesthesia is understood as an interface that allows using image classification algorithms in the problem of detecting network anomalies, making it possible to use non-specialized image detection methods that have recently been widely and actively developed. The proposed approach is that the network traffic data is “projected” into the image. It can be seen from the experimental results that the proposed method for detecting anomalies shows high results in the detection of attacks. On a large sample, the value of the complex efficiency indicator reaches 97%.

Keywords: Data networks Attack detection

1

· Image recognition · Availability ·

Introduction

One of the methods of ensuring network availability is employing the network anomaly detection mechanisms. Before defining an anomaly, it is necessary to figure out what is considered a normal state. We consider the state of system “normal” (or “functionally viable”) when it performs all the functions assigned to it. Therefore, an anomaly is a state where the behavior of the system does not correspond to the clearly established characteristics of normal behavior [1]. Implementing the prompt detection mechanisms for such anomalies will sufficiently increase the chances of an effective response to network availability violation incidents. Known network anomalies are so diverse they cannot be categorized using a single classification. There is a clearly laid distinction, however, between active and passive, external and internal, intentional and unintentional anomalies, etc. This report contains results of the research project supported by Russian Foundation for Basic Research, grants no. 18-07-01109, 16-47-330055. c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 179–191, 2020. https://doi.org/10.1007/978-3-030-29513-4_13

180

Y. Monakhov et al.

Since these distinctions do not reflect all the characteristics of the phenomenon under study, the author [2] proposed a classification of anomalies based upon the impact object, i.e. an information system consisting of hardware, software and a network infrastructure. According to the chosen approach, network anomalies can be divided into two main groups: node malfunctions and security breaches. Node malfunctions include hardware faults, design and configuration errors, software errors, and hardware performance issues. Network security breaches include the following anomalies: network scanning, denial of service, malware activity, distribution of network worms, exploitation of vulnerabilities, traffic analyzers (sniffers), and network modifiers (packet injections, header spoofing etc.). The largest financial damage to telecom operators is caused by denial of service incidents. DoS attacks, in turn, can be divided into two types: inadvertently caused “attacks” (design errors and network settings, a small amount of dedicated computing resources, a sharp increase in the number of calls to a network resource) and attacks due to deliberate actions, e.g. UDP flood, TCP SYN flood, Smurf ICMP broadcast flood and ICMP flood. Deliberate attacks pose the greatest threat, as it is more difficult to mitigate them effectively and potentially they can lead to large losses. Analysis of research results published in [3–8], as well as reports of major information security systems developers, showed that there is no single effective algorithm for denial-of-service attack detection and mitigation. Usually, vendors offer an expensive solution implementing a hybrid algorithm based on signature search methods and blacklisting attacker node IP addresses as a form of mitigation. An example is the ATLAS system from Arbor, Ltd. Thus, the problem of developing tools for distributed DoS attack detection with a high degree of efficiency remains relevant. The rest of this paper is organized as follows: in Sect. 2, a review of the existing approaches for detecting anomalies is provided; Sect. 3 discusses the proposed approach, specifically, a strategy of representing traffic metadata into an image and an algorithm for classifying the obtained image are presented; in Sect. 4, experimental results are provided; Sect. 5 concludes the work and gives an outlook for further studies.

2

Existing Approaches

In the authors’ opinion, anomaly detection systems, or ADS, seem to be the most perspective direction in the subject of attack detection, because these systems can detect, among others, the unknown (zero-day) attacks. Almost all the models for detecting anomalies described in the literature can be divided into: (a) Based on a behavioral pattern storage [9,10]. The program implementation of this approach needs to be compiled into the operating system kernel, which is difficult to the point of practical impossibility (e.g. in trusted computing systems). In addition, the constant presence of a monitoring component leads to an overall slowdown of the entire system by approximately 4%–50%.

A Machine-Synesthetic Approach to DDoS Network Attack Detection

181

(b) Frequency-based [11,12]. Common drawbacks of frequency methods are their poor adaptability, since the reference values of frequencies are determined once by training sets or according to expert data. Moreover, these methods are usually “stateless”, i.e. the order of feature appearance is not taken into account. (c) Based on some type of a neural network classifier [13–17]. The disadvantage of many neural networks is their poor fitness to process non-ordered datasets. Introducing an artificial order on a set of element values will only distort the picture, since the neural network will recalculate weights according to the proximity of numerical values. (d) Based on a finite automata (state machine) synthesis [6,9,18–20]. The main disadvantage of this approach is the complex process of building a state machine by parsing the attack scenario. In addition, there are restrictions on the types of attack algorithms that can be described by regular grammars. (e) Other, special: based on Bayesian networks [21], genetic algorithms [22], etc. Most of the works offer only the basic idea, the algorithm, often unsuitable for practical use.

3

Proposed Approach

To detect anomalies, the authors propose to use machine synesthesia. In this case, machine synesthesia is understood as an interface that allows using image classification algorithms in the problem of detecting network anomalies, making it possible to use non-specialized image detection methods that have recently been widely and actively developed [23]. The proposed approach is that the network traffic data is “projected” into the image. Accumulating image changes gives us a video stream, analyzing which, we can make a conclusion about the anomalous state of the observed data network. The basis of any anomaly detection system is a module that analyzes network packets and decides on their potential maliciousness. In fact, ADS is trying to classify network traffic into two subsets: “normal” traffic and network attacks (it doesn’t even matter which detection technology is used—signature-based or statistical). Consequently, the very concept of ADS is in very good agreement with the goals of image classification algorithms - matching the original image to a class of images from a set according to some features. Moreover, image classification as a mathematical tool for analyzing network traffic data and detecting network attacks has several advantages compared to the anomaly detection methods discussed earlier. These advantages are represented below. – The mathematical apparatus for the classification of images is well developed and tested in practice in many other areas of science and technology. – A large number of image classification algorithms and wide possibilities for their improvement make this mathematical apparatus very flexible and provide an extensive potential for increasing the efficiency of network intrusion detection.

182

Y. Monakhov et al.

– Most image classification algorithms, showing high practical efficiency, are relatively easy to understand and implement in software. – Image classification is very effective even with very large amounts of input data. This fact makes us consider these methods as especially suitable for analyzing large network traffic dumps. – Classification of images can be applied even in the absence of a priori information about the importance of particular network packet features in the context of detecting certain types of network attacks. – Interpretation of the results is fairly simple and intuitive. 3.1

Image Representation of Multidimensional TCP/IP Traffic Data

The authors propose to solve the problem of representing network traffic metadata in the way which will allow using the pattern recognition algorithm to detect anomalies in the video stream. Consider the network terminal device collecting traffic in the virtual channel. Each collected packet has a set of metadata, presented as a vector p: p(id, date, x1 , x2 , . . . , xn ), p ∈ P, where n is a vector dimension, P is a set of all vectors, id is a session identifier, date is a timestamp of logging by the terminal, x1 , . . . , xn - direction, addresses and ports of sender and receiver, packet size, protocol type, timestamp (as in TCP segment header), different flags and service fields. To project traffic into an image, the “orthogonal projection” method is used [24]: each vector p is represented by a point in multidimensional space, where n is the dimension of space, then all points (packets) belonging to one session are projected into two-dimensional space: (¯ ¯) (¯ a¯b) a aa ¯¯ (¯b¯ a) (bb) b ¯b 0 ¯ ¯ X × a ¯ X × X = (¯ ¯) (¯ a¯b) aa (¯b¯ a) (¯b¯b) where a, b are empirically achieved basis vectors for the projection into the twodimensional space, X is a source vector, constructed from p by removing id and date elements, X is a projection result, × is a cross product, () is a scalar product. The next stage of the network session imaging is the connection of all its points, forming a convex figure. The last step is to fill the resulting shape with color. Then everything is repeated for the next network session. The resulting image is obtained when the imaging process has been performed for all network sessions intercepted by the terminal. Accumulating changes or differentiating this image gives us a video stream. Figure 1 shows examples of images that reflect the legitimate (“normal state”) network behavior.

A Machine-Synesthetic Approach to DDoS Network Attack Detection

183

Fig. 1. Examples of images that reflect the legitimate (“normal state”) network behavior.

3.2

Image Classification in the Problem of Anomaly Detection

The next step is to solve the problem of classifying the obtained image. In general, the solution to the task of detecting classes (objects) in an image is to use machine learning algorithms for building class models, and then output algorithms to search for classes (objects) in an image. Building a model has two stages: (a) Extraction of characteristic features for a class: construction of characteristic feature vectors for class elements. (b) Training on the obtained features of the model for subsequent recognition tasks. The description of the class object is carried out using feature vectors. Vectors are built from: (a) Color information (oriented gradient histogram). (b) Contextual information. (c) Data on the geometric interposition of object parts.

184

Y. Monakhov et al.

The classification (prediction) algorithm can be divided into two stages: (a) Extracting features from an image. At this stage, two tasks are performed: • Since the image can contain objects of many classes, we need to find all the representatives. To do this, one might use a sliding window, which “runs through” the image from the upper left to the lower right corner. • The image is scaled, since the scale of the objects in an image may vary. (b) Associating an image with a specific class. A formal class description, i.e. a set of features that are highlighted by their test images, is used as an input data. Based on this information, the classifier decides whether the image belongs to the class and assesses the degree of reliability for the conclusion. Classification Methods. Classification methods range from mostly heuristic approaches to formal procedures based on the methods of mathematical statistics. There is no generally accepted classification, but a number of approaches to image classification can be distinguished: – Methods of part-based object modeling. – “bag-of-words” methods. – Spatial pyramid matching methods. For implementation presented in this article the authors chose the bag-of-words algorithm, considering the following reasons: – The algorithms of the parts-based modeling and spatial pyramid matching are sensitive to the position of the descriptors in space and their mutual arrangement. These classes of methods are effective in the tasks of detecting objects in an image; however, due to the characteristic features of the input data, they are poorly applicable to the problem of image classification. – The bag-of-words algorithm is widely tested in other areas of knowledge, it shows good results and is simple enough to implement. To analyze the video stream projected from the traffic, we used a naive Bayes classifier [25]. It is often used to classify texts with the bag-of-words model. In this case, the approach is similar to the analysis of texts, only descriptors are used instead of words. The work of this classifier can be divided into two parts: the training phase and the prediction phase. Training Phase. Each frame (image) is fed to the input of the descriptor search algorithm, in this case the scale-invariant feature transform (SIFT) [26]. After that, the task of correlating singular points between frames is performed. A singular point on the image of an object is a point that will most likely appear on other images of this object. To solve the problem of comparing the singular points of an object in different images, a descriptor is used. Descriptor is a data structure, identifier of a singular point, distinguishing it from the rest. It may or may not be invariant w.r.t. image transformations of the object. In this case, the descriptor is invariant

A Machine-Synesthetic Approach to DDoS Network Attack Detection

185

w.r.t. perspective transformations, i.e. scaling. The descriptor allows to compare a singular point of the object in one image with the same singular point on another image of this object. Next, the set of descriptors obtained from all images is ordered into groups “by similarity” using the k-means clustering method [26,27]. This is done in order to train the classifier, which will issue a conclusion about whether the image represents anomalous behavior. Below is a step-by-step algorithm for training the image descriptor classifier: Step 1. Extraction of all descriptors from sets with attack and without attack. Step 2. K-means clustering of all descriptors into n clusters. Step 3. Calculation of the matrix A(m, k), where m is the number of images and k is the number of clusters. The element (i; j) will store the value of how frequently the descriptors from the j-th cluster appears on the i-th image. Such a matrix will be called the appearance frequency matrix. Step 4. Calculation of descriptor weights using tf idf formula1 tf idf (t, d, D) = tf (t, d) ∗ idf (t, D) Here tf (“term frequency”) is the frequency of occurrence of the descriptor in this image and is defined as nt tf (t, d) = , nk k

where t is a descriptor, k is the number of descriptors in an image, nt is an amount of descriptor t in an image. Also, idf (“inverse document frequency”) is the reverse frequency of the image with the given descriptor in the sample and is defined as idf (t, D) = log

Step 5. Step 6. Step 7. Step 8.

1

D , {di ∈ D, t ∈ di }

where D is the number of images with the given descriptor in the sample,{di ∈ D, t ∈ di } is the number of images in D where t is found under the conditions of nt = 0. Substituting corresponding weights instead of descriptors into the matrix A. Classification. We use the “boosting” (adaboost) of naive Bayes classifiers. Saving the trained model to a file. This concludes the training phase.

Wu H., Luk R., Wong K., and Kwok K.: Interpreting TF-IDF term weights as making relevance decisions. ACM Transactions on Information Systems, vol. 26, no. 3, (2008).

186

Y. Monakhov et al.

Prediction Phase. The differences between the training phase and the prediction phase are small: descriptors are extracted from the image and related to the groups at hand. Based on this relationship, a vector is constructed. Each element of this vector is the frequency of occurrence of descriptors from this group in the image. Analyzing this vector, the classifier can make a prediction about an attack with a certain probability. General algorithm for prediction based on a pair of classifiers is presented below: Step Step Step Step

Extraction of all descriptors from the image. Clustering the resulting set of descriptors. Calculation of the vector [1, k]. Calculation of the weight for each descriptor by the tf idf formula presented above. Step 5. Replacing the frequency of occurrence in vectors with their weight. Step 6. Classification of the resulting vector by a previously trained classifier. Step 7. Conclusion about the presence of an anomaly in the observed network based on the prediction of the classifier.

4

1. 2. 3. 4.

Detection Efficiency Evaluation

The task of evaluating the efficiency of the proposed method was solved experimentally. The experiment used a number of parameters set empirically. 1000 clusters were used for clustering. The generated images were 1000 by 1000 pixels. 4.1

Experimental Dataset

A setup was assembled for the experiments. It consists of three devices connected by a communication channel. The block diagram of the setup is shown in Fig. 2.

Fig. 2. The block diagram of the setup

The SRV device plays the role of a server under attack (hereinafter referred to as the target server). As the target server, the devices listed in Table 1 with the code SRV were used sequentially. The second is a network device designed to transfer network packets. Characteristics of the device are shown in Table 1 under the code ND-1.

A Machine-Synesthetic Approach to DDoS Network Attack Detection

187

On target servers, network packets were captured to a PCAP file for later use in detection algorithms. For this task, the tcpdump utility was used. The datasets are described in Table 2. The following software was used on target servers: Linux distribution, nginx 1.10.3 web server, postgresql 9.6 DBMS. To emulate system load a special web application was written. The application requests a database with a large amount of data. The request is designed to minimize the use of various caching. Through the experiments the requests to this web application were generated. Table 1. Network device characteristics Code

Description RAM, MB Network interface, Mbps

Disk drive, GB

Disk type Processor

2048

100

60

SSD

2 × 2 GHz Intel Atom D525

SRV-2 virtual host 6144

100

70

SSD

8 × 2 GHz Intel Xeon E7-4850

512

1000

10

HDD

2 GHz QEMU Virtual CPU

32

100

.008

Flash

535 MHz MIPS 74 Ks

SRV-1 Acer Atom Nettop

SRV-3 virtual host (KVM) ND-1

WR842ND router

Table 2. Sets of captured network packets Code Filename

Server DDoS Time of No. of packets Dump size record, min

D1

calm network

SRV-1 No

71

2950108

2.8 Gb

D2

empty net 247 SRV-2 No

71

87306

15 Mb

D3

empty net

SRV-3 No

17

163950

11 Mb

D4

pretty loaded

SRV-3 Yes

13

53244

54 Mb

D5

loaded

SRV-1 Yes

12

2949244

433 Mb

D6

loaded 2

SRV-2 Yes

5

589706

403 Mb

Table 3. Background traffic features Code Protocols Traffic datasets BT-1 bittorrent D1, D5 BT-2 ssh

All datasets

BT-3 http

D1, D3, D4, D5

BT-4 https

D2, D6

188

Y. Monakhov et al.

The attack was generated from the third client device (Table 1) using the Apache Benchmark utility. The structure of the background traffic during the attack and during the rest of the time is presented in Table 3. As an attack we implement a version of the HTTP GET-flood distributed DoS. Such an attack is essentially a generation of constant stream of GET requests, in this case from the CD-1 device. To generate it, we used the ab utility from the apache-utils package. As a result, files containing information about the state of the network were obtained. The main features of these files are presented in Table 2. The main parameters of the attack scenario are listed in Table 4. Table 4. DDoS attack features Code Dataset code Requests processed Speed, pps Avg. processing time, ms A-1

D4

900

15.90

29201

A-2

D5

8300

24.45

18120

A-3

D6

9950

31.20

16023

From the resulting network traffic dump, sets of the generated images TD#1 and TD#2, which were used for the training phase, were obtained. The sample TD#3 was used for the prediction phase. A summary of the test datasets is presented in Table 5. Table 5. Test image datasets Image type Test data TD#1 Test data TD#2 Test data TD#3

4.2

Legitimate 1500 images With DDoS 500 images

3000 images 1500 images

1000 images 1000 images

Total

4500 images

2000 images

2000 images

Efficiency Criteria

The main parameters evaluated through the course of this research were: (a) DR (Detection Rate) - the number of detected attacks in relation to the total number of attacks. The higher this parameter, the higher the efficiency and quality of ADS. (b) F P R (False Positive Rate) - the number of “normal” objects, mistakenly classified as an attack, in relation to the total number of “normal” objects. The lower this parameter, the higher the efficiency and quality of the anomaly detection system.

A Machine-Synesthetic Approach to DDoS Network Attack Detection

189

(c) CR (Complex rate) is a complex indicator that takes into account the combination of DR and FPR parameters. Since, as part of the study, the DR and F P R parameters were taken to be of equal importance, the complex PR . indicator was calculated as follows: CR = DR+F 2 The classifier was fed 1000 images marked as “anomalous”. Based on the recognition performance, DR was calculated depending on the size of the training sample. The following values were obtained: for TD#1 DR = 9.5% and for TD#2 DR = 98.4%. Next, the second half of the images (the “normal” ones) were classified. Based on the result, F P R was calculated (for TD#1 F P R = 3.2% and for TD#2 F P R = 4.3%). Thus, the following comprehensive efficiency indicators were obtained: for TD#1 CR = 53.15% and for TD#2 CR = 97, 05%.

5

Conclusions and Future Research

It can be seen from the experimental results that the proposed method for detecting anomalies shows high results in the detection of attacks. E.g., on a large sample, the value of the complex efficiency indicator reaches 97%. However, this method has some limitations in its application: 1. The values of DR and F P R show the sensitivity of the algorithm to the size of the training set, which is a conceptual problem for machine learning algorithms. Increasing the sample results in improved detection rates. However, it is not always possible to implement a sufficiently large training sample for a specific network. 2. The developed algorithm is deterministic, the same image is classified each time with the same result. 3. The efficiency indicators of the approach are good enough for proof of concept, but the number of false positives is also large, which can lead to the difficulties of practical implementation. To overcome the limitation described above (item 3), it is supposed to change the naive Bayesian classifier to a convolutional neural network, which, according to the authors, should lead to an increase in the accuracy of the anomaly detection algorithm.

References 1. Mohiuddin, A., Abdun, N.M., Jiankun, H.: A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 60, 19–31 (2016) 2. Afontsev, E.: Network anomalies (2006). https://nag.ru/articles/reviews/15588/ setevyie-anomalii.html 3. Berestov, A.A.: Architecture of intelligent agents based on a production system to protect against virus attacks on the Internet. In: XV All-Russian Scientific Conference Problems of Information Security in the Higher School System, pp. 180–276 (2008)

190

Y. Monakhov et al.

4. Galtsev, A.V.: System analysis of traffic to identify anomalous network conditions: the thesis for the Candidate Degree of Technical Sciences, Samara (2013) 5. Kornienko, A.A., Slyusarenko, I.M.: Intrusion Detection Systems and Methods: Current State and Direction of Improvement (2008). http://citforum.ru/security/ internet/ids overview/ 6. Kussul, N., Sokolov, A.: Adaptive anomaly detection in the computer systems users behavior using Markov chains of variable order. Part 2: methods of detecting anomalies and the results of experiments. In: Informatics and Control Problems, no. 4, pp. 83–88 (2003) 7. Mirkes, E.M.: Neurocomputer: Draft Standard, pp. 150–176. Science, Novosibirsk (1999) 8. Tsvirko, D.A.: Prediction of a network attack route using production model methods (2012). http://academy.kaspersky.ru/downloads/academycup participants/ cvirko d.ppt 9. Somayaji, A.: Automated response using system-call delays. In: USENIX Security Symposium 2000, pp. 185–197 (2000) 10. Ilgun, K.: USTAT: a real-time intrusion detection system for UNIX. In: Proceedings 1993 IEEE Symposium on Research in Security and Privacy, pp. 16–28. IEEE (1992) 11. Eskin, E., Lee, W., Stolfo, S.J.: Modeling system calls for intrusion detection with dynamic window sizes. In: Proceedings DARPA Information Survivability Conference and Exposition II, DISCEX 2001, vol. 1, pp. 165–175. IEEE (2001) 12. Ye, N., Xu, M., Emran, S.M.: Probabilistic networks with undirected links for anomaly detection. In: 2000 IEEE Workshop on Information Assurance and Security, West Point, NY (2000) 13. Michael, C.C., Ghosh, A.: Two state-based approaches to program-based anomaly detection. ACM Trans. Inf. Syst. Secur. 5(2), 203–237 (2002) 14. Garvey, T.D., Lunt, T.F.: Model-based intrusion detection. In: Proceedings of the 14th Nation Computer Security Conference, Baltimore, MD, vol. 17 (1991) 15. Theus, M., Schonlau, M.: Intrusion detection based on structural zeroes. Stat. Comput. Graph. Newsl. 9(1), 12–17 (1998) 16. Tan, K.: The application of neural networks to UNIX computer security. In: IEEE International Conference on Neural Networks, Perth, Australia, vol. 1, pp. 476–481 (1995) 17. Ilgun, K., Kemmerer, R.A., Porras, P.A.: State transition analysis: a rule-based intrusion detection system. IEEE Trans. Softw. Eng. 21(3), 181–199 (1995) 18. Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: 17th International Conference on Machine Learning, pp. 255–262. Morgan Kaufmann, San Francisco (2000) 19. Ghosh, K., Schwartzbard, A., Schatz, M.: Learning program behavior profiles for intrusion detection. In: 1st USENIX Workshop on Intrusion Detection and Network Monitoring, Santa Clara, California, pp. 51–62 (1999) 20. Ye, N.: A Markov chain model of temporal behavior for anomaly detection. In: 2000 IEEE Systems, Man, and Cybernetics, Information Assurance and Security Workshop. IEEE (2000) 21. Axelsson, S.: The base-rate fallacy and its implications for the difficulty of intrusion detection. In: Proceedings of the 6th ACM Conference on Computer and Communications Security, pp. 1–7. ACM, New York (1999). https://doi.org/10. 1145/319709.319710

A Machine-Synesthetic Approach to DDoS Network Attack Detection

191

22. Chikalov, I., Moshkov, M., Zielosko, B.: Optimization of decision rules based on methods of dynamic programming. Vestnik of Lobachevsky State University of Nizhni Novgorod 6, 195–200 (2010) 23. Chen, C.H.: Handbook of Pattern Recognition and Computer Vision. University of Massachusetts Dartmouth, Dartmouth (2015) 24. Gantmacher, F.R.: The Theory of Matrices. Science, Moscow (1968) 25. Murty, M.N., Devi, V.S.: Pattern Recognition: An Algorithmic Approach, pp. 93– 94. Springer, Heidelberg (2011) 26. Lowe, D.G.: Distinctive image features from scale-invariant keypoints (2004). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.3843&rep=rep1& type=pdf 27. Clustering with K-means in Python (2013). https://datasciencelab.wordpress.com/ 2013/12/12/clustering-with-k-means-in-python

Novel Synchronous Brain Computer Interface Based on 2-D EEG Local Binary Patterning Daniela De Venuto

and Giovanni Mezzina(&)

Department of Electrical and Information Engineering, Politecnico di Bari, Via E. Orabona 4, 70125 Bari, Italy {daniela.devenuto,giovanni.mezzina}@poliba.it

Abstract. This paper proposes the design and the validation through in-vivo measurements, of an innovative machine learning (ML) approach for a synchronous Brain Computer Interface (BCI). The here-proposed system analyzes EEG signals from 8 wireless smart electrodes, placed in motor, and sensorymotor cortex area. For its functioning, the BCI exploits a specific brain activity patterns (BAP) elicited during the measurements by using clinical-inspired stimulation protocol that is suitable for the evocation of the Movement-Related Cortical Potentials (MRCPs). The proposed BCI analyzes the EEGs through symbolization-based algorithm: the Local Binary Patterning, which – due to its end-to-end binary nature - strongly reduces the computational complexity of the features extraction (FE) and real-time classification stages. As last step, the user intentions discrimination is entrusted to a weighted Support Vector Machine (wSVM) with linear kernel. The data have been collected from 3 subjects (aged 26 ± 1), creating an overall dataset that consists of 391 ± 106 observations per participant. The in-vivo real-time validation showed an intention recognition accuracy of 85.61 ± 1.19%. The overall computing chain requests, on average, just 3 ms beyond the storage time. Keywords: BCI EEG Machine learning

Symbolization Local binary pattern

1 Introduction The Brain Computer Interface (BCI) is a control-loop system that translates the brain activity patterns (BAP), into commands for interactive application [1]. A general BCI scheme is shown in Fig. 1. Here, we can distinguish two main parts: the BCI one (gray blocks in Fig. 1) and the operative interface (yellow and red blocks). The operative interface manages the input and the output of the BCI and consists of the signal acquisition interface (e.g., EEG reader) and the mechatronic device that physically realize the user intention. It also comprises an external block, which is in general optional: the Stimulation Block (red block in Fig. 1). It is present in synchronous BCIs, such as the one proposed in this paper, absent in asynchronous BCIs [2, 3]. To provide a complete overview of the BCI working scheme, the figure also shows, for each BCI block, some noteworthy state-of-the art algorithms.

© Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 192–210, 2020. https://doi.org/10.1007/978-3-030-29513-4_14

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

193

Fig. 1. General BCI application scheme. The figure shows for each BCI block some noteworthy state-of-the art algorithms.

Most current BCIs monitor information from the brain activity through electroencephalography (EEG) [1]. It has been proven that non-invasive BCI can be used in several specific applications, such as cursor movement on a computer monitor without hand movements [3], in assistive technologies [4], mind-controlled text input systems [5], wheelchairs driving [6], rehabilitation devices for stroke patients [6], and so on. Currently, the BAP recognition, via BCI, is based on EEG pre-processing, followed by proper features extraction (FE)/features selection (FS) stages [1]. The processing chain ends with the selection of the classification method that minimizes the recognition errors. The EEG signals are typically filtered both in the time domain (bandpass filter), and spatial domain (optional- spatial filter) before the FE [1, 7]. Then, the FS algorithms are used to find and select the best features subsets to train the chosen classifier. The current FE algorithms can be divided in two general subtypes: the frequency band power features and time domain features. The band power features (BPF) are related to the power (energy) of EEG signals for a given frequency band on a specific channel, averaged over a given time window (typ: 2 s) [1–6]. The BPFs are the most used features [1, 8] for the identification of oscillatory activity (i.e. changes in EEG rhythm amplitudes [1]). They are considered as the gold standard for BCIs that exploit the motor and mental imagery techniques or for steady state visual evoked potential (SSVEP). The time domain features (TDF) are defined as a concatenation of EEG samples from all channels. Typically, the TDF can be considered reliable only after some preprocessing, band-pass or low-pass filtering and coherent down-sampling [1, 8]. The TDF are less used than the BPF, but they find application in the Event Related Potentials (ERP) classification. The ERPs are temporal variations in EEG signals’ after

194

D. De Venuto and G. Mezzina

a protocolled event/stimulus [9, 10]. These latter are the features used in most P300based BCI. The BAPs used to realize reliable single trial classifications (with proper BPF or TDF feature extraction stages) are: the ERPs, the slow cortical potentials (SCP), the event-related (de-) synchronization potentials (ERD/ERS), the SSVEP and the sensorimotor rhythms (SMR) [10, 11]. Several single-trial based BCI were proposed by the state of the art and they have been summarized in the Table 1 [12–15]. Some noteworthy solutions are compared with the proposed work in terms of: chosen BAP, FE and classification methods, mean accuracy, EEG trial length after stimulation (AS), number of subjects involved in the test, data rate (commands/min) and number of available choices. In this framework, this paper proposes a computationally light, machine learning (ML) algorithm for a synchronous 2-choice Brain Computer Interface (BCI). The proposed system demonstrated to reduce the training time of about 7 times with respect to the state of the art, preserving a good tradeoff with the classification accuracy (State of the art typical BCI training: 1 h, Proposed BCI: 8 min). Table 1. Analysis and comparison with the state-of-the-art solutions System

[12]

[13]

[14]

[15]

Our work

BAP

SCP

ERD/ERS

ERPs

Oscillatory rhythms Motor Imagery + CVA + Gaussian 16 86.2% (online)

MRCPs

LBP + SVM Standard LLP algorithm mroMethods Imagery unsupervised modulation + AR + ANN method threshold # Electr. 32 56 31 8 86% 90.2% 85.6% 83.6% Mean (online) (pseudo (offline) (pseudo accuracy real-time) real-time) (%) Training *50 h *4 h *4 h 2 weeks *10 min time EEG trial 4s 8s 17 s 1s 0.2 s length AS Data rate 2 6 2.4 2 8 (com/min) # Choices 2 3 24 2 2 Dataset 2 3 13 10 3 (subjects) *AR: Autoregressive Coefficients, ANN: Artificial Neural Network, LLP: learning label proportions, CVA: Canonical Variate Analysis

Another noteworthy strength point of the proposed BCI system lies in the after stimulus EEG trials length, needed for accurate BAP recognition. As shown in Table 1,

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

195

the current solutions ask for 2 s to 8 s after the stimulus to provide a reliable classification, while the proposed system reach optimal results with only 0.2 s after the stimulation. This strong reduction is linked to the motion preparatory nature of the selected BAP. The proposed system also drastically reduces the computational complexity of the FE stage, entrusting the overall stage to end-to-end binary operations. Moreover, an original contribution of the paper lies in the use of a novel set of timedomain features for the characterization of the oscillatory brain activities (typically treated by BPFs techniques) linked to the hands movement: the movement related cortical potentials (MRCPs). It opens to interesting scenarios for very fast BCIs. Briefly, in the proposed solution, the MRCPs have been experimentally elicited via clinical-inspired protocols. Through the MRCPs waveforms, the BCI must recognize which hand (left/right) performed a specific movement (detailed in Sect. 2). The algorithm analyzes the EEGs, wirelessly acquired from 8 smart electrodes, through a methodology known as symbolization [16, 17]. The chosen symbolization technique is a custom version of the Local Binary Patterning (LBP). It allows the BCI to treat EEG as a sequence of discrete binary strings, named in this context symbol [16, 17]. During the ML stage, some EEGs observations are progressively selected to realize two “golden template” binary masks (one for each available choice). Then, considering an unlabeled EEG observation, a similar 2D binary matrix is extracted and compared with the two masks through a set of XNOR gates. The output consists of two likelihood matrices. The degree of similarity between the unlabeled observation matrix and the two reference masks is used to train a weighted Support Vector Machine (wSVM) with linear kernel. The paper is structured as follows: Sect. 2 gives a description of the used stimulation protocols. Section 3 provides information about the overall architecture, from the ML to real-time classification. Section 4 describes the experimental results and Sect. 5 underlines the conclusions and future perspectives.

2 Stimulation Protocol 2.1

The Brain Activity Pattern: MRCP

It has been proven by several clinical studies [1, 10, 18] that during any voluntary movements, the human brain is involved into a preparation routine that precedes activation of proper muscles sequence. The cyclical neural mechanic related to the preparation and execution of a motor command, usually starts about 1 s before the muscle contraction, with maximum evidence about 200 ms before the movement onset. The brain process characterized, inter alia, by EEG movement-related cortical potentials (MRCPs) [18]. In a context of movement recognition, the most characterizing MRCPs are the premotor potential (or Bereitschaftspotential, BP), the l-rhythms, and the b-rhythms [18–20]. These rhythms occupy a frequency band between 2 Hz and 30 Hz: 2 Hz–5 Hz for the BP, 9 Hz–11 Hz l-rhythms and 13 Hz–30 Hz for the b ones [18].

196

D. De Venuto and G. Mezzina

The needed potentials drifts typically occur from 1 s before the movement activation to 0.2 s after it, where the l-rhythms are suppressed by the motor action [18]. MRCPs are typically more visible in the central and early parietal cortex areas known as motor and sensory regions, and specifically in the brain hemisphere opposite to the limb that performs the movement. 2.2

The Elicitation Protocol

For the here treated BCI application, an ad-hoc test has been designed in order to emphasize the movements’ voluntariness. The MRCPs have been experimentally elicited via clinical-inspired protocols. Through the MRCPs waveforms, the BCI must recognize which hand (left/right) performed a specific movement. In this first stage of this pilot study, the designed test consists of a stimulation-based protocol with needed user physical interaction. Figure 2 shows the experimental measurements setup for the BCI train and test. Figure 2(a) shows the user position and the equipment needed for the experiment. The user, who wears the wireless EEG headset, is sitting in front of two push buttons, placed on a prototype board (Fig. 2(b)). According to their positions in Fig. 1 (b), the yellow button must be clicked with the left hand, while the red one with the right hand. On the same breadboard there is a black buzzer which provides the auditory stimulation to the subject under test. The buzzer is programmed to play with a random inter-stimulus timing (from 3 s to 7 s). It randomly emits two different tones (that are distant in frequency) to indicate the hand that must be used to push the button. The low frequency tone is linked to the right-hand movement, while the high frequency tone leads to the movement of the left hand. The hardware responsible of the stimulation timing is a dedicated ATMega 328ppu based board (white PCB in the top of the Fig. 1(b)). The buzzer waveform - that causes the sound - is contextually sent to the EEG base station (the blue box on the right of the Fig. 1(b)). When a specific tone plays, and the user pushes the button with the prescribed finger, a common trigger 3.3 V signal is generated and sent to the EEG base station. The base station is connected via USB to the central computing unit that in this case is a PC. For the gateway interface has been created a pseudo real-time model in SimulinkMatlab 2017b environment, which has the role to synchronize the stimulation with the EEGs. Figure 2(c) shows the Simulink interface realized for the voluntariness – via MRCPs - real-time evaluation. Starting from the top-left scope in Fig. 2(c) to right we can find: the monitored EEG signals (Scope 1), the buzzer sound tone (Scope 2) and the button linked step signal (Scope 3). The bottom scopes displays in real time the µ-rhythm (Scope 1), BP (Scope 2) and b-rhythm.

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

197

Fig. 2. Measurements setup: (a) experimental acquisition settings (b) prototype board for auditory stimulation and user responses (c) Simulink interface for the MRCPs monitoring.

3 The BCI Architecture The overall BCI working principle, from acquisition to real time classification is chronologically shown by the Figs. 3, 4 and 5. The overall architecture describes a specific processing chain: (1) the acquisition and general settings; (2) the pre-processing; (3) the data management; (4) the Movement Reference Masks (MRM) extraction; (5) the XNOR based likelihood approach; (6) the Maximum Likelihood Analysis and (7) the weighted features matrix building that leads to (8) the SVM classifier training. All the steps constitute the machine learning (ML) stage, while the steps from (1) to (3) and from (5) to (7) are “shared” with the real-time classification phase. 3.1

The EEG Acquisition Unit

The acquisition unit is made up by a 32-channel EEG headset as shown in Fig. 2(a). For this application, data from 8 EEG channels on the sensorimotor area are wirelessly acquired from: T7, C3, Cp1, Cp5, C4, T8, Cp2 and Cp6 [11, 18]. The AFz electrode is used as GND for a monopolar reading and the right ear lobe is the reference electrode (REF). The selected channels respect the motor cortical homunculus mapping for the MRCPs extraction [21]. EEG samples are recorded in an analog input range of ±187.5 mV with 24bit resolution at 500 Hz sampling rate [22–24] and in loco bandpass filtered between 0.5 Hz and 35 Hz (8th order Butterworth) [22, 23].

198

D. De Venuto and G. Mezzina

Fig. 3. Top-level blocks diagram for pre-processing and 1st stage of feature extraction in the processing chain

3.2

The Machine Learning Phase

The operation of the proposed BCI can be described as follows: the EEGs are acquired from the wireless headset, which collects and in-loco filters the data [16], sending them via Bluetooth Low Energy to the base station connected to the PC USB socket. This EEG base station provides a set of 3.3 V tolerant digital inputs that have been interfaced with external trigger signals, in order to synchronize them with the EEGs. Considering the Fig. 3, the EEGs are acquired without other operations for 60 s (the buzzer does not sound in this time window). This time window is useful for the EEG headset filters settling [4, 22]. In this time interval the user provides indications about the winsorizing limits [13] for the real-time deleting of the outliers from the EEGs. Indeed, after the 60 s for the filter settling, the buzzer sounds 3 times to start the pre-processing phase. For the following 30 s the user must randomly click the buttons with a rate of 1 click/3 s, limiting as much as possible the eye blinks. This step is necessary to extract, according with the winsorizing settings, the fixed thresholds (upper and lower) for the EEGs outliers’ elimination. The random clicks make the MRCPs linked deflections statistically relevant (avoiding their elimination). The real-time cleaning of the EEGs samples above the winsorizing upper and lower limits constitute the routine shown in Fig. 3. After the first 90 s, the user waits for the unknown buzzer sound focusing on the hand to be moved. When the tone occurs, the user pushes the button with the prescribed hand, triggering a phase of EEGs extraction. Indeed, from the trigger signal onset the system waits for 0.2 s and extracts 1.2 s backwards. In this way, an EEG epoch (or observation) is extracted and consists in 1 s before the movement onset and 0.2 s after it.

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

199

These epochs are stored in a 3D matrix (EEG Epochs in Fig. 3 – Data Management): EEG Epochs 2 Rnch; Nsa; Nepochs , with nch number of channels, Nsa number of samples that constitutes an epoch (600 Sa @ sampling frequency of 500 Hz) and Nepochs number of collected observations for the offline ML phase. The epochs are firstly divided in two subsets: EEG Rch and EEG Lch that are respectively, the EEGs observations from the right side electrodes (Rch) and left side ones (Lch): EEG Rðor LÞch 2 Rnch=2; Nsa; Nepochs . Then, they are re-organized considering the movement that generated the MRCPs reactions. The resulting two 3D matrices are EEG RHM and EEG LHM, where RHM is Right Hand Movement and LHM is Left Hand Movement. Considering balanced datasets, the system analyzes Nepochs/2 observations for each available movement, so EEG R=LHM 2 Rnch; Nsa; Nepochs=2 . These matrices must be organized as:

EEG Rch Lðor RÞHM EEG Lðor RÞHM ¼ EEG Lch Lðor RÞHM

ð1Þ

3.2.1 The Movement Reference Masks After the matrices building, some observations of the data stores are randomly selected (Mask Indexing Settings - Fig. 3) in order to derive coherent behaviors among all the EEG epochs. The indexing generates two data stores, named EEG L(or R)HM Mask Selection nch; Nsa; pðNepochs=2Þ R ; where p is the percentage of the dataset intended for the masks construction. For instance, in an overall dataset of 200 observations, 100 of them are dedicated to the LHM and 100 to the RHM. If p = 0.5, 50 epochs for both LHM and RHM are used to derive the MRMs. In particular the EEG L(or R)HM Mask Selections are averaged along the third dimension, generating a 1D vector, in which “survive” only the phased potential deflections. Both the vectors undergo the local binary patterning (LBP) routine. This symbolization routine transforms the EEGs (1D vectors in single/double precision) into binary 2D matrices. For instance, considering the first right EEG channels (e.g. C4 – Rch1) the 1D vector has the dimension [1 Nsa] with single precision. After the LBP routine, it is translated into a [d Nsa] matrix of binary type, with d = 6 [16, 17]. The resulting matrices, from all the channels are stacked into a definitive one (2D bEEG Matrix in Fig. 2) composed as follows:

bEEG Mat: Rch bEEG Matrix ¼ bEEG 2Mat: Lch 3 2D Mat: Lch1 5 with bEEG Mat:Lðor RÞch ¼ 4 ... 2D Mat: Lch4

ð2Þ

200

D. De Venuto and G. Mezzina

These resulting matrices, derived by the mask selection epochs, realize the two MRMs: MRM LHM and MRM RHM. They can be considered as golden template for left hand movements (MRM LHM) and right hand (MRM RHM) ones. The MRMs contains spatial information (the first d * nch/2 rows are dedicated to right channels in specific order, while the last d * nch/2 are dedicated to left channels) and temporal ones. The LBP Routine. As stated above, the symbolization is a data analysis method that describes single/double precision general process (EEG epoch waveform) as a set of discrete symbols. In the last years, symbolization found several interesting applications in intracranical EEG studies, with a strict focus on epileptic seizures classification [16, 17]. Indeed, has been proven that the symbolization preserves the dominant signal features, and at the same time, is more efficient than horizontal graphs approach, which suffers of offset and slow drift [17]. The symbolization method adopted in this BCI application is the LBP, that consists in mapping a sequence of d = 6 EEG samples into a 1D bit string. The LBP computing is summarized in the following Matlab-based pseudo code: LBP Routine (Example of Matlab coding) L=6; %number of bit c=1; for i=L+1:size(EEG,1) z=1; %bit 1 if EEG(i)-EEG(i-1)>0 bEEG(z,c)=1;%z is the bit string position, c is the column (EEG samples to be substituted by the string) z=z+1; else bEEG(z,c)=0; z=z+1; end if EEG(i-1)-EEG(i-2)>0 bEEG(z,c)=1; z=z+1; %bit2 else bEEG(z,c)=0; z=z+1; %bit2 end

…

if EEG(i-5)-EEG(i-L)>0 bEEG (z,c)=1; z=z+1; %bit6 else bEEG(z,c)=0; z=z+1; %bit6 end c=c+1; end

The LBP coding provides information between consecutive EEG values emphasizing the relative increment or decrement trend.

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

201

3.2.2 The XNOR Validation The observations used for the masks are excluded by the classifier training set. In this way, the EEG LHM and EEG RHM matrices are resized along the third dimension (Nepochs/2 ! (1 − p) * Nepochs/2). The LBP routine is applied to all the observations properly reorganized as stated above. The LBP routine resulting from each epoch is named unlabeled local binary pattern (uLBP). They have dimensions that are coherent with the MRM ones. It allows the system to compare the uLBP with the MRMs through a plane of XNOR gates. In particular, 96 XNOR (one for each uLBP row for a total of 2 MRMs *nch*d) have been used for the application. Formally, two likelihood matrices are derived from the comparisons. In Figs. 3 and 4 these matrices are named “uLBP vs MRM LHM” and “uLBP vs MRM RHM”. The system analyzes the uLBP versus the MRMs, row-by-row, as shown in Fig. 4. The uLBP ith row is sent in parallel to two XNOR: in the first XNOR it is compared with the equivalent position in MRM LHM and through the second XNOR it is compared with the equivalent matrix element of MRM RHM. The sequential natures of the operations generate waveforms that permit to count the number of ‘1’ in each row. This operation determines the features vector composition, according to the eq. (3) and Fig. 4: 3 uLBPði; jÞMRM LHMði; jÞ 7 6 ... 7 6 6 @48th row : PNsa uLBPði; jÞMRM LHMði; jÞ 7 7 6 j¼1 P FV ¼ 6 7 6 @49th row : Nsa uLBPði; jÞMRM RHMði; jÞ 7 j¼1 7 6 5 4 ... P Nsa th @96 row : j¼1 uLBPði; jÞMRM RHMði; jÞ 2

@1st row :

PNsa

j¼1

ð3Þ

It is a quantitative evaluation of the row-by-row degree of similarity. An example of resulting similarity statistic is sketched by histograms in Fig. 4. The experimental results have been demonstrated that there are two specular and distinguishable trends if the uLBP is more similar to the MRM LHM or RHM (see Fig. 4. After the count of the ‘1’ from the XNOR, the system returns to operate with integer number in single precision, as shown by the colors in Fig. 4. The XNOR comparisons realizes the train Feature Matrix that contains (1 − p) * Nepochs observations, each of which described by 96 features.

202

D. De Venuto and G. Mezzina

Fig. 4. Top-level blocks diagram for 2nd stage of feature extraction and SVM training

3.2.3 Thresholds and Weights Assignment The extracted feature matrix, useful to train the SVM is slightly handled to emphasize the differences between the epochs linked to LHM and RHM. To realize this condition, the feature matrix undergoes to a statistical based thresholds extraction and weights assignment. The statistical-based thresholding analyzes the occurrences of each feature along the evaluated (1 − p) * Nepochs observations. An example of the single feature distribution along the epochs from a real dataset is shown in Fig. 5. Each x-axis ticks of the top subplot represents the single features from 1 to 48 (uLBP vs LHM), while the ticks of the bottom subplot represent the single features from 49 to 96 (uLBP vs RHM). It is notable that a uLBP linked to a RHM has low values for the first 48 features and high values for the remaining ones (49–96). A specular trend has been recorded for a uLBP derived from an LHM. This behavior permits to extract two fixed thresholds, according to the Fig. 4: the upper threshold is composed by 96 values, the first 48 are derived by the 75th percentiles of the uLBP linked to LHMs, while the last 48 from the uLBP linked to RHMs.

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

203

In a similar manner, the first 48 points of the lower threshold are derived from the uLBP linked to RHMs and the last 48 come from the uLBP linked to LHMs. The distance between the thresholds, feature-by-feature, is linearly spaced in 21 weights from −1 to 1 to normalize the matrix. The features are then normalized in a weighted way. According to this step, the uLBP linked to LHMs will have the first 48 features close to 1 and the last 48 are close to −1.

Fig. 5. Boxplot view of statistical incidence for all the features in presence of LHM (blue) and RHM (red). Sub. 1 – Dataset with Nepochs = 310 (155 LHM, 155RHM).

The weighted feature matrix is then used to train a Support Vector Machine (SVM) with linear kernel. 3.2.4 The SVM Classifier The feature matrix is labelled, row by row, with “1” if the observation to be classified is linked to an RHM and with a “−1” if it is related to an LHM. The weighted feature matrix is then used to train a classical SVM [25] with a linear kernel. A classic SVM is a binary discriminator that classifies the extracted features, by finding the best hyperplane (linear kernel) that separates all data points relating to a first class (e.g. LHM), from the points of a second one (e.g., RHM). In this application, the accuracy of the trained model has been k-fold validated in k = 50 steps. It ensures a reliable value of accuracy [26].

204

3.3

D. De Venuto and G. Mezzina

The Real-Time Prediction

In the real-time prediction context, the system operates on a single observation, triggered by the button. In a blind way (but under supervision), the system re-organizes the data from the right and left EEG channels as reported in Sect. 3.2.1, then proceeds to the LBP coding. The LBP routine output is the uLBP matrix, according to the process explained in Sect. 3.2.1. Since the MRMs are stored in the system memory, the algorithm simply compares, through the XNOR plane, the uLBP with the MRM RHM and MRM LHM as stated in Sect. 3.2.2. The number of ‘1’ are counted row-by-row, realizing a 1 96 features vector. This latter is weighted through a set of fixed thresholds, which have been previously extracted in the ML. The weighted version of the features vector is then used to feed the previously trained SVM model, which provides in output the predicted user intention. For the real time prediction, the trained classifier has been implemented via extrinsic coding in Simulink. Finally, the SVM classifier sends the predicted label to a dedicated block that manages the Bluetooth communication. Firstly, the Simulink model verifies the Bluetooth object and opens the communication by sending a string: {89, command, 89}. The communications are then stopped waiting for the next trigger edge. The HC05 placed on the actuator (e.g. a prothesis or any general mechatronic device) is programmed to operate at 9600 baud.

4 Results The proposed architecture has been in-vivo tested on a dataset of 3 volunteers (aged 26 ± 1), all students of the Politecnico di Bari, Italy. Table 2 summarizes the composition of the training and validation datasets on which the study has been conducted. In the table are reported the datasets composition, identifying each one by ID (1..7), subject involved in the test, age of the subject, day and hour of the test and a brief dataset description. Also, the table distinctly reports the observation for the right-hand movement (R) and left-hand ones (L). The validation epochs do not participate to the training of the SVM. On average, for each subject 391 ± 106 observations have been collected and used to train the SVM classifier. The datasets have been balanced with post-processing. The system accuracy, in the follows, is defined as the ratio between the correctly detected observations (in a supervised way) and the total number of requested predictions.

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

205

Table 2. Datasets composition Training dataset composition Sub Age Day

Hour

Dataset Descr.

1

1

25

1

11.00

*45 min recording

2

1

25

2

12.00

*45 min recording

3

1

25

1+2

–

Dataset 1 + 2

4

2

25

1

16.00

*50 min recording

5

3

27

1

17.00

*30 min recording

6

3

27

2

15.00

*30 min recording

7

3

27

1+2

–

Dataset 5 + 6

Test dataset composition Sub Age 1 1 25

Dataset Descr. Data collected during Day 1 and Day 2 – all excluded from training set

2

2

25

Data collected during Day 1 – all excluded from training set

3

3

27

Data collected during day 1 and day 2 – all excluded from training set

4.1

Dataset L R L R L R L R L R L R L R

155 155 102 102 257 257 161 161 89 89 80 80 169 169

Dataset L R L R L R

41 41 32 32 35 35

The ML Performance

The ML stage has been performed off-line through a Matlab 2017b script. It has been tested a PC equipped with Intel i5 processor and 16 GB RAM, the ML algorithm requested, on average: 9.21 ms for the data management, *2 ms for the mask selection, 1.61 ms for the MRMs extraction and *0.2018 ms for each single uLBP coding, *0.667 ms for a single uLBP comparison with MRM LHM and RHM, 1 ms for the statistical based thresholds extraction and 0.16 ms for the weights’ assignment. The weighted feature matrix trains a linear SVM in about 5.79 ± 1.23 s, validating it in k-folded way with *190 obs./s. Table 3 summarizes the training performance along all the datasets in terms of type of SVM, number of support vectors (SVs), classification timing, minimum training batch and accuracy.

206

D. De Venuto and G. Mezzina Table 3. ML performance

Dat. Winsorizing limit class. type 1 10th–90th Lin wSVM 2 10th–90th Lin wSVM 3 10th–90th Lin wSVM 4 5th–95th Lin wSVM 5 10th–90th Lin wSVM 6 10th–90th Lin wSVM 7 10th–90th Lin wSVM

# SV Class. timing (ms) 29 4.25

Min. training batch 66

No-validated accuracy (%) 96.77

30

4.01

62

94.61

32

4.52

64

95.91

28

4.14

60

96.27

29

4.00

62

95.51

30 31

3.92 4.15

62 64

94.37 96.15

The k-fold validation has been managed to extract the minimum training batch, which is able to ensure a stable accuracy close to the values in Table 3. The minimum training batch is, on average, 62 ± 2 observations, which consists in a needed training time of about 434 s (worst case considering a fixed interstimulus time of 7 s). The last step of the ML performance assessment concerned the extraction of the Area Under the Curve (AUC) and the Receiver Operating Characteristics (ROC). Briefly, the AUC - ROC curve is the most used classifier performance metric, because it analyzes the classifier discerning capability in different conditions [27]. Specifically, the ROC is a probability curve and the AUC represents the degree of the analyzed classes’ separability. A high AUC value is representative of a classifier model able to predict – in a right way – the classes. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings, defined as: TPR ¼

TP TP þ FN

ð4:aÞ

FPR ¼

FP FP þ TN

ð4:bÞ

with TP: true positive occurrences, TN; true negative ones, FP: false positive or false alarm and FN: false negative. Figure 6 shows the both the AUC value and the ROC curve for a specific dataset. In addition, the figure highlights, by using a red dot, the classifier chosen to distinguish the choices of the subject under test. It is notable that the implemented classifier is the one that minimizes the FPR and maximizes the TPR.

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

4.2

207

The Real-Time Validation

The real-time validation step has been conducted on the dataset 1, 2, 3 from the bottom part of the Table 2. The real-time validation interface has been fully realized in Simulink environment. The previously trained SVM classifier has been stored as a struct and uploaded in a dedicated extrinsic function. For the validation of the test dataset (TeD) 1, the used SVM is the one from the training dataset (TrD) 3. For the TeD 2, the used SVM is the classifier derived from TrD 4 and, finally, for the TeD 3 the SVM model comes from the TrD 7. The accuracies for all the evaluate datasets are shown in Figs. 7 through 3 heatmap views of the confusion matrices. On average, the real time validation accuracy is 85.61 ± 1.19%.

Fig. 6. ROC curve for the extracted classifier @Dataset 3: Sub 1, Day 1 + 2

Fig. 7. Heatmap view of confusion matrices for real time validation datasets.

208

4.3

D. De Venuto and G. Mezzina

The System Timing

The real time processing chain, starting from the button trigger, needs for: 15 ms for EEG digitizing and transmission, 200 ms of EEG samples to complete the epoch, 1.61 ± 0.22 ms for the uLBP extraction, 0.69 ± 0.09 ms for the XNOR comparisons. The weights assignment asks for *0.09 ms and the final decision by using the extrinsic coding of the SVM lasts 0.24 ± 0.07 ms per observation. Overall, in the worst case the processing chain takes 218 ms. Excluding the transmission delay (15 ms) and the unavoidable register filling (200 ms), the computing times is of only 3 ms. This strong computing time reduction is linked to the use of end-to-end binary operations that realize the proposed FE methodology.

5 Conclusions In this paper has been detailed the design of an innovative machine learning (ML) algorithm based on a symbolization technique for a synchronous 2-choices Brain Computer Interface (BCI). The BCI operates on 8 EEG wireless smart electrodes that monitor the motor cortex and sensory-motor area, extracting information about the user intention from the Movement Related Cortical Potentials (MRCPs). The proposed BCI analyzes the EEGs through a symbolization-based algorithm: the Local Binary Patterning (LBP). It permits the reduction of the computational complexity in both the FE and real-time prediction stages. The LBP method is used to build binary matrices named reference masks. The masks represent a binary version of the brain signals typical behavior during two allowed choices. These matrices are computed on statistical base (offline ML stage). Moreover, the LBP is used to extract a binary matrix for each single observation (on-line classification). The single observations are XNOR compared with the reference masks, providing in output simple handleable features. These are statistically weighted and sent to a linear Support Vector Machine. The in-vivo validation confirmed an accuracy compatible with the state of the art: 85.61 ± 1.19%, and a computing time of only 3 ms (excluding 200 ms of unavoidable register filling). In conclusion, the here-described BCI algorithm proposes a novel time-domain FE approach for oscillatory brain potentials such as the MRCPs, typically analyzed in frequency (time and resources consuming). Moreover, the BAP choice leads to a strong reduction of the after-stimulation time needed to for the user intention recognition (only 203 ms after stimulus, about 4 times faster than the competitors. Also in term of learning time request, the BCI overcomes the limits of the state of the art solutions asking for only 10 min to extract the reference matrix. Finally, the introduction of a FE stage fully based on binary operations led to a very light computational effort, which is translated in very small computational timing (*3 ms).

Novel Synchronous BCI Based on 2-D EEG Local Binary Patterning

209

On the other hand, with the current implementation, the proposed BCI does not permit the increment of the available choices, remaining optimum only for 2 classes. It limits the system applicability in some typical BCI fields (e.g. it is not suitable for a fast BCI speller). Moreover, the BCI data rate is still low (about 8 commands/min) due to the proposed experimental protocol time constraints. In order to bridge these gaps, future works will concern the design of faster BCI data rate. The experimental protocol must not contain any external stimulation, but it will need to operate by using only the voluntariness status recognition. A next step in the BCI development will also concern the available choices increment, for example by using a classification model more suitable for a multi-class discrimination problem. Acknowledgment. This work was supported by the project AMICO (Assistenza Medicale In COntextual awareness, AMICO_Project_ARS01_00900).

References 1. Lotte, F., et al.: A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update. J. Neural Eng. 15(3), 031005 (2018) 2. Annese, V.F., De Venuto, D.: FPGA based architecture for fall-risk assessment during gait monitoring by synchronous EEG/EMG. In: 2015 6th International Workshop on Advances in Sensors and Interfaces (IWASI), Gallipoli, pp. 116–121 (2015). https://doi.org/10.1109/ iwasi.2015.7184953 3. Wolpaw, J.R., McFarland, D.J., Neat, G.W., Forneris, C.A.: An EEG-based brain–computer interface for cursor control Electroencephalogr. Clin. Neurophysiol. 78, 252–259 (1991) 4. Annese, V.F., Crepaldi, M., Demarchi, D., De Venuto, D.: A digital processor architecture for combined EEG/EMG falling risk prediction. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, pp. 714–719 (2016). ISBN 978-3-98153707-9 5. Qi, H., et al.: A speedy calibration method using riemannian geometry measurement and other-subject samples on A P300 speller. IEEE Trans. Neural Syst. Rehabil. Eng. 26(3), 602–608 (2018) 6. Kobayashi, N., Nakagawa, M.: BCI-based control of electric wheelchair using fractal characteristics of EEG. IEEJ Trans. Electr. Electron. Eng. 13, 1795–1803 (2018) 7. De Venuto, D., Torre, M.D., Boero, C., Carrara, S., De Micheli, G.: A novel multi-working electrode potentiostat for electrochemical detection of metabolites. In: SENSORS, 2010 IEEE, Kona, HI, pp. 1572–1577 (2010). https://doi.org/10.1109/icsens.2010.5690297 8. Ang, K.K., Guan, C.: Brain-computer interface in stroke rehabilitation. J. Comput. Sci. Eng. 7, 139–146 (2013) 9. De Venuto, D., Annese, V.F., Mezzina, G.: Remote neuro-cognitive impairment sensing based on P300 spatio-temporal monitoring. IEEE Sens. J. 16(23), 8348–8356 (2016). https:// doi.org/10.1109/jsen.2016.2606553 10. Lotte, F.: A tutorial on EEG signal-processing techniques for mental-state recognition in brain–computer interfaces. In: Guide to Brain–Computer Music Interfacing, pp 133–161. Springer, Berlin (2014)

210

D. De Venuto and G. Mezzina

11. Annese, V.F., De Venuto, D.: Fall-risk assessment by combined movement related potentials and co-contraction index monitoring. In: 2015 IEEE Biomedical Circuits and Systems Conference (BioCAS), Atlanta, GA, pp. 1–4 (2015). https://doi.org/10.1109/biocas.2015. 7348366 12. Birbaumer, N., Ghanayim, N., Hinterberger, T., Iversen, I., Kotchoubey, B., Kübler, A., Perelmouter, J., Taub, E., Flor, H.: A spelling device for the paralysed. Nature 398, 297–298 (1999) 13. Peters, B.O., Pfurtscheller, G., Flyvbjerg, H.: Automatic differentiation of multichannel EEG signals. IEEE Trans. Biomed. Eng. 48(1), 111–116 (2001) 14. Hübner, D., Verhoeven, T., Schmid, K., Müller, K.R., Tangermann, M., et al.: Learning from label proportions in brain-computer interfaces: online unsupervised learning with guarantees. PLoS ONE 12(4), e0175856 (2017). https://doi.org/10.1371/journal.pone. 0175856 15. Leeb, R., Tonin, L., Rohm, M., Desideri, L., Carlson, T., Millán, J.D.R.: Towards Independence: a BCI telepresence robot for people with severe motor disabilities. Proc. IEEE 103(6), 969–982 (2015) 16. Shanir, P.P.M., et al.: Automatic seizure detection based on morphological features using one-dimensional local binary pattern on long-term EEG. Clin. EEG Neurosci. 49, 351–362 (2018) 17. Schindler, K., et al.: On seeing the trees and the forest: single signal and multisignal analysis of periictal intracranial EEG. Epilepsia 53, 1658–1668 (2012) 18. de Tommaso, M., Vecchio, E., Ricci, K., Montemurno, A., De Venuto, D., Annese, V.F.: Combined EEG/EMG evaluation during a novel dual task paradigm for gait analysis. In: 2015 6th International Workshop on Advances in Sensors and Interfaces (IWASI), Gallipoli, pp. 181–186 (2015). https://doi.org/10.1109/iwasi.2015.7184949 19. McFarland, D.J., Miner, L.A., Vaughan, T.M., Wolpaw, J.R.: Mu and beta rhythm topographies during motor imagery and actual movements. Brain Topogr. 12(3), 177–186 (2000) 20. Green, J.B., StArnold, P.A., Rozhkov, L., Strother, D.M., Garrott, N.: Bereitschaft (readiness potential) and supplemental motor area interaction in movement generation: spinal cord injury and normal subjects. J. Rehabil. Res. Dev. 40(3), 225–234 (2003). Daw, C.S., et al.: A review of symbolic analysis of experimental data. Rev. Sci. Instrum. 74(2), 915–930 (2003) 21. Nakamura, A., et al.: Somatosensory homunculus as drawn by MEG. Neuroimage 7(4), 377–386 (1998) 22. De Venuto, D., Stikvoort, E., Tio Castro, D., Ponomarev, Y.: Ultra low-power 12-bit SAR ADC for RFID applications. In: 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), Dresden, pp. 1071–1075 (2010). https://doi.org/10. 1109/date.2010.5456968 23. De Venuto, D., Tio Castro, D., Ponomarev, Y., Stikvoort, E.: 0.8 lW 12-bit SAR ADC sensors interface for RFID applications. Microelectron. J. 41(11), 746–751 (2010). ISSN 0026-2692. https://doi.org/10.1016/j.mejo.2010.06.019 24. Carrara, S., Torre, M.D., Cavallini, A., De Venuto, D., De Micheli, G.: Multiplexing pH and temperature in a molecular biosensor. In: 2010 Biomedical Circuits and Systems Conference (BioCAS), Paphos, pp. 146–149 (2010). https://doi.org/10.1109/biocas.2010.5709592 25. Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Their Appl. 13(4), 18–28 (1998) 26. Christianini, N., Shawe-Taylor, J.C.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000) 27. Zweig, M.H., Campbell, G.: Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39(4), 561–577 (1993)

LSTM-Based Facial Performance Capture Using Embedding Between Expressions Hsien-Yu Meng(B) and Jiangtao Wen Department of Computer Science and Technology, Tsinghua University, Beijing, China [email protected], [email protected]

Abstract. We present a novel end-to-end framework for facial performance capture given a monocular video of an actor’s face. Our framework are comprised of 2 parts. First, we optimize a triplet loss to learn the embedding space which ensures the semantically closer facial expressions are closer in the embedding space and the model can be transferred to distinguish the expressions that are not presented in the training dataset. Second, the embeddings are fed into an LSTM network to learn the deformation between frames. In the experiments, we demonstrated that compared to other methods, our method can distinguish the delicate motion around lips and significantly reduce artifacts between the tracked meshes.

Keywords: Facial animation

1

· Facial performance capture

Introduction

In feature films and video games, digital characters are prominent, and it is crucial to delivering accurate facial animation since human is sensitive to delicate and slight facial motions such as compressed lips, stretched lips, pucker and dimple. Conventional facial animation pipelines require manual correction and removing artifacts to ensure that lip and eye contours are natural, and it requires tremendous efforts when there are several hours-long footages to be processed. In this paper, we present a transfer learning method to train an expression classifier with 23 basic expressions, and the trained model can be used to distinguish different expressions that do not appear in the training set with Euclidean distance. From the trained model, an expression embedding space can be constructed. For an input video footage, it can be mapped into this space and the embeddings are fed into an LSTM network, which will generate the blend shape coefficients. Our end-to-end framework takes in a 10 min’ single view RGBD video footage [19] as training input, and after training it can process the remaining video footage at 137 FPS on our server. Figure 1 illustrates the general pipeline of our framework. Furthermore, compared to other deep-learning method proposed in [10] and [5], our expression embedding space c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 211–226, 2020. https://doi.org/10.1007/978-3-030-29513-4_15

212

H.-Y. Meng and J. Wen

Fig. 1. Capture pipeline

can incorporate more delicate motion around eyes and lips, and our LSTM network can encode the time-related features to significantly reduce jitters between meshes as illustrated in Figs. 12 and 11. The rest of this paper is organized as following: Sect. 2 introduces related works in facial expression capture field; Sect. 3 defined the architecture of our network while Sect. 4 describes how we choose the baseline and setup the experiment and compares the result of our design and the baseline; Sect. 5 sums up the whole paper.

2

Related Work

Conventional approaches to capturing highly detailed facial expressions can be divided into two divisions. The first one requires depth information, either from multi-view videos [2,8,9] or from structural light [11,20] as Digital Emily Project proposed by [1]. The second kind requires no depth information but explores other information to capture the facial expression. In the production-level environment, marker-based techniques with manually drawn contours over the lip and eye regions on RGB image to ensure accurate lip and eye contour [3] are widely used. However, in real-time face capture, markless techniques are more common, which rely on texture information [14,17] to generate the PCA coefficients of vertices and textures. It generally requires a well-collected face dataset including meshes and textures of different identities and construct PCA base vectors. An energy function is designed and by minimizing it we get the best PCA coefficients. Moreover, the method proposed in [17] employs Gaussian-Newton algorithm to minimize the difference between the input image and rendered one, which requires rendering the generated mesh and computing the derivatives of cost function respect to every PCA coefficients at each iteration. While the texture generated from PCA basis is not realistic enough, to improve this, [15] proposed a deep learning method to synthesize the texture.

LSTM-Based Facial Performance Capture

213

Eigenfaces proposed by [18] represent facial appearances as linear models and several works were proposed to extend it [4,7,13]. The multi-linear PCA model can be described as below: Mgeo (α, δ) = aid + Eid · α + Eexp · δ

(1)

Malb (β) = aalb + Ealb · β

(2)

where aid is the average shape, aalb is the average reflectance, Eid is basis of shape, Eexp is basis of expression and Ealb is basis of reflectance. Basically α and β are the parameters in optimization problems. Another approach to capturing the distinctive expression of an actor is designated blendshape rigs. The mesh can be described as below: N αi · (Bi − B0 ) Mgeo (α) = B0 + Σi=1

(3)

where N is number of blendshapes excluding the neutral blendshape, B0 is the neutral expression and α is the coefficients to be applied to Bi . Blendshape makes it easier for CG system to transfer the controller semantics and expression dynamics from a generic template to the target blendshape model mentioned in [12].

3

Network Architecture

Our network contains 2 networks. The first one is FaceNet [16] optimized with triplet loss to perform expression verification and extract embeddings from the input video footage. The second one is an LSTM-based network which takes in a sequence of embedding vectors generated by the optimized FaceNet and attempts to learn the deformation between the embeddings as well as the mapping function between embeddings and blendshape coefficients. Our training dataset is comprised of 2 datasets. The FaceWarehouse dataset is composed of 150 persons with 23 different facial expressions as illustrated in Fig. 2. And the input footage is consist of 3500 frames at 30 FPS. Our target output is 51 coefficients of blendshapes, which are the combinations of different expression of an actor such as mouth press, mouth stretch, and mouth smile, etc. Every blendshape contains 7366 vertices and 14600 triangles. 3.1

Expression Embedding

FaceNet [16] is a popular network for face verification problems. It maps the face images to embedding space where Euclidean distance represents the similarity of faces as demonstrated in Fig. 4. By optimizing the triplet loss L(anchor, positive, negative), it minimizes ||anchor − positive||22 , and maximize ||anchor − negative||22 . Figure 3 shows its architecture while Table 1 lists the

214

H.-Y. Meng and J. Wen

Fig. 2. Part of FaceWarehouse dataset, each column are labeled the same

Fig. 3. FaceNet model proposed in [16]

Fig. 4. Triplet Loss in face verification problems: the anchor and the positive samples are same people while the anchor and the negative samples are different people

LSTM-Based Facial Performance Capture

215

Fig. 5. Triplet Loss in our experiment: the anchor and the positive samples are semantically same expression while the anchor and the negative samples are semantically different expression

Fig. 6. L2-norm between embeddings of frames on evaluation dataset

detailed parameters. The goal is to ensure that in the embedding space, an image xai (anchor) of a specific expression is closer to those xpi (positive) of the same expression than those xni (negative) of different expression, as illustrated in Fig. 5. The Loss function L is defined as: ΣiN [||f (xai ) − f (xpi )||22 − ||f (xai ) − f (xni )||22 || + α]

(4)

where α is a margin that is enforced between positive and negative pairs and N is the cardinality of all possible triplets in training set.

216

H.-Y. Meng and J. Wen Table 1. FaceNet architecture [16] Name

Kernel Stride Input-channels Output-channels Activation

conv1

7×7

2

3

64

pool1

3×3

2

64

64

conv2a 1 × 1

1

64

64

ReLU

3×3

1

64

192

ReLU

ReLU

rnorm1 conv2 rnorm2 3×3

2

192

192

conv3a 1 × 1

1

192

192

conv3

3×3

1

192

384

pool3

3×3

2

384

384

conv4a 1 × 1

1

384

384

pool2

ReLU

3×3

1

384

256

ReLU

conv5a 1 × 1

1

256

256

ReLU

conv4

3×3

1

256

256

ReLU

conv6a 1 × 1

1

256

256

ReLU

conv6

3×3

1

256

256

ReLU

pool4

3×3

2

256

256

conv5

concat

3.2

fc1

128

fc2

128

LSTM Network

After FaceNet is trained and tuned on FaceWarehouse dataset, the model is applied to infer the embedding of frames of a monocular video. We build a sequence LSTM network of length 10, the hidden state is set to 1 × 256 and each cell output 51 coefficients for corresponding blendshapes. The LSTM network architecture is shown in Fig. 7. The hidden state is sent to the next cell so that the network can learn the deformation between embeddings.

Fig. 7. VGG-like framework [10]

To examine whether the embedding carries enough information from frames for our LSTM network, we did two experiments as shown in Fig. 8.

LSTM-Based Facial Performance Capture

217

Fig. 8. Experiment: the first row is our pretrained FaceNet + LSTM network, the second row is the VGG-like network trained with LSTM network

First, we implement the end-to-end VGG-like framework mentioned in [10]. Figure 7 shows the architecture of the network and Table 2 lists the detailed parameters. Second, we build a deep network similar to the network proposed in [10], feed the fully connected layer’s output into LSTM network and train them together as the baseline. Furthermore, by changing the kernel size in conv a layers to 1 × 1, we only change the dimension in filter space. At the pre-processing step, we do not calculate the mean and variance across all training images to whiten the training set. The two networks are illustrated in Tables 2 and 3. In the first comparison experiments, we do PCA analysis on all training meshes and choose 160 basis vectors, which can explain 99.9% variances in the training set. The lossvs-epochs of 3 networks are shown in Fig. 9. We sum up all squared difference between meshes predicted by LSTM network and ground truth meshes as the loss.

4 4.1

Experiments Expression Embedding

We aim to train the network to extract the good features of footage images. In the experiment, we use the FaceWarehouse [6] dataset for training as demonstrated in Fig. 2, which contains frontal images of 150 persons and each person’s 23 different expressions. Furthermore, the semantically same expressions are labelled same thus there are 23 labels in our training set. In the training, we randomly sample the negative and positive training examples and feed them to the deep architecture to acquire the embeddings of those examples.

218

H.-Y. Meng and J. Wen

Fig. 9. log Loss: the blue line is the training loss of pretrained FaceNet + LSTM network, the red line is the training loss of the VGG-like network trained with LSTM network, the orange line is the network mentioned in [Laine 2017] [10].

In the training step, the FaceWarehouse dataset is cropped and resized into 160×160, the batch size is set to 24 and the loss became 1/125 of original loss after 375k epochs using 3 GeForce GTX 1080 cards. In the evaluation step, we generated expression embeddings from pre-trained model mentioned above and calculate the L2-norm between semantically different expressions. The result inferred from evaluation dataset is shown in Fig. 6 and it indicates that the L2-norm between neutral expressions is smaller than the L2-norm between the mouth-open expressions. Note that the pre-trained model is trained on FaceWarehouse dataset and evaluated on another footage, therefore, we can safely draw the conclution that the model is not over-fitting although the training set contains only 3450 images. 4.2

Comparison

Since the LSTM network can capture the time-related features and reduce the jitters between meshes, we visualize the result of our network in comparison with the baseline VGG-like network. Figures 11, 12, 13 and 14 are the results of 5 continuous frames in the validation video footage. The first row shows our result while the second row shows the result of the baseline VGG-like network.

LSTM-Based Facial Performance Capture

219

Fig. 10. Comparison: the first column is the footage, the second column is ground truth, the third column is result of VGG-like network as Fig. 7 described in [Laine 2017] [10], the fourth column is the result of our network.

As an example, Fig. 11 illustrates the deformation from a neutral expression to smile expression, and notice that the baseline (second row) is almost the same within the 5 continuous frames while our result (first row) shows more delicate deform process in lip region and eye region, as the eye region is getting smaller when the actor is smiling.

220

H.-Y. Meng and J. Wen Table 2. Network architecture in [10] Name

Kernel Stride Input-channels Output-channels Activation

conv1a 3 × 3

2

1

64

ReLU

conv1b 3 × 3

1

64

64

ReLU

conv2a 3 × 3

2

64

96

ReLU

conv2b 3 × 3

1

96

96

ReLU

conv3a 3 × 3

2

96

144

ReLU

conv3b 3 × 3

1

144

144

ReLU

conv4a 3 × 3

2

144

216

ReLU

conv4b 3 × 3

1

216

216

ReLU

conv5a 3 × 3

2

216

324

ReLU

conv5b 3 × 3

1

324

324

ReLU

conv6a 3 × 3

2

324

486

ReLU

conv6b 3 × 3

1

486

486

ReLU

Drop fc

160

Linear

fc

PCA num

Linear

Fig. 11. Neutral expression to smile. The first row is our result; the second row is result of VGG-like network as Fig. 7 described in [Laine 2017] [10]. See Fig. 15 for numerical analysis.

To represent a quantified measurement of jitters between frames, we define the mean squared difference D between frame f − 1 and frame f as described below: 1 N {f −1} {f } {f } = Σi − vi ||22 (5) ||vi D N

LSTM-Based Facial Performance Capture

221

Fig. 12. Neutral expression to mouth-open expression. The first row is our result; the second row is result of VGG-like network as Fig. 7 described in [Laine 2017] [10]. See Fig. 16 for numerical analysis.

Fig. 13. Mouth-press-left expression to compressed-lips expression. The first row is our result; the second row is result of VGG-like network as Fig. 7 described in [Laine 2017] [10].

where f is the frame index in the footage video and N is number of vertices of mesh. Figure 15 illustrates D of frames when the actor is changing from a neutral expression to smile expression. The baseline (VGG) shows a high peak at frame 2, which represents a sudden change in the reconstructed expression. This kind of jitters is common in the baseline’s result. However, our result has a steadier D, which suggests that the reconstructed expression is changing more smoothly.

222

H.-Y. Meng and J. Wen

Fig. 14. Failure case: neutral expression to mouth-open expression. The first row is our result; the second row is result of VGG-like network as Fig. 7 described in [Laine 2017] [10]. Table 3. Network architecture in our experiments Name

Kernel Stride Input-channels Output-channels Activation

conv1a 1 × 1

2

3

64

ReLU

conv1b 3 × 3

1

64

64

ReLU

conv2a 1 × 1

2

64

96

ReLU

conv2b 3 × 3

1

96

96

ReLU

conv3a 1 × 1

2

96

144

ReLU

conv3b 3 × 3

1

144

144

ReLU

conv4a 1 × 1

2

144

216

ReLU

conv4b 3 × 3

1

216

216

ReLU

conv5a 1 × 1

2

216

324

ReLU

conv5b 3 × 3

1

324

324

ReLU

conv6a 1 × 1

2

324

486

ReLU

conv6b 3 × 3

1

486

486

ReLU

fc

160

Linear

fc

PCA num

Linear

drop

Figure 16 illustrates D between frames in Fig. 12. When f = 3 the actor’s mouth is open, and D{3} is supposed to be the highest value among {D{1} , D{2} , ..., D{5} }. We can notice that the baseline (VGG) is much lower than our result (LSTM+FaceNet), which indicates the expression reconstructed

LSTM-Based Facial Performance Capture

223

Fig. 15. Mean squared difference analysis of Fig. 11 defined in Eq. 5.

Fig. 16. Mean squared difference analysis of Fig. 12 defined in Eq. 5.

by the baseline has smaller motion. This verifies what we have observed in the comparison between the first and the second row in Fig. 12. Figures 17 and 18 show the comparison between our result and the result of conventional method proposed by [5], which generates the next frame based on the α mentioned in Eq. 1 and displacement coefficients of current frame. Compared Figs. 11, 12, 13, it is clear that the VGG-like network cannot distinguish the intrinsic motion around lips while ours can. Figure 14 shows a failure case, which contains 5 continuous frames from a neutral expression to mouth-open expression. The first row in Fig. 14 demonstrates a motion from neutral to mouth-open-large expression and then to mouth-opensmall expression while the second row demonstrates a motion from a neutral

224

H.-Y. Meng and J. Wen

Fig. 17. Comparison: neutral expression to mouth-open expression. The first row is our result; the second row is result of DDE proposed by [Cao 2014] [5].

Fig. 18. Comparison: mouth-press-left expression to compressed-lips expression. The first row is our result; the second row is result of DDE proposed by [Cao 2014] [5].

expression to mouth-open-small expression. To claim this issue, we visualize the L2-norm between frames in Fig. 19. The elements on diagonal have a value of zero, which is as expected. However, notice that element (2, 3) has almost the same value as element (2, 4), which suggests that the expression embedding space is not trained well as the L2-norm cannot reflect the degree of mouth-opening. This can explain the failure case in Fig. 14.

LSTM-Based Facial Performance Capture

225

Fig. 19. Heatmap of L2-norm between neutral expression and mouth-open expressions on diagonal.

5

Conclusion

Figure 10 illustrates the comparison of ground-truth, VGG-like network [10] and our result, which suggests that the VGG-like network cannot discriminate the mouth-open-like expressions such as mouth-open-right and mouth-open-left. Moreover, Fig. 13 suggests that the model can be transferred to distinguish the delicate lip expressions (mouth-press-left and compressed-lips) which do not appear in the training dataset. For LSTM network, we claim that this network can learn the deformation between embeddings and perform more smooth result as demonstrated in Figs. 11 and 13.

References 1. Alexander, O., Rogers, M., Lambeth, W., Chiang, J.-Y., Ma, W.-C., Wang, C.-C., Debevec, P.E.: The digital Emily project: achieving a photorealistic digital actor. IEEE Comput. Graph. Appl. 30(4), 20–31 (2010) 2. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P.A., Gotsman, C., Sumner, R.W., Gross, M.H.: High-quality passive facial performance capture using anchor frames. ACM Trans. Graph. 30(4), 75:1–75:10 (2011) 3. Bhat, K.S., Goldenthal, R., Ye, Y., Mallet, R., Koperwas, M.: High fidelity facial animation capture and retargeting with contours. In: The ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2013, Anaheim, CA, USA, 19-21 July 2013, pp. 7–14 (2013) 4. Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A., Dunaway, D.: A 3D morphable model learnt from 10, 000 faces. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27-30 June 2016, pp. 5543–5552 (2016)

226

H.-Y. Meng and J. Wen

5. Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. 33(4), 43:1–43:10 (2014) 6. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: FaceWarehouse: a 3D facial expression database for visual computing. IEEE Trans. Vis. Comput. Graph. 20(3), 413–425 (2014) 7. Donner, R., Reiter, M., Langs, G., Peloschek, P., Bischof, H.: Fast active appearance model search using canonical correlation analysis. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1690–1694 (2006) 8. Fyffe, G., Hawkins, T., Watts, C., Ma, W.-C., Debevec, P.E.: Comprehensive facial performance capture. Comput. Graph. Forum 30(2), 425–434 (2011) 9. Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., Debevec, P.E.: Multiview face capture using polarized spherical gradient illumination. ACM Trans. Graph. 30(6), 129:1–129:10 (2011) 10. Laine, S., Karras, T., Aila, T., Herva, A., Lehtinen, J.: Facial performance capture with deep neural networks. CoRR, abs/1609.06536 (2016) 11. Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust single-view geometry and motion reconstruction. ACM Trans. Graph. 28(5), 175:1–175:10 (2009) 12. Li, H., Weise, T., Pauly, M.: Example-based facial rigging. ACM Trans. Graph. 29(4), 32:1–32:6 (2010) 13. Matthews, I.A., Baker, S.: Active appearance models revisited. Int. J. Comput. Vis. 60(2), 135–164 (2004) 14. Romdhani, S., Vetter, T.: Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20–26 June 2005, San Diego, CA, USA, pp. 986–993 (2005) 15. Saito, S., Wei, L., Hu, L., Nagano, K., Li, H.: Photorealistic facial texture inference using deep neural networks. CoRR, abs/1612.00523 (2016) 16. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. CoRR, abs/1503.03832 (2015) 17. Thies, J., Zollh¨ ofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2387–2395 (2016) 18. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci. 3(1), 71–86 (1991) 19. Weise, T., Li, H., Van Gool, L.J., Pauly, M.: Face/off: live facial puppetry. In: Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2009, New Orleans, Louisiana, USA, 1–2 August 2009, pp. 7–16 (2009) 20. Zhang, S., Huang, P.: High-resolution, real-time 3D shape acquisition. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 28–28, June 2004

Automatic Curation System Using Multimodal Analysis Approach (MAA) Wei Yuan1(B) , Yong Zhang1 , Xiaojun Hu2 , and Mei Song1 1

Beijing University of Posts and Telecommunications, No. 10, Xitucheng road, Haidian District, Beijing, People’s Republic of China [email protected] 2 DeepAit Office, No. 8, Wenhuiyuan north road, Haidian District, Beijing, People’s Republic of China

Abstract. Extracting sports highlights and summarizing the most exciting moments of the match is an important task for the broadcast media. However, it requires intensive video editing. We propose a Multimodal Analysis Approach (MAA) for auto-curating sports highlights, and use it to create a real-world system for editing aids of soccer highlight reels. MAA fuses information from the players actions (action recognition), the landmarks (image classification), the scores on the scoreboard (OCR) and the commentators tone of the voice (audio detection) to determine the most exciting moments of a match. In addition, we use the face recognition technology to make star highlights. The shot-boundary detection method is developed to accurately identify the start and end frames of highlights for content summaries. The proposed system has performed real-time highlights extraction from the video stream of the FIFA world cup 2018. Moreover, MAA produces highlights with better quality by comprehensive user studies from multiple participants subjects. Keywords: Automatic curation · Highlight extraction Multimodal analysis · Deep learning

1

·

Introduction

In modern life, sports videos, especially soccer videos are very popular among the viewers and receive the highest level of attention. Long duration and large quantity make it difficult to watch throughout a complete match in the fastpaced life. Wherefore, it is an urgent demand to produce a video summary containing the effectively detected and labeled highlights for current video analysis researchers. The tremendous growth in video data has led to a huge demand for tools that can speed up and simplify the production of highlight packs for more effective browsing, search and content summaries. However, most of the process for producing highlight reels is still manual, labor-intensive, and not scalable. In this paper, we present a novel method for auto-curating sports highlights in real time, demonstrating its application in extracting soccer highlights. c Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 227–240, 2020. https://doi.org/10.1007/978-3-030-29513-4_16

228

W. Yuan et al.

MAA uniquely fuses information from the video, texts and audio to determine a match’s most exciting moments. Simultaneously, a variety of cutting-edge algorithms in the field of computer vision are combined for automatic extracting highlights. At the video and image level, we use 3D CNN [1] to recognize the player action, use MobileNet [2] to identify landmark targets, and MTCNN [3] for stars face recognition; At the text level, the OCR technique is utilized to recognize the score on the scoreboard; At the audio level, the features of the time-domain and the frequency-domain are extracted, and the deep neural network is used to detect the exciting moments based on the commentator’s tone of the voice. Our automatic video editing system can detect goals, shots, corner kicks, penalties, fouls, substitutions, video assistant referees (VAR) and other exciting events in soccer in real time. The system can also recognize the stars faces in real time, and then produces personalized highlights generation based on the viewer’s favorite players. We use shot-boundary detection to accurately detect the start and end frames of highlights, and then add selected clips to the web for video editors’ quick viewing and retrieval. The interface of the system is shown in Fig. 1, called Auto-editing Highlights Management System. In summary, the main contributions of our work are listed below: • We create four kinds of soccer-centric datasets, which are: the videos dataset for recognizing the player action, the images dataset for detecting landmark targets in soccer matches, the audio dataset for analyzing the commentator’s tone and the dataset containing the stars faces. • According to the characteristics of different exciting events in soccer matches, we combine a variety of deep learning algorithms, such as action recognition in video, image classification and text recognition, to extract multimodal features to detect highlights. We conduct an extensive evaluation of our work and demonstrate the importance of each component in MAA. • We present a first-of-kind system for automatically extracting soccer highlights in real time by uniquely fusing multimodal excitement features from the video, texts and audio. Our system has been successfully demonstrated at the FIFA world cup 2018, processing live streams and extracting highlights.

2

Related Work

In the field of extracting soccer highlights, most of the recently reported work related to soccer videos focuses on a single event or a few events by using a single approach. Yet, MAA covers almost all the highlights of a soccer match, including goals, shots, corner kicks, penalties, fouls, substitutions, video assistant referees (VAR) and other important scenes not limited to specific events. Chen et al. [4,5] extracted audio features and lens type features, and used C4. 5 decision tree to detect the goals. Controlled Markov chain was also used to detect the goals [6]. Huang et al. [7] established Bayesian networks to identify goals, corner kicks, penalties and yellow cards at the two semantic levels, frame and lens. Yang [8] used Top-Hat transform to detect the goal-mouth to detect

Automatic Curation System

229

Fig. 1. The system web for auto-curation of soccer highlights in real time.

wonderful events. In [9], hidden Markov model was used to detect exciting video segments based on the analysis of the underlying features of the video. Zawbaa et al. [10] used a vector machine (SVM) and artificial neural network (ANN) for broadcast soccer videos summarization. There are also numerous rule-based approaches. Li et al. [11] used rulebased algorithm with low-level features to produce soccer video summaries. The researchers also used rules to detect specific events. Tjondronegoro et al. [12] performed statistical analysis on different types of events, and established a series of rules based on statistical results to classify soccer video events into goals, shots, and fouls. More recently, Liu [13] proposed a multimodal analysis for content-based soccer video retrieval using traditional methods, not deep learning. Different from existing methods, MAA combines excitement measures uniquely to produce highlights, including information from the player action, landmark objects in the match, the scoreboard on the screen and the commentator’s tone.

3 3.1

Automatic Curation Using Multimodal Analysis Approach System Architecture

The system architecture is illustrated in Fig. 2. Given an input video feed, MAA extracts four multimodal markers of potential highlights in parallel: the player action (recognized by a visual 3D-CNN classifier), landmark targets in soccer matches (detected by MobileNet), the score on the scoreboard (by an OCR

230

W. Yuan et al.

engine), and the commentator excitement (with an audio classifier). We combine face recognition results with highlight detection results to create star collections. In the following sections, we describe each component in detail.

Fig. 2. The multimodal analysis approach

3.2

Player Action Recognition

Soccer video event identification and player action recognition have been an interesting task in the field of research during the past few decades. A number of machine learning techniques, such as CNN (Convolution 2-dimensional), have been used to solve this problem, but 3D-CNN (Convolution 3-dimensional) still has not been implemented for this task well. Video Dataset. Because no suitable and effective dataset is available, we categorize soccer videos into five classes of soccer scenes and events and develop soccer videos dataset for training 3D-CNN. As shown in Fig. 3, the events are (a) shot, (b) corner-kick, (c) penalty and (d) foul. There are also (e) background scenes which none of the four events mentioned above happen. To build the soccer video dataset, we download soccer match videos of the FIFA world cup 2014 and Europe’s big five soccer leagues from the Internet and each video is coded by MPEG-4 with 25 frames per second. We use FFmpeg to crop a small sequence of downloaded videos related to our category. The negative examples (background clips) were randomly sampled from these videos. The time period of each video clip is equal to 3 s (75 frames). This is designed to let 3DCNN model process the complete key frames of every event and scene. In the

Automatic Curation System

231

soccer match, due to the different appearance frequency of different wonderful events, the amount of data we collect for different categories of video clips is also different: 1190 shots, 157 corners, 138 penalties. 493 fouls, 1224 background clips. Data Augmentation. In order to make the training samples as balanced as possible, we perform the left and right flip, noise reduction, and color brilliance processing for the clips of the category with a small amount of data, thereby the data amount was expanded by four times (see Table 1).

(a) Shot

(b) Corner

(d) Foul

(c) Penalty

(e) Background

Fig. 3. Five classes in the video dataset

Non-local Neural Network. CNN-based frameworks for action recognition can be generally classified into 2D-CNN and 3D-CNN. 2D two-stream architectures [14,15] feed RGB frames and optical flow fields into the network. In this way, the optical flow calculation takes a long time. 3D-CNN has more advantages for not needing to calculate the optical flow. multiple 3D-CNN models have been invented for spatial-temporal modelling such as C3D [16], P3D [17], and I3D [18]. Among them, the state-of-the-art framework is non-local neural network [1] which is based on I3D [18] for video modelling and captures long-range dependencies of human action in a video by computing interactions between each pair of positions in the feature space. We use non-local neural network [1] to ensure real-time to train an action recognizer to detect the player action. Inspired by when humans observe pictures, their focus is mainly on the center of the picture, we cut out 2/25 of each of the left and right sides of the video clip in our dataset, then resize into 224 * 224 pixels. It not only ensures that the video frame after resize is not excessively deformed, but also removes visual redundant information to improve the accuracy of action classification.

232

W. Yuan et al.

The models are pre-trained on Kinetics dataset [19]. We fine-tune the models using 16-frame input clips with Caffe2 deep learning library. These clips are generated by cropping out 64 consecutive frames randomly from the original full-length video and then sampling one frame every four frames. The purpose of setting the training sample length (75 frames) to be larger than 64 frames is to make the video frames input into the neural network random, and play a data augmentation effect. We train on a 2-GPU machine and each GPU has 4 clips in a mini-batch (so in total with a mini-batch size of 8 clips). The model is trained for 7000 iterations in total. The learning rate is 0.001 for the first 4000 iterations and 0.0001 for the next 3000. A momentum of 0.9 and a weight decay of 0.0001 are used. Dropout is adopted after the global pooling layer, with a dropout ratio of 0.5. We divide 90% of each class of the video clips as the training set and 10% of each class of the video clips as the test set for our method. The accuracy of each scene has been calculated, and the results are shown in Table 1. The model achieve the overall 91.9% accuracy on the test set. Table 1. Data augmentation and precisions of 5 scenes

3.3

Classes

Data size Precisions Raw clips After augmentation Training set Test set

Shot

1190

1190 (*1)

1070

120

96.7%

Corner

157

628 (*4)

568

60

100%

Penalty

138

552 (*4)

496

56

66.1%

Foul

493

986 (*2)

888

98

96.9%

Background 1224

1224 (*1)

1088

136

91.2%

Total

4580

4110

470

91.9%

3202

Image Classification

The above action recognition focus on the motion movement of the specific video clip. During our experiment, we found that action recognition model cannot distinguish between referee gives a card or the player raises his hand. Therefore, after identifying the raising hand scenes (from the foul scenes), we use the image classification to further identify the card scenes. Apart from this, substitution scenes have special landmark targets that are soccer electronic substitution cards. The method of image classification can also be used to identify the scene of substitution. In 2018, live broadcast of soccer matches such as the World Cup, VAR (Video Assistant Referee) was used to replay the controversial judgment of a specific segment. So, we use the image classification technique to identify the VAR event.

Automatic Curation System

233

Image Dataset. We extract some representative keyframes through a fixed interval from soccer match videos of the soccer match collections that include and substitutions, red cards and yellow cards to represent fouls scene. VAR scenes of the FIFA world cup 2018 are collected as the image dataset. There are about 300 images per category in the dataset (see Fig. 4).

Fig. 4. Referee with the card, substitution and VAR

Mobilenet V2. We get the classification model by training Mobilenet V2 [20]. Mobilenet V2 is a light weighted classification network. The original mobilenet [2] adapt the depth-wise and point-wise convolution. By reference the idea of resnet, mobilenet v2 adapt inverted residual blocks which can incorporate low level features with high level features through shortcut. The neural network is pre-trained on the ImageNet1k set [21] and then is finetuned on the image dataset. We use the publicly available pre-trained mobilenet v2 model and train the model end-to-end with pytorch deep learning framework. The input image is resized to 224 * 224. Synchronized SGD is used to train the model on 1 GPU. A weight decay of 0.0001 and a momentum of 0.9 are set. The model is trained for 12000 iterations in total, starting with a learning rate of 0.0025 and reducing it by a factor of 10 at every 4000 iterations. 3.4

Score Detection

Since the scenes of the goals are similar to those of the ordinary shot events in the soccer match, the player action recognition method mentioned in Sect. 3.2 can not distinguish goal events from shot events well. However, the good news is that the score board provides an important clue for goal detection. In a professional soccer match broadcast like the World Cup, the score is displayed on the scoreboard after every scored goal. The score board is a caption region different from the surrounding region, which provides information about the score of the match. The score board always appears in the upper right part of the image frame. So, we apply the technology of OCR (using the Tesseract engine [22]) within the region of score board in order to extract the score, as the region appear in specific locations of the image frame.

234

3.5

W. Yuan et al.

Commentator Excitement Detection

In order to realize an estimate of various exciting scenes not limited to specific scenes, we propose a novel approach to extract highlights based on commentator excitement. The excitement in the commentators’ tone is perhaps the most effective form of indicating highlights within the context of any sport. Especially in soccer, commentator excitement can indicate the fact that a significant event has just happened. The excitement in the commentators’ tone plays a vital role in determining the excitement degree and position of a potential soccer highlight clip. Audio Dataset. Commentator excitement samples from the FIFA world cup 2014 are used for training the audio classifier. For negative examples, audio tracks containing two classes of data, regular speech and regular cheer (without commentator excitement) are utilized. In total the final training set consist of 260 positive and 520 negative samples (2 classes). Each sample is set to be a WAV format audio clip with a duration of 1 s. Spectrogram Classification. In audio spectral analysis, the spectrogram of the audio signal [23,24] is a popular frequency domain transformation. It is a graphic representation of the energy content of a signal; the horizontal axis is time, the vertical axis is frequency, and the coordinate point value is the energy of audio signal. Since the three-dimensional information is expressed by a two-dimensional plane, the magnitude of the energy value is represented by a color. The spectrogram is defined as an intensity map of the Short-Time Fourier Transform (STFT) magnitude. The STFT is computed as a series of Fast Fourier Transforms (FFTs) of windowed data frames in practice, where the window hops forward through time [25]. In our approach, the data frames are filtered by 75% overlap of the Hanning window and FFT is 1024-point. The spectrogram image of the three classes of audio clips in our dataset is shown in Fig. 5.

Fig. 5. The spectrogram image of the three classes in the audio dataset, followed by commentator excitement, regular speech and regular cheer.

The spectrogram is a diagram that can be regarded as an image. In this way, classification methods utilized over images can be applied in the audio field. We

Automatic Curation System

235

propose an audio classification method using combinational spectrogram-based CNN. The base model is the Resnet-50 model pre-trained on Imagenet. The Tensorflow deep learning library is used for training the model with stochastic gradient descent, learning rate 0.001, momentum 0.9, and weight decay 0.0001. The leave-one-out cross-validation accuracy on the audio training set is 99.2%. 3.6

Star Recognition

Auto-curating highlights system can also achieve real-time face recognition from video streams. 40 star faces (10 photos per star) are collected to build the face registration library. In order to improve the accuracy, the various forms of star faces are cut out directly from the soccer matches. The face detection and alignment models are based on MTCNN [3], which is extremely fast, and is powerful enough for changes in lighting differences, facial expressions and background. The face recognition model is based on InsightFace [26], which completely learns how similar two face images are. After face detection and five-point alignment, the face image is sent into the face recognition model to obtain the feature (no additional training required), and then the feature is compared with the face feature of the face registration library. The two maximum and minimum values are respectively removed, and the remaining confidence is averaged as the final confidence result.

4 4.1

Experiments Experimental Setting

We evaluated the auto-curating highlights system in a real-world application, namely the FIFA world cup 2018 and analyzed the content of live stream in real time. The system ran on a Ubuntu16.04 Linux box with four Tesla P40 GPUs. Frames were extracted directly from the video stream at rate 25 fps and audio in 1-second clips was encoded as 16bit PCM at a rate of 44100. • Action recognition: Whenever a 64-frame video clip (2.56 s) is collected, a prediction is made, wherein the sampling strategy is same as the training, which is 16 * 4. Using the sliding window method, after the window slides forward 32 frames, the next prediction is made, that is, the overlap is 0.5. The model needs 0.67 s to process 2.56 s of video content. Different SoftMax output thresholds are set for different categories: shot 0.999, corner 0.9, penalty 0.98, foul 0.99. • Image classification: Image classification is performed every 5 frames (0.04 s per frame). • OCR: The system rely on the coordinates of the score-board which is determined by the Web GUI to intercept the part of the score image, and then convert the RGB image to a grayscale image and binarize the image. OCR is applied every 10 frames to achieve score recognition (0.37 s per frame).

236

W. Yuan et al.

• Audio detection: Whenever a one-second audio clip is collected, a prediction is made. The model needs 0.12 s to process 1 s of audio content. The SoftMax output threshold of the commentator excitement category is set as 0.97. • Face recognition: Face detection and face recognition are performed every 6 frames (0.02 s per frame). To determine the start and end of the video segment, we perform shot boundary detection [27] after detecting the highlight. The beginning and end of the segment are set at the lens change point (two lenses forward, two lenses backward). 4.2

Evaluation of Specific Scenes

For specific scenes, in order to evaluate the system’s highlights, we evaulate the system by 8 matches from the FIFA world cup 2018 as reported in Tables 2 and 3. As for the system is a real-time highlight detection system, we applied two criterion: Precision and Recall. Precision is evaluated as Eq. (1): P recision =

tp tp + fp

(1)

tp , as known as True Positive, refers to the number of true highlight that the system labeled as highlight. fp (False Positive) refers to the number of highlight that our system wrongly labeled as highlight. Thus, this criterion is used for evaluate the number of hit among our highlight detection. Recall is evaluated as Eq. (2): Recall =

tp tp + fn

(2)

fn (False Negative) refers to the number of highlight that the system failed to label as highlight. This criterion is used to evaluate the number of hit among all highlight clips. 4.3

Evaluation of Highlights

Assessing the quality of sports highlights is a challenging task because there are no well-defined basic facts. Similar to [28], we solved this problem by comparing the highlights automatically generated by our system with two human-based references. The first is to manually evaluate the video clips we made. The second is comparing with the highlights released by professional producers on China Central Television (CCTV). Human Evaluation of Highlights. Not considering specific scenes, in order to evaluate the quality of all the video segments produced by our system from 8 matches mentioned above, we invited 15 evaluators from different educational backgrounds, all of whom are soccer enthusiasts. Each evaluator was required

Automatic Curation System

237

Table 2. Precisions and recalls of 8 scenes from 4 matches (8 in total) Scenes

Matches Final Quarter-final Eighth-final France vs Russia vs Brazil vs Colombia Croatia Croatia Belgium vs England

Goal

Precision Recall

85.7% 100%

80% 100%

75% 100%

100% 50%

Shot

Precision Recall

64% 76.2%

83.3% 80.6%

79.5% 86.1%

84.8% 93.3%

Substitution

Precision Recall

80% 80%

66.6% 75%

100% 80%

87.5% 100%

Corner

Precision Recall

80% 100%

100% 92.9%

100% 91.7%

81.8% 100%

Penalty

Precision Recall

100% 100%

100% 60%

-

88.9% 72.7%

Foul(card)

Precision Recall

100% 100%

100% 80%

66.7% 100%

87.5% 87.5%

VAR

Precision Recall

50% 100%

-

-

-

Table 3. Precisions and recalls of 8 scenes from another 4 matches (8 in total) Scenes

Matches Eighth-final Sweden vs Switzerland

Group Matches Uruguay Germany Iran vs vs Russia vs Sweden Portugal

Goal

Precision Recall

100% 100%

100% 66.7%

75% 100%

100% 100%

Shot

Precision Recall

81.8% 90%

76.2% 84.2%

73% 76%

72% 85.7%

Substitution

Precision Recall

100% 80%

71.4% 83.3%

62.5% 83.3%

71.4% 83.3%

Corner

Precision Recall

92.9% 92.9%

85.7% 100%

83.3% 91%

71.4% 83.3%

Penalty

Precision Recall

-

-

-

50% 100%

Foul(card)

Precision Recall

80% 100%

66.7% 100%

100% 66.7%

71.4% 100%

VAR

Precision Recall

-

-

-

75% 100%

238

W. Yuan et al.

to assign a score of 0 to 10 for each video segment, 0 meaning the least exciting segment, and 10 meaning the most exciting segment. For each segment, the lowest and highest scores of the 15 scores which gets from 15 evaluators were removed, and the final score was the average of the remaining 13 scores. We averaged scores of all the segments from one match as an evaluation indicator for this match. Quantities of clips and scores from 8 matches are shown in the Table 4. As the average of these 8 scores, the score 8.1 can represent the quality of highlights extracted by our system. Table 4. Quantities of clips generated by our system and scores

Matches

France Russia Brazil

Colombia Sweden

Uruguay Germany Iran

vs

vs

vs

vs

vs

Croatia Croatia Belgium England

vs

Switzerland Russia

vs

vs

Sweden

Portugal

Average

Quantities 64

74

76

86

65

48

60

63

67

Scores

8.3

8.1

7.9

8.4

8.2

7.7

7.9

8.1

8.2

Comparison with Official Highlights. Previous experiments proved the quality of the highlights editing perceived by potential users of our system. Next, we compared the video segments generated by our system with the highlights released by professional producers on China Central Television (CCTV). For each match, 25 official highlights are released on CCTV, which are available at the official page (http://tv.cctv.com/). We asked evaluators assign scores for these highlights in the same way as mentioned in the previous section. By comparing Tables 4 and 5, we can find that our automatic editing system provides almost three times the number of videos from the official manual editing, with little difference in scores. In addition, about 80% of official highlights can overlap highlights generated by our system. Therefore, our system may be able to generate almost all professionally produced content. Table 5. Quantities of clips generated by professional producers and scores

Matches

France Russia Brazil

Colombia Sweden

Uruguay Germany Iran

vs

vs

vs

vs

vs

Croatia Croatia Belgium England

vs

Switzerland Russia

vs

vs

Sweden

Portugal

Average

Quantities 25

25

25

25

25

25

25

25

25

Scores

8.7

7.9

8.6

8.0

8.5

8.1

8.4

8.3

5

8.4

Conclusion

We presented MAA for automatically extracting real-time highlights from sports video streams, including visual analysis of the players, text analysis from the scoreboard and audio analysis from the commentator. Based on this, we developed a first-advanced system for automatic curation of soccer highlight packs,

Automatic Curation System

239

which was demonstrated at the FIFA world cup 2018. MAA has certain limitations, as different types of video have different excitement characteristics. Besides, the pattern of excitement features fusion is still unsatisfactory. In the future, we plan to apply MAA in other sports such as basketball, baseball and use more other methods to produce more exciting video summaries of the matches. Acknowledgment. This work was supported by the National Natural Science Foundation of China (No. 61871046).

References 1. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018) 2. Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017) 3. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016) 4. Chen, S.C., Shyu, M.L., Chen, M., Zhang, C.: A decision tree-based multimodal data mining framework for soccer goal detection. In: 2004 IEEE International Conference on Multimedia and Expo, ICME 2004, Vol. 1, pp. 265–268. IEEE, June 2004 5. Xie, Z., Shyu, M.L., Chen, S.C.: Video event detection with combined distancebased and rule-based data mining techniques. In: Multimedia and Expo, pp. 2026– 2029. IEEE, July 2007 6. Leonardi, R., Migliorati, P., Prandini, M.: Semantic indexing of soccer audio-visual sequences: a multimodal approach based on controlled Markov chains. IEEE Trans. Circuits Syst. Video Technol. 14(5), 634–643 (2004) 7. Huang, C.L., Shih, H.C., Chao, C.Y.: Semantic analysis of soccer video using dynamic Bayesian network. IEEE Trans. Multimed. 8(4), 749–760 (2006) 8. Yang, Y., Lin, S., Zhang, Y., Tang, S.: Highlights extraction in soccer videos based on goal-mouth detection. In: 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, pp. 1–4. IEEE, February 2007 9. Kang, Y.L., Lim, J.H., Kankanhalli, M.S., Xu, C.S., Tian, Q.:. Goal detection in soccer video using audio/visual keywords. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 3, pp. 1629–1632. IEEE, October 2004 10. Zawbaa, H.M., El-Bendary, N., Hassanien, A.E., Kim, T.H.: Event detection based approach for soccer video summarization using machine learning. Int. J. Multimed. Ubiquitous Eng. 7(2), 63–80 (2012) 11. Li, B., Pan, H., Sezan, I.: A general framework for sports video summarization with its application to soccer. In: Proceedings 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2003), vol. 3, pp. III–169. IEEE, April 2003 12. Tjondronegoro, D.W., Chen, Y.P.P.: Knowledge-discounted event detection in sports video. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Hum. 40(5), 1009– 1024 (2010)

240

W. Yuan et al.

13. Liu, H.: Highlight extraction in soccer videos by using multimodal analysis. In: 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 2169–2173. IEEE, July 2017 14. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, pp. 568– 576 (2014) 15. Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Van Gool, L.:Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, pp. 20–36. Springer, Cham, October 2016 16. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015) 17. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo3D residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. IEEE, October 2017 18. Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE, July 2017 19. Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., Natsev, P., Suleyman, M.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017) 20. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. arXiv preprint arXiv:1801.04381 (2018) 21. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015) 22. Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE, September 2007 23. Son, W., Cho, H.T., Yoon, K., Lee, S.P.: Sub-fingerprint masking for a robust audio fingerprinting system in a real-noise environment for portable consumer devices. IEEE Trans. Consum. Electron. 56(1), 156–160 (2010) 24. Malekesmaeili, M., Ward, R.K.: A local fingerprinting approach for audio copy detection. Sig. Process. 98, 308–321 (2014) 25. Smith, J.O.: Spectral audio signal processing, vol. 1334027739. W3K (2011) 26. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. arXiv preprint arXiv:1801.07698 (2018) 27. Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. J. Electron. Imaging 5(2), 122–129 (1996) 28. Merler, M., Joshi, D., Nguyen, Q.B., Hammer, S., Kent, J., Smith, J.R., Feris, R.S.: Automatic curation of golf highlights using multimodal excitement features. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 57–65. IEEE, July 2017

Mixed Reality for Industry? An Empirical User Experience Study Olli I. Heimo(&), Leo Sakari, Tero Säntti, and Teijo Lehtonen University of Turku, Turku, Finland [email protected] http://ar.utu.fi

Abstract. Due to the development of both computational power of mobile devices and display technology mixed reality solutions have become possible. As the entertainment industry has taken the technology eagerly in use, it has also opened several possibilities to develop solutions to enhance and support work processes in industrial and maritime use. In this paper user-testing of a few of these prototypes of mixed reality solutions is presented. Keywords: Image processing Mobile devices

Industry Maritime Mixed reality

1 Introduction The art and technology used to mix real-world elements with virtual components is called mixed reality (MR). By changing incoming information the user receives one can insert computer-generated information for the user to sense. This is usually but not necessarily done by visual and aural methods, but also olfaction, gustatory, and haptic senses can be utilized (see e.g. [1–3]). Thereby computer-generated 2D and 3D images can be superimposed on a real-world view captured using a smartphone, computer or other device, audio can be added for user to hear and various other senses can be simulated to alter the experience of reality [4–7]. A minimum requirement for MR is met with a smartphone or a tablet device since they include a camera and the capabilities to render and display augmented graphics [8]. Hence, with explosive growth of penetration rates of smartphones, applicationbased MR has been more accessible to users and industry alike. During the last few years consumer-oriented MR devices with proper resolution and usability have become more common. This widespread possibility for MR hardware has given the software producers new business opportunities, most of them yet unknown. The hypothesis of this research was that the MR solutions for industry will contribute to the trade by aiding the work processes of management, design, implementation and marketing. Due to the novelty of the technology there have not been user testing with MR solutions for the industry nor there is any software to test with. Thus these solutions had to be created to test the hypothesis. While the knowledge on the MR solutions in industry is lacking, the practices from other fields, such as entertainment, must be adopted and transformed to match the requirements set by the industry. © Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 241–253, 2020. https://doi.org/10.1007/978-3-030-29513-4_17

242

O. I. Heimo et al.

MARIN1 and MARIN22 projects were focused on industrial uses of MR technologies. The projects ran from January 2012 to March 2017. The projects had several industrial partners including infrastructure, construction and shipbuilding companies. As this paper focuses on the user experience, more in-depth descriptions of the projects’ prototypes and technology used can be found from [9]. In this paper we will present some of the project’s prototypes and introduce qualitative user testing results gathered with these prototypes. The testing was done in several different locations with a variety of users with different educational background and nationality. All test subjects were involved in industrial design or work involved with 3D-models. The aim of the study was to gather data to support or contradict the hypothesis, and to find practices which lead to more efficient MR solutions in the field.

2 Mixed Reality Mixed reality is a term for a combination of different levels of reality and digitally generated material in relation to the reality. While virtual reality (VR) applications are mainly focused in recreating the experience completely digitally, augmented reality (AR) aims to complement the natural world with superimposed synthetic material. Hence MR covers the area between the physical reality and completely simulated virtual reality, as illustrated in the Reality-Virtuality Continuum in Fig. 1 [10].

Fig. 1. Levels of mixed reality

While the original definition does not include the endpoints of the continuum, we consider also the virtual reality solutions representing the real world as part of the mixed reality even though it can be observed without a direct link to the reality. AR, on the other hand, combines the real-world elements with the virtual elements and thus is more limited by the real world requirements (e.g. you cannot fly over the model) in order to generate an immersive experience. There are various types of devices which enable MR experiences. The most common platform is the mobile devices, which have a high penetration rate. The video feed from the rear camera is combined on the screen with virtual content drawn on top of it creating a seamless viewing experience.

1 2

Mobile Augmented Reality Tool for Marine Industry, https://tech.utu.fi/ar/research/marin/. Mobile Mixed Reality Applications for Professional Use, https://tech.utu.fi/ar/research/marin2/.

Mixed Reality for Industry? An Empirical User Experience Study

243

Tablets have a few advantages over AR-glasses: • The user can instantly switch between AR and reality. • The weight of the device is not carried by the head. • Tablets are often already standard equipment in industrial environments, thus the investments are minimal. However, the computing performance in tablets may not be sufficient for some applications. Typically this may be the case when the models to be displayed are very complex and thus more computing power is required or the software and displayed content must be optimized for mobile devices. Tablet, when held in front of the user’s face, may also block the view, causing hazards in the workplace. This can be at least partly avoided with a dedicated tracking camera that can be aimed to other directions than the tablet display (see Fig. 2).

Fig. 2. IndoorAR application on a tablet

While it is easy to produce MR to the masses with mobile devices, at most they can only provide a limited window-type experience. The next phase, however, are the wearable eyewear both with VR-glasses as well as see-through displays, for example Oculus Rift, HTC Vive, and Microsoft HoloLens [11–13]. AR-glasses with see-through optics have been commercially available for several years, but the speed of technical progress with them has not been very fast. Vuzix,

244

O. I. Heimo et al.

Epson, Google and Brother are some of the companies that offer wearable AR-glasses (see e.g. [14–17]). However, as there has so far not been a great hit among such products in markets, the expectations from the Microsoft HoloLens are high. Microsoft HoloLens – even with the current development edition – has a few strong/unique points that give it an advantage over (most of, if not all of) the competition: • It has a very robust tracking that does not need markers, and actually adapts to the environment: it recognizes walls and other shapes of the room and can estimate its own location very accurately. This makes the AR experience very plausible. • Computing occurs in the device itself, so it needs no wires or connected backpacks for the equipment to function. • The device can interpret user’s hand movements and use them as input. This can be very handy, although long interactive sessions can also be tiresome. A separate hand controller is also supported. The FOV in HoloLens is very limited and thus the user often loses sight to some content. The quality of the picture, however, is good enough to give a fairly good view of the virtual content [12]. The glasses must be tightened around the wearer’s head and thus a long session might get tiring. Head-mounted VR-glasses are being developed by several companies, and new models are released frequently. Currently the two best-known VR-glasses are Oculus Rift and HTC Vive (see e.g. [11, 13]). As these are just display devices, the performance of both systems depends mostly on the computer being used, which is not sold as part of the system and can cost more than the VR-glasses. Google Cardboard and Samsung GearVR are low-cost systems that attach a mobile phone to a head-mounted mask (see e.g. [17, 18]). VR experiences can be created with these devices, but performance is distinctly lower than with dedicated VRglasses. Industrial development efforts should however be more focused on PC-based glasses, due to the higher performance of those systems and the typical need to use highly complex CAD-models. These devices were also considered during the project but no large-scale prototypes were constructed.

3 Our Prototypes ShipVR illustrated in Fig. 3 is a VR application built for HTC Vive. The main purpose of the application was to create an immersive visualization of a passenger ferry before its construction work was finished. The vessel can be observed from the outside in a simplified dock-environment and from the inside in the duty free store of the vessel. A miniature version of the duty free store is also presented. As in all Vive applications, the user can move inside the virtual environment. The application was visually impressive and thus was used to test the marketing aspects of MR in industry.

Mixed Reality for Industry? An Empirical User Experience Study

245

Fig. 3. HTC Vive running ShipVR virtual reality application.

BridgeVR is a VR application built for Oculus Rift representing a highway bridge. The application allows for dynamically loading IFC-models from an online server into a virtual environment. Several movement options are presented for the user to allow easy and quick navigation in possibly large-scale environments. Visibility of parts of the models can be toggled on and off, and metadata contained in the IFC-model can be inspected inside the scene. The user experience can also be customized by letting the user select between different object highlighting and movement options. BridgeVR was made as a tool for professional designers, engineers, and management. Thus it was made visually similar to existing CAD tools and was tested with aforementioned professionals. BuildingVR is a VR application that is based on BridgeVR. The main difference between these two is that as well as displaying static metadata contained in the loaded IFC-models, BuildingVR also contains components that allow gathering data from the design databases. In this demo a connection to a building information application was utilized. This integration allows combining the models of the buildings/spaces in the application with real-time sensor data. This means that inspecting for example the models of air conditioning systems gives the user real-time data about the state of the selected air conditioning components. Thus this system and its’ testing was more focused on building designers and maintenance personnel. IndoorAR is an Android tablet application that combines several technical solutions. The application guides its user to the maintenance target with a dynamically updating indoor map, downloads the CAD-model of the maintenance target from a server, and allows the inspection of the maintenance target via AR. Several objects can

246

O. I. Heimo et al.

be selected for inspection and static metadata from the IFC-model as well as real-time data from an building information application are displayed to the user. The functionality of the application is based on several subsystems. Displaying real-time data from the environment is made possible with a component that communicates with the building information application. The system extracts component identifiers from the metadata contained in the IFC-model and then uses those identifiers to request for component specific real-time data. HoloLensAR is a simple proof of concept created for Microsoft HoloLens. The application contains the 3D-model of an air conditioning system situated in an office building. The air conditioning system is positioned so that it is overlaid directly on top of the real-life system. After positioning the model, real-time data from the air conditioning system can be fetched from the relevant databases. The main focus with testing HoloLensAR and IndoorAR was to find differences of use between VR solutions and AR solutions – and between different AR solutions and platforms – in both user experience and use cases. In the next chapter the user testing is discussed in more detail.

4 User Testing 4.1

Test One in Savonlinna

First test was done in Savonlinna at a bridge construction site where 13 site engineers and construction workers were given 5 to 10 min to use the BridgeVR software with Oculus Rift while others were discussing with the research team and monitoring the use from a screen. All test subjects were male with 7 having a bachelor’s degree in engineering, 3 with vocational training and one with master’s degree in engineering. Two test subjects were from age group 18–20, 3 from 25–34 years of age, six in 35–49 age bracket, one from 50–64 and one over 65 years of age. Ten of the test subjects had no experience from using VR-glasses, two had some experience and one was more experienced user of VR technology. Five of the subjects do not use 3D models (with computer or otherwise) at their work, five once a week or less, two more than once a week and one almost daily. All the users saw that they are proficient or fairly proficient in using the computers and most of them were interested in technology, computer games and VR solutions. Twelve of the users saw that the VR solutions improved the perceiving of the environment and nine saw that the single elements were also easier to perceive through the VR-glasses. Three of the users did not see a particular advantage in perceiving the elements and one person felt that it was harder. Eight of the test subjects found the movement in the VR as easy and natural while three of them found it “rather easy”. Two test subjects saw that they would require more training with it. Out of the thirteen test subjects twelve saw that by sitting down the system would be nicer to use while one saw no difference between sitting down and standing up. Eleven would rather use a gamepad (as used in the test) to control the movement while two of them would rather use mouse-keyboard – combination.

Mixed Reality for Industry? An Empirical User Experience Study

247

The UI also had “virtual nose3” as well as “virtual eyelids4” to make using the app feel more natural. No one experienced such attributes distracting or unattractive, because no one noticed the nose and only two subjects noticed the blinking. Five subjects saw that the VR system could be useful for the company’s management, 12 for construction managers, 12 for designers and nine for installers and builders. Also, supervisors, sales department, as well as to subscribers and audience were suggested to be good target group for utilisation. The subjects were a very homogenous group by sex and education level, but it describes clearly the target organization’s employees, those who are most likely to use the system. However, one can plausibly argue that a broader and more heterogeneous sample would certainly have given answers of higher variance. The biggest problem was the novelty of the technology and the powerful immersivity made the test subjects excited about the technology itself (wow-effect). This new value added enthusiasm again may obscure the evaluation of functionality, usefulness, and effectiveness of the system, which is reflected in over-optimistic and over-positive assessment. User evaluations showed no clear correlation between the ease of use and age. However, while using the system it was clearly noticeable that the younger embraced the use of the device faster. That might be explained with a more critical view of their own competence displayed by younger people, as well as the fact that the age groups were very broad. In addition during the testing following problems were detected: • The bridge model obtained from designers was not fully finished and this caused confusion among users. Virtual objects were coloured in accordance with the virtual model which slightly diminished the convenience and efficiency for some users. These things should have been mentioned to the users. • Almost everyone forgot most of the primary advantage of VR-glasses: the ability to move your head freely around. The test subject could look directly over their shoulder and rotate the angle of view only with their controller. The problem disappeared from the majority when they were notified about that, but some reverted back to this habit within a minute or two. We named this phenomenon as “neck-lock”. • The user was able to “peel” the 3D-model in order to eliminate the elements of blocking the view by using the controller keys. This caused some problems because the controller was not visible and thus the position of the fingers in the controller was neither visible while the VR-glasses were equipped. • As the test subjects directed their gaze to a single object, the object became highlighted in white. None of the users commented on the issue, but when they were informed about the possibility to remove the highlighting, they took the option off and did not put it back on.

3

4

http://www.purdue.edu/newsroom/releases/2015/Q1/virtual-nose-may-reduce-simulator-sickness-invideo-games.html. see e.g. https://www.youtube.com/watch?v=addUnJpjjv4&t=40m5s.

248

O. I. Heimo et al.

• Most of the properties desired by test subjects already existed, but they did not find the features without help. • The test subjects gave no negative feedback during the use. In addition, we found that during the use scales were not understood properly and the perception of the scale was challenging while moving others, “crawled” in the surface while the others flew firmly at an altitude of three meters. To remedy this problem, we recommend to implement well-known objects such as people and vehicles, in order to make the scale clearer, as well as to implement a “measurement and distance tool” to the UI for the user to get some measurements. Also, it is advisable to pay more attention to VR-glasses’ lens adjustments in order to facilitate the detection of scale. Subjects also requested for patterns that would indicate where the construction site machinery fit to move (e.g. “the bulldozer model”). In addition, the “fourth dimension” (i.e. time) was requested, so that the movement of the heavy machinery could be further planned. This clearly indicates that the test subjects saw the VR system not as a toy but as a tool to ease their work. General reception was very positive. Examination of the structures inside was found entertaining and informative. Usability was feasible, because all the subjects learned to move and use the system independently in only 5 min. Even though most of the users started with fumbling, some of the test subjects began immediately to use the system as a planning tool and to discuss with others about the structures, as well as to plan future steps of the project with the tool. 4.2

Test Two in Helsinki

The second testing was done with a building engineering company. The testing was done by introducing five test subjects to three different systems BuildingVR, HololensAR and IndoorAR, analysing their use and then interviewing them about the MR solutions. One of the test subjects was under 24 years of age, two between 25 and 34 and two from the age group 35 to 49. Three of them had bachelors’ level education, one had masters’ level, and one with high school education. All of them had used VR or AR before, described themselves as proficient in using computers and all of them used 3D software daily. The target group was more heterogeneous by sex and age but smaller than the group in Savonlinna. All the test subjects were selected because they expressed interest towards these new solutions. All the test subjects were very positive and enthusiastic about the new solutions, but according to the test subjects some designers do not share the initial interest in VR and thus all the results are not generalizable. While their 3D models in tablets are already used in construction site by their customers and they have tested VR in different solutions, AR was something they viewed with great interest. Many of the test subjects enjoyed using the VR while standing because it was easier and more natural for them to move about inside the model. All test subjects learned to use the systems very quickly. Yet they were somewhat certain that if the VR-glasses would be a tool for everyday use, sitting down could be easier and more relaxing way

Mixed Reality for Industry? An Empirical User Experience Study

249

to use the system. Standing while using VR also eased the understanding of the scales and thus requirement for objects that implicate the sizes were not that big, but if people would use the system mostly sitting down, the scale problem may yet occur. Movement in the VR system was found somewhat awkward until the ‘teleportation’ was found. Compared to the Savonlinna study the users were more keen to “jump around” rather than to fly in the virtual world. Also it seems that the users will prefer the input system they are more accustomed to. In this study the keyboard-mouse – interface was requested by all the test subjects that have more history in computer (nonconsole) video games. All the users used the systems instantly and the minor hardships compared to Savonlinna study did not occur in this test. This may be because the education and work history differ between construction site engineers and design engineers working indoors. The AR solutions (which were not in use in Savonlinna) were easier to use. The MS HoloLens AR gave a bigger wow-effect with excellent tracking. The hand gestures were seen hard but the “clicker” - controller clearly improved usability. Test subjects were keen to “wander around” the corridors to get a proper picture of the 3D model inside the building. Moreover, the imaging of the heating, piping, and air conditioning built in the walls was according to the users a fine tool to understand the building and to plan the tasks ahead. AR was seen more useful tool than VR for the company. For both VR and AR systems the requirement for accurate information was emphasized. A system with inaccurate or outdated information was described as “unnecessary”. Also a requirement for the system to co-operate with other systems rose from the discussions. Moreover, test subjects were keen to have these systems for not a single person use, but for collaboration where several users could simultaneously and in real-time (and possibly from different physical locations) co-operate with the models in virtual or augmented space. General reception was again very positive. The test subjects were confident that the MR solutions will be an improvement to their daily work but were not sure about the fields next to them. Almost no pondering about the “bigger picture” was done by the test subjects, except one supervisor who saw that these solutions could benefit all the possible user groups. The most important lesson learned was that a good tracking is essential for AR to be useful in work and thus lessens the overall user resistance against the system. 4.3

Test Three in Southampton

Third testing session was conducted at a classification society in Southampton. A total of eight test subjects were interviewed. The interviewees were selected beforehand to ensure a maximally diverse interview group. All test subjects had either completed further or higher education, one test subject was from age group 25–34, another three were aged between 50 and 64 and four were from a group between 35 to 49 years of age. Two of the test subjects were female. Out of the eight test subjects six had their first touch to VR, while every subject got to experience AR for the first time.

250

O. I. Heimo et al.

Each interviewee tested IndoorAR, HoloLensAR, and slightly modified versions of ShipVR and BridgeVR. The interviewees used the applications for roughly thirty minutes just before the interviews. After trying out demonstrations, six out of the eight participants felt that the MR demonstrations made perceiving environments easier and more natural than common CAD-tools. Two participants felt that their expertise with existing CAD-tools meant that MR applications couldn’t immediately offer improved spatial sensing. Both of them were CAD specialists. Four out of eight participants did not feel nauseous or dizzied in the slightest sense, while the four remaining participants felt weird for a tiny moment but got used to it fairly quickly. Six participants said that movement in the VR demonstrations felt natural and could be learned nearly immediately. The remaining two still thought that movement was relatively easy and could be learned fairly fast. Most observations from the test event in Southampton were in line with the findings in the previous events, as similar problems, comments and feedback were observed. The largest difference between this event and the other events was that even though all test subjects were enthusiastic about the potential of mixed reality applications, only a part of them could think of direct application areas within their own field. This could be a direct result of the fact that the test group was very diverse and included personnel outside the planned target group of the demonstration applications.

5 Discussion During the first two tests the interview contained a part where the test subjects were asked to describe potential areas of use for the system. Their personal view on the usefulness of MR is presented in Table 1. Table 1. “What kind of image you have of the usefulness of mixed reality in the following positions?” Perhaps not so useful

5

Equal to the traditional methods 1

9

1

1

5

4

2

8

3

1

8

3

1

Very Somewhat useful useful Visualisation tool for 12 customers and managers A visualisation tool for 7 designers and site managers 6 A tool for inspection in different levels of work process Design tool for building and 5 repairing A tool for improving and 5 altering the plans and design What else designing work 2 process (1) Marketing (1)

Clearly worse than traditional methods

Mixed Reality for Industry? An Empirical User Experience Study

251

The aforementioned opinions reflect the position of the interviewees in their tasks. Most of the test subjects agreed that the MR solutions should be used as a visualisation tool for customers and managers. During the interviews, though, it became obvious that the terms used in this chart did not always mean the same things to different people and required some discussion and interpretation and thus were not included in the following studies. Some users experienced nausea during the use of VR-glasses, but the big majority were unaffected by it. The most significant pattern that was present in the event in Southampton, as well as the other events, was that architects and designers who use CAD-tools daily think that VR does not offer much to them. This can be compared to “reading the Matrix flow”, from the movie Matrix (1999). A continuous pattern was that roughly half of the interviewees considered tabletbased AR to be a thing-of-the-past, while the rest thought it had the greatest potential due to tablets being a widespread technology. Ergo, tablets divided the views of test subjects the most. For all of the systems and various backgrounds of the test subjects, the desire to implement more MR solutions to their work was common. Due the prototype and demonstrative level of these solutions no actual work could be tested within the testing period and thus it is impossible to properly understand the effect for the individuals and their work (c.f. [19–21]). Yet it seems that these solutions have a possibility to benefit the work process in various industrial and maritime tasks. As this research was conducted by the members of the same research group, which conducted the actual producing of the system, the results are obviously not as convincing as those conducted by a third party. Moreover, the interviewees knew they were speaking to the “developers” and thus may have softened their tone and opinions just for being polite. None the less, the results seem to be convincing enough that even when toned down, it gives a clear view on how these kinds of systems should be developed further.

6 Conclusions The user testing gave convincing results that the industry workers, designers and managers should not suffer a large-scale user resistance if these systems were introduced. The test setting was limited but produced quite consistent results from where some indications can be deduced. Overall, when asking from the test users, the potential to use VR and AR systems in industry seems good. Most of the test subjects see potential in the technology and are eager to implement the solutions to their use. The AR and VR solutions are easy to learn and use, thus suiting various possibilities with users both new and experienced with these solutions. If these systems are taken into use, it is important to remember the requirements, because a non-functional, or inaccurate system, or a system with outdated information is seen as unnecessary. The interesting notion that the VR- and AR-glasses should be focused on “their” specific field (but not necessarily to others) gives a hint that if everyone feels this way,

252

O. I. Heimo et al.

perhaps they are suitable for many of those fields. It must be noted that some of the 3D designers in Southampton did not wish AR/VR to be implemented to their work because they consider the traditional design tools as more intuitive. Thus, it is possible that the AR/VR solutions do not benefit the CAD designing as much as other fields. Hence, these results support the hypothesis that these systems contribute to the trade by aiding the work processes of management, design, implementation and marketing. In addition the system seems to support the maintenance processes. In contradiction to the hypothesis however the CAD designers, as mentioned above, are not so enthusiastic to implement these systems to their work flow. Thereby, we suggest focusing more on marketing, management, engineering and construction giving time for these tools to develop to meet the needs of the CAD designing sector. It is important to notice that these prototypes are one of the first of their kind and must be treated as such. As they seem to produce a noticeable amount of wow-effect they also indicate that these systems could indeed promote the work processes in wide range of work processes. As the technology and practices in MR improve, there seems to be a wide potential in implementing these kinds of solutions as a part of normal workflow in the industrial sector. While being still expensive both in software and hardware, the development and the generalisation of the technologies will lower the costs. While the current solutions are a combination of custom software and library components, the tracking solutions done for MS HoloLens indicate that the requirement of custom software will soon diminish increasing the cost-benefit – ratio of these systems. Most of all, it is important to notice that when creating these systems, there are many different fields of use for one single solution, and the expenses can be divided between construction, engineering, management and marketing thus generating more value with the same money. Acknowledgment. The research has been carried out during the MARIN2 project (Mobile Mixed Reality Applications for Professional Use) funded by Tekes (The Finnish Funding Agency for Innovation) in collaboration with partners; Defour, Destia, Granlund, Infrakit, Integration House, Lloyd’s Register, Nextfour Group, Meyer Turku, BuildingSMART Finland, Machine Technology Center Turku and Turku Science Park. We would especially like to thank all those participating to this study.

References 1. Ranasinghe, N., Karunanayaka, K., Cheok, A.D., Fernando, O.N.N., Nii, H., Gopalakrishnakone, P.: Digital taste and smell communication. In: Proceedings of the 6th International Conference on Body Area Networks (BodyNets 2011). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium, Belgium, pp. 78–84 (2011) http://dl.acm.org/citation.cfm?id=2318795 2. Colwell, C., Petrie, H., Kornbrot, D., Hardwick, A., Furner, S.: Haptic virtual reality for blind computer users. In: Proceedings of the Third International ACM Conference on Assistive Technologies (Assets 1998), 92–99. ACM, New York (1998). http://dx.doi.org/10. 1145/274497.274515

Mixed Reality for Industry? An Empirical User Experience Study

253

3. Ischer, M., Baron, N., Mermoud, C., Cayeux, I., Porcherot, C., Sander, D., Delplanque, S.: How incorporation of scents could enhance immersive virtual experiences. Front. Psychol. 5, 736 (2014). https://doi.org/10.3389/fpsyg.2014.00736 4. Bujak, K.R., Radu, I., Catrambone, R., MacIntyre, B., Zheng, R., Golubskic, G.: A psychological perspective on augmented reality in the mathematics classroom. Comput. Educ. 68, 536–544 (2013) 5. Di Serio, Á., Ibáñez, M.B., Kloos, C.D.: Impact of an augmented reality system on students’ motivation for a visual art course. Comput. Educ. 68, 586–596 (2013) 6. Seppälä, K., Heimo, O.I., Pääkylä, J., Latvala, J., Helle, S., Härkänen, L., Jokela, S., Järvenpää, L., Saukko, F., Viinikkala, L., Mäkilä, T., Lehtonen, T.: Examining user experience in an augmented reality adventure game: case Luostarinmäki handicrafts museum. In: 12th IFIP TC9 Human Choice and Computers Conference – 7th–9th September 2016 (2016) 7. Heimo, O.I., Kimppa, K.K., Yli-Seppälä, L., Viinikkala, L., Korkalainen, T., Mäkilä, T., Lehtonen, T.: Ethical problems in creating historically representative mixed reality makebelief. In: CEPE/ETHICOMP 2017 Values in Emerging Science and Technology, University of Turin, Italy, 5–8 June 2017 (2017) 8. Henrysson, A., Ollila, M.: UMAR: ubiquitous mobile augmented reality. In: MUM 2004 Proceedings of the 3rd International Conference on Mobile and Ubiquitous Multimedia, pp. 41–45. ACM, New York ©2004 (2004) 9. Sakari, L., Helle, S., Korhonen, S., Säntti, T., Heimo, O.I., Forsman, M., Taskinen, M., Lehtonen, T.: Virtual and augmented reality solutions to industrial applications. In: Proceedings of COMPIT 2017 International Conference on Computer Applications and Information Technology in the Maritime Industries, 15–17 May 2017, Cardiff, UK (2017) 10. Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented reality: a class of displays on the reality-virtuality continuum (PDF). In: Proceedings of Telemanipulator and Telepresence Technologies, p. 2351(34) (1994) 11. HTC, Vive. https://www.vive.com/. Accessed 13 Mar 2017 12. Microsoft, Microsoft HoloLens. https://www.microsoft.com/microsoft-hololens/en-us. Accessed 13 Mar 2017 13. Oculus, Rift. https://www.oculus.com/rift/. Accessed 13 Mar 2017 14. Brother, AiRScouter. https://www.brother-usa.com/AiRScouter/. Accessed 13 May 2017 15. Epson, Moverio Smart Eyewear. https://epson.com/moverio-augmented-reality. Accessed 13 Mar 2017 16. Vuzix, Vuzix—View the Future. https://www.vuzix.com/. Accessed 13 Mar 2017 17. Google, Google Cardboard | Experience virtual reality in a simple, fun and affordable way. https://vr.google.com/cardboard/. Accessed 13 Mar 2017 18. Samsung, GearVR—Powered by Oculus. http://www.samsung.com/global/galaxy/gear-vr/. Accessed 13 Mar 2017 19. Nurminen, M.I.: People or Computers: Three Ways of Looking at Information Systems. Studentlitteratur, Cartwell Brat Ltd., Lund (1986) 20. Heimo, O.I., Kimppa, K.K., Nurminen, M.I.: Ethics and the inseparability postulate. In: ETHIComp 2014, Pierre & Marie Curie University, Paris, France, 25th–27th July 2014 (2014) 21. Heimo, O.I.: Icarus, or the idea toward efficient, economical, and ethical acquirement of critical governmental information systems, Ph.D. thesis, Turku School of Economics, University of Turku 2018 (2018)

Person Detection in Thermal Videos Using YOLO Marina Ivasic-Kos(&), Mate Kristo, and Miran Pobar Department of Informatics, University of Rijeka, 51000 Rijeka, Croatia {marinai,mpobar}@uniri.hr, [email protected]

Abstract. In this paper, the task of automatic person detection in thermal images using convolutional neural network-based models originally intended for detection in RGB images is investigated. The performance of the standard YOLOv3 model is compared with a custom trained model on a dataset of thermal images extracted from videos recorded at night in clear weather, rain and fog, at different ranges and with different types of movement – running, walking and sneaking. The experiments show excellent results in terms of average precision for all tested scenarios, and a significant improvement of performance for person detection in thermal imaging with a modest training set. Keywords: Thermal imaging Object detector Convolutional neural networks YOLO Person detection

1 Introduction The purpose of object detection is to classify objects in images and mark their exact position. Many successful machine learning algorithms have been developed for detecting objects such as human face [1] or human figure [2] in RGB images. Thermal cameras are now ubiquitous in video surveillance systems that take care of the safety of people and objects in urban areas, at state borders, in guarded areas, etc. Because of global terrorist threats and illegal migration, concerns about the safety of innocent people have been intensified. To prevent unwanted events and protect people and their property, investment in security systems has reached record levels trying to utilize all available technological achievements and develop sophisticated systems. Thermal cameras are important for surveillance and security because they can be used in unfavorable weather conditions when ordinary RGB cameras cannot be used or give poor results, such as night and full darkness (Fig. 1) or rain and fog. Today, the best object detection results in RGB images are achieved by models based on convolutional neural networks (CNN). The popularity of convolution neural networks and deep learning began with the great success of AlexNet for the image recognition task in ImageNet Challenge in 2012 [3]. Since then, many successful CNN architectures for object detection have been developed, such as R-CNN [4], SSD [5], R-CNN mask [6], R-FCN [7] and YOLO [8]. Several methods have been proposed specifically for person detection in thermal videos. Papers [9, 10] use the HOG features that are commonly used in the task of pedestrian detection in RGB images as the fundamental feature for detecting persons in © Springer Nature Switzerland AG 2020 Y. Bi et al. (Eds.): IntelliSys 2019, AISC 1038, pp. 254–267, 2020. https://doi.org/10.1007/978-3-030-29513-4_18

Person Detection in Thermal Videos Using YOLO

255

Fig. 1. Night vision vs. thermal imaging showing that tree camouflage cannot hide a person from a thermal camera (https://www.youtube.com/watch?v=rAvnMYqj2c0).

thermal images. In [9] the HOG features are extracted from previously detected regions of interest, and an AdaBoost classifier is used to make the final decision. The method presented in [10] in addition to region proposal and classification based on HOG tracks the detected persons across several frames using template matching to suppress false positive detections. Background subtraction methods are used in [11] as a first stage to identify regions with movement in images that are potentially containing moving pedestrians. These regions are then used as regions of interest from which features are extracted and used for classification with the AdaBoost classifier in the second stage. The goal of this paper is to investigate whether the CNN method could be successful for the task of detecting people in images and videos obtained with a thermal camera. Due to the differences in visual and thermal image features, the aim is to explore how well deep learning methods successful for object detection in optical images will be successful with thermal imaging. For the detection task, the YOLOv3 network [12] will be used, as it achieves object detection results in RGB images at the state-of-the-art level for different detection tasks [13, 14]. In the next section, the basic information about thermal imagery is provided. The detection pipeline of YOLO object detector is given in Sect. 3. Dataset and object detection experiments are described in Sect. 4. The evaluation results are presented and discussed in Sect. 5. The work ends with the conclusion and direction for future research.

2 Thermal Imagery Thermal cameras record the heat emitted by the subject being monitored and form an image by infrared (IR) radiation. IR radiation is the electromagnetic radiation emitted by the relative heat generated or reflected by an object, and therefore IR imaging is termed thermal imaging and an IR image thermogram. The wavelengths of IR range from 400 nm to 1400 nm, Fig. 2. IR wavelengths are longer than those of visible light, so IR is invisible to humans [15].

256

M. Ivasic-Kos et al.

Fig. 2. Wavelengths of light in nm (https://www.scienceoflight.org/ir-light/).

As thermal sensors form an image of the environment or object solely according to the detected amount of heat energy of the recorded object, unlike visible sensors, they are insensitive to light conditions and changes in light and robust to different weather conditions and a wide range of light variations [16, 17]. However, thermal sensors provide much less detail than visible light cameras, because instead of the information that gives color in the visible spectrum, it only has the temperature ranges in the thermograms, Fig. 3.

Fig. 3. IR sensors provide much fewer details than the optical sensor of visible light (https:// www.wildlabs.net/resources/case-studies/hwc-tech-challenge-update-comparing-thermopile-andmicrobolometer-thermal).

Unlike optical sensors, IR sensors are very sensitive to changes in ambient temperature. The heat that some object emits is not constant but depends on the internal state of the object and the ambient temperature, Fig. 4. For example, the human body temperature makes the core temperature (the temperature of the abdomen, the thoracic and the cranial cavities) and the skin temperature (skin and subcutaneous tissue). The thermoregulatory system of the human body maintains a constant core temperature (about 37 °C during rest) by establishing a balance between the production of the heat of metabolism and the release of heat into the environment. But during running or intense exercise, metabolic heat production can be increased by 10 to 20 times in relation to heat production at rest. Also, skin temperature is more affected by blood flow to the skin and by environmental conditions [18]. The color on the temperature scale does not always match the same temperature for all images, but the lowest color (dark blue) corresponds to the coolest part of the image and the brightest (white) hottest part of the image. E.g., in Fig. 4. (left) the white color corresponds to 26 °C and to the Fig. 4. (right) to 36 °C.

Person Detection in Thermal Videos Using YOLO

257

Fig. 4. IR sensors are very sensitive to changes in ambient temperature (https://www.wildlabs. net/resources/case-studies/hwc-tech-challenge-update-comparing-thermopile-and-microbolometerthermal).

3 YOLO Detector The original YOLO object detector (YOLOv1) [8] uses a convolutional neural network to simultaneously predict the bounding boxes of multiple objects in an image and associate them with the confidence of the class they belong to. The YOLOv1 architecture consists of 24 convolutional layers and two fully connected layers. The convolutional layers perform feature extraction while the fully connected layers predict locations of bounding boxes and their probabilities. The input image is first divided into an S S grid. Two bounding boxes and corresponding class confidences are associated with each grid cell, so at most two objects can be detected within a cell. If an object occupies more than one cell, the center of all cells is selected as the holder of prediction for that object. The bounding box that doesn’t hold any object has a confidence value assigned to zero when training the model. The confidence value of a bounding box that contains an object or has an intersection with an object corresponds to the intersection-over-union (IoU) score of the bounding box and the ground truth box. In the version 2 of the Yolo detector (YoloV2) [19], five of the convolutional layers of the original model are replaced with max-pooling layers, and the way in which the bounding box suggestions are generated was changed. Instead of predicting coordinates of the bounding box for each cell, predefined anchor boxes are used. Anchor boxes are defined in a training set, using k-means clustering of ground truth bounding boxes where boxes translations are relative to a grid cell. In the third version of the Yolo detector (YoloV3) [12, 20], instead of the 19-layer feature extraction network, a much deeper network is used, consisting of 53 layers of 3 3 and 1 1 filters with skip connections and without fully-connected layers, Fig. 5. Instead of pooling layers, a convolutional layer with stride 2 is used to downsample the feature map and pass size-invariant feature forward. Also, the bounding box prediction was refined, using features at 3 different scales to make 3 sets of box predictions for each location. The classification method has also been changed, so now multilabel classification is used. An object may, in that case, belong to more than one class simultaneously, which is achieved by replacing the soft-max with logistic regression.

258

M. Ivasic-Kos et al.

Fig. 5. YoloV3 network architecture (https://www.cyberailab.com/home/a-closer-look-at-yolov3).

4 Experiment Setup In the experiment, the effectiveness of the YoloV3 detector in surveillance applications when using IR cameras is examined. The task is to detect a person on thermal images collected at different weather conditions. As the baseline model, the original YOLOv3 network (referred to as bY) trained on the COCO RGB [21] image datasets is used, which proved successful in detecting different object classes [22] in RGB images. Although the thermal image differs significantly in color and detail from RGB images, reasonably good results are expected with the bY model for IR imaging for two reasons. First, it can be assumed that some convolutional layers trained on RGB images will extract shape features that will be similar in thermal imaging, so those features trained on RGB images will also be useful for thermal imaging. Then, the experiment [8] showed that the effectiveness of the YOLO detector, when applied to the task of person detection in art images, was less degraded than with other detectors, even though art images were not used for training the model.

Person Detection in Thermal Videos Using YOLO

259

The original model was further trained for person class on thermal images from a custom data set (called tY), and the results of both models are compared on thermal images taken in different weather conditions. A. Dataset The data used in the experiment were collected by recording people during the night in different weather conditions and at different distances from the camera. FLIR ThermalCAM P10 thermal focal surface (FPA) camera with uncooled bolometer that covers the spectral range between 7.5 and 13 µm (LWIR) was used. Telephoto lens series FLIR P/B with 7° 5.3 V FOV and 3.5 magnification was used along with cameras and lenses with 24° 18°/0.3 m field of view as basic equipment. Five men and two women were recorded in the winter period (in February 2017) moving in a normal and hunched position, either with a normal walking speed or running, in several lenses and range configurations [23]. The recording was done in different weather conditions, with appropriate settings. In the clear weather, the distance of people from the camera was 110 m (baseline) or 165 m. In the heavy fog, with a minimum visibility of up to 5 m, the shooting was done with people moving at less than 30 m from the camera and at 50 m from the camera. In the fog conditions, it was not possible to use standard lenses or to record at larger distances, so only telephoto lenses were used. In the heavy rain, people were moving at 30 m, 70 m, 110 m, 140 m, 170 m, 180 m and 215 m from the camera. After the videos had been captured, individual frames were extracted from the video to create a data set. About 15,000 images in the set were shot with a telephoto lens in clear, fogy or rainy weather conditions and about 6,000 images were shot with a standard lens in clear weather [24]. The images were manually annotated using the VGG Image Annotator (VIA) [25]. In this experiment, about 1,000 images for each weather condition were used for training. B. Evaluation measure The mean average precision (mAP) criteria like the one used in the PASCAL VOC 2012 competition [26] is used to evaluate the performance of the models. To get the mAP value, the mean of Average Precision (AP) values of all classes is calculated, but in this experiment, only the Person class is considered. The detection results are compared with the ground truth. A detection is counted as a true positive if the intersection-over-union (IoU) score of the detected bounding box and the corresponding ground truth bounding box is greater than or equal to 50%. An example of positive and negative object detection with respect to intersection-overunion score in the case of person detection is shown in Fig. 6.

260

M. Ivasic-Kos et al.

Fig. 6. Negative (left) and positive (right) representation of IoU criteria

When the same object is detected multiple times, only one detection is counted as a true positive.

5 Evaluation Results and Discussion A precision versus recall curve for the desired class (here Person) is produced by varying the confidence threshold of the detector. The AP score is then the area underneath the precision versus recall curve. Figure 7 presents the AP score for the original YOLO model, bY, that was not trained on thermal images, while Fig. 8 corresponds to the AP score for model tY that was additionally trained for the class Person on the custom dataset. Additional training improved the baseline results, so the AP score of 97.93% achieved with the model tY significantly exceeds the AP score of 19.63% achieved by bY. For example, the bY model achieves 100% precision with the recall of 15.5%, while the model tY achieves the same precision with a much higher recall of approx. 50%, meaning that the tY model detects a lot more people present in the images with the same precision.

Fig. 7. AP score and precision versus recall curve for baseline YOLO model, bY (left), and for custom trained YOLO model, tY (right)

Person Detection in Thermal Videos Using YOLO

261

Fig. 8. Results of person detection on images recorded with a normal lens in clear weather condition, distance 110–160 m, using the bY (a) opposite to tY model (b).

Figure 8 shows an example (Image no. 6597) of the detection results of both bY and tY models on clear weather, recorded with a standard lens. The tY model has detected all three persons in the image (true positive detection, TP), even though they are about 150 m away from the camera and take up only a few pixels. The bY Model managed to detect only one out of three persons present in the image, (two false negative detections, FN). This is also an unexpectedly good result because the silhouettes of persons are tiny and the temperature difference between persons and vegetation is not so large, so it is not easy to notice people at that distance even for a security guard. Figures 9, 10 and 11 show the AP score for the different weather conditions. The tY model achieves an AP score of 97.85% for clear and foggy weather, while for the rain the AP score is even better, at 98.8%. Looking at the results fixed at 100% precision, the tY model achieves a recall score of 35% in the clear weather, 50% in the foggy weather and 75% in the rain.

Fig. 9. Precision versus recall curve for clear weather images

262

M. Ivasic-Kos et al.

Fig. 10. Precision versus recall curve for images taken in foggy conditions.

Fig. 11. Precision versus recall curve for images taken in the rain.

Figures 12, 13, 14 and 15 shows few examples of the detection results in thermal images for the tY model. The examples show that the model provides good detection results for people at different shooting distances, with different types of motion including normal walking, running and sneaking, and in all tested weather conditions. It is interesting that the model managed to detect people at large distances regardless of the mode of movement, even when the thermal difference between a person and a tree trunk was low. The model also managed to detect people when they stood still (Fig. 12 (a)). Although there are similarities between persons and tree trunks in their contours and temperature curves, there are no false positive detections in the image.

Person Detection in Thermal Videos Using YOLO

263

Fig. 12. Results of person detection on images recorded with telephoto lens in clear weather condition, 110 m distance, (a) normal walk (b) running.

In the case of fog, the dispersion of the temperature is the greatest then in the other observed weather conditions. However, despite this, the model has achieved positive detection of persons both near (Fig. 13(b)) and at large distances from the camera, Fig. 13(a).

Fig. 13. Results of person detection (normal walk) on images recorded in foggy weather with a normal lens, (a) 50 m, (b) sem-cat-rules => con-rules => syn-rules => morph-rules => final form

5

6 7

Although SLIME worked flawlessly during the entire experiment, it did crash once due to a faulty interaction with Swank. It was a major problem that could not be solved using the graphical interface. The Babel software can be downloaded for free at: https://www.fcg-net.org/ download/ and http://emergent-languages.org/Babel2/. The algorithm is described in detail in Steels [16, pp. 12–13].

Figurative Language Grounding

351

– Parsing: morph-rules => lex-stem-rules => sem-cat-rules => syn-rules => con-rules => final meaning The FCG formalism (bearing great resemblance to Lisp; cf. https://www. fcg-net.org/tech-doc/ for technical details and taxonomies) is logic-based: for instance, variables preceded by a question mark are assumed to be logical variables and may get bound when matching (see Steels in [16, p. 8]). FCG may be, thus, perceived as an echo of the symbolic approach to Artificial Intelligence (cf. Flasi´ nski in [4, pp. 16–25]). Both the semantic and syntactic structures consist of slots.8 As regards the semantic aspect, referent and meaning are the major slots. In the case of the syntactic structure it is the form and the syntactic category. Formally, each rule is a 4-tuple (Steels in [16, p. 11]): def- Where is used to assess the communicative success in multi-agent cognitive experiments. In this paper this parameter will be ignored due to its redundancy. Nevertheless, it ought to be added that measuring the strength of a rule is crucial in studying the phenomenon of language emergence. Thanks to this parameter, agents can update and assess particular utterances, thus tuning the language competence to their linguistic environment. The diagram below (Fig. 1) presents an overview of particular rules employed in FCG:

Fig. 1. The interplay of rules in FCG (reproduced from Steels in [16, p. 6]). 8

See Steels in [16, p. 11] for an in-depth discussion of slots.

352

4

M. Gw´ o´zd´z

Setup and Analysis

In order to launch Babel 2 (having installed all the packages successfully), a .lisp file needs to be created.9 The following (see Fig. 2) command (in my case all commands are executed on the Mac OS Terminal) is used to launch Babel 2 (provided that SLIME has been set up properly, it should work automatically): open -a emacs .ccl-init.lisp

Fig. 2. Launching of Babel 2 in Emacs.

Since FCG comes as a package10 , it has to be launched separately by means of the following commands: asdf:operate ’asdf:load-op :fcg) (in-package :fcg) To my mind, one of the most important features of the FCG package is the browser tracker which enables the researcher to observe the results in convenient graphical form via the web interface. In order to activate the tracker, the following command has to be executed: (activate-monitor trace-fcg) 9 10

It is advised that the launch file be named .ccl-init.lisp and placed in the root user’s directory. Another package, IRL (Incremental Recruitment Language) is also employed in studies on language grounding in robots. However, FCG is better suited for the purposes of my paper.

Figurative Language Grounding

353

Afterwards, the user’s browser (preferably Safari or Firefox) needs to be opened at http://localhost:8000/. Successful activation is indicated by the following screen (Fig. 3):

Fig. 3. Launching of the FCG package and activation of the browser tracker.

The so-called “initial transient structures” need to be set for parsing, production, and construction (respectively): (defparameter *transient-structure->* nil) (defparameter *transient-structure*) (show *transient-structure