6th EAI International Conference on Robotic Sensor Networks (EAI/Springer Innovations in Communication and Computing) 3031338251, 9783031338250

This book presents the proceedings of the 6th EAI International Conference on Robotics and Networks 2022 (ROSENET 2022).

120 35 5MB

English Pages 122 [117] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

6th EAI International Conference on Robotic Sensor Networks (EAI/Springer Innovations in Communication and Computing)
 3031338251, 9783031338250

Table of contents :
Conference Organization
Steering Committee
Organizing Committee
General Chair
General Co-chairs
TPC Chair and Co-chair
Web Chairs
Publicity and Social Media Chairs
Workshops Chairs
Sponsorship & Exhibits Chairs
Publications Chairs
Panels Chairs
Tutorials Chairs
Demos Chairs
Posters and PhD Track Chair
Local Chairs
Technical Program Committee
Technical Program Committee Chairs
Technical Program Committee Members
Preface
Contents
About the Editors
An Intelligent Learning System Based on Robotino Mobile Robot Platform
1 Introduction
2 System Description and Analysis
2.1 Drive Subsystem
2.2 Sensor Subsystem
2.3 I/O Interface
2.4 Application Programming Interface
3 New Learning System with Robotino
3.1 Graphic Programming Interface
3.2 Application Programming Interface
4 Conclusion
References
A Neocognitron Based on Multi-objective Optimization for Few-Shot Learning
1 Introduction
2 Related Work
3 Network Architecture
4 Multi-Objective Optimization
5 Object Localization with the Neocognitron
6 Conclusion
References
Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID
1 First Section
2 Research on Fuzzy PID Control Algorithm of Temperature Control System
2.1 Control Object Modeling
2.2 Design of PID Controller
2.3 Design of Fuzzy PID Controller
2.3.1 Controller Design of Membership Function
2.3.2 Controller Control Rule Design
3 Analysis of Simulation Results
4 Conclusion
Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles to Solve the Blind Man's Buff Problem
1 Introduction
2 Problem Formulation
3 Background Information
4 Proposed Implementation
5 Results and Simulations
5.1 Implementation Details
5.2 Solution Quality
6 Conclusion
References
Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory of Acceptance and Use of Technology (UTAUT) Model Approach
1 Introduction
2 Literature Review and Hypotheses Formulation
2.1 Technology Readiness Index (TRI)
2.2 Technology Acceptance Model (TAM)
2.3 Perceived Usefulness
2.4 Innovativeness and Perceived Usefulness
2.5 Trust and Perceived Usefulness
2.6 Insecurity and Perceived Usefulness
2.7 Identification of Research Gap
3 Research Model
3.1 Research Methodology and Research Design
3.2 Data Analysis Method
4 Empirical Findings
4.1 Demographic Information
5 Conclusions and Discussions
5.1 Managerial Implications
5.2 Scope and Limitations
References
A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention Network for Facial Expression Recognition
1 Introduction
2 The Proposed Method
2.1 Overview
2.2 Shared Feature Extraction Module
2.3 Shared CNN
2.4 Structural Parameters of the Network
3 Experiment
3.1 Datasets
3.1.1 CK+ Dataset
3.1.2 JAFFE Dataset
3.1.3 RAF-DB Dataset
3.1.4 FER-2013 Dataset
3.2 Data Augmentation
3.3 Comparison with Other Works
3.4 Result Analysis
4 Conclusion
References
Real Time Surgical Instrument Object Detection Using YOLOv7
1 Introduction
2 Surgical Instrument Detection Algorithm
2.1 YOLO Algorithm
2.2 Network Structure of YOLOv7
3 Experimental Design and Implementation
3.1 Real Surgical Dataset
3.2 Evaluation Indicators
3.3 Experimental Environment
3.4 Experimental Results and Analysis
4 Conclusions
References
A Lightweight Blockchain Framework for Visual Homing and Navigation Robots
1 Introduction
2 State of the Art
3 Methodology and Framework Design
3.1 Motivating Example
3.2 Navigation Use Case
3.3 Panoramic State Update
4 Evaluation
4.1 Testbed
4.2 Performance Evaluation
4.3 Performance Discussion
4.3.1 Latency and Transmission Time Overhead
4.3.2 Scalability and Throughput
4.3.3 Denial of Service (DoS)
5 Conclusions
References
Index

Citation preview

EAI/Springer Innovations in Communication and Computing

Predrag S. Stanimirović Yudong Zhang Dunhui Xiao Xinwei Cao   Editors

6th EAI International Conference on Robotic Sensor Networks

EAI/Springer Innovations in Communication and Computing Series Editor Imrich Chlamtac, European Alliance for Innovation, Ghent, Belgium

The impact of information technologies is creating a new world yet not fully understood. The extent and speed of economic, life style and social changes already perceived in everyday life is hard to estimate without understanding the technological driving forces behind it. This series presents contributed volumes featuring the latest research and development in the various information engineering technologies that play a key role in this process. The range of topics, focusing primarily on communications and computing engineering include, but are not limited to, wireless networks; mobile communication; design and learning; gaming; interaction; e-health and pervasive healthcare; energy management; smart grids; internet of things; cognitive radio networks; computation; cloud computing; ubiquitous connectivity, and in mode general smart living, smart cities, Internet of Things and more. The series publishes a combination of expanded papers selected from hosted and sponsored European Alliance for Innovation (EAI) conferences that present cutting edge, global research as well as provide new perspectives on traditional related engineering fields. This content, complemented with open calls for contribution of book titles and individual chapters, together maintain Springer’s and EAI’s high standards of academic excellence. The audience for the books consists of researchers, industry professionals, advanced level students as well as practitioners in related fields of activity include information and communication specialists, security experts, economists, urban planners, doctors, and in general representatives in all those walks of life affected ad contributing to the information revolution. Indexing: This series is indexed in Scopus, Ei Compendex, and zbMATH. About EAI - EAI is a grassroots member organization initiated through cooperation between businesses, public, private and government organizations to address the global challenges of Europe’s future competitiveness and link the European Research community with its counterparts around the globe. EAI reaches out to hundreds of thousands of individual subscribers on all continents and collaborates with an institutional member base including Fortune 500 companies, government organizations, and educational institutions, provide a free research and innovation platform. Through its open free membership model EAI promotes a new research and innovation culture based on collaboration, connectivity and recognition of excellence by community.

Predrag S. Stanimirovi´c • Yudong Zhang • Dunhui Xiao • Xinwei Cao Editors

6th EAI International Conference on Robotic Sensor Networks

Editors Predrag S. Stanimirovi´c Faculty of Sciences and Mathematics University of Niš Niš, Serbia

Yudong Zhang Department of Informatics University of Leicester Leicester, UK

Dunhui Xiao School of Mathematical Sciences Tongji University Shanghai, China

Xinwei Cao School of Business Jiangnan University Wuxi, China

ISSN 2522-8595 ISSN 2522-8609 (electronic) EAI/Springer Innovations in Communication and Computing ISBN 978-3-031-33825-0 ISBN 978-3-031-33826-7 (eBook) https://doi.org/10.1007/978-3-031-33826-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.

Conference Organization

Steering Committee Imrich Chlamtac Shuai Li

Bruno Kessler Professor, University of Trento, Italy Swansea University, UK

Organizing Committee General Chair Shuai Li

Swansea University, UK

General Co-chairs T.D.Subash Yudong Zhang

Mangalam College of Engineering, India University of Leicester, UK

TPC Chair and Co-chair J. Ajayan S R University, India Predrag S. Stanimirovi´c University of Niš, Serbia

v

vi

Conference Organization

Web Chairs Bolin Liao Jishou University, China T Senthil Siva Subramanian Sharda Group of Institutions, India

Publicity and Social Media Chairs Titus Issac Vasilios N. Katsikis

Karunya Institute of Technology & Science, India National and Kapodistrian University of Athens, Greece

Workshops Chairs Arunachalla Perumal Wojciech Hunek

S.A Engineering College, India Opole University of Technology, Poland

Sponsorship & Exhibits Chairs Ivona Brajevi´c Jeneesh Scaria

University Business Academy in Novi Sad, Serbia Mangalam College of Engineering, India

Publications Chairs Eugene Peter Xinwei Cao

Mangalam College of Engineering, India Shanghai University, China

Panels Chairs Dunhui Xiao Llyods Raja Zhenzhong Liu

Tongji University, China National Institute of Technology, India Tianjin University of Technology, China

Tutorials Chairs Ameer Hamza Khan K J Jeepa

Hong Kong Polytechnic University Mangalam College of Engineering, India

Conference Organization

vii

Demos Chairs Yiguo Yang Zhan Li

Shanghai University, China Swansea University, UK

Posters and PhD Track Chair Ameer Tamoor Khan

Hong Kong Polytechnic University, China

Local Chairs Muhammad Rehan Jarin T

Pakistan Institute of Engineering and Applied Sciences (PIEAS), Pakistan Jyothi Engineering College, India

Technical Program Committee Technical Program Committee Chairs J. Ajayan S R University, India Predrag S. Stanimirovi´c University of Niš, Serbia

Technical Program Committee Members Abdul Rehman Khan Ameer Hamza Khan Ameer Tamoor Khan Bolin Liao Eugene Peter Ivona Brajevi´c J Ajayan Muhammad Rehan Predrag Stanimirovi´c T.D. Subash

Pakistan Institute of Engineering and Applied Sciences, Pakistan Hong Kong Polytechnic University, Hong Kong Hong Kong Polytechnic University, China Jishou University, China Mangalam College of Engineering, India University Business Academy in Novi Sad, Serbia S R University, Warangal, India Pakistan Institute of Engineering and Applied Sciences, Pakistan University of Niš, Faculty of Sciences and Mathematics, Serbia Mangalam College of Engineering, India

viii

Vasilios Katsikis Wojciech Hunek Xinwei Cao Yiguo Yang Zhan Li Zhenzhong Liu

Conference Organization

National and Kapodistrian University of Athens, Greece Opole University of Technology, Poland Shanghai University, China Shanghai University, China Swansea University, UK Tianjin University of Technology, China

Preface

We are delighted to introduce the proceedings of the 6th EAI International Conference on Robotics and Networks, 2022 (EAI ROSENET 2022). This conference has brought researchers, developers and practitioners around the world who are involved in robotics and knowledge-based AI systems experts along with interdisciplinary approaches. EAI ROSENET 2022 targets research at the intersection of AI/machine learning applications, algorithms, software, and hardware with deeply embedded intelligent autonomous robotic systems, to keep the growing momentum demonstrated by technical progress and ecosystem development. The technical program of EAI ROSENET 2022 consisted of eight full papers, including three invited papers in oral presentation sessions at the main conference tracks. The conference tracks were as follows: Track 1 – A multi-region feature extraction and fusion strategy based CNN-attention network for facial expression recognition; Track 2 – Real-time surgical instrument object detection using Yolov7; and Track 3 – A lightweight blockchain framework for visual homing and navigation robots. Aside from the high-quality technical paper presentations, the technical program also featured two keynote speeches, one invited talk and two technical workshops. The two keynote speakers were Prof. Yu-Dong Zhang from School of Computing and Mathematical Sciences, University of Leicester, UK, and Dr. Zhan Li from Swansea University, United Kingdom. The main content of their keynote speeches can be summarized as follows: (1) COVID-19 is a pandemic disease that caused more than 6.25 million deaths until 10th May 2022. A CT scan is a medical imaging technique used in radiology to get detailed images of the body non-invasively for diagnostic purposes. This talk will discuss the advanced neural networks for chest CT-based COVID-19 diagnosis. This talk will cover other chest-related diseases: secondary pulmonary tuberculosis and community-acquired pneumonia. (2) Nowadays, industrial robotic manipulators have been playing important roles in manufacturing fields, such as welding and assembling, by performing repetitive and dull work. Such long-term industrial operations usually require redundant manipulators to keep good working conditions and maintain the steadiness of joint actuation. However, some joints of redundant manipulators may fall into fault status ix

x

Preface

after enduring long-period heavy manipulations, causing the desired industrial tasks to not be accomplished accurately. We propose a novel fault-tolerant method with a simultaneous fault-diagnose function for motion planning and control of industrial redundant manipulators. The proposed approach can adaptively localize which joints run away from the normal state to be the fault, and it can guarantee to finish the desired path tracking control even when these fault joints lose their velocity to actuate. Simulation and experiment results on a Kuka LBR iiwa manipulator demonstrate the efficiency of the proposed fault-tolerant method for motion control of the redundant manipulator. The invited talk was presented by Prof. Ata Jahangir Moshayedi from Jiangxi University of Science and Technology, China. The two workshops organized were the robotics and knowledge-based AI systems along with interdisciplinary approaches to computer science, control systems, computer vision, machine learning, electrical engineering, intelligent machines, mathematics and other disciplines. Coordination with the steering chairs, Imrich Chlamtac, Shuai Li, T.D. Subash and Yudong Zhang was essential for the success of the conference. We sincerely appreciate their constant support and guidance. It was also a great pleasure to work with such an excellent organizing committee team for their hard work in organizing and supporting the conference. Committee, led by our TPC co-chairs Dr. J. Ajayan and Dr. Predrag S. Stanimirovi´c have completed the peer-review process of technical papers and made a high-quality technical program. We are also grateful to Conference Managers, Veronika Kissova, Managing Editor, Eliška Vlˇcková, Publishing Chair Predrag S. Stanimirovi´c, Yudong Zhang, Dunhui Xiao and Xinwei Cao. Thank you, Yiguo Yang and all staff of the Organizing Committee, for her support and all the authors who submitted their papers to the EAI ROSENET 2022 conference and workshops. We strongly believe that the EAI ROSENET 2022 conference provides a good forum for all researchers, developers and practitioners to discuss all science and technology aspects that are relevant to smart grids. We also expect that the future EAI ROSENET conference will be as successful and stimulating, as indicated by the contributions presented in this volume. Niš, Serbia Leicester, UK Shanghai, China Wuxi, China

Predrag S. Stanimirovi´c Yudong Zhang Dunhui Xiao Xinwei Cao

Contents

An Intelligent Learning System Based on Robotino Mobile Robot Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Phan Van Vinh, Phan Xuan Dung, Tran Thi Thuy Hang, and Truong Hong Duc 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 System Description and Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 New Learning System with Robotino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Neocognitron Based on Multi-objective Optimization for Few-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hao Pan, Guiyu Guo and Daming Shi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Object Localization with the Neocognitron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gang He, Xiaohua Cai, Yuntao Hou, and Yan Ye 1 First Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Research on Fuzzy PID Control Algorithm of Temperature Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Analysis of Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1 4 10 14 15 17 17 19 19 21 23 24 25 27 27 28 37 38

xi

xii

Contents

Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles to Solve the Blind Man’s Buff Problem . . . . . . . . . . . . . . . . . . Gokul P 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Problem Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Background Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Proposed Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Results and Simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory of Acceptance and Use of Technology (UTAUT) Model Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ankita Bhatia, Arti Chandani, Pravin Kumar Bhoyar, and Rajiv Divekar 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Literature Review and Hypotheses Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Research Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Empirical Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusions and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention Network for Facial Expression Recognition . . . . . . . . . . . . . . . . Yanqiang Yang and Hui Zhou 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Real Time Surgical Instrument Object Detection Using YOLOv7 . . . . . . . . . Laiwang Zheng and Zhenzhong Liu 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Surgical Instrument Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Experimental Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Lightweight Blockchain Framework for Visual Homing and Navigation Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Rahouti, Damian M. Lyons, and Lesther Santana 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Methodology and Framework Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41 41 42 43 45 46 50 51

53 53 54 57 59 63 64 67 67 68 72 76 77 81 81 82 85 89 89 91 91 93 94

Contents

xiii

4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

About the Editors

Predrag S. Stanimirovi´c earned his PhD degree from the Faculty of Philosophy, University of Niš, Niš, Serbia, in 1996. He is working as a full professor in the Department of Computer Science, Faculty of Sciences and Mathematics, University of Niš. He has authored over 260 publications in various scientific journals, including six research monographs. His current research interests include research that encompasses diverse fields of mathematics, applied mathematics and computer science, which span multiple branches of numerical linear algebra, recurrent neural networks, symbolic computation and operations research. Yudong Zhang is a professor at the School of Computing and Mathematical Sciences, University of Leicester, UK. His research interests include deep learning and medical image analysis. He is the Fellow of IET, Fellow of EAI and Fellow of BCS. He is the senior member of IEEE, IES and ACM. He is the distinguished speaker of ACM. He was the 2019 and 2021 recipient of Clarivate highly cited researcher. He has (co)authored over 400 peer-reviewed articles. There are more than 50 ESI highly cited papers and 5 ESI hot papers in his (co)authored publications. His citation reached 22817 in Google Scholar (h-index 84). He has conducted many successful industrial projects and academic grants from NIH, Royal Society, GCRF, EPSRC, MRC, Hope, British Council and NSFC. He has served as (co-)chair for more than 60 international conferences and workshops (including more than 10 IEEE or ACM conferences). More than 50 news presses have reported his research outputs, such as Reuters, BBC, Telegraph, Physics World, and UK Today News. Dunhui Xiao is a professor at Tongji University (Shanghai, China). He obtained his PhD from Imperial College London where he did his post-doc. His research interests include numerical modelling with a focus on non-intrusive reduced-order modelling of Navier-Stokes equations, fluid-structure interactions and multiphase flows in porous media. He is also interested in data-driven modelling, data science, physical data combined machine learning and optimisation. He is a PI of a number of grants including EPSRC. He sits on the editorial boards for a number of journals, and he is the reviewer for many journals and the EPSRC grants.

xv

xvi

About the Editors

Xinwei Cao received bachelor’s degree from Shandong University, Jinan, China, 2009; master’s degree from Tongji University, Shanghai, China, in 2012; and PhD degree from Fudan University, Shanghai, China, in 2017, all in management. She is currently an assistant professor (lecturer) with Jiangnan University, Wuxi, China, and also a visiting scholar with Swansea University, UK. Her research interests include operations research, operational management, computational and quantitative finance, empirical asset pricing and financial market.

An Intelligent Learning System Based on Robotino Mobile Robot Platform Phan Van Vinh, Phan Xuan Dung, Tran Thi Thuy Hang, and Truong Hong Duc

1 Introduction Nowadays, mobile robotics has increasingly attracted scientific research in all aspects of modern life with a wide range of applications, from industrial application to nonindustrial applications or domestic applications [1]. Traditionally, a mobile robot is an autonomous mobile machine which can complete its mission by moving around in a given area, collecting environmental data, and performing tasks based on the information gained from attached sensors. Recently, the benefits in design and implementation of mobile robots have made way for the developing of mobile robots in manufacturing, research, and education to satisfy the increased users’ demand. In this context, it is important to apply the advantages of mobile robots to modern educational system to improve the effectiveness in training and research activities. In fact, it is obvious that mobile robots have been increasingly used in all levels of education, from elementary school to graduate school, to develop scientific knowledge and experimental skills for students through a set of different activities. For example, students at the levels from elementary school to high school usually get acquainted with a robotic system through STEM program (science, technology, engineering, and math). A well-known product used in the STEM program is LEGO robot, a programmable robot based on LEGO building blocks with many built-in features that allow students to practice scientific projects to enhance their

P. Van Vinh () · T. T. T. Hang · T. H. Duc School of Computing and Information Technology, Eastern International University, Binh Duong City, Vietnam e-mail: [email protected]; [email protected]; [email protected] P. X. Dung School of Engineering, Eastern International University, Binh Duong City, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_1

1

2

P. Van Vinh et al.

Fig. 1 Robotino robot and cyber-physical factory system at EIU I4.0 lab

knowledge [2]. The STEM program is highly effective in developing teamwork and self-confidence for students at any level of education. However, it does not seriously focus on technical details inside a robotic system or helps students to fully understand how it works. Therefore, at the graduate school level which requires higher level of learning outcomes of knowledge and practical skill, a programmable robot with flexible and adaptive design can be considered as a potential option. Students can work in hands-on projects to design and implement their own robot for a specific purpose as shown in [3]. However, this kind of task is very challenging since it takes quite long time in implementation and requires an appropriate level of technical skills. Therefore, to mitigate the efforts of doing such challenging task, one of the practical approaches is to use a commercial product, such as Robotino robot developed by the German company “Festo Didactic” [4]. Robotino, a mobile robot is well designed with salient hardware and software components so that it can be used either in a standalone work or in a complex system (e.g., Festo Cyber-Physical Factory system), as illustrated in Fig. 1. In fact, there are numerous projects that can be paired with Robotino, such as collecting and analyzing of data from sensors for various purposes, processing digital images, and tracking objects, navigation system, and several other projects related to mobile and service robotics. Basically, Robotino is based on an omnidirectional drive assembly enabling it to move freely in all directions with a speed of up to 10 km/h. Robotino is controlled by an industry-standard computer system, which helps it communicate with other robots via wireless interface to form a fully autonomous robotic system network. Table 1 illustrates the technical specifications of Robotino robot. With the significant advantages in hardware and software design, Robotino has been already used in many educational and research projects. In fact, from the first release of Robotino robot platform, it was already used in training and research

An Intelligent Learning System Based on Robotino Mobile Robot Platform

3

Table 1 Technical specifications of Robotino Feature Power Drive Computer/controller

Tool Connectivity Maximum payload Maximum speed Sensors Digital inputs Digital outputs Analog inputs Output relays

Specification 2 × 4A-12 V DC lead-gel batteries with maximum of 4 hours operation time 3 × 24 V DC motors with omnidirectional wheels Intel i5, eighth generation, 2.5 GHz Integrated UHD graphics 630 Main memory: 8 GB RAM Hard disk: 64 GB SSD Linux-based operating system (Ubuntu) Robotino View, Robotino Factory, C/C++, Java, Microsoft. NET, MATLAB/Simulink, LabVIEW, ROS, SmartSoft Wireless LAN communication, Ethernet 40 kg 10 km/h 9 × proximity sensors, 3 × wheel encoders, gyroscope, line sensors, bumper 8 (24 V) 8 (24 V) 8 (0–10 V) 10 point, italic

project to enhance students understanding of robot [5] or to join international competitions which motivate students and/or researchers to use mobile robots more efficiently [6]. There are many research topics that should be considered to improve the robot performance in practical environment such as obstacle avoidance which help robot reach destination without collision [7–9] or localization and mapping to build a map of its surroundings [10, 11] based on sensor data and odometry information. However, it is not clear that how good the system can perform when applying these methods in the real environment. On the other hand, some projects focus on building practical applications for particular purposes. For example, one interesting project in [12] was proposed to control the speed of Robotino by brain waves platform. Another application is to use the Robotino robot to collect and transport mail in an office [13]. In this project, Robotino is implemented and tested in an indoor environment between two offices and a home position. However, this project needs the support of Raspberry Pi device to host the user webpage and cooperate with Robotino to perform task assignments. The authors in [14, 15] presented different applications with Robotino for training and research. However, the potential approach which allows users to develop their own applications using programming interface was not discussed clearly. With the aim of providing a deep understanding of Robotino design and potential approaches in education and research, in this paper, we study the main hardware and software structure of Robotino robot, introduce the analysis of motion model, and then apply it in developing Robotino applications. In addition, we also present the design of a new learning system with Robotino robot which is essential for

4

P. Van Vinh et al.

developing educational programs and research activities. In summary, the main contribution of this paper is as follows: (1) to study the main Robotino design and its motion model and (2) to design a new learning system with an interactive webbased and mobile-based interface framework for system control and administration purpose. The remaining paper is presented as follows: Section 2 provides the main design of our system and the Robotino programming interface and its applications in Sect. 3. Finally, we make some conclusions in Sect. 4.

2 System Description and Analysis This part is to provide the detailed hardware and software design of the Robotino system and the application programming interfaces (APIs) which can be applied in a wide range of educational projects and research implementation.

2.1 Drive Subsystem Robotino is equipped with three omnidirectional drive components attached at an angle of 120◦ to one another. Each component includes a motor, a gear set, alldirectional roller, toothed belt, and incremental encoder as shown in Fig. 2. Robotino can move freely in all directions with the speed of up to 10 km/h.

Fig. 2 Robotino drive subsystem. (a) Robotino with 3 omnidirectional drive units. (b) Each drive unit components

An Intelligent Learning System Based on Robotino Mobile Robot Platform Table 2 Robotino drive unit specifications

Parameter Motor nominal voltage Motor nominal speed Motor nominal torque Gear unit ratio Incremental encoder resolution All-way roller (wheel) diameter Maximum carrying capacity Robotino maximum speed

5 Value 24 V DC 3600 rpm 3.8 Ncm 32:1 2048 80 mm 40 kg 10 km/h

Fig. 3 A motion model of the three omnidirectional wheel Robotino robot

An incremental encoder is used to compare the desired speed and the motor’s actual speed; after that the result is adjusted by a PID (proportional integral derivative) controller which is connected to the I/O circuit board. For one full turn of the motor shaft, the encoder generates 2048 pulses. The information from the encoder is also used to determine the position of the mobile robot. A gear unit with a ratio of 32:1 is used between the drive shaft and the wheel. The detailed specifications of drive unit are shown in Table 2. Next, we present a detailed analysis of Robotino motion model that shows the relation between the Robotino speed, motor speed, and measured data of encoder (as shown in Fig. 3). These results can be very helpful in implementing a practical application to control and manage the Robotino system. Some notations are used in the following section as follows: SR : the travel path of Robotino (mm). vR : the translation speed of Robotino (mm/s), ωR : the rotation speed of Robotino (deg/s), α: the angle of path movement (rad),

6

P. Van Vinh et al.

R: the radius of the omnidirectional wheel (mm). The relationship between the indication of the motor encoder (ne ) and the angle of rotation of the motor (ϕm ) is given by: π 2π ne = 10 ne 2048 2

ϕm =

(1)

while the relation between the encoder value and the angle of rotation of the wheel (ϕw ) is given by: ϕw =

π ϕm = 15 ne [rad] 32 2

(2)

Based on the desired travel path and the speed of Robotino, the analytical relations that describe the relationship between the speed of Robotino, wheels, and motors are given below [15]. The speed of wheel 1 [mm/s]: vw1 =

π  πR ωR − vR sin −α 180 3

(3)

The angular speed of the motor 1 [rpm]: ϕM1

  π 960 π R vw1 ωR − vR sin −α × 60 = = 32 × π R 180 3 2π R

(4)

And the indication value of encoder 1: ne1 =

 π 215 SR sin (60 − α) 215 = × SR sin −α × 2π × 2π R πR 3 π

(5)

Using the same way, we get the results of wheel 2 and wheel 3 as follows: The speed of wheel 2 [mm/s]: vw2 =

πR ωR − vR sin (α) 180

(6)

The angular speed of the motor 2 [rpm]: ϕM2 =

  960 π R ωR − vR sin (α) π R 180

(7)

The indicated value of encoder 2: ne2 =

215 × SR sin (α) πR

(8)

An Intelligent Learning System Based on Robotino Mobile Robot Platform

7

Table 3 The relation between Robotino motion and motor speeds Direction of the Robotino motion Linear travel at α = 0 Linear travel at α = π /3 Linear travel at α = 2π/3 Linear travel at α = π Linear travel at α = 4π/3 Linear travel at α = 5π/3 Rotation on spot (counterclockwise direction) Rotation on spot (clockwise direction)

Motor 1 −βvR 0 βvR βvR 0 −βvR βvR −βvR

Motor 2 0 −βvR −βvR 0 βvR βvR βvR −βvR

Motor 3 βvR βvR 0 −βvR −βvR 0 βvR −βvR

The speed of wheel 3 [mm/s]: vw3 =

π  πR +α ωR + vR sin 3 180

(9)

The angular speed of the motor 3 [rpm]: ϕM3 =

 π  960 π R ωR + vR sin +α π R 180 3

(10)

The indicated value of encoder 3: ne3 =

π  215 +α × SR sin 3 πR

(11)

From the abovementioned analysis, the robot movement depends on the control of all three motors. Table 3 gives an example of the relation between the Robotino √ motion and the speed of each motor [rpm] with R = 40 mm, ωR = 0 and .= 12π 3 .

2.2 Sensor Subsystem As depicted in Fig. 4, the Robotino is equipped with a variety of sensors that can gather a range of useful information about its surroundings. In fact, there are many different types of sensors that can be used with Robotino, depending on which types of applications we want to implement. The list of main sensor devices supported in Robotino is given in Table 4. For the purpose of navigation and motion control, Robotino is equipped with nine infrared sensors (IR) arranged around its base at an angle of 40◦ to one another as shown in Fig. 5a. This design helps to detect objects or obstacles within the distance of 4–40 cm [16]. Each IR sensor reads the output signal of voltage whose value

8

P. Van Vinh et al.

Fig. 4 Robotino sensor subsystem

Table 4 Robotino sensor devices Sensor device Infrared distance sensors Inductive sensors Optical sensors Gyroscope Collision safety sensor (bumper) Color camera Laser rangefinder

Function Get distances to obstacles, used for navigation and motion control Detect metallic objects on the ground used for path control and fine positioning Guide Robotino along a defined path or to stop in a specific position with high accuracy Identify changes in orientation and enhance the precision of position sensing Stop movement for any possible damages on the moving path (e.g., hitting obstacles) Display live images, used for navigation, object detection, and mapping Get distances to obstacles, used for mapping and localization

depends on the distance to a reflective object. Figure 5b presents the relationship between the measured distance to a target object and the output values of voltage.

2.3 I/O Interface The I/O (input/output) interface of Robotino is used to connect additional actuators or sensors as shown in Fig. 6, including: • • • •

8 × analog inputs (0 to 10 V) (AIN1 to AIN8) 8 × digital inputs (DI1 to DI8) 8 × digital outputs (DO1 to DO8) 2 × relays (REL1 and REL2).

An Intelligent Learning System Based on Robotino Mobile Robot Platform IR3 IR4

120°

9

2.5

80°

IR2

2.0

40° 160°

IR1 0°

Voltage

1.5

IR5

1.0

200°

IR6

0.5

IR9 320° 240°

IR7

0

IR8

0

5

10

15

20

25

30

35

40

cm

45

Distance

(b) Infrared distance sensor in Robotino

(b) The relationship between the measured distance and voltage output

Fig. 5 Robotino with infrared distance sensors. (a) Infrared distance sensor in Robotino. (b) The relationship between the measured distance and voltage output Fig. 6 The I/O interface of Robotino

2.4 Application Programming Interface Robotino is a Linux-based environment that supports for various programming languages and systems via an application programming interface (API), including: – – – –

C/C++, JAVA, Matlab. LabVIEW and MATLAB/Simulink. Robotino View, Robotino Factory. RestAPI: HTTP-based interface ready for retrieval and transmission of information at runtime. – Microsoft. NET, SmartSoft, and ROS.

10

P. Van Vinh et al.

Table 5 Main RestAPI functions in Robotino RestAPI name /data/controllerinfo /data/festoolcharger /data/chargerID /camID /sensorimage /data/distancesensorarray /data/bumper /data/scan0 /data/odometry /data/omnidrive /data/services /data/servicestatus/serviceName /data/startService /data/stopService /data/ethinfo /data/ethconfig /data/wlaninfo /data/wlanconfig

Method GET GET GET GET GET GET GET GET GET PUT GET GET PUT PUT GET PUT GET PUT

Objective Get system version Get festool Li-ion batteries Get data of charger Get camera image Get distance sensor image Get distance sensor value Get status of bumper Get data of the laser rangefinder Get robot odometry Set forward and rotational velocity Get data of services Get status of a specific service Start a specific service Stop a specific service Get the wired network information Set the wired network configuration Get the wireless network information Set the wireless network configuration

The API allows users to create applications for Robotino with a variety of programming languages. For example, Robotino View, Robotino Factory, Matlab Simulink or LabVIEW could be used by new users with simple projects. However, SmartSoft and ROS can be used for more complex projects. For the communication between Robotino and external controllers, the links via OPC and TCP/IP are provided. With the latest version of Robotino-daemons package, Robotino can support for a RestAPI interface which is based on a Restful server listening at TCP port 80. The RestAPI interface can not only be used to control Robotino’s motion but can also be used to access Robotino’s sensors and actors. The list of main RestAPI functions in Robotino is shown in Table 5 [17].

3 New Learning System with Robotino In this section, we present the new learning environment, including graphic programming interface and application programming interface that are mainly designed based on Robotino robot for education, training, and research activities at different skill levels.

An Intelligent Learning System Based on Robotino Mobile Robot Platform

11

Fig. 7 An example of basic programming with Robotino View

3.1 Graphic Programming Interface For the beginners, Robotino View (Fig. 7) can be considered as a good choice to create the programs for controlling Robotino via wireless network. Moreover, Robotino View contains the functional blocks of all hardware components (motor, encoder, sensors, I/O, camera, gripper, robot arm) which are used to control the robot motion and get data signals from the sensors in its surroundings. To enhance the safety of Robotino robot when working in a dynamic environment, Robotino Factory can be used to create a map of its surrounding environment which can be very helpful to control the robot movement safely based on the data from laser rangefinder, infrared sensors, and bumpers (Fig. 8).

3.2 Application Programming Interface For advanced users, application programming interface (API) with a variety of programming environments is recommended for working with Robotino. A typical application can be written by using C/C++, JAVA, .Net, LabVIEW, MATLAB/Simulink, ROS, or SmartSoft. The following topics based on Robotino can be considered for training and research activities: Real-time control and monitoring system. Feedback control system with computer vision (object detection and recognition). Localization and mapping service. IoT-based application (sensor data acquisition and visualization).

12

P. Van Vinh et al.

Fig. 8 Building a map of surrounding environment with Robotino Factory

Fig. 9 Robotino web interface structure

In this context, we introduce a web-based interface and mobile interface framework that is crucial for developing educational programs and research initiatives using the Robotino robot. Web-Based Application Interface Figure 9 displays the design structure of the web-based interface used to operate the Robotino robot and process the gathered data. There are two types of user accounts available for logging into the Robotino system: a normal account, which only allows for viewing of gathered data and live streaming images of the surrounding environment, and an admin account, which has complete authority to manage the entire system. When an admin user logs into the system, they have complete access rights. At the Home page, depicted in Fig. 10a, the basic information of Robotino is displayed, such as controller version and status information of battery and charger. At the

An Intelligent Learning System Based on Robotino Mobile Robot Platform

13

Fig. 10 Web-based interface of Robotino: Home page and Config page. (a) Home page. (b) Config page

Fig. 11 Web-based interface of Robotino: Service page and Control page. (a) Service page. (b) Control page

Config page, system setting and setup can be performed by changing parameters of the configuration files (Fig. 10b). The system services can be controlled at the Service page as shown in Fig. 11a. Users can manually control the Robotino movement at the Control page (Fig. 11b). Moreover, web interface is used to monitor status, setup operating parameters, and go online help. This level of training is suitable for beginners or students who just start to explore the world of robot. Mobile-Based Application Interface Another approach is to build a mobile application interface that allows users interact with Robotino. This is necessary because mobile or smart phones are widely used nowadays. The design of mobile-based application interface has the same structure as the web-based interface as shown in Fig. 9. When accessing to the system with a registered account successfully (Fig. 12a), the basic information of Robotino such as controller version and status information of battery and charger is shown at the Home menu (Fig. 12b).

14

P. Van Vinh et al.

Fig. 12 Mobile-based application interface: Login and Home menu. (a) Login menu. (b) Home menu

The admin users can access more information such as live image and realtime sensor data as shown in Fig. 13. All data can be retrieved via the RestAPI service. Moreover, the user can manually control the movement of Robotino easily at the control menu with the live video from cameras (Fig. 13a). Obstacles on the movement path can be detected by sensor data and therefore Robotino can move to destination without collision (Fig. 13b).

4 Conclusion Mobile robots are becoming increasingly significant in our daily lives. This paper presents an intelligent learning system that provides different levels of using Robotino robot in education and research activities. Each level of education requires specific robot design with custom functions, hardware components, applications, and programming interfaces. Through the well-designed lessons and practical projects with mobile robots, students can enhance their knowledge and abilities, cultivate research skills, and foster creative thinking, decision-making, problemsolving, and teamwork. The new learning system with the Robotino robot has

An Intelligent Learning System Based on Robotino Mobile Robot Platform

15

Fig. 13 Mobile-based application interface: Control menu and Sensor menu. (a) Control menu. (b) Sensor menu

demonstrated its effectiveness in developing educational programs and research activities. However, in order to make the proposed learning system working more practically, some necessary features need to be improved, such as mapping, localization, and self-navigation techniques. Acknowledgments This research is financially supported by Eastern International University, Binh Duong Province, Vietnam.

References 1. Rubio, F., Valero, F., Llopis-Albert, C.: A review of mobile robots: Concepts, methods, theoretical framework, and applications. Int. J. Adv. Robot. Syst. 16(2), 1–22 (2019) 2. Demetriou, G.A.: Mobile Robotics in Education and Research. IntechOpen, chapter.22293 (2011) 3. Van Vinh, P., Thanh, T.M., Khang, V.D., An, N.H., Nhat, T.D.: A Mobile Robotic System for Rescue and Surveillance in Indoor Environment. In: 4th EAI International Conference on Robotics and Networks, pp. 1–16 (2021)

16

P. Van Vinh et al.

4. Festo Didactic: Robotino® Mobile robot platform for research and training. Denkendorf, 56940. (2013) 5. Nan, X., Xiaowen, X.: Robot experiment simulation and design based on Festo Robotino. In: IEEE 3rd International Conference on Communication Software and Networks, pp. 160–162. (2011) 6. Weinert, H., Pensky, D.: Mobile robotics in education and student engineering competitions. In: IEEE Africon’11, pp. 1–5. (2011) 7. Ali, T.Y., Ali, M.M.: Robotino obstacles avoidance capability using infrared sensors. In: IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. (2015) 8. Aydin, M., Erdemir, G.: An Object Detection and Identification System for a Mobile Robot Control. Balkan Journal of Electrical and Computer Engineering, pp.73–76 (2017) 9. Al-Dahhan, R.R., Al-Dahhan, M.R.H., Jebur, M.H.: Target Seeking and Obstacle Avoidance of Omni Robot in an Unknown Environment. In: International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1–7. (2020) 10. Wolf, D.F., Sukhatme, G.S.: Mobile Robot Simultaneous Localization and Mapping in Dynamic Environments. Auton. Robot. 19, 53–65 (2005) 11. Ciurea, C.-F., Duka, A.-V., Oltean, S.-E.: Automatic Mapping of an Enclosure Using a Mobile Robot. Procedia Technol. 12, 50–56 (2014) 12. Katona, J., Ujbanyi, T., Sziladi, G., K˝ovári, A.: Speed control of Festo Robotino mobile robot using NeuroSky MindWave EEG headset based Brain-Computer Interface. In: 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), pp. 251–256 (2016) 13. Rawashdeh, N., Alwanni, H., Sheikh, N., Afghani, B.: Control Design of an Office Mail Delivery Robot Based on the Festo Robotino Platform. In: ASEE North Central Section Conference (2019) 14. Crnoki´c, B., Grubisic, M., Volaric, T.: Different Applications of Mobile Robots in Education. Int. J. Integr. Technol. Educ. 6, 15–28 (2017) 15. Prsic, D., Stojanovic, V., Djordjevic, V.: A Constructive Approach to Teaching with Robotino® . In: 7th International Scientific Conference Technics and Informatics in Education, pp. 273–278 (2018) 16. Festo Didactic: Robotino Workbook (2016) 17. Robotino Wiki.: https://wiki.openrobotino.org/index.php?title=Rest_api

A Neocognitron Based on Multi-objective Optimization for Few-Shot Learning Hao Pan, Guiyu Guo, and Daming Shi

1 Introduction As we all know, the large-scale sample training method of neural networks is not suited for many practical applications, such as when there are insufficient training samples available for certain cancers or sudden new instances. On the other hand, even if there are sufficient samples in this era of big data, the sample tags that are used for network training not only squander both material and human resources, but they also produce tags that contain human mistake. As a result, it is of the utmost importance to construct a reliable model from a limited amount of data points. The human brain just needs to learn a few samples to recognize its variation patterns, which is the same concept as the few-shot learning method that the neural network needs to adopt. Few-shot learning is a cutting-edge approach in deep learning, where the model is trained on a limited number of examples, yet still able to show high performance. Deep learning is typically associated with big data, as having a large data set allows the machine to learn effectively. However, when data are limited, the model may be over fitting, meaning it performs poorly on unseen data and lacks generalization ability [1]. While data augmentation and regularization can mitigate the problem to some extent, few-shot learning provides a more comprehensive solution for small data sets. Currently, few-shot learning can be classified into three main categories: data augmentation, meta-learning, and metric learning [2]. Data augmentation involves transforming the existing examples to create additional, synthetic examples for the model to learn from. Meta learning, on the other hand, trains the model to

H. Pan · G. Guo · D. Shi () College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, P.R. China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_2

17

18

H. Pan et al.

learn how to learn, so that it can generalize better to new tasks with limited data. Lastly, metric learning focuses on learning an appropriate similarity metric between examples, allowing the model to make more accurate predictions with limited data. Overall, few-shot learning provides a powerful solution to the challenge of training deep learning models on limited data, making it an essential technology in many practical applications. Data augmentation is a technique that increases the size of the training data set by applying various transformations to the existing data, to tackle the issue of limited training data. On the other hand, the meta-learning approach trains a general model through multiple tasks, enabling it to gain cross-task high-level transfer skills and quickly adapt to new tasks with limited data. In metric learning, the focus is on learning discriminative features that lead to improved classification accuracy, achieved through learning an embedding space where learning is more efficient. To summarize, data augmentation in few-shot learning only creates more synthetic samples and does not truly reflect the real sample distribution. While meta-learning requires a large sample training set, few-shot learning is achieved through transfer. In the papers published in 1979 and 1980, Fukushima [3–5] imitated the visual cortex to design neocognitron. The biggest feature of the neocognitron is the addition of an S layer (simple layer) to each level, while the convolutional neural network proposed by the same time only contains the C layer (complex layer). Fukushima designed the S layer as a shape model and called it a pattern. Its function is to extract image features in the receptive field. The C-layer unit receives and responds to the same features returned by different receptive fields. The rapid development of convolutional neural network and the design of deep learning benefit from its strict error back propagation learning method, which expresses features in each weight through a large number of sample learning. Unfortunately, Fukushima only uses simple incremental learning to train the weights, and the shape models of each layer can only be designed by himself based on experience, which greatly limits the scope of its application. For decades, neocognitron has only been applied to digital recognition, and the effect is not good. Therefore, we improve the neocognitron and use it for few-shot learning. We add a 0-1 mask before pattern extraction and form a non-linear constraint on the C layer through the 0-1 mask. Since the shape model in the neocognitron is designed by humans based on experience, the training vector we extracted through the 0-1 mask also needs to be mapped to the corresponding S layer. Here we use multi-objective optimization to automatically select more representative training vectors. Our main contributions are: (1) Extract global features and transform them into 0-1 masks, and use them as shape templates. (2) Through multi-objective optimization, an excellent shape template is selected to provide non-linear constraints for the convolutional layer, and more complex feature information is extracted. The remainder of the chapter is organized as follows. First of all, the related work of few-shot learning is introduced in Sect. 2. Then, the network structure of the studied visual neocognitron is shown in Sect. 3. Next, use multi-objective optimization to optimize the training vector of the visual neocognitron in Sect. 4.

A Neocognitron Based on Multi-objective Optimization for Few-Shot Learning

19

After that, use our proposed algorithm to locate objects in the real scene in Sect. 5. Finally, the conclusion of the work is made in Sect. 6.

2 Related Work In this section, we will delve into the related work in the field of few-shot learning. Deep learning has been a significant milestone in the advancement of machine learning, leading to remarkable achievements in various tasks as documented in various studies [6]. However, the complex structure of deep models, characterized by numerous parameters, requires vast amounts of labeled data for proper training. This limitation severely restricts the practical applications of deep learning. The question arises, can we obtain an effective model using only a limited number of labeled data? This question has become a topic of significant interest in both academic and industrial communities, as it holds significant implications for the future development of machine learning. Therefore, few-shot learning is a new force, and a powerful model can be obtained through training with a small number of samples [7]. Despite its potential, existing few-shot learning methods are prone to overfitting to the limited sample size and underfitting to the real model. This limits their ability to exhibit analogical reasoning, a critical aspect of human learning. As such, researchers have been devoted to enhancing the generalization ability of fewshot learning for many years. Wu et al. [8] proposed a new meta-learning target detection framework, MetaRCNN, which is based on the popular Faster R-CNN detector. Region proposal network (RPN) and object classification branches are both meta-learning. The metatrained RPN learning provides class-specific recommendations, while the object classifier learns to perform few-shot classification. Meta-RCNN’s new loss targets and learning strategies can be trained in an end-to-end manner. Chen et al. [9] proposed the mutual conversion between the feature space and the semantic space and enhanced the semantic features in the semantic space, and the generated extra features were mapped to the visual feature space to get more sample instances. Li et al. [10] proposed a covariance measurement network to optimize the small sample algorithm based on the measurement network in small sample learning tasks. In the feature representation, the algorithm designs a feature based on the local descriptor to represent the image information and calculates the corresponding covariance representation to represent the category.

3 Network Architecture The hidden layer of the visual neocognitron contains simple cells and complex cells, which are recorded as S cells and C cells. S cells unite to form the S-cell plane, and

20

H. Pan et al.

Fig. 1 Network architecture

Fig. 2 Concept relationships

S cells unite to form the S layer, which is represented by U.s . There is a similar relationship between C cells, C-cell plane, and C layer (U.c ). U.sl represents level S of level l, and U.cl represents level C of level l. The specific structure is shown in Fig. 1. The layer of S cells is responsible for feature extraction and the layer of C cells for feature displacement distortion. Compared with convolution network, S-layer pattern, that is, shape model, is unique to visual neocognitron, which provides non-linear constraints for subsequent C-layer training, and gives full play to its unique advantages to achieve better recognition effect. As shown in Fig. 2, (a) the sample is an image, represented by a two-dimensional matrix X. (b) Shape is a one-dimensional vector composed of feature points, expressed by S. (c) The mode is the 0-1 mask window of the S layer of the visual neocognitron, and the window size increases by 3.l *3.l until the whole image size. (d) Pattern gene string is the representation of a group of 0-1 strings formed by

A Neocognitron Based on Multi-objective Optimization for Few-Shot Learning

21

concatenating each row of pattern window, which is used for evolutionary algorithm to search the optimal pattern automatically. First, for each class of training samples, the visual neocognitron extracts feature points to form shapes. The feature points are extracted by SIFT [11]. After that, through window sliding and convolution operation, the feature can be accurately identified at any position, and the feature can be extracted to realize the displacement constant ability of the model. The extracted global features are represented by 0-1 mask. Then the 0-1 mask window is cut in the way of 3.l *3.l increasing step by step. In short, the mask window cut size of S1 layer is 3 * 3, and the mask window cut size of S2 layer is 9 * 9 until the whole image size. The shape of each S layer after cutting is used as the training vector, namely S cell. Generally speaking, the local features of input patterns are extracted at the shallow level, and the global features are extracted at the deeper level. Finally, evolutionary algorithm selects excellent training vectors to provide non-linear constraints for the training of the C layer. The U.c layer receives the shape features from the U.s layer and extracts more complex feature information. The U.c layer is trained with inhibition signal [12]. According to the error between the output value and the actual value, it propagates back from the output layer to the hidden layer to the input layer. The function designed by Fukushima is simplified, and the sigmoid differentiable function is used to realize the reverse derivation and deepen the network level. The inhibition signal is introduced into the action function, and the inhibition signal is added to each layer to limit the output of each layer’s shape feature, and then the mode shape is changed to improve the positioning accuracy. Compared with the convolutional neural network training method, in this chapter, the action function is the result of excitation signal and inhibition signal, which is more in line with the bionics principle of human visual cortex. The function is shown in Eq. (1). y = Wl Ucl − bl Vcl .

.

(1)

Wl is the weight of the l-level .Uc layer network, which is the excitation signal. .bl is set to 1 inthe training phase, and .Vcl is the inhibition signal acting on the C

.

layer. .Vcl = Ucl2 γl , .γl is the selectivity, which controls the intensity of inhibition. The larger the value of .γl , the more selective the response to the specified feature element.

4 Multi-Objective Optimization In this section, we will optimize the training vectors in the neocognitron proposed above through a multi-objective optimization evolutionary algorithm to automatically obtain more representative training vectors. In our proposed new neocognitron, we obtain training vectors by cutting the mask. The training vector of the neocognitron needs to satisfy that the correlation between the training vectors

22

H. Pan et al.

Fig. 3 Network structure represented by genes. C cell indicates which .Uc is joined to, S cell indicates which .Us is trained in, and the following 0-1 indicates the training vector

of the same U.s should be large enough, and the correlation between the training vectors of different U.s should be small enough, and if the correlation between one training vector and other training vectors is too great, it is redundant [13]. It can be seen that it is quite difficult to design the training vector, especially to set it manually. We can use genetic algorithm to automatically select the training vector. We encode the training vector as a gene representation string, and each group of genes represents a training vector, as shown in Fig. 3. The training vector set is obtained by cutting the mask, and then all the training vectors are coded to obtain the initial population. First, the fitness value of each training vector is cleared, and .ξ individuals are selected from the population to form the training vector of each Us in the neocognitron, and the interconnection weight of the neocognitron is obtained through training. When the training phase is over, the correct recognition rate c, rejection rate r, and error recognition rate e of the neurocognitive machine are obtained from the test set. In actual application scenarios, if the decision-making system makes a wrong recognition, such as misrecognizing A as B, this will lead to serious consequences and unimaginable disasters. Therefore, users are more willing to have a high rejection rate r than a high error recognition rate e, so the value of .r − e should be as high as possible, suppose: f1 = r − e.

.

(2)

In addition, when the rejection rate is too high, the performance of the neurocognitive machine will decrease. Users also hope that the higher the correctness rate is, the better, so the value of .c − r should be as high as possible, suppose: f2 = c − r.

.

(3)

In summary, we construct a multi-objective optimization problem to maximize f1 and .f2 :

.

F (c, r, e) = max{f1 , f2 }.

.

(4)

We use fast non-dominated sorting genetic algorithm to solve this problem. We assign the fitness .f1 and .f2 of each training vector to the sum of the fitness .f1 and .f2 of the five best networks it participates in. Then we sort them by fast non-dominated sorting, and we need to maximize .f1 and .f2 . Let A and B be two individuals in a

A Neocognitron Based on Multi-objective Optimization for Few-Shot Learning

23

population. If f1 of a is greater than .f1 of B, and .f2 of A is less than .f2 of B, then A does not dominate B, and A and B are divided into the same Pareto level. If .f1 of a is greater than .f1 of B and .f2 of a is greater than .f2 of B, then A dominates B. The number of each individual dominating other individuals and the number of individuals dominated by other individuals are calculated, and the population is divided into multiple Pareto levels. The elite strategy is used to select the excellent individuals from the parent generation to the offspring to prevent the loss of Pareto optimal solution. According to the pareto level from low to high, the entire level of individuals in the parent generation is placed into the offspring, until a certain level of individuals is placed into the offspring population, which will cause the size of the population to be exceeded, so that it is impossible to put the whole layer into the offspring. We sort individuals at this level by crowding distance, and each objective function calculates the average distance between two points on both sides of this point. This value is used as an estimate of the circumference of a rectangular parallelepiped with the nearest neighbor as the vertex, and individuals with a large crowded distance are preferentially selected, so that the decision variables are evenly distributed in the target space. According to the crowded distance, the level that cannot be placed is selected until the progeny population is full. Finally, crossover and mutation operations are performed on the individual. Randomly select .δ to perform crossover operations on individuals, divide the individuals into two parts at random, and then exchange them. Randomly select .ε individuals for mutation operation, and change the random bit 0 in the gene to 1 or 1 to 0. Repeat the above operations to continuously evolve the population, and finally obtain the pareto optimal solutions.

5 Object Localization with the Neocognitron In this section, we will show the effect of our proposed algorithm in actual scenarios. We use unmanned aerial vehicle (UAV) to shoot videos in simulated post-disaster scenes, extract the pictures, and integrate them into a data set, which we named Drone. Figure 4 shows part of the scene of the Drone data set. There is obvious scene chaos, and we split the training and test images by 5 to 1. During the flight of the UAV, we record the position of the UAV and then locate the detected objects according to the position of the UAV. Here we use a monocular camera to locate the person in the image according to our previous research [14]. As shown in Fig. 5, we detect people in the image according to our proposed algorithm and then locate the object to determine the location of the person on the map. We mark the recorded location of the drone and the location of the detected person on the map and then compare the obtained person’s coordinates with the real person’s coordinates. The results show that the algorithm has good performance in precise positioning.

24

H. Pan et al.

Fig. 4 Part of the pictures in the Drone data set we collected

Fig. 5 The location of the drone and the location of the person located on the map. The red icon indicates the UAV, and the white icon indicates the person located

6 Conclusion In this chapter, we propose new shape templates to improve the traditional neural network for few-shot learning. First, the image is extracted to the global characteristics to form a 0-1 mask. Then the 0-1 mask window is cut in the way of 3.l *3.l increasing step by step. Second, the shape after cutting is selected by evolutionary computing, which provides a non-linear constraint for the convolution layer. Third, human visual cortex is the result of excitation signal and inhibition signal. Therefore, inhibition signals are introduced into the convolution layer. Finally, we apply our proposed algorithm to object localization. According to the test in actual scenes, our proposed algorithm achieved high positioning accuracy.

A Neocognitron Based on Multi-objective Optimization for Few-Shot Learning

25

Acknowledgments This work is supported by Ministry of Science and Technology China (MOST) Major Program on New Generation of Artificial Intelligence 2030 No. 2018AAA0 102200. It is also supported by Natural Science Foundation China (NSFC) Major Project No. 61827814 and Shenzhen Science and Technology Innovation Commission (SZSTI) Project No. JCYJ20190808153619413. The experiments in this work were conducted at the National Engineering Laboratory for Big Data System Computing Technology, China.

References 1. Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8420–8429 (2019) 2. Zhang, J., Zhao, C., Ni, B., Xu, M., Yang, X.: Variational few-shot learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1685–1694 (2019) 3. Fukushima, K.: Neural network model for a mechanism of pattern recognition unaffected by shift in position-neocognitron. IEICE Technical Report, A 62(10), 658–665 (1979) 4. Fukushima, K., Miyake, S.: Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition. In: Competition and Cooperation in Neural Nets, pp. 267–285. Springer, Berlin, Heidelberg (1982) 5. Fukushima, K: Neocognitron trained with winner-kill-loser rule. Neural Netw. 23(7), 926–938 (2010) 6. Pouyanfar, S., Sadiq, S., Yan, Y., Tian, H., Tao, Y., Reyes, M.P., Iyengar, S.S. et al.: A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. (CSUR) 51(5), 1–36 (2018) 7. Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M.: Generalizing from a few examples: a survey on fewshot learning. ACM Comput. Surv. (CSUR) 53(3), 1–34 (2020) 8. Wu, X., Sahoo, D., Hoi, S.: Meta-RCNN: Meta learning for few-shot object detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1679–1687 (2020) 9. Chen, Z., Fu, Y., Zhang, Y., Jiang, Y. G., Xue, X., Sigal, L.: Multi-level semantic feature augmentation for one-shot learning. IEEE Trans. Image Process. 28(9), 4594–4605 (2019) 10. Li, W., Wang, L., Xu, J., Huo, J., Gao, Y., Luo, J.: Revisiting local descriptor based imageto-class measure for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7260–7268 (2019) 11. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE, New York (1999) 12. Buzsáki, G.: Feed-forward inhibition in the hippocampal formation. Prog. Neurobiol. 22(2), 131–153 (1984) 13. Shi, D., Dong, C., Yeung, D.S.: Neocognitron’s parameter tuning by genetic algorithms. Int. J. Neural Syst. 9(06), 497–509 (1999) 14. Yang, S., Shi, D.: RnR: retrieval and reprojection learning model for camera localization. IEEE Access 9, 34626–34634 (2021)

Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID Gang He, Xiaohua Cai, Yuntao Hou, and Yan Ye

1 First Section At present, it is difficult to accurately control the temperature of dairy products during the feeding process of domestic calves. In most of the existing control systems, the PID algorithm is still dominant in the bottom control because of its simple algorithm, good adaptability, convenient adjustment, and other advantages, which can achieve most systems’ control effect. However, due to the increasing complexity of the controlled object, the PID control algorithm also exposes its shortcomings. In the case of appropriate parameters, the PID control algorithm can be competent for most control functions, but its dynamic performance variability is not strong enough. The parameters of the PID algorithm need to be manually adjusted, and the adaptability of the parameters is poor. In the face of sudden changes in the controlled object, it is necessary to reset the PID parameters to make the system reach a new balance and solve the instability of the temperature control system. Based on the PID control algorithm, this chapter uses the advantages of the intelligent control algorithm to improve the disadvantages of PID control algorithm. On this basis, the Simulink simulation block diagram of MATLAB is established,

Research and Application of Key Technology Research and Application 2019YFE0125400 G. He College of Engineering, China Agricultural University, Beijing, China Hohhot Branch of Chinese Academy of Agricultural Mechanization Sciences Co., Ltd., Hohhot, China X. Cai () · Y. Hou · Y. Ye Hei Long Jiang Academy of Agricultural Machinery Sciences, Harbin, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_3

27

28

G. He et al.

and the fuzzy PID controller is designed to realize the steady-state precision and adaptive ability of temperature control, which is of great significance to improve the survival rate of calves, improve the stability of calves, the economic benefit of pasture, and the healthy and sustainable development of animal husbandry is of great significance.

2 Research on Fuzzy PID Control Algorithm of Temperature Control System 2.1 Control Object Modeling The input of the heat exchanger is 0 .∼ 220 V alternating current. After heating, the electric heating wire in the interlayer volatilizes heat and transmits it to the final heating object (water) through the heat conduction layer. Through the PT100 temperature sensor, which goes deep into the final heating object (water), the water temperature is taken out as the output of the controlled object. The structure of the heat exchanger is shown in Fig. 1. The input of the heat exchanger is 0 .∼ 220 V alternating current. After heating, the electric heating wire in the interlayer volatilizes heat and transmits it to the final heating object (water) through the heat conduction layer. Through the PT100 temperature sensor, which goes deep into the final heating object (water), the water temperature is taken out as the output of the controlled object. Electric heating furnace equipment is mainly a structure relation of temperature .Tf change of heating object in furnace with the increase of time t, and the model is one-dimensional change relation. After being electrified, the resistance wire in the electric heating furnace begins to generate heat. In the heat conduction model, the following variables are used, and the meanings of each variable are as follows: (1) The heat transfer area represents A. (2) .T1 is the temperature of the resistance wire bushing after heating. (3) .Q1 is the steady value of the heat transferred from the resistance wire bushing to the final heating object. (4) .ΔQ1 is the incremental value of the heat transferred from the resistance wire bushing to the final heating object. (5) .Q0 is the steady value of the heat lost from the resistance wire bushing to the outside of the heating furnace. (6) .ΔQ0 is the increment value of heat loss from the resistance wire bushing to the outside of the heating furnace. (7) .Tf is the temperature of the final heated object. The instantaneous value of the difference between the heat .Q1 transferred from the resistance wire bushing to the final heating object and the heat .Q0 lost from the resistance wire bushing to the outside of the heating furnace is equal to the temperature change rate of the final heating object.

Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID

29

Fig. 1 Heat exchanger architecture

.

dTf = ΔQ1 − ΔQ0 . dt

(1)

In equation, .ΔQ1 is caused by the temperature change of the resistance wire bushing. When the physical structure of the heat exchanger remains unchanged, .ΔQ1 and .ΔT1 is in direct proportion to the temperature. ΔQ1 = KT1 ΔT1 ,

.

(2)

where .KT1 is .ΔQ1 and .ΔT1 is the proportional relationship coefficient with. The relationship between the heat .Q0 loss from the resistance wire bushing to the outside of the heating furnace and the temperature .Tf of the final heating object is as follows:   Q0 = A 2gTf = K1 Tf ,

.

(3)

where A is the heat dissipation area of the resistance wire bushing. Equation is a non-linear relation. Linearization is carried out near the equilibrium point .(T0 , Q0 )

30

G. He et al.

R=

.

ΔTf . ΔQ0

(4)

Substituting formula (3) and formula (4) into formula (5), we can get ΔTf dTf =A . R dt

(5)

dTf + ΔTf = RΔQ1 . dt

(6)

ΔQ1 −

.

Put formula in order A

.

Then equations .T = RA, K = R can be arranged as dTf + ΔTf = KΔQ1 . dt

T

.

(7)

Therefore, the transfer function from the temperature of the last heated body to the change in heat transferred from the resistance wire bushing to the last heated body can be obtained as G(s) =

.

ΔTf (s) K = . ΔT1 (s) Ts +1

(8)

The above mathematical model is an approximate one, which is different from the actual controlled object. For example, the heat transfer between the resistance wire and the resistance wire bushing and the heat transfer between the resistance wire bushing and the final heated object will have a certain transmission time, that is, the delay .τ0 can affect the next object. Therefore, there will be a control variable in the differential equation, and it will be more accurate to .Δu (t − τ0 ) replace .Δu (t), and the corresponding transfer function should correspondingly contain a delay link. Then the final mathematical model of the heating system can be expressed as G(s) =

.

K eτ0 s . Ts +1

(9)

Parameter values are calculated according to the physical meaning of the parameters in the model: G( s) =

.

2 e−s . 15s + 1

So far, the modeling process of the controlled object is completed.

(10)

Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID

31

2.2 Design of PID Controller In PID control, the deviation .e(t) = r(t) − c(t) is obtained by the difference between the actual value and the set value of the controlled object. After the proportional, integral, and differential operation of the deviation, the operation results of each part are summed up to output the control quantity. In closed-loop control systems, PID controllers provide real-time optimization of dynamic system performance by adjusting proportional gains, integral time coefficients, and derivative time coefficients. This document uses the Ziegler Nichols parameter tuning formula to provide theoretical and technical guidance and generality for PID parameter tuning. The PID parameter tuning equations based on the Z-N method are closely related to the mathematical model coefficients of what the system controls. According to the formula in Table 1, we can calculate the proportional gain, integral time coefficient, and differential time coefficient of the PID controller. According to the calculation formula of PID controller parameters, we can get .KP = 9, .Ti = 2.2, .Td = 0.5. Then, the closed-loop PID control circuit diagram of the boiler temperature control system is built in the Simulink interface, as shown in Fig. 2. The default setting of the system is: the maximum outlet water temperature is 45.◦ C, and the minimum outlet water temperature is 39.◦ C; the water inlet temperature ranges from 10.◦ C to 18.◦ C. In the simulation process, the value of outlet water temperature is set as 42.◦ C, the simulation time is 800 s, and the PID parameters are set as the values calculated by Z-N method. The simulation results are shown in Fig. 3. Table 1 PID parameter tuning formula of Z-N method

Fig. 2 Closed-loop PID control-loop block diagram

Controller P PI PID

.Kp

.Ki

.Kd

.T /Kτ .0.9T /Kτ

.3τ

.1.2T /Kτ

.2τ

.0.5τ

32

G. He et al.

Fig. 3 Temperature closed-loop response curve

It can be seen from the figure that there is a large amplitude oscillation phenomenon in the system, which does not meet the engineering requirements. In order to improve the control effect, after adjusting the parameters, Fig. 4 is obtained. The adjusted parameters are .KP = 3, .Ti = 1, and .Td = 0.3. It can be seen that although the number of oscillation is obviously less, the overshoot is still larger. Adjustment parameters .KP = 1, .Ti = 0.1, .Td = 0.3. Get Fig. 5. From Figs. 3, 4, and 5, we can see that the system’s response time is reduced, but there is still a large overshoot. To get the best control effect of the system, further tune the parameters to obtain Fig. 6. The adjusted parameters are .= 0.75, = 0.05, = 0.3. PID control algorithm can be competent for most of the temperature control functions, but its dynamic performance variability is not strong enough. Algorithm parameters need to be manually tuned, and the parameter tuner is poor. Faced with rapid changes in the controlled object, the PID parameters must be returned to allow the system to reach a new equilibrium, causing the system to become unstable. The fuzzy control algorithm is introduced into the traditional heater control system to form an intelligent dimming control system. Use fuzzy control rules to adaptively change PID parameters online to form a self-tuning fuzzy PID control system. This will give you a better control effect and better control of the temperature object.

Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID

Fig. 4 Temperature closed-loop response curve

Fig. 5 Temperature closed-loop response curve

33

34

G. He et al.

Fig. 6 Trim parameters

2.3 Design of Fuzzy PID Controller This document adds the theory of fuzzy control to PID controllers. A fuzzy PID controller takes a response deviation and deviation rate of change as inputs and uses fuzzy control rules to modify the PID controller parameters P, I, and D in real time to adapt them to the deviation and deviation requirements. The rate of change is incremented at different times in the PID parameters. The default setting of the system is: the maximum outlet water temperature is 45 ◦ ◦ . C, and the minimum outlet water temperature is 37. C; The water inlet temperature ◦ ◦ ranges from 0. C to 20. C. Therefore, let the basic domain of error E be [.−45, 45] and the basic domain of temperature error EC be [.−8, 8]. By determining the domain of exact quantity, we choose their hierarchical quantity domain as e .= −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, EC .= −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6. At the same time, in order to carry out fuzzy reasoning, it is necessary to discretize the basic universe into the fuzzy universe of fuzzy subsets, which is realized by quantifying the sum of factors. Therefore, from the formula of a quantitative factor in the universe of determining the rank quantity: Ke = 6/45 Ke =

.

2×6 6 = . 45 45 − 45

(11)

Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID

Kec =

3 2×6 = . 4 8−8

35

(12)

Take the output of fuzzy controller .ΔKp , .ΔKi , .ΔKd . The fuzzy universe of KD is .−3, −2, −1, 0, 1, 2, 3, .ΔKp , .ΔKi , .ΔKd . The basic domains of KD are .[−0.3, 0.3], [−0.06, 0.06], [−3, 3]. The calculation formula of scale factor, .ΔKKp , .ΔKKi , .ΔKKd , is as follows: .

0.3 = 0.1. 3 0.06 = 0.02. ΔKKi = 3 3 ΔKKd = = 1. 3 ΔKKp =

.

(13) (14) (15)

For the actual working condition of the temperature system, the fuzzy control universe language expression variables of deviation, deviation change rate, and fuzzy PID controller parameter setting adjustment value are .ΔKp , ΔKi , ΔKd set as .{NB, NM, NS, ZO, P S, P M, P B} , negative big, negative middle, negative small, zero, positive small, positive middle, and positive big.

2.3.1

Controller Design of Membership Function

Considering the complexity of the actual system and other characteristics, .ΔKp , ΔKi , ΔKd , the membership functions of deviation, deviation change rate, and system control parameter increment are selected as triangle functions with small amount of calculation and small memory occupation, respectively. The structure of specific membership functions is shown in Fig. 7.

.

Fig. 7 Membership function of fuzzy PID controller

36

2.3.2

G. He et al.

Controller Control Rule Design

In order to make the fuzzy PID constant temperature controller work reliably .ΔKp , ΔKi , ΔKd , it is necessary to reasonably design the fuzzy control rules when the system works:

.

(1) The system deviation is large in the initial stage of the liquid-level control system. In order to speed up the response speed of the system, we need to use larger proportion coefficient, smaller integral coefficient, and smaller differential coefficient. (2) When the liquid-level control system is close to the set temperature, to prevent the liquid-level overshoot of the system, it is necessary to use a smaller proportion coefficient, the integral coefficient should be smaller or zero, and a larger differential parameter. (3) When the liquid-level control system produces overshoot, it is necessary to adopt appropriate proportional and larger differential coefficients. Fuzzy control rules of parameter .ΔKp are shown in Table 2: Fuzzy control rules of parameter .ΔKi are shown in Table 3: Fuzzy control rules of parameter .ΔKd are shown in Table 4: According to the control rules set by the system, the fuzzy inference operation can get the adjusted value of PID control parameters. Due to the complexity of the process, this paper will not describe it any more. For the fuzzy control quantity obtained by fuzzy inference, it is in a clear value of fuzzy theory. Considering the clarity of fuzzy control quantity, the three adjusted values of PID control parameters are obtained, respectively, namely: Table 2 Fuzzy control rules of parameters (.ΔKp )

EEC NB NM NS ZO PS PM PB

NB PB PB PM PM PS PS ZO

Table 3 Fuzzy control rules of parameters (.ΔKi )

EEC NB NM NS ZO PS PM PB

NB NB NB NB NM NM ZO ZO

NM PB PB PM PM PS ZO ZO NM NB NB NM NM NS ZO ZO

NS PM PM PM PS ZO NS NM NS NM NM NS NS ZO PS PS

ZO PM PS PS ZO NS NM NM ZO NM NS NS ZO PS PM PM

PS NS NS ZO PS PS PM PM PS NS NS ZO PS PS PM PM

PM PM ZO NS NM NM NB NB

PB PB NS NS NM NM NB NB

PM ZO ZO PS PM PM PB PB

PB ZO ZO PS PM PM PB PB

Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID Table 4 Fuzzy control rules of parameters (.ΔKd )

EEC NB NM NS ZO PS PM PB

NB PS PS ZO ZO ZO PB PB

NM NS NS NS NS ZO NS PM

NS NB NB NM NS ZO PS PM

ZO NB NM NM NS ZO PS PM

37 PS NB NM NS NS ZO PS PS

PM NM NS NS NS ZO PS PS

  j =7     j =1 μ ΔKPj × ΔKPj . .ΔKP =  i=7   i=1 μ ΔKP i   j =7     j =1 μ ΔKij × ΔKij ΔKi = .  i=7   i=1 μ ΔKij   j =7     j =1 μ ΔKdj × ΔKdj . ΔKd =  i=7   i=1 μ ΔKdi

PB PS ZO ZO ZO ZO PB PB

(16)

(17)

(18)

For the corresponding .μ(ΔKP i ), μ(ΔKij ), μ(ΔKdj ) fuzzy subset, the calculated value of PID control number is as .Kp , .Ki , .Kd follows: ⎧ ⎨ Kp = Kp0 + ΔKP . K = Ki0 + ΔKi . ⎩ i Kd = Kd0 + ΔKd

(19)

In the above formula, .Kp0 , Ki0 , Kd0 for control parameter initial value.

3 Analysis of Simulation Results According to the above analysis and design, the Simulink program as shown in Fig. 8 is drawn in MATLAB, and the step input is used as the expected value input of the system, and the simulation results are as follows: Then the control effect of fuzzy PID controller is compared with that of conventional PID controller to verify the control effect of fuzzy PID controller. The simulation results are compared as follows: (1) The simulation result is shown in Fig. 9. (2) The comparison curves of simulation results are shown in Figs. 10 and 11. Compared with conventional PID, fuzzy self-tuning PID has faster response speed, shorter response time, better stability, and smaller overshoot, which greatly

38

G. He et al.

Fig. 8 Simulation program of fuzzy PID control algorithm

Fig. 9 Temperature closed-loop response curve of fuzzy PID control algorithm

improves the dynamic performance of the system. When the step disturbance signal appears, both of them can overcome the disturbance quickly, but in comparison, the anti-interference ability of fuzzy PID is better than that of conventional PID.

4 Conclusion In the temperature control of heat exchanger, compared with the conventional PID control algorithm, the temperature control algorithm based on fuzzy self-tuning PID not only has the advantages of conventional PID, but also has the adaptability and flexibility of fuzzy control, which shows that this method has obvious advantages in system robustness and stability when applied to the temperature control system.

Milk Temperature Control System of Calf Feeding Robot Based on Fuzzy PID

Fig. 10 Comparison of temperature response curves of two control modes (no interference)

Fig. 11 Comparison of temperature response curves of two control modes (with interference)

39

Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles to Solve the Blind Man’s Buff Problem Gokul P

1 Introduction Taking inspiration from the social behavior of insects, swarm robots have come a long way from being fiction to real-life implementation [1, 2]. In areas such as emergency response and rescue [3, 4], military applications [5], multi-robot systems have proven to be superior to individual robots when it comes to navigation and exploration in unknown environments. Dispersion of multi-robot systems is one of the most vital tasks when it comes to a decentralized network. Over the years, multiple algorithms have been put forward and have proven to be viable for the path planning of each of the nodes to disperse efficiently given various constraints, and while allowing for decentralized communication mechanisms, and balancing efficiency, exploration, and exploitation with a varying number of robots. Recent advancements in communication technology have fast-tracked the development of cooperative control to make sure the trajectory is collision-free and as natural as possible. This has in turn reduced the complexity of some arduous tasks such as exploration, target search, and disaster mitigation [6] by utilizing multirobot systems rather than single deployed robots. Multi-robot systems have come a long way from being a myth to reality. The research in the field has attracted a lot of roboticists. Multi-robot aggregation [7], dispersion, task allocation [8], motion coordination, and communication [9] are some of the main tasks involved in an end-to-end mapped multi-robot system working on a specific task. This chapter concentrates on the dispersion of multi-robot systems in a very specific case known as The Blind Man’s Buff that can be considered analogous to applications where the

Gokul P () Manipal Institute of Technology, Manipal, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_4

41

42

Gokul P

area is bounded, is cluttered with obstacles, and the robots are being hunted down or sought by other robots, humans, or any other similar sources. The main aim of dispersion is to cover the maximum area for the robots to perform their tasks while remaining in contact with each other. A trivial aspect that is generally cast aside during dispersion is collisions that might occur within the cohort of robots if the environment is bounded and compact. This could be more problematic if there are obstacles, and the situation could snowball into completely going off the beaten track if the obstacles are dynamic. Certain tasks such as disaster mitigation involve immediate dispersion of robots after aggregation, and this should also not come at the cost of collisions with the other robots or obstacles. Prior work including extended exploration by dispersion, gradient descent-based dispersion [10], and clique-intensity algorithm [11] has been developed to modulate the multi-robot system. The aforementioned works even though they have been applied to unexplored environments fall short when there is a possibility of a collision while dispersion is underway. A collision could render more than one robot useless thereby reducing the number of usable particles. Bluff is an implementation of hybrid reciprocal velocity obstacles [12] that itself is a derivative of velocity obstacles [13] and reciprocal velocity obstacles [14] applied to the dispersion of the robots taking care of some of the main shortcomings of the previously proposed works including collision-free trajectory, natural motion of the robots, and oscillation-free trajectory. It is a decentralized [15] approach with no communication between the robots be it seekers or the cohort robots. The remainder of this chapter is organized as follows: Sect. 2 introduces the problem statement, and Sect. 3 gives a background of the existing algorithms that have been used to formulate the paper. In Sect. 4, the proposed approach is described in detail. We evaluate our approach in Sect. 5. Finally, Sect. 6 concludes the paper and discusses future work.

2 Problem Formulation Let there be n particles that form a part of the multi-robot system and m particles that include static obstacles, dynamic obstacles, and also robot seekers. As introduced in the previous section, the role of the seeker is to collide with a robot and render them useless. Each robot, i, of the multi-robot system has a radius .ρri , a velocity .vri , and a position .pri , and each of the (static and dynamic) obstacles and the seekers have a radius .ρoi , a velocity .voi and position .poi . The n robots start at a common point and the goal being to disperse all n goal goal robots to different user-defined waypoints .pri and a goal velocity .vri that is the maximum velocity the robot can travel in when the environment is uncluttered. The robots have to compute and choose the right trajectory and velocity, .vri , to reach the goal waypoint making sure that there is no collision with the robots of the

Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles

43

same cohort, seekers, or obstacles. The robots do not communicate with each other making sure that the entire workflow is decentralized. To put it in simpler words, it is an implementation of Blind Man’s Buff but for robots. Blind Man’s Buff Blind Man’s Buff is a game popularized in Asia and is a variant of yet another popular game, tag. Here the seeker popularly called “it” is blindfolded, and the objective is to come in contact with as many players as possible. It is played in a bounded clustered environment. In our case, the “it” or the seeker can be multiple robots, and to make it more challenging, the obstacles can be dynamic as well. The objective of the game remains the same, i.e., to come in contact with as many particles as possible. But unlike the game, the robots do not stay in contact with each other essentially forming a decentralized network.

3 Background Information Velocity Obstacles A very basic explanation of velocity obstacles can be explained using Fig. 1 where we take a single robot, R, and a single obstacle, O. The radius of R being .ρR and the radius of the obstacle being .ρO . And the distance between them being .d(R, O). To make it easier to solve the problem, we can shrink the robot to a point mass with a negligible radius, and the obstacle can be considered to have a radius of .ρR + ρO . This does not alter the dynamics and at the same time makes it easier to solve the problem. We need to find the speed at which the robot needs to travel so that there is no collision. Assuming tangent from the point R to the obstacle as shown in the figure, we can calculate the angles .β and .θ as follows: Fig. 1 Conversion of robots to point objects by scaling down

44

Gokul P



ρR + ρO .β = asin d(R, O)

 .

θ = atan2(Oy − Ry , Ox − Rx ).

(1) (2)

Any unit vector that has the angle in the range .[θ − beta, θ + β] will incur a collision. Therefore, the collision cone, CC, defined by the obstacle is given by CC(O) = v : λ, q + λV ∩ R(O) =  ∅,

.

(3)

where .CC(O) is the collision cone for obstacle O and .R(O) is the spatial occupancy of O that is the area covered by the obstacle O having a radius .ρR + ρO . .R(O) can be formally written as R(O) = q : d(q, θ ) < ρR + ρO .

.

(4)

Considering the velocity component V , we consider the relative speed that cannot be .CC(O) Vθ (O) − VO = CC(O),

.

(5)

where .Vθ (O) is the absolute velocity that results in collision or velocity obstacle for O and .VO is the velocity of the obstacle. Rearranging the equation, Vθ (O) = CC(O) + VO ,

.

(6)

where the .+ operation is the Minkowski addition. The final velocity obstacle can be seen in 1. Static obstacles is a special case of dynamic obstacles and seekers when the velocity and movement of direction are zero. Reciprocal Velocity Obstacles Even though the velocity obstacle approach solves the problem of navigating by avoiding collisions to a great extent, the trajectory is not smooth. As the agents approach each other, there is some type of oscillation occurring. Although the collision is averted, the motion seems unnatural. One of the disadvantages of velocity obstacles is this. To avoid this [14, 16] unnatural motion, we take the average of the current velocity of the agent and a new velocity that is inside the velocity obstacles. This ensures that the whole responsibility to avoid collisions does not rest on a single particle but is rather divided (equally or unequally) among a pair of particles. The reciprocal velocity obstacles of robot A moving with velocity vA for an obstacle moving with velocity vB are given as   RV OBA (vB , vA ) = vA |2vA − vA ∈ CCBA (vB ),

.

(7)

Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles

45

Fig. 2 Depiction of hybrid reciprocal velocity obstacle

where CC is the collision cone defined in the previous subsection. But we do not always give equal priority to the agents, hence generalizing the equation, A .RV OB

    1 A  1  vB , vA , αB = vA | A vA + 1 − A vA ∈ CCBA (vB ). αB αB

(8)

If α = 0.5, then the priority is divided equally among the agents. Hybrid Reciprocal Velocity Obstacles Even though reciprocal velocity obstacle does a very good job to get a more natural trajectory, the oscillations that we talked about in the previous subsection might get worse and snowball into something known as the reciprocal dance. This happens when the agents do not agree on which side to pass. It sometimes even leads to deadlocks or even collisions. The hybrid reciprocal velocity was introduced, which solved this problem. Taking Fig. 2, we see that vA is to the left of the center line RV OBA (vB , vA ) and vB is to the left of the center line RV OAB (vB , vA ). The robot A must choose to take the left of the obstacle B. We can accomplish this by enlarging the reciprocal velocity. As proven in the paper, if the wrong turn is taken by any of the robot, full priority will be given to the other.

4 Proposed Implementation The proposed implementation has been divided into 2 separate sections: Algorithm 1 depicts the algorithm followed by the robots to optimize the trajectory to avoid obstacles and seekers. The obstacles can be both dynamic and static. Here, we first construct the collision cones, which is followed by the center line for the RVO. The velocity is then compared, replaced, and a new velocity is computed and given to the robot. Algorithm 2 depicts the algorithm followed by the robots to avoid collisions

46

Gokul P

Algorithm 1 Algorithm to avoid obstacles and seekers Assumptions: The robots have been aggregated to a common point Input: R List of robots in the multi-robot system, O seekers and obstacles 1: for .Ri ∈ R do 2: Extract .pRi , .vRi 3: for .Oj ∈ O do 4: Extract .pOj , .vOj 5: if .vOj = 0 then 6: 7: 8:

O

O

Construct .CCRij and .CCRij ∗ end if if .vOj > 0 then O

O

replace

O

9: 10: 11:

Construct cones for .CCRij and .RV ORij O Construct Center line of .RV ORi j if .vR is on left of Center line then

12: 13: 14:

(right).RV ORij ←−−→ (right) .CCRij = .H RV ORij end if if .vR is on right of Center line then

O

O

replace

O

O

O

15: (left).RV ORij ←−−→ (left) .CCRij = .H RV ORij 16: end if 17: end if 18: end for 19: Compute new preferred velocity 20: Compute new Velocity .∈ H RV OR ∗ 21: Apply velocity to the robots 22: end for

with other robots in the same cohort. This follows 1 where we construct the collision cones and RVO. The velocities are compared and adjusted.

5 Results and Simulations 5.1 Implementation Details The proposed algorithms are implemented in Python with ROS-Melodic as the middleware software suite and executed on a laptop running Ubuntu 18.04 with Intel i5-6300HQ @2.30 GHz CPU and 8 GB of RAM. The project made use of a custom multi-robot testbed using Pioneer as the mobile robot. The simulation has been done on Gazebo. Figure 3 shows the path followed by the multi-robot system in a cluttered environment. The environment is bounded and has static obstacles and seekers goal that are dynamic. The .vr is .5 ms−1 , and the velocity of .seeker1 is .6 ms−1 and −1 .seeker2 is .4 ms . The first subfigure depicts the initial starting position, and the 3rd subfigure depicts the case when the robot is about to come in contact with the seekers. The arrows show the direction of motion.

Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles

47

Algorithm 2 Algorithm to avoid robots from the same cohort during dispersion Assumptions: The robots have been aggregated to a common point Input: R List of robots in the multi-robot system 1: for .Ri ∈ R do 2: Extract .pRi , .vRi 3: for .Rj ∈ R do 4: Extract .pRj , .vRj 5: 6: 7: 8: 9: 10:

R

R

replace

R

Construct cones for .CCRij and .RV ORij R Construct Center line of .RV ORi j if .vR is on left of Center line then R

R

(right).RV ORij ←−−→ (right) .CCRij = .H RV ORij end if if .vR is on right of Center line then R

replace

R

R

11: (left).RV ORij ←−−→ (left) .CCRij = .H RV ORij 12: end if 13: end for 14: Compute new preferred velocity 15: Compute new Velocity .∈ H RV OR ∗ 16: Apply velocity to the robots 17: end for

Figure 4 plots out the trajectory of yet another case with two sub-cases. In the figure, green circles depict the multi-robot system with a radius .ρri , and the aqua colored circles depict the seekers and dynamic obstacles with a radius .ρri that are moving with a velocity equivalent to 2 units per second. The first subfigure is the initial starting position of the robots, and the second, third, and fourth figures show how the robot disperses and avoids the first seeker using hybrid reciprocal velocity obstacles. The fifth and 6th figures clearly show how the hybrid algorithm tackles the problem of reciprocal dance; in the fifth subfigure, the speed of the second seeker has been reduced to 1 unit per second, and in the sixth figure, it is 2 units per second. In both cases, the robot has made sure to disperse with a natural trajectory changing its speed and avoiding contact with the seeker.

5.2 Solution Quality Dispersion in a cluttered bounded environment is the main objective of the paper. To make sure that no edge case is left out, the project has pushed the limits of the simulation by reducing the environment, increasing the particles as well as increasing the number of seekers and obstacles. The obstacles and the seekers are randomly initialized with random velocities ranging between .1 ms−1 and .10 ms−1 . The multi-robot system is initialized at a predefined position and has a predefined final location.

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0

2

4

6

8

10

12

14

0

0

2

4

6

(a) 14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0

2

4

6

8

10

12

14

0

0

2

4

6

(c) 14

12

12

10

10

8

8

6

6

4

4

2

2

0

2

4

6

8

(e)

10

12

14

8

10

12

14

10

12

14

(d)

14

0

8

(b)

10

12

14

0

0

2

4

6

8

(f)

Fig. 3 This figure plots out the trajectory followed by the multi-robot system in a cluttered environment. Green color circles are the multi-robot system, and the aqua colored circles are the seekers. The red line depicts the trajectory followed by the multi-robot system. e and f are separate cases where the velocities of the seekers are different. In case of e, the seeker is going at a much slower velocity than f. (a) Initial position. (b) Dispersion and collision avoidance. (c) Dispersion and collision avoidance. (d) Dispersion and collision avoidance. (e) Trajectory planning without oscillations when the speed of the seeker is less. (f) Trajectory planning without oscillations when the speed of the seeker is more

Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles Fig. 4 Simulation of the algorithm using Pioneer 3-AT robots. The aqua colors depict the multi-robot system, the red colored circles depict the seekers, and blue depicts the obstacles. The arrows point in the direction of velocity. (a) Initial position of the multi-robot system. (b) Dispersion and collision avoidance. (c) Dispersion and collision avoidance. (d) Final positions

49

50

Gokul P

Table 1 Time taken in seconds by the multi-robot systems to reach the destined waypoints without collision. Any collision would be awarded with 0 points. The simulations have been averaged over 100 epochs to document the values in Table Number of Particles n 2 3 5 10 15 20 25

Velocity Obstacles VO 3.5712s 3.9013s 4.3210s 15.1103s 30.1189s 41.8016s NA

Reciprocal Velocity Obstacles RVO 3.3139s 3.6158s 4.0019s 12.6039s 23.9901 38.6132s 45.7108s

Hybrid Reciprocal Velocity Obstacles HRVO 3.3019s 3.5901s 3.8061s 7.9103s 16.6013s 26.5702s 38.3090s

Table 1 has documented the time taken by the multi-robot system to disperse to the final waypoint when it follows velocity obstacles, reciprocal velocity obstacles, and the hybrid reciprocal velocity obstacles. The simulations were iterated for 100 epochs keeping the number of particles constant, and the environment was highly cluttered. As it is very clear, the hybrid reciprocal velocity obstacles outperform the other algorithms. This is even more visible when the number of particles is more than 5, i.e., the environment becomes very highly compact and complicated to navigate. The algorithms such as velocity obstacles or reciprocal velocity obstacles fail to plan the path either because of edge cases like the reciprocal dance or due to oscillations.

6 Conclusion The work presented a modular dispersion algorithm for multi-robot systems. The algorithm assumes initial aggregation prior to dispersion. Simulations of the algorithm have demonstrated that the hybrid reciprocal velocity obstacles-based dispersion provides significant improvements not only in the time taken but also in the trajectory produced. The trajectory is much more natural, less oscillatory, and more collision-free. The effectiveness of the algorithm is more prevalent when the number of particles is more than five and the environment is bounded, small, and highly cluttered. The distinctiveness of the dispersion algorithm is the fact that it focuses on acute situations including when the number of robots in the multirobot system is greater than 30. Future work will include extending the algorithm by fusing values from multiple sensors thereby helping the robots localize at every stage.

Bluff: A Multi-Robot Dispersion Based on Hybrid Reciprocal Velocity Obstacles

51

References 1. Hinchey, M., Sterritt, R., Rouff, C.: Swarms and swarm intelligence. Computer 40, 111–113 (2007) 2. Brambilla, M., Ferrante, E., Birattari, M., Dorigo, M.: Swarm robotics: a review from the swarm engineering perspective. Swarm Intell. 7, 1–41 (2013) 3. Carrillo-Zapata, D., Milner, E., Hird, J., Tzoumas, G., Vardanega, P., Sooriyabandara, M., Giuliani, M., Winfield, A., Hauert, S.: Mutual shaping in swarm robotics: user studies in fire and rescue, storage organization, and bridge inspection. Frontiers in Robotics and AI 7, 53 (2020) 4. Arnold, R., Yamaguchi, H., Tanaka, T.: Correction to: search and rescue with autonomous flying robots through behavior-based cooperative intelligence. Journal of International Humanitarian Action 4, 12 (2019) 5. Sangeetha, M., Srinivasan, K.: Swarm robotics: a new framework of military robots. J. Phys. Conf. Ser. 1717, 012017 (2021) 6. Yie, Y., Solihin, M., Kit, A.: Development of swarm robots for disaster mitigation using robotic simulator software. In: Proceedings of the 9th International Conference on Robotic, Vision, Signal Processing and Power Applications, pp. 377–383 (2016) 7. Zhao, H., Nie, Z., Wang, X.: Design and analysis of multi-robot grouping aggregation algorithm. J. Robot. Networking and Artificial Life 6, 60 (2019) 8. Khamis, A., Hussein, A., Elmogy, A.: Multi-robot task allocation: a review of the state-of-theart. Cooperative Robots and Sensor Networks 2015, 31–51 (2015) 9. Doriya, R., Mishra, S., Gupta, S.: A brief survey and analysis of multi-robot communication and coordination. In: International Conference on Computing, Communication and Automation (2015) 10. Bayert, J., Khorbotly, S.: Robotic swarm dispersion using gradient descent algorithm. In: Proceedings of the 2019 IEEE International Symposium on Robotic and Sensors Environments (ROSE) (2019) 11. Ludwig, L., Gini, M.: Robotic swarm dispersion using wireless intensity signals. In: Distributed Autonomous Robotic Systems, vol. 7, pp. 135–144 (2006) 12. Snape, J., Berg, J., Guy, S., Manocha, D.: The hybrid reciprocal velocity obstacle. IEEE Trans. Robot. 27, 696–706 (2011) 13. Wilkie, D., van den Berg, J., Manocha, D.: Generalized velocity obstacles. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (2009) 14. van den Berg, J., Lin, M., Manocha, D.: Reciprocal velocity obstacles for real-time multi-agent navigation. In: 2008 IEEE International Conference on Robotics and Automation (2008) 15. Claes, D., Tuyls, K.: Multi robot collision avoidance in a shared workspace. Auton. Robot. 42, 1749–1770 (2018) 16. van den Berg, J., Snape, J., Guy, S., Manocha, D.: Reciprocal collision avoidance with acceleration-velocity obstacles. In: 2011 IEEE International Conference on Robotics and Automation (2011)

Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory of Acceptance and Use of Technology (UTAUT) Model Approach Ankita Bhatia, Arti Chandani, Pravin Kumar Bhoyar, and Rajiv Divekar

1 Introduction Robo advisory services are now becoming more apparent and prominent in the wealth management landscape [4]. They are built upon certain mathematical algorithms that run the software. The logic to draft these algorithms is derived from the big data of investors. Once the onboarding of investors is done on the portal of Robo advisor, next is risk profiling stage, which is questionnaire-based to manifest their risk appetite and goals. These services are still in their infancy stage. The Assets Under Management (AUM) in the Robo advisor’s segment equals US$980,541 m in 2019, and the average AUM per user in the Robo advisor’s segment amounts to US$21,421 in 2019, but the countries like India have altogether a different scenario. It would be easy to comprehend the financial market by understanding the perspective of human emotions associated with it [23]. People preferring traditional services had different characteristics from those who prefer online services. People with less income, net worth, and little inheritance were the ones who preferred Robo advisory service, and they were found to be less impulsive [6]. Finance professionals encounter a major challenge while dealing with millennials, and that is a lack of trust. Millennials tend to prefer and are fascinated by Robo advisory services as they have low amounts to invest, and they want 24/7 access to their portfolio [16]. The day is very far when Robo advisory services become popular and will replace traditional financial advisors from the market.

A. Bhatia · P. K. Bhoyar · R. Divekar Symbiosis Institute of Management Studies, Symbiosis International (Deemed University), Pune, India A. Chandani () Jaipuria Institute of Management, Lucknow, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_5

53

54

A. Bhatia et al.

Individual investors in an emerging economy like India are influenced by psychological biases which have a systematic effect on their subsequent trading behavior [14]. Indeed, these biases have an origin from the outside and are plausibly quantifiable [3]. Traditional finance theories state that the market behaves efficiently and systematically. According to efficient market hypothesis (EMH) proposed by economists [4], all the market information is reflected in the stock price as the investors behave rationally. In congruence with this, economic utility theory (EUT theory) speaks about decisions taken by investors in an uncertain environment where investors take rational decisions after considering all expected utilities, unknowingly, and about the outcomes. There were massive market anomalies observed during the energy crisis in 1970, which were not aligned with EMH and EUT theories [11]. According to [8] “Behavioral finance is the study of cognitive errors and emotions in financial decisions.” India is a developing economy, where the middle-class segment has grown exponentially in terms of income level and lifestyle changes. In India, there are masses of millennials, who are amateurs in wealth the management industry. There is a large chunk of investors who can’t even afford to hire a financial advisor, to get their portfolio balanced. Automated investment advice, Robo advisory service, brings a sigh of relief to these investors. Tech-savvy, who are also cognizant of investment and savings, are expected to show an enormous amount of interest in stock market participation. Therefore, the researchers have felt the need to study the adoption of Robo advisory services in the Indian context. Financial technology is one of the emerging areas these days. Robo advisors are the by-products of fintech innovation in the wealth management industry. Investment is one of the crucial areas in today’s scenario, as investors increased income leads to increased savings which result in increased investment. This study is undertaken to uncover the behavioral biases affecting individual investor patterns of investor dealing in the stock exchange. The study would be extremely important for researchers who will be interested in studying the significance of Robo advisory services and how this automation has tapped into the wealth management industry.

2 Literature Review and Hypotheses Formulation 2.1 Technology Readiness Index (TRI) The technology readiness index (TRI) shows the readiness of an individual to embrace newer technologies that are likely to increase performance [12]. Individuals are keen to adopt new technology provided it has a positive influence on their emotional and mental state of mind representing their personality [13]. The different dimensions of TRI, available in the existing literature include insecurity, discomfort, innovativeness, and optimism. The present study makes an attempt to integrate the TRI model with technology acceptance model (TAM) to understand the investor’s

Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory. . .

55

beliefs about using Robo advisory services in the Indian context. Perceived usefulness, the dependent variable of the study, has been used from TRI for the current study.

2.2 Technology Acceptance Model (TAM) Technology adoption model is extensively used in the existing literature, and it has been a well-recognized model for studies involving the adoption of new technology. The different measurements of TAM help in predicting consumers’ attitudes toward the adoption of technology. The model has been used in varied fields such as Ecommerce, mobile banking, augmented reality, and e-learning among others. The Robo advisory services in this study concept are integrated with TRI and TAM 2.0 to understand the behavioral aspect of Indian retail investors toward its usage. There are numerous studies that have already proven the positive and significant influence of consumers’ adoption of technology through the TAM model; however, literature is scant for the application of TAM in Robo advisory services.

2.3 Perceived Usefulness Perceived usefulness is related to how an individual views the technology from the perspective of being able to be efficient and helping in reducing efforts [22]. If the individual sees that a particular technology can bring in an element of efficiency and benefit to him, its perceived usefulness will be more, and it increased the likelihood of adoption of that particular technology. Perceived usefulness is one of the important indicators which not only influence the adoption of the technology but can influence the attitude of consumers toward the technology. The earlier research has highlighted that perceived usefulness is a crucial determinant while adopting Robo advisory services [6]. Further, investors might make fair use of investments decision with the help of these technologies in the Indian stock market.

2.4 Innovativeness and Perceived Usefulness Investors are of the opinion that innovative technology can satisfy their needs and wants along with being convenient and easy to use which influences the adoption of Robo advisory. Investors’ predisposition toward new technology also helps in understanding their behavioral dimensions. Investors, who find new technology to be beneficial along with those who know and learn new technology, are more creative. These perspectives are important to get insights into investors’ behavior and their opinion on perceived usefulness [18, 19]. Robo-advisors in the fintech

56

A. Bhatia et al.

industry is a new technology in the Indian context and is capable of giving information, knowledge, and insight for investment decision-making along with being able to help investors in designing their investment portfolio. H1: Innovativeness Positively Affects the Perceived Usefulness of Robo Advisory Services

2.5 Trust and Perceived Usefulness Trust plays a crucial role in the perceived usefulness of any new technology. Trust and usefulness are important elements when it comes to the adoption of any new technology. H2: Trust and Positively affect the Perceived Usefulness of Robo Advisory Services

2.6 Insecurity and Perceived Usefulness The anxiety and stress of the investor can increase when feeling not able to control the technology and have a feeling of insecurity of the technology. Insecurity is a crucial element when understanding the factor of using less technology. Roboadvisory services can influence the end user, i.e., investors who are inclined toward safe and secure investment which can be provided by the technology-enabled platforms. These algorithm-driven Robo advisory services might be able to help the investors to make the decision while controlling the same whereas some might feel insecure while using technology. Academic literature affirms that perceived usefulness gets negatively influenced by insecurity. Investors, who feel that they are not able to control their investment, might feel insecurity which affects the perceived usefulness of Robo advisory services. H3: Insecurity Negatively Affects the Perceived Usefulness of Robo Advisory Services

2.7 Identification of Research Gap It is very much significant to understand the precursors of behavioral biases (decision inertia) before developing the algorithms and interfaces of digital decision support systems, viz., Robo advisory services. This would assist decision-makers and investors to overcome the previous situation which was proven to be unfavorable

Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory. . .

57

[10]. The perfect intersection of psychology and these digital advisory support systems can help in addressing behavioral issues more effectively. Diversified and undiversified investors have different performances during volatile markets, based on the number of stocks they hold and the market-adjusted volatility of their portfolio. There is a veritable need for the supreme design of Robo advisory services to cater to the distinct needs of investors across different categories [12]. The impact of biases is not only at the data gathering process but may creep in eventually in portfolio recommendations which may then affect the investment behaviors. Based on the extensive literature survey and to the best of the researcher’s ability, it is found that there is no study that has been carried out to understand factors influencing the perceived usefulness of Robo-advisory services in India among Indian individual investors and researchers are trying to fill this gap by undertaking present research.

3 Research Model This research especially focuses on how factors influence the adoption of Robo advisory services in the Indian context. This section gives the details of the steps involved to achieve the research objective. It also includes the proposed research model, data collection and data analysis process [1]. Based on the research gaps, a research model is prepared and the same is given in Fig. 1.

3.1 Research Methodology and Research Design It is a complete layout plan used to apply statistical techniques in order to achieve research objectives.

Fig. 1 Conceptual model. (Source: author’s representation)

58

A. Bhatia et al.

• Sample Population: The sample of this paper has targeted individual investors of the Indian stock market to explore the factors influencing the adoption of Robo advisory services. • Sampling and data collection: The primary focus of this paper is to research how factors influence the adoption of Robo advisory services. To dig deeper into this, a questionnaire is framed which is considered a cost-effective technique to gather primary data such as interviews, video conferencing, etc. A total of 175 questionnaires were floated among investors and 62 turned out to be effectively used for data analysis. Sample size majorly depends on your surroundings like your relatives, friends, and resources availability. • Statistical techniques: Quantitative statistical techniques can be applied to a minimum of 100 respondents to get statistically fit and significant results [7]. • Data collection mechanism: Primary data has been used in the present research where the data has been collected with the help of a structured questionnaire. In addition to primary data, secondary data has also been used in the form of a literature review. • Instrumentation for data collection: The questionnaire comprised three sections. Demographic questions are ordered first in number followed by behavioral questions. The last section has questions related to perceived usefulness. All questions are framed on an 11-point Likert scale. • Questionnaire and scales used: Below section deals with all the constructs and variables along with the samples taken into consideration while framing the questionnaire. – Innovativeness: This consists of five sample items that are given to evaluate the perceived usefulness [21]. Below are excerpts from that: “In general, I am not hesitant to try out new information technologies” and “I like to experiment with new technologies” – Trust: This consists of three sample items which are given to evaluate the perceived usefulness [17] Below are excerpts from that: “I believe that Robo advisors are reliable” and “In general, I trust Robo advisors for achieving my investment goals” – Insecurity: This consists of four sample items which are given to evaluate the perceived usefulness [16]. Below are excerpts from that: “I fear that while I using Robo advisory services, the battery may run out or the connection would be lost” – Perceived Usefulness: This is a dependent variable and is a reflective construct that comprises five items. One of the sample questions framed [22] is given below: “I believe using Robo advisory services will allow me to accomplish learning tasks more quickly” and “I believe using Robo advisory services would increase my productivity while investing”

Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory. . .

59

3.2 Data Analysis Method After receiving responses from the questionnaire, this raw data was first cleaned, and incomplete responses were removed. This finished data was then analyzed through IBM SPSS Statistics Version 25 Package. Data analysis was done with different statistical tools which include reliability statistics like Cronbach’s alpha for all factors taken into consideration, EFA (exploratory factor analysis), defining latent constructs, formulating measurement model, examining convergent and discriminant validities of latent constructs, and model fitness check.

4 Empirical Findings 4.1 Demographic Information Table 1 is a frequency table for demographic variables and shows that the selected sample for this piece of research is composed of 15.5% females and 84.5% males. The percentage of investors is uniformly spread across all age brackets with 9.67% in the above 40-year category. In terms of investment experience, 30.65% of investors have 0–5 years of experience. Table 1 Frequency tables for demographic variables

Gender Frequency 25 37 62

Percent 15.5 84.5 100

Frequency 8 18–25 25–30 25 11 30–35 12 35–40 6 Above 40 Total 62 Investment experience (in years) Frequency Valid 0–5 year 19 13 10–15 year 15–20 year 12 More than 20 years 18 62 Total

Percent 12.90 40.34 17.74 19.35 9.67 100

Female Male Total Age (in years) Valid

Valid

Source: Author’s representation

Percent 30.65 20.96 19.35 29.032 100

60

A. Bhatia et al.

Fig. 2 Structural model of factors influencing perceived usefulness. (Source: PLS-SEM output)

Figure 2 represents the structural model comprising four reflective constructs wherein innovativeness, trust, and insecurity are three independent variables and perceived usefulness is one dependent variable. Innovativeness is measured by five items, trust is measured by three items, insecurity is measured by four items, and perceived usefulness is measured by five items. Figure 3 shows the internal consistency or reliability table using Cronbach’s alpha. Higher the values higher the reliabilities [9]. From Fig. 4 the reliability values are directly proportionate to Cronbach’s values. For instance, all values in the table represent satisfactory and good values as they lie between (0.85 and higher). There are two ways to test discriminant validity. The first one is Fornell and Larcker criteria [5] as shown in Fig. 5. This criterion is nothing but a matrix comprising average variance extracted (AVE) values shown diagonally pertaining to all constructs. The results can be simply analyzed by looking simply at the diagonal values. If all the diagonal values are higher than their underneath values, then it means that discriminant validity is there. From Fig. 5, we can depict that all diagonal values are greater than their respective column values. In the wake of this HTMT ratio was introduced. Constructs are conceptually distinct only if they have threshold values of HTMT 0.85 and not more than that. Figure 6 shows that the HTMT values of all constructs are less than 0.85 or below. Therefore, we can safely conclude that our constructs possess discriminant validity.

Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory. . .

61

Fig. 3 Cronbach’s alpha of different constructs influencing Robo advisory services. (Source: PLSSEM output)

Fig. 4 Reliability and validity measures of different constructs. (Source: PLS-SEM output)

Fig. 5 Fornell and Larcker criteria for discriminant validity. (Source: PLS-SEM output)

A structural model comprising of innovativeness, trust, and insecurity as hypothesized shows that innovativeness, trust, and insecurity will have a positive and significant influence on the adoption of Robo advisory services. The path coefficients signify that the effect of one independent variable on the dependent variable is significant or not. The path coefficients talk about the strength of relationships with the help of PLS algorithms. Figure 7 shows that all independent variables are green in color which affect the dependent variable; therefore, it can be safely derived that all factors significantly influence the dependent variable. The R square is the coefficient of determination and variance measure explained in each of the endogenous constructs and thereby explains the models’ explanatory power [20]. The prescribed value range is between 0 and 1 [15]. The R square value of 0.525 as shown in Fig. 8 shows that model has substantial explanatory power.

62

A. Bhatia et al.

Fig. 6 Heterotrait and Monotrait ratio of all constructs. (Source: PLS-SEM output)

Fig. 7 Path coefficients of factors influencing adoption of Robo advisory services. (Source: PLSSEM output)

After performing bootstrapping, we got p values for all the exogenous constructs and found that all p values are significant and all hypotheses are accepted at a 95% confidence interval as given in Table 2.

Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory. . .

63

Fig. 8 R- square of factors perceived usefulness of Robo advisory services. (Source: PLS-SEM output) Table 2 Hypotheses summary

S. No. 1. 2. 3.

Hypotheses H1 H2 H3

Status Accepted Accepted Accepted

p-value 0.02 0.04 0.03

Source: Author’s representation

5 Conclusions and Discussions Investors are getting aware of latest technological developments day by day. It is imperative to manifest which are different inhibitors and predictors toward the perceived usefulness of Indian individual investors. It seems that both predictors of trust and innovativeness and inhibitor insecurity significantly influence the adoption of Robo advisory services. Trust and personal innovativeness are two important factors for any investor to perceive a product as useful [2].

5.1 Managerial Implications The onus is now on managers, product owners, and other fintech managers to incorporate trust-building mechanisms inside this fintech product known as Robo advisory services in order to reassure prospective/ potential investors to keep their heads up to augment the optimal use of this new fintech product. Innovators of Robo advisors have the added advantage of being leaders in the fintech industry. So, it is eminent to keep encouraging them by adding value-added features as they could serve as WOM (word of mouth) for laggards and potential users of such innovative technologies.

64

A. Bhatia et al.

5.2 Scope and Limitations This study is purely empirical in nature, and it could be of numerous uses to researchers alike. The main focus of the study is to thoroughly study different predictors and inhibitors that affect the adoption of Robo advisory services. This study will help to develop models that relate the critical factors of the performance of Robo-advisors in individual decision-making of individual investors. The study is conducted in India with individual investors which could be one of the limitations of the study. The study uses limited biases such as innovativeness, trust, and insecurity which could be another limitation. The small sample size is another limitation that can be overcome by future researchers to conduct similar types of studies on a larger sample.

References 1. Bell, E., Bryman, A.: The ethics of management research: an exploratory content analysis. Br. J. Manag. 18(1), 63–77 (2007) 2. Bhatia, A., Chandani, A., Chhateja, J.: Robo advisory and its potential in addressing the behavioral biases of investors – a qualitative study in Indian context. J. Behav. Exp. Financ. 25, 100281 (2020) 3. Fama, E.F.: Efficient market hypothesis. Doctoral dissertation, PhD Thesis, Ph.D. dissertation, University of Chicago Graduate School of Business (1960) 4. Fein, M. L.: Robo-advisors: A closer look. Available at SSRN 2658701 (2015) 5. Fornell, C.G., Larcker, D.F.: Evaluating structural equation models with unobservable variables and measurement error. J. Mark. Res. 18(1), 39–50 (1981) 6. Fulk, M., Grable, J.E., Watkins, K., Kruger, M.: Who uses Robo-advisory services, and who does not? Finan. Serv. Rev. 27(2) (2018) 7. Hair Jr., J.F., Anderson, R.E., Tatham, R.L., Black, W.C.: Multivariate data analysis. PrenticeHall, Upper Saddle River (1998) 8. Hirschey, M., Nofsinger, J.R.: Investments: Analysis and Behavior, vol. 281. McGraw-Hill Irwin, New York (2008) 9. Jöreskog, K.G.: Simultaneous factor analysis in several populations. Psychometrika. 36 (1971) 10. Jung, D., Erdfelder, E., Glaser, F.: Nudged to win: designing robo-advisory to overcome decision inertia. In: Proceedings of the 26th European Conference on Information Systems (ECIS 2018), p. 19. Karlsruher Institut für Technologie (KIT) (2018) 11. Kahneman, D., Tversky, A.: Prospect theory: an analysis of decision under risk. Econometrica. 47(2), 263–292 (1979) 12. Kim, T., Chiu, W.: Consumer acceptance of sports wearable technology: the role of technology readiness. Int. J. Sports Mark. Spons. (2019) 13. Liljander, V., Gillberg, F., Gummerus, J., Van Riel, A.: Technology readiness and the evaluation and adoption of self-service technologies. J. Retail. Consum. Serv. 13(3), 177–191 (2006) 14. Pandit, A., Yeoh, K.: Psychological tendencies in an emerging capital market: a study of individual investors in India. J. Dev. Areas, 129–148 (2014) 15. Rigdon, E.E.: Rethinking partial least squares path modeling: in praise of simple methods. Long Range Plan. 45(5/6), 341–358 (2012) 16. Rourke, C.: Leveraging the Competition: How Wealth Managers Can Use Robo-Advisors to their Advantage (2019)

Factors Influencing the Adoption of Robo Advisory Services: A Unified Theory. . .

65

17. Ryu, H.S.: What Makes Users Willing or Hesitant to Use Fintech?: the Moderating Effect of User Type. Industrial Management & Data Systems (2018) 18. Shefrin, H.: Beyond Greed and Fear. Harvard Business School Press (2000) 19. Shefrin, H., Statman, M.: Behavioral portfolio theory. J. Financ. Quant. Anal. 35(2), 127–151 (2000) 20. Shmueli, G., Sarstedt, M., Hair, J.F., Cheah, J.-H., Ting, H., Vaithilingam, S., Ringle, C.M.: “Predictive Model Assessment in PLS-SEM: Guidelines for Using PLSpredict”, Working Paper (2019) 21. Thakur, R., Srivastava, M.: Adoption readiness, personal innovativeness, perceived risk and usage intention across customer groups for mobile payment services in India. Internet Res. (2014) 22. Tversky, A., Kahneman, D.: Loss aversion in riskless choice: a reference-dependent model. Q. J. Econ. 106(4), 1039–1061 (1991) 23. Zahera, S.A., Bansal, R.: Do investors exhibit behavioral biases in investment decision making? A systematic review. Qual. Res. Finan. Markets. 10(2), 210–251 (2018)

A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention Network for Facial Expression Recognition Yanqiang Yang and Hui Zhou

1 Introduction Emotional expression is the physiological expression of human psychological activities and an important indicator of communication and interaction. Emotion recognition has been applied to human-computer interaction [1], automatic driving [2], virtual reality (VR) [3, 4], and other tasks. Different methods can be used for emotion recognition, such as face [5, 6], voice signal [7], and electroencephalogram (EEG) signal [8]. Among these methods, facial expressions are intuitive and easy to capture. Therefore, more and more researchers focus on recognizing emotions through facial expression features. Ekman and Friesen classified facial emotion into seven basic emotions: happiness, surprise, sadness, anger, disgust, fear, and neutral [9]. At the early stage, handcrafted features and traditional machine learning methods were commonly used in FER (facial expression recognition). Some widely used handcrafted features include appearance geometric features [10], directional gradient histogram (HOG) [11], local binary pattern (LBP) [12], SIFT [13], and Gabor features [14]. However, the limitation of these methods is handcrafted features cannot recognize facial expressions effectively on datasets with larger intra-class differences. Recently, with the rapidly development of deep learning, many facial expression recognition research have been conducted using various deep learning methods [15, 16]. Masakazu [17] found that convolution neural network (CNN) is robust to the affine transformation of FER to some extent. Agrawal [18] evaluated different kernel sizes and filter numbers and proposed two new CNN architectures. Mollahosseini [19] proposed a FER network with parallel convolution layers. Liu [20] built

Y. Yang · H. Zhou () School of Automation, Nanjing University of Science and Technology, Nanjing, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_6

67

68

Y. Yang and H. Zhou

feature extraction, feature selection, and emotion classification into a unit, which strengthened the discrimination ability of selected features. Ali [21] proposed adaptive correlation-based loss, which expanded the differences between classes. Xie [22] proposed a branching framework to improve classification performance by aggregating multiple features. Compared with traditional emotion recognition methods, the recognition accuracy of above methods has been significantly improved. However, the correlation between local face features and the whole face features was usually ignored in these models, and the performances of these methods had reached a bottleneck. In recent years, the attention mechanism has been applied to computer vision tasks [23–25]. Many researchers focus on FER with combination of attention mechanism and convolutional neural network. Li [26] proposed an attention-CNN model to improve the recognition accuracy of occluded faces. Shervin [27] focused on utilizing multiple face areas through the attention convolution network, reducing the size of the network without decreasing the recognition accuracy. Wang [28] proposed a regional focus network, which makes the network weight more inclined to key face areas. Zhao [29] et al. improved the face recognition effect in the field by connecting the local attention network with ResNet-18. Li [30] extracted human features into 24 regions of interest through convolution and then improved the recognition accuracy under partial occlusion of the face through a patch gating unit (PG Unit). Wang [28] enhanced regional weight by using self-attention and relationship attention to process facial features of different sizes. Moreover, the usage of the attention mechanism to improve recognition accuracy is still a challenging work. In this work, we propose a network framework for multi-region feature extraction and fusion. The proposed network extracts facial image features and salient features of the eyes and mouth, respectively. After that, multiple features are fused together for FER. The experimental results show that the proposed model could accurately recognize facial expressions.

2 The Proposed Method The proposed method is based on CNN and the attention network. The thought of our method is to preprocess the input image, detect and segment the face region, further segment the eyes and mouth region, and then input the images of these three regions into the deep network to learn the features. The method includes three steps. Firstly, the input image is preprocessed to segment the face area. And the eye area and mouth area are cut at the same time. Secondly, three images are fed to CNN and the attention network module to learn features. Finally, fuse the feature information of channel dimension through CNN and classify facial expressions.

A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention. . .

69

Fig. 1 The proposed network architecture. The face image in the figure is taken from the JAFFE dataset with the permission

2.1 Overview Figure 1 shows the proposed network architecture in this paper. The framework consists of three parts: preliminary feature extraction, high-level feature extraction, and classification with fused features. The architecture starts from the low-level feature extraction module, which acts on the images of the eye region and mouth region. Our model begins by extracting preliminary features using a shared-structure CNN module as the backbone. Following that, the features will be fed into the shared attention structure. The attention module can suppress non-salient features, increase the weight of useful features, and guide the attention of the network to the features that are crucial to expression recognition, thus enabling the network to recognize different expressions more effectively. The face image passes through a convolution layer to extract the primary features, and the output dimension is transformed to be consistent with the other two outputs. Next, the three outputs are respectively input into the CNN module with the shared structure to learn advanced features. Finally, the three features are fused in the channel dimension and input into the two-layer fully connected network for facial expression recognition.

2.2 Shared Feature Extraction Module For the FER task, facial image features are extracted and images are classified into different categories based on the differences between the extracted features. Many facial features can be extracted by convolution neural network. Different features have different meanings, but not all features are helpful for classification. In recent research [31], the attention mechanism has been proven to be helpful to measure the

70

Y. Yang and H. Zhou

Fig. 2 Channel attention

degree of attention of different regional features, while channel attention and spatial attention mechanism can select fine-grained important pixels. Figure 2 depicts the channel attention network structure. Channel attention is to focus on what characteristics are meaningful for task. Because each channel of the feature represents a special detector, meaningful features can be extracted through channel attention. To summarize the spatial feature information, global average pooling and maximum pooling are used to use different information respectively, and then the two features are integrated through shared MLP. Suppose the input is F ∈ RC × H × W , where C is the channel dimension, H and W are height and width dimensions, and there are: MC (F ) = σ1 (σ21 (Maxpool(F )) + σ22 (Avgpool(F )))

(1)

where Mc (F) represent output, σ 1 represents the last sigmoid activation function, σ 21 and σ 22 both represent the Relu activation function of the shared MLP layer, and the second number indicates that it is updated at different times. Spatial attention focuses on which areas are meaningful for task. Figure 3 shows its structure. First, the max pooling layer and average pooling layer is used to extract the characteristics of each channel, then splice the two features together according to the channel dimension, and then obtain the feature weight coefficients of different regions through a layer of convolution and activation function. The weight coefficient stands for the degree of significance. If the input is F, there is: MS (F ) = σ3 (conv (f (Maxpool(F ), Avgpool(F ))))

(2)

where MS (F) represent output, σ 3 is the sigmoid activation function, conv represents a convolution operation to integrate two pooled channels into one channel, and f represents the concatenation of two pooled data in the channel dimension. In previous studies [32], the eye area and mouth area are important for FER. The proposed method sends the feature information of these two local regions to the attention network after preliminary extraction, which enhances the feature information of these two regions. Assuming that the input of the attention module is expressed as Fin ∈ RC × H × W , the output can be expressed as:

A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention. . .

71

Fig. 3 Spatial attention

    F0 = Fin + Fin ∗ Mc (Fin ) ∗ Ms Fin ∗ Mc (Fin ) = Fin + Mc1 (Fin ) ∗ Ms (Mc1 (Fin ))

(3)

where MC (Fin ) and Ms (Fin ∗ Mc (Fin )) represent the output of channel attention network and spatial attention network successively and Mc1 (Fin ) = Fin ∗ Mc (Fin ).

2.3 Shared CNN After the shared feature extraction module, the dimensions of the three features are the same. Then they are input into the CNN with the same structure and finally fused in the FC layer. It is assumed that the output of their CNN is M1 (Fin ), M2 (Fin ), and M3 (Fin ), then the final classification result is: M0 (Fin ) = σ4 (M1 (Fin ) ⊕ M2 (Fin ) ⊕ M3 (Fin ))

(4)

where σ 4 is the activation function of two full connections and ⊕ is concatenation in the channel dimension.

2.4 Structural Parameters of the Network This section described the specific structural parameters of the network. The general structure of the network is shown in Fig. 1. In the figure, the first part is the conv1 layer and the shared feature extraction module. Here, conv1 layer is the convolution layer with output channel size = 64,

72

Y. Yang and H. Zhou

Table 1 Structure of shared feature extraction module Layer conv2 conv3 Channel attention

Parameter output channel = 64, kernel = 2 × 4, stride = 1 × 2, padding = 1 output channel = 64, kernel = 4 × 3, stride = 1 × 1, padding = 1 avg pooling, max pooling FC avg pooling, max pooling convolution

Spatial attention

Table 2 Structure of shared CNN Layer conv4 max pooling1 conv5 max pooling2

Parameter output channel = 128, kernel = 3 × 3, stride = 1 × 1, padding = 1 kernel = 2 × 2, stride = 2 × 2, padding = 1 output channel = 64, kernel = 3 × 3, stride = 1 × 1, padding = 1 kernel = 3 × 3, stride = 2 × 2, padding = 1

kernel size = 4 × 4, stride = 2 × 2 and padding = 1. Table 1 displays the parameters of the shared feature extraction module. After the first part of data processing, the three inputs will enter the shared CNN with the same structure, and its structure is shown in Table 2. The final part is the full connection layer, which uses two 1024 full connection layers to classify and output expressions.

3 Experiment In this section, we verify the performance of our model on four databases and compare it with some recent work. And the performance of the model in each database is analyzed.

3.1 Datasets The proposed model has been tested on four datasets: Cohn-Kanade (CK +) [33, 34], Japanese female facial expression (JAFFE) [35], Real world Affective Faces Database (RAF-DB) [36], and Facial Expression Recognition 2013 dataset (FER2013) [37]. Figure 4 shows representative facial expression image of four datasets.

3.1.1

CK+ Dataset

The CK+ dataset were collected from 123 participants, with a total of 593 video clips. Among these video sequences, 327 video sequences were labeled with

A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention. . .

73

Fig. 4 The sample image of four datasets. From top to bottom, each row represents a sample diagram of JAFFE, CK+, RAF-DB, and FER-2013. The images in the figure are taken with the permissions from by JAFFE, CK+, RAF-DB, and FER-2013, respectively

emotions. Three frames were taken from the end of each video as data samples [33, 34].

3.1.2

JAFFE Dataset

The JAFFE dataset is composed of facial expression data from 10 Japanese women, with whom seven different basic facial expressions were captured. Then each image was scored by 60 annotators based on facial expressions [35].

3.1.3

RAF-DB Dataset

The RAF-DB dataset is composed of about 30 K facial images. These images were downloaded from the Internet and removed with clutter. Each image was scored by 40 annotators and divided into seven basic and composite emotions [36].

3.1.4

FER-2013 Dataset

The FER-2013 dataset is consisted of 35,886 images with 48*48 grayscale. These images were divided into seven basic expressions. Since most of the data in the FER-2013 dataset were downloaded using web crawlers, there were many nonfacial images and labeling errors [37].

74

Y. Yang and H. Zhou

3.2 Data Augmentation For deep learning, if there is a lack of data, over-fitting will occur. Over-fitting will reduce the model’s prediction effect and generalization performance. Therefore, data augmentation is critical, particularly for datasets with uneven distribution. The commonly used data augment methods include rotating images, cutting the image, changing image size, and enhancing image noise. To improve the effect of FER and enhance the generalization performance of the network, random image horizontal flipping and random 0–20◦ image rotation are performed on the data of each dataset. In addition, due to the serious imbalance in the distribution of different label data in the RAF-DB dataset, the anger and disgust label data with fewer data samples are expanded to ensure that the data with each tag is close in amount. The data of FER-2013 with disgust label were also expanded due to the very limited data samples.

3.3 Comparison with Other Works Our work has achieved good results on four datasets. Tables 3, 4, 5, and 6 show the comparison results with other works. In the table, the results of the pure facial channel convolution neural network with the eye and mouth area channels removed (only CNN face) were also shown. Table 3 shows that our method accuracy in the CK+ dataset is 99.14%. In Table 4, our method recognition accuracy is 93.81%, traditional methods produced better Table 3 Comparison on CK+

Methods Island loss + CNN [38] DTAGN [39] ST network [40] DNN [41] FN2EN [42] Ours (CNN only face) Ours (proposed CNN + attention)

Accuracy (%) 94.35 97.25 98.50 97.59 98.60 97.75 99.13

Table 4 Comparison on JAFFE

Methods CNN [43] mSVM [36] gaCNN [26] VGG19 [44] LBP + HOG [45] Ours (CNN only face) Ours (proposed CNN + attention)

Accuracy (%) 79.06 88.95 92.80 93.00 96.00 90.63 95.24

A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention. . .

75

Table 5 Comparison on RAF-DB

Methods FSN [46] gaCNN [26] LDL-ALSG [47] RAN [28] SCN [48] Ours (CNN only face) Ours (proposed CNN + attention)

Accuracy (%) 81.10 85.07 85.53 86.90 87.03 84.35 88.87

Table 6 Comparison on FER-2013

Methods CNN [49] GoogleNet [50] VGG + SVM [51] E-fcnn [52] Deep Emotion [27] Ours (CNN only face) Ours (proposed CNN + attention)

Accuracy (%) 62.44 65.20 66.17 66.31 70.02 65.08 67.42

results in this dataset. This is because the amount of data on the JAFFE dataset is small, and the deep network cannot get enough features, so the performance of deep learning is not as good as that of manual features. Table 5 shows that our method performs well on RAF-DB, with an accuracy of 88.87%. As far as we know, a lot of pre-trained large models can only achieve 85% recognition accuracy. Table 6 demonstrates that our method’s performance on the FER-2013 dataset is relatively general. This is mainly because the FER-2013 data is too messy and many pictures can’t accurately detect the face area, resulting in the inability to give full play to the network performance.

3.4 Result Analysis In this section, we will compare the performance of the above datasets. We use the confusion matrix to display the emotion classification results on the four datasets. Figure 5 shows the confusion matrix of CK+, JAFFE, RAF-DB, and FER-2013 datasets. It can be observed from the figure that the happiness and disgust are easier to identify, while neutrality and sadness are more difficult to distinguish. In the RAFDB dataset, the recognition rates of neutrality and sadness are low. The reason may be human beings sometimes behave more peacefully when they are sadness. Thus, it is difficult to distinguish sadness and neutrality. On the FER-2013 dataset, the recognition accuracy of the proposed network is decreased, which may be due to the inconsistent pose of the subjects, face occlusion, side face and invalid face images, etc. Our network performs well on CK+ and JAFFE datasets and can distinguish almost all expressions with high accuracy.

76

Y. Yang and H. Zhou

Fig. 5 Confusion matrix of FER accuracy on four datasets, (a) is JAFFE dataset, (b) is CK+ dataset, (c) is RAF-DB dataset and (d) is FER-2013 dataset

4 Conclusion We propose a multi-region feature fusion network to recognize facial expressions in this paper. The framework extracts face features and salient features of eyes and mouth regions, allowing the network to learn enough information for face recognition. According to the experimental results on CK+, JAFFE, RAF-DB, and FER-2013, the proposed network on four datasets shows better classification accuracy than other listed networks. Besides, the experiment results show that the network has strong generalization ability for facial emotion recognition. However, the proposed network relies on face detection. Therefore, in future work, it is important to study how to extract local attention information under face occlusion.

A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention. . .

77

References 1. Sheridan, T.B.: Human–robot interaction: status and challenges. Hum. Factors. 58(4), 525–532 (2016) 2. Assari, M.A., Rahmati, M.: Driver drowsiness detection using face expression recognition. In 2011 IEEE international conference on signal and image processing applications (ICSIPA), IEEE. pp. 337–341 (2011) 3. Hickson, S., Dufour, N., Sud, A., Kwatra, V., Essa, I.: Eyemotion: Classifying facial expressions in VR using eye-tracking cameras. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, pp. 1626–1635 (2019) 4. Chen, C.H., Lee, I.J., Lin, L.Y.: Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders. Res. Dev. Disabil. 36, 396–403 (2015) 5. Aneja, D., Colburn, A., Faigin, G., Shapiro, L., Mones, B.: Modeling stylized character expressions via deep learning. In Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20–24, 2016, Revised Selected Papers, Part II 13, pp. 136–153. Springer International Publishing (2017) 6. Poulose, A., Reddy, C.S., Kim, J.H., Han, D.S.: Foreground Extraction Based Facial Emotion Recognition Using Deep Learning Xception Model. In 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN), IEEE, pp. 356–360. Jeju Island (2021, August) 7. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In Interspeech 2014. pp. 223–227 (2014) 8. Petrantonakis, P.C., Hadjileontiadis, L.J.: Emotion recognition from EEG using higher order crossings. IEEE Trans. Inf. Technol. Biomed. 14(2), 186–197 (2009) 9. Ekman, P., Keltner, D.: Universal facial expressions of e-motion. In: Segerstrale, U., Molnar, P. (eds.) Nonverbal Communication: Where Nature Meets Culture, pp. 27–46. Routledge (1997) 10. Munasinghe, M.I.N.P.: Facial expression recognition using facial landmarks and random forest classifier. In 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), IEEE. pp. 423–427 (2018) 11. Chen, J., Chen, Z., Chi, Z., Fu, H.: Facial expression recognition based on facial components detection and hog features. In International workshops on electrical and computer engineering subfields. pp. 884–888 (2014) 12. Jun, H., Jian-Feng, C.A.I., Ling-zhi, F., Zhong-Wen, H.: A method of facial expression recognition based on LBP fusion of key expressions areas. In The 27th Chinese Control and Decision Conference (2015 CCDC), IEEE. pp. 4200–4204 (2015) 13. Rister, B., Wang, G., Wu, M., Cavallaro, J.R.: A fast and efficient SIFT detector using the mobile GPU. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE. pp. 2674–2678 (2013) 14. Gu, W., Xiang, C., Venkatesh, Y.V., Huang, D., Lin, H.: Facial expression recognition using radial encoding of local Gabor features and classifier synthesis. Pattern Recogn. 45(1), 80–91 (2012) 15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM. 60(6), 84–90 (2017) 16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016) 17. Matsugu, M., Mori, K., Mitari, Y., Kaneda, Y.: Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Netw. 16(5–6), 555–559 (2003) 18. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 36(2), 405–412 (2020)

78

Y. Yang and H. Zhou

19. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In 2016 IEEE Winter conference on applications of computer vision (WACV), IEEE. pp. 1–10 (2016) 20. Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1805–1812. Columbus, OH (2014) 21. Fard, A.P., Mahoor, M.H.: Ad-corre: adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access. 10, 26756–26768 (2022) 22. Xie, S., Hu, H., Wu, Y.: Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn. 92, 177–191 (2019) 23. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., . . . Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) 24. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7132–7141. Salt Lake City, UT (2018) 25. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI conference on artificial intelligence. vol. 31(1) (2017) 26. Li, Y., Zeng, J., Shan, S., Chen, X.: Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 28(5), 2439–2450 (2018) 27. Minaee, S., Minaei, M., Abdolrashidi, A.: Deep-emotion: facial expression recognition using attentional convolutional network. Sensors. 21(9), 3046 (2021) 28. Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020) 29. Zhao, Z., Liu, Q., Wang, S.: Learning deep global multi-scale and local attention features for facial expression recognition in the wild. IEEE Trans. Image Process. 30, 6544–6556 (2021) 30. Li, Y., Zeng, J., Shan, S., Chen, X.: Patch-gated CNN for occlusion-aware facial expression recognition. In 2018 24th International Conference on Pattern Recognition (ICPR), IEEE. pp. 2209–2214 (2018) 31. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). pp. 3–19 (2018) 32. Gera, D., Balasubramanian, S.: Landmark guidance independent spatio-channel attention and complementary context information based facial expression recognition. Pattern Recogn. Lett. 145, 58–66 (2021) 33. Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In Proceedings fourth IEEE international conference on automatic face and gesture recognition (Cat. No. PR00580), IEEE. pp. 46–53 (2000) 34. Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended cohnkanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops, IEEE. pp. 94–101 (2010) 35. Lyons, M.J., Kamachi, M., Gyoba, J.: Coding facial expressions with Gabor wavelets (IVC special issue). arXiv preprint arXiv:2009.05938 (2020) 36. Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2852–2861. Honolulu, HI (2017) 37. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., . . . , Bengio, Y.: Challenges in representation learning: A report on three machine learning contests. In Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3–7, 2013. Proceedings, Part III 20. pp. 117–124. Springer, Berlin/Heidelberg (2013)

A Multi-region Feature Extraction and Fusion Strategy Based CNN-Attention. . .

79

38. Cai, J., Meng, Z., Khan, A.S., Li, Z., O’Reilly, J., Tong, Y.: Island loss for learning discriminative features in facial expression recognition. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE. pp. 302–309. Xi’an (2018) 39. Jung, H., Lee, S., Yim, J., Park, S., Kim, J.: Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). pp. 2983–2991. Santiago (2015) 40. Zhang, K., Huang, Y., Du, Y., Wang, L.: Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Trans. Image Process. 26(9), 4193–4203 (2017) 41. Cui, Z., Song, T., Wang, Y., Ji, Q.: Knowledge augmented deep neural networks for joint facial expression and action unit recognition. Adv. Neural Inf. Proces. Syst. 33, 14338–14349 (2020) 42. Ding, H., Zhou, S.K., Chellappa, R.: Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). pp. 118–126. IEEE. Washington, DC (2017) 43. Ferreira, P.M., Marques, F., Cardoso, J.S., Rebelo, A.: Physiological inspired deep neural networks for emotion recognition. IEEE Access. 6, 53930–53943 (2018) 44. Ramalingam, S., Garzia, F.: Facial expression recognition using transfer learning. In 2018 International Carnahan Conference on Security Technology (ICCST). pp. 1–5 (2018) 45. Yaddaden, Y., Adda, M., Bouzouane, A.: Facial expression recognition using locally linear embedding with lbp and hog descriptors. In 2020 2nd International Workshop on HumanCentric Smart Environments for Health and Well-being (IHSH). pp. 221–226. Boumerdes (2021) 46. Zhao, S., Cai, H., Liu, H., Zhang, J., Chen, S.: Feature Selection Mechanism in CNNs for Facial Expression Recognition. In BMVC. vol. 12 (2018) 47. Chen, S., Wang, J., Chen, Y., Shi, Z., Geng, X., Rui, Y.: Label distribution learning on auxiliary label space graphs for facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 13984–13993. Seattle (2020) 48. Wang, K., Peng, X., Yang, J., Lu, S., Qiao, Y. Suppressing uncertainties for large-scale facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 6897–6906. Seattle (2020) 49. Liu, K., Zhang, M., Pan, Z.: Facial expression recognition with CNN ensemble. In 2016 International Conference on Cyberworlds (CW), IEEE. pp. 163–166. Chongqing, China (2016) 50. Giannopoulos, P., Perikos, I., Hatzilygeroudis, I.: Deep learning approaches for facial emotion recognition: A case study on FER-2013. Advances in Hybridization of Intelligent Methods: Models, Systems and Applications, 1–16 (2018) 51. Georgescu, M.I., Ionescu, R.T., Popescu, M.: Local learning with deep and handcrafted features for facial expression recognition. IEEE Access. 7, 64827–64836 (2019) 52. Shao, J., Cheng, Q.: E-FCNN for tiny facial expression recognition. Appl. Intell. 51, 549–559 (2021)

Real Time Surgical Instrument Object Detection Using YOLOv7 Laiwang Zheng and Zhenzhong Liu

1 Introduction In the last few years, in the field of medicine and robotics, the cooperation between robots and neurosurgery is a research hotspot, and surgical robots have gradually become a hot topic in the fields of medicine and robotics. Therefore, object detection algorithms for surgical instruments have also become an active research direction in the field of machine vision. In the past, doctors inserted endoscopes and instruments into patients and watched them on a screen. This method has high requirements for the technical level of clinicians and the coordination between clinicians [1]. In addition, one or more surgeons are required to operate the endoscope alone during the operation, and the images generated by the endoscope will also be affected by human factors such as doctor’s fatigue, emotional fluctuations, and handshaking. It may even cause damage to the organs or tissues of patients in the actual operation [2]. The robot with high detection accuracy and stability can effectively alleviate the impact of emotion and fatigue. For example, the Da Vinci surgical robot is currently more advanced in the world [3]. The surgical equipment detected and analyzed the surgical image in real time, extracted the characteristic of the tip of the surgical instrument in the image, and determined its position and spatial posture. Thus, it provides a basis for further surgical navigation [4]. At present, there are many interference factors in the surgical scene that affect the image quality. In addition, surgical instruments will also have

L. Zheng · Z. Liu () Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, School of Mechanical Engineering, Tianjin University of Technology, Tianjin, China National Demonstration Center for Experimental Mechanical and Electrical Engineering Education, Tianjin University of Technology, Tianjin, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_7

81

82

L. Zheng and Z. Liu

distortions or high artifacts, which will bring a certain difficulty to the detection task. The birth of deep learning object detection algorithms makes up for the shortcomings of traditional algorithms to a great extent. Choi [5] et al. proposed an algorithm to improve the speed and accuracy by improving YOLO, which is a single-stage convolutional neural network (CNN). By improving Faster R-CNN, automatic detection, tracking, and classification of surgical instruments in the video of real laparoscopic surgery are realized [6]. Zhaorui Chen et al. [7] combined convolutional neural network with line segment detection to detect surgical instruments. Sarikaya [8] et al. predicted image and temporal motion cues by integrating regional suggestion network (RPN) and multimode dual-stream CNN. Real-time detection of surgical instruments is a very important topic in the field of surgical robots. Because of the need to better improve real-time detection, accuracy, and robustness, this paper proposes a real-time detection method for surgical instruments based on YOLOv7.

2 Surgical Instrument Detection Algorithm 2.1 YOLO Algorithm YOLO [9] is a high-performance algorithm in the field of target detection. R-CNN [10] no longer requires as much time as traditional methods to perform object detection. However, in the first step of the R-CNN process, the number of candidate boxes extracted from the original image by Selective Search is up to several thousand, which requires each box to be classified by CNN [11] features +SVM, and the computational load is very large, resulting in a very slow detection speed of R-CNN. This certainly adds complexity to the operation. In contrast, YOLO adopts the regression method, so the detection speed is very fast and no complex framework is needed. The reason why YOLO algorithm can realize realtime target detection largely depends on its differences with R-CNN.

2.2 Network Structure of YOLOv7 YOLOv7 is an algorithm with balanced model accuracy and reasoning performance. The model reparameterization module, operator structures such as Swinv2, and auxiliary detection branches, which are recently open source, are added to realize the problem of how to deal with the output allocation of different layers of dynamic labels to improve the detection accuracy. The network structure of YOLOv7 is shown in Fig. 1.

Real Time Surgical Instrument Object Detection Using YOLOv7

83

Fig. 1 Network structure of YOLOv7

Fig. 2 Backbone network structure of YOLOv7

Compared with YOLOv5, the input end of YOLOv7 uses the whole preprocessing method of YOLOv5. The difference is that YOLOv7 resizes the input images to 640*640 which was input to backbone networks and then outputs three layers of different size feature maps through the head layer network. The prediction results are output by Rep-Conv. The Backbone of YOLOv7 is shown in Fig. 2. CBS is mainly composed of Conv + BN + SiLU. After 4 CBS, the feature map becomes 160*160*256. Then it will pass through the ELAN module composed of multiple CBS. During this period, the input and output feature sizes remain the same. In the beginning, the number of two convolutional channels will change, and the following input channels will be consistent with the output channel after the last CBS output is the desired channel. In the backbone network, the proposed extended ELAN(E-ELAN), by controlling the shortest and longest gradient path, the deeper network can effectively learn and converge [12], and the group convolution is used to increase the number of new features. In order to continuously enhance the learning ability of the network

84

L. Zheng and Z. Liu

Fig. 3 Extended efficient layer aggregation networks. (a) VoVNet. (b) CSPVoVNet. (c) ELAN. (d) E-ELAN

without destroying the original gradient path, the cardinality methods of shuffle and merge cardinality should be used to combine the features of different populations. The gradient analysis method makes the speed of reasoning faster and the accuracy of reasoning higher. Figure 3 shows the main architecture of its extended highefficiency layer aggregation network.

Real Time Surgical Instrument Object Detection Using YOLOv7

85

In the Head network, it is a PAFPN structure, which is similar to the previous YOLOv4 [13] and YOLOv5, and the overall structure of the detection head is also similar to YOLOv5, which is still an anchor-based structure. The difference is that the CSP module in YOLOv5 is replaced by the ELAN-W module and MP structure. YOLOv7 and YOLOv5 use classification loss, localization loss, and confidence loss in the loss function. The model can be made more complete.

3 Experimental Design and Implementation 3.1 Real Surgical Dataset In this article, in order to make the more plentiful and more comprehensive evaluation of the operation quality of laparoscopic cholecystectomy or minimally invasive cholecystectomy in reality, the m2cai16-tool-locations dataset proposed by Jin et al. [6] and the region-based convolutional neural network [14] were used to detect the spatial boundaries of tools. The m2cai16-tools [15] dataset used in this study is a video of cholecystectomy surgery performed at the University Hospital of Strasbourg, France. And the m2cai16-tool was extended with spatial annotation. Sarikaya [8] et al. solved the problem of tool positioning in the video of robotassisted surgical training tasks using a multimodal convolution neural network. Based on m2cai16-tool-locations [16] dataset, 50%, 30%, and 20% were used for training, validation, and testing. A total of 2532 spatial tool annotations were selected out of the 23,000 frames, including 7 categories of surgical instruments, as shown in Fig. 4, namely grippers, bipolar clips, hooks, scissors, pliers, irrigators, and sample bags, with a total of 3141 annotations. Table 1 shows the types of appliances and the number of corresponding images of the seven appliances in the test and training datasets.

3.2 Evaluation Indicators The evaluation index is the important parameter for evaluating object detection algorithms. Target detection includes precision, recall, mAP, FPS, and other evaluation indexes. The accuracy refers to the correct possibility of the detected target. The recall refers to the correct percentage of all test specimens, which is the index to measure the performance of network structure [17]. The mAP mean refers to the average detection accuracy of all samples, which can represent the global performance. FPS is used for network speed evaluation.

86

L. Zheng and Z. Liu

Fig. 4 Schematic diagram of the surgical instruments: (a) scissor, (b) hook, (c) irrigator, (d) clipper, (e) grasper, (f) bipolar, (g) specimen bag

3.3 Experimental Environment In this paper, based on the Python framework and Cuda, the YOLOv7 network surgical instrument detection model was built in the Window11 operating system using Python language, and the network was trained using NVIDIA GeForce RTX3050 Loop GPU.

3.4 Experimental Results and Analysis The test results show the precision, recall, and FPS of the model, as shown in Table 2. Figure 5 shows the visual results of the detection of some surgical instruments. The P-R curves of the training results are shown in Fig. 6. The recall is the horizontal axis, and precision is the vertical axis. The model is evaluated by the area enclosed by the two coordinate axes. The larger the corresponding area, the better the effect of the model. The AP and mAP of various surgical instruments are very obvious, as shown in Table 3. It proves the efficiency and accuracy of YOLOv7 used in the detection of surgical instruments. The training results are shown in Fig. 7. The training results include each loss mean, precision, and recall of the training set and the validation set. It reflects the extent to which the predicted value of the model is different from the actual value of the model, which greatly affects the performance of the model.

Real Time Surgical Instrument Object Detection Using YOLOv7 Table 1 Annotated surgical instrument data distribution

Table 2 Test results of surgical instruments

87 Surgical instrument Scissors Hook Irrigator Clipper Grasper Bipolar Specimen bag Total

Algorithm YOLOv7

Recall (%) 93.6

Precision (%) 94.4

Number 400 308 485 400 923 350 275 3141 FPS 112.3

Fig. 5 Test results of surgical instruments

Compared with the experimental results of other models, as shown in Table 4, it can be seen from the result that the YOLOv7-based surgical instrument detection model is superior to previous detection models.

88

L. Zheng and Z. Liu

Fig. 6 P-R curves of surgical instruments Table 3 The average detection accuracy of surgical instruments

Surgical instruments Specimen bag Clipper Scissors Hook Bipolar Grasper Irrigator mAP

Fig. 7 Training results for surgical instruments

Precision (%) 96.3 99.1 92.6 97.4 92.6 91.1 91.9 –

Recall (%) 93.5 93.4 93.4 96.2 96.4 87.9 94.2 –

AP (%) 96.8 97.3 95.1 99.3 96.4 91.2 94.8 95.8

Real Time Surgical Instrument Object Detection Using YOLOv7 Table 4 Comparison to other models

Methods Choi et al. [5] Jin et al. [6] Twinanda et al. [15] Sahu et al. [17] Raju et al. [18] Kyungmin et al. [19] Yan Wang et al. [20] Our model

89 mAP (%) 72.3 81.8 52.5 61.5 63.7 84.7 87.6 95.8

4 Conclusions In this paper, the experiment uses the surgical instrument detection method based on YOLOv7 convolution neural network. Through comparison, it can be proved that the detection method is superior to other methods in real-time and accuracy. The latest YOLOv7 in the YOLO series is used. The accuracy and detection speed are higher compared with the earlier version. Sophisticated detection was achieved which was not possible with earlier YOLO. Even though YOLOv7 has higher specific evaluation criteria compared with other algorithms, there are still detection errors. Since the dataset is a real laparoscopic cholecystectomy operation, in which the surgical instruments and endoscopes will move irregularly, it is extremely challenging during the experiment, and there will also be many interference factors such as the occlusion of the surgical instruments by tissues and organs, motion blur, and insufficient light that affect the detection accuracy. By improving the quality and quantity of the data set to increase the richness and complexity of the image scenes, as well as the use of more high-end configuration of experimental equipment, YOLOv7’s performance in the detection of surgical instruments will definitely be further improved. Acknowledgments This study was supported by the National Natural Science Foundation of China (Grant No. 61873188).

References 1. Wang, Y., et al.: Visual detection and tracking algorithms for minimally invasive surgical instruments: a comprehensive review of the state-of-the-art. Robot. Auton. Syst., 103945 (2021) 2. Yang, C., Zhao, Z., Sanyuan, H.: Image-based laparoscopic tool detection and tracking using convolutional neural networks: a review of the literature. Comput. Assisted Surg. 25(1), 15–28 (2020) 3. Rosen, J., Hannaford, B., Satava, R.M.: Surgical Robotics: Systems Applications and Visions [M], pp. 199–217. Springer, New York (2011)

90

L. Zheng and Z. Liu

4. Zhou, J., Payandeh, S.: Visual tracking of laparoscopic instruments. J. Autom. Control Eng. 2(3), 234–241 (2014) 5. Choi, B., Jo, K., Choi, S., et al.: Surgical-tools detection based on Convolutional Neural Network in laparoscopic robot-assisted surgery[C]. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, pp. 1756–1759 (2017). 6. Jin, A., Yeung, S., Jopling, J., et al.: Tool detection and operative skill assessment in surgical videos using region-based convolutional neural networks[C]. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 691–699 (2018) 7. Chen, Z., Zhao, Z., Cheng, X.: Surgical instruments tracking based on deep learning with lines detection and spatio-temporal context. In 2017 Chinese Automation Congress (CAC), Jinan, pp. 2711–2714, https://doi.org/10.1109/CAC.2017.8243236 (2017) 8. Sarikaya, D., Corso, J.J., Guru, K.A.: Detection and localization of robotic tools in robotassisted surgery videos using deep neural networks for region proposal and detection [J]. IEEE Trans. Med. Imaging. 36(7), 1542–1549 (2017) 9. Redmon, J., Diwala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 10. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017) 11. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM. 60(6), 84–90 (2017) 12. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022) 13. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) 14. Kranzfelder, M., et al.: Real-time instrument detection in minimally invasive surgery using radiofrequency identification technology. J. Surg. Res. 185(2), 704–710 (2013) 15. Twinanda, A.P., et al.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging. 36(1), 86–97 (2016) 16. Amy, J., et al.: Tool detection and operative skill assessment in surgical videos using regionbased convolutional neural networks. Winter Conference on Applications of Computer Vision abs:1802.08774, 691–699 (2018) 17. Sahu, M., et al.: Tool and phase recognition using contextual CNN features. arXiv preprint arXiv:1610.08854 (2016) 18. Wang, S., Raju, A., Huang, J.: Deep learning based multi-label classification for surgical tool presence detection in laparoscopic videos. 2017 IEEE 14th international symposium on biomedical imaging (ISBI 2017). IEEE, 2017 19. Jo, K., et al.: Robust real-time detection of laparoscopic instruments in robot surgery using convolutional neural networks with motion vector prediction. Appl. Sci. 9(14), 2865 (2019) 20. Wang, Y., et al.: Object detection of surgical instruments based on Yolov4. 2021 6th IEEE International Conference on Advanced Robotics and Mechatronics (ICARM). IEEE (2021)

A Lightweight Blockchain Framework for Visual Homing and Navigation Robots Mohamed Rahouti

, Damian M. Lyons

, and Lesther Santana

1 Introduction Visual homing is a lightweight approach to visual navigation for a mobile robot. Using stored visual information of a target “home” location, visual homing can be used to navigate back to this home location from any other location in which the home location is visible by comparing the home image to the current image [10]. It does not require a stored map of the environment and can be combined with an obstacle avoidance functionality for generality. This makes visual homing very attractive for robot platforms with a low computational capacity, such as small UAV drones and ground robots [6]. It is also attractive for applications where global map information may not be available, e.g., GPS-denied or rapidly changing environments. A robot might store multiple different home location images related to tasks or activities it needs to perform, or the images may be transmitted to it from another robot or camera system. Visual homing systems do not require GPS and can operate reliably in the absence of GPS-based localization. A key limitation of visual homing is that the home location must be within the field of view (FOV) of the robot to start homing,

M. Rahouti () Fordham University, New York, NY, USA e-mail: [email protected] D. M. Lyons Fordham University, The Bronx, NY, USA Robotics and Computer Vision Laboratory, The Bronx, NY, USA e-mail: [email protected] L. Santana Fordham University, The Bronx, NY, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7_8

91

92

M. Rahouti et al.

and this restricts the use of the method to just the locality of the home location. One solution to address such a limitation is by breaking a path into “line of sight” segments or using a topological stored map with the edges labeled with intermediate home images. However, this additional map requirement may be problematic for lightweight implementations or GPS-denied or rapidly changing environments. In this chapter, we propose to address the issue of the home location not being in the FOV by considering the case where the robot is part of a team of robots and by leveraging camera information from other robots or camera resources in the team. Therefore, the research problem of this study is formulated as follows: If a robot is tasked with traveling to a home location, and that location is not in its FOV, then the robot attempts to find another robot in the team that can see the home location and travels toward that robot’s location. To stay within the visual homing paradigm, we do not assume that the first robot has access to or can navigate to the spatial coordinates of the second robot. Instead, the two must identify a common visual landmark that can be used as an “intermediate” home location. While this example considers only two robots and one shared visual landmark, the approach generalizes to n robots and .n − 1 landmarks, as shown in the graph in Fig. 1, which depicts a team of individual visual homing robots with visible landmarks of which there are several shared landmarks. In this chapter, blockchain technology is adopted to provide decentralized communication among visual homing robots. Such a decentralized communication is trustworthy in terms of data integrity (immutability), auditability, and transparency (i.e., tamper-proof data sharing among robots as they complete their moving task)

Fig. 1 Graph of team of robots (triangles) with visible landmarks (ellipses) for each shown as an edge from robot to landmark. The chain of robots (triangles) with common visual landmarks (ellipses) is shown as heavier line

Lightweight Blockchain for Visual Homing and Navigation

93

[4, 16]. The deployment of blockchain structure in visual homing will guarantee a tamper-proof record of robots’ transactions (i.e., individual robot’s location record) and rapid retrieval of field views to facilitate the per-robot navigation task [22]. One of the most challenging parts of the problem is the video analysis and identification of common landmarks. In this chapter, we focus on building and evaluating that part of the problem. The proposed blockchain architecture ensures key security objectives, trusted data origin, permanent and instant data integrity, backend resiliency, and trustworthy-based accountability. The remaining of this chapter is organized as follows. Section 1 provides a background of the research problem and challenges. Section 2 discusses the state-of-the-art literature related to our work. Section 3 presents the methodology and architectural design of the proposed framework, in addition to the research and practical implications related to the framework implementation. Next, Sect. 4 provides the experimental setups and key evaluation findings demonstrating the efficiency of the proposed solution. Last, Sect. 5 gives the concluding remarks of this study along with potential future plans.

2 State of the Art Visual homing, originally developed as a model of animal behavior [5], has been extensively studied in robotics; see [9, 13, 15] for a review. Its advantages are that it is lightweight and does not require a metric map data structure or GPS synchronization. The FOV restriction has been addressed by several authors. Steltzer’s [20] Trailmap approach uses a map structure to link several lines of sight landmarks together, allowing wide-area navigation. However, such a map would need to be constantly updated and shared in a changing environment, e.g., operating outdoors in all seasons and weather. Furthermore, a global map data structure might exceed the memory capacity of a small robot. Our approach proposes that robots in a team communicate and identify shared common landmarks and that these landmarks, which are up-to-date and path specific, take the place of the intermediate line of sight landmarks in other work. Identifying common visual imagery is a problem in which deep-learning methods have shown great success. For example, YOLO [17] employs a CNN architecture with 24 convolutional layers followed by two fully connected layers. The image is divided into a fixed-size grid, and when an object is recognized by a grid cell, that grid is responsible for predicting the object class probability and bounding box. This allows YOLO to propose multiple object class matches in a single image pass. Lyons and Petzinger [12] evaluate several combinations of CNN-based YOLO with SIFT-based feature recognition to identify common landmarks for two robots in a simulated urban landscape. The authors used Yolo to identify candidate objects and then SIFT to compare these candidates, yielding improved performance over either alone. However, they propose that robots share visual information through the point-to-point transmission of imagery. That approach does not scale well to

94

M. Rahouti et al.

a team and real deployment. We propose instead using a blockchain approach to decentralize communication among the team. Blockchain technology has been integrated into a broad range of modern applications, including, but not limited to, connected and autonomous vehicles (CAVs), the Internet of Things (IoT), and robotics [2, 4, 18, 19]. The deployment of blockchain technology in robotic systems, such as visual homing and navigation, can be highly useful in tackling limitations beyond decentralized/distributed decision-making and security challenges [14]. Several notable studies in the literature integrate blockchain technology in robotic-driven applications. For instance, Castello et al. [6] proposed a blockchainbased framework to improve the security and decision-making in robotic swarm systems, while Fernandes and Alexandre [8] developed blockchain-enabled event management for robotic platforms using Tezos technology. Collective decisionmaking in robotic systems has further been enhanced through the deployment of blockchain. Namely, in work by Strobel et al. [21], the collaborative decisionmaking problem is addressed in byzantine robots through smart contract-based coordination mechanisms. Moreover, several research works addressed the data sharing and data monitoring problems in robotic systems. Among these studies, Castello et al. [7] developed RoboChain, a blockchain framework to enhance and secure data sharing for human– robot interaction. In contrast, Lopes et al. [11] proposed a systematic approach for monitoring the robot workspace using a blockchain-enabled 3D vision mechanism. According to recent studies [3], although state-of-the-art works have enabled specialized teams of robots to handle individual-specific behaviors, such as aggregation, flocking, foraging, etc., there is little to no work that integrates blockchain technology into visual homing-based robotic systems. With its low operational cost, trustworthy functions, provenance, and rigorous access control, blockchain in this chapter will provide enhancements not only in visual homing systems but also in new robotic-driven use cases and applications [1]. To the best of our knowledge, no study has considered the deployment of blockchain technology in visual homing and navigation systems. The proposed framework enables individual robots in a team to efficiently share and identify upto-date common landmarks in a timely, secure, and trustworthy manner with low operational costs/overhead.

3 Methodology and Framework Design This section describes the architectural design and operability of our proposed blockchain-enabled visual homing robotic system. The proposed solution allows individual robots in a visual homing environment to efficiently share and identify up-to-date common landmarks at a low operational

Lightweight Blockchain for Visual Homing and Navigation

Up-to-date Visual Homingg State Transaction p

Decentralized Ledger

95

Work Area 1

Work Area 2

Common Landmarks

Robot R2

Robot R1

Block N Transaction 1

Transaction 2

Transaction -> Transaction ID -> Hash String -> Timestamp -> Data: Field View 1 Field View 2

Transaction 2 Field View 6

Fig. 2 Illustration of the proposed blockchain-enabled visual homing framework

cost and timely manner. Robots are required to create and add a new transaction to the ledger as soon as they complete the move to a new position. A transaction includes an individual team robot’s up-to-date visual panorama (i.e., panoramic view). The new location (position) is defined as one or more of the following: 1. If the robot has traveled over a threshold distance from a prior location 2. If the current visual panorama differs by more than a threshold amount from the panorama of the last location 3. If more than a threshold amount of time has elapsed since the prior transaction A robot’s transaction includes the visual panorama seen from the new robot’s location and a unique hash value. The hash will serve as a unique ID to characterize each transaction in the ledger. The core of the transaction is depicted in Fig. 2, which consists of the transaction ID, hash string, transaction timestamp, and the core of the transaction. The core of the transaction represents the panoramic view data. Namely, the panoramic view is a sequence (list) of joining images with slightly overlapping fields of view to create the panoramic view for an individual robot. Furthermore, since the ledger consists of a series of blocks that are connected in a chain, each block comprises the core information to be stored, the “hash” of the data in the block, and the “hash” of the previous block in the chain. Hashing is done here by converting the string of the transaction’s core content into a series of unique numbers and letters.

96

M. Rahouti et al.

3.1 Motivating Example A simple motivating example is shown in Fig. 3. In this example, .R1 needs to deliver its load of grain to a distant barn storage area. .R1 knows what the barn looks like and searches in its visual panorama for a match but cannot find one. It concludes that the goal location is not in view, and it must utilize network information to travel there. As discussed, all robots enter their visual panorama information into the blockchain data structure. .R1 queries panorama information from the blockchain and identifies that Robot .R3 has a visual panorama that includes the goal location for .R1 to transport its grain. It then checks if .R1 and .R3 have any common landmarks. Unfortunately, the geographic landscape (the “walls” in Fig. 3) prevents .R1 from seeing anything in common with .R3 . At this point, .R1 knows the goal location is in view of .R3 , but there is still no way for .R1 to make its way to the goal. .R1 checks if any panorama in the blockchain has a common landmark with .R3 . In this case, .R2 has a single common landmark with .R3 . The problem of .R1 finding its way to its goal can now be reduced to: can .R1 find its way to the common landmark of .R2 and .R3 ? If it can, then it could look around, and the goal location should be in view. .R1 next checks if it has a common landmark with .R2 ; in this case, there is one common landmark. .R1 now has a potential path to its goal, as shown in Fig. 3: 1. Use visual homing to the common landmark of .R1 and .R2 ; this is possible since the common landmark is already in the view of .R1 by definition.

Fig. 3 Motivating example

Lightweight Blockchain for Visual Homing and Navigation

97

2. When .R1 reaches the common landmark, it must look around again and identify the common landmark of .R2 and .R3 . 3. Use visual homing to the common landmark of .R2 and .R3 . 4. When .R1 reaches the common landmark of .R2 and .R3 , look around and identify the goal. 5. Use visual homing to the goal location. Informally, this assumes that landmarks are spatially grouped so that if a robot approaches within some distance of one of the landmarks, it will see the other landmarks in that area. The approaching distance must be chosen so that the landmark does not occlude some or all the other landmarks.

3.2 Navigation Use Case Here is a use case scenario of how a robot can get to its goal based on the provided use case with three visual homing robots, .R1 , .R2 , and .R3 in Fig. 3: 1. .R1 will first check which robot (.R2 or .R3 or both) can see the goal location; in this case, it is .R3 (this task can be easily achieved by looking in the transaction record of the last blockchain block). 2. .R1 checks if it has a common landmark with .R2 by looking in the transaction record (in the last block); in this case, there is one common landmark. 3. .R1 next checks if .R2 and .R3 have a common landmark by looking in the transaction record (in the last block); in this case, there is one common landmark. 4. Last, .R1 should know the landmark path to reach the goal destination.

3.3 Panoramic State Update As discussed earlier, the blocks are linked through hashing using SHA256. Each block will include the previous block’s hash, its hash, timestamp, and a list of all transactions broadcast by individual robots in the visual homing team. Each time a particular robot in the team changes its location, a new block must be created and appended to the blockchain. The newly created block will include all unchanged transactions (robots remained in the exact location) from the previous block, in addition to the new transactions for individual robots that have moved to a new location. Therefore, the blockchain is ensured to maintain the up-to-date panoramic views/states for all individual robots in the last block appended to the ledger.

98

M. Rahouti et al.

4 Evaluation A similar framework to that of [12] will be used to evaluate our contribution. Pairs of robots will be placed at random locations across the 3D simulated urban landscape with a spacing of 1 to 20 m. The quality and amount of common landmarks will be collected as well as the latency and throughput information for the blockchain calculations. The following subsection will describe this testbed in more detail, and the subsequent subsection will present the results.

4.1 Testbed The common landmark recognition testbed software is written in the widely used open-source middleware Robot Operating System (ROS).1 The 3D simulation engine, Gazebo,2 has been integrated with ROS to allow simulation testing of the robot software. Two Pioneer P3AT robots equipped with cameras are used in conjunction with the modified UCIC python software from [12] for these experiments. The modifications to include blockchain usage are described below. Figure 4 shows an example scene from our ROS/Gazebo suburban simulation. The simulation models a .130 × 180 m2 flat suburban area filled with grass, trees, buildings, vehicles, and other objects. The simulation runs on a Digital Storm Aventum with an Intel Core-i9 processor and GeForce RTX 3080 GPU.

4.2 Performance Evaluation In order for us to evaluate the efficiency of the proposed solution, we examine the following key metrics: • Blockchain update time: This represents the time the system takes to update the ledger in accordance with the FoV state changes—message transmission/broadcast time and transaction validation. • Panoramic view retrieval time: This denotes the time individual team robots take to retrieve a common landmark from the ledger to proceed with the navigation step. • Landmark quality metric: The simulation model database is used to measure how good the common landmark is, as defined in [12].

1 http://www.ros.org. 2 http://gazebosim.org.

Lightweight Blockchain for Visual Homing and Navigation

99

Fig. 4 Top: Example robot pair positions showing ROS/Gazebo 3D Urban landscape. Bottom: Panoramic views from robot cameras

In this evaluation setup, we examined the proposed framework using varying simulated positions with panoramic views to better demonstrate the system’s behavior under a small and large number of transactions (i.e., latency/scalability trade-off). Figure 5 depicts the blockchain update time in a simulated visual homing environment comprising two team robots, .R1 and .R2 , with 40 different panoramic views. As shown in this figure, the time associated with this operation is bounded by .1.6 × 10−3 ms for both visual homing robots. Similarly, Fig. 6 shows the blockchain state update time with 200 total panoramic views. The displayed time is mostly smaller than 0.01 ms, except in rare cases where the latency nearly reaches 0.04 ms. These experiments demonstrate our solution’s considerably low delay/latency associated with the blockchain update operation. As a result, when an individual robot moves to a new position, a new transaction containing up-to-date panoramic views can be created and validated in a very short time. Furthermore, Fig. 7 shows the retrieval time in our simulated blockchain-enabled visual homing environment using team robots with 40, 200, and 500 different positions and panoramic views. The plotted time is bounded by .5.5 × 10−5 and mostly

100

M. Rahouti et al.

Fig. 5 Update time, including transmission time and transaction creation/validation time with 40 positions for robots R1 and R2

smaller than .3 × 10−5 ms, except in some rare cases where it slightly increases for a couple of the 200 and 500 positions trails. The latency/communication delay associated with retrieving common landmarks by individual team robots in our blockchain-enabled visual homing system is incomparably tiny compared to stateof-the-art blockchain solutions. Figure 7 also shows that the overall number of positions (for the individual team robots) does not remarkably impact the time needed by individual robots to retrieve panoramic views from the blockchain. Last, Table 1 presents the average delay/communication overhead associated with our blockchain solution over simulated FoVs with varying visual homing robot positions. The table demonstrates the significantly low delay (overall latency) associated with the substantial operations in the visual homing and navigation system, namely, panoramic state update and panoramic state retrieval. It also shows that there has been no loss of average landmark quality compared to the less sophisticated communication mechanism of [12].

Lightweight Blockchain for Visual Homing and Navigation

101

Fig. 6 Update time, including transmission time and transaction creation/validation time with 200 positions for robots .R1 and .R2

4.3 Performance Discussion 4.3.1

Latency and Transmission Time Overhead

It is important to note that distributed solutions for the visual homing and navigation-based robotic systems use classical approaches to broadcast and transmit all panoramic images (of each robot’s field of view) to all individual team robots. Therefore, as the visual homing team increases, the transmission time complexity associated with broadcasting all panoramas becomes .n2 graph. The latency in blockchain frameworks may become significant if team robots in a visual homing environment are used in cooperative navigation tasks. Hence, rapid information will be required to orchestrate the movements of the individual robots. Further, collisions may arise when there is a mismatch between the current FoV state and the ledger transaction synchronization. A future extension will be required to improve the latency overhead and transaction throughput through a lightweight consensus mechanism. As such, team robots belonging to the same FoV region will not need to wait long time frames to accept/process new transactions among themselves.

102

M. Rahouti et al.

Fig. 7 Retrieval time of panoramic views with 40, 200, and 500 positions

4.3.2

Scalability and Throughput

In our solution, if a large number of robots are deployed within the visual homing and navigation environment, the size of the ledger will be significantly extended (i.e., ledger bloat). Hence, critical parameters, such as the block and transaction size and the number of transactions per block, must be optimized in future works [23]. The blockchain throughput limitation may remarkably impact the FoV state update and retrieval in the case of busy networks with a large number of team robots. A future solution can be the deployment of parallel ledgers with optimized frequency and block size for different FoV information.

4.3.3

Denial of Service (DoS)

The DoS issue may arise upon an overwhelming number of team robots interacting with the blockchain, leading to visual homing network disruption. However, the visual homing and navigation environment is generally restricted to small/medium scale, keeping the blockchain infrastructure and network secure against flooding vulnerabilities.

Lightweight Blockchain for Visual Homing and Navigation

103

Table 1 Average latency overhead examination over simulated FoVs with varying visual homing robot positions # Pos 40 200 500

AVG Ledger Update Time (ms) −3 .0.82998 × 10 −2 .0.100515 × 10 −3 .1.09 × 10

AVG Panoramic View Retrieval Time (ms) −5 .1.79 × 10 −5 .1.99 × 10 −5 .1.98 × 10

AVG Landmark Quality 3.41 3.35 3.40

5 Conclusions Given stored visual information of a target home location, robot navigation back to this location can be accomplished in a lightweight manner using visual homing. This approach requires the home location to be within the FOV, which limits the generality of this approach. To address this limitation, we consider the robot part of a team and integrate blockchain technology into the team’s visual homing and navigation system. Based on the decentralized nature of blockchain, the proposed solution-enabled team robots to share their visual homing information and synchronously access the stored data of panoramic views to identify common landmarks and establish a navigation path. The evaluation results demonstrated the efficiency of our solution in terms of latency and delay overhead, throughput, and scalability.

References 1. Aditya, U.S., Singh, R., Singh, P.K., Kalla, A.: A survey on blockchain in robotics: issues, opportunities, challenges and future directions. J. Netw. Comput. Appl. 196, 103245 (2021) 2. Afanasyev, I., Kolotov, A., Rezin, R., Danilov, K., Kashevnik, A., Jotsov, V.: Blockchain solutions for multi-agent robotic systems: related work and open questions. arXiv preprint arXiv:1903.11041 (2019) 3. Afanasyev, I., Kolotov, A., Rezin, R., Danilov, K., Mazzara, M., Chakraborty, S., Kashevnik, A., Chechulin, A., Kapitonov, A., Jotsov, V., et al.: Towards blockchain-based multi-agent robotic systems: analysis, classification and applications. arXiv preprint arXiv:1907.07433 (2019) 4. Ali, A., Rahouti, M., Latif, S., Kanhere, S., Singh, J., Janjua, U., Mian, A.N., Qadir, J., Crowcroft, J., et al.: Blockchain and the future of the Internet: a comprehensive review. arXiv preprint arXiv:1904.00733 (2019) 5. Carwright, B.A., Collet, T.S.: Landmark learning in bees: experiments and models. J. Comp. Physiol. 151, 521–543 (1983) 6. Castelló Ferrer, E.: The blockchain: a new framework for robotic swarm systems. In: Proceedings of the Future Technologies Conference, pp. 1037–1058. Springer, Berlin (2018) 7. Castelló Ferrer, E., Rudovic, O.O., Hardjono, T., Pentland, A.S.: RoboChain: a secure datasharing framework for human-robot interaction (2018) 8. Fernandes, M., Alexandre, L.A.: RobotChain: using Tezos technology for robot event management. In: Ledger (2019)

104

M. Rahouti et al.

9. Fu, F., Lyons, D.: An approach to robust homing with stereovision. In: SPIE Defense and Security 2017 Conference on Unmanned Systems Technology XX (2018) 10. Gaussier, P., Joulain, C., Banquet, J.P., Leprêtre, S., Revel, A.: The visual homing problem: an example of robotics/biology cross fertilization. Robot. Auton. Syst. 30(1–2), 155–180 (2000) 11. Lopes, V., Pereira, N., Alexandre, L.A.: Robot workspace monitoring using a blockchain-based 3d vision approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019) 12. Lyons, D., Petzinger, N.: Visual homing for robot teams: do you see what I see?. In: SPIE 2020 Conference on Unmanned Systems Technology XXIV (2020) 13. Lyons, D.M., Barriage, B., Del Signore, L.: Evaluation of field of view width in stereo-visionbased visual homing. Robotica 38(5), 787–803 (2020) 14. Menegay, P., Salyers, J., College, G.: Secure communications using blockchain technology. In: MILCOM 2018-2018 IEEE Military Communications Conference (MILCOM), pp. 599–604. IEEE, New York (2018) 15. Nirmal, P., Lyons, D.M.: Homing with stereovision. Robotica 34(12), 2741–2758 (2016) 16. Rahouti, M., Xiong, K., Ghani, N.: Bitcoin concepts, threats, and machine-learning security solutions. IEEE Access 6, 67189–67205 (2018) 17. Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv (2018) 18. Rivera, A.O.G., Tosh, D.K., Njilla, L.: Scalable blockchain implementation for edge-based Internet of Things platform. In: MILCOM 2019-2019 IEEE Military Communications Conference (MILCOM), pp. 1–6. IEEE, New York (2019) 19. Smith, M., Castro, A., Rahouti, M., Ayyash, M., Santana, L.: ScreenCoin: a blockchain-enabled decentralized ad network. In: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS), pp. 1–6. IEEE, New York (2022) 20. Stelzer, A., Vayugundla, M., Mair, E., Suppa, M., Burgard, W.: Towards efficient and scalable visual homing. Int. J. Robot. Res. 37(2–3) (2018) 21. Strobel, V., Castelló Ferrer, E., Dorigo, M.: Managing byzantine robots via blockchain technology in a swarm robotics collective decision making scenario (2018) 22. Vasylkovskyi, V., Guerreiro, S., Sequeira, J.S.: BlockRobot: Increasing privacy in human robot interaction by using blockchain. In: 2020 IEEE International Conference on Blockchain (Blockchain), pp. 106–115. IEEE, New York (2020) 23. Zhou, Q., Huang, H., Zheng, Z., Bian, J.: Solutions to scalability of blockchain: A survey. IEEE Access 8, 16440–16455 (2020)

Index

A Attention mechanism, 68–70 B Blockchain, ix, 91–103 C Calf feeding, 27–39 Convolution neural network (CNN), ix, 67–76, 82, 89, 93 D Deep learning, 17–19, 67, 74, 75, 82, 93 Dispersion, 41–50 E Education, 1–4, 10, 12, 14, 15 F Facial expression recognition (FER), 67–76 Feature fusion, 76 Few-shot learning, 17–24 Field of view (FOV), 91–93, 98, 101–103 Fuzzy PID, 27–39 I Individual investors, 54, 57, 58, 63, 64 Inhibition signal, 21, 24

M 0-1 mask, 18, 20, 21, 24 Mobile robot, 1–15, 46, 91 Multi-objective optimization, 17–24 Multi-robot systems, 41, 42, 46–50

N Navigation, 2, 7, 8, 15, 41, 81, 91–103

P Perceived usefulness, 55–58, 60, 63

R Realtime detection, 81–89 Research, ix, 1–4, 10–12, 14, 15, 23, 28–37, 41, 55–59, 69, 81, 92–94 Robo advisory services, 53–64 Robot, ix, 1–15, 27–39, 41–50, 81, 82, 85, 91–103 Robotino, 1–15

S Smart partial least squares (PLS), 61 Surgical instruments detection, ix, 81–89 Surgical robot technology, 81, 82

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 P. S. Stanimirovi´c et al. (eds.), 6th EAI International Conference on Robotic Sensor Networks, EAI/Springer Innovations in Communication and Computing, https://doi.org/10.1007/978-3-031-33826-7

105

106 T Temperature control, 27–39 U Unified theory of acceptance and use of technology (UTAUT) model, 53–64

Index V Velocity obstacles, 41–50 Visual homing, ix, 91–103 Y YOLOv7, ix, 81–89