Communications, Networking, and Information Systems: First International Congress, CNIS 2023, Guilin, China, March 25–27, 2023, Revised Selected ... in Computer and Information Science) 9819935806, 9789819935802

This volume constitutes selected papers presented at the First International Congress on Communications, Networking, and

101 85 16MB

English Pages 188 [181] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Communications, Networking, and Information Systems: First International Congress, CNIS 2023, Guilin, China, March 25–27, 2023, Revised Selected ... in Computer and Information Science)
 9819935806, 9789819935802

Table of contents :
Preface
Organization
Contents
Communications and Networking
Deep Reinforcement Learning Based Vehicle Selection for Asynchronous Federated Learning Enabled Vehicular Edge Computing
1 Introduction
2 Related Works
3 System Model
4 Parameters Computing
4.1 Local Training Delay
4.2 Distance
4.3 Transmission Rate
4.4 Transmission Delay
5 Problem Formulation
5.1 State
5.2 Action
5.3 Reward
6 DRL-Based AFL Weight Optimization:DAFL
6.1 Training Stage
6.2 Process of AFL
6.3 Testing Stage
7 Simulation and Results
7.1 Simulation Setup
7.2 Experiment Results
8 Conclusion
References
A Virtual Community Healthcare Framework in Metaverse Enabled by Digital Twins
1 Introduction
2 Background and Related Work
2.1 IoMT Applications
2.2 Digital Twins in Healthcare
3 Virtual Healthcare in the Metaverse Era
3.1 DT-enabled Virtual Healthcare in Metaverse
3.2 VirCom: System Architecture
4 Multi-Sensor Senior Falling Detection: A Case Study
4.1 Information Fusion Using Dempster-Shafer Evidence Theory
4.2 MARS System - Components and Methodology
5 Experimental Study
5.1 Experimental Data Set
5.2 Experimental Results
6 Conclusions
References
An Improved LED Aruco-Marker Detection Method for Event Camera
1 Introduction
2 Related Work
2.1 Event-Based Tracking
2.2 ArUco
3 Method
3.1 Mean Shirt Filter
3.2 Corner Point Refinement
3.3 Error Detection
3.4 Weighted Statistic
4 Experiment
5 Conclusion
References
Improvement of CenterNet Based on Feature Pyramid Networks
1 Introduction
1.1 CenterNet ch4DBLP:journalsspscorrspsabssps1904sps07850
1.2 Feature Pyramid Networks
2 The Improved Method
2.1 Optimization of Feature Pyramid Network
2.2 Data Processing
3 Experiments
4 Conclusion
References
Information Security Protection Techniques for Substation Operation and Maintenance Edge Gateway
1 Introduction
2 Overall Structure of Substation Edge Gateway
3 Host Security
3.1 Trusted Computing Platform
3.2 Network Security Monitoring
4 Microservice Security
4.1 Docker and Namespaces
4.2 Resource Allocation for Microservices
5 Communication Security
5.1 Internal Communication Security
5.2 Security of Southbound Device Access
5.3 Security of Northbound Data Reporting
6 Security of Business Process
6.1 Identity Authentication and Authority Control
6.2 Security Audit and Backup
6.3 App Store Review
7 Test and Pilot Applications
8 Conclusion
References
A Robust MOR-Based Secure Fusion Strategy Against Byzantine Attack in Cooperative Spectrum Sensing
1 Introduction
1.1 Related Work
1.2 Contributions
1.3 Organization
2 System Model
2.1 Cooperative Spectrum Sensing Model
2.2 Fading Channel Model
2.3 Attack Model
3 CSS with MU and Its Influence
3.1 Energy Detection
3.2 Cooperative Spectrum Sensing with Energy Detection
3.3 The Influence of Malicious Users
4 A Robust Modified Outlier Removal Sensing Scheme
4.1 Concept Design
4.2 Detailed Procedure
5 Simulation Results and Discussion
5.1 Parameters Setting
5.2 Simulation Results
6 Conclusion and Future Directions
References
Security Analysis of Blockchain Layer-One Sharding Based Extended-UTxO Model
1 Introduction
2 Problem Statement and Contributions
3 Background
3.1 Ledger State Models
3.2 Sharding
3.3 EUTXO Model
4 Related Work
4.1 Elastico
4.2 Omniledger
4.3 RapidChain
4.4 Chainspace
4.5 Monoxide
4.6 Zilliqa
4.7 Extended-UTXO
5 The Proposed Model
5.1 Addresses and Ownership
5.2 Transaction Structure
5.3 Validity Conditions
5.4 Smart Contract Architecture
5.5 Network Sharding
6 Evaluation Methodology
6.1 Simulation Structure
6.2 Experiment Setup
7 Evaluation
7.1 Results
8 Conclusion
References
Information Systems and Artificial Intelligence
Fundamental Frequency Removal PCA Method and SVM Approach Used for Structure Feature Distilling and Damage Diagnosis
1 Introduction
2 Fundamental Frequency Removal Principal Component Analysis (FFR-PCA)
3 Pattern Recognition Method
4 Experiments and Discussion
4.1 Actual Vibration Testing
4.2 PROE and ADAMS Simulation Testing
4.3 FFR-PCA Feature Distilling
4.4 Pattern Recognition Based on SVM
5 Conclusion
References
Stock Trend Prediction Based on Improved SVR
1 Introduction
2 The Forecasting Model of Stock Price
2.1 SVR Prediction Model of Stock Price
2.2 The Proposed SVR Model for Stock Price Forecasting
2.3 Evaluating Indicator
3 Our Experiment
3.1 The Data Set Used in Experiment
3.2 Experiment
4 Summary
References
Several Misconceptions and Misuses of Deep Neural Networks and Deep Learning
1 Introduction
2 Misconception: Deep Learning Contains Machine Learning
3 Misconception: Deep Structure Is Superior to Shallow Structure
4 Misconception: Deep Learning Is Learning by a Deep Layered Neural Network
5 Misconception: Transfer Learning Is Always Useful
6 Misuse of Deep Learning
7 Conclusion
References
Author Index

Citation preview

Haonan Chen Pingyi Fan Lipo Wang (Eds.)

Communications in Computer and Information Science

1839

Communications, Networking, and Information Systems First International Congress, CNIS 2023 Guilin, China, March 25–27, 2023 Revised Selected Papers

Communications in Computer and Information Science Editorial Board Members Joaquim Filipe , Polytechnic Institute of Setúbal, Setúbal, Portugal Ashish Ghosh , Indian Statistical Institute, Kolkata, India Raquel Oliveira Prates , Federal University of Minas Gerais (UFMG), Belo Horizonte, Brazil Lizhu Zhou, Tsinghua University, Beijing, China

1839

Rationale The CCIS series is devoted to the publication of proceedings of computer science conferences. Its aim is to efficiently disseminate original research results in informatics in printed and electronic form. While the focus is on publication of peer-reviewed full papers presenting mature work, inclusion of reviewed short papers reporting on work in progress is welcome, too. Besides globally relevant meetings with internationally representative program committees guaranteeing a strict peer-reviewing and paper selection process, conferences run by societies or of high regional or national relevance are also considered for publication. Topics The topical scope of CCIS spans the entire spectrum of informatics ranging from foundational topics in the theory of computing to information and communications science and technology and a broad variety of interdisciplinary application fields. Information for Volume Editors and Authors Publication in CCIS is free of charge. No royalties are paid, however, we offer registered conference participants temporary free access to the online version of the conference proceedings on SpringerLink (http://link.springer.com) by means of an http referrer from the conference website and/or a number of complimentary printed copies, as specified in the official acceptance email of the event. CCIS proceedings can be published in time for distribution at conferences or as postproceedings, and delivered in the form of printed books and/or electronically as USBs and/or e-content licenses for accessing proceedings at SpringerLink. Furthermore, CCIS proceedings are included in the CCIS electronic book series hosted in the SpringerLink digital library at http://link.springer.com/bookseries/7899. Conferences publishing in CCIS are allowed to use Online Conference Service (OCS) for managing the whole proceedings lifecycle (from submission and reviewing to preparing for publication) free of charge. Publication process The language of publication is exclusively English. Authors publishing in CCIS have to sign the Springer CCIS copyright transfer form, however, they are free to use their material published in CCIS for substantially changed, more elaborate subsequent publications elsewhere. For the preparation of the camera-ready papers/files, authors have to strictly adhere to the Springer CCIS Authors’ Instructions and are strongly encouraged to use the CCIS LaTeX style files or templates. Abstracting/Indexing CCIS is abstracted/indexed in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, Scopus. CCIS volumes are also submitted for the inclusion in ISI Proceedings. How to start To start the evaluation of your proposal for inclusion in the CCIS series, please send an e-mail to [email protected].

Haonan Chen · Pingyi Fan · Lipo Wang Editors

Communications, Networking, and Information Systems First International Congress, CNIS 2023 Guilin, China, March 25–27, 2023 Revised Selected Papers

Editors Haonan Chen Colorado State University Fort Collins, CO, USA

Pingyi Fan Tsinghua University Beijing, China

Lipo Wang Nanyang Technological University Singapore, Singapore

ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-981-99-3580-2 ISBN 978-981-99-3581-9 (eBook) https://doi.org/10.1007/978-981-99-3581-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

The 2023 International Congress on Communications, Networking, and Information Systems (CNIS 2023) aimed to bring together researchers and scientists from academia, industry, and government laboratories to present new results and identify future research directions in communications, networking, and information systems. The recent years have witnessed tremendous advancements in communications and networking, especially 5G and 6G technologies, Networked Cyber Physical Systems, sensor networks, and distributed ledger technologies. At the same time, there have been major breakthroughs in information systems and artificial intelligence, notably in deep learning and federated learning. These new developments in communications, networking, information systems, and artificial intelligence have found many exciting applications in virtually all walks of life, including healthcare, smart cities, and robotics. We were delighted to receive 32 submissions from around the globe. Each paper underwent a single-blind review by at least 3 reviewers. 10 papers, i.e., 8 regular papers and 2 short papers, were accepted and included in the conference program and proceedings. We would like to sincerely thank all organizing committee members, program committee members, and reviewers for their hard work and valuable contribution. Without their help, this conference would not have been possible. Special thanks go to Springer for publishing the proceedings, and we thank the Springer staff for their great support. We are very grateful to the keynote speakers and invited speakers for their authoritative speeches. We thank all authors and conference participants for using this platform to communicate their excellent work. April 2023

Haonan Chen Pingyi Fan Lipo Wang

Organization

Organizing Committee General Chairs Frank Langbein Xiaodong Xu Guanhua Yan

Cardiff University, UK Central South University, China Binghamton University, USA

Program Chairs Haonan Chen Pingyi Fan Tigang Jiang Lipo Wang Dehao Wu

Colorado State University, USA Tsinghua University, China University of Electronic Science and Technology of China, China Nanyang Technological University, Singapore Bournemouth University, UK

Publicity Chairs Joseph Doyle Taogang Hou Zhi Wang

Queen Mary University of London, UK Beijing Jiaotong University, China Florida State University, USA

Publication Chairs Edmund Harbord Bo Liu Yutao Ma

University of Bristol, UK Southwest University, China Wuhan University, China

Program Committee Bo Ai Atm Shafiul Alam Muhammed Ali Bingol Vira Chankong

Beijing Jiaotong University, China Queen Mary University of London, UK De Montfort University, UK Case Western Reserve University, USA

viii

Organization

Haonan Chen Yihong Chen Yu Chen Peng Cheng Kenneth Chiu Edwin Chong Daniel Clarke Xiaohui Cui Xiaoheng Deng Baocang Ding Joseph Doyle Yu Du Pingyi Fan Wenjiang Feng Paula Fonseca Peter Gacs Vasileios Germanos Siddhartan Govindasamy Derek Groen Shenglin Gui Yuchun Guo Edmund Harbord Mary He Jane Henriksen-Bulmer Taogang Hou Shaoqing Hu Take Itagaki Mona Jaber Xiaolin Jia Tigang Jiang Bingli Jiao Laleh Kasraian Youngwook Ko Frank Langbein Yingjie Lao Ethan Lau Chuang Li Heng Li

Colorado State University, USA China West Normal University, China Binghamton University, USA Southwest University, China Binghamton University, USA Colorado State University, USA Cranfield University, UK Wuhan University, China Central South University, China Chongqing University of Posts and Telecommunications, China Queen Mary University of London, UK Florida International University, USA Tsinghua University, China Chongqing University, China Queen Mary University of London, UK Boston University, USA De Montfort University, UK Boston College, USA Brunel University London, UK University of Electronic Science and Technology of China, China Beijing Jiaotong University, China University of Bristol, UK De Montfort University, UK Bournemouth University, UK Beijing Jiaotong University, China Brunel University London, UK Brunel University London, UK Queen Mary University of London, UK Southwest University of Science and Technology, China University of Electronic Science and Technology of China, China Peking University, China De Montfort University, UK Queen Mary University of London, UK Cardiff University, UK Clemson University, USA Queen Mary University of London, UK Hainan University, China Central South University, China

Organization

Houjun Li Qiang Li Xiao Li Yanhong Li Yongming Li Feng Lin Bo Liu Hui Lu Xin Lu Junchao Ma Qian Ma Yutao Ma Arjuna Madanayake Mahtab Mirmohseni Ivor Morrow Eranjan Udayanga Padumadasa Bernd-Peter Paris Ivan Petrunin Christian Poellabauer Shengbing Ren Muntadher Sallal Mohammad Samie Neetesh Saxena Sunish Kumar Orappanpara Soman Guolin Sun Haifeng Sun Yimao Sun Zhi Sun Matthew (Wai Chung) Tang Atoussa H. Tehrani Xiaoyang Tong Darpan Triboan Uraz Turker Bingchuan Wang Jiahao Wang Juan Wang

ix

Guangxi University of Science and Technology, China Southwest University of Science and Technology, China Southwest University, China South-Central Minzu University, China Chongqing University, China Sichuan University, China Southwest University, China Binghamton University, USA Bournemouth University, UK Shenzhen Technology University, China Sun Yat-sen University, China Wuhan University, China Florida International University, USA University of Surrey, UK Cardiff University, UK Queen Mary University of London, UK George Mason University, USA Cranfield University, UK Florida International University, USA Central South University, China Bournemouth University, UK Cranfield University, UK Cardiff University, UK Ulster University, UK University of Electronic Science and Technology of China, China Southwest University of Science and Technology, China Sichuan University, China Tsinghua University, China Queen Mary University of London, UK Florida International University, USA Southwest Jiaotong University, China De Montfort University, UK Lancaster University, UK Central South University, China University of Electronic Science and Technology of China, China Wuhan University, China

x

Organization

Junsong Wang Meng Wang Ruoyu (Fish) Wang Xinyuan (Frank) Wang Yongqiang Wang Yueyang Wang Zhi Wang Jiayan Wen Dehao Wu Dehao Wu Hejun Wu Weihua Xu Xiaodong Xu Guanhua Yan Shuangyi Yan Siqi Yan Jie Yang Ping Yang Yang Yang Baicheng Yao Lu Yu Fatemeh Zarrabi Feng Zeng Wen Zeng Li Zhang Peng Zhang Shigeng Zhang Zhenghao Zhang Liang Zhao Ping Zhong Jun Zhou Weiping Zhu Runmin Zou

Shenzhen Technology University, China Central South University, China Arizona State University, USA George Mason University, USA Clemson University, USA Chongqing University, China Florida State University, USA Guangxi University of Science and Technology, China Bournemouth University, UK Central South University, China Sun Yat-sen University, China Southwest University, China Central South University, China Binghamton University, USA University of Bristol, UK Huazhong University of Science and Technology, China Florida State University, USA Binghamton University, USA Southwest University, China University of Electronic Science and Technology of China, China Clemson University, USA De Montfort University, UK Central South University, China De Montfort University, UK Shenzhen University, China Shenzhen University, China Central South University, China Florida State University, USA Southwest University of Science and Technology, China Central South University, China University of Electronic Science and Technology of China, China Wuhan University, China Central South University, China

Contents

Communications and Networking Deep Reinforcement Learning Based Vehicle Selection for Asynchronous Federated Learning Enabled Vehicular Edge Computing . . . . . . . . . . . . . . . . . . . . Qiong Wu, Siyuan Wang, Pingyi Fan, and Qiang Fan

3

A Virtual Community Healthcare Framework in Metaverse Enabled by Digital Twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qian Qu, Han Sun, and Yu Chen

27

An Improved LED Aruco-Marker Detection Method for Event Camera . . . . . . . . Shijie Zhang, Yuxuan Huang, Xuan Pei, Haopeng Lin, Wenwen Zheng, Wendi Wang, and Taogang Hou

47

Improvement of CenterNet Based on Feature Pyramid Networks . . . . . . . . . . . . . Yatao Yang, Zihan Yang, Yao Huang, and Li Zhang

58

Information Security Protection Techniques for Substation Operation and Maintenance Edge Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fenfang Li, Lixiang Ruan, Rongrong Ji, Yifei Shen, and Mingguo Hou

68

A Robust MOR-Based Secure Fusion Strategy Against Byzantine Attack in Cooperative Spectrum Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lan Guo, Weifeng Chen, Yang Cong, and Xuechun Yan

81

Security Analysis of Blockchain Layer-One Sharding Based Extended-UTxO Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cayo Fletcher-Smith and Muntadher Sallal

95

Information Systems and Artificial Intelligence Fundamental Frequency Removal PCA Method and SVM Approach Used for Structure Feature Distilling and Damage Diagnosis . . . . . . . . . . . . . . . . . . . . . . 127 Gang Jiang, Yue Peng, Yifan Huang, Xing’an Hao, Shi Yi, Yuanming Lai, Qian Wang, Jie Jiang, Chuanmei Hu, Lanying Yang, and Song Gao Stock Trend Prediction Based on Improved SVR . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Zhouyuzhe Bai

xii

Contents

Several Misconceptions and Misuses of Deep Neural Networks and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 K.-L. Du Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Communications and Networking

Deep Reinforcement Learning Based Vehicle Selection for Asynchronous Federated Learning Enabled Vehicular Edge Computing Qiong Wu1,2 , Siyuan Wang1,2 , Pingyi Fan3(B) , and Qiang Fan4 1

School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China [email protected], [email protected] 2 State Key Laboratory of Integrated Services Networks (Xidian University), Xi’an 710071, China 3 Department of Electronic Engineering, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China [email protected] 4 Qualcomm, San Jose CA 95110, USA

Abstract. In the traditional vehicular network, computing tasks generated by the vehicles are usually uploaded to the cloud for processing. However, since task offloading toward the cloud will cause a large delay, vehicular edge computing (VEC) is introduced to avoid such a problem and improve the whole system performance, where a roadside unit (RSU) with certain computing capability is used to process the data of vehicles as an edge entity. Owing to the privacy and security issues, vehicles are reluctant to upload local data directly to the RSU, and thus federated learning (FL) becomes a promising technology for some machine learning tasks in VEC, where vehicles only need to upload the local model hyperparameters instead of transferring their local data to the nearby RSU. Furthermore, as vehicles have different local training time due to various sizes of local data and their different computing capabilities, asynchronous federated learning (AFL) is employed to facilitate the RSU to update the global model immediately after receiving a local model to reduce the aggregation delay. However, in AFL of VEC, different vehicles may have different impact on the global model updating because of their various local training delay, transmission delay and local data sizes. Also, if there are bad nodes among the vehicles, (that is, the amount of data and local computing resources are small, and the local model is polluted by random noise), it will affect the global aggregation quality at the RSU. To solve the above problem, we shall propose a deep

Supported in part by the National Natural Science Foundation of China (No. 61701197), in part by the open research fund of State Key Laboratory of Integrated Services Networks (No. ISN23-11), in part by the National Key Research and Development Program of China (No. 2021YFA1000500(4)), in part by the 111 Project (No. B23008). (Qiong Wu and Siyuan Wang contributed equally to this work.). c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023  H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 3–26, 2023. https://doi.org/10.1007/978-981-99-3581-9_1

4

Q. Wu et al. reinforcement learning (DRL) based vehicle selection scheme to improve the accuracy of the global model in AFL of vehicular network. In the scheme, we present the model including the state, action and reward in the DRL based to the specific problem. Simulation results demonstrate our scheme can effectively remove the bad nodes and improve the aggregation accuracy of the global model. Keywords: Deep reinforcement learning (DRL) · Asynchronous federated learning (AFL) · accuracy · mobility · delay

1

Introduction

The emerging Internet of vehicles (IoV) becomes a promising technology to make our life more convenient [1–4]. At the same time, intelligent services also become a critical part in various vehicles [5]. Therefore, vehicles driving on the road will generate some computing tasks according to the high quality service requirements of users [6,7]. However, in the traditional cloud computing, the cloud is far from the moving vehicles, incurring a high task delay when tasks are offloaded to the cloud, which is not suitable for high-speed vehicles. Thus, vehicular edge computing (VEC) [8] is introduced to enable vehicles to offload the computing tasks to a roadside unit (RSU) with a certain computing capability to reduce the task processing delay. However, it requires the vehicle to upload local data to the RSU for processing which is a challenging issue because people are reluctant to open their local data due to privacy issues [9,10]. So the federated learning (FL) is designed to handle this issue [11,12]. Specifically, FL performs iterative global aggregations at the RSU. In one round, the vehicle first downloads the current global model update from the RSU, and then uses their local data for local training. The trained local model will be uploaded to the RSU. When the RSU receives all the trained local models from vehicles, it will perform the global aggregation and broadcast the updated global model to the vehicles. Then the second round is repeated until the specified number of rounds is reached. Since local data cannot be accessed at RSU, the data privacy can be ensured physically. However, in conventional FL [13], the RSU needs to wait for all vehicles to offload the local model before updating the global model [14]. If a vehicle has high local training delay and transmission delay, some vehicles may drive out of the coverage of the RSU and thus cannot participate in the global aggregation. Thus, asynchronous federated learning (AFL) is introduced [15–17]. Specifically, the vehicle uploads the local model after finishing one round of its local training, with which the RSU updates the global model once it receives the local model. This enables a faster update of the global model at the RSU without waiting for other vehicles. The vehicle mobility will cause the time-varying channel conditions and transmission rates [18,19], and thus vehicles have different transmission delays [20– 23]. At the same time, different vehicles have different time-varying computing resources and different amounts of local data, which will cause different local training delays. In AFL, vehicles upload their local models asynchronously, it

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

5

is possible that the RSU has already updated the global model according to its received local models but some vehicles had not uploaded its local model yet. As a result, the local model of these vehicles are in staleness. Staleness is related to local training delay and transmission delay. Therefore, it is important to consider the impact of the above factors on the accuracy of the global model at the RSU. In AFL, some bad nodes may exist in the network, that is, the vehicle has small available computing resources, small amount of local data, or the local model is polluted by random noise. The bad nodes can significantly affect the accuracy of the global model at the RSU [24]. Therefore, it is necessary to select the vehicles to participate in the global aggregation. Deep reinforcement learning (DRL) may provide a way to select the proper vehicles to solve this problem [25]. Specifically, it takes action based on the current state of vehicles, and then gets the corresponding reward. After that, the next state is reached, and the above steps are repeated. Finally, the neural network can provide an optimal policy for the vehicle selection of the system. In this paper, we have proposed an AFL weight optimized scheme which selects vehicles based on deep deterministic policy gradient (DDPG) while considering the mobility of the vehicles, time-varying channel conditions, timevarying available computing resources of the vehicles, different amount of local data of vehicles, and existence of bad nodes1 The main contributions of this paper are shown as follows: 1) By considering the bad nodes with less local data, less available computing resources and a local model polluted by random noise, we have employed DDPG to select vehicles to participate in the AFL, so as to avoid the impact of bad nodes on the global model aggregation. 2) We consider the impact of vehicle mobility, time-varying channel conditions, the time-varying available computing resources of the vehicles, and different amount of local data of vehicles to perform a weight optimization and select vehicles that participate in the global aggregation, thereby improving the accuracy of the global model. 3) Extensive simulation results demonstrate that our scheme can effectively remove the bad nodes and improve the accuracy of the global model.

2

Related Works

In the literature, there are many research works on FL in vehicular networks. In [26], Zhou et al. proposed a two-layer federated learning framework based on the 6G supported vehicular networks to improve the learning accuracy. In [27], Zhang et al. proposed a method using federated transfer learning to detect the drowsiness of drivers to preserve drivers’ privacy while increasing the accuracy of their scheme. In [28], Xiao et al. proposed a greedy strategy to select vehicles according to the position and velocity to minimize the cost of FL. In [29], Saputra et al. proposed a vehicle selection method based on their locations and 1

The source code has been released at: https://github.com/qiongwu86/AFLDDPG.

6

Q. Wu et al.

history information, and then developed a multi-principal one-agent contractbased policy to maximize the profits of the service provider and vehicles while improving the accuracy of their scheme. In [30], Yan et al. proposed a power allocation scheme based on FL to maximize energy efficiency while getting better accuracy of power allocation strategy. In [31], Ye et al. proposed an incentive mechanism by using multidimensional contract theory and prospect theory to optimize the incentive for vehicles when preforming tasks. In [32], Kong et al. proposed a federated learning-based license plate recognition framework to get a high accuracy and low cost for detecting and recognizing license plate. In [33], Saputra et al. proposed an economic-efficiency framework using FL for an electric vehicular network to maximize the profits for charging stations. In [34], Ye et al. proposed a selective model aggregation approach to get a higher accuracy of global model. In [35], Zhao et al. proposed a scheme combined FL with local differential privacy to get a high accuracy when the privacy budget is small. In [36], Li et al. proposed an identity-based privacy preserving scheme to protect the privacy of vehicular message. It can reduce the training loss while increasing the accuracy. In [37], Taïk et al. proposed a scheme including FL and corresponding learning and scheduling process to efficiently use vehicular-to-vehicular resources to bypass the communication bottleneck. This scheme can effectively improve the learning accuracy. In [38], Hui et al. proposed a digital twins enabled on-demand matching scheme for multi tasks FL to address the two-way selection problem between task requesters and RSUs. In [39], Liu et al. proposed an efficient-communication approach, which consists of the customized local training strategy, partial client participation rule and a flexible aggregation policy to improve the test accuracy and average communication optimization rate. In [40], Lv et al. proposed a blockchain-based FL scheme to detect misbehavior and finally get higher accuracy and efficiency. In [41], Khan et al. proposed a DRLbased FL to minimize the cost considering packet error rate and global loss. In [42], Samarakoon et al. proposed a scheme considering joint power and resource allocation for ultra-reliable low-latency communication in vehicular networks to keep a high accuracy while reducing the average power consumption and the amount of exchanged data. In [43], Hammoud et al. proposed a horizontal-based FL, empowered by fog federations, devised for the mobile environment to improve the accuracy and service quality of IoV intelligent applications. However, these works have not considered the situation that vehicles may usually drive out of the coverage of the RSU before they upload their local models, which deteriorates the accuracy of the global model. A few works have studied the AFL in vehicular networks. In [44], Tian et al. proposed an asynchronous federated deep Q-learning network to solve the task offloading problem in vehicular network, then designed a queue-aware algorithm to allocate computing resources. In [45], Pan et al. proposed a scheme using AFL and deep Q-learning algorithm to get the maximized throughput while considering the long-term ultrareliable and low-latency communication constraints. However, these works have not considered the mobility of vehicles, the amount of data and computing capability to select vehicle in the design of the AFL in vehicular networks and the impact of bad nodes. This motivates us to do this

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

7

work by considering the key factors affecting the AFL applications in vehicular networks.

3

System Model

Fig. 1. System model.

This section will describe the system model. As shown in Fig. 1, we consider an edge-assisted vehicular network consisting of a RSU and K vehicles within its coverage. In the network, the bottom of the RSU stands for the origin, the x-axis direction is toward east, the y-axis direction is toward south, and the z-axis direction is perpendicular to the x-axis and y-axis and along the direction of the RSU’s antenna. Specifically, vehicles are assumed to move toward east with the same velocity in the coverage of the RSU, which can match most of cases in the highway scenarios. The time domain is divided into discrete time slots. Each vehicle i (1 ≤ i ≤ K) carries a different amount of data Di and has different computing capabilities. At the same time, the vehicle mobility incurs a time-varying channel condition. We first use the DRL algorithm to select the vehicles participating in AFL according to the vehicle’s transmission rate, amount of available computing resources, and the location of vehicle, and then the selected vehicles will train and upload their local models to the RSU. That is, each selected vehicle can use its local data to train the local model, and then the weight of local model is optimized according to the local training delay and transmission delay. All selected vehicles can asynchronously upload the local models to the RSU. After multiple rounds of model aggregations, we can get a more accurate global model at the RSU. For ease of understanding, the main notations used in this paper are listed in Table 1.

4

Parameters Computing

For simplicity, we first introduce some parameters used in the following sections.

8

Q. Wu et al. Table 1. Notations used in this paper Notation

Description

K

Total number of vehicles within the coverage of RSU

v

Velocity of vehicle

Di

The amount of data carried by vehicle i

μi

Computing resources of vehicle i

Pi (t)

Position of vehicle i at time slot t

dix (t)

Position of vehicle i along the x-axis from the antenna of the RSU at time slot t

dy

Position of vehicle i along the y-axis from the antenna of the RSU at time slot t

di0

Initial position of vehicle i along the x-axis

Hr

Height of RSU’s antenna

Pr

Position of RSU’s antenna

di (t)

Distance from vehicle i to the antenna of RSU at time slot t

tri (t)

Transmission rate of vehicle i at time slot t

B

Transmission bandwidth

p0

Transmission power of each vehicle

hi (t)

Channel gain

α

Path loss exponent

σ2

Power of noise

ρi

Normalized channel correlation coefficient between consecutive time slots

i fd

Doppler frequency of vehicle i

Λ

Wavelength

θ

Angle between the moving direction and the uplink communication direction

C0

Number of CPU cycles required to train a data

Tli

Delay of local training of vehicle i

i Tu (t)

Transmission delay for vehicle i to upload local model at time slot t

|w|

Size of the local model of each vehicle

γ

Discounted factor

N

Total number of time slots

δ

Parameter of actor network

δ∗

Optimized parameter of actor network

ξ

Parameter of critic network

ξ∗

Optimized parameter of critic network

δ1

Parameter of target actor network

∗ δ1

Optimized parameter of target actor network

ξ1

Parameter of target critic network

∗ ξ1

Optimized parameter of target critic network

τ

Update parameter for target networks

Rb

Replay buffer

Δt

The exploration noise at time slot t

I

Size of mini-batch

μδ

Policy approximated by actor network

μ∗

Optimal policy of system

Emax

Max episode of training stage

Kl

Number of selected vehicles

l

Number of local training

m1

Parameter of training weight

m2

Parameter of transmission weight

Emax

Max episode of testing stage



DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

4.1

9

Local Training Delay

Vehicle i uses its local data to train a local model, thus the local training delay Tli of vehicle i can be calculated as: Tli =

Di C0 μi

(1)

where C0 is the number of CPU cycles required to process one unit of data, μi is the computing resources of vehicle i, i.e., CPU cycles frequency. 4.2

Distance

Denote Pi (t) as the position of vehicle i at time slot t, dix (t) and dy as the distance between vehicle i and the antenna of RSU at time slot t along the xaxis and y-axis. Thus Pi (t) can be expressed as (dix (t) , dy , 0). Here, dy is a fixed value, and dix (t) can be denoted as: dix (t) = di0 + vt

(2)

where di0 is the initial position of vehicle i along x-axis. We set the height of RSU’s antenna as Hr , and thus the position of antenna of RSU can be expressed as Pr = (0, 0, Hr ). Then the distance between vehicle i and the antenna of RSU at time slot t can be expressed as: di (t) = Pi (t) − Pr  4.3

(3)

Transmission Rate

We set the transmission rate of vehicle i at time slot t to be tri (t). According to Shannon’s theorem, it can be expressed as:   −α p0 ·hi (t) ·(di (t)) (4) tri (t) = Blog2 1 + σ2 where B is the transmission bandwidth, p0 is the transmission power of each vehicle, hi (t) is the channel gain of vehicle i at time slot t, α is the path loss exponent, σ 2 is the power of noise.

10

Q. Wu et al.

We use autoregressive model to formulate the relationship between hi (t) and hi (t − 1):  hi (t) = ρi hi (t − 1) + e (t)

1 − ρ2i

(5)

where ρi is the normalized channel correlation coefficient between consecutive time slots, e (t) is the error vector following a complex Gaussian distribution and   is related to hi (t). According to Jake’s fading spectrum, ρi = J0 2πfdi t , where J0 (·) is the zeroth-order Bessel function of the first kind and fdi is the Doppler frequency of vehicle i which can be calculated as: fdi =

v cos θ Λ

(6)

where Λ is the wavelength, θ is the angle between the moving direction x0 = (1, 0, 0) and the uplink communication direction Pr − Pi (t). Thus cos θ can be calculated as: x0 · (Pr − Pi (t)) (7) cos θ = Pr − Pi (t) 4.4

Transmission Delay

The transmission delay of vehicle i for uploading its local model Tui (t) can be denoted as: |w| (8) Tui (t) = tri (t) where |w| is the size of the local model of each vehicle.

5

Problem Formulation

In this section, we will formulate the problem and define the state, action and reward. In the system, due to the mobility and the time-varying computing resources and channel conditions of the vehicles, we employ a DRL framework including state, action, and reward to formulate the problem of vehicle selection. Specifically, at each time slot t, the system takes action according to the policy based on the current state, and then gets the reward and transitions to the next state. Next, the state, action and reward of the system will be defined, respectively. 5.1

State

Considering that the vehicle mobility can be reflected by its position while the local training delay and transmission delay of the vehicle are related to the vehicle’s time-varying available computing resources and current channel condition, so we define the state at time slot t as: s (t) = (T r (t) , μ (t) , dx (t) , a (t − 1))

(9)

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

11

where T r (t) is the set of the transmission rates of each vehicle at time slot t, i.e., T r (t) = (tr1 (t) , tr2 (t) , . . . trK (t)), μ (t) is the set of available computing resources of all vehicles at time slot t, i.e., μ (t) = (μ1 (t) , μ2 (t) , . . . μK (t)), dx (t) is the set of all vehicles’ positions along the x-axis at time slot t, i.e., dx (t) = (d1x (t) , d2x (t) , . . . dKx (t)), a (t − 1) is the action at time slot t − 1. 5.2

Action

Since the purpose of DRL is to select the better vehicles for AFL according to the current state, we define the system action at time slot t as: a (t) = (λ1 (t) , λ2 (t) , . . . λK (t))

(10)

where λi (t) , i ∈ [1, K] is the probability of selecting vehicle i, and we define that λ1 (0) = λ2 (0) = . . . = λK (0) = 1. We denote a new set ad (t) = (ad1 (t) , ad2 (t) , . . . adK (t)) in order to select specific vehicles. After we normalize the action, if the value of λi (t) is greater than or equal to 0.5, adi (t) is recorded as 1, otherwise 0. Then we get the set that is composed of of 0 and 1 where the binary value stands for if a vehicle is selected or not. 5.3

Reward

We aim to select a vehicle with better performance for AFL to obtain a more accurate global model at the RSU where both the local training delay, transmission delay and the accuracy of the global model are critical metrics. Thus, we define the system reward at time slot t as: ⎡ ⎤ K    Tli + Tui (t) adi (t) ⎢ ⎥ K ⎢ω1 Loss (t) + ω2 i=1 ⎥ r (t) = − K (11) ⎣ ⎦ K   λi (t) adi (t) i=1

i=1

where ω1 and ω2 are the non-negative weighting factors, Loss (t) is the loss computed by AFL, which will be discussed later. The expected long-term discount reward of the system can be expressed as: N   t−1 J (μ) = E γ r (t) (12) t=1

where γ ∈ (0, 1) is the discounted factor, N is the total number of time slots, μ is the policy of system. In this paper we aim to find an optimal policy to maximize the expected long-term discounted reward of the system.

6

DRL-Based AFL Weight Optimization:DAFL

In this section, we introduce the overall system framework, and the training stage to obtain the optimal strategy, then present the testing stage for the performance evaluation of our model.

12

6.1

Q. Wu et al.

Training Stage

Considering the state and action spaces are continuous and DDPG is suitable for solving DRL problems under continuous state and action spaces, we employ DDPG to solve our problem. DDPG algorithm is based on actor-critic network architecture. The actor network is used for policy promotion, and the critic network is used for policy evaluation. Here, both actor and critic network are constructed by deep neural network (DNN). Specifically, actor network is used to approximate policy μ, and the approximated policy is expressed as μδ . The actor network observes state and output the action based on policy μδ . We improve and evaluate the policy iteratively in order to obtain the optimal policy. To ensure the stability of the algorithm, the target network composed of the target actor network and the target critic network are also employed in the DDPG, where the architectures are the same as the original actor and critic networks, respectively. The proposed algorithm is shown in Algorithm 1.

Algorithm 1: Training Stage for the DAFL-based Framework

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Input: γ, τ , δ, ξ, a (0) = (1, 1, . . . , 1) Output: optimized δ ∗ , ξ ∗ Randomly initialize the δ, ξ; Initialize target networks by δ1 ← δ, ξ1 ← ξ; Initialize replay buffer Rb ; for episode from 1 to Emax do Reset simulation parameters of system model, initialize the global model at the RSU; Receive initial observation state s (1); for time slot t from 1 to N do Generate the action according to the current policy and exploration noise a = μδ (s|δ) + Δt ; Compute ad , get the selected vehicles; The selected vehicles conduct AFL based on weight to train global model at RSU; Get the reward r and the next state s ; Store transition (s, a, r, s ) in Rb ; if number of tuples in Rb is larger than I then Randomly sample a mini-batch of I transitions tuples from Rb ; Update the critic network by minimizing the loss function according to Eq. (16); Update the actor network according to Eq. (17); Update target networks according to Eqs. (18) and (19).

Let δ be the actor network parameter, ξ be the critic network parameter, δ ∗ be the optimized actor network parameter, ξ ∗ be the optimized critic network

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

13

parameter, δ1 be the target actor network parameter and ξ1 be the target critic network parameter. τ is the update parameter of target network and Δt is the exploration noise at time slot t. I is the size of mini-batch. Now, we will describe our algorithm in detail. First, we initialize δ and ξ randomly, and initialize δ1 and ξ1 in the target network as δ and ξ respectively. At the same time, we initialize the replay buffer Rb . Then our algorithm will be executed for Emax episodes. In the first episode, we first initialize the positions of all vehicles, the channel states and computing resources of vehicles, and set λ1 (0) =λ2 (0) = . . . =λK (0) =1. Then at time slot 1, the system can get the state, i.e., s (1) = (T r (1) , μ (1) , dx (1) , a (0)). Meanwhile, the convolutional neural network (CNN) is employed as the global model w0 at the RSU. Our algorithm will execute from time slot 1 to time slot N . At time slot 1, actor network can get the output μδ (s|δ) according to the state. Note that we add a random noise Δt to the action, and the system can get the action as a (1) = μδ (s (1) |δ) + Δt . Thus we get ad (1) based on the action and determine the selected vehicles at this time slot. The selected vehicles will conduct AFL. That is, all the selected vehicles train their local models according to their local data, then upload them to RSU asynchronously for global model updating. Given the action, we can get the reward at time slot 1. After that we update the positions of vehicles according to Eq. (2), recalculate the channel states and the available computing resources of the vehicles, and update the transmission rates of the vehicles according to Eq. (4). Then it can get the next state s (2). The related sample (s (1) , a (1) , r (1) , s (2)) will be stored in Rb . Note that the system will iteratively calculate and store the samples into Rb , until reaching the capacity of Rb . When the number of tuples is bigger than I in Rb , the parameters δ, ξ, δ1 and ξ1 of actor network, critic network and target networks respectively will be trained to maximize J (μδ ). Here, δ is updated towards the gradient direction of J (μδ ), i.e., ∇δ J (μδ ). We set Qμδ (s (t) , a (t)) as the action-value function which obeys policy μδ under s (t) and a (t), it can be expressed as:   N  k1 −t γ r (k1 ) (13) Qμδ (s (t) , a (t)) = Eμδ k1 =t

It represents the long term expected discount reward at time slot t. The existing paper has proved that the solution to ∇δ J (μδ ) can be replaced by solving the gradient of Qμδ (s (t) , a (t)), i.e., ∇δ Qμδ (s (t) , a (t)) [46]. Due to the continuous action space of Qμδ (s (t) , a (t)), it cannot be solved by Bellman Equation [47]. In order to solve this problem, critic network uses ξ to approximate Qμδ (s (t) , a (t)) by Qξ (s (t) , a (t)). When the number of tuples is bigger than Iin Rb , system will sample I tuples  randomly from Rb to form a mini-batch. Let sx , ax , rx , sx , x ∈ [1, 2, . . . , I] be 

the x-th tuple in the mini-batch. The system will input the sx to the target actor

14

Q. Wu et al.

      network, and get the output action ax = μδ1 sx |δ1 . Then input sx and ax to    the target critic network, and get the action-value function Qξ1 sx , ax . The target value can be calculated as:    yx = rx + γQξ1 sx , ax |a =μδ (s |δ1 ) (14) x

1

x

According to sx and ax , critic network will output the Qξ (sx , ax ), then the loss of tuple x is given by: 2 (15) Lx = [yx − Qξ (sx , ax )] When all the tuples act as the input to the critic network and target networks, we can get the loss function: L (ξ) =

I 1 Lx I x=1

(16)

In this case, the critic network updates ξ by employing the gradient descent of ∇ξ L (ξ) to the loss function L (ξ). Similarly, actor network updates δ by employing the gradient ascent, i.e., ∇δ J (μδ ), to minimize J (μδ ) [48], where ∇δ J (μδ ) is calculated by action-value function approximated by critic network as follows: ∇δ J(μδ ) ≈

I 1 ∇δ Qξ (sx , aμx )|aμx =μδ (sx |δ) I x=1

I 1 = ∇ μ Qξ (sx , aμx )|aμx =μδ (sx |δ) I x=1 ax

(17)

· ∇δ μδ (sx |δ) Here the input of Qξ is aμx = μδ (sx |δ). In the end of a time slot t, we update the parameters of target networks as follows: (18) ξ1 ← τ ξ + (1 − τ ) ξ1 δ1 ← τ δ + (1 − τ ) δ1

(19)

where τ is a constant and τ  1. Then input s to actor network and start the same procedure for the next time slot. When the time slot t reaches N , this episode is completed. In this case, system will initialize the state s (1) = (T r (1) , μ (1) , dx (1) , a (0)) and execute the next episode. When the number of episodes reaches Emax , the training is finished. We get the optimized δ ∗ , ξ ∗ , δ1∗ and ξ1∗ . The overall DDPG flow diagram is listed in Fig. 2.

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

15

Fig. 2. DDPG flow diagram

6.2

Process of AFL

In this section, we will introduce the process of AFL in detail, which is used in the step 10 in Algorithm 1. Let Vk , k ∈ [1, Kl ] be the selected vehicles, where Kl is the total number of the selected vehicles. In the AFL, each vehicle will go through three stages: global model downloading, local training, uploading and updating. Specifically, vehicle Vk will first download the global model from the RSU, then it will train a local model using local data for some iterations. Then it will upload the local model to the RSU. Once the RSU receives a local model, it updates the global model immediately. To be clearly, we use the AFL training at time slot t of vehicle Vk as an example. Downloading the Global Model. In time slot t, vehicle Vk downloads the global model wt−1 from the RSU. Note that the global model at the RSU is initialized as w0 using CNN at the beginning of the whole training process. Local Training. Vehicle Vk trains local model (CNN) based on its local data. The local training includes l iterations. In iteration m (m ∈ [1, l]), vehicle Vk first inputs the data a into the CNN of local model wk,m , then outputs the prediction probability yˆa of each label ya of data a. Cross-entropy loss function is used to compute the loss of wk,m : fk (wk,m ) = −

Di  a=1

ya logyˆa

(20)

16

Q. Wu et al.

Then stochastic gradient descent (SGD) algorithm is used to update our model as follows: (21) wk,m+1 = wk,m − η∇fk (wk,m ) where ∇fk (wk,m ) is the gradient of fk (wk,m ), η is the learning rate. Vehicle Vk will use the updated local model in the proceeding iteration of m + 1. The local training will stop when the iteration reaches l. At this time, the vehicle gets the updated local model wk . For local model wk , the loss is: fk (wk ) = −

Di 

ya logyˆa

(22)

a=1

In our proposed scheme, the impact of the delay on the model has also been investigated. Specifically, the local training and local model uploading will incur some delay, during which other vehicles may upload the local models to the RSU. In this situation, the local model of this vehicle will have staleness. Considering this issue, we introduce the training weight and transmission weight. The training weight is related to the local training delay, and it can be expressed as: Vk (23) β1,k = m1 Tl −0.5 where TlVk is the local training delay of vehicle Vk , which can be calculated by Eq. (1). m1 ∈ (0, 1) is the parameter to make β1,k decrease with the increase of local training delay. The transmission weight is related to the transmission delay of vehicles for uploading local models to the RSU. Here, due to the downloading delay, i.e., the duration of vehicles downloading the global model from the RSU, is very small compared with transmission delay so it can be ignored, thus the transmission weight can be denoted as: Vk (t)−0.5

β2,k (t) = m2 Tu

(24)

where TuVk (t) is the transmission delay of Vk , which can be calculated by Eq. (8), m2 ∈ (0, 1) is the parameter to make β2,k decrease with the increase of transmission delay. Then we can get the weight optimized local model, i.e., wkw = wk ∗ β1,k ∗ β2,k

(25)

Uploading and Updating. When vehicle Vk uploads the weight optimized local model, the RSU will update the global model. The formula is wnew = βwold + (1 − β) wkw

(26)

where wold is the current global model at the RSU, wnew is the updated global model, β ∈ (0, 1) is the proportion of aggregation.

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

17

When RSU receives the first uploaded weight optimized local model, wold = wt−1 . When RSU receives all the weight optimized local models of selected vehicles and get the global model wt updated for Kl rounds, it indicates the global model updating is finished at this time slot. At the same time, we get the average loss of the selected vehicles: Loss (t) =

Kl 1  fk (wk ) Kl

(27)

k=1

Then step 10 in Algorithm 1 is completely explained. The procedure of AFL training is shown in Algorithm 2.

Algorithm 2: weight optimized AFL scheme 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

6.3

Initialize the global model w0 ; for each round x from 1 to Kl do wk ← Vehicle Updates(w0 ); Vehicle Vk calculates the weight optimized local model wkw based on Eq. (25); Vehicle Vk uploads the weight optimized local model wkw to the RSU; RSU receives the weight optimized local model wkw ; RSU updates the global model based on Eq. (26); return wnew Get the updated global model wt after Kl rounds. Vehicle Update(w): Input: w0 for each local iteration m from 1 to l do Vehicle Vk calculates the cross-entropy loss function based on Eq. (20); Vehicle Vk updates the local model based on Eq. (21); Set wk = wk,l ; return wk

Testing Stage

The testing stage employs the achieved critic network, target actor network and target critic network in the training stage. In the testing stage, the system will select the policy with optimized parameter δ ∗ . The process of testing stage is shown in Algorithm 3.

18

Q. Wu et al.

Algorithm 3: Testing Stage for the DAFL-based Framework 1 2 3 4 5 6 7 8

7 7.1



for episode from 1 to Emax do Reset simulation parameters of system model, initialize the global model at the RSU; Receive initial observation state s (1); for time slot t from 1 to N do Generate the action according to the current policy a = μδ (s|δ) ; Compute ad , get the selected vehicles; The selected vehicles conduct AFL based on weight to train global model at the RSU;  Get the reward r and the next state s ;

Simulation and Results Simulation Setup

In this section, the simulation tool is python 3.9. Our actor network and critic network are all selected as the DNN with two hidden layers. The two hidden layers have 400 and 300 neurons, respectively. The exploration noise obeys the Ornstein-Uhlenbeck (OU) process with variation 0.02 and decay-rate 0.15. We use MNIST dataset to allocate data to vehicles and the computing resources of vehicles obey a truncated Gaussian distribution. Here the unit of the computing resources is CPU-cycles/s. We configured a vehicle as the bad vehicle, that is, it has small amount of data and computing resources, and the local model is disturbed by random noise. The rest simulation parameters are shown in Table 2. Table 2. Parameters of simulation Parameter Value

Parameter Value

γ

0.99

Hr

10 m

τ

0.001

B

1000 HZ

I

64

p0

0.25w

Emax

1000

σ2

10−9 mw



Emax

3

Λ

7m

K

5

|w|

5000 bits

v

20 m/s

α

2

C0

106 CP U − cycles m1

0.9

t

0.5

0.9

dy

5m

m2

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

19

Fig. 3. Reward for different epochs

7.2

Experiment Results

Figure 3 shows the system reward with respect to different epochs in the training stage. One can see that when the number of epoches is small, the reward has a large variation. This is due to that the system is learning and optimizing the network in the initial phase, so some explorations (i.e., action) incur poor performance. As the number of epoches increases, the reward gradually becomes stable and smoother. It means that at this time, the system has gradually learned the optimal policy, and the training of the neural network has been close to be completed. Figure 4 depicts the two components of the reward in the testing stage: the loss calculated in the AFL and the sum of the local training delay and the transmission delay. It can be seen that the loss is decreasing as the number of steps increases. This is attributed to the fact that vehicles are constantly uploading local model to update global model at the RSU, so the global model becomes more accurate. The sum of delay shows a certain fluctuation, because of the dynamic available computing resources of the vehicle and its time-varying location. Figure 5 shows the accuracy of our scheme, traditional AFL and traditional FL in the presence of bad node. From the figure one can see that the accuracy of

20

Q. Wu et al.

our scheme remains at a good level and gradually increases and finally reaches stability. This indicates that our scheme can effectively remove the bad node in the model training. Since traditional AFL and FL do not have the function of selecting the vehicles, their accuracy are seriously affected by the bad node, resulting in large fluctuations.

Fig. 4. Relation between delay and loss

Fig. 5. Accuracy v.s. number of steps in testing stage

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

21

Fig. 6. Accuracy with optimized model weights in testing stage

Fig. 7. Training delay v.s. number of training rounds

Figure 6 shows the accuracy of our scheme and traditional AFL and FL after the vehicle selection. From the figure one can see that the accuracy of all schemes are increasing when the number of step increases and finally become stable. However, our scheme has the highest accuracy among them. This is because our scheme considers the impact of local training delay and transmission delay of vehicles in global model updating.

22

Q. Wu et al.

Figure 7 depicts the training delay of our scheme compared to the FL as the global round (i.e., step) increases. It shows the delay of FL remains to be high while our scheme keeps a small delay. This is because FL only starts updating the global model when all local models of selected vehicles are received, whereas in our scheme, RSU is updated every time when it receives a local model uploaded from a vehicle. At the same time, we can observe that the delay of our scheme appears to rise at first and then fall and rise again. This is because the proposed scheme selects four vehicles to update the global model one by one. At the same time, owning to the large local computing delay of the vehicles compared to the transmission delay, the local computing delay of the vehicle dominates. In this case, since the vehicle that finishes the earliest local training will update the global model first, the training delay gradually increases. When the four vehicles all update the global model, the vehicles will repeat the above global model updating until reaches the maximum number of step.

Fig. 8. Accuracy v.s. different beta

Figure 8 depicts the accuracy of our scheme and traditional AFL with selected vehicles under different value of β. It shows when β is small, the accuracy of the model keeps relatively high. In contrast, when β gradually increases, the accuracy of global model gradually decreases. This is because when β is relatively large, the weight of the local model is much small, thus the update of the global model mainly depends on the previous hyperparameter values of global model, which decreases the influence and contribution of new local model of all vehicles, so it has a significant impact on the accuracy of global model. At the same time, the accuracy of our scheme is better than that of AFL. This is because our scheme has considered the influence of local computing delay and transmission delay of vehicles.

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

8

23

Conclusion

In this paper, we considered the vehicle mobility, time-varying channel states, time-varying computing resources of vehicles, different amount of local data of vehicles and the situation of bad node and proposed a DAFL-based framework. The conclusions are summarized as follows: – The accuracy of our scheme is better than traditional AFL and FL. This is due to that our scheme can effectively remove the bad node and thus prevent the global model updating from being affected by bad node. – For the absent of bad nodes, the accuracy of our scheme is still higher than that of AFL and FL, because our scheme considers the mobility of the vehicles, time-varying channel conditions, available computing resources of the vehicles and different amounts of local data to allocate different weights to different vehicles’s local model in AFL. – The aggregate proportion β affects the accuracy of the global model. Specifically, when β is relatively small, it can get the desirable accuracy.

References 1. Xu, X., Li, H., Xu, W., Liu, Z., Yao, L., Dai, F.: Artificial intelligence for edge service optimization in Internet of Vehicles: a survey. Tsinghua Sci. Technol. 27(2), 270–287 (2022) 2. Wu, Q., Liu, H., Zhang, C., Fan, Q., Li, Z., Wang, K.: Trajectory protection schemes based on a gravity mobility model in loT. Electronics 8(2), 148 (2019) 3. Fan, J., Yin, S., Wu, Q., Gao, F.: Study on Refined Deployment of Wireless Mesh Sensor Network, In: 2010 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM), Chengdu, China, pp. 1–5 (2010) 4. Fan, J., Wu, Q., Hao, J.: Optimal deployment of wireless mesh sensor networks based on Delaunay triangulations, In: 2010 International Conference on Information, Networking and Automation (ICINA), Kunming, China, pp. 370–374 (2010) 5. Wu, Q., Ge, H., Fan, P., Wang, J., Fan, Q., Li, Z.: Time-dependent Performance Analysis of the 802.11p-based Platooning Communications Under Disturbance. IEEE Trans. Veh. Technol. 69(12), 15760–15773 (2020) 6. Liu, J., Ahmed, M., Mirza, M., Khan, W., Xu, D., Li, J., Aziz, A., Han, Z.: RL/DRL Meets Vehicular Task Offloading Using Edge and Vehicular Cloudlet: a Survey. IEEE Internet Things J. 9(11), 8315–8338 (2022) 7. Wu, Q., Shi, S., Wan, Z., Fan, Q., Fan, P., Zhang, C.: Towards V2I Age-aware Fairness Access: a DQN Based Intelligent Vehicular Node Training and Test Method, Chin. J. Electron. (2022 ). https://doi.org/10.23919/cje.2022.00.093 8. Xu, X., Li, H., Xu, W., Liu, Z., Yao, L., Dai, F.: Artificial intelligence for edge service optimization in Internet of Vehicles: a survey. Tsinghua Sci. Technol. 27(2), 270–287 (2022) 9. Cheng, W., Luo, E., Tang, Y., Wan, L., M. Wei, M.: A Survey on Privacy-security in Internet of Vehicles In: 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), AB, Canada, pp. 644–650 (2021)

24

Q. Wu et al.

10. Wan, S., Lu, J., Fan, P., Letaief, K.: To smart city: public safety network design for emergency. IEEE access 6, 1451–1460 (2017) 11. Nguyen, D.C., Ding, M., Pathirana, P.N., Seneviratne, A., Li, J., Vincent Poor, H.: Federated Learning for Internet of Things: a Comprehensive Survey IEEE Commun. Surv. Tutorials 23(3), pp. 1622–1658 (2021) 12. Xing, L., Zhao, P., Gao, J., Wu, H., Ma, H.: A Survey of the Social Internet of Vehicles: Secure Data Issues, Solutions, and Federated Learning. IEEE Intell. Trans. Syst. Mag. 15(2), pp. 70–84 (2023) 13. Zhu, Z., Wan, S., Fan, P., Letaief, K.: Federated multiagent actor-critic learning for age sensitive mobile-edge computing, 9(2), pp. 1053–1067 (2021) 14. Wu, Q., Wang, X., Fan, Q., Fan, P., Zhang, C., Li, Z.: High Stable and Accurate Vehicle Selection Scheme based on Federated Edge Learning in Vehicular Networks. Chin. Commun. 20(3), 1–17 (2023). https://doi.org/10.23919/JCC.2023.03.001 15. Wang, Z., Xie, G., Chen, J., Yu, C.: Adaptive asynchronous federated learning for edge intelligence. In: 2021 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), pp. 285–289 (2021) 16. Wang, Z., et al.: Asynchronous Federated Learning Over Wireless Communication Networks. IEEE Trans. Wireless Commun. 21(9), pp. 6961–6978 (2022) 17. Wu, Q., Zhao, Y., Fan, Q., Fan, P., Wang, J., Zhang, C.: Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning. IEEE J. Sel. Top. Sig. Process. 17(1), 66–81 (2022) 18. Wu, Q., Zheng, J.: Performance modeling and analysis of the ADHOC MAC protocol for VANETs, In: 2015 IEEE International Conference on Communications, London, United Kingdom, pp. 3646–3652 (2015) 19. Wu, Q., Zheng, J.: Performance modeling and analysis of the ADHOC MAC protocol for vehicular networks. Wireless Netw. 22(3), 799–812 (2016) 20. Chen, X., Wei, W., Yan, Q., Yang, N., Huang, J.: Time-delay deep Q-network based retarder torque tracking control framework for heavy-duty vehicles. IEEE Trans. Veh. Technol. 72(1), 149–161 (2023) 21. Wu, Q., Xia, S., Fan, P., Fan, Q., Li, Z.: Velocity-adaptive V21 fair-access scheme based on IEEE 802.11 DCF for platooning vehicles, Sensors 18(12), pp. 4198 (2018) 22. Wu, Q., Zhao, Y., Fan, Q.: Time-dependent performance modeling for platooning communications at intersection. IEEE Internet Things J. 9(19), 18500–18513 (2022) 23. Wang, Q., Wu, D., Fan, P.: Delay-constrained optimal link scheduling in wireless sensor networks. IEEE Trans. Veh. Technol. 59(9), 4564–4577 (2010) 24. Saputra, Y.M., Nguyen, D.N., Hoang, D.T., Dutkiewicz, E.: Selective Federated Learning for On-Road Services in Internet-of-Vehicles In: 2021 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. Madrid, Spain (2021) 25. Long, D., Wu, Q., Fan, Q., Fan, P., Li, Z., Fan, J.: A Power Allocation Scheme for MIMO-NOMA and D2D Vehicular Edge Computing Based on Decentralized DRL Sensors 23(7), pp. 3449 (2023) 26. Zhou, X., Liang, W., She, J., Yan, Z., Wang, K.I.-K.: Two-layer federated learning with heterogeneous model aggregation for 6G supported internet of vehicles. IEEE Trans. Veh. Technol. 70(6), 5308–5317 (2021) 27. Zhang, L., Saito, H., Yang, L., Wu, J.: Privacy-preserving federated transfer learning for driver drowsiness detection. IEEE Access 10, 80565–80574 (2022) 28. Xiao, H., Zhao, J., Pei, Q., Feng, J., Liu, L., Shi, W.: Vehicle selection and resource optimization for federated learning in vehicular edge computing. IEEE Trans. Intell. Transp. Syst. 23(8), 11073–11087 (2022)

DRL Based Vehicle Selection for AFL Enabled Vehicular Edge Computing

25

29. Saputra, Y.M., Dinh, H.T., Nguyen, D., Tran, L.-N., Gong, S., Dutkiewicz, E.: Dynamic federated learning-based economic framework for internet-of-vehicles. IEEE Trans. Mob. Comput. 22(4), 2100–2115 (2021) 30. Yan, M., Chen, B., Feng, G., Qin, S.: Federated cooperation and augmentation for power allocation in decentralized wireless networks. IEEE Access 8, 48088–48100 (2020) 31. Ye, D., Huang, X., Wu, Y., Yu, R.: Incentivizing Semisupervised Vehicular Federated Learning: a Multidimensional Contract Approach With Bounded Rationality. IEEE Internet Things J. 9(19), 18573–18588 (2022) 32. Kong, X., et al.: A Federated Learning-Based License Plate Recognition Scheme for 5G-enabled Internet of Vehicles. IEEE Trans. Industr. Inf. 17(12), 8523–8530 (2021) 33. Saputra, Y.M., Nguyen, D.N., Hoang, D.T., Vu, T.X., Dutkiewicz, E., Chatzinotas, S.: Federated Learning Meets Contract Theory: economic-efficiency Framework for Electric Vehicle Networks. IEEE Trans. Mob. Comput. 21(8), 2803–2817 (2022) 34. Ye, D., Yu, R., Pan, M., Han, Z.: Federated Learning in Vehicular Edge Computing: a Selective Model Aggregation Approach. IEEE Access 8, 23920–23935 (2020) 35. Zhao, Y., et al.: Local Differential Privacy-Based Federated Learning for Internet of Things. IEEE Internet Things J. 8(11), 8836–8853 (2021) 36. Li, Y., Tao, X., Zhang, X., Liu, J., Xu, J.: Privacy-Preserved Federated Learning for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 23(7), 8423–8434 (2022) 37. Taïk, A., Mlika, Z., Cherkaoui, S.: Clustered Vehicular Federated Learning: process and optimization. IEEE Trans. Intell. Transp. Syst. 23(12), 25371–25383 (2022) 38. Hui, Y., et al.: Digital Twins Enabled On-demand Matching for Multi-task Federated Learning in HetVNets. IEEE Trans. Veh. Technol. 72(2), 2352–2364 (2022) 39. Liu, S., Yu, J., Deng, X., Wan, S.: FedCPF: An Efficient-Communication Federated Learning Approach for Vehicular Edge Computing in 6G Communication Networks. IEEE Trans. Intell. Transp. Syst. 23(2), 1616–1629 (2022) 40. Lv, P., Xie, L., Xu, J., Wu, X., Li, T.: Misbehavior Detection in Vehicular Ad Hoc Networks Based on Privacy-Preserving Federated Learning and Blockchain. IEEE Trans. Netw. Serv. Manage. 19(4), 3936–3948 (2022) 41. Khan, L.U., Tun, Y.K., Alsenwi, M., Imran, M., Han, Z., Hong, C.S.: A Dispersed Federated Learning Framework for 6G-Enabled Autonomous Driving Cars. IEEE Trans. Netw. Sci. Eng., (2022). https://doi.org/10.1109/TNSE.2022.3188571 42. Samarakoon, S., Bennis, M., Saad, W., Debbah, M.: Distributed Federated Learning for Ultra-Reliable Low-Latency Vehicular Communications. IEEE Trans. Commun. 68(2), 1146–1159 (2020) 43. Hammoud, A., Otrok, H., Mourad, A., Dziong, Z.: On Demand Fog Federations for Horizontal Federated Learning in IoV. IEEE Trans. Netw. Serv. Manage. 19(3), 3062–3075 (2022) 44. Tian, G., Ren, Y., Pan, C., Zhou, Z., Wang, X.: Asynchronous Federated Learning Empowered Computation Offloading in Collaborative Vehicular Networks, In: 2022 IEEE Wireless Communications and Networking Conference (WCNC), pp. 315– 320. Austin, TX, USA (2022) 45. Pan, C., et al.: Asynchronous Federated Deep Reinforcement Learning-Based URLLC-Aware Computation Offloading in Space-Assisted Vehicular Networks, IEEE Trans. Intell. Trans. Syst. (2022). https://doi.org/10.1109/TITS.2022. 3150756

26

Q. Wu et al.

46. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms In: 2014 International Conference on Machine Learning(ICML), Beijing, Chain, pp. 387–395 (2014) 47. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. IEEE Trans. Neural Netw. 9(5), 51–52 (1998) 48. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2015). arXiv:1509.02971

A Virtual Community Healthcare Framework in Metaverse Enabled by Digital Twins Qian Qu, Han Sun, and Yu Chen(B) Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY 13902, USA {qqu2,hsun28,ychen}@binghamton.edu

Abstract. Many developed economic bodies are facing an unprecedented increase in the aging population, which brings high demand for healthcare services and it is a specific challenge to ensure the health and safety of seniors living alone in a residential community or in large-scale nursing houses. Thanks to the proliferation of the Internet of Things (IoT), Artificial Intelligence (AI)/Machine Learning (ML) algorithms, and the fifth-generation and beyond (B5G) communication technologies, digital healthcare services (DHS) provide a promising solution. This paper proposes a virtual community healthcare framework (VirCom) leveraging the Internet of Medical Things (IoMT), Digital Twins (DT), and Metaverse, which envisions a seamless interwoven of the physical world and cyberspace. In the context of VirCom, each individual senior is considered a physical object (PO) no matter whether living alone at home or in a room of a nursing house, and their activities and health status are mirrored to corresponding logical objects (LO) in a virtual community in the Metaverse, where activity recognition, potential risk prediction, and alert generation are realized. Specifically, a Multisensor Action Recognition-based Senior falling detection (MARS) system is presented as a case study of the VirCom framework, aiming at protecting seniors’ safety without compromising their privacy but allowing an instant alerting system when anomalies are detected. An extensive experimental study validated the feasibility of the VirCom framework. Keywords: Virtual Community · Digital Healthcare Internet of Medical Things (IoMT) · Metaverse

1

· Digital Twins ·

Introduction

Many developed economic bodies are facing an unprecedented increase in the aging population [8], which brings high demand for healthcare services and it is a specific challenge to ensure the health and safety of seniors living alone in a residential community or in large-scale nursing houses [32]. Figure 1 presents the statistics from U.S. Census Bureau by 2020. The number of seniors aged 65 c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023  H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 27–46, 2023. https://doi.org/10.1007/978-981-99-3581-9_2

28

Q. Qu et al.

Fig. 1. Number of persons age 65 and older in the U.S.

and above in the U.S. has reached 54 million by 2019 and would approximately reach 80 million by 2040. Ensuring the health and safety of seniors living alone in a residential community or a nursing home presents a significant challenge. Seniors may have limited access to healthcare services, which can be compounded by physical limitations that prevent them from traveling to healthcare facilities. And they are more vulnerable to accidents or medical emergencies, such as falls or heart attacks, which can have serious consequences if not addressed promptly. Moreover, the patients may face challenges in managing their health conditions, such as medication management and chronic disease management. This can be especially dangerous for seniors who have multiple chronic conditions or who require frequent medication adjustments. Digital healthcare services (DHS) are a promising application of digital technologies to address these challenges. DHS involves the use of digital technologies, such as the Internet of Things (IoT), Artificial Intelligence (AI)/Machine Learning (ML) algorithms, and the fifth-generation and beyond (B5G) communication technologies, to provide healthcare services remotely [33]. DHS has the potential to revolutionize healthcare delivery by enabling patients to receive high-quality healthcare services from the comfort of their homes. Besides the technologies mentioned above, Digital Twins (DT) have gained immense popularity in recent years in many areas, from decentralized network performance monitoring [24] to digital healthcare services [23]. DTs enable the creation of virtual models of patients, which can be used to personalize treatment plans and improve patient outcomes. In virtual healthcare, digital twins can be used to monitor patients remotely and provide real-time feedback on their health status. For example, a digital twin can be created for a patient with a chronic

A Virtual Community Healthcare Framework in Metaverse

29

disease, such as diabetes or heart disease. The digital twin can monitor the patient’s vital signs, such as blood pressure, heart rate, and glucose levels, and provide real-time feedback to the patient and their healthcare provider. This can enable early intervention in case of any abnormal changes in the patient’s health status, preventing complications and hospitalizations. In addition to personalized care, digital twins can also support predictive analysis and machine learning algorithms. By analyzing large amounts of patient data, these algorithms can identify patterns and predict potential health risks, enabling proactive interventions to prevent adverse events. This can be especially beneficial for seniors who live alone, as they may be at higher risk for health complications and require more frequent monitoring. Another application of digital twins in virtual healthcare is simulation and training. Healthcare professionals can use digital twins to simulate medical procedures and scenarios, allowing them to practice and improve their skills in a safe and controlled environment. This can help reduce medical errors and improve patient safety. In this paper, we propose a virtual community healthcare framework (VirCom) leveraging the Internet of Medical Things (IoMT) and Digital Twins (DT) under the umbrella of Metaverse, which envisions a seamless interwoven of the physical reality and virtual world. In the context of VirCom, each individual senior is considered a physical object (PO) no matter whether living alone at home or in a room of a nursing house, and their activities and health status are mirrored to corresponding logical objects (LO) in a virtual community in the Metaverse, where activity recognition, potential risk prediction, and alert generation are realized. Specifically, a Multi-sensor Action Recognition-based Senior falling detection (MARS) system is presented as a case study of the VirCom framework, aiming at protecting seniors’ safety without compromising their privacy but allowing an instant alerting system when anomalies are detected. An extensive experimental study validated the feasibility of the VirCom framework. In brief, the key contributions of this paper are highlighted as follows: (1) From the architecture aspect, a comprehensive virtual community healthcare framework called VirCom is envisioned, which illustrates how a physical world smart residential community like a large-scale nursing house can be projected into the virtual Metaverse space leveraging the digital twin technology. (2) Focusing on falling detection, a typical senior safety monitoring scenario, a Multi-sensor Action Recognition-based Senior falling detection (MARS) system is explored as a case study to validate the feasibility of the complex VirCom framework. An instant alerting system is designed to predicate behaviors of objects and notifies first responders in real-time when anomalous events are identified; and (3) A proof-of-concept prototype of the MARS system is implemented and tested under a physical network that simulates the case of seniors’ safety. The experimental results validated the rationale of the proposed VirCom framework and verified the MARS system is able to meet the design goal.

30

Q. Qu et al.

The rest of this paper is structured as follows. In Sect. 2, the background knowledge of IoMT and Digital Twins is provided. Section 3 presents our VirCom system architecture and Sect. 4 discusses the Multi-sensor Action Recognitionbased Senior falling detection (MARS) system. The experimental results are presented in Sect. 5. Finally, Sect. 6 concludes this paper with some discussions about the ongoing efforts and future work.

2 2.1

Background and Related Work IoMT Applications

An IoMT-based healthcare application is a collection of smart devices designed to collect health-related data through sensors connected to a Body Sensor Network (BSN), such as wearables, remote controls, and implants [10]. With the rapid development of hardware devices and machine learning technology, the IoMT application range is expanding from personal health management to large-scale hospital management. IoMT can be divided into indoor and outdoor application scenarios in practical applications [29]. Indoor applications can take advantage of more sensors because the devices are directly connected to the power supply, enabling the short-term goal of timely alerts through collaboration between multiple devices while storing data for long-term analysis. Meanwhile, the targets of outdoor applications usually focus on a single device to handle emergencies, such as smart watch-based fall detection. Continuous health tracking is the most commonly used and classic application of IoMT, from recording and analyzing body data in a healthy state to managing chronic diseases [29]. Fall detection is a critical application, especially for older adults who live alone. Because falls are a fatal threat, and most older adults already have medical problems. Recent advances in this field include intelligent real-time camera-based fall detection using support vector machines and wearable devices to distinguish between daily activities and fall using 3D acceleration collected by sensors attached to the legs [5]. The focus of sleep detection is not only the quality of sleep but, more importantly, the detection of sleep disorders, a common high-incidence health problem [30]. Most sensors for sleep detection are hidden in mattresses or pillows to collect breathing rates and body movements during sleep without affecting sleep quality. Chronic disease monitoring is another application area for continuous health tracking. Continued IoMT assistance will ultimately reduce patient costs and reduce hospital workload [1]. For example, a blood glucose meter works continuously 24/7. The system can send an alarm in time when abnormal blood sugar is detected. In order to exclude the influence of other factors, other signals, such as Electrocardiogram (ECG) signals, are also introduced into the system for auxiliary analysis [31]. Seizure detection is a neurological disorder in which sudden electrical disturbances in the brain cause recurring seizures. Currently, motion recognitionbased systems cannot make accurate judgments [26]. Therefore, more sensors,

A Virtual Community Healthcare Framework in Metaverse

31

such as Electroencephalogram (EEG), must be introduced. The system monitors the patient’s condition by continuously processing neural signals acquired from EEG sensors. In addition, innovative pillows can also be used for epilepsy detection [29]. Anxiety is a common phenomenon in people’s daily life. The IoMT is also used to detect anxiety continuously. One of the most commonly used signals for emotion detection is the EEG. Through real-time analysis and uploading of EEG signals, the system can detect user abnormalities and contact them in time [34]. Disease prediction and behavior prediction are two other areas of healthcarerelated prediction. Health status can be predicted by analyzing the characteristics and patterns of past data and comparing them with Big data-based analysis [21]. The lack of comprehensive databases and individual differences in health status are two significant challenges for predictive applications. A recently reported system collects and examines four biological parameters (body temperature, heartbeat, ECG, and posture), employs an SVM algorithm to make predictions, and then sends the predictions along with all the data to doctors for final decision-making [18]. Predictions based on fuzzy theory are also compelling because disease outcomes are inherently uncertain [21]. Behavior prediction aims to warn individuals to adjust their behavior and provide targeted help in a timely manner. IoMT-based healthcare applications can provide medical personnel with timely and accurate insights into patient health, enabling remote monitoring and diagnosis, personalized medicine, and optimized healthcare delivery. Data collected from the IoMT is also stored for long-term health analysis, which can be used to identify patterns and predict potential health problems. 2.2

Digital Twins in Healthcare

The initial application of DT in healthcare originates from the case of medical facilities [3] where DT models are created for the maintenance of prediction and optimization. As an important subset of the future Metaverse landscape, recent literature works on DT in healthcare are proposed in various aspects including hospital management, medical resource allocation Digital Patients, etc. Applications of DT in hospital management are adopted by some medical institutions. A hospital in Baltimore adopted a “Capacity Command Center” to simulate and analyze the demand of patients according to their activities and correspondingly optimize the hospital capacity to improve the quality of service (QoS) [22]. Moreover, a hospital in Dublin leverages the DT model-based platforms and ML algorithms to overcome the unbalance between patients’ growing demands and the shortage of medical equipment, beds, space, and appointment capacity [27]. DT and Discrete Event simulation (DES) are combined to create a predictive decision-making framework using real-time collected data to optimize the resource allocation for a hospital system [9]. Similarly, DT models are more focused on creating a simulation framework among various hospitals under the

32

Q. Qu et al.

circumstance of major disasters [2]. The system also adopted DES and DT to establish the monitoring and controlling simulation for crises. A Digital Patient is a virtual representation of a patient, created using either historical data or real-time data collected through various sensors [17]. By combining current and past data, a comprehensive medical record can be generated, providing reliable information for medical services like diagnosis and regular examination. With the emergence of low-cost medical sensors, such as smartwatches and smart rings, the idea of Digital Patients has become more feasible, enabling continuous medical monitoring without the need for hospitalization. The use of different sensors, powerful edge devices, and advanced communication techniques make Digital Patient a promising solution for modern medical services. [15] proposed an example of a cloud DT-based healthcare system (CloudDTH) that aims to provide continuous monitoring and diagnosis for senior patients. In this framework, DT models are implemented to facilitate interaction and convergence between physical and virtual spaces in the context of healthcare.

3

Virtual Healthcare in the Metaverse Era

As an essential subset of the Metaverse landscape, virtual healthcare can leverage various state-of-the-art technologies to improve efficiency, guarantee QoS, and reduce unnecessary costs. This section illustrates the high-level architecture of the DT-enabled virtual healthcare in Metaverse and proposed our virtual community healthcare framework (VirCom). 3.1

DT-enabled Virtual Healthcare in Metaverse

The main architecture of the DT-enabled virtual healthcare is shown in Fig. 2. This conceptual architecture includes the key enabling technologies that support the mentioned and other potential applications of DT in virtual healthcare. Communications. DT-enabled virtual healthcare highly relies on the efficiency, accuracy, and robustness of the communications in the system. To address these requirements, technologies such as sixth-generation networks, Bluetooth Low Energy (BLE), Tactile Internet, and Network Slicing can be utilized in different areas and circumstances. For example, BLE can provide low-power-consumption, short-range wireless communication between light-weight medical sensors and edge devices, Network Slicing can improve network performance by creating virtual networks tailored to specific healthcare needs, and 6G can guarantee the data transfer speed and enable a real-time twining process for the system. Multi-layer Computing. Multi-layer Computing refers to utilizing the combination of edge-computing, fog-computing, and cloud-computing to achieve better collaboration in the context of virtual healthcare. For example, tasks like on-site emergency detection and alert should be deployed on the edge layer of the whole

A Virtual Community Healthcare Framework in Metaverse

33

Fig. 2. DT-enabled virtual healthcare in Metaverse.

system. However, considering most edge sensors or devices have limited storage and computation power, heavy tasks like massive data storage or training with large datasets should be carried out on the fog layer or cloud layer based on the capability of the system. Data Collection. Data collection highly relies on the development of IoMT, including the body sensors that collect biosignals, sensors in smart home collecting environmental data, motion detectors, and smart cameras, etc. Body sensors such as heart rate monitors, blood pressure sensors, and glucose sensors are attached to a patient’s body to monitor vital signs in real-time. This enables healthcare professionals to track a patient’s health status and detect any abnormalities, enabling further analysis and prompt medical intervention. Similarly, smart cameras and motion detectors are to monitor the activities of the patients and provide insight into their activity levels. This can be used to monitor patients with chronic conditions such as Parkinson’s disease or monitor elderly patients who may be at risk of falls. Data Processing. The processing of the collected data is another important aspect of virtual healthcare. As the DT-enabled system highly relies on the timeliness and continuity of data, real-time processing is essential for creating digital twins and is critical in emergency situations. However, online processing involves data as it is received, but not necessarily in real-time. This approach is used when the data is not time-sensitive. It is less time-critical than realtime processing but still requires timely processing to avoid delays in patient

34

Q. Qu et al.

care. Data fusion is adopted to create a comprehensive and accurate representation of a patient’s health status according to the data from various sources. The integration of data enables healthcare professionals to make informed decisions about a patient’s care plan, identify potential health risks, and monitor treatment effectiveness in real-time. Modeling. Modeling in DT-enable virtual healthcare refers to the creation of a digital twin of a patient or a medical environment, such as a hospital room or a nursing home. This virtual replica can be used for a variety of purposes, including medical training, real-time monitoring, and remote consultations. AutoCAD, Unreal Engine, and 3DMax are some of the popular software tools used for creating virtual models in healthcare. These tools can help to create 2D and 3D models of the patient, medical devices, or environment. With advanced graphics and physics engines, we can simulate medical procedures and scenarios in a realistic and engaging way Artificial Intelligence. Artificial Intelligence(AI) plays a crucial role in DTenabled virtual healthcare. By analyzing the large amounts of data collected from smart sensors, AI algorithms can detect certain patterns, make predictions, and provide insights that can improve medical diagnosis, treatment, and monitoring. Security and Privacy. Security and privacy are critical concerns in virtual healthcare, as the transmission and storage of sensitive medical data require robust protection to prevent unauthorized access or misuse. Blockchain can be one of the promising technologies to address these concerns [16,36]. Blockchain can help create a secure and tamper-proof ledger of medical records, which can be accessed by authorized parties only. Each transaction or change to the medical record can be securely recorded on the blockchain, making it easy to track who accessed the record and when. Non-fungible Token (NFT) is another potential technique [7], whose characteristics of immutability, traceability, and uniqueness can help secure access to medical records without compromising patient privacy. Ownership of the NFT can be easily transferred to other authorized parties when needed. 3.2

VirCom: System Architecture

Leveraging these key enabling technologies, we propose our virtual community healthcare framework (VirCom). This system can be regarded as a subset of a future Metaverse which aims to provide healthcare service to the residents living in a community like a large-scale nursing home. Figure 3 demonstrates an overview of VirCom system architecture that consists of three major layers and a service block.

A Virtual Community Healthcare Framework in Metaverse

35

Fig. 3. Illustration of VirCom system architecture.

User Layer. The user layer is the infrastructure layer consisting of multiple personal DT scenarios. Each scenario contains a smart home environment located in a residential community like a nursing home. A trust support unit is deployed on a personal computer (PC) or an edge server within the smart home network, along with other registered IoMT devices. This support unit acts as a gateway that aggregates data streams from the IoMT devices and performs data processing, virtual space maintenance for senior patients, and primary intelligent decision-making operations like ML-based abnormal event detection and on-site emergency alarms. The senior patient, or PO, lives in the smart home environment and is continuously monitored by a smart camera and wearable devices with various on-body sensing functions. The collected data from POs may be transmitted in various communication protocols, which need to be standardized before using DT technology to create corresponding LOs in the virtual space. Community Layer. In the community layer, data storage is facilitated with efficiency and security, whether in a centralized or distributed network environment. For example, the data of the residents in a nursing home can be stored in the central server while patients in a dispersed community may choose blockchain technology for storage. The medical data may include electrocardiograms (ECGs) which may have large volumes, making it impractical to store directly on the blockchain. To solve this issue, Distributed Data Storage (DDS) is introduced as off-chain storage. The InterPlanetary File System (IPFS) is a viable option for storing data in a robust system for file storage and sharing.

36

Q. Qu et al.

Metaverse Layer. Essentially, the Metaverse layer can be regarded as a sum of various subsets of Metaverse such as similar community networks, medical professionals networks established by hospitals, and other virtual healthcare systems. This layer enables data sharing and collaborations between different communities and other medical institutions. However, sensitive data is often the target of malicious third parties, such as unauthorized insurance companies, hackers who sell personal information, and even scammers. To guarantee the integrity, traceability, and impenetrability of data sharing, we can leverage NFT to secure the sharing process. We define agent A here as the default agent who is authorized by the senior patient (PO) when the NFT is minted. Whenever NFT is created or updated, the patient (PO) and the new owner can get a receipt notifying the change of access control status. If any third party, defined as agent B here, asks for sharing of the data from A, the NFT will be updated. Afterward, B can get the data from the DDS as NFT defines B as the new owner of the data. PO and the last owner can burn the NFT while PO always has the history of all transactions. Service Block. Integrating AI and other technologies like Big Data, the service block can offer intelligent healthcare applications. The DT model for senior patients includes comprehensive information such as personal health status, environment data, and location coordinates. By analyzing this real-time and historical data using statistical algorithms and ML methods, analytical services such as anomaly detection and future prediction can be provided. Additionally, the system can establish emergency privacy policies according to patients’ demands and situations. For example, a smart camera can generate a body skeleton image to protect users’ privacy.

4

Multi-Sensor Senior Falling Detection: A Case Study

As a common service block provided by IoMT-based healthcare applications, falling detection aims to monitor the sudden falling of the elder that need urgent care. According to a report by the U.S. Centers for Disease Control and Prevention (CDC), the age-adjusted fall death rate increased from 55.3 per 100,000 seniors in 2012. By 2021, it will be 78.0 per 100,000 older people, an increase of 41%, as it is shown in Fig. 4. The report also shows that falls are the leading cause of injury-related death among adults 65 and older. For older adults, the risk of injury and death increases with age. Therefore, detecting and preventing falls is very important. Among the typical applications, fall detection can be divided into two categories: wearable device-based and remote monitoring-based. Wearable devices often use inertial sensors such as accelerometers and gyroscopes. In addition to exceptional equipment, with the development of technology, smart bracelets can also provide data collection services. Based on the advantages of small size, lightweight, and low power consumption, inertial sensors can capture living bodies’ acceleration and angular velocity and usually achieve higher accuracy when

A Virtual Community Healthcare Framework in Metaverse

37

Deaths from Older Adult Falls

Number of death caused by falling

40000

35000

30000

25000

20000 2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

Year

Fig. 4. Number of death caused by falling.

recognizing walking, running, jumping, etc. At the same time, inertial sensors are generally unaffected by the external environment and can be in real-time. However, inertial sensors are not very accurate in the face of complex motion and are prone to drift over time. Fall recognition based on remote monitoring has also made significant progress in the past decade. In addition to standard RGB cameras, depth pictures can be obtained by cameras. Computer vision algorithms process the images captured by the camera to extract features for action recognition. The advantage of camera-based data is that they contain more motion information and can achieve higher accuracy in the face of complex motion. However, camerabased data collection is often sensitive to lighting and background conditions, requiring higher computing power to extract information. Therefore, it will cause a delay in the real-time requirements of the system in the Internet of Things environment. In addition, action recognition based on a single sensor has gradually encountered a bottleneck, so more and more research has begun to focus on algorithms based on multi-sensor fusion. In this work, a Multi-sensor Action Recognition-based Senior falling detection (MARS) system is investigated as a case study. The detailed description of the multi-sensor fusion-based fall detection system shows the advantage of IoMT in ensuring accuracy while protecting the target’s privacy. The MARS system adopts information fusion technology to leverage the abundant sensory data collected by IoMT. In this section, a brief introduction to the core information fusion technology is included to help our readers to capture the principles, then a detailed discussion of the MARS system is presented.

38

4.1

Q. Qu et al.

Information Fusion Using Dempster-Shafer Evidence Theory

Information Fusion. Since its inception in the 1970 s various data fusion algorithms have been developed, each with different emphases. These algorithms include the weighted sum, Kalman filter algorithm, and Bayesian estimation algorithm. In recent years, with the rapid development and progress of science and technology, new algorithms, such as fuzzy control theory and machine learning, have also been used for data fusion [39]. Weighted majority voting is a technique that assigns weights to each base classifier to indicate the importance of their outputs in the final decision [4]. The weights are determined based on the accuracy of the base classifiers in separating samples, resulting in higher weights for more accurate classifiers. However, this method ignores inaccurate base classifiers, and tuning the weights can be challenging. Kalman filtering, developed in the 1960 s involves two steps: prediction and correction. First, it calculates the relative confidence and estimated covariance between past and current observations from different sensors to minimize the posterior estimated covariance [14]. Kalman filtering is suitable for low-level data fusion due to its low requirement for fused data. However, it is limited to linear and Gaussian transformations, which can cause delays in real-time applications. Machine learning-based fusion methods offer a better approach to information fusion as they can automatically process original input data. Ensemble Learning Systems (ELS) are commonly used for heterogeneous sensors [25]. Data fusion based on neural networks can also achieve good results, but the computational complexity and required computing resources increase with the number of sensors. Dempster-Shafer Evidence Theory. The Dempster-Shafer (DS) evidence theory is a more general framework than the probability theory [6,28]. It represents uncertainty and imprecision by accumulating evidence, enabling a multisensor system to provide accurate information fusion results without needing additional prior information or conditional probabilities. The following are the fundamental concepts: Definition 1. Define the Θ as the frame of discernment(FOD), is a set of N mutually exclusive and exhaustive hypotheses. The Θ is defined as the form of the function set as Eq. 1. Θ = θ1 , θ2 ....θN = θi |i = 1, 2...N

(1)

where N is the number of hypotheses while the θ is an element of the FOD. The 2Θ is the power set, which is a set of all possible subsets of Θ. Definition 2. Define m(A) as the mass function or the Basic Probability Assignment(BPA) which satisfies the conditions in Eq. 2  A⊆2Θ m(A) = 1 (2) m(ø) = 0

A Virtual Community Healthcare Framework in Metaverse

39

Where the first ø is the empty set. value m(A) represents the degree of belief assigned to the set A. Definition 3. Define m1 and then m2 as two sources of evidence on the DOF, the DS combination rule is defined as Eq. 3. ⎧ ⎨ m( 1, 2)(ø) = 0 1 m(A) = 1−K m (A )m (B ), A = ø (3)  ai ∩bj =A 1 i 2 j ⎩ K = ai ∩bj =ø m1 (Ai )m2 (Bj ) The normalization factor K serves as an indicator of the degree of conflict between the two sources that need to be fused. In case there are more than two sources, the combination rule can be iteratively generalized. 4.2

MARS System - Components and Methodology

Figure 5 shows an overview of the three-stage MARS system. First, the remote sensor is established using the Kinect V2 camera, which can gather skeleton data without requiring supplementary processes. Subsequently, the skeleton data is forwarded to the edge device for processing and decision-making. Ultimately, the system outcome is uploaded to the server layer. In an emergency, the system will initiate an alert. Otherwise, the data will be stored.

Fig. 5. An Overview of the Falling Detection Procedure of MARS.

Feature Extraction. The Kinect sensor simplifies the human skeletal model by utilizing only 20 key-point data instead of 206 bones. This simplified model provides sufficient information to describe the motion process of a general human model. The collected data is represented as P(x,y,z), where x and y indicate the position of the point in the two-dimensional plane, and z represents the point’s distance from the camera. The torso remains relatively stable during motion

40

Q. Qu et al.

while the limbs’ points undergo relative motion. In order to minimize the effect of the z-axis and accurately represent the offset of the limb joints relative to the torso, the coordinates of the location points are transformed using the following equation: (4) f = pn − phip (n = 2, 3, ....N ). where pn denotes all the nodes except the hip joint, while phip refers to the hip-center joint [32]. The representation along the x, y, and z axes is: ⎧ m m ⎨Δxn = xm n − xh m m m Δy = yn − yh (5) ⎩ nm Δzn = znm − zhm m m fxm = [Δxm 1 , Δx2 , .....Δxn ]

(6)

Inertial sensors include acceleration and angular velocity sensors. The 3D accelerometer, gyroscope, and heart rate meter are used as wearable sensors in the proposed algorithm. Hence, the obtained data includes a set of threedimensional acceleration and a set of three-dimensional angular velocity. Decision Making. The Recurrent Neural Network (RNN) is a deep neural network that can effectively retain past information in time-series analysis by taking the current and previous hidden states as input. However, the RNN and its variant, the Long short-term memory (LSTM) network, may encounter problems such as gradient explosion and vanishing when dealing with long-range dependencies during back-propagation operations. To address these issues, an improved version of the simple RNN, the Independently Recurrent Neural Network (IndRNN), has been proposed [13]. The IndRNN architecture has independent neurons in the same layer but connected across layers, unlike the simple RNN. Figure 6 illustrates the architectural differences between the simple RNN and IndRNN. In addition, in the IndRNN architecture, each neuron in a specific hidden layer only receives its past context information instead of fully connecting all neurons in the same layer, further enhancing the model’s performance. The current study utilizes an IndRNN architecture [13] to address the task of action recognition, as shown in Fig. 7. The architecture involves an IndRec+ReLU block that performs input and recurrent processing at each time step using the ReLU activation function. Batch normalization (BN) is applied before and after the activation function [13]. To process the recurrent inputs in the hidden layer of the processed architecture, the Hadamard product is utilized. The nth hidden state hn,t can be updated for each time step t, following the equation provided in Eq. 7. hn,t = σ(Wn xt + un  hn,t−1 + bn )

(7)

where vector Wn is the input weight and un represents the recurrent weight. The ReLU activation function is represented by σ, and the Hadamard product is denoted by . Finally, the bias is denoted by bn [32].In order to meet the

A Virtual Community Healthcare Framework in Metaverse

41

Fig. 6. Comparison between simple RNN and IndRNN architectures.

Fig. 7. Architecture of IndRNN used in MARS.

needs of a smart-home environment, a simple 4-layer structure is used for the IndRNN architecture [13]. The basic idea of DS theory is to represent uncertain information as a probability set and then use mathematical operations such as Dempster’s combination rule to combine them. The verification result of the neural network is a probability set between [0,1] after passing softmax. Each number represents the probability that the input belongs to a particular outcome. Information from different data sources can conclude after passing through their respective trained neural networks, and the combination of final decisions is then completed using DS theory.

Fig. 8. Information fusion on decision level.

42

5

Q. Qu et al.

Experimental Study

A study experimentally evaluated the proposed approach, which involved training and testing accuracy using the IndRNN architecture. The training was performed using 4 Nvidia RTX A5000 GPUs, while the experiments were conducted under Python 3.8 utilizing Pytorch backend and NVIDIA CUDA 11.6 library to enable parallel computing. 5.1

Experimental Data Set

The NTU RGB+D dataset [29] has been used for training for this research. This dataset comprises 60 action classes performed by 40 subjects from 80 different viewpoints. The actions are captured by three Microsoft Kinect V2 sensors that record RGB images, depth map sequences, 3D skeleton data, and infrared (IR) data. The 3D skeleton data includes the 3D coordinates of 25 body joints for each frame, but the last five joints (21 to 25) are excluded for feature extraction. The dataset contains 56,880 video samples from 60 different classes. It provides two segmentation methods: Cross-Subject(40,320 and 16,560 samples for training and testing, respectively) and Cross-View(37,920 and 18,960 samples for training and testing, respectively). The size of the skeleton data file used to represent the action is only 10kb, compared to 1.88MB and 878KB for RGB and depth images, respectively. The simplicity of the skeleton data makes it suitable for limited computing power and storage resources such as IoT devices. We utilize the publicly available SCUT-NAA dataset [37] and an additional Falling dataset [19] to represent the initial data. The SCUT-NAA dataset consists of activity data from 44 volunteers, including 34 males and 10 females. Each volunteer performed a single collection of 10 activities, resulting in a wellbalanced dataset. In a separate experiment, data were collected from 32 volunteers for falls, consisting of 28 males and 4 females. The data collection included four fall postures: forward, backward, left, and right, with sensors primarily placed on the chest and thighs. This dataset contains both acceleration and angular velocity data. After selecting common actions from the above databases and sorting them out, we obtained an experimental data set with 8 actions: Sitting, Walking, Step walking, Jumping, Upstairs, Downstairs, Cycling, and Falling. 5.2

Experimental Results

After using the Kinect-based skeleton image and the wearable device-based inertia sensor data to calculate jointly, our system can obtain a relatively good result. Figure 9 illustrates the use of key spatial joints and temporal stages for action recognition. Figure 10 shows the sudden change of inertial sensor data when falling. Table 1 presents the comparisons of the proposed MARS with other existing methods. While the results reported in [35] show a higher accuracy than what our MARS system achieved, it is worth noting that a 512-dimensional vector is

A Virtual Community Healthcare Framework in Metaverse

43

used as the input of the decision-level fusion in MARS. Considering the limitations of edge computing on computing resources and storage space, a small-size method is more appropriate.

Fig. 9. An Illustration of key stages, Joints, and Motion for the Action of Falling.

Fig. 10. Example of Inertial Sensor data During Falling. Table 1. Comparison of various action detection methods. Refs

Year Sensors

Methods Accuracy Datasets

[11]

2017 Radar,Wearable,Kinect

SVM

91.36%

Self-Collect

[20]

2020 Pressure, Stress,Magnetic field sensors SVM

96.67%

Self-collect

[12]

2020 Inertial Sensors

RF

97%

[35]

2021 Kinect, Inertial Sensors

2D-CNN 98.90%

[38]

2022 Thermal Camera

MARS 2023 Kinect, Inertial Sensors

SVM

92%

IndRNN 97.32%

SCUT-NAA CZU-MHAD Self-Collext NTURGB+SCUT-NAA

44

Q. Qu et al.

The experimental results indicate that data fusion enhances fall detection efficiency. Since the sensor reserves on edge devices are limited, efficient data fusion can strike a balance between the number of sensors and device resources. Fall detection is a prime example of an offline application in our design layout. Various detection around the target aims to quickly and accurately capture current information and generate a digital twin that closely represents the physical object.

6

Conclusions

As an important component of the entire Metaverse landscape, the development of virtual healthcare faces significant challenges, especially in senior safety monitoring services. It is possible that patients who live alone in a room in a nursing house or a distributed community become the victim of accidents or medical emergencies, such as falls or heart attacks. Introducing a DT-enabled monitoring system brings new solutions to these challenging situations and the virtual community healthcare system it creates essentially contributes to the development of Metaverse. Inspired by this compelling need, we illustrate a high-level architecture of DT-enabled virtual healthcare service including the key technologies supporting different applications. We also propose a virtual community healthcare framework (VirCom) utilizing the IoMT and DT technologies in the context of Metaverse. The elderly fall detection system based on Multi-Sensor Action Recognition (MARS) approach is presented as a case study in the VirCom framework to protect the safety of the elderly without compromising their privacy. Experimental results demonstrate the effectiveness of the VirCom framework and its ability to meet design goals. The proposed VirCom framework has the potential to revolutionize healthcare delivery for seniors by providing personalized telemedicine services, improving patient outcomes, and supporting unavailable healthcare professionals.

References 1. Ara, A., Ara, A.: Case study: Integrating iot, streaming analytics and machine learning to improve intelligent diabetes management system. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 3179–3182. IEEE (2017) 2. Augusto, V., Murgier, M., Viallon, A.: A modelling and simulation framework for intelligent control of emergency units in the case of major crisis. In: 2018 Winter Simulation Conference (WSC), pp. 2495–2506. IEEE (2018) 3. Barricelli, B.R., Casiraghi, E., Fogli, D.: A survey on digital twin: definitions, characteristics, applications, and design implications. IEEE access 7, 167653–167671 (2019) 4. Chan, A.P., Yeung, D.S., Tsang, E.C., Ng, W.W.: Empirical study on fusion methods using ensemble of rbfnn for network intrusion detection. In: Advances in Machine Learning and Cybernetics: 4th International Conference, ICMLC 2005,

A Virtual Community Healthcare Framework in Metaverse

5.

6. 7. 8. 9.

10. 11. 12. 13.

14. 15. 16. 17.

18.

19.

20. 21. 22.

23.

45

Guangzhou, China, August 18–21, 2005, Revised Selected Papers, pp. 682–690. Springer (2006) Chen, Y., Du, R., Luo, K., Xiao, Y.: Fall detection system based on real-time pose estimation and svm. In: 2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 990–993. IEEE (2021) Dempster, A.P.: Upper and lower probabilities generated by a random closed interval. The Annals of Mathematical Statistics, pp. 957–966 (1968) Hammi, B., Zeadally, S., Perez, A.J.: Non-fungible tokens: a review. IEEE Internet of Things Magazine 6(1), 46–50 (2023) Juan, S., Adlard, P.A.: Ageing and cognition. Biochemistry and cell biology of ageing: Part II clinical science, pp. 107–122 (2019) Karakra, A., Fontanili, F., Lamine, E., Lamothe, J., Taweel, A.: Pervasive computing integrated discrete event simulation for a hospital digital twin. In: 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), pp. 1–6. IEEE (2018) Ketu, S., Mishra, P.K.: Internet of healthcare things: a contemporary survey. J. Netw. Comput. Appl. 192, 103179 (2021) Li, H., et al.: Multisensor data fusion for human activities classification and fall detection. In: 2017 IEEE Sensors, pp. 1–3. IEEE (2017) Li, R., Li, H., Shi, W.: Human activity recognition based on lpa. Multimed. Tools Appl. 79, 31069–31086 (2020) Li, S., Li, W., Cook, C., Zhu, C., Gao, Y.: Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5457–5466 (2018) Liu, X., et al.: Kalman filter-based data fusion of wi-fi rtt and pdr for indoor localization. IEEE Sens. J. 21(6), 8479–8490 (2021) Liu, Y., et al.: A novel cloud-based framework for the elderly healthcare services using digital twin. IEEE access 7, 49088–49101 (2019) Madine, M.M., et al.: Blockchain for giving patients control over their medical records. IEEE Access 8, 193102–193115 (2020) Minerva, R., Lee, G.M., Crespi, N.: Digital twin in the iot context: a survey on technical features, scenarios, and architectural models. Proc. IEEE 108(10), 1785– 1824 (2020) Nandy, S., Adhikari, M., Chakraborty, S., Alkhayyat, A., Kumar, N.: Ibonn: intelligent agent-based internet of medical things framework for detecting brain response from electroencephalography signal using bag-of-neural network. Futur. Gener. Comput. Syst. 130, 241–252 (2022) Ojetola, O., Gaura, E., Brusey, J.: Data set for fall events and daily activities from inertial sensors. In: Proceedings of the 6th ACM Multimedia Systems Conference, pp. 243–248 (2015) Pan, D., Liu, H., Qu, D., Zhang, Z.: Human falling detection algorithm based on multisensor data fusion with svm. Mob. Inf. Syst. 2020, 1–9 (2020) Phan, D.T., et al.: A flexible, wearable, and wireless biosensor patch with internet of medical things applications. Biosensors 12(3), 139 (2022) Polyniak, K., Matthews, J.: the johns hopkins hospital launches capacity command center to enhance hospital operations (Oct 2016), https://www.hopkinsmedicine. org/news/media/releases Qu, Q., Chen, Y.: Digital twins in the aiomt. In: Handbook of Security and Privacy of AI-Enabled Healthcare Systems and Internet of Medical Things, pp. 1–17. Taylor & Francis Group (2023)

46

Q. Qu et al.

24. Qu, Q., Xu, R., Chen, Y., Blasch, E., Aved, A.: Enable fair proof-of-work (pow) consensus for blockchains in iot by miner twins (mint). Future Internet 13(11), 291 (2021) 25. Sagi, O., Rokach, L.: Ensemble learning: a survey. Wiley Interdiscip. Rev.: Data Mining Knowl. Disc. 8(4), e1249 (2018) 26. Sayeed, M.A., Mohanty, S.P., Kougianos, E., Zaveri, H.P.: eseiz: an edge-device for accurate seizure detection for smart healthcare. IEEE Trans. Consum. Electron. 65(3), 379–387 (2019) 27. Scharff, S.: From digital twin to improved patient experience (Sep 2010). https:// www.siemens-healthineers.com/en-us/news/mso-digital-twin-mater.html 28. Sentz, K., Ferson, S.: Combination of evidence in dempster-shafer theory (2002) 29. Si-Ahmed, A., Al-Garadi, M.A., Boustia, N.: Survey of machine learning based intrusion detection methods for internet of medical things. In: Applied Soft Computing, p. 110227 (2023) 30. Siyang, S., Lokavee, S., Kerdcharoen, T.: The development of iot-based nonobstructive monitoring system for human’s sleep monitoring. In: 2019 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW), pp. 1–2. IEEE (2019) 31. Sujaritha, M., Sujatha, R., Nithya, R.A., Nandhini, A.S., Harsha, N.: An automatic diabetes risk assessment system using iot cloud platform. In: Haldorai, A., Ramu, A., Mohanram, S., Onn, C.C. (eds.) EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing. EICC, pp. 323–327. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-19562-5 32 32. Sun, H., Chen, Y.: Real-time elderly monitoring for senior safety by lightweight human action recognition. In: 2022 IEEE 16th International Symposium on Medical Information and Communication Technology (ISMICT), pp. 1–6. IEEE (2022) 33. Sun, H., Chen, Y.: An overview of aiomt applications. In: Handbook of Security and Privacy of AI-Enabled Healthcare Systems and Internet of Medical Things, pp. 1–18. Taylor & Francis Group (2023) 34. Sundaravadivel, P., Goyal, V., Tamil, L.: i-rise: An iot-based semi-immersive affective monitoring framework for anxiety disorders. In: 2020 IEEE International Conference on Consumer Electronics (ICCE), pp. 1–5. IEEE (2020) 35. Wang, X., Lv, T., Gan, Z., He, M., Jin, L.: Fusion of skeleton and inertial data for human action recognition based on skeleton motion maps and dilated convolution. IEEE Sens. J. 21(21), 24653–24664 (2021) 36. Xu, R., Chen, S., Yang, L., Chen, Y., Chen, G.: Decentralized autonomous imaging data processing using blockchain. In: Multimodal Biomedical Imaging XIV. vol. 10871, pp. 72–82. SPIE (2019) 37. Xue, Y., Jin, L.: A naturalistic 3d acceleration-based activity dataset & benchmark evaluations. In: 2010 IEEE International Conference on Systems, Man and Cybernetics, pp. 4081–4085. IEEE (2010) 38. Yang, Y., Yang, H., Liu, Z., Yuan, Y., Guan, X.: Fall detection system based on infrared array sensor and multi-dimensional feature fusion. Measurement 192, 110870 (2022) 39. Zhang, Y., Jiang, C., Yue, B., Wan, J., Guizani, M.: Information fusion for edge intelligence: a survey. Inform. Fusion 81, 171–186 (2022)

An Improved LED Aruco-Marker Detection Method for Event Camera Shijie Zhang , Yuxuan Huang , Xuan Pei , Haopeng Lin , Wenwen Zheng , Wendi Wang , and Taogang Hou(B) School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China [email protected]

Abstract. The event camera is a bio-inspired sensor with numerous advantages over frame-based cameras, including high dynamic range, low latency, and no motion blur, making it ideal for detecting and tracking fast-moving objects while providing inter-frame data. Despite the extensive research and applications of frame-based cameras in detecting natural and fiducial object features, research on event cameras has been relatively sparse. In this paper, we propose a method for enhancing the accuracy of ArUco-like marker identification in event cameras by using LED aruco markers. Through experimentation conducted in dynamic environments, we have demonstrated the high efficiency and accuracy of this method. In the future, we aim to develop a new fiducial marker that is even faster, more precise, and more suitable for robot vision tasks.

Keywords: event camera

1

· aruco code · object detection

Introduction

The event camera is a bio-inspired sensor with numerous advantages, including high dynamic range, low latency, and no motion blur. These capabilities make event cameras ideal for use in challenging scenes, particularly in low-light and rapidly changing environments. Compared to traditional cameras, event-based cameras with no motion blur and low latency are preferable for high-speed object tracking [12]. Existing event-based tracking algorithms generally involve extracting corners using corner detection, updating corner position using optical flow or probability estimation methods, and achieving reliable tracking performance. However, the slow speed of corner detection often fails to match the demand for Research was supported by the Young Elite Scientists Sponsorship Program by CAST (Grant No. 2022QNRC001); China Postdoctoral Science Foundation (Grant No. 2021M690337); the National Science Foundation of China (Grant No. 62103035); the Fundamental Research Funds for the Central Universities (Grant No. 2020JBM265); the Beijing Natural Science Foundation (Grant No.3222016) and the Beijing Laboratory for Urban Mass Transit (Grant No. 353203535). c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023  H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 47–57, 2023. https://doi.org/10.1007/978-981-99-3581-9_3

48

S. Zhang et al.

object tracking in high-speed environments. In cluttered backgrounds, extracting the object corners separately from the background corners becomes challenging. Traditional cameras often use artificial features, such as fiducial markers, for calibration, registration, and tracking [1,13]. Compared to natural features, artificial ones are easier to detect, have higher accuracy, and are more robust. ArUco markers are widely used as fiducial markers. In our prior work, we proposed a method to detect markers contained in event images by means of noise removal, corner point identification, quadrilateral detection, and marker verification. Experimental results demonstrate that our method exhibits high accuracy in marker detection in static environments compared to traditional methods. In dynamic environments, our method exhibited improved detection accuracy, but not by much. Our approach is robust under different exposure conditions, while the existence of an applicable exposure interval for RGB cameras enables our method to be effective in some low-exposure environments. In this paper, we introduce an improved method for event-based ArUco marker detection, which enhances the accuracy of event cameras for ArUco-like marker identification by improving filtering and corner detection algorithms, along with the integration of error detection function.

2 2.1

Related Work Event-Based Tracking

The event camera is an asynchronous and neuromorphic vision sensor that has low latency and no motion blur, making it ideal for sensing and tracking applications. Event-based tracking methods can be divided into two main categories: event-driven and event-and-frame combining approaches. The event-driven method involves clustering, filtering, and registering every generated event. In early research, object tracking was limited to simple shapes such as circles [10] and lines [3] using event clustering. Gaussian Mixtures and other methods are commonly used in microrobotics [10], high-speed robot tracking [4], and traffic monitoring [8]. For more complex object shapes, Mean-shift and Monte-Carlo methods [7], gradient descent [9], and Iterative Closest Point (ICP) [10] algorithms are used to achieve high-speed tracking by updating transformation parameters. In contrast, frame-based cameras are widely used in computer vision tasks for their ability to provide valuable color and texture information. They complement event cameras in object tracking, where the event camera offers dynamic information between the two frames of a traditional camera. Harris [6] and Fast [11] corner detectors are used to extract edge features, while the ICP algorithm is used to find motion parameters by comparing the photometric error between events and frames.

An Improved LED Aruco-Marker Detection Method for Event Camera

2.2

49

ArUco

Markers are commonly utilized in computer vision for tasks such as target detection, pose estimation, and reality augmentation, among others. Currently, the binary square-based marker is widely used in marker detection [11]. Figure 1 illustrates several binary square markers in (a)-(c), which can be easily detected using common image processing methods in the environment.

Fig. 1. A comparison of visual markers.

The main advantage of an ArUco marker is that it provides sufficient corresponding points to obtain the camera pose. The internal binary encoding makes them particularly robust, allowing for the possibility of utilizing error detection and correction. [2] proposes the theoretical maximum Hamming distance among tags that a dictionary with square markers has saved, and the ArUco marker is designed according to this method.

Fig. 2. Structure of the ArUco marker and its encoding approach.

An ArUco marker is a synthetic square marker consisting of a white background, a black border, and an internal binary matrix with a unique identifier (ID). The black border and white background facilitate fast detection in the image, and the internal matrix is used for binary encoding (black squares are

50

S. Zhang et al.

encoded as “1” and white squares are encoded as “0”) to determine the information carried by the marker, as shown in Fig. 2. The size and number of bits of all ArUco markers are obtained by a generic method of configurable dictionaries, instead of using a predefined set of markers. In any standard ArUco marker dictionary, the minimum Hamming distance between all markers effectively reduces the occurrence of error detection in marker identification. In addition, the binary encoding of ArUco allows for the application of identification, error detection, and correction, which enhances the accuracy of its identification. ArUco is renowned for its robustness, accuracy, and wide usage in computer vision. Traditional ArUco markers cannot be directly captured by an event camera due to its inability to register changes in brightness. Simulating an ArUco marker using an LED dot matrix combines the advantages of ArUco and event cameras. By manipulating the changes in brightness of the LED dot matrix, events are generated in the environment, and the event camera responds to capture the simulated ArUco marker. Traditional ArUco marker detection is achieved by using image thresholding and boundary tracking, while no thresholding operation is unnecessary with an event camera. However, due to the physical gaps between the LED beads of the dot matrix, the entire marker cannot be directly detected using a boundary tracking algorithm. Thus, the traditional algorithm is not directly applicable to event camera identification.

3

Method

The methods of mean shirt Filter, corner point refinement, error detection and weighted statistic have been applied to our algorithm to improve the accuracy in detecting markers based on event camera. The methods are described in detail as follows. 3.1

Mean Shirt Filter

The previous algorithm utilized mean filtering for active background noise removal. Equation 1 presents the general expression of the mean filtering algorithm, where Ω denotes the filtering window centered on pixel i, w represents the filtering weight, q is the original pixel value of an image, and I  is the resulting output. The algorithm operates on a specific pixel and calculates the output by taking a weighted linear combination of the center and neighboring pixels within the set filtering distance. However, this filtering method resulted in some edge pixels with marker information being filtered out, thereby causing the marker’s edge to be blurred and impacting its accuracy of identification. Consequently, we switched to the mean shift filtering algorithm for improvement.  wij qj (1) Ii = j∈Ω

An Improved LED Aruco-Marker Detection Method for Event Camera

51

Fig. 3. Schematic diagram of the mean drift algorithm. The red target is the original position and the position of orange one is after updated. The green vector represents the direction of mean shift. (Color figure online)

where Ω is the filtering window centered on pixel i, w is the filtering weight, q is the original pixel value of image, and I  is the output result. As shown in Fig. 3, the mean shift filtering algorithm identifies clustering points by tracing the direction of increasing density. The method computes the mean offset value from the center within a sample space as per Eq. 2 and adjusts the center position using Eq. 3. The center point position undergoes successive updates until convergence is achieved.  (xi − x) (2) M (x) = xi ∈Sh k xt+1 = M t + xt

(3)

where Sh is the 2D region of radius h with x as the center point, k is the number of points contained in the range of Sh , xi is the sample point contained in the range of Sh . M t and xt are the mean value of the offset obtained and center in state t respectively. Compared with the traditional mean filtering algorithm, the mean shift filter does not need to set the convolution kernel size in advance, and is very slightly affected by outliers. In addition, the algorithm is more suitable for noise removal in dynamic environment. The comparison of the effect after modifying the algorithm is shown in Fig. 4. 3.2

Corner Point Refinement

In the previous marker detection process, we framed the markers based on the geometric relationship of the four most peripheral corner points. One of the problems caused by this is that once there is a serious bias in one of the four corner points, the whole marker identification algorithm will fail.

52

S. Zhang et al.

Fig. 4. The effect of filtering. The black pixels are the event captured by event camera. The region marked with a red circle is one of the important corner points to each image, the result shows that lots of active background noise is removed and important event information is reserved. (Color figure online)

Fig. 5. A common frame from the event packet. It can be clearly observed that two edges are clear to recognize and some important corner points maybe not obvious.

Figure 5 illustrates that when the marker is in motion, two edges at the front of the marker are well identified for directional motion, resulting in more accurate corner points. To improve the algorithm’s robustness, we filtered the three detected corner points most likely to be the actual corner points, in order to estimate the position of the fourth corner point. In the filtering algorithm, Eq. 4 was used, which calculates the cosine value four times between every three detected points, and the three points with a cosine value closest to 0 were considered as the actual corner points. The position of the remaining point was then determined. The process is depicted in Fig. 6. cos α =

a2 + b2 − c2 2ab

(4)

An Improved LED Aruco-Marker Detection Method for Event Camera

53

Fig. 6. Schematic diagram of corner point refinement. The blue circles are the original detected corner points. Three important points were selected, which form the angle closest to 90◦ C, the position of point remaining is determined by selected points. (Color figure online)

where α is an angle of triangle which is consisted of three corner points, a,b are the length of adjacent edge and c is the length of opposite edge. The aforementioned method can decrease the likelihood of marker detection failure that arises from corner point position bias, thereby enhancing detection accuracy, particularly in a dynamic environment. 3.3

Error Detection

The previous method of image encoding and matching necessitates an exact match between the image encoding and dictionary encoding for successful marker identification. However, this method compromises the success rate to enhance marker identification accuracy. Our improved method combines the high Hamming distance between the ArUco markers and chooses the encoding information from the dictionary with the smallest Hamming distance. A successful marker identification only occurs when the Hamming distance is within the accepted range [5].

54

S. Zhang et al.

Fig. 7. The diagram of error detection. The example is given for the case where there is a 2-bit error. The obtained encoding result is compared one by one in the reserved dictionary, where the encoding with the minimum distance from the obtained encoding is filtered and whether the distance is within the specified range, the final result (succeed or failed) will be output at last.

3.4

Weighted Statistic

In traditional pixel-based detection methods, the threshold is usually set to half of all pixels in a grid. If the number of black pixels within the grid exceeds the threshold, the grid is encoded as a binary value of “1”, otherwise it is encoded as “0”. However, this approach is not well-suited for detecting markers using an event camera due to several reasons. Firstly, as depicted in Fig. 8, the event image generated by the LED dot matrix used to stimulate the event camera’s response is not ideal, and some pixels in the region of each lamp bead are not captured by the event camera, making it unreasonable to set the threshold as in the traditional method. Secondly, as shown in Fig. 8, when using the event camera, there exists an accumulated time T during which the target’s displacement may occur, especially in fast-moving environments, resulting in slight marker deformation and inter-pixel interference, thus reducing the identification accuracy. In our previous work, we attempted to improve the accuracy by lowering the threshold, but it did not effectively eliminate the effect of target displacement. To address this issue, we propose a new strategy, in which we delineate different regions within the grid, each with different assigned weights to eliminate as many inter-pixel effects as possible. The schematic of this approach is shown in Fig. 8. The weight assignment and pixel statistic principles should satisfy Eqs. 5 and 6, respectively.

An Improved LED Aruco-Marker Detection Method for Event Camera

55

Fig. 8. The diagram of weighted statistic. (a) is the schematic of weight assignment. (b) is the process of statistic where qi is the total number of black pixels in a region,“1” and “0” are the encoding result represent the whole grid is black or not.

V =

N 

wi qi

(5)

wi Qi

(6)

i=1

1=

N  i=1

where V is the result of statistic. N is the number of regions delineated, i is the index of region and wi is the wight. Qi and qi are the total number of pixels in the region and the number of black pixels, respectively. We have designed a weight assignment strategy in which the weights decrease from the periphery to the inner periphery of each region, with the innermost region having the highest weight and the outermost region having the lowest weight. This weight assignment order is implemented to minimize the interaction of pixels between neighboring grids, and it has proved to reduce the probability of marker identification failure caused by the algorithmic factors or the motion of the target.

4

Experiment

To evaluate the efficiency of marker identification based on an event camera using our developed algorithm, we conducted four sets of comparative experiments to determine its improvement over the original algorithm in identifying markers accurately. The experiments were carried out indoors using an EVK4 event camera (Prophesee, Paris, France) with a resolution of 1280 × 720. The

56

S. Zhang et al.

marker was an 8 × 8 LED dot matrix with a power consumption of 0.5 W on average, and a size of 6 cm x 6 cm. The experimental platform was based on CPU (i7-12700H, Intel, California, USA) and GPU (GeForce RTX 3060, NVIDIA, California, USA) hardware. For each set of experiments, we captured video data in both static and dynamic environments, at distances of 30 cm and 50 cm from the marker. The identification success rates of the processed data using the two different methods are presented in Table 1. Table 1. Accuracy of frame detection in common frames (%) Distance (30 cm) Static

Distance (30 cm) Dynamic

Static

Dynamic

Original Developed Original Developed Original Developed Original Developed 99.38% 92.51% 78.12% 98.10% 90.05% 76.10% 20.9% 81.61% Note: The identification efficiency of the developed algorithm is more than 8 times higher than before, at the expense of a certain degree of accuracy.

Both methods show high accuracy in detecting markers in static environments. However, in dynamic environments, the accuracy of our developed algorithm is significantly improved compared to the original algorithm.

5

Conclusion

Detection and tracking of objects is an important task in event-based vision, for which event cameras have great potential and applicability. Compared to traditional cameras, event cameras offer the advantages of low latency and no motion blur, providing more motion information. In this paper, we present a LED ArUco-marker detection method for event cameras that improves their accuracy in marker identification. The main improvements are summarized as follows: 1. A mean shift filter method is proposed to enhance robustness in dynamic environments. This method improves stability while minimizing event noise and increases detection efficiency. 2. An error detection and weighted statistic method is presented to effectively reduce the probability of marker identification failure caused by target motion. The experiment results demonstrate that our method significantly improves accuracy, robustness, and efficiency in dynamic environments. In the future, we plan to design new fiducial markers for event cameras that are faster, more accurate, and better suited for robot vision tasks.

An Improved LED Aruco-Marker Detection Method for Event Camera

57

Acknowledgements. Research was supported by the Young Elite Scientists Sponsorship Program by CAST (Grant No. 2022QNRC001); China Postdoctoral Science Foundation (Grant No. 2021M690337); the National Science Foundation of China (Grant No. 62103035); the Fundamental Research Funds for the Central Universities (Grant No. 2020JBM265); the Beijing Natural Science Foundation (Grant No.3222016) and the Beijing Laboratory for Urban Mass Transit (Grant No. 353203535).

References 1. Asayama, H., Iwai, D., Sato, K.: Fabricating diminishable visual markers for geometric registration in projection mapping. IEEE Trans. Visual Comput. Graphics 24(2), 1091–1102 (2017) 2. Comaniciu, D., Meer, P.: Mean shift analysis and applications. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1197–1203. IEEE (1999) 3. Conradt, J., Cook, M., Berner, R., Lichtsteiner, P., Douglas, R.J., Delbruck, T.: A pencil balancing robot using a pair of AER dynamic vision sensors. In: 2009 IEEE International Symposium on Circuits and Systems, pp. 781–784. IEEE (2009) 4. Delbruck, T., Lang, M.: Robotic goalie with 3ms reaction time at 4% CPU load using event-based dynamic vision sensor. Front. Neurosci. 7, 223 (2013) 5. Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F.J., Marín-Jiménez, M.J.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn. 47(6), 2280–2292 (2014) 6. Harris, C., Stephens, M., et al.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, pp. 10–5244. CiteSeer (1988) 7. Lagorce, X., Meyer, C., Ieng, S.H., Filliat, D., Benosman, R.: Asynchronous eventbased Multikernel algorithm for high-speed visual features tracking. IEEE Trans. Neural Netw. Learn. Syst. 26(8), 1710–1720 (2014) 8. Litzenberger, M., et al.: Embedded vision system for real-time object tracking using an asynchronous transient vision sensor. In: 2006 IEEE 12th Digital Signal Processing Workshop & 4th IEEE Signal Processing Education Workshop, pp. 173–178. IEEE (2006) 9. Ni, Z., Ieng, S.H., Posch, C., Régnier, S., Benosman, R.: Visual tracking using neuromorphic asynchronous event-based cameras. Neural Comput. 27(4), 925–953 (2015) 10. Ni, Z., Pacoret, C., Benosman, R., Ieng, S., RÉGNIER*, S.: Asynchronous eventbased high speed vision for microparticle tracking. J. Microscopy 245(3), 236–244 (2012) 11. Zhang, S., et al.: RPAS: a refined positioning and analysis system based on aerial platform for swimming scenes. In: 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 688– 693. IEEE (2022) 12. Zhang, S., Sang, F., Li, J., Tang, T., Zhang, J., Hou, T.: ERCM: bionic eventbased registration method based on contrast minimum for intelligent unmanned systems. In: 2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC), pp. 734–739. IEEE (2022) 13. Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)

Improvement of CenterNet Based on Feature Pyramid Networks Yatao Yang, Zihan Yang, Yao Huang, and Li Zhang(B) Shenzhen University, 3688 Nanhai Avenue, Nanshan District, Shenzhen, China zhang [email protected] Abstract. To solve the problem of lacking multi-scale feature fusion in anchor-free object detection algorithm CenterNet, an improved CenterNet-F algorithm combined with feature pyramid is proposed. Based on the network structure of CenterNet, the data processing method batch normalization is replaced by group normalization with better performance. The multi-scale fusion of features is carried out in the network output head combined with feature pyramid structure, which effectively improves the accuracy of small-scale object detection. The experimental results show that the detection accuracy of the improved algorithm is improved from 30.1 AP to 30.6 AP for all scale objects on coco dataset, and 3.5 AP for small scale objects. Keywords: Object detection Normalization

1 1.1

· Feature pyramid · CenterNet · Group

Introduction CenterNet [1]

Object detection algorithms can be divided into anchor-based and anchorfree algorithms, and the difference is whether the candidate target boxes are extracted using anchors. Anchor (also known as anchor box) is a set of rectangular boxes clustered on the training set using methods like k-means before training, which represents the long scale and wide scale of the main target distribution in the dataset. The n candidate rectangular boxes are used as sliding windows to extract features from these anchors on the feature map generated at inference time for further classification and regression. Previously, the accuracy of anchor-based methods is generally higher than that of anchor-free methods, thanks to the emergence of FPN (Feature Pyramid Networks) [2] and Focal Loss [3] methods, the limitations of anchor-free algorithms in multi-scale detection and central region prediction are compensated, making anchor-free algorithm has a performance comparable to the effect of anchor-based algorithms. CenterNet is an anchor-free algorithm that is an enhancement to ConerNet [4] and ExtremeNet [5] object detection networks, which presents the target by locating the target centroid and then regressing some attributes of the target at the centroid location, thus making the object detection problem a standard key c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023  H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 58–67, 2023. https://doi.org/10.1007/978-981-99-3581-9_4

Improvement of CenterNet Based on Feature Pyramid Networks

59

point estimation problem. We simply pass the image into the full convolutional network to obtain a heat map with the heat map peak point, i.e., the centroid and the peak point position of each feature map predicts the width and height information of the target. The model is trained using standard supervised learning, and the inference is simply a single forward propagation network without such post-processing as Non-maximum suppression values [6], which is one of the advanced object detection algorithms currently proposed. 1.2

Feature Pyramid Networks

Feature pyramid network is a network structure that utilizes the inherent multiscale pyramid structure of deep convolutional neural networks to construct feature pyramids with very small computational effort. FPN constructs features including bottom-up, top-down, and same-layer connections in three processes, where the top-down network structure is used to construct feature maps with high-level semantic information at different scales through lateral connections. Due to the different expressiveness of features at different levels of the feature map, shallow features mainly reflect details such as light and dark, edges, etc., while deep features reflect a richer overall structure. The shallow features alone cannot contain the overall structural information and will weaken the expressiveness of the features. If the deep features are fused into the shallow features, the details and the whole object are taken into account, and the fused features will have a richer expressive power. So, the advantage of this method is that feature extraction is performed for each scale of the image, which can produce multiscale feature representations with strong semantic information at all levels of feature maps, even including some high-resolution feature maps. Its complementary disadvantage is the significant increase in inference time due to the huge memory consumption. The general structure of the algorithm consists of the bottom-up lines, the top-down lines, and the horizontal connections. Figure 1. shows the structure of the feature pyramid network.

Fig. 1. The structure of the feature pyramid network.

60

Y. Yang et al.

The top-down process essentially constructs a new feature map by transforming the upper layer’s feature map by scale. The new feature map needs to be at the same scale as the lower layer’s feature map, thus ensuring that the feature maps can be fused together. In the length and width directions, an up-sampling method is used to pull the width and height to the same size as the lower feature map; in the depth direction, the depth of the upper feature map is compressed to the same depth as the lower feature map by a 1×1 convolution. Since the bottom feature map contains more localization details, and the top feature map contains more target feature information, in order to use the localization details information of the bottom layer, the lateral connection adds the new feature map and each corresponding element in the original lower feature map, which achieves the fusion of the upper and lower features, and then output each of the fused feature maps as a new feature map with depth d. By doing so, the high resolution and strong semantic features can be obtained, which is beneficial for the detection of small targets.

2 2.1

The Improved Method Optimization of Feature Pyramid Network

The original CenterNet uses ResNet-50 [7], which first convolves the input image through a convolution kernel, downsampling the original image to change its size from 3*512*512 (3 is the number of channels, 512 is the image size) to a feature map with a scale of 192*64*64. The feature map then passes through the deconvolution module [8], which includes three deconvolution groups, and each group includes a 3*3 convolution and a deconvolution, each deconvolution will double the size of the feature map, and finally get a feature map with the size of 64*128*128 by three upsampling. In our modified network named CenterNet-F, the main network framework still uses ResNet-50, and an FPN module is added after the main network as a neck network, which can fuse more layers of features and enhance the network’s ability to extract small target objects in images as aforementioned. However, after directly adding the FPN network we found that the final loss function did not converge, and NAN values appeared after only a few epochs of training. After a lot of experiments, we confirmed that the reason for the non-convergence is that the gradient of the backbone network is too large, which will cause the network to be unstable. One of the reasons for the large gradient of the backbone network is that there is no good data processing, and another more important reason is that the feature map pyramid network usually uses bilinear interpolation method for upsampling. Because the bilinear interpolation method is a linear function that cannot fit the nonlinear model well, it can lead to the nonconvergence of the final output loss function. To solve this problem, we try to replace the traditional bilinear interpolation upsampling method with the deconvolution method, which can better solve the nonlinearity problem and suggest the fitting ability of the model. Even so, after a few training cycles, the loss function still oscillates strongly, which we found to be caused by the gradient

Improvement of CenterNet Based on Feature Pyramid Networks

61

dispersion or gradient explosion that occurs during the training process. Finally, we fitted the model with the deconvolution together with the activation function, which used ReLU as shown in Eq. 1. f (x) = max(0, x)

(1)

Also, we added Group Normalization to the data after each nonlinear processing, and obtained the desired effect. The improved network is shown in Fig. 2.

Fig. 2. The structure of the Optimized feature pyramid network.

2.2

Data Processing

When training is performed, the data are generally normalized between the convolution and activation layers to make their distribution consistent. In CenterNet, Batch Normalization [9] is used to normalize the data for each batch of the input network, pulling the data back to a positive-terrestrial distribution with mean 0 and variance 1. To solve the problem that each batch has a different distribution and the change of data distribution will cause difficulties in learning the next layer, Batch Normalization can be adopted, which can make the data distribution consistent and avoid the gradient disappearance. However, because normalizing the batches along the channel direction, there is a problem of sensitivity to batch size, which is the drawback of Batch Normalization. If the batch size is too small, the calculated mean and variance are not enough to represent the whole data distribution, which makes the learning of the network difficult and is not conducive to convergence. The operations of Batch Normalization are described with the Eq. 2. (2) B = {x1...m } where x denotes the value over a mini-batch B. m

μB =

1  ∗ xi m i=1

(3)

62

Y. Yang et al.

Use Eq. 3 to calculate the average value of the mini-batch. m

2 = σB

1  2 ∗ (xi − μB ) m i=1

(4)

Use Eq. 4 to calculate the variance of the mini-batch. (xi − μB ) x ˆi =  2 σB + 

(5)

yi = γ ∗ xˆi + β

(6)

Normalize x using Eq. 5. Equation 6 expands and shifts the normalized data. This aims to let the neural network learn how to use and modify parameters of γ and β, so that the neural network can learn to determine whether the previous normalization operation is optimized or not, and if not, use γ and β to offset some of the normalization operations. The main method of the improved normalization is to group in the channel direction and normalize along the batch direction, which is independent of the batch size and is not constrained by it, and plays a better and more robust role in the case of small batch size. {xi |i = (iN , iC , iH , iW )}

(7)

In Eq. 7, i in xi represents the coordinates of the four dimensions respectively, and xi is a point at the specified location in the feature map. Si = {k|kN = iN ,

iC kC = } C/G C/G

(8)

Here G is the number of groups, which is a pre-defined hyper-parameter and equals to 32 by default. C/G is the number of channels per group. The motivation of such grouping is to perform normalization in the same group of the same feature map, and the group is only divided in the channel dimension, so the normalization operation is independent of the batch size. The subsequent operation is the same procedure as what Eqs. 2 to 5 describe. The data processing before and after using the improved normalization method is shown in Fig. 3.

3

Experiments

We compared the accuracy of object detection of the original CenterNet and our improved CenterNet-F, and the result is shown in Table 1. The average accuracy for all targets is 30.1 AP and 11.9 AP for small targets before the improvement, and the average accuracy for all targets of CenterNet-F is 30.6 AP, which is 0.5 AP higher than that of CenterNet. The detection accuracy for other scales are also improved.

Improvement of CenterNet Based on Feature Pyramid Networks

63

Fig. 3. The data processing before and after using the improved normalization method [10] Table 1. The accuracy comparison between CenterNet and our improved CenterNet-F. Method

Backbone AP

AP(50) AP(75) AP(Small) AP(Medium) AP(Large)

CenterNet ResNet-50 30.1 47.5 CenterNet-F ResNet-50 30.6 48.1

31.5 32.2

11.9 15.4

34.3 35.4

46.6 41.7

Fig. 4. Figure detection results of CenterNet.

Figures 4 and 5 show the detection results of CenterNet and CenterNet-F respectively. From the figures, we can easily see that the improved algorithm has higher detection accuracy for objects in images, especially for small target objects.

64

Y. Yang et al.

Fig. 5. Figure detection results of CenterNet-F.

We set the training period to 140 epochs, and Table 2 shows the accuracy comparison between the original algorithm and the improved algorithm at different epoch. Table 2. Accuracy of object detection before and after improvement. Method

CenterNet CenterNet-F

AP(Epochs=20)

16.2

17.5

AP(Epochs=40)

21.7

22.8

AP(Epochs=60)

24.5

25.7

AP(Epochs=80)

25.8

26.7

AP(Epochs=100) 28.4

29.9

AP(Epochs=120) 29.7

30.4

AP(Epochs=140) 30.1

30.6

We also compared the degree of convergence of different loss functions in different training epochs, as shown in Figs. 6, 7 and 8. Where, hmap loss refers to the heatmap loss function, reg loss refers to the center-shifted loss function, and w h loss refers to the width-height loss function. From these figures, it can be seen that CenterNet-F converges faster and has less ups and downs of relative fluctuations compared to CenterNet.

Improvement of CenterNet Based on Feature Pyramid Networks

65

Fig. 6. heatmap loss function.

Fig. 7. center-shifted loss function.

From the above experiments, we can conclude that the detection accuracy of the improved detector CenterNet-F is significantly higher than that of CenterNet in the case of dense objects and small size.

66

Y. Yang et al.

Fig. 8. width-height loss function.

4

Conclusion

In this paper, we add an FPN network to the original network CenterNet. Through the FPN network, the features extracted from the previous backbone network can be fused to obtain different semantic features, which enriches the perceptual field of the model, and is more sensitive to small object feature information, thus can detect features that cannot be recognized by the original CenterNet network. However, the direct addition of the FPN network will cause the loss function to fluctuate drastically and fail to converge, so we improve the CenterNet network by modifying the upsampling method in the FPN network from bilinear interpolation to deconvolution. In the data processing stage, the channels of batch normalization are changed from grouping to grouping in order to solve the problem that the network loss fluctuates too much by the batch size. The final experimental results prove that the losses converge successfully with different degrees of improvement for different scales of objects, with the most obvious improvement in detection accuracy for small-scale objects. Acknowledgement. This work was supported by the Shenzhen Science and Technology Program (JCYJ20210324093806017).

References 1. Zhou, X., Wang, D., Kr¨ ahenb¨ uhl, P.: Objects as points. CoRR, abs/1904.07850 (2019) 2. Lin, T.-Y., Doll´ ar, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017)

Improvement of CenterNet Based on Feature Pyramid Networks

67

3. Lin, T.-Y., Goyal, P., Girshick, R.B., He, K., Doll´ ar, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327 (2020) 4. Law, H., Deng, J.: CornerNet: Detecting objects as paired keypoints. Int. J. Comput. Vision 128, 642–656 (2019) 5. Papadopoulos, D.P., Uijlings, J.R.R., Keller, F., Ferrari, V.: Extreme clicking for efficient object annotation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 4940–4949 (2017) 6. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015) 7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 8. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. ArXiv, abs/1701.06659 (2017) 9. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML (2015) 10. Wu, Y., He, K.: Group normalization. Int. J. Comput. Vision 128, 742–755 (2019)

Information Security Protection Techniques for Substation Operation and Maintenance Edge Gateway Fenfang Li1(B)

, Lixiang Ruan2 , Rongrong Ji3 , Yifei Shen2 , and Mingguo Hou1 1 Nari Group Corporation, Nanjing, JiangSu, China

[email protected]

2 State Grid Zhejiang Electric Power Co. Ltd. Research Institute, Hangzhou, Zhejiang, China 3 State Grid Zhejiang Electric Power Co. Ltd. Supervoltage Branch, Hangzhou, Zhejiang, China

Abstract. The cloud-edge collaborative operation and maintenance system based on edge computing realizes the remote operation and maintenance of substation equipment. As the core equipment of the system, the edge gateway is the carrier of local processing of operation and maintenance business and cloud side collaboration. Its intrinsic safety is directly related to the reliability of the operation and maintenance system. This paper designs a complete information security protection scheme for substation edge gateway from the aspects of equipment ontology, microservice technology, communication authentication, identity authentication, security audit, business process and so on. Through network attack, vulnerability scanning, integrity verification and other security testing methods, the feasibility of the scheme is verified. This scheme can effectively improve the security level of the operation and maintenance system while ensuring the main business performance of the system. Keywords: Remote Operation and Maintenance · Edge Calculation · Safety Protection · Power Information Security

1 Introduction The promotion of the digital transformation of the power grid and the implementation of the “unattended + centralized monitoring” mode of the substation have brought new challenges to the operation and maintenance management of substation equipment. The traditional operation and maintenance mode based on on-site operations can no longer meet the needs of “unattended”. The remote operation and maintenance technology for substation equipment has become a research hotspot. At the same time, as a new computing paradigm, edge computing has obvious advantages in improving the real-time performance of business processing, the convenience of business expansion, and ensuring the security of data. Therefore, some experts and scholars began to try to apply edge computing to the remote operation and maintenance of substation equipment. Document [1] introduces a remote operation and maintenance technology of substation automation devices based on the microservice architecture, designs the architecture and functions © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 68–80, 2023. https://doi.org/10.1007/978-981-99-3581-9_5

Information Security Protection Techniques

69

of the remote operation and maintenance system, and applies the container technology to realize the appization of business applications, but does not carry out research on the security protection of operation and maintenance information. Document [2] proposed a wide area operation and maintenance system security protection technology for substation automation equipment, focusing on information security protection through login authentication and communication authentication for service management center, operation and maintenance center and substation. The regional station intelligent equipment integrated operation and maintenance system introduced in document [3] uses security probe technology to monitor the host equipment in the system, and uses service agent to reduce the communication security problems of the dispatching data network. Document [4] utilizes the cloud-edge collaborative interaction technology and the key technology of intelligent identification of the platen state mobile terminal to realize the full-process, online control of the substation switching operation. The mobile application service of the information extranet ensures the communication security between the mobile terminal and the anti-misoperation cloud. The above documents focus more on the realization of operation and maintenance functions. In terms of information security, they propose local reinforcement methods represented by border protection and communication security authentication, and lack a complete security protection system for operation and maintenance services. Therefore, based on the remote operation and maintenance architecture of substation equipment in cloud edge collaboration, this paper introduces an overall scheme of remote operation and maintenance information security protection of substation automation equipment based on edge gateway, analyzes the main security problems that may occur in the edge gateway, and expounds the security strategies that can be adopted by the edge gateway from host security, microservice security, communication security, business process security and other aspects. While using edge computing technology to improve business processing speed, it ensures the security of remote operation and maintenance work.

2 Overall Structure of Substation Edge Gateway The cloud-edge collaborative operation and maintenance system adopts a two-level integration architecture of “cloud side + edge side”. The cloud side operation and maintenance master station is deployed in the provincial and municipal dispatch centers or centralized monitoring centers, and is oriented to network-wide business services. The main operation and maintenance station provides functions such as big data center, APP application store, advanced business processing, graphic display, human-computer interaction, etc. At the same time, it specifies a standardized model interface to realize cloud side data interaction. The edge side substation is deployed in the substation, and the operation and maintenance edge gateway collects real-time data and calculates and processes it locally. The edge gateway connects to various types of electrical equipment to the south, and supports common communication protocols such as RS485, Modbus, DL/T 645, and IEC61850 protocols. The northbound data is transmitted to the main operation and maintenance platform, and the data reporting communication protocols include DL/T634.5.104, MQTT, GSP, etc. The edge gateway adopts a micro-service

70

F. Li et al.

architecture, and uses docker lightweight container technology to package applications and dependent libraries into APPs to isolate the application running environment and reduce the coupling between services. The existing operation and maintenance business APPs include remote virtual human-computer interaction, device intelligent diagnosis, whole-site SCD management and control, and secondary equipment inspection. The APP package is managed by the application store of the main operation and maintenance station. The application store can complete the functions of APP listing and deletion, security verification, version management, and remote distribution, realizing the flexible deployment and expansion of applications. Through the human-computer interaction interface of the operation and maintenance master station, users can perform application store management, advanced application display, edge gateway parameter configuration, gateway data and alarm query functions. The overall architecture design of the cloud-edge collaborative operation and maintenance system is shown in Fig. 1.

Fig. 1. Cloud-edge collaborative operation and maintenance system architecture diagram.

3 Host Security The security level of substation operation and maintenance business is strict. In the security protection scheme of the edge gateway, it is first necessary to ensure the security of the gateway equipment itself. Therefore, a trusted computing platform is introduced

Information Security Protection Techniques

71

to strengthen the security of the gateway device from the chip, hardware and operating system modules. At the same time, intranet security monitoring software is installed in the edge gateway system, which can collect network security events and actively report them to the network security monitoring device. 3.1 Trusted Computing Platform Safe and controllable hardware products and operating systems are used in the edge gateways for substation operation and maintenance. On the hardware, choose an integrated TPCM (Trusted Platform Control Module) hard core or a trusted computing platform using ARM TrustZone technology [5]. At present, many domestic software and hardware built trusted computing platforms take comprehensive measures from trusted hardware core, processor, operating system and other modules to establish a complete trust chain, which can be well applied to the substation operation and maintenance edge gateway [6]. Considering from the operating system level, the edge gateway formulates access control policies for files, devices and sockets, and prohibits the opening of high-risk ports and high-risk services. 3.2 Network Security Monitoring Intranet security monitoring software, also known as network security probe software, is installed on the operating system of the edge gateway, which is used to actively report the network security information of the edge gateway. The network security probe software has its own security perception technology, which can generate security events and report them to the substation network security monitoring device, and finally access the network security management platform for unified management and control. As shown in Fig. 2, the network security events that the software can monitor include system login and operation, plugging and unplugging of external devices, abnormal network access, illegal port opening, and key file changes.

Fig. 2. Network security incident reporting process.

4 Microservice Security The edge gateway adopts a micro-service architecture design. Operation and maintenance business programs such as virtual human-computer interaction, device intelligent diagnosis, SCD management, and secondary equipment inspection are encapsulated in

72

F. Li et al.

an independent operating environment. The edge gateway relies on docker’s own security mechanism and the unified arrangement to realize the security isolation and reasonable resource allocation of the microservice APP operating environment. 4.1 Docker and Namespaces The edge gateway adopts docker container technology to realize app microservice deployment. Docker technology is based on C-S architecture, and the core modules include client, daemon, image warehouse and registry. The client is the operation interface provided by docker to users. The background daemon is responsible for receiving and processing operation requests and is the manager of docker services. The application and running dependencies are packaged as image packages and stored in the image warehouse. After the startup instruction is issued, the image packages will be generated and run in an independent environment. Docker technology realizes the security isolation between processes and file systems by virtue of the namespace function of the Linux kernel, and the running container cannot perceive the processes and file directories of other containers [7]. Applications with different communication protocols in the edge gateway will be packaged and run in different docker containers, so there will be no resource conflict and configuration coverage. However, it should be noted that the special directories of the gateway host system, such as /sys/, /proc/sys/, can be directly accessed by super users in the container [8]. Therefore, there are two solutions: using the capability component, you can choose to run as super user in the container, but deprive the CAP_SYS_ADMIN, CAP_CHOWN, or the ability to run as a normal user but give a specific capability. The latter is selected internally by the edge gateway. 4.2 Resource Allocation for Microservices Another security issue with microservices comes from dangerous configurations and dangerous mounts. The Cgroups component of Linux can help docker to allocate resources, control the CPU, memory, disk I/O and other resources of each container, and prevent attackers from conducting denial of service (DOS) attacks. But in addition, dockerfile and docker image itself will also have many security vulnerabilities, and manual configuration and management of containers will introduce unnecessary problems, such as illegal port exposure and critical directory mounting. Therefore, the management software on the edge gateway provides a unified APP container configuration management entry. Users can input through the interface or uniformly allocate container running resources by the management software, including port monitoring, network configuration, directory mounting, read and write permissions. After confirming the submission, the management software will verify the validity of the configuration. If it is legal, run the APP container and record the parameters in the database, otherwise the container will be rejected. The running status of the APP container includes resource status such as CPU occupancy, memory occupancy, and traffic size. It will also be monitored and displayed by the edge gateway management software. When the resource occupancy of the container reaches a certain limit, the management software will issue an alarm, and the container will be stopped to secure the gateway host.

Information Security Protection Techniques

73

5 Communication Security The communication process of edge gateway can be divided into internal communication and external communication. Internal communication includes communication interaction between microservice app and management software, and between microservice apps. The security protection technologies adopted mainly include certificate authentication and token authority authentication. External communication security is divided into two parts: southbound equipment access and northbound reporting to the master station, which are connected to the substation end equipment and master station respectively. Since the equipment type and communication protocol are not fixed, different information security protection schemes need to be taken according to different business scenarios. 5.1 Internal Communication Security In the common microservice framework, the rest API or remote procedure call (RPC) method is used for communication between microservice APPs [9], and the gRPC protocol is used inside the substation edge gateway, which is a type of RPC. Compared with rest API, RPC can greatly reduce the cost of architectural microservices, and convert information into binary transmission through standardized serialization tools, which is more efficient. GRPC provides two basic security authentication mechanisms, one is SSL/TLS authentication based on security certificates, and the other is token based authentication. In the SSL authentication process, 1. The gRPC client first transmits the SSL version, encryption algorithm and the generated random number random_C to the server; 2. The server returns a handshake response and sends its own public key and random_S to the client; 3. The client verifies the validity of the server certificate, including whether the certificate has expired, whether the certificate chain is credible, whether the public key of the issuer’s certificate matches the digital signature. And a random number pre_master is generated after the verification is completed; 4. The server decrypts the pre_master with the private key, and obtains the negotiated key through fixed function calculation; 5. The client and server use the negotiated key to encrypt the handshake message, and the authentication process is completed after the mutual decryption is passed [10]. Opening the SSL/TLS secure channel for microservice data interaction can not only prevent data leakage, but also limit man in the middle attacks and greatly reduce the network security risk of edge network shutdown. Another method is the token authentication mechanism, which allows the user to assign a token to each microservice application, and the server can judge whether the remote access is legal according to the token, so as to control different access rights for different applications. The generation and verification rules of the token can be implemented by the user, usually with a time limit, and if it expires, a new application is required. The management data communication between the edge gateway management software and the APP adopts token authentication to control the access rights of the APP; the business data communication between the APPs adopts the SSL/TLS authentication method to ensure the security of information exchange.

74

F. Li et al.

5.2 Security of Southbound Device Access There are various communication protocols for southbound access devices, and different protocols have different security protection strategies. For example, MQTT protocol can verify identity, IEC61850 protocol can verify message integrity and legitimacy. This requires that the developers of the edge gateway service APP have a good sense of security protection, and the gateway management software will also start from the control of database read and write permissions for re protection. For the access of plug and play terminal equipment, the edge gateway will push the alarm prompt to the operation and maintenance master station interface after receiving the online message, and it will be added to the terminal equipment assets of the gateway only after manual confirmation. In addition, the substation operation and maintenance edge gateway will be deployed in the production control area, while the access cameras and other auxiliary equipment will be deployed in the information management area. Therefore, a special horizontal one-way security isolation device must be set between the edge gateway and the camera, as shown in Fig. 3. The access program of the camera device converts the single frame image information into an E language file, and performs strong content filtering through the reverse isolation device, while the drive APP on the edge gateway will restore the E language file to the original data such as pictures [11]. Forward unidirectional isolation device

Safety I/II Zone

Safety III/IV Zone tcp

tcp Edge Gateway E language file

Cameras, Sensors, Robots...

E language file

Reverse unidirectional isolation device

Fig. 3. Working principle of transverse safety isolation device.

5.3 Security of Northbound Data Reporting The north communication between the substation edge gateway and the operation and maintenance master station is divided into management data and business data transmission. Management data transmission adopts customized TCP protocol to complete the configuration of edge gateway operation parameters. Select DL/T634.5.104 protocol or GSP protocol for service data transmission to complete the query and push of service data. The communication parameters of the operation and maintenance master station are configured on the edge gateway, including communication address, security certificate, control authority, etc. The edge gateway will initiate an authentication process before data transmission starts, and the edge gateway will send the name, IP address, and random number to the master stations. The master station use SM2 algorithm to

Information Security Protection Techniques

75

sign the authentication request message. The SM2 digital signature algorithm is a public key cryptographic algorithm based on elliptic curve operations [12]. It is an asymmetric encryption method. Compared with the most commonly used public key algorithm RSA, the SM2 algorithm has a more advanced elliptic curve cryptographic mechanism, higher encryption strength, and faster speed [13]. When the operation and maintenance master station receives the authentication request message from the edge gateway, if the verification fails, it disconnects the TCP connection, and if the verification passes, it replies an authentication response message to the edge gateway, and finally completes the authentication and authentication process. Business message communication after the authentication is completed. For the business scenarios of remote control and service proxy, a digital signature field will also be added to the message for re-checking to ensure the security of remote operations. The gateway management data and business data choose whether to encrypt the transmission according to the data confidentiality requirements. The encrypted transmission usually adopts a symmetric encryption algorithm, and the SM4 algorithm is recommended [14]. For the data communication using the traditional power system protocol, the SM2 digital signature security authentication process is added after the TCP connection is established. In Fig. 4, the security authentication process of the DL/T634.5.104 transmission protocol is shown. O&M Master Station

Substation Edge Gateway TCP SYN

TCP ConnecƟon Phase

TCP SYN+ACK TCP ACK AuthenƟcaƟon request

SM2 Signature AuthenƟca Ɵon Phase

AuthenƟcaƟon response

AuthenƟcaƟon confirmaƟon

Send START frame Data InteracƟon Phase

Gateway name, gateway IP, random number R1, SM2 digital signature

Master staƟon informaƟon, random number R1, random number R2, SM2 digital signature

Random number R2, SM2 digital signature

Confirm START frame ApplicaƟon data transmission

Fig. 4. Digital signature authentication process of DL/T634.5.104 transmission protocol.

76

F. Li et al.

6 Security of Business Process For the substation operation and maintenance edge gateway, in addition to security reinforcement at the technical level, it is also necessary to provide security protection for all aspects of the gateway business process. The security protection measures taken include identity authentication and authority control, security audit, backup and recovery, access rules and legality check. 6.1 Identity Authentication and Authority Control The edge gateway management software interface can be remotely logged in at the main operation and maintenance station, and a multi-factor login method combining password, ukey, and biotechnology is used to uniquely identify the logged-in user’s identity [14]. Passwords have complexity requirements, and are stored and transmitted in ciphertext. It has the function of handling login failures, enabling relevant measures such as ending sessions, limiting the number of illegal logins, and automatically logging out after login connection timeout. After multiple login failures, the account is locked for a period of time to prevent brute force cracking. Based on the principle of separation of powers, the users are divided into three roles: administrator, operator, and auditor. And the permissions of different roles restrict each other. Administrator users can create and manage accounts other than auditor users, operator users can configure edge gateways and access devices, and auditor users can view operation audit logs. A whitelist mechanism is established in the edge gateway. The whitelist is controlled by services, files, and database tables at different levels, and remote access to network addresses and users outside the whitelist is denied. Identity information needs to be reconfirmed when performing remote control operations. 6.2 Security Audit and Backup The edge gateway management software has a self-diagnostic function, which monitors and alerts key process exceptions, communication exceptions, timing exceptions, excessive CPU or memory usage, insufficient storage space and other problems. Security audit provides traceability means for security event analysis. The log types include login log, operation log, maintenance log, and the content includes security level, time, user, type, specific content. The audit behavior covers all users, and the audit log will be stored for more than 6 months. The gateway management software provides an access to query audit logs and supports exporting logs into audit reports. The edge gateway can package and export the parameter configuration and key processes to offline file packages, and the configuration files cannot be stored in clear text. When there is a problem with the gateway software version, users can import the backup file to quickly restore the operation. 6.3 App Store Review The launch and installation of APPs is the key process of edge gateway business. The developer uploads the APP installation package to the application store in the master

Information Security Protection Techniques

77

operation and maintenance station, and the application store sends it to the substation edge gateway. The APP installation package is packaged with applications, configuration files, dependent libraries, and images. Based on security considerations, to prevent system vulnerabilities introduced by the basic image, the edge gateway APP installation package adopts minimal packaging. Several basic images are pre-installed to choose. The configuration file adopts the uniformly specified json format, and the content includes the developer information, version information, image version, application type, parameter template of APPs, as shown in Table 1.. Table 1. Uniformly stipulated APP configuration file format. Field Name

Field Type

Field Description

Name

String

App name

Description

String

App description

Type

String

Divided into driver class and service class

Version

String

Release version

Image

String

Mirror version

ManufactureName

String

Developer manufacturer name

Settings

JsonArray

Operation parameter configuration item

DockerConfig

JsonArray

Docker related configuration

Imports

JsonArray

Input data group

Exports

JsonArray

Output data group

Before the installation package is put on the shelf, testers need to check the normalization of the configuration files in the app package and the integrity of the dependent libraries, and install and run it on the simulation platform to check whether there are security problems such as CPU, excessive traffic occupation, buffer leakage, etc. Only the tested installation package is allowed to be put on the app store. When sending package to the edge gateway, the management software on the gateway will check the configuration file format and image information of package, again. After confirmation, the installation process will continue, as shown in Fig. 5.

7 Test and Pilot Applications Referring to the system architecture in Fig. 1, a test environment is built in the factory, including the operation and maintenance master station, the edge gateway and various types of substation equipment. The master station and the edge gateway are connected through a LAN. Use protocol testing tools, vulnerability scanning tools and network performance testing tools to perform security detection on the edge gateway. As shown in Table 2., the test items include device access control, access authorization and authentication, operation security audit, data integrity, network attack defense, illegal services and port scanning. Test results meet the expected requirements.

78

F. Li et al.

Fig. 5. APP package launch and installation.

Table 2. Gateway Security Test Project Test Content

Test Items

Test Results

Equipment access control

Equipment access in assets

Up to standard

Equipment access outside assets

Up to standard

User role management

Up to standard

Authorization item test

Up to standard

Configure software certification

Up to standard

Identity authentication and password verification

Up to standard

Audit record events

Up to standard

Audit log export

Up to standard

Data transmission integrity

Up to standard

Access authorization and authentication

Operation safety audit Data integrity

Data storage integrity

Up to standard

Input data syntax verification

Up to standard

Network attack defense

Data flooding attack

Up to standard

Fuzzy test of communication protocol

Up to standard

Illegal service and port

Service and port scanning

Up to standard

In addition, a pilot application was carried out in a 110 kV substation. The overall function was normal and the operation was stable, which improved the operation and maintenance efficiency of the pilot substation. Among them, the virtual human-computer interaction APP function of secondary equipment is mainly applied. As shown in Fig. 6, the APP can map the LCD display and key operation of secondary equipment in the substation to the interface of the master station, and combine the camera in the station to capture the real-time picture. It can realize remote operation and visualized operation and maintenance of secondary equipment in substations.

Information Security Protection Techniques O&M Master StaƟon

SubstaƟon Edge Gateway Virtual LCD driver APP

Secondary equipment

indicator light

keyboard module

LCD drawing

...

79

Virtual LCD interface Key operation

Indicator Status Liquid crystal image

internal informaƟon

Core service layer

Fig. 6. Flow chart of virtual LCD app for secondary equipment.

8 Conclusion With the development of intelligent and digital power grid, cloud-edge collaboration architecture is gradually popularized in the power system. As the key equipment in the cloud-edge collaboration architecture, the safe and stable operation of edge gateway is crucial. Starting from the operation and maintenance work of the substation, focusing on the structure and function of the edge gateway, the paper expounds the security protection schemes that can be taken on the edge gateway from the aspects of equipment ontology, container technology, micro service management, communication authentication, identity authentication, and security audit, to ensure the overall security of the cloud-edge collaborative operation and maintenance system. In the future, edge gateways will be gradually applied to scenarios with more stringent security requirements such as scheduling, control and remote configuration. At the same time, asset data in edge gateways can be considered to be associated with engineering production management (PMS) systems and enterprise resource planning (ERP) systems. It can be used to standardize the management of cloud-edge collaboration system, facilitate the remote operation and maintenance of substation equipment, and help the digital transformation of power grid.

References 1. Zhang, M., Xu, C., Zhang, Q., et al.: Remote operation and maintenance technology of substation automation device based on micro-services architecture. Electr. Power Eng. Technol. 41(4), 177–182 (2022) 2. Wu, X., Zhang, C., Pan, H., et al.: Design and implementation of safety control technology for substation automation wide-area operation and maintenance system. Zhejiang Electr. Power 39(1), 41–46 (2020) 3. Zhang, Q., Wang, G., Li, J., et al.: Design and implementation of substation remote operation and maintenance platform. Power Syst. Protect. Control 47(10), 164–172 (2019) 4. Chen, J., Lin, W., Xia, W., et al.: Research on key technologies of substation switching antimisoperation based on cloud-edge collaboration. Electr. Power Inf. Commun. Technol. 20(8), 91–98 (2022) 5. Yang, W., Liu, W., Cui, H., et al.: SG-Edge: key technology of power internet of things trusted edge computing framework. J. Soft. 33(2), 641–663 (2022)

80

F. Li et al.

6. Leng, B., Pang, F.: Construction method of trusted computing platform based on domestic CPU. Commun. Technol. 52(8), 2044–2049 (2019) 7. Wu, S., Wang, K., Jin, H.: Research situation and prospects of operating system virtualization. J. Comput. Res. Develop. 56(1), 58–68 (2019) 8. Ren, L., Zhuang, X., Fu, J.: Technical research of Docker container security protection. Telecom Eng. Technics Standardiz. 33(3), 73–78 (2020) 9. Wan, S., Yi, Q., Zhang, K., et al.: Microservice Architecture based service choreography technology for new generation dispatching and control system. Autom. Electr. Power Syst. 43(22), 116–121 (2019) 10. Song, L., Luo, Q., Luo, Y., et al.: Encryption on power systems real-time data communication. Autom. Electr. Power Syst. 14, 76–81 (2004) 11. Shi, H., Zhai, G., Lu, X., et al.: File transfer method and system across reverse isolation device. Industr. Control Comput. 33(11), 22–25 (2020) 12. National Cryptography Administration Public Key Cryptographic Algorithm SM2 Based Elliptic Curves. https://sca.gov.cn/sca/xwdt/2010-12/17/content_1002386.shtml. Accessed 17 Dec 2010 13. Hu, J., Yang, Y., Xiong, L., et al.: SM algorithm analysis and software performance research. Netinfo Secur. 21(10), 8–16 (2021) 14. ISO/IEC 18033–3:2010/Amd 1: Information technology - Security techniques - Encryption algorithms - Part 3: Block ciphers-Amendment 1: SM4 (2021)

A Robust MOR-Based Secure Fusion Strategy Against Byzantine Attack in Cooperative Spectrum Sensing Lan Guo, Weifeng Chen(B) , Yang Cong, and Xuechun Yan Yangzhou University, Yangzhou 225000, China [email protected] Abstract. Cognitive radio has emerged as a promising technology aimed at improving allocated spectrum utilization and alleviating spectrum shortage through opportunistic spectrum usage, which has been widely studied due to its strong robustness and high reliability. However, its vulnerability to potential attacks raises security issues. Extensive researches have focused on how to alleviate the negative effect of the malicious attack on cooperative spectrum sensing. This paper first briefly discusses the models of cooperative spectrum sensing, fading channel, and malicious attack, then further illustrates the influence of fading and malicious users on cooperative spectrum sensing in detail. Motivated by the above analysis, we propose a robust modified outlier removal (MOR) spectrum sensing scheme. Before data fusion, the fusion center conditionally removes outlier which is most likely tampered by the malicious user. Through several simulations, we compare and analyze the proposed scheme with the traditional scheme to verify its correctness and feasibility. Simulation results show that the proposed scheme has a high defense under various sensing environments and different attack strengths. The proposed scheme can resist malicious users and improve detection performance more effectively than the traditional scheme. Keywords: Cooperative spectrum sensing · malicious attack fusion · Byzantine attacks · fading · security

1

· data

Introduction

Spectrum sensing is a crucial technology in cognitive radio (CR) [3], which can improve the utilization of the precious spectrum resource. However, single node sensing can lead to unreliable sensing performance when affected by hidden terminal issues due to actual fading and shadow effects [6]. Thus cooperative spectrum sensing (CSS) has been extensively explored and studied. In some situations, there will be malicious users (MUs) in the CSS, which cause the idle spectrum to be selfishly occupied and the considerable degradation of the sensing performance. The common malicious attacks include Byzantine attack [16], primary user emulation attack (PUEA), etc. Therefore, it’s necessary to propose a reliable and effective spectrum sensing scheme in the present of MUs. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023  H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 81–94, 2023. https://doi.org/10.1007/978-981-99-3581-9_6

82

1.1

L. Guo et al.

Related Work

In recent years, there have been multiple researches focused on spectrum sensing [2]. In order to mitigate the impact like fading, CSS [1] is an important part of spectrum sensing. Of all algorithms, energy detection (ED) [12] is the most dominantly used not only due to its low computational and implementation complexities, but also because it does not need any prior knowledge of primary transmission characteristics, so lots of CSS researches are based on ED. Due to the openness and complexity of wireless channels, CSS is vulnerable to potential malicious attacks [16], which makes cognitive radio network (CRN) face huge security challenges. Byzantine attack [13] is one of the most famous security threats in the process of CSS. The main objectives of Byzantine attackers in CSS are two aspects: one is to reduce the detection probability by interfering with normal operation of CRN, the other is to increase the false alarm probability and deny access to honest secondary users (SUs). Therefore, several spectrum sensing schemes against Byzantine attack have been proposed in recent years. The Byzantine attack model is comprehensively analyzed in reference [18]. In the presence of the Byzantine attacks, a new robust CSS method based on evidence theory and credibility calculation is given in [15]. To defense against spectrum sensing data falsification(SSDF) attack, reference [8] proposed a neighbor detection-based spectrum sensing algorithm in distributed CRNs. In [5] formulated an easy CSS scheme to counter Byzantine attacks by removing the anomalous value from the data fusion process. A secure handoff mechanism based on the trust of each cognitive user (CU) is proposed in [10]. MUs can be effectively distinguished between trusted and untrusted CUs by looking up the trust values. To ward off MUs, a new defense method was proposed in [17]. The proposed scheme is more effective against attacks than the traditional scheme, but it doesn’t consider the available of scheme under fading channels. In general, reliable CSS remains a huge challenge in the field of CR. 1.2

Contributions

This paper focuses on how to defend against malicious users (MUs) in CSS. The main contributions of this paper are concluded as follows. – For MUs, a relatively large value will be used in the perception and the final decision will be affected. We demonstrate how MUs can practically affect detection performance through simulations. – Aiming at Byzantine attacks in CRN, we propose a new modified outlier removal (MOR) spectrum sensing scheme based on equal gain combination (EGC). We use the new method to conditionally identify and eliminate the outlier in advance. – We demonstrate the feasibility and effectiveness of our new modified scheme by lots of simulations, which including the error probability under the Nakagami-m fading, the receiver operating characteristic (ROC) curve of the proposed scheme with different SNR, etc.

A Robust MOR-Based Secure Fusion Strategy

1.3

83

Organization

The remainder of this paper is structured as follows. In Sect. 2, the system model including CSS, fading and attack channel models are introduced. The traditional CSS with the MU and its problems are discussed in Sect. 3. To protect CSS from Byzantine attack, a new CSS scheme is proposed in Sect. 4. Section 5 presents the simulations and discussions of the new scheme. Finally, Sect. 6 draws conclusions and presents future research directions.

2 2.1

System Model Cooperative Spectrum Sensing Model

Typically, the CSS model can be formulated as two hypotheses with H0 (PU signal absent) and H1 (PU signal present)  H0 : ri (n) = ωi (n) , (1) H1 : ri (n) = hi (n) si (n) + ωi (n) , where ri (n) is the received signal, ωi (n) denotes the Additive White Gaussian Noise (AWGN) with zero mean and variance σω2 , n is the number of sampling points of observed signal, si (n) denotes the primary signal which is independent of the noise ωi (n). hi (n) is the channel gain, when hi (n) = 1 means there is no fading in the channel. In CSS, the false alarm probability Pf,i , the detection probability Pd,i , and the total error probability Pe,i are three key indicators to assess the performance Pf,i = Pr {decide = H1 |actually = H0 } ,

(2)

since there is no prior information, we follow the Neyman-Pearson (NP) criterion [9], where the detector must make the greatest probability of detection at the fixed false alarm probability. Hence, Pd,i = Pr {decide = H1 |actually = H1 } ,

(3)

Pe,i = Pr {H0 } Pf,i + Pr {H1 } (1 − Pd,i ) .

(4)

and then It can be observed from Fig. 1 that, the model of cooperative sensing is discussed under the centralized scenario. The primary users (PUs) have the authorized frequency band, while part of which is not utilized. By collecting the local sensing results of each node, fusion center (FC) can make use of global test statistic to determine the final sensing decision, thus improving the detection performance.

84

L. Guo et al.

SU Fusion Rule

Energy Detection

PU

SU

FC

Global Decision

MU

Fig. 1. CSS model in CRN

2.2

Fading Channel Model

This paper mainly focuses on Nakagami-m fading [4], which is a fast fading model proposed by Nakagami, the probability density function (PDF) is f (g) =

mg 2 2mm · g 2m−1 1 · exp− Ω , g ≥ 0, m ≥ , m Γ (m) · Ω 2

(5)

where Γ (·) is the Gamma function, Ω represents the expectation of the square of the channel gain g. Where m represents the degree of fading, the smaller the value, the more severe the decline. When m = 0.5 is a unilateral Gaussian distribution, when m = 1, it degenerates into a Rayleigh distribution, m = ∞ is a random channel. Furthermore, when satisfied with the Nakagami-m distribution, 2 |g| is also the gamma distribution. 2.3

Attack Model

The collaboration model of MUs is based on CSS, in which there may be MUs among honest SUs. The primary goal of MU is to reduce the detection performance of the CRN by intentionally falsifying sensing data. The typical Byzantine attack model [11] can be described as  Ti + ρi E (Ti |H0 ) | H0  Ti = , (6) Ti − ρi E (Ti |H0 ) | H1 where Ti represents the test statistic of ith SU, Ti is falsified sensing results. The intensity of the attack is determined by ρi and mean E (Ti |H0 ). As i changes, E (Ti |H0 ) is a constant value. Here we directly consider ρi as the attack strength. It must be pointed out that here we mainly describe attack model under soft collaboration, and the attack model under hard collaboration is similar, which can be inferred from soft collaboration.

A Robust MOR-Based Secure Fusion Strategy

3 3.1

85

CSS with MU and Its Influence Energy Detection

As we all know that ED does not need any prior information of the signal. Its low calculation complexity and simple implementation make it widely used in various fields. The energy value is Y =

N 

2

|r(n)| ,

(7)

n=1

where N is the sample number, the received signal is r (n). The decision metric of ED [7] can be written as TED =

H1 Y ≷ ηED , σω2 H0

(8)

where TED is the test statistic of single-user, σω2 is the variance of Gaussian Noise. ηED is the local threshold. For a large N , according to central limit theorem (CLT), TED is approximated as a Gaussian random variable under both hypotheses H0 and H1  H0 : TED ∼ N (N, 2N ), (9) H1 : TED ∼ N (N (1 + ϑ) , 2N + 4N ϑ) , where ϑ is the average signal-to-noise ratio (SNR) of ith SU. In this setting, the detection probability Pd , a index to evaluate the performance of spectrum sensing, can be represented as Pd = Pr (TED > ηED |H1 )   ηED − N (1 + ϑ) √ , =Q 2N + 4N ϑ

(10)

where Q (·) is the complementary distribution function of the standard Gaussian with zero mean and unit variance. Similarly, the false alarm probability Pf can be expressed as Pf = Pr (TED > ηED |H0 )   (11) ηED − N √ =Q . 2N Further, the expression of threshold ηED is √ ηED = Q−1 (Pf ) 2N + N, where Pf is the required false alarm probability.

(12)

86

3.2

L. Guo et al.

Cooperative Spectrum Sensing with Energy Detection

In practice, the accuracy of spectrum sensing decreases significantly when single user is exposed to some random or unpredictable factors. On this basis, CSS can effectively reduce the impact of the above factors. This paper will focus on the soft decision scheme in the centralized scenario, which based on the soft judgment fusion algorithm reported the complete sensing information to the FC. The common soft judgment fusion algorithms include the EGC algorithm and maximal ratio combining (MRC) algorithm. Here we focus on the EGC algorithm. Hence, the test statistic of the FC can be described as TF C =

M 

(13)

Ti ,

i=1

where M is the total number of users participating in CSS. Based on the previous analysis, the distribution of TF C is ⎧ M  M ⎪ ⎪ E (Ti |H0 ) , D (Ti |H0 ) , ⎨ H0 : TF C ∼ N i=1 i=1  (14) M M ⎪ ⎪ ⎩ H1 : TF C ∼ N E (Ti |H1 ) , D (Ti |H1 ) , i=1

i=1

where E (·) and D (·) represent the mean and variance functions, respectively. The false alarm probability of the FC can be modified to ⎛ ⎞ M ⎜ ηF C − E (Ti |H0 ) ⎟ ⎜ ⎟ i=1 ⎟  PF = Q ⎜ ⎜ ⎟ M ⎝ ⎠ (15) D (Ti |H0 )  =Q

i=1

ηF C − M N √ 2M N

 .

If the false alarm probability value is constant, the desired threshold of FC is

 √ 2M N + M N. ηF C = Q−1 PF

(16)

In the case of signal transmission, the theoretical detection probability PD of FC is directly obtained as ⎛ ⎞ M ⎜ ηF C − E (Ti |H1 ) ⎟ ⎜ ⎟ i=1 ⎜ ⎟. (17) PD = Q ⎜  ⎟ M ⎝ ⎠ D (Ti |H1 ) i=1

A Robust MOR-Based Secure Fusion Strategy

3.3

87

The Influence of Malicious Users

Potential MUs can imitate the signal characteristics of the PU by masquerading as the PU when the PU has not yet occupied the authorized band, making the SU mistaken for the PU, thus unable to access the spectrum, resulting in the waste of spectrum resources and decreased spectrum utilization, this is called the PUEA. MUs can try to tamper with real local data in the cooperation process, by passing false sensing data make multi-user cooperative sensing come to the wrong conclusion, namely SSDF. In severe cases, the entire cognitive system will be paralyzed and unable to function. Therefore, it is of great significance to analyze the influence of MU and design spectrum sensing methods that can resist the tampering attack of MUs. In this part, the ED performance in MU will be first analyzed. ED has become the most used detection method in practical application because of its good performance and low computational complexity. Therefore, ED is also used in CRNs. The energy value of ith SU is Yi =

N 

2

|ri (n)| ,

(18)

n=1

further, the test statistic Ti is compared to a threshold ηi to decide whether the PU is present or not H Yi 1 (19) Ti = 2 ≷ ηi , σi H0 where ηi and σi2 are the threshold and variance of ith SU, respectively. The honest SU that truthfully submits own local sensing value, while Byzantine attacker submit Ti . Under H1 , to decrease the detection probability  Ti + ρi E (Ti |H0 ), (1 − Pd,i )  Ti = , (20) Ti − ρi E (Ti |H0 ), Pd,i clearly MU sent tampered T˜i rather than the original Ti . Under H0 , to increase the false alarm probability  Ti + ρi E (Ti |H0 ), (1 − Pf,i ) Ti = , (21) Ti − ρi E (Ti |H0 ), Pf,i thus preventing other honest SUs from using the idle channel. Combined with the previous analysis, it can be seen that Ti follows Gaussian distribution both under H0 and H1 . Figure 2 is the PDF of MU and SU [5]. It is obviously difficult to identify MU. Therefore, it is necessary to propose a defense scheme to identify and eliminate MU.

88

L. Guo et al.

Fig. 2. PDF of the test statistic

4 4.1

A Robust Modified Outlier Removal Sensing Scheme Concept Design

Considering these, we proposed a robust modified outlier removal (MOR) spectrum sensing scheme based on traditional EGC. The main idea is that if there has any problem with the sensing data, there will be no gain but the loss in CSS, so we should get rid of it. Since the MU will use a large ρi , which can profoundly affect the decision-making process. Therefore, identification and defensible disposition of outlier before data fusion at FC are critical. We improve the scheme based on the EGC scheme and conditionally delete the outlier that is most likely to come from the MU. The new MOR scheme is based on the box plot in probability theory. 4.2

Detailed Procedure

The criteria through judging outliers to resist MU can be summarised as follows. First, there are n CUs, the test statistic of each user is Ti , i = 1, ..., n, where T1 ≤ ... ≤ Ti represent the ordered data  , T  np = integer , (22) Tp = 1 ([np]+1) T + T (np) (np+1) , np = integer 2 where Tp is the test statistic of pth quantile, n is the sample number of the total test statistics, T(np) is the test statistic of (np)th quantile, T(np+1) is the test statistic of (np + 1)th quantile. According to the statistics, the first quartile Q1 (the 25th percentile) and the third quartile Q3 (the 75th percentile) are Q1 = T0.25 ,

(23)

Q3 = T0.75 ,

(24)

A Robust MOR-Based Secure Fusion Strategy

89

using the quartile [14] is more objective and can effectively protect against MUs. The interquartile range is (25) IQR = Q3 − Q1 . The measure of spread similar to the standard deviation is defined as High = Q3 + 1.5IQR,

(26)

Low = Q1 − 1.5IQR.

(27)

If the test statistic of any user is less than Low (under H1 ) or greater than High (under H0 ), the data beyond the extremes are considered to be from MUs and regarded as potential outliers, then FC will mark and remove them before data fusion  TF C − Tmin , H1 T = , (28) FC TF C − Tmax , H0 where Tmin is the minimum value of all test statistics, Tmax is the maximum value of all test statistics, T F C is the modified test statistic of FC. Accordingly, after removing the malicious node, the threshold of the FC is   −1 η Pf 2 (M − 1) N + (M − 1) N. (29) FC = Q In general, the proposed scheme is carried out in FC. In the case of H1 , when the PU is real exit, the normal test statistic TED should be greater than the judgment threshold ηF C , while the MU will intentionally tamper with the perceived result, specifically to reduce the test statistic, make it less than the decision threshold η F C , so the result is mistaken that the PU does not exist and the sensing performance is degraded. At this point, the proposed scheme is to eliminate the test statistic most likely to come from the MU in the sorted T with the given algorithm. Since the MU lower the test statistic in this case, we can directly compare the minimum value Tmin . If Tmin is smaller than Low, the minimum value is considered abnormal which comes from the MU. The situation of two MUs is similar. In this way, the proposed scheme is easy to operate and the algorithm complexity is low. It’s worth noting that each node here is actually distributed differently. If there is no signal, just normalization, they are the same. Since we detect under low SNR, we can consider that their distributions and the PDF of the test statistics after normalization is approximately the same. Thus, the impact is not very large. Besides, to verify the universality of the proposed scheme, we compare the performance under different malicious attack strength by changing ρi . We discuss the performance of MUs changing SNR and SUs changing SNR, so as to create different sensing environments. Nakagami-m fading is also added to observe the performance of the proposed scheme under the influence of varying degrees of fading by changing the fading factor m. Thus verifying the reliability of the proposed scheme from different aspects.

90

5 5.1

L. Guo et al.

Simulation Results and Discussion Parameters Setting

Here we use the BPSK signals, assuming that every SU has the same false alarm probability Pf,1 = Pf,2 = · · · = Pf,M = Pf , the same detection probability Pd,1 = Pd,2 = · · · = Pd,M = Pd , Pf = 0.05, the attack strength ρi = 0.3, simulation times is 10000, Pr {H0 } = Pr {H1 } = 0.5, the decision threshold of FC are shown in (16), the default SNR is −10 dB and the total number of SUs is 10. The sampling points N = 500 in Fig. 4 and N = 200 in other Figures. 5.2

Simulation Results

1 0.9 0.8 0.7 0.6 0.5 0.4

EGC,SU=10,MU=0 EGC,SU=9,MU=1 EGC,SU=8,MU=2 Proposed,SU=10,MU=0 Proposed,SU=9,MU=1

0.3 0.2

Proposed,SU=8,MU=2 AND,SU=10,MU=0 AND,SU=9,MU=1 OR,SU=10,MU=0 OR,SU=9,MU=1

0.1 0 0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Fig. 3. Achieved false alarm probability Pf under different required false alarm probabilities Pf

Figure 3 shows achieved false alarm probability Pf under different required false alarm probabilities Pf . We compare the MOR scheme to the EGC scheme, OR rule, AND rule [5]. First, it can be seen from the simulation results that the detection performance of the EGC scheme is the best in the traditional scheme when the MU exist. Therefore, in the subsequent experiments, in order to highlight the performance of our proposed scheme, we only compare the performance of the optimal EGC scheme with the MOR scheme. It obvious that the presence of MU seriously affects the achieved false alarm probability Pf of the system, making the Pf higher and the sensing performance worse. Under the condition of the same sampling points, when the required false alarm probability is 0.07,

A Robust MOR-Based Secure Fusion Strategy

91

in the presence of one MU, the achieved false alarm probability of EGC scheme is about 0.5, while that of the proposed scheme is about 0.14. The proposed scheme significantly lower than the traditional EGC scheme. Above all, the proposed scheme can effectively reduce the false alarm probability of the CRN, which can better resist MUs and improve the robustness of the system. 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Fig. 4. The total error probability Pe with different required false alarm probabilities Pf under two sensing environment

In Fig. 4 the total error probability Pe with different required false alarm probabilities Pf under two sensing environment is given. Obviously, no matter how the sensing environment is changed, the proposed scheme can still maintain good performance. When there is one MU, the total error probability of the traditional scheme rises obviously, while that of the proposed scheme is obviously lower than the traditional scheme. It can be seen from the figure that the performance of the proposed scheme without MU is even better than the traditional scheme with without MU. In Fig. 5 the total error probability Pe with different required false alarm probabilities Pf under the different number of SU is given. Obviously, when there is one MU, no matter how the number of SU is changed, the performance of proposed scheme is better than the traditional one. Even when there is no MU, the total error probability in the proposed scheme is lower than in traditional. Figure 6 shows the total error probability Pe with different required false alarm probabilities Pf under Nakagami-m fading channel. We respectively draw the fading with m = 0.5, m = 1 and m = 2. Under the fading channel, it can be clearly seen that the decline amplitude of the single-point curve is obviously than that of cooperative sensing. By comparing the curves when the number of MU is the same, the total error probability

92

L. Guo et al. 0.6

0.5

0.4

0.3

0.2

0.1

0 0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Fig. 5. The total error probability Pe with different required false alarm probabilities Pf under the different number of SU

0.6

0.5

0.4

0.3

0.2

0.1 0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Fig. 6. The total error probability Pe with different required false alarm probabilities Pf under Nakagami-m fading channel

of the fading parameter m = 2 is obviously more lower. It can be seen that the smaller the m, the higher the corresponding error probability, the more serious the fading, and the worse the sensing performance.

A Robust MOR-Based Secure Fusion Strategy

6

93

Conclusion and Future Directions

This paper analyzes and discusses the CSS model, fading model, and malicious attack model. Then analyzes the specific impact of MU on CSS. In the case of malicious attack, we propose a robust MOR spectrum sensing scheme based on EGC, which is easy to cooperate. In this new scheme, FC conditionally culls values that are most likely to from MU. Further, it has been demonstrated that the proposed scheme is more resistant to Byzantine attacks than the traditional one by simulation results. There is still some meaningful work that remains to be explored, such as what the optimal defense scheme in other attack modes is, confirming the feasibility of the proposed scheme in distributed scenarios, and how to design a fusion algorithm with less complexity and time under certain detection performance. Acknowledgments. This work was supported by the National Natural Science Foundation of China under Grant 201011348.

References 1. Akyildiz, I., Lo, B.F., Balakrishnan, R.: Cooperative spectrum sensing in cognitive radio networks: a survey. Phys. Commun. 4(1), 40–62 (2011) 2. Ali, A., Hamouda, W.: Advances on spectrum sensing for cognitive radio networks: theory and applications. IEEE Commun. Surv. Tutorials 19(2), 1277–1304 (2017). https://doi.org/10.1109/COMST.2016.2631080 3. Banerjee, A., Maity, S.P.: Jamming in eavesdropping on throughput maximization in green cognitive radio networks. IEEE Trans. Mob. Comput. 22(1), 299–310 (2023). https://doi.org/10.1109/TMC.2021.3068797 4. Bera, D., Chakrabarti, I., Pathak, S.S., Karagiannidis, G.K.: Another look in the analysis of cooperative spectrum sensing over nakagami- m fading channels. IEEE Trans. Wireless Commun. 16(2), 856–871 (2017). https://doi.org/10.1109/TWC. 2016.2633259 5. Gao, R., Zhang, Z., Zhang, M., Yang, J., Qi, P.: A cooperative spectrum sensing scheme in malicious cognitive radio networks. In: 2019 IEEE Globecom Workshops (GC Wkshps), pp. 1–5 (2019). https://doi.org/10.1109/GCWkshps45667. 2019.9024531 6. Godugu, K.K., Vappangi, S.: Performance analysis of Wideband Spectrum Sensing (WSS) in Cognitive Radio Networks (CRN) over erroneous sensing and reporting channels, pp. 1–6 (2022). https://doi.org/10.1109/ASIANCON55314.2022.9908624 7. López-Benítez, M., Casadevall, F.: Improved energy detection spectrum sensing for cognitive radio. IET Commun. 6(8), 785–796 (2012) 8. Pei, Q., Li, H., Liu, X.: Neighbor detection-based spectrum sensing algorithm in distributed cognitive radio networks. Chin. J. Electron. 26(2), 399–406 (2017) 9. Rahaman, M.F., Khan, M.Z.A.: Low-complexity optimal hard decision fusion under the Neyman-Pearson criterion. IEEE Signal Process. Lett. 25(3), 353–357 (2018). https://doi.org/10.1109/LSP.2017.2766245 10. Rathee, G., Jaglan, N., Garg, S., Choi, B.J., Choo, K.K.R.: A secure spectrum handoff mechanism in cognitive radio networks. IEEE Trans. Cognitive Commun. Networking 6(3), 959–969 (2020). https://doi.org/10.1109/TCCN.2020.2971703

94

L. Guo et al.

11. Rawat, A.S., Anand, P., Chen, H., Varshney, P.K.: Collaborative spectrum sensing in the presence of byzantine attacks in cognitive radio networks. IEEE Trans. Signal Process. 59(2), 774–786 (2011). https://doi.org/10.1109/TSP.2010.2091277 12. Umar, R., Sheikh, A.U.H., Deriche, M.: Unveiling the hidden assumptions of energy detector based spectrum sensing for cognitive radios. IEEE Commun. Surv. Tutorials 16(2), 713–728 (2014). https://doi.org/10.1109/SURV.2013.081313.00054 13. Wu, J., et al.: Analysis of byzantine attack strategy for cooperative spectrum sensing. IEEE Commun. Lett. 24(8), 1631–1635 (2020). https://doi.org/10.1109/ LCOMM.2020.2990869 14. Xu, M., et al.: Early warning of lithium battery progressive fault based on box plot. In: 2019 3rd International Conference on Electronic Information Technology and Computer Engineering (EITCE), pp. 1072–1075 (2019) 15. Ye, F., Zhang, X., Li, Y., rui Tang, C.: Faithworthy collaborative spectrum sensing based on credibility and evidence theory for cognitive radio networks. Symmetry 9(3), 36 (2017) 16. Zhang, L., Ding, G., Wu, Q., Zou, Y., Han, Z., Wang, J.: Byzantine attack and defense in cognitive radio networks: a survey. IEEE Commun. Surv. Tutorials 17(3), 1342–1363 (2015). https://doi.org/10.1109/COMST.2015.2422735 17. Zhang, L., Nie, G., Ding, G., Wu, Q., Zhang, Z., Han, Z.: Byzantine attacker identification in collaborative spectrum sensing: a robust defense framework. IEEE Trans. Mob. Comput. 18(9), 1992–2004 (2019). https://doi.org/10.1109/TMC. 2018.2869390 18. Zhang, L., Wu, Q., Ding, G., Feng, S., Wang, J.: Performance analysis of probabilistic soft SSDF attack in cooperative spectrum sensing. EURASIP J. Adv. Signal Process. 2014(1), 1–9 (2014)

Security Analysis of Blockchain Layer-One Sharding Based Extended-UTxO Model Cayo Fletcher-Smith and Muntadher Sallal(B) Department of Computing and Informatics, Bournemouth University, Dorset BH12 5BB, UK [email protected], [email protected] Abstract. Blockchain technology facilitates the transfer of digital assets, accomplished through the distributed storage of a transaction ledger, allowing peer-to-peer participant nodes to agree on valid transactions based on their local records without the reliance on centralised infrastructure or trusted participants. Distributed ledgers are increasing in public adoption, which can be attributed to the permissionless infrastructure and a rise in decentralised finance (DeFi) protocols. In this growth, shortcomings in throughput and latency have been highlighted, especially when compared to traditional payment channels. The extended-UTXO (eUTXO) model offers the untapped potential to support a functionally scalable infrastructure by adopting qualities of both the account model, and directed acyclic graph-structured UTXO model. We identified the unique benefits of eUTXO as: the ability to bundle the transaction processing of non-conflicting input states, achieving parallelism at the validator nodes; and the ability to implement complex offchain scaling solutions through smart contracts. This research examines the security impact of sharding when applied alongside an eUTXO ledger. To illustrate this we introduce S-EUTO, a novel proof-of-concept statesharding protocol. It leverages distributed randomness to ensure unbiased node-to-shard distribution and introduces an input/output crossshard transaction architecture to maintain global state synchronisation. Our model demonstrated the potential of sharding alongside eUTXO without compromising security. Keywords: Blockchain scalability

1

· EUTXO · blockchain security

Introduction

Distributed ledger technology (DLT) was incorporated in a decentralized environment as a solution to these challenges in the Bitcoin whitepaper published by Satoshi Nakamoto, introducing the world to blockchain technology. Since its inception in 2008 it’s become one of the fastest growing, disruptive financial technologies since the digitization of currency. Blockchain technology leverages a ledger of transactions stored in a distributed manor across a network of peerto-peer (p2p) nodes [1]. Transactions within this network facilitate the transfer c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023  H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 95–123, 2023. https://doi.org/10.1007/978-981-99-3581-9_7

96

C. Fletcher-Smith and M. Sallal

of digital currency and assets, without the need for centralised trusted authorities. The trustless nature of transactions relies on the use of cryptography to prove the authenticity of transfers, allowing trust to be placed on the underlying protocol [2]. Each transaction is approved or rejected by the consensus of network participants and stored within blocks. Blocks are appended to the chain state and linked through cryptographic mechanisms ensuring the ledger remains immutable, traceable, and trusted by users [3,4]. While flagships like Bitcoin handle decentralisation and security well [5], the lack of scalability proves too dire for meaningful impact in the financial sector with its 7 transactions per second (TPS) compared to 170tps of PayPal and the peak 56,000tps provided by Visa [6]. These shortcomings are also observable in smart contract projects like Ethereum with a throughput of 11tps. These scalability issues are often solved off-chain, using smart contracts to ship processing onto parallel networks with specialised scalability mechanisms; allowing the primary network to focus on decentralisation and security, while sidechains account for scalability [7]. Smart contracts and sidechains are considered off-chain processes as the logic and architecture sit above the underlying protocol at layer-two (L2). L2 operations present an imperfect scaling solution since sidechains often present unintended consequences, such as increased centralised processes, when hyper-focused on efficiency. Instead, equal and less severe scaling solutions should be considered within both the main L1 network, and subsequent L2 chains [7]. L1 solutions often revolve around an approach known as sharding, where network segmentation and state partitioning occur to enable parallel processing and increase throughput. In essence, the smart contract functionality at layer-2 allows for complex possibilities in the way of off-chain scaling, as seen in the Ethereum ecosystem; alongside the potential for transaction parallelism at the validator node level, due to predefined input states. These two potential methods of scaling are not possible in unison on: account based blockchains, due to ambiguous state dependencies (pre-validation); or UTXO based ledgers, caused by a general lack of logical expression (beyond Bitcoin script).

2

Problem Statement and Contributions

Scalability is a serious problem in the blockchain ecosystem. Our in-depth analysis of the literature (See Sect. 4) shows that previous attempts to overcome the blockchain scalability issue were mainly focusing on sharding as well as Extended-UTXO. However, these previous attempts do not take into consideration the integration of sharding with Extended-UTXO as a scalability solution which may achieve optimal results. Considering the benefits of extended-UTXO in supporting scaling at both L2 and at individual validator nodes: we hypothesize that eUTXO has the potential to be a natural scaling solution, that enables various performance mechanisms throughout the blockchain stack. However, the potential for inherent scaling is further amplified if extended-UTXO is deployed alongside sharded network consensus, allowing parallel transaction validation.

Security Analysis of Blockchain Sharding Based EUTxO

97

We theories that in such a model, validator nodes could individually process transactions with non-conflicting states in parallel; validation shards could further scale the throughput at the consensus and data storage level; and smart contracts could facilitate L2 solutions, ensuring the ecosystem can adopt new scaling mechanisms. We consider if “Extended-UTXO in blockchain sharding can improve blockchain scalability without compromising security”. The main contributions of this paper can be summarised as follows: Security Evaluation: This paper examines whether Sharding based ExtendedUTXO can be done safely, without increasing the likelihood of certain classes of attacks, for instance, inconsistency between shards caused by malicious nodes. We propose and evaluate a new Sharding protocol based Extended-UTXO (SEUTXO). S-EUTXO is designed by integrating blockchain Sharding protocol with the ledger state model EUTXO which is proposed to overcome the issues of the UTXO ledger state model. In this paper, we have designed and run extensive simulations to evaluate security of the S-EUTXO protocol.

3

Background

In this section, we provide a general overview of the blockchain scalability. We focus on the blockchain scalability techniques by discussing the sharding and EUTXO. 3.1

Ledger State Models

Unspent Transaction Output Model. In the unspent-transaction-output (UTXO) model, transactions generate unspent-outputs associated with the recipient’s address, representing the transferred asset. UTXOs are spendable, with the associated wallet tracking the sum of all unspent-outputs assigned to the address [8,9]. To start a new transaction, the unspent-outputs are declared as inputs. To use these outputs, the owner of a wallet must sign the transaction with their private key, signifying: (i) the outputs are spendable by the wallet, and (ii) the user authorises the transaction. The digital signature is known as a witness to the transaction [10]. Wallet addresses are derived from the public key, and can be explicitly linked to the private key that controls authorization. Once processed the transaction will generate UTXOs of equal value, assigned to the recipient’s address. If the transaction only consumes a fraction of the input, the output of the transaction will assign remaining funds back to the sender. These processes can be conceptualised as paying in cash, and getting change returned to you. This process is illustrated as a directed acyclic graph in Fig 1. Account Model. The account model supports smart contract accounts (controlled by code), and externally owned accounts (controlled by private-key), containing account balances. When transactions occur, the sender’s account balance has the transaction value deducted, then the recipient’s account is increased by the same amount. The global state is then updated on the blockchain to

98

C. Fletcher-Smith and M. Sallal

Fig. 1. UTXO Directed Acyclic Graph

reflect the change [11]. Ethereum approaches smart contracts by isolating them under their own accounts. Both externally owned accounts (wallets) and contract accounts contain a public address, nonce, balance, storage hash and code hash. These account types differ in that the code and storage in wallets are empty, and contract accounts do not use private keys (because control is automated by the contract itself) [12,13]. 3.2

Sharding

Transaction throughput is difficult to scale at L1 since the requirements for participation in consensus produces high latencies with the involvement of numerous validators for sequential transaction processing. To solve this, sharding has been proposed by researchers as key feature to improve ‘scale-out’. We define scaleout as the ability to continuously scale linear to network growth. Sharding is the process of segmenting certain network operations within the blockchain to increase performance. There are three types of sharding: (i) network sharding, (ii) transaction sharding, and (iii) state sharding. This section will provide an overview, detailing the motivations of each and their associated challenges. Network Sharding. Network segmentation is fundamental to both state and transaction sharding. The network is segmented into shards representing a smaller number of network participants. The challenge associated with this is maintaining BFT within isolated subnets. If the pre-shard network is designed to withstand Byzantine threats comprising 51% of participants, producing shards representing 10% of the network raises the threat of decentralised security mechanisms being inefficient in this centralised distribution [14]. If malicious actors overload individual shards operating local consensus, blockchain validity could be undermined by Sybil attacks. Sybil attacks can occur when an individual

Security Analysis of Blockchain Sharding Based EUTxO

99

actor represents more than the consensus threshold, allowing the malicious validation of transactions [15]. For this reason network sharding is a delicate process requiring unbiased mechanisms to dictate segmentation. Transaction Sharding. Transaction sharding is the process of segmenting the transaction pool and assigning transactions to the aforementioned network shards for parallel processing. Network shards validate transactions by achieving consensus independently, allowing transaction throughput to be increased by the factor of active shards. Transaction parallelism increases scalability, with the potential for linear scale-out. Isolated consensus may result in multiple submissions of the same transaction to different shards, resulting in double spending [14]. Atomic commit protocols and shard communication are critical to mitigate these threats and avoid conflicts [16]. The mechanism used to assign transactions to shards must be clear and without conflict. State Sharding. State partitioning segments blockchain storage into smaller states, assigning partition maintenance to network shards. State distribution reduces strain associated with ledger storage which can reduce computational overheads associated with validator internal processes. Furthermore, operating on segments of the global state results in smaller transmission messages when bootstrapping nodes per epoch. Bootstrapping is the process of updating validator nodes, occurring on node reassignment [15]. 3.3

EUTXO Model

The extended-UTxO model is an improvement on the UTxO model introduced by Nakamoto in Bitcoin. eUTxO introduces smart contract implementation which is not applicable in UTXO models previously. When smart contracts are compiled, a binary output is produced. This output is hashed, and used as an address to the contract, on-chain. Script addresses, do not have a public/private key pair. This is conceptually similar to smart contract accounts, in the account model, not having private keys to control transactions.

4

Related Work

This section focuses on the research related to scaling the transaction throughput and latency in blockchain systems. Typically there are two categorical approaches to this, off-chain processes at layer-two (L2) and on-chain scaling at layer-one (L1). L2 scaling leverages smart contracts to offload transactions onto specialised side-chain infrastructures to increase transaction throughput. In this model, sidechains process transactions, and store the proof of validity on the main network, thereby inheriting the security and decentralisation of the parent chain.

100

C. Fletcher-Smith and M. Sallal

L1 processes focus on adjustments and optimisation to be placed on the peerto-peer operations, consensus mechanisms and underlying ledger state model of the main network. Sharding is the main approach for achieving significant scaling at L1 through network segmentation and network parallelism. This section mainly focuses on research related to L1 sharding and scalability. 4.1

Elastico

Luu et al. proposes Elastico, a Nakamoto-style sharding approach to network and transaction parallel computation [17]. Operating in epochs with interludes of node reassignment, network segmentation is intended to infinitely scale proportionate to network growth. Global state is maintained by a single randomly elected shard. Node assignment is based on verifiable PoW identity generation, randomising allocation and constraining actors’ ability to overwhelm shards. Assignment occurs uniquely at random depending on the value of predetermined bits in the output hash from the proof-of-work algorithm [17]. While scaling increases linearly, local shard BFT is sacrificed. 4.2

Omniledger

The works of Kokoris-Kogias et al. detail a fault tolerant protocol, incorporating concepts from Elastico [16]. The main research contributions presented are: (i) cross-shard transaction atomicity, mitigating double spending and conflicts in global state maintenance; and (ii) verifiable random functions (VRF) for reassignment. Instead of binding distributed randomness to a biasable PoW function for node assignment, OmniLedger incorporates RandHound which relies on the temporary election of a central coordinator. Each node generates a VRF ticket using their private key and broadcasts it throughout the network, they then accept the creator of the lowest value ticket as the temporary coordinator to run RandHound [16]. The coordinator generates a seed dictating the parameters of the next epoch, based on the input VRF tickets distributed from the network. Therefore, epochs can be dictated democratically and verifiably without assumptions being made about the quality of coordinator [18]. 1. Initialisation: Coordination is leveraged client-side where a final commit shard is designated, and the transaction is submitted to multiple input shards. 2. Lock: The input state is locked in each shard and validators execute an internal consensus function to either accept or reject the transaction. A verifiable hash traceable to the shard is produced, specifying either proof-of-acceptance or proof-of-rejection 3. Unlock: The client now has the power to commit once or abort the transaction. If the client holds proof-of-acceptance from all input shards, they issue an unlock-to-commit message containing the proofs and initial transaction. The proofs are verified and the input state is unlocked for the transaction to be committed. If one or more shards returns proof-of-rejection, the client forwards this to all input shards where the state is unlocked and returned for

Security Analysis of Blockchain Sharding Based EUTxO

101

future transactions [16]. If shards become unresponsive post-lock, the transaction is aborted at the end of the epoch. 4.3

RapidChain

Operating in epochs of reassignment, RapidChain is a synchronous protocol with atomic-commit functionality where state is partitioned between shards [15]. Cross-shard transactions rely on shard coordinators and input/output shards alongside a reference committee that generates epoch randomness [19]. Epoch randomness dictates the assignment of nodes and state within an epoch. Generated transactions are sent to a random node then forwarded to the appropriate output shard concerned with maintaining the state output of that transaction. The output shard coordinator gossips the transaction to shards responsible for the input state where consensus will be achieved in each, similar to Omniledger’s approach [15]. In principle, the output shard is acting as a trusted party, although the power of commit relies on compliance from the input shards [19]. 4.4

Chainspace

Similar to Omniledger, Chainspace offers cross-shard transactions [20]. As opposed to leveraging the client for temporary coordination, Chainspace’s SBAC protocol, a combination of Byzantine agreement and atomic commit, relies on peer-to-peer communication between shards. Operating an account model, immutable objects are used to represent the state of wallets on-chain, when transactions are approved the object is destroyed and re-initiated with the updated balance [21]. Transactions are conceptually similar to UTXO implementations, requiring an input object, which is destroyed, returning the new output object. In this case the entire account is used as an input, as opposed to a specific unspentoutput. Transactions are sent to input-shards maintaining the associated object’s state, each shard runs consensus, communicating the result of their individual validation between each other. The associated shards then run another round of consensus to inactivate the input-objects, before forwarding the transaction to a commit shard where the object is re-initialised [22]. 4.5

Monoxide

Monoxide performs network segmentation, for transaction parallelism, and state partitioning, reducing storage overheads [23]. Account states are assigned to shards based on parameters on the bits of an output hash from the account’s public key. Transactions affecting accounts split between shards rely on relay cross-shard transactions to update the account states in the neighbouring shard [23]. Consider a transaction from Alice to Bob, where Alice’s account is maintained in shard A and Bob’s in shard B. Alice derives the shard address from her public key, sending her transaction to the location storing her account. Shard A performs consensus and updates Alice’s balance, forwarding a relay transaction containing proof of consensus to shard B where Bob’s balance is updated.

102

4.6

C. Fletcher-Smith and M. Sallal

Zilliqa

Zilliqa employs network and transaction sharding on an account-based design [24]. Shards performs a PoW function determining their shard assignment. Operating a reject-and-retry mechanism, Zilliqa adopts security benefits associated with cross-chard transactions without the atomic-commit complexity found in similar projects. This is achieved through parameters based on sender addresses defining what transactions shards process. If transactions are sent to the wrong shard, nodes return a reject-and-retry response. This approach mitigates doublespending attacks without implementing a complex coordination mechanism for cross-shard consensus. Blocks are committed to a directory shard, that maintains the global state, in the form of “macroblocks” [24]. Signature aggregation combines digital signatures from all validating nodes, into one smaller signature and is sent alongside the macroblock. The aggregated signature is verified through reconstruction using the public keys from the macroblock’s validator nodes [24]. Aggregation lowers network overheads by eliminating network traffic associated with nodes sending digital signatures individually. The directory shard does not reiterate the same validation, instead it verifies the aggregated signatures from validating nodes. Zilliqa does not allow for smart contract parallelisation since transactions affecting the same contract are processed sequentially. As the network grows, and adoptions increases at L2, dApps become bottlenecked to the processing power of a singular shard, as opposed to their optimisation requirements [25]. 4.7

Extended-UTXO

This section covers the designs of state models categorised under the extendedUTXO umbrella. We do not review sharding approaches here as network segmentation has not yet been implemented at L1 on an extended-UTXO blockchain. Cardano. In the works of Chakravarty et al. a smart contract-capable implementation of the traditional UTXO model was proposed for the Cardano platform [10]. This extended-UTXO model introduces logic functionality, traditional to account-based implementations, to the UTXO model. In principle UTXO models operate a lock and key mechanism where public keys lock transaction ownership, and verifiable signatures provide the key to unlock and spend associated outputs. In eUTXO, addresses may contain logic that fulfills the same purpose as signatures (keys) by specifying the conditions for spending locked outputs [10]. Transactions may contain arbitrary data called the datum, alongside outputs, that can determine how the logic at the output address operates when the funds are used as an input. Contract logic defines additional parameters regarding how outputs can be used, these parameters are checked at validation when a transaction uses the locked output. When the output is consumed as an input, additional parameters are passed within the transaction. These parameters are conceptually similar to function arguments; we call this the redeemer [26]. Redeemers dictate the parameters the contract operates under, and dictates

Security Analysis of Blockchain Sharding Based EUTxO

103

how the logic will execute when validated. Imagine a scenario where assets are locked on a vesting schedule, and users can periodically unlock their assets after a period (specified in the datum) has elapsed. The redeemer arguments could be ‘claim’, signifying the user wishes to unlock their assets. This argument tells the contract what to do if the request is valid. In this case, the request is valid if a predefined period has elapsed since the assets were initially locked, and the user is entitled to unlock them. This implementation allows for the validity of transactions to be checked off-chain since the dependencies (inputs, datum and logic) are predefined. Therefore, transaction success is guaranteed if dependencies are unchanged upon validation on-chain. Since the blockchain is in motion transactions may not always succeed due to the state dependencies being invalid by the time validation occurs. The nature of transactions only being dependent on themselves, and their inputs protects them from external on-chain states that may causes unexpected consequences [10]. Ergo. Ergo implements boxes into their state model, which are conceptually similar to unspent outputs [27]. Each box is registered to an address under the control of a private key. Boxes can only be used in an operation once, with the only possible operation being a transaction. Once used, they are marked as spent and new output boxes are initiated [28]. Each box contains 10 registers, 4 are designated for mandatory data and always remain full while the remaining 6 are reserved for client-defined data and may remain unused [27]. Mandatory data registers include the monetary value, serialised script protecting the box, asset types stored in the box, and transaction information. Transaction information includes (i) the creation size; (ii) a unique identifier associated with the transactions creating the box; and (iii) index of the box in the outputs created by the transaction [29]. Transactions in Ergo can take multiple inputs and create multiple outputs, essentially bundling transfer processes together. Nervos. Nervos operates using a hybrid UTXO account-based adaptation most easily classified under the niche eUTXO umbrella. Instead of using unspentoutputs, which can be restrictive on data structures, Nervos employs cells to act as the inputs and outputs of transactions [30]. Cells are immutable and contain arbitrary data, which could be state (such as tokens) or executable logic. Contrasting account-model implementations, where state is the internal property of a smart contract, cells can reference data in other cells, allowing for assets and governance logic to be separated. Application logic is split into two phases, generation and verification, running in different places [31]. This allows for additional algorithmic flexibility between phases. The generation of new state (transaction) is run locally, client-side, and broadcast to the network. Assets stored in cell states must follow associated logic, specified by scripts. Verification assesses the generated state variables stored within the transaction. State variables include a reference to the previous cell storing the state which can only be used once as an input, and a signature verifying the client owns the input. Associated script logic dictates the

104

C. Fletcher-Smith and M. Sallal

parameters of cell usage and is executed by validators locally within the virtual machine (VM). If transaction variables coincide with the parameters of use, the validation is successful. The VM executes the type script upon cell creation to ensure the state is valid under certain parameters. The lock script is executed taking proof arguments when the cell is referenced as a transaction input [31]. Cells used in transaction inputs are removed from the active set of states, with output cells being created and included in the amended active global state. In this approach, similarly to UTXO projects, transactions act as the proof for state transition [30].

5

The Proposed Model

The proposed model, named as S-EUTO, is a novel smart contract-capable protocol, with state partitioning and transaction parallelism on an extended-UTXO model. The transaction pool is segmented and assigned to specific shards based on certain transaction parameters (see Sect. 5.5), and validated in accordance with the local shard state. Figure 2 illustrates the various layers of this protocol, from the transaction pool to validation in accordance with the UTXO directed acyclic graph. The following system model can be used as a reference point to understand, in high levels of abstraction, how the mechanisms fit together. In Sect. 5.4 we illustrate the smart contract architecture implemented, and discuss the relationship between contract parameters and output states. In Sect. 5.5 we examine the unique challenges of applying sharding alongside eUTXO, discuss how the aforementioned contract architecture works contextually to sharding and provide insight into the mechanisms dictating node distribution.

Fig. 2. System Design Architecture

Security Analysis of Blockchain Sharding Based EUTxO

5.1

105

Addresses and Ownership

Externally owned addresses have three key attributes: address, public and private key. UTXOs are verifiably associated with addresses and controlled by the user’s private key. To generate key pairs, we used RSA asymmetric cryptography algorithm from the cryptography.hazmat.primitives.asymmetric libraries. We set the padding. PSS salt used in cryptographic processes to max length, ensuring encryption remains resilient to reverse engineering [32]. Each key pair is 2048bits in size, allowing for slightly shorter verification times compared to 3072bits or 4096bits, with the public exponent set to e = 65537 for compliance with cryptography standards [22]. 5.2

Transaction Structure

This section outlines the transaction data structures, digital signatures and authenticity requirements, and input output calculation. Transaction identifiers are unique integers generated by validating nodes before block minting. Identifiers are referenced in the transaction input field of a subsequent transaction if the corresponding unspent output identified fulfills expenditure parameters. The sender address defines the transaction origin wallet and is used to find the associated sender public-key. The recipient address dictates the new ownership wallet controlling the specified assets upon transaction approval. The token field specifies the unique identifiers associated with asset types; this dictates which asset the output field represents and the parameters of which inputs can fulfil the transaction. The output field represents the quantity of tokens being transacted. A digital signature contained in each transaction is produced by the sender using their private key and wallet address as input. The script, datum hash, datum and redeemer attributes are smart contractdependent fields that remain empty in normal transactions. Digital Signatures and Authenticity. Digital signatures are produced using the sha256 hashing algorithm, predefined input, padding, and private key of the sender. The predefined input is a UTF-8 encoded byte string, produced from the value of the sender’s address. All participants know this value and verify the authenticity of transactions using this input and the sender’s public key. Input Declaration. Input calculation occurs in the validation function run by validating nodes. The node reads the output history associated with the sending address of a new transaction. It compares each transaction identifier from the history with the spent outputs stored in virtual memory. If the unique identifier does not match any spent object, and the output fulfils the input requirements of the new transaction, the validator appends the unspent output identifier as the input of the new transaction.

106

C. Fletcher-Smith and M. Sallal

Output Calculation. Output calculation by default is defined client-side by the sender. The exception to this is if the validator selects an input exceeding the specified output amount. In this case, a return transaction containing the value difference between the input/output is created and validated alongside the new transaction. The return transaction recipient is the sender of the new transaction (the owner of the input exceeding the output value). 5.3

Validity Conditions

When nodes validate the entire chain state, the bytes of each block are calculated, and the recreated output hash checked against the previous hash pointer of the next block. This ensures nodes can determine chain validity, and identify conflicts. In the event of conflict, there is no mechanism for resolution, such as hard forking or state roll-back, although this may be implemented in future research. Before appending blocks, the chain validity is verified by computing the hash pointers, and the block is verified according to the ledger. The verification process for new blocks encompasses each validator verifying the included transactions according to the transaction validity conditions. Validators independently execute these validity functions, before aggregating votes and achieving consensus according to the consensus parameters. Transaction Validity. Transaction validation is determined based on the validity of the input used. Inputs are valid if four key parameters are met: 1. 2. 3. 4.

The input transaction is owned by the sender’s address The input UTXO is ≥ to the specified output The digital signature is verifiable to the sender’s key The input identifier does not match any spent input previously used on-chain

Validators verify the signature by extracting the public key associated with the sender and creating a hash using the predefined global input parameters. This hash is checked against the digital signature included in the transaction. Input validity is calculated by isolating the transaction history associated with the sender’s address and iteratively locking a received transaction. Once locked, the spent transaction history is filtered for spent UTXOs associated with the sender. The locked transaction’s identifier is checked against each associated spent transaction. If no matches are found and the UTXO fulfils the output requirements the validator deems the input valid. If the locked transaction matches the identifier of a spent transaction, or does not fulfil the output requirements, a new transaction is locked and the process is repeated. These conditions are mandatory for all transactions aside from validatorgenerated return transactions, which return any remaining assets from an input

Security Analysis of Blockchain Sharding Based EUTxO

107

back to the sender. It was unnecessary for additional complexities related to return transactions (such as signatures), since the return transaction is the direct output of an already validated transaction. While these conditions are true across all transactions, transactions referencing smart contracts may have additional parameters necessary for expenditure Sect. 5.4. Consensus Parameters. Consensus parameters define how independent nodes perform validation functions and agree on the result in a Byzantine network. Transaction validation and block population occurs on the minting node; upon reaching the threshold for block size, blocks are broadcast to validators. Validators check the blockchain state validity, validate each individual transaction, and reach a binary conclusion of true or false. This conclusion is broadcast to consensus participants; if the aggregation of all votes is above the consensus threshold the network is updated with the new state. 5.4

Smart Contract Architecture

Smart contracts are deployed under contract addresses. Unlike wallets smart contracts are not controlled via a public private key pair, since assets are only conditionally bound to the address the transaction references, not owned. Contract addresses contain immutable logic, stored on-chain. Contracts can be referenced by transactions between network participants, thereby altering the parameters of output spending [33]. Smart contract logic only impacts how associated inputs are used. Transactions can be sent from one participant to another without the requirement of meeting script parameters. When the recipient of a transaction referencing contract logic spends that output, additional parameters are applied to the validation based on the script referenced. This is accomplished through the transaction attributes: 1. 2. 3. 4.

script datum hash datum redeemer

If the sender of a transaction wishes to apply additional parameters to the output, the smart contract address is referenced in the script field. We refer to the sender as the initiator, since they instigate the transaction. Arbitrary data is passed in the datum field, which can dictate the parameters the output is used under, while the datum hash verifies the signature of the sender and the arbitrary data as being authentic. The datum hash is always necessary when referencing a contract, while the full datum can optionally be included in the initiator transaction (contract dependant). The initiator transaction follows standard validation requirements. The transaction recipient is called the redeemer, and must meet contract parameters to redeem and spend the output. Redeemers must include the script

108

C. Fletcher-Smith and M. Sallal

reference, their datum hash, the full datum, and additional script arguments in the redeemer field. By verifying the datum hashes provided by the initiator and the redeemer, the script ensures the data is authentic. The full datum is optional in the initiator transaction but not the redeemer transaction, as the script must be passed the full datum at some stage to compute datum hash validity. As a method of privately passing data on-chain to the redeemer, the initiator can encrypt the datum using the redeemer’s public key. The datum may also be withheld by another contract for a vesting period, locking the redeemer’s assets. Finally, the redeemer must include script arguments in the redeemer field, determining how the input should be used. Parameters depend on the structure of contract logic and the expected script arguments. The architecture of these transactions is illustrated through an expansion on the directed acyclic graph in Fig. 3.

Fig. 3. S-EUTO: Smart Contract Architecture

Example. Exemplifying this process, contextual to a currency exchange would be: the initiator specifying the amount while the redeemer specifies the output asset class. Logic Incorporation and Spending Conditions. Smart contract logic sits above on-chain validation functions in that primary validation must return true, before smart contract validation occurs. In the result a transaction references a smart contract, both the primary validation function and contract logic must return true for the transaction output to be spent. 5.5

Network Sharding

Network segmentation in S-EUTO assigns network participants to specific validation groups. Each node maintains a routeing table detailing all other nodes

Security Analysis of Blockchain Sharding Based EUTxO

109

within their specified shard used for communication associated with distributed consensus functions. Introducing sharding to eUTXO presents some additional challenges that must be considered in the design architecture of the system. One key consideration in any sharding protocol is the maintenance of the global state, ensuring individual shard state does not folk between epochs of bootstrapping and redistribution. Maintaining the state is critical in ensuring UTXOs are either spendable or spent across all shards, and smart contracts logic accesses the most recent version of the ledger to avoid exploitation. To maintain the eUTXO state, we propose the use of state partitions, where individual shards maintain the state of a set amount of external addresses. To enable shards to distinguish between transactions in the pool, the public key of a transaction sender determines the shard responsible for validating the transaction. This can be defined before submission with public data from the blockchain state, calculated by the most recent redistribution epoch. Considering that redistribution is determined by distributed randomness, transaction to validator assignment also maintains randomness. If a transaction reaches an incorrect shard, the recipient node will respond with an error code, indicating transaction rejection. This mechanism ensures UTXO transactions and shards have a uniform way of interacting with each other, and transactions reach the appropriate validators. Shards associated with the state of a transaction sender will achieve consensus on the validity of that transaction, these are referred to as input shards. Upon reaching consensus validator signatures are aggregated and sent as proof to the output shard that maintains the state associated with the recipient address. The output shard achieve consensus on the validation proof, and generates a UTXO tied to the recipient. To facilitate smart contract transactions, our model requires that shards maintaining the state of smart contract address also receive proof of validation from the input shard. This enables smart contracts to update their state with the transaction data critical to the instance. When redeeming a smart contract transaction the shard maintains the newly generated output, produced by the initiating transaction, achieves consensus on the output spending then forwards the transaction to the smart contract shard where additional parameters are applied, in accordance to the contract state. The smart contract shard then responds with proof of validation, allowing the redeeming transaction to continue in validation. The approach effectively makes a shard that is responsible for an associated contract state aware of the existence of a transaction, while ownership is still withheld in the output shard maintaining the state of the redeemer’s address. We illustrate this process in Fig. 4. Shard Assignment and Node Redistribution. To determine each shard’s participant nodes, we propose a distributed randomness function (DRF) to produce unbiased node allocation. DRF executes within each shard independently, operating in iterations equal to the number of nodes per shard. Each iteration operates in two phases: argument collection, and ticket generation, requiring

110

C. Fletcher-Smith and M. Sallal

Fig. 4. Cross-shard Smart Contract Transactions

three participants: a temporary coordinator node, an operator node, and subject node. At the start of the function, each node randomly generates an input ticket and an operator argument. A coordinator node is elected each iteration based on tickets generated from the previous epoch. In the argument collection phase, the coordinator randomly selects two nodes from the shard route table and defines each of them as either the subject node or operator node for this iteration, broadcasting the selection to the network. Upon receiving the broadcast message, the reallocation node responds by broadcasting their input ticket while the operator broadcasts their argument back to the coordinator. In the ticket generation phase, the coordinator randomly selects one of the DRF predefined functions, which takes the provided inputs and aggregates them, producing the final output ticket for the reassignment node. The output ticket, function and inputs used in its generation and associated reassignment node are broadcast to the shard. Shard participants external to the process observe all stages through the message broadcasts, allowing for traceable and verifiable generation. At the end of the iteration, direct participants are removed from potential selection for the role they filled. Throughout the DRF execution all nodes act as the operator, coordinator, and reassignment node once. On DRF completion, all tickets are broadcast to the entire network, where node reassignment occurs based on the byte value of tickets. Three iterations of this process is visualised in Fig. 5. Dotted lines indicate broadcasts, while full lines indicate specific communication between iteration participants. Based on initial simulation results, we further optimised this process by implementing overlapping iterations. In this overlap feature, the next iteration starts as soon as the next predefined coordinator (based on tickets) receives the initial broadcast from the current coordinator. It may be possible to execute iterations in parallel, with nodes performing all roles simultaneously, although we leave this to future research.

Security Analysis of Blockchain Sharding Based EUTxO

111

Fig. 5. Iterations of Distributed Randomness Function

6

Evaluation Methodology

The evaluation was conducted on quantitative data generated through simulations. The measurements we identified in this research were redistribution latency (RL ) and number of malicious validations. Based on link latency data extracted in [34,35] from the Bitcoin network we define the edge-to-edge latency at 2500 ms. We used this data to inject realism into our network environment since the data was extracted from a blockchain with comparable implementations to our model, such as transaction size Ts and state model [36]. We extracted measurements under the following variable parameters: s = number of shards m = number of malicious nodes Sn = nodes per shard Bs = block size Tpool = transaction pool (network usage per second) Equation 1: We factored in the impact of distance between participants and message size when calculating CL . The distance between participant nodes was calculated using the following equation: √ d = ((x2 − x1 )2 + (y2 − y1 )2 )

112

C. Fletcher-Smith and M. Sallal

The simulation environment maintained an edge-to-edge distance of d = 50.9 with a propagation delay of 2500 ms. Therefore we assume d = 1 to have a latency of 49 ms. We introduced artificial serialisation delay to evaluate the relationship between Bs and CL . Serialisation delay Sd is latency associated with serialising a packet into a transmissible format and waiting in a traffic queue for appropriate bandwidth to transmit. We assumed Sd to incur 5.12 µs delay per 64-byte packet under a throughput of 100 Mb/s [37]. Equation 2: Block size Bs was calculated using the following formula: Bs = Tpb (Ts ) + Ebs where Tpb is transactions per block, Ts is transaction size and Ebs represents the empty block baseline size based on mandatory attributes such as index, timestamp, and hash. Method 1: We evaluated our intra-shard consensus by implementing malicious transactions (MT ) alongside introducing malicious nodes. MT were illegitimate for one of several reasons: 1. invalid output value 2. wrong signature input 3. wrong signature private key These values were selected randomly before transaction submission to assess various parameters of the validation function. Equation 3: We used the following equation to represent the threshold of malicious nodes the network can securely contain under various levels of shard deployment. m Gr = m ≤ (n) n 6.1

Simulation Structure

We considered discrete event simulation (DES) where state is constant between events since it is commonly applicable for computer systems since the state entirely depends on events, indicating changes in the system. In practice, simulations progress directly to the next event, bypassing the actual time taken to complete the processes [9]. Since blockchain state remains unchanged between processes we implemented DES to model the blockchain system, allowing complex processes to be expressed as simple sequential event cycles, simplifying mechanisms such as consensus and redistribution. Such events included transaction propagation, redistribution iterations, and consensus processes [2,38]. 6.2

Experiment Setup

Malicious node thresholds determined the percentage of the network an adversary controlled. This was used to identify shard failure and the overall level of

Security Analysis of Blockchain Sharding Based EUTxO

113

Table 1. The percentage of the network an adversary controlled where m represents the malicious nodes, and n represents the network size(total number of nodes.) Malicious Parameters m = 0.1 m = 0.2 m = 0.3 m = 0.4

10% 20% 30% 40%

of of of of

n n n n

Table 2. Five levels of shard deployments where s represents number of shards, and Sn represents the percentage of nodes per shard. Shard Deployments s=4 s=8 s = 12 s = 16 s = 32

Sn Sn Sn Sn Sn

= 0.25(n) = 0.125(n) = 0.083(n) = 0.0625(n) = 0.03125(n)

Byzantine fault tolerance in the network. We did not implement a threshold above 50% since Byzantine fault tolerance would be compromised by default. Thresholds are detailed in Table 1. We introduced five levels of shard deployments ranging from conservative to extreme. This was used in conjunction with the malicious thresholds to identify the baseline adversary model required to cause shard failure at each deployment, outlined in equation (Equation 3). The parameters of shard quantities s and nodes per shard Sn are detailed in Table 2. Network usage was defined as the number of transactions generated by clients and sent to the transaction pool Tpool per second. Usage thresholds were used in conjunction with Bs and s deployment parameters to generate transaction latency data using method (M2). These were the following parameters: Tpool = (1000, 2500, 5000, 10000, 20000). Each test-case was run across 10 epochs of redistribution, ensuring statistical relevance, with varying unique parameters per simulation. Latency tests were executed with 1000 measured transactions per epoch. Baseline Data. This section outlines how we defined the baseline data to calculate the measurements used in our evaluation. We used the sys.getsizeof package to identify Ts and Eb s in bytes of the associated data structures. We identified the baseline block size as Eb s = 232 (bytes) and the transaction size as Ts = 360 (bytes). This was necessary to determine latency incurred in block and transaction propagation and produce accurate simulation results. This

114

C. Fletcher-Smith and M. Sallal

data provided a baseline to determine the block capacity, outlined in equation (Equation 2). We used the same method to identify the sizes of messages per iteration of our distributed randomness function. We had four key messages: coordinator assignment message Cassignment , subject response Sresponse , operator response Oresponse , and coordination ticket issuance Cticket . Smart Contract Test Structure. We tested our extended-UTxO smart contracts by probabilistically instigating contract instances throughout the simulated network usage. This was achieved by modifying a preset transaction, before being submitted by the client, to initiate an instance. The initiating transaction referenced the associated test script and was validated in accordance with normal parameters. After a smart contract exchange was started, and the initiating transaction was distributed on-chain, another preset transaction would be modified and sent to redeem the funds and close the instance. The redeeming transaction would specifically select the unspent-output sent by the initiator as an input, and apply redeemer arguments, the full datum, and datum hash. In our test instance, the datum was resolved by decrypting the datum field in the initiator transaction. Upon decryption the redeemer transaction provides the full datum for the script for validation. Alongside baseline validation parameters, contract logic is executed on the redeemer transaction. The validator node computes the datum hash validity of both participants, using their public keys, and the full datum provided by the redeemer. Bob knew the full datum since Alice encrypted it in the on-chain initiator transaction. Assuming datum signatures are valid, ensuring data authenticity, the script unlocks Bob’s assets in USD as specified by the redeemer arguments.

7 7.1

Evaluation Results

This section details our simulation results which have been collected by running methods outlined in Sect. 6 within the experiment structure introduced in Sect. 6.2. Several measurement results are collected based on shard failure rate, redistribution latency, transaction throughput, and transaction latency. We use the following Definitions to illustrate the results of the tests outlined in Sect. 6 s =number of shards n =total nodes m =malicious nodes h =honest nodes mr =malicious rate

Security Analysis of Blockchain Sharding Based EUTxO

115

Fig. 6. Average Distribution: 8 Shards, 40 Malicious Rate

Shard Failure Rates. We define shard failure as performance that compromises the integrity of the global network, resulting in the authorisation of illegitimate transactions. We expect to see the overall Byzantine fault tolerance of the network decrease linearly to shard scale-out and malicious adversary growth. In ambitious shard deployments, we predict growth in malicious nodes will have an increasingly detrimental effect causing observable shard failure. We discuss peak concentrations and outliers, while we provide figures illustrating average distributions across 10 epochs where necessary. 4 Shards: The distributions of nodes under s = 4 were well beneath the consensus threshold in all shards across each epoch. The peak concentration was observed in s1 (E6 ) under mr = 0.4, at h = 77 and m = 67 with m comprising 46.5% of the shard participants. 8 Shards: We observe the distribution of s = 8 to be within a safe threshold across all epochs although peaks were found at mr = 0.4 in s5 (E4 ) where m comprised 44.7% of the shard. The average distributions at mr = 0.4 are illustrated in Fig 6. 12 Shards: Distributions under s = 12 became more concentrated around mr = 0.3 and mr = 0.4. High malicious concentrations in mr = 0.3 were observed in s1 1 (E1 ) and s1 1 (E6 ) where m was 9.5% higher than mr comprising 39.5% of the shard (see Fig 7). In mr = 0.4 we started to see increased frequency in concentrated distributions per epoch as the average baseline was now observed to sit at similar levels as the peak outliers in mr = 0.3 (see Fig. 8). Peak concentrations were recorded where m comprised 47.9% of a shard across 6 epochs.

116

C. Fletcher-Smith and M. Sallal

Fig. 7. Average Distribution: 12 Shards, 30% Malicious Rate

Fig. 8. Average Distribution: 12 Shards, 40% Malicious Rate

16 Shards: Disproportionate malicious concentration was observed earlier than previous deployments, at mr = 0.2. We find that concentration was 10.5% higher than mr in s2 (E1 ) and s7 (E6 ) at 30.5%, peaking once at h = 23 and m = 13 in s14 (E1 ) with m comprising 36.1% of the shard participants (see Fig 9).

Security Analysis of Blockchain Sharding Based EUTxO

117

Fig. 9. Average Distribution: 16 Shards, 20% Malicious Rate

Fig. 10. Average Distribution: 16 Shards, 30% Malicious Rate

In our simulation of mr = 0.3 we observe m concentration 14.4% higher than mr in s6 (E1 ), s3 (E5 ) and s11 (E6 ) at 44.4%, and peak concentrations at h = 19 and m = 17 in, s3 (E6 ) at 47.2% (see Fig. 10).

118

C. Fletcher-Smith and M. Sallal

Fig. 11. Average Distribution: 16 Shards, 40% Malicious Rate

Shard failure, where m ≤ 12 (n), was observed on 14 occasions in the deployments with mr = 0.4 across 9 epochs, peaking at 58.8% m concentration in, s2 (E1 ),s3 (E3 ) and s1 (E10 ). An equal distribution, where m = 12 (n) was observed on 10 occasions across 7 epochs (see Fig. 11). 32 Shards: No specific malicious concentration was recorded at mr = 0.1 although from mr = 0.2 onwards outlier distributions represented a disproportionate percentage of individual shards. In our deployment at mr = 0.2 we

Fig. 12. Average Distribution: 32 Shards, 20% Malicious Rate

Security Analysis of Blockchain Sharding Based EUTxO

119

observe peak m concentration to be 22.4% higher than mr in s1 7 (E1 ), s26 (E4 ) and s7 (E8 ) at 44.4% (see Fig. 12). At mr = 0.3 we recorded 7 instances of shard failure across 4 epochs. This resulted in 25.5% higher concentration than mr with m representing 55.5% of s10 , s12 , s32 in E8 and s1 , s2 , s3 E9 , peaking at 61.1% concentration in s19 in E7 (see Fig. 13).

Fig. 13. Average Distribution: 32 Shards, 30% Malicious Rate

At mr = 0.4 we observe 45 accounts of shard failure across all epochs. Peaking at 72.2% m concentration (32.2% higher than mr ) in s15 (E1 ), s30 (E6 ) and s25 (E7 ) (see Fig 14).

Fig. 14. Average Distribution: 32 Shards, 40% Malicious Rate

Peak Concentrations. Figure 15 illustrates the peak concentrations of m across all s deployments and mr variations.

120

C. Fletcher-Smith and M. Sallal

Fig. 15. Peak Malicious Concentrations

The following table (Table 3) details our observations of peak concentrations across each shard deployment. Table 3. Peak concentrations across each shard deployment where mr represents the variations, and s represents the number of shards

8

mr s = 4

s=8

s = 12

s = 16 s = 32

0.1 0.2 0.3 0.4

18.05% 30.5% 40.27% 44.7%

18.75% 33.33% 39.5% 47.9%

25% 36.6% 47.2% 58.8%

15.27% 25.69% 37.5% 46.6%

33.33% 44.4% 61.1% 72.2%

Conclusion

Through the research conducted in this study we identified a lack of layer one scaling solutions being implemented on extended-UTXO state modelled blockchains. We proposed S-EUTO, a novel proof of concept sharding protocol, which incorporated an extended-UTXO state ledger, bias-resistant distributed randomness function, smart contract framework, and sharded network to achieve transaction parallelism. Despite slight sacrifices regarding the size of

Security Analysis of Blockchain Sharding Based EUTxO

121

malicious adversaries the network can withstand, our findings illustrated a significant increase to scalability brought about when sharding is implemented on an eUTXO blockchain. This was accomplished without sacrificing state properties such as traceability, anonymity and ownership, while data immutability was maintained. Shard allocation involved aspects of centralisation, however the impact was managed by distributing elements of centralised trust equally throughout the network over various iterations, thereby minimising the impact of central authorities to a purely semantic observation. Security evaluation determined that S-EUTO is resistant to the mlacious nodes distribution across Shards based Eutxo model. Certain limitations were outlined in that additional complexities associated with tracking state validity may cause higher computational overheads, although this was ultimately concluded as comparably insignificant.

References 1. Nakamoto, S.: Bitcoin whitepaper (2008) 2. Sallal, M.F.: Evaluation of Security and Performance of Clustering in the Bitcoin Network, with the Aim of Improving the Consistency of the Blockchain. PhD thesis, University of Portsmouth (2018) 3. Golosova, J., Romanovs, A.: The advantages and disadvantages of the blockchain technology. In: 2018 IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), pp. 1–6 (2018) 4. Sallal, M.., et al.: Vmv: augmenting an internet voting system with selene verifiability, arXiv e-prints, pp. arXiv-1912 (2019) 5. Sallal, M., Owenson, G., Adda, M.: Security and performance evaluation of master node protocol in the bitcoin peer-to-peer network. In: 2020 IEEE Symposium on Computers and Communications (ISCC), pp. 1–6. IEEE (2020) 6. G¨ obel, J., Krzesinski, A.: Increased block size and Bitcoin blockchain dynamics. In: 2017 27th International Telecommunication Networks and Applications Conference (ITNAC), pp. 1–6 (2017) 7. Worley, C., Skjellum, A.: Blockchain tradeoffs and challenges for current and emerging applications: generalization, fragmentation, sidechains, and scalability. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1582– 1587 (2018) 8. Delgado-Segura, S., P´erez-Sol´ a, C., Navarro-Arribas, G., Herrera-Joancomart´ı, J.: Analysis of the bitcoin UTXO set. In: International Conference on Financial Cryptography and Data Security, pp. 78–91 (2018) 9. Sallal, M., de Fr´ein, R., Malik, A., Aziz, B.: An empirical comparison of the security and performance characteristics of topology formation algorithms for bitcoin networks. Array 15, 100221 (2022) 10. Chakravarty, M.M.T., Chapman, J., MacKenzie, K., Melkonian, O., Jones, M.P., Wadler, P.: The extended UTXO model. In: International Conference on Financial Cryptography and Data Security, pp. 525–539 (2020) 11. Golosova, J., Romanovs, A.: BlockMaze: an efficient privacy-preserving accountmodel blockchain based on zk-SNARKs. IEEE Transactions on Dependable and Secure Computing (2020)

122

C. Fletcher-Smith and M. Sallal

12. Tikhomirov, S., Voskresenskaya, E., Ivanitskiy, I., Takhaviev, R., Marchenko, E., Alexandrov, Y.: Smartcheck: Static analysis of ethereum smart contracts. In: Proceedings of the 1st International Workshop on Emerging Trends in Software Engineering for Blockchain, pp. 9–16 (2019) 13. Sallal, M., et al.: Augmenting an internet voting system with selene verifiability using permissioned distributed ledger. In: 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), pp. 1167–1168. IEEE (2020) 14. Hafid, A., Hafid, A.S., Samih, M.: A tractable probabilistic approach to analyze sybil attacks in sharding-based blockchain protocols. IEEE Transactions on Emerging Topics in Computing (2022) 15. Movahedi, M., Zamani, M., Raykova, M.: Rapidchain: scaling blockchain via full sharding. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 931–948 (2018) 16. Kokoris-Kogias, E., Jovanovic, P., Gasser, L., Gailly, N., Syta, E., Ford, B.: Omniledger: a secure, scale-out, decentralized ledger via sharding. In: 2018 IEEE Symposium on Security and Privacy (SP), pp. 583–598 (2018) 17. Luu, L., Narayanan, V., Zheng, C., Baweja, K., Gilbert, S., Saxena, P.: A secure sharding protocol for open blockchains. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 17–30 (2016) 18. Nguyen, Q.: Implementing OmniLedger sharding (2008) 19. Avarikioti, G., Kokoris-Kogias, E., Wattenhofer, R.: Divide and scale: Formalization of distributed ledger sharding protocols (2019) 20. Al-Bassam, M., Sonnino, A., Bano, S., Hrycyszyn, D., Danezis, G.: Chainspace: a sharded smart contracts platform, arXiv preprint arXiv:1708.03778 (2017) 21. Al-Bassam, M., Sonnino, A., Bano, S., Hrycyszyn, D., Danezis, G.: Chainspace: A sharded smart contracts platform (2017) 22. Han, R., Yu, J., Lin, H., Chen, S., Esteves-Ver´ıssimo, P.: On the security and performance of blockchain sharding. Cryptology ePrint Archive (2021) 23. Wang, J., Wang, H.: Monoxide: scale out blockchains with asynchronous consensus zones. In: 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pp. 95–112 (2019) 24. N.-J. M. B.-A. A. T. E. Schwarz-Schilling, C., Tse, D.: The Zilliqa Project: A Secure, Scalable Blockchain Platform (2017) 25. Skidanov, A.: Limitations of Zilliqa’s sharding approach. Near Protocol (2018) 26. Pomogalova, A.V., Martyniuk, A.A., Yesalov, K.E.: Key features and formation of transactions in the case of using UTXO, EUTXO and account based data storage models. In 2022 International Conference on Modern Network Technologies (MoNeTec), pp. 1–7 (2022) 27. Ergo, D.: Ergo: A resilient platform for contractual money (2019) 28. Chepurnoy, A., Saxena, A.: On Contractual Money (2019) 29. Slesarenko, A.: ErgoTree Specification for Ergo Protocol 1.0 (2020) 30. Xie, J.: Cell Model, A generalized UTXO as state storage (2019) 31. M. A.-K. J. Yang, I.: The Nervos Network Positioning Paper. IEEE Network, pp. 166–173 (2019) 32. Nielson, S., Monson, C.: Asymmetric Encryption: Public/Private Keys. In: Practical Cryptography in Python, pp. 111–163 (2019) 33. Sallal, M., de Fr´ein, R., Malik, A.: Pvpbc: privacy-and verifiability-preserving evoting based on permissioned blockchain. Future Internet 15(4), 121 (2023) 34. Sallal, M., Owenson, G., Adda, M.: Bitcoin network measurements for simulation validation and parametrisation. In: 11th International Network Conference (2016)

Security Analysis of Blockchain Sharding Based EUTxO

123

35. Owenson, G., Adda, M., et al.: Proximity awareness approach to enhance propagation delay on the bitcoin peer-to-peer network. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 2411–2416. IEEE (2017) 36. Fadhil, M., Owenson, G., Adda, M.: A bitcoin model for evaluation of clustering to improve propagation delay in bitcoin network. In: 2016 IEEE intl conference on computational science and engineering (CSE) and IEEE intl conference on embedded and ubiquitous computing (EUC) and 15th intl symposium on distributed computing and applications for business engineering (DCABES), pp. 468–475. IEEE (2016) 37. Sloderbeck, M., Andrus, M., Langston, J., Steurer, M.: High-speed digital interface for a real-time digital simulator. In: Proceedings of the 2010 Conference on Grand Challenges in Modeling & Simulation, pp. 399–405 (2010) 38. Fadhil, M., Owenson, G., Adda, M.: Locality based approach to improve propagation delay on the bitcoin peer-to-peer network. In: 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 556–559. IEEE (2017)

Information Systems and Artificial Intelligence

Fundamental Frequency Removal PCA Method and SVM Approach Used for Structure Feature Distilling and Damage Diagnosis Gang Jiang1,2 , Yue Peng2 , Yifan Huang3 , Xing’an Hao1,2(B) , Shi Yi2 , Yuanming Lai2 , Qian Wang2 , Jie Jiang2 , Chuanmei Hu2 , Lanying Yang2 , and Song Gao2 1 State Key Lab of Oil and Gas Reservoir Geology and Exploitation, Chengdu University of

Technology, Chengdu 610059, Sichuan, China [email protected] 2 College of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu 610059, Sichuan, China 3 School of Marxism, Chengdu University of Technology, Chengdu 610059, Sichuan, China

Abstract. In Structure Nondestructive Testing (SNT), it’s easy to obtain abundant vibration frequency features by advanced sensors and signal collectors. But too many features often lead to algorithms termination unexpectedly because of computer memory overflow, or trap in local optimum because of sparse nature of solutions in high dimension space. Classical Principal Component Analysis (PCA) algorithm can sort features by descending order but cannot directly select features used for classification. Many “principal components” choose by PCA usually lead to wrong decision because they are not always “principal features” which can separate samples among different classes rightly. Same or similar structures will have same or similar frequency and amplitude correspondingly, they were called “fundamental frequencies”. This paper proposed a new feature distilling method named “Fundamental Frequency Removal Principal Component Analysis (FFR-PCA)”, fundamental frequencies were removed so as to reduced features size substantially and improved calculation speed and accuracy successfully. Support Vector Machine (SVM) was used in pattern recognition experiments and showed its superior performance on engineering structure damage diagnosis. The proposed method in this paper is scalable and can be extended to an entire bridge engineering structure faults diagnosis. Keywords: Structure Nondestructive Testing · Fundamental Frequency Removal Principal Component Analysis (FFR-PCA) · PROE Modeling · ADAMS Simulation · Support Vector Machine (SVM)

1 Introduction Efficient and effective damage identification is very important in bridge engineering structure safety and reliability monitoring. There were many famous bridge collapse disasters in the world. The I-35W Bridge in Mississippi, collapsed In 2011 [1]. In 1916, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 127–147, 2023. https://doi.org/10.1007/978-981-99-3581-9_8

128

G. Jiang et al.

the Quebec Bridge fallen down in Canada. Florida-Ohio Kanauga Bridge, a famous bridge in USA, collapsed in 1967. Collapsed About thirty years latter, Seoul SANGSU Bridge in Korea, broken without any warning. After that, JIUJIANG Bridge in China collapsed in 2007. During bridge structure long term service life time, connection joints loose and beams broken were inevitably. In order to estimate structures’ health status, many researchers tried to do experiments based on vibration technique, ultrasonic testing, magnetic detecting, and so on. S. K. Chakrapani, V. Dayal, D. Barnard [2] used air coupled ultrasound to do detecting experiment, got some features of response signals. A. A. Carvalho, R. R. Silva, J. M. A. Rebello [3] implemented a series of welding pattern recognition experiment based on magnetic technique. The most widely used methods in SNT research area were vibration-based approaches. S.H. Sung, H.J. Jung, H.Y. Jung introduced their work in paper [4], they collected .vibration signals from a beam. Reza V. Farahani and Dayakar Penumadu [5] analyzed bridge structure vibration time sequence signal and identified health status. Others vibration-based research can be found in work of F. Magalhaes, A. Cunha, E. Caetano [6], Rajendra Machavaram, K. Shankar [7], Rupika P. Bandara, Tommy H.T. Chan, David P [8], and so on. However, vibration signals usually contain rich features. When we use their time domain or frequency domain features to do pattern recognition experiment, it’s easy to trap in local optimum because of sparse nature of solutions in high dimension space. More serious consequences are that too many features organized as huge size matrices in algorithm, will cause computer termination unexpectedly because of memory overflow. Some scholars studied Principal Component Analysis (PCA). It’s well known that PCA is an unsupervised. Approach and widely utilized for complex system fault diagnosis and multivariate analysis. Works of I.M. Johnstone and A.Y. Lu [9], J. Lei and V.Q. Vu [10], D. Kim, Yazhen Wang [11] discussed sparse nature of solutions in high dimension space and suggested importance of PCA in multivariate system. R. P. Bandara, T. H. T. Chan and D. P. Thambiratnam [12] studied structures frequency response signals and tried to do health status pattern recognition experiments. Y. Tharrault, G. Mourot and J. Ragot [13] proposed a kind of robust PCA technique and implemented it in structure fault diagnosis testing. Chiang L. H., Colegrove L. F [14] analyzed an online engineering control system and adopted PCA approach to distil main factors from multivariate datasets. Aimed at steel industry development situation, Kano M. and Nakagawa Y [15] used PCA technique to analyze relationship and main factors among manufacturing monitoring, feedback control and quality promotion. We found that classical PCA can sort features by descending order according to contribution rates, but cannot output features used for classification directly. It cannot reduce samples and features size by itself automatically. How many dimensions input features are, the number of output dimensions are how many. Researchers must choose features they need by themselves, mostly rely on cumulative contribution rate. Many “principal components” often lead to fault decision because they are almost the same features among different classes in datasets. “Non-principal components” will be discarded directly because of their poor contribution rate for the signals, even if they are the most important factors in fact used for pattern recognition.

Fundamental Frequency Removal PCA Method and SVM Approach

129

This paper mainly proposes an improved PCA method named Fundamental Frequency Removed PCA (FFR-PCA), and try to found a way to separate “principal features” from huge size datasets rather than “principal components”. Authors aim at distilling effective features from structures vibration signals so as to reduce samples size, save computational cost and improve classification accuracy. Main contents of this paper are organized as follows: (1) Built real experiment objective and 3D simulation models according a bridge structure, do actual vibration tests and virtual experiments. (2) Research on improved FFR-PCA algorithm by discarding useless features from structure vibration signals, so as to generate really worthwhile features for fault diagnosis. (3) Use Support Vector Machine (SVM) to do pattern recognition experiment.

2 Fundamental Frequency Removal Principal Component Analysis (FFR-PCA) Give a n − by − m dataset matrix X as follows: ⎤ ⎡ ⎤ ⎡ x1 x11 . . . x1j · · · x1m ⎢ ... ... ... ... ... ⎥ ⎢...⎥ ⎥ ⎢ ⎥  ⎢ ⎥ ⎢ ⎥ ⎢ X = ⎢ xi1 . . . xij . . . xjm ⎥ = ⎢ xi ⎥ = v1 . . . vj . . . vm ⎥ ⎢ ⎥ ⎢ ⎣ ... ... ... ... ... ⎦ ⎣...⎦ xn1 . . . xnj · · · xnm xn

(1)

This n − by − m matrix contains information from n experimental samples and m features of each sample. Each row vector xi represents features of a sample. In the same way, each column vector vj represents one feature in the whole dataset. Since different features have different magnitudes and scales, original data set must be normalization preprocessed before applying any analysis. Each variable should be re-scaled to have zero mean and unity variance. This is performed by modifying each feature vj as follows: 1 n xij i=1 n

(2)

1 n (x − μvj )2 i=1 ij n−1

(3)

μvj = σvj2 =

xij = (xij − μvj )/σvj where μvj and σvj2 are the mean and the variance, xij is the re-scaled sample. The covariance matrix is defined as follows: ⎤ ⎡ T v1 v1 . . . vT1 vj · · · vT1 vm ⎢ ... ... ... ... ... ⎥ ⎥ ⎢ 1 1 ⎢ vT v . . . vT v . . . vT v ⎥ CX = XT X = ⎢ j 1 j j j m⎥ ⎥ n−1 n − 1⎢ ⎣ ... ... ... ... ... ⎦ vTm v1 . . . vTm vj · · · vTm vm

(4)

(5)

130

G. Jiang et al.

It’s a m−by −m square symmetric matrix, which can be used to measure the degrees of linear relationship within the dataset among all features. The diagonal terms are the variances of the corresponding variables: σvj2 =

1 T 1 n 2 vj vj = x i=1 ij n−1 n−1

(6)

The off-diagonal terms are the covariance between pairs of variables: 2 σvj,vk =

1 T 1 n vj vk = xij xik i=1 n−1 n−1

(7)

Give a m − by − m linear transformation matrix P, which can be used to transform the original data matrix X into the following form: Y = PX

(8)

In order to calculate the minimal redundancy, we aim at finding a transformation matrix P so that the covariance of the new data matrix T is diagonal. CX =

1 XX T n

(9)

CY =

1 T YY n

(10)

Substituting (8) into (9), relationship between C X and C Y can be written as: CY =

1 T 1 1 1 YY = (PX)(PX)T = PXX T P T = P( XX T )P T = PC X P T n n n n

(11)

Considering that C X is a symmetric matrix, it can be transformed into diagonal matrix by eigenvalue decomposition as follows: (12) C Y = PC X P T = P(ET DE)P T = P(P T DP)P T = PP T D PP T = D where matrix P≡ ET , and D is a diagonal matrix consisting of the characteristic vectors of C X . P is a linear transformation matrix as well as an orthogonal matrix, each row of P is a characteristic vector of C X . Generally, characteristic vectors are sorted according to the eigenvalues by descending order. Principal component matrix of X can be solved out according to P. PCA algorithm can be performed in practice through the following steps: Step 1: Organize the dataset as n − by − m matrix, where m is the number of features and n is the number of samples. Step 2: Normalize the data to have a zero mean and unity variance. Step 3: Calculate the eigenvectors–eigenvalues of the covariance matrix. Step 4: Select several front part eigenvectors of the characteristic matrix as principal components according to their cumulative contribution rate.

Fundamental Frequency Removal PCA Method and SVM Approach

131

Step 5: Transform the original data by means of the principal components. Classical PCA can sort features by descending order but cannot directly select features used for classification by itself. Many “principal components” choose by PCA often lead to fault results because they are almost the same features among different classes in datasets. Give a series of signals defined as follows: f (t) = A1 sin(2π ω1 t) + A2 sin(2π ω2 t) + A3 cos(2π ω3 t) + A4 sin(2π ω4 t)

(13)



⎤ 5 4 3 0.5 A = [A1 , A2 , A3 , A4 ] = ⎣ 5 4 3 1.0 ⎦ 5 4 3 1.5 ⎡ ⎤ 80 180 320 450 ω = [ω1 , ω2 , ω3 , ω4 ] = ⎣ 80 180 320 450 ⎦ 80 180 320 450 Their time domain waveforms and frequency results calculate by FFT algorithm can be shown as Fig. 1. Frequency features vector of each sample can be defined as follows: AF = [AFi , Fi ], i = 1, 2, . . . 500

(14)

It’s easy to find in frequency domain spectrum that principal frequency of the signals are F = [80 Hz, 180 Hz, 320 Hz, 450 Hz], they were distilled successfully by FFT algorithm. Based on the amplitudes of the four frequency, Principal Component Analysis calculation results is obviously. The former three ones, that is to say, 80 Hz, 180 Hz, 320 Hz, will be chose as “Principal Components” by classical PCA algorithm because of their cumulative contribution rate is over 98%. Time domain waveform of f1(t)

Power spectral density of f1(t) 1000

Amplitude

Amplitude

20 0 −20

0

0.05

0.1

0.15

0.2

500 0

0.25

20 Amplitude

Amplitude

Time(s) Time domain waveform of f2(t)

0 −20

0

0.05

0.1

0.15 Time(s) Time domain waveform of f3(t)

0.2

0.25

0 −20

100

200 300 Frequency (Hz) Power spectral density of f2(t)

400

500

0

100

200 300 Frequency (Hz) Power spectral density of f3(t)

400

500

0

100

200 300 Frequency (Hz)

400

500

500 0

1000

Amplitude

Amplitude

20

0

1000

0

0.05

0.1

0.15 Time(s)

0.2

0.25

500 0

Fig. 1. Time domain waveform and frequency domain spectrum

Most values of AF = [AFi , Fi ] are equal or very similar, because they are generated by similar models. We called these frequencies as “Fundamental Frequencies”. The only difference between them is relied on the fourth frequency F = 450 Hz, as Fig. 2 shows.

132

G. Jiang et al.

Fig. 2. Detailed features in frequency domain

Without usage of the fourth frequency—450 Hz, we cannot separate signals f (t) by the three “principal components” because they are exactly equal. The fourth one has key role, but based on the classical PCA algorithm, it is not a “Principal Component”. PCA algorithm is not a panacea. It will lead to fault decision if we take it for granted that taking “Principal Component” as “Principal Feature”. “Principal Components”, consisting of main part of a signal, are not always the “Principal Features” which can separate a signal from another one. Give a transform T, which can remove these “Fundamental Frequencies” and amplify effect of frequency F = 450 Hz, AF can be transformed in another space: T



AF → AF

(15)



Use AF as inputs of PCA algorithm, frequency F = 450 Hz will be chose as “principal component”, while “Fundamental Frequencies” will be discarded because of their useless for classification. With the removal of a large number of “Fundamental Frequencies” from sample datasets, the computation of algorithm will be greatly reduced as well as a substantial increase in accuracy.

3 Pattern Recognition Method The most important concept of Support Vector Machine (SVM) is hyper plane. SVM is a useful solve 2-class classification method. Give a hyper planes defined as follows [16] (w · x) + b = 0 w ∈ RN , b ∈ R

(16)

A decision functions can be build as [17] f (x) = sgn[(w · x) + b] Maximization of distance between two decision planes defined as [17] 

1 min[T (w)] = min ||w||2 2 s.t. yi · [(w · x) + b] ≥ 1, i = 1, 2, ..., n

(17)

(18)

Fundamental Frequency Removal PCA Method and SVM Approach

133

Give Lagrange formulation as L(w, b, α) =

1 ||w||2 − αi {yi · [(xi · w) + b] − 1} 2

(19)

i=1

Based on saddle point theory, the derivatives of formula L must be vanished as ⎧ n

⎪ ∂L(α, b, w) ⎪ ⎪ = w − yi αi xi = 0 ⎪ ⎪ ⎨ ∂w i=1

n ⎪ ⎪ ∂L(α, b, w)

⎪ ⎪ = yi αi = 0 ⎪ ⎩ ∂b

(20)

i=1

Leads to:

⎧ n

⎪ ⎪ ⎪ w = yi αi xi ⎪ ⎪ ⎨ i=1

n ⎪

⎪ ⎪ ⎪ yi αi = 0 ⎪ ⎩

(21)

i=1

The solution vector has an expansion in terms of a subset of the training patterns. Those patterns, whose Lagrange multipliers equal non-zero, were called Support Vectors (SVs). By using Karush-Kuhn-Tucker complementarily conditions (KKT conditions can be found in works of Vapnik [16] and Cristianini [17]) αi · {yi · [(xi · w) + b] − 1} = 0, i = 1, 2, ..., n

(22)

Support Vectors lie on the margin. All remaining examples of the training datasets are irrelevantly, they have no uses during optimization process. Based on Wolfe dual optimization theory [16], we aim at finding multipliers which can satisfy conditions as follows: ⎤ ⎡ n n



  1 αi − αi αj yi yj xi · xj ⎦ max[W (α)] = max⎣ 2 i=1

s.t.

n

i,j=1

(23)

αi yi = 0,

i=1

αi ≥ 0, i = 1, 2, ..., n After parameter b calculated by upper formula, we can write the hyper plane decision function as follows [17]  n 

f (x) = sgn αi yi · (x · xi ) + b (24) i=1

134

G. Jiang et al.

Aimed at solving nonlinear problems, a transform function ∅ should be defined so as that the input space can be mapped into a feature space. Dimension of feature space is generally higher than original space. The ∅ satisfies Mercer condition usually called kernel function K(x · xi ), which can substitute (x · xi ). Sigmoid kernel, radial basis function kernel, linear kernel and polynomial kernel are typical and useful kernel function defined as follows [16] K(x, xi ) = tanh(v < x · xi > +c)

(25)

K(x, xi ) = exp −((x − xi )/σ )2

(26)

K(x, xi ) =< x · xi >

(27)

K(x, xi ) = (< x · xi > +1)d

(28)

Some scholars adopted SVM to monitor structural health status, such as online monitoring research work of Salvador Villegas, Xiaoou Li and Wen Yu [18], damage self-diagnosis works of Xiaoma D, Baoli W and Qingzhen S [19], structural damage identification of Rivera-Castillo J, Rivas-López M and Nieto-Hipolito J. I [20], pattern recognition analysis of Zhang Xiaozhong, Yao Wenjuan and Tian Fan [21], and so on. Research work of them were meaningful but were mainly theory simulation analysis under lab conditions. Few reports related to real huge size engineering structure can be found in recently few decades.

4 Experiments and Discussion FOUR-RIVER Bridge is the highest bridge in the world so far. We used its structure as our research object, and tried to implement our method under actual lab conditions as well as do virtual simulation in PROE and ADAMS software. Three dimension PROE model (3D model) was designed based on 119 engineering drawings, as Fig. 3 showed. Some engineering structures parameters we used are listed in Table 1.

Fig. 3. FOUR-RIVER Bridge in China and our 3D model designed by PROE software

It’s well known that any destructive testing on real bridge structure, such as damage a beam or loose some bolts so as to make the bridge falling down, were strictly prohibited.

Fundamental Frequency Removal PCA Method and SVM Approach

135

Table 1. Some engineering drawings‘ parameters Point

X Coordinate

Y Coordinate

Tangent

Cosine

Vertical Component

Y Coordinate in CAD

Boom Length

X Coordinate in CAD

1

−41.00

8.20

−0.40

0.93

204.64

8.20

2

−36.00

6.32

−0.35

0.94

201.38

6.32

8400.57

3

−33.00

5.31

−0.32

0.95

199.60

5.31

7392.59

8.00

4

−30.00

4.39

−0.29

0.96

197.97

4.39

6472.27

11.00

5

−27.00

3.56

−0.26

0.97

196.48

3.56

5639.62

14.00

6

−24.00

2.81

−0.23

0.97

195.14

2.81

4894.62

17.00

7

−21.00

2.15

−0.20

0.98

193.95

2.15

4237.27

20.00

8

−18.00

1.58

−0.18

0.98

192.91

1.58

3667.58

23.00

9

−15.00

1.10

−0.15

0.99

192.02

1.10

3185.54

26.00

11

−12.00

0.70

−0.12

0.99

191.30

0.70

2791.14

29.00

12

−9.00

0.40

−0.09

1.00

190.73

0.40

2484.39

32.00

13

−6.00

0.18

−0.06

1.00

190.33

0.18

2265.28

35.00

14

−3.00

0.04

−0.03

1.00

190.08

0.04

2133.82

38.00

15

0.00

0.00

0.00

1.00

190.00

0.00

2090.00

41.00

16

3.00

0.04

0.03

1.00

190.08

0.04

2133.82

44.00

17

6.00

0.18

0.06

1.00

190.33

0.18

2265.28

47.00

18

9.00

0.40

0.09

1.00

190.73

0.40

2484.39

50.00

19

12.00

0.70

0.12

0.99

191.30

0.70

2791.14

53.00

20

15.00

1.10

0.15

0.99

192.02

1.10

3185.54

56.00

21

18.00

1.58

0.18

0.98

192.91

1.58

3667.58

59.00

22

21.00

2.15

0.20

0.98

193.95

2.15

4237.27

62.00

23

24.00

2.81

0.23

0.97

195.14

2.81

4894.62

65.00

24

27.00

3.56

0.26

0.97

196.48

3.56

5639.62

68.00

25

30.00

4.39

0.29

0.96

197.97

4.39

6472.27

71.00

26

33.00

5.31

0.32

0.95

199.60

5.31

7392.59

74.00

27

36.00

6.32

0.35

0.94

201.38

6.32

8400.57

28

41.00

8.20

0.40

0.93

204.64

8.20

0.00 5.00

77.00 82.00

So it’s very difficult to obtain enough and effective “negative samples” for machine learning. Without enough “negative samples”, any algorithm based on datasets totally consisted of “positive samples” were meaningless and ineffective. Existing research works were mainly implemented on objectives under lab conditions or pure virtual simulation. We know that there is a great “gap” between “actual structure” and “virtual model”. The most important thing is to fill the “gap” and find relationship between them as we execute in the next steps as follows. 4.1 Actual Vibration Testing We made a structure objective and experiment system, as Fig. 4 shows. In our testing system, three accelerometers named SD14N14 were used to transfer vibration in numerical signal, and an instrument named ECON PREMAX 1000 was used as signal collector.

136

G. Jiang et al.

Fig. 4. Actual experiment objective and system

A stainless ball was used as hit force. Pulse force was generated when it rolled down from pipe-slope. Hit force can be calculated as follows formulas. 1 2 mv = mgh 2

(29)

where: m—steal ball’s mass, m = 3kg; v—steal ball’s velocity; g—acceleration of gravity; h—height F.dt = mv

(30)

where: F—hit force; dt—time of steal ball hit the objective, in our experiment, dt = 0.25 s

 1 F.dt 2 (31) h= 2g m We know that real structures will be damaged if it is hit by too big force. In our experiment, ultimate strength is 20 N force, that is to say, F = 20 N. Four kinds of tests were implemented as follows. A “4-by-10” datasets was built successfully after 10 times repeat experiments on each kind. (1) Health (saved as No. 1–10 samples in dataset, whose label were marked as “1”) (2) Screws loose (saved as No. 11–20 samples in dataset, whose label were marked as “2”) (3) Wound status (saved as No. 21–30 samples in dataset, whose label were marked as “3”) (4) Compound damage (saved as No. 31–40 samples in dataset, whose label were marked as “4”) Signals collected by ECON PREMAX instrument can be plotted, as Fig. 5 showed. 4.2 PROE and ADAMS Simulation Testing Built a structure model in PROE software using geometric parameters of actual experiment objective as Fig. 4 showed. Imported it into ADAMS software and made all parts

Fundamental Frequency Removal PCA Method and SVM Approach

137

flexible using parameters listed in Table 2. Flexible parameters should be carefully set by considering materials engineering performance and structures characters. Inappropriate parameters will lead to false vibration output, which cannot match actual structure vibration. The worst result is that it will fail to make parts of structure flexible. Table 2. Transform parameters of flexible body Pole1

Pole1(wound)

Pole2

Beam1

Beam2

Material

Stainless

Stainless

Stainless

Stainless

Stainless

Mass(kg)

0.01374

0.01367

0.01374

0.3931

1.1078

Node count

370

370

370

622

3453

Mode count

24

30

24

24

48

Elem. Type

Solid

Solid

Solid

Solid

Solid

Elem. Shape

Tetrahe-dral

Tetrahe-dral

Tetrahe-dral

Tetrahe-dral

Tetrahe-dral

Elem. Order

Linear

Linear

Linear

Linear

Quadra-tic

Edge Shape

Straight

Straight

Straight

Straight

Mixed

Elem. Size

20 mm

20 mm

20 mm

20 mm

10 mm

Min Size

5 mm

5 mm

5 mm

5 mm

0.2 mm

Growth Rate

1.5

1.5

1.5

1.5

1.5

Ang./Elem

45

45

45

45

45

Shell Thick

1.0 mm

1.0 mm

1.0 mm

1.0 mm

1.0 mm

After building the relationship between actual structure and virtual three dimension model (3D model), parameters and outputs can be checked and matched finally, that is to say, our method can be generalized to a bigger structure parts, even applied on a whole real bridge. Flexible model in ADAMS shows as Fig. 6. Give a virtual force at the “Hit Position” in Fig. 6, F = step(time, 0, 20, 0.25, 0)

(32)

It’s a force input formula in ADAMS software, “step” means pulse force, “20” means force peak value, “0.25” means total time of force acting on objective. Adjusting bolts status of joint 1 and joint 2, selecting health status of pole1, actual structure experiments can be simulated with four conditions: health, loose, wound, compound (loose + wound). In each kind of experiments, acceleration of Joint 1 will be calculated and outputted as data files. Time domain signal waves and frequency features can be calculated and plotted out, as Fig. 7 showed. It’s similar to real signal of real engineering structure as Fig. 5 shows. Using following formula, we can calculate relative error between actual experiment outputs and simulation outputs of all tests, as Fig. 7 shows.   (33) E = P − P /P ∗ 100%

138

G. Jiang et al.

Fig. 5. Actual structure’s signal in time domain and frequency domain of No.16 sample (The second class: Loose status)

where: P—Actual FFT value; P’—ADAMS model’s FFT value; E—Relative error value of each FFT dot. Error distribution results are listed in Table 3, simulation model vibration results can match actual structure vibration very well.

Fundamental Frequency Removal PCA Method and SVM Approach

139

Fig. 6. ADAMS model transformed from rigid body into flexible body in ADAMS software

Fig. 7. Signal wave in time and frequency domain of ADAMS model (2nd pattern: Loose)

4.3 FFR-PCA Feature Distilling FFT applied on each signal generated 90000 frequency features (each has 30000 in x, y and z axes). We did 10 times tests in four kinds of structure status (health, loose, wounded and composed damage), and got a 40-by-90000 data matrix finally. The matrix was too big to do matrix calculation and transformation. Computer halted on frequently during calculation process because of memory overflow. After careful analysis, we found many frequency features in the 40-by-90000 matrix were equal or very similar because of the same basic structures of four patterns. These equal or similar frequencies, which we called as “Fundamental Frequencies”, will lead to false results in latter pattern recognition experiments. At the same time, heavy calculation

140

G. Jiang et al. Table 3. Error distribution analysis (40 samples of dataset, 4 classes) Total Dots Error > 5% Dot

Class Label Damage Status

Error > 10% Dot

Number Percent (%) Number Percent (%) 1

Health (10 samples)

900000

160

0.0178

31

0.0034

2

Bolts Loose (10 samples)

900000

267

0.0297

42

0.0047

3

Pole Wound (10 samples)

900000

2420

0.2689

701

0.0779

4

Compound damage 900000 (10 samples)

1396

0.1551

481

0.0534

burden of huge size matrix transforms will cause memory overflow also. Calculation speed and pattern recognition prediction accuracy will increase obviously by removing these “Fundamental Frequencies”. Give a discrimination function of frequency between two samples defined as  m,n (i)δ =

|fm (i) − fn (i)|  max fm (i) − fn (i)

(34)

m = 1, 2, . . . , 40, n = 1, 2, . . . , 40, i = 1, 2, . . . , 90000 Set trust factor δ = 0.05 and a counter k = 0, we add the counter with 1 when each m,n (i) > δ. Finally, total dot of each sample can be calculated by counter k. Values of k of all samples in Class 1 (Health status) list in Table 4. Define discrimination percent value as D(m) = 100 ×

k , m = 1, 2, . . . , 40 90000

(35)

Discrimination percent figures of each class of the dataset can be plotted as Fig. 8. Frequency discrimination value between two samples is almost similar in the same class with an obvious difference among different classes. It means that if we can distil the “special frequency features” by removing “Fundamental Frequencies” in each class, pattern recognition among four classes can be realized successfully. Set trust factor δ = 0.95, features whose m,n (i) < δ are called “Fundamental Frequencies” and will be removed before SVM experiment latter. At the same time, features whose m,n (i) > δ are selected and used in PCA algorithm. In our experiments, 7650 frequency values were distilled from 90000 original frequencies with a percentage of 91.5% off. It means that time costs of latter process of PCA will be compressed to about 8.5% directly, and without any computer memory overflow disaster happens. These advantages are very important in real engineering application. Figure 9 shows distilled features of some samples from four classes by the method introduced in former chapters.

Fundamental Frequency Removal PCA Method and SVM Approach

141

Table 4. Discrimination of Class 1 in whole dataset (Trust factor = 0.05) Label

Class 1 (Health)

Sample

1

2

3

4

5

6

7

8

9

10

Label

Sample

Total dots

90000

90000

90000

90000

90000

90000

90000

90000

90000

90000

Class 1: Health

1

90000

0

30

36

25

36

37

38

34

27

33

2

90000

30

0

32

31

26

33

35

23

28

30

3

90000

36

32

0

36

38

43

42

33

28

19

4

90000

25

31

36

0

27

29

37

35

38

34

5

90000

36

26

38

27

0

32

40

34

33

30

6

90000

37

33

43

29

32

0

42

41

31

32

7

90000

38

35

42

37

40

42

0

40

41

32

8

90000

34

23

33

35

34

41

40

0

37

30

9

90000

27

28

28

38

33

31

41

37

0

29

10

90000

33

30

19

34

30

32

32

30

29

0

11

90000

79831

79837

79851

79826

79838

79824

79847

79832

79819

79839

12

90000

79826

79852

79864

79839

79840

79845

79848

79820

79844

79839

13

90000

79827

79835

79830

79809

79820

79822

79826

79817

79823

79832

14

90000

79824

79842

79849

79838

79843

79826

79849

79838

79851

79847

15

90000

79850

79857

79855

79829

79851

79837

79845

79847

79854

79852

16

90000

79843

79858

79866

79850

79848

79845

79853

79849

79855

79844

17

90000

79823

79844

79853

79830

79844

79825

79833

79831

79839

79845

18

90000

79839

79844

79840

79831

79827

79826

79844

79822

79841

79835

19

90000

79847

79854

79838

79823

79848

79841

79846

79825

79843

79836

20

90000

79833

79847

79855

79822

79829

79840

79856

79828

79850

79850

21

90000

89261

89267

89271

89269

89265

89262

89266

89278

89275

89272

22

90000

89263

89270

89269

89267

89266

89265

89265

89273

89270

89271

23

90000

89264

89269

89272

89269

89265

89265

89265

89279

89271

89273

24

90000

89266

89271

89271

89270

89264

89259

89262

89277

89271

89275

25

90000

89261

89275

89269

89270

89261

89267

89266

89272

89277

89275

26

90000

89262

89272

89271

89271

89263

89260

89261

89277

89274

89272

27

90000

89265

89272

89269

89271

89267

89265

89264

89273

89269

89274

28

90000

89265

89274

89270

89270

89266

89260

89265

89277

89272

89272

29

90000

89265

89271

89266

89270

89264

89265

89265

89278

89274

89274

30

90000

89264

89268

89271

89271

89261

89263

89266

89277

89270

89272

31

90000

89270

89269

89268

89266

89266

89265

89269

89268

89260

89269

32

90000

89270

89266

89266

89260

89269

89268

89266

89265

89261

89269

33

90000

89267

89270

89267

89267

89269

89264

89266

89267

89259

89268

34

90000

89264

89269

89266

89263

89269

89267

89265

89271

89256

89271

35

90000

89267

89271

89268

89262

89264

89266

89267

89267

89258

89273

Class 2: Loose

Class 3: Wounded

Class 4: Compound

(continued)

142

G. Jiang et al. Table 4. (continued)

Label

Label

Class 1 (Health)

Sample

1

2

3

4

5

6

7

8

9

10

Sample

Total dots

90000

90000

90000

90000

90000

90000

90000

90000

90000

90000

36

90000

89267

89267

89265

89263

89267

89263

89265

89266

89259

89268

37

90000

89266

89271

89267

89259

89268

89269

89265

89269

89260

89268

38

90000

89265

89269

89265

89262

89268

89269

89265

89268

89259

89272

39

90000

89263

89265

89267

89262

89269

89267

89268

89268

89263

89271

40

90000

89268

89268

89270

89261

89271

89266

89264

89271

89260

89268

Dis.(%)

Dis.(%)

Dis.(%)

Dis.(%)

Discrimination of Class 1 100 50 0

100 50 0

100 50 0

0

5

10

0

5

10

0

5

10

0

5

10

15

20 25 Sample No. Discrimination of Class 2

30

35

40

15

30

35

40

15

30

35

40

30

35

40

20 25 Sample No. Discrimination of Class 3

20 25 Sample No. Discrimination of Class 4

100 50 0

Class1

15

Class2

20 25 Sample No.

Class3

Class4

Fig. 8. Discrimination map of each class in whole dataset

Datasets was organized as a 40 − by − 90000 matrix as formula (1) shows in Sect. 2, where m is the number of features and n is the number of samples. We used formula (2) and (3) to calculate mean and variance value of samples. Covariance matrix was calculated by formula (5)-(7) and (11), and eigenvectors-eigenvalues of the covariance matrix was calculated finally. The total processes were programmed by us use MATLAB software with a version of 2014A. Adopted the 40-samples datasets as input “X” in our program. Frequency features of each sample, which was organized as a row in “X”, were finally mapped in a 39

Fundamental Frequency Removal PCA Method and SVM Approach −3

8

143

FFR Result

x 10

Class 1(No.3 sample) Class 2(No.15 sample) Class 3(No.22 sample) Class 4(No.37 sample)

7 6

Amplitude

5 4 3 2 1 0

0

1000

−3

Amplitude

6

3000 Frequency(Hz)

4000

5000

6000

FFR result of Class 1(No.3 sample)

x 10

4

2

0

0

1000

2000

5000

6000

1000

2000

5000

6000

1000

2000

5000

6000

1000

2000

5000

6000

−3

8

Amplitude

2000

x 10

3000 4000 Frequency(Hz) FFR result of Class 2(No.15 sample)

6 4 2 0

0 −3

Amplitude

6

x 10

4

2

0

0 −3

4

Amplitude

3000 4000 Frequency(Hz) FFR result of Class 3(No.22 sample)

x 10

3000 4000 Frequency(Hz) FFR result of Class 4(No.37 sample)

3 2 1 0

0

3000 Frequency(Hz)

4000

Fig. 9. FFR results of four classes (No.3, 15, 22, 37 sample, total dots = 7650)

dimensions space. We can get a 40 − by − 39 transform matrix calculated by “X” matrix. We called the transform matrix as “pc” matrix. Each column of “pc” matrix corresponded to one principal component. The eigenvalues vector, we defined as “T” in program, stored

144

G. Jiang et al.

the variances of the principal components. The “pc” matrix corresponded to a coefficient matrix with orthonormal components, which meant that they are irrelevance among all dimensions. Eigenvectors, organized as rows in “pc” matrix, will be automatically sorted by descending order according to corresponding eigenvalues of the covariance matrix C X defined by formula (9)–(12). Chose principal components according to eigenvalues’ cumulative contribution rate defined as

k

39 T (i)/ T (i) (36) τk = 100 × i=1

i=1

Set cumulative contribution rate threshold τk = 99.5%, which meant that selected components based on the contribution rate, contain 99.5% features of original sample. Calculation results of τk showed that, the cumulative contribution of former three components of transform matrix “pc” was satisfied with τk = 99.5%. They were selected as “principal components” in our program, and used to do pattern recognition experiments based on Support Vector Machine in the next section. 4.4 Pattern Recognition Based on SVM Three contrast experiments was implemented to find advantages and disadvantages of the method. (1) Direct SVM: A 40 − by − 90000 vibration signal frequency features matrix was used as SVM input directly. They were gathered by ECON PREMAX 1000 signal collector instrument. (2) PCA_SVM: Input 40 − by − 90000 vibration signal frequency features in PCA algorithm at first. Results of PCA were input in SVM. (3) FFR_PCA_SVM: Fundamental frequencies were removed in the first step. Outputs matrix, with a size of about 40 − by − 7650, was used for PCA. Outputs of PCA were SVM input data in final step. Normalize SVM input data X is essential. Give a normalized equation defined as follows: X NE =

(vmax − vmin ) × (x − xmin ) + vmin xmax − xmin

(37)

Generally speaking, normalize all input data in [−1, 1] is a good choice. So parameters in formula (37) should be set as vmax = 1, vmin = −1. Contrast experiments were performed on a computer with 2 GB DDR RAM, MATLAB R2014a edition software and 32 bits Windows system. SVM classification algorithm introduced in Sect. 2 was programmed by MATLAB language. Normalized samples (row vectors in input matrix X) were selected as the predict samples in turn as well as others samples were training data. In training procedure, a sigmoid kernel function, which was defined as formula (28), was selected in our algorithm. Parameters settings list in Table 5. Training and predict outputs of three contrast experiments list in Table 6.

Fundamental Frequency Removal PCA Method and SVM Approach

145

Table 5. Parameter settings of three contrast experiments Parameter Name

Direct SVM

PCA SVM

FFR_PCA SVM

SVM type

C-SVC

C-SVC

C-SVC

Kernel Function

Sigmoid

Sigmoid

Sigmoid

Cost of C-SVC

1

1

1

Tube Radius ε

0.001

0.05

0.1

Tolerance δ

0.001

0.028

0.02

Table 6. SVM training and predict outputs Parameter Name

Direct SVM

PCA SVM

FFR-PCA SVM

Total SVs

20

10

8

Total sample

40

40

40

SVs percentage

50%

25%

20%

Predict False number

3

5

0

Predict Accuracy

92.5%

87.5%

100%

Time cost (second)

21.3538

1.0783

0.0801

5 Conclusion This paper proposed a kind of method for bridge structure health nondestructive testing. Real structure experiments were applied to build “real dataset”, at the same time, PROE and ADAMS software were adopted to do numerical simulation tests so as to get “negative samples”, which were very difficult to obtain in real engineering structures. After that, a kind of feature distil approach which called “Fundamental Frequency Removal Principal Component Analysis (FFR-PCA)” was proposed. Fundamental frequency features were removed in the first step. Real useful features were saved while most useless data were removed so as to save computational costs, accelerate calculation speed, improve prediction accuracy, avoid computer memory overflow and keep away from local optimum effectively. Finally, pattern recognition experiments based on SVM were implemented. (1) Huge matrix transform and calculation will cost too much time and too much computer memory. Some special technology such as computer hard disk virtual capacity, memory dynamic management, and matrix special calculation methods are essential so as to avoid computer halting on because of memory overflow. So, it’s very important to distil real useful features and compress original dataset substantially. (2) Vibration frequency features were too rich to do pattern recognition experiments because of sparse nature of solutions in high dimension space. Directly input original vibration features in SVM will increase Support Vectors (SVs) numbers obviously. It will lead to optimization algorithm trap in local optimum and output error results.

146

G. Jiang et al.

(3) “Principal components” of a signal were the main parts of the signal, but they were not always the “principal features”, which can be used to separate one signal from another. So, many real useful features will be ignored by classical PCA algorithm. Our experiment results show that Direct SVM method (all frequency features input directly in SVM) usually has higher prediction accuracy than Classical PCA SVM method, because some useful features will be removed by Classical PCA SVM. (4) FFR_PCA SVM method shows its good performance on real useful features distilling experiments, with a 100% predict accuracy and about 0.38% time cost of Direct SVM method. It is a powerful and potential method in structure health nondestructive engineering application. Acknowledgments. Authors would like to thank the supports from Major Science and Technology Projects of Sichuan Province (2020ZDZX0019), and Sichuan Province Science and Technology Department Key Research and Development Project (2021YFG0075, 2021YFG0076, 2022YFG0347).

References 1. Alampalli, S., Rehm, K.C.: Impact of I-35W bridge failure on state transportation agency bridge inspection and evaluation programs. Struct. Congress 2011, 1019–1026 (2011) 2. Chakrapani, S.K., Dayal, V., Barnard, D.: Detection and characterization of waviness in unidirectional GFRP using Rayleigh wave air coupled ultrasonic testing (RAC-UT). Res. Nondestr. Eval. 24, 191–201 (2013) 3. Carvalho, A.A., Silva, R.R., Rebello, J.M.A., Sagrilo, L.V.S.: Pattern recognition techniques applied to the detection and classification of welding defects by magnetic testing. Res. Nondestr. Eval. 21, 91–111 (2010) 4. Sung, S.H., Jung, H.J., Jung, H.Y.: Damage detection for beam-like structures using the normalized curvature of a uniform load surface. J. Sound Vib. 332, 1501–1519 (2013) 5. Farahani, R.V., Penumadu, D.: Damage identification of a full-scale five-girder bridge using time-series analysis of vibration data. Eng. Struct. 115, 129–139 (2016) 6. Magalhaes, F., Cunha, A., Caetano, E.: Vibration based structural health monitoring of an arch bridge: from automated OMA to damage detection.Mech. Syst. Signal Process. 28, 212–228 (2012) 7. Machavaram, R., Shankar, K.: Joint damage identification using Improved Radial Basis Function (IRBF) networks in frequency and time domain. Appl. Soft Comput. 13, 3366–3379 (2013) 8. Bandara, R.P., Chan, T.H.T., Thambiratnam, D.P.: Frequency response function based damage identification using principal component analysis and pattern recognition technique – ScienceDirect. Thambiratnam. Eng. Struct. 66, 116–128 (2014) 9. Johnstone, I.M., Lu, A.Y.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104(486), 682–693 (2009) 10. Lei, J., Vu, V.Q.: Sparsistency and agnostic inference in sparse PCA. Ann. Stat. 43(1), 299–322 (2015) 11. Kim, D., Wang, Y.: Sparse PCA-based on high-dimensional Ito processes with measurement errors. J. Multivariate Ana. 152, 172–189 (2016) 12. Bandara, R.P., Chan, T.H.T., Thambiratnam, D.P.: Frequency response function based damage identification using principal component analysis and pattern recognition technique. Eng. Struct. 66(4), 116–128 (2014)

Fundamental Frequency Removal PCA Method and SVM Approach

147

13. Tharrault, Y., Mourot, G., Ragot, J.: Fault detection and isolation with robust principalcomponent analysis. Int. J. Appl. Math. Comput. Sci. 18(4), 429–442 (2013) 14. Chiang, L.H., Colegrove, L.F.: Industrial implementation of on-line multivariate quality control. Chemom. Intell. Lab. Syst. 88(2), 143–153 (2007) 15. Kano, M., Nakagawa, Y.: Data-based process monitoring, process control, and quality improvement: Recent developments and applications in steel industry. Comput. Chem. Eng. 32(1–2), 12–24 (2008) 16. Vapnik, V.N.: Statistical Learning Theory. Wiley Publishing House, New York (1998) 17. Cristianini, N., Taylor, J.S.: An Introduction to Support Vector Machines and Other KernelBased Learning Methods. Cambridge University Press, Cambridge (2000) 18. Villegas, S., Li, X., Yu, W.: Detection of building structure damage with support vector machine. In: 12th International Conference on Networking, Sensing and Control, pp. 619–624 (2015) 19. Xiaoma, D., Baoli, W., Qingzhen, S., et al.: Inves-tigation on damage self-diagnosis of fiber smart structures based on LS-SVM. In: 2nd IEEE Inter-national Conference on Information Management and Engineering, pp. 626–638 (2010) 20. Rivera-Castillo, J., Rivas-López, M., Nieto-Hipolito, J.I., et al.: Structural Health Monitoring based on Optical Scanning Systems and SVM. 23rd International Symposium on Industrial Electronics (ISIE), pp.1961–1966 (2014) 21. Xiaozhong, Z., Wenjuan, Y., Fan, T.: Structural damage identification based on time-varying ARMA model and support vector machine. J. Basic Sci. Eng. 21(6), 1094–1102 (2013)

Stock Trend Prediction Based on Improved SVR Zhouyuzhe Bai(B) Southwest Minzu University, Sichuan, China [email protected]

Abstract. Forecasting the change trend of stocks and making investment decisions based on the prediction results can effectively avoid risks in stock investment and increase investment income. The combination of machine learning and big data provides an effective way for stock trend prediction. At present, many machine learning algorithms have been used for stock trend prediction and have achieved good results. In view of the fact that the support vector regression (SVR) algorithm has a large deviation from the prediction of individual stocks, this paper proposes an improved SVR algorithm. Different from traditional SVR, the enhanced SVR uses the recent data to train the model, rather than all history record. An experiment was performed to compared the proposed SVR with traditional SVR and decision tree algorithms. The RMSE are used as metric. The result shows that the proposed SVR is better than the existing comparing algorithm. Keywords: Support vector regression · Stock price forecast · Determination coefficient · Root mean square error

1 Introduction As the most important financial instrument, the stock reflects the evolution of the domestic economy[1]. After the beginning of the 20th century, and especially after 1970, the equities market expanded rapidly due to the scale and concentration of the economies of industrialized countries, advances in communication network technology, and the growth of transaction activity. Stocks have become an important means of investment in modern society. However, unfortunately, most shareholders lack the ability to tolerate risk and abnormal fluctuations in stock prices are probably to result in significant losses for investors. In order to bring investors more investment income, there has been a need for more effective investment strategies in the stock market than in the past [2]. The stock market is a highly complex mechanism. Many factors, including political, economic, industry, company, human dynamics and psychological aspects all have an impact on stock prices [3]. Predicting the stock market is often quite challenging due to the complex patterns of stock price fluctuations and the non-linear patterns of variation they demonstrate. In order to create an algorithm for accurate forecasting, the equity market must be precisely modelled [4–6]. Owe to the advantage of the data science and the development of AI, the vast quantities of data created by the stock market on a daily © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 148–154, 2023. https://doi.org/10.1007/978-981-99-3581-9_9

Stock Trend Prediction Based on Improved SVR

149

basis can be exploited. Many scholars have commenced to think about how these huge data sets can be used to predict stock prices. Many researchers are now working on the topic of stock price forecasting. In order to establish a quantitative relationship between past and future stock prices, some researchers present historical stock market data in chronological order and use forecasting methods based on time series statistical models. For example, the Auto-Regressive and Moving Average (ARMA) model [7, 8], the Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH) model [9, 10], the Markov model [11, 12], and the Mean Shift Model (MA) [13, 14] are all used extensively in the academic field. Recently, more and more researchers began to utilize machine learning to predict stock prices. Researchers[15] proposed the forward neural network model, and the neural network model is better than the simple MA, although the prediction did not meet the application criteria. An approach which combines an Auto-Regressive Integrated Moving Average (ARIMA) model with an Artificial Neural Network was proposed by Zhang and his colleagues in [16], and the results show that the combined approach has higher prediction accuracy than the ARIMA model in non-linear data treatment. Some investigators have used models such as Long Short Term Memory Network (LSTM) [17, 18] and RNN [19, 20] to predict the price of stock. The SVM is also used to forecast stock price by more and more researchers. The SVM algorithm utilizes the post-sectional risk minimisation principle and is highly generalisable [21, 22]. Kim [23] used SVR which is an essential subset of SVM for stock prices forecasting and experiments showed that SVR was better than BP Neural Network model. In the practical application of a machine learning algorithm, the prediction effect of the algorithm has a strong correlation with the distribution characteristics of training data. For the historical data of most stocks, the algorithm can achieve the same prediction effect as other machine learning algorithms, but for individual stock data, the SVR algorithm will have significant performance degradation. To solve this problem, in this paper, we provided an enhanced SVR algorithm. The proposed algorithm can achieve good results on known data sets, ensuring the universal applicability of the SVR.

2 The Forecasting Model of Stock Price 2.1 SVR Prediction Model of Stock Price Mathematical Formulation of Regression Problem Let the training set of the model be T = {(x1 , y1 ), (x2 , y2 ), . . . , (xl , yl )} ∈ (X × Y )l , in which xi ∈ X = Rn , yi ∈ Y = {−1, 1}, i = 1, 2, . . . , l, suppose the training set is an independent X × Y and identically distributed sample point selected according to an unknown probability distribution on P(x, y), andlet the given loss function c(x, y, f ). Try to find a function f (x, y) to minimize R(f ) = c(x, y, f )dP(x, y) the expected risk. Nonlinear SVR Because the data of stock price index is nonlinear data, nonlinear SVR is utilized to forecast the stock price, and the samples x are mapped to high-dimensional feature

150

Z. Bai

space(H) through ϕ(x) nonlinear mapping. Specifically, there is a transformation from the input space Rn to a high-dimensional space(H): :

X ⊂ Rn → X ⊂ H x → (x)

Through the transformation, the original training set in the input space Rn is trans formed to new training set T = {((x1 ), y1 ), ((x2 ), y2 ), . . . , ((xl ), yl )}, in the Hilbert space(H). And then, the linear regression problem in the high-dimensional Hilbert space becomes a nonlinear regression problem in the low-dimensional input space. And, using appropriate kernel function K(xi , x) in the optimal regression function instead of the vector inner product ϕ(xi ) · ϕ(x) in the high-dimensional Hilbert space can achieve a linear fitting after a transformation[16]. And the optimization problem can be defined as: max∗ α,α

l l l    1  ∗    αi (yi − ε) − αi (yi + ε) − αi − αi ∗ αj − αj ∗ K(xi , xj ) 2 i=1

i=1 j=1

The regression function are described as following: ⎧ ⎪ ⎪ ⎨

ω=

l

(αi ∗ − αi )ϕ(xi )

i=1

l ⎪ ⎪ ⎩ f (x) = (αi ∗ − αi )K(xi , x) + b i=1 i ] is used as the kernel function Radial basis function (RBF) K(xi , x) = exp[− x−x σ2 of nonlinear SVM, and the training set includes all the historical stock data. 2

2.2 The Proposed SVR Model for Stock Price Forecasting In our experiments, we found that the SVR model’s predictions of individual stock prices deviated significantly from reality (e.g. Amazon stock, stock code AMZN and Tianchang Group, stock code 2182. HK). It can be seen that although the share price changes of these stocks are quite high, the share price change curves of these stocks are not as stable as those of other stocks when compared to the share price change curves of other stocks. As can be seen in Fig. 1, Amazon’s share price is highly volatile, whereas Microsoft’s share price is not very volatile. For curves with large share price changes, the accuracy of modelling with the SVR model can be greatly affected. According to our test, it is found that the SVR model is not in line with the traditional method that more data does not mean more accurate in data mining because of these sharply changing stock price curves. In our test, if the stock price curve shakes fiercely, the more data samples will result lower prediction accuracy. In this paper, an enhanced SVR was proposed. The proposed SVR uses only the last N days of data not all data to predict the stock price of the next day. The enhanced SVR algorithm and the original SVR use the same regression and kernel functions.

Stock Trend Prediction Based on Improved SVR

151

Fig. 1. The above figure shows the stock price trend of Amazon. The figure below shows the stock price change of Microsoft.

2.3 Evaluating Indicator In this paper, the R2 and RMSE are used as metric: 2 (y −y ) The expression of R2 is R2 = 1 − i i i 2 , i (yi − yi )2 where it represents i (yi −yi ) the square differences between the real value and the predicted value, and sum of the 2 i (yi − yi ) indicates the sum of the square differences between the real value and the mean value. The larger R2 means the better fitting ofthe model. 2 The expression of RMSE is defined as RMSE = m1 m i=1 (yi − yi ) , yi − yi where it represents the distance between the two vectors of real value and the predicted value. The smaller the RMSE indicates the better fitting of the model.





152

Z. Bai

3 Our Experiment 3.1 The Data Set Used in Experiment Via the Python third-party expansion, the experiment acquires historical stock data for numerous domestic and international firms from January 1, 2000, to December 1, 2018. Time (time), Open (opening price), High (highest price), Low (lowest price), Close (closing price), and Volume (trading volume) of stocks are all included in each piece of information. For instance, Table 1 provides access to some Amazon stock information. Table 1. Amazon stock information Time

Opening price

Maximum price

Minimum price

Closing price

Turnover

1997/5/19 1997/5/20

20.5

21.25

19.5

20.5

508900

20.75

21

19.63

19.63

455600

1997/5/21

19.25

19.75

16.5

17.13

1571100

1997/5/22

17.25

17.38

15.75

16.75

981400

1997/5/23

16.88

18.25

16

18

1328100

1997/5/27

17.75

19.75

17.5

19

724800

1997/5/28

19.31

19.63

18.38

18.38

381200

1997/5/29

18.5

18.5

17.75

18.06

289400

1997/5/30

18

18.13

17.75

18

216200

1997/6/2

18.13

18.38

18

18.13

49300

1997/6/3

18.38

18.38

17.75

17.75

98600

1997/6/4

17.75

17.88

16.75

17

256700

1997/6/5

17

18.5

16.5

18.5

472700

Table 2. Result of experiment Prediction methods

Decision tree

Extreme random tree

KNN

Random forest

Adaboost

R2

0.989896

0.998994

0.999013

0.999291

0.995395

RMSE

13.511611

15.161157

12.591843

11.494701

25.989989

Prediction methods

Iterative decision tree

Bagging

logistic regression

SVR

Improve SVR

R2

0.999304

0.999312

0.998994

0.542353

0.997591

RMSE

11.890182

11.460001

10.369871

274.125687

9.100403

Stock Trend Prediction Based on Improved SVR

153

3.2 Experiment In our experiment, the penalty coefficient C = 1, and insensitive loss coefficient ε = 0.1, Amazon’s stock data is used as the training set. The traditional SVR algorithm uses all the historical data for training. The enhanced SVR algorithm uses only the last N days’ data. In addition, the decision tree, logical regression, and KNN algorithms are involved for comparison, and all of them are trained using all historical data of stocks. The experimental results are shown in the following table: From the experimental results, it can be seen that the improved the SVR algorithm is significantly better than SVR algorithm, and also better than Adaboost, which is not far from the evaluation results of the decision tree and extreme random tree.

4 Summary In this paper, we find a marked under-prediction of individual stocks’ prices by the SVR model. The cause is that the stock values of these stocks are highly volatile. This research proposes an enhanced SVR algorithm that predicts stock prices using only the most up-todate data, rather than all preexisting data. This study evaluates the modified SVR method using two evaluation metrics, namely the coefficient of determination and the root mean square error. By comparing the modified SVR algorithm with other machine learning algorithms, including decision trees, we discovered that the experimental results of our improved SVR algorithm significantly outperformed the results of the SVR algorithm.

References 1. Chuanjun, Z.: Liu Jinfeng Research on LSTM stock price prediction based on numerical and text characteristics. Journal of Shanxi University (Natural Science Edition) 41(01), 1–14 (2022) 2. Qian Qimiao, Zhang Duo Application of machine learning in stock price prediction [J] China market, 2022 3. Haiyuan, Y.: Yang Qingsong The impact of investor sentiment on stock price foam based on Bi LSTM model mining. J. Manag. 41(01), 1–14 (2022) 4. Lin Hong, Zhang Yong Pan-security portfolio selection strategy considering portfolio forecasting stock price. Journal of Management Engineering, 2023 5. Han Jinlei, Xiong Pingping. Research on stock price time series prediction based on LSTM and grey model. Journal of Nanjing University of Information Technology (Natural Science Edition), 2023 6. Deng Dejun, Xu Hongzhen Stock price prediction of E-V-ALSTM model. Computer Engineering and Application, 2022 7. Wang Ying Analysis and prediction of stock price based on ARMA model. Productivity Research, 2021, (26): 53, 146–148 8. Feng Pan, Cao Xianbing Empirical research on stock price analysis and prediction based on ARMA model. Practice and Understanding of Mathematics, 2011, (26): 53, 146–148 9. Wanrui Prediction of stock price volatility based on GARCH model. Science and Technology Information, 2022, (32): 12–14 10. Wang Pengwu Research on China’s stock price volatility based on asymmetric GARCH model. Statistics and Decision Making, 2020, (16): 288

154

Z. Bai

11. Yao Pengli Research on stock price prediction from the perspective of improved hidden Markov model. Zhejiang University of Finance and Economics, 2022 12. Zhang Xuan Empirical Research on Stock Price Prediction Based on Hidden Markov Model and Support Vector Machine. Shandong University, 2019 13. Gu Lixia Research on MA trading system based on the kinetic energy of long and short game. South China University of Technology, 2016 14. Huang Tianxi, Tang Yinghui Variable length moving average method and its application in stock investment. Journal of University of Electronic Science and Technology, 2007, 466–469 15. Gencay, R.: Non-linear Prediction of Security Returns with Moving Average Rules. J. Forecast. 15(3), 43–46 (1996) 16. G.Peter, Zhang. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 2003. 15(3): 43–46 17. Liu Tianyang CNN-LSTM model stock price trend prediction based on attention mechanism. Science and Technology Information, 2022, 15 (3): 43–46 18. Dong Chenyang Composite stock prediction model based on 1DCNN-LSTM and text mining. Northwest University, 2022 19. Comprehensive Research on the application of CNN and LSTM in the short-term stock price rise and fall prediction of cyclical stocks. Zhejiang University, 2022 20. Li Cong Quantitative investment strategy research based on LSTM neural network stock price prediction. Yunnan University of Finance and Economics, 2022 21. Ding Yuxin Research on stock price prediction model based on SVM. Shanxi University, 2021 22. Wu Chunxue, Lai Jingwen Research on stock forecasting method based on SVM and stock price trend. Software Guide, 2018, 43–46 23. Kim K J. Financial Time Series Forecasting Using Support Vector Machines. Neurocomputing, 2003, 55(1–2): 307–319. DIO: https://doi.org/10.1016/S0925-2312(01)003 72-2

Several Misconceptions and Misuses of Deep Neural Networks and Deep Learning K.-L. Du(B) Department of Electrical and Computer Engineering, Concordia University, Montreal, QC H3G 1M8, Canada [email protected] http://www.ece.concordia.ca/~kldu Abstract. Deep learning is widely accepted as a major drive of this artificial intelligence era. In this article, we first briefly describe the milestones of deep learning. Then, based on a mini-review of the literature, we point out some common misconceptions of deep neural networks and deep learning. We also point out that the deep learning approach is being misused in the present days.

Keywords: Deep neural networks learning · Misconception · Misuse

1

· Deep learning · Machine

Introduction

Since 2006, deep learning has become popular, and it has been used almost in all fields that need analysis of data, whether it is in the form of digits, numericals, characters, or symbols, or in the form of speech, image, video, text, or natural language. There are numerous emerging applications of deep learning or machine learning, such as recommendation and sentiment analysis on social media. Neural networks or machine learning has been employed as an alternative approach since the 1980s, as many traditional models, such as multilayer perceptron (MLP), radial basis function (RBF) network, recurrent models such as Hopfield model and Boltzmann machine, support vector machine (SVM), some clustering methods, ensemble learning, and probabilistic models based on Bayesian theorem, were proposed. Most of the models can be found from some hints from the human brain or animal neural systems, or psychology, hence the name neural networks [1,2]. All machine learning problems are actually optimization problems. Supervised neural network models solve the function approximation problem, which is categorized into regression and classification problems. The function approximation problem is also an optimization problem that minimizes the approximation error for the training sets. Unsupervised learning problems such as clustering are also optimization problems that minimize a total error. Recurrent models c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023  H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, pp. 155–171, 2023. https://doi.org/10.1007/978-981-99-3581-9_10

156

K.-L. Du

usually correspond to an optimization problem that corresponds to a stability problem, and the associative memory problem is an example [2]. Reinforcement learning is another important aspect in machine learning. It is inspired from the psychology of animal learning. It is actually a simulation-based dynamic programming method, and is used for solving Markov or semi-Markov decision problems that are traditionally solved with optimal control and solution by the high burden dynamic programming. In 1989, LeCun et al. proposed the seminal convolutional neural network called LeNet-5, which is a six-layer forward neural network , for classification of handwritten digits [3]. The LeNet-5 model can be trained with backpropagation (BP) algorithm. Using LeNet-5, it is possible to directly recognize visual patterns from raw pixels with little preprocessing. This model is believed to be the beginning of deep learning. Many concepts for deep learning are inherited from this model. The era of deep learning ushered in 2006 when Hinton et al. [4] proposed the deep belief network, also called deep Bayesian network, which is a sequential stack of restricted Boltzmann machines. Restricted Boltzmann machine is a restricted version of Boltzmann machine, trained by unsupervised learning. The deep belief network improves the classification of the handwritten digits to a substantially improved accuracy. As deep network structure was proved successful for large and complex problems, many new models are emerging by expanding traditional models with deep architecture. Most of the popular deep neural networks are based on the convolutional neural network, which is particularly useful for dealing with signals and images where convolution is a basic operation. Some examples are AlexNet, VGG-16, GoogLeNet, and ResNet. Generative adversarial network (GAN) is also one of the most prominent contributions in deep learning. Continually learning deep neural networks can always surpass the most brilliant brain on this planet for a specific scenario. Deep reinforcement learning arose at the same time. It is more like human intelligence and thus becomes a prominent direction of artificial intelligence (AI). Its first debut in DeepMind’s AlfaGo shocked the world. In 2015, AlphaGo defeated arguably the strongest professional human Go world champion in history [5]. Later on, AlphaFold, another product of DeepMind, can efficiently predict the 3D structure of a protein when its amino acid sequence is presented. It outperforms other protein structure prediction method by a large margin, with a prediction accuracy competitive with experiment [6]. Recently, by using AlphaFold, researchers at DeepMind have predicted the structures of 214 million proteins, essentially all known protein-coding sequences of living beings on this planet [7]. Traditionally, researchers experimentally identify the structure of proteins one by one. The powerful continual learning ability of deep learning models has been recognized by all trades and all disciplines including social sciences. In mobile communications, machine learning is widely used for cognitive radio in 5G, and the projected 6G is defined as AI-driven mobile communication. In this big data era, AI is advocated and permeates into all trades of business and non-profitable organizations, and our daily life. AI goes with big data, cloud

Several Misconceptions and Misuses of Deep Learning

157

computing, and the Internet of Everything, hand in hand. Deep learning and reinforcement learning are two most important approaches to AI. Deep learning models complex systems, while reinforcement learning simulates the problemsolving process. They both have roots in the brain functions. Before we proceed, we provide some representative or authoritative definitions of deep learning, machine learning, neural networks, and transfer learning. Many of the definitions are in confliction or misleading, and the borderlines between deep learning, machine learning, and neural networks are blurred. In this article, we try to clarify some of the widely accepted misconceptions. Definition of Deep Learning IBM [8]: “Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers. These neural networks attempt to simulate the behavior of the human brain-albeit far from matching its ability-allowing it to ‘learn’ from large amounts of data.” DeepAI [9]: “Deep Learning is a machine learning technique that constructs artificial neural networks to mimic the structure and function of the human brain. In practice, deep learning, also known as deep structured learning or hierarchical learning, uses a large number hidden layers -typically more than 6 but often much higher - of nonlinear processing to extract features from data and transform the data into different levels of abstraction (representations).” Oracle [10]: “Deep learning is a subset of machine learning, where artificial neural networks-algorithms modeled to work like the human brain-learn from large amounts of data.” MIT [11]: “Deep learning is in fact a new name for an approach to artificial intelligence called neural networks, which have been going in and out of fashion for more than 70 years.” Zhang et al. [12]: “Deep learning is a process not only to learn the relation among two or more variables but also the knowledge that governs the relation as well as the knowledge that makes sense of the relation.” Definition of Machine Learning IBM [13]: “Machine learning is a branch of AI and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.” Microsoft [14]: “Machine learning is the process of using mathematical models of data to help a computer learn without direct instruction. It’s considered a subset of AI. Machine learning uses algorithms to identify patterns within data, and those patterns are then used to create a data model that can make predictions. With increased data and experience, the results of machine learning are more accurate-much like how humans improve with more practice.” Oracle [15]: “Machine learning is the subset of AI that focuses on building systems that learn-or improve performance-based on the data they consume. AI is a broad term that refers to systems or machines that mimic human intelligence.” Google [16]: “Machine learning is a subset of artificial intelligence that enables a system to autonomously learn and improve using neural networks and deep

158

K.-L. Du

learning, without being explicitly programmed, by feeding it large amounts of data.” Definition of Neural Networks IBM [17]: “Neural networks, also known as artificial neural networks or simulated neural networks, are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another.” DeepAI [18]: “An artificial neural network learning algorithm, or neural network, or just neural net, is a computational learning system that uses a network of functions to understand and translate a data input of one form into a desired output, usually in another form. The concept of the artificial neural network was inspired by human biology and the way neurons of the human brain function together to understand inputs from human senses.” Amazon [19]: “A neural network is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain. It is a type of machine learning process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain. It creates an adaptive system that computers use to learn from their mistakes and improve continuously. Thus, artificial neural networks attempt to solve complicated problems, like summarizing documents or recognizing faces, with greater accuracy.” Definition of Transfer Learning Yu et al. [20]: “Transfer learning reapplies the learned knowledge on source domains to achieve good performance on different but related target domains.” Zhuang et al. [21]: “Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains.” Du and Swamy [2]: “Transfer learning describes the procedure of using data recorded in one task to boost the performance in another related task. It exploits the insight that generalization may occur not only within tasks, but also across tasks.”

2

Misconception: Deep Learning Contains Machine Learning

In recent years, deep learning seems to be everything and almighty. Researchers and companies try their best to include the deep learning concept into their proposals and implementations. Today, deep learning is treated as a synonym of AI, and thus it includes everything related to machine learning. Deep learning is, no doubt, a subset of machine learning. However, as deep learning proliferates, many authors definitely treat machine learning as a part of deep learning. This is truly a misconception.

Several Misconceptions and Misuses of Deep Learning

159

Machine learning, frequently treated as a synonym of neural networks, is an AI approach that exploits the learning ability of neural networks to reconstruct a complex system, and then use the generalization ability of the reconstructed system to generate output for a new input. The process of reconstructing a complex system by learning is known as training, while the generalization process is known as performing. In usual cases, machine learning corresponds to the case that a neural network model is used for a specific application such as classification, and the network structure is simple. The input and output dimensions of the network are relatively small. There is a feature selection or extraction phase to deal with the raw data to extract the input dimensions that are correlated with the output. This feature selection or extraction phase is usually hand-crafted. Prior to the deep learning era, application of machine learning was typically in this way. Some people may argue that machine learning is not equivalent to the neural networks approach, and it also contains such models as SVM, clustering, nearest neighbors, random forests, and ensemble learning. All these methods can be modelled as graphic or network architectures with computation nodes, which correspond to biological neurons, and the corresponding architecture parameters can be adjusted by learning. As a result, they can be in, a narrow sense, categorized into neural networks. In all the definitions of machine learning [13– 16], it simulates the process of human learning, which is rooted in the learning of neural networks. In Google’s definition of machine learning, it directly states that machine learning is realized by neural networks and deep learning. The major discrepancy between deep learning and traditional machine learning is that deep learning gets rid of the feature selection or extraction phase by using a deep architecture. Deep learning includes into its network structure many peculiarities that are actually used for feature extraction. With deep learning, one needs only to adapt his raw information to a selected deep neural network that has been well trained. This greatly liberates researchers from numerous arduous trials for extracting a group of features that are best suited for a network architecture. For deep learning, the training process is time-consuming, and GPU hardware is usually used to speed up the training process. The price paid for deep learning is worthwhile, since the resulting performance improves by a large margin when compared to conventional approaches. The trained deep model performs very fast as well, since there is no iteration as in the training process. In summary, deep learning is a subset of machine learning, and includes feature extraction as part of the network functions. The converse is a misconception.

3

Misconception: Deep Structure Is Superior to Shallow Structure

Deep neural networks may be much more expressive than shallow ones of comparable size. Those deep network models with rectified linear units (ReLUs) are capable of more efficiently approximating smooth functions than shallow ones

160

K.-L. Du

[22]. When approximating a deep neural network by a shallower one, one may require an exponential number of units. Assume that we have a ReLU network with L layers and U = Θ(L) units. In order to approximate it, a network having 1/3 O(L1/3 )) layers must have Ω(2L ) units [23]. Likewise, if a high-dimensional three-layer network is approximated by a two-layer network, an exponential blow-up in the number of nodes is required [24]. Deep neural networks are more effective than shallow ones in terms of computational complexity and generalization ability in case of function approximation with periodic characteristic over binary inputs [25]. Deep neural networks are proved capable of approximating a set of compositional functions with exponentially less VC-dimensions for the same accuracy, when compared with shallow ones [26]. Shallower but wider networks may be superior to a deep neural network, though most recent works advocate deep neural networks. By defining the capacity of an architecture using the binary logarithm of the number of functions it can compute, the capacity of layered, fully connected architectures of linear threshold neurons is derived [27]. In a general sense, shallow networks are capable of computing more functions than their deep counterparts in the same settings, but those functions computed by deep networks are usually more regular. Residual network (ResNet) is a popular model. Its success has been interpreted as ensembling exponentially many shallow neural networks, in that paths in a ResNet are not strongly dependent on one another, though trained jointly [28]. The gradient vanishing issue facing deep network architecture has been largely resolved by skip connections [29]. A wide 16-layer ResNet is superior to the original thin thousand-layer ResNet on CIFAR-10, and likewise, a 50-layer ResNet is superior to its 152-layer counterpart on ImageNet [30]. Training wide ResNets is also several times faster. In fact, the performance of a ResNet is dependent on the number of trainable parameters [30]. Handwritten Digit Recognition. We here give handwritten digit recognition as an example to support our viewpoint that deep is not always better. Two-dimensional grayscale image space is mapped into ten classes. The MNIST dataset is a dataset of handwritten digits. Table 1 lists soem of the results in the literature. In [3], LeCun designed a six-layer convolutional neural network (784-576-576768-192-10) trained with BP. Weight sharing and other heuristics are employed. There are 4,635 nodes, 98,442 connections, and a total of 2,578 independent parameters. The network was trained by using an approximate Newton based incremental BP algorithm, for 30 epochs through the training set. The error rates on the training and testing sets were 1.1% and 3.4%, respectively. An MLP achieved 1.5% when not hand-crafted for this particular application [4]. An MLP with a single 800-units hidden layer achieved an error rate of 0.70% [31]. More complex methods on the MNIST web page mostly outperformed MLPs. For comparison, the nearest neighbors method has an error rate of 3.1%, or 2.8% by using an L3 norm [4]. In [32], the proposed SVM method produces

Several Misconceptions and Misuses of Deep Learning

161

Table 1. Handwritten digit recognition using neural networks. Reference Network

Error rate Model type

[3]

Six-layer convolutional neural network 3.4%

Deep

[4]

MLP

1.5%

Shallow

[31]

Three-layer MLP

0.7%

Shallow

[4]

Nearest neighbors

2.8%

Shallow

[32]

SVM

1.4%

Shallow

[33]

Hand-coded recognition system

0.63%

Shallow

[34]

Domain-specific tricks

0.95%

Shallow

[4]

Five-layer deep belief network

1.2%

Deep

[32]

Convolutional neural networks

0.56%

Deep

[31]

Convolutional neural networks

0.40%

Deep

[35]

Convolutional neural networks

0.39%

Deep

[36]

Seven-layer MLP

0.35 %

Deep

[36]

Four-layer MLP

0.49%

Shallow

[37]

Three-layer spiking models

< 1.8%

Shallow

an error rate of 1.4%. In [33], a best hand-coded recognition algorithm achieved an error rate of 0.63%. Weight-sharing and subsampling, as two domain-specific tricks, are made use in [34] to reduce the error rate of discriminative neural networks from 1.5% to 0.95%. These models are known as shallow models. For an enhanced MNIST database, for the permutation-invariant version of the task, the deep belief network achieved an error rate of 1.25% on the official test set [4]. As a generative model, it has three hidden layers and a total of about 1.7 million weights. Layerwise training, which is further fine-tuned by BP, was performed for 30 epochs through the training set. Training pattern deformations were not implemented. The deep belief network achieved an error rate of 1.2% [4]. Error rate can be substantially reduced by supplementing the dataset with its deformed versions. By translation by one and two pixels, an error rate of 0.56% was obtained [32]. In [31], by using local elastic deformations a recordbreaking error rate of 0.40% was achieved on convolutional neural networks. Later, an error rate of 0.39% was achieved on a convolutional neural network, when hidden layers were pre-trained layer-by-layer in an unsupervised fashion using an energy-based model and then supervised learning was applied [35]. By experimenting on plain MLP trained with a good old online BP algorithm, an error rate of 0.35% was achieved on a seven-layer (784-2500-2000-1500-1000-50010) MLP, and even on a four-layer (784-1000-50-10) MLP, an error rate of 0.49% was achieved [36]. Numerous biologically plausible variants of deep learning typically reach around a test accuracy of 98% on the MNIST dataset [37]. In [37], a threelayer network was trained with biologically plausible, local learning rules. The hidden layer weights are either fixed at random, or trained by unsupervised learning that can be implemented by local rules. A supervised, local learning

162

K.-L. Du

rule is used to train the readout layer. The three-layer spiking models achieve a test accuracy of > 98.2% on MNIST, which approximates that of three-layer rate networks trained with BP. The shallow spiking network models have performance comparable to most of the biologically plausible deep learning models. A Summary. From the above literature survey on handwritten digit recognition, we know that both deep and shallow models can achieve a performance much better than humans. Both models can achieve very excellent error rate by exploiting a few domain-specific tricks to improve the generalization and feature extraction capabilities, such as weight-sharing, subsampling, dropout, and ReLU activation function, plus some enhancement on dataset by deformation. One more thing, most of the above results are better than that achieved by humans. Since there are some digits that are illegible to humans due to faulty segmentation, wrongly labelling, or ambiguous or erroneous writing, the labelling by humans may be wrong for these digits. A recognition rate by deep learning, when higher than that by humans, is somewhat meaningless since it is based on wrong ground truth. In summary, deep neural networks do not have competitive edge over shallow ones on the MNIST dataset. A shallow neural network can achieve better performance than a deep one. This is also validated by the No Free Lunch Theorem [2,38]. In the present days, many researchers pursue a powerful deep neural network that can have advantage over the existing methods. This is not necessary. Even if such a neural network is very accurate for some datasets, they are not always powerful for other datasets. This is clarified by the No Free Lunch Theorem. A deep neural network inflicts much more stringent restrictions on the training process as well as on the preparation and selection of the training dataset. Ensemble Learning. Another line of research for reliable decision is ensemble learning, which achieves highly reliable decision at much lower cost. Let us give an example. Assume that each projectile can independently detects an incoming missile and has a success rate of 70% for intercepting the missile. If three projectiles are launched for incepting a missile, in an ideal case, the probability for successfully intercepting the missile is 1-0.30.30.3 = 97.3%. If each projectile has a success rate of 80%, the final success rate for launching three projectiles is 99.2%. The effort for designing a controller with an accuracy of 70% is substantially lower than that for an accuracy of 97.3%. To achieve high reliability, a simple majority voting strategy for multiple classifiers can have very reliable final accuracy [2].

4

Misconception: Deep Learning Is Learning by a Deep Layered Neural Network

Most users believe that a neural network that has more than one hidden layers or have more than three layers counting the input and output layers can be called

Several Misconceptions and Misuses of Deep Learning

163

a deep neural network, and that the approach for training the model as well as using this model is known as deep learning. This is supported by IBM’s [8] and DeepAI’s [9] definitions of deep learning. Oracle’s definition of deep learning [10] has no restriction on the number of layers, but stress that the model is trained on large amounts of data. MIT’s definition of deep learning [11] states that deep learning is just a new name of neural networks, and in Amazon’s definition of neural networks [19] neural networks is called deep learning. Zhang et al.’s definition of deep learning [12] does not mention the number of layers, the number of parameters, and the amount of the data, but stresses that the system learns not only the relation between variables but also the knowledge behind the relation. Therefore, there is no consensus on the definition of deep learning. We hold a viewpoint that a neural network model that is obtained by simple stacking of layers should not be called deep neural network and thus the method should not be called deep learning. It was well-known that a three-layer feedforward network such as MLP or RBF network is a universal approximator. So three-layer architecture is generally used, and this architecture is very successful when the classical BP learning algorithm is replaced by some more efficient second-order training algorithms. In the early days, researchers already found that MLP with more layers can generally uses fewer parameters for the same training accuracy. Most applications used a structure as simple as possible for better generalization [2]. Many researchers designed various methods to remove redundancy in network architecture. They spent much time to discover the correlation between the input and output by implementing feature extraction. When the accuracy of a classifier is undesirable, researchers proposed various fusion strategies for combining the outputs of multiple classifiers to achieve almost sure classification results. This line of research is very successful. Hinton’s deep belief network [4] is obtained by stacking a large number of restricted Boltzmann machine and training using error backpropagation. A deep neural network is usually trained in two stages: layerwise pretraining and finetuning. Due to the large number of layers and parameters, only layerwise training by BP on distributed computer systems such as GPUs is a viable way to combat the complexity. The vanishing gradient problem is a well-known problem of error backpropagation, and the ReLU activation is used to avoid this drawback. The dropout strategy is used to combat network node faults to increase the robustness and generalization of the model. In our view, when a traditional multilayer neural network is simply expanded with more hidden layers, the resulting model is still a traditional model, and should not be called a deep neural network. For example, an auto-encoder is usually implemented as a symmetrical five- or seven-layer MLP with a bottleneck layer in the middle, and is trained with the same input and output. It performs principal component analysis (PCA). The output at the bottleneck layer gives the principal components of the input [2]. Such a traditional multilayer model was proposed in the 1980s.

164

K.-L. Du

In our view, deep neural networks and deep learning are not merely related to the number of layers, but rather related to the total number of parameters to adjust. In order to trained such a complex system, many unconventional strategies should be embodied in the models. To guarantee that all the parameters are “deeply” or fully adjusted, an enormous size of dataset is needed to ensure that the model is not over-trained and has good generalization capability. We here give our definition of deep learning: Deep learning is a neural network with three or more layers, with a very large number of parameters, typically more than tens of thousands, to be “deeply” learnt, or fully adjusted, from large amounts of data. For example, LeCun’s convolutional neural network LeNet-5 [3] has only six layers, but it has 2,578 parameters to adjust. Hinton’s deep belief network [4] has only five layers, but it has 1.7 million weights to learn. These early methods are known as deep learning. Many models proposed thereafter have hundreds of layers or millions of parameters. Our understanding of deep learning is that a multilayer neural network with a large number of parameters are “deeply” trained such that it can “deeply” represent a complex system.

5

Misconception: Transfer Learning Is Always Useful

Since Hinton’s deep belief network in 2006, many deep neural networks have been proposed. These models go deeper and deeper, from several to dozens to hundreds to thousands of layers, and the total number of model parameters can be in tens of millions. Neural networks of such complexity have strict demand on the training dataset, and the learning time. Training of such a large neural network model requires very large dataset, and only traditional error backpropagation or layerwise idea can be used for training to reduce the computational complexity. For practical use, such deep neural networks have to be pretrained for general use by using for a very large domain dataset. To cater for this purpose, the ImageNet dataset collects over ten million images of one thousand classes. Training of such a deep neural network has to be based on GPU-assisted distributed computing systems. Once the model is trained for classification on ImageNet, it is available for download for various uses. Since many existing deep learning models are available which have already been trained by existing big datasets such as ImageNet. These well-trained models can be directly used, for many other different but related image recognition problems. This is known as transfer learning [2,20,21]. The objective is to generalize to a target task the knowledge gained from a source task. The model is then further trained by using the raw dataset prepared for the specific target task. Although a model is trained for image classification on ImageNet, it may not be useful for classifying music, diagnosing a disease, classifying the state of a machine, or sentiment analysis, since they are not related to image classification. However, due to the availability of large image datasets, most of the

Several Misconceptions and Misuses of Deep Learning

165

popular deep learning models are pretrained and evaluated by image datasets such as ImageNet. A user is unable to start over the training of the model, since he does not have a large domain dataset of sufficient size, and has no sufficient computation resources for training. He may be able to collect a specific dataset for classifying his application. Besides, the trained deep models have fixed structure. In order to use a trained deep model, a common practice is to modify the deep model by modifying some of the layers to adapt to a specific problem. For example, layers are attached to the input and output layers of a deep neural network to adapt it to the problem dimension. The parameters of the trained deep neural network portion are used as initial parameters for the new model, but are kept fixed, or modified by using a BP or layerwise idea [39] on a specific dataset for a specific application. While most of the parameters of the deep network portion are changed slightly, the parameters of the newly added layers are adjusted substantially to adapt to the new dataset. This transfer learning idea is commonly used in the deep learning practice. Transfer learning makes deep learning feasible, and it is successful. But there is no theoretical foundation to support this idea. The deep learning community advocates transfer learning based on a heuristic: a person with schooling of many years can easily learn new things. They are enchanted by deep learning in the deep learning era. As long as the concept of deep learning is used in their articles, their research work is thought to be up to date, and is easy to get published. On our mind, transfer learning is unfounded and is a misconception. In fact, even a randomly initialized deep neural network trained by a specific domain dataset can also give a satisfactory result. Let us give an example. The extreme learning machine (ELM) [40], which is merely an RBF network or a single-hidden-layer feedforward network that has a large size of hidden nodes. The parameters of the hidden-layer nodes are randomly selected and the output weights are analytically determined. For a given training dataset, only the linear weights of the output layer are updated by applying pseudoinverse of a matrix, while all the other parameters are randomly set. Only one pseudoinverse operation is performed, and thus the learning process is very fast. The training method is thousands of times faster than BP algorithm or training an SVM, and the generalization performance is also good. The author of ELM was very proud of this trick, and named this method a brilliant name. They organized annual conferences on ELM, and numerous papers published, rousing a hustle for a decade. In fact, the low computational burden of ELM arises from the fact that very few neurons are used to synthesize the estimator while the majority of the neurons are not aroused. However, due to the bias–variance tradeoff, a satisfactory generalization performance may not be obtained for a sparse model. Similar ideas that use random features in training neural networks have been implemented long before ELM, such as in a single hidden-layer network with sigmoidal activation in 1992 [41], and the random vector functional link network in 1994 [42], in reservoir computing [43] and in early deep learning models. Similar idea is also used in cascade-correlation learning [44], which freezes all previous

166

K.-L. Du

trained units when a new unit is added. Early in 1988, the RBF network was implemented with the randomized selection of centers [2,42]. According to the comparisons of many biologically plausible shallow and deep neural network models on MNIST, networks with localized receptive fields significantly outperform fully connected networks [27]. LeCun even stated that “ELM is officially a fraud.” [45]. According to Occam’s razor, an over-parameterized model does not generalize well. Many researchers are amazing on the good generalization performance of deep learning: the overfitting problem does not occur. They developed theory and experiment to explain this: either treating the training process as a regularization problem [46,47], or finding that the deep neural network models contain suboptimal local minima and saddle points which can be easily found by gradient descent [48,49]. The long error propagation path over deep layers may cause gradient vanishing at many inner nodes and leads to low node updates, leading to weak representation power of intermediate layers and the redundant model. The popular use of ReLU activation φ(x) = max(x, 0) can alleviate the gradient vanishing problem since it eliminates the saturation zone to enable error propagation over the network. The number of functioning nodes is also reduced substantially, since as long as the weighted sum as the input to a neuron is negative, ReLU function will generate zero. This greatly reduces the total number of parameters. We believe that this structural self-regularization is a more reasonable explanation, since most of the model parameters are idle and it is a natural sparse network and thus has good generalization. We believe that transfer learning works in the same way as ELM. An enormous number of parameters of a deep neural network are actually idle, and they do not participate into the calculations, just as only very few neurons in our brain are working for a specific task. Even if the pretrained parameters in a neural network are reset to random numbers before learning a new task, good results can still be achieved. This is validated in [37]. That is, transfer learning has no significant advantage over learning from start for a new task. A neural network of enormous size can rapidly adapt part of its parameters to a new application task, while most of the other parameters are fixed randomly, and then slightly adjusted. We believe that transfer learning arises from a heuristic that a well-trained person can learn a new, related thing very fast. The original definition of transfer learning is that knowledge gained from a source domain is transferred to a different but related target domain to accelerate learning. This is related to case-based or analogical reasoning, and it has certain rationality. For example, a person learned C++ can learn Java language very rapidly. However, in most practical applications of transfer learning, the transfer of knowledge is not to a related target domain, but across domains. There is no theoretical foundation to support this heuristic. When the brain is trained for some things, for a new thing in a different domain, most of the past training can be useless, and one has to be briefly trained from start. This resembles the

Several Misconceptions and Misuses of Deep Learning

167

transfer learning process across domains. As an example, a professor with forty years of experience in chemical engineering is not a rapid learner of computer programming compared to a grade-1 college student. In summary, transfer learning is reasonable only when a target task is related to the source task, but is unfounded for widely used across-domain applications. The success of transfer learning across domains can be explained as the transferred knowledge is not exploited and a sparse model is trained for a new task.

6

Misuse of Deep Learning

Deep learning is, no doubt, an effective numerical tool for approximation of complex models and is usually used as an auxiliary method. In recent years, largely due to DeepMind’s successes of AlphaGo and AlphaFold, a wave of industrial investments and academic proposals have been under way. This wave of hype may continue for a few years. In almost all sectors and fields, scientists and engineers are rushing in and publishing numerous papers identified as deep learningassisted methods for specific applications. Many new journals have been launched for submissions related to AI and machine learning. In numerous published papers, classical topics in all disciplines are solved repeatedly by various neural network models. Many authors claim that they exploited deep learning, but it turns out to be a classical machine learning method. In some papers, authors customized the architecture of a popular deep learning model for specific applications, and then claimed a new model. Such combinations and modifications can lead to numerous papers but have no essential contribution to their disciplines. We do not advocate this kind of research. Deep learning is just a statistical approach, like many other applied mathematics branches. We should not overstress its importance. However, due to the present hype in deep learning, authors realize that a combination of specific applications with deep learning makes their manuscripts very easily accepted for publication. Here we give another example. One of the most important things that will influence our daily life is the 6G mobile communication, which is under development and is projected for deployment in 2030. As 6G is consensually anticipated to be AI-driven, deep learning will be in charge of learning the complex channels, user habits, policies, and everything related to the communications. The 6G standards will define an air interface, while the complex protocols will not be defined and everything will be done ad hoc by AI. To this end, more and more papers on deep learning based communication topics are published in top IEEE journals. Recently the IEEE launched a new journal IEEE Transactions on Machine Learning in Communications and Networking for this purpose. In the past, scientists proposed various channel models based measured data from experiments, in an attempt to find out the underlying physical laws between channels and environments, and then generalize to more complex systems. They created knowledge. However, machine learning is a black-box model. Data are collected and fed to deep neural networks. People take the results from deep

168

K.-L. Du

learning for granted, and cannot know internal mechanism of things. Humans will not think the nature of things. Instead, they will only provide raw data to deep neural networks, and even the measurement of data will be continually done by AI. The trained deep neural network will emulate a complex system, and will generalize for a new input. In such a trend, scientists will become technicians, and their jobs can also be done by robots. Humans will lose the ability to extract knowledge, and human minds will retrograde. The future of humans will be highly dependent on AI. When AI falters, humans will find nowhere. We also notice that numerous papers have been published by using machine learning to classify some medical cases collected from medical practice. In the past, statistical tools such as factor analysis were used for data analysis. This helps to gain knowledge by finding the underlying correlations between factors. These days, deep learning is used to classify whatever medical datasets they have collected, and people are highly dependent on a black-box system. Just in the past pandemic, numerous papers on COVID-19 forecast have been published by using various combinations of statistical tools, machine learning or deep learning methods, and different datasets. These are finally proved merely a momentary hustle, and no tangible knowledge is added to the discipline of epidemiology. Another misuse is that many researchers call whatever machine learning methods as deep learning or deep neural networks. They may exploit the present wave of spotlight of deep learning to make publication easier. However, neural networks or machine learning is not new.

7

Conclusion

Deep learning, like many applied mathematics disciplines, is a statistical approach and a tool for modelling complex systems. In this article, we have clarified some of widely accepted misconceptions regarding deep learning, and also pointed out that deep learning is misused in the present days. The presently popular deep learning is merely an extension of conventional neural networks or machine learning approach, by substantially increasing the model complexity described by the number of layers and the total number of parameters. A high complexity system is powerful, but highly dependent on the distributed computing and extremely large datasets to sufficiently train all the model parameters. Small network structure is also very useful and efficient for specific applications. When the accuracy of deep learning is highly above that of humans, the result may be unreliable, since the training samples provided by humans may be incorrectly labelled as well. While deep learning is a buzz word and is pursued everywhere, the price paid for achieving high accuracy by deep learning may be enormous. A low-cost alternative is emsembling learning. The transfer learning strategy, which is widely used in deep learning, is a misconception and its validity is conjectured to be due to the sparsity in the model due to the training process. Deep learning is highly useful and helps to assist scientists and practitioners in developing highly reliable systems, but it has already been misused. The

Several Misconceptions and Misuses of Deep Learning

169

humankind should treat AI, including deep learning, as an auxiliary tool to improve their objectives, but should never be overly dependent on this blackbox system. Acknowledgment. The author would like to thank Dr. M.N.S. Swamy for insightful discussions.

References 1. Du, K.-L., Swamy, M.N.S.: Neural Networks in a Softcomputing Framework. Springer, London (2006). https://doi.org/10.1007/1-84628-303-5 2. Du, K.-L., Swamy, M.N.S.: Neural Networks and Statistical Learning, 2nd edn. Springer, London (2019). https://doi.org/10.1007/978-1-4471-7452-3 3. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., et al.: Handwritten digit recognition with a back-propagation network. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, vol. 2, pp. 396–404. Morgan Kaufmann, San Mateo (1989) 4. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006) 5. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961 6. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., et al.: Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2 7. Editorial: How AlphaFold can realize AI’s full potential in structural biology. Nature 608(8) (2022). https://doi.org/10.1038/d41586-022-02088-x 8. IBM: What is Deep Learning? https://www.ibm.com/cloud/learn/deep-learning. Accessed 13 Mar 2023 9. DeepAI: Deep Learning Definition. https://deepai.org/machine-learning-glossaryand-terms/deep-learning. Accessed 13 Mar 2023 10. Oracle: What is Deep Learning? https://www.oracle.com/artificial-intelligence/ machine-learning/what-is-deep-learning/. Accessed 13 Mar 2023 11. Hardesty, L.: MIT News Office: Explained: Neural networks (2017). https://news. mit.edu/2017/explained-neural-networks-deep-learning-0414. Accessed 13 Mar 2023 12. Zhang, W.J., Yang, G., Lin, Y., Ji, C., Gupta, M.M.: On definition of deep learning. In: Proceedings of the IEEE 2018 World Automation Congress, Stevenson, WA, USA, pp. 232–236 (2018) 13. IBM: What is Machine Learning? https://www.ibm.com/cloud/learn/machinelearning. Accessed 13 Mar 2023 14. Microsoft Azure: What is machine learning? https://azure.microsoft.com/enus/resources/cloud-computing-dictionary/what-is-machine-learning-platform/. Accessed 13 Mar 2023 15. Oracle: What is Machine Learning? https://www.oracle.com/artificialintelligence/machine-learning/what-is-machine-learning/. Accessed 13 Mar 2023 16. Google: What is Machine Learning? https://cloud.google.com/learn/what-ismachine-learning. Accessed 13 Mar 2023

170

K.-L. Du

17. IBM: What are Neural Networks? https://www.ibm.com/cloud/learn/neuralnetworks. Accessed 13 Mar 2023 18. DeepAI: What is a Neural Network? https://deepai.org/machine-learningglossary-and-terms/neural-network. Accessed 13 Mar 2023 19. AWS: What is a Neural Network? https://aws.amazon.com/what-is/neuralnetwork/?nc1=h ls. Accessed 13 Mar 2023 20. Yu, F., Xiu, X., Li, Y.: A survey on deep transfer learning and beyond. Mathematics 10(3619) (2022). https://doi.org/10.3390/math10193619 21. Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021). https://doi.org/10.1109/JPROC.2020.3004555 22. Yarotsky, D.: Error bounds for approximations with deep ReLU networks. Neural Netw. 94, 103–114 (2017) 23. Telgarsky, M.: Benefits of depth in neural networks. In: Proceedings of the 29th Annual Conference on Learning Theory (PMLR), New York, NY, vol. 49, pp. 1517–1539 (2016) 24. Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Proceedings of the 29th Annual Conference on Learning Theory (PMLR), New York, NY, vol. 49, pp. 907–940 (2016) 25. Szymanski, L., McCane, B.: Deep networks are effective encoders of periodicity. IEEE Trans. Neural Netw. Learn. Syst. 25(10), 1816–1827 (2014) 26. Mhaskar, H., Liao, Q., Poggio, T.: Learning functions: when is deep better than shallow. CBMM Memo No. 045 (2016). https://arxiv.org/pdf/1603.00988v4.pdf 27. Baldi, P., Vershynin, R.: The capacity of feedforward neural networks. Neural Netw. 116, 288–311 (2019) 28. Veit, A., Wilber, M., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: Advances in Neural Information Processing Systems, vol. 29, pp. 550–558 (2016) 29. He, F., Liu, T., Tao, D.: Why ResNet works? residuals generalize. IEEE Trans. Neural Netw. Learn. Syst. 31(12), 5349–5362 (2020). https://doi.org/10.1109/TNNLS. 2020.2966319 30. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: Proceedings of British Machine Vision Conference, Newcastle, UK, pp. 87.1–87.12 (2016) 31. Simard, P.Y., Steinkraus, D., Platt, J.: Best practice for convolutional neural networks applied to visual document analysis. In: Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp. 958–962. IEEE Computer Society, Los Alamitos (2003) 32. Decoste, D., Schoelkopf, B.: Training invariant support vector machines. Mach. Learn. 46, 161–190 (2002) 33. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002) 34. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 35. Ranzato, M.A., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: Advances in Neural Information Processing Systems, vol. 19, 1137–1144 (2006) 36. Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010) 37. Illing, B., Gerstner, W., Brea, J.: Biologically plausible deep learning - but how far can we go with shallow networks? Neural Netw. 118, 90–101 (2019). https:// doi.org/10.1016/j.neunet.2019.06.001

Several Misconceptions and Misuses of Deep Learning

171

38. Du, K.-L., Swamy, M.N.S.: Search and Optimization by Metaheuristics. Springer, New York (2016). https://doi.org/10.1007/978-3-319-41192-7 39. Du, K.-L., Leung, C.-S., Mow, W.H., Swamy, M.N.S.: Perceptron: learning, generalization, model selection, fault tolerance, and role in the deep learning era. Mathematics 10, 4730 (2022). https://doi.org/10.3390/math10244730 40. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006) 41. Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feedforward neural networks with random weights. In: Proceedings of 11th IAPR International Conference on Pattern Recognition, The Hague, Netherlands, vol. 2, pp. 1–4 (1992) https://doi.org/10. 1109/ICPR.1992.201708 42. Suganthan, P.N., Katuwal, R.: On the origins of randomization-based feedforward neural networks. Appl. Soft Comput. 105, 107239 (2021) 43. Jaeger, H.: The “echo state” approach to analyzing and training recurrent neural networks. GMD Technical Report 148. German National Research Center for Information Technology, Sankt Augustin, Germany (2001) 44. Fahlman, S.E., Lebiere, C.: The cascade-correlation learning architecture. In: Advances in Neural Information Processing Systems, vol. 2, pp. 524–532 (1990) 45. Birost: “Yann LeCun: Who can explain where the extreme learning machine (ELM) is?” https://blog.birost.com/a?ID=e170d2e1-62f6-43e0-9b64f6510be36803. Accessed 13 Mar 2023 46. Mhaskar, H.N., Poggio, T.: An analysis of training and generalization errors in shallow and deep networks. Neural Netw. 121, 229–241 (2020). https://doi.org/ 10.1016/j.neunet.2019.08.028 47. Martin, C.H., Mahoney, M.W.: Implicit self-regularization in deep neural networks: evidence from random matrix theory and implications for learning. J. Mach. Learn. Res. 22, 1–73 (2021) 48. Liu, B., Liu, Z., Zhang, T., Yuan, T.: Non-differentiable saddle points and suboptimal local minima exist for deep ReLU networks. Neural Netw. 144, 75–89 (2021). https://doi.org/10.1016/j.neunet.2021.08.005 49. Petzka, H., Sminchisescu, C.: Non-attracting regions of local minima in deep and wide neural networks. J. Mach. Learn. Res. 22, 1–34 (2021)

Author Index

B Bai, Zhouyuzhe 148 C Chen, Weifeng 81 Chen, Yu 27 Cong, Yang 81 D Du, K.-L. 155 F Fan, Pingyi 3 Fan, Qiang 3 Fletcher-Smith, Cayo 95 G Gao, Song 127 Guo, Lan 81 H Hao, Xing’an 127 Hou, Mingguo 68 Hou, Taogang 47 Hu, Chuanmei 127 Huang, Yao 58 Huang, Yifan 127 Huang, Yuxuan 47 J Ji, Rongrong 68 Jiang, Gang 127 Jiang, Jie 127

L Lai, Yuanming 127 Li, Fenfang 68 Lin, Haopeng 47 P Pei, Xuan 47 Peng, Yue 127 Q Qu, Qian

27

R Ruan, Lixiang

68

S Sallal, Muntadher Shen, Yifei 68 Sun, Han 27

95

W Wang, Qian 127 Wang, Siyuan 3 Wang, Wendi 47 Wu, Qiong 3 Y Yan, Xuechun 81 Yang, Lanying 127 Yang, Yatao 58 Yang, Zihan 58 Yi, Shi 127 Z Zhang, Li 58 Zhang, Shijie 47 Zheng, Wenwen 47

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 H. Chen et al. (Eds.): CNIS 2023, CCIS 1839, p. 173, 2023. https://doi.org/10.1007/978-981-99-3581-9