3D Imaging―Multidimensional Signal Processing and Deep Learning: 3D Images, Graphics and Information Technologies, Volume 1 (Smart Innovation, Systems and Technologies, 297) 9811924473, 9789811924477

This book gathers selected papers presented at the conference “Advances in 3D Image and Graphics Representation, Analysi

120 43 8MB

English Pages 273 [262] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

3D Imaging―Multidimensional Signal Processing and Deep Learning: 3D Images, Graphics and Information Technologies, Volume 1 (Smart Innovation, Systems and Technologies, 297)
 9811924473, 9789811924477

Table of contents :
Preface
Contents
About the Editors
1 Color Restoration of RGB-NIR Images in Low-Light Environment Using CycleGAN
1.1 Introduction
1.2 CycleGAN Structure
1.2.1 The Overall Structure
1.3 Objective Function
1.3.1 The Basic Objective Function of GANs
1.3.2 Objective Function of Cyclic Transformation Consistency
1.3.3 Total Objective Function of CycleGAN
1.4 Experiment and Evaluation
1.4.1 Image Data Set in Low-Light Environment
1.4.2 Training Method
1.5 Quantitative and Qualitative Evaluation
1.6 Conclusion and Discussion
References
2 Dynamic Grey Wolf Optimization Algorithm Based on Quasi-Opposition Learning
2.1 Introduction
2.2 Grey Wolf Optimization Algorithm
2.3 Improved Grey Wolf Algorithm
2.3.1 Opposition-Based Learning (OBL)
2.3.2 Dynamic Search Strategy
2.3.3 Improved the Grey Wolf Algorithm to Optimize the Process
2.4 Numerical Experiment and Result Analysis
2.4.1 Influence of Improved Strategy on GWO
2.4.2 Compared with Other Swarm Intelligence Optimization Algorithms
2.5 Conclusion
References
3 Image Recognition Methods Based on Deep Learning
3.1 Introduction
3.1.1 Overview of Image Recognition
3.1.2 Preprocessing in Image Recognition
3.1.3 Feature Extraction in Image Recognition
3.2 Deep Learning Model for Image Recognition
3.2.1 Feedforward Neural Network (FNN)
3.2.2 Convolutional Neural Network (CNN)
3.2.3 Recursive Neural Network (RNN)
3.2.4 U-Net Convolutional Neural Network
3.2.5 Long Short-Term Memory Network
3.2.6 Auto-Encoder Neural Network
3.2.7 Generative Adversarial Network (GAN)
3.2.8 Deep Belief Network (DBN)
3.3 The Application of Deep Learning in Image Recognition
3.3.1 Face Recognition
3.3.2 Traffic Image Recognition
3.3.3 Medical Image Recognition
3.4 Train a Convolutional Neural Network from a Small Data Set
3.4.1 Data Sorting and Network Building
3.4.2 Data Processing and Reading
3.4.3 Image Processing
3.5 Conclusion
References
4 Longitudinal Structure Analysis and Segmentation Algorithm of Dongba Document
4.1 Overview of Dongba Hieroglyphics
4.2 Structure of Dongba Document Image
4.3 Preprocessing of Dongba Documents
4.4 Automatic Segmentation and Recognition of Columns
4.5 Experiment
4.6 Conclusion
References
5 Overview of SAR Image Change Detection Based on Segmentation
5.1 Introduction
5.2 Traditional Image Change Detection Methods
5.2.1 Image Difference Method
5.2.2 Image Ratio Method
5.2.3 Correlation Coefficient Method
5.2.4 Image Regression Method
5.3 SAR Image Change Detection Method Based on Segmentation
5.4 Simulation Results
5.5 Summary and Expectation
References
6 Full-Focus Imaging Detection of Ship Ultrasonic-Phased Array Based on Directivity Function
6.1 Introduction
6.2 Method
6.2.1 Principle Analysis
6.2.2 Full-Focusing Algorithm in Frequency–Wave Number Domain
6.3 Result Analysis
6.3.1 System Introduction
6.3.2 Test Experiment
6.4 Conclusion
References
7 A Novel Space Division Rough Set Model for Feature Selection
7.1 Introduction
7.2 Related Work
7.3 Our Approach
7.4 Experiments
7.5 Conclusion
References
8 Development of Mobile Food Recognition System Based on Deep Convolutional Network
8.1 Introduction
8.2 Related Work
8.2.1 Food Recognition Models
8.2.2 Mobile Food Recognition System
8.3 Methods
8.3.1 Deep Convolutional Neural Network
8.3.2 Training Food Recognition Models
8.3.3 Deploying the Recognition Application on Android Side
8.4 Experiment Results
8.4.1 “ChineseFood80” Dataset
8.4.2 Train and Select Food Recognition Models
8.4.3 Food Recognition Application on Android Device
8.5 Conclusion
References
9 Water Environmental Quality Assessment and Effect Prediction Based on Artificial Neural Network
9.1 Introduction of Artificial Neural Network Model for Water Environmental Quality Assessment
9.2 Prediction Model Based on Levenberg–Marquardt Optimization Algorithm
9.2.1 Time Series
9.2.2 Algorithm of Prediction Model
9.2.3 Defining the Grid Structure
9.2.4 Sample Selection and Training Methods
9.3 Predictive Analysis
9.4 Conclusion
References
10 Network Intrusion Detection Based on Apriori-Kmeans Algorithm
10.1 Introduction
10.2 Research Status
10.3 Apriori-Kmeans Algorithm
10.4 Intrusion Detection Model Based on Apriori-Kmeans Algorithm
10.5 Simulation Experiment
10.6 Summary
References
11 A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information
11.1 Introduction
11.1.1 Optimization of the Selection of Initial Centroids
11.1.2 Accelerate Approximate K-means
11.1.3 Accelerate Exact K-means
11.1.4 Our Contribution
11.2 A Heuristic K-means Algorithm
11.2.1 Narrow the Search Space of Sample Points
11.2.2 Reduce the Number of Sample Points for Reallocation
11.2.3 Algorithm Flow Chart
11.3 Experiments
11.4 Conclusion
References
12 Global Analysis of Discrete SIR and SIS Systems
12.1 Introduction
12.2 Continuous Model
12.3 Discretization of Continuous Models
12.4 The Second Discrete Model
12.5 Stability Analysis
12.6 Globally Asymptotically Stable
12.7 Conclusion
References
13 Image-Based Physics Rendering for 3D Surface Reconstruction: A Survey
13.1 Introduction
13.2 Research Status of 3D Reconstruction Based on Image
13.3 Image-Based 3D Surface Reconstruction
13.3.1 Laser Scanning Method
13.3.2 Time-Of-Flight Method
13.3.3 Structured Light Method
13.3.4 Shape from Shading Method
13.3.5 Shape from Silhouettes Method
13.3.6 Shape-from-Motion Method
13.3.7 Shape-from-Texture Method
13.3.8 Shape-from-Focus Method
13.3.9 Photometric Stereo
13.4 Summary
References
14 Insulator Detection Study Based on Improved Faster-RCNN
14.1 Introduction
14.2 Sample Expansion
14.3 Insulator Identification and Positioning
14.3.1 Faster RCNN Detection Principle
14.3.2 Improved RPN
14.3.3 Residual Networks (ResNet)
14.3.4 Multi-Scale Training
14.3.5 Comparison of Different Detection Methods
14.4 Detection of Defective Insulators
14.5 Improved Faster-RCNN
14.6 Experimental Verifications
14.6.1 Conduct a Comparative Experiment
14.6.2 Compare Experimental Results
14.7 Summary
References
15 Citrus Positioning Method Based on Camera and Lidar Data Fusion
15.1 Introduction
15.2 Method
15.2.1 The Citrus Positioning Algorithm
15.2.2 Preliminary Positioning of Pixel Coordinates of Citrus
15.2.3 Camera and Lidar Joint Calibration
15.2.4 Camera and Lidar Data Fusion
15.2.5 Conversion of Citrus Pixel Coordinates to Three-Dimensional Space Coordinates
15.3 Experiments
15.3.1 Environment of System
15.3.2 Detection Effects of Citrus
15.3.3 The Results of Camera Internal Parameter Calibration
15.3.4 The Results of Camera and Lidar Joint Calibration
15.3.5 The Results of Citrus Positioning
15.4 Conclusion
References
16 Comparative Analysis of Automatic Poetry Generation Systems Based on Different Recurrent Neural Networks
16.1 Introduction
16.2 Problem Formation
16.3 The Invariant Testbed
16.4 The Internal Logic of RNN Modules
16.5 Results
16.6 Analysis and Expectation
References
17 Grid False Data Intrusion Detection Method Based on Edge Computing and Federated Learning
17.1 Introduction
17.2 Research Status
17.3 Principles of False Data Injection Attacks
17.4 Design of Intrusion Detection Model Based on Edge Computing and Federated Learning
17.4.1 Edge Computing
17.4.2 Federated Learning
17.4.3 Framework Based on Edge Computing and Federated Learning
17.4.4 CNN-LSTM Joint Detection Model
17.5 Case Analysis
17.6 Summary
References
18 Innovative Design of Traditional Arts and Crafts Based on 3D Digital Technology
18.1 Introduction
18.2 Application Steps
18.2.1 Preparation Period
18.2.2 Design Period
18.2.3 Optimization Period
18.3 Application Direction
18.3.1 Ceramic
18.4 Conclusion
References
19 Research on the Simulation of Informationized Psychological Sand Table Based on 3D Scene
19.1 Introduction
19.2 Method
19.2.1 Design and Application
19.2.2 Hardware Design
19.2.3 Infrared Scanning Design
19.2.4 Software Design
19.3 Result Analysis
19.4 Conclusion
References
20 Research on Graphic Design of Digital Media Art Based on Computer Aided Algorithm
20.1 Introduction
20.2 Method
20.2.1 Digital Media Art Graphic Design Content
20.2.2 Shape Interpolation
20.3 Result Analysis
20.4 Conclusion
References
21 Research on Visual Communication of Graphic Design Based on Machine Vision
21.1 Introduction
21.2 Method
21.2.1 Design Process
21.2.2 Visual Analysis
21.3 Result Analysis
21.4 Conclusion
References
22 Research on the Adaptive Matching Mechanism of Graphic Design Elements Based on Visual Communication Technology
22.1 Introduction
22.2 Method
22.2.1 Hardware Design
22.2.2 Software Design
22.3 Result Analysis
22.4 Conclusion
References
23 Design of Intelligent Recognition English Translation Model Based on Improved Machine Translation Algorithm
23.1 Overview of Intelligent Translation
23.2 Design and Analysis of Intelligent Recognition English Translation Model Based on Improved Machine Translation Algorithm
23.2.1 Applications
23.2.2 Design Model
23.2.3 Experimental Analysis
23.3 Conclusion
References
24 Ore Detection Method Based on YOLOv4
24.1 Introduction
24.2 Related Work
24.2.1 Ore Classification
24.2.2 YOLOv4
24.2.3 Random Forest
24.3 Experiments
24.3.1 Implementation Details
24.3.2 YOLOv4 Object Detection
24.3.3 Neural Network Results
24.3.4 Data Augmentation
24.3.5 Auto MSRCR
24.3.6 Random Forest Classification Experiment
24.4 Conclusions
References
Author Index

Citation preview

Smart Innovation, Systems and Technologies 297

Lakhmi C. Jain Roumen Kountchev Yonghang Tai Roumiana Kountcheva   Editors

3D Imaging— Multidimensional Signal Processing and Deep Learning 3D Images, Graphics and Information Technologies, Volume 1

Smart Innovation, Systems and Technologies Volume 297

Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK

The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.

More information about this series at https://link.springer.com/bookseries/8767

Lakhmi C. Jain · Roumen Kountchev · Yonghang Tai · Roumiana Kountcheva Editors

3D Imaging— Multidimensional Signal Processing and Deep Learning 3D Images, Graphics and Information Technologies, Volume 1

Editors Lakhmi C. Jain KES International Shoreham-by-Sea, UK

Roumen Kountchev Technical University of Sofia Sofia, Bulgaria

Yonghang Tai Yunnan Normal University Kunming, China

Roumiana Kountcheva TK Engineering Sofia, Bulgaria

ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-19-2447-7 ISBN 978-981-19-2448-4 (eBook) https://doi.org/10.1007/978-981-19-2448-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book (Volume 1) contains high-quality peer-reviewed research papers presented at the Third International Conference on 3D Imaging Technologies—Multidimensional Signal Processing and Deep Learning (3DIT-MSP&DL) which was arranged by IRNet China and was held on December 26–28, 2021, at Yunnan Normal University in Kunming, China. The contents of the papers cover wide part of the most topical areas in the 3D image representation and technologies, multidimensional signal, image, and video processing and coding, together with related mathematical approaches and applications. The papers present unique research achievements in: color restoration of RGB-NIR images; dynamic grey wolf optimization algorithm; image recognition methods based on deep learning; SAR image change detection based on segmentation; full-focus imaging detection of ship ultrasonic phased array; novel space division rough set model for feature selection; water environmental quality assessment and effect prediction based on artificial NN; network intrusion detection based on apriori K-means algorithm; fast heuristic K-means algorithm based on nearest neighbor information; analysis of discrete S-I-R and S-I-S models; image-based physics rendering for 3D surface reconstruction; insulator detection based on improved faster-RCNN; citrus positioning method based on camera and lidar data fusion; grid false data intrusion detection method; graphic design of digital media art-based on computer-aided algorithm; visual communication of graphic design based on machine vision; adaptive matching mechanism of graphic design elements based on visual communication technology; design of intelligent recognition English translation model; analysis of automatic poetry generation systems based on recurrent NNs; ore detection method based on YOLOv4, and also, some interesting approaches in specific areas as: research on the simulation of informationized psychological sand table based on 3D scene; design of traditional arts and crafts based on 3D digital technology; development of mobile food recognition system based on deep convolutional network; longitudinal structure analysis and segmentation algorithm for Dongba documents.

v

vi

Preface

The aim of the book is to present the latest achievements of the authors to a wide range of readers: IT specialists, engineers, physicians, Ph.D. students, and other specialists in the area. Shoreham-by-Sea, UK Sofia, Bulgaria Kunming, China Sofia, Bulgaria February 2022

Lakhmi C. Jain Roumen Kountchev Yonghang Tai Roumiana Kountcheva

Acknowledgments The book editors express their special thanks to book chapter reviewers for their efforts and good will to help for the successful preparation of the book. Special thanks for Prof. Lakhmi C. Jain (Honorary Chair), Prof. Dr. Srikanta Patnaik, Prof. Dr. Junsheng Shi and Prof. Dr. Roumen Kountchev (General Chairs), Prof. Dr. Yingkai Liu (Organizing Chair), and Prof. Dr. Yonghang Tai (Programme Chair) of 3D IT-MSP&DL. The editors express their warmest thanks to the excellent Springer team which made this book possible.

Contents

1

2

Color Restoration of RGB-NIR Images in Low-Light Environment Using CycleGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shangjin Lv, Xiaoqiao Huang, Feiyan Cheng, and Junsheng Shi Dynamic Grey Wolf Optimization Algorithm Based on Quasi-Opposition Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tianlei Wang, Junhui Li, Renju Liu, Jinzhao Xu, Xiaoxi Hao, Kenneth Teo Tze Kin, and Jiehong Liang

3

Image Recognition Methods Based on Deep Learning . . . . . . . . . . . . . Zehua Zhang

4

Longitudinal Structure Analysis and Segmentation Algorithm of Dongba Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuting Yang and Houliang Kang

5

6

Overview of SAR Image Change Detection Based on Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mengting Yuan, Zhihui Xin, Xiaoqiao Huang, Zhixu Wang, Yu Sun, Yongxin Li, and Jiayu Xuan Full-Focus Imaging Detection of Ship Ultrasonic-Phased Array Based on Directivity Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cailiang Huang and Yongzheng Li

1

11

23

35

47

57

7

A Novel Space Division Rough Set Model for Feature Selection . . . . Shulin Wu, Shuyin Xia, and Xingxin Chen

8

Development of Mobile Food Recognition System Based on Deep Convolutional Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yue Geng

77

Water Environmental Quality Assessment and Effect Prediction Based on Artificial Neural Network . . . . . . . . . . . . . . . . . . . Wentian An

91

9

67

vii

viii

Contents

10 Network Intrusion Detection Based on Apriori-Kmeans Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Yiying Zhang, Delong Wang, Yannian Wu, Yiyang Liu, Nan Zhang, and Yingzhuo Li 11 A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Junkuan Wang, Qing Wen, and Zizhong Chen 12 Global Analysis of Discrete SIR and SIS Systems . . . . . . . . . . . . . . . . . 121 Fang Zheng 13 Image-Based Physics Rendering for 3D Surface Reconstruction: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Da Fang, Zhibao Qin, Shaojun Liang, and Renbo Luo 14 Insulator Detection Study Based on Improved Faster-RCNN . . . . . . 141 Zhuangzhuang Jing 15 Citrus Positioning Method Based on Camera and Lidar Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Chengjie Xu, Can Wang, Bin Kong, Bingliang Yi, and Yue Li 16 Comparative Analysis of Automatic Poetry Generation Systems Based on Different Recurrent Neural Networks . . . . . . . . . . 169 Lichao Wang 17 Grid False Data Intrusion Detection Method Based on Edge Computing and Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Yiying Zhang, Yiyang Liu, Nan Zhang, Delong Wang, Suxiang Zhang, and Yannian Wu 18 Innovative Design of Traditional Arts and Crafts Based on 3D Digital Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Lin Lin 19 Research on the Simulation of Informationized Psychological Sand Table Based on 3D Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Guangdong Wei, Tie Liu, Qiuyan Wang, Yuchi Tang, and Yitao Wang 20 Research on Graphic Design of Digital Media Art Based on Computer Aided Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Saihua Xu 21 Research on Visual Communication of Graphic Design Based on Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Qihao Zhou and Teng Liu

Contents

ix

22 Research on the Adaptive Matching Mechanism of Graphic Design Elements Based on Visual Communication Technology . . . . . 225 Teng Liu and Qihao Zhou 23 Design of Intelligent Recognition English Translation Model Based on Improved Machine Translation Algorithm . . . . . . . . . . . . . . 233 Ting Deng 24 Ore Detection Method Based on YOLOv4 . . . . . . . . . . . . . . . . . . . . . . . 245 Taozhi Wang Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

About the Editors

Lakhmi C. Jain, Ph.D., Dr. H.C., ME, BE (Hons), Fellow (Engineers Australia), is with the Liverpool Hope University and the University of Arad. He was formerly with the University of Technology Sydney, the University of Canberra and Bournemouth University. Professor Jain founded the KES International for providing a professional community the opportunities for publications, knowledge exchange, cooperation and teaming. Involving around 5000 researchers drawn from universities and companies world-wide, KES facilitates international cooperation and generate synergy in teaching and research.KES regularly provides networking opportunities for professional community through one of the largest conferences of its kind in the area of KES. His interests focus on the artificial intelligence paradigms and their applications in complex systems, security, e-education, e-healthcare, unmanned air vehicles and intelligent agents. Roumen Kountchev, Ph.D., D. Sc. is a professor at the Faculty of Telecommunications, Department of Radio Communications and Video Technologies at the Technical University of Sofia, Bulgaria. His scientific areas of interest are: digital signal and image processing, image compression, multimedia watermarking, video communications, pattern recognition and neural networks. Prof. Kountchev has 400 papers published in magazines and conference proceedings; 20 books; 48 book chapters; 20 patents. He had been the principle investigator of 52 research projects. At present he is a member of Euro Mediterranean Academy of Arts and xi

xii

About the Editors

Sciences and the President of the Bulgarian Association for Pattern Recognition (member of IAPR). Editor in chief of International Journal of Image Processing and Vision Science. Editorial board member of: International Journal of Reasoning-based Intelligent Systems; International Journal Broad Research in Artificial Intelligence and Neuroscience; KES Focus Group on Intelligent Decision Technologies; Egyptian Computer Science Journal; International Journal of Bio-Medical Informatics and e-Health, and International Journal Intelligent Decision Technologies; International Journal of Bio-Medical Informatics and eHealth. Member of Institute of Data Science and Artificial Intelligence and International Engineering and Technology Institute. He has been a plenary speaker at more than 30 international scientific conferences and symposia. Dr. Yonghang Tai studied in Yunnan Normal University and got his bachelor’s degree in 2009–2012. He received his Ph.D on Computer Science from Deakin University, Melbourne, Australia. He has hosted 4 Fund projects including Deakin University Postgraduate Research Full Scholarship, Yunnan Education Commission, Yunnan Natural Science Foundation, Yunnan Education Commission. He has published more than 30 papers, 5 of which has been indexed by SCI. He is the Co-editor of International Journal of Telemedicine and Clinical Practices and Machine learning and data analytics. His research interests include VR/AR/MR in surgical simulation, Physic-based rendering, Medical image processing. Roumiana Kountcheva got her M.Sc. and Ph.D. at the Technical University of Sofia, Bulgaria and in 1992 she got the title Senior Researcher. At present, she is the Vice president of TK Engineering, Sofia. She had postgraduate trainings in Fujitsu and Fanuc, Japan. Her main scientific interests are in image processing, image compression, digital watermarking, pattern recognition, image tensor representation, neural networks, CNC and programmable controllers. She has more than 180 publications, among which: 34 journal papers, 21 book chapters, and 5 patents. She was the PI and Co-PI of 48 scientific research projects. R. Kountcheva was the

About the Editors

xiii

plenary speaker at 16 international scientific conferences and scientific events. She edited several books published in Springer SIST series and is a member of international organizations: Bulgarian Association for Pattern Recognition, International Research Institute for Economics and Management (IRIEM), the Institute of Data Science and Artificial Intelligence (IDSAI), and is a Honorary Member of the Honorable Editorial Board of the nonprofit peer reviewed open access IJBST Journal Group.

Chapter 1

Color Restoration of RGB-NIR Images in Low-Light Environment Using CycleGAN Shangjin Lv, Xiaoqiao Huang, Feiyan Cheng, and Junsheng Shi

Abstract It has always been people’s pursuit to obtain more natural and clear color images in low-light environment. With the development of generative adversarial network (GAN) models in deep learning, more and more attention has been paid to the problem of near-infrared (NIR) image color restoration, especially the proposal of Cycle Generative Adversarial Network (CycleGAN) model which is suitable for the training task that is difficult to obtain pairs of training data; it is possible to achieve the above goal of near-infrared image color restoration. However, the NIR image color restoration task does not meet the requirements of CycleGAN model structure, so it is difficult to obtain the expected effect. In this paper, based on the basic principle of CycleGAN model and the image characteristics of RGB images containing near infrared (RGB-NIR), the feasibility of color restoration using CycleGAN model is investigated, the effect of color restoration is experimentally compared with the latest Asymmetric Cycle Generative Adversarial Network (ACGAN) and Convolutional and Deconvolutional Neural Network (CDNet) under the same conditions, and the experimental results are evaluated qualitatively and quantitatively. The experimental results show that CycleGAN can well adapt to RGB-NIR image color restoration task, and its overall effect on texture restoration and naturalness of color is better than the other two networks. Keywords Image color restoration · Near infrared (NIR) · RGB-NIR · GAN · CycleGAN

S. Lv · X. Huang (B) · F. Cheng (B) · J. Shi School of Physics and Electronic Information, Yunnan Normal University, Kunming, China e-mail: [email protected] F. Cheng e-mail: [email protected] Yunnan Key Lab of Optic-Electronic Information Technology, Kunming, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_1

1

2

S. Lv et al.

1.1 Introduction In low-light conditions, an ordinary color camera captures images with noise that have lost color and texture. Generally, the method of increasing exposure time and increasing auxiliary light source is adopted to achieve better effect. Since the sensitivity of the sensor of ordinary cameras based on silicon (SiO2 ) is up to 1100 nm in the near infrared (NIR), in low-light conditions, the near-infrared band is usually turned on to capture gray images in the field of monitoring applications. In recent years, the use of NIR to improve color effect of images has also become a research focus, such as the design of image sensors has increased an additional near-infrared channel based on RGB three channels to obtain a four-channel image (RGBN). In addition, removing the infrared filter, the traditional RGB imaging system is also used to capture RGB images containing near infrared (RGB-NIR). Based on the characteristic of this convenience, RGB-NIR images can usually be used in target detection systems or artificial assistance systems in low-light conditions [1, 2]. However, RGBNIR image will make overlap the visible and the near-infrared band, resulting in the overall color distortion to be pinkish. Generally, we will carry out color restoration processing for RGB-NIR images, but the color restoration problem of RGB-NIR images is different from image coloring that is from gray images to color images [3– 5]. The latter image intensity information has been included by the input gray image; it only needs to estimate the chroma information from image intensity. The RGB-NIR image is to keep the luminance information in RGB domain and remove the overlapped effect of chromaticity information of near infrared in RGB domain. Relatively speaking, the RGB distribution is too complex and difficult to be constrained by a certain rule. The color restoration process of RGB-NIR is also more complicated. Most of the previous methods of color restoration focus on interpolation and spectral decomposition (signal processing level) for color restoration to eliminate overlap in RGB-NIR images. Although these methods have certain positive effects on color restoration, it will lead to blurred image details. In recent years, the rapid development of deep learning has provided us with new ideas, among which the proposal of convolutional neural network (CNN) and generative adversarial network (GAN) has brought hope for people to solve this kind of problem well. In 2016, Limmer et al. [6] proposed infrared spectral positioning image technology using CNN to color near-infrared images. The thermal infrared color restoration technology based on CNN was proposed by Berg et al. [7] in 2018, and the near-infrared color restoration technology CDNet also based on CNN was proposed by Soria et al. [14] in the same period. But the generator of CNNs is more about learning a known sample distribution, which does not meet the generation expectation we want. The generator of GANs approximates an unknown distribution rather than learning a sample. So the mapping target is highly ambiguous and one input can correspond to multiple potential targets. The method based on GANs generation is proved to be effective in this case. In addition to the advantages mentioned above, the CycleGAN [8], a variant that generates adversarial networks, can realize the transformation between different domains under the condition of disqualification mapping in addition to the advantages

1 Color Restoration of RGB-NIR Images in Low-Light Environment …

3

mentioned above. Although CycleGAN does not require strict registration between image pairs, the definition of loss function in its network requires the same number of channels between image pairs. As a result, some people hope to use CycleGAN to solve the problem of image color restoration with overlap in the visible and the nearinfrared bands. Most of them change the network structure of CycleGAN to remove the identity mapping loss function that requires the same number of channels for images, such as ACGAN proposed by Sun et al. [9] and AAFSTNet proposed by Yang et al. [10]. In this way, the generator of CycleGAN will lose the pertinence of generating network, leading to the increase in error rate of CycleGAN and the decrease in correlation between generated samples and input samples. Our idea is to change the number of image channels containing NIR. Both RGB-NIR and RGB images have the characteristics of three channels, which is just suitable for CycleGAN conditions. Therefore, we use CycleGAN to perform color restoration of RGB-NIR images in low-light environment. In this paper, the feasibility of color restoration using CycleGAN model is investigated. The paper is organized as follows: Firstly, problems in the process of image color restoration are analyzed based on the basic principle of CycleGAN model and the characteristics of RGB-NIR images. Secondly, the effect of color restoration using CycleGAN is experimentally compared with the latest ACGAN and CDNet under the same conditions, and the experimental results are analyzed qualitatively and quantitatively. Finally, the last section is conclusion and discussion.

1.2 CycleGAN Structure 1.2.1 The Overall Structure Due to the principle of “channel consistency” required by CycleGAN and the characteristics of RGB-NIR images data set in low-light environment, it can be theoretically obtained that CycleGAN is suitable for RGB-NIR images in low-light environment to solve the problem of color restoration. CycleGAN is designed to solve the problem that in a series of visual tasks, it is difficult to find a matching high-quality image as the target image for the algorithm model to learn. The idea of this model is to form a universal mapping relationship from distribution domain A to distribution domain B, and to learn the distribution transformation relationship between distribution domain A and B rather than the one-to-one mapping relationship between specific data A and data B. At the same time, this mapping relationship means that its mapping is highly unconstrained. So CycleGAN adds an inverse mapping and a cyclic consistency loss function to ensure that the generated distribution has some correspondence with the input distribution. As shown in Fig. 1.1, CycleGAN model has two generators, GAB and GBA , and two discriminators, DAB and DBA . GAB can be regarded as a generation network of distribution A to distribution B, and GBA can be regarded as a generation network

4

S. Lv et al.

Fig. 1.1 Schematic diagram of CycleGAN model structure

of distribution B to distribution A. DAB is regarded as the discriminant network to determine whether the image of input discriminator A belongs to distribution A, and DBA is regarded as the network to determine whether the image of input discriminator B belongs to distribution B. The discriminator models are PatchGAN structures [11–13]. Fake image B is generated after input of real image A through GAB . Fake image B and type B image are input into discriminator DAB for identification. When the input is a real image B, the real image B is generated by GBA to generate fake image A, and the fake image A and the type A image are input into the discriminator DBA for identification. At the same time, the fake image A is generated and reconstructed by GAB .

1.3 Objective Function 1.3.1 The Basic Objective Function of GANs The loss function of the general GAN is composed of two parts. The generator network learns its corresponding transformation function (GAB or GBA ) by minimizing the loss. The generator’s loss is calculated by measuring the difference between the generation distribution and the target distribution; the greater the difference, the higher the generator’s penalty. The discriminator learns how to accurately distinguish the generated distribution from the target distribution by minimizing the loss function. The loss function is usually the cross-entropy function between two distributions. The larger the loss function is, the higher the penalty is for discriminator. When the two are combined, they play each other and progress each other. Train

1 Color Restoration of RGB-NIR Images in Low-Light Environment …

5

the generator to fool the discriminator, and the discriminator will be trained to better distinguish real data from generated data. Two generators and two discriminator packages of CycleGAN contain loss functions of traditional GAN. The first loss function for discriminator and generator: L GAN (G AB , DBA , A, B) = E b∼

pdata (b) [log

DBA (b)] + E a∼

pdata (a) [log(1

− DBA (a))] (1.1)

where E a~Pdata(a) and E b~Pdata(b) are, respectively, the distribution of RGB-NIR images and generated false RGB images, while log DBA (b) and log (1 − DBA (a)) are, respectively, the probability of false data and true data discriminator. The second loss function for discriminator and generator: L GAN (G BA , DAB , A, B) = E a∼

pdata (a) [log

DAB (a)] + E b∼

pdata (b) [log(1

− DAB (b))] (1.2)

where E a~Pdata(a) and E b~Pdata(b) are, respectively, the distribution of RGB images and generated false RGB-NIR images, while log DAB (a) and log (1 − DAB (b)) are, respectively, the probability of false data and true data discriminator.

1.3.2 Objective Function of Cyclic Transformation Consistency In addition, when the capacity is large enough, the CycleGAN can map the same set of input images to any random arrangement of images in the target domain, where any learned mapping can induce an output distribution matching the target distribution. Therefore, the above adversarial loss function alone does not guarantee that the learned function will map the single input Ai to the desired output Bi . Therefore, the identity mapping loss of cyclic transformation is proposed, which is defined as the difference between input A and forward prediction GBA (GAB (a)) and input B and forward prediction GAB (GBA (b)). It is the difference between the real image and the reconstructed image. The greater the difference, the further away the prediction is from the original input. The identity mapping loss function of cyclic transformation is defined as: L cyc (G AB , G BA ) = E a∼

  G BA (G AB (a)) − a1   pdata (b) G AB (G BA (b)) − b1

pdata (a)

+ E b∼

(1.3)

where L1−norm1 is used to calculate the loss between real and false values, where false a is the value of real a after generator GAB and generator GBA, false b is the value of real b after generator GBA and generator GAB.

6

S. Lv et al.

1.3.3 Total Objective Function of CycleGAN The function for training the total CycleGAN loss of the network is defined as the sum of the two GAN losses and the identity mapping loss. Loss(G AB , G BA , DAB , DBA ) = L GAN (G AB ,DBA , A, B) + L GAN (G BA , DAB , A, B) + λL cyc (G AB , G BA ) (1.4) The weighted factor λ is used to control the weight of the consistency loss in the total loss. The higher the weight, the more meaningful it is to reduce the loss of cyclic consistency compared to other losses. According to the total loss function of CycleGAN, in order to prevent the highly unconstrained CycleGAN, it is necessary for the inverse mapping and cyclic consistency loss function to exist; that is, in solving image problems, the input image and the number of channels must be consistent.

1.4 Experiment and Evaluation 1.4.1 Image Data Set in Low-Light Environment The image data set contains RGB and RGB-NIR image pairs, which ware captured by a self-designed imaging system with a key device controlling “on” and “off” for infrared filter on a moonlit night. So, both image pairs are registration. Among them, the true RGB images were captured with manually auxiliary light source in the same scene, and the RGB-NIR images did not. A total of 130 pairs of images mainly contain scenes of trees, flowers, park buildings, sky, stones, etc. 110 image pairs were collected in the data set to use for training and 20 for testing, and the size of each image is 640 × 480 × 3.

1.4.2 Training Method All network models were run on Linux operating system using PyTorch framework and trained on NVIDIA Tesla100 GPU. We use the Adam solver with a batch size of 1. All networks were trained from scratch with a learning rate of 0.002. We keep the same learning rate for the first 90 epochs and linearly decay the rate to zero over the next 90 epochs. The input image size was 352 × 352 × 3 and the output image size was 352 × 352 × 3.

1 Color Restoration of RGB-NIR Images in Low-Light Environment …

7

1.5 Quantitative and Qualitative Evaluation The effect of color restoration for RGB-NIR images in low-light environment using CycleGAN is experimentally compared with the latest ACGAN and CDNet [14] under the same conditions, and the experimental results are evaluated qualitatively and quantitatively. In quantitative evaluation, we chose five different scene images, and each row represents the same scene image under different models. The effects of color restoration for five RGB-NIR test images using different network models are shown in Fig. 1.2. In which, (a) are RGB-NIR images, (b) are true color images, (c) are color restoration images using CDNet, (d) are color restoration images using ACGAN, and (e) are color restoration images using our investigated CycleGAN. The lower left corner of the first scene and the upper left corner of the fifth scene image show that the CycleGAN model has better improved color effect on the color restoration of RGB-NIR images. The middle of the second

Fig. 1.2 Examples of color restoration using three different networks: a RGB-NIR; b RGB; c CDNet; d ACGAN; e CycleGAN

8 Table 1.1 Quantitative evaluation for 20 test samples using different networks

S. Lv et al. Different networks

Image quality evaluation algorithms MAE

SSIM

PSNR

CDNet

0.139

0.419

15.854

ACGAN

0.261

0.213

13.162

CycleGAN

0.253

0.227

13.537

scene and the sky part of the fourth image show that the CycleGAN model has better effect on the texture information retention of RGB-NIR images. In quantitative evaluation, we use three image quality evaluation algorithms: Mean Absolute Error (MAE), Peak Signal to Noise Ratio (PSNR), and Structural Similarity (SSIM). We calculate the above algorithm for the three channels of the image respectively and then calculate the mean to get the final result. Meanwhile, we normalized all the images. The results are shown in Table 1.1. The results show that the CycleGAN is better than the advance ACGAN in color restoration of RGB-NIR images in low-light environment, small data set, although it is lower than the CDNet in training mode.

1.6 Conclusion and Discussion It can be concluded that compared with the latest ACGAN and CDNet, the CycleGAN model is not only more comprehensive in preserving texture information, but also better in color restoration for RGB-NIR images of small data set. Since manual auxiliary light source is used to collect real RGB images, it takes time to collect data, so our data set is relatively small. Under such conditions, the advantage of CDNet model is that the texture information of the generated image is well preserved, but the disadvantage is that the color recovery is very poor and easily affected by the input image. This is due to the one-to-one correspondence between training images and the highly restricted model structure of CDNet model. The advantage of ACGAN model is that the color of the generated image is more creative and diverse. But the disadvantage is that there is too much or too little texture information. This is due to the highly unconstrained model structure of ACGAN. CycleGAN has an identity mapping loss function and uses unaligned image pairs. So that it can be good at color restoration and texture information retention tasks. The three mainstream quantitative evaluation methods used in this paper mainly focus on the evaluation of similarity degree of texture information of image. Such evaluation has little effect on color restoration and prefers the network model with similar texture information of input and output results. It fits the advantage of the CDNet model, because CDNet is heavily influenced by inputs. Therefore, CDNet scored higher than CycleGAN. But on the whole, CycleGAN model is more suitable for RGB-NIR image restoration tasks of small data set in low-light environment. In future work, we will pay more attention to the texture information of generated

1 Color Restoration of RGB-NIR Images in Low-Light Environment …

9

images and explore the feasibility of expanding the data set of RGB-NIR images in low-light environment to enhance the robustness of the model. Acknowledgments This work is funded by grants from the National Science Foundation of China (Grant Number 61875171, 61865015, 61650401) and the Yunnan Education Commission of China (Grant Number ZD2014004).

References 1. Honda, H., Timofte, R., Van Gool, L.: Make my day-high-fidelity color denoising with nearinfrared. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 82–90 (2015) 2. Mavadati, S.M., Sadeghi, M.T., Kittler, J.: Fusion of visible and synthesised near infrared information for face authentication. In: 2010 IEEE International Conference on Image Processing, pp. 3801–3804. IEEE (2010) 3. Cao, Y., Zhou, Z., Zhang, W., Yu, Y.: Unsupervised diverse colorization via generative adversarial networks. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 151–166. Springer, Cham (2017) 4. Guadarrama, S., Dahl, R., Bieber, D., Norouzi, M., Shlens, J., Murphy, K.: Pixcolor: Pixel recursive colorization. arXiv preprint arXiv:1705.07208 (2017) 5. Deshpande, A., Lu, J., Yeh, M.C., Chong, M.J., Forsyth, D.: Learning diverse image colorization. arXiv e-prints, arXiv-1612 (2016) 6. Limmer, M., Lensch, H.: Infrared colorization using deep convolutional neural networks. arXiv preprint arXiv:1604.02245 (2016) 7. Berg, A., Ahlberg, J., Felsberg, M.: Generating visible spectrum images from thermal infrared. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1143–1152 (2018) 8. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycleconsistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017) 9. Sun, T., Jung, C., Fu, Q., Han, Q.: NIR to RGB domain translation using asymmetric cycle generative adversarial networks. IEEE Access 7, 112459–112469 (2019) 10. Yang, X., Chen, J., Yang, Z., Chen, Z.: Attention-guided NIR image colorization via adaptive fusion of semantic and texture clues. arXiv preprint arXiv:2107.09237 (2021) 11. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Imageto-image translation with conditional adversarial networks. In: Proceedings of IEEE Conference on CVPR, Honolulu, Hawaii, USA (2017) 12. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z.H., Shi, W.Z.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017) 13. Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: European Conference on Computer Vision, pp. 702–716. Springer, Cham (2016) 14. Soria, X., Sappa, A.D., Hammoud, R.I.: Wide-band color imagery restoration for RGB-NIR single sensor images. Sensors 18(7), 2059 (2018)

Chapter 2

Dynamic Grey Wolf Optimization Algorithm Based on Quasi-Opposition Learning Tianlei Wang, Junhui Li, Renju Liu, Jinzhao Xu, Xiaoxi Hao, Kenneth Teo Tze Kin, and Jiehong Liang Abstract Although the structure of Grey Wolf Optimization algorithm (GWO) is simple and easy to understand, it has the disadvantage of unbalanced exploration and exploitation. To solve the problem, this paper puts forward a kind of learning strategy based on quasi-opposition learning and dynamic search strategy improved Grey Wolf Optimization algorithm. That is verified on the nine benchmark test functions, the experimental results proved that the improved Grey Wolf Optimization algorithm has better search performance, compared with the cuckoo search algorithm (CS) and particle swarm optimization (PSO) algorithm at the same time, by contrast, improved algorithm has the advantages of larger. Keywords Swarm intelligence · Grey Wolf Optimization algorithm · Quasi-opposition-based learning

2.1 Introduction In the field of scientific research, optimization problem is universal. To solve the optimization problem, many different kinds of optimization algorithms have been

T. Wang · J. Li · R. Liu · X. Hao (B) Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, Guangdong, China e-mail: [email protected] T. Wang School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, China J. Xu School of Mathematics and Computer Science, Wuyi University, Jiangmen 529020, Guangdong, China K. T. T. Kin Faculty of Engineering, University Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia J. Liang Jinming Machinery Manufacturing Co., LTD, Shantou, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_2

11

12

T. Wang et al.

Table 2.1 Nomenclature BOGWO

Grey Wolf Optimization algorithm based on quasi-opposition learning

DGWO

Dynamic Grey Wolf Optimization algorithm

BODGWO

Dynamic Grey Wolf Optimization algorithm based on quasi-opposition learning

proposed by researchers in recent years, among which the main focus is the metaheuristic algorithm. Metaheuristic algorithms can generally be divided into four types: evolutionary algorithms, physics-based algorithms, swarm intelligence algorithms and human-based algorithms. Swarm intelligence algorithm is an algorithm inspired by the social behaviour of groups and animals, for example, the Grey Wolf Optimization algorithm [1]. Evolutionary algorithms are proposed by simulating the evolutionary behaviour of organisms, such as genetic algorithms [2]. Physics-based algorithms are inspired by the physical laws of nature, such as gravity search algorithm [3]. Human-based algorithms are proposed inspired by human behaviour, such as social evolution and learning optimization algorithm [4]. GWO in swarm intelligence algorithm has simple structure, easy to understand, and few initialization control factor, which is the research object of this paper. To improve accuracy and convergence speed of GWO and balance exploration and development capability, reference [5] uses the normalized mapping method to add chaotic sequences in the search process of GWO, which enhances the detection and development capabilities of the algorithm. Literature [6] introduced a hunting search strategy based on dimensional learning to share information among wolves, enhance the balance between global search capability and local search capability, and maintain population diversity. In this paper, quasi-opposition learning and dynamic search strategy are used to GWO, and the experiment proves that GWO has better advantages. The algorithm names are provided in Table 2.1.

2.2 Grey Wolf Optimization Algorithm Mirjalili [1] proposed a new metaheuristic optimization algorithm: Grey Wolf Optimization Algorithm (GWO) inspired by social hierarchy and grey wolf hunting behaviour. According to leadership level, there are four species of grey wolves: α, β, δ and ω. Their social hierarchy is shown in Fig. 2.1. α, located at the first level of the pyramid, is the leader of the group. β is located in the second layer of the pyramid, and is the dominant group next to α. δ in the third layer is dominated by α and β. The ω wolf in the fourth tier is the lowest, also known as the search wolf, which follows the instructions of the first three and is responsible for searching and catching prey. During hunting, the behaviour of grey wolves to round up their prey is defined as follows:

2 Dynamic Grey Wolf Optimization Algorithm Based …

13

Fig. 2.1 Social hierarchy of grey wolves

       · (t) − = (t) C D  XP X 







 

X (t + 1) = X p (t) − A · D

(2.1) (2.2)



In Formula (2.1), D represents the distance between an individual grey wolf and the prey. Formula (2.2) is to update the position of the grey wolves. Where, t repre



sents the current iteration algebra, A and C is the coefficient vector and calculate as   Formulas (2.3) and (2.4), X P and X are the position vector of prey and the position vector of grey wolf, respectively. 

 



A = 2 a ·r 1 − a 



C =2· r2 

a =2−

2t tmax

(2.3) (2.4) (2.5)



where, a is the Equilibrium control factor and goes down linearly from 2 to 0, t repre  sents number of iterations, tmax represents the highest iteration number, r 1 and r 2 is a random number between (0, 1). Grey wolves can locate prey by scent and surround them. When a prey is found, the pack will surround it, led by α wolves. Formula (2.6) shows the mathematical model of individual grey wolves tracking their prey.   ⎧     ⎪  ⎪ ⎪ D α = C 1 · X α − X  ⎪ ⎪ ⎪   ⎨     D β = C 2 · X β − X  ⎪ ⎪   ⎪ ⎪    ⎪ ⎪  ⎩ D δ = C 3 · X δ − X 

(2.6)

14

T. Wang et al. 





where, D α , D β , D δ represent the distances between α, β, δ and other individuals, 







respectively. X α , X β , X δ represent the current positions of α, β, δ, respectively. C 1 , 



C 2 , C 3 is a random quantity, which is generated by Formula (2.4). ⎧   ⎪ ⎪ ⎨ X 1 = X α − A1 · D α 





X 2 = X β − A2 · D β ⎪ ⎪  ⎩  X 3 = X δ − A3 · D δ 



(2.7)



X1 + X2 + X3 X (t + 1) = 3



(2.8)

Formula 2.8 defines the update location of the next generation of grey wolves.

2.3 Improved Grey Wolf Algorithm 2.3.1 Opposition-Based Learning (OBL) When a heuristic algorithm tries to optimize a particular problem with an optimal solution, the initial solution is usually randomly generated. The optimization process is to constantly reduce the difference between the initial solution and the optimal solution. If the initial solution is near the optimal solution, the optimization process can be completed quickly. If the initial solution is far from the optimal solution, the optimization process takes a lot of time. The quasi-opposition learning strategy is introduced in order to improve the convergence rate of the algorithm [7]. It is defined in Formula (2.10): Suppose x is a real number within the range of a and b, where a ≤ b, then x, ˇ the opposite number x, as in Formula (2.9) indicates: 

x =a+b−x

(2.9)

In order to increase the diversity of the grey wolf population, a new type of quasi-vermicity of x [8] was selected. According to the study, x˜ is referred to as:  x˜ = rand ·

a+b  ,x 2 

 (2.10)

and x. where, x˜ is the random number between a+b 2 Similarly, this definition applies in higher dimensions, where P = {x1 , x2 , · · · , x D } is a point in the D-dimensional search space, where xi (1 ≤ i ≤ D)

2 Dynamic Grey Wolf Optimization Algorithm Based …

15

Fig. 2.2 Optimization process of standard GWO

ˇ is

in the range ai and bi . The opposite number of P is P = a real number xˇ1 , xˇ2 , · · · , xˇ D and quasi-opposite number P˜ = {x˜1 , x˜2 , · · · , x˜ D } said as follows: 

x i = ai + bi − xi 

x˜i = rand · (x i −

ai + bi ai + bi )+ 2 2

(2.11) (2.12)

In this paper, the pseudo-antithesis number generation method of antithesis learning strategy is used to increase the diversity of searching wolves and reduce the time of hunting prey in grey wolf population.

2.3.2 Dynamic Search Strategy As shown in Fig. 2.2, each search wolf approaches the same optimal solution, and the prey enclosure gradually shrinks, but the population diversity also gradually

16

T. Wang et al.

decreases during this period. Remain unchanged in order to guarantee the diversity of the grey wolf population update late, introduced a kind of dynamic structure [9], the ability of real-time updating of dynamic structure to search the wolf, and may be in the process of updating the three dominant wolf will change, the standard grey wolf algorithm two minor cycle into a tiny, accelerate the algorithm convergence speed. The specific structure is shown in Fig. 2.3. Fig. 2.3 Dynamic grey wolf algorithm optimization process

2 Dynamic Grey Wolf Optimization Algorithm Based …

17

2.3.3 Improved the Grey Wolf Algorithm to Optimize the Process Step 1: Step 2: Step 3: Step 4: Step 5: Step 6:

Initialize the population. Find the top three wolves and update parameters A, a, C and D. Change the location of the ω wolves and generate its approximate opposite number. Update the fitness value of the ω wolves and its approximate opposite number, and select the top three leading wolves α, β and δ. Check whether all wolves have been updated. If all wolves have been updated, go to Step 6. If not, return to Step 2. Judge whether the maximum iteration number is reached, if so, stop the programme; if not, return to Step 2.

Output the optimal solution and the optimal fitness value.

2.4 Numerical Experiment and Result Analysis 2.4.1 Influence of Improved Strategy on GWO The simulation environment was MATLAB R2016b on Intel(R) Core(TM) I5-8265U [email protected] GHz processor and 8 GB memory under Windows 10. In this experiment, nine benchmark functions were selected to analyse the performance of BODGWO algorithm. Where F1–F3 is a unimodal test function; F4–F6 is a multimodal test function; F7–F9 are fixed dimensional multi-peak test functions. The multi-peak test function is mainly used to estimate the global optimization ability of the algorithm and the single-peak test function is mainly used to estimate the convergence rate and accuracy of the algorithm. Table 2.2 shows the details of the benchmark function. In order to prove that the improved algorithm BODGWO is superior than that of the standard GWO algorithm in performance, experiments are conducted on the algorithms GWO, BOGWO, DGWO and BODGWO. Firstly, the population number N = 30, the search dimension D = 30 and the maximum iteration number Gmax = 500 were set. Meanwhile, in order to reduce random error, the benchmark function was independently run for 20 times in each experiment, and then the average value was taken, and then the measured data was recorded. The ordinate is the logarithm of the fitness value of the test function in Fig. 2.4. It can be seen from Fig. 2.4 that BODGWO has a fast convergence rate and a high convergence accuracy in the benchmark function. For example, in F1, BODGWO finds the optimal value after iteration at around 250 generations, while GWO and DGWO fall into local optimal value at around 250 generations. It didn’t escape until generation 500. The same is true in F2 and F3. BODGWO is the fastest one to find the optimal value among other four algorithms, while GWO and DGWO never find

Goldstein–Price function

F8 = (x2 −

Branin

5.1 2 x 4π 2 1

 ai −



+ 5 π x1

2

xi √ i

 +1

− 6)2 + 10(1 −

xi (bi2 +bi x 2 ) bi2 +bi x 3 +x 4

i=1 cos

d

1 8π ) cos(x 1 ) + 10

  F9 = 1 + (x1 + x2 + 1)2 · (19 − 14x1 + 3x12 − 14x2 + 6x1 x2 + 3x22 )   × 30 + (2x1 − 3x22 )2 · (18 − 32x1 + 12x12 + 48x22 − 36x1 x2 + 27x22 )

i=1

11

F7 =

Kowalik’s Function

xi2 i=1 4000

d

F6=



1 d x2 i=1 i d  cos(2π xi ) + 20 + exp(1)



  d 1 − exp i=1 d

F5 = −20 exp −0.2



Formula d F1 = i=1 xi2 d d |xi | F2= i=1 |xi | + i=1

 2 i d F3 = i=1 j=1 x j  d  2 F4 = 10d + i=1 xi − 10 cos(2π xi )

Griewank

Ackley

Rastrigin

Schwefel’s problem 12

Schwefel’s problem 2.22

Sphere

Function name

Table 2.2 Benchmark functions

0 0 0

[−10, 10] [−100, 100] [−5.12, 5.12]

[−2, 2]

[−5, 10] [10, 15]

[−5, 5]

[−600, 600]

3

0.39788

0.0003

0

0

0

[−100, 100]

[−32, 32]

f min

Bounds

18 T. Wang et al.

2 Dynamic Grey Wolf Optimization Algorithm Based …

19

Fig. 2.4 Evolution curve of fitness value of benchmark function

the optimal value. In the multimode test function, BODGWO algorithm has been convergent in the first 30 generations or so, and has greater convergence accuracy than the other three algorithms. However, the performance of the four algorithms is almost the same on the multimode test function with fixed dimensions. For example, on F8 and F9, the convergence speed and accuracy are almost the same. In Table 2.3, the average value and optimal value reflect the convergence accuracy of the algorithm. The standard deviation reflects the searching ability of the algorithm. The smaller the standard deviation is, the more centralized the solution of the algorithm is, and the algorithm shows strong local opening ability. The larger the standard deviation, the more discrete the solution distribution of the algorithm and the stronger the global search ability of the algorithm. In Tables 2.3 and 2.4, no matter DGWO, BOGWO or BOGWO algorithm, their average value in the nine benchmark functions is lower than that of the original GWO algorithm, which also indicates that they have higher convergence accuracy. In addition, they are lower than GWO algorithm in terms of standard deviation, indicating that the improved algorithm has strong local development capability in the later stage.

20

T. Wang et al.

Table 2.3 Comparison of GWO algorithm results in benchmark function Benchmark function

Optimization algorithm

Best

Mean

STD

F1

GWO

1.2E−27

1.2E−27

6.78E−31

BOGWO

0

0

0

DGWO

1.66E−61

1.66E−61

4.56E−66

BODGWO

0

0

0

GWO

7.52E−17

7.53E−17

2.08E−20

BOGWO

1.2E−203

1.6E−203

0

DGWO

1.94E−35

1.94E−35

1.84E−40

F2

F3

F4

F5

F6

F7

F8

F9

BODGWO

0

0

0

GWO

8.62E+03

8.82E+03

2.59E+02

BOGWO

0

0

0

DGWO

8.11E+03

8.11E+03

4.21E−01

BODGWO

0

0

0

GWO

3.02

3.73

6.19E−3

BOGWO

0

1.78E−17

5.32E−17

DGWO

1.15

1.15

1.61E−05

BODGWO

0

2.66E−17

4.14E−17

GWO

1.02E−13

1.06E−13

0

BOGWO

2.49E−15

4.02E−15

6.16E−16

DGWO

2.68E−14

3.04E−14

0

BODGWO

8.88E−16

3.72E−15

1.29E−15

GWO

0

5.55E−17

0

BOGWO

0

2.22E−18

4.42E−18

DGWO

0

5.55E−17

0

BODGWO

0

6.66E−18

1.19E−17

GWO

3.46E−03

3.47E−03

9.31E−06

BOGWO

3.76E−04

3.77E−04

6.4E−07

DGWO

1.12E−02

1.12E−02

2.2E−09

BODGWO

3.26E−04

3.26E−04

2.45E−09

GWO

0.397889

0.398288

5.20E−04

BOGWO

0.397888

0.398141

3.68E−04

DGWO

0.397888

0.397888

8.84E−08 1.16E−07

BODGWO

0.397888

0.397888

GWO

3.000046

3.000934

1.25E−03

BOGWO

3.000018

3.001031

1.50E−03

DGWO

15.15004

15.15004

5.91E−07

BODGWO

3.000042

3.000042

6.15E−08

2 Dynamic Grey Wolf Optimization Algorithm Based …

21

Table 2.4 Comparison of results of different swarm intelligence algorithms in test functions Benchmark function

Optimization algorithm

Best

Mean

STD

F1

PSO

1.41E+02

2.61E+05

2.89E+04

CS

1.44E+01

3.21E+01

1.44E+01

BODGWO

0

0

0

PSO

4.58

3.07E+29

3.24E+29 1.61

F2

F3

F4

F5

F6

F7

F8

F9

CS

3.29

5.29

BODGWO

0

0

0

PSO

9.94E+04

1.14E+09

8.5E+08

CS

5.66E+04

1.48E+05

7.7E+04

BODGWO

0

0

0

PSO

6.55E+01

8.49E+02

8.16E+01

CS

9.81E+01

1.43E+02

3.01E+01

BODGWO

0

0

0

PSO

7.94

2.17E+01

9.69E−02

CS

8.63

1.44E+01

4.37

BODGWO

8.88E−16

3.62E−15

1.32E−15

PSO

7.11E−03

9.09E+01

4.17

CS

8.69E−07

1.47E−02

3.16E−02

BODGWO

0

5.37E−18

1.06E−17

PSO

8.26E−03

1.06E+08

5.4E+08

CS

7.33E−04

1.15E−02

1.61E−02

BODGWO

3.57E−04

3.57E−04

6.63E−09

PSO

0.401477

3.10E+02

1.33E+02

CS

0.397887

0.628976

0.669416

BODGWO

0.397888

0.397888

4.55E−08

PSO

3.019255

2.75E+05

2.98E+05

CS

3.000001

14.23137

2.30E+01

BODGWO

3.000038

3.000038

4.86E−08

2.4.2 Compared with Other Swarm Intelligence Optimization Algorithms According to the data in Table 2.4, the proposed BODGWO algorithm has great advantages over PSO and CS algorithm. Because BODGWO has found the global optimal in the single-mode test functions F1–F3 and multimode test functions F4 and F6, and the convergence accuracy and convergence speed in the test functions F5 and F7–F9 are better than the PSO and CS algorithm. Therefore, the improved grey wolf algorithm BODGWO has better performance and can be applied to solve some practical optimization problems in society.

22

T. Wang et al.

2.5 Conclusion In this paper, based on the standard grey wolf algorithm, the original search structure is changed to dynamic search structure. The quasi-opposition learning strategy is added to ensure that the diversity of the population is maintained in the iterative process of the algorithm. Dynamic search structure can enhance the balance between global search capability and local search capability. Experiments on nine benchmark functions show that BODGWO algorithm has higher convergence accuracy and faster convergence speed than the original algorithm, cuckoo optimization algorithm and particle swarm optimization algorithm, indicating that the improved grey wolf algorithm has strong performance and can be applied to some engineering optimization problems. Although the improved grey wolf algorithm converges fast, it also has the problem of large computational complexity, which are two kinds of contradictory problems, but it is these contradictions that promote the evolution of the grey wolf algorithm, so the performance of grey wolf algorithm has further room for improvement in the future. Acknowledgements This work was supported by the education reform Projects of Guangdong Province (GDJX2018011), the Science and Technology Project of Jiangmen City (2020JC01035), the 2020 College Students’ Innovative Entrepreneurial Training Plan Program (202011349028) and the 2021 College Students’ Innovative Entrepreneurial Training Plan Program and the 2021 Enping Industry-University-Research Alliance project of Wuyi University.

References 1. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey Wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014) 2. Paterna, S., Santoni, M., Bruzzone, L.: An approach based on multiobjective genetic algorithms to schedule observations in planetary remote sensing missions. IEEE J. Sel. Topics Appl. Earth Observ. Rem. Sens. 13, 4714–4727 (2020) 3. Rashedi, E., Nezamabadi-Pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inf. Sci. 179(13), 2232–2248 (2009) 4. Kumar, M., Kulkarni, A.J., Satapathy, S.C.: Socio evolution and learning optimization algorithm: a socio-inspired optimization methodology. Futur. Gener. Comput. Syst. 81, 252–272 (2017) 5. Saxena, A., Kumar, R., Das, S.: β-chaotic map enabled grey wolf optimizer. Appl. Soft Comput. 75, 84–105 (2019) 6. Nadimi-Shahraki, M.H., Taghian, S., Mirjalili, S.: An improved grey wolf optimizer for solving engineering problems. Exp. Syst. Appl. 166, 113917 (2020) 7. Yu, X., Xu, W., Li, C.: Opposition-based learning grey wolf optimizer for global optimization. Knowl.-Based Syst. 226, 107139 (2021) 8. Chen, H., Li, W., Yang, X.: A whale optimization algorithm with chaos mechanism based on quasi-opposition for global optimization problems. Exp. Syst. Appl. 158, 113612 (2020) 9. Zhang, X., Zhang, Y., Ming, Z.: Improved dynamic grey wolf optimizer. Frontiers Inform. Technol. Electron. Eng. 22(6), 877–890 (2021)

Chapter 3

Image Recognition Methods Based on Deep Learning Zehua Zhang

Abstract With the development of science and technology worldwide, computer technology has gradually been widely used in various fields like the military and biomedical domains. In this situation, people’s requirements for image recognition are increasing. However, the efficiency of manual image recognition is very low, and image recognition has always been a weak field in computer technology. This paper will introduce various deep learning models such as feedforward neural networks, convolutional neural networks, and recurrent neural networks. We will talk about how these models work and then compare and analyze these different networks. In this paper, we will try to train a convolutional neural network from a data set. The “dogsvs-cats” data set from Kaggle will be used. In the “dogs-vs-cats” experiment part, some basic tasks in image processing will be discussed, such as feature extraction method, which is one of the most important steps in each link of image processing. Finally, the machine should be able to distinguish between pictures of dogs and cats. Moreover, we will also discuss some preliminary applications of deep learning in image recognition. Keywords Deep learning · Image recognition · Machine learning

3.1 Introduction 3.1.1 Overview of Image Recognition Image recognition means using computer technology to process and analyze images, including extracting features of images and classifying images. Image recognition generally consists of three steps: Image preprocessing, feature extraction, and image recognition. Image preprocessing is to improve the accuracy of image recognition by removing noise and interference in images to enhance the useful information of images; feature exaction is to transform images into another form, such as using Z. Zhang (B) University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_3

23

24

Z. Zhang

vector to represent the images; image recognition is to summarize the information about the images from the previous procedures, and the images will be decided which category the images belong to. Nowadays, image comparison is one of the most efficient methods for image recognition. For instance, Bayesian classification is one image recognition method which is comparing images by comparing the specific information and template information of the images.

3.1.2 Preprocessing in Image Recognition Preprocessing is to filter the information in the images. The helpful information will be kept, and some useless information will be erased. One central part of image preprocessing is image denoising. The general process of image preprocessing is image grayscale, image geometric transformation, and image enhancement. Image grayscale is the process of transforming a color image into a grayscale image. There are always three components, R, G, B, in a color image, and a weighted sum of the three components is used to obtain the grayscale image (It should be noted that the grayscale has one component only instead of three). The pixel with a larger gray value is brighter (the maximum pixel value is 255, which shows white color), and the pixel with a smaller gray value is darker (the minimum pixel value is 0, which shows black color). The most significant part of the preprocessing is color analysis and (eventually) color correction.

3.1.3 Feature Extraction in Image Recognition Image feature extraction is the second step of image recognition and a very significant step. The original image, which is based on pixels, belongs to the signal data and has a large amount. The classifier has no way to judge these pixels. It can only recognize the high-level data information extracted from the input image’s content part, namely the image features. This process is called feature extraction [1]. The feature of an image is like a face of a person. People memorize another person’s facial features in their brains, and similarly, machines can extract the features of various images. There are several classic algorithms to do the feature extraction: the scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG), and local binary patterns (LBP). Table 3.1 shows the comparison of these three methods.

3 Image Recognition Methods Based on Deep Learning

25

Table 3.1 Some advantages and disadvantages of the three algorithms Algorithms

Advantages

SIFT

It can generate large numbers of features It is still quite slow, costs long time, that densely cover the image over the full and is not effective for low powered range scales and locations [2] devices [3]

Disadvantages

HOG

It keeps good invariance to geometrical and optical deformation of images

The future dimension is too large; the huge amount of calculation; it cannot process occlusion of images

LBP

It is not sensitive to light; it owns fast computing speed

It is not able to extract the texture features in large size and structure [4]

3.2 Deep Learning Model for Image Recognition 3.2.1 Feedforward Neural Network (FNN) Feedforward neural network is one of the simplest neural networks because in this network, each neuron is only connected to the neuron at the previous layer, and there is no feedback between the layers. FNN has three layers: the input layer, the hidden layer, and the output layer. Figure 3.1 is a three-layer feedforward neural network. The layer of the network is called the depth of the model. The nodes in each layer represent the neurons, and the number of these nodes in each layer is called the width of the model. Such connection method is the key to the FNN architecture [5]. FNN transmits information through the following formulas:   z (l) = W (l) a (l−1) + b(l) , a (l) = fl z (l) . The two formulas can also be combined:

Fig. 3.1 A basic three-layer feedforward neural network

(3.1)

26

Z. Zhang

  z (l) = W (l) fl−1 z (l−1) + b(l) .

(3.2)

Here, z (l) is the lth layer’s neuronal output; a (l) is the lth layer’s neuronal input; b is the bias from the (l − 1)th layer to the lth layer; W (l) is the weight matrix from the (l − 1)th layer to the lth layer. Although FNN is one of the simplest networks, researchers are still advancing the research on FNN, and further increasing the accuracy and computing speed of FNN by improving algorithms. (l)

3.2.2 Convolutional Neural Network (CNN) Convolutional neural network is a special deep feedforward network. CNN is generally composed of four layers: input layer, convolutional layer, pooling layer, and fully connected layer. Figure 3.2 shows the construction of CNN [6]. For the input layer, the feature mapping is the image itself. If it is a gray image, then the depth is 1; if it is a color, the depth is 3. The function of the convolution layer is to extract the features from a local region. Convolutions of different sizes are like different feature extractors. Convolutions and neurons are one-dimensional, but convolutional neural networks are mainly for two-dimensional image processing. Therefore, neurons are usually organized into neural layers of the three-dimensional structure to use the local features of images fully. The pooling layer is mainly used to reduce the number of parameters. Although the convolutional layer can significantly reduce the number of connections in the network, the number of neurons in the feature map does not significantly decrease. Therefore, the pooling layer is used here to reduce the feature dimension and parameters’ number. The function of the

Fig. 3.2 The construction of CNN with an input layer, a convolutional layer, a pooling layer, a fully connected layer, and a classifier layer

3 Image Recognition Methods Based on Deep Learning

27

full-connected layer is to connect the outputs from the previous layer to the nodes in the next layer. In recent years, smaller convolution kernels and the deeper structure have been more commonly used, making pooling layers smaller and smaller. Therefore, in the current popular convolutional networks, the utilization rate of pooling layers is gradually decreasing.

3.2.3 Recursive Neural Network (RNN) RNN, also known as temporal recursive neural network, is a kind of neural network with short-term memory capacity. RNN has three layers: an input layer, a hidden layer, and an output layer. Figure 3.3 shows the structure of a basic RNN [7]. When the network receives the input xt , the value of the hidden is st , and the output value is ot . The value of st does not only depend on the value of xt , but also st−1 . Therefore, we can obtain the below formulas (where U is the weight matrix from the input layer to the hidden layer, and V is the weight matrix from the hidden layer to the output layer): Ot = g(V · St ), St = f (U · X t + W · St−1 ).

(3.3)

RNN is mainly used to solve sequence data problems. In the structure of RNN model, the network will remember the previous information and apply it to the output calculation. Compared with the feedforward network and convolutional neural network, the neurons in the hidden layer of the recurrent neural network are interconnected. In recent years, RNN has been applied mostly in Natural Language Processing, including speech recognition, language modeling, and machine translating fields.

Fig. 3.3 The structure of an RNN

28

Z. Zhang

3.2.4 U-Net Convolutional Neural Network The typical characteristic of U-net network is that it is u-shaped symmetric structure, with convolution layer on the left and up-sampling layer on the right. The weight of the network will be initialized, and then the model can be trained. The convolution layer structure of some existing networks and corresponding trained weight files can also be used for training calculation. In deep learning model training, if existing weight model files can be used, the training speed can be greatly accelerated. Another feature of the model is that the feature graph of each convolution layer of U-net network will concatenate to the corresponding up-sampling layer, so that the feature graph of each layer can be effectively used in the subsequent calculation. U-Net Convolutional Neural Network is very suitable for medical image analysis. To be more specific, medical image needs more high-resolution information due to its fuzzy boundary and complex gradient. Moreover, the internal structure of human body is relatively fixed, and the distribution of segmentation target in human body image is very regular, with simple and clear semantics. Low resolution information can provide this information for target object recognition.

3.2.5 Long Short-Term Memory Network LSTM is a special kind of RNN. The difference between the two is that ordinary RNN has only one state inside a single loop structure whereas an LSTM has four states inside a single circulating structure, also known as a cell. In contrast to RNN, LSTM keeps a persistent unit state between loop structures, which is used to decide which information to forget or continue to pass. LSTMs are specifically designed to address long-term dependency issues. Remembering information over a long period of time is an essential skill for them. All recursive neural networks have a chain form that repeats the neural network model itself. In standard RNNs, this replication module has only a very simple structure, such as a tanh layer. LSTMs also have this chain structure, but this repeating module is different from the RNNs structure mentioned above: LSTMs do not add just one simple neural network layer, but four, which interact in a special form. As a member of deep learning technology, LSTM has a complex basic structure and high computational complexity, which makes it difficult to conduct in-depth learning. For example, Google Translate only applies 7–8 layers of LSTM network structure.

3 Image Recognition Methods Based on Deep Learning

29

3.2.6 Auto-Encoder Neural Network Auto-encoder is an unsupervised neural network model. It can learn the implied features of input data (this process is called coding) and reconstruct the original input data with the learned new features (this process is called decoding). Intuitively, this network can be used for feature dimension reduction, which is like PCA. However, its performance is stronger than PCA, because the neural network model can extract more effective new features. In addition to feature dimension reduction, the new features learned by the autoencoder can be fed into the supervised learning model, so the auto-encoder can play the role of feature extractor. Building an auto-encoder requires the following three steps: building the encoder, building the decoder, and setting up a loss function to measure the loss of information due to compression. Encoders and decoders are generally parameterized equations with differentiable loss functions. Parameters for encoders and decoders can be optimized by minimizing loss functions, such as SGD. For instance, an auto-encoder is considered to consist of two cascaded networks. The first network is an encoder that takes input x and converts the input to signal y via the function h: y = h(x).

(3.4)

The second network takes the encoded signal y as its input and obtains the reconstructed signal r through function f : r = f (y) = f (h(x)).

(3.5)

3.2.7 Generative Adversarial Network (GAN) A GAN mainly consists of a generator and a discriminator. The discriminator can be understood as the coach of the generator. Specifically, a generator can generate images through machines, and the discriminator will judge whether the images are real, or machine generated. The first step is to train the generator with the discriminator fixed. In this way, the skill of the generator constantly improves, and then it will be harder and harder for the discriminator to judge the generator. The second step is to train the discriminator with the generator fixed. The skill of the discriminator of judging the generator improves, and finally it can judge all the images. The two steps are done again and again, and finally a very strong generator is created. Therefore, the generator can be used to generate the images people want.

30

Z. Zhang

The basic procedure to train a GAN is as follows. Firstly, the parameter of the generator and the discriminator will be initialized. Secondly, n samples are taken from the real sample and n noise samples are taken from the prior distributed noise; n generated samples are obtained through the generator. The generator G is fixed, and the discriminator D is trained to distinguish the real sample from the generated sample as well as the correct sample from the generated sample as much as possible. Thirdly, after the discriminator is updated k times, the parameters of the generator are updated with a small learning rate and the generator is trained to reduce the gap between the generated sample and the real sample as much as possible. Finally, after several iterations of updates, the ideal situation is that the discriminator can’t tell whether the sample came from the output of the generator or the real output. That is, the final sample discriminant probability is 0.5. The network helps with tasks such as generating images by text, improving image resolution, matching drugs, and retrieving images with specific patterns.

3.2.8 Deep Belief Network (DBN) DBN consists of multiple restricted boltzmann machines (RMMs). These networks are “restricted” to a visible layer and a hidden layer, with connections between layers but not between cells within the layers. Hidden layer units are trained to capture the correlation of higher-order data represented in the visual layer. The connection of a DBN is guided by top-down generation weights, and the RBMs is like a building block. Compared with traditional and deeply layered Sigmoid belief networks, it is easy to learn weights. During this training phase, a vector v is generated in the visible layer to pass values to the hidden layer. In turn, the input to the visual layer is randomly selected to reconstruct the original input signal. Finally, these new visual neuron activation units will be transmitted forward to reconstruct the hidden layer activation units to obtain h. It is worth mentioning that the hidden layer units here are trained to capture the correlation of higher-order data represented in the visual layer. The backward and forward steps are Gibbs’s sampling, and the correlation difference between the hidden layer activation unit and the visible layer input is the main basis for weight updates. Figure 3.4 shows the structure of DBN, where h_1, h_2, and h_3 are hidden layers, and V is the visible layer.

3 Image Recognition Methods Based on Deep Learning

31

Fig. 3.4 The structure of a DBN

3.3 The Application of Deep Learning in Image Recognition 3.3.1 Face Recognition Face recognition is the technology of identity recognition by comparing face features. Convolutional neural network is the most used network for face recognition. There are three main procedures to do the face recognition by using CNN. The first step is “face detection.” In this step, the machine will receive the input image. The machine then detects whether the image has human face and determine the bounding box of the human face. The second step is “image calibration.” This step is to determine some key points of the face, and then align the face according to these key points. The third step is “image representation.” This step uses convolutional neural network to change the input image with human face to vector form. Figure 3.5 is an example

Fig. 3.5 The structure of a very simple kind of CNN named VGG16

32

Z. Zhang

model to do this step. It is called VGG16, which is one kind of convolutional neural network [8]. As the development of CNN, face recognition technology has become a significant research field of deep learning. Face recognition now is more and more commonly used in electronic commerce field, criminal investigation field, and so on.

3.3.2 Traffic Image Recognition In IJCNN2011 Traffic sign Recognition Contest, the traffic sign recognition algorithm using deep learning performed outstanding. Since then, a lot of methods based on deep learning have been applied to the field of traffic sign recognition, and extensive achievements have been made. A series of target detection algorithms with excellent performance, represented by R-CNN and YOLO, have been proposed to truly achieve end-to-end traffic sign detection and recognition. In 2020, the Chinese researcher Zhang Zhong-wen and his team applied the YOLOv3 target detection algorithm to the detection of forbidden flags and achieved an average accuracy of 80% on TT100K data set.

3.3.3 Medical Image Recognition Nowadays, many lesion sites cannot be effectively diagnosed, mainly due to the lack of relevant data sets. It is one of the most advanced medical diagnosis methods to use image processing technology combined with deep learning to diagnose the diseased parts of the human body. In 2018, Mohamed and his team trained a CNN model to classify breast density and determine the risk of breast cancer by using the breast images collected by their institution as data set. Experiments showed that this method’s area under curve (AUC) was as high as 98.8%. In 2017, the Chinese researcher Sun proposed a hybrid model combining a convolutional neural network and support vector machine for functional magnetic resonance imaging recognition. In the experiment, the fusion method achieves 99.5% recognition accuracy [9].

3.4 Train a Convolutional Neural Network from a Small Data Set 3.4.1 Data Sorting and Network Building There is a data set that contains 12,500 images of kinds of dogs and 12,500 images of kinds of cats. A new data set that contains three subsets is created. The first subset

3 Image Recognition Methods Based on Deep Learning

33

is a training set of 1000 samples of images of dogs and 1000 samples of images of cats; the second subset is a validation set of 500 samples of images of dogs and 500 samples of images of cats; the third one is a test set of 500 samples of images of dogs and 500 samples of images of cats. In the convolutional neural network, the Conv2D layer and MaxPooling2D stacked alternately to build the network. In this specific case, the depth of the feature graph in the network increases gradually, while the size of the feature graph decreases gradually.

3.4.2 Data Processing and Reading Before feeding the data into the neural network, the data should be formatted as preprocessed tensors of floating-point numbers. The data is now stored on a hard disk as a JPEG file, so the data preprocessing steps are as follows: Read the image files → Decode JPEG files into RGB pixel grids → Convert these pixel grids into floating-point tensors → Scale the pixel value (within the range of 0–255) to the range of [0, 1]. The images need to be read into memory and converted to the tensor format by TensorFlow (this step is to train the model). After the images are read into the memory, they will be decoded. The images are color image, and the number of channels is specified as 3 (3 represents red, green, and blue color channels).

3.4.3 Image Processing The images can be classified to two categories: the code can be [0, 1] or [1, 0], where [0, 1] represents the images of dogs and [1, 0] represents the images of cats. Moreover, all the images need to be resized, and the size of the images is determined by the CNN model used. The input image size used for Inception-resnet-v2 is 299 × 299. If the width or height of the images is less than 299, it will be filled with black pixels to 299; if the width or height of the images is greater than 299, then it will be cropped from the center of the image to 299. The code for the training phase uses the Inception-resnet-v2 module preconfigured in tensorflow.slim module to construct the network. The training code uses CKPT file of ImageNet pre-training model parameters provided by tensorflow.slim module to recover network parameters.

34

Z. Zhang

3.5 Conclusion In conclusion, this paper mainly introduces the overall definition and the specific steps of image recognition. There are a lot of deep learning networks such as feedforward neural network, convolutional neural network, recursive neural network, UNet Convolutional neural network, and so on. The basic definition and some training methods of these networks is introduced and analyzed. Most of those models can be used in image recognition. In this situation, there are a great number of applications of deep learning in image recognition like face recognition and traffic image recognition. In future, more and more applications of deep learning in image recognition will be found and exploited.

References 1. Li, L.Y.: Application of deep learning in image recognition. J. Phys. Conf. Ser. 1693(1), 012128 (2020). IOP Publishing 2. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004) 3. Aljutaili, D.S., Almutlaq, R.A., Alharbi, S.A., Ibrahim, D.M.: A speeded up robust scaleinvariant feature transform currency recognition algorithm. World Acad. Sci. Eng. Technol. Int. J. Comput. Electr. Autom. Control Inform. Eng. 12(6), 346–351 (2018) 4. Sree, C.S.: Survey on extraction of texture based features using local binary pattern. Int. J. Eng. Res. Technol. (IJERT) 4(7), 334–338 (2015) 5. Ciaburro, G., Venkateswaran, B.: Neural Networks with R: Smart Models Using CNN, RNN, Deep Learning, and Artificial Intelligence Principles (2017) 6. Zaniolo, L., Marques, O.: On the use of variable stride in convolutional neural networks. Multimedia Tools Appl. 79 (2020). https://doi.org/10.1007/s11042-019-08385-4 7. Zhu, J.C., Yang, Z.L., Mourshed, M., Guo, Y.J., Zhou, Y.M., Chang, Y., Wei, Y.J., Feng, S.Z.: Electric vehicle charging load forecasting: a comparative study of deep learning approaches. Energies 12(14), 2692 (2019) 8. Er, M.B., Aydilek, I.B.: Music emotion recognition by using chroma spectrogram and deep visual features. Int. J. Comput. Intell. Syst. 12(2), 1622–1634 (2019) 9. Sun, X., Park, J., Kang, K., et al.: Novel hybrid CNN-SVM model for recognition of functional magnetic resources images, pp. 1001–1006 (2017)

Chapter 4

Longitudinal Structure Analysis and Segmentation Algorithm of Dongba Document Yuting Yang and Houliang Kang

Abstract With the rapid development of information technology, Dongba hieroglyphs and Dongba culture have also been rapidly spread through information technology and the Internet. However, most of the Dongba documents still exist in the form of images at present, and it is impossible to extract and retrieve the text. Therefore, combined with the structural characteristics of the Dongba document, first we normalize the size of the document image; then, combined with the vertical projection algorithm to perform preliminary segmentation of the document, we extract the structural characteristics of the vertical column from it. Finally, we use statistical methods to analyze the attributes and effectiveness of different segmentation columns, and by removing abnormal columns, we achieve effective vertical segmentation of documents and text column extraction. The algorithm is intuitive, vivid, simple to implement, and lays a foundation for detailed analysis and digitization of documents. Keywords Analysis of Dongba document structure · Vertical segmentation · Statistical analysis · Vertical projection split

4.1 Overview of Dongba Hieroglyphics Dongba script is a very primitive pictograph. It is called “Senjululu” in Naxi language, which is literally translated as “imprint left on wood and stone.” Since this kind of script is mainly used by the Naxi priest Dongba to write the Dongba Sutra that conveys national culture, people usually call it Dongba script [1]. In 2003, Dongba ancient books written in Dongba script were included in the Memory of the World Heritage List by UNESCO. Figure 4.1 shows part of the Dongba Sutra manuscripts. Y. Yang (B) School of Computer Engineering, Suzhou Vocational University, Suzhou 215000, Jiangsu, China e-mail: [email protected] H. Kang Sports Department, Suzhou Vocational University, Suzhou 215000, Jiangsu, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_4

35

36

Y. Yang and H. Kang

Fig. 4.1 Part of the Dongba sutra manuscripts

However, the Dongba script written by Dongba masters is only a brief record of the contents of the Dongba Sutra, and the masters used it as the basis for reciting during religious ceremonies [2]. Often just a few simple Dongba characters, Dongba Master can read dozens or even dozens of words from them. This is because the masters’ understanding of the Dongba Sutra is mainly passed down through memory and word-of-mouth through Dongba masters and apprentices [3]. The learning model makes it almost impossible for people to learn Dongba script and understand Dongba scripture independently. With the rapid development of information technology, people can more conveniently obtain readings and image materials related to Dongba hieroglyphs through the Internet. However, because most people cannot distinguish and read Dongba hieroglyphs, it is difficult to distinguish the accuracy and completeness of this information. Therefore, we take the three dictionaries representing the higher level of Dongba hieroglyphics—“Naxi Pictographs,” [4] “Naxi Pictographs and Transcription Characters Dictionary [5]” and “A Na-Khi English Encyclopedic Dictionary [6]” as the research objects. Then, by analyzing the document structure of the dictionary, we realize the extraction algorithm for getting the vertical column content of the document body, which lays the foundation for the structure analysis and digitization of Dongba document.

4 Longitudinal Structure Analysis and Segmentation Algorithm …

37

4.2 Structure of Dongba Document Image Three dictionaries “Naxi Pictographs,” “Naxi Pictographs and Transcription Characters Dictionary” and “A Na-Khi English Encyclopedic Dictionary” have different structural features. “Naxi Pictographs” is a document composed of Dongba pictographs and offline handwritten Chinese characters, and the other two are documents composed of Dongba pictographs and printed Chinese characters. Analyzing the structure of the three dictionaries, we can see that the characteristics of rows and columns in the document “Naxi Pictographs” are very significant, but due to the influence of offline handwritten characters, the rows and columns are potentially glued together. At the same time, due to the randomness of handwritten text, the height and width of the text cannot be calculated accurately, as shown in Fig. 4.2a. The column structure of the “Naxi Pictographs and Transcription Characters Dictionary” document is more prominent, followed by rows. Since the text is mainly printed text, the height and width values of the text are relatively fixed. However, the segmentation lines added in the document will still cause some interference to the segmentation of text lines, as shown in Fig. 4.2b. The line structure of the “A Na-Khi English Encyclopedic Dictionary” document is very significant, especially the paragraph structure, the segmentation of text lines is less difficult, and the segmentation of text lines is easy to achieve, as shown in Fig. 4.2c. It can be seen from the above that the three dictionaries have different structures, including offline handwritten characters, printed characters, offline handwritten Dongba characters, and other characters. Moreover, they are different from the text line segmentation in ordinary documents. The three documents are all dictionaries, and the Dongba characters and the corresponding Chinese comment paragraphs must have a one-to-one correspondence. Therefore, in the process of text line segmentation, in addition to considering the segmentation of a single text line, we also need to lay a foundation for the subsequent combination and segmentation of Dongba characters and Chinese comment paragraphs. The existing offline handwritten Chinese text line segmentation algorithm [7–10], or the ancient book layout analysis method [11], mainly solves the segmentation of a single text line, regardless of the overall structure of the paragraph where the text line is located. They have the problems of being too targeted, having few extensible fields, and being too complex for algorithms. If we directly use them for the text line segmentation of the three dictionaries, the segmentation results will be too fine and fragmented, which is not conducive to analyzing the paragraph structure and overall structure of the document. Therefore, taking the structures of “Naxi Pictographs” and “Naxi Pictographs and Transcription Characters Dictionary” as examples, we focus on analyzing the longitudinal features of the document, and give an effective text column segmentation algorithm. It can make the three types of documents consistent in the horizontal and vertical structure. Moreover, we can also use a similar method to complete the text line segmentation of the three types of documents in future. The steps to split a text column in a document are: first, we use the preprocessing algorithm to remove the headers and footers of the text line, which have nothing to do with the text and easily

Fig. 4.2 Document structure of Dongba hieroglyphic dictionary. a Naxi Pictographs, b Naxi Pictographs and Transcription Characters Dictionary, c A Na-Khi English Encyclopedic Dictionary

38 Y. Yang and H. Kang

4 Longitudinal Structure Analysis and Segmentation Algorithm …

39

interfere with the text column segmentation; Secondly, we use the vertical projection algorithm to segment the document for the first time, remove the blanks on the left and right sides of the document, and reduce the interference to the longitudinal structure of the document; Finally, by drawing the distribution of vertical blank spacers and judging the properties of vertical blank spacers, we can achieve effective vertical segmentation of the document and accurate extraction of text columns.

4.3 Preprocessing of Dongba Documents The main body of the dictionary is generally composed of three parts: header, main content, and footer, but sometimes it also adds numbering, chapter guides, separator columns, and other content to help readers quickly locate the document location, as shown in Fig. 4.2. In order to get the main body of the document, in addition to removing structural content such as headers and footers, we also need to remove decorations, comments, and partitions in the document. Since auxiliary structures such as headers/footers and decorative signs will be added later in the document typesetting, they have a relatively stable composition and a fixed position. Therefore, in the preprocessing stage of the document, first, we format the size of the images of the same document; then, we calculate the header and footer positions of a single document to facilitate the removal of the header and footer in the document. Among them, the format size of “Naxi Pictographs” is 3508*2480 pixels, and the text range is [380, 3200] pixels. The format size of “Naxi Pictograph Phonetic Dictionary” is 1385*983 pixels, and the text range is [116, 1210] pixels. The format size of “A Na-Khi English Encyclopedic Dictionary” is 3071*2184 pixels, and the text range is [173, 3000] pixels. The removal effect of the header and footer of the three documents is shown in Fig. 4.3. At this time, the characteristics of the text column are more obvious. Among them, the “Naxi Pictographs” includes the following text columns: number, Dongba Hieroglyph and Chinese annotations, and the “Naxi Pictographs and Transcription Characters Dictionary” includes: number, Dongba Hieroglyph, Chinese annotations and chapter guide. The document image of “A Na-Khi English Encyclopedic Dictionary” is based on the main structure of text line. Therefore, it has no significant column features, but we still need to further remove the blank areas on the left and right sides of the document to reduce their influence on the analysis of the structure. Therefore, we use the vertical projection algorithm [12, 13] to initially segment the document. At this time, the blank columns on the left and right sides of the document are significantly separated and removed. It prepares for the analysis of document paragraphs, extraction and recognition of independent Dongba hieroglyphs, pronunciation, and Chinese annotations. The segmentation result is shown in Fig. 4.4. Obviously, the vertical projection does not get the segmentation effect which we want. The extra segmentation column in the document and the text writing according with wishes lead to uneven distribution, which makes a large number of over-segmentation in the vertical projection. In addition, these dividing lines are

Fig. 4.3 Remove the header and footer of the document

40 Y. Yang and H. Kang

41

Fig. 4.4 The first division of the vertical projection

4 Longitudinal Structure Analysis and Segmentation Algorithm …

42

Y. Yang and H. Kang

unmarked, and the computer cannot automatically recognize their validity. Therefore, on the basis of the vertical projection algorithm, we combine statistical theory to determine the width and attributes of the document column to achieve effective vertical segmentation of the document.

4.4 Automatic Segmentation and Recognition of Columns Analyzing the structure of the three types of document images, it is easy to see that the “Naxi Pictographs” is more difficult to segment than the other two types. Because it is a combination of Dongba pictographs and offline handwritten Chinese characters, the spacing between text lines and the distance between words cannot be exactly the same in the layout of the document, which leads to more over-segmentation during the vertical projection process. Since the other two types are all printed documents, they are relatively stable in structure, and there are fewer cases of over-segmentation. Therefore, we take “Naxi Pictographs” as an example to count the segmentation results of all documents in the book by vertical projection. Since the position of the Vertical Blanking Spacer (VBS) is relatively fixed, the width does not change much. Therefore, we choose VBS as a sample for document vertical segmentation, and segment documents by determining the type of VBS. The feature binary VD that defines VBS is: V D = (Sv, W id) Among them, Sv is the starting column coordinate of the VBS, and Wid is the width of the VBS. The two-dimensional distribution of VBS samples in the document is shown in Fig. 4.5. Fig. 4.5 The feature distribution of all blank columns in the document

4 Longitudinal Structure Analysis and Segmentation Algorithm …

43

Combining with the distribution law of VBS in Fig. 4.5, we can easily find that the clusters in the red box belong to the abnormal category. This is because they are basically distributed inside the comment column, and the width of the blank column is narrow. Therefore, when a blank column is within this range, it means that it is an abnormal column and should be deleted to ensure the integrity of the comment column. We judge the blank columns in the document shown in Fig. 4.4, and after the second merge, the result is shown in Fig. 4.6. It is easy to see that through the statistical method, we can more accurately determine the attributes of the blank segmentation column in the document, and perform effective processing, thereby ensuring the validity of the text column in the document.

Fig. 4.6 Attribute judgment and merging of blank columns

44

Y. Yang and H. Kang

4.5 Experiment The automatic segmentation and recognition algorithm of document columns can also be used for “Naxi Pictographs and Transcription Characters Dictionary” and “A Na-Khi English Encyclopedic Dictionary”. Among them, “A Na-Khi English Encyclopedic Dictionary” is based on paragraphs and does not involve blank columns, so only the width of the blank columns on both sides of it can be counted. However, there are many functional structures in the “Naxi Pictographs and Transcription Characters Dictionary,” such as partitions, chapter guides, and so on. Therefore, after finishing the preprocessing, we need to remove the chapter guide in the document to avoid its influence on the feature statistics of the blank column. Since the position and size of the chapter guide are fixed, its process is similar to the process of removing header/footer. The effect after removing the chapter guide is shown in Fig. 4.7. At this time, the structure of the document is more prominent, and the characteristics of the text columns and blank columns in the document are also more obvious. Therefore, combining the vertical projection and statistical theory, we compare the structural characteristics of the document, divide the “Naxi Pictographs and Transcription Characters Dictionary” and “A Na-Khi English Encyclopedic Dictionary” separately, and then calculate the characteristics of the blank columns. The result is shown in Fig. 4.8.

4.6 Conclusion The column structure analysis and segmentation in Dongba documents is the first step of document analysis, and it is also a very important step. Through the vertical division of the document, the structure of the document can be made more prominent, and the division and extraction of text lines can become easier and more effective. Therefore, combined with the structural characteristics of Dongba documents, we give a text column segmentation algorithm which is suitable for Dongba document analysis based on statistical theory and vertical projection. Moreover, experiments show that the algorithm can be applied to different structures and types of Dongba documents, and lay a foundation for the digitization of Dongba documents.

4 Longitudinal Structure Analysis and Segmentation Algorithm …

Fig. 4.7 Remove the chapter guide in the document

45

46

Y. Yang and H. Kang

Fig. 4.8 The feature distribution of blank columns of two types of documents

Acknowledgements This research was supported by the Scientific Research Fund of Suzhou Vocational University (SVU2021YY02).

References 1. Ge, A.G.: Dongba culture overview. Stud. Nat. Art 2, 71–80 (1999) 2. Zheng, F.Z.: Word Research of Naxi Dongba Hieroglyphic, pp. 1–230. Nationalities Publishing House, Beijing (2005) 3. He, L.M.: On transition of Dongba culture. Soc. Sci. Yunnan 1, 83–87 (2004) 4. Fang, G.Y.: Naxi Pictographs. Yunnan People’s Publishing House, Yunnan (2005) 5. Rock, J.F.: A Na-Khi-English Encyclopedic Dictionary. Roma Istituto Italiano Peril Medio ed Estreme Prientale, Roma (1963) 6. Li, L.C.: Naxi Pictographs and Transcription Characters Dictionary. Yunnan Nationalities Publishing House, Yunnan (2001) 7. Yin, Y.L., Liu, A.M., Zhou, X.D.: Offline handwritten text line segmentation based on highorder correlation clustering. J. Central China Normal Univ. (Nat. Sci.) 51(1), 18–22, 34 (2017) 8. Zhu, J.B., Chen, W.L.: An approach based on domain knowledge to text categorization. J. Northeastern Univ. (Natural Science) 26(8), 555–564 (2005) 9. Lei, X., Li, J.Y., Song, Y.: Text segmentation method applied for handwritten Chinese character recognition. Intell. Comput. Appl. 8(2), 126–128 (2018) 10. Huang, L., Yin, F., Chen, Q.H.: Graph-based ensemble method for text line segmentation in offline Chinese handwritten documents. J. Huazhong Univ. Sci. Tech. (Natural Science Edition) 42(3), 33–36 (2014) 11. Jia, Y., Tian, X.D., Zuo, L.N.: Layout image analysis for ancient Chinese books based on local outlier factor and wave threshold. Sci. Technol. Eng. 20, 12021–12027 (2020) 12. Wang, H.Q., Dai, R.W.: Projection based recursive algorithm for document understanding. Pattern Recogn. Artif. Intell. 02, 118–126 (1997) 13. Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. IEEE Comput. 25(7), 10–22 (1992)

Chapter 5

Overview of SAR Image Change Detection Based on Segmentation Mengting Yuan, Zhihui Xin, Xiaoqiao Huang, Zhixu Wang, Yu Sun, Yongxin Li, and Jiayu Xuan

Abstract Change detection refers to the analysis of images at different moments, and then ascertaining the characteristics and processes of changes in the earth’s surface. Change detection plays an important role in disaster dynamic monitoring, environment pollution monitoring and urban planning. In the technological society, change detection of SAR image becomes a significant research direction. This paper first describes the concept of SAR image change detection and introduces the traditional image change detection methods. Then are summarized various image change detection methods based on image segmentation. Finally, the existing problems and further development of image change detection are discussed. Keywords Image change detection · Image segmentation · SAR image

5.1 Introduction Synthetic Aperture Radar (SAR) has the capability of all-sky and all-weather observation [1]. SAR has applications in soil moisture content, agriculture, forestry, geology, hydrology, flood and marine monitoring, oceanography, ice and snow detection, land cover mapping, and earth change detection. Remote sensing images can truly reflect the present situation of ground object distribution and the relationships between ground objects or phenomena [2]. It’s an important research direction how to detect the change information from these remote sensing images. Change detection refers to the analysis of images at different moments, and then ascertaining the characteristics and processes of changes in the earth’s surface. The purpose of this study is to filter out irrelevant interference information and extract meaningful change information [3]. The content of change detection is as follows: M. Yuan · Z. Xin (B) · X. Huang (B) · Z. Wang · Y. Sun · Y. Li · J. Xuan Yunnan Key Laboratory of Opto-Electronic Information Technology, School of Physics and Electronic Infomation, Yunnan Normal University, Kunming, China e-mail: [email protected] X. Huang e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_5

47

48

M. Yuan et al.

detect changes that have occurred, recognize the nature of change, determine the extent of the change, and assess spatial patterns of change. Texture difference image, gray difference image, and principal component difference diagram can be used as the detection target of image change information [4]. Image change detection methods include image difference method and principal component transformation method, etc. In the recent years, scholars have proposed many methods of image change detection. Such as Markov random field model, SVM support vector machine, using object-oriented classification technology to classify, method based on segmentation, etc. Different arithmetic and different data processing software are used to further study of image change detection. This paper mainly introduces the methods of traditional and innovative change detection of SAR image.

5.2 Traditional Image Change Detection Methods 5.2.1 Image Difference Method The difference method is to subtract the gray value of the corresponding pixel in another image from the gray value of the pixel in one image, and the result is the change of the image in different periods [5]. In the difference plot, the regions with constant difference are 0 or close to 0, while the regions with non-0 difference are changed. The computational formula of difference method is as follows: I (i, j) = I1 (i, j) − I2 (i, j)

(5.1)

where i, j is pixel coordinate value, I (i, j) is pixel gray value, I1 is the image taken at time 1, I2 is the image taken at time 2. The advantages of image difference method are simple, straightforward and easy to understand. The disadvantage is that the anti-noise performance is poor. If the nature of the change needs to be further discussed, some other methods need to be combined.

5.2.2 Image Ratio Method Similar to the image difference method, the image ratio method is also based on the gray values of pixels in different periods. The image ratio method divides the image pixels in the same place with the corresponding pixels in different periods. And the resulting graph is the changes of the region in this period [6, 7]. If the ratio result is close to 1, there is no change in this region; if the ratio is significantly greater than or less than 1, there is a change in this region. The calculation formula of image ratio method is as follows:

5 Overview of SAR Image Change Detection Based on Segmentation

I (i, j) =

I1 (i, j) I2 (i, j)

49

(5.2)

where i, j is pixel coordinate value, I (i, j) is pixel gray value. I1 is the image taken at time 1, I2 is the image taken at time 2. Compared with the image difference method, the image ratio method has better anti-noise performance. But this method is too simple, may cause the loss of some details, and the accuracy is not very high.

5.2.3 Correlation Coefficient Method Take the N-point neighborhood of pixel (i, j) as i, j , the correlation coefficient γc (i, j) between SAR intensity I 0 and I 1 is: γc (i, j) = 

1 N





I1 (i, j)I0 (i, j)   1 2 2 (k,l)∈i, j (I1 (i, j) N (k,l)∈i, j I0 (i, j) 1 N

(k,l)∈i, j

(5.3)

where I0 is the image taken at time 1, I1 is the image taken at time 2. If there is no change, there is a high correlation between I 0 and I 1 , the correlation coefficient is close to 1 [8].

5.2.4 Image Regression Method The image regression method assumes that the images of different phases are linearly correlated, then reconstruct of the image by using the least square method. And the regression difference image is obtained. The least square formula is as follows: I1 (i, j) = a I0 (i, j) + b

(5.4)

where I 0 is the image taken at time t 1 , I 1 is the image taken at time t 2 , I1 is the reconstruction image of I 1 , a and b are the undetermined parameters of linear function. The advantage of image regression method is that it can reduce the adverse effect caused by the difference of atmospheric conditions and solar altitude angle in the image. But the disadvantage is that the accuracy is not high.

50

M. Yuan et al.

5.3 SAR Image Change Detection Method Based on Segmentation The importance of image change detection is self-evident. On the basis of traditional methods, many scholars proposed an image change detection method based on image segmentation algorithm. Rui Zhao proposed a change detection method of SAR image based on super pixel segmentation and image regression. Firstly, the super pixel segmentation method was used to segment images of different phases into multiple super pixel pairs, and then the least square method was used to perform regression analysis on the images to complete the local regression model [9]. The least square formula is as follows: I1 (i, j) = a I0 (i, j) + b

(5.5)

where I 0 is the image taken at time t 1 , I 1 is the image taken at time t 2 , I1 is the reconstruction image of I 1 , a and b are the undetermined parameters of linear function. In order to determine the parameters a and b, areas with no change {(I0n , I1n ), n = 1, . . . , N } in I 0 and I 1 were selected as samples for regression analysis. The computational formula of a and b is as follows: a=

N

N N n=1 I0n I1n − n=1 I0n n=1 I1n N  2 N 2 N n=1 I0n −( n=1 I0n )

N

N

b=

n=1 I1n

−a N

(5.6)

N

n=1 I0n

(5.7)

Finally, obtain the final difference graph by using LR operator:    I  (i, j)  D(i, j) = log 1 I0 (i, j) 

(5.8)

Xianghai Cao et al. proposed a method based on super pixel saliency analysis. First, the logarithmic ratio method was used to obtain image C, and image C was taken as the significant query point y. The Simple Linear Iterative Clustering (SLIC) method is used to segment the image and obtain the segmentation image I with a certain number of super pixel. Then, calculate the significance value of each super pixel. Function f = [f 1 , f 2 , …, f n ]T . f ∗ = (D − aW )−1 y

(5.9)

f sal(i) = f (i)l(i)

(5.10)

5 Overview of SAR Image Change Detection Based on Segmentation

51

where f * is the optimal solution of f , D is the degree matrix of segmentation image, W is the weight matrix, f sal (i) is the significant value of each super pixel. Assign the significance value of each resulting super pixel block to the set of pixels that belong to it, and then obtain a significant image. Next, the significant image is processed by threshold, and the threshold image with gray value of 0 and 1 is obtained. Then obtain the difference graph by using this image to dot the original image I 1 and I 2 . Finally, the optimal membership matrix is obtained by cluster analysis, which is the result of final change detection [10]. Xue-mei Hu proposed the multi-scale segmentation integration of remote sensing image change detection, on the basis of the FH algorithm was improved, make the algorithm can realize the combination of two image segmentation and segmentation image of spatial location are the same, in combination with multi-scale segmentation, the method is good to keep the global information of source image, and have high area segmentation speed. Then, according to the spectral characteristics and texture characteristics of the segmented image patches, the difference coefficient between the corresponding image patches was calculated, and the change areas at different scales were extracted by the set double threshold. Finally, the multi-scale fusion was carried out to obtain the final change detection results [11]. According to Narayan’s definition: Int (c) =  τ (c) =

1 N



W (e)

e∈MST(C,E)

wmax − wmin |C|



×



Numc k

(5.11)  (5.12)

where N is the number of edges in the minimum spanning tree, wmax is the maximum weight of an edge in a region, wmin is the minimum weight of weight of edges in the region, Numc is the number of regions in the images, k is a constant, I nt (c) is the mean of the ownership values in the region, and τ (C) contains the difference between the maximum weight and the minimum weight in the region. Guiting Wang proposed a multi-temporal change detection method of SAR image based on image segmentation and fusion. The method uses the conditional probability of difference image and two original images to fuse the segmentation results of two original images [12]. Lu Jia et al. proposed multi-kernel image cutting for SAR image change detection and developed multi-kernel image cutting (MKCG) algorithm. MKCG not only avoids modeling, but also gives a complete description of the changed area and enhances noise immunity [13]. Jianlong Zhang proposed change detection of SAR image based on DSSRM cascaded segmentation. DSSRM algorithm converges and merges in semi-semantic image super pixel space, and simplifies SRM to obtain the final result of change detection [14]. Yujun Zhang proposed a change detection method of SAR image based on SRM segmentation, which took the multi-layer dynamic sorting statistical region combination method as the core to obtain the super pixel segmentation results of the difference images, and adopted

52

M. Yuan et al.

the cascade segmentation framework and Markov random field weight optimization algorithm, respectively, to complete the change detection [15]. A change detection method of SAR image based on spatial threshold segmentation was proposed by Yan Liao. Yan Liao proposed a change detection method of SAR image. The algorithm was optimized for the condition that the existing conditional distribution model of KI threshold method was not fit enough to improve the neighborhood ratio difference graph [16]. Peiyang Zhang proposed polarization change detection of SAR image based on collaborative segmentation. Firstly, the polarization features of planned SAR images were extracted and the initial segmentation results were obtained through optimization. Then, the initial segmentation results were further optimized by spatial constraint algorithm to obtain the final segmentation results, change detection, and change recognition results [17]. Gang Zhang proposed change detection method of SAR image based on the automatic threshold segmentation and deep neural network. The fitting degree of KI threshold segmentation algorithm to the pixel distribution of remote sensing image is not accurate enough, and this algorithm improves the disadvantage of KI threshold segmentation algorithm. Training the deep confidence network by using two-dimensional Gabor filter to select suitable training samples, and the final change detection result is obtained [18]. Min Yuan et al. proposed high resolution remote sensing image change detection based on collaborative segmentation [19]. Lei Zhao proposed unsupervised multi-channel remote sensing image change detection based on segmentation window [20]. Traditional SAR image change detection is studied by pixel as feature unit, but it does not take into account the spatial neighborhood information of images, resulting in low accuracy of SAR image change detection. To improve the detection accuracy, it is crucial to combine the spatial neighborhood information of the image. The above mentioned image change detection method is combined with the segmentation algorithm, using super pixel as the processing unit. Because the super pixel can better combine the spatial neighborhood information of the image, it can retain good image edge information and details and can better determine the position of the changing area and improve the accuracy. Therefore, various segmentation methods applied to SAR image change detection can roughly detect the change area. These algorithms extend the research direction of remote sensing image.

5.4 Simulation Results Figure 5.1 is the simulation results of some methods, including traditional methods and several image change detection results combined with segmentation method (a) and (b), respectively, show the SAR images of Ottawa. The images are taken on the satellite and reflect the changes in the rainy season in Ottawa. (c) is the artificially marked variation area. (d)–(g) shows the simulation images of method based on ratio method and difference method, logarithmic mean segmentation method, super pixel significance analysis algorithm, multi-scale cross-significance analysis.

5 Overview of SAR Image Change Detection Based on Segmentation

53

Fig. 5.1 Simulation results: a SAR image 1, b SAR image 2, c artificially marked variation area, d ratio and difference method, e logarithmic mean segmentation method, f super pixel significance analysis algorithm, g multi-scale cross-significance analysis

Table 5.1 is the comparison of detection indexes. Where, MA is the number of pixels detected unchanged but actually changed. FA is the number of pixels that have detected a change but have not actually changed. OE is the total number of errors, which is the sum of MA and FA. ACC is the accuracy rate, which is the percentage of the total number of correctly detected pixels to the total number of pixels. Kappa coefficient is an indicator to verify whether the detection results are consistent with the reference results. The value of Kappa coefficient is between 0 and 1. The larger the Kappa coefficient is, the better the change detection performance of this method is. According to the analysis of the detection results and comparison of detection indexes, fusion ratio method and difference method [21] have the most noise and Table 5.1 Comparison of detection indexes Method Index

MA

FA

OEE

ACC (%)

Kappa

Ratio and difference method

2409

1358

3767

95.27

0.8026

Logarithmic mean segmentation method

2385

1274

3659

96.40

0.8607

Super pixel significance analysis algorithm

334

1274

1608

98.42

0.9390

Multi-scale cross-significance analysis

144

1102

1656

98.37

0.9379

54

M. Yuan et al.

more missing details in the detection results. Compared with other algorithms, the detection effect is the worst. Logarithmic mean segmentation method [22] calculates the average value of the original two graphs and the difference graph obtained by logarithmic ratio method, selects the minimum value of the average value of the original two graphs, and sets a threshold value combining with half of the average value of the difference graph to segment the difference graph and obtain the variation graph. The detection result of this method has more noise, which reflects the under segmentation and more missing edge details. The results of the super pixel significance analysis algorithm [23] and the multi-scale mutual significance analysis algorithm [24] are better than the previous two algorithms. Because different scales of the image show different changes and the mutual significance detection integrates the image information, so as to better detect the changing areas. In general, the image change detection method based on the significance analysis of super pixel segmentation and improved manifold ordering can not only roughly determine the change range by significance detection, but also improve the edge retention ability and the accuracy of change detection by taking the spatial information of the image into account.

5.5 Summary and Expectation The application of image change detection is more and more extensive, involving disaster, environment, urban monitoring, and so on. There are more and more SAR image change detection methods combined with various algorithms. However, some algorithms cannot detect images of different scales better, and edge details are easy to be missing. However, the SAR image change detection method combined with segmentation technology can take the spatial information of the image into account and effectively remove the interference information in the image. So the edge information can be kept better and the accuracy of change detection can be improved. These SAR image change detection methods improve the accuracy of change detection to a certain extent and reduce the false alarm rate. However, the current detection methods still face some challenges and need to be further improved. First is how to reduce the false alarm rate while ensuring the highest detection accuracy. Second, there is no efficient change detection method for remote sensing images that can be applied to all scenes. SAR images are widely used, more ground object types and change characteristics are considered. Therefore, more effective detection methods are needed to solve these problems. Thirdly, if the deep learning method is used for change detection, the training model needs to be further optimized to shorten the training time.

5 Overview of SAR Image Change Detection Based on Segmentation

55

References 1. Shang, R., Xie, K., Okoth, M.A., Jiao, L.: SAR image change detection based on mean shift pre-classification and fuzzy C-means. In: IGARSS 2019–2019 IEEE International team and Remote Sensing Symposium, pp. 2358–2361 (2019) 2. Ghosh, S., Bruzzone, Patra, S., et al.: A context-senstive technique for unsupercised change detection based on Hopfield-type neural networks. IEEE Trans. Geosci. Rem. Sens. 45(3), 778–789 (2007) 3. Zhao, Z.X., He, J.N.: Change detection method of CD-BS remote sensing image based on super pixel. Sci. Technol. Innov. 22, 44–45 (2021) 4. Gamba, P., Dell’Acqua, F., Lisini, G.: Change detection of multitemporal SAR data in urban areas combining feature-based and pixel-based techniques. IEEE Trans. Geosci. Rem. Sens. 44(10), 2820–2827 (2006) 5. Achanta, R., Shaji, A., Smith, K., et al.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012) 6. Zhao, G.W., Peng, Y.X.: Semisupervised SAR image change detection based on a siamese variational autoencoder. Inf. Process. Manage. 59(1), 102726 (2022) 7. Wang, Y.H., Gao, L.R., Hong, D.F., et al.: Mask DeepLab: end-to-end image segmentation for change detection in high-resolution remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 104, 102582 (2021) 8. Sun, H., Xia, G.S., Sang, C.W., Su, X.: Synthetic Aperture Radar Image Information Interpretation and Application Technology, p. 205. Science Press, Beijing (2020) 9. Zhao, R., Peng, G.H., Yan, W.D., et al.: Change detection in SAR images based on superpixel segmentation and image regression. Earth Sci. Inf. 14(1), 69–79 (2021) 10. Cao, X.H.: A method of SAR image change detection based on superpixel saliency analysis. Xidian University, Xi’an, Shaanxi Province, 1 December 2018 11. Hu, X.M., Jia, Z.H., Qin, X.Z., Yang, J., Nikola, K.: Remote sensing image change detection based on multi-scale segmentation and fusion. Comput. Eng. Des. 36(09), 2452–2456 (2015) 12. Wang, G.T., Gu, Z.Q., Jiao, L.C., Fan, Y.Z.: Key Laboratory of Intelligent Perception and Image Understanding of the Ministry of Education of China, Institute of Intelligent Information Processing, Xidian University, Xi’an, 710071, China. Change detection based on image segment and fusion in multitemporal SAR images. China Institute of Electronics (CIE). Proceedings of 2009 2nd Asian-Pacific Conference on Synthetic Aperture. China Institute of Electronics (CIE): IEEE Beijing Section, 4 (2009) 13. Lu, J., et al.: Multiple kernel graph cut for SAR image change detection. Rem. Sens. 13(4), 725–725 (2021) 14. Zhang, J.L., Wang, B.: SAR image change detection method of DSSRM based on cascade segmentation. J. Rem. Sens. 21(04), 614–621 (2017) 15. Zhang, Y.J.: Research on SAR Image Change Detection Method Based on SRM Segmentation. Xidian University (2017) 16. Liu, Y., et al.: A Novel Approach to SAR Image Change Detection Based on Spatial Threshold Segmentation. Xidian University (2019) 17. Zhang, P.Y.: Polarimetric SAR Image Change Detection Based on Collaborative Segmentation. Xidian University (2020) 18. Zhang, G.: SAR Image Change Detection Based on Automatic Threshold Segmentation and Deep Neural Network. Xidian University (2020) 19. Yuan, M., Xiao, P.F., Feng, X.Z., Zhang, X.L., Hu, Y.Y.: J. Nanjing Univ. (Natural Science) 51(05), 1039–1048 (2015) 20. Zhao, L., Wang, B., Zhang, L.M.: Inform. Electron. Eng. 8(02), 173–179+185 (2010) 21. Celik, T.: Unsupervised change detection in satellite images using principal component analysis and k-means clustering. IEEE Geosci. Rem. Sens. Lett. 6(4), 772–776 (2009) 22. Sumaiya, M.N., Kumari, R.S.S.: Logarithmic mean-based thresholding for SAR image change detection. IEEE Geosci. Rem. Sens. Lett. 13(11), 1726–1728 (2016)

56

M. Yuan et al.

23. Cao, X.H., A method of SAR image change detection based on superpixel saliency analysis. Xidian University, Xi’an, Shaanxi Province, 1 December 2018 24. Lempitsky, V., Kohli, P., Rother, C., Sharp, T.: Image segmentation with a bounding box prior. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 277–284. IEEE (2010)

Chapter 6

Full-Focus Imaging Detection of Ship Ultrasonic-Phased Array Based on Directivity Function Cailiang Huang and Yongzheng Li

Abstract For ship manufacturing and transportation, the metal sheet used directly affects the operation quality and safety of the whole system. Therefore, in order to ensure the scientific nature and effectiveness of internal parts and structures, researchers proposed to use ultrasonic-phased array total aggregation imaging detection technology for practical analysis. In this study, firstly, the application advantages of ultrasonic-phased array frequency–wave number domain full-focusing algorithm are systematically understood, and then, the possible delamination defects are detected and studied according to the characteristics of metal sheet used in ships. In the experiment, two kinds of metal sheets with different thicknesses are selected, and the imaging quality of the two algorithms, namely, time domain full focus and frequency–wave numbers domain full focus, is analyzed. The final results show that the full-focus algorithm based on time domain and frequency wave data can accurately detect the defects of metal sheet. Compared with the traditional time domain full-focus imaging method, the detection advantage of the algorithm in this paper is more obvious. Keywords Ship · Metal sheet · Time domain full focusing · Ultrasonic-phased array full-focusing method · Imaging detection

6.1 Introduction In terms of practical application, the sheet metal selected in shipbuilding construction and manufacturing process has excellent mechanical properties and small practical application quality, so it has been widely used in shipbuilding parts manufacturing. However, the actual quality directly determines the service life and safety performance of the ship’s overall system architecture. Therefore, it is of great significance to carry out systematic nondestructive testing for metal sheet used in ships. For C. Huang (B) · Y. Li School of Shipping and Ocean Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_6

57

58

C. Huang and Y. Li

sheet metal parts, ultrasonic testing technology is superior to other methods in precision and cost. In practical research, electromagnetic loading excitation ultrasonic wave and nonlinear ultrasonic wave are used to detect and analyze closed cracks in aluminum plates. At the same time, ultrasonic water jet penetration C-scan detection method has been used to study sheet metal defects. Essentially, delamination defects are the most common problem in sheet metal. Practical detection methods include oblique incident s-wave reflection method and guided wave method. However, in the detection process of these two methods, it is easy to reduce the surface reflection and detection signal if there is a slope delamination defect. As a result, the most commonly used ultrasonic detection methods cannot quickly determine the detection signal and obtain a specific image of the defect. At present, ultrasonicphased array detection as the core of the full-focus imaging detection method has attracted the attention of researchers and scholars. Compared with traditional-phased array imaging technology, Time frequency modulation (TFM) has stronger resolution and signal-to-noise ratio, which can solve the defects of effective detection caused by traditional too small size. In order to solve this problem, the frequency beam domain algorithm is used for full-focus imaging. Since STOLT proposed the beam domain algorithm to solve seismic imaging problems in the 1970s, the frequency– wave number domain full-focusing algorithm has been widely used, which not only improves the lateral resolution of images but also provides an effective basis for subsequent technical research. At the same time, Liu et al. [1] proposed to study the propagation characteristics of laser Lamb wave in the frequency beam domain in practical exploration, and test and analyze the actual defect depth according to the relationship between wave number and aluminum plate thickness. Zhang and Duan [2] used 16-channel multi-receiver multi-phased array detection instrument and constant beam domain algorithm to conduct in-depth detection of surface defects of thin aluminum plates, which not only effectively verified the application advantages of the algorithm but also achieved good defect reconstruction effects. Therefore, this paper uses two ultrasonic-phased array probes to detect and analyze thin plates and defects of different thickness and compares the imaging effects of traditional full-focus algorithm and frequency beam domain algorithm [3].

6.2 Method 6.2.1 Principle Analysis The full-focusing algorithm is quite different from the traditional-phased array imaging method. The former requires data acquisition (FMC—FPGA Mezzanine Card) of the whole matrix before imaging and more imaging methods after data processing. In which, the second part is the receiving element part, which serves as the receiving part of data collection.

6 Full-Focus Imaging Detection of Ship Ultrasonic-Phased Array …

59

Fig. 6.1 Schematic diagram of traditional full-focus imaging technique

In the traditional sense, full-focus imaging technology refers to the time domain time delay sum imaging algorithm based on full-matrix data acquisition [4], which is a virtual focusing application method. Among them, N-phased array probes transmit ultrasonic signals according to the same repeated cycle and all receive reflected signals [5, 6]. The specific schematic diagram is as follows Fig. 6.1. Assuming that the element center of phased array is the origin O, the element direction is X-axis, and the depth direction of the workpiece to be detected is z-axis; a two-dimensional Cartesian coordinate system is constructed to divide the waiting detection into grid nodes, and all points can be regarded as virtual focal points. It is assumed that the waiting detection area contains point F, and the actual coordinate is (x, z). The array element I with the coordinate is (u, 0) sends ultrasonic signal to point F, which is accepted by the array element J with the coordinate is (v, 0) after reflection [6]. According to Fermat theory analysis, sound wave will propagate at the shortest distance, so it can be known that the formula for calculating the total flight time of array element I after it propagates to point F and is received by array element J is as follows:   (u, 0)x 2 + z 2 (v, 0)x 2 + z 2 + (6.1) Ti j = C C In the above formula, T i represents the propagation time of sound wave from I to F, and T j represents the receiving time of sound wave reflected from F to J. C represents the propagation speed of ultrasonic wave in the detected object under the same condition of the detected medium. By analyzing the transmitting and receiving elements using the delay rule, any point in the detection area can be focused (N is the number of nodes). The signal S ij can be scanned according to the electricity obtained from the full-matrix data acquisition, and the corresponding pixel value I ij can be obtained. The specific formula is as follows:

60

C. Huang and Y. Li

Ii j (x, z) =

N  N 

  Si j ti j

(6.2)

i=1 j=1

6.2.2 Full-Focusing Algorithm in Frequency–Wave Number Domain This algorithm is quite different from focusing imaging in the traditional sense, and the actual formula is as follows [7]: ¨ exp( jku u + jkv v) −1   × E(w, u, v) = P(w) 2 (4π ) k 2 − kw2 k 2 − kv2 ¨

   2 2 2 2 k − kw + k − kv z dxdz dku + dkv f (x, z) exp − j(ku + kv )x − j (6.3) In Formula (6.3), (v, u, w) represents the coordinates of the target focus; E(w, u, v) represents the frequency response value of E(t, u, v) signal obtained by the transmitting and receiving array element in the full-matrix data. P(ω) represents the spectrum of transmitted signal; W stands for angular frequency; f (x, z) represents the point scattering function of the target focus, k u represents the wave number of the transmitting array element, and k v represents the wave number of the receiving array element. According to the above formula analysis, the variables u and V contained therein are processed by Fourier transform. In order to obtain images in an appropriate wave number domain, wave number should be mapped to image correlation. This operation is called [8] STOLT mapping, which can not only effectively transform data domain and image domain but also obtain transformed variables. In order to clarify the corresponding expression formula, inverse STOLT mapping should be carried out for the Fourier transform process of variables. Since the corresponding relation is not clear, consistency of wave number K u of an incident wave should be guaranteed, thus obtaining: ∞ F(k x , k z ) =

F(k x , k z |ku )dku

(6.4)

∞ F(k x , k z |ku ) represents the inverse STOLT mapping of In the above formula, beam k u . Recalculating all the values of the beam and defining the two-dimensional Fourier transform and the average value can effectively control the influence of narration or noise, the specific formula is as follows: Finally, in the calculation of

6 Full-Focus Imaging Detection of Ship Ultrasonic-Phased Array …

61

Table 6.1 Parameter design of phased array probe Array excitation signal frequency/MHz

Array elements

Center spacing of array elements/mm

Wavelength/mm

5

64

1.0

1.16

10

8

0.6

0.58

the two-dimensional inverse Fourier transform, the image domain function of the scatterer can be defined.

6.3 Result Analysis 6.3.1 System Introduction Phased array imaging detection system includes probe, imaging software, signal acquisition system, and so on. In the empirical study of this paper, eight array elements with a center frequency of 10 MHz should be selected, and 0.6-mm linear array probe and 64 array elements should be used for the center spacing of array elements. The center frequency should reach 5 MHz, and the linear array probe with a center spacing of 1 mm should be selected for detection. The specific parameters are shown in Table 6.1. At the same time, the negative square wave excitation signal with a voltage of 145 V is used, and the signal sampling frequency reaches 50 MHz. Without analyzing the influence of shear wave, the propagation speed of longitudinal wave in 304 stainless steel sheet is 5800 m s−1 , which is integrated into the imaging software. In addition, the length and width of the 304 stainless steel sheet to be tested are 1200 mm and 800 mm, respectively. The actual thickness is 1 and 3 mm. In the thin plate, the square with the side length of 20 mm and the circular B port with the diameter of 60 mm are artificial delamination defects so as to provide effective basis for practical research [9].

6.3.2 Test Experiment First of all, the embedding of deep defects should be studied. A longitudinal wave direct probe with a frequency of 10 MHz and a diameter of 6.35 mm was used to measure the lamination defects with a thickness of 1 mm and 3 mm, respectively, and the obtained ultrasonic waveform changes are shown in Figs. 6.2 and 6.3 [10]. Combined with the above analysis, it is found that the embedding depth of defects of different thickness of thin plate can approximately reach 1/2 of the actual thin plate thickness [11].

62

C. Huang and Y. Li

(a)Square defect

(b)The circular defects Fig. 6.2 Defect depth of 1 mm thickness

Secondly, the application results of full-focus imaging technology at 10 MHz are compared and analyzed, as shown in Table 6.2. Combined with the results given in Table 6.3, it is found that the application effect of the detection algorithm studied in this paper is closer to the actual measured

6 Full-Focus Imaging Detection of Ship Ultrasonic-Phased Array …

63

(a) Square defect

(b)The circular defects Fig. 6.3 Embedding depth of defects with thickness of 3 mm

value, and the traditional full-focus imaging technology is easily affected by noise and increases the measurement difficulty. Finally, the application results of full-focus imaging technology at 5 MHz are compared and analyzed, as shown in Table 6.3. Combined with the analysis in the above table, it is found that in order to further verify the application advantages of the algorithm studied in this paper, the 5 MHz 64-element-phased array probe is selected for detection and analysis. The actual

64

C. Huang and Y. Li

Table 6.2 Comparison of measurement results at 10 MHz Defect types

A sweep

Frequency–wave number domain TFM

Traditional TFM

A TFM measurement deviation between sweep and frequency wave number domain

A scan deviation from traditional TFM measurement

The circular defects

1.472

1.492

1.433

0.020

0.039

Square defect

1.481

1.493

1.543

0.012

0.062

Table 6.3 5 MHz measurement results Algorithm

Defect size

The measured results

Error

Full focusing in frequency–wave number domain

20 * 20 φ60

18.69 * 18.69 φ52.29

6.55 12.85

Traditional full focus

20 * 20 φ60

22.23 * 22.23 φ45.29

11.15 24.52

application length is larger than the defect size. During the detection, the probe will soon be defective, and part of the array elements span in the defect, while the other part of the array elements span in the place without defect. From a theoretical perspective, the final imaging result will contain two parts: defect imaging on the one hand and defect-free effect on the other. Comparing the results of the two full-focus imaging methods, it is found that the traditional full-focus imaging algorithm is difficult to accurately distinguish the defect region from the non-defect region. However, the algorithm in this paper can accurately distinguish the location of defects and actually obtain a relatively significant step contour in the image. It should be noted that it is difficult for the two algorithms to quantitatively analyze the specific length of the defect, but the traditional full-focus imaging algorithm has lower resolution and less accuracy in actual measurement. Therefore, the frequency–frequency domain algorithm of full-focus imaging technology is effective [6–8, 12].

6.4 Conclusion To sum up, in the new era of economic development, shipbuilding transportation as the important basis of social and economic rising steadily by using sheet metal quality directly determines the subsequent vessel operating efficiency and the safety of the personnel property, so in practice during the research and production, to take advantage of frequency ultrasonic-phased array full-wave number domain

6 Full-Focus Imaging Detection of Ship Ultrasonic-Phased Array …

65

focus imaging testing analysis. The empirical analysis in this paper shows that although the traditional full-sentence intersection algorithm is simpler in principle, the actual computing time and imaging efficiency are not high compared with the frequency–wave number domain full-focus imaging technology. Therefore, during the subsequent technology research and development, the ultrasonic-phased array full-focusing method should be reasonably used for imaging inspection of various properties of relevant applied materials, which can not only obtain more significant image information but also guarantee the actual measurement results and improve the accuracy of imaging results.

References 1. Liu, Z.H., Dong, T.C., Peng, Q.L., et al.: Acoustic emission source location of carbon fiber composite plate. Nondestr. Test. (10), 5 (2016) 2. Zhang, H.Y., Duan, W.H.: Acoust. Technol. 39(4), 5 (2020). (in Chinese) 3. Su, D.Q., Liu, Y.: Research on accurate identification and coverage assessment of street shops based on “Internet + MR” big data. Designing Tech. Posts Telecommun. (2) (2019) 4. Zhang, L.J.: Research on user portrait assisted precision marketing based on big data analysis. Telecom Technol. 000(001), 61–62, 65 (2017) 5. Li, Y.: Defect characterization in ultrasonic full-focus imaging detection. Nondestr. Test. (3) (2018) 6. Li, Y.: Full-focus surface detection of elbow weld. Nondestr. Test. (1), 5 (2019) 7. Yang, G.D., Zhan, H.Q., Wei Chen, W., et al.: Nondestr. Test. 40(5), 4 (2018). (in Chinese) 8. Wei, W., Zhu, D.Y., Wu, D.: Radar J. 9(2), 9 (2020). (in Chinese) 9. Liang, J.J., Li, X.B., Ni, P.J., et al.: Ultrasonic phased array detection method for magnesium alloy shell. Ordnance Mater. Sci. Eng. 37(4), 4 (2014) 10. Li, Y., Li, Y., et al.: Ultrasonic phased array full focus imaging detection. Nondestr. Test. 05, 39(386), 62–69 (2017) 11. Lee, J.H., Choi, S.W.: Optimum design of linear phased array transducer for NDE. Key Eng. Mater. 183(187), 619–624 (2000) 12. Chi, Q.Q., Hu, M.H.: Acoust. Technol. 39(02), 52–59 (2020). (in Chinese)

Chapter 7

A Novel Space Division Rough Set Model for Feature Selection Shulin Wu, Shuyin Xia, and Xingxin Chen

Abstract Feature engineering has been widely used in the fields of pattern recognition, data mining and machine learning. Its main application is feature selection. As an excellent branch of feature selection, rough set attribute reduction has also been greatly developed in recent years, but classical rough set can only deal with discrete data. In view of this, combined with space partition, this paper proposes an efficient rough set attribute reduction theory, which makes the current model suitable for not only discrete data sets, but also continuous data sets and obtains an efficient rough set algorithm. The experimental results show that the algorithm reduces the number of attributes and uses the classifier KNN to test the accuracy of attribute reduction. Compared with the existing algorithms, it shows great advantages. Keywords Feature selection · Rough set · Attribute reduction · Space partition

7.1 Introduction Feature selection, or attribute selection, is a very important process in data preprocessing. It can not only reduce the difficulty of learning but also alleviate the dimensional disaster. Therefore, feature selection is a hot issue in current research. A large number of scholars at home and abroad have made important contributions to this. According to the formation process of search strategy subset, many research directions can be divided into three methods: global optimal search, random search and heuristic search. So far, the branch and bound method is the only global search method that can obtain the optimal results. Find the optimal subset under the condition that the number of features in the optimal feature subset is determined in advance [1]. Random search strategy is to assign a certain weight to each feature during the operation of the algorithm. Although the heuristic search strategy is very efficient, it is at the expense of global optimization. According to whether the selection method is independent of the learning algorithm, it can be roughly divided into S. Wu (B) · S. Xia · X. Chen School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_7

67

68

S. Wu et al.

three categories: filtering method, wrappering method and embedding method. The main idea of filtering method is to score the features according to certain evaluation criteria and then sort the features in descending order according to their scores. The evaluation criteria can be divided into four categories: distance measure, information measure, dependence measure and consistency measure. Wrapper method takes feature selection as a feature subset optimization process and uses classifier to evaluate the feature subset. Because each feature subset needs to train a classifier, most packaging methods are inefficient. In addition, due to the limitation of the classifier used in this method, the feature subset obtained by packaging method often has low universality. The research of wrapper method usually focuses on the optimization process. Embedded method embeds feature selection into the training process of learning algorithm to screen out those features that are important for model training. Feature selection based on rough set theory belongs to filter method. Rough set theory was proposed by Polish scientist Pawlak in 1982 [2]. It is an effective mathematical tool to process uncertain, inconsistent and incomplete data, which has been widely applied in machine learning, data mining, decision support systems and other application fields [3–9]. The forward heuristic attribute selection algorithm based on rough set theory can effectively reduce the time complexity; thus, a lot of research results have been come up in decades [6, 10–14]. Hu et al. [15] proposed a set of neighborhood rough set model, which can be applied to both discrete data sets and continuous data sets. Neighborhood rough sets use neighborhood relations to replace equivalent relations, and the implementation shows that neighborhood rough sets are more effective than classical rough sets. However, classical neighborhood rough sets need to do a lot of calculations through Euclidean distance; when the data latitude is high, it will lead to low efficiency. Therefore, a new rough set model is proposed in this paper, which is suitable for both discrete data and continuous data. The complex steps of Euclidean distance calculation are omitted, and the attribute importance is comprehensively considered by using the subspace divided by space and the number of positive domains. The experimental results show that the model shows good performance. The main contributions of this paper are as follows: 1.

2.

Use space division to process the relationship between samples, avoid discretizing the data set before seeking the positive domain and achieve the purpose of de-discretization; Using the idea of granular computing, comprehensively consider the relationship between the positive domain and the number of subspaces to determine a more effective attribute importance. Experiments show that it has better accuracy than classical rough set.

7 A Novel Space Division Rough Set Model for Feature Selection

69

7.2 Related Work The essence of filter methods is to use some indicators in mathematical statistics to score features, including Pearson correlation coefficient, Gini coefficient, Kullback– Leibler divergence, Fisher score, similarity measure, etc. Since filter method only uses the dataset itself and does not rely on specific classifiers, it has strong versatility and is easy to expand. Compared with wrapper methods and embedded methods, filter methods have lower algorithm complexity. But at the same time, the classification accuracy of filter methods is usually lower than the other two methods. In addition, because filter methods only score a single feature rather than the whole feature subset, the feature subset obtained by filter methods usually has high redundancy. Gu et al. [16] proposed a generalized Fisher score feature selection method, which aims at finding a feature subset to maximize the lower bound of Fisher score. This method transforms feature selection into a quadratically constrained linear programming and uses cutting plane algorithm to solve this problem. Roffo and Melzi [17] proposed a feature selection method based on graph, which ranked the most important features by recognizing them as arbitrary clue sets. This method maps feature selection problem to affinity graph by taking features as nodes and then evaluates the importance of nodes by eigenvector centrality. And another work [18] proposed a filtering feature selection method, Inf-FS, which takes features as nodes of a graph and views feature subset as a path in the graph. The power series property of matrix is used to evaluate the path, and the computational complexity is reduced by adding the paths until the length reaches infinity; there are other methods of feature selection [19–22]. For rough set filtering method, Hu, Zhao and Yu proved that the samples in the positive region will not enter the boundary region in the process of positive attribute reduction, and proposed FARNeMF algorithm [23]. The algorithm only considers the samples of the boundary region when calculating the neighborhood, which reduces the redundant calculation. Gao et al. [24] use the matrix to save the measurement and calculation results. After the dimension increases, only one-dimensional measurement and calculation is required, which reduces the amount of calculation required to find the positive region. In neighborhood rough sets, neighborhood radius is a parameter that has a great impact on the reduction results and must be set manually. How to select this parameter is also a frequently studied problem. Peng et al. [25] designed a fitness function, which combined the attributes of data set and classifier to select the best neighborhood radius from the given neighborhood radius interval. Xia et al. [26] proposed GBNRS, an adaptive neighborhood rough set model, by combining granular computing with neighborhood rough set. This method generates multiple sphere coverage data sets and adaptively sets the neighborhood radius according to the average distance between the sample in the sphere and the sphere centroid. The most widely used rough set is forward heuristic. It is not only efficient but also can generate different knowledge rules. Therefore, the rough set method in this paper defaults to this method. Based on this, a new rough set model is proposed, which not only has the characteristics of classical neighborhood rough set, can process

70

S. Wu et al.

continuous data and has high processing efficiency, but also uses the leaf node space to set a new index to improve the classification accuracy more obviously.

7.3 Our Approach The classical Pawlak rough sets is based on the strict equivalence relation, which refines the sample space into disjoint equivalence classes and uses the upper approximation set and the lower approximation set to describe the uncertainty of information particles. The more samples in the positive field, the higher the label’s dependence on this attribute, the greater the effect of this attribute on distinguishing labels. However, due to the strict equivalence relationship, classical rough sets can only be used on discrete data sets (Fig. 7.1). Compared with the classical neighborhood rough set, GBNRS still needs distance calculation. Moreover, the neighborhood relationship is less explanatory. However, due to the proposal of the particle ball, the depiction of the neighborhood relationship is greatly accelerated. However, compared with high-sample data sets, the amount of calculation is still large. Therefore, this paper combines the ideas of classic rough set, space division and neighborhood rough set, reduces the amount of calculation

Fig. 7.1 Schematic diagram of classical neighborhood rough set

7 A Novel Space Division Rough Set Model for Feature Selection

71

through space division and combines the number of subspaces to comprehensively consider the importance of attributes. Definition 1 Space division [27]. Given a data set D, suppose that S is a compact data space reduced by D. {S1 , S2 , . . . , S M } is a space division of S if S = S1 ∪ S2 ∪ . . . S M , Si ∩ S j = ∅, 0 < i, j < M. We say that Si is a subspace of S. The parent space S of Si is denoted as parent (Si ). We let parent2 (Si ) = parent(parent (Si )), and so on, with the j parent space of Si notated parent j (Si ). The majority of the labels of points that appears in S is denoted label (S) (Fig. 7.2). Definition 2 Upper and lower approximation. Let be a decision system. We notate the partition of the universe U by the decision attribute set D into n equivalence classes by X 1 , X 2 , . . . , X n . ∀B ∈ C there is a corresponding equivalent relationship R B on U. The upper approximation and the lower approximation of D with respect to B are respectively defined as BD =

Fig. 7.2 Spatial hierarchy of the division



.. i=1 B X i

(7.1)

72

S. Wu et al.

BD =



.. i=1 B X i

(7.2)

    where B X i = xk |σ BS (xk ) ⊆ X i , xk ∈ U ,B X i = xk |σ BS (xk ) ∩ X i = ∅, xk ∈ U . Definition 3 Positive domain in single subspace. Let U, C, D be an information system. S is a space division of U, S = {S1 , S2 , . . . , Sn }, Si is different subspace, ∀B ∈ C, and its relative positive domain index RPOS can be defined as follows: RPOS B (D) = POS B (D)/|S|

(7.3)

7.4 Experiments In this part, we will compare with other excellent algorithms that are widely used in the field of attribute selection. 11 public UCI machine learning data sets [28] are shown in Table 7.1. Firstly, as a branch of feature selection, attribute reduction reduces redundant attributes as much as possible without affecting the accuracy of classification decision-making. Therefore, reducing attributes as much as possible is also an important improvement process of our algorithm. Through Table 7.2, we can see that our proposed algorithm shows good reduction effect on more than half of the data sets, and it is guaranteed to be above high classification accuracy, in which Data_for_UCI_Name, the reduction rate of named is even more than 90%. It can be seen that our algorithm performs well in the reduction degree. In this part, we compare our new algorithm with three popular neighborhood rough set algorithms (FARMeRF) and classical rough set algorithms (RS); GBNRS Table 7.1 Information of dataset Dataset

Samples

Attributes

Classes

1

Algerian_forest_fires_dataset_UPDATE

244

10

2

2

anneal

798

37

5

3

Data_for_UCI_named

10,000

13

2

4

heart2

303

13

5

5

hepatitis

155

19

2

6

sat

4435

36

7

7

seismic-bumps

2583

18

2

8

sensorReadings

5456

24

4

9

SeoulBikeData

8760

12

2

10

wdbc

569

30

2

11

wifiLocalization

2000

7

4

7 A Novel Space Division Rough Set Model for Feature Selection Table 7.2 Results of attribute reduction

73

Dataset Reduction result

Reduction rate (%)

1

[4, 7]

80

2

[0, 2, 3, 4, 8, 10, 11, 12, 19, 22, 68.4 24, 35]

3

[12]

92.3

4

[1, 2, 4, 6, 10, 11, 12]

46.2

5

[1, 3, 4, 16, 17]

73.4

6

[0, 2, 11, 15, 16, 17, 18, 20]

77.8

7

[1, 4, 7, 8]

77.8

8

[10, 13, 16, 17, 18]

79.2

9

[0, 9, 10]

75

10

[0, 13, 21, 22, 23, 24, 26]

76.7

11

[2, 3, 4, 6]

42.9

is a feature selection algorithm that uses granules to describe the neighborhood relationship, and the Weight of Neighborhood Rough Set (WNRS) uses attribute weighting to obtain attribute selection results. Using the results of space particle rough set (SPRS) reduction, various rough set algorithms are compared on the same benchmark, that is, the reduction results with the best performance are selected. In addition, two feature selection non rough set algorithms with good performance are selected, mutinffs and mrmr; they both use the amount of information to determine the importance of features, and their source code comes from the open-source code library FSLIB. For these two algorithms, we use grid search to obtain the optimal results. Due to the independence between the reduction results and classifiers, we uniformly select the widely used and positive classifier KNN for accuracy evaluation. It should be noted here that since the other six algorithms get the importance ranking of attributes, in the experiment, we are the first k feature results of the selected optimal performance. The experimental results are shown in Table 7.3. As shown in Table 7.3, on the whole, we can see that our experimental algorithm performs very stably on the current eleven data sets. For the eleven data sets, most of the data sets perform best, and only a few data sets perform poorly, but it is also close to other algorithms with the best performance. It is worth mentioning that the average accuracy on all data sets reaches 0.9120. It is one point higher than the second highest algorithm GBNRS, which proves the effectiveness of our algorithm. On the other four data sets with the second highest accuracy, it can be seen that our algorithm is still relatively effective. However, due to the small amount of data in the data set and the excessive dispersion of data, there are some poor spatial division results.

74

S. Wu et al.

Table 7.3 Classification accuracy of attribute reduction based on KNN Datasets

RS

FARMeRF

GBNRS

WNRS

mutinffs

mrmr

SPRS

1

0.9301

0.9633

0.9631

0.9511

0.9140

0.9428

0.9635

2

0.9222

0.98131

0.9699

0.9649

0.9725

0.9562

0.9800

3

0.9875

0.9997

0.9147

0.9782

0.7569

0.9354

0.9997

4

0.5607

0.5776

0.5772

0.5642

0.5644

0.5542

0.5903

5

0.8129

0.8579

0.8571

0.8714

0.8250

0.8652

0.8650

6

0.8211

0.8475

0.8455

0.8461

0.8536

0.8457

0.8511

7

0.8742

0.9183

0.9167

0.9175

0.8974

0

0.9210

8

0.8539

0.7769

0.7734

0.7725

0.7524

0.8528

0.9235

9

0.9238

0.9902

0.9257

0.9259

0.9776

0.9173

0.9902

10

0.9384

0.9719

0.9649

0.9736

0.9719

0.9631

0.9701

11

0.968

0.9775

0.975

0.975

0.959

0.9745

0.9775

Average

0.8721

0.8965

0.8803

0.9032

0.8503

0.8773

0.8586

7.5 Conclusion We have introduced a new rough set model, which not only inherits the advantages of the original basis, avoids the failure of Euclidean distance, replaces the classical neighborhood by spatial division and draws lessons from the new relative positive domain index, so as to further improve the accuracy of the model and define a new principle definition consistent with the classical rough set, so as to make the model more interpretable and wider adaptability. And it performs well in a large number of data sets. We will continue to study based on this in the future. Acknowledgements This work was supported in part by the National Key Research and Development Program of China under Grant No. 2019QY(Y)0301, National Natural Science Foundation of China under Grant Nos. 61806030 and 61936001, the Natural Science Foundation of Chongqing under Grant Nos. cstc2019jcyj-msxmX0485 and cstc2019jcyj-cxttX0002 and by NICE: NRT for Integrated Computational Entomology, US NSF award 1631776.

References 1. Dash, M., Liu, Y.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997) 2. Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982) 3. Ma, W., Huang, Y., Li, H., Li, Z., Jiao, L.: Image segmentation based on rough set and differential immune fuzzy clustering algorithm. J. Softw. 25(11), 2675–2689 (2014) 4. Qian, Y., Liang, X., Wang, Q., Liang, J., Dang, C.: Local rough set: a solution to rough data analysis in big data. Int. J. Approximate Reasoning 97, 38–63 (2018) 5. Vidhya, K., Geetha, T.: Entity resolution framework using rough set blocking for heterogeneous web of data. J. Intell. Fuzzy Syst. 34(1), 659–675 (2018)

7 A Novel Space Division Rough Set Model for Feature Selection

75

6. Hu, M., Yao, Y.: Structured approximations as a basis for three way decisions in rough set theory. Knowl. Based Syst. 165, 92–109 (2019) 7. Yao, Y., Wong, S., Lingras, P.: A decision-theoretic rough set model. Methodol. Intell. Syst. 17–24 (1990) 8. Ziarko, W.: Variable precision rough set model. J. Comput. Syst. Sci. 46(1), 39–59 (1993) 9. Li, Z., Fan, J., Ren, Y., Tang, L.: A novel feature extraction approach based on neighborhood rough set and PCA for migraine rs-fMRI. J. Intell. Fuzzy Syst. 38(6), 1–11 (2020) 10. Luo, S., Miao, D., Zhang, Z., Zhang, Y., Hu, S.: A neighborhood rough set model with nominal metric embedding. Inf. Sci. 520, 373–388 (2020) 11. Wang, C., Qi, Y., Shao, M., Hu, Q., Chen, D., Qian, Y., Lin, Y.: A fitting model for feature selection with fuzzy rough sets. IEEE Trans. Fuzzy Syst. 25(4), 741–753 (2017) 12. Zhang, K., Zhan, J., Wang, X.: TOPSIS-WAA method based on a covering-based fuzzy rough set: an application to rating problem. Inf. Sci. 539, 397–421 (2020) 13. Zhang, W., Chen, H.: Hyperspectral band selection algorithm based on kernelized fuzzy rough set. J. Comput. Appl. 40(1), 258–263 (2020) 14. Dai, J., Hu, H., Wu, W., Qian, Y., Huang, D.: Maximal-discernibility pair-based approach to attribute reduction in fuzzy rough sets. IEEE Trans. Fuzzy Syst. 26(4), 2174–2187 (2017) 15. Hu, Q., Yu, D., Xie, Z.: Numerical attribute reduction based on neighborhood granulation and rough approximation. J. Softw. 19(3), 640–649 (2008) 16. Gu, Q., Li, Z., Han, J.: Generalized Fisher score for feature selection. In: CoRR (2012) 17. Roffo, G., Melzi, S.: Ranking to learn: feature ranking and selection via eigenvector centrality. In: CoRR, pp. 19–35 (2017) 18. Roffo, G., Melzi, S., Castellani, U.: Infinite feature selection: a graph-based feature filtering approach. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4396–4410 (2020) 19. Roffo, G., Melzi, S., Castellani, U., Vapnik, A.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002) 20. Guo, J., Guo, Y., Kong, X., He, R.: Unsupervised feature selection with ordinal locality. In: Proceedings of IEEE International Conference on Multimedia and Expo, pp. 1213–1218 (2017) 21. Guo, J., Zhu, W.: Dependence guided unsupervised feature selection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32(1) (2018) 22. Bradley, P., Mangasarian, O.: Feature selection via concave minimization and support vector machines. In: Proceedings of theInternational Conference on Machine Learning, vol. 98, pp. 82–90 (1998) 23. Hu, Q., Zhao, H., Yu, D.: Efficient symbolic and numerical attribute reduction with neighborhood rough sets. Pattern Recognit. Artif. Intell. 21(6), 730–738 (2008) 24. Gao, Y., Liu, Z., Ji, J.: Neighborhood rough set attribute reduction algorithm based on matrix reservation strategy. Appl. Res. Comput. 32(12), 3570–3573 (2019) 25. Peng, X., Liu, Z., Ji, J.: Adaptable method for determining neighborhood size of neighborhood rough set. Appl. Res. Comput. 36(327), 150–153 (2019) 26. Xia, S., Zhang, Z., Li, W.: GBNRS: a novel rough set algorithm for fast adaptive attribute reduction in classification. IEEE Trans. Knowl. Data Eng. 1–1 (2020) 27. Xia, S., Zheng, Y., Wang, G.: Random space division sampling for label noisy classification or imbalanced classification. IEEE Trans. Cybern. 1–14 (2021) 28. UCI Homepage. https://archive.ics.uci.edu/ml/index.php. Last accessed 2021/06/05

Chapter 8

Development of Mobile Food Recognition System Based on Deep Convolutional Network Yue Geng

Abstract This paper designs and implements a mobile food recognition system, which aims at common food in China, so as to provide reference for the nutritional intake of the population. Aiming at the difficult problems of the same kind of food image recognition in China, such as the diverse appearance of the same food and the complex food background, this paper proposes a mobile food image recognition method based on deep convolutional neural networks. This paper builds a dataset “ChineseFood80” with richer data, and uses transfer learning to train models by importing pre-training model. Compared with the baseline method, the recognition accuracy is improved by more than 10%. At the same time, based on the trained recognition model, this paper designed and developed a food recognition system that can be used on Android devices, and it can realize the function of recognizing food through real-time photographing. Keywords Food recognition · Deep learning · CNN · ResNet · Android

8.1 Introduction Nowadays, people can get the food they want more easily. Reasonable planning and recording of the types of food and the amount of food they consume can help people clearly understand their daily nutrition and cultivate healthy lifestyle habits. In response to the above problems, food recognition based on computer vision technology can provide a reference for nutritional intervention [1]. However, there are many problems with recognizing Chinese food, such as similar dishes may have many different appearances, and the environment of the dishes and the appearances of containers are diversified. Deep learning provides a way to solve the problem of Chinese food identification. Nowadays, food recognition is an emerging topic in the field of computer vision, deep convolutional neural networks have achieved good results in many classification and Y. Geng (B) Nanyang Technological University, 50 Nanyang Avenue, Singapore, Republic of Singapore e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_8

77

78

Y. Geng

recognition problems [2]. Based on different datasets, researchers used different types of deep convolutional networks, such as AlexNet, VGG16, ResNet, etc., to train recognition models and achieve good recognition accuracy [3–6]. However, due to the complexity of Chinese food, many datasets are not rich enough and the models cannot achieve high recognition accuracy. At the same time, with the popularization and promotion of mobile devices, deploying food identification models and food identification systems on the mobile terminal to directly take pictures and identify food will provide people with a more convenient and efficient way to identify food. There are some mobile food recognition systems [7, 8]. The systems recognition models are deployed on remote servers to recognize food. The problems with such identification methods are that it requires Internet connection when in use, and may have a large time delay. However, some current deep learning architectures, such as TensorFlow and PyTorch, have provided a library for Android development, which provides conditions for directly and independently deploying the recognition model on the Android devices. In order to solve the above problems, this paper builds a richer dataset, uses deep convolutional networks to train models, and deploys the model on Android device, finally realizes an efficient mobile food recognition system. Some of the main achievements are as follows: This paper builds a “ChineseFood80” dataset, which contains 80 food types and over 14,000 images, by screening some open-source datasets and collecting food image data through web crawlers. To obtain more diverse image data that meets the requirements of network training, this paper also uses data preprocessing and data enhancement. This paper uses PyTorch to import pre-training ResNet models to perform transfer learning on the “ChineseFood80” dataset, the Top-1 recognition accuracy can achieve over 92%. This paper compares the accuracies and network complexities of different models and selects the model with the best effect to deploy on the Android side. This paper uses the methods provided by the PyTorch for Android library to directly deploy the selected model on the Android device. In this way, the captured pictures can be directly calculated and recognized on the Android device through the model without relying on the server and the Internet, which improves the efficiency of use.

8.2 Related Work 8.2.1 Food Recognition Models Researchers have used many types of deep convolutional networks to train food recognition models. For example, based on pure CNN architecture, Chairi et al. built a “PureFoodNet” network to train food recognition model [9], and the Top-1 accuracy on Food101 dataset reached 78%. The problem with the model is that the architecture

8 Development of Mobile Food Recognition System Based on Deep …

79

of the network was simple and couldn’t reach high accuracy. For Chinese food, based on ChineseFoodNet dataset [5], Chen et al. used different convolutional neural networks, including VGG19, ResNet and DenseNet, to train recognition models. They also used the method of fusing the inference results of different models, and the best Top-1 recognition accuracy reached 81.42%. Liang et al. built an “MVANet” network [10], which can fuse features of different tasks and have fewer paraments. For single-label food recognition tasks, they achieved 64% Top-1 recognition accuracy on ChineseFoodNet. The problems of such Chinese food recognition are that due to the diversity of Chinese food, the datasets might be not complex enough and even a very deep network couldn’t reach very high accuracy.

8.2.2 Mobile Food Recognition System Ocay et al. [7] implemented a “NutriTrack” application on Android devices. The application can capture the photo of the picture and transfer the picture to cloud-based databases to do recognition through Android API. The problem with the application is that users should connect the device to the server to recognize the food, and sometimes it may be inconvenient. Fakhrou et al. [11] implemented a food recognition application. The application can load the model on the server and use it without Internet. However, users still should connect the device to a server and load the model. This paper aims to implement an application which can be used without servers.

8.3 Methods 8.3.1 Deep Convolutional Neural Network The experiment uses convolutional neural networks for model training. Convolutional Neural Networks (CNN) is currently the most widely used and more efficient and stable method in many fields, especially in image classification [12, 13]. CNN can make full use of the local characteristics of the data itself to optimize the network structure. Due to the characteristics of local perception and shared weights, CNN can greatly reduce the number of parameters in its structure, and the network structure will become simpler and clearer [14, 15]. Deeper convolutional neural networks can achieve better accuracy because they can extract more features. However, a too deep network will affect the forward propagation process and cause network degradation. Based on the architecture of CNN, the ResNet network proposed by He et al. solves this problem by introducing a residual block [16, 17]. By the residual block, ResNet solves the problem of gradient disappearance in the process of back propagation.

80

Y. Geng

ResNet is an improvement of CNN structure. The structures of CNN include the following parts: Convolutional Layer. Weight represents convolutional layer. Convolutional layer uses a weighted convolution kernel to perform convolution operations on the image to extract different features of the image. Multiple convolutions can extract more advanced features step by step. The neural network learns these features and uses them for image classification tasks. Batch Normalization Layer (BN). The BN layer adjusts the distribution of the input value to a normal distribution with a mean value of 0 and a variance of 1 to prevent the distribution of the input value from shifting during the network training process and leading to the disappearance of the gradient. At the same time, this normalization can improve the generalization ability of the network and speed up the convergence speed. Activation Function. The activation function is a non-linear function, and its main function is to increase the non-linearity of the neural network, which enables the deep neural network to better approximate any function, and then learn the corresponding characteristics of the data more accurately. ResNet uses ReLu as the activation function [18], which has linear and non-saturation characteristics, which reduces the amount of calculation and improves the sparsity of the network, and can better extract the characteristics of the data. ResNet adds a shortcut connection on CNN. The shortcut connection forms an identity mapping between different layers, ensuring that the network performance will not be degraded as the number of network layers deepens. A convolutional layer ensures that the dimensions of the added data are consistent. At present, ResNet has evolved a variety of different network structures. By superimposing different numbers of residual blocks, we can build ResNet networks of different depths to adapt to different types of classification tasks.

8.3.2 Training Food Recognition Models In order to further improve the accuracy of model recognition, this paper adopts the method of transfer learning. Using a pre-trained model for transfer learning can enable the neural network to apply the experience and knowledge learned in a certain field to another related field, thereby optimizing model training [19, 20]. Since there are model parameters that have been trained using a large amount of similar data for reference, and then use our dataset for training, we can transfer the trained model parameters to the specific image classification we need, so that we can more quickly and accurately determine the classification criteria. PyTorch provides ResNet transfer learning model based on the ImageNet dataset [21]. ImageNet is a large-scale database commonly used in the field of computer vision for visual object recognition. It has more than 14 million image data and contains more than 20,000

8 Development of Mobile Food Recognition System Based on Deep …

81

categories, so it is suitable for the image classification task of this experiment. As shown in Fig. 8.1, transfer learning can be implemented by PyTorch. Some key blocks in the structure include: Record Images and Labels. To load the data, images and labels need to be recorded. In this paper, we use csv file to record images and labels. We put one kind of images in a folder, then use writer function provided in csv library to record images and labels in a csv file. The first column is file name and the second column is the corresponding label. Import Data. We use Dataset class provided by PyTorch to import data. The images and corresponding labels need to be organized as train, validation and test class to ensure that the training effect is verified during the training process, we set the scale to 6:2:2. Data need to be preprocessed to fit the network. We use functions provided by PyTorch to covert the image to 224 × 224 tensor data, which is required by ResNet network. Then we can add data enhancement functions to increase image diversity, which can improve the generalization capability of the model. Finally, we can use DataLoader class to load the data and feed it in the network. Train Network. Right part is used to train the network. We set the network is trained for 100 epochs. During each epoch, the train process including forward propagation, loss computation, back propagation and update gradient is performed once. Finally, we save the model that has best performance.

Fig. 8.1 The process of using PyTorch framework to realize transfer learning

82

Y. Geng

8.3.3 Deploying the Recognition Application on Android Side By adding the PyTorch for Android library in Android Studio, we can deploy the trained model to the Android side device. As shown in Fig. 8.2, to covert the model to fit Android device, we firstly load model network and parameters, set the format of the input layer, then use trace() function provided in PyTorch to convert the model format. Then, we deploy the model in an Android application. Firstly, we create a string array which contains the names of all identifiable food types in a file. Then, we create functions to preprocess image to adapt the data to the input of the network. The process is same to the part in model training. Then we create function to read the data and get the recognition result. The function gets the input, feed the data in the model and get the result, finally return the name according to the string array created before. Finally, we create functions to call the camera and set the interfaces to transfer data in the main file, and create a file to display the picture and the corresponding recognition result. The framework and modules of the system mainly include the following parts: Model Conversion. Before deploying the model on Android, we first convert the trained model to the TorchScript format for Android through the trace() function provided by the PyTorch library. For Android devices, the computing power is often not as powerful as a computer. This step can optimize the parameters of the model, so that the Android device can better deploy and use the model. Call Up the Camera. We can use the MediaStore ACTION_IMAGE_CAPTURE method in Android Studio to call the camera and specify the path of the obtained image for subsequent image recognition steps. We can create a button in Android Studio, then realize calling the above functions by clicking the button.

Fig. 8.2 The process of deploying the food recognition model on the Android side

8 Development of Mobile Food Recognition System Based on Deep …

83

Image Recognition. First, we create a file containing all the recognizable category tags. The model we trained can recognize 80 categories. We create all types as a string array in a file to facilitate the display of subsequent identification results. Then create a file to call the model. We first import the converted model file into the project folder and record the storage location; then we set the image preprocessing function in the file to convert the bitmap obtained by taking pictures into the tensor data type used for model recognition; then create the argmax function is used to process the recognition result. The return result is the position in the created string array, and the string at this position is the output recognition result; finally, we create a predict function to implement the preprocessed function Pass in the model, get the recognition result, and return the final recognition result through the argmax function. Display the Result. We create a module to display the result page. The function of this module is to display the picture and the recognition result after taking a picture and recognizing the picture. The layout is to display the image in the interface and display the recognition result below the picture.

8.4 Experiment Results Based on the above method, we first use the existing open-source datasets and the crawler implemented by Python to construct a “ChineseFood80” dataset for model training. Then we use Python language and PyTorch library to train and optimize models. Finally, we use Android Studio software to develop Android applications.

8.4.1 “ChineseFood80” Dataset There are many types of food that can be seen in China in the ChineseFoodNet and ChinFood1000 dataset, but some food categories are not common and there are few data. As shown in Fig. 8.3, based on the food types of the ChineseFoodNet dataset, this paper first selects the more common data categories with a large amount of data to ensure the diversity and complexity of the dataset. Filtering Food Categories. Compared with foreign food, common food types in China are more abundant and diverse, and the same type may have different appearances, so training a food recognition system for Chinese food requires a larger number of datasets with more diverse picture data. Therefore, the food categories in the existing dataset are screened, and the food categories that are more common in the country and have a variety of food pictures with no less than 800 picture data are selected. Using a Web Crawler to Expand the Dataset. A web image crawler is implemented to obtain images of different types of food. At present, many image search websites

84

Y. Geng

Fig. 8.3 The process of constructing the ChineseFood80 dataset

have a two-tier structure, that is, the home page is an index of each image page. After clicking on the index, you will enter the specific image display page. As shown in Fig. 8.4, the idea of crawler implementation is: the URL of the image index page is relatively regular, and only the last part of the URL of each index page is used to indicate that the value of the indexing object has changed. All citation pages can be generated in batches. Then download the picture through re regular expression and request function. Build the “ChineseFood80” Dataset. We first screened the food types with richer data in the datasets such as ChineseFoodNet, and then used a web crawler to crawl the pictures of each food type on the three common search engines of Google, Baidu and Microsoft, and then filter the crawled pictures, removed the wrong pictures, and added the filtered pictures to each picture type. Finally, this paper expanded the dataset to at least 1500 pictures of each kind, and the total number of pictures is 140,245, which met the needs of subsequent model training.

Fig. 8.4 The process of implementing crawlers

8 Development of Mobile Food Recognition System Based on Deep …

85

Data Enhancement. In order to further enhance the diversity of pictures and improve the recognition efficiency of the model, this paper performs data enhancement operations before using the dataset for network training. First, the picture is converted to the size required by the neural network through the Resize() function provided by the PyTorch architecture. Then use the functions provided by the PyTorch framework to perform the following operations on the image data: Enlarge the picture. Use the Resize() function to enlarge the size of the original picture by 1.25 times. Rotate the picture randomly. Rotate the picture randomly by 15° through the RandomRotation() function. Crop picture. Crop the picture to the size required by the network through the center of the CenterCrop() function. Randomly adjust brightness. Use the ColorJitter() function to randomly brighten or darken the image brightness. Random horizontal flip. Random horizontal flip of the picture through the RandomHorizontalFlip() function. Random vertical flip. Random vertical flip of the picture through the RandomVerticalFlip() function. Figure 8.5 shows the effect of a single picture after the above data enhancement. Through data enhancement, one image can be expanded into images of different representations, which increases the diversity of the dataset, thereby improving the recognition accuracy of the model.

Fig. 8.5 Pictures after data enhancement

86

Y. Geng

8.4.2 Train and Select Food Recognition Models Training Result. In the constructed dataset, we used ResNet18, ResNet34 and ResNet50 pre-training models for transfer learning, and observed the training results through the accuracy curve and the CrossEntropyLoss curve. As shown in Fig. 8.6, the correct rate curve gradually rises until it is flat, and the CrossEntropyLoss curve gradually drops and approaches 0. The model accuracy and parameter quantity obtained by the final three kinds of network training are shown in Table 8.1. From Table 8.1 we can see that the recognition accuracy of our proposed method can achieve at least 90%. Models Selection. In order to subsequently deploy the model on Android devices, while improving the accuracy of model recognition, the model parameters should not be too many, so as not to require the device to perform too many calculations. According to the model training results, the recognition accuracies of ResNet34 and ResNet50 is not much different, but the recognition accuracy is improved compared to ResNet18, and the parameter amount and model size of ResNet34 are smaller than ResNet50, so the experiment used the model trained by ResNet34 network for subsequent mobile terminal application development.

Fig. 8.6 The recognition accuracy curve and CrossEntropyLoss curve of the model training

Table 8.1 The parameter quantities and accuracies of the three ResNet models

Model types

Model parameters/Model size

Model recognition accuracy (%)

ResNet 18

11197072/43.9 MB

90.2

ResNet 34

21305232/83.1 MB

92.4

ResNet 50

23590032/93.3 MB

92.6

8 Development of Mobile Food Recognition System Based on Deep …

87

8.4.3 Food Recognition Application on Android Device According to the steps described above, we used Android Studio software to develop an Android application capable of deploying a food recognition model. The realized Android application and usage process are shown in Fig. 8.7. The steps to use the application are as follows: Main Interface. After opening the application, enter the main interface, click the “Take a picture” button on the main interface to call the camera function. Camera Interface. Enter the called system camera and use the camera function of the system to take pictures of food. Result Interface. Pass the obtained photo into the model and obtain the recognition result, and display the captured photo and the recognition result on the result interface. Through the application developed in this paper, the food recognition model can be directly deployed on the Android side without connecting to the server via the Internet and using the model remotely. After model conversion, the model parameters were relatively small and can run well on Android devices with the support of the PyTorch for Android library. At the same time, by calling the camera function, users can identify food more conveniently and quickly.

Fig. 8.7 The using process of food identification applications on Android device

88

Y. Geng

8.5 Conclusion This paper implements an application that can directly deploy the food recognition model on Android mobile devices. First of all, in response to the incomprehensive problem of the current datasets of common food in China, we expand the existing dataset through a web crawler and builds a more data-rich ChineseFood80 dataset to further improve the accuracy of the model. Then we use different ResNet pre-training models for transfer learning, and obtains Top-1 recognition accuracies of more than 90%. Finally, we use Android Studio software, based on the PyTorch for Android library to develop an application that directly deploys the food recognition model to the Android mobile device, so that people can use mobile phones to take pictures and recognize food. At present, the application function implemented in this paper is relatively single, and functions such as nutrient composition query and nutrient intake record can also be designed in the future.

References 1. Hassannejad, H., Matrella, G., Ciampolini, P., et al.: Food image recognition using very deep convolutional networks. In: Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, pp. 41–49. ACM (2016) 2. Min, W., Jiang, S., Liu, L., Rui, Y., Jain, R.: A survey on food computing. ACM Comput. Surv. 52(5), 1–36 (2019) 3. Bossard, L., Guillaumin, M., Van Gool, L.: Food-101—mining discriminative components with random forests. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 446–461. Springer, Cham (2014) 4. Kawano, Y., Yanai, K.: Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. In: European Conference on Computer Vision, pp. 3–17. Springer, Cham (2014) 5. Chen, X., Zhu, Y., Zhou, H., Diao, L., Wang, D.: ChineseFoodNet: a large-scale image dataset for Chinese food recognition (2017). arXiv preprint arXiv:1705.02743 6. Fu, Z., Chen, D., Li, H.: Chinfood1000: a large benchmark dataset for Chinese food recognition. In: International Conference on Intelligent Computing, pp. 273–281. Springer, Cham (2017) 7. Ocay, A.B., Fernandez, J.M., Palaoag, T.D.: NutriTrack: android-based food recognition app for nutrition awareness. In: 2017 3rd IEEE International Conference on Computer and Communications (ICCC), pp. 2099–2104. IEEE (2017) 8. Khaled, S.: NutriPal—food recognition android app for nutrition awareness (2019) 9. Kiourt, C., Pavlidis, G., Markantonatou, S.: Deep learning approaches in food recognition. In: Machine Learning Paradigm, pp. 83–108. Springer, Cham (2020) 10. Liang, H., Wen, G., Hu, Y., Luo, M., Yang, P., Xu, Y.: MVANet: multi-tasks guided multiview attention network for Chinese food recognition. IEEE Trans. Multimedia 23, 3551–3561 (2020) 11. Fakhrou, A., Kunhoth, J., Al Maadeed, S.: Smartphone-based food recognition system using multiple deep CNN models. Multimedia Tools Appl. 80(21), 33011–33032 (2021) 12. Bashar, A.: Survey on evolving deep learning neural network architectures. J. Artif. Intell. 1(02), 73–82 (2019) 13. Khan, A., Sohail, A., Zahoora, U., et al.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020)

8 Development of Mobile Food Recognition System Based on Deep …

89

14. Zhou, D.X.: Theory of deep convolutional neural networks: downsampling. Neural Netw. 124, 319–327 (2020) 15. Zhang, Z., Cui, P., Zhu, W.: Deep learning on graphs: a survey. IEEE Trans. Knowl. Data Eng. (2020) 16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, pp. 770–778. IEEE (2016) 17. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645. Springer, Cham (2016) 18. Agarap, A.F.: Deep learning using rectified linear units (RELU) (2018). arXiv preprint arXiv: 1803.08375 19. Zhuang, F., Qi, Z., Duan, K., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2020) 20. Marcelino, P.: Transfer learning from pre-trained models. Towards Data Science (2018) 21. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2021)

Chapter 9

Water Environmental Quality Assessment and Effect Prediction Based on Artificial Neural Network Wentian An

Abstract Water environmental quality is an important criterion to promote China’s sustainable development strategy and the main direction of China’s scientific research and technological innovation. Based on the current situation of water environmental quality evaluation, the artificial neural network and its prediction model were constructed, and the Levenberg–Marquardt optimization algorithm was used to evaluate the water environmental quality. Keywords Artificial neural network · Water environment · Levenberg–Marquardt · Water environmental quality assessment

9.1 Introduction of Artificial Neural Network Model for Water Environmental Quality Assessment Artificial neural network is used to simulate the complex structure of human brain neural network, which has a strong independent adaptability. It can predict the future changes of things after learning the historical development rules of things. Therefore, in the practical research, most researchers begin to use this technical concept to deal with the river water quality change prediction work. Both at home and abroad, water environment quality assessment based on artificial neural network is in the primary stage of development, and the most common types include BP neural network with improved algorithm, Levenberg–Marquardt optimization algorithm and neural network prediction model combined with other theories [1–4]. For water environmental quality assessment, the simple BP neural network model as shown in Fig. 9.1 is the most commonly used. However, Levenberg–Marquardt optimization algorithm is mainly used in this paper for model construction and analysis. As a fast algorithm based on standard numerical optimization technology, it effectively combines Gaussian–Newton method and gradient descent method. Assuming that X (k) represents the vector constituted by the weights and thresholds of W. An (B) School of Metallurgy, Northeastern University, Shenyang 110819, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_9

91

92

W. An

Fig. 9.1 Prediction model of Levenberg–Marquardt algorithm grid

the k-th iteration, the new weights and thresholds will constitute a new vector X (k+1) ,  −1 which can be calculated according to x (k+1) = x (k) + x x = ∇ 2 E(x) ∇ E(x). One of the formulas ∇ 2 E(x) represents the Hessian matrix of the error index function E(x), ∇ E(x) and represents the gradient of E(x). N 2 ei (x), then ei (x) in If you set the error index function to be E(x) = 21 i=1 the formula represents the error, and you get ∇ E(x)  = J T (x)e(x) and ∇ 2 E(x) = N ei (x)∇ 2 ei (x) represents J T (x)e(x) + J (x). In both of these formulas, J (x) = i=1 Jacobian matrix, as shown below: ∂e1 (x) ∂e1 (x) ∂e1 (x) ··· ∂ x1 ∂ x2 ∂ xn ∂e2 (x) ∂e2 (x) ∂e2 (x) J (x) = ··· ∂ x1 ∂ x2 ∂ xn ··· ∂e N (x) ∂e N (x) ∂e N (x) J (x) = ··· ∂ x1 ∂ x1 ∂ xn J (x) =

(9.1)

 −1 And the Gauss–Newton formula is x = J T (x)e(x) J (x)e(x), the form of  −1 J (x)e(x), [5] where μ reprethe improved algorithm is x = J T (x)e(x) + μI sents the constant greater than 0, and I represents the identity matrix. Because this algorithm uses approximate second derivative information, it is faster than gradient descent. At the same time, the above Levenberg–Marquardt optimization algorithm has a faster convergence speed [6]. Although it still needs to use more memory during

9 Water Environmental Quality Assessment and Effect Prediction …

93

operation, with the improvement of current computer technology, it will definitely show its unique advantages in practical prediction and evaluation in the future [7].

9.2 Prediction Model Based on Levenberg–Marquardt Optimization Algorithm 9.2.1 Time Series This modeling method is operated in the form of mathematical statistics. The main research needs to predict the original data series of the water quality index itself, so as to achieve the predicted goal after mastering its change form [8]. Generally speaking, the time sequence change of a detection factor in water environment is regarded as a linear curve, and the monitoring value at a certain moment will only be affected by its own historical record. The historical record of the concentration of this factor can then be thought of as the time series D(I), where I = 1, 2, 3, …, N (N represents the number of monitoring points). Thus, the prediction model for a single water quality monitoring factor is formulated as D(n + 1) = ψ(D(1), D(2), , D(n)), where D(n + 1) represents the nonlinear action function. For multiple water quality monitoring factors, the time series can be set as DI(1), DI(2),…, DI(n), where I = 1, 2, 3, …, m, m represents the number of items of water quality, and n represents the monitoring concentration value of each index changing with time. The corresponding water quality prediction model formula is Di (n + 3, n + 2, n + 1) = ψ(Di (1), Di (2), , Di (n)), like single water quality monitoring, both represent nonlinear function.

9.2.2 Algorithm of Prediction Model As the most common form of artificial neural network, BP neural network has been widely used in practice because of its strong autonomous learning ability and adaptability. In the water environmental quality assessment and effect prediction, due to the actual environment is more complex, so when using this kind of algorithm, it must be effectively improved. There are two common methods; one is to reduce the oscillation; the other is to improve the algorithm. In this paper, we mainly choose the latter method and use Levenberg–Marquardt optimization algorithm to deal with it, which can not only show the local convergence of Gauss–Newton method but also reflect the global characteristics of gradient descent method; the convergence rate of gradient descent method with fixed step size on strongly convex and smooth functions is linear [9].

94

W. An

9.2.3 Defining the Grid Structure In this paper, when building a prediction model for a single factor, it can be obtained D(t) = F(D(t − 1), D(t − 2), . . . , Di(t − n)), in the formula; D(t) represents the monitoring value of a water quality index at time t; n represents the number of input nodes, and F represents the mapping relationship between input and output defined by the neural grid. Combined with the prediction model of single factor, the prediction model of multiple water factors can be obtained, and the formula is Di (t) = F(D(t − 1), Di (t − 2), . . . , Di (t − n)), where I = 1, 2, 3, …, m (m represents the number of water quality indicators). As shown in Fig. 9.1, it is the structure diagram of the prediction model constructed in this study [10].

9.2.4 Sample Selection and Training Methods The samples selected in this paper are from the central area of a river in a certain region of China and are tested at a certain time. After the initial raw data are obtained, the sample is pretreated. At the same time, cross-test training method should be adopted to prevent the over-fitting phenomenon of neural network effectively. The data in this paper are used to introduce self-learning course guidance and self-learning process to mine the high confidence samples of massive unlabeled samples for automatic labeling. It is used to train the model with selected pseudo-labeled samples and manually labeled samples to improve model performance. As an application algorithm composed of extensive connections of a large number of simple artificial neurons, artificial neural network is applied in water quality evaluation and effect prediction analysis. BP network and RBF network are mainly selected. The specific structure forms are shown in Figs. 9.2 and 9.3. Two kinds of nonlinear multilayer forward neural network algorithm belong to the network, but the latter, the basis function of calculation is the center of the input vector and Euclidean distance, and the latter is the incentive function of sigmoid function, input, and output BP artificial neural network weight value can be regarded as variable weight values, and RBF is the output of all instructions input to the artificial neural network, and the output weight value can be changed. Wherein, the mathematical form of sigmoid function of BP neural network is as follows: f (x) =

1 1 + e−x

(9.2)

The mirror product of RBF is a Gaussian function, and the specific mathematical form is as follows: D(x − u ) = e−x−u , α > 0 2

(9.3)

9 Water Environmental Quality Assessment and Effect Prediction …

95

Fig. 9.2 Structure diagram of BP network

Fig. 9.3 Structure diagram of RBF network

In the above formula, u represents the center of the kernel, and α represents the parameters of the kernel.

9.3 Predictive Analysis Through continuous monitoring of water quality change data in the experimental area day and night, the actual recording time interval is 60 s, and the flow meter is used to test every two hours. After data analysis, it was found that the overall flow

96

W. An

rate was too low, and the maximum value was 0.016 m/s, so it was not considered during modeling. 252 groups of data were selected to construct the prediction model, and 30 groups of data were used to verify the prediction accuracy of the model. The function in MATLAB was used for step-step regression, and the regression equation was finally formed. The value R2 after the degree of freedom correction was 0.9240. After the F test and P test, the regression model was very significant when the significance level α reached 0.05. The corresponding regression equation is as follows: Y  = 0.0038 + 0.5291 X 1 − 0.2704 X 2 − 0.5512 X 3 ; Y = σ ∗ Y  + Y , (9.4) The construction equation is used to fit and predict the chlorophyll A of water quality, and the results can be obtained as shown in Fig. 9.4. According to the above analysis, it is found that the fitted curve and the real value have similar change characteristics, but there are some differences in the change amplitude, especially in the predicted value, many predicted values deviate from the real data, which prove that the corresponding value has a large error. The relative error of linear regression fitting is shown in Fig. 9.5. According to the above analysis, the error of the fitting value is controlled at about 10%, and the error in the middle area of the overall error curve is small because the factors affecting the change of chlorophyll A content in water quality at this stage are relatively single, so the fitting of the model is easier, and the corresponding value is lower. At the same time, model analysis can grasp more research indicators so as to quickly determine the specific content of chlorophyll A and make a prospective prediction. MATLAB neural network toolbox was used to build a three-layer BP network model with three inputs, seven hidden layers of neurons and one output neuron. The data of the actual training and the construction model were consistent. The training function was TrainLM, and the neural network model with regression coefficient

Fig. 9.4 Analysis of linear regression fitting and prediction results

9 Water Environmental Quality Assessment and Effect Prediction …

97

Fig. 9.5 Error analysis of linear regression fitting results

over 0.99 was obtained after 127 steps of training. The actual prediction results are shown in Fig. 9.6. According to the above analysis, it was found that the predicted results fitted the original curve relatively high and accurately predicted the direction of change in the region where chlorophyll A content changed drastically. However, the distance of actual values was too large, and the relative error was too high. The specific change curve was shown in Fig. 9.7. According to the above analysis, except for some abnormal values, the error generated by the prediction results of BP neural network will be controlled at about 5%, and the further away from the training value, the corresponding error will continue to rise. It can be seen that the artificial neural network has a strong fitting effect on nonlinear relations, and the actual black box operation process can effectively solve the highly nonlinear, which plays a positive role in water quality evaluation and effect prediction. Especially in the prediction of single index, the BP neural network studied in this paper can obtain satisfactory prediction accuracy, and in the evaluation

Fig. 9.6 Prediction results of BP neural network

98

W. An

Fig. 9.7 Error analysis results of BP neural network synchronous prediction

of water quality, whether BP neural network or R, B, F artificial neural network can obtain more objective evaluation results. The research model in this paper should be realized by using GUI formal design and pay attention to the principle of open input, which can not only provide convenience for the study of influencing factors of water quality change but also become an effective way to solve environmental problems. In order to facilitate the comparative analysis, the original data dissolved oxygen in water (DO), chemical oxygen demand (COD), and total nitrogen (TN) obtained in the experimental study will be used as the basic indexes of the Levenberg–Marquardt algorithm grid prediction model, and the variation trend of the three is shown in Fig. 9.8. The following figure shows the trend analysis chart of DO, COD, and TN monitoring indexes, in which DO is higher than the other two. From this analysis, it

Fig. 9.8 Trend analysis diagram of DO, COD, and TN monitoring indicators

9 Water Environmental Quality Assessment and Effect Prediction …

99

can be seen that: First, the change trend of all indicators belongs to nonlinear function, and it is difficult to approximate the nonlinear function with only one time variable by using the previous explicit function method. The problem can be effectively solved by using neural network to continuously increase the neural elements in the hidden layer and adjusting the corresponding weights. Second, the relationship among all indicators is also a complex nonlinear relationship, which requires that they cannot be substituted during the modeling.

9.4 Conclusion To sum up, a comparative analysis of several current common water quality evaluation models based on artificial neural network shows that they all have advantages and disadvantages. From the perspective of the discussion in this paper, the prediction model based on Levenberg–Marquardt algorithm grid can not only control the prediction error within the specified range but also control the prediction error within the specified range. In addition, we can find out the main factors affecting water quality, which is of great importance for the social environment that is in the optimal development.

References 1. Hosseini, S., Turhan, B., Gunarathna, D.: A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans. Software Eng. 45(2), 111–147 (2017) 2. Wang, Q., Liu, Q., Xia, R., Zhang, P., Zhou, H., Zhao, B., Li, G.: Automatic defect prediction in glass fiber reinforced polymer based on THz-TDS signal analysis with neural networks. Infrared Phys. Technol. 115, 103673 (2021) 3. Li, H., Luo, M., Zheng, J., Luo, J., Zeng, R., Feng, N., Du, Q., Fang, J.: An artificial neural network prediction model of congenital heart disease based on risk factors: a hospital-based case-control study. Medicine 96(6), e6090 (2017) 4. Nezhadshokouhi, M.M., Majidi, M.A., Rasoolzadegan, A.: Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance. J. Supercomput. 76(1), 602–635 (2020) 5. Haider, S.W., Cangussu, J.W., Cooper, K.M., Dantu, R., Haider, S.: Estimation of defects based on defect decay model: ED3 M. IEEE Trans. Softw. Eng. 34(3), 336–356 (2008) 6. Bennin, K.E., Keung, J., Phannachitta, P., Monden, A., Mensah, S.: MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans. Software Eng. 44(6), 534–550 (2017) 7. Ein, O., Keun, Y.T., Hong, S.: Artificial neural network approach for differentiating open-angle glaucoma from glaucoma suspect without a visual field test. Invest. Ophthalmol. Vis. Sci. 56(6), 3957–3966 (2015) 8. Tantithamthavorn, C., Hassan, A.E., Matsumoto, K.: The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans. Software Eng. 46(11), 1200–1219 (2018)

100

W. An

9. Rix, G.J.: Interpretation of nondestructive integrity tests using artificial neural networks. NDT E Int. 30(5), 329–330 (1997) 10. Sun, H.L., Wu, Y.W., Yu, Y., Zhu, S.H.: Trace-level lead analysis in environmental water and whitening cosmetics based on solid-phase extraction followed by flame atomic absorption spectrometry determination. At. Spectrosc. 35(3), 127–133 (2014)

Chapter 10

Network Intrusion Detection Based on Apriori-Kmeans Algorithm Yiying Zhang, Delong Wang, Yannian Wu, Yiyang Liu, Nan Zhang, and Yingzhuo Li

Abstract Timely detection of network intrusion is an indispensable part of network security; therefore, how to accurately detect network intrusion is particularly important. Based on this, this paper proposes an intrusion detection model based on AprioriKmeans algorithm to detect network intrusion, in order to ensure the normal operation of the network. Firstly, the intrusion alarm mode under the security state of feature-based NIDS is modeled, and the continuous output of NIDS intrusion alarm is filtered by Apriori-Kmeans algorithm, then the similarity between points is determined by the modified distance D in Apriori-Kmeans algorithm, and whether the new data point is normal or not is judged according to the similarity score, so as to reduce the false alarm rate of intrusion detection system, improve the accuracy of intrusion detection. A comparative experiment is designed, and the results show that the method proposed in this paper has high accuracy. Keywords Apriori algorithm · Kmeans algorithm · Intrusion detection · NIDS

10.1 Introduction In recent years, with the rapid development of mobile payment, e-commerce, and the financial industry, a large amount of user privacy information is generated on the Internet at all times, which makes it more and more important to establish an intrusion detection system to ensure network security. In recent years, network security public incidents have frequently occurred [1]. In May 2017, ransomware attacks spread in many countries around the world. The computer files of many universities, hospitals, police stations, and other departments were encrypted by viruses and required a ransom to be unlocked. This ransomware was spread by the “Eternal Blue” hacker Y. Zhang · D. Wang (B) · Y. Liu · N. Zhang · Y. Li College of Artificial Intelligence, Tianjin University of Science & Technology, Tianjin 300457, China e-mail: [email protected] Y. Wu Shenzhen Guodian Technology Communication Co., Ltd., Shenzhen 518000, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_10

101

102

Y. Zhang et al.

weapon leaked by the NSA, and various virus variants have appeared over time. In February 2018, someone on the open source code sharing website GitHub shared the source code of the core components of the iPhone operating system. The leaked code belongs to an important part of the IOS security system, iBoot. iBoot is equivalent to the BIOS system of a Windows computer. The leak of the iBoot source code may expose hundreds of millions of IOS devices to security threats. IOS and Mac system developer Jonathan Levin said that this is the most serious leak in the history of IOS. In January 2019, the number of customers affected by Marriott International Hotel’s big data breach dropped from 500 to 383 million. More than 5 million unencrypted passport numbers and approximately 8.6 million encrypted credit card numbers were stolen. The incident remains one of the largest personal data breaches in history. Marriott stated that it has been under hacking since 2014. Traditional intrusion detection methods generally simply extract features, then perform feature selection, build an intrusion behavior database based on the selected features, use feature matching methods for detection, or use clustering or classification and other machine learning algorithms for detection [2], so as to discover some abnormal data. However, the above methods are not deep enough for feature mining, and traditional clustering and classification algorithms are very dependent on the selection of features. Different features will produce different detection effects, sometimes even very different [3]. Therefore, intrusion detection must be strongly adaptive. Sexuality and high accuracy require a deeper understanding of features. In today’s increasingly complex network environment, intelligence is the development trend of intrusion detection. Different from the traditional pattern matching method, the neural network completes the detection of intrusion behavior by simulating the process of human brain work and learning the characteristics of intrusion behavior, and it has been widely used in the field of intrusion detection. Deep learning has super nonlinear fitting capabilities and can extract the main meaning features from complex features. It has achieved great achievements in complex fields such as text processing, image, voice, and video. The current network intrusion data has problems such as various types of data, huge amount of data, and high degree of concealment of intrusion data. Deep learning methods are used to extract the deep features of intrusion behavior data, so that intrusion behavior data can be better distinguished.

10.2 Research Status Intrusion detection system (IDS) is a very critical device in the field of network security, which is usually deployed behind firewalls [4]. IDS monitors network traffic and activates defense policies when suspicious connection records are found in the traffic. IDS is a proactive security protection technology [5]. Within April 1980, James Anderson first proposed the concept of intrusion detection. From 1984 to 1986, Dorothy Denning and Peter Neumann proposed a system model ides that can carry out real-time intrusion detection. Ides summarizes and

10 Network Intrusion Detection Based on Apriori-Kmeans Algorithm

103

abstracts other intrusion detection models and constructs a general intrusion detection system framework. In 1990, L. T. Heberlein and others developed network security monitor (NSM). NSM directly took the network traffic information as the data source for intrusion detection for the first time. After that, IDS is gradually divided into network-based IDS and host-based IDS. In 1991, haystack laboratory and Heberlein et al. put forward the concept of distributed intrusion detection system and received extensive attention. In the 1990s, with the rapid development of data mining and machine learning, the research idea of intrusion detection from the perspective of data and intelligent algorithm has gradually become the mainstream. In 1998, Wenke Lee and Salvatore J. Stolfo transformed intrusion detection into data classification from the perspective of data and used decision tree algorithm to classify the extracted features. Subsequently, various intrusion detection methods based on decision tree, biological immune theory, Bayesian theory, AdaBoost, clustering, support vector machine, neural network, and other intelligent algorithms continue to emerge. In view of the limitations of intrusion detection algorithm, there are many improved methods.

10.3 Apriori-Kmeans Algorithm The traditional Kmeans algorithm ignores the relationship between data and data to a certain extent [6] and regards data as independent of each other, but there is a semantic correlation between data and data. The Apriori-Kmeans algorithm converts the association rules into vectorizable [7] and then combines with the Kmeans algorithm to regard whether there is a frequent itemset X i (X i = 0 or X i = 1) as an attribute and use it as a variable in the distance calculation formula of the Kmeans algorithm. Bool (X i ) indicates whether it contains frequent itemset X i , and its mean value for each category is conf (X i → Yi ). Combining the distance calculation formula of Kmeans to obtain the distance D between the new test sample and the training sample (x, yi ) is shown in Formula (10.1):  D=



D 2 (x  , x) +

(Bool(X i ) − con(X i − Yi ))2

(10.1)

j∈[1,m]

Let parameter β be the mean value of vector x word frequency. X (1) represents frequent 1-itemset, X (2) represents frequent 2-itemset, and so on. In terms of classification, itemsets with more items tend to be higher than itemsets with less items, so the parameter of ki = i is set to revise the itemset X (k) . The modified formula is as follows:   kl2 (Bool(X j (l) ) − con(X j (l) − Yi ))2 (10.2) D = D 2 (x  , x) + β 2 j∈[1,m]

104

Y. Zhang et al.

Based on the above content, the execution steps of the Apriori-Kmeans algorithm are as follows: 1. 2.

Preprocessing the dataset. The Bool model and vector space model (VSM) are obtained according to Step (1). PCA method was used to reduce the dimension of the data, and the first J features were selected as the new Bool model according to Formula (10.3). IG(t) = −

m 

P(Ci ) log P(Ci ) + p(t)

i=1

+ p(t)

m 

P(Ci |t) log P(Ci |t)

i=1 m 

P(Ci |t) log P(Ci |t)

(10.3)

i=1

3.

4.

Mining association rules according to the Bool model in Step (2), using the Apriori algorithm to generate frequent itemsets and rules [8], and computing the confidence of frequent itemsets and its confidence of any classification. Use Formula (10.4) to calculate the keyword weight of the VSM model, and sort the first K features. wik =  n

t f ik ∗ log(N /n k + 0.01)

(10.4)

2 2 k=1 (t f ik ) ∗ log (N /n k + 0.01)

5. 6.

Calculate the similarity between test samples and training samples according to the modified Kmeans distance Formula (10.2) above. Clustering was carried out according to Kmeans clustering rules. In order to improve the accuracy of clustering, the distance weighting formula was used: y = arg max v



wi ∗ I (v = yi )

(10.5)

(xi ,yi )∈Dz

where wi = 1/i. Apriori-Kmeans algorithm uses Apriori algorithm to optimize the traditional Kmeans algorithm [9], solve the problem of semantic relevance and word frequency caused by the length of the article, and improve the clustering accuracy of Kmeans algorithm.

10 Network Intrusion Detection Based on Apriori-Kmeans Algorithm

105

10.4 Intrusion Detection Model Based on Apriori-Kmeans Algorithm The intrusion detection system based on Apriori-Kmeans algorithm not only has the general network intrusion detection system (NIDS) part, but also includes the intrusion alarm filtering function, as shown in Fig. 10.1. Most NIDS are rule-based, which are not only difficult for software implementation [10], but also undetectable for new network intrusion behavior. Aiming at the shortcomings of NIDS, an intrusion detection model based on Apriori-Kmeans algorithm is proposed. This model can reduce the false positive rate of network intrusion by virtue of alarm filtering function and greatly improve the accuracy of network intrusion detection. When the network environment encounters an attack, the features-based NIDS will issue an intrusion alert different from that in the security case, and in some special attacks, it will issue an alert type that does not exist in the security case. This method can be used to determine whether the input alarm sequence deviates from the normal situation. If there is a deviation from the normal security situation, it is determined that there may be a network attack, which requires users to be careful. If the alarm is very similar to the non-aggressive scenario, the likelihood of an attack is low. Fig. 10.1 Intrusion detection model based on Apriori-Kmeans algorithm

Response unit module The alarm

Event analyzer Detec on engine

Event generator Data preprocessing Network detector

Associa on rules Event database Intrusion rule base Alarm filtering Abnormal network behavior mode

Apriori-Kmeans Alarm filtering

input Network data

Training data

106

Y. Zhang et al.

(a)

(b)

Fig. 10.2 Examples of normal points and abnormal points in the false positive model

The simulation of the normal alarm mode and deviations detection from the normal state are described below. In the case of no attack behavior, let P be a different type of alarm issued by NIDS, and use the P-dimensional space to build a normal alarm mode. Attribute (A1 , A2 , . . . , A p ) of data point Q in the space represents the number of different types of alerts within time period T. Defined as the “normal” point when there is no attack, these reports of network intrusions are considered safe and are considered false positives. As shown in Fig. 10.2a, data points represent normal points in the absence of attack behavior. The algorithm above is used to create a new data point for the intrusion alert and then determine whether the new report is an error report. The distance between the white point (new point) and the black point (normal point) in the figure corresponds to the deviation of the intrusion report from the security mode; that is, if the new point is close to the normal point, it is judged to be normal, and the network intrusion alarm issued during this period is considered to be a false alarm. Figure 10.2a is an example of a normal point model. If any of the following conditions are met, the new point is determined to be abnormal: If the point far from the normal point (Fig. 10.2b) or the new intrusion reporting type that does not exist at the normal point, the alarm issued at this time is the real alarm; that is, the attack behavior has occurred. According to the above judgment method for the new intrusion alarm, AprioriKmeans algorithm is used to determine whether the new data points are normal. The similarity between points can be determined according to the distance D in the Apriori-Kmeans algorithm. The smaller D is, the greater the similarity is. The final similarity score of the clustered data points is the average of the distance between the nearest K normal points. If the similarity score is higher than the threshold value T, the point is judged to be abnormal; otherwise, the intrusion detection alarm is a false positive and should be filtered out. Figure 10.3 shows the system structure based on the intrusion alert function of Apriori-Kmeans algorithm. The structure of intrusion report based on AprioriKmeans algorithm consists of data preprocessing, intrusion report storage, and

10 Network Intrusion Detection Based on Apriori-Kmeans Algorithm Fig. 10.3 Experimental flowchart

107

Intrusion alarm

Data preprocessing Format Feature extrac on conversion

Intrusion alert storage

Intrusion alert filtering

intrusion report filtering. Data preprocessing includes feature extraction and format conversion. Feature extraction is to preprocess the input intrusion report, and format conversion is to transform the feature-based NIDS report into standard alert according to the preprocessing feature set. The specific feature selection depends on the type of NIDS. The function of the intrusion warning store is to store all incoming standard intrusion reports to a database. The function of intrusion alarm filtering is to filter out the wrong intrusion alarm, so as not to cause unnecessary trouble.

10.5 Simulation Experiment Grid systems are the most vulnerable to cyberattacks, with hundreds to thousands of attacks a year in many areas, and in some cases tens to hundreds of thousands of attacks. Based on this, this experiment selects a certain power grid in a certain area as the experimental object for research. In this experiment, accuracy and false detection rate were selected as evaluation indexes. The accuracy rate refers to the proportion of correctly detected attacks to the total number of attacks, and the false detection rate refers to the proportion of incorrectly detected attacks to the total number of attacks. The experiment compares the accuracy and error detection rate of Apriori-Kmeans algorithm, Bayes algorithm, and KNN algorithm for network intrusion detection. The experimental results are shown in the following figure. Figure 10.4 shows the intrusion detection accuracy graphs of the three algorithms under different attack times. For the abscise axis, the text “Attack number of times” is the number of attacks on the power grid in one year. It can be seen from the figure that, with the increase of attack times, the accuracy of network intrusion detection of the algorithm in this paper is gradually improved and is higher than Bayes algorithm and KNN algorithm on the whole, and the highest accuracy can reach more than

108

Y. Zhang et al. 100

95

Accuracy(%)

90

85

80 Apriori-Kmeans Bayes KNN

75

70 100

200

300

400 500 600 Attack number of times

700

800

900

Fig. 10.4 Accuracy under different attack times

95%. Figure 10.5 shows the false detection rate graph of the three algorithms under different attack times. The false detection rate of the algorithm in this paper decreases gradually and is lower than the other two algorithms on the whole when the attack times are increasing. 30

Error detection rate(%)

25 Apriori-Kmeans Bayes KNN

20

15

10

5 100

200

300

600 500 400 Attack number of times

Fig. 10.5 False detection rate under different attack times

700

800

900

10 Network Intrusion Detection Based on Apriori-Kmeans Algorithm

109

10.6 Summary In this paper, an intrusion detection model based on Apriori-Kmeans algorithm is proposed to solve the problem of high false positive rate of NIDS. Firstly, the pdimensional space is used to model the intrusion alert mode in normal state, in which each dimension corresponds to a type of intrusion alert, the data points in the model are used to represent the distribution of intrusion alert in a certain period of time, and each attribute value is the number of specific type of intrusion alert in this period of time. Then, with the help of the modified distance D in the Apriori-Kmeans algorithm, the similarity between points is determined, and the similarity score is used to determine whether the new data points are normal. If this point is judged to be normal, it means that the attack will indeed occur. At this time, users need to be vigilant to avoid unnecessary losses. Otherwise, the intrusion detection report is false and should be filtered out. The experimental results show that the proposed method has a high accuracy and is helpful to network security.

References 1. Chen, Y.: Network security situational awareness standard architecture design. Inf. Secur. Res. 7(9), 844–848 (2021). (in Chinese) 2. He, R.Z.: Research on instability control method of multi-source network intrusion based on situation awareness. Autom. Instrum. 7, 30–33 (2021). (in Chinese) 3. Zhu, K.: Application of machine learning in network intrusion detection. Data Acquisition Process. 32(3), 79–488 (2017). (in Chinese) 4. Ma, Z.F.: Network intrusion detection based on IPSO-SVM algorithm. Comput. Sci. 45(2), 231–235+260 (2018). (in Chinese) 5. Yan, L.: Network intrusion detection based on GRU and feature embedding. Chin. J. Appl. Sci. 39(4), 559–568 (2021). (in Chinese) 6. Hua, H.Y.: A network intrusion detection algorithm based on Kmeans and KNN. Comput. Sci. 43(3), 158–162 (2016). (in Chinese) 7. Yi, Y.F.: Application research of K-means algorithm in network intrusion detection. Softw. Guide 12(2), 124–126 (2013). (in Chinese) 8. Yang, L.P.: Research and application of Apriori association rule algorithm in network intrusion detection. Digit. Technol. Appl. 36(8), 107–108 (2018). (in Chinese) 9. Xiao, G.Y.: Application of Apriori algorithm in network intrusion detection system. Microcomput. Inf. 26(6), 71–72+113 (2010). (in Chinese) 10. Yu, Y.T.: Intrusion detection alert filtering based on heterogeneous information. Comput. Eng. Design 2006(17), 3181–3183+3298 (2006). (in Chinese)

Chapter 11

A Fast Heuristic k-means Algorithm Based on Nearest Neighbor Information Junkuan Wang, Qing Wen, and Zizhong Chen

Abstract The k-means algorithm is a widely used clustering algorithm, but the time overhead of the algorithm is relatively high on large-scale data sets and highdimensional data sets. In this paper, we propose an efficient heuristic algorithm with the main idea of narrowing the search space of sample points and reducing the number of sample points reallocated during the iteration. Experimental results show that our algorithm has excellent time performance in most of the datasets, and the clustering quality is almost the same or even better than that of the exact k-means algorithm. Keywords K-means · Exact K-means · Approximate K-means

11.1 Introduction The k-means algorithm is one of the classical clustering algorithms. Among many types of clustering algorithms, k-means algorithm is one of the simpler and more effective ones and belongs to one of the division-based clustering algorithms [1]. As a clustering analysis method, k-means algorithm has an important role in various fields, such as data mining, machine learning, target detection, artificial intelligence, and so on [2]. K-means algorithm of Lloyd [2], also known as standard k-means algorithm. has been one of the top ten data mining algorithms due to its simplicity and efficiency This algorithm is simple and efficient [3] and is one of the important algorithms for data mining. At the same time, the algorithm is an np-hard problem and usually does not lead to a global optimal solution, but only to a local optimal solution [4]. The distance measure of the k-means algorithm is the Euclidean distance, and the algorithm has a two-stage process: In the assignment phase, each point calculates the distance to the other cluster centers and then assigns the point to the nearest J. Wang · Q. Wen Chongqing Key Laboratory of Computational Intelligence, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications Chongqing, Chongqing, China Z. Chen (B) University of California, Riverside, Riverside, CA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_11

111

112

J. Wang et al.

cluster center, and the update phase, in which the cluster-center coordinates of each cluster are recalculated after the assignment phase. These two phases are repeated sequentially until no cluster centers move. Since the twentieth century, scholars have proposed various improvement algorithms of the K-means algorithm. The improvements have been related: (1) optimization of the selection of initial centroids; (2) accelerate approximate k-means; (3) accelerate the exact k-means algorithm.

11.1.1 Optimization of the Selection of Initial Centroids Optimizing the initial centroid is an important topic in the study of the k-means algorithm. This is mainly because the initial centroid affects both the convergence speed and the clustering quality of the algorithm. Choosing different initial clustering centers will make the final the clustering results are different [5, 6]. Among then, k-means + + is currently the most widely used method for initializing the centroid. This method proposes an adaptive sampling scheme called D2 sampling [5], which is a simple non-uniform sampling technique for choosing points from a set of points. To improve the efficiency of k-means + + Bachem et al. proposed a sampling method based on MCMC sampling to replace D2 sampling [1], and then further improved the algorithm to improve the efficiency and robustness of the algorithm [6]. Newling et al. improved the Clarans algorithm in terms of complexity and running time [7], making it effective for large data sets [8].

11.1.2 Accelerate Approximate K-means Another improvement strategy of the k-means algorithm is to accelerate the approximate k-means algorithm. These algorithms have a characteristic, which is to obtain greater time performance by sacrificing certain clustering quality. Among them, Jperez and Pazos et al. proposed an improvement using a new convergence condition [9], and Sculley [10] introduced a small batch sampling method. Wang et al. [3] proposed an accelerated algorithm that uses multiple random space partition trees to pre-assemble the data into multiple sets of neighboring points to effectively identify active points and uses neighborhood information to construct the closure of each cluster. Joaquín Pérez et al. proposed a heuristic algorithm for early classification [11]. Shen et al. accelerated large-scale data clustering (CKM) by compressing highdimensional data into binary codes [12]. Qinghao Hu et al. proposed to accelerate the k-means algorithm (MKM) through multi-stage filtering [13]. Deng et al. compare each data sample with its nearest neighbors by using approximate nearest neighbor (KNN) graphs [14].

11 A Fast Heuristic k-means Algorithm Based …

113

11.1.3 Accelerate Exact K-means The clustering results of the exact k-means acceleration algorithm are the same as the Lloyd algorithm but performs better on the clustering efficiency. Elkan avoids unnecessary distance calculations by using triangular inequalities, but the efficiency of this algorithm is greatly limited due to the need to maintain the upper and lower bounds of each sample point [15]. Therefore, Hamerly improved the Elkan algorithm, which greatly improved the efficiency of the algorithm, but the Hamerly algorithm requires that the search space of the sample points that do not meet the determination condition be all cluster centers [16]. Therefore, the Annular algorithm [17] and the exponion [18] algorithm further accelerate the algorithm by narrowing the search range of those sample points that do not meet the determination condition. Pelleg et al. proposed a blacklisting algorithm to assign some points to each cluster one at a time by constructing a k-d tree based on sample points [19]. Curtin et al. proposed the dual-tree k-means algorithm, which creates two trees with four pruning strategies on both data samples and cluster centers [20]. Yinyang k-means is a highly efficient accelerated precise k-means algorithm, which is accelerated through multiple levels of filters [21]. Another fast exact k-means algorithm is the Ball k-means algorithm [22]. Ball k-means accurately divides the cluster into many areas by representing the cluster with a ball [23], and then restricts the search range of the query point to the neighbor balls of the current ball to reduce a lot of distance calculations.

11.1.4 Our Contribution There are two main contributions to this paper: The first contribution is based on the characteristics of the np-hard problem and proposes an accurate k-means acceleration strategy that uses a combination of approximate k-means and accurate k-means, and then finds another solution for the specified initial centroid, and proves the solution is better than Lloyd’s algorithm in some cases (that is, the within-cluster sum of distortion is smaller). The second contribution is to propose a more efficient k-means heuristic algorithm by applying the exact k-means acceleration algorithm ball k-means to this acceleration strategy.

11.2 A Heuristic K-means Algorithm This chapter proposes a new heuristic k-means algorithm based on nearest neighbor information. In the Lloyd k-means algorithm, a large number of redundant distance calculations can be completely avoided. Even if many exact k-means speedup algorithms reduce the redundant computations of the Lloyd algorithm, it is inevitable

114

J. Wang et al.

that some considerable redundant computations will continue to exist. In general, during the iterative process of k-means, a queried cluster usually only exchanges some sampling points with a few clusters that are closer to its center. Based on the above conclusions, this paper adopts a nearest neighbor relationship [24] to narrow the search range when sample points are reassigned and draws on the more current ball k-means algorithm [22] to reduce unnecessary distance calculations when sample points are reassigned.

11.2.1 Narrow the Search Space of Sample Points This subsection will detail how to narrow down the search for sample points. Before presenting the specific algorithm, some common notations should be introduced. Given a cluster C, C (t −1) represents the cluster after the (t − 1)-th iteration of this cluster, and C (t) represents the cluster after the tth iteration of this cluster. Definition 1 Given a sample point x and a cluster C, in the two adjacent iterations, if the sample point x was allocated to cluster C only once, we call x as boundary points (BPs), that is,     / C (t) ∪ x|x ∈ / C (t−1) , x ∈ C (t) , ∀x ∈ D BPs = x|x ∈ C (t−1) , x ∈

(11.1)

where D represents the sample point set. According to Definition 1, the clusters that are closer to the queried cluster can be found, and these clusters are called the neighbor clusters (BPs) of the queried cluster. The detailed description is given in Definition 2. Definition 2 Given a cluster C, any cluster C i, and a moving point x, in the (t-1)th iteration, if there is a sample point exchange between cluster C and C i , then cluster C i is called as the neighbor’s cluster (NCs) of C, that is,     NCs = Ci |x ∈ C (t−1) , x ∈ Ci(t) ∪ Ci |x ∈ Ci(t−1) , x ∈ C (t) , ∀x ∈ BPs (11.2) Since the point x in the queried cluster C can only be allocated to a cluster around C during the iteration. Therefore, we can think that the sample points in cluster C can only be allocated to the neighbor clusters defined in Definition 2, thereby narrow the search space of sample points in cluster C to achieve the purpose of acceleration.

11.2.2 Reduce the Number of Sample Points for Reallocation There must be some points in the search cluster that have a smaller distance from the search cluster than any other cluster. Therefore, these sample points cannot be

11 A Fast Heuristic k-means Algorithm Based …

115

assigned to other clusters during the iteration. Therefore, these sample points do not require a distance calculation for the nearest neighbors of the search cluster, and the details of how to find some of these sample points are described next. Definition 3 Given a cluster C with center c, for all xi ∈ C. This cluster is called ball cluster, it’s center and radius are represented as follows: |C|

c=

1  xi |C| i=1

rc = max(xi − c)

(11.3)

Definition 4 Given a cluster of spheres C and C j centered at c and cj . We define the radius of the stable area as rc−sta =

   1 min dist c, c j 2 c j ∈NCs

(11.4)

which allows C to be partitioned into two areas, the stable area C sta and the active area C act . ∀x ∈ Csta , x − c ≤ rc−sta , and∀x ∈ Cact , x − c > rc−sta .

11.2.3 Algorithm Flow Chart For the sake of paper length, we do not give specific pseudo-code in this subsection but draw a more intuitive flowchart of the algorithm. The flowchart is shown in Fig. 11.1.

11.3 Experiments To evaluate the performance of the algorithm, we conducted comparison experiments on several data sets. The datasets used in the experiments are mainly from the UCI public dataset, and the specific data information is shown in Table 11.1. Several algorithms were used for the comparison algorithm, including Lloyd, Hamerly [17], blacklist [19], dual-tree [20], and ball k-means [22]. All algorithms in this experiment were implemented using C + + code, and the comparison algorithm used the mlpack machine learning library. The specific experimental results are presented in Tables 11.2 and 11.3. From Table 11.2, it can be concluded that the time performance of our algorithm is much better on most of the data sets and k and is on average nearly twice as fast as ball k-means. Combined with Table 11.1, our algorithm performs better than ball kmeans on higher dimensional datasets, mainly because in high-dimensional datasets,

116

J. Wang et al.

Fig. 11.1 Flowchart of algorithm

Table 11.1 Dataset information

Data set

Size

Dimension

Four-class

862

5

Codrna

59,535

8

Kegg network

65,554

28

Epileptic

11,500

179

Birch

100,000

2

Ijcnn

141,690

22

the ball k-means algorithm will have a very large number of nearest neighbor clusters, and some of these clusters are nonsensical. To judge the quality of clustering, we use the SSE metric. From Table 11.3, the SSE of our algorithm and the exact k-means algorithm are almost the same, and even our algorithm leads by a larger margin on some data sets.

11.4 Conclusion In this paper, we propose a new heuristic k-means algorithm that is excellent in time performance and has a clustering quality comparable to that of the exact kmeans algorithm. This is mainly attributed to two points: (1) we reduce the search space of sample points, allowing the algorithm to converge quickly. (2) we reduce the number of points that need to be reassigned during the iterative process. The

11 A Fast Heuristic k-means Algorithm Based …

117

Table 11.2 Time performance comparison (running time (ms)) Data set

k

Lloyd

Hamerly

Blacklist

Dual-tree

Ball

Ours

Four-class

30

3

2

3

4

0

1

50

6

4

7

8

1

1

100

6

5

6

7

1

1

300

8

9

8

8

3

6

30

60

38

47

70

18

19

50

87

61

64

82

18

14

100

219

157

166

197

31

32

Svmguide

Codrna

Kegg

Epileptic

Birch

Ijcnn

300

395

384

278

299

36

36

30

1598

797

1103

1469

827

467

50

1817

1212

1353

1608

583

363

100

4935

2862

3049

3821

1199

1014

300

20,280

13,874

10,372

13,793

2418

1421

30

2801

1676

1627

2300

2388

1200

50

4356

2302

2047

2679

2369

1300

100

6735

3536

2721

3722

1976

759

300

11,851

7011

5014

5986

1431

788

30

1451

1364

3137

2521

2690

1109

50

1380

2029

2655

2847

1732

1087

100

3933

5271

7666

6856

4390

3774

300

13,177

16,231

23,672

21,165

9116

8953

30

516

184

95

126

50

68

50

1841

373

210

220

82

104

100

4389

940

486

390

116

160

300

20,283

5404

2574

1086

255

380

30

3092

1451

2260

2919

1562

483

50

6441

2811

4801

5890

3863

1195

100

15,642

7748

10,960

12,983

7077

3203

300

45,508

31,527

30,802

33,161

7470

3584

algorithm proposed in this paper is heuristic, and this acceleration strategy works equally well for algorithms other than ball k-means. The algorithm and the speedup strategy provide a solution to the k-means algorithm for large-scale datasets as well as high-dimensional datasets.

118

J. Wang et al.

Table 11.3 Comparison of the clustering quality (SSE) with Lloyd’s algorithm (lower is better) Data set

k = 30 (%)

k = 100 (%)

k = 300 (%)

Average (%)

Four-class

−0.54

k = 50 (%) 0.00

−0.39

−0.50

−0.36

Svmguide

−0.05

0.08

0.41

0.10

0.14

Codrna

0.03

0.01

−0.20

0.24

0.02

Kegg

−2.66

−2.97

0.13

0.34

−1.29

Epileptic

−0.08

0.12

0.18

−0.06

0.04

Birch

0.00

0.00

0.00

0.03

0.01

Ijcnn

0.00

0.01

0.00

−0.04

−0.01

Acknowledgements This work was supported in part by National Key Research and Development Program of China (2019QY(Y)0301, the National Natural Science Foundation of China under Grant Nos. 62176033 and 61936001, and the Natural Science Foundation of Chongqing No. cstc2019jcyjcxttX0002.

References 1. Bachem, O., Lucic, M., Hassani, S. H., et al.: Approximate k-means++ in sublinear time. In: Thirtieth AAAI Conference on Artificial Intelligence (2016) 2. Botía, J.A., Vandrovcova, J., Forabosco, P., et al.: An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst. Bio. 11(1), 1–16 (2017) 3. Wang, J., Wang, J., Ke, Q., et al.: Fast Approximate K-Means via Cluster Closures. In: Multimedia Data Mining and Analytics, pp. 373–395. Springer, Cham (2015) 4. Wu, X., Kumar, V., Quinlan, J.R., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008) 5. Arthur, D., Vassilvitskii, S.: k-means++: The Advantages of Careful Seeding. Stanford (2006) 6. Bachem, O., Lucic, M., Hassani, H., et al.: Fast and provably good seedings for k-means. Adv. Neural Inf. Proc. Syst. 29, 55–63 (2016) 7. Ng, R.T., Han, J.: E cient and E ective clustering methods for spatial data mining. In: Proceedings of VLDB, pp. 144–155 (1994) 8. Newling, J., Fleuret, F.: K-medoids for k-means seeding. In: arXiv preprint arXiv:1609.04723 (2016) 9. Pérez, J., Pazos, R., Cruz, L., et al.: Improving the efficiency and efficacy of the k-means clustering algorithm through a new convergence condition. In: International Conference on Computational Science and Its Applications, pp. 674–682. Springer, Berlin, Heidelberg (2007) 10. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on world Wide Web, pp. 1177–1178 (2010) 11. Pérez, J., Pires, C.E., Balby, L., et al.: Early classification: A new heuristic to improve the classification step of k-means. J. Inf. Data Manag. 4(2), 94–94 (2013) 12. Shen, X., Liu, W., Tsang, I., et al.: Compressed k-means for large-scale clustering. In: Thirtyfirst AAAI Conference on Artificial Intelligence (2017) 13. Hu, Q., Wu, J., Bai, L., et al.: Fast k-means for large scale clustering. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2099–2102 (2017) 14. Deng, C.H., Zhao, W.L.: Fast k-means based on k-NN Graph. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1220–1223. IEEE (2018)

11 A Fast Heuristic k-means Algorithm Based …

119

15. Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 147–153 (2003) 16. Hamerly, G.: Making k-means even faster. In: Proceedings of the 2010 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 130–140 (2010) 17. Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In: Partitional Clustering Algorithms, pp. 41–78. Springer, Cham (2015) 18. Newling, J, Fleuret, F.: Fast k-means with accurate bounds. In: International Conference on Machine Learning, pp. 936–944. PMLR (2016) 19. Pelleg, D.: Extending K-means with efficient estimation of the number of clusters in ICML. In: Proceedings of the 17th International Conference on Machine Learning. pp. 277–281 (2000) 20. Curtin, R.R.: A dual-tree algorithm for fast k-means clustering with large k. In: Proceedings of the 2017 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp. 300–308 (2017) 21. Ding, Y., Zhao, Y., Shen, X., et al.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: International Conference on Machine Learning, pp. 579– 587. PMLR (2015) 22. Xia, S., Peng, D., Meng, D., et al.: A fast adaptive k-means with no bounds. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020) 23. Xia, S., Liu, Y., Ding, X., et al.: Granular ball computing classifiers for efficient, scalable and robust learning. Inf. Sci. 483, 136–152 (2019) 24. Peng, D., Chen, Z., Fu, J., et al.: Fast k-means Clustering Based on the Neighbor Information. In: 2021 International Symposium on Electrical, Electronics and Information Engineering, pp. 551–555 (2021)

Chapter 12

Global Analysis of Discrete SIR and SIS Systems Fang Zheng

Abstract In this paper, an SIS continuous model without death caused by disease is proposed, and it is proved that the model usually has a unique globally asymptotically stable equilibrium point. For this continuous model, the corresponding discrete model is put forward by using two methods, and the stability of the discrete model is analyzed. Both period doubling and chaos are excluded from the model, and the global stability results show that these new models have very similar behavior to their continuous models. The basic reproduction number and the positive equilibrium point of the first discrete model and the continuous model are the same, but the basic reproduction number and the positive equilibrium point of the second discrete model and the continuous model are different. This shows that the discrete methods are different, resulting in the same continuous model, the corresponding discrete model behavior will change. Keywords Discrete model · SI systems · SIS systems · Global stable

12.1 Introduction In the past, when studying the transmission principle of infectious disease, most of them used the continuous infectious disease model, because the continuous model can easily study its pathological characteristics. Few literatures conducted in-depth studies on discrete models of infectious diseases comparatively. Allen [1] once studied the discrete SI (Susceptible-Infected), SIR (Susceptible-InfectedRecovered), and SIS (Susceptible-Infected-Susceptible) infectious disease systems with constant total population and standard infection rate, and found that under some natural constraints, the discrete SI and SIR infectious disease systems and the continuous model actually had similar characteristics, which indicated that the adoption of these two systems could reflect the characteristics at this time. But the SIS model can be periodized or even chaotic when the parameters are given certain values. F. Zheng (B) Department of Basic, Air Force Engineering University, Xi’an, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_12

121

122

F. Zheng

Allen [1] and Burgin in [2] analyze and compare deterministic and stochastic discrete SIS and SIR infectious disease models. They determined the basic regeneration number of the model under certain constraints. Castillo-Chavez and Yakubu in [2] study the discrete SIS model with diffusion and nonlinear incidence. If the generative function is selected in some complex forms, such as the Richer model and the Verhulst equation [3], then bistable phenomena and period-doubling phenomena can occur. Zhou and Fergola in [4] consider a discrete age structure model of SIS type infectious disease, find the basic regeneration number, and identify the infectious disease. Assuming that the disease spreads only among adults, a method for determining the basic number of rebirths is proposed. Many discrete models are discrete continuous models and are not direct descriptions of discrete processes. In addition, some discrete models are directly established from the perspective of biological theory, rather than discrete continuous processes, such as [5], models can produce period-doubling and chaotic phenomena. Since the data are collected over discrete time periods, it is easier to analyze the problem in a model with discrete form if the assumptions of the model background are given in discrete form. A new discrete model SI and SIS infectious disease model are proposed to describe the discrete process of a continuous model by two methods. The properties of the two new discrete models are both continuous. There are differences in the models. The model directly discretized by the first method has been proved in article [5] that it is possible to produce period-doubling and chaos if certain birth functions are selected. The other discrete model, which uses exponential and Poisson probability distribution to describe survival rate and disease infection rate in discrete time, is different from the continuous model in both positive equilibrium point and basic regeneration number unless the discrete time is very small [6].

12.2 Continuous Model In the article [7], it is assumed that the infection rate β(N ) = β is a constant, which is quite contrary to the actual situation, because usually the infection rate is not constant, it often changes with the number of patients affected, so we propose a basic continuous SIS model with disease-free mortality: 

S  = f (N ) − β(I )S I − μS + r I I  = β(I )S I − (μ + r )I

(12.1)

N = S + I is the total population in there, S is the healthy population. The model has a birth function dependent on the total population, with the same natural mortality rate in proportion μ for both healthy and sick populations. We assume that the exposure rate β(I ) for each person per unit time is a function of the sick person I , with an initial value β(0) < ∞. Assuming that there are no deaths caused by disease,

12 Global Analysis of Discrete SIR and SIS Systems

123

let r ≥ 0 represent the recovery rate of those who recover and move to health. If the model is SI model, it means that the patient will always be sick and will not recover and become healthy, which is the special case of r = 0. Therefore, we first analyze the SIS model, which is a special case. α for model (12.1), α is an infection parameter, so system We take β(I ) = 1+I (12.1) becomes: 

αSI − μS + r I S  = f (N ) − 1+I αSI  I = 1+I − (μ + r )I

(12.2)

For the system (12.2), the total population N satisfies the equation N  = f (N ) − μN

(12.3)

When K satisfies the equation f (K ) = μK , if f  (K ) < μ, we have N (t) → K when n → ∞. The constant K replaces N , K − I instead of S in system (12.2) at the same condition [8], so we will get system (12.2) dimension reduction into the following systems: I  = β(I )I (K − I ) − (μ + r )I

(12.4)

Following, we analyze the model (12.4) and know that there is a disease-free equilibrium point I0 = 0 in the system. We defined basic reproduction number R0 is: R0 =

αK μ+r

Theorem 12.1 If R0 < 1, the disease-free equilibrium I0 is locally asymptotically stable. If R0 > 1, then the disease-free equilibrium is unstable. If R0 > 1, there is only one positive equilibrium I1 in system (12.4), and I1 is locally asymptotically stable.

12.3 Discretization of Continuous Models We discretize the continuous SIS model (12.2) from [9], assuming that the time interval is fixed constant h, so tn+1 − tn = h, tn = t0 + nh, n = 0, 1, ...

124

F. Zheng

From the previous discussion, we saw that the continuous model has a simple behavior, that is, it has a unique locally stable equilibrium point. In (12.4) change the birth function. Then we consider whether the discrete system with continuous model (12.2) will also produce complex phenomena. Let Sn = S(tn ), In = I (tn ), Nn = N (tn ) = S(tn ) + I (tn ). We have usually the following systems: 

αhIn Sn+1 = hf(Nn ) − hμSn + Sn [1 − 1+I ] + rhIn n αhIn In+1 = In [1 − (μ + r )h] + 1+In Sn

(12.5)

r = 0 is a special case SI model of SIS model. We add the two equations of system (12.5) to obtain: Nn = hf(Nn ) + (1 − hμ)Nn

(12.6)

The Eq. (12.6) has an equilibrium point N = K , where K is determined by f (K ) = μK if f  (K ) < μ and hf(K ) + (1 − hμ) > −1, or h < μ− f2 (K ) . System (12.5) is an asymptotic autonomous system, and K replaces Nn and K − In replaces Sn , we have the same properties, so: In+1 = In [1 − (μ + r )h] +

αhIn (K − In ) 1 + In

The Eq. (12.7) has a disease-free equilibrium point I = 0. If R0 = ∗

system has a positive equilibrium point I =

α K −μ−r . K +μ+r

(12.7) αK μ+r

> 1, the

Theorem 12.2 If R0 < 1 and h < μ+r2−α K , system (12.7) of the disease-free 2 equilibrium I = 0 is locally asymptotic in stability; if R0 > 1 and h < α K −μ−r , system (12.7) the positive equilibrium point of I ∗ is locally asymptotically stable.

12.4 The Second Discrete Model In this section, we reestablish a different discrete system for continuous Eq. (12.1). In this new discrete model, we describe the results of these continuous processes by treating natural mortality and recovery rates as continuous processes in discrete time periods. t We’ll call the number of people born is tnn+1 f (N (s))ds in the time period [tn , tn+1 ]: we’ll call it hf(Nn ) roughly. For each healthy person in the time period

12 Global Analysis of Discrete SIR and SIS Systems

125

t α I (s) [tn , tn+1 ], the effective number of contacts with the sick person is tnn+1 1+I ds. Thus, (s) using the probability of Poisson distribution, the probability of healthy people being t α I (s) αhI − tnn+1 1+I ds (s) healthy without being infected is e , which is roughly regarded as e− 1+I . Suppose that there is the same proportionate mortality rate μ for both the sick and the healthy, so that the exponential distribution tells us that each population survives in proportion from time tn to time tn+1 . Similarly, other proportional probabilities are e−μh expressed in exponential form. So in time tn , we propose a corresponding new discrete system for the continuous system (12.1): 

αhI(n)

S(n + 1) = hf(N (n)) + S(n)e−μh e− 1+I (n) + I (n)e−μh (1 − e−r h ) αhI(n) I (n + 1) = I (n)e−μh e−rh + S(n)e−μh (1 − e− 1+I (n) )

(12.8)

And we know from the model above that the initial value is S(0) ≥ 0, I (0) ≥ 0 for all n, we have S(n) ≥ 0, I (n) ≥ 0. For model (12.8), the sum of its two equations yields N (n + 1) = S(n + 1) + I (n + 1)=g(N (n)) = hf(N (n)) + e−μh N (n)

(12.9)

The discrete Eq. (12.9) has a positive equilibrium K point when g(K ) = K , where −μh f (K ) = K 1−eh . If the positive equilibrium point K is asymptotically stable, we need f  (K )
−1.

(12.11)

For condition (12.11), when h < h ∗ is satisfied, where, if f  (K ) ≥ 0, h ∗ = ∞; if f (K ) < 0, then h ∗ satisfies the positive root of the equation hf (K ) + e−μh = −1. If we need a discrete infectious disease model and there must be a locally asymptotically stable equilibrium point, then we need to assume that the conditions (12.10) and (12.11) are satisfied. 

Remark 12.1 If the population birth function of the model f (Nn ) is constant, then for any h condition (4.3) and (4.4) are naturally satisfied. Under the conditions (12.10) and (12.11), the solution of Eq. (12.9) has a limit K when t → ∞. A one-dimensional differential system is obtained by referring to [10]: αhIn

αhIn

In+1 = K e−μh (1 − e− 1+In ) − In e−μh [1 − e− 1+In − e−μh ] Let r = e−(μ+r )h ≤ s = e−μh < 1,

126

F. Zheng

So, αhIn

In+1 = s(K − In )(1 − e− 1+In ) + r In

(12.12)

For SI model we have r = s. Let In+1 = f (In ), αhI

f (I ) = s(K − I )(1 − e− 1+I ) + rI ≥ 0.

(12.13)

Lemma 12.1 The inequality 0 < f (I ) < K is true for all I if it satisfied 0 < I < K . Proof And for the function f (I ), we know that it satisfies f (0) = 0, f (K ) = rK < K for any one of I ∈ (0, K )   αhI αhI f  (I ) = −s 1 − e− 1+I + (K − I )se− 1+I αhI

f  (I ) = −se− 1+I

αh + r, (1 + I )2   αh(K − I ) 1 + 2K − I αh 1+ < 0. + (1 + I )2 (1 + I )2 1+ I

  αhK Thus, f  (0) = sαhK + r > 0 and f  (K ) = −s 1 − e− 1+K + r . If f  (K ) ≥ 0, because f  (I ) < 0., so for all 0 < I < K we have f  (I ) > 0.. So f (I ) < f (K ) < K .. For the special case SI model, there must be f  (K ) ≥ 0. for s = r . If f  (K ) < 0, on the other hand, exists I M ∈ (0, K ) such that f  (I M ) = 0, i.e. αhI

se− 1+I =

s −r , αh 1 + (K − I ) (1+I )2

At this point, the function f (I ) has maximum f (I M ) at a point I = I M in the interval (0, K ). When r < s f (I M ) =(K − I M )

shα(K −I M ) +r (1+I M )2 αh(K −I M ) 1 + (1+I M )2

+ r IM

< (K − I M )s + rI M < K . So in either case, we’ve proved that inequality 0 < f (I ) < K is true. Next we consider the equation on the interval.

12.5 Stability Analysis Theorem 12.3 If the parameters meet the conditions: shα K + r ≤ 1, there is an only disease-free equilibrium point I = 0 in this system; if shα K + r > 1, there is an only positive equilibrium point I ∗ ∈ (0, K ) in this system, I ∗ satisfied

12 Global Analysis of Discrete SIR and SIS Systems αhI

e− 1+I = 1 −

127

(1 − r )I s(K − I )

(12.14)

Theorem 12.4 If the parameters meet the conditions shα K +r ≤ 1, the disease-free equilibrium point I = 0 is globally stable, so infectious diseases eventually died out. If shα K + r > 1, the positive equilibrium point I = 0 is unstable, so infectious diseases will always be there. Proof For all x > 0, we have 1 − e−I < I , so αhIn

f (In ) = s(K − In )(1 − e− 1+In ) + rIn < s(K − In )

αhIn + r IM 1 + In

< shα In (K − In ) + rIn = In (shα K − shα In + r ) < In (shα K + r ) If shα K +r < 1, In is decreasing to 0. If shα K +r = 1, we have In+1 < In , so the sequence {In } is decreasing. So we know that there is a limit limn→∞ In = I∞ ≥ 0 when In > 0. Because I∞ is the equilibrium point of system (12.14), according to the Theorem 5.1 I∞ = 0, if shα K + r = 1, the only equilibrium point of system (12.14) is I = 0. On the other hand, if shα K + r > 1, then limn→∞ f (II ) = r + shα K > 1, and so when In is small enough, that means In doesn’t go to zero. When shα K + r > 1, the system (12.14) has a positive equilibrium I ∗ , next we study the local stability of a positive equilibrium. Theorem 12.5 If shα K +r > 1, the positive equilibrium I ∗ is locally asymptotically stable in this system. ∗ If the positive equilibrium I is locally asymptotically stable, then it satisfies Proof f  (I ∗ ) < 1. By \ref{lp20} proof can be known F  (I ∗ ) < 0, thus f  (I ∗ ) < 1. Due to 0 < r ≤ s < 1, αhK

f  (I ∗ ) > f  (K ) = r − s(1 − e− 1+K ) > r − s > −1, If f  (I ∗ ) < 1, the positive equilibrium point I ∗ is then locally asymptotically stable. We use the conclusion in [6] to calculate the basic regeneration number of model (12.8), written as R(h), there is R(h) =

αhKe−μh μ, there is a positive number h ∗ , such that R(h) > R0 when 0 < h < h ∗ ; if r ≤ μ, there is a positive number h ∗ , such that R(h) ≤ R0 when 0 < h < h∗. Proof We define g(h) = R(h) − R0 =

α K [h(μ + r )e−μh − 1 + e−(μ+r )h ] (μ + r )[1 − e−(μ+r )h ]

Let g1 (h) = h(μ + r )e−μh − 1 + e−(μ+r )h , we get g1 (0) = 0, g1 (h) = (μ + r )e−μh (1 − μh − e−r h )(μ + r )e−μh g2 (h) So g2 (h) = −μ + r e−r h , g2 (0) = r − μ. The following two cases are discussed: (1)

If r > μ,we have g2 (0) > 0.g2 (h) = −r 2 e−r h < 0, therefore there is h ∗1 such that ∗

g2 (h ∗ ) = −μ + r e−r h = 0, We can solve h ∗ = r1 ln μr . Thus g2 (h) > 0 when 0 < h < h ∗ , g2 (h) < 0 when h > h ∗ . Because of g2 (0) = 0, limn→∞ g2 (h) = −∞ there is a positive number h ∗2 > h ∗1 such g2 (h) > 0 when 0 < h < h ∗ , g2 (h) < 0 when h > h ∗ . Due to g2 (0) = 0, lim g1 (h) = −1, there is h > h ∗2 such that g1 (h) > 0 when n→∞

(2)

0 < h < h ∗ , g1 (h) < 0 when h > h ∗ . So,g(h) and g1 (h) have the same sign, Rh > R0 when 0 < h < h ∗ . If r ≤ μ, g2 (0) ≤ 0。Because of gh (0) < 0, for all h > 0 we have g2 (h) ≤ 0. So for all h > 0 have g1 (h) ≤ 0. The certificate must Rh ≤ R0 .

Therefore, according to the above theorem, when h does not approach 0, if we make appropriate selection of parameters of the model, we can get: Rh > 1 > R0 or R h < 1 < R0 . The new discrete system (12.8) has different behavior from the original continuous system (12.2) because of the different basic regeneration number. Of course, if

12 Global Analysis of Discrete SIR and SIS Systems

129

we take the limit of Rh when h are small enough, then lim Rh = R0 . Therefore, we n→∞ conclude that the asymptotic behavior of the discrete model (12.8) and the continuous system (12.2) are similar when h is small enough. This shows that accuracy is very important in describing continuous model into discrete model. Because of the different discretization methods, the properties of (12.8) and (12.5) are different.

12.6 Globally Asymptotically Stable The system (12.12) usually has a locally asymptotically stable equilibrium point. In fact, this equilibrium point is globally asymptotically stable. If shα K + r < 1, we can prove that the solutions of the system (12.12) all approach 0, such that the equilibrium point I = 0 is globally asymptotically stable. For positive equilibrium points I ∗ , if they are unique, it follows: 1. 2. 3.

1. f (0) = 0. There is only one locally asymptotically stable positive equilibrium I such that f (I ) = I , when 0 < I < I we have f (I ) > I , if I > I we have f (I ) < I . If the maximum value of f (I ) is obtained at the midpoint I M in (0, I ), when I > I M is monotonically decreasing, so f (I ) > 0.

According to the theorem in [11], if the function f (I ) has no maximum value on the interval (0, I ), or f (I ) < 0, f (I ) > 0, then the equilibrium point I of the function is globally asymptotically stable. From the previous proof of lemma (12.8), we know that these conditions are satisfied, so the theorem is proved. If the positive equilibrium of the system (12.5) is locally asymptotically stable, the same method can be used to prove that the positive equilibrium is globally stable [12]. We draw the following conclusions: Theorem 12.7 If Rh ≤ 1, the system (12.13) has a unique disease-free equilibrium I = 0 and is globally stable, the infection will not continue to spread and will eventually become extinct. So, if Rh > 1, the disease-free I = 0 equilibrium is unstable. The only positive equilibrium of the system is globally stable, so the disease will persist.

12.7 Conclusion We give two different discrete systems for the same continuous system by two different discrete methods, and analyze the stability of the system. Different discrete systems can change the behavior of the system, so we find the discrete method is very important when we study the system.

130

F. Zheng

References 1. Allen, L.J.S.: Some discrete-time SI, SIR, and SIS epidemic models. Math Biosci. 124(1), 83–105 (1994) 2. Castillo-Chavez, C., Yakubu, A.A.: Discrete-time S-I-Smodel with complex dynamics. Nonlinear Anal. 47(7), 4753–4762 (2001) 3. Allen, L., Burgin, A.M.: Comparison of deterministic and stochastic SIS and SIR models, Dept. Math. Statistics, Technical Report Series (1999) 4. Castillo-Chavez, C., Yakubu, A.A.: Dispersal, disease and life-history evolution. Math. Biosci. 173(1), 35–53 (2001) 5. Emmert, K.E., Allen, L.J.: Population persistence and extinction in a discrete-time, stagestructured epidemic model. J. Differ. Equations Appl. 10(13–15), 1177–1199 (2004) 6. Cushing, J.M.: An introduction to structured population dynamics. Soc. Ind. Appl. Math. (1998) 7. Li, J., Ma, Z., Brauer, F.: Global analysis of discrete-time SI and SIS epidemic models. Math. Biosci. Eng. 4(4), 699–710 (2007) 8. Castillo-Chavez, C., Thieme, H.H.: Asymptotically autonomous epidemic models. In: Arino, O., Axelrod, D., Kimmel, M., Langlais, M. (eds.) Mathematical Population Dynamics: Analysis of Heterogeneity. Theory of Epidemics, vol. 1, pp. 33–50. Wuerz, Winnipeg (1993) 9. Li, X., Wang, W.: A discrete epidemic model with stage structure. Chaos, Solitons Fractals 26(3), 947–958 (2005) 10. Zhou, Y., Fergola, P.: Dynamics of a discrete age-structured SIS models. Discr. Cont. Dyn. Syst.-B 4(3), 841–850 (2004) 11. Zhao, X.Q.: Asymptotic behavior for asymptotically periodic semiflows with applications. Comm. Appl. Nonlinear Anal. 3(4), 43–66 (1996) 12. Cull, P.: Local an global stability for population models. Biol. Cybern. 54(3), 141–149 (1986)

Chapter 13

Image-Based Physics Rendering for 3D Surface Reconstruction: A Survey Da Fang, Zhibao Qin, Shaojun Liang, and Renbo Luo

Abstract Obtaining 3D surface information and physical material information of an object from images is an essential research prospect in computer vision and computer graphics. Image-based 3D reconstruction is to extract the 3D depth information of the scene and objects from single or multiple images through specific algorithms to reconstruct the 3D model of objects or locations with robust realism, which has fast reconstruction speed, simple equipment, realistic effect, and minor technical data, which can better realize the virtualization of natural objects. Keywords Image · 3D reconstruction · Physics rendering

13.1 Introduction With the development of computer vision technology, the result of 3D object reconstruction has gone through a long process. People mainly obtain 3D models in three ways: traditional geometric modelling technology, that is, using conventional geometric modelling technology to construct models directly; 3D reconstruction based on the point cloud, that is, using 3D scanning equipment to scan natural objects, and then reconstructing the model; 3D reconstruction based on image, that is, the model is reconstructed by using multiple images of natural objects taken from different perspectives. As the image-based 3D reconstruction technology constructs the model through the image information, it can give the colour information of the corresponding object, thus making the model more vivid. Image-based 3D reconstruction techniques can be divided into two categories: one is to reconstruct the model through multiple depth images, and the other is to generate the visual shell of the object through numerous multiple photos [1]. At present, the image-based 3D reconstruction techniques with application value mainly include laser scanning method [2], time-of-flight method [3], structured light method [4], shape from shading, shape from silhouettes [5], shape-from-motion [6], shape-from-texture [7], D. Fang · Z. Qin · S. Liang · R. Luo (B) Yunnan Key Lab of Optic-Electronic Information Technology, Yunnan Normal Universit, Kunming, Yunnan, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_13

131

132

D. Fang et al.

shape-from-focus [8], Photometric stereo [9] and so on. According to whether contact with the target object, it can be divided into contact type and non-contact type [10]. The number of captured image cameras can also be divided into 3D reconstruction based on monocular vision, 3D reconstruction based on binocular vision and 3D reconstruction based on multi-vision [11, 12].

13.2 Research Status of 3D Reconstruction Based on Image The 3D reconstruction technology is to truly reconstruct the 3D object surface virtual model in the computer and construct a complete 3D model of the object. Firstly, the point cloud data of the thing is collected from all angles by using image acquisition equipment such as a camera. A single camera can only photograph the object from one angle. To obtain the complete information of the object surface, it is necessary to shoot the thing from multiple angles. The obtained point cloud data are transformed to the same coordinate system to complete the registration of multi-viewpoints cloud data. Secondly, the mesh surface of the model is constructed according to the registered point cloud data. As image imaging technology is becoming more and more mature and has the characteristics of low price, image-based 3D object reconstruction is more and more favoured by researchers. For image-based 3D reconstruction, it is defined as: under the condition that the physical material, viewing angle and illumination of the target object are known, the most likely corresponding 3D shape is estimated by using a specific computer algorithm. 3D reconstruction based on graphics needs to know the physical material, viewing angle and lighting conditions of the target object, so in the particular operation, through some artificial assumptions to make the reconstruction results more accurate, but also simplify the reconstruction process and improve the reconstruction efficiency. For example, it is assumed that the target object is a Lambert body, that is, the incident energy is uniformly reflected in all directions, the incident energy is centred on the incident point, and the reflected energy is isotropic in the whole hemispheric space, and it is assumed to be a complete diffuser. This assumption makes it possible to approximate the optical phenomena reflected by light sources and objects without destroying the original properties of light.

13.3 Image-Based 3D Surface Reconstruction 13.3.1 Laser Scanning Method 3D laser scanning technology appeared in the 1980s, which makes laser a light source and by the principle of laser ranging, through recording the 3D coordinates, reflectivity and texture of a large number of dense points on the surface of the measured

13 Image-Based Physics Rendering for 3D Surface Reconstruction …

133

object, the 3D model of the measured object and all kinds of data such as line, texture and volume can be quickly reconstructed [2]. Through the method of high-speed laser scanning measurement, the (x.y.z) coordinates, reflectivity, (R.G.B) colour and other information of each point on the object surface can be quickly obtained in large area and high resolution. The true colour 3D point cloud model at 1:1 can be reconstructed promptly from this massive and dense point information. However, the 3D point cloud data collected by scanning equipment often contain some defects, such as noise, uneven sampling, data loss, feature loss. At the same time, the accuracy is easily affected by the surrounding environment. Therefore, it is challenging to collect objects with a large depth of field, and local detail feature restoration, texture colour restoration and resolution are not high.

13.3.2 Time-Of-Flight Method Time-of-flight (ToF) refers to obtaining the distance by measuring the flight time interval between the transmitted signal and the received signal under the premise of a certain speed of light and sound [3]. The depth information of the target scene is recovered by calculating the time-of-flight difference between the light pulse and the receiver after the light pulse is reflected by the target surface. The time-of-flight method has the advantages of not being limited by the length of the baseline, independent of texture, fast imaging speed and so on. However, the resolution of the ToF camera is shallow, which is easily affected by environmental factors, such as mixed pixels, external light sources, resulting in inaccurate scene depth, systematic error and random error have a significant impact on the measurement results, so it is necessary to carry out post-data processing [3]. Unlike the structured light technology, which is limited by the camera focal length and field, the time-of-flight method has good stability and high accuracy for long-distance 3D measurement scenes of large scenes and is suitable for obtaining depth information of dynamic scenes [13].

13.3.3 Structured Light Method The structured light method [4] uses the standard grating stripe structured light to project to the object surface and the projection light strip changes with the fluctuation of the object surface shape. The camera captures the image of the object surface and extracts the 3D information of the object from the fringe pattern modulated by the object surface shape. The structured light method is based on the measurement theory of optical triangulation, and the camera and optical projector should be calibrated before measurement. A typical structured light system consists of a projector, a camera and a computing unit. The projector first projects a pattern with specific coding information onto the surface of the target scene, based on the coded information image modulated by the target surface obtained by the camera. The

134

D. Fang et al.

decoding algorithm establishes the correspondence of spatial 3D points in the image coordinates of the camera and the projector, and the 3D point cloud is obtained. The structured light method solves the problems of the flat surface, single texture, slow grey change, etc. Because of its simple implementation and high precision, the structured light method is widely used, such as Microsoft’s Kinect [14].

13.3.4 Shape from Shading Method The shadow boundary of the image contains the contour feature information of the image, so the depth information of the object surface can be calculated by using the light and shade degree and shadow of the image under different lighting conditions, and the reflected illumination model can be used for 3D reconstruction. The shadow method uses the change of the imaging surface’s brightness to analyse the object’s vector information, which is converted into surface depth information. The shadow restoration shape method has a wide range of applications that can restore the 3D models of all objects except the mirror. The disadvantage is that the process is primarily mathematical calculation, and the reconstruction results are not acceptable enough. Another one that cannot be ignored is that the shadow method needs accurate light source parameters, including position and direction information, which cannot be used in the case of complex light.

13.3.5 Shape from Silhouettes Method The contour method is also known as shape from silhouettes [5]. The 3D contour model of the object is obtained from the contours of objects with different viewing angles. In 1983, Martin et al. first proposed the contour method with low computational complexity and high reconstruction efficiency. However, the contour method has strict requirements on the internal and external parameters of the camera. Because it can only reconstruct the convex hull, it is unable to build the depressions and holes on the object’s surface, so it is generally used in low detail or preliminary 3D reconstruction. Visual shell is a geometric entity constructed by shape-to-contour 3D reconstruction technology introduced by Laurentini [15]. This technique assumes that the foreground objects in the scene can be separated from the background. Based on this assumption, the original image can be threshold-processed into a front/background binary image called a silhouette image.

13 Image-Based Physics Rendering for 3D Surface Reconstruction …

135

13.3.6 Shape-from-Motion Method The shape-from-motion (SFM) is when the target moves in front of the camera or the camera moves in a fixed environment. The obtained changing image can be used to restore the relative motion between the camera and the target and the relationship between multiple targets in the scene. The point correspondence between the images is needed, and the matching feature points are detected from numerous images taken at any angle to estimate the camera position. SFM [6] can recover the image position (external orientation elements of the picture) and the structure information of the scene (3D point cloud data) at the same time, but since the shape-from-motion method can only calculate the 3D coordinates of the matching feature points, usually the image contains only a small amount of feature point information, so the accuracy of the model is low.

13.3.7 Shape-from-Texture Method The texture method obtains the deformation through the image measurement and then calculates the depth data in reverse according to the deformed texture elements. In 1981, Witkin proposed the shape-from-texture method under orthogonal projection [7]. By analysing the size and shape of the repeated texture units of the object in the image, the normal direction and depth of the object surface were estimated. In 2010, Warren et al. [16] proposed the shape-from-texture method under perspective projection, which eliminated ambiguity and greatly improved the effect of reconstruction. The shape-from-texture method can quickly obtain accurate 3D reconstruction targets from a single image. Still, this method has strict requirements for the texture information of the object surface, so it needs to know the distortion information of the texture elements in the imaging projection, and its application range is narrow. It is only suitable for some special situations, such as determining texture characteristics or scenes with regular textures.

13.3.8 Shape-from-Focus Method The shape-from-focus (SFF) [8] restores the depth information of an object through the relationship between the clarity of the image and the distance from the object to the projection centre. Because the relationship between image sharpness and length is nonlinear, this method requires multiple images to reconstruct. The shape-from-focus method has a simple structure, relatively accurate measurement results, low requirements for the light source and good reconstruction effect. However, this method needs to change the focal length and aperture of the camera constantly, and the reconstruction effect of complex texture surface is poor. The measurement process takes a long

136

D. Fang et al.

time. At the same time, as its movable nature, its accuracy and reliability are reduced. Therefore, its application is limited.

13.3.9 Photometric Stereo Stereo vision [17] mainly includes three ways: obtaining range information directly by rangefinder, speculating 3D information from one image and restoring 3D information by using two or more images from different viewpoints. In stereo vision, the famous researches are by the photometric stereo method. According to the Color Theory, the object surface colour is also different under different light strips. Photometric stereo vision takes the same surface of the object under illumination in all directions. This group of images are used to reconstruct the surface shape of the object. In 1979, Woodham first put forward the concept of photometric stereo [9]. Under the same visual angle, the shading information of multiple images was obtained under different illumination conditions to solve the object surface’s normal direction for 3D reconstruction. The photometric stereo method can obtain accurate 3D reconstruction results from a few photos. Multiple image constraints are used to solve the unsolvable problems such as shadows in the lightness method, which increases the robustness and accuracy of the algorithm. The reconstruction accuracy of the photometric stereoscopic 3D reconstruction algorithm depends on incident light’s stability and surface reflection characteristics. Among them, the commonly used surface reflection models include the Lambert reflection model, specular Phong model, Blinn–Phong reflection model, BRDF (bi-directional reflection distribution function) reflection model, etc. (shown in Table 13.1).

13.4 Summary Compared with the traditional 3D reconstruction, image-based 3D reconstruction has the advantages of easy data acquisition, simple acquisition equipment and a wide range of scene applications. Image-based 3D reconstruction is most likely to achieve the accurate restoration of texture details and colour of the target object at the same time, especially in the environment with complex lighting conditions and environmental interference, at the same time, image-based 3D reconstruction also has the low price equipment, which makes it the preferred method in film, television, entertainment and other industries. In the aspect of 3D reconstruction for complex scenes, the above single algorithm more or less has some limitations and weaknesses. In addition to proposing an improved scheme of a single algorithm, multi-sensor fusion schemes based on different reconstruction algorithms are also a critical research way to make up for the shortcomings of a single algorithm. In 2009, Kim et al. [18] proposed an integrated multi-view sensor fusion method, which combines the information of a variety of colour cameras and multiple TOF depth sensors to reduce

13 Image-Based Physics Rendering for 3D Surface Reconstruction …

137

Table 13.1 Comparison of advantages and disadvantages of 3D reconstruction technology Method

Advantage

Disadvantages

Laser scanning method Fast speed and high resolution

Expensive equipment, uneven sampling and limited applicability

Time-of-flight method

Data collection frequency high, fast speed, less affected by ambient light

Error large, low accuracy and high hardware cost

Structured light

Easy and convenient, high reconstruction accuracy

Slow reconstruction, affected by ambient light

Shape from shading

Simple equipment, low requirements for input images

Special light source is needed, the reconstruction effect is poor, the process is more complicated, and the scope of application is small

Shape from silhouettes High reconstruction efficiency and Accuracy low, and the cavities simple equipment and depressions on the surface of the object cannot be reconstructed Shape-from-motion

Recover the image position (external orientation elements of the picture) and the structure information of the scene

Large amount of calculation and long reconstruction time, accuracy of the model is low

Shape-from-texture

Can quickly obtain accurate 3D reconstruction targets from a single image

Needs to know the distortion information of the texture elements in the imaging projection, application range is narrow

Shape-from-focus

A simple structure, relatively Not suitable for complex texture accurate measurement results, low surface requirements for the light source and good reconstruction effect

Photometric stereo

High reconstruction accuracy and low hardware cost

Affected by ambient light, poor robustness

the complex noise of TOF sensors and obtain high-quality dense and detailed 3D models. Santo et al. [19] put forward the concepts of Deep Photometric Stereo and Deep Photometric Stereoscopic Network (DPSN), which use the knowledge representation and reasoning ability of deep learning to solve the problems of existing photometric methods in reconstructing complex reflection properties. Žbontar et al. [20] use Convolution Neural Network (CNN) to reconstruct local blocks of the image densely, and other modules still use the traditional method of semi-global matching to optimize the depth map. Niu et al. [21] proposed to restore 3D shape structures from a single RGB image. Chen et al. [22] proposed to use the depth neural network method to learn the light transmission matrix of light passing through transparent objects to realize the 3D reconstruction of end-to-end transparent objects. In the future, with the popularity of computers with significant computing power and the improvement of camera quality, to complete 3D object reconstruction more

138

D. Fang et al.

accurately, especially when 3D reconstruction is applied to areas with higher requirements, such as virtual medical treatment, artificial intelligence, autopilot, precious cultural relics protection and so on, 3D reconstruction is bound to take the road of multi-technology integration, such as the fusion of multi-vision geometry and deep learning method, multi-sensor fusion, the combination of algorithm and hardware, the combination of algorithm and specific application, and so on. Multi-sensor and multi-technology fusion can effectively avoid the limitations of a single method. At the same time, it can improve the accuracy and the integrity of object information.

References 1. Song, P., Wu, X., Wang, M.Y.: A robust and accurate method for visual hull computation. In: 2009 International Conference on Information and Automation, pp. 784–789. IEEE (2009) 2. Rakitina, E., Rakitin, I., Staleva, V., Arnaoutoglou, F., Koutsoudis, A., Pavlidis, G.: An overview of 3D laser scanning technology. In: Proceedings of the International Scientific Conference (2008) 3. Hosseini, S.J., Araujo, H.: A ToF-aided approach to 3D mesh-based reconstruction of isometric surfaces. In: International Conference on Pattern Recognition Applications and Methods, pp. 146–161. Springer, Cham (2014) 4. Hou, Z., Su, X., Zhang, Q.: 3D shape compression based on virtual structural light encoding. Acta Optica Sinica 31(5) (2011) 5. Martin, W.N., Aggarwal, J.K.: Volumetric descriptions of objects from multiple views. IEEE Trans. Pattern Anal. Mach. Intell. 5(2), 150–158 (1983) 6. Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vision 9(2), 137–154 (1992) 7. Witkin, A.P.: Recovering surface shape and orientation from texture. Artif. Intell. 17(1–3), 17–45 (1981) 8. Nayar, S.K., Nakagawa, Y.: Shape from focus: An effective approach for rough surfaces. In: Proceedings IEEE International Conference on Robotics and Automation, pp. 218–225. IEEE (1990) 9. Woodham, R.J.: Photometric stereo: a reflectance map technique for determining surface orientation from image intensity. In: Image Understanding Systems and Industrial Applications I, vol. 155, pp. 136–143. International Society for Optics and Photonics (1979) 10. Chen, X.R., Cai, P., Shi, W.K.: The latest development of optical non-contact 3D profile measurement. Opt. Precis. Eng. 10(5), 528–532 (2002) 11. Kolmogorov, V., Zabih, R.: Multi-camera scene reconstruction via graph cuts. In: European Conference on Computer Vision, pp. 82–96. Springer, Berlin, Heidelberg (2002) 12. Zabulis, X., Daniilidis, K.J.I.: Multi-camera reconstruction based on surface normal estimation and best viewpoint selection. In: Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 3DPVT 2004, pp. 733–740. IEEE (2004) 13. Lange, R., Seitz, P.: Solid-state time-of-flight range camera. IEEE J. Q. Electron. 37(3), 390–397 (2001) 14. Ringaby, E., Forssén, P. E.: Scan rectification for structured light range sensors with rolling shutters. In: 2011 International Conference on Computer Vision, ICCV 2011, pp. 1575–1582. IEEE (2011) 15. Laurentini, A.: The visual hull concept for silhouette-based image understanding. IEEE Trans. Pattern Anal. Mach. Intell. 16(2), 150–162 (1994) 16. Warren, P.A., Mamassian, P.: Recovery of surface pose from texture orientation statistics under perspective projection. Biol. Cybern. 103(3), 199–212 (2010)

13 Image-Based Physics Rendering for 3D Surface Reconstruction …

139

17. Schmid, K., Hirschmüller, H.: Stereo Vision. Icra (2013) 18. Kim, Y.M., Theobalt, C., Diebel, J., Kosecka, J., Miscusik, B., Thrun, S.: Multi-view image and ToF sensor fusion for dense 3D reconstruction. In: IEEE International Conference on Computer Vision Workshops, pp. 1542–1549 (2009) 19. Santo, H., Samejima, M., Sugano, Y., Shi, B., Matsushita, Y.: Deep Photometric Stereo Network. In: IEEE International Conference on Computer Vision Workshop, pp. 501–509 (2017) 20. Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1), 2287–2318 (2016) 21. Niu, C., Li, J., Xu, K.: Im2Struct: recovering 3D shape structure from a single RGB image. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4521–4529 (2018) 22. Chen, G. Y., Han, K., Wong, K.K.: TOM-Net: learning transparent object matting from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9233–9241 (2018)

Chapter 14

Insulator Detection Study Based on Improved Faster-RCNN Zhuangzhuang Jing

Abstract Now, unmanned aerial vehicles are used widespreadly in insulators detection. We aimed at the problems of large errors in the identification and positioning of insulators in complex environments during aerial photography, and inaccurate detection of defective insulators, and difficulty in identifying small targets. This paper improved Faster Region-based CNN (Faster-RCNN) to improve the insulator positioning accuracy and defect detection precision. Firstly, the improved Region Proposal Network increased the accuracy of the identification and positioning of insulators with designing the residual module and performed multi-scale training. Secondly, the Intersection over Union (IoU) aware module improved the detection accuracy of defective insulators. Finally, compared with the unimproved FasterRCNN, this study improved the positioning accuracy of insulators by 4.88% and the detection of defective insulators by 8.91%. Keywords Faster-RCNN · Insulators · Deep learning · Object detection

14.1 Introduction Insulators have always been an important object of detection in power systems. Defective insulators could not only damage power equipment to affect the operation of the power system, but also cause significant economic losses and casualties evenly. In practical terms, the studies of insulator detection problems are necessary [1]. Li [2] realized the semantic segmentation of insulator strings in complex backgrounds. Zuo et al. [3] proposed to improve the segmentation effect of the images, and use morphological processing to locate the faults. They improved the ability to identify faults in complex environments. Wang [4] used the Fully Convolution Network (FCN) algorithm to set the training data with the insulator images removing the background area to build a You Only Look Once version 3 (YOLOv3) object detection model, improved mean Average Precision (mAP) by 4.65%. Liu and Huang Z. Jing (B) Shanghai Dianji University, Shanghai 200120, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_14

141

142

Z. Jing

[5] proposed a YOLOv4-based deep learning combined with improved watershed algorithms to identify insulator bodies and defect locations relatively quickly. Zijian et al. [6] proposed an image detection based on Efficient and Accuracy Scene Text (EAST) model and Hu invariant moments method. Ling et al. [7] proposed an insulator detection method based on feature pyramid and multi-task learning to improve detection accuracy. Liu et al. [8] used the improved YOLOv3 network in aerial image insulator detection in complicated background interference. In the above studies, the detection accuracy of insulators with complex backgrounds and small targets is not enough. Only one of insulator identification and fault diagnosis could be improved and affected by the environment largely. Therefore, this paper proposed a Faster Region-based CNN (Faster-RCNN) model with higher insulator positioning accuracy and higher fault detection accuracy. The feasibility of the detection model was demonstrated by experiments, which provides a strong reference for the live detection by unmanned aerial vehicles (UAV).

14.2 Sample Expansion A total of 1500 insulator images in this study were taken from networks and drones. When a new model trains a convolutional neural network, it is necessary to have a large amount of basic data. If the data is too small, the network overfitting phenomenon will occur. Which means that all the characteristics of the sample are easily fitted when training parameters, so that the convolutional network could not really learn the true data distribution and reduce the accuracy of the model. Therefore, this paper used data enhancement and image segmentation to expand the sample data, also manually annotated the sample and saved it as the Visual Object Classes (VOC) data format as seen, and created a sample set containing 4000 insulator images finally. Among them, the defective insulators have a ratio of 1:3 to normal, and the pixels of the picture were unified at 800 × 600. 70% of the base samples are the training data, 20% are the validation set, and the last 10% are the test sample. Since the original sample set contains fewer pictures of defective insulators, the positive and negative ratios of the training set are unbalanced, so the image segmentation method was used to set different backgrounds for the same defective insulators. As shown in Fig. 14.1, the defective insulators in the image are all generated by image segmentation fusion, and marked in red as the self-detonating defective part of the insulator. Figure 14.2 is data enhancement by mirror flipping, random cropping, color conversion, and adding noise to the sample image under the premise of retaining the integrity of the insulator in pictures. In order to increase the data of the color distortion phenomenon of the insulator under different lights and improve the robustness of the detection model, it is necessary to adjust the RGB image value of some images (that is, color conversion). When detecting defective insulators, the detection accuracy would be greatly affected by the scale of the original data, and the more data the better. At the same time, the original data needs to be labeled, then the quality of the label is also a

14 Insulator Detection Study Based on Improved Faster-RCNN

143

Fig. 14.1 Image segmentation of defective insulators

a Original image

d Color conversion

b Mirror flipping

e Color conversion

c Random cropping

f Adding noise

Fig. 14.2 Data enhancement of insulators

major influencing factor of the detection accuracy. This article used LabelImg in PASCAL VOC format to label sample images. Labeling should be brought into correspondence with the labeling rules strictly, in the meantime, it should be noted that the label box is as close as possible to the edge of the insulator. A single target was labeled separately. Only when the overlapping part of the target is not more than half, the two label boxes could overlap. The target insulators should be clear when they were marked. The label box could not go beyond the image boundary. Finally, the completed labeling data would be further checked to ensure the correctness of the data labeling.

144

Z. Jing

Fig. 14.3 Faster RCNN detection schematic

14.3 Insulator Identification and Positioning 14.3.1 Faster RCNN Detection Principle According to the testing principle, it could be known the detection process roughly. First of all, the input images would be pre-processed (flip, pixel transformation, etc.). Secondly, convolutional neural networks were used to extract image features. Thirdly, the Region Proposal Network (RPN) would generate candidate graphs. Fourthly, the feature diagram and the candidate box would extract the candidate box feature maps at the same time. Fifthly, the candidate box feature maps became the same format by Region of Interest (ROI) pooling layer. Finally, the fully connected layer would perform classification score and position regression processing on the feature box (Fig. 14.3).

14.3.2 Improved RPN RPN as all target candidate regions on the feature map, Its general detection setting scales are 1282, 2562, 5122 and aspect ratios are 1:1, 1:2, 2:1. A total of nine different anchors. The insulators are elongated objects with the unbalanced ratios of widths and heights. The proportions of the general insulators in the image data are relatively small, which leads to the unsatisfactory identification, positioning, and detection of insulators. Therefore, the proportion of candidate boxes needs to be adjusted to achieve better recognition and reduce the rate of missed detection. We have added

14 Insulator Detection Study Based on Improved Faster-RCNN

145

a set of 642 scale to the model, thus added three candidate boxes and expanded the proportion of candidate boxes.

14.3.3 Residual Networks (ResNet) Designed a residual module to ResNet-50 network by layering overlays. By comparing the Residual Networks 50 (ResNet-50) with the Visual Geometry Group 16 (VGG16) network, it could be found that the former had better effect on image feature extraction. The residual module is shown in Fig. 14.4. X is the input and F(X) is the function fitted to the residual module. Fig. 14.4 The residual module

146

Z. Jing

14.3.4 Multi-Scale Training The unmodified Faster-RCNN model is trained on data of consistent scale size, but in actual inspection, images are acquired by aerial photography. It is unrealistic to artificially adjust the images in insulators to have the same size, which would cause a missed inspection. So, we made new settings for the data, which are 480, 600, and 750. We used the multi-scale basic data to train a new model. In this way, a model would be built that could learn the characteristics of insulators of different sizes. The positioning accuracy of the model on the smaller insulators in the input image and the overall detection accuracy are further improved.

14.3.5 Comparison of Different Detection Methods Three detection models were compared with 300 data images (including missing and occluded insulators), which were based on whether the models could accurately identify the insulators. The results are in Table 14.1. From this Table 14.1, it is easy to see that the improved Faster-RCNN made the recall rate and accuracy have been significantly improved if we compared it with Faster-RCNN and YOLOv3. The increases in recall rate are 10 and 7.34%, and in accuracy are 4.88 and 3.82%.

14.4 Detection of Defective Insulators Aerial surveillance inspections of unmanned aerial vehicles have the problems of identifying small target insulators that are defective. To this end, this paper proposed an improved detection method for the original Faster-RCNN using multi-scale feature fusion and IoU-aware (as shown in Fig. 14.5). Compared region convolutional neural networks with fast regional neural networks, Faster-RCNN replaced Selective Search with RPN to generate suggestion boxes, and used the convolutional characteristics of the object detection network to reduce the recommendation windows to 300, so that the amount of computation is greatly reduced. Table 14.1 Insulator identification comparison results Network model

Right

Wrong

Missing

Recall rates/%

Precision/%

Average detection time/s

Faster-RCNN

224

34

76

74.67

86.82

0.75

YOLOv3

232

32

68

77.33

87.88

0.78

Improved faster-RCNN

254

23

46

84.67

91.07

0.83

14 Insulator Detection Study Based on Improved Faster-RCNN

147

Fig. 14.5 IoU-aware module

When the input image was extracted feature maps, the 3-layer feature maps extracted by ResNet-50 were firstly adjusted to the appropriate sizes. This step needed to go through a convolutional layer of 1 × 1, because the three-layer feature maps had different convolutional steps. They should pass the feature fusion channel from top to bottom (as shown in Fig. 14.6) dissolves. Before the fusion, it is also necessary to add pixel by pixel, and use bilinear interpolation operations to enlarge the higher layer feature map to the same size as the scale of the feature map to be fused. The feature maps of each layer were carried out the same, and the fusion was gradually realized. The resulting layer feature was added to a convolutional layer of 3 × 3 to reduce the feature confusion effect (generated by multiple up sampling operations). In the end, we got the feature map that is both higher resolution and rich semantic information, which improved the accuracy of the detection.

Fig. 14.6 Feature fusion structure from top to bottom (feature fusion module)

148

Z. Jing

Fig. 14.7 Faster-CNN block diagram with the addition of the IoU module

The improved Faster-RCNN block diagram is shown in Fig. 14.7. The algorithm used ResNet as the feature extraction network of Faster-RCNN, and then introduced the feature fusion structure using up sampled to the basis of the original Faster-RCNN network, so that deep features and shallow features could be fused. Then the resulting feature maps were produced with higher resolution and richer semantic information. And incorporated the IoU-aware module into the Faster-RCNN network to better match the improved score class fractions and positioning accuracy for more accurate detection results. As we known, the low performance of drone detection is due to the low correlation between the classification score and the positioning accuracy. So, we added the IoU-aware module to predict the anchor and increase mAP with least additional calculations. The final score is directly multiplied by the predicted IoU and the classification confidence. While maintained real-time performance, the accuracy was improved.

14.5 Improved Faster-RCNN In the above parts, we not only used the improved RPN, residual modules, and multi-scale training to improve the accuracy of Faster-RCNNss identification and positioning of insulators, but also improved Faster-RCNN’s detection rate of fault defects by multi-scale feature fusion and IoU-aware modules. A new two-stage object detection model with a network structure is shown in Fig. 14.8. First, the UAV is used for inspection and shooting to obtain the images containing insulators, that is, to locate the insulators. Then, the captured insulator images are cropped to remove useless information to make the signal-to-noise ratio better. Finally, the processed insulator images are input to the defect detection part for defect fault judgment.

14 Insulator Detection Study Based on Improved Faster-RCNN

149

Fig. 14.8 Improved two-stage object detection model

14.6 Experimental Verifications 14.6.1 Conduct a Comparative Experiment Based on the new two-section object detection network designed in this paper, the experiment used the self-expanded database above for insulator fault detection. ResNet-50, which has a good detection effect on small target insulators, is the main network. The multi-scale detection could prevent small targets missed inspection. At the same time, the IoU-aware module was added to achieve improvements in accuracy. The results were obtained. Due to the limited equipment, the learning rate was set to 0.0001, and the image scale is a uniform 416 × 416. Although the database has been enriched, the data is still relatively limited, so 40,000 iterations were set according to the actual setting. The experimental parameters are configured as shown in Table 14.2. Table 14.2 Experimental training parameters

Program parameters

Hardware parameters

Backbone net

RetNet-50

CPU

i7-9700H

Learning rate

0.0001

GPU

GTX1650Ti

Maximum training times

27,000

CUDA

CUDA10.0

Image size

416 × 416

Program environment

Tensor flow

150

Z. Jing

Fig. 14.9 Improved model-related loss graphs

Figure 14.9 contains the correlation loss graphs of the improved model. Figure 14.9a is the classification loss curve of the feature fusion model. Figure 14.9b is the improved target bounding box training loss plot. Figure 14.9c is the change classification of post-RPN loss plot. Figure 14.9d is an improved post-RPN network target box regression loss plot.

14.6.2 Compare Experimental Results Figure 14.10 is the loss curve for loss training about the improved model and the unimproved. In the figure, blue curve is the loss curve of the unmodified model, and red is the loss curve of the improved model. It could be seen that the improved network not only converges the loss function quickly, also has a small loss value for the network. Obviously, the improved model is better. As shown in Table 14.3, the first row belongs to the unimproved, and the second is the improved. It could be seen that the improved network has greatly improved the accuracy of insulator detection, which proved that the problem of losing the detection

14 Insulator Detection Study Based on Improved Faster-RCNN

151

Fig. 14.10 Improved front and rear model loss curve

Table 14.3 Improved model comparison Models

Basic net

Right rates (%)

Recall rates (%)

AP (%)

F1

Faster-RCNN

VGG-16

51.89

98.21

85.68

0.68

Improved faster-RCNN

ResNet-50 + tricks

94.74

85.71

94.59

0.90

details of small objects was correct and excellent in this paper with a top-down multiscale fusion method, and the detection accuracy is improved by 8.91%. It proved that the improved model is even better. On Fig. 14.11 is shown (the red circled part is the fault part): the unimproved can locate the insulator, but failed to identify the defects of the insulators. And b is the improved, shown the defect location of the insulators. C and d are examples of improved test results.

14.7 Summary In this paper, the accuracy of Faster-RCNN’s identification and positioning of insulators was improved by improving RPN, designing residual modules, and performing multi-scale training. The detection rate of defective insulators in the model was improved by adding the IoU-aware module. In the end, a new model with a better detection was built. Through experimental verification, compared with the uninterpreted model, this model could not only identify insulators in complex backgrounds and small target insulators, but also improve the accuracy of insulator positioning

152

Z. Jing

Fig. 14.11 Improvement before and after test results and examples

(4.88%) and the detection accuracy of defective insulators (8.91%). This study on the application of UAV aerial photography to the live detection of insulators could provide some practical reference.

References 1. Wang, Y.R., Yang, K., Zhai, Y.J., Guo, C.B.: Transmission line insulator identification based on artificial image data augmentation. J. Syst. Simul. 1–11 (2021) 2. Li, B.Y.: Research on semantic segmentation of insulators based on deep learning. Inf. Commun. 12, 113–115 (2020) 3. Zuo, Y., Liu, W., Ma, Y.Q., Guo, G.K., Tan, S.S.: Insulator defect detection based on improved grabcut. Comput. Eng. Des. 42(07), 2009–2015 (2021) 4. Wang, Z., Wang, Y.J., Wang, Q.Y., Kang, S.Q., Mikulovich, V.I.: Two-stage insulator fault detection method based on collaborative deep learning. Trans. China Electrotech. Soc. 36(17), 3594–3604 (2021) 5. Liu, Y., Huang, X.B.: Research on the detection and localization of insulator burst based on YOLOv4 and improved watershed algorithm. Power Syst. Clean Energy 37(07), 51–57 (2021) 6. Zhang, Z.J., Ma, J.E., Li, X.F., Fang, Y.T.: Insulator fault detection based on deep learning and Hu invariant moments. J. China Railway Soc. 43(02), 71–77 (2021) 7. Huang, L., Zhao, K., Li, J.D., Feng, H., Wang, Y.Q.: Insulator image detection based on feature pyramid and multi-task learning. Electric. Measur. Instrum. 58(04), 37–43 (2021) 8. Liu, C.Y., Wu, Y.Q., Liu, J.J., Sun, Z.: Improved YOLOv3 network for insulator detection in aerial images with diverse background interference. Electronics 10(7), 771 (2021)

Chapter 15

Citrus Positioning Method Based on Camera and Lidar Data Fusion Chengjie Xu, Can Wang, Bin Kong, Bingliang Yi, and Yue Li

Abstract There have been some research achievements in the field of citrus picking robot research, but there are problems of inaccurate positioning in the application scenarios of citrus picking robots. To solve this problem, a YOLO V3 convolutional neural network is used to detect citrus and locate the pixel coordinates in this paper. On the basis of joint calibration of the camera and lidar, the image and point cloud data were fused. A method for obtaining the depth value of the citrus center based on the fusion of camera and lidar data was proposed. The proposed method is verified in the experimental scene, and the positioning errors do not exceed 0.01 m, which provides a technical method for the research and development of citrus picking robots. Keywords Detection and positioning · Calibration · Data fusion

15.1 Introduction In recent years, as the disadvantages of an aging agricultural labor force and rising labor costs have become increasingly prominent, the country has begun to pay attention to the development of agricultural modernization [1]. In the citrus industry, manual picking is the main method, and there are some problems, such as high labor costs and high picking intensity [2]. To solve these problems, it is necessary to develop a citrus picking robot, but there are some difficulties in the development, such as fruit identification and positioning, path planning and mechanical arm design [3]. Citrus recognition and positioning systems are the basis of automatic picking robots, C. Xu · C. Wang · B. Kong (B) · B. Yi · Y. Li Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China e-mail: [email protected] C. Xu · B. Yi College of Mechanical Engineering, Chongqing Three Gorges University, Chongqing 404100, China C. Wang · B. Kong The Key Laboratory of Biomimetic Sensing and Advanced Robot Technology, Anhui Province, Hefei 230031, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_15

153

154

C. Xu et al.

and the recognition accuracy and speed of the system directly affect the subsequent picking work, so research on citrus recognition and positioning systems has very important theoretical value and practical significance. Traditional fruit recognition relies on image processing technology. Yin et al. extracted fruit color features by combining HIS, LCD, LAB, and other color spaces to achieve fruit image segmentation [4, 5]. Recently, with the rapid development of deep learning, agricultural robot researchers began to use convolutional neural networks to develop environmental sensing modules of agricultural robots [6, 7]. Zhu Mingxiu adopted the Faster R-CNN network framework to achieve target detection of fruit with high recognition accuracy [8]. In fruit positioning, a binocular vision system, RGB-D camera, and laser rangefinder are used to obtain target depth information [9]. Li et al. used an RGB-D camera as the sensor to simultaneously detect and locate fruit branches of multiple litchi clusters [10]. It is easily affected by the reflection of the target surface, and the depth image resolution and positioning accuracy are low. Hu used a Bumblebee2 binocular camera as depth acquisition equipment to locate the center point of citrus by using the trigonometry theorem [11]. Liu Yanping and Xiong Longye used a Kinect V2 depth camera to solve the 3D coordinates of citrus [12, 13]. Danyang used the projection method to determine the two-dimensional coordinates of the wolfberry image and then calculated the three-dimensional coordinates of the wolfberry through binocular stereo vision technology to achieve the positioning of the wolfberry [14]. However, the binocular vision system fails when the light is too strong or too weak. Kondo et al. utilized laser rangefinder to a variety of fruit picking robots to realize automatic positioning and grasping of fruit stems [15]. The limitation of laser rangefinder is that the measurement distance is too long to easily lead to the loss of focus. In the following part of this paper, the second part introduces the method of citrus detection and positioning explains the basic principle of camera and lidar joint calibration and data fusion. The third part shows the experimental results, including the detection effect of the citrus detection network, camera internal parameter calibration and camera and lidar joint calibration, and citrus positioning results. Finally, the research work is summarized and prospected.

15.2 Method 15.2.1 The Citrus Positioning Algorithm In terms of depth value acquisition, lidar can measure the distance between nodes by using the flight time between two asynchronous transceivers to provide accurate depth values and is not affected by environmental illumination. Compared with other sensors, lidar has certain advantages in real time and accuracy [16]. The depth of the citrus center was obtained by combining image and point cloud data. The realization

15 Citrus Positioning Method Based on Camera and Lidar Data Fusion

155

Fig. 15.1 Flow chart of citrus positioning algorithm

of citrus positioning is mainly divided into the following steps: (1) Camera internal parameter calibration, (2) The pixel coordinates of citrus were determined by a YOLO V3 convolutional neural network, (3) Camera and lidar joint calibration, (4) The point cloud was fused with the image by the internal and external parameters, and the point cloud was projected onto the image by the coordinate transformation matrix, (5) Obtain citrus center depth information and solve the three-dimensional space coordinates. The flow is as follows (see Fig. 15.1).

15.2.2 Preliminary Positioning of Pixel Coordinates of Citrus The output results of the YOLO V3 convolutional neural network include the following parts: category information of targets, probability of targets belonging to a certain category, and position information of each bounding box (center point coordinates tx , t y , Bounidng box length and height bh , bw ). Therefore, the YOLO V3 convolutional neural network can be used to obtain the pixel coordinates of the citrus center. The core algorithm idea of the YOLO V3 convolutional neural network is to input the whole image into the convolutional neural network and divide the image into S*S grid cell. When the target to be detected is in a grid, the grid is responsible for detecting the target, and the confidence level and bounding box of the target to be detected are given. The predicted conditional probability value of each grid (Pr(Class|Object)) is multiplied by the confidence truth value of the detected objects in the grid (Pr(Object) ∗ IOUtruth pred ) to obtain the confidence score of the detected objects belonging to a certain category: truth Pr(Class|Object) ∗ Pr(Object) ∗ IOUtruth pred = Pr(Classi ) ∗ IOUpred

(15.1)

After obtaining the confidence level score of all bounding boxes, thresholds were set to remove the results with low confidence level scores, and NMS was adopted to process the remaining bounding boxes to retain the output results with high confidence levels. The structure of the YOLO V3 convolutional neural network is as follows (see Fig. 15.2).

156

C. Xu et al.

Fig. 15.2 YOLO V3 convolutional neural network structure diagram

The above content uses the YOLO V3 convolutional neural network to complete citrus detection and obtain citrus center positioning information in pixel coordinates (tx , t y ), but the depth value of the center has not yet been given. The pixel coordinate system is transformed into a three-dimensional space coordinate system, and the key is to obtain the depth information of the citrus center.

15.2.3 Camera and Lidar Joint Calibration The internal parameters of the camera should be calculated before camera and lidar joint calibration. The relationship among coordinate points [X c , Yc , Z c ]T in threedimensional space, pixel coordinate points [u, v]T and camera internal parameters can be expressed as: ⎞⎛ ⎞ ⎛ ⎞ ⎛ Xc u f x 0 cx Z c ⎝ v ⎠ = ⎝ 0 f y c y ⎠⎝ Yc ⎠ 0 0 1 Zc 1

(15.2)

15 Citrus Positioning Method Based on Camera and Lidar Data Fusion

157

The pixel coordinate system is transformed into an image coordinate system after the axis scaling and origin translation. Suppose that the pixel coordinate system is scaled α times on the x axis and β times on the y axis, the origin translation is T  cx , c y , the focal length of the camera is f , f x is the product of α and f , f y is the product of β and f . The middle matrix of Eq. (15.2) is the internal parameter matrix of the camera. In order to correct the ideal projection model and make the camera internal parameter result more accurate, the nonlinear distortion of the camera lens is introduced. Suppose that the image coordinate p of point P in the coordinate system of threedimensional space is [x, y]T , [xdistored , ydistored ]T , which is the normalized coordinate of the point after the distortion. It is generally assumed that the distortion is polynomial, and r is the distance between point p and the origin of the coordinates. Relations are as follows:  xdistored = x 1 + k1r 2 + k2 r 4 + k3r 6

(15.3)

 ydistored = y 1 + k1r 2 + k2 r 4 + k3r 6

(15.4)

Two additional parameters are introduced to correct tangential distortion:  xdistored = x + 2 p1 x y + p2 r 2 + 2x 2

(15.5)

 ydistored = y + p1 r 2 + 2y 2 + 2 p2 x y

(15.6)

According to the above formula, the distortion of the camera lens can be expressed by five parameters (k1 , k2 , k3, p1 , p2 ). All the above parameters can be obtained by calibration. The basic principle model of camera and lidar joint calibration is as follows (see Fig. 15.3). Pixel coordinates collected by the camera are represented by [u, v]T , and point cloud coordinates collected by lidar are represented by [X, Y, Z ]T . The coordinate system transformation model is shown in Fig. 15.3, whose transformation relationship can be expressed as follows: ⎞ ⎛ ⎞ ⎛ ⎛ ⎞ X u f x 0 cx

R t ⎝ v ⎠ = ⎝ 0 f y cy ⎠ ⎝Y ⎠ 0 1 0 0 1 1 Z

(15.7)

f x , f y , cx , c y are the internal parameter matrix parameters of the camera, which were acquired by camera internal parameter calibration. R, t respectively represent the rotation and translation matrices of the relative pose of the camera and the lidar. The purpose of camera and lidar joint calibration is to solve the pose transformation parameters R, t.

158

C. Xu et al.

Fig. 15.3 Camera and lidar joint calibration model

In the process of camera and lidar joint calibration, due to the need to fit the calibration plate plane to minimum measurement error, it is essential to remove the noise points in the lidar point cloud as much as possible. To filter out the uninterested areas in the lidar point cloud data, rqt_reconfigure is used in this paper to dynamically adjust the limits of each coordinate in the lidar coordinate system and segment the ROI of point cloud data, which reduces the possibility of error detection and facilitates the fitting of the calibration plate plane. Even if there is no empty calibration field, camera and lidar joint calibration can be completed accurately. The area marked with white circles in the figure is where the demarcation calibration plate to be segmented is located (see Fig. 15.4).

Fig. 15.4 Point cloud region before segmentation

15 Citrus Positioning Method Based on Camera and Lidar Data Fusion

159

Fig. 15.5 ROI of point cloud data from different perspectives after segmentation

The ROI of point cloud data from different perspectives after segmentation is shown in the figure (see Fig. 15.5). After dividing the area where the calibration plate is located, plane fitting of the calibration plate point cloud is carried out. The fitting of the lidar point cloud adopts RANSAC (random sampling consistency algorithm), which generates candidate solutions by estimating the minimum number of observed values (data points) required by the basic model parameters and fits the calibration plate point cloud [17]. The algorithm requires that the probability p (generally set as 0.99) of at least one successful sampling meets the following relationship with the minimum sampling number m of the basic subset on the premise of ensuring the confidence level: p = 1 − (1 − u m ) N N=

log(1− p) log(1−u m )

(15.8) (15.9)

In the above formula, u m is the probability of successful sampling at all m points, and N is the number of samples required. In fact, the segmented point cloud does not lie on an exact plane. The segmented point cloud can be projected onto an exact plane using the Project Inliers function. The normal vector of the calibration plate plane can be obtained by obtaining the point cloud data after the fitting projection of the calibration plate, and the edge contour of the calibration plate can be fitted into a straight line by using the random sampling consistency algorithm again. After determining the four edges of the calibration plate point cloud, the four corners of the calibration plate point cloud plane are obtained through the line With Line Intersection function, and then the coordinates of the center point are calculated.

160

C. Xu et al.

Fig. 15.6 Calibration plate images and corresponding feature diagrams of point clouds at different positions

For the extraction of camera features, RGB images were first converted into grayscale images, and the find chess board Corners function was used to extract sub-pixel precision internal corner data of the calibration plate. Given the size information of the calibration plate and the internal parameters of the camera, the coordinates of the center point of the calibration plate and each edge corner point in the camera coordinate system and pixel coordinate system could be obtained. After the orientation of the calibration plate is determined, the plane normal vector of the calibration plate in the camera coordinate system is obtained. Feature extraction and fitting effects are shown in the figure (see Fig. 15.6). In Fig. 15.6, the red, yellow, green, and blue points represent the four corners of the calibration plate, the white points represent the central position of the calibration plate, and the arrow points to the normal vector of the calibration plate. The center point coordinates, plane normal vectors, and four corner point coordinates of the calibration plate in the lidar coordinate system and the camera coordinate system can be obtained by the above method. After collecting multiple sets of data at different positions, the transformation matrices of two coordinate systems can be obtained by using the pespective-n-point algorithm, and the objective function is constructed to optimize the external parameters of the sensor. The genetic algorithm has a better global search ability, is less liable to trap in the local optimal solution, and has a fast solution speed [18]. In this paper, a genetic algorithm in the literature [19, 20] is adopted to optimize the objective function, which is divided into two parts: the initial transformation estimation and final transformation estimation, and the optimized rotation and translation matrix are obtained after two optimizations. In the results, the R rotation matrix is expressed in the form of Euler angle, roll angle, pitch angle, and Yaw angle, and the t translation matrix is expressed in the translation of the x, y, and z directions.

15.2.4 Camera and Lidar Data Fusion Time synchronization. In a real scene, the camera and the lidar operate independently. Different sensors collect data at different frequencies, and the point cloud and

15 Citrus Positioning Method Based on Camera and Lidar Data Fusion

161

Fig. 15.7 Time synchronization diagram

image data collected are not synchronized in time. Therefore, time synchronization between the camera and the lidar is required. The frame rate of the lidar RS-LiDAR32 used in this paper is 10 Hz and that of the AVT Manta G-201C industrial camera is 30 Hz. The frame rate of data acquisition is an integer multiple, so the direct registration method is adopted for time synchronization. The lidar point cloud data with the low frame rate are taken as the benchmark to find the corresponding data in the camera image timestamp, as shown in the figure (see Fig. 15.7). Spatial synchronization. After time synchronization, the lidar and camera need to be converted to the same coordinate system for fusion point cloud data and image data. Combined with the external parameters and internal parameters, the positions of the two sensors are converted to the same coordinate system, and the point cloud data are projected onto the image by using the keyhole imaging model to realize the data fusion of the camera and lidar. The fused model includes both RGB information of the image and three-dimensional space coordinate information of the point cloud (x, y, z, RGB).

15.2.5 Conversion of Citrus Pixel Coordinates to Three-Dimensional Space Coordinates Suppose point P(X c , Yc , Z c ) is the three-dimensional space coordinate of the citrus center relative to the camera, and its corresponding coordinate in the pixel coordinate system is (u, v). After fusion of the image and point cloud, the interior of the bounding box is endowed with depth value information by lidar. The depth value Z c of point P can be obtained by lidar, and the corresponding pixel coordinates of point P (u, v) are the bounding box center points (tx , t y ) of citrus output by YOLO V3 convolutional neural network detection. Based on the above information, X c , Yc can be expressed as follows: ZC (tx − cx ) fx  YC = Zf Cy t y − c y

XC =

(15.10)

162

C. Xu et al.

Fig. 15.8 Detection effect of citrus by the YOLO V3 convolutional neural network

15.3 Experiments 15.3.1 Environment of System All experiments were performed in the following environment: Intel i7-8700, Nvidia RTX2080ti, Ubuntu16.04, 64 gb of memory, 64-bit systems.

15.3.2 Detection Effects of Citrus A total of 2000 citrus images in the natural environment were collected to train the YOLO V3 convolutional neural network. After the training, the detection effect is shown in the figure (see Fig. 15.8).

15.3.3 The Results of Camera Internal Parameter Calibration The results of camera internal parameter calibration are shown in the following Table 15.1. The reprojection error is introduced in the error evaluation method, that is, the position error of the projection of pixel coordinates and spatial coordinates according to the estimation transformation. The smaller the reprojection error is, the higher the calibration accuracy is. According to the Table 15.1, the minimum reprojection error is 0.14 pixels, which shows a good calibration effect. The internal parameters of the sample 5 cameras were used in subsequent experiments. The reprojection error of each image in sample 5 is shown in the figure (see Fig. 15.9).

15.3.4 The Results of Camera and Lidar Joint Calibration The installation of the camera and lidar in this experiment is shown in the figure (see Fig. 15.10).

15 Citrus Positioning Method Based on Camera and Lidar Data Fusion

163

Table 15.1 The results of camera internal parameter calibration Parameters\samples

Sample 1

Sample 2

Sample 3

Sample 4

Sample 5

f x (unit:pixel)

2758.5

2758.7

2758.1

2761.1

2757.9

f y (unit:pixel)

2759.0

2759.3

2758.6

2761.9

2758.1

cx (unit:pixel)

821.7

821.9

821.4

821.5

821.5

c y (unit:pixel)

619.3

619.5

619.2

618.9

619.3

k1

−0.067486 −0.070675 −0.065030 −0.062040 −0.068607

k2

0.192665

0.187366

0.159822

0.138639

0.160230

k3

0

0

0

0

0

p1

0.000460

−0.000270 0.000076

p2

0.000782

0.001073

0.000217

0.000144

0.000111

0.22

0.19

0.24

0.14

Reprojection error (unit:pixel) 0.16

−0.000534 −0.000262

Fig. 15.9 Reprojection error diagram of each sample

The results of five groups of camera and lidar joint calibration are shown in the Table 15.2. After the external parameters were solved, the point cloud was projected onto the image, and the coincidence degree of the object contour and the corresponding point cloud information was observed to judge the quality of the camera and lidar joint calibration results. The projection effect reflects that the point cloud overlap degree of the corresponding images in the first group is high, and the fusion visualization effect is shown as follows (see Fig. 15.11). In Fig. 15.11, different color point clouds represent different depth information. It can be seen from the Fig. 15.11 that the lidar point clouds can basically keep in coincidence with the contour of the object in the image. The projection effect shows

164

C. Xu et al.

Fig. 15.10 The installation of camera and lidar in this experiment

Table 15.2 The results of camera and lidar joint calibration Parameters\samples

The first group

The second group

The third group

The fourth group

roll(unit: degree) pitch(unit: degree)

The fifth group

−1.5003

−1.59081

−1.58799

−1.57690

−1.58447

0.0115488

0.022189

0.0205253

−0.0128841

−0.0104183

yaw(unit: degree)

−1.39631

−1.41331

−1.40991

−1.40381

−1.4098

x (unit: m)

0.0486672

0.0381683

0.0483855

0.0594681

0.0584105

y (unit: m)

0.022725

0.0492618

0.0403363

0.0300035

0.045841

z (unit: m)

−0.117128

−0.0839367

−0.0928496

−0.118581

−0.0983206

Fig. 15.11 Visual rendering of fusion of image and point cloud

that the external parameter solution results are good, and the feasibility of the fusion algorithm is verified.

15 Citrus Positioning Method Based on Camera and Lidar Data Fusion

165

Fig. 15.12 Visual rendering of citrus fusion detection and positioning

15.3.5 The Results of Citrus Positioning After the point cloud is fused with the image information, the depth information in the citrus detection frame is obtained, and its visual effect is shown in the figure (see Fig. 15.12). In Eq. (15.2), f x , f y , cx, c y was solved by 3.3 camera internal parameter calibration, tx , t y was the YOLO V3 convolutional neural network output parameter, and Z c was measured by lidar. After X C , YC was calculated, and the three-dimensional space coordinates of the citrus center were obtained to realize citrus positioning. By substituting the calibration results of the internal parameters in Table 15.1 into Eq. (15.6), the following can be obtained: XC = YC =

ZC (tx − 821.5) 2757.9 ZC 2758.1

 t y − 619.3

(15.11)

The three-dimensional space coordinates of the five position states in Fig. 15.12 are calculated and solved as follows (Table 15.3). After collecting and measuring 50 groups of data, its position error distribution is shown in the figure (see Fig. 15.13). The graph clearly shows that the error between the calculated value and the real value does not exceed 0.01 m. The mean error is 0.00582 m in the X c direction and 0.0058 m in the Y c direction. This can basically meet the positioning accuracy requirements of citrus picking. The error mainly comes from the camera’s internal parameter calibration error and subjective factors during manual measurement.

166

C. Xu et al.

Table 15.3 Solution results of citrus coordinates Coordinates\citrus samples

Sample1

Sample2

Sample3

Sample4

Sample5

tx (unit : pixel)

435

657

640

1214

1457

t y (unit : pixel)

186

183

600

580

561

Z C (unit : m)

0.970

0.970

0.970

0.831

1.176

X C (unit : m)

−0.136

−0.058

−0.064

0.118

0.271

YC (unit : m)

−0.152

−0.153

−0.007

−0.012

−0.025

X C measured value(unit : m)

−0.131

−0.052

−0.059

0.114

0.264

YC measured value(unit : m)

−0.145

−0.145

0

−0.006

−0.018

X C error value(unit : m)

0.005

0.006

0.005

0.004

0.007

YC error value(unit : m)

0.007

0.008

0.007

0.006

0.007

Fig. 15.13 Error distribution diagram

15.4 Conclusion In this paper, a YOLO V3 convolutional neural network was used for citrus target detection, and the detection results and pixel coordinates of citrus were obtained. On the basis of the fusion of camera and lidar data, the point cloud and projection on the image are endowed with depth information to achieve the solution of the citrus target three-dimensional space position. Compared with the depth map, the depth value information of point cloud data is more accurate, less processing, and less affected by light and has certain advantages in positioning time and accuracy, which is more in line with the technical requirements of citrus picking robots. This method can also be applied to the identification and positioning system of other kinds of picking robots and has good generalization ability. However, the lidar used in this method is expensive. In addition, it is difficult to locate citrus in complex scenes. In future research, we should focus on solving the occlusion and overlapping positioning problems of citrus.

15 Citrus Positioning Method Based on Camera and Lidar Data Fusion

167

Acknowledgements This work is supported by the following funds: 1. Youth Spark Project of the Dean Fund of Hefei Institutes of Physical Science, CAS: Research on Heterogeneous multi-sensor synchronization and deep data fusion. (Project No.: YZJJ2020QN20). 2. The Open Fund of Artificial Intelligence + Intelligent Agriculture Science Group of Chongqing Three Gorges University. (Project No.: ZNNYKFA201902).

References 1. Zhu, M.X.: Research on fruit detection and positioning of picking robot based on image processing and convolutional neural network. J. Agric. Mech. Res. 44(04), 49–53 (2022). (In Chinese) 2. Wei, B., He, J.Y., Shi, Y., Jiang, G.L., Zhang, X.Y., Ma, Y.: Design and experiment of underactuated citrus end-effector. Trans. Chinese Soc. Agric. Mach. 52(10), 120–128 (2021). (In Chinese) 3. Wang, J., Wang, R.R., Li, X.H.: Research on target recognition method of tomato picking robot. Jiangsu Agric. Sci. 49(20), 217–222 (2021). (In Chinese) 4. Yin, J., Mao, H., Zhong, S.: Segmentation methods of fruit image based on color difference. J. Commun. Comput. 6(7), 40–45 (2009) 5. Cubero, S., Aleixos, N., Albert, F., et al.: Optimized computer vision system for automatic pregrading of citrus fruit in the field using a mobile platform. Precision Agric. 15(1), 80–94 (2014) 6. Yang, D., Li, H., Zhang, L.: Study on the fruit recognition system based on machine vision. Adv. J. Food Sci. Technol. 10(1), 18–21 (2016) 7. Bargoti, S., Underwood, J.P.: Image segmentation for fruit detection and yield estimation in apple orchards. J. Field Robot. 34(6), 1039–1060 (2017) 8. Zhu, M.X.: Research on fruit detection and positioning of picking robot: based on image processing and convolutional neural network. J. Agric. Mech. Res. 44(04), 49–53 (2017). (In Chinese) 9. Tang, Y., Chen, M., Wang, C., et al.: Recognition and positioning methods for vision-based fruit picking robots: a review. Front. Plant Sci. 11, 510 (2020) 10. Li, J., Tang, Y., Zou, X., et al.: Detection of fruit-bearing branches and positioning of litchi clusters for vision-based harvesting robots. IEEE Access 8, 117746–117758 (2020) 11. Hu, Y.C.: Research on Target Recognition and Positioning Method of Citrus Picking Robot in the Natural Environment. Chongqing University of Technology, Chongqing, China (2018).(In Chinese) 12. Liu, Y.P.: Research on Obstacle Identification and Positioning method of Citrus Picking Robot. Chongqing University of Technology, Chongqing, China (2019).(In Chinese) 13. Xiong, L.Y.: Study on Classification Recognition and Positioning Method of Ripe Citrus Fruit Under Natural scene. Chongqing University of Technology, Chongqing, China (2020).(In Chinese) 14. Zhao, D.Y.: Identification and Positioning of Barbarum Based on Binocular Vision. Anhui Agricultural University, Anhui, China (2020).(In Chinese) 15. Kondo, N., Yamamoto, K., Shimizu, H., Yata, K., Kurita, M., Shiigi, T., et al.: A machine vision system for tomato cluster harvesting robot. Eng. Agric. Environ. Food 2(2), 60–65 (2009) 16. Guo, H., Li, M.Q.: Design of automatic ranging and positioning experiment Platform based on UWB. Educ Modern. 6(41), 87-89+101 (2019). (In Chinese) 17. Zou, P., Li, B.Z., Gong, J.X., Yang, J.G.: Improved RANSAC point cloud segmentation algorithm and its application. Mach. Des. Manuf. 11, 121–124 (2020). (In Chinese)

168

C. Xu et al.

18. Li, R.Z., Han, Z.S.: Parameter extraction of BSIM SOI model based on genetic algorithm. J. Semicond. 26(08), 1676–1680 (2005). (In Chinese) 19. Mitchell, M.: An Introduction to Genetic Algorithms. MIT press, Massachusetts (1998) 20. Verma, S., Berrio, J.S., Worrall, S., et al.: Automatic extrinsic calibration between a camera and a 3D Lidar using 3D point and plane correspondence. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). pp. 3906–3912. New Zealand (2019)

Chapter 16

Comparative Analysis of Automatic Poetry Generation Systems Based on Different Recurrent Neural Networks Lichao Wang

Abstract The work concerning with natural language processing covers a wide range of topics, including machine translation, sentiment analysis, and automatic text generation. Among them, automatic text generation is very important for specific projects such as automatic summarization and automatic question and answer. In addition, different from the general automatic text generation work, poems have certain requirements in semantic expression and syllable coordination. Completing this work can further improve the semantic expression ability of automatic text generation projects and render the generated text an aesthetic feeling. This article wants to explore the effect of various recurrent neural network structures, such as basic recurrent network (RNN), long short-term memory (LSTM), gated recurrent units (GRU), in handling this work. Within the model, a word embedding layer, a recurrent neural network layer and a fully connected layer are used. Comparing the loss value decline curve and the generated verses under the same data set, loss function and parameter update method finally came to the conclusion: in the training process, the models with LSTM and GRU as the core module can effectively reduce the loss value and generate verses with rhyme satisfying the poetry requirement, and the semantics is basically reasonable; as for the model with RNN as the core module, the loss value fluctuates, and the generated verses elementarily meet the standard of rhyme and semantics. Keywords Automatic poetry generation · RNN · LSTM · GRU

16.1 Introduction Automatic text generation is one of the most important research directions in the field of natural language processing. Automatic text generation has a wide range of applications in production and daily life, such as automatic dialogue, automatic question and answer, and automatic summary. Automatic poetry is an important part L. Wang (B) School of Computer and Technology, Beijing Institute of Technology, Beijing, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_16

169

170

L. Wang

of automatic text generation, and its main purpose is to use computer system to automatically generate verses that are consistent with the quality of human writing [1]. Automatic poetry generation emphasizes the rhyme of sentences on the basis of automatic text generation [2]. Studying this specific aspect can broaden the application field of automatic text generation and make the articles generated by the current automatic text generation system more aesthetic [3]. Therefore, many domestic and foreign researchers have focused on this area, and some of their results have been widely used in production systems. These studies are mainly divided into two categories: one is the generation of poetry using deep learning models, like the model designed by Yang et al. in 2019 [4], and the other is the generation of unsupervised style poetry based on mutual information, like the model designed by Shakeeb et al. in 2021 [5]. At present, most of the papers focus on the construction of deep learning-based models. These models are basically derived from the encoder-decoder framework [6]. The encoder and decoder are composed of a large number of recurrent neural network layers (Cells), part of which introduced the attention mechanism [7]. A small number of combined models are based on multimodal analysis that combines sentences and pictures [8]. Few researchers focus on the basic elements, neural network modules, which make up the recurrent neural network. However, a suitable basic module can greatly reduce the amount of calculation and improve the effect of the model’s output of verses. Therefore, the theme of this article is exploring the performance of three basic recurrent neural network modules (RNN, LSTM, GRU) in automatic verse generation. Through research, it is found that LSTM and GRU modules can effectively reduce the loss value. The loss value of RNN is increased in the early epochs and fluctuates afterwards. The model based on LSTM and GRU modules can generate verses with rhyme that meets the requirements of poetry and the semantics is basically reasonable; while the verse generated by the RNN module basically meets the requirements for rhyme, but is slightly worse than LSTM and GRU, and is basically reasonable in semantics. The following parts of this article will describe the following: (2) Problem Formation (3) The Invariant Testbed (4) The Internal Logic of RNN modules. (5) Results (6) Analysis and Expectation.

16.2 Problem Formation This section aims to explain the objectives and implementation methods of the comparative experiment. The goal of this experiment is to compare the performance of the three recurrent neural network modules of RNN, GRU, and LSTM as the core module of the automatic poetry generator in terms of loss value reduction, the semantics of the generated verse, and the quality of rhyme. In order to accomplish the above goals, this experiment contains three sub-experiments. These three sub-experiments all include the selection and processing of data sets, the construction and training

16 Comparative Analysis of Automatic Poetry …

171

of a model based on a single recurrent neural network cell, and the use of models to generate verses. In order to ensure that the results of the comparison experiment are objective, the only difference among the three sub-experiments is the selection of the recurrent neural network cell in the model.

16.3 The Invariant Testbed The three main parts of the experiment have been introduced above, and this section mainly explains the invariants of the experiment. Firstly, the data set is a corpus of 9000 five-character quatrain verses. After the data is read in, each word is encoded as a specific decimal number, and the mapping relationship is the same in the three subexperiments. Secondly, the structure of the model is a combination of an embedding layer, a recurrent neural network module, and a fully connected layer (output layer), where the weight matrix in the embedding layer is initialized to a matrix of all 1 s, the weight matrix of the fully connected layer is initialized to a matrix of all 1 s, and the bias vector is initialized as an all-zero vector. The model structure is shown in Fig. 16.1. During training, the loss function selects the cross-entropy loss function, the parameter optimization method selects the Adam method [9], and the epoch is 10 for all three sub-experiments. When generating verses, preheating methods are used, and all three models were applied with the same half-line verse as preheating. After defining the above variables as invariants, the experimental results can objectively reflect the influence of the three recurrent neural network modules on the model.

16.4 The Internal Logic of RNN Modules This section illustrates the internal logic of the three recurrent neural network modules. The internal calculation logic of RNN is divided into three steps. The first step is to determine the influence of the hidden state of the previous time step (H t-1 ) on the hidden state of the current time step (H t ). The second step is to determine the influence of the input of the current time step (X t ) on the hidden state of the current time step. This trade-off is achieved through a weight matrix W and a bias vector b. For the input of the current step, W 2 is the weight matrix and b2 is the bias vector. For the hidden state of the previous step, W 1 is the weight matrix and b1 is the bias vector. The third step is to add the two matrices derived from the first two steps, and activate the sum through tanh function to generate the hidden state H t at current time step. Its formula is expressed as: Ht = tan h(X t W2 + b2 + Ht−1 W1 + b1 )

(16.1)

As for LSTM, the first feature is that it implements three logic gates on the basis of RNN to control the following three behaviors: input I, output O, and forget F [10].

172

L. Wang

Fig. 16.1 General model structure for all three sub-experiments. h(t−1) and h(t) are the hidden states vector of the recurrent neural network at two adjacent time steps

The three internal logic gates are consistent with the operation logic of RNN, and the formulas are:   It = σ X t Wx f + Ht − Whi + bi

(16.2)

  Ft = σ X t Wx f + Ht − Wh f + b f

(16.3)

Ot = σ (X t Wxo + H t − Who + bo )

(16.4)

W xi , W xf , W xo and W hi , W hf are weighted parameters, bi , bf , bo are bias parameters for input, forget, and output respectively, σ means the sigmoid function.

16 Comparative Analysis of Automatic Poetry …

173

Another feature is that LSTM module introduces the cell state C, and the C t-1 of the previous time step is directly used as the input of the current time step, thus reflecting the dependence of the subsequent time step of the LSTM module on the previous time step. The C t of the current time step is determined by C t-1 and the newly generated candidate cell state C_hatt . The weight matrices of C t-1 and C_hatt come from the output of the forget gate and the input gate. The calculation formulas of C_hatt and C t are: Ct = Ft  Ct − 1 + It  C_hatt

(16.5)

C_hatt = tan h(X t Wxc + Ht−1 Whc + bc )

(16.6)

W xc , W hc are weighted parameters for time step t and t − 1 respectively, bc is bias parameter. The calculation formula of the hidden state H t is: C t is the cell state at time step t, Ot is the output vector at time step t Ht = Ot  tan h(Ct ) The advantage of the LSTM module is to retain long-term sequence information and reduce the problem of gradient disappearance. The internal operation logic of GRU also relies on logic gates. Compared with LSTM, the number of parameter matrices of the GRU module is reduced to two (reset gates, update gates) [11]. The calculations corresponding to the reset gate R and the update gate Z of the GRU are as follows: Rt = σ (X t Wxr + Ht − Whr + br )

(16.7)

Z t = σ (X t Wx z + Ht − Whz + bz )

(16.8)

X t is the input to the t time step, H t−1 is the hidden state of the previous time step, W xr , W xz and W hr , W hz are weight parameters, br , bz are bias parameters. Unlike the LSTM module, because the GRU module cancels the unit state C, the output of the reset gate is directly used when the candidate hidden state H_hat is generated. The formula is: H _hatt = tan h(X t Wxh + (Rt  Ht−1 )Whh + bh )

(16.9)

H_hatt is the candidate hidden state at time step t, W xh and W hh are weight parameters, and bh is bias term. The generation of the hidden state of the current time step needs to weigh the hidden state of the previous time step and the newly generated candidate hidden state. The formula is:

174

L. Wang

Ht = Z t  Ht−1 + (1 − Z t )  H _hatt

(16.10)

Z t is the update gate at time step t. The advantage of the GRU model is that it simplifies the calculation while basically ensuring the same functionality as the LSTM.

16.5 Results In this experiment, the decline curve of loss value, the semantic rationality of human judgment, and the standardization of rhyme are selected as the evaluation criteria of the three models. The verses used in the preheating test are also five-character quatrains, and the three models use the same verse to ensure the objectivity of the comparison. The verse is shown as following: The appearance pales every day, missing someone in solitude (Fig. 16.2) Figure 16.3 shows the declining curve of loss value during training the model with GRU as the core module, Fig. 16.4 shows the declining curve of loss value during training the model with RNN as the core module and Fig. 16.5 shows the declining curve of loss value during training the model with RNN as the core module. The performance of decreasing the loss value during the training process can be concluded

Fig. 16.2 The preheating verses

Fig. 16.3 Declining curve of loss value (GRU)

16 Comparative Analysis of Automatic Poetry …

175

Fig. 16.4 Declining curve of loss value (RNN)

Fig. 16.5 Declining curve of loss value (LSM)

from the three images: the model with GRU as the core module and the model with LSTM as the core module can effectively reduce the loss value, and the model with RNN as the core module will make the loss value rise in the first epoch and fluctuate afterward. For Figs. 16.3, 16.4 and 16.5, the X-axis means the number of epochs, Y-axis means the loss value. Figure 16.6 shows the verses generated by the model with GRU as the core module, Fig. 16.7 shows the verses generated by the model with RNN as the core module and Fig. 16.8 shows the verses generated by the model with LSTM as the

176

L. Wang

In a prosperous dynasty, one writes in depression, still watching the flowers, in the blossom period in which the emperor leads the army as usual. Fig. 16.6 Verses generated by GRU

The palace is frozen and the flowers are covered with frost, spring finally comes, the wagon crashed along with the beads, only remains the Luolan flower twigs and stones. Fig. 16.7 Verses generated by RNN

The fairy flies in the air, with relaxing mood, sometimes dive into the clouds, and do this as an everyday routine. Fig. 16.8 Verses generated by LSTM

core module. The rhyming and semantic performance of the generated verses could be concluded from the three images: the models of LSTM and GRU modules can generate verses with rhymes that meet the requirements of poetry and the semantics are basically reasonable; while the verses generated by the RNN module basically meet the requirements on rhymes, but slightly worse than LSTM and GRU, which is basically reasonable in terms of semantics.

16.6 Analysis and Expectation In the task of automatically generating verses using single recurrent neural network cell, the model with GRU performs the best in reducing the loss value and the quality of the generated verses. Thus, we can refer that GRU could be used as the basic module in both deep neural network model and encoder-decoder model. Although the experiment indicates the best candidate of basic module, it does not adjust its hyperparameters to achieve the best performance of itself. In addition, the module can also be changed to bidirectional, and in the case of bidirectional, whether the optimal module will be changed is not determined currently.

16 Comparative Analysis of Automatic Poetry …

177

References 1. Liu, Y., Liu, D., Lv, J.: Deep Poetry: A Chinese Classical Poetry Generation System. In: Proceedings of the AAAI Conference on Artificial Intelligence, 34(09), pp. 13626-13627 (2020) 2. Še¸la, A., Plecháˇc, P., Lassche, A.: Semantics of European poetry is shaped by conservative forces: the relationship between poetic meter and meaning in accentual-syllabic verse. Sep 2021, arXiv:2109.07148 [cs.CL] (2021) 3. Shen, L., Guo, X., Chen, M.: Compose like humans: jointly improving the coherence and novelty for modern chinese poetry generation. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE (2020) 4. Yang, Z., Cai, P., Feng, Y., Li, F., Feng, W., Chiu, E. Y., Yu, H.: Generating classical chinese poems from vernacular Chinese. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, pp. 6155–6164, NIH Public Access (2019) 5. Mukhtar, S.A., Joglekar, P.S.: Urdu and Hindi Poetry Generation using Neural Networks. arXiv: 2107.14587 [cs.CL] (2021) 6. Hu, J., Sun, M.: Generating major types of chinese classical poetry in a uniformed framework, arXiv:2003.11528 [cs.CL] (2020) 7. Zhang, X., Sun, M., Liu, J., Li, X.: Lingxi: a diversity-aware chinese modern poetry generation system, arXiv:2108.12108 [cs.CL] (2Ð021) 8. Liu, Y., Liu, D., Lv, J., Sang, Y.: Generating Chinese poetry from images via concrete and abstract information. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, IEEE (2020) 9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 [cs.LG] 10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 11. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv:1412.3555 [cs.NE] (2014)

Chapter 17

Grid False Data Intrusion Detection Method Based on Edge Computing and Federated Learning Yiying Zhang, Yiyang Liu, Nan Zhang, Delong Wang, Suxiang Zhang, and Yannian Wu Abstract With the development of science and technology, traditional power systems are transforming to smart grids. At the same time as the transformation, many cyber-attacks also followed. The recent network attack is a false data injection attack (FDIA), which can use the learned power system configuration knowledge to launch an attack. Injecting false data into the state variables causes the state variables to produce an error state, which causes the data collection and monitoring system to make wrong decisions, which seriously threatens the safety of the power grid. In summary, this paper proposes a new detection method, which combines edge computing and federated learning. Split the system into multiple subsystems, and set edge node detectors in the subsystems to collect and detect data. Combined with deep learning, this method constructs CNN-LSTM model detector to extract the characteristics of data, and combined with federal learning method, trains the model locally to realize efficient and low delay FDIA detection. Keywords Edge computing · Federated learning · False data injection attacks · Intrusion detection · Deep learning

17.1 Introduction False Data Injection Attack (FDIA) is a latest network attack. It’s an attack on smart grid state estimation. This attack can use the bad data detection vulnerability in the management system to destroy the results of state estimation in the power grid. This attack method is very concealed [1]. Compared with traditional physical attacks, it can launch multiple attacks without being discovered. Therefore, this attack has Y. Zhang · Y. Liu (B) · N. Zhang · D. Wang Tianjin University of Science and Technology, Tianjin 300457, China e-mail: [email protected] S. Zhang State Grid Information and Telecommunication Branch, Beijing 100000, China Y. Wu Shenzhen Guodian Technology Communication Co., Ltd., Shenzhen 518000, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_17

179

180

Y. Zhang et al.

a significant impact on the health of the smart grid. In addition, the distribution network has the characteristics of complex network topology and low redundancy, and there is a greater hidden danger of network attacks. The power grid is an important infrastructure of a country. Once the power grid is attacked, it will cause economic losses and even greater hidden dangers to the country. Therefore, FDIA’s defense and intrusion detection have become a new challenge to ensure the information security, resilience and economic operation of the power system [2]. In the operation of power systems, traditional detection methods are no longer applicable to FDIA. Therefore, the focus and significance of the research lies in how to develop efficient and accurate detection methods based on FDIA’s attack characteristics and methods. This article presents an intrusion detection method related to edge computing and federated learning. Edge node detectors are set on the edge of the power grid. Unlike traditional methods, this method combines deep learning to build a CNNLSTM model detector to extract data. The features are combined with the federated learning method to train the model locally.

17.2 Research Status At present, many researchers have carried out research on FDIA detection technology and have achieved good results. In the literature [3], an FDIA detection method based on Kalman filter is proposed. Literature [4] proposed a detection method based on extreme gradient boosting combined with unscented Kalman filter. In recent years, due to the vigorous development of artificial intelligence, the combination of machine learning and deep learning in FDIA detection technology has become a trend [5]. The newly proposed method can effectively cope with the ever-increasing amount of real-time grid data, and has a greater improvement compared with the traditional detection methods in the past. The paper [6] proposed an FDIA detection method. This method is based on clustering algorithm and state prediction detection method. It uses the internal relationship between nodes and FDIA to identify vulnerable nodes, and uses clustering algorithm to classify and classify vulnerable nodes. Finally, the FDIA detection is completed by the state prediction detection method. The paper [7] proposed a detection model based on deep learning for FDIA attacks in smart meters. Through pattern recognition of historical measurement data, it is judged whether it has been attacked by FDIA. In the literature [8], two detection techniques based on machine learning are proposed. One is to train SVM using supervised learning with labeled data; the other is unsupervised learning without training data. In the paper [9], an attack detection method based on distributed event trigger mechanism and machine learning technology is designed.

17 Grid False Data Intrusion Detection Method Based …

181

17.3 Principles of False Data Injection Attacks Principle of false data injection attack as early as 2009, Liu et al. [10] presented a model for false data injection attack. The scholar presented that FDIA can take advantage of the shortcomings of the bad data detection module in the energy management system. The result of the state estimation can be modified at will and will not be found by the bad data detection module of the state estimation. This attack poses a serious threat to the safety of the power grid.

17.4 Design of Intrusion Detection Model Based on Edge Computing and Federated Learning 17.4.1 Edge Computing Since 2014, people from all walks of life have paid more and more attention to Edge Computing. So far, there is no unified definition of what is edge computing. But its essence is that part of the functions of the cloud center are distributed to the edge of the network, so that they are closer to the user, and the functions can be processed nearby. In the context of the smart grid, due to the rapid increase in data generated by users, the pressure on data centers to process massive amounts of data has also increased, and edge computing can share the task of computing data. It improves the efficiency of data processing and can protect private data and improve data security. Compared with cloud computing, the distributed processing of edge computing is more suitable for IoT applications and promotes the development of smart grid security defenses.

17.4.2 Federated Learning Federated Learning (FL) is a distributed training method that uses data sets scattered across all parties to fuse multiple data information through privacy protection technology to collaboratively build a global model. In the process of model training, each participant can exchange information about the model (for example, model parameters), and the exchange method can be plain text, data encryption, and so on. However, the local training data is only stored locally and will not be shared with other participants. This exchange does not expose local user data, which reduces the possibility of data leakage. Each data participant can share and deploy the trained federated learning model. The federated learning model is shown in Fig. 17.1.

182

Y. Zhang et al.

Fig. 17.1 Federated learning model

17.4.3 Framework Based on Edge Computing and Federated Learning This article presents a detection method related to edge computing and federated learning. Collect data at edge nodes and combine with federated learning, and use local data for learning. After learning, you will get the locally updated model parameters and upload them to the central server. After receiving the model parameters uploaded by all participating edge nodes, the central server performs aggregation. After the aggregation is completed, the model parameters for the next learning of each participating edge node will be generated until the entire federated learning model converges. The trained federated learning model can be shared and used by all participating edge nodes. The basic functions of this system are as follows: the relevant data of the smart meter is collected and stored by the edge node, in the edge node, the real-time data is processed, and the historical data stored in the edge node is trained locally through federated learning. The trained relevant parameter information is uploaded to the server which aggregates it. The server sends the aggregated model to the edge node. After the edge node receives the trained model, it updates the local model and starts detection. Compared with traditional detection methods, setting up edge nodes can improve processing efficiency, reduce bandwidth pressure in network transmission, and speed up data analysis and processing. At the same time, edge computing is combined with federated learning to conduct model training locally, and the data

17 Grid False Data Intrusion Detection Method Based …

183

Fig. 17.2 Detection framework based on edge computing and federated learning

owned by each participating edge node will not leave the node, which can reduce the leakage of private data at the edge. This framework can improve the real-time detection capability of the power grid. The detection framework is shown in Fig. 17.2.

17.4.4 CNN-LSTM Joint Detection Model 17.4.4.1

Convolutional Neural Network

Convolutional neural network (CNN) is one of the core algorithms of deep learning. It automatically trains and optimizes the structure of the network model by inputting text, images and other data information to realize functions such as classification and prediction. One of the simplest neural network structure is composed of input layer, hidden layer and output layer. Each layer of network is composed of many neurons. The neurons in the upper layer are mapped to the neurons in the next layer through the activation function, and each neuron has a corresponding weight, and the output

184

Y. Zhang et al.

is the final classification. Then the convolutional neural network used in this article includes a data input layer, a convolutional calculation layer, a ReLU excitation layer, a pooling layer, a fully connected layer and an output layer. This paper uses the CNN algorithm to extract the features in the data collected by the edge nodes.

17.4.4.2

Long and Short-Term Memory Neural Network

Long short-term memory neural network (LSTM) is specially used to solve the problem of gradient disappearance and gradient explosion caused by errors in RNN. The neural unit is composed of forget gate, input gate and output gate. It has good learning ability for time series data and is widely used in predictive models. LSTM generally contains three gates at each sequence time t: forget gate, input gate and output gate. In this paper, the feature vector xt extracted by CNN, the state memory unit ct−1 and the hidden state h t−1 of the previous sequence are put into the forget gate together. Obtain the output of the forgetting gate through a sigmoid activation function. The calculation formula is: f t = σ (W f h t−1 + U f xt + b f )

(17.1)

In the formula: W f , U f are the weights, and b f is the bias. The input gate is divided into two parts. The first part uses the sigmoid activation function s and the output is i t , and the second part uses the tanh function and the output is at . The two parts jointly determine the vector that needs to be retained in the state memory unit. The calculation formula is: i t = σ (Wi h t−1 + Ui xi + bi )

(17.2)

at = tan h(Wa h t−1 + Ua xi + ba )

(17.3)

In the formula:Wi , Ui , Wa , Ua are the weights, and bi , ba are the bias. The update gate state Ct consists of two parts. The first part is the product of Ct−1 and the output f t of the forgetting gate, and the second part is the product of the output i t and at . of the input gate. Ct = Ct−1  f t + i t  at

(17.4)

Among them,  is the Hadamard product. The update of the hidden state h t consists of two parts. The first part is ot , which is obtained by the hidden state h t−1 of the previous sequence, the sequence data xt and the activation function sigmoid. The second part is composed of the hidden state ct and the tanh activation function. which is:

17 Grid False Data Intrusion Detection Method Based …

185

ot = σ (Wo h t−1 + Uo xt + bo )

(17.5)

h t = ot  tan h(Ct )

(17.6)

Then the h t of the intermediate output is determined by the updated ct and the ot of the output gate.

17.4.4.3

CNN-LSTM Detector

The CNN-LSTM detector proposed in this paper is mainly composed of two parts: CNN is used to extract the features in the data and is responsible for integration; the function of LSTM is to remember and filter, filter out the integrated features of CNN, and then perform fitting prediction, and finally use the Softmax layer to identify the data and achieve classification.

17.5 Case Analysis To select the power grid in a certain area as an example analysis scenario, this paper selects IEEE14 nodes as the test environment, and divides the 14 nodes into 4 areas. The experimental data consists of 2000 sets of measurement samples randomly selected, which form the training and test sets. The experimental flowchart is shown in Fig. 17.3. The experimental data are mainly active power, reactive power, voltage amplitude and voltage phase angle. This article will prove the effectiveness of this method from the perspective of accuracy. The variable definition is shown below. TP (True positive) indicates that it is actually normal data, and it is also recognized as normal data; FN (False Negative Example) indicates that it is actually normal data and recognized as abnormal data; FP (False Positive Example) indicates that it is actually abnormal data. Recognized as normal data; TN (True Negative Example) means that it is actually abnormal data, and it is also abnormal data. Then the calculation formula of the

Fig. 17.3 Experimental flowchart

186

Y. Zhang et al.

accuracy rate is as follows. A=

TP + TN TP + FN + FP + TN

(17.7)

This article will be divided into two types of detectors, one is the traditional center detector, and the other is the edge detector assigned to the edge nodes. Since 14 nodes are divided into 4 regions, there are 4 edge nodes, and each detector is only responsible for its own region. When the traditional centralized detection method is used, the central detector is used to detect all data; when the detection based on edge computing and federated learning proposed in this paper is used, the edge detector will be used to detect the data. Collect data and process it, complete the training of the model locally, and pass the parameters to the central server, and the central server will integrate the data and send it to the edge detector. The time and accuracy used by each detector are shown in Fig. 17.4. This article also sets up two sets of comparative experiments, which are respectively compared with the CNN model and the SVM model. The accuracy of each method is shown in Fig. 17.5. It can be seen from Fig. 17.5 that the accuracy of the method proposed in this paper is higher than that of the other two models.

Fig. 17.4 The accuracy of each detector

17 Grid False Data Intrusion Detection Method Based …

187

Fig. 17.5 Accuracy rate of each false data detection method

17.6 Summary In order to effectively detect network intrusion and protect network security in the era of large-scale growth of grid big data, an intrusion detection method of grid fake data combining edge computing and federated learning is proposed. This method uses the advantages of edge computing to deliver detector tasks to each edge node; it also combines the advantages of federated learning, so that the data collected by edge nodes can be trained without going out locally, and only the relevant parameters are encrypted and transmitted to the server. This protects the privacy of the data. This framework combines deep learning algorithms to achieve efficient detection. Compared with the traditional intrusion detection method, the method proposed in this paper is more efficient, while reducing the transmission delay and improving the detection accuracy.

References 1. Liu, J. H.: Research on False Data Injection Attack Method and Detection Method in Smart Grid. Guangdong University of Technology (2020). (In Chinese)

188

Y. Zhang et al.

2. Xue, D. B.: Research on false data injection attack detection in smart grid. Chongqing University of Posts and Telecommunications (2019). (In Chinese) 3. Wang, Y., Wu, J.Y., Chen, X.H., Cheng, Y.Z., Liu, L.L.: Kalman filter-based false data injection attack detection method for electric power. J. Shanghai Electr. Power Univ. 37(02), 205–210 (2021). (In Chinese) 4. Liu, X.R., Chang, P., Sun, Q.Y.: Power grid false data injection attack detection based on adaptive hybrid prediction of XGBoost and unscented Kalman filter. Proc. Chinese Soc. Electr. Eng. 41(16), 5462–5475 (2021). https://doi.org/10.13334/j.0258-8013.pcsee.202354. Last accessed 13 Aug 2021. (In Chinese) 5. Ashrafuzzaman, M., Chakhchoukh, Y., Jillepalli, A.A., Tosic, P.T., de Leon, D.C., Sheldon, F.T., Johnson, B.K.: Detecting stealthy false data injection attacks in power grids using deep learning. In: 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), pp. 219–225, IEEE (2018) 6. Ruan, Z.W., Meng, G., Zhou, D.Q., et al.: Research on false data injection attack detection method in smart grid. Autom. Instrum. 3, 49–52 (2019). (In Chinese) 7. He, Y., Mendis, G.J., Wei, J.: Real-time detection of false data injection attacks in smart grid: a deep learning-based intelligent mechanism. IEEE Trans. Smart Grid 8(5), 2505–2516 (2017) 8. Esmalifalak, M., Liu, L., Nguyen, N., et al.: Detecting stealthy false data injection using machine learning in smart grid. IEEE Syst. J. 11(3), 1644–1652 (2017) 9. Chen, L.D., Liu, N.: False data injection attack and detection method for interactive demand response. Autom. Electr. Power Syst. 45(3), 15–23 (2021). (In Chinese) 10. Liu, Y., Ning, P., Reiter, M.K.: False data injection attacks against state estimation in electric power grids. In: Proceedings of the 16th ACM Conference on Computer and Communications Security. Chicago, Illinois, USA: ACM, pp. 21–32 (2009)

Chapter 18

Innovative Design of Traditional Arts and Crafts Based on 3D Digital Technology Lin Lin

Abstract Under the background of the new era, 3D digital technology, as an important basis for the innovation and development of various industries, has made excellent achievements in practical application. Especially for the innovative design of traditional arts and crafts, it can make art creation and product production with the help of three-dimensional digital technology, which can not only visually show the gradually disappearing traditional crafts, but also enhance the artistic charm of related products. After understanding the relationship between 3D digital technology and traditional arts and crafts, this paper analyzes the main direction of their future integration and development according to its practical application steps. Keywords 3D digital technology · Traditional · Arts and crafts · Ceramic · Xiang embroidery

18.1 Introduction Understanding the actual application of 3D digital technology, we can know that the staff can use advanced machine equipment and operating software to simulate and create intuitive models of traditional arts and crafts products. However, the understanding of traditional arts and crafts shows that there are high difficulties in both design analysis and operation innovation. Especially when the relevant culture and technology are slowly disappearing with the development of The Times, in order to better inherit the excellent traditional arts and crafts and create designs based on the living needs of contemporary people, it is necessary to make reasonable use of 3D digital technology [1–4].

L. Lin (B) Tsinghua University, Beijing, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_18

189

190

L. Lin

18.2 Application Steps Traditional arts and crafts, as a part of China’s traditional culture, have not received people’s attention and attention in the new era, which leads to the development problem of talent fault in product design of these industries. The combination of 3D digital technology for product optimization and innovation can use modern technology to get more people’s attention, and stimulate people’s love for related products and technologies. At the same time, 3D digital technology can further solve the actual product design defects, and thus accelerate the innovation of this industry development. Based on the analysis of the application of 3D digital technology in recent years, it can be seen that the staff must use professional design software for modeling and adjustment, which involves the following steps:

18.2.1 Preparation Period During this period, according to the steps shown in Fig. 18.1, the staff should prepare the materials needed for the product design, and clearly propose the basic design ideas, such as the selected color matching, materials and decorative patterns. Especially for standard color cards, which serve as an important basis for color contrast and design analysis, designers need to plan the design scheme of process products on the basis of early preparation. It should be noted that the design data during this period is not accurate, but it will not have much impact, and it is only for the preparation of later in-depth research.

Fig. 18.1 Steps of innovative design based on 3D digital technology

18 Innovative Design of Traditional Arts …

191

18.2.2 Design Period At this time, it is necessary for designers to subdivide and study the data contained on the basis of a comprehensive understanding of the product design specifications, and put forward an accurate design scheme. Since there are many types of traditional craft products in China, cultural elements in different periods will be used, so designers need their own experience to put forward the best plan. For example, after selecting the appropriate design software, in the process of making the product model, the designer needs to carry out several experimental analyses on its structure and selected materials according to the product characteristics [5–7].

18.2.3 Optimization Period In this stage, it is necessary to evaluate the design of the product model, find out the existing problems according to all the data and application withdrawal, and put forward a specific optimization plan. This stage is the key to the creation and design of traditional craft products, as well as the last link to clarify the design model. At this point, the designer needs to correct all the data of the model after putting forward the final plan, and then determine the quantity of the required number of arts and crafts products and calculate the corresponding shrinkage ratio. This can not only improve the success rate of product production, but also ensure the application quality of arts and crafts design technology. In practical application, 3D digital technology can solve some problems existing in previous design and creation, but it must also consider the limitations of actual processing, so as to ensure the effective integration of the existing production level and design scheme [8–13].

18.3 Application Direction 18.3.1 Ceramic In the design and creation of this kind of product, more attention is paid to the radian of the curved surface, so designers must strictly control the curve rate of model manufacturing and design operation. Past ceramic products are shown in Fig. 18.2 the teacup, teapot, such as arts and crafts, either in design or in the creation requires designers have rich work experience, and make use of the macroscopic observation way to various patterns, although the overall operation difficulty is not high, but in order to better show this kind of product design art and craft, can combine three digital technology to create innovation. However, it can not only reduce the probability of errors in the teapot surface rate, but also obtain more accurate data information. Common software such as Rhino, which can master the specific specifications of

192

L. Lin

Fig. 18.2 Teapot model designed based on Rhino software

different components from the whole, and then effectively connect the data of each component in the design and creation, and use tools such as arcs and curves to coordinate and unify. In the context of industrial technological innovation, with the steady development of social division of production, design, manufacturing and sales began to develop separately. At this time, design has an independent innovative position and constitutes a variety of application software in practical exploration. These tools are mainly divided into two-dimensional plane software and three-dimensional modeling software. The former is mainly used to design patterns and patterns, while the latter can carry out product design, art design and environmental art design according to its own modeling function. At present, the common modeling methods are divided into three kinds: first, use three-dimensional software to implement modeling; secondly, we should use equipment and instruments to measure and model; finally, model based on the image. The case study of this paper is to use Rhino software for modeling design. In the program plug-in development, Rhino software is more and more widely used. RhinoGold, TSpline and other software are used as the core of plug-in development, which can be used in a number of industries design. According to the curve design results shown in Figs. 18.3 and 18.4, Rhino software and Grasshopper parametric modeling plug-in can obtain more beautiful architectural curve modeling, at this time Rhino software surface modeling ability can help designers to complete glass or ceramic complex modeling. The non-uniform rational B-spline (NURBS) technology was used as the core for 3D modeling. Curves were formed according to the adjustment of points, and then curved surfaces were formed combining with the movement of curves. The rationality of the curve design directly affects the quality of the model. According to the curve variation diagram shown in the figure above, the curve includes starting point, ending point, CV point, EP point and so on. Among them, CV points represent points in the control curve shape, and the distance and distribution between points will directly affect the shape and continuity of the overall curve. The point represents

18 Innovative Design of Traditional Arts …

193

Fig. 18.3 Differences between CV points and EP points

Fig. 18.4 Real-time adjustment of CV points

the structural point in the curve. The actual distribution affects the shape of the curve and controls the line segment during editing. The advantage of EP points is that EP points are on curves, making it easy to capture and create non-smooth curves 3D scanning technology and 3Dmodeling software have auxiliary characteristics in technical research and are mainly used in reverse engineering. After all-round scanning

194

L. Lin

of the entity, the spatial coordinates of the object’s morphological characteristics are sorted and recorded, and then the point cloud data is constructed from the location information of the unique coordinates. Then the digital model is designed by using 3Dsoftware, and the replica with the same morphology is finally obtained. Imagecentered modeling is also called photo modeling. After collecting two-dimensional images, a computer will be used to analyze and process the images to form a THREEDIMENSIONAL model. This method is more effective than the traditional modeling method, the actual operation is more convenient, the virtual effect obtained is more real, the overall operation cost is lower, and has a broad range of application. In the traditional arts and crafts innovation using three digit technology, the actual operation process is divided into the following points: First, preparation. In this process, sufficient design materials should be prepared, and the designer should draw the three views of the product based on the preliminary design concept drawing, and the corresponding dimension marking should provide reference for the subsequent model making. Second, design and implementation. Designers should choose the corresponding 3Dsoftware according to the design requirements. Different types of product modeling should choose relative construction models, one should be selected according to the characteristics of product modeling, and the other should be combined with design thinking. For example, in the production of ceramic products, according to the 3Dsoftware modeling needs to draw the outline of the product and the section of the thickness of the instrument, so the rotation tool can be used to rotate according to the central axis to get the modeling diagram as shown in Fig. 18.5. While in the design of more complex ceramic products, the use of 3DRhino software curved tools, so as to ensure that the design modeling meets the basic requirements. Third, design optimization. Designers should evaluate and analyze the existing design scheme to ensure that it meets the design requirements. Generally speaking, an appropriate design scheme should be judged quickly and accurately by combining highly data-based checking calculation and visual representation, as shown in Fig. 18.6:

Fig. 18.5 Rotation tool and Lofting tool

18 Innovative Design of Traditional Arts …

195

Fig. 18.6. 3D digital technology scheme based on traditional arts and crafts

Fourth, design in-depth. This operation can provide effective basis for the subsequent production process, requiring designers to continue to improve the existing optimization scheme, and combined with the digital THREE-DIMENSIONAL model calculation to obtain the value of the product, so as to clarify the processing and production constraints. See Table 18.1 for the specific differences. The comparison between traditional dewaxing casting technology and 3D-printing dewaxing casting technology is presented. In the traditional arts and crafts innovative design, the production and application of 3D digital technology can be implemented in the overall product development and Table 18.1 Comparative analysis of traditional technology and 3Dprinting technology Technical types

Traditional dewaxing casting technology

3D printing demolding casting technology

The casting process

First of all, the silica gel mold is selected and the seed mold is made (the seed mold is usually made of blue paraffin wax)

Firstly, the virtual digital model is created by using 3D modeling software

Secondly, the kind of mold for carving, dressing, grinding, polishing

Secondly, the paraffin model is directly printed by 3Dprinting technology

Finally, the die is processed by a CNC machine tool

Finally, dewaxing casting and processing are carried out

196

L. Lin

manufacturing, the required manufacturing time and cost is far lower than the manual model, can be modified several times during the design. For example, in the design and manufacture of traditional glass technology products, the contrast analysis of lost wax casting technology and 3D-printing lost wax casting technology can get the results as shown in the table above, thus the application of digital technology can make the whole product is set, research and development become more intuitive, and convenient communication, be able to use animation demo or virtual reality, threedimensional presents the overall appearance of traditional arts and crafts products. In the application and promotion of 3D digital technology, 3D-printing technology is the focus of arts and crafts designers. Now in the international market, some people or enterprises have begun to use this technical concept to manufacture traditional arts and crafts. 3Dprinting technology effectively integrates 3D digital design functions and process manufacturing features, which helps designers to print three-dimensional products directly after clear design drawings. Although the technology is in its early stages of development, it has been successfully used in industry, jewelry and apparel. Combined with the analysis in the table below, it can be seen that in the remanufacturing of glass art products, Tora casting glass based on 3Dtechnology can not only eliminate part of the complex processing relief, but also reduce the production and design time of the product. Therefore, in the future traditional arts and crafts creation, it is necessary to increase the exploration of this technical concept. See Table 18.1.

18.4 Conclusion To sum up, the integration of 3D digital technology and traditional arts and crafts has further promoted the development of social, economic and cultural propaganda. At the same time, the effective combination of traditional technology and modern technology, and with the pace of technological innovation and continuous optimization and promotion, all fields will be able to obtain more outstanding achievements.

References 1. Debs, L., Miller, K.D., Ashby, I., et al.: Students’ perspectives on different teaching methods: comparing innovative and traditional courses in a technology program. Res. Sci. Technol. Educ. 37(3), 297–323 (2019) 2. Raman, A.: Brenda king, the wardle family and its circle: textile production in the arts and crafts era (Woodbridge: Boydell Press, 2019. Pp. xix+ 218. 14 plates. 15 figures. ISBN 9781783273959 Hbk.£ 29.95). Econom. History Rev. 73(1), 325–350 (2020) 3. Mitchell, R.: The wardle family and its circle: textile production in the arts and crafts era, by Brenda M King. English Histor Rev 135(577), 1619–1621 (2020) 4. Jang, M., Aavakare, M., Nikou, S., Kim, S.: The impact of literacy on intention to use digital technology for learning: a comparative study of Korea and Finland. Telecommun. Policy 45(7), 102154 (2021)

18 Innovative Design of Traditional Arts …

197

5. Shin, D., Hwang, Y.: The effects of security and traceability of blockchain on digital affordance. Online Inf. Rev. 44(4), 913–932 (2020) 6. Seo, H., Britton, H., Ramaswamy, M., Altschwager, D., Blomberg, M., Aromona, S., Schuster, B., Booton, E., Ault, M., Wickliffe, J.: Returning to the digital world: digital technology use and privacy management of women transitioning from incarceration. New Med. Soc. 3, 1461444820966993 (2020) 7. Phillips, N.C., Killian Lund, V.: Sustaining affective resonance: co-constructing care in a school-based digital design studio. Br. J. Edu. Technol. 50(4), 1532–1543 (2019) 8. Reynaldo, T., Mukhopadhyay, T.P.: Digital arts in Latin America: a report on the archival history of intersections in art and technology in Latin America. Digit. Sch. Human. 36(Supplement_1), i113–i123 (2021) 9. Zhou, S., Zhu, Z., Shu, W., et al.: Phoenix centre design and construction of a complex spatial structure based on 3D digital technology. Struct. Eng. Int. 29(3), 377–381 (2019) 10. Wu, Z., Chen, Y.: Digital art feature association mining based on the machine learning algorithm. Complexity 1, 1–11 (2021) 11. Xu, C., Huang, Y., Dewancker, B.: Art inheritance: an education course on traditional pattern morphological generation in architecture design based on digital sculpturism. Sustainability 12(9), 3752 (2020) 12. Li, Q.W., Jiang, P., Li, H.: Prognostics and health management of FAST cable-net structure based on digital twin technology. Res. Astron. Astrophys. 20(05), 49–56 (2020) 13. Amft, O., Baker, M.: Makers of pervasive systems and crafts. IEEE Pervasive Comput. 18(4), 61–70 (2019)

Chapter 19

Research on the Simulation of Informationized Psychological Sand Table Based on 3D Scene Guangdong Wei, Tie Liu, Qiuyan Wang, Yuchi Tang, and Yitao Wang

Abstract Sand table game as one of the most common technical methods in international psychological therapy, also known as box court therapy, in the current scientific research has been paid attention to, and gradually become the most common techniques in the field of applied psychology content. Because the sand table itself is too large, it is not convenient to move during the use, will be limited by the space scope, and cannot effectively save the information of the sand table. Therefore, scientific researchers put forward the use of simulation and other equipment functions in the practice of exploration, to carry out a comprehensive study of the sand table. Based on the understanding of the intelligent electronic sand table constructed based on the concept of Internet technology, this paper designs and analyzes the informatization psychological sand table simulation system with 3D scenes as the core in view of the development needs of the actual psychology field, and carries out empirical research from the perspective of the application of mobile phone App software. The final results show that the optimization of traditional psychological sand table simulation by 3D scenization and information technology can not only build a high-quality simulation environment, but also perform data simulation analysis and effective storage in the visualization of simulation results. Keywords 3D scene · Informatization · Psychological sand table · The Internet

G. Wei Linyi Shayou Technology Research Institute, Linyi, China T. Liu Liaoning Provincial Department of Human Resources and Social Security Liaoning, Liaoning, China Q. Wang The Staff Help Association of Shenyang City, Shenyang, China Y. Tang Shenyang Aerospace University, Shenyang, China Y. Wang (B) Linyi Shayou Technology Research Institute, Linyi, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_19

199

200

G. Wei et al.

19.1 Introduction In essence, sand table simulation in the field of psychology is to complete the client’s free expression and creation by using fine sand, clean water and various objects and the attention of the therapist. In this simple design application, the inner world of human can be fully presented, and the transformation can be completed in the development of spiritual enrichment. This approach is to actively imagine technology under the guidance of Jung and related theories. With the continuous exploration of theories, the traditional enlightenment and comprehensive study of children’s nature and self-expression was finally formed and promoted in the 1960s. At the beginning, it was mainly applied to the treatment of children, and now it has gradually been applied to the treatment of adult psychoanalysis. Sand table therapy is widely used in psychology and philosophy in Chinese culture under the guidance of modern development and theory. Up to now, the concept of treatment technology has made substantial breakthroughs, not only obtaining a variety of micromodel numbers and types, but also transferring the practical treatment object from children to adults. At the same time, the healers of sand table games in ancient symbols makes a deep exploration and its value, and the content is further proved that the treatment of sand table games process not only involves the core of human emotions, also touched the collective unconscious, hence the sand play therapy in itself shows advantage in the diagnosis and treatment of psychological, at the same time also have guidance education function. Therefore, in the current technological development, combined with the 3D scene simulation research information psychological sand table, can not only solve the traditional psychological sand table research problems, but also collect a large amount of data information from the system, so as to provide effective basis for follow-up treatment research [1–3].

19.2 Method 19.2.1 Design and Application Electronic sand table system design of the core is to make people and geographical environment in developing building virtual interactive technology, to science and processing the data information, real and effective electronic sand table and building a perfect electronic sand table system, need to reserve a large number of remote sensing data as support, the overall requirements for electronic sand table design is extremely high. At this stage, the most common information electronic sand table in the market has a variety of touch screens, such as infrared, capacitive, resistive, etc. Although infrared has a long application life, but because the screen resolution is too low, the actual response speed is too slow, information positioning is not clear, so it has not been widely used. Research in this paper to complete information in the 3D scenes, psychological sand table simulation, must want to build intelligent interactive

19 Research on the Simulation of Informationized Psychological …

201

control technology as the core of a new simulation system, and by using quadratic subdivision scans improve positioning accuracy of the infrared touch screen, choose Internet + closed-loop control technology in key facilities complete environment, to enhance the system environment fitness while, strengthen the simulation effect of psychological sand table game [4–6]. Through the effective integration of mobile Internet technology and ordinary electronic sand table, multiple independent electronic sand tables are fused together using 4G or Wi-Fi communication technology, so as to facilitate data and information transmission and application within the system. The specific flow chart is shown in Fig. 19.1. First, A refers to the mobile phone App, followed by B refers to the horizontal electronic sand table, and finally, C refers to the standing sand table. All facilities contained within the network belong to communication nodes, which can use 4G or Wi-Fi for data communication and transmission, and transfer the relevant content to the corresponding server. Users can also use background software or mobile phone App to collect and research the operating conditions and data information of the device.

19.2.2 Hardware Design In recent years, the continuous development of mixed-signal technology has not only greatly reduced the production cost of touch sensors to a level that can be used by a variety of consumer electronic products, but has also contributed to the realization of higher sensitivity and reliability of the sensing circuit, thereby increasing the cover thickness for improved reliability and durability. Temperature control is a very attractive switch, but it requires proper physical size, as well as the use of a 3 mm or thinner dedicated interface for infrared touch in typical designs. Sensing fingers through a dedicated interface becomes increasingly difficult. The cable tray is easy to use, accessible and transparent, as you can see in the circuit pad of the sensor. The cable bridge layer can facilitate the interface. The overall structure of the information psychological sand table includes four parts, the first is workstation, the second is the storage server, the third is the interaction terminal, and the last is the peripheral components. Workstation design to choose D EL L 7910 T, storage server to choose D EL L M D 3400, the first two to use professional interface or gigabit switch connection services. Choose an Opto-4 K HD display with 75-inch LCD screen [6], install the infrared touch on the top of the display, and connect the workstation with USB 3.0 interface [7]. These four forms the core of informatization psychological sand table based on content, temperature control circuit will be done by fan and humidifier is the effective control of sand table prototype environment fitness, and remote control circuit can split off and status of remote monitoring, at the same time the internal configuration of quality cable and speaker, [8] to strengthen the psychological sand table simulation experience. The overall architecture is shown as Fig. 19.2 follows:

202

G. Wei et al.

Fig. 19.1 Flow chart of electronic sand table design based on mobile Internet

Fig. 19.2 Structure diagram of electronic sand table

19.2.3 Infrared Scanning Design The infrared touch screen consists of two parts, one is the infrared transmitting tube, the other is the infrared receiving tube, both are closely arranged inside the frame. When the infrared light is blocked by the contact, the intensity of the signal obtained by the receiving tube will change accordingly, and the controller can scan

19 Research on the Simulation of Informationized Psychological …

203

both of them to judge the signal change of the receiving tube, so as to accurately calculate the location of the contact, and finally transmit the coordinate information to the display host. In the traditional sense, the infrared touch screen only selects the scanning mode of a group of coaxial transmitting and receiving tubes, and the actual resolution will be judged according to the number of infrared tubes. In order to improve the positioning accuracy and actual resolution of infrared touch screen and control the change probability of contact position, accurate value should be obtained according to the difference quantization offset positioning algorithm. This method not only improves the calculation process, but also improves the response speed of program operation. The specific process is shown in Fig. 19.3. If the infrared transmitting tube, infrared touchscreen and receiving tube is in balanced distribution, all of the output signal of infrared transmitting tube is stronger, so according to the calculation process in the above analysis, select all infrared quantitative numerical tube row, and as it M, X axis and contact the nearest tube X 1 serial number is A1 , coordinates is collected (X 1 , y2 ) and light intensity is the Z 1 . The maximum transmitting tube X n projected by the receiver has the serial number An , and the coordinate is (X n , yn ). The distance between the two points is considered as D, and the light intensity obtained by the point An is Z n . When this condition is met, there are only two infrared tubes, so it can be clear that the position information of contact Z is: x = x1 + x = |x2 x1 | Y = |x1 x2 | Yy

A2 A1 D + (Z 2 Z 1 ) 2 M

(19.1)

Under this condition, the two infrared tubes with the lowest light intensity, Am and An , are selected, and the corresponding light intensity is Z m and Z n . The Y-axis

Fig. 19.3 Calculation flow of offset positioning algorithm

204

G. Wei et al.

Fig. 19.4 Work flow chart of main interface of the system

of contact Z shall be calculated according to the above formula, and the position information of X-axis shall be calculated by:  x=

 D Am − An + (Z m − Z n ) ∗ 2 M

(19.2)

19.2.4 Software Design In this paper, mobile phone App software is selected for the 3D scene built by the simulation of informationized psychological sand table. It mainly uses Ethernet to communicate with the data of electronic sand table, so as to have the functions of remote switch, environmental state monitoring and expansibility. The system design of mobile App software is divided into two kinds. On the one hand, it refers to Android system. Java programming language should be selected and the official SDK of Android and third-party SDK should be called in the Eclipse development environment, thus forming universal mobile software with user interface and human–machine interaction functions. On the other hand, it refers to the iOS system. OC programming language should be selected for the design of relevant apps, and man–machine interaction and user interface should be realized in the development environment. The work flow of the main interface is shown in Fig. 19.4.

19.3 Result Analysis 3D technology is applied to the modeling and control of electronic sand table. The scene model was established with Ethernet technology and 3DS MAX, and Unity3D was used as the 3D driving engine to realize the roaming of sand table, the positioning and inverse control of turnout, the change of signal machine, operation control and

19 Research on the Simulation of Informationized Psychological …

205

human–computer interaction. In view of the above research and analysis with threedimensional scene as the core of information analysis, found that psychological sand table simulation system in the traditional sense of the sand table games already cannot satisfy the current demand in the field of psychological analysis, so if you want to show the application of psychological treatment technology advantages, combined with the 3D scene simulation is required to build a new application system. In practice, this paper interaction as the core of the electronic sand table psychological analysis system carries on the empirical research, from the point of view of practical application, although at present, the market can help sand to retain information, but electronic equipment is too flat, not the simulation analysis on psychological research, which directly affect the psychological experience of the sand table simulation effect. It is very different from the actual sand table. In order to solve this technical problem, an electronic sand table psychological analysis system based on 3D interaction is proposed in this paper, including the main controller, psychological analysis server and 3D interaction module. Among them, the master controller will be connected with the data storage server, mainly receive the electronic sand table tool call command. The storage server contains information such as location, tool type and sand table direction. Assignments as shown in the figure below flow chart analysis showed that the use of their interaction to construct electronic sand table psychological analysis system, can effectively integrate the network information technology and information technology psychological sand processing concept, in the integration of 3D interactive display module, and the solid model, help psychotherapy participants faster sanding and psychological analysis, in order to improve the scientific level of modern psychology research, to meet the needs of practical development. At the same time, the psychological sand table module constructed in this study constructs a real experience environment for psychological therapy, and plays a positive role in the analysis and promotion of sand table. On the one hand, it is helpful to the integration of psychological data information in modern psychology research, on the other hand, it can improve the accuracy of actual analysis, and accords with the characteristics of information psychological sand table simulation.

19.4 Conclusion To sum up, in the development of modernization construction, 3D scenario-based psychological sand table simulation analysis has gradually become the focus of modern scientific and technological research, and has been paid attention to by researchers and scholars in the field of psychological research. This research is interdisciplinary, but because the overall research of theoretical knowledge and technical content is relatively complex, so in practice exploration at home and abroad to actively learn from the knowledge of skills, pay attention to our country’s basic national conditions, in 3Dscenes, psychological sand table simulation analysis in the construction of informatization, and in view of the field development needs, strengthen the cultivation of professional talents. Only in this way can we fully show the application value

206

G. Wei et al.

of information technology and psychological sand table simulation in the market environment of calendar innovation [9, 10].

References 1. 2021 recruit students general rules | psychological sand course. J. Psychol. Health (10), 9 (2021) 2. Liu, X.F., Ren, Y., Yang, H.: A case study on psychological sandplay to soothe anger of mild autistic adolescents. J. Campus Psychol. 19(05), 461–463 (2021) 3. Gao, X.T.: Case study on intervention of sandtable game therapy on College students’ general psychological problems. J. Langfang Normal Univ. (Natural Science Edition) 21(03), 95–99 (2021) 4. Wang, X.M.: The effect of group sand table game on coping style, mental health level and quality of life in patients with gastric cancer undergoing chemotherapy. J. Contemp. Clin. Med. 34(05), 21+17 (2021) 5. Zhong, W.H.: Exploring the initial sandbox characteristics of college students with different psychological capital. Intelligence 26, 106–108 (2021) 6. Chen, Y.F.: Application of sand table game therapy in psychological counseling of technical college students. Professional 15, 37–38 (2021) 7. Miao, R., Luo, Z.L.: J. Guangzhou Public Sec. Adm. Cadre Coll. 31(02), 56–60 (2021) 8. ke xe. Psychological intervention of group sand table game on coping style and therapeutic effect of thoracic cancer patients undergoing chemotherapy. Evid.-Based Nurs. 7(09), 1266– 1269 (2021) 9. Wang, R.H.: The courage to speak: a case study of selective mutism. Mental Health Educ Primary Secondary Schools 21, 35–38 (2021) 10. Zhang, N., Li, X.Y., Wen, S.Q.: Application of sand table game therapy in middle school mental health education. Tianjin Educ. 20, 6–7 (2021)

Chapter 20

Research on Graphic Design of Digital Media Art Based on Computer Aided Algorithm Saihua Xu

Abstract Digital media art graphic design in the new media environment began to be used in many fields in a dynamic way. Especially in the background of big data era, art graphic design with computer-aided algorithm as the core is not only the main direction of practical exploration, but also an important content to meet the development of the industry. In this article, therefore, to understand digital media, on the basis of dynamic graphical development history, according to the digital age of graphic design content and characteristics of dynamic and systematic research on the unique function of visual graphic design, and then validated combining computer aided the shape interpolation algorithm analysis, the final results show that based on shape interpolation can meet the demand of the subjective visual experience, or the objective laws of physics. Keywords Computer aided algorithm · Digital media · Art graphics · Shape interpolation

20.1 Introduction The characteristics of productive forces in different periods all affect the characteristics of art. In the innovation and development of digital technology, digital media graphic design began to use computer aided application in various fields under the new media environment, and the overall presentation is more vivid and interesting [1]. From the perspective of art, in the development of modern art, most artists no longer use simple planar image thinking to express their thoughts, but start to choose new creation methods and materials to create, so as to move the audience. For example, in his seminal films of the late 1920s, Marcel Duchamp used curved and rotating text sets to create entirely new audio–visual spaces, redefining the nature and effects of on-screen motion [2]. From the perspective of technology, researchers have seen the development potential of computer as an artistic tool as early as the S. Xu (B) Nanchang Institute of Science & Technology, Nanchang, Jiangxi, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_20

207

208

S. Xu

nineteenth century, but limited by the technological level and social conditions at that time, the combination of computer and artistic creation has not been comprehensively promoted [3]. It was not until the early 1960s that interactive computer graphics and graphical user interface designers used software sketching to allow users to create high-precision digital display engineering drawings with a light pen on a CRT screen and perform basic operations such as copying, moving and rotating, which is the basis of all graphic design software today [4]. In the exploration of practical technology innovation, more and more graphics software technology can be used now. The computer aided design in the way refers to the software with mature application nowadays, among them, computer aided technology such as PS, AI, CAD and so on, the actual graphics and images also have a variety of manifestations. The specific structure is shown in Fig. 20.1. The development trend of the integration of art and technology makes the traditional graphic design gradually change from static to dynamic. The motion of the plane also brings certain interest and plot to the design of the work, which not only enriches the design connotation, but also attracts more audience groups. Shape interpolation is the basic content of computer animation design, which is mainly used to obtain new geometric models. When a given geometric model belongs to different poses of the same object, shape interpolation is also called pose interpolation in essence, and the interpolation sequence directly reflects the motion of the object. Therefore, this paper studies the application direction of shape difference in computer aided algorithm in digital media art graphic design, and carries out verification analysis based on global optimization [5].

Fig. 20.1 Digital media usage architecture

20 Research on Graphic Design of Digital Media Art Based …

209

20.2 Method 20.2.1 Digital Media Art Graphic Design Content First, words. From the perspective of text, if the design transformation is carried out under computer aided technology, dynamic and changeable text expression can be obtained. This dynamic presentation technology, which fully integrates text and other elements, is not only simple and fast, but also easy to understand, helping the audience to feel the intuitive information under digital media more quickly. Characters have changed from the traditional physical form to the dynamic language in the digital environment, which not only has time and space for free expression, but also can construct a complete narrative scene [6]. Second, graphics. Nowadays, the majority of Internet users browse digital media information using electronic devices such as mobile phones and computers, so most art graphic design is more focused on showing its own dynamic. This design form between video and image can not only make digital media more three-dimensional and effective, but also produce visual sensory conflicts for the audience. Third, still photos. This kind of works mostly use static pictures to add motion graphics to make the whole movement. Designers and photographers choose seemingly absurd visual images full of dreams, which can make the original plain design more unique and vivid [7]. Fourth, signs. In the steady development of digital media technology, mobile interactive media has been fully used, which also represents the gradual development of China’s digital media from static plane symbols in the traditional sense to dynamic. This brand new design effectively integrates multiple factors such as text, graphics, sound and video, which can not only improve itself in a new context, but also make the logo more possible. Fifth, illustrations. Dynamic illustration is designed and analyzed based on dynamic visual experience. During visual communication, dynamic illustration can intuitively construct multiple viewing levels and promote the audience to interact while viewing. This artistic graphic design can make an otherwise static picture more vivid [8].

20.2.2 Shape Interpolation Polygons or grids are used to represent shapes, and all frames have the same number of vertices. Then the coordinates of all vertices in each frame are connected together to form points in a high-dimensional space. The space formed by this point is called shape space, denoted as S. The key needle S0 is given, and the condition S ∈ S is met. Shape interpolation can be expressed by functions as follows:

210

S. Xu

 : S × S × [0, 1] → S   S t =  S 0 , S 1 , t , t ∈ [0, 1]

(20.1)

In the above formula, t represents time and S ‘represents the shape of time T. To calculate t ∈ R − [0, 1] the shape S under this condition is extrapolated. In the discrete state, the next frame of S ‘is called S t + t. From now on, superscripts are used to represent quantities related to S t , single subscripts—to represent points, two subscripts—to represent edges and three subscripts—to represent triangles. First, shape interpolation within geometry. The first is the simplest set of intrinsic quantity, which involves volume, included angle, side length and other content. Assuming that the side length of S t , the reverse angle and volume of adjacent triangular faces of S t are all required to change linearly, the following formula can be obtained: ⎧ t e = (1 − t)ei0j + tei1j ⎪ ⎨ ij θitj = (1 − t)θi0j + tθi1j ⎪ ⎩ t V = (1 − t)V 0 + t V 1

(20.2)

It should be noted that the length of the edge (i, j) is, the included angle of the normal vectors of the two triangles sharing the edge is, and the total volume of the grid is V t , both of which can represent the vertex function in S t . Therefore, the minimization energy calculation formula can be obtained as follows: ε = λεedge + μεangle + vεvolume

(20.3)

When rebuilding the intermediate frame shape, you can get: ⎧   2 ⎪ ε eit j − (1 − t)ei0j − tei1j = edge ⎪ ⎪ ⎪ ⎪ (i, j)∈S ⎪ ⎨   2 εangle = θitj − (1 − t)θi0j − tθi1j ⎪ ⎪ ⎪ (i, j)∈S ⎪ ⎪ ⎪ 2  ⎩ εvolume = V t − (1 − t)V 0 − t V 1

(20.4)

In view of the above model, a lot of work needs to be extended according to the volume inside the set. The biggest difference between the selected methods lies in the construction method and the optimization method of nonlinear energy. For example, in practical exploration, multi-level interpolation technique of multiple registration is proposed to carry out operation, so as to construct intermediate frames by means of layered splicing on the premise of guaranteeing the quality of results. The specific structure is shown in Fig. 20.2. In traffic difference analysis, in the study, put forward by the differential how few people directly difference, different grid toward differential coordinates, according to the actual embedded space component f o to the local space was analyzed, and

20 Research on Graphic Design of Digital Media Art Based …

211

Fig. 20.2 Interpolates the rotation angles of two triangular frames

the mapping relationship between f 1 and local space need according to the attached vertex of discrete local framework to describe, shown in Fig. 20.3, specific as follows, the inner point set FO as a plane point set, all the inner point is called the interior of F, recorded as: F0. (2) Outer point. If there is a neighborhood F (1.0) of point f 1 causing F (1) F0, F1 is called the outer point of F. In addition to straight lines representing object elements and the motion relationship between the beginning and the end, indirect ways can also be used to present object motion as shown in the Fig. 20.4. In other words, according to the relative position relationship of the elements inside the object, the movement process changes are represented. The relative spatial position transformation of the geometric elements of the object itself is also known as the linkage mapping. Combined with the Fig. 20.3 Fig. 20.3 Schematic diagram of vertex—neighborhood

212

S. Xu

Fig. 20.4 Grid interpolation of graphic elements

analysis above, it can be seen that the operator of extracting the frame is F t and the connection mapping is Titj , so the schematic formula is as follows [9]:   Titj ◦ Fit (Pi ) = Fit P j , P j ∈ N (Pi ), t ∈ [0, 1]

(20.5)

20.3 Result Analysis In this paper, global optimization is selected for optimization analysis of the two forms. On the one hand, it refers to the method with geometric constraints as the core, among which AIAP method (Fast AIAP shape interpolation algorithm) is the most typical one. Assuming that the model follows isometric motion, the measured distance of any two points on the surface will remain unchanged during the movement. From the perspective of popularity, the length of each point tangent vector contained in the equidistant surface is consistent. In other words, the minimized energy formula is: 1

2

dτ (t, p)/dt, τ (t, p) Dτ dpdt 0

(20.6)

φ T p (φ)

   the In the above formula, T p φ S 0 , S 1 , t represents  set of all tangent vectors on  the tangent plane of the surface at point P, φ S 0 , S 1 , t represents any tangent vector in the set. The global optimization algorithm can provide some properties for interpolation results. As shown in the Fig. 20.5, MSGA and AIAP methods are used to conduct comparative analysis of quasi-isometric properties of interpolation sequences. When dealing with a large number of computational problems, global optimization usually deals only with the difference between two key frames. The current algorithm uses the velocity field as a guide to make the middle shape form a high-dimensional polyline segment close to the popular Riemann geodesic, which involves a large nonlinear optimization problem, and the multi-frame interpolation must be directly related to the multi-geodesic, which makes the interpolation problem difficult to apply to the multi-frame situation. AIAP method can provide more accurate information of target

20 Research on Graphic Design of Digital Media Art Based …

213

Fig. 20.5 Comparative analysis results of using MSGA and AIAP methods

and clutter likelihood function through target amplitude information. Combining AIAP method to track multiple targets, the detection and tracking performance of sensor can be improved effectively. Combined with the target amplitude information, a practical auxiliary particle is constructed. The horizontal axis represents the number of interpolation frames, and the vertical axis represents the maximum error of edge length relative to linear interpolation. Legend Percentage indicates the percentage of edges that participate in statistics after poor quality edges are deleted. On the other hand, physical constraints are the core of the approach. In addition to considering the transformation relation of Euclidean space, we should start with the description deformation of other Spaces [10]. For example, the combination of physical plane polygon shape interpolation, motion process between the key frames will be affected by the bending and stretching the two forces, so you need to define the edge of the drawing are applied to solve the model and the angles of the bent due to the overall operation is relatively complex, so it can combine the stroke interpolation of cartoon image to process the shape [11]. At the same time, some researchers have proposed to consider the reference posture of periodic grid, calculate the vibration mode based on modal analysis, and transform it into a basis in space. As a result, the number of interpolations is then translated into modal coordinates in the corresponding space. In addition, according to continuum mechanics and elastic theory, the intermediate frame is guided to deform in the linear difference. This method is similar to the pure geometric method in that the tensor after the linear difference cannot guarantee the corresponding real shape, so the result should

214

S. Xu

be projected into the high dimensional trend that conforms to the physical hypothesis. In general, physical methods, especially those centered on solid mechanics, require a lot of computation. These methods can effectively avoid the phenomenon of word intersection because they need to consider both inside and outside the object. However, initialization is required during calculation, and the final result is affected by initialization [12].

20.4 Conclusion To sum up, the current innovation of information transmission and reading methods is becoming stronger and stronger. Especially in the background of the era of big data, as people have accumulated more and more experience in the development of digital media, the corresponding art and graphic design began to explore in-depth combined with computer-assisted algorithms. Thus, while changing the graphic design concept and mode, a more unique media art is presented. Therefore, in the development of digital media, industry enterprises should strengthen the training of computerassisted algorithms on the basis of paying attention, and pay attention to technological innovation and research from the perspective of their own long-term development. Only in this way can we obtain more unique artistic graphics. Acknowledgments (1)This work was supported by the Nanchang Kcy Laboratory of VR innovation development and Application (No.2018-NCZDSY-001). (2)The research and practice of innovative VR talents training pattern based on the integration of industry and education taking Digital Media Technology major as an example (jxjg-19-27-1).

References 1. Li, D.: Application and research of computer technology in digital media art. Artist, No. 236(08), 78–78 (2018) 2. Zhao, H.Y.: Application of virtual reality technology in digital media art creation. Guide Family Life 08, 30–31 (2020) 3. Hao, R.: Application research of dynamic graphics design based on digital technology. Hunan Packag. 2019–2, 27–30 (2021) 4. Liu, J.: Discussion and research on display design based on digital media art. Sci. Educ. Guide (E-Edition), 000(002), 96 (2017) 5. Zhao, N.Y.: “Moving for design”—exploring the teaching of digital media specialty based on the case study of “dynamic graphic design”. Beauty Time 748(05), 120–122 (2018) 6. Liu, J.T.: Virtual campus construction based on digital media art. Talent 000(021), 183–183 (2014) 7. Tang, S., Hanneghan, M.: State-of-the-art model driven game development: a survey of technological solutions for game-based learning. J. Interact. Learn. Res. 22(4), 551–605 (2011) 8. Su, J., Zhang, S.: Research on product shape innovation design method with human-computer interaction through genetic algorithm. In: IEEE International Conference on Computer-aided Industrial Design & Conceptual Design, vol. 1, pp. 301–305, IEEE (2010)

20 Research on Graphic Design of Digital Media Art Based …

215

9. Hao, R.: Research on the application of dynamic graphic design based on digital technology. Hunan Packaging 2019–2, 27–30 (2021) 10. Zhao, H.Y.: The application of virtual reality technology in digital media art creation. Family Life Guide 08, 30–31 (2020) 11. Genda, E.: Digitalization of printing media on graphic design(toward a era of digital designingform and word on design). Science of the Total Environment, Under Review, 31–36 (2015) 12. Liffiton, M.H., Sakallah, K.A.: Algorithms for computing minimal unsatisfiable subsets of constraints. J. Autom. Reason. 40(1), 1–33 (2008)

Chapter 21

Research on Visual Communication of Graphic Design Based on Machine Vision Qihao Zhou and Teng Liu

Abstract Visual communication graphic design as a difficult point in modern technology research, its original intention is to use visual graphics to transmit some ideas or ideological content to people, belongs to the visual image with creative thinking, people can according to their own visual sense to obtain the special language form contained in it. Whether it is commercial circulation or social culture, graphic design of visual communication has shown a profound influence. Therefore, in this paper, the machine vision graphics processing system is applied to the visual communication graphic design, and the system structure, functional modules, design process and other contents are simply understood, and according to the injection mold trademark design and processing practice cases are verified and analyzed. The final results prove that the visual communication graphic design system with machine vision as the core has played a positive role in the current technological development. Keywords Machine vision · Visual communication · Graphic design · CAD · The image processing

21.1 Introduction Visual communication graphic design is a basic content in every field of social development. From the point of traditional design, graphic design processing comprises: first of all designers drawing papers, the second technology personnel to the related parameters and requirements into the actual machining process, and then to processing personnel, this process is mainly based on manual processing, it is not only of low efficiency, but also can’t guarantee the quality of the product production. Finally, the graphics products that meet the design requirements in advance are obtained. In today’s social development, it is an effective means to promote the efficiency of visual communication graphic design and processing by comprehensively popularizing the automation equipment and CAD technology with computer control Q. Zhou (B) · T. Liu Wonkwang University, Iksan, Republic of Korea e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_21

217

218

Q. Zhou and T. Liu

as the core. Nowadays, a lot of advanced processing equipment on the market is based on automatic mechanical equipment to replace manual operation mode as the core of the development, such as laser cutting marking machine, pattern engraving machine, the actual graphic design is the use of CAD drawing software for operation, in order to solve the problems existing in the traditional design. At the same time, the system will also store a large number of fonts and patterns, which indirectly reflects the limitations of the system operation [1, 2]. If the design operator does not have graphic design expertise, it takes more time and effort to input the design patterns provided by the customer into the system. This paper mainly studies the design and production of visual communication graphics with machine vision as the core, and judges whether it can deal with the problems encountered during the process from design to processing. The system structure studied in this paper is shown in Fig. 21.1. With the support of the system software, the computer can study and analyze the image signal in the real memory, and do the processing work according to the way of human–computer interaction, so as to output the NUMERICAL control code file. And the computer can also operate the external equipment directly, to the corresponding machining data simulation test. As the execution unit of the whole system, CNC machining machinery should be processed according to the program instructions put forward by the computer. In the hardware design of the system studied in this paper, commercial general equipment with suitable performance and low price should be selected to ensure that the equipment is widely used and the actual investment cost is low [3].

Fig. 21.1 System structure diagram

21 Research on Visual Communication of Graphic Design Based …

219

21.2 Method 21.2.1 Design Process Graphics design with machine vision as the core is mainly divided into two elements, on the one hand refers to the design process, on the other hand refers to the design method. The original graphic information needs to be accurately transformed into visual information after sorting, analysis and processing, and the abstract concept is transformed into concrete content. Therefore, from the perspective of practical application, the visualization of graphic design is mainly divided into the following points: First, get the data. Combined with the analysis of machine vision structure diagram as shown in Fig. 21.2, it is found that in the process of visual communication graphic design, data survey is the basic task of graphic design, which is helpful for designers to grasp more design parameters and inspiration. In the process of collecting research results and information resources in large quantities, designers need to conduct investigation and analysis of data information, which takes longer than the actual design time [4]. Second, build layers. Visual communication information related to graphic design is composed of multiple data. Due to the irregular presentation of data, it is necessary not only to translate the information expressed, but also to filter and summarize the valuable content during the design. Therefore, mining the key information contained in the data and clarifying the relationship between them is an important link in visual communication graphic design. The construction of the information level should meet the requirements of the audience. On the one hand, it refers to the content level. If designers encounter the design chaos when dealing with large quantities of information, they must group the information, clarify the key and difficult points of information transmission, and then classify the information according to different situations and characteristics of the audience, and finally present the rational analysis state. On the other hand, it refers to the spatial level, which can play a guiding role in

Fig. 21.2 Structure flow chart of machine vision

220

Q. Zhou and T. Liu

system design, ensure that important information of design is placed in a prominent position, and then use signs, colors, shapes and other forms to remind the audience of the way and order of information observation [5]. Third, transform information. The transformation of graphic information is presented with data information as the core. The transformation of information only starts from graphic visualization. By transforming abstract information in graphics into visual symbols, visual language of information transmission can be formed. At this point, as the basic element of graphic design, the presentation of the chart not only has abstractness, but also contains specificity. Diagrams are a basic guide to graphic information design, so that after defining ICONS, you can define the basic representation of graphic information design. Fourth, use visual form. When designers use the software design frame diagram shown in Fig. 21.3 to design, they need to consider the visual aesthetic feeling, and use visual communication to timely convey the design intention, and effectively integrate the information structure and the form of information unit to produce visual pleasure. Using machine vision to complete the last step of image information processing directly affects the visual effect of the overall graphic design [5].

Fig. 21.3 Software design framework diagram

21 Research on Visual Communication of Graphic Design Based …

221

21.2.2 Visual Analysis Visual flow refers to the visual communication process of interface content, which needs to be analyzed according to human psychological and physiological habits to ensure that interface design conforms to the characteristics of visual flow. Specifically, it is divided into the following points [6]. First, visual flow and visual stance. When designing the visual process, it is necessary to conduct research according to the visual standpoint. Generally speaking, the visual center will affect the visual standpoint, and the layout rate will also hinder the transmission efficiency of page information and affect the visual transmission effect of the page. Secondly, different types of process design. The operation interface of graphic design based on machine vision requires users to pay attention to the work object, ensure that the process design conforms to the operation mode, and all elements layout is classified, so as to facilitate the understanding and communication of the program framework. The specific flow chart is as follows Fig. 21.4. Finally, design method. To ensure machine vision as the core of the visual communication of graphic design process, can present good visual effect, to use the following method to carry on the design analysis: first, build a unified visual guide, prompting the user according to the regulation of visual program for graphical information, security of graphic design can be unified and rational layout in the specific environment; second, create a visual rhythm, which can make the changeable and complex information meet the basic visual rules under the guidance of vision, and provide effective reference for user design; third, strengthen the contrast. There are also primary and secondary visual elements displayed by the system interface, so it is necessary to use prominent ways to compare during the system design, such as arrangement contrast, color contrast, dynamic and static contrast, etc., which can produce visual buffer at the same time, intuitively present the design focus [7].

Fig. 21.4 Flow chart of machine vision design analysis

222

Q. Zhou and T. Liu

Fig. 21.5 ZR32212 image sensor package diagram

ZR32212 image sensor is used in this paper. The amplifier is designed for pixel distribution, so the pixel response is independent of the CDS circuit, where the sensor has extremely low “fixed-mode noise”. The front end analog band gives each pixel a resolution of 10 bits. This highly parallel approach reduces the speed requirements for each analog circuit and reduces power losses for the entire circuit. Independently programmable red, green and blue PGA circuits enable color balance in the modal domain. The image sensor is a standard 48-foot LCC package with pins defined as shown in Fig. 21.5.

21.3 Result Analysis Presenting two-dimensional graphics and image information through a variety of media can meet users’ visual communication graphic design needs more quickly, such as engraving, paper-cut decoration, wood carving, mold marking, etc., all of which are common graphic design contents nowadays. Based on the mold trademark pattern designed by a mold company, this paper conducts empirical analysis to judge whether the visual communication graphic design scheme with machine vision as the core is effective. The original graphic design selection method is: first, the manuscript pattern in accordance with a reasonable proportion on the coordinate paper; second, multiple points are selected on the coordinate paper to mark coordinate values. Third, follow

21 Research on Visual Communication of Graphic Design Based …

223

the carving route to connect the points into lines; fourth, design numerical control instructions and code generation processing files. And the application of this design system, the specific operation is divided into the following points: first, the machine vision system to collect the trademark pattern on the business card; second, use image enhancement and wrinkle removal functions to obtain perfect design pattern images, the specific operation process is shown in the figure below; thirdly, the contour extraction function is selected to automatically obtain the contour of the pattern, and it is clear that the closed domain needs to be focused on during the processing. Fourth, choose vectorization function, the original pattern image into vector graphics file. The output graphic packet contains graphic composition, element properties and mutual relations. From image design to output data and other operations, need to be handed over to the computer system for automatic processing [1]. By comparing the two systems and design forms, it is found that it takes less than 30 min to obtain the graphic sample and generate the NUMERICAL control code, and the machining process takes about 40 min. Therefore, the visual communication graphic design with machine vision as the core of this paper can not only input and edit multimedia information, but also show the characteristics of automation, integration and universality of graphic design, so as to promote the graphic design work of various industries to become more effective [8].

21.4 Conclusion To sum up, computer graphics design language and visual communication design, as an important content of technological innovation and development at the present stage, have made excellent achievements in various fields of technical research. Combining machine vision to optimize the graphic design can not only enhance the three-dimensional sense and precision of the design content, but also further enhance people’s perception of the design effect. Therefore, while paying attention to the research work of visual communication graphic design, on the one hand, we should strengthen the research on machine vision technology system, and pay attention to the continuous innovation based on the requirements of graphic design. On the other hand, according to the development and changes of the times, we should strengthen the training of professional talents, actively learn from the excellent machine vision communication graphic design experience at home and abroad, and give priority to the selection of advanced technical software and ideas, so as to improve the level of graphic design and present better graphic visual effects.

224

Q. Zhou and T. Liu

References 1. Ying, W.: Research on the visual communication design based on technology of computer graphics. Adv. Mater. Res. 846–847, 1064–1067 (2013) 2. Chen, P.: Application research of information graphics design in web site design based on visual communication. In: 2019 International Conference on Smart Grid and Electrical Automation (ICSGEA), pp. 204–207 (2019) 3. Long, H., Su, X.L., Wang, X.T.: Design of embedded high-speed image communication system based on machine vision. Microcomput. Inform. 14, 44–46 (2009) 4. Oszust, M., Wysocki. M.: Recognition of signed expressions using symbolic aggregate approximation. In: International Conference on Artificial Intelligence & Soft Computing. pp. 745–756, Springer, Cham (2014) 5. Ma, Q.: Based on form means of the exhibition design visual communication research. In: IEEE International Conference on Computer-aided Industrial Design & Conceptual Design. pp. 1760–1763, IEEE (2010) 6. Li, H.D., Miao, C.Y., Yang, Y.L.: Research on conveyor belt image online acquisition technology based on machine vision. Wireless Commun. 003(006), 139–143 (2013) 7. Ouyang, M.Y.: Research on visual communication design system based on embedded network— ScienceDirect. Microprocess. Microsyst. 81, 103789 (2021) 8. Arvor, D., Durieux, L., Andres, S., Laporte, M. A.: Advances in geographic object-based image analysis with ontologies: a review of main contributions and limitations from a remote sensing perspective. Isprs J. Photogrammetry Remote Sens. 82(aug), 125–137 (2013)

Chapter 22

Research on the Adaptive Matching Mechanism of Graphic Design Elements Based on Visual Communication Technology Teng Liu and Qihao Zhou Abstract Based on the operation of the traditional plane graphic element matching system, it can be seen that the image matching error is too large during the design and application. Therefore, combining the existing technical software and practical design experience, it is very important to propose an adaptive matching system of plane graphic elements based on visual communication technology. In order to ensure the orderly operation of this design module, it is necessary to set the corresponding inductor circuit board on the basis of updating the system software architecture, and other hardware should also run along the original system hardware. The software design should use contour pixel numerical judgment method to obtain the corresponding graphic element features, select the HOG algorithm to calculate the matching of image elements, control the overall matching process according to the way of programming, so as to ensure the system has the adaptability. According to practice verification analysis, it is found that the adaptive matching system with visual communication technology as the core has a lower error rate than the original system, which proves that the system has a certain effectiveness during operation. Keywords Visual communication technology · Graphic design · Adaptive matching system · Feature extraction

22.1 Introduction With the rapid development of Internet technology, the innovation pace of digital image processing system is also accelerating, which makes the image information used in daily life and scientific research more and more rich, and reduces the application difficulty of system technology. How to acquire image information quickly, master the features contained in the system, and enhance the matching accuracy of image information is the main subject of image processing technology research at present stage. In the original technical research, most researchers proposed to use T. Liu (B) · Q. Zhou Wonkwang University, Iksan, Republic of Korea e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_22

225

226

T. Liu and Q. Zhou

the planar image element matching system to carry out relevant operations, but from the practical point of view, the matching accuracy of this system is too low. Therefore, based on the visual communication technology and adaptive technology, this paper continuously strengthens the operation accuracy of the matching system to ensure that the matching results of plane graphic elements meet the expected design requirements. In the context of economic globalization, visual communication technology, like messenger, integrates information data of different countries, which plays a positive role in cultural development and human progress [1]. As early as the middle of the nineteenth century, European and American countries put forward the theory of visual communication technology in printing art design. Although there are conceptual differences between visual communication and graphic design, there is no contradiction between them. Graphic design as a design activity carried out in the interior of the space, visual communication design is more perceptual and intuitive. Through the systematic study of visual communication technology, it can be seen that it belongs to a modern technology that uses visual expression and delivers to the audience, fully integrating advanced technological concepts and characteristics of the times. This technique can not only quickly obtain image features, but also continuously optimize the accuracy of system matching. Visual communication technology can be selected for to guarantee the fast in the adaptive matching system, this paper studies mainly on the system analysis, the hardware structure and the integrated circuit design and the design of the system software is weak area matching calculation module, matching calculation level in ascension, at the same time enhance the accuracy of the system of image processing [2]. According to the relationship between visual communication technology and graphic design, this paper discusses how to optimize the result of image feature extraction by using visual communication technology, and improve the original system operation by using high-precision image matching algorithm. This new design concept can not only solve the defects of the original system operation, but also be reasonably applied in other fields.

22.2 Method 22.2.1 Hardware Design According to the analysis of all the problems of the original system during operation, it is found that some software must be improved for system optimization design combined with visual communication technology. The updated system hardware framework involves communication interface, control circuit, image detection circuit, power supply, controller and so on. Since the circuit output frequency of the original matching system is low, it is difficult to meet the operation requirements during the use of system software. Therefore, AD 9850 special chip should be selected for the updated system circuit and regarded as the circuit signal output control center. And the communication interface in the

22 Research on the Adaptive Matching Mechanism of Graphic Design …

227

system should choose RS 232 model to enhance the transmission efficiency of the communication architecture of the system and ensure the operation effect and calculation accuracy of the actual adaptive matching system. At the same time, the design of the system circuit involves inductance board, capacitor board, circuit board in these three contents, and in the optimization of the hardware frame focus on the optimization of the system inductance board. The improved pole plate selects a supply of 10 and 6 V DC for the system to deliver the basic elements of the image in practical operation. At the same time, the inductance plate generator can transmit image signal accurately, which is helpful to further strengthen the matching level of system operation. Upload image characteristic data in time and transfer the corresponding processing results to the upper computer of the PC, so that the data information can be accurately matched and analyzed. In addition, the detector in the inductance board can ensure the normal temperature during the circuit operation, so as to ensure the stability of the whole system operation. The improved circuit is applied to the original system to ensure that the original system hardware remains unchanged, and the above system hardware becomes the realization environment of this design study [3].

22.2.2 Software Design The hardware framework and circuit designed above are taken as the basic conditions of this system design to understand the errors and related problems of the original system during the matching, and the system performance optimization analysis. By designing the operation module of image matching in the original system, the optimized software framework can be obtained as shown in Fig. 22.1: The feature extraction actually refers to the acquired element features contained in the image, where the contour pixel numerical judgment method in the visual communication technology is optimized to realize the adaptive control of the elements [4]. The system software structure designed in the above figure further realizes the matching performance, mainly involving the following two aspects: On the one hand, capture features. In this design, the contour pixel numerical judgment method in visual communication technology is used to optimize, which can quickly obtain the element features contained in the image. The original system uses ray method to carry out relevant operations, although it meets the requirements of traditional plane image design, the overall process is complicated in practice and the consumption of working time is large. Therefore, pixel method is used to reduce the difficulty of processing in this study. By uploading the plane image in the system input device, and obtaining the content to be analyzed and matched during graffiti, the corresponding image elements are obtained from it, and the pixel values and contour points of the elements are searched. Then the image processing module is used to master the group attributes of the image, which involves transparency, color, thickness and other content. The actual operation

228

T. Liu and Q. Zhou

Fig. 22.1 System framework

process is shown in Fig. 22.2 [5]: The element matching operation by HOG algorithm is used as a flow diagram for extracting features. Combining with the above analysis shows that in the graphic design elements during the running of adaptive matching system, obtain element characteristics of images, not only to manage according to the characteristics of construct the corresponding database storage, and use characteristics table stored in the form of relevant information, to ensure that eventually acquire the characteristics of the results with integrity and effectiveness [6]. On the other hand, the matching algorithm. According to the above operations, the element feature group of the image is obtained, and the HOG algorithm is used for element matching operation. During the acquisition of image features, transform are preprocessed to ensure that they have standardized gamma color space Q(x, y) and gamma space Z(x, y)gamma , so as to compress the image. The specific calculation Q(x, y)Z (x, y)gamma :in the above formula, Z represents the vector gradient of image elements in space, and (x, y) represents the spatial coordinates of image elements. Accurate calculation of gradient vector can intuitively present some features of the image and obtain corresponding edge information. The actual calculation results are shown as follows:

22 Research on the Adaptive Matching Mechanism of Graphic Design …

229

Fig. 22.2 Flow chart of feature extraction



ox(x, y) = t(x + 1, y) − t(x − 1, y) oy(x, y) = t(x, y + 1) − t(x, y − 1)

(22.1)

In the above formula, ox (x, y) represents the horizontal direction of the element in the image, oy (x, y) represents the gradient in the vertical direction of the element, and t represents the pixel value. According to the above formula to determine the direction during image matching, relevant calculation can be carried out, the specific formula is as follows: β(x, y) = arctan

ox(x, y) oy(x, y)

(22.2)

230

T. Liu and Q. Zhou

22.3 Result Analysis According to the above formula, not only can the orientation of image elements be defined, but also the accuracy of image matching be improved. The programming method selected in this design can further improve the adaptive matching performance of the system. According to the above hardware design, the optimization of the original system is studied, and the design of the adaptive matching system of graphic design elements with visual communication technology as the core is discussed. Combined with the software framework and system module proposed above, the self-adaptation of plane image elements and the design and analysis of matching system are completed. In order to quickly grasp the application of the system studied in this paper, as well as the processing of the original system problems, it is necessary to test and analyze the system operation, so as to grasp the performance differences of the system. First, build the system test platform. In this paper, PC is selected to complete the design simulation, and the specific operating environment is shown in the following Table 22.1. Build a perfect system test platform for the operating environment proposed in the above table, select a variety of plane images for element matching test, compare and analyze the performance differences between the original system and the system studied in this paper, and obtain the final test results. It should be noted that, in order to ensure the rationality of the actual test results, the selected image samples should be matched and analyzed with the pre-stored image elements in the platform, so as to determine the matching error probability of the system. Secondly, design test indicators. Understand the problems existing in the original system operation and design an adaptive matching system of image elements with visual communication technology as the core. In order to master the performance difference of the two systems, the error probability of image element matching is selected as the test index of the design research, and the specific error after image matching processing is compared, so as to study the performance of the design system Table 22.1 Operating environment

The structure direction

Content

Parameter

Software part

The operating system

Windows 7 Qt

The development tools

Vs2010

The processor

Intel Core A series of

OpenCV The hardware part

Memory

6 GB

The hard disk

200 GB

22 Research on the Adaptive Matching Mechanism of Graphic Design …

231

Table 22.2 Error rate comparison results of system matching Image sample number

Original system sample image matching error rate

In this paper, the system sample image matching error rate is designed

1

9.8

4.3

2

9.2

4.5

3

9.6

4.1

4

9.4

3

5

10

3.7

6

9

3.3

7

9.9

3.7

8

9.2

4.4

9

9

4.1

10

9.5

4.5

in this paper. During the operation of the platform, the actual processing results will be sent back to the control PC to calculate the error probability of the image matching results, and the calculated results will be stored in the module in the form of data tables, which can provide effective basis for subsequent research. Finally, test results. 12 groups of images were included in the test sample. Due to distortion in the early processing, two groups of data had to be cleared, and only 10 groups of images were left for matching analysis. The error rate (%) results of image matching between the original system and the system studied in this paper are shown in the following Table 22.2. Combined with the above table analysis, it can be seen that the error probability of the original system during the matching period is controlled between 9 and 10%, while the error probability of the system studied in this paper can be controlled between 3 and 5%, so the probability of the former is higher than that of the latter. Meanwhile, the lowest error probability of the original system can reach 9%, while the probability of this study can reach 4.5%. According to the above data analysis, it is found that the image matching error probability of the original system is higher than that of the system studied in this paper. At the same time, the system studied in this paper can effectively deal with the problems related to the original system to ensure that the optimized image meets the standard requirements. Therefore, in the current social development, the adaptive matching system of graphic design elements with visual communication technology as the core can be widely promoted [7]. Aiming at the problem of large error rate of image matching in the original plane graphic elements matching system, an adaptive matching system of plane graphic elements based on visual communication technology is proposed and designed. In order to ensure the normal operation of the software module in this design, it is found that the image matching error probability of the original system is higher than that of the system studied in this paper. At the same time, the system studied in this paper can effectively deal with the problems related to the original system to ensure that

232

T. Liu and Q. Zhou

the optimized image meets the standard requirements. Build system test link, set test index, obtain test result. It is proved that the image matching error rate of the proposed system is much lower than that of the original system. In conclusion, the adaptive matching performance of graphic elements in this system is better than that of the original system.

22.4 Conclusion To sum up, in the continuous innovation of social economy and science and technology, image data processing requirements are becoming higher and higher. At this time, to obtain more effective image information and improve the accuracy of information matching, it has gradually become the main research topic of digital image development. Based on the understanding of the operation of the traditional graphic design element adaptive matching system, according to the relationship between visual communication technology and graphic design, this paper discusses how to use visual communication technology to optimize the results of image feature extraction, and use high-precision image matching algorithm to improve the original system operation. This new design concept can not only solve the defects of the original system operation, but also be reasonably applied in other fields.

References 1. Zhang, J.: Interaction design research based on large data rule mining and blockchain communication technology. Soft. Comput. 24(21), 16593–16604 (2020) 2. Wang, B., Fan, T., Li, Z.Y., et al.: Research and analysis on the to be coordination mechanism of financial innovation and economic growth based on BP neural network. J. Intell. Fuzzy Syst. 37(5), 6177–6189 (2019) 3. Yao, Y., Chen, M.: The design of adaptive communication frame supporting high-speed transmission based on ModBus protocol. Procedia Comput. Sc. 183(8), 551–556 (2021) 4. Ma, C., Pan, Y.H., Zeng, C.Y.: Intelligent interaction design research based on block chain communication technology and fuzzy system. J. Intell. Fuzzy Syst. 1, 1–7 (2020) 5. Arakelian, V., Zhang, Y.: An improved design of gravity compensators based on the inverted slider-crank mechanism. J. Mech. Robot. 11(3), 034501 (2019) 6. Qiao, Y.P.: Research on graphic design method based on virtual vision technology. Contemp. Tourism 12, 339–339 (2019) 7. Zhao, R.J.: Research on the construction of visual aesthetic elements in graphic design. Tokyo Lit. 000(006), 47–48 (2018)

Chapter 23

Design of Intelligent Recognition English Translation Model Based on Improved Machine Translation Algorithm Ting Deng

Abstract Whether it is learning a language or traveling abroad, translation is a skill that must be mastered. Traditional human translation is too slow and expensive. Therefore, in the integration and development of artificial intelligence and translation industry, intelligent translation has become the main direction of scientific research and exploration. On the basis of understanding the current development status of intelligent translation, this paper designs the content of intelligent recognition English translation model based on improved machine translation algorithm and analyzes its application value from the perspective of practical application. In this paper, an intelligent recognition model for English translation is proposed to solve the problem of the low accuracy of the traditional algorithm model. Based on the improved algorithm framework, an intelligent recognition model of automatic search, data acquisition, processing, and output planning is designed to collect and process English signals in a planned way and extract characteristic parameters to realize intelligent recognition of English translation. Experimental analysis shows that the designed English translation recognition model has high accuracy and can meet the needs of English translation work. Keyword Translation algorithm · Intelligent identification · Translation model · Data collection · Data combining

23.1 Overview of Intelligent Translation Under the background of economic globalization, the information communication between the countries presents a high-speed liquidity, English as the current one of the most common language of international communication, with intelligent recognition technology in all areas of application value, based on improved machine translation algorithm of intelligent recognition application translation model obtained the comprehensive promotion [1–4]. T. Deng (B) Jiangxi Teachers College, Jiangxi, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_23

233

234

T. Deng

Artificial intelligence (AI) translation, which combines a number of technological forms, originated in the 1930s and was not applied to the market until computer science and technology were widely used. In essence, AI translation is the use of simple mechanical decoration to translate words, in other words, the use of dictionaries to transform words. Nowadays, the application direction of artificial intelligence translation is mainly divided into two points, on the one hand, machine translation, on the other hand, speech recognition. The former refers to the use of computers to translate one language into another language and does not need manual processing. The latter refers to the use of computer recognition and understanding of speech signals, and on the basis of signal processing, it will be transformed into text, and then other functions of processing [5–7].

23.2 Design and Analysis of Intelligent Recognition English Translation Model Based on Improved Machine Translation Algorithm 23.2.1 Applications In the rapid development of intelligent recognition technology, a large number of high-quality intelligent machine translation tools have emerged in the market. Although there are still some defects in the practical application, especially in the full-text scope, the server is used to compare and learn different language data, and the accuracy of the obtained grammar and character rules between languages is not high. Therefore, it is necessary to proceed with in-depth discussion and comprehensive improvement. If the user is not clear about the machine translation of “exchange rate,” then according to the experience analysis can get “rate,” and enter this word in the search engine for analysis, the final result can get the accurate machine translation of “exchange rate.”

23.2.2 Design Model Assuming that all Chinese matrices are F and English sentences are E, then the probability of E being translated into F by machine is p(e/ f )

(23.1)

And the problem of translating F machine into E can be thought of as computation eˆ = arg max P(e/ f )

(23.2)

23 Design of Intelligent Recognition English …

235

The derivation of this formula is as follows. According to the analysis of noise channel model, the retrieval of statistical machine translation is guaranteed to meet the requirements of the above formula. Its common scenario is · speech recognition. P | voice signal (text) = P (voice signal | text) P (text) · Machine translation. P (| the translated version) = P (the original |) P (translation) · Spelling correction P = P (right) | error text (error text | correct text) P (text) right. If the length of the English string e and the Chinese string is 1 and m, respectively, Deduced. f = f 1m = f 1 f 2 . . . f m

(23.3)

Alignment A can use the position information of each value to represent the position of the English word corresponding to the word in the Chinese sentence, so it can be obtained a = a1m ≡ a1 a2 ...am

(23.4)

The derivation of this formula.At this point, the value range of each value is [0, 1], then the formula as follows can be obtained: P( f, A|e ) = p(m|e )

m      j−1 j−1 p α j a1 , f 1 , m, e ·

(23.5)

j=1

    j j−1 p f j a1 , f 1 , m, e

The formula is analyzed according to the alignment process of Chinese and English sentences. According to the English sentences, the length of the Chinese sentence is obtained, and the position of the first Chinese word string is obtained, and then the first word of the Chinese sentence is obtained. The cycle of this process can complete sentence translation. After simplifying the above formula, the IBM machine translation model has designed certain preconditions, which can be divided into: first, if P (M|E) has no correlation with target language E and length L; Second, if     j−1 j−1 (23.6) p α j a1 , f 1 , m, e Is associated with the target language E and length, and you get     j−1 j−1 p α j a1 , f 1 , m, e =

1 l +1

(23.7)

236

T. Deng

    j−1 j−1 this formula; and Third, suppose p α j a1 , f 1 , m, e . It is related to f j and f ut ,       j−1 j−1 and then you get ε ≡ P(m|e ) and t f j |eat = p f j a1 , f 1 , m, e . Among   them t f j |eat the probability of a given eut and a given f j

(23.8)

After the Lagrange factor λ1 is fused, the corresponding machine model is translated as follows:

l l  m       s h( p, λ) = L t f j |αθ − λθ t( f |e ) − 1 (23.9) (l + 1)m al=0 a =0 j=1 θ l m

According to the IBM statistical model outlined above, it will be transformed into a reverse machine translation model, which can use the maximum entropy statistical machine method to perform accurate machine translation of the English language. At this point, with the transformation of the statistical machine source-channel model, the efficiency and accuracy of the machine translation model can be effectively improved, and the maximum likelihood prediction method can be used to obtain more improved data, such as θˆ = arg max s = θ

s 

pθ ( f s |es ) and γˆ = arg max γ

s=1

s 

pγ (es )

(23.10)

s=1

The improved formula is as follows      j eˆ1l = arg max pγˆ e1l · Pθˆ f 1 e1l

(23.11)

e1l

      j j After the new attribute is incorporated,Pθˆ e1  f 1l it will be replaced Pθˆ f 1 e1l , and the final extended statistical machine translation framework is      j (23.12) eˆ1l = arg max pγˆ e1l · pθˆ e1  f 1l j

e1

which can be retrieved from this, and more accurate translation results can be obtained [8–11]. Machine translation (MT) applications in the field of intelligent identification English mainly for two types, one is based on the rules for the application of core method, another way based on the application of corpus as the core, the former from the Angle of rationalism, find the common rule of language, whereas the latter is empirical, based on large quantities of real text to translation studies. According to the

23 Design of Intelligent Recognition English …

237

Fig. 23.1 Evaluation results of machine translation algorithms with different methods

Fig. 23.2 Source-channel model structure of machine translation

existing research results, human translation can guarantee the quality of translation and improve the speed of actual translation. Combined with the evaluation results of machine translation systems with different methods as shown in Fig. 23.1, it is found that the ISI system with the best performance is proposed by the Institute of Information Science of University of Southern California, and the actual BLEU4 value can reach 0.3393. Google’s system is second only to ISI system, and the actual value can reach 0.3316. This paper mainly discusses the machine translation model with word method as the core. The corresponding source-channel model is shown in Fig. 23.2, which is proposed by Brown et al. Of IBM T.J Watson Research Center [12]: The empirical study takes short sentence extraction as an example and constructs the mechanical translation algorithm model as shown in Fig. 23.3. Aiming at the corresponding relationship between source language and target language, the constraint conditions are satisfied and the probability table of phrase translation is trained, so as to provide decoding services for the later stage of translation. In the process of data collection, processing, and output, the intelligent recognition model is planned and analyzed, and the intelligent recognition architecture diagram for English translation can be constructed as shown in Fig. 23.4. First of all, the model design process should be established to clarify the application and implementation functions of the overall model. According to the model as shown in the figure below design flow chart analysis, the paper research model to output in collecting and processing data, on the basis of using the data acquisition device for speech signals, and using the audio input device will be passed to the signal processing system, the science and processing the data signal, finally obtained the results to the client and the output of the display, The user needs to use the display or client to view the translation recognition results (Fig. 23.5).

238

T. Deng

Fig. 23.3 Training flow chart of short sentence translation model

Secondly, we need to deal with English signals. Because there are many interference factors in English speech signal, the accuracy of speech signal information cannot be guaranteed. In order to improve the accuracy of the signal, the collected signal should be processed, as shown in Fig. 23.6. Finally, the characteristic parameters should be extracted. According to the structure diagram of parameter extraction shown in Fig. 23.7 below, the operation can provide an effective basis for subsequent calculation. In order to verify the application effect of the improved machine translation algorithm in the intelligent recognition English translation model, the research experiment in this paper selects 400 character proofreading vocabulary and 500 short articles proofreading vocabulary, and carries out empirical analysis based on 25 kB/s word recognition speed. Finally, the translation results are obtained as shown in Table 23.1. According to the analysis in the above table, the accuracy of English translation results is improved by 75.1% after the machine translation algorithm is improved, which proves that the intelligent recognition English translation model based on

23 Design of Intelligent Recognition English …

239

Fig. 23.4 Architecture diagram of intelligent recognition English translation model based on improved machine translation algorithm

the improved machine translation algorithm has a strong effectiveness. Meanwhile, according to the analysis of the distribution of recognition nodes shown in Figs. 23.6 and 23.7, it is found that the improved machine translation algorithm can more accurately check the distribution of points monitored by the system, accurately identify the problem of contextual incoherence, and finally obtain the translation results that meet the requirements of context (Figs. 23.8 and 23.9). In general, aiming at the problems existing in the traditional algorithm of machine translation, puts forward the improved algorithm of machine translation, and build intelligent identification model of English translation, center design improved algorithm based on phrases of phrase structure, strengthen the linear table syntax function, optimizing the structure of parts of speech recognition and English–Chinese structure, can improve accuracy of traditional algorithm during the identification. It provides a more effective way for short sentence recognition. The experimental results show that the improved machine translation algorithm can be used to construct an intelligent recognition English translation model, which can guarantee the accuracy of English–Chinese machine translation on the basis of the system processing English corpus, so as to meet the needs of actual users and solve the problems existing in traditional machine translation.

240

T. Deng

Fig. 23.5 Flow chart of model design

Fig. 23.6 Flow chart of English signal processing

23.2.3 Experimental Analysis An improved algorithm based on the phrase structure phrase is designed in the center of the English translation intelligent recognition model established above, and practical application tests are carried out here to demonstrate the applicability of the model. A total of 586,538 Chinese–English parallel corpus provided by an

23 Design of Intelligent Recognition English …

241

Fig. 23.7 Structure diagram of parameter extraction Table 23.1 Translation accuracy before and after table proofreading

Fig. 23.8 Distribution of node control points in traditional text recognition system

The serial number

Translation accuracy Before proofreading (%)

Proofread (%)

1

58.2

99.1

2

72.4

98.6

3

67.5

98.4

4

72.1

99.1

5

75.1

98.5

Precision mean

69.06

98.74

242

T. Deng

Fig. 23.9 Distribution of node control points of the system based on the improved algorithm

enterprise is selected, from which 1000 sentences are randomly selected for testing, 2000 sentences are used for development and the rest are used for training. In the experimental design and analysis, the test corpus will be divided into three test sets according to the sentence length: simple sentences, general sentences, and complex sentences, as shown in Table 23.2. The experimental analysis designed in this paper mainly uses the above model and the traditional machine translation method of syntactic analysis to calculate the BLEU value on the test set in the above table, and the results shown in Table 23.2 can be obtained. At this point, BLEU value represents an automatic evaluation method of machine translation. As the value increases, it represents the higher efficiency of machine translation (Table 23.3). Through the comparative analysis of tables, it can be seen that the effect of the design of over-long sentences in machine translation is far better than that of traditional texts, and the best combination form of different English language features in complex sentences can be obtained from it, and the ambiguity phenomenon in some language structures can be clear, so as to improve the efficiency of practical translation. Of course, in addition to the above translation model method, the machine translation method of specific forest and tree-based translation method can also be compared and analyzed from other perspectives. The final result is shown in Fig. 23.10, which still has the highest decoding performance as outlined in this paper. Table 23.2 Classifies the test set 8L32

6L32

C32 ACERT

C18 ACERT

Speed/rpm

720

720

1800

1800

Number of cylinders

8

6

12

6

Excitation frequency

48.0

36.0

180.0

90.0

23 Design of Intelligent Recognition English … Table 23.3 Comparative analysis results of translation

243

Physical quantity

System of units

Physical quantity

System of units

The length of the

m

Speed

m/s

The quality of

kg

The acceleration

m/s2

Time

s

The density of

Kg/m3

Area of

m2

Force

N = kg·m/s2

The volume

m3

Modulus of elasticity, stress

Pa = N/m2

Fig. 23.10 Decoding performance results of different methods

23.3 Conclusion To sum up, in the continuous innovation of AI translation field, the design and application of intelligent recognition English translation model based on machine translation are the focus of sustainable development. The algorithm is used to create an intelligent recognition model for automatic search, data acquisition, processing, and output planning. English signals are collected and processed in a planned way, and characteristic parameters are extracted to realize intelligent recognition of English translation. It solves the problem of low accuracy of traditional algorithm model fundamentally, and the English translation recognition model designed has high accuracy, which can meet the needs of English translation work.

References 1. Liu, J.: English machine translation model based on modern intelligent recognition technology. In 2020 International Conference on Computer Engineering and Application (ICCEA), pp. 444– 447, IEEE (2020)

244

T. Deng

2. Lin, L., Liu, J., Zhang, X., Liang, X.: Automatic translation of spoken English based on improved machine learning algorithm. J. Intell Fuzzy Syst. 40(2), 2385–2395 (2021) 3. Li, B.: Study on the intelligent selection model of fuzzy semantic optimal solution in the process of translation using english corpus. Wirel. Commun. Mob. Comput. 5, 1–7 (2020) 4. Liu, J., Lin, L., Liang, X.: Intelligent system of English composition scoring model based on improved machine learning algorithm. J. Intell. Fuzzy Syst. 40(2), 2397–2407 (2021) 5. Song, X.: Intelligent English translation system based on evolutionary multi-objective optimization algorithm. J. Intell. Fuzzy Syst. 10, 1–11 (2020) 6. Ji, C.Y., Xiong, Z.J., Hou, Y.F., Chen, M.: Design of network intelligent translation system based on human-computer interaction. Autom. Instrum. 08, 25–28 (2019) 7. Bi, S.: Intelligent system for English translation using automated knowledge base. J. Intell. Fuzzy Syst. 39(5), 1–10 (2020) 8. Wang, P., Cai, H.G., Wang, L.K.: Design of intelligent english translation algorithms based on a fuzzy semantic network. Intell. Autom. Soft Comput. 26(3), 519–529 (2020) 9. Yang, H., Yang, Y.: Design of English translation computer intelligent scoring system based on natural language processing. J. Phys: Conf. Ser. 1648(2), 022084 (2020) 10. Wen, H.: Intelligent English translation mobile platform and recognition system based on support vector machine. J. Intell. Fuzzy Syst. 38(153), 1–12 (2020) 11. Ding, Q., Ding, Z.: Machine learning model for feature recognition of sports competition based on improved TLD algorithm. J. Intell. Fuzzy Syst. 40(1), 1–12 (2020) 12. Kirz, J., Rarback, H., Shu, D., et al. IBM, T.J.: Watson Research Center, Yorktown Heights, N Y 10598, USA

Chapter 24

Ore Detection Method Based on YOLOv4 Taozhi Wang

Abstract Traditional ore identification methods usually analyze the physical or chemical properties of ore samples and get conclusions, but those methods are not efficient when faced with large quantities of samples. In this paper, the YOLOv4 object detection algorithm is introduced and trained as a classifier of 7 common ores, which can make classification and position prediction for the image of different kinds of ore samples. The model has a high recognition accuracy, and the loss function is easy to converge. Moreover, the model also adapted several data augmentation methods and utilize Auto Multi-Scale Retinex with Color Restoration (MSRCR) for reducing the influence photograph environment of the ore specimen. The paper also uses the random forest algorithm to enhance the adaptability of the model in complex environments. The model has the advantages of low cost, short detection time, and strong generalization ability, covering most application scenarios from raw ore to ore products. Keywords Ore identification · YOLO · Model · Random forest algorithm · Data augmentation

24.1 Introduction Traditional ore identification methods are based on the physical characteristics or chemical composition of ore samples. By analyzing the color, gloss, stripe, transparency, cleavage, hardness, and element composition of ore samples [1], researchers could obtain a classification result with high accuracy. [1, 2] However, this method has a high cost and a low efficiency when facing a large number of samples, which makes it difficult to be applied on a larger scale. Furthermore, those methods usually need special hardware or environmental conditions, and they also have certain technical requirements for users. Thus, traditional ore classification methods have a high professional requirement, which limits its application scenarios. T. Wang (B) Capital Normal University High School, Beijing 100142, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4_24

245

246

T. Wang

Previously, neural networks were mainly used in the analysis and determination of geological data, grade estimation of ore, geotechnical engineering, mining methods, and production capacity prediction in mining [3]. In the field of computer vision, image recognition algorithms such as convolutional neural networks (CNN), BP neural network (BP), or R-CNN are used to identify or classify ore images. This paper uses image data of seven common minerals. After data preprocessing, I train the network of the YOLOv4 object detection algorithm. The model has extremely high detection efficiency while ensuring accuracy, and it can classify and predict seven kinds of raw ores and their ore products and find the position of ore in the picture. In the experiment, it was found that individual ore sample’s images will have serious color differences due to different shooting conditions and various selected cross-sections. In those cases, it is difficult for the classifier to give an accurate classification based on the features gained from the original image. Moreover, the model has an unsatisfactory accuracy on images with low resolution or with obstructions. In order to solve the problems above, I utilize the random forest algorithm which can provide an alternative result and thus improving the accuracy of the classifier by combining the result of random forest and YOLOv4. Furthermore, this paper tries several data augmentation methods and Auto MSRCR to improve the generalization ability of the model. The main feature of this model is the fast and low-cost identification of ore. It is suitable for large-scale preliminary detection or identification as an auxiliary device. Such as robot recognition, pipeline recognition, a large number of sample sorting, or as an application layout on the mobile device to assist professionals to judge. This method uses the image data of ore samples to train the YOLOv4 target detection model for prediction. Large-scale detection can be achieved in a short time. At the same time, the detection method has a strong generalization ability and can be applied in most scenarios. The detection process does not need to contact the sample directly or move the sample, so the requirement of data collection is not high. We also find that by applying the data augmentation method, the model can have high accuracy in the case of only a small number of samples. In addition, the variable cost of this method is close to zero. With the increasement in detection volume, the short-term average cost will continue to decline. At the same time, the method fixed cost is low, the detection system only needs image data of the ore samples.

24.2 Related Work 24.2.1 Ore Classification In recent years, researchers have attempted different kinds of algorithms in ore classification. S Laine et al. using an online cluster algorithm and the Kohonen feature

24 Ore Detection Method Based on YOLOv4

247

map to classify Outokumpu Hitura mine [4]. Singh, V., T.N. Singh, and V. Singh classify the ores for blast furnace feed based on the visual texture of the ore particles [5]. There is also a multi-class support vector machine (SVM) model was developed for monitoring the iron ore classes [6]. With the advancement of the neural network, more and more researchers use deep learning to classify ore images. Naresh et al. [7] use RGB or grayscale images to train a neural network with 27 numerical parameters. Xiao et al. [2] proposed a neural network framework based on visible near-infrared spectroscopy of iron ore. This method provides an accurate and economical way for the primary selection of iron. However, this method only contains ore samples from a specific area and it needs to collected iron ore samples and measured the spectral data with spectrometers before training. Data acquisition requirements make it difficult to be applied to other fields. Furthermore, Veerendra Singh and Mohan Rao [8] also use neural network techniques to remove gangue material from ore particles.

24.2.2 YOLOv4 Traditional object detection algorithms such as R-CNN or Fast R-CNN need to generate object candidate frames first and then perform regression training [9]. However, YOLO (You Only Look Once) algorithm, proposed by R EDMOND [10] in 2016, completes the object classification and object location in one step, which leads to a faster detection speed than other two-stage detection methods. The object category and detection frame are regressed in the output layer, so to improve the detection efficiency. Bochkovskiy et al. proposed YOLOv4 [11] in 2020. Based on the original YOLO object detection structure, YOLO-V4 adopts several optimization strategies of CNN in recent years, including mosaic data augmentation, cross mini-batch normalization, self-adversarial training, and other methods. It has a very high improvement in processing accuracy. The backbone network of YOLOv4 is CSPResNext50, and the detection head uses the YOLOv3 model. The model also combines residual block structure and changes the activation functions and bounding box regression method (Fig. 24.1). Compared with other object detection algorithms, YOLOv4 has relatively higher accuracy and detection speed. Therefore, this paper chooses it as the basic classification algorithm.

24.2.3 Random Forest The random forest algorithm uses the bootstrap resampling method to build multiple decision trees, randomly select samples for prediction, and then combine all the

248

T. Wang

Fig. 24.1 Network structure of YOLOv4

results to predict the data [12]. The algorithm can handle input samples with highdimensional features well. In this paper, I cut the ore part of the pictures in the training set as new training data, then I resize the photos and transform them into a training matrix. Finally, the fitting of the algorithm is completed, and the prediction is used as an alternative answer when the object detection method cannot work effectively.

24.3 Experiments 24.3.1 Implementation Details The experiments in this article are all done on Windows 10 operating system, without using GPU for accelerating training, using python3 language and PyTorch framework. The dataset in this article comes from the public dataset on the Kaggle platform [13] from Albert Klu, the Geological engineering department of the University of Mines and Technology. It contains pictures and annotations of seven common minerals. I selected 744 pictures in seven categories, including biotite, bornite, chrysocolla, malachite, muscovite, pyrite, and quartz. The training set of each type of ore is about 50–150, and the data used in the training set and the validation set have been normalized before start training. The training pictures cover various content from raw ores, polished ores, ore products, etc. It meets the requirements of ore classification use scenarios. Figure 24.2 is a ore sample of training set.

24 Ore Detection Method Based on YOLOv4

249

Fig. 24.2 An ore sample of training set. The annotation of photo is “chrysocolla”

24.3.2 YOLOv4 Object Detection All the data was scrambled and preliminary selected, 744 pictures were finally selected. I draw the frame of ore and label it for each image. Finally, I obtain the VOC format of the label and converted it into YOLOv4 dataset label format. Then I use the processed dataset to start training the model. After finishing parameter adjustments, some of the hyperparameters of the model are set as follows: input shape: (416,416); learning rate: 1e−3; batch size: 16; total epoch: 47; validation set: 10% of data. The model also uses a series of methods for optimizing. The dataset of the model is relatively small and ore photo’s feature is not obvious, so the model was found to be prone to overfitting in previous experiments. I utilize DropBlock [14] Normalization to reduce this problem. DropBlock drops continuous regions with the intention of removing specific semantic information. Thence, the weight of information in other parts will be enhanced. The network can generate a simpler model and reduce the appearance of overfitting. Moreover, the model uses the Adam optimization algorithm, which combines momentum and RMSProp, and uses exponential moving average to calculate the first-order momentum. As shown in the formula. optimizer = optim.Adam(net.parameters(), lr, weight_decay = 5e − 4)

(24.1)

250

T. Wang

24.3.3 Neural Network Results I use Complete IoU Loss (CIoU) to calculate model’s loss function. L CIOU = 1 − IOU(A, B) +

ρ 2 (Actr + Bctr ) +α·v C2

(24.2)

The IoU Loss of model is 1 − IOU(A, B) and IOU(A, B) is calculated by the following formula, where A is the area of detection box and B is the area of prediction box. IOU(A, B) =

A∩B A∪B

(24.3)

In order to prevent the loss function from being unable to provide gradients when the predicted frame and the real frame do not overlap, CIoU introduces two other penalty terms to quickly reduce the standardized distance and aspect ratio between the detection image and the real image [15]. Thus, it speeds up the convergence of the model, as shown in Fig. 24.5 (Fig. 24.3). After completing 47 generations of training, the total loss of the model is 5.1020, and the loss of the validation set is 5.2196. Figures 24.4 and 24.5 show the convergence of the total loss function and the verification set loss function. After data augmentation and Auto MSRCR processing of certain kinds of ore photos, the model’s accuracy is shown in Fig. 24.6. It can find that different kinds Fig. 24.3 An instance of ore sample’s target detection. The model gives out the position of ore and recognizes it

24 Ore Detection Method Based on YOLOv4 Fig. 24.4 Total loss function of the model

Fig. 24.5 The loss function on validation set

Fig. 24.6 The precision of each kinds of ore and the accuracy of model

251

252

T. Wang

of ores perform contrarily. The highest precision is muscovite ore and the lowest is biotite ore. The total accuracy of model is 93.28%.

24.3.4 Data Augmentation It is found that the model’s accuracy is extremely dissimilar when making classification on different kinds of ores. The YOLOv4 target detection algorithm extracts feathers such as color, gloss, surface texture, reflectivity. Ores which have similar physical structure and characteristics or ore sample which have similar contrast and photos’ environment are hard to be distinguished correctly (Fig. 24.7). In order to reduce overfitting and improve the accuracy of the model, researchers usually apply data augmentation methods such as flipping, rotation, scaling, cropping, translation, or Gaussian Noise to their dataset. We can also use Generative Adversarial Nets (GANs) to generate images of different styles [16]. I expanded the dataset of several specific kinds of ores and utilize mosaic data augmentation (Fig. 24.8). The loss function of the two datasets is shown in Fig. 24.9. It can find that the expanded dataset can converge quickly since the training set is bigger. After 35 epochs, the smaller dataset have a smaller loss than expanding dataset in both the training set and validation set. The training photos involved ores in multiple situations such as raw ore, ore products. Photos also have a complex background. It is difficult for the model to make calibration for the prediction box. Thus, enlarging the dataset may increase the loss since it introduces more photos in a more complex shooting environment. The choice of hyperparameter may also lead to this situation. However, the expanded dataset did well in distinguishing ores that have similar colors and structures. The accuracy of quartz increases from 82.8% to about 95% and the accuracy of malachite rises from 87.4% to about 94%. Fig. 24.7 Two pairs of ores that have a lower average precision. The first pair is muscovite and quartz. The second pair is bornite and chrysocolla

24 Ore Detection Method Based on YOLOv4

253

Fig. 24.8 The dataset arrangement before and after expanding

Fig. 24.9 The loss function of smaller dataset and expanded dataset

Data analysis indicates that most of the incorrect predictions come from small ore objects. When ore samples are photographed with other kinds of ores or covered with materials that are similar in color to the surrounding environment, it is difficult for the model to recognize it. Most of this situation happens in small ore objects which contain fewer anchors in detecting. Therefore, I utilize mosaic, a method that combines four training images in a certain proportion to form a new one, to encourage the model to learn how to detect smaller ore samples. Moreover, I also oversample part of the pictures which contains small ore samples [17]. When I continue to train the expanded dataset with mosaic data augmentation and learning rate decay, the validation loss decreases to about 4.5081 in the 55 epochs. Therefore, by applying data augmentation, the model can enhance the accuracy of certain ores, but there is no obvious effect in improving the prediction of ore’s location (Fig. 24.10).

254

T. Wang

Fig. 24.10 The loss function of expanded dataset with data augmentation

24.3.5 Auto MSRCR During the detection, there are plenty of images which have been overexposed (mostly is ore crafts) or which has similar veins and color trait with the surrounding (raw ores’ photograph). In some cases, images are photographed in a dark or backlight situation. This severely affected the accuracy of the model. I decide to enhance the image by using Auto MSRCR (Auto Multi-Scale Retinex with Color Restoration) to alleviate the problem above. Auto MSRCR calculates the average value of three times’ Retinex and then makes color balance, normalization, and deviation linear weighting to the result (Fig. 24.11). After the photos of several kinds of ore are enhanced, I train the YOLOv4 model for 86 epochs. The loss function of the model converged slower because of the use of data augmentation methods. After 86 epochs, the smallest total loss is 5.0429. Since auto MSRCR only impacts a small amount of data. The hyperparameter selection makes

Fig. 24.11 An instance of Auto MSRCR (left is the original photo, the right one is the image after enhancing)

24 Ore Detection Method Based on YOLOv4

255

Fig. 24.12 The loss function with augmented extended dataset

it does not affect the model too much. However, Auto MSRCR has an obvious effect on the pictures with overexposed problems. The accuracy of the model increases from 90.75 to 93.28%. The model generalization ability of the model is improved because it has a satisfactory result in an overexposed environment or a dark place (Fig. 24.12).

24.3.6 Random Forest Classification Experiment This paper first resizes the pictures into (100,100,3) and finishes standardization. 20% of the data will be validation set. Then, I equally divided each channel into eight groups and calculate the histogram of them. The train matrix’s size will be (516*n), where n is the number of photos. Random forest classifier finally generates 50 random trees with max depth 7. For some photos that have a low clarify or pictures that were obscured, random forest performs better than YOLOv4. Thence, I set the threshold value as 0.2. If the YOLOv4 prediction value is smaller than the threshold, the model will mark this image. Then, it will save the area inside the prediction box gained from YOLOv4 as a new photo and send it to a random forest algorithm. The random forest will return a final prediction label of this ore sample to replace the previous one. Thus, the model gains a higher accuracy (Table 24.1).

24.4 Conclusions This paper designs an ore classification method based on the YOLOv4 target detection algorithm, and it has made certain optimizations for the data. By combining the random forest algorithm and auto MSRCR, I improve the accuracy of the model

256

T. Wang

Table 24.1 The confusion matrix of random forest algorithm Biotite Biotite

Bornite

Chrysocolla

0

Muscovite

Pyrite

Quartz

0

3

3

0

Bornite

2

22

0

0

0

1

Chrysocolla

0

1

21

1

0

0

0

Malachite

0

0

1

26

0

0

0

Muscovite

1

2

0

0

3

7

2

Pyrite

2

1

0

0

0

12

4

Quartz

1

1

0

0

0

24

2

0

Malachite

7

1

in certain situations and finally gain an ore detection method of seven categories with high accuracy and detection speed. This method is suitable for most application scenarios. But at the same time, there are uncontrollable deviations in the recognition accuracy of the model. Therefore, it is not suitable to be used in recognition fields that require stable accuracy or professional detection results. In addition, the analysis results stay on the classification and position, and it cannot provide more information for specific ore sample. Subsequent research can focus on the difference in image features between different minerals. Principal component analysis (PCA) can be used to map texture features to improve the recognition accuracy of minerals with similar colors. In addition, the classification criteria can be subdivided by using grayscale images, and other object detection methods can also be tried [18, 19].

References 1. Cutmore, N.G., Liu, Y., Middleton, A.G.: Ore characterization and sorting. Miner. Eng. 10(4), 421–426 (1997) 2. Xiao, D., Le, B.T., Ha, T.T.L.: Iron ore identification method using reflectance spectrometer and a deep neural network framework. Spectrochemica Acta Part A: Mol. Biomol. Spectrosc. 248, 119168 (2021) 3. Donskoi, E., et al.: Utilization of optical image analysis and automatic texture classification for iron ore particle characterization. Miner. Eng. 20(5), 461–471 (2007) 4. Laine, S., Lappalainen, H., Jämsä-Jounela, S.L.: On-line determination of ore type using cluster analysis and neural networks. Miner. Eng. 8(6), 637–648 (1995) 5. Singh, V., Singh, T.N., Singh, V.: Image processing applications for customized mining and ore classification. Arab. J. Geosci. 4(7), 1163–1171 (2011) 6. Kumar, P.A., Chatterjee, S., Gorai, A.K.: Development of machine vision-based ore classification model using support vector machine (SVM) algorithm. Arab. J. Geosci. 10(5), 107 (2017) 7. Naresh, S., et al.: Textural identification of basaltic rock mass using image processing and neural network. Comput. Geosci. 14(2), 301–310 (2010) 8. Veerendra, S., Rao, S.M.: Application of image processing and radial basis neural network techniques for ore sorting and ore classification. Miner. Eng. 18(15), 1412–1420 (2005)

24 Ore Detection Method Based on YOLOv4

257

9. Lee, C.H., Lin, C.W.: A two-phase fashion apparel detection method based on YOLOv4. Appl. Sci. 11(9), 3782 (2021) 10. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 779–788 (2016) 11. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934 (2020) 12. Mahesh, P.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1), 217–222 (2005) 13. Minerals Identification Dataset [EB/OL]. https://www.kaggle.com/asiedubrempong/mineralsidentification-dataset (2019) 14. Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. arXiv preprint arXiv:1810.12890 (2018) 15. Zheng, Z.H., et al.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. arXiv preprint arXiv:2005.03572 (2020) 16. Wang, J., Perez, L.: The effectiveness of data augmentation in image classification using deep learning. Convolutional Neural Netw. Vis. Recognit arXiv:1712.01621 (2017) 17. Kisantal, M., Wojna, Z., Murawski, J., Naruniec, J., Cho, K.: Augmentation for small object detection. arXiv preprint arXiv:1902.07296 (2019) 18. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Scaled-YOLOv4: scaling cross stage partial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13029–13038 (2021) 19. Li, H., Deng, L., Yang, C., Liu, J., Gu, Z.: Enhanced YOLOv3 tiny network for real-time ship detection from visual image. IEEE Access 9, 16692–16706 (2021)

Author Index

A An, Wentian, 91

C Cheng, Feiyan, 1 Chen, Xingxin, 67 Chen, Zizhong, 111

D Deng, Ting, 233

F Fang, Da, 131

G Geng, Yue, 77

H Hao, Xiaoxi, 11 Huang, Cailiang, 57 Huang, Xiaoqiao, 1, 47

L Li, Junhui, 11 Liang, Jiehong, 11 Liang, Shaojun, 131 Lin, Lin, 189 Liu, Renju, 11 Liu, Teng, 217, 225 Liu, Tie, 199 Liu, Yiyang, 101, 179 Li, Yingzhuo, 101 Li, Yongxin, 47 Li, Yongzheng, 57 Li, Yue, 153 Luo, Renbo, 131 Lv, Shangjin, 1

Q Qin, Zhibao, 131

S Shi, Junsheng, 1 Sun, Yu, 47

T Tang, Yuchi, 199 J Jing, Zhuangzhuang, 141

K Kang, Houliang, 35 Kin, Kenneth Teo Tze, 11 Kong, Bin, 153

W Wang, Can, 153 Wang, Delong, 101, 179 Wang, Junkuan, 111 Wang, Lichao, 169 Wang, Qiuyan, 199

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 L. C. Jain et al. (eds.), 3D Imaging—Multidimensional Signal Processing and Deep Learning, Smart Innovation, Systems and Technologies 297, https://doi.org/10.1007/978-981-19-2448-4

259

260 Wang, Taozhi, 245 Wang, Tianlei, 11 Wang, Yitao, 199 Wang, Zhixu, 47 Wei, Guangdong, 199 Wen, Qing, 111 Wu, Shulin, 67 Wu, Yannian, 101, 179

X Xia, Shuyin, 67 Xin, Zhihui, 47 Xuan, Jiayu, 47 Xu, Chengjie, 153 Xu, Jinzhao, 11

Author Index Xu, Saihua, 207

Y Yang, Yuting, 35 Yi, Bingliang, 153 Yuan, Mengting, 47

Z Zhang, Nan, 101, 179 Zhang, Suxiang, 179 Zhang, Yiying, 101, 179 Zhang, Zehua, 23 Zheng, Fang, 121 Zhou, Qihao, 217, 225