Genetic and Evolutionary Computing: Proceedings of the Fifteenth International Conference on Genetic and Evolutionary Computing (Volume II), October 6–8, 2023, Kaohsiung, Taiwan 981999411X, 9789819994113

This second volume of conference proceedings contains selected papers presented at ICGEC 2023, the 15th International Co

120 51 47MB

English Pages 522 Year 2024

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Genetic and Evolutionary Computing: Proceedings of the Fifteenth International Conference on Genetic and Evolutionary Computing (Volume II), October 6–8, 2023, Kaohsiung, Taiwan
 981999411X, 9789819994113

Table of contents :
Preface
Organization
Contents
Vehicle Traveling Speed Prediction Based on LightGBM Algorithm
1 Introduction
2 Literature Review
3 Methodology
3.1 Data Preprocessing
3.2 Travel Speed Prediction on Expressway
4 Experiments and Analysis of Results
4.1 Data Sources and Pre-processing
4.2 Parameter Selection
4.3 Evaluation Indicators
4.4 Experimental Results and Analysis
5 Conclusions
References
A Feature Matching Method Based on Rolling Guided Filter and Collinear Triangular Matrix Optimal Transport
1 Introduction
2 Proposed Framework
2.1 The Keypoint Response Computation Method Based on Rolling Guided Filter Constraints
2.2 Feature Matching Based on Property Optimization with Keypoint Response Constraints
2.3 Collinear Triangular Matrix Optimal Transport
3 Experimental Analysis
3.1 Analysis of Scale Factor of Rolling-Guided Filter
3.2 Property Constraint Analysis in Feature Matching
3.3 Comparison and Analysis with Existing Methods
4 Conclusion
References
Short-Time Driving Style Classification and Recognition Method on Expressway
1 Introduction
2 Methodology
2.1 Data Preprocessing
2.2 Feature Construction
2.3 Selection of the Number of Driving Style Categories
2.4 Short-Time Driving Style Classification Method
2.5 Short-Time Driving Style Recognition
3 Experimental Results and Analysis
3.1 Analysis of Driving Style Classification Results
3.2 Short-Time Driving Style Recognition Results
4 Conclusion
References
Expressway Short-Term Traffic Flow Prediction Based on CNN-LSTM
1 Introduction
1.1 Background
1.2 Related Work
2 Methods
2.1 Convolutional Neural Network
2.2 Long Short-Term Memory
2.3 Establishment of CNN-LSTM Model
3 Experimental Research
3.1 Experimental Analysis
3.2 Model Parameter Adjustment
3.3 Model Performance Evaluation Index
3.4 Comparison Model Experiment
3.5 Result Analysis
4 Conclusions
References
Expressway Traffic Speed Prediction Method Based on KF-GRU Model via ETC Data
1 Introduction
2 Algorithm
2.1 Problem Description and Definition
2.2 Overall Framework
2.3 Construction of Section Speed Dataset
2.4 Kalman Filter
2.5 Gated Recurrent Unit
3 Experimental Results and Analysis
4 Conclusion
References
Dynamic Brightness Adjustment of Tunnel Lighting Based on ETC Transaction Data
1 Introduction
2 Bidirectional LSTM Model
2.1 Basic Principle
2.2 BiLSTM Network
3 Simulation Experiments
3.1 Dataset Introduction
3.2 Data Preprocessing
3.3 Evaluation
3.4 Experiment Results
4 Conclusion
References
Short-Time Traffic Flow Prediction of Highway Toll Station Based on Combined GRU-MLP Model
1 Introduction
2 Related Work
3 GRU-MLP Combination Model
3.1 GRU Neural Network
3.2 MLP Neural Network
3.3 Combination Model
4 Experiment
4.1 Data Presentation and Analysis
4.2 Evaluation Indicators
4.3 GRU-MLP Combined Model Predictions and Analysis of Results
5 Conclusion
References
Real-Time Carbon Emission Monitoring and Prediction Method of Expressway Based on LSTM
1 Introduction
2 Carbon Emission Prediction Model Based on RNN-LSTM
2.1 RNN-LSTM Neural Network Model
2.2 Carbon Dioxide Emissions Accounting Method
2.3 Vehicle carbon Emission Calculation Model
3 Simulation Experiments
3.1 Data Description
3.2 Introduction of Comparative Experiment
3.3 Evaluation Index and Parameter Setting
3.4 Experimental Results and Analysis
4 Conclusion
References
PSR-GAN: Unsupervised Portrait Shadow Removal Using Evolutionary Computing
1 Introduction
2 Related Work
3 Data Synthesis
3.1 Shadow Model
3.2 Shadow Image Synthesis
3.3 Dataset Composition
4 Methodology
4.1 Evolutionary Computing
4.2 Network Architecture
4.3 Loss Function
5 Experiments
5.1 Experimental Settings
5.2 Experimental Results
6 Conclusion and Future Works
References
Chebyshev Inequality and the Identification of Genes Associated with Alzheimer’s Disease
1 Introduction
1.1 Introduction to AD and Gene Chip Technology
1.2 Simple Idea of Chebyshev’s Inequality Applied to Genetic Screening
2 Materials and Methods
2.1 Data Collection
2.2 Data Preprocessing
2.3 Correlation Obtained by Two Ways
2.4 Changes in Disease Relevance at Different Stages
2.5 Chebyshev’s Inequality Screens for Genes Associated with AD
3 Results
3.1 Correlation Between Cosine Similarity and the Pearson Correlation Coefficient Measure for AD Genes
3.2 Pearson’s Correlation Coefficient and Cosine Similarity Measures the Difference in Correlation Between Different Stages
3.3 The Chebysheff Inequality was Used to Screen Candidate Genes from Changes Based on Pearson Correlation Coefficient and Cosine Similarity
4 Discussion
4.1 Discussion of Correlation Calculation Results
4.2 Analysis of AD Candidate Genes
5 Conclusions
References
A Novel Crossover Operator Based on Grey Wolf Optimizer Applied to Feature Selection Problem
1 Introduction
2 Material and Methods
2.1 Grey Wolf Optimizer as Crossover Operator
2.2 Description of Datasets and Evaluation Methods
3 Results
3.1 Compare with Different Feature Selection Algorithms
3.2 Compare with Different Crossover Operators
3.3 Real AD Data Applications
4 Discussion
5 Conclusion
References
Research on the Detection Method of Sybil Attacks on Wireless Terminals in Power Internet of Things
1 Introduction
2 The Impact of Sybil Attacks
2.1 The Impact of Sybil Attacks on Localization
2.2 The Impact of Sybil Attacks on Data Transmission
2.3 The Impact of Sybil Attacks on Routing
3 Sybil Attack Detection and Resistance
3.1 Sybil Attack Detection Scheme
3.2 Improved Positioning Strategy of Anti-Sybil Attack
3.3 Improved Routing Strategies to Counter Sybil Attacks
4 Simulation Experiment and Analysis
4.1 Simulation Experiment and Analysis
4.2 Algorithm Simulation and Analysis
5 Conclusion
References
Probability Vector Enhanced Tumbleweed Optimization Algorithm
1 Introduction
2 Related Work
2.1 Compact Particle Swarm Optimization
2.2 Tumbleweed Optimization Algorithm
3 Probability Vector Enhanced Tumbleweed Optimization Algorithm
3.1 Basic Idea
3.2 Seedling Growth Stage
3.3 Seed Propagation Stage
4 Numerical Experimental Analysis
5 Conclusion
References
Improving BFGO with Apical Dominance-Guided Gradient Descent for Enhanced Optimization
1 Introduction
2 Related Work
2.1 Metaheuristic Algorithms and Gradient Descent
2.2 Bamboo Forest Growth Optimization (BFGO) Algorithm
3 Methodology
3.1 Apical Dominance in BFGO-ADGD
3.2 Gradient Descent in BFGO-ADGD
3.3 BFGO with Apical Dominance-Guided Gradient Descent
4 Experiment and Analysis
5 Conclusion
References
Research on Innovative Social Service Training Mode in Higher Vocational Colleges Under the New Situation
1 Introduction
2 The Connotation and Significance of Social Service in Vocational Colleges
2.1 The Connotation
2.2 The Significance
3 Analysis of Social Service Training Model in Higher Vocational Colleges
3.1 Content of Social Service Training Model
3.2 Issues in the Social Service Training Model
4 Strategies for Innovating the Social Service Training Model in Higher Vocational Colleges
4.1 Establishing New Social Training Organizational Structures
4.2 Building a High-Quality Training Faculty
4.3 Innovating the Social Service Training Model
4.4 Introduction of New Generation Information Technology Training Methods
4.5 Strengthening School-Enterprise Collaborative Training
5 Conclusion
References
Research on the Precise Teaching Path of Higher Vocational Colleges Under the Concept of OBE in the Digital Era
1 Introduction
2 The Connotation of OBE Concept and Precise Teaching
2.1 The Concept of OBE
2.2 Precise Teaching
3 The Necessity of Implementing Precise Teaching in Higher Vocational Colleges Under the Concept of OBE
3.1 An Important Avenue to Meet Personalized Development Needs
3.2 An Important Means of Adapting Diverse Learning Paths
3.3 A Significant Approach to Implementing Generative Teaching Strategies
3.4 A Significant Form of Digital Empowerment Education
4 The Implementation Path of Precise Teaching in Higher Vocational Colleges Under the of Concept in the Digital Age
4.1 Define Learning Outcomes and Establish Precise Teaching Objectives
4.2 Attain Learning Outcomes by Precision Designing of Instructional Processes
4.3 Evaluate Learning Outcomes and Precisely Diagnosing Achievements
References
An Optimal Inventory Replenishment Strategy with Cross-docking System and Time Window Problem
1 Introduction
2 Problem Description and Assumptions
3 Model Development
4 A Case Study of a High-Tech Industry in Taiwan
5 Conclusion
References
Proposal of a DDoS Attack Detection Method Using the Communication Interval
1 Introduction
2 Research Background
2.1 DDoS Attack
2.2 DDoS Attack Detection Methods
2.3 Previous Research
3 Proposed Method
3.1 Packet Filtering Using the Communication Interval
3.2 Implementation
4 Evaluation Experiments
4.1 Purpose of Experiment
4.2 Experimental Methods
4.3 Experimental Conditions
4.4 Experimental Procedure
4.5 Experimental Results
5 Conclusions
References
Study of an Image-Based CAPTCHA that is Resistant to Attacks Using Image Recognition Systems
1 Introduction
2 Related Work
2.1 CAPTCHA
2.2 Fooling Image Recognition by Many Discontinuous Points and Its Application to CAPTCHA
3 Proposed Method
3.1 Summary
3.2 Image Processing Methods
3.3 Suggested CAPTCHA
4 Evaluation Experiment
4.1 Experiments to Evaluate the Utility of the Images
4.2 Evaluation Experiments on Resistance to Image Recognition Attacks
4.3 Evaluation Experiments on Resistance to Noise Reduction Filters
5 Conclusion
References
Blind Image Quality Assessment Using Standardized NSS and Multi-pooled CNN
1 Introduction
2 Proposed NSS Integrated Multi-pooled CNN Method
2.1 Image Features Extraction Using Multi-Pooled CNN
2.2 Natural Scene Statistics Features Extraction
2.3 Methodology
3 Experimental Results
4 Conclusion
References
Abnormal Activity Detection Based on Place and Occasion in Virtual Home Environments
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Activities Simulation Task Using VirtualHome2KG
3.2 Object Detection
3.3 Feature Extraction
3.4 Awareness of Human-Object Relationships
3.5 Decision-Making Process Using Hidden Markov Model
4 Experimental Results
4.1 Abnormal Activity Detection Based on Place and Occasion
4.2 Limitations and Discussion
5 Conclusion
References
Proposal of Fill in the Missing Letters CAPTCHA Using Associations from Images
1 Introduction
2 Proposed CAPTCHA
2.1 Basic Idea
2.2 Adoption of Generic/Specific Concept Relations
3 Exploring Suitable Parameters of the Proposed CAPTCHA
3.1 Number of Missing Letters of Answer Words
3.2 Positions of Missing Letters
4 Security Evaluation
4.1 WordNet
4.2 Bot Attack Procedure
4.3 Experimental Conditions
4.4 Results
5 Conclusion
References
Digital Transformation (DX) Solution for Monitoring Mycoplasma Infectious Disease in Calves: A Worldwide Health Challenge
1 Introduction
2 Proposed System
2.1 Stage 1: Development of Digital Transformation (DX) Tools
2.2 Stage 2: Data Interpretation and Analysis
2.3 Stage 3: Information Integration
2.4 Stage 4: Utilization of AI Advances for Decision Making Process
3 Some Illustrative Simulation Results
3.1 Mahalanobis Distance-Based Mycoplasma Classification
4 Conclusions
References
AI Driven Movement Rate Variability Analysis Around the Time of Calving Events in Cattle
1 Introduction
2 Some Related Works
3 Materials and Methods
3.1 Data Collection and Preprocessing
3.2 AI Driven Movement Detection and Tracking Process
3.3 Feature Extraction and Movement Variability Calculation
3.4 Calving Time Prediction Process
4 Experimental Results
4.1 Cattle Detection and Evaluation Results
4.2 Cattle Tracking and Evaluation Results
4.3 Calving Prediction and Analysis Results
5 Discussions and Conclusions
References
Channel-Wise Pruning via Learn Gates&BN for Object Detection Tasks
1 Introduction
2 Related Work
3 Method
3.1 Pruning from Scratch Method
3.2 Learn Gates & Batch Normalization
4 Experiment
4.1 Network Model
4.2 Datasets
4.3 Learning Gates&BN and Fine-Tuning
4.4 Experiment Results
4.5 Analysis
5 Discussion and Conclusion
References
Prediction of Sepsis Mortality Risk Based on Ensemble Learning Algorithm FBTV
1 Introduction
2 Related Work
2.1 MIMIC-IV Database Introduction
2.2 Ensemble Learning
3 Methodology
3.1 Data Extraction
3.2 Data Preprocessing
3.3 Feature Selection
3.4 Experimental Dataset
3.5 Ensemble Learning Model Construction
4 Experiments
4.1 Ensemble Learning Algorithm Based on Voting Mechanism
4.2 Weighted Fusion Ensemble Learning Algorithm
4.3 Experimental Analysis
5 Conclusion
References
Research on the Influence Mechanism of College Students’ Learning Satisfaction Based on the Chain Mediation Model
1 Introduction
2 Method
2.1 Research Subjects
2.2 Research Tools
2.3 Research Hypothesis
2.4 Research Model
3 Experiment
3.1 Common Method Biases Test
3.2 Reliability and Validity Analysis
3.3 Descriptive Statistical Analysis
3.4 Correlation Analysis Among Variables
3.5 Chain Intermediary Effect Analysis
4 Conclusion
References
Design of License Plate Recognition System Based on Machine Vision
1 Introduction
2 YOLO-v5-Based License Plate Detection
2.1 YOLO-v5 Training Process
2.2 Experimental Hardware Environment and Parameter Settings
2.3 Experimental Results and Analysis
3 License Plate Recognition Based on LPRNet
3.1 LPRNet Training Process
3.2 Network Structure
3.3 Experimental Hardware Environment and Parameter Settings
3.4 Analysis of Experimental Results
4 Conclusion
References
Construction and Application of Knowledge Graph in the Field of Medical Food Supplements
1 Introduction
2 GPLinker Intends to Identify Slot Analysis Models
2.1 The Overall Structure of the Model
2.2 Training Data Preprocessing
2.3 Parameter Settings
2.4 Experimental Evaluation Indicators
2.5 Experimental Results and Analysis
3 Design and Implementation of Intelligent Question Answering Application Based on Domain Knowledge Graph
3.1 Domain Knowledge Graph Construction Method and Implementation
3.2 Q&A Application Design
4 Conclusion
References
Practice and Exploration of Discrete Mathematics Teaching in Applied Undergraduate Colleges Under the Background of New Engineering
1 Introduction
2 The Teaching Concept of “Discrete Mathematics” in Applied Undergraduate Colleges Under the Background of “New Engineering”
3 Discrete Mathematics Teaching Practice in Applied Undergraduate Colleges Under the Background of “New Engineering”
3.1 Under the Background of New Engineering, Reform the Teaching Content of “Discrete Mathematics” Courses to Adapt to the Construction of Knowledge System of Applied Talents
3.2 Under the Background of New Engineering, Reform the Teaching Method of “Discrete Mathematics” Courses and Improve the Interdisciplinary Ability of Applied Talents
4 Conclusion
References
Research on Production Line Balance Optimization Based on Improved PSO-GA Algorithm
1 Introduction
2 Scheduling Mathematical Model
2.1 Analysis of Job Shop Scheduling Problems
2.2 IPPS Mathematical Model
3 Workshop Scheduling Algorithm Based on PSO-GA
4 Simulation and Experimental Analysis
5 Conclusion
References
Prediction and Analysis of Stroke Risk Based on Ensemble Learning
1 Introduction
2 Related Work
3 Method
3.1 Data Preprocessing
3.2 Principal Component Analysis
3.3 Prediction Model
4 Experiment
4.1 SVM
4.2 Decision Tree
4.3 Voting Model
4.4 Bagging Model
4.5 Stacking Model
5 Conclusion
5.1 Results Analysis
5.2 Future Work
References
The Users’ Purchase Behavior Research of Buying Clothing of Live Streaming eCommerce on Tiktok
1 Introduction
2 Construction of Hypothesis Model
3 Assumption Model
4 Data Collection and Research Design
5 Empirical Analysis and Model Revision
6 Model Correction
7 Conclusion
Bibliography
Exploration on the Evaluation Index System of E-commerce Application Talents’ Literacy for Industry Needs
1 Introduction
2 Assumption Evaluation Indicator System
3 Evaluation Index System for the Literacy of E-commerce Applied Talents
3.1 Data Collection and Descriptive Analysis
3.2 Revision of Hypothesis Evaluation Indicator System and Establishment of E-commerce Applied Talent Literacy Evaluation Indicator System
4 Conclusion
Bibliography
Research on Cosmetics Consumer Behavior Analysis and Marketing Strategy Optimization on E-Commerce Platform “Xiaohongshu”
1 Introduction
2 Data Collection
2.1 Distribution and Recovery of Questionnaires
2.2 Statistical Analysis of Sample Status
2.3 Statistical Analysis of the Usage of “Xiaohongshu”
3 Empirical Results and Analysis
3.1 Analysis of Factors Influencing Consumers’ Use of “Xiaohongshu” for Shopping
3.2 Cross Analysis of Survey Data
4 Conclusion
References
DOA Estimation of Special Non-uniform Linear Array Based on Quantum Honey Badger Search Algorithm
1 Introduction
2 DOA Estimation Model
3 DOA Estimation for Special Nonuniform Linear Array Based on Quantum Honey Badger Algorithm
3.1 Quantum Honey Badger Algorithm
3.2 DOA Estimation for Special Non-uniform Linear Array Based on Quantum Honey Badger Algorithm
4 Simulation Experiment Results
4.1 The First Set of Simulation Experiments
4.2 The Second Set of Simulation Experiments
4.3 The Third Set of Simulation Experiments
5 Conclusions
References
Chewing Behavior Detection Based on Facial Dynamic Features
1 Motivation and Purpose
1.1 Motivation
1.2 Purpose
2 Literature Review and Technical Analysis
2.1 Visual Characteristics of Chewing During Eating
2.2 Past Literature on Chewing Recognition
2.3 Dlib Facial Landmark Model
3 Methods and Procedures
3.1 Application Scenarios
3.2 Chewing Calculation Process
3.3 Algorithm Design for Chewing Instances Calculation
4 Experimentation and Performance Analysis
4.1 Purpose of the Experiment
4.2 Characteristics of Chewing
4.3 Chewing Signal Data Pre-Processing
4.4 Analysis of Experimental Results for Chewing Instance Calculation
5 Results and Discussion
References
Unknown DDoS Attack Detection Using Open-Set Recognition Technology and Fuzzy C-Means Clustering
1 Introduction
2 Related Works
2.1 DDoS Attack
2.2 AlexNet
2.3 Open Set Recognition
2.4 Spatial Location Constraint Prototype Loss
2.5 Fuzzy C-Means Algorithm
2.6 Unknown DDoS Detection
3 Methodology
3.1 Proposed Framework
3.2 Unknown DDoS Attack Model
3.3 Spatial Location Constraint Prototype Loss
3.4 Unknown DDoS Attack Identification
4 Experiment Result
4.1 CICIDS2017dataset
4.2 Experimental Environment
4.3 Evaluation Metric
4.4 Conventional DDoS Identification Modules
4.5 Unknown Identification Module
5 Conclusion
References
Dairy Cow Behavior Recognition Technology Based on Machine Learning Classification
1 Introduction
1.1 Background
1.2 Motivation and Purpose
2 Literature Review
2.1 Common Cattle Behaviors
2.2 Comparison with Relevant Research
3 Cow Behavior Recognition Technology
3.1 Introduction to the Dataset
3.2 Behavior Recognition Server System Architecture
3.3 Calculation Process
3.4 Data Preprocessing
3.5 Machine Learning Methods
3.6 Model Optimization and Performance Tuning
4 Performance Evaluation and Analysis
4.1 Restated Validation Objectives
4.2 Machine Learning Prediction Results
5 Conclusion
References
PSO-Based CI Agent with Learning Tool for Student Experience in Real-World Application
1 Introduction
2 PSO-Based Sandbox for Teaching and Learning in CI Model
2.1 CI Sandbox Workshop and Competition @ IEEE CEC 2023 and FUZZ-IEEE 2023
2.2 Program of CI Sandbox @ IEEE CEC 2023 and FUZZ-IEEE 2023
3 CI Agent with Learning Tool for Young Student Experience
3.1 Introduction to CI&AI-FML Learning Tool
3.2 CI Agent Structure for the Learning Tool
4 Experimental Results
4.1 Extended ADAS Application @ IEEE CEC 2023
4.2 Intelligent Agriculture System (IAS) Application @ FUZZ-IEEE 2023.
4.3 Smart Greenhouse System (SGS) Application @ FUZZ-IEEE 2023
5 Conclusions
Appendix
References
A GA-Based Scheduling Algorithm for Semiconductor-Product Thermal Cycling Tests
1 Introduction
2 Related Work
3 Problem Definition
3.1 Symbols
3.2 Order Grouping
3.3 Group Scheduling
4 The Proposed GA-Based Group Scheduling Algorithm
4.1 Encoding
4.2 Fitness and Selection
4.3 Crossover and Mutation
4.4 Chromosomes Repair
4.5 The Proposed Algorithm
5 Experimental Results
6 Conclusion
References
Power Internet of Things Security Evaluation Method Based on Fuzzy Set Theory
1 Introduction
2 Power Internet of Things Safety Evaluation Index System
3 Power Internet of Things Security Evaluation System Based on Fuzzy Set
3.1 Determining the Set of Factors and the Assessment Collection
3.2 Determining Membership
3.3 Determining Index Weights
4 Experimental Results
5 Conclusion
References
A Q-Learning Based Power-Aware Data Dissemination Protocol for Wireless Sensor Networks
1 Introduction
2 Related Work
3 Q-Learning Based Power-Aware Data Dissemination
4 Simulation Results
5 Conclusions
References
Application of Entropy – TOPSIS Method in Service Quality Assessment
1 Introduction
2 TOPSIS Method
2.1 Definition
2.2 Implementation Formula
3 Developing Criteria for Evaluating Service Quality
4 Application in Service Quality Selection
5 Conclusion
References
Machine Learning Algorithm-Based Prediction of Hyperglycemia Risk After Acute Ischemic Stroke
1 Introduction
2 Related Work
3 Approach
3.1 Basic Modules
3.2 Feature Extraction
3.3 Classification Prediction Model Based on 8 Machine Learning Algorithms
4 Experiment
4.1 Experiment Setting
4.2 Model Evaluation and Performance Comparison
5 Conclusion
References
Implementation of Campus Pedestrian Detection Using YOLOv5
1 Introduction
2 Related Works
3 YOLO Model
4 The Implementation and Results
5 Conclusion
References
Fick's Law Algorithm with Gaussian Mutation: Design and Analysis
1 Introduction
2 Proposed GM-FLA
2.1 Gaussian Mutation
2.2 GM-FLA Mathematical Model
2.3 Diffusion Operator (DO)
2.4 Equilibrium Operator (EO)
2.5 Steady-State Operator (SSO)
3 Experimental Design and Results
3.1 Results for 10D
3.2 Results for 50D
3.3 Convergence Curve
4 Conclusions
References
Barnacle Growth Algorithm (BGA): A New Bio-Inspired Metaheuristic Algorithm for Solving Optimization Problems
1 Introduction
2 Proposed Optimization Algorithm
2.1 Inspiration
2.2 Algorithm Structure
3 Experimental Result
3.1 Experimental Environment
3.2 Experimental Results on Convergence
4 City Power Transmission Management with BGA
5 Conclusions
References
The Effect of Wooden Floor Processing Procedures on Product Quality
1 Introduction
2 Research Purposes
3 Case Study Product Introduction
4 Methodology
5 Data Collection and Statistical Analysis
5.1 The Relationship Between Data Collection and Statistical Analysis
5.2 Analysis of Wear Resistance of Five Types of Unpainted Solid Wood
6 Conclusions
References
The Effect of Heat Treatment on the Quality Characteristics of Metal Materials
1 Introduction
2 Research Purposes
3 Case Study Product Introduction
4 Methodology
5 Data Collection and Analysis
5.1 Analysis of Medium Carbon Wheel Mold S45C115 mm
5.2 Analysis of Medium Carbon Gear S45C150 mm
5.3 Analysis of Guangyuan Iron Sleeve 46 mm Carburizing Heat Treatment
5.4 Analysis of Guangyuan Iron Sleeve 41.0 mm Carburizing Heat Treatment
6 Conclusions
References
Author Index

Citation preview

Lecture Notes in Electrical Engineering 1114

Jeng-Shyang Pan Zhigeng Pan Pei Hu Jerry Chun-Wei Lin   Editors

Genetic and Evolutionary Computing Proceedings of the Fifteenth International Conference on Genetic and Evolutionary Computing (Volume II), October 6–8, 2023, Kaohsiung, Taiwan

Lecture Notes in Electrical Engineering

1114

Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Napoli, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, München, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, University of Karlsruhe (TH) IAIM, Karlsruhe, Baden-Württemberg, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Dipartimento di Ingegneria dell’Informazione, Sede Scientifica Università degli Studi di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Intelligent Systems Laboratory, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, Department of Mechatronics Engineering, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Intrinsic Innovation, Mountain View, CA, USA Yong Li, College of Electrical and Information Engineering, Hunan University, Changsha, Hunan, China Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d´Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Subhas Mukhopadhyay, School of Engineering, Macquarie University, Sydney, NSW, Australia Cun-Zheng Ning, Department of Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova, Genova, Genova, Italy Bijaya Ketan Panigrahi, Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Federica Pascucci, Department di Ingegneria, Università degli Studi Roma Tre, Roma, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, University of Stuttgart, Stuttgart, Germany Germano Veiga, FEUP Campus, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Haidian District Beijing, China Walter Zamboni, Department of Computer Engineering, Electrical Engineering and Applied Mathematics, DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy Junjie James Zhang, Charlotte, NC, USA Kay Chen Tan, Department of Computing, Hong Kong Polytechnic University, Kowloon Tong, Hong Kong

The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering—quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning: • • • • • • • • • • • •

Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS

For general information about this book series, comments or suggestions, please contact [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada Michael Luby, Senior Editor ([email protected]) All other Countries Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. **

Jeng-Shyang Pan · Zhigeng Pan · Pei Hu · Jerry Chun-Wei Lin Editors

Genetic and Evolutionary Computing Proceedings of the Fifteenth International Conference on Genetic and Evolutionary Computing (Volume II), October 6–8, 2023, Kaohsiung, Taiwan

Editors Jeng-Shyang Pan School of Artificial Intelligence Nanjing University of Information Science and Technology Nanjing, China Pei Hu School of Computer and Software Nanyang Institute of Technology Nanyang, China

Zhigeng Pan School of Artificial Intelligence Nanjing University of Information Science and Technology Nanjing, China Jerry Chun-Wei Lin Faculty of Automatic Control, Electronics and Computer Science Department of Distributed Systems and IT Devices Silesian University of Technology Gliwice, Poland

ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-981-99-9411-3 ISBN 978-981-99-9412-0 (eBook) https://doi.org/10.1007/978-981-99-9412-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.

Preface

This volume composes the proceedings of the 15th International Conference on Genetic and Evolutionary Computing (ICGEC 2023), which was held in Kaohsiung City, Taiwan, on October 6–8, 2023. The aim of ICGEC 2023 is to provide an internationally respected forum for scientific research in the areas of artificial intelligence, genetic and evolutionary computing, intelligent data analysis, machine learning, and all the associated applications. The ICGEC 2023 was co-sponsored by Western Norway University of Applied Sciences (Norway), National Kaohsiung University of Science and Technology (Taiwan), Shu-Te University (Taiwan), Nanchang Institute of Technology (China), Taiwan Association for Web Intelligence Consortium, Ubiquitous International Co., Ltd, and Changzhou College of Information Technology (China). We would like to express our sincere thanks to the authors, reviewers, and organizing committee members for their contributions of making this conference a success. We would also like to thank the publisher, Springer, for their work and support in publishing the proceedings. October 2023

Jerry Chun-Wei Lin Chin-Shiuh Shieh Mong-Fong Horng Shu-Chuan Chu

Organization

Honorary Chairs Jau-Shyong Wang Vaclav Snasel Jeng-Shyang Pan

Shu-Te University, Taiwan Technical University of Ostrava, Czech Republic Shandong University of Science and Technology, China

Advisory Committee Chair Tien-Chin Wang

National Kaohsiung University of Science and Technology, Taiwan

Conference Chairs Jerry Chun-Wei Lin Mong-Fong Horng Zhi Geng Pan

Western Norway University of Applied Sciences, Norway National Kaohsiung University of Science and Technology, Taiwan Nanjing University of Information Science and Technology, China

Program Committee Chairs Chin-Shiuh Shieh Shu-Chuan Chu Rajesh K. Shukla Gautam Srivastava

National Kaohsiung University of Science and Technology, Taiwan Shandong University of Science and Technology, China Oriental Institute of Science and Technology, India Brandon University, Canada

viii

Organization

Invited Session Chairs Jimmy Ming-Tai Wu Shih-Pang Tseng Yendeng Huang Fuquan Zhang Jia Zhao

National Kaohsiung University of Science and Technology, Taiwan Changzhou College of Information Technology, China Changzhou College of Information Technology, China Minjiang University, China Nanchang Institute of Technology, China

Local Organizing Chairs Shi-Huang Chen Tien-Szu Pan Chien-Liang Chiu

Shu-Te University, Taiwan National Kaohsiung University of Science and Technology, Taiwan National Kaohsiung University of Science and Technology, Taiwan

Publication Chairs Tien-Wen Sung Pei Hu

Fujian University of Technology, China Shandong University of Science and Technology, China

Finance Chair Yun-Chi Huang

Taiwan Ubiquitous Information Co., Ltd., Taiwan

Program Committee Members Stefania Tomasiello Vo Bay Youcef Djenouri Jaroslav Frnda Vicente Garcia Diaz Dariusz Mrozek

University of Tartu Estonia, Estonia Ho Chi Minh City University of Technology, Vietnam University of South-Eastern Norway, Norway University of Zilina, Slovakia University of Oviedo, Spain Silesian University of Technology, Poland

Organization

Marcin Fojcik Jia-Chun Lin Ming Chang Lee Ji Zhang Keping Yu Muhammad Aleem Ko-Wei Huang Brij B. Gupta Ala Al-Fuqaha Basant Kumar Verma Muhammad Baqer Mollah Lawton Liao Zakia Hammouch Denis Miu Sarfraz Hussain Tarek Gaber Dhrubajyoti Das Pei-Wei Tsai Valentina Emilia Balas Ying-Chieh Chao Yuh-Chung Lin Abhishek Kumar Tin-Yu Wu Zubair Hussain Shashi Kant Gupta Satendra Kumar Muthmainnah Thanh-Tuan Nguyen Shaowei Weng Lingping Kong Junbao Li Ling Wang Hao Luo

ix

Western Norway University of Applied Sciences, Norway Norwegian University of Science and Technology, Norway Western Norway University of Applied Sciences, Norway University of Southern Queensland, Australia Hosei University, Japan National University of Computer and Emerging Sciences, Pakistan National Kaohsiung University of Science and Technology, Taiwan Asia University, Taiwan Hamad Bin Khalifa University, Qatar Panipat Institute of Engineering and Technology, India University of Massachusetts Dartmouth, USA Intelligent Cloud Plus Co., Ltd., Taiwan Moulay Ismail University, Morocco Genie-Networks Co., Ltd., Taiwan Government Postgraduate College of Commerce, Sahiwal, Pakistan University of Salford, Manchester, UK Patanjali Research Institute, Haridwar, India Swinburne University of Technology, Australia Aurel Vlaicu University of Arad, Romania ICP DAS Co., Ltd., Taiwan Sanda University, Shanghai, China Chandigarh University, Punjab, India National Pingtung University of Science and Technology, Taiwan Riphah International University, Pakistan Chinmay Research, Education and Publication Pvt. Ltd., India Moradabad Institute of Technology, India Universitas Al Asyariah Mandar, Sulawesi Barat, Indonesia Nha Trang University, Vietnam Fujian University of Technology, China Technical University of Ostrava, Czech Republic Harbin Institute of Technology, China Northeast Electric Power University, China Zhejiang University, China

x

Organization

Zhenyu Meng Wei Li Jianpo Li Xin Wang Trong-The Nguyen Yulong Qiao Kuo-Kun Tseng Veerpratap Meena

Fujian University of Technology, China Harbin Engineering University, China Northeast Electric Power University, China Harbin Institute of Technology, China Vietnam National University, Vietnam Harbin Engineering University, China Harbin Institute of Technology, Shenzhen Campus, China Malaviya National Institute of Technology, India

Contents

Vehicle Traveling Speed Prediction Based on LightGBM Algorithm . . . . . . . . . . Nan Li, Fumin Zou, and Feng Guo

1

A Feature Matching Method Based on Rolling Guided Filter and Collinear Triangular Matrix Optimal Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liu Xiaoming, Yuan Yizhao, Li Qiqi, and Zhao Huaqi

11

Short-Time Driving Style Classification and Recognition Method on Expressway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GuangHao Luo, FuMin Zou, Feng Guo, and ChenXi Xia

19

Expressway Short-Term Traffic Flow Prediction Based on CNN-LSTM . . . . . . . Ting Ye, Fumin Zou, and Feng Guo

29

Expressway Traffic Speed Prediction Method Based on KF-GRU Model via ETC Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ChenXi Xia, FuMin Zou, Feng Gou, and GuangHao Luo

37

Dynamic Brightness Adjustment of Tunnel Lighting Based on ETC Transaction Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shilong Zhuo, Fumin Zou, Feng Guo, and Xinrui Zhao

47

Short-Time Traffic Flow Prediction of Highway Toll Station Based on Combined GRU-MLP Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenyu Chen, Fumin Zou, and Feng Guo

58

Real-Time Carbon Emission Monitoring and Prediction Method of Expressway Based on LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xinrui Zhao, Fumin Zou, Feng Guo, and Sirui Jin

68

PSR-GAN: Unsupervised Portrait Shadow Removal Using Evolutionary Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tianlong Ma, Longfei Zhang, Xiaokun Zhao, and Zixian Liu

79

Chebyshev Inequality and the Identification of Genes Associated with Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lei Yu, Xueli Tan, Delin Luo, Lin Yang, Xinping Pang, Zhengchao Shan, Chengjiang Zhu, Jeng-Shyang Pan, and Chaoyang Pang

87

xii

Contents

A Novel Crossover Operator Based on Grey Wolf Optimizer Applied to Feature Selection Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenbo Guo, Yue Sun, Xinping Pang, Lin Yang, Lei Yu, Qi Zhang, Ping Yang, Jeng-Shyang Pan, and Chaoyang Pang

98

Research on the Detection Method of Sybil Attacks on Wireless Terminals in Power Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Daming Xu, Kelin Gao, Zhongbao Hou, Li Zhang, Zhipei Wei, Qi Li, and Yuzhu Ding Probability Vector Enhanced Tumbleweed Optimization Algorithm . . . . . . . . . . . 118 Yang-Zhi Chen, Ruo-Bin Wang, Hao-Jie Shi, Rui-Bin Hu, and Lin Xu Improving BFGO with Apical Dominance-Guided Gradient Descent for Enhanced Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Hao-Jie Shi, Feng Guo, Yang-Zhi Chen, Lin Xu, and Ruo-Bin Wang Research on Innovative Social Service Training Mode in Higher Vocational Colleges Under the New Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Wei Zong and Dawei Luo Research on the Precise Teaching Path of Higher Vocational Colleges Under the Concept of OBE in the Digital Era . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Juan Luo An Optimal Inventory Replenishment Strategy with Cross-docking System and Time Window Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Yen-Deng Huang, Simon Wu, and Xue-Fei Yuan Proposal of a DDoS Attack Detection Method Using the Communication Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Kosei Iwasa, Shotaro Usuzaki, Kentaro Aburada, Hisaaki Yamaba, Tetsuro Katayama, Mirang Park, and Naonobu Okazaki Study of an Image-Based CAPTCHA that is Resistant to Attacks Using Image Recognition Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Sojiro Nishikawa, Shotaro Usuzaki, Kentaro Aburada, Hisaaki Yamaba, Tetsuro Katayama, Mirang Park, and Naonobu Okazaki Blind Image Quality Assessment Using Standardized NSS and Multi-pooled CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Nay Chi Lynn, Yosuke Sugiura, and Tetsuya Shimamura

Contents

xiii

Abnormal Activity Detection Based on Place and Occasion in Virtual Home Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Swe Nwe Nwe Htun, Shusaku Egami, Yijun Duan, and Ken Fukuda Proposal of Fill in the Missing Letters CAPTCHA Using Associations from Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Hisaaki Yamaba, Muhammad Nur Firdaus Bin Mustaza, Shotaro Usuzaki, Kentaro Aburada, Masayuki Mukunoki, Mirang Park, and Naonobu Okazaki Digital Transformation (DX) Solution for Monitoring Mycoplasma Infectious Disease in Calves: A Worldwide Health Challenge . . . . . . . . . . . . . . . . 218 Cho Nilar Phyo, Pyke Tin, Hiromitsu Hama, and Thi Thi Zin AI Driven Movement Rate Variability Analysis Around the Time of Calving Events in Cattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Wai Hnin Eaindrar Mg, Pyke Tin, Masaru Aikawa, Ikuo Kobayashi, Yoichiro Horii, Kazuyuki Honkawa, and Thi Thi Zin Channel-Wise Pruning via Learn Gates&BN for Object Detection Tasks . . . . . . . 238 Min-Xiang Chen, Po-Han Chen, and Chia-Chi Tsai Prediction of Sepsis Mortality Risk Based on Ensemble Learning Algorithm FBTV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Xuan Zhang, Lixin Huang, Teng Fu, and Yujia Wu Research on the Influence Mechanism of College Students’ Learning Satisfaction Based on the Chain Mediation Model . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Xin Guo, YiChen Yang, Zilong Yin, Ying Chen, and Yujia Wu Design of License Plate Recognition System Based on Machine Vision . . . . . . . 271 Ming Hui Zhang, Xu Yang, and Ming Chao Zhang Construction and Application of Knowledge Graph in the Field of Medical Food Supplements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 Ming Hui Zhang, Wei Hong Yu, and Ming Chao Zhang Practice and Exploration of Discrete Mathematics Teaching in Applied Undergraduate Colleges Under the Background of New Engineering . . . . . . . . . . 290 Ming Chao Zhang and Ming Hui Zhang Research on Production Line Balance Optimization Based on Improved PSO-GA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Zhijian Pei, Zhihui Deng, and Xinmin Shi

xiv

Contents

Prediction and Analysis of Stroke Risk Based on Ensemble Learning . . . . . . . . . 311 Xiuji Zuo, Xin Guo, Zilong Yin, and Shih-Pang Tseng The Users’ Purchase Behavior Research of Buying Clothing of Live Streaming eCommerce on Tiktok . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Huang Zheng and Zhiqiang Zhu Exploration on the Evaluation Index System of E-commerce Application Talents’ Literacy for Industry Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Zhiqiang Zhu Research on Cosmetics Consumer Behavior Analysis and Marketing Strategy Optimization on E-Commerce Platform “Xiaohongshu” . . . . . . . . . . . . . 336 Xuwei Zhang and Zhiqiang Zhu DOA Estimation of Special Non-uniform Linear Array Based on Quantum Honey Badger Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Yaqing Zheng, Hongyuan Gao, and Yulong Qiao Chewing Behavior Detection Based on Facial Dynamic Features . . . . . . . . . . . . . 355 Cheng-Zhe Tsai, Chun-Chih Lo, Lan-Yuen Guo, Chin-Shiuh Shieh, and Mong-Fong Horng Unknown DDoS Attack Detection Using Open-Set Recognition Technology and Fuzzy C-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Hao Kao, Thanh-Tuan Nguyen, Chin-Shiuh Shieh, Mong-Fong Horng, Lee Yu Xian, and Denis Miu Dairy Cow Behavior Recognition Technology Based on Machine Learning Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Che-Wei Chou, Chang-Ang Lee, Shu-Wei Guo, Chin-Shiuh Shieh, and Mong-Fong Horng PSO-Based CI Agent with Learning Tool for Student Experience in Real-World Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Chang-Shing Lee, Mei-Hui Wang, Chih-Yu Chen, Che-Chia Liang, Sheng-Chi Yang, and Mong -Fong Horng A GA-Based Scheduling Algorithm for Semiconductor-Product Thermal Cycling Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Yeong-Chyi Lee, Tzung-Pei Hong, Yi-Chen Chiu, and Chun-Hao Chen

Contents

xv

Power Internet of Things Security Evaluation Method Based on Fuzzy Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Yuman Wang, Hongbin Wu, Yilei Wang, Zixiang Wang, Xinyue Zhu, and Kexiang Qian A Q-Learning Based Power-Aware Data Dissemination Protocol for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Neng -Chung Wang, Chao-Yang Lee, Ming-Fong Tsai, Hsu-Fen Fu, and Wei-Jung Hsu Application of Entropy – TOPSIS Method in Service Quality Assessment . . . . . 429 Tien-Chin Wang and Thuy Thi Thu Nguyen Machine Learning Algorithm-Based Prediction of Hyperglycemia Risk After Acute Ischemic Stroke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Yating Hao, Xuan Zhang, and Lihua Dai Implementation of Campus Pedestrian Detection Using YOLOv5 . . . . . . . . . . . . 447 Yuh-Chung Lin, Ta-Wen Kuan, Shih-Pang Tseng, and Xinhang Lv Fick’s Law Algorithm with Gaussian Mutation: Design and Analysis . . . . . . . . . 456 Haonan Li, Shu-Chuan Chu, Saru Kumari, and Tsu-Yang Wu Barnacle Growth Algorithm (BGA): A New Bio-Inspired Metaheuristic Algorithm for Solving Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 Ankang Shao, Shu-Chuan Chu, Yeh-Cheng Chen, and Tsu-Yang Wu The Effect of Wooden Floor Processing Procedures on Product Quality . . . . . . . 480 Tien-Chin Wang, Tzu-Lien Hung, and Jiunn-Ming Yu The Effect of Heat Treatment on the Quality Characteristics of Metal Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Tien-Chin Wang, Shun-Tung Wu, and Jiunn-Ming Yu Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505

Vehicle Traveling Speed Prediction Based on LightGBM Algorithm Nan Li1 , Fumin Zou1 , and Feng Guo2(B) 1 Fujian University of Technology, Fuzhou 350118, China 2 Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, Fujian

University of Technology, Fuzhou 350118, China [email protected]

Abstract. Vehicle speed is an important indicator that affects the efficiency and safety of highway traffic, so realizing accurate prediction of vehicle speed on expressways can help reduce traffic accidents and improve the service level of intelligent traffic control. Aiming at the problem that the existing vehicle speed prediction model based on machine learning algorithm can not take into account the high computational accuracy, strong generalization ability and fast computation speed, the vehicle speed prediction model based on LightGBM algorithm is proposed. The model takes the ETC transaction data as the research object, uses the Light Gradient Boosting Machine (LightGBM) algorithm to establish the vehicle traveling speed prediction model with road features and vehicle traveling features in the feature library as inputs, and compares it with eight machine learning algorithms, including Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), Gradient Boosting Decision Tree (GBDT), Ridge Regression (RR), Support Vector Regression (SVR), Random Forest (RF), Classification and Regression Tree (CART) and Back Propagation Neural Network (BPNN), and the results show that: the vehicle driving prediction model based on LightGBM has the best overall performance and can realize the fast prediction of vehicle driving speed with high prediction accuracy and strong generalization ability. Keywords: Vehicle speed prediction · LightGBM · feature library · machine learning

1 Introduction In recent years, with the rapid development of intelligent transportation technology, people have begun to pay increasing attention to vehicle speed prediction. Accurate vehicle speed prediction is a crucial element in achieving the goal of a “known, measurable, controllable, and serviceable” in traffic management system. Therefore, it is an essential component in Intelligent Transportation Systems (ITS). Currently, researchers and scholars have conducted many studies related to speed prediction in the field of transportation research, which is mainly divided into traffic flow speed prediction [1] and short-term speed prediction [2]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 1–10, 2024. https://doi.org/10.1007/978-981-99-9412-0_1

2

N. Li et al.

Considering that the existing prediction methods have less research on the speed of vehicles traveling on highways, and that there are problems such as the inability to take into account the prediction speed and accuracy, this paper proposes a speed prediction model for vehicles traveling on highways based on LightGBM. Firstly, in order to solve the problem of insufficient data volume, this paper uses the ETC transaction data with a user penetration rate of more than 80% to carry out research, digs deep into the characteristics of the ETC transaction data, and designs the preprocessing rules of the ETC data; Secondly, in-depth research on the characteristics of the vehicle traveling speed, constructs the feature vector, and carries out the relevance analysis of the feature vector; on the basis of which, it puts forward a speed prediction model for vehicles traveling on the highway based on the LightGBM. Speed prediction model for vehicles traveling on highways. By comparing with seven machine learning algorithms, including DT, XGBoost, GBDT, RR, SVR, RF, CART and BPNN, we verify the accuracy of LightGBM in prediction accuracy, as well as its superiority in generalization ability and operation speed. The rest of the paper is organized as follows. We first review related research works on traffic parameter prediction and traffic speed prediction in Sect. 2. Secondly, the related methods are presented in Sect. 3, including the overall framework, data preprocessing, the feature vector construction, and the vehicle speed prediction model. Thirdly, experiments and results are analyzed in Sect. 4. Finally, the conclusions of this paper are given and future research work is prospected.

2 Literature Review Traffic speed as an important factor in traffic flow analysis has also become the focus of traffic research, the scholars also use different methods to predict the speed and compare the advantages and disadvantages of each method. At present, the existing vehicle speed prediction methods at home and abroad can be summarized as follows: prediction methods based on statistical theory, prediction methods based on deep learning, and prediction methods based on combined models. Prediction methods based on statistical theory [3–5] usually make the assumption that the driving speed of the vehicle in the future period of time and the statistics to the historical vehicle speed data have the same or similar characteristics, and then use mathematical statistics to process the historical data of the vehicle speed, so as to predict the vehicle driving conditions in the future period of time. Advances in machine learning have provided us with new methods for capturing more complex information from existing traffic datasets. These methods such as K-Nearest Neighbors (KNN) [6], Bayesian Networks [7] and SVM [8] have been successfully applied in speed prediction studies. Deep learning models can better deal with nonlinear representations through a large amount of data, and at the same time, they can automatically extract features and capture the relevance of the data, so a large number of deep learning models have been used for traffic prediction [9, 10]. Currently, the prediction method based on the combination model reflects good prediction effect [11]. Researchers and scholars use two or more prediction models to combine for speed prediction, which can utilize the advantages of different models to achieve better prediction accuracy [12, 13].

Vehicle Traveling Speed Prediction

3

3 Methodology 3.1 Data Preprocessing Vehicle Travel Speed Construction. Due to the high discreteness of ETC data, it is difficult to construct the vehicle’s speed based on isolated data points. Therefore, the following definitions are made. Definition 1. ETC gantries, toll station entrances and exits on expressway are collectively referred to as Node, two adjacent Node on the road constitutes a segment on expressway, referred to as QD. QD =< Q, Distance >

(1)

where Q =< Node1 , Time1 , Node2 , Time2 >, Node1 is the starting point of the section, Node2 is the end point of the section, Distance is the distance of the section. Definition 2. All sections of QD form a expressway network, referred to as TOPO. TOPO =< QD1 , QD2 , ..., QDn >

(2)

Definition 3. The sequence of time-ordered Node formed by a vehicle VID passing through ETC gates on expressway in consecutive segments is called traveling track Etraj. Etraj =< traj1 , traj2 , ..., trajm >

(3)

where traj1 and trajm are the starting point and the end point of VID driving track respectively, traji is the transaction data of ETC gantry which the vehicle passes through during the driving process, which contains the ID of the gantry traji.ID , the transaction time of the gantry traji.T , the vehicle ID traji.VID , entrance number traji.EID , entrance time traji.ET , and m is the VID traveling track starting point and end point. Definition 4. The average speed of a vehicle VID passing through a certain section QDi is called the average speed of the vehicle, and the calculation formula is shown below. v=

Distance Time2 − Time2

(4)

where Distance is the distance of the section, Time1 and Time2 are the time when the vehicle passes through the starting point of the section and the time when the vehicle passes through the end point of the section. The algorithm for constructing vehicle speed includes three parts: trajectory construction, data cleaning, and vehicle speed calculation. Firstly, the transaction data excluding abnormal data are grouped based on vehicle IDtraji.VID , entrance IDtraji.EID , and entrance timestamptraji.ET . Secondly, after sorting each data set by transaction timestamptraji.T , we eliminate duplicate entries. Thirdly, we examine the topological information of two adjacent data points within each set to determine correctness. This involves removing redundant data generated by opposite gantries and repairing missing data. Then, the trajectory dataset is updated with the vehicle trajectories that meet the

4

N. Li et al.

specified criteria. Finally, the algorithm traverses the vehicle’s trajectory to extract inforj j j j mation about two adjacent gantries, including traji.ID ,traji.T ,traji+1.ID , andtraji+1.T . The algorithm then calculates the vehicle’s travel speed through this section using Formula (4). Algorithm for Vehicle Traveling Speed Outlier Detection. After constructing the vehicle traveling speed data, the objectively existing data are reasonable data. However, it contains part of the ETC transaction data in the abnormal state, such as a vehicle traveling speed is obviously too large or too small than the same type of vehicle traveling under normal circumstances. Therefore, the vehicle traveling speed outlier detection algorithm is constructed to further eliminate such data. The basic idea of the vehicle traveling speed outlier detection algorithm is to simultaneously use the upper and lower limits of the box-and-line diagram and the centroid threshold of the statistical distribution of distance data for outlier detection, to determine the threshold interval for filtering the abnormal traveling speed data, and to exclude data outside this threshold, thus quickly filtering out the abnormal data in the massive ETC gantry transaction data. The final valid interval of vehicle traveling time v ∈ [vdown , vup ]. The specific formula is shown below: vdown = max(v25% − 1.5 × (v75% − v25% ), vmean − 2σ)

(5)

vup = min(v75% + 1.5 × (v75% − v25% ), vmean + 2σ )

(6)

where v25% denotes greater than 25% bit vehicletraveling speed, v75% denotes greater N

v

i , it is the mean value of vehicle than 75% bit vehicle traveling speed. vmean = i=1 N traveling speed, and σ denotes the standard deviation of vehicle traveling speed.

3.2 Travel Speed Prediction on Expressway Feature Vector. There are many factors affecting the speed of vehicles on expressways, and they have highly nonlinear characteristics. For this reason, this paper summarizes the results of previous work and constructs feature vectors from road characteristics and vehicle driving characteristics as follows: (1) Road characteristics: The effect of road features on speed prediction is significant. The length of the section, the presence or absence of tunnels in the section, the flow situation in the current 15 min of the section, and the flow situation in the previous 15 min of the section all have some influence on the speed of the vehicle, for this reason, the road feature vectors are constructed as shown below: R = (Len, tunnel, F, Fprev )T

(7)

where Len represents the length of the section with unit of km; tunnel represents whether the section contains a tunnel, if the value is 0, it means the section does not contain a tunnel, if the value is 1, it means the section contains a tunnel; F is the current 15-min time period of the section’s traffic situation, which represents the section’s road traffic situation under the current time in unit of: vehicle/15 min; Fprev is the the flow condition of the segment in the previous 15-min time period, in units of vehicles/15 min.

Vehicle Traveling Speed Prediction

5

(2) Vehicle driving characteristics: In general, the performance and characteristics of different models affect the speed of the vehicle. For example, trucks usually have lower driving speeds due to the transportation of heavy loads, in contrast to passenger cars, which usually have higher driving speeds. A vehicle’s historical driving speed can reflect the driver’s driving habits and behavior. If a vehicle’s historical driving speed is high, this may mean that the driver tends to drive at a faster speed. Conversely, a lower historical driving speed may indicate that the driver prefers to drive at slower speeds. Moreover, the driving speed in the previous zone can also affect the speed in the current zone. Also, the traffic flow and road conditions may be different at different travel times, and the vehicle driving speed may change. For this reason, the vehicle characteristics are constructed as shown below: Vel = (class, ctype, v hist , vprev , time, date)T

(8)

where class is the vehicle type, the vehicle type is 16 categories; ctype is the passenger/freight volume characteristics; vhist is the historical driving speed of the vehicle; vprev is the average speed of the vehicle in the previous zone, time is the time period characteristics of the model driving, the whole day will be divided into 96 time segments according to 15 min, so its value range is 0 to 95; date is the holiday characteristics, its value is 0 or 1, 0 means non-working day, 1 means working day. The construction of the feature vector involves transforming all feature values into a vector representation. Vehicle Traveling Speed Prediction Model Based on LightGBM Algorithm. GBDT is a representative framework for integrated learning utilizing Boosting strategy [14]. It builds strong classifiers or strong regressors by combining weak learners, and the commonly used base learner is CART. GBDT is an additive model that accumulates the predicted values of all the CART trees as the final prediction, and trains the model through continuous iteration in the direction of gradient descent of the loss function of the base learner. Direction to train the model iteratively. LightGBM [15] is a Decision Tree based GBDT algorithm framework that optimizes the problem in XGBoost. In the XGBoost model, the features are sorted before calculating the feature gains, which consumes a lot of memory resources, and calculating the feature gains for each segmentation point of each feature also consumes a lot of computational resources. To address these problems, LightGBM uses the histogram algorithm instead of feature sorting to reduce the number of feature splitting points, the Gradient-based One-Side Sampling (GOSS) algorithm for data sampling to reduce the sample size, and the Exclusive feature bundling (EFB) algorithm to reduce the number of features, so that the feature gain can be calculated in the absence of the feature split point. to reduce the number of features, which makes it possible to improve the training efficiency and save the training time without reducing the classification and regression accuracy.

6

N. Li et al.

4 Experiments and Analysis of Results 4.1 Data Sources and Pre-processing The experimental data comes from the ETC transaction data of key road network of Fujian Province Fuxia Expressway for 5 days from June 1st to 5th, 2021, totaling 18,726,700 entries. It contains fields such as vehicle identification, transaction time, and gantry number after desensitization conversion. According to the vehicle type classification and toll standard on expressway, vehicles can be divided into 4 categories of different types of passenger cars and 6 categories of different types of trucks. The set of vehicles’ driving speed feature vectors on expressway is constructed by the 10 statistical features of the feature vector model, as shown in Table 1, and each vector contains a total of 10 dimensional attributes as well as the corresponding driving speed. Table 1. Sample vehicle driving speed feature vectors on expressway. Len

tunnel

F

Fprev

class

ctype

vhist

vprev

time

date

v

2.633

0

453

434

11

2

90.53

95.22

62

0

93.84

2.633

0

326

319

1

1

101.84

103.16

76

1

103.03























14.376

1

254

225

4

1

98.04

98.40

37

0

96.73

14.376

1

177

172

1

1

112.43

109.97

50

1

110.82

4.2 Parameter Selection The important part of the LightGBM algorithm for predicting vehicle speed is to set the values of different parameters, as well as the learning task and the corresponding learning objectives, in order to enhance the controllability of the model. The parameters that have a greater impact on the performance of the algorithm among all the parameters of LightGBM include ‘num_ leaves’, ‘n_estimators’, and ‘n_estimators’, and ‘n_estimators’. ‘n_estimators‘ and ‘1earning_rate‘, etc. In order to get the optimal combination of parameters, it is necessary to tune the parameters before training the model. Compared with the grid search and random search algorithm, Bayesian algorithm can overcome the problems of the above methods in the speed and quality of parameter optimization, based on the Bayesian Optimization library of Python software can realize Bayesian optimization, through the built-in function can be specified in the range of the input parameters and the value, after random search of the initial space, according to the feedback of the objective function iteration, the model can be optimized by the Bayesian optimization library. After randomly searching the initial space, according to the feedback from the iteration of the objective function, the search space is continuously

Vehicle Traveling Speed Prediction

7

adjusted to reduce the search space, and multiple parameters are optimized in a wide range, and finally the parameter combination with the optimal overall performance is obtained to achieve the optimization objective with the maximum coefficient of determination, and the expression of the optimization function is shown in Eq. 13. The optimal parameter settings of the LightGBM model used in the paper are shown in Table 2. 2 n  i=1 yˆ (leaves, estimators, learning_rate, max _depth)i − y 2 (9) max R = n 2 i=1 (yi − y) where i is the serial number of the sample; n is the number of samples; yi is the actual value of the speed of the vehicle in sample i; leaves is the value of the parameter num_leaves; estimators is the value of the number of iterations n_estimators; learning_rate is the value of the model learning rate; max_depth is the value of the maximum depth of the tree. Table 2. Optimal parameters for the LightGBM model. Parameter name

Value

Base learner

GBtree

n_estimators

2201

Learning_rate

0.01

max_depth

9

num_leaves

850

4.3 Evaluation Indicators In order to evaluate the prediction performance of the model, a total of five metrics, including Coefficient of Determination (R2 ), Mean-Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and operating time, were used in the experiments. The specific equations are as follows.  (yi − y)2 (10) R2 = 1 − i=1 2 i=1 (yi − y) 1 N MSE = (yi − y)2 (11) i=1 N  1 N  yi − y MAE = (12) i=1 N   1 N yi − y MAPE = × 100% (13) i=1 N yi 







operatingtime = end _time − start_time

(14)

8

N. Li et al. 

where yi is the actual vehicle practicing speed speed, y is the predicted vehicle traveling speed and N is the sample size. MAE and MSE are used to indicate the error between the actual vehicle speed information value yi and the predicted value y,the smaller the value indicates that the prediction model has better accuracy. R2 is used to indicate the ability of the model to explain the data, and larger values indicate a better fit. MAPE measures the generalization ability of the model. 

4.4 Experimental Results and Analysis Using the actual speed of the samples in the test set and the predicted speed obtained by the predicted model can be obtained sample relative error frequency distribution histogram, as shown in Fig. 1. As can be seen from Fig. 1 it can be seen that 96% of the samples of the prediction results of the vehicle traveling speed and the real value of the relative error is [−6%, 6%], the prediction effect is better.

Frequency

8000

-0.2

-0.1

0.0

0.1

0.2

sum = 96%

8000

7000

7000

6000

6000

5000

5000

4000

4000

3000

3000

2000

2000

1000

1000

0

-0.2

-0.1

0.0

0.1

0.2

0

Relative Error

Fig. 1. Histogram of the frequency distribution of the relative errors of the samples

In order to understand the comprehensive performance level of the experimental model in regression analysis algorithms, different algorithmic models are used for the same valid data, and the prediction accuracy, generalization ability, and computing speed of different models are compared. The comparison algorithms are selected as representative algorithms with good performance in the current research, covering traditional machine learning algorithms, deep learning algorithms and integrated learning algorithms, among which, the traditional machine learning algorithms choose DT, XGBoost, GBDT, RR, SVR, RF, CART and BPNN. With the optimal settings within the range of hyperparameter optimization for all algorithms, the trained model is predicted on the test set, and the differences between the predicted and actual values are analyzed to obtain the R2 , MSE, MAE, MAPE and the operating time of the prediction model, and the detailed results are shown in Table 3. Table 3 shows that the prediction accuracy and generalization ability of the vehicle speed prediction model based on the LightGBM algorithm are ranked first among the nine algorithms compared, and the operating time is greatly shortened with the same prediction accuracy. XGBoost has the best generalization ability, but the running time

Vehicle Traveling Speed Prediction

9

Table 3. Performance comparison among different algorithms Algorithm

R2

MSE

MAE

LightGBM

0.9495

10.6461

DT

0.8687

20.2879

XGBoost

0.9342

GBDT RR SVR

MAPE/%

operating time

2.4001

2.4809

1.1553

3.3361

3.4463

0.9220

10.6658

2.3388

2.4231

9.7745

0.9131

13.4351

2.6599

14.4025

2.7791

0.8721

19.7844

2.9659

3.1373

0.0186

0.8696

20.1592

2.9561

3.1217

1617.4928

RF

0.9311

10.6517

2.4021

2.4829

121.2121

CART

0.8691

20.2264

3.3322

3.4442

0.9318

BPNN

0.9215

12.1242

2.5194

2.5942

280.3894

is longer. RR has the fastest operation speed, but the prediction accuracy is lower, and the generalization ability is ranked sixth. Therefore, LightGBM has the best overall performance in terms of prediction accuracy, generalization ability and operation speed.

5 Conclusions Adopting the ETC gantry transaction data on expressway, the driving speed of vehicles installed with vehicle OBU equipment is selected as the prediction object, which can accurately reflect the running condition of individual vehicles compared with the average speed of road sections, and realize the prediction of short-time running speed of vehicles, and the optimization processing of parameters can improve the prediction performance of the model while ensuring the computational efficiency. The speed prediction model of vehicles based on LightGBM has obvious improvement in prediction accuracy, generalization ability and calculation speed compared with eight machine learning algorithms such as DT, XGBoost, GBDT, RR, SVR, RF, CART, BPNN, and so on, and the following conclusions are obtained: (a) The prediction accuracy of the vehicle speed prediction model based on LightGBM is high, R2 reaches 0.9495, MAPE is only 2.4809%, the generalization ability is strong, and the computation speed is fast, and the model training and prediction time is 1.1553s. (b) Compared with the comparative algorithms, the vehicle speed prediction model based on LightGBM has the best overall performance, and it can achieve fast prediction of the vehicle speed under the premise of guaranteeing a high prediction accuracy and a strong generalization ability. It can realize the fast prediction of vehicle traveling speed with high prediction accuracy and strong generalization ability.

10

N. Li et al.

References 1. Gao, Y., Zhou, C., Rong, J., et al.: Short-term traffic speed forecasting using a deep learning method based on multitemporal traffic flow volume. IEEE Access 10, 82384–82395 (2022) 2. Tran, Q.H., Fang, Y.M., Chou, T.Y., et al.: Short-term traffic speed forecasting model for a parallel multi-lane arterial road using GPS-monitored data based on deep learning approach. Sustainability 14(10), 6351 (2022) 3. Guo, J., He, H., Sun, C.: ARIMA-based road gradient and vehicle velocity prediction for hybrid electric vehicle energy management. IEEE Trans. Veh. Technol.Veh. Technol. 68(6), 5309–5320 (2019) 4. Shin, J., Sunwoo, M.: Vehicle speed prediction using a Markov chain with speed constraints. IEEE Trans. Intell. Transp. Syst.Intell. Transp. Syst. 20(9), 3201–3211 (2018) 5. Lin, X., Zhang, G., Wei, S.: Velocity prediction using Markov Chain combined with driving pattern recognition and applied to Dual-Motor Electric Vehicle energy consumption evaluation. Appl. Soft Comput.Comput. 101, 106998 (2021) 6. Rasyidi, M.A., Kim, J., Ryu, K.R.: Short-term prediction of vehicle speed on main city roads using the k-nearest neighbor algorithm. J. Intell. Inf. Syst.Intell. Inf. Syst. 20(1), 121–131 (2014) 7. Wang, L.H., Cui, Y.H., Zhang, F.Q., et al.: Stochastic speed prediction for connected vehicles using improved bayesian networks with back propagation. Sci. China Technol. Sci. 65(7), 1524–1536 (2022) 8. Rahman, M., Kang, M.W., Biswas, P.: Predicting time-varying, speed-varying dilemma zones using machine learning and continuous vehicle tracking. Transp. Res. Part C: Emerg. Technol. 130, 103310 (2021) 9. Zhao, J., Gao, Y., Bai, Z., et al.: Traffic speed prediction under non-recurrent congestion: based on LSTM method and BeiDou navigation satellite system data. IEEE Intell. Transp. Syst. Mag.Intell. Transp. Syst. Mag. 11(2), 70–81 (2019) 10. Jeong, M.H., Lee, T.Y., Jeon, S.B., et al.: Highway speed prediction using gated recurrent unit neural networks. Appl. Sci. 11(7), 3059 (2021) 11. Li, Q., Cheng, R., Ge, H.: Short-term vehicle speed prediction based on BiLSTM-GRU model considering driver heterogeneity. Physica A A 610, 128410 (2023) 12. Zhang, A., Liu, Q., Zhang, T.: Spatial–temporal attention fusion for traffic speed prediction. Soft Comput., 1–13 (2022) 13. Pan, C., Zhu, J., Kong, Z., et al.: DC-STGCN: Dual-channel based graph convolutional networks for network traffic forecasting. Electronics 10(9), 1014 (2021) 14. Li, D., Ma, C.: Research on lane change prediction model based on GBDT. Physica A A 608, 128290 (2022) 15. Wang, D., Li, L., Zhao, D.: Corporate finance risk prediction based on LightGBM. Inf. Sci. 602, 259–268 (2022)

A Feature Matching Method Based on Rolling Guided Filter and Collinear Triangular Matrix Optimal Transport Liu Xiaoming, Yuan Yizhao, Li Qiqi, and Zhao Huaqi(B) The Heilongjiang Provincial Key Laboratory of Autonomous Intelligence and Information Processing, School of Information and Electronic Technology, Jiamusi University, Heilongjiang 154000, Jiamusi, China [email protected] , [email protected]

Abstract. Unmanned aerial vehicle(UAV) cross-view object matching is widely used in applications such as field, disaster detection, and express delvery. The feature-matching method is a key technology for UAV cross-view object localization. This paper proposed a feature-matching method based on collinear triangular matrix optimal transport. First, a rolling-guided filter is introduced to calculate the keypoint response, which addresses the problem of scale difference between the two images; Then, the response of keypoints is employed to calculate the network loss, which addresses the problem of viewing angle differences between images. Finally, collinear triangular matrix optimal transport is proposed to advance the accuracy and speed of the algorithm. The experimental results show that the proposed method have at least 4% higher AR than the existed method. Keywords: Cross-view geo-localization · Object matching optimization feature matching · Optimal transport

1

· Properties

Introduction

Cross-view geo-localization (CGL) is widely utilized in various fields, such as agriculture, aerial photography, navigation, event detection, and accurate delivery [18]. However, the satellite image and the UAV image are acquired with different views, which causes to the problem of view difference [16]. Feature matching is an important method to solve the problem of view difference in CGL. Sift is regarded as one of the traditional artificial feature matching methods, which has been verified in a large number of natural images. However, the satellite image and the airborne object have a view difference, the cross-view point matching methods are not satisfactory [9]. FAST [13] and ORB[8] are commonly considered as the fast feature matching methods, but they do not have good scale invariance. Tilde [5] is a time invariant keypoint detector, which can readily adapt to some image changes caused by weather, season, time, and c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 11–18, 2024. https://doi.org/10.1007/978-981-99-9412-0_2

12

L. Xiaoming et al.

other factors. Superpoint [19] uses self-supervised to learn interest points and description operator, which are proven to effectively solve the problem of view difference. These optimization algorithms have better performance for natural images. However, the cross-view image matching is limited by complex scenes. In order to improve the performance of view difference point matching, Yan et al. proposed a general learning theoretical framework for keypoint detectors and feature description [14], which considers the sparsity, repeatability, and distinguishability properties of convolutional neural networks to optimize the keypoint detector and feature description models. In practical applications, there are still several challenges when the state-ofthe-art works are utilized for object matching. First, The existing learning methods only consider a few properties’ features, which do not obtain keypoint with higher distinguishability and matchability; Second, The latest methods don’t consider the global information of point set, which caused the algorithm unaccrate. Motivated by those challenges, we present a new method to improve the performance of feature matching. (1) A rolling-guided filter was introduced to generate an image-scale pyramid that preserved the image edge information and improved the repeatability of keypoints. (2) We introduced the response of keypoint to constrain the feature matching convolutional neural network, which will advance the performance of feature matching. (3) The collinear triangular matrix optimal transport is employed to optimize the global information of keypoint set and advance the performance and speed of the method. The overall organization of the paper is outlined as follows: We introduce work related to the proposed method in Sect. 2. In Sect. 3, effective feature matching based on collinear triangular matrix optimal transport is proposed. Then, Sect. 4 shows the experimental analysis results of feature matching for the University-1652 database, followed by conclusions in Sect. 5.

2

Proposed Framework

Aimed at the problem of view angle difference in the matching of UAV objects, this paper studies the feature matching method based on collinear triangular matrix optimal transport. It is shown in Fig. 1. First, given the satellite image and UAV image, the keypoint response is computed by a feature detect method based on rolling guided filter constrains. Second, the convolutional neural network combining the keypoint response function optimizes the keypoint detector and feature descriptor and they are utilized to feature coarse matching. Finally, the feature matching is improved by the collinear triangular matrix optimal transport.

A Feature Matching Method

13

Fig. 1. Network structure of optimized feature matching

2.1

The Keypoint Response Computation Method Based on Rolling Guided Filter Constraints

In this section, we proposes a keypoint response compution method based on rolling guided filter constrains. Traditionally, small-scale structures were removed using a Gaussian filter, as shown in the following equation:   ||p − q||2 1  exp − I(q) (1) G(p) = Kp 2σs2 q∈N (p)

where K

p

=



 q∈N (p)

exp



||p−q||2 2σs2

 denotes the normalization term, I is

the input image, G is the output image. p and q are the pixel coordinates in the image, σs is the Gaussian standard deviation, and N (p) is the coordinate collection of the neighboring pixels around p. This filter eliminates structures smaller than σs in scale. Instead of the traditional approach, we chose a rolling-guided filter to establish the difference in the Gaussian rolling(DoGR) guided pyramid. First, we set the number of octaves in the scale-space pyramid to O and the number of layers to S. The scale at any position is given by σ(i, j) = k i−1 σ0 2j−1 1

(2)

where σ0 is the initial scale and k = 2 c is the scale transfer coefficient with S = c + 3, i ∈ [1, O], and j ∈ [1, S]. The rolling-guided filter is convolved with the input image to obtain the scale space image, which is expressed by the

14

L. Xiaoming et al.

following equation: J t+1 (p, σ(i, j)) =

1  exp (f1 + f2 )I(q) Kp q∈N (p)

||p − q||2 2σ 2 (i, j) ||J t (p) − J t (q)||2 f2 = − 2σr2 f1 = −

(3)

 where Kp = q∈N (p) exp (f1 + ff ) is the normalization term and I(q) is the input image. The scale space was constructed by repeatedly applying Eq. 3 to generate a Gaussian rolling guided pyramid with O octaves and S layers. A local extremum can be detected on the DoGR pyramid to obtain candidate keypoints, and compute the keypoint response. 2.2

Feature Matching Based on Property Optimization with Keypoint Response Constraints

In this section, we propsed a feature matching method based on property optimization with keypoint response constrained.

Satellite image

UAV image

Sparsity probability Distinguishability probability

The keypoint response Generate training image

Repeatability probability

Keypoint detector and descriptor model

Keypoint detection Computing similarity

Keypoint response detector

descriptor

Keypoint detector and descriptor convolutional neural network

Corresponding keypoint

Fig. 2. Network structure of optimized feature matching

First, the keypoint response is calculated by the rolling-guided filter. Then, a loss function is constructed based on the keypoint response with jointly optimizing sparsity, repeatability, and distinctiveness. The joint probability of properties are maximized using the Expectation Maximization (EM) algorithm, resulting in a trained feature detector θF and feature descriptor θD [3]. Finally, a keypoint detector and feature descriptor were used to compute the keypoints and feature descriptors of the images. Similar to the SIFT feature matching algorithm, the corresponding keypoint sets are obtained through nearest-neighbor matching (Fig. 2).

A Feature Matching Method

2.3

15

Collinear Triangular Matrix Optimal Transport

For advancing the performance of feature matching, we employed the collinear triangular matrix optimal transport to improve the accuracy and speed of the algorithm. In this section, we explored the special structure of kernel matrices by defining and exploiting the properties of the Lower-Collinear Triangular Matrix(L-CoLT matrix) and Upper-Collinear Triangular Matrix [6]. For these matrices, we can realize the matrix-vector multiplication with O(N) cost by using the idea of dynamic programming. Thus, the Lower-Collinear Triangular Matrix(L-CoLT matrix) is defined as follows:   CLN = M ∈ RN ×N |mi+1,j /mi,j = ri , j ≤ i; mi,j = 0, i < j, r ∈ (R\{0})N −1 (4) and the Upper-Collinear Triangular Matrix is defined as follows:    , i < j; mi,j = 0, i ≥ j, r  ∈ (R\{0})N −2 (5) CUN = M ∈ RN ×N |mi−1,j /mi,j = ri−1   C N = CLN + CUN = A + B|A ∈ CLN , B ∈ CUN

(6)

The space and time complexities of these algorithms are O(N ), which is much better than the original matrix-vector multiplication.

3

Experimental Analysis

To validate the performance of the proposed algorithm, experiments were conducted using the University-1652 dataset released by the University of Technology Sydney (UTS)[17]. In this study, the repeatability rate (RPR), recall rate (RR), accuracy rate (AR), and quantity rate (QR) were chosen as evaluation metrics for the experiments [7]. In the following sections, we introduce analysis of scale factor of rolling-guided filter, parameter analysis of the attribute optimization for keypoint constraints, and a comparison with existing methods. 3.1

Analysis of Scale Factor of Rolling-Guided Filter

In this section, we introduce the rolling-guided filter to construct the scale space, which helps preserve the edge information of the image and improves the performance of feature matching. As shown in Figs. 3, 4, 5, 6, 7 and 8. When σr = 0.05, the matching accuracy (AR) was 96.36%, indicating a good matching performance. However, as σr increases, AR decreases to below 95%. The highest AR of 98.44% is achieved when the σr = 0.2. When σr = 0.25, the AR drops sharply to 76.36%, indicating a significant decrease in matching performance.

16

L. Xiaoming et al.

Fig. 3. Matching accurate rate on different scale

3.2

Property Constraint Analysis in Feature Matching

This experiment’s property constraints analysis verifies the feature matching performance of keypoint response constraints. In order to verify the feasibility of the proposed method, this section conducts experimental verification on different properties’ constraint strategies. The comparison algorithms include: F M (S), F M (R), F M (C), F M (SC), F M (SR), F M (RC), F M (SRC) and F M (SRCK). FM (Feature Matching) represents feature matching. For the combination of properties in parentheses: S represents sparsity, R represents repeatability, C represents distinguishability, and K represents the keypoint response constraint. The experimental results are shown in Table 1. The results show that considering the four properties, the proposed method will be highest. Table 1. Changeable trend of point matching accuracy and recall rate with different properties

3.3

method

RPR

RR

AR

QR

FM(S) FM(R) FM(C) FM(SC) FM(SR) FM(RC) FM(SRC) FM(SRCK)

0.5745 0.5736 0.5466 0.5404 0.5657 0.5680 0.4186 0.4325

0.0066 0.0083 0.0110 0.107 0.0095 0.0068 0.0170 0.0182

0.1545 0.1627 0.1848 0.1670 0.1668 0.1425 0.1856 0.1923

0.0036 0.0036 0.0027 0.0027 0.0031 0.0032 0.0018 0.0021

Comparison and Analysis with Existing Methods

In this section, the proposed method is compared with several mainstream algorithms including SIFT[9], SURF[2], ORB[8], ASIFT[4], AKAZE[1], superpoint [3], SuperGlue[10], DISK[11], TILDE[12], and POP-Net [15]. The experimental

A Feature Matching Method

17

results are presented in Table 2. Among all algorithms, the proposed method have the highest performance for the RPR, RR, AR, QR, which indicate the proposed mathod is sutable for view-cross target matching. Especially, the AR of the proposed method is at least 4% higher than the other methods. Table 2. Comparison of the proposed method with existing methods

4

method

RPR

RR

AR

QR

TE

SURF ORB AKAZE ASIFT TILDE DISK Superpoint SuperGlue POP-Net Proposed method

0.8799 0.3811 0.7032 0.9665 0.2794 0.9075 0.5392 0.5849 0.5831 0.9550

0.0020 0.0242 0.0067 0.0046 0.0015 0.0033 0.0245 0.0649 0.0146 0.014

0.224 0.1690 0.2390 0.3290 0.0038 0.1860 0.2722 0.2775 0.2317 0.3644

0.0112 0.0019 0.0042 0.1608 0.0019 0.0124 0.0021 0.0026 0.0033 0.0316

0.93 0.63 1.39 14.6 2.56 1.5 0.72 2.2 0.67 0.83

Conclusion

A feature matching method based on rolling-guided filter and collinear triangular matrix optimal transport is introduced in this study. The proposed method adopts rolling-guided filter to calculate the keypoint response addressing the problem of scale difference between the two images. Besides, to solve the problem of viewing angle differences, the response of keypoints is utilized to calculate the network loss. Futhermore, collinear triangular matrix optimal transport is proposed to advance the accuracy and speed of the algorithm. It has a better performance comparising with the existed algorithms. However, it is discovered during practical applications that there is still room for further optimization in the stability of the optimal transport. In the future, we will improve the optimization capabilities of the optimal transport by introducing different optimization strategies. ACKNOWLEDGMENTS. This work is supported by the Natural Science Foundation of Heilongjiang (LH2022F052), Doctoral Program of Jiamusi University(JMSUBZ2022-13), and National Natural Science Foundation Training Project of Jiamusi University (JMSUGPZR2022-016).

18

L. Xiaoming et al.

References 1. Alcantarilla, P.F., Solutions, T.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Pattern Anal. Mach. Intell 34(7), 1281–1298 (2011) 2. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. Lect. Notes Comput. Sci. 3951, 404–417 (2006) 3. DeTone, D., Malisiewicz, T., Tomasz, A.: SuperPoint: self-supervised interest point detection and description. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, USA, pp. 337–349 (2018) 4. Gao, J., Sun, Z.: An improved ASIFT image feature matching algorithm based on POS information. Sensors 22(20), 7749 (2022) 5. Huo, Z., Zhang, Y., Liu, H., Wang, J., Liu, X., Zhang, J.: Improved covariant local feature detector. Pattern Recogn. Lett. 135, 1–7 (2020) 6. Liao, Q., Wang, Z., Chen, J., Bai, B., Jin, S., Wu, H.: Fast sinkhorn II: collinear triangular matrix and linear time accurate computation of optimal transport. arXiv preprint arXiv:2206.09049 (2022) 7. Liu, X., Li, J.B., Pan, J.S.: Feature point matching based on distinct wavelength phase congruency and log-gabor filters in infrared and visible images. Sensors 19(19), 4244–4264 (2019) 8. Loncomilla, P., del Solar, J.R., Mart´ınez, L.: Object recognition using local invariant features for robotic applications: a survey. Pattern Recogn. 60, 499–514 (2016) 9. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004) 10. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020) 11. Tyszkiewicz, M., Fua, P., Trulls, E.: DISK: learning local features with policy gradient. Adv. Neural. Inf. Process. Syst. 33, 14254–14265 (2020) 12. Verdie, Y., Yi, K., Fua, P., Lepetit, V.: Tilde: a temporally invariant learned detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5279–5288 (2015) 13. Wang, J., Wang, H., Nie, F., Li, X.: Sparse feature selection via fast embedding spectral analysis. Pattern Recogn. 139, 109472 (2023) 14. Yan, P., Tan, Y., Tai, Y., Wu, D., Hao, X.: Unsupervised learning framework for interest point detection and description via properties optimization. Pattern Recogn. 112, 1–13 (2021) 15. Yang, J., Shi, Y., Qi, Z.: Learning deep feature correspondence for unsupervised anomaly detection and segmentation. Pattern Recogn. 132, 108874 (2022) 16. Zheng, Z., Wei, Y., Yang, Y.: University-1652: a multi-view multi-source benchmark for drone-based geo-localization. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1395–1403 (2020) 17. Zheng, Z., Yunchao, W., Yi, Y.: University-1652: a multi-view multi-source benchmark for drone-based geo-localization. arXiv:2002.12186 (2020) 18. Zhu, P., et al.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. 44(11), 7380–7399 (2021) 19. Zou, B., Li, H., Zhang, L.: Self-supervised SAR image registration with SARsuperpoint and transformation aggregation. IEEE Trans. Geosci. Remote Sens. 61, 1–15 (2023)

Short-Time Driving Style Classification and Recognition Method on Expressway GuangHao Luo, FuMin Zou, Feng Guo(B) , and ChenXi Xia Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou 350118, China [email protected]

Abstract. This article proposes a short-term driving style classification method that considers spatiotemporal features. By deeply mining ETC (Electronic Toll Collection) transaction data from vehicles equipped with onboard units (OBU) on the highways in Fujian Province, including historical segment speeds, segment flow rates, passage time points, vehicle types, and other data, short-term driving style features are constructed. Subsequently, the silhouette coefficient method is employed to determine the optimal number of clusters, and the K-means algorithm is used to cluster the spatiotemporal features of vehicles’ driving behavior. Finally, the support vector machine is utilized to recognize driving styles. This approach accurately captures real-time features of vehicles, thereby enhancing the accuracy and reliability of driving style classification. By employing this method, more personalized driving style recognition and analysis can be provided to drivers, potentially playing a positive role in the field of intelligent driving and traffic management. Keywords: Short driving style · Contour coefficient method · Kmeans · Support vector machine

1 Introduction Driving style constitutes a generalized portrayal of a driver’s habitual driving conduct, encompassing inherent behavioral patterns [1]. These styles can be classified into shortterm and long-term variations, each with distinct applications. Short-term driving style primarily characterizes immediate vehicle operation, often influenced by ongoing driving tasks and scenarios. It finds utility in applications like driving behavior monitoring, correction [2, 3], and predicting driver intentions [4, 5]. Conversely, long-term driving style delineates extended-period vehicle operation, typically derived from extensive driving data analysis. This article concentrates on researching short-term driving style specifically for expressway vehicles. Diverse factors like gender, age, experience, and more shape distinct driving styles, yielding varying degrees of dissimilarity. Typically, driving styles fall into aggressive, neutral, and conservative categories [6–11]. However, these styles are not static; they shift due to driving tasks, environments, often transitioning. Notably, short-term style © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 19–28, 2024. https://doi.org/10.1007/978-981-99-9412-0_3

20

G. Luo et al.

shifts across driving scenarios display notable randomness. For instance, even typically conservative drivers may exhibit aggressive behavior under urgency, affecting safety. This highlights dynamic vehicle control influenced not just by long-term style but also by short-term task-related features. Thus, prompt access to short-term style assessment for all parties in conflicts enables drivers to make targeted operational choices, bolstering traffic safety and flow efficiency. Moreover, in the era of intelligent vehicles, positioned as consumer products, smart cars must align with individual preferences for improved sales. In conclusion, research on short-term driving style evaluation will play a specific role in various stages in the future, especially in significantly enhancing road traffic safety levels and traffic efficiency. It holds significant research value as it can provide substantial assistance in improving road safety and traffic flow efficiency. The existing driving style classification methods are divided into non-machine learning class and machine learning class. The machine learning class is also subdivided into unsupervised learning class and supervised learning class. For non-machine learning algorithms, manually defined rules (or thresholds) are used to classify driving styles. They mainly include questionnaire analysis, rule-based algorithms, and fuzzy logic. These algorithms are usually relatively simple, highly interpretable, and require minimal computational capabilities. However, they demand a high level of expertise from the algorithm designer. Unsupervised learning requires less prior knowledge of driving style identification compared to non-machine learning algorithms. It also does not rely on labeled data, making it easier to obtain unlabeled data. The K-means clustering algorithm is a commonly used method for driving style recognition. It partitions the given data into k clusters in a way that maximizes the inter-cluster distance while minimizing the intra-cluster distance [12]. In addition to K-means clustering, other unsupervised learning methods such as DBSCAN [13], GMM [14], and DP-means [15] have been employed for driving style classification and recognition. Supervised learning requires algorithm designers to have a fundamental understanding of driving style recognition and manually add driving style labels to relevant data, but it results in a higher recognition accuracy. Liu et al. [16] used onboard signals and physiological signals as inputs for the KNN model, used for driving style recognition. Zhang et al. [17] developed a window-based SVM model for classifying drivers of three different vehicles, with average accuracies of 75.83%, 85.83%, and 86.67%, respectively. However, SVM may not perform well when classifying imbalanced datasets. In a study [18], random SMOTE was applied to balance the number of positive and negative samples, and then a cost-sensitive SVM was used to set different penalty factors for positive and negative samples, thereby improving the model’s ability to recognize hazardous driving behavior. This article proposes a method for short-term driving style classification and recognition on highways based on ETC (Electronic Toll Collection) big data. By using features such as segment passing speed and segment flow rate generated by vehicles, the driving style is classified, and a short-term driving style recognition model is established on the basis of this classification to accurately identify the driving styles of vehicles. This

Short-Time Driving Style Classification and Recognition Method

21

approach utilizes extensive ETC data to analyze the driving speed and traffic flow characteristics of vehicles on different road segments, enabling more precise classification and recognition of various driving styles. It provides strong support for driving style research and traffic management.

2 Methodology This section introduces the deep mining of implicit spatiotemporal features from ETC transaction data, including vehicle’s historical segment passing speed, segment flow rate, and vehicle type, which serve as indicators for constructing short-term driving style metrics. Then, the silhouette coefficient method is utilized to determine the optimal number of clusters for the data. Next, the PCA-Kmeans algorithm is employed to classify vehicles’ driving styles. Finally, a support vector machine (SVM) is used to identify the short-term driving styles of the vehicles. The overall process is shown in Fig. 1. Driving style related characteriscs Data preprocessing and feature extraction

Driving style classification number selection

Data cleaning of missed detection, false detection and repeated detection

Determine the selection range of cluster center number

Velocity dimension feature extraction

Traffic dimension feature extraction

Calculate the profile coefficients for each sample.

Calculate the average profile coefficients of all samples for each cluster number.

K clustering centers

Support vector machine

Driving style classification

Driving style recognition

K initial cluster centers are selected as the centers of the initial cluster

Feature input

Initialize cluster center, sample planning

Parameter setting

New clustering centers are calculated Model training Iterate until convergence

Vehicle type feature extraction

The number of clusters with the maximum average profile coefficients is chosen as K

Output clustering results

Driving style recognition

Vehicle driving style

Fig. 1. Overall frame diagram

2.1 Data Preprocessing Due to the fact that the original data comes from the actual production system, there is dirty data that needs to be denoised.As shown in Formula 1, the sequence represents the normal path of the vehicle’s driving trajectory, where each identifier is a FLAGID representing the vehicle passing through the corresponding gantry. Formula 2 represents duplicate data, where the gantry 34025B is detected twice, and the two transaction data are nearly identical. Formula 3 indicates false detection, where gantry 34025B should be connected to 34025D, but due to certain issues in the data system, gantry 350501 communicated with the vehicle, resulting in the inclusion of gantry 350501 in the transaction data, forming an impossible passing relationship in the trajectory path. Formula 4 represents missed detection, as it can be observed that gantry 34025D is not present in the vehicle’s passing path, leading to a certain amount of missing data in the vehicle’s trajectory information. [340259, 34025B, 34025D, 34025F, 340261, 340263, 340265]

(1)

22

G. Luo et al.

[340259, 34025B, 34025B, 34025D, 34025F, 340261, 340263, 340265]

(2)

[340259, 34025B, 350501, 34025D, 34025F, 340261, 340263, 340265]

(3)

[340259, 34025B, , 34025F, 340261, 340263, 340265]

(4)

Regarding duplicate detections, this paper utilizes the built-in Python function “drop_duplicates” to remove duplicates from the data. For false detections, the paper determines the completeness of the path based on OD (origin-destination) information. If the path is complete, the trajectory is rearranged according to the topology of the highway. If the path is incomplete, it is removed from the data. For missed detections, they are directly deleted from the dataset. 2.2 Feature Construction Based on ETC data, vehicle trajectories are constructed from time series. Using these, historical driving characteristics are analyzed. First, vehicle trajectories are used to find passing time nodes of adjacent gantries and calculate time between them. Second, adjacent gantry coordinates are acquired, and distances are calculated using Gaode API. Then, kinematic equations determine historical driving speed. To capture driving style, study gets traffic speeds of the target section and its two preceding sections. To maintain traffic flow authenticity and mitigate distortion from extended time gaps, this study tallies vehicles passing each segment in the ten minutes prior to a vehicle’s entry until the transaction at the preceding gantry. This count signifies segment traffic during passage. Furthermore, aligning with prior segment speeds, the study computes traffic flow for the present segment and its two precursors. Leveraging ETC data’s vehicle types, this info is directly extracted. As ETC data contains vehicle types, this information is directly extracted from the data. 2.3 Selection of the Number of Driving Style Categories Determining driving style categories is inherently subjective, lacking fixed rules or standards. Classification considers road conditions, driver preferences, and habits. Given style diversity, accurate subjective classification is challenging. The silhouette coefficient method offers an objective, quantitative evaluation, aiding category selection without fixed rules. This enhances result credibility, scientific validity, and style classification reliability. The silhouette coefficient (SC) gauges sample compactness within clusters and separation between clusters. Higher SC values signify superior clustering: intra-cluster samples are nearer, inter-cluster samples are distant. SC is a common metric to assess clustering performance. The formula for SC is: SC(i) =

b(i)−a(i) max{a(i),b(i)}

(5)

In Formula (5), SC(i) ∈ [−1, 1] represents the silhouette coefficient of sample i. b(i) = min{bi1 , bi2 , . . . , bik ) represents the minimum dissimilarity between the cluster

Short-Time Driving Style Classification and Recognition Method

23

containing sample i and other clusters. Here, bik represents the average distance of the sample to all samples in cluster Cj , where Cj is another cluster different from the one containing the sample. a(i) represents the average distance of the sample to all other samples within the same cluster. When SC(i) is close to 1, it indicates that the sample is well-clustered, and the clustering is reasonable. When SC(i) is close to -1, it suggests that the sample’s classification is inappropriate, and it should be assigned to another cluster. If SC(i) is close to 0, it implies that the sample is located near the boundary between two clusters, indicating some ambiguity in its classification. 2.4 Short-Time Driving Style Classification Method Once the number of driving style categories is determined, the data needs to be classified based on driving characteristics. The original data contains high-dimensional features, and high-dimensional datasets can increase model complexity and computational costs. Therefore, PCA was used to reduce the dimensionality of the data. The goal of PCA is to linearly project high-dimensional data into a lower-dimensional space while maximizing the variance of the data in the target lower-dimensional space. This is done to prevent the loss of significant information from the original data and to avoid introducing errors in subsequent mathematical analyses. Given a sample set M = {X 1 , X 2 , …, X M } with corresponding N-dimensional features X i = (x1i , x2i , . . . , xNi ), after feature centering, the covariance matrix is as shown in Eqs. (6)–(7).   cov(x1 , x1 ) cov(x1 , x2 ) (6) C= cov(x2 , x1 ) cov(x2 , x2 ) cov(x1 , x1 ) =

M  i=1

  x1i −x1 x1i −x1 M −1

(7)

The eigenvalues λ and their corresponding eigenvectors u of the covariance matrix C are sorted in descending order, and the top k eigenvectors are selected. The calculation formula for the new feature values after dimensionality reduction is as follows: T ⎤ ⎡ i ⎤ ⎡ T  i i u1 · x1 , x2 , . . . , xni y1 T ⎥  ⎢ yi ⎥ ⎢ u2T · x1i , x2i , . . . , xni ⎥ ⎢ 2⎥=⎢ (8) ⎥ ⎣...⎦ ⎢ ⎣ ⎦ ...   T yki ukT · x1i , x2i , . . . , xni Utilizing new feature values post dimensionality reduction, K-means assesses data object similarity through a chosen distance formula. Distance reflects inverse similarity, smaller similarity yielding greater distance. K-means first determines driving style categories using results from Sect. 3.3. It initializes corresponding initial cluster centers "C" and computes distances to other data objects. The chosen distance measure is Euclidean, with the formula for calculating Euclidean distance between a cluster center and other data objects in space as follows:

 2 m (9) d (x, Ci ) = j=1 (xj − Cij )

24

G. Luo et al.

In the formula, x represents a data object, Ci is the i-th cluster center, m is the dimensionality of the data objects, and xj and Cij represent the attribute values of the j-th dimension of the data object x and the i-th cluster center Ci respectively. Employing Euclidean distance for similarity measurement, data objects akin to cluster centers are assigned to respective clusters. After assignment, new cluster centers are computed as averages of data objects in each of the k clusters, minimizing dataset squared error sum. The calculation formula is as follows:   (10) SSE = ki=1 x∈C |d (x, Ci )|2 The value of SSE (Sum of Squared Errors) is used as a measure of the quality of the clustering results. When SSE no longer changes or converges, the iteration stops, and the final result is obtained. 2.5 Short-Time Driving Style Recognition Based on 3.4 classifications, Support Vector Machine (SVM) is utilized for driving style recognition. SVM, a novel statistical learning-based ML approach, aims to find optimal classification hyperplanes ensuring complete separation of two sample classes in the original space when linearly separable. In non-linear scenarios, the kernel method reshapes the high-dimensional space, enabling resolution of non-linear issues as linearly separable. The optimal classification hyperplane formula is: wϕ(x) + b = 0

(11)

where: w represents the weights, and b represents the bias. Therefore, the two-classification problem in the original sample space can be represented as: yi (wϕ(x) + b) ≥ 1

(12)

where yi ∈[1,2,…,n] represents the output class. Given that some samples may be misclassified, introducing slack variables ε_i and penalty factor C, the constraints for the hyperplane become   min 21 ||w||2 + C li=1 εi (13) s.t.yi (wϕ(x) + b) ≥ 1 − εi Introducing Lagrange multipliers αi , the original problem is transformed into the dual problem, which is   ⎧ l l 1 l ⎪ α − α α y y ϕ(x )ϕ(y ) i i j i j i i ⎨ max i=1 i=1 j=1 2 (14) ≤C s.t.0 ≤ α i ⎪ l ⎩ i=1 yi αi = 0 According to the Kuhn-Tucker conditions, αi must satisfy the following constraints:   (15) αi yi (wϕ(xi ) + b) − 1 + εi = 0

Short-Time Driving Style Classification and Recognition Method

25

Therefore, the samples corresponding to non-zero αi are the support vectors. Solving the above problem yields the optimal classification function, which is:   (16) f (x) = sgn m i=1 yi αi K(x, xi ) + b In the equation, m represents the number of support vectors, and K(x, xi )=ϕ T (x)ϕ(xi ) stands for the kernel function. The kernel function comes in various types, such as the linear kernel function, polynomial kernel function, radial basis kernel, etc. In this paper, the radial basis kernel is chosen, and its expression is as follows:   (17) K(x, xi ) = exp −γ |u − v|2 In the equation, m represents the number of support vectors, and K(x, xi ) = ϕ T (x)ϕ(xi ) stands for the kernel function.

3 Experimental Results and Analysis To assess model performance in driver style classification, this study employs accuracy, recall, and F1 score. Additionally, a confusion matrix visually presents model classification outcomes. Among these metrics, accuracy is a widely used classification evaluation indicator. It gauges model precision in predicting correct samples relative to the total, often expressed as a percentage. Recall, also termed sensitivity or true positive rate (TPR), is another crucial classification metric. It measures the model’s efficacy in correctly identifying positive samples (belonging to a specific class), reflecting its ability to accurately recognize true positive class samples. F1 score, a comprehensive classification metric, strikes a balance between precision and recall. It’s the harmonic mean of precision and recall, capturing both positive sample prediction accuracy and the model’s positive sample recognition capability. 3.1 Analysis of Driving Style Classification Results We categorized vehicle speeds as slow, moderate, and fast, and classified road conditions as smooth, slow-moving, and congested based on vehicle flow. Analyzing driver style recognition results from a speed and flow perspective, Table 1 reveals that class 0 pertains to vehicles maintaining slow speeds on smooth roads, class 1 signifies vehicles maintaining fast speeds in congestion, class 2 represents vehicles with slow speeds on smooth roads, class 3 signifies vehicles maintaining slow speeds during slow-moving conditions, class 4 indicates vehicles exhibiting abnormal behaviors marked by very low speeds, and class 5 encompasses vehicles maintaining moderate speeds during slow-moving conditions. VALL1 represents the overall speed of Sect. 1, VALL2 represents the overall speed of Sect. 2, V1 represents the vehicle speed in Sect. 1, V2 represents the vehicle speed in Sect. 2.

26

G. Luo et al. Table 1. Classification of driving styles. VALL1

VALL2

V1

V2

F1

F2

Vehclass

Category

Car1

91

95

70

72

205

216

16

0

Car2

88

86

78

80

113

110

16

2

Car3

86

101

74

81

373

293

16

5

Car4

93

81

70

56

375

404

16

3

Car5

88

85

39

68

307

366

12

4

Car6

93

95

109

108

540

676

1

1

3.2 Short-Time Driving Style Recognition Results As shown in Fig. 2, the accuracy of each model in driver style classification is 88%, 79%, 84%, 68%, 79%, and 63% respectively. Our model achieved an accuracy of 88%, which is the highest among all models. In terms of recall, the recall rates of each model are 84%, 65%, 85%, 57%, 75%, and 84% respectively. Our model’s recall rate is 84%, slightly lower than Gaussian Naive Bayes. The reason for the good performance of Gaussian Naive Bayes in recall is due to the assumption that features follow a Gaussian (normal) distribution given the class. Although this assumption may not hold completely in realworld data, in many cases, the distribution of continuous features can be approximated as Gaussian. By making this assumption, Gaussian Naive Bayes can model the distribution of data well, leading to improved accuracy and recall in classification. As for the F1 score, the performance of each model is 0.88, 0.79, 0.87, 0.68, 0.79, and 0.85, respectively. Our proposed model achieves the best performance in terms of F1 score.

Fig. 2. Model performance comparison

Taking all evaluation metrics into consideration, our model outperforms all other models in terms of accuracy and F1 score. Although it slightly lags behind Gaussian Naive Bayes in recall, the difference between the two is not significant. Therefore, it can be observed that our proposed model exhibits the best overall performance among all the models.

Short-Time Driving Style Classification and Recognition Method

27

4 Conclusion The model proposed in this paper effectively categorizes short-term vehicle driving styles based on speed, traffic flow, and vehicle type, accurately distinguishing diverse styles through classification. The model outperforms similar machine learning and deep learning counterparts. However, the influence of ETC data entropy limits the comprehensiveness of considered driving style factors. For instance, aspects like continuous driving duration, weather conditions, and fatigue driving were unaccounted for. In future research, we aim to integrate multiple data sources, including vehicle GPS data, to further enrich driving style factors and establish multi-dimensional driving style insights.

References 1. Mohammadnazar, A., Arvin, R., Khattak, A.J.: Classifying travelers’ driving style using basic safety messages generated by connected vehicles: application of unsupervised machine learning. Transp. Res. Part C Emerg. Technol. 122, 1–18 (2021) 2. Johnson, D.A., Trivedi, M.M.: Driving style recognition using a smartphone as a sensor platform. In: 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 1609–1615. IEEE (2011) 3. Meseguer, J.E., Toh, C.K., Calafate, C.T., Cano, J.C., Manzoni, P.: Drivingstyles: a mobile platform for driving styles and fuel consumption characterization. J. Commun. Networks 19(2), 162–168 (2017) 4. Wang, X., et al.: Safety-balanced drivingstyle aware trajectory planning in intersection scenarios with uncertain environment. IEEE Trans. Intell. Vehicles (2023) 5. Yuan, J., Yang, L.: Predictive energy management strategy for connected 48v hybrid electric vehicles. Energy 187, 115952 (2019) 6. Yang, L., Li, X., Guan, W., et al.: Assessing the relationship between driving skill, driving behavior and driving aggressiveness. J. Transp. Saf. Secur., 1–17 (2020) 7. García, J.L.P., Castro, C., Doncel, P., et al.: Adaptation of the multidimensional driving styles inventory for Spanish drivers: Convergent and predictive validity evidence for detecting safe and unsafe driving styles. Accident Anal. Prevention 136, 105413 (2020) 8. Fountas, G., Sonduru Pantangi, S., Hulme, K.F., et al.: The effects of driver fatigue, gender, and distracted driving on perceived and observed aggressive driving behavior: a correlated grouped random parameters bivariate probit approach. Analytic Methods in Accident Research, 22 (2019) 9. Li, X., Yan, X., Wong, S.C.: Effects of fog, driver experience and gender on driving behavior on S-curved road segments. Accid. Anal. Prevention 77, 91–104 (2015) 10. Ge, Y., Qu, W., Jiang, C., et al.: The effect of stress and personality on dangerous driving behavior among Chinese drivers. Accid. Anal. Prev.. Anal. Prev. 73, 34–40 (2014) 11. Delhomme, P., Cristea, M., Paran, F.: Self-reported frequency and perceived difficulty of adopting eco-friendly driving behavior according to gender, age, and environmental concern. Transp. Res. Part D 20(may), 55–58 (2013) 12. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979) 13. Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Trans. Database Syst. (TODS) 42(3), 1–21 (2017)

28

G. Luo et al.

14. Md N Shakib, Md Shamim, Md Nazirul Hasan Shawon, Most Kaniz Fatema Isha, MMA Hashem, and MAS Kamal. An adaptive system for detecting driving abnormality of individual drivers using gaussian mixture model. In: 2021 5th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT), pp. 1–6. IEEE (2021) 15. Brambilla, M., Mascetti, P., Mauri, A.: Comparison of different driving style analysis approaches based on trip segmentation over gps information. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 3784–3791. IEEE (2017) 16. Yang, L., Ma, R., Michael Zhang, H., Guan, W., Jiang, S.: Driving behavior recognition using eeg data from a simulated carfollowing experiment. Accid. Anal. Prevention 116, 30–40 (2018) 17. Zhang, C., Patel, M., Buthpitiya, S., Lyons, K., Harrison, B., Abowd, G.D.: Driver classification based on driving behaviors. In: Proceedings of the 21st International Conference on Intelligent User Interfaces, pp. 80–84 (2016) 18. Zhang, L., Tan, B., Liu, T., Li, J.: Research on recognition of dangerous driving behavior based on support vector machine. In: Twelfth International Conference on Graphics and Image Processing (ICGIP 2020), vol. 11720, pp. 471–476. SPIE (2021)

Expressway Short-Term Traffic Flow Prediction Based on CNN-LSTM Ting Ye1 , Fumin Zou1 , and Feng Guo2(B) 1 Fujian University of Technology, Fuzhou 350118, China 2 Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, Fujian

University of Technology, Fuzhou 350118, China [email protected]

Abstract. In recent years, with the continuous improvement of people’s quality of life, the purchase of motor vehicles has become more and more. Accompanied by serious traffic congestion, people’s travel caused great inconvenience. Shortterm traffic flow prediction can help the traffic department to timely understand the future congestion situation of a certain section, so as to respond in advance. At the same time, it also has certain significance for people’s travel, so that people can avoid the congested section in time. In view of the problem that traditional forecasting methods only consider the temporal characteristics of traffic flow, ignoring its spatial characteristics, a short-time traffic flow forecasting model combining convolutional neural network (CNN) and long short-term memory network (LSTM) was proposed. The spatial correlation of traffic flow is mined through CNN, the temporal characteristics of traffic flow are mined through LSTM model, and the extracted temporal and spatial characteristics are integrated to achieve short-term traffic prediction. The experimental results show that the error of CNN-LSTM traffic flow prediction model is obviously smaller than other models. Keywords: Traffic Flow Prediction · Convolutional Neural Network · Long Short-Term Memory

1 Introduction 1.1 Background With the rapid development of modern cities, more and more extensive traffic problems are born, affecting people’s travel feelings. It has become a major life demand for people to choose the optimal route according to the traffic conditions. Accurate prediction of traffic flow can assist the optimal travel path planning [1–3]. The short term traffic flow forecast of expressway should take into account the correlative characteristics of expressway traffic system: nonlinear and periodic. Because of these characteristics of short-term traffic flow, the accuracy of prediction is difficult to meet the needs of actual work [4]. At present, it is difficult to solve the problem based on the traditional statistical prediction model, but with the development of traffic big data analysis, it provides a basis for building a short-term highway traffic prediction model © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 29–36, 2024. https://doi.org/10.1007/978-981-99-9412-0_4

30

T. Ye et al.

that can be iteratively optimized by using machine learning. Through data accumulation, the model can constantly revise itself, gradually promote the improvement of model prediction accuracy, provide reliable support for the future operation and management of expressways, and improve the accuracy of management [5]. 1.2 Related Work The methods of traffic flow prediction can be divided into three kinds: one is the method with parameters, the other is the method without parameters, and the third is the simulation method. Among them, the methods with parameters include time series method, Kalman filter method, etc. Parameterless methods include artificial neural networks and machine learning. The method of simulation is to use the traffic simulation tool for model simulation. More and more people have tried to apply deep learning methods to short-term traffic flow prediction and achieved many remarkable results. Yuhan Jia et al. [6] explained that the DBN traffic flow prediction model considering rainfall factors has better effects than the traditional model. Moretti F et al. [7] established an urban traffic flow prediction model by using neural networks for integrated hybrid modeling. Luo Xianglong et al. [8] proposed a traffic flow prediction model based on the combination of deep belief network and support vector machine. Luo Wenhui et al. [9] based on the CNN-SVR mixed deep learning model for short-term traffic flow prediction, applied CNN at the bottom of the network to extract traffic flow features, and input the extracted results into the SVR regression model for traffic prediction. Traffic flow is a typical spatio-temporal data, while most of the above methods are based on the historical data of the series itself, and only consider the temporal characteristics of traffic flow, without fully exploring the spatial characteristics of traffic flow [10]. In order to fully explore the spatiotemporal characteristics of traffic data, comprehensively consider the relationship between time and space of traffic data, and improve the accuracy and efficiency of short-term traffic flow prediction, this paper puts forward a CNN-LSTM short-term traffic flow prediction model.

2 Methods 2.1 Convolutional Neural Network Convolutional neural network (CNN) have become a hot topic in the field of deep learning. Its advantage is that it has a network architecture of weight sharing and local perception, which can reduce the computational complexity of neural networks and reduce the number of weights, and can directly use image coding as input for feature extraction, avoiding image preprocessing and display feature extraction [11, 12]. Convolutional neural network have three characteristics, one is local regional connection. In the traditional neural network structure, the two adjacent layers of neurons are fully connected, that is, the neurons of layer N-1 are fully connected with all the neurons of layer n, but in the convolutional neural network, layer N-1 is connected with part of the neurons of layer n. The second is weight sharing, which ensures that each pixel has weight factors and these weight factors are shared by the entire image, greatly reducing

Expressway Short-Term Traffic Flow Prediction Based on CNN-LSTM

31

the number of parameters in the convolution kernel [13, 14]. By adding convolution operations, features can be automatically extracted by utilizing local correlation in the image space. The third is pooling, which is another important feature of convolutional neural networks and is often called downsampling. The most common methods are maximum pooling, minimum pooling, and average pooling. The advantage of pooling is that the resolution of the image is reduced, and the entire network is not easy to overfit. 2.2 Long Short-Term Memory With the increase of the number of neural network layers, RNN model will inevitably appear the phenomenon of gradient disappearance or gradient explosion, and LSTM can greatly alleviate the above phenomenon with its unique structure. LSTM model adds the setting of gate structure on the basis of RNN model, and uses three kinds of gates to control information. The first layer is the forgetting door [15], which will forget some unimportant information and choose to remember some important data features; The second layer door is the input door, and the input door will update some content to fill in the information that has been omitted by the forgotten door; The third layer gate is the output gate, which can process the previous information and then output the data. The input of the vehicle flow prediction model is the input Xt−1 of the previous stage (the value of the vehicle flow at the previous time), the hidden state HT−1 of the previous stage, and the storage state Ct−1 of the previous stage. The output is the hidden state (H) and the stored state (C) at the present stage (the traffic flow at a certain point in the present moment). By processing the hidden state (H) of the current stage, the predicted value Y of the current stage of traffic flow can be obtained. The specific solution is as follows. ft = σ(X1 ∗ Uf + Ht−1 ∗ Wf ) ∼

(1)

Ct = tanh(Xt ∗ Uc + Ht−1 ∗ Wc )

(2)

It = σ(Xt ∗ Ui + Ht−1 ∗ Wi )

(3)

Ot = σ(Xt ∗ Uo + Ht−1 ∗ Wo )

(4)



Ct = ft ∗ Ct−1 + It ∗ Ct

(5)

Ht = Ot ∗ tanh(Ct )

(6)

where Xt is the input at the present stage (the traffic flow at a certain point in the present moment), Ht−1 is the output at the previous stage (the value of the traffic flow at the previous moment after processing by the previous neural unit), Ct−1 is the storage state at the previous stage, Ht is the output at the present stage (the traffic flow at a certain point in the present moment after processing by this neural unit), and Ct is the storage state at the present stage. U and W are the weights.

32

T. Ye et al.

2.3 Establishment of CNN-LSTM Model The traffic flow of a location is usually related not only to its own historical traffic flow data, but also to the traffic flow of its neighboring locations. Since CNN can effectively process the data representation with local structure, this paper uses CNN to capture the spatial features of traffic flow, and constructs a two-dimensional feature matrix containing temporal and spatial information. By combining the weight sharing characteristic of CNN and the memory characteristic of LSTM, the temporal and spatial characteristics of traffic flow are mined. The CNN model and LSTM model are combined properly, and the temporal and spatial characteristics of traffic flow are analyzed to achieve short-term traffic flow prediction. The process of CNN-LSTM short-term traffic flow prediction model can be described as: Step 1 Preprocess the original traffic flow data; Step 2 Input the processed data into the CNN network to extract the spatial characteristics of traffic flow; Step 3 Input the data processed by CNN into the LSTM layer to predict the value of the next moment of the time series; Step 4 Reverse normalization is performed to obtain the final predicted value.

3 Experimental Research 3.1 Experimental Analysis In the experimental part, we used the data generated by the ETC gantry on the highway section from Qingkou Junction to Yutian Changle in Fujian Province from May 2 to 5, 2021. When the vehicle passes through the gantry to communicate with it, the data generated by the gantry is called ETC transaction data, which mainly includes the transaction time, transaction gantry id, vehicle license plate and other fields. This section will use the data after clearing abnormal data, such as license plate loss, duplicate records, etc. Then split the training set and the test set, the training set is from May 2 to May 4 “total trade” data, the test set is May 5 this day “total trade” data. A custom function to display the learning curve of LSTM neural network during training. As shown in Fig. 1 below. In this function, the subgraph on the left shows how the loss value of the model training and validation dataset changes with the training rounds, while the subgraph on the right shows how the model’s accuracy (i.e., MAE) on the training and validation dataset changes with the training rounds. In the left subgraph, the loss of the training data set starts out high, but as the number of iterations increases, the error gradually decreases and converges to some smaller value. At the same time, because of the verification set, you can see that the blue curve is also continuously optimized. In the right subgraph, we can also see that the MAE of the model on the training and validation datasets is also gradually decreasing.

Expressway Short-Term Traffic Flow Prediction Based on CNN-LSTM

33

Fig. 1. The left figure shows the model loss rate on the model training and validation data set, and the right figure shows the model accuracy on the model training and validation data set.

3.2 Model Parameter Adjustment When applying the established neural network model, the parameters involved in the model should be set. This part of the model in this paper includes convolution kernel and bias term in convolutional neural network, weight matrix and bias term in LSTM, weight parameter of fusion of three time span prediction results, and fully connected parameter for external information processing. These parameters are learned through model iterative training. The whole neural network will reverse propagate the training error and constantly adjust the intermediate weight parameters. After each batch of training data, the intermediate parameters will be adjusted once, and the training will stop until the required number of iterations is reached or the error meets the set threshold. 3.3 Model Performance Evaluation Index In order to judge the accuracy of the prediction, some evaluation indexes are needed to express the accuracy of the prediction. The mean absolute error (MAE) and root mean square error (RMSE) of the prediction results of the CNN-LSTM model for the test set are calculated. The mean absolute error (MAE) measures the mean absolute error between the predicted value and the true value. Root mean square error (RMSE) measures the root mean square error between the predicted value and the true value. Its calculation formula is shown in the equation.  1 m  yi − yi  i=1 m  2 m  i=1 yi − yi RMSE = m MAE =



(7)



(8) 

where, yi is the real traffic flow data collected from the sensor, yi is the traffic flow data predicted by the prediction model, and m represents the number of traffic flow data.

34

T. Ye et al.

3.4 Comparison Model Experiment In order to verify the effectiveness of traffic flow prediction based on CNN and LSTM models, this paper chooses the following benchmark prediction methods for comparison, including time series prediction model and deep learning-based prediction model. There are mainly historical average method (HA), differential integrated moving average autoregressive model (ARIMA) and seasonal differential autoregressive moving average model (SARIMA). HA model: Historical average method, which uses the average of historical data as the predicted value of future moments. For example, when we want to predict the traffic flow between 8:30 am and 9:00 am, we can use the average traffic flow between 8:30 am and 9:00 am in the historical record data as the forecast. This forecasting method is simple and easy to implement, but the forecasting effect is not ideal. ARIMA model: Differential integrated moving average autoregressive model is a very classical forecasting method in time series forecasting models. ARIMA has better forecasting results, but the model has higher requirements on data. SARIMA model: Seasonal differential autoregressive moving average model, one of the time series forecasting methods. The performance evaluation indexes of the above models are shown in Table 1. Table 1. Performance evaluation indexes of each model. Algorithm

MAE

RMSE

HA

23.33

27.78

ARIMA

20.24

23.74

SARIMA

39.10

47.76

7.41

9.51

CNN-LSTM

As can be seen from Table 1, the prediction effect of the combined prediction model of CNN and LSTM proposed in this paper is better than that of the benchmark comparison model in the data set we used (the data generated by the ETC gantry of the expressway section from Qingkou Junction to Yutian Changle from May 2 to 5, 2021). The effect of the model is obviously better than that of the time series prediction method. The improvement point of the model in this paper is that the LSTM network layer is added to the convolutional neural network. The experimental results show that LSTM can capture the time dependence in traffic flow prediction, which is helpful for more accurate time series prediction. The model used in this paper has a simple structure and can complete training learning without a large amount of training data. In the case of a small amount of data, the combination of CNN and LSTM prediction will perform better. 3.5 Result Analysis Plot the change image of the real and predicted values of time series data over time. First, convert the index of the test set into date format and match the predicted data length. Then,

Expressway Short-Term Traffic Flow Prediction Based on CNN-LSTM

35

Fig. 2. Relationship between the predicted value and the real value of this model.

obtain the date range to be displayed and select the corresponding real data according to the date range. Next, the real data and predicted data are drawn through the matplotlib library, in which the pink curve represents the real data and the yellow curve represents the predicted total number of transactions. As shown in Fig. 2, the function consists of three curves of total transaction volume change, which are the real total transaction volume curve, the total transaction volume curve in the test data set and the predicted total transaction volume curve respectively. By observing the trend of the three curves, we can better understand whether the model prediction is good, and we can further understand whether the gap and trend between the real trading data and the predicted value are similar.

4 Conclusions Highway is an important way for vehicles to travel, and the real-time perception of highway traffic conditions is an important guarantee for vehicles to run and the road is smooth. Among them, highway traffic flow is an important index of highway traffic status evaluation. If the short-term traffic flow of expressways can be accurately predicted, which can not only provide decision-making assistance for highway managers, but also effectively alleviate traffic congestion, reduce traffic accidents and reduce traffic pressure. At present, there are more and more traffic flow prediction methods, the use of big data to mine traffic information from data and explore traffic travel rules has achieved certain development results. In order to solve the problem that traditional traffic flow prediction methods can not combine the temporal and spatial characteristics well, a deep neural network model combining CNN and LSTM is proposed in this paper. It can be seen from the experimental results that the prediction accuracy of the CNN-LSTM model combined with space-time characteristics is higher than that of the prediction model only considering time characteristics, and it can effectively predict the traffic flow in short time.

36

T. Ye et al.

References 1. Zhuang, W., Cao, Y.: Short-term traffic flow prediction based on CNN-BILSTM with multicomponent information. Appl. Sci. 12(17), 8714 (2022) 2. Zhengyang, Qu., Li, J.: Short-term traffic flow forecast on basis of PCA- interval type-2 fuzzy system. J. Phys. Conf. Ser. 2171(1), 012051 (2022) 3. Hao, S., Zhang, M., Hou, A.: Short-term traffic flow forecast based on DE-RBF fussion model. J. Phys. Conf. Ser. 1910(1), 012035 (2021) 4. Li, Y., Liu, G., Cheng, Y., Jifei, Wu., Xiong, Y., Ma, R., Wang, Y.: Application of artificial intelligence technology in traffic flow forecast. J. Phys. Conf. Ser. 1852(2), 022076 (2021) 5. Wang, R.: Research on Short-term Traffic Flow Forecast and Auxiliary Guidance Based on Artificial Intelligence Theory. J. Phys. Conf. Ser. 1544(1), 012164 (2020) 6. Jia, Y., Wu, J., Xu, M.: Traffic flow prediction method based on depth planning. J. Traffic Transp. 22, 1–10 (2017) 7. Moretti, F., Pizzuti, S., Panzieri, S., Annunziato, M.: Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling. Neurocomputing 167, 3–7 (2015) 8. Xianglong, L., Qinqin, J., Liyao, N., et al.: Short-term traffic flow prediction based on deep learning. Appl. Res. Comput. 34(01), 91–93 (2017) 9. Luo, W., Dong, B., Wang, Z.: Short-term traffic flow prediction based on CNN-SVR hybrid deep learning model. J. Transp. Syst. Eng. Inform. Technol. 17(5), 68–74 2017) 10. Wei, D., Liu, H.: An adaptive-margin support vector regression for short-term traffic flow forecast. J. Intell. Transp. Syst. 17(4), 317–327 (2013) 11. Lan, T., Zhang, X., Dayi, Qu., Yang, Y., Chen, Y.: Short-term traffic flow prediction based on the optimization study of initial weights of the attention mechanism. Sustainability 15(2), 1374 (2023) 12. Xuecai, Xu., et al.: A hybrid autoregressive fractionally integrated moving average and nonlinear autoregressive neural network model for short-term traffic flow prediction. J. Intell. Transp. Syst. 27(1), 1–18 (2023) 13. Tian, R., Li, S., Yang, G.: Retraction note: research on emergency vehicle routing planning based on short-term traffic flow prediction. Wirel. Pers. Commun. 128(2), 1509–1509 (2022). https://doi.org/10.1007/s11277-022-10146-w 14. Zhao, L., Bai, Y., Zhang, S., Wang, Y., Kang, J., Zhang, W.: A novel hybrid model for shortterm traffic flow prediction based on extreme learning machine and improved kernel density estimation. Sustainability 14(24), 16361 (2022) 15. Mohammed, G.P., Alasmari, N., Alsolai, H., Alotaibi, S.S., Alotaibi, N., Mohsen, H.: Autonomous short-term traffic flow prediction using Pelican optimization with hybrid deep belief network in smart cities. Appl. Sci. 12(21), 10828 (2022)

Expressway Traffic Speed Prediction Method Based on KF-GRU Model via ETC Data ChenXi Xia, FuMin Zou, Feng Gou(B) , and GuangHao Luo Fujian Key Laboratory of Automotive Electronics and Electric Drive, Fujian University of Technology, FuZhou 350118, China [email protected]

Abstract. Accurate prediction of expressway traffic speed can not only improve traffic efficiency and reduce road congestion, bring convenience to travelers, but also help management departments plan and manage road resources more effectively. Through the utilization and extension of expressway ETC gantry transaction data, this paper proposes an expressway traffic speed prediction method combining Kalman Filter (KF) and Gated Recurrent Unit (GRU) models. Build a vehicle trajectory dataset using ETC transaction data to generate section speed data set. Then, the KF is used to reduce noise and smooth the data, which solves the nonstationary and nonlinear problems of section speed. Then, the GRU is used to mine the traffic speed characteristics for traffic speed prediction, and finally, the experiment was validated using real ETC transaction data from Fuzhou to Xiamen. The results show that the KF-GRU method surpasses traditional models at different time intervals, thus verifying the correctness and reliability of the method. Keywords: Expressway · Speed Prediction · ETC Data · Gated Recurrent Unit · Kalman Filter

1 Introduction With the development of modern transportation network, expressways, as one of the main transportation hubs between cities, play a decisive role in ensuring people’s convenient travel and working of the transportation system. Faced with the increasing demand for expressway traffic, expressways urgently need to develop efficient intelligent transportation systems (ITS). Accurate prediction of traffic information (e.g., travel time, traffic speed, traffic flow, etc.) in ITS systems is essential to provide real-time, reliable road traffic information. Traffic speed prediction is a significant component of ITS, which can improve the efficiency of expressway operation and respond to traffic congestion in advance, can provide travelers with accurate road condition information, and provide traffic managers with decision-making basis to help them manage traffic resources more effectively [1]. Therefore, Accurately predicting expressway traffic speed is particularly important. To further increase the efficiency of expressway, the electronic toll collection system (ETC) has been widely used, providing us with a valuable data resource for studying and © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 37–46, 2024. https://doi.org/10.1007/978-981-99-9412-0_5

38

C. Xia et al.

predicting traffic speed [2]. At present, based on ETC transaction data, many scholars have carried out a series of research on intelligent driving topics, including expressway dynamic speed limit recognition [3], expressway traffic flow prediction [4, 5], vehicle speed prediction [6] and other related research. Due to the complexity and variability of traffic speed, how to accurately predict the traffic speed of expressways is still a challenging problem [7]. At present, many researchers have conducted research related to traffic prediction [8, 9], the main traffic prediction methods mainly include statistical models, traditional machine learning models and deep learning models, and statistical learning models are based on historical data to predict future values through the analysis of historical time series. Han et al. [10] uses an Autoregressive Integrated Moving Average model to predict traffic flow. Statistical models are suitable for simple and stable data, while expressway traffic conditions vary greatly and are too computationally intensive. In addition to this, this method can only capture linear relationships, while traffic speed data has nonlinear characteristics. For such problems, traditional machine learning models have a good improvement. Zhu et al. [11] proposed a model combining Kalman filter (KF) and Support Vector Machine (SVM) to predict traffic flow, which has high requirements for data preprocessing and parameter adjustment, and cannot perform well for the prediction task of complex regularity and complex factors. Zhang et al. [12] used the K-nearest neighbor nonparametric regression method to predict short-term traffic flow, and the method has extensive calculations for the dataset with large sample size, and when the sample is unbalanced, the prediction deviation is relatively large, and other algorithms need to balance the data set. The deep learning model shows its high prediction accuracy and extensive prediction ability, and the deep learning model has strong learning ability, which can voluntarily abstract features and seize the relevance of data, so many deep learning models are used for traffic prediction. Since traffic data can be indicated as time series data, Gated Recurrent Unit (GRU) [13] and Long Short-Term Memory (LSTM) [14] are increasingly being used in the field of speed forecasting. Because the ETC gantry may encounter various interference factors when collecting data from transactions, including data interference, the influence of space and time factors, etc. These factors can affect the accuracy and validity of the data and must be considered during data collection and processing. To tackle the problem, an expressway section speed prediction model based on KF and GRU is proposed. The constructed velocity dataset is processed using the KF to reduce noise and improve data quality. Then, we input the processed data into the GRU model for deep learning to more accurately predict future traffic speeds, which can elevate the precision of expressway traffic speed predictions, thereby providing more effective data support for expressway management and planning. Finally, based on the data of the expressway ETC gantry system from Fuzhou to Xiamen, the section traffic speed prediction is carried out, and the feasibility of the model is evaluated.

2 Algorithm This section will introduce the relevant definitions and describe how the expressway section traffic speed prediction model KF-GRU proposed in this paper performs traffic speed prediction and the characteristics of the import GRU model.

Expressway Traffic Speed Prediction Method

39

2.1 Problem Description and Definition Based on the traffic speed prediction target of expressway section in this paper, this paper gives the following definition: Definition 1 Expressway section (QD): the entrances and exits of expressway gantries, toll stations (including cross-provincial exits and entrances) are called node, and two adjacent nodes form an QD. Definition 2 Trajectory (Traj): The node order formed by a vehicle on the expressway gantry is called the Traj. Definition 3 Section traffic speed (Vave ): The average speed of all vehicles getting past the same section within a certain period of time is called Vave . 1 Vi n n

Vave =

(1)

i=1

where Vi indicates the average speed of the i vehicle; i indicates the i vehicle getting past a section within a period of time; n indicates the n vehicle getting past a section within a period of time. Problem description: The goal of this paper is the traffic speed prediction method of the section, that is, given the feature information of k time intervals before a time t on a expressway section, as shown in formula 2: X (xt−k , xt−k+1 , xt−k+2 , ...xt−1 )

(2)

The aim of the model proposed in this paper is to predict the traffic speed yt within a time interval at time t, that is, to learn a mapping function f from the characteristics of k time periods before time t in the section to the traffic speed yt at time t, as shown in formula 3: yt = f (X (xt−k , xt−k+1 , xt−k+2 , ...xt−1 ); θ )

(3)

where yt indicates the traffic speed within a time interval after the time t of the target section, X indicates the feature matrix, which contains k feature vectors from time t-k to time t-1, and θ indicates the parameter that the model needs to learn. The input X of the GRU model refers to a k-dimensional vector (xt−k , . . . xt−2 , x1 ), xt−k is the feature vector at time t-k. In this paper, we use five characteristics for traffic speed prediction to capture the dynamic changes and influencing factors of traffic speed, xt−k is a vector with a length of 5, which are the traffic speed at time t-k in the target section, the traffic speed at time t-k in the front and rear sections, and the sum of hours thour and days tweekday corresponding to time t-k. By using the traffic speed information and time information of the front and rear sections, it is possible to more accurately predict the traffic condition of the target section. 2.2 Overall Framework This paper uses KF-GRU model to predict the expressway traffic speed. It is mainly divided into three modules. The overall flowchart of the model is shown in Fig. 1.

40

C. Xia et al.

Section speed data set construction module: Preliminary cleaning of data, including removing missing values and outliers, and then construct the vehicle passing time for each section, then use the formula to compute the section traffic speed for the vehicle, and finally construct the time series section speed data set. Data processing module: use the KF to achieve data smoothing and noise suppression, and use the smoothed data for GRU model input. Model training module: Input the processed data into the GRU model for learning and training, and finally output the result predicted by the module.

Start

End

Input section traffic speed

Start

Import highway ETC data (including ETC transaction data and road network topology data)

Output predicted data

Initialize filtering related parameters, given covariance QandR

Calculate the prior state estimation Xk/k-1 for state K

Traffic speed characteristics of input time series in sections

ETC transaction Data cleansing, Remove abnormal data

Model training module Using GRU for learning and training (See Process 2)

Calculate the prior state covariance Pk/k-1 of state K

Divide the training and testing sets according to 8:2

data processing module KF algorithm (See Process 1)

Calculate Calculate the the posterior Kalman state gain Kk covariance for state K Pk/k of state K

Build a GRU model using training data to obtain a model with low prediction error

Constructing a Time Series section Traffic Speed Dataset Using Processed ETC Data

Calculate the posterior state estimation Xk/k for state K

Test the test set in a trained prediction model

K+1 >N

output result

Output prediction results

End

Fig. 1. Algorithm flowchart.

2.3 Construction of Section Speed Dataset Construct the travel trajectory of each vehicle in chronological order based on ETC transaction data, and use the section set of the expressway network to search for the driving trajectory of each vehicle, traverse two adjacent ETC gantry in the driving trajectory one by one, and calculate the vehicle travel time by collecting the difference between the transaction times collected by the two gantries. Based on the distance between two adjacent gantries, the vehicle speed getting past the section is computed; To obtain the speed of each vehicle getting past each section of the expressway network during the driving process, and then obtain the interval speed dataset of all vehicles getting past the section, and calculate the Vave using the formula.

Expressway Traffic Speed Prediction Method

41

2.4 Kalman Filter KF is an important estimation method in statistics [15]. It is widely used in smoothing the actual measurement statistics with errors. Among them, the smoothing method based on KF is widely used to process the physical measurement statistics with mistakes to get the best estimate. In the section traffic speed data processing, the KF is used to optimize the data, which can improve the precision and dependability of the data, and can also reduce the prediction error of the subsequent GRU model. Its Equation of state and observation equation can be expressed as: xk = Axk−1 + BUk + WK

(4)

yk = Hxk + Vk

(5)

where xk and xk−1 represents the system status variable under k and k-1 states, and is the actual section traffic speed; A is the transition matrix, which is used to describe the evolution of system state variables over time; Uk is the input variable of k-state control; B is an optional control input gain; yk is the observed variable of state k; H is the gain of the system status variable xk on the observed variable yk ; WK ~N(0,Qk ), Q is the covariance of system process noise; VK ~N(0,Rk ), R is the measurement noise covariance, and after multiple experimental tests, appropriate WK and Vk have been set. The measurement equation is updated to: ⎧ xk/k−1 = Axk−1/k−1 + BU k ⎪ ⎪ ⎨ Pk/k−1 = APk−1/k−1 AT + Q ⎪ x = xk/k−1 + Kk [yk − Hxk/k−1 ] ⎪ ⎩ k/k Kk = Pk H T \[HPk/k−1 H T + R]

(6)

Based on the known system k-1 state xk−1 , estimate the result of xk in k state. Where xk/k−1 represents the prior state estimate of state k, which is made based on the posterior state estimate of state k-1; xk−1/k−1 is the posterior state estimate of state k-1, which is the optimal estimate obtained based on considering the current observation and prior estimation; Pk/k−1 is the prior estimate covariance of state k, and is the degree of uncertainty of the prior estimate value; Pk−1/k−1 is the posterior estimation covariance of the k-1 state, and is the degree of uncertainty of the posterior estimation value; xk/k is a posterior state estimate; Kk is Kalman gain. Time status update:  Pk/k = [1 − Kk H ]Pk/k−1 (7) Pk/k = Pk+1/k where Pk/k is the posterior estimation covariance of state k; When the system enters the k + 1 state, Pk/k is the prior estimated covariance Pk+1/k of the k + 1 state. In the subsequent recursive process of the algorithm, continue to estimate the state and obtain the optimal estimation value until the entire algorithm is completed, thus obtaining the final result.

42

C. Xia et al.

2.5 Gated Recurrent Unit The scene of vehicles passing through ETC gantry in the expressway network is a real-time dynamic evolution scene. The time series traffic speed characteristics and corresponding time characteristics smoothed by the KF model will be used for GRU learning and training. GRU [13] can solve the shortcomings of Recurrent Neural Network (RNN) models in neural network models, such as the inability to retain long-term dependency information and the occurrence of gradient vanishing and explosion during reverse transmission. Compared to LSTM, GRU can improve computational speed while having a smaller impact on performance. It preserves features with higher frequencies in feature values at different time periods, and the GRU model has a simpler structure and fewer parameters, resulting in faster training speed and reduced risk of overfitting. Its structure is shown in Fig. 2.

ht-1

+ Reset Door Rt

Update Door tanh Zt

ht Candidate hidden state H~ t

Xt Fig. 2. GRU model structure diagram.

The GRU obtains the status of the reset door rt and the update door zt by the previous gantry’s transaction time ht−1 and the input xt of the current moment:

  rt = σ Wr · xt , ht−1 (8) zt = σ Wz · xt , ht−1 where Wr and Wz represent the weight matrix;σ represents a sigmoid activation function; ∼

After that, the pending output ht of the current neuron is calculated:

 h t = tanh Wh · rt ∗ ht−1 , xt

(9)

Wh represents the weight matrix. The reset door rt defines the extent of impact from ∼

the previous moment ht−1 on the output ht to be determined. Finally, the output ht at the current moment can be calculated: ∼

ht = (1 − zt ) ∗ ht−1 + zt ∗ ht

(10)

where the update door zt applied to determine the extent of influence of the output ht−1 of the previous moment on the output ht at the current moment.

Expressway Traffic Speed Prediction Method

43

The GRU model can take the hidden state at time t-1 and the speed information at time t as inputs to the model, thereby obtaining the speed information at time t. By retaining historical speed information as input at time t + 1, the GRU model is able to capture time dependencies and better predict speed information at times. In addition, the speed of the QD also has spatial dependence. This article uses the traffic speed of the front and rear sections of the target section as the feature input model to catch the spatial characteristics of speed. This design fully utilizes the traffic speed correlation between sections, which helps to elevate the accuracy and robustness of the model.

3 Experimental Results and Analysis The data used in the experiment comes from ETC transaction data collected by a provincial expressway information technology company through the ETC gantry system from May 1 to May 10, 2021 as a training dataset. According to the data after preliminary cleaning, the section traffic speed sets of 5 sections every 5 min from 00:00 to 24:00 are obtained, and the time series are generated, with a total of 2880 data in 10 days. Take the first 80% of the data as the training set, and the remaining 20% as the testing set. In addition, this paper also calculates the speed of the QD in accordance with to the time interval of 15 min and 30 min as the data set of the experiment. Due to the relatively small amount of data for a 30 min interval, the experiment used ETC transaction data from May 1 to May 20, 2021 as the dataset. In order to evaluate the performance of the model, the experiment uses root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination (R2 ) as evaluation indicators. The metrics are defined as follows:

1 N RMSE = (yi − y)2 (11) i=1 N  1 N  (12) yi − y MAE = i=1 N  (yi − y)2 R2 = 1 − i=1 (13) 2 i=1 (yi − y) 







where yi represents the actual vehicle speed, y represents the predicted vehicle speed, and N represents the sample size. RMSE and MAE show the extent of deviation between actual and predicted values. The smaller the value, the better the quality of the model and the more accurate the prediction. R2 represents used to observe whether the forecast error is greater than or less than the mean reference error, and the larger the value, the better the forecast effect. To verify the reliability of the KF-GRU model, this paper uses a number of traditional models for experimental comparison of 5-min intervals, including GRU, bidirectional GRU(Bi-GRU), LSTM and Random Forest(RF). The model comparison is shown in Fig. 3. Looking at the figure, we can see that the KF-GRU model is obviously better than other models, where RMSE, MAE, and R2 are 1.8815, 1.2754, and 0.9884, respectively.

44

C. Xia et al.

Fig. 3. Comparison of 5-min interval algorithms.

For time series data with noise, KF can effectively eliminate, combined with the GRU model, this model can effectively deal with long-term dependence problems, so as to obtain the best results on this task. Therefore, the model is better than the GRU and Bi-GRU model. Although this model have the ability to deal with long-term dependence problems, they may be more affected by noisy data than the model combined with KF, resulting in poor prediction results. LSTM is similar to GRU in dealing with long-term dependency problems, but it may be slightly less effective in some tasks, probably because its structure is more complex and there are more parameters, resulting in insufficient model training. The base learner of RF is a decision tree. Although it can handle nonlinear and high-dimensional problems, for time series data, it may not be able to effectively capture the timing dependence of data, so the effect on this task is poor. To better observe the experimental effect of the model, a section is randomly selected in this paper, and the KF-GRU model is used to visualize the expected vehicle speed in the next 5 min. The results are shown in Fig. 4, where orange is the predicted section traffic speed and blue is the true section traffic speed. It can be seen from the figure that the predicted traffic speed has a similar trend to the actual speed, indicating that the KF-GRU model can correctly predict the traffic speed. To further validate the feasibility of the model, this paper uses different models to compare the time intervals of 15 min and 30 min. It can be seen from Tables 1 and 2 that the effect of the KF-GRU model at different time intervals is better than other evaluation models, and the constancy of the model is also better than the comparison model. Thus, the model can increase the accuracy of speed prediction.

Expressway Traffic Speed Prediction Method

45

Fig. 4. Visualization of Traffic Speed Prediction.

Table 1. Comparison of 15-min interval algorithms. Model

RMSE

MAE

R2

KF_GRU

1.6875

0.8233

0.9253

GRU

5.6873

4.0566

0.5319

lstm

5.8416

4.1448

0.5062

Bi-gru

5.453

3.8936

0.5697

RF

3.0003

2.2124

0.6344

Table 2. Comparison of 30-min interval algorithms. Model

RMSE

MAE

R2

KF_GRU

0.7317

0.6172

0.9772

GRU

4.579

2.1472

0.5998

lstm

5.3122

3.8396

0.536

Bi-gru

4.5006

3.2188

0.6669

RF

3.0482

2.1472

0.5998

4 Conclusion In this study, we proposed a speed prediction method for expressway sections. Based on the traditional recurrent neural network, we combined KF to solve the problem of noise in the original ETC data, which improved the data quality. This paper also conducted a series of experiments on the actual data of the expressway. The experimental results reflect that the KF-GRU model has better results than the traditional model at different time intervals, and verifies the role of KF in data smoothing. It can elevate the deficiency

46

C. Xia et al.

of the GRU to the traffic speed prediction, thereby significantly elevating the accuracy of the expressway traffic speed.

References 1. Smith, M., Huang, W., Viti, F., et al.: Quasi-dynamic traffic assignment with spatial queueing, control and blocking back. Transp. Res. Part B Methodol. 122, 140–166 (2019) 2. Li, Y.: The Application and Research of Forecast Analysis Based on Expressway Networking Toll Data. Beijing Jiaotong University (2017) 3. Zou, F., Guo, F., Tian, J., et al.: The method of dynamic identification of the maximum speed limit of expressway based on electronic toll collection data. Sci. Program. 2021, 1–15 (2021) 4. Chen, Z., Zou, F.M., Guo, F., et al.: Short-term traffic flow prediction of expressway based on Seq2seq model. In: International Conference on Frontiers of Electronics, Information and Computation Technologies, pp. 1–5 (2021) 5. Tian, J.S., Zou, F.M., Guo, F., et al.: Expressway traffic flow forecasting based on SF-RF model via ETC data. In: International Conference on Frontiers of Electronics, Information and Computation Technologies, pp. 1–7 (2021) 6. Zou, F., Ren, Q., Tian, J., et al.: Expressway speed prediction based on electronic toll collection data. Electronics 11(10), 1613 (2022) 7. Zeng, X., Guan, X., Wu, H., et al.: A data-driven quasi-dynamic traffic assignment model integrating multi-source traffic sensor data on the expressway network. ISPRS Int. J. Geo Inf. 10(3), 113 (2021) 8. Jeong, M.H., Lee, T.Y., Jeon, S.B., et al.: Highway speed prediction using gated recurrent unit neural networks. Appl. Sci. 11(7), 3059 (2021) 9. Zafar, N., Haq, I.U., Chughtai, J.R., et al.: Applying hybrid LSTM-GRU model based on heterogeneous data sources for traffic speed prediction in urban areas. Sensors 22(9), 3348 (2022) 10. Han, C., Song, S., Wang, C.H.: A real-time short-term traffic flow adaptive forecasting method based on ARIMA model. Acta Simulata Systematica Sinica (2004) 11. Zhu, Z.Y., Liu, L., Cui, M.: Short-term traffic flow forecasting model combining SVM and kalman filtering. Computer Science (2013) 12. Zhang, T., Chen, X., Xie, M., Zhang, Y.: K-NN based nonparametric regression method for Short-term traffic flow forecasting. Syst. Eng.-Theory Practice 30(02), 376–384 (2010) 13. Zhang, D., Kabuka, M.R.: Combining weather condition data to predict traffic flow: a GRUbased deep learning approach. IET Intel. Transport Syst. 12(7), 578–585 (2018) 14. Abduljabbar, R.L., Dia, H., Tsai, P.W., et al.: Short-term traffic forecasting: an LSTM network for spatial-temporal speed prediction. Future Transp. 1(1), 21–37 (2021) 15. Kalman, R.E.: A new approach to linear filtering and prediction problems (1960)

Dynamic Brightness Adjustment of Tunnel Lighting Based on ETC Transaction Data Shilong Zhuo1 , Fumin Zou1 , Feng Guo2(B) , and Xinrui Zhao1 1 Fujian University of Technology, FuZhou 350118, China 2 Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, Fujian

Uni-Versity ofTechnology, Fuzhou 350118, China [email protected]

Abstract. In recent years, with the continuous extension of the highway network, the power demand of the highway lighting system has also become increasingly serious, and the energy-saving problem brought by tunnel lighting has gradually attracted everyone’s attention. As a special section of the highway network, the lighting facilities in the tunnel need to run uninterruptedly for 24 h to solve the “black hole phenomenon” and “zebra effect” in part of the driving process, which has undoubtedly become the most important expense in the highway operating costs. Moreover, excessive brightness distribution runs counter to the national policy of “carbon neutrality” and “energy conservation and emission reduction”. In order to improve the lighting efficiency and thus minimize the operating costs, this paper proposes a dynamic adjustment of tunnel lighting brightness based on ETC transaction data. This method uses the ETC gantry transaction data in front of the tunnel as input to train the bidirectional LSTM (BiLSTM) model, establishes a long-term and short-term tunnel traffic flow prediction model, and realizes the dynamic adjustment of the brightness of the tunnel lighting system under the premise of ensuring traffic safety, thereby reducing energy consumption by 35%-40% and minimizing the energy loss of highway tunnels. Keywords: Deep learning · Traffic forecasting · Tunnel lighting

1 Introduction Over the past decade, energy conservation has become a major global issue as an important parameter of sustainability [1]. The transport sector accounts for about one third of total energy consumption [2], road transport accounts for 89% of all modes of transport and road tunnels account for 85% of total road transport energy consumption. Road tunnel is a relatively closed space, tunnel lighting is the key to ensure driving safety. Therefore, it is necessary to install lighting fixtures, artificial lighting 24 h a day, 365 days a year [3], to provide a safe and comfortable environment for drivers through, approaching and leaving the tunnel [4]. However, many tunnel lighting systems currently have a problem of wasted energy, and excessive lighting in tunnels has led to 70% of wasted electricity [5]. Therefore, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 47–57, 2024. https://doi.org/10.1007/978-981-99-9412-0_6

48

S. Zhuo et al.

various methods and technologies have been proposed to improve the energy efficiency of operations, thereby minimizing the energy consumption of road tunnels. There are many studies on tunnel energy efficiency, and they can be divided into four categories. The first type is the introduction of natural light sources into the tunnel. This approach involves extending the threshold area beyond the road tunnel, supplementing electrical lighting with sunlight in daylight conditions [6]. Cantisani et al. [7]. Used the analysis method to calculate the illuminance on the road surface under the pre-tunnel lighting conditions, and verified that the pre-tunnel lighting can reduce the energy consumption of tunnel lighting. The second category is to reduce the brightness requirement through the greening of the surrounding environment and special pavement. Peña-Garcia et al. [8]. Proposed a method to reduce the illumination around the tunnel entrance by planting climbing plants around the entrance door of the road tunnel. The third category is extensive research and improvement of tunnel lamps. Avotins et al. [9]. Found that the application of LED instead of high-pressure sodium lamps (HPS) lamps provided up to 47% energy savings for street lighting. All three methods can reduce energy consumption with additional system or hardware upgrades. The fourth category is to use intelligent control methods to adjust the brightness of lamps in tunnel lighting systems [10]. These methods are the most common and effective ways to reduce energy consumption through internal optimization of the system. Zhao et al. [11]. Adopts an adaptive fuzzy control strategy to adjust the brightness inside the tunnel through the changes of the external brightness, traffic volume and vehicle speed to meet the basic requirements of tunnel lighting, thereby significantly reducing energy consumption. Qin et al. [12]. Uses the K-means clustering algorithm to divide the daily traffic distribution into six time periods for traffic timing optimization, and proposes a dynamic brightness adjustment method to distinguish the operation strategy under different time periods. This adjustment method provides good energy savings while improving the sustainability of the lighting system. However, their classification of the daily traffic distribution within the tunnel is too simplistic, and timeliness issues may arise. In order to solve the above problems, improve the tunnel lighting effect, driving comfort, safety and energy saving. In this paper, the traffic flow prediction of the gantry in front of the tunnel will be analyzed to predict the traffic flow of the portal in front of the tunnel, and the brightness of the lighting in the tunnel will be automatically controlled: the brightness of the tunnel lighting will be increased during the peak traffic flow; When the traffic flow is low, appropriately reduce the brightness of tunnel lighting. Considering the high randomness and regularity of tunnel traffic, this paper proposes a traffic prediction model based on bidirectional long short-term memory network, which can minimize the manpower, material resources and power consumption of tunnel lighting while ensuring tunnel traffic safety, and achieve green and energy-saving development.

Dynamic Brightness Adjustment of Tunnel Lighting

49

2 Bidirectional LSTM Model 2.1 Basic Principle LSTM is an improved recurrent neural network that can not only solve the common problem of gradient explosion and gradient vanishing in neural networks, but also solve the dependency problem that RNNs cannot handle long distances [13]. Compared with traditional recurrent neural networks that do not selectively remember all information, LSTM-RNN adopts the idea of “gate control” to control and update the memory content of this network by forgetting gates, input gates, and output gates. Its core idea is to change the gradient of numerical decimal places from continuous multiplication form to cumulative form, which can enable LSTM to solve the problem of gradient disappearance when processing long-term time series information. The specific network structure is shown in Fig. 1.

Fig. 1. LSTM-RNN structure.

The first gate control is the forgetting gate, which can realize the abandonment of cross-time information by multiplying the number of 0 ∼ 1 after output by cell state C. This operation can be represented by Formula (1):     (1) ft = δ Wf · ht−1 , xt + bf   In the formula, W represents the weight. Symbol ht−1 , xt Represents the sum of the dimensions of the input and the dimensions of the hidden layer of the previous layer, and b indicates bias. The second gate control is the input gate, which is responsible for determining the amount of new information to enter. The operation of the input gate is divided into two parts, as shown in the following two equations:     (2) it = δ Wi · ht−1 , xt + bi     ct = tanh WC · ht−1 , xt + bC

(3)

50

S. Zhuo et al.

The meaning of the letters in the formula is the same as in Eq. 1, but the values of the weights and biases will be different. By multiplying the result points of these two formulas, we can get the contents updated to the new cell state. The cell state C is updated with the operation of the forget gate and the input gate. ct with memory signal solves the problem of network gradient vanishing. The final cell state is obtained by multiplying the tanh activation function and Ot , and Ot is obtained by combining the output of the hidden layer neuron at the previous moment with the input at this time through the sigmoid activation function, such as Formula (4) and Formula (5).     (4) Ot = δ Wo · ht−1 , xt + bo ht = Ot ∗ tanh(Ct )

(5)

In order to capture more time series information, this paper uses the Bidirectional LSTM Network to predict the traffic flow in front of the tunnel. 2.2 BiLSTM Network It is one of the preferred models for sequential modeling that enhances the feature learning from the large-scale spatial time series data [14]. The BiLSTM network uses two LSTMs, i.e., forward and backward directions [15], and provides the combined output as per the following: Forward LSTM:   − → →−−→ + b  At = φ wxA Xt + w− (6) A AAA t−1

Backward LSTM: ← − −←−− + b← −) − X + w← At = φ(wx← AAA A A t

(7)

 − → ← − Yt = φ wY At , At + bY

(8)

t−1

Combined Output:

− → ← − where wY is the weight of recurrent neuron, bY is the bias, At , At represents the forward and  hidden states, and both the activation are considered at the time  backward t. Further, bA , b← represents the bias parameters, used for the forward and backward A layers. Here, Fig. 2 represents the block diagram of BiLSTM, where for any time step t − 1, t, and t + 1, given input sequences are Xt−1 , Xt , Xt+1 , and output sequences are Yt−1 , Yt , Yt+1 , respectively, and the activation function φ represents the forward and − → ← − backward hidden states, i.e., At and At .

Dynamic Brightness Adjustment of Tunnel Lighting

51

Fig. 2. Block diagram of bidirectional LSTM.

3 Simulation Experiments In terms of hardware, the CPUs we use are the Intel i7 and the NVIDIA GTX 4060. The BiLSTM model is implemented on the Keras platform for Python programming based on the tensorflow API. Using 80% of the combined data as training samples and the remaining 20% as test samples, 50 neural network iterations are performed using the callback function to intercept the parametric model with the best results to prevent overfitting. 3.1 Dataset Introduction In this study, the ETC transaction data collected by the front gantry of Dapingshan Tunnel in Fujian Province was used as the analysis object, and the Dapingshan Tunnel section had a relatively average speed and a large traffic flow, which was the main tunnel section of Quanxia Expressway in Fujian Province, and there was no diversion point in front of the tunnel, which was a highway section with better conditions. Dynamic brightness adjustment of the lighting system in the tunnel has a good reference value. The data used in this study includes transaction data for the 5 days from May 7, 2021 to May 11, 2021, about 120,000 data items, and 70 data dimensions. According to the transaction time dimension, the traffic flow of Dapingshan Tunnel is quantified by using 5 min as the statistical time window. 3.2 Data Preprocessing Because the data dimension is too large, and the actual acquisition environment of the equipment is complex, the collected data has certain loss and anomaly. Outliers and missing data can affect the effectiveness of model building and prediction. Data needs to be preprocessed before the model is trained. Since most outliers are nonlinear, most outliers use the mean of historical data as supplementary data, as shown in Formula (9). After processing the missing data and abnormal data, in order to make the data feture map to the same space, normalization processing is needed. Normalization processing can solve the incomparability of various types of data in their respective feature space,

52

S. Zhuo et al.

and improve the prediction accuracy of the model [16], as shown in Formula (10). 

xni

=

i i i +xn−2 +...+xn−h xn−1



(9)

h

xi =

x−xmin xmax −xmin

(10)

i represents the data for the i time period of the previous day, xni represents the xn−h missing data for the i time period of the nth day, and xi represents the normalized range of [0, 1], xmin , xmax represents the maximum and minimum values of such data. In order to make the experimental results more accurate, the subsequent experiment splits the data into a training set and a test set, as shown in Fig. 3.the training set selects the traffic flow data from May 7 to May 10, and the test set selects the traffic flow data from May 11.

Fig. 3. Schematic diagram of training set and test set.

3.3 Evaluation In order to evaluate the performance of the proposed model, coefficient of determination (R2 ), root-mean-square error (RMSE) and mean absolute error (MAE) are used. The formula is as follows:



R2

=1−

RMSE =

1 m

i





yi −yi

2

i (y−yi )

(11)

2

m  2

yi − yi

(12)

i=1

m



yi −yi

yi

MAE =

100% m

(13)

i=1



yi represents the true value, yi represents the predicted value, y represents the average value of the data, and m represents the total number of such data.

Dynamic Brightness Adjustment of Tunnel Lighting

53

Table 1 shows the results of predicting traffic flow in Dapingshan Tunnel using RNN network, GRU network, LSTM network and BiLSTM model. Experiments show that the BiLSTM model is superior to other neural networks in terms of coefficient of determination (R2 ), root mean square error (RMSE), mean absolute error (MAE), etc., and has been improved on the basis of the original LSTM model. Table 1. Comparison of Prediction Results by Neural Network. RNN

GRU

LSTM

BiLSTM

RMSE

14.371954

12.433409

12.306760

12.017861

MAE

10.694378

9.400774

9.461162

9.043254

0.879151

0.909554

0.911387

0.915498

R2

3.4 Experiment Results Tunnel Traffic Forecasting. The tunnel traffic predicted by the BiLSTM model is compared with the actual tunnel traffic flow, and the effect is shown in Fig. 4. As can be seen from the figure, the predicted results of the predictive model coincide with the real traffic flow in general trends. When the traffic fluctuation is small, the prediction curve of the model basically coincides with the actual traffic flow curve. Therefore, the BiLSTM model proposed in this paper can better predict the short-term traffic flow of tunnels.

Fig. 4. May 11 5-min Trading Totals Forecast.

The BiLSTM neural network model is trained to obtain the loss function and accuracy function curves during training as shown in Fig. 5. During the training process, the train

54

S. Zhuo et al.

loss and test loss decreased, and finally stable and close to coincidence, indicating that the model training process was effective and the training and test sets were well fitted.

Fig. 5. Plot of the loss function and accuracy function during training.

Through the analysis of the above evaluation indicators and the observation and comparison of the graphical representation of simulation results, the effectiveness of the BiLSTM Recurrent neural network predictive model is verified. Tunnel Lighting System Dynamic Brightness Adjustment. Since there is still a distance between the gantry and the tunnel, the adjustment time of the lighting system in the tunnel needs to have some lag compared to the predicted results, and the lag time is t1 =

S V

(14)

t1 refers to the time it takes for the vehicle to travel from the ETC gantry to the tunnel, S refers to the distance from the ETC gantry to the tunnel, and V refers to the speed limit of the tunnel section. By predicting the traffic flow of the ETC gantry in front of the tunnel, the brightness of the lighting system after the vehicle arrives in the tunnel can be simulated, as shown in Fig. 6. t2 refers to the transition period of the tunnel to adjust the brightness. The maximum tunnel brightness value simulated in the figure is matched to the maximum brightness of the tunnel in reality, and the minimum tunnel brightness value of the simulated tunnel is matched to 40% brightness of the tunnel in reality. In order to verify the influence of different tunnel brightness change frequencies on energy loss, In this study, the brightness conversion period t2 was set to equal 10min, 25min, and 50min for comparison experiments. The experimental results are shown in Fig. 7. The results of the effect of different tunnel brightness changes on energy saving are shown in Table 2. It can be seen from the table that the effect of different brightness conversion cycles on energy saving within one hour is not much different. Considering that frequent changes in the brightness of the LED lights in the tunnel may lead to a shortened service life of the LED lights, and frequent changes in the brightness of the lighting system in the tunnel can cause discomfort to the driver, especially in the case of rapid changes. This can lead

Dynamic Brightness Adjustment of Tunnel Lighting

55

Fig. 6. Plot of the change in luminance of the lighting system after the vehicle arrives at the tunnel.

Fig. 7. Comparison of experimental plots with luminance transformation period t2 equal to 10min, 25min, 50min.

to visual fatigue, glare and distraction, which can affect driving safety. Therefore, while ensuring traffic safety, this study will select a large brightness conversion period as much as possible.

56

S. Zhuo et al. Table 2. Energy Savings Effectiveness Table.

Definite integral

10 min

25 min

50 min

24 h

92352.265625

92226.098633

92339.904785

24*60*100 = 144000

35.8664%

35.9541%

35.8751%

Energy efficiency

0%

4 Conclusion Tunnel traffic flow prediction is an important part of tunnel energy saving, which has great economic value and social significance. In this study, through the analysis and mining of the ETC transaction data before the tunnel, a prediction of tunnel traffic flow based on BiLSTM model is proposed, and the prediction results are applied to the dynamic adjustment of the brightness of the lighting system in the tunnel. The prediction results of the BiLSTM model are compared with those of other neural networks by coefficient of determination (R2 ), root mean square error (RMSE) and mean absolute error (MAE), and the experimental results show that the proposed BiLSTM model has a good prediction effect. By comparing different brightness change cycles, the 50-min brightness change cycle was selected as the optimal solution, and the effect of saving energy by 35%-40% was achieved. Future research will be based on tunnel sections with different conditions, and more special situations, such as mergers in front of the tunnel, congestion in the tunnel, etc., will be further optimized to further optimize the original algorithm in order to better study the generalization of the model.

References 1. Yoomak, S., Jettanasen, C., Ngaopitakkul, A., et al.: Comparative study of lighting quality and power quality for LED and HPS luminaires in a roadway lighting system. Energy Build. 159, 542–557 (2018) 2. Wang, Y.F., Li, K.P., Xu, X.M., et al.: Transport energy consumption and saving in China. Renew. Sustain. Energy Rev. 29, 641–655 (2014) 3. Carli, R., Dotoli, M., Cianci, E.: An optimization tool for energy efficiency of street lighting systems in smart cities. IFAC-Papers OnLine 50(1), 14460–14464 (2017) 4. Boyce, P.R.: The benefits of light at night. Build. Environ. 151, 356–367 (2019) 5. Roberts, A.C., Christopoulos, G.I., Car, J., et al.: Psycho-biological factors associated with underground spaces: what can the new era of cognitive neuroscience offer to their study? Tunn. Undergr. Space Technol. 55, 118–134 (2016) 6. Xu, R., Ye, H., Hu, B., et al.: Intelligent dimming control and energy consumption monitoring system of tunnel lighting. Light. Res. Technol. (2023) 7. Cantisani, G., D’Andrea, A., Moretti, L.: Natural lighting of road pre-tunnels: a methodology to assess the luminance on the pavement – Part I. Tunn. Undergr. Space Technol. 73, 37–47 (2018)

Dynamic Brightness Adjustment of Tunnel Lighting

57

8. Peña-García, A., López, J.C., Grindlay, A.L.: Decrease of energy demands of lighting installations in road tunnels based in the forestation of portal surroundings with climbing plants. Tunn. Undergr. Space Technol. 46, 111–115 (2015) 9. Avotins, A., Apse-Apsitis, P., Kunickis, M., Ribickis, L.: Towards smart street LED lighting systems and preliminary energy saving results. In: 2014 55th International Scientific Conference on Power and Electrical Engineering of Riga Technical University 130–135 (2014) 10. Zhao, J., Feng, Y., Yang, C.: Intelligent control and energy saving evaluation of highway tunnel lighting: Based on three-dimensional simulation and long short-term memory optimization algorithm. Tunn. Undergr. Space Technol. 109, 103768 (2021) 11. Zhao, L., Qu, S., Zhang, W., Xiong, Z.: An energy-saving fuzzy control system for highway tunnel lighting. Optik 180, 419–432 (2019) 12. Qin, L., Shi, X., Leon, A.S., Tong, C., Ding, C.: Dynamic luminance tuning method for tunnel lighting based on data mining of real-time traffic flow. Build. Environ. (2020) 13. Mikolov, T., Karafiát, M., Burget, L., et al.: Recurrent neural network based language model. ACM (2010) 14. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997) 15. Saurabh, S., Gupta, P.K.: Deep learning-based modified bidirectional LSTM network for classification of ADHD disorder. Arab. J. Sci. Eng. (2023) 16. Chen, Z., Zou, F., et al.: Short-Term traffic flow prediction of expressway based on Seq2Seq model. Int. Conf. Front. Electron. (2021)

Short-Time Traffic Flow Prediction of Highway Toll Station Based on Combined GRU-MLP Model Wenyu Chen1 , Fumin Zou1 , and Feng Guo2(B) 1 Fujian University of Technology, Fuzhou 350118, China 2 Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, Fujian

University of Technology, Fuzhou 350118, China [email protected]

Abstract. In this paper, based on the traffic flow data of a toll station on a highway in Fujian Province for 7 days, the combined GRU-MLP model is used to predict the short-time traffic flow at a time interval of 15 min, and compared with the single models such as SVM, LSTM, GRU, MLP, etc., and the 4 evaluation indexes of MAE, MAPE, RMSE, and R2 are adopted to analyze the 5 models, and the results show that the paper The proposed GRU-MLP combination model performs better on the four evaluation indexes, in which the MAE value is reduced by 14.25%, the MAPE value is reduced by 36.61%, the RMSE value is reduced by 8.16%, and the R2 value is improved by 0.73%, and the combination model is better than the other four models in terms of prediction, which verifies the superiority of the combination model, and it is able to provide the highway management department with better It verifies the superiority of the combined model and can provide better decision support for highway management. Keywords: Freeway · GRU-MLP Combined Model · Traffic Flow Prediction

1 Introduction Highway is the main choice for intercity travel of passenger cars and freight cars, which has the advantages of comfort, speed and smoothness. As the number of motor vehicles continues to increase, the coverage of electronic toll collection (ETC) technology on highways has also increased steadily. Highway toll station is an important node o-f the intercity transportation road network system, and the phenomenon of toll station blockage often occurs, especially in the holiday period, the pressure of the toll station is more significant [1]。As one of the basic parameters of highway traffic flow, highway toll station flow is of great significance to highway management. If we can understand the traffic flow change rule of the toll station in real time and make accurate prediction, it will greatly help the highway management department to master the flow situation of the toll station and implement the traffic diversion, shunt and congestion information release. Therefore, accurate highway toll station traffic flow prediction for highway control and management is crucial. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 58–67, 2024. https://doi.org/10.1007/978-981-99-9412-0_7

Short-Time Traffic Flow Prediction of Highway Toll Station

59

The types of traffic flow prediction can be divided into three major categories [2]: 1) short-term prediction, which usually refers to the traffic flow prediction within the time interval of 5 ~ 30 min; 2) medium-term prediction, which refers to the traffic flow prediction within the time interval of 30min ~ nh; and 3) long-term prediction, which usually lasts for a time period of 1 ~ nd. The goal of highway toll station traffic flow prediction is to use the historical traffic flow data of the toll station for prediction and evaluation, and the short-term prediction of traffic flow is usually paid more attention to. Early short-term prediction methods of traffic flow include wavelet prediction, Kalman filter, historical averaging method, chaos theory, etc. The prediction effect of these traditional methods is not very good, and the error with the actual traffic flow is large, so more scholars gradually adopted artificial intelligence algorithms as a way to improve the accuracy of traffic flow prediction. Currently, AI algorithms have a wide range of applications in prediction, such as building energy consumption pre-prediction [3], photovoltaic system output prediction [4], temperature prediction [5], human body size prediction [6], water quality index prediction [7], economic operation index prediction [8], rockburst condition prediction [9], and lithium battery life prediction [10]. Artificial intelligence algorithms in the MLP algorithm has a strong fitting ability and nonlinear feature learning ability, and can analyze the network structure and parameters to understand the model; GRU algorithm compared to LSTM and other algorithms in the structure of a more streamlined, fewer parameters, the training efficiency of the law, both are good performance algorithms, but each single algorithm model has its own characteristics and limitations on the application of the model. However, each single algorithmic model has its own characteristics and limitations in application, so only one algorithmic model may be used to lose data information. In order to obtain more excellent prediction results, this paper proposes a combination model of GRU and MLP based on the selection of the two artificial intelligence algorithms, generates a GRUMLP combination model for short-term prediction of highway toll station traffic flow, and uses the real traffic flow data of a toll station in Fujian Province highway to validate the superiority and effectiveness of the combination model.

2 Related Work Currently, there are many scholars using a single artificial intelligence model to carry out research in traffic flow prediction [11–13], but with the development of technology and higher accuracy requirements, more and more combination models are receiving attention from scholars. Li Lei and other scholars [14] proposed a short-term traffic flow prediction method based on the combination of K-nearest neighbor (KNN) and Long Short-Term Memory (LSTM) to predict the traffic flow of urban roadway sections and verified the prediction effect; Sun Yue and other scholars [15] used an optimization algorithm to combine ARMA (Autoregressive Sliding Mean) and LSTM (Long Short-Term Memory Network) models to predict the passenger flow of railroads, and compared their prediction results with those of LSTM (Long Short-Term Memory Network) models, and compared the prediction results with ARMA, LSTM and gray model. As for the traffic flow prediction of highway, it is usually studied based on the highway toll data, combined with the periodicity of flow data, and analyzed to verify the model performance and

60

W. Chen et al.

characteristics. Liu Yongle and other scholars [16] considered the combination of convolutional neural network (CNN) and bidirectional long and short-term memory model (BiLSTM) to construct a CNN-BiLSTM model considering spatio-temporal correlation, and validated the model in the California highway dataset in the U.S., and the results showed that the model has a better prediction effect and fits the real value better; Cai Yanguang and other scholars [17] designed an highway traffic flow prediction method based on the improved cuckoo search (CS) algorithm with radial basis function (RBF) neural network, which predicts the highway traffic flow under heavy rainfall weather, and the prediction accuracy is higher than 90%. In summary, the current research on highway traffic flow prediction is still mainly focused on the road sections, compared with less traffic flow prediction at toll stations. The traffic flow size of the toll station has a great impact on the management and operation of the highway, and the accurate prediction of the toll station flow is more conducive to the management personnel to grasp the traffic situation of the toll station and deploy the corresponding decisions in advance.

3 GRU-MLP Combination Model 3.1 GRU Neural Network The Gate Recurrent Unit (GRU) neural network is an improvement on the structure of Recurrent Neural Network (RNN) to solve the gradient vanishing problem and gradient explosion problem in long sequence training [18]. The basic principle of GRU and LSTM is similar, in which a prediction is made at the current time step through a gating mechanism to control the input, memory and other information and make a prediction. In order to solve the problem of RNN gradient vanishing, GRU is structured with an update gate and a reset gate. These 2 gating vectors determine the output of the final gated loop unit, and the information memorized in the long sequence can be preserved over time. Since the GRU has only 2 control gates, the computational complexity can be reduced and the efficiency of training can be improved during the training process. The GRU unit internally includes an update gate zt (t is the time step)、 a reset gate rt 、a candidate vector ht 、and a new state vector ht . xt is the input vector for the tth time step. A matrix splicing operation is used to form the gate with the activation function. The update gate zt determines how much of the past memorized state information is retained in the current state, with the expression: zt = σ (W (z) xt + U (z) h(t−1) )

(1)

where: W 、U is the weight vector; σ is the Sigmoid activation function. The reset gate rt determines how previously memorized state information and new input information are combined, with the expression:   rt = σ W (r) xt + U (r) ht−1 (2) The new memory content will store past relevant information by resetting the gate output with the expression: ht  = tanh(Wxt + rt  Uh(t−1) )

(3)

Short-Time Traffic Flow Prediction of Highway Toll Station

61

where:  is the Hadamard product operation; tanh is the hyperbolic tangent activation function. The new state vector will retain the information from the current cell and pass it to the next cell with the expression: ht = zt  ht−1 + (1 − zt )  ht 

(4)

The training of GRU is divided into 2 processes: forward propagation and backward error propagation. The forward propagation process calculates the output value of the GRU unit, and the reverse error propagation process calculates the error between the output value and the actual value, and in accordance with the gradient descent method, the connection weights of the neurons in each layer are sequentially derived and updated in the direction of the decreasing gradient of the error, and the two processes of forward propagation and reverse error propagation are continuously alternated until the minimum error or the maximum number of iterations is reached, and then the iterative process ends. 3.2 MLP Neural Network The basic structure of a multilayer perceptron network consists of 1 input layer, 1 or 2 hidden layers, and 1 output layer. The network has the typical characteristics of a forward-oriented neural network, where the information values can be transmitted from the neurons in the input layer to the neurons in the hidden layer 1, then to the neurons in the hidden layer 2, and finally to the neurons in the output layer. The neurons in each layer are connected to the neurons in the layer above them, while there are no connections between neurons in the same layer. Since the hidden layer neurons are linear, the selection of neurons and learning algorithm is simpler than other neural networks. The number of neurons in the input layer is n, i.e. x1 , x2 , …, xn ; Hidden layer 1 has n1 neurons,corresponding to the outputs x1 , x2 , …, xn ; Hidden layer 2 has n2 neurons, corresponding to the outputs x1 , x2 , …, xn ; The number of neurons in the output layer is k, and the output of the whole network is x1 , x2 , …, xn . The input-output relationship of each layer is: n hj = f ( wij xi − θj ), j = 1, 2, . . . , n1 (5) (i=1)

cl = f ( ym = f (

n1

(i=1)

n2

(i=1)

vlj hj − θl ), l = 1, 2, . . . , n2

(6)

uml cl − θm ), k = 1, 2, . . . , k

(7)

In Eqs. (5) ~ (7), hj is the output of the jth neuron of hidden layer 1, wij is the connection weight of the jth neuron of hidden layer 1 to the ith neuron of the input layer, and θj is the threshold value of the jth neuron of hidden layer 1; cl is the output of the lth neuron of hidden layer 1, vlj is the connection weight of the lth neuron of hidden layer 2 to the jth neuron of the layer of hidden layer 1, and θl is the threshold value of the lth neuron of hidden layer 2; and ym is the output of the output of the mth neuron of the layer, uml is the connection weight of the mth neuron of the output layer to the lth neuron of the hidden layer 2, and θm is the threshold value of the mth neuron of the output layer. Where f (·) is the transfer function.

62

W. Chen et al.

3.3 Combination Model In order to obtain better prediction results, this paper adopts the combined GRU-MLP model for highway toll station traffic flow prediction. After the training and output of the GRU model and the MLP model, a certain weight is assigned to the predicted values of the two models to obtain the predicted values of the combined model. The model structure relationship is: ∗ yi∗ = ω1 y∗i_gru + ω2 yi_mlp

(8)

ω1 + ω2 = 1

(9)

∗ is the predicted value of where: yi∗ is the predicted value of the combined model, yi_gru ∗ the GRU model, yi_mlp is the predicted value of the MLP model, and ω1 and ω2 are the weights of the GRU predicted value and the MLP predicted value, respectively. The common methods of assigning weights in the combination model are arithmetic average method, inverse error method, etc. The inverse error method is based on the prediction error to assign the weights, the larger the prediction error, the corresponding single model prediction effect is relatively worse, resulting in the single model of the combined model prediction effect of the smaller contribution, so the single model prediction value of the weight value to take a smaller value. The inverse error method tends to have a more superior performance in the prediction effect of the combined model [19], so this paper selected the inverse error method to assign the weights to ensure the accuracy of the prediction of the combined model, in which the error is selected as the mean absolute error (MAE). The calculation formula is as follows:

ω1 =

MAE mlp MAE gru + MAE mlp

(10)

ω2 =

MAE_gru MAE gru + MAE mlp

(11)

where: MAE gru is the mean absolute error of the GRU model and MAE mlp is the mean absolute error of the MLP model. The structure of the combined GRU-MLP model is shown in Fig. 1 The steps are as follows: Step1: Split the collected traffic flow data of highway toll booths every 15 min for seven days, and set the data of the first six days as the training set and the data of the seventh day as the test set; Step2: Normalize the training set and test set, the benefit of which is to improve the convergence speed and accuracy of the model; Step3: Construct the feature set and label set, and set both as a data structure with 96 time steps and 1 output feature. Each row of the training set has the first 96 traffic flow data every 15 min, i.e., 1 day’s traffic flow data; Step4: Convert the constructed feature dataset into the tensor shape required by the neural network; Step5: put the converted feature dataset into the GRU model for training and output the predicted value with the average absolute error value after back-normalization;

Short-Time Traffic Flow Prediction of Highway Toll Station

63

Step6: put the transformed feature dataset into the MLP model for training and output the predicted value with the mean absolute error value after back-normalization; Step7: Weighting the two models using Eqs. (8) ~ (11) and generating the combined GRU-MLP model; Step8: Based on the generated GRU-MLP combined model, calculate the final prediction value and evaluation index, and compare the performance with the single model.

Fig. 1. Structure of the GRU-MLP combined model

4 Experiment 4.1 Data Presentation and Analysis This paper adopts the real data of traffic flow from May 7, 2021 to May 13, 2021 of a toll station of Fujian Provincial Expressway as the research object, integrates the data in 15min units, and obtains a total of 7 days, 672 time periods of the traffic flow of the expressway toll station, traffic flow with the law of change of time as shown in Fig. 2. 4.2 Evaluation Indicators In order to accurately analyze the accuracy of the combined GRU-MLP model, this paper uses four error metrics for the evaluation of the combined model effect. These indicators include: mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and R2 (R-squared) coefficient of determination, which are calculated as follows: MAE =

n 

|yi − yi∗ |

(12)

i=1

1  yi − yi∗ MAPE = | | × 100 n yi n

i=1

(13)

64

W. Chen et al.

Fig. 2. Graph showing the variation of traffic flow over a seven-day period at a toll plaza.

  n 1  RMSE =  (yi − yi∗ )2 n

(14)

i=1

n ∗ 2 i=1 yi − yi R = 1 − n ∧ 2 i=1 (yi − y ) 2

(15)

where: n is the number of samples, yi is the true value, yi∗ is the predicted value, and y∧ is the average of the true values. 4.3 GRU-MLP Combined Model Predictions and Analysis of Results In this paper, we firstly use GRU model and MLP model for short-term traffic flow prediction for the selected data respectively, and construct the combined model, and then put the same training data into LSTM model and SVM model for prediction respectively, and the predicted results will be compared with the GRU-MLP combined model proposed in this paper. The traffic flow data from May 7, 2021 to May 12, 2021 is selected for model construction and training, which is used to predict the traffic flow on May 13th. After determining the weights of the combined model based on Eqs. (10) ~ (11), the combined model is then constructed using Eqs. (8) ~ (9), and the determined weights and the results of the combined model are shown in Table 1. From Table 1, it can be seen that in the combined model, the weight of the predicted value of the GRU model is ω1 =0.4875 and the weight of the predicted value of the MLP model is ω2 =0.5125,so the predicted value of the combined GRU-MLP model is yi∗ = ∗ ∗ +0.5125 yi_mlp 。 0.4875 yi_gru After training the SVM and LSTM models with the same data, the obtained results are compared and analyzed with the GRU model, MLP model and GRU-MLP combined model proposed in this paper. The prediction results of the combined model are shown in Fig. 3, from which it is easy to see that the prediction curve of the combined model has a better fitting effect.

Short-Time Traffic Flow Prediction of Highway Toll Station

65

Table 1. Weights and Evaluation Indicators for Combined Models Weighting Factors (ω)

MAE

MAPE

RMSE

R2

GRU Model

0.4875

16.8927

20.4712

22.2311

0.9492

MLP Model

0.5125

16.0700

22.2420

20.3874

0.9573

Combination Model

ω1 =0.4875 ω2 =0.5125

13.7797

14.1001

18.6326

0.9643

Fig. 3. Comparison curve between true and predicted values of SGRU-MLP combined model

The comparison of the model prediction effects is shown in Table 2, in which the values of the four evaluation indexes of MAE, MAPE, RMSE, and R2 for a total of five models are listed. From Table 2, it can be seen that compared to the MLP model, the MAE value of the GRU-MLP model is reduced by 14.25%, the MAPE value is reduced by 36.61%, the RMSE value is reduced by 8.61%, while the R2 value is improved by 0.73%. In summary, the combined GRU-MLP model proposed in this paper has better performance and stability compared with the single model after clear result comparison and verification, and it is the best for predicting the short-term traffic flow of highway toll booths, and the predicted results are in line with the actual traffic situation of highway toll booths.

66

W. Chen et al. Table 2. Comparison of evaluation indicators for the five models MAE

MAPE

RMSE

R2

SVM

19.2149

36.3107

23.4816

0.9433

LSTM

17.2198

19.1494

22.3029

0.9489

GRU

16.8927

20.4712

22.2311

0.9492

MLP

16.0700

22.2420

20.3874

0.9573

GRU-MLP

13.7797

14.1001

18.6326

0.9643

5 Conclusion In this paper, the traffic flow data of a toll station on a highway in Fujian Province is taken as the object of study, and the data from May 7, 2021 to May 13, 2021 are selected to analyze the trend of daily traffic flow changes as well as the periodicity during the period, and a single model of SVM, LSTM, GRU, and MLP, as well as a combined model of GRU-MLP, are constructed for the 15-min interval of this toll station shorttime traffic flow prediction. The prediction results are more in line with the objective trend of highway toll station traffic flow than the other four single models, which can better realize the short-time traffic flow prediction at 15-min intervals, and provide convenience and decision-making support for the control of the highway management.

References 1. Jianxiong, C., Yanjun, X.: Analysis and forecast of traffic flow at entrance and exit of highway toll station. J. Shanghai Inst. Ship Transp. Sci. 46(01), 42–48 (2023) 2. Hou, Z., Li, X.: Repeatability and similarity of freeway traffic flow and long-term prediction under big data. IEEE Trans. Intell. Transp. Syst. 17(6), 1786–1796 (2016). https://doi.org/10. 1109/TITS.2015.2511156 3. Sun, H.: “Prediction of building energy consumption based on BP neural network. Wirel. Commun. Mobile Comput. 2022, 7876013, 10 (2022). https://doi.org/10.1155/2022/7876013 4. Huijuan, Z., Qi, L., Zeyao, C., et al.: Short-term prediction model of photovoltaic system output power based on GWO-MLP. Electr. Measur. Instrum. 59(07), 72–77+113 (2022). https://doi.org/10.19753/j.issn1001-1390.2022.07.010 5. Gang, C., Jingfan, P., Hailong, M., et al.: Boiler reheat steam temperature prediction based on multilayer perceptron neural network. Hunan Electr. Power 42(01), 71–75 (2022) 6. Yingmei, X., Zhujun, W., Jianping, W., et al.: Research on human body size prediction based on multilayer perceptron neural network. J. Wuhan Text. Univ. 32(04), 37–42 (2019) 7. Weilun, Y., Yuxuan, G., Lei, C.: Linear regression method combined with MLP to predict the comprehensive water quality index of Lijiahe Reservoir. Shaanxi Water Resources 2023(06), 19–21+25 (2023). https://doi.org/10.16747/j.cnki.cn61-1109/tv.2023.06.061 8. Zhang, Z.: Prediction of economic operation index based on support vector machine, Mobile Inf. Syst. 2022, 3232271, 11 (2022). https://doi.org/10.1155/2022/3232271 9. Owusu-Ansah, D., Tinoco, J., Lohrasb, F., Martins, F., Matos, J.: A decision tree for rockburst conditions prediction. Appl. Sci. 13, 6655 (2023). https://doi.org/10.3390/app13116655

Short-Time Traffic Flow Prediction of Highway Toll Station

67

10. Yang, Y.: Life prediction of lithium battery based on combined ARIMA and BP neural network model. Hainan Univ. (2020). https://doi.org/10.27073/d.cnki.ghadu.2020.000073 11. Peng, J.I.A., Mujun, L.U., Zihan, C.A.O., et al.: Charging station traffic flow prediction based on Gray theory. Electrotechnology 20, 32–34 (2021). https://doi.org/10.19768/j.cnki.dgjs. 2021.20.011 12. Xiaoxia, Z., Nano, G.: Research on short-time traffic flow prediction of Shanghai-Chongqing expressway based on LSTM. China Transp. Inf. 09, 133–137 (2022). https://doi.org/10.13439/ j.cnki.itsc.2022.09.012 13. Yaofang, Z., Jian, C.: Short-time prediction model of highway traffic flow by vehicle type based on GBDT algorithm. Highway 67(01), 221–227 (2022) 14. Lei, L.: Research on short-time traffic flow prediction method based on KNN-LSTM. Modern Inf. Technol. 6(10), 169–173 (2022). https://doi.org/10.19850/j.cnki.2096-4706.2022.10.043 15. Yue, S., Xiaoyu, S., Liting, J., et al.: Railroad passenger flow prediction based on ARMALSTM combined model. Comput. Appl. Softw. 38(12), 262–267+273 (2021) 16. Yongle, L.I.U., Yuanli, G.U.: Prediction of spatio-temporal characteristics of highway traffic flow based on CNN-BiLSTM. Transp. Sci. Econ. 24(01), 9–18 (2022). https://doi.org/10. 19348/j.cnki.issn1008-5696.2022.01.002 17. Yanguang, C., Bing, L., Cai, R., et al.: Short-time traffic flow prediction on highways under heavy rainfall. Comput. Eng. 46(06), 34–39 (2020). https://doi.org/10.19678/j.issn.10003428.0055520 18. Tiying, W., Pengchao, S., Jiangqiong, L., et al.: Research on traffic flow prediction method based on threshold recurrent unit recurrent neural network. J. Chongqing Jiaotong Univ. (Natural Science Edition) 37(11), 76–82 (2018) 19. Xianzhun, P.: Research on iron ore price forecasting based on ARIMA and BP neural network combination model. Dalian Univ. Technol. (2020). https://doi.org/10.26991/d.cnki. gdllu.2020.001509

Real-Time Carbon Emission Monitoring and Prediction Method of Expressway Based on LSTM Xinrui Zhao1 , Fumin Zou1 , Feng Guo2(B) , and Sirui Jin1 1 Fujian University of Technology, Fuzhou 350118, China

[email protected]

2 Fujian Provincial Key Laboratory of Automotive Electronics and Electric Drive, Fujian

Uni-Versity ofTechnology, Fuzhou 350118, China [email protected]

Abstract. Under the background of “double carbon” policy, it is imperative to maintain ecological balance and reduce carbon emissions. Traffic carbon emissions are the main source of carbon emissions in China, of which road traffic accounts for a relatively high proportion. Reducing road traffic carbon emissions is an important way to achieve carbon emission reduction. Improving the unified statistical monitoring system of traffic carbon emissions and quantifying the level of traffic carbon emissions are the basis for achieving traffic carbon emission reduction. To this end, this study proposes a method for measuring carbon emissions of highway vehicles based on fusion data. Firstly, the basic data of highway vehicles are cleaned. Secondly, the calculation model of highway carbon emissions based on RNN-LSTM network is established, and then the relevant calculation process is designed. Finally, taking a portal frame of a highway in Fujian Province as an example, the vehicle carbon emissions from May 6,2021 to May 30,2021 were calculated. Keywords: Deep learning · RNN-LSTM · Carbon emission

1 Introduction In recent years, China’s car ownership has continued to rise, highway traffic has steadily increased. With the rapid development of China’s expressway electronic toll collection system (ETC) technology, the infrastructure is becoming more and more perfect, and the huge ETC charging data has been accumulated, which provides strong data support for the information construction of intelligent high-speed. Meantime, with the acceleration of industrialization and urbanization and the continuous upgrading of consumption structure, resource and environmental problems are still one of the bottlenecks restricting China’s economic and social development, energy conservation and emission reduction are imperative. Traffic carbon emissions are the main source of carbon emissions in China, and road traffic accounts for a relatively high proportion. Existing studies have shown that road transportation accounts for 77.8% © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 68–78, 2024. https://doi.org/10.1007/978-981-99-9412-0_8

Real-Time Carbon Emission Monitoring and Prediction Method

69

of traffic carbon emissions [1]. Improving the statistical monitoring system of traffic uniform carbon emissions and quantifying the level of traffic carbon emissions are the basis for achieving carbon peak and traffic carbon emission reduction. Vehicle carbon emissions mainly refer to the emission of carbon dioxide. There are four main methods for calculating carbon emissions: the “total structure method” [2], the full life cycle method [3, 4], the IPCC (Intergovernmental Panel on Climate Change) top-down method and bottom-up method [5]. Based on the expressway data of Guangdong Province, Li Yuanjun et al. [6] established a full-sample, high-precision carbon emission measurement model based on different vehicle types and analyzed the spatial characteristics of expressway carbon emissions. Lin Xukun et al. [7] proposed a vehicle carbon emission measurement method based on multi-source data fusion and took a highway network as an example to measure vehicle carbon emissions. In summary, the current deficiencies are as follows: First, most studies use the topdown total estimation method, and the accuracy is low; second, the influencing factors considered in the calculation of carbon emissions are less and not comprehensive enough; third, high-precision research objects are often local road sections. The purpose of this study is to construct and verify a highway carbon emission measurement model based on etc. driving data using integrated traffic data [8, 9].

2 Carbon Emission Prediction Model Based on RNN-LSTM 2.1 RNN-LSTM Neural Network Model Recurrent neural network is a kind of neural network. Compared with other types of neural networks, the biggest difference is that it establishes its own memory mechanism and increases the cyclic transmission mechanism of time-related state information among neurons in each layer. Therefore, the data set input to the recurrent neural network should not only contain the current eigenvalues, but also include the state of the previous moment or a few moments ago. For data sets with a large time span, LSTM (Long Short-Term Memory) can solve the problem that it is difficult to capture long-term time correlation. The neurons of the LSTM are composed of a forgetting gate, an input gate, an output gate, and a memory cell to record additional information. Memory cells are responsible for remembering time-related information, and the three gates are responsible for regulating the information flow in and out of neurons [10]. Therefore, in this process, each memory unit can obtain a continuous gradient flow and learn a sequence of hundreds of time steps while the error remains the original value, thereby solving the problem of gradient disappearance [11]. Therefore, the structure of LSTM networks is very suitable for processing time series. In this paper, a sequence model based on recurrent neural network (RNN) and long short-term memory (LSTM) is proposed to predict the traffic flow of highway portal frame. The proposed model consists of three LSTM layers with Dropout regularization, and a Dense layer with a single value output for prediction. First, an LSTM layer is added, which has 64 memory units. A Dropout layer is added after the LSTM layer, which randomly discards the output of some neurons with a specified probability during the training process, thereby preventing overfitting. Secondly, an LSTM and Dropout layer with the same configuration is added to increase the complexity and expression

70

X. Zhao et al.

ability of the model. Add an LSTM layer and a Dropout layer again, and the LSTM layer will only return the output of the last time step. Finally, a Dense layer is added, which is a fully connected layer with a single neuron. The output of this neuron will be used as the predicted value for the regression problem. 2.2 Carbon Dioxide Emissions Accounting Method International Mainstream Methods Globally, common transportation carbon emissions accounting methods mainly include [12]: 1. ‘IPCC Guidelines for National Greenhouse Gas Inventories (Revised 1996)’ and ‘IPCC Guidelines for National Greenhouse Gas Inventories (Revised 2006)’. Among them, the road traffic accounting method in the ‘IPCC Guidelines’ (2006 Edition) traffic carbon emissions accounting method is as follows: (1) Optimal options: 

(Fuela × EFa_CS)

(1)

a

(2) Sub-optimal options:  (Fuela × EFa_Default)

(2)

a

In the formula, FUEL is the fuel consumption, a is the fuel type, EF is the carbon dioxide emission factor, and CS is the localized emission factor (not the default factor). 2. ‘International Standard for Urban Greenhouse Gas Accounting’ (GPC): GPC is a methodological framework for urban greenhouse gas emissions accounting based on the IPCC Guidelines. GPC is mainly used at the local level and is more widely used than the “IPCC Guidelines”. It is not only used to calculate the history and current situation of carbon emissions in the field of transportation, but also to predict the future trend of carbon emissions in transportation and identify the main sources of emissions. Domestic Common Method 1. ‘Top-down’ method At the national level, China strictly follows the “top-down” method of the “IPCC Guidelines” (1996 edition) and the “IPCC Guidelines” (2006 edition), and calculates the national traffic carbon emissions through the energy consumption statistics of the national energy balance sheet.

Real-Time Carbon Emission Monitoring and Prediction Method

71

2. ‘Bottom-up’ method Although the ‘bottom-up’ method is not the national transportation greenhouse gas accounting method recommended by the IPCC, many provinces and cities in China use the combination of ‘ top-down ‘ method and ‘bottom-up’ method to identify the main emission sources, predict future emission trends, and support the preparation of low-carbon action plans when calculating transportation carbon emissions [13]. 2.3 Vehicle carbon Emission Calculation Model Theory of Calculation By analyzing and integrating transportation data, the relevant variables of the model are determined, and then the model framework is obtained: the theoretical energy consumption and average carbon emission coefficient of each vehicle type under different fuel classifications are obtained, and the total energy consumption and total carbon emission of each vehicle type are calculated by combining the indicators such as highway vehicle mileage, vehicle speed and vehicle type [14, 15]. Calculation Process (1) Run times calculation The flow of the expressway through the portal frame during the statistical period is calculated to obtain the running times of various vehicle types in the expressway. The running times of the vehicles in the expressway network are recorded as c, as shown below: c=

n 

ci

(3)

i=1

In the formula: i is the vehicle type, which is divided into 18 categories (n = 18, as shown in Table 1) according to the classification standard of the expressway model in this paper, and ci refers to the running times of the i-class vehicle.

Table 1. Classification standard of highway vehicle carbon emission calculation model. Field Name

Vehicle Class

Standard of Classification

VehClass

1~6

Passenger car I ~ VI

VehClass

11 ~ 16

Lorry I ~ VI

VehClass

21 ~ 26

Special operation vehicle I ~ VI

(2) Average speed calculation The average speed of the vehicle between the highway portal frames during the statistical period is calculated. The actual distance si between the two gantry frames

72

X. Zhao et al.

and the time-consuming t are obtained, and then classified according to the vehicle type, and the average value is calculated, and then the average travel speed of each type of vehicle is calculated: ci si vi =

k=1 t

(4) c In the formula: vi is the average driving speed of the type i vehicle, si is the actual distance between the two gantry frames passed by the k th vehicle in the type i vehicle, t is the duration used, and ci is the total number of vehicles of the type i vehicle. (3) Establishment of vehicle fuel consumption-average speed model According to the distribution of test data, based on the generalized linear regression model theory, the linear regression analysis of the test results is carried out to determine the distribution relationship between the fuel consumption and the average speed of the test vehicle. The model is as follows: FCi = −10.011 ∗ ln(vi ) + 0.133 ∗ vi + 37.955(R2 = 0.874)

(5)

In the formula, FCi is the fuel consumption per hundred kilometers (kg) of the type i vehicle, and vi is the average driving speed (km/h) of the type i vehicle. (4) Vehicle total energy consumption calculation V ∗ (−10.011 ∗ ln(v) + 0.133 ∗ v + 37.955) ∗ L (6) 100 In the formula: F is the total fuel consumption of the driving vehicle, p is the conversion coefficient of fuel consumption-carbon dioxide emissions, and 2.254 is taken for gasoline; v is the hourly traffic volume; L is the mileage. (5) Calculation of carbon emissions F =p∗

CEF = NCV ∗ PF ∗ COF ∗ K ∗ 10−6

(7)

CEFLi = CEF ∗ ρi

(8)

FCi ∗ CEFLi Dr ∗ cir (9) 100 In the formula: CEF is the carbon dioxide emission coefficient (kg CO2/kg); NCV is the average low calorific value of fuel (kJ/kg); PF is the potential carbon emission factor, which is expressed by the mass of carbon contained in the unit calorific value fuel (t-C/TJ); COF is the carbon oxidation rate of fuel (%); K is the carbon conversion efficiency, which is the ratio of the relative molecular mass of carbon dioxide and carbon, 44/12 ≈ 3.67. The calculation result is multiplied by 10−6 for unit conversion. CEFLi is the carbon emission coefficient of type i fuel (kg CO2/kg); ρi is the density of type i fuel; Eir is the carbon emissions (kg) of the type i vehicle driving on the r-segment; FCi is the fuel consumption per 100 km (L/100 km) of the type i vehicle; Dr is the length of r road section (km); cir is the number of type i vehicles on r-segment. According to official data (Table 2) [16, 17]., the carbon emission coefficient of diesel is 3.10 kg CO2/kg; The carbon emission coefficient of gasoline is 2.93 kg CO2/kg. Eir =

Real-Time Carbon Emission Monitoring and Prediction Method

73

Table 2. Variable values in the formula of carbon dioxide emissions factor. Fuel type

Density /(kg·L−1 )

Average low calorific value /(kJ·kg−1 )

Potential carbon Carbon oxidation emission coefficient rate /% /(t-C·TJ-1)

diesel oil (5–0–10)

0.81~0.85

42 705

20.2

98

diesel oil (20–35-50)

0.79~0.84

42 705

20.2

98

Gasoline (89–92–95)

0.72~0.78

43 124

18.9

99

3 Simulation Experiments 3.1 Data Description This experiment relies on the real transaction data collected by the ETC gantry system of Fujian Expressway to evaluate the performance of the proposed model. The portal frame 34025F is one of the portal frames with the highest flow rate in Fujian Province, and is selected as the target portal frame in this experiment. The gantry data is sampled in real time in chronological order. The sampling time is 25 days of transaction data from May 6 to May 30,2021, with about 810,000 data. In the experiment, the first 80% of the data is used as the training set, and the last 20% is used as the test set. 3.2 Introduction of Comparative Experiment In order to verify the effectiveness of the proposed model, this paper uses several representative control methods for comparison. These methods include forward feedback neural network model (MLP), time-dependent recurrent neural network model (RNN-GRU) and convolutional neural network model (CNN) considering spatio-temporal features. Specifically, MLP stands for Multilayer Perceptron, which is the most basic and common artificial neural network model. Its information propagates unidirectionally in the network without forming a loop or loop. Compared with long short-term memory (LSTM), RNN-GRU has a similar gating mechanism, but the number of parameters is less, which may be more suitable for smaller data sets or resource-constrained environments in some cases. Convolutional Neural Network (CNN) is a special type of artificial neural network, which is widely used in image processing and computer vision tasks. 3.3 Evaluation Index and Parameter Setting This paper uses the following three evaluation indicators to evaluate the accuracy of the prediction results of each model:

74

X. Zhao et al.

(1) Mean absolute error (MAE): MAE =

n  1  yi − yi  n 

(10)

i=1

(2) Coefficient of determination (R2 ): R2 = 1 −

SSR SST

(11)

(3) Root mean square error (RMSE):   n 1 

2 yi − yi RMSE =  n 

(12)

i=1



where yi is the true value, yi is the predicted value, and n is the sample size. SSR (Sum of Squared Residuals) is the sum of squares of regression, which represents the sum of squares of deviation between the predicted value of the model and the actual observed value. SST (Total Sum of Squares) is the total sum of squares, which represents the sum of squares of the deviation between the actual observation and the mean value of the observation. MAE represents the mean of the absolute error between the predicted value and the observed value. The determination coefficient R2 measures the degree of interpretation of the predicted value to the actual value. The closer the determination coefficient is to 1, the better the model can explain the variability of the actual data and the stronger the prediction ability; RMSE represents the sample standard deviation of the residual between the predicted value and the observed value, indicating the degree of dispersion of the sample. 3.4 Experimental Results and Analysis RNN-LSTM Model Validation According to the vehicle type, the vehicles passing through the gantry 34025F are classified and visualized. It can be seen from Fig. 1 that the vehicle type 1, that is, the number of first-class buses, is absolutely dominant, and changes periodically and regularly according to day and night. The number of other vehicles is not much different, and it also changes periodically from day to night. From Fig. 1, the number distribution of each vehicle type can be seen more clearly. After that, the vehicles with a vehicle type of 1 were screened and divided in 30 min. The number of vehicles passing through the portal frame 34025F and the speed between the portal frame 34025F and the portal frame 340261 were counted. The speed distribution is shown in Fig. 2. Next, the model is trained and evaluated. The data from May 6 to May 25,2021 is used as the training set, and the transaction data from May 26 to May 30,2021 is used as the test set. The model is used to predict the test set. After the prediction, the inverse_transform is used to denormalize the predicted value, otherwise the predicted value will be a value between 0–1. The prediction results are shown in Fig. 3.

Real-Time Carbon Emission Monitoring and Prediction Method

75

The RNN-LSTM neural network model is trained to obtain the loss function and accuracy function curves during the training process, as shown in Fig. 4. During the training process, the training loss and test loss are reduced, and finally stable and close to consistent, indicating that the model training process is effective, and the training set and test set are well fitted. After calculation, the total carbon emission generated by passing through the type 1 vehicle in the two gantry frames is 33244 kg. Through calculation, when the vehicle is in an extremely crowded situation, that is, when the average speed of the vehicle is less than 20 km/h, the fuel consumption is more than 2.5 times that of the unblocked (the average speed of the vehicle is greater than 80 km/h).

Fig. 1. Quantity change curve divided by vehicle type every 4 h.

Fig. 2. Speed distribution of Type 1 vehicle.

76

X. Zhao et al.

Fig. 3. Change curve of trading volume forecast value of Type 1 vehicle.

Fig. 4. The graph of loss function and precision function in the training process.

Overall Performance Comparison Table 3 shows the prediction results of each model for the traffic flow of the 34025F gantry. Each control experiment is tested on the relevant gantry traffic data set. For all the control experiments based on recurrent neural network, 10 independent repeated experiments were carried out to test the stability of the prediction results, and the average value of the prediction results of each group was counted. It can be seen that the error of the proposed model is lower than that of other control models. Among them, CNN has the worst effect. This model lacks a memory mechanism, and the prediction results for time series with poor stability are often not accurate enough. The effect of RNN-GRU is relatively poor; compared with Long Short-Term Memory (LSTM), GRU has a similar gating mechanism, but the number of parameters is smaller, and in some cases it may be more suitable for smaller data sets or resourceconstrained environments. MLP has no internal memory mechanism, cannot capture the

Real-Time Carbon Emission Monitoring and Prediction Method

77

Table 3. The experimental results of the proposed method and the comparison method. Model name

Flow prediction based on related gantry MAE

R2

RMSE

MLP

42.16

0.957

54.26

RNN-GRU

43.51

0.970

53.36

CNN

52.67

0.949

69.25

RNN-LSTM

38.64

0.974

49.89

time dependence and long-term dependence in sequence data, and is less sensitive to the order and time information in the input data. Based on the previous theoretical analysis, this section introduces the construction process of the proposed model in detail, and tests the prediction effect of the proposed model through real cases, which enhances the interpretability of the model.

4 Conclusion Model construction and quantitative monitoring have always been a hot field in the study of traffic carbon emissions. Under the background of China’s clear goal of “double carbon,” it is urgent to innovate the measurement method of traffic carbon emissions, and to accurately locate and scientifically evaluate the mobile carbon emission sources. This study improves the measurement method of highway carbon emission, and provides ideas for constructing and expanding the problem of traffic carbon emission model. Limited by objective conditions such as data, there are still some deficiencies in time scale research, energy consumption analysis of motor vehicles (especially trucks), traffic congestion and the impact of terrain on energy consumption. In the future, it will continue to deepen in order to make new progress in the field of spatial and temporal evolution of carbon emissions.

References 1. Pan, Y., Cheng, S.: Research on prediction and simulation technology of running speed and engine fuel consumption. Road Traffic Sci. Technol. (5), 96–99 (2004) 2. Guo, S.: Research on carbon emission assessment system and evaluation method of urban transportation system. Hefei University of Technology, Hefei (2014) 3. Zhang, X., Yang, X., Yan, Y.: Research on the statistical calculation method of urban traffic energy consumption and carbon emissions. China Soft Science 6, 142–150 (2014) 4. Zhang, X.,Yang, X., Yan, Y.: Research on energy consumption and carbon emission of urban transportation. China Soft Sci. (6), 142–150 (2014) 5. Xu, Z., Zou, Z., Cao, B.: Estimation of carbon emissions from urban passenger transport and low-carbon ways-Taking Tianjin as an example. J. Beijing Univ. Technol. 39(7), 1007–1013, 1020 (2013)

78

X. Zhao et al.

6. Li, Y., Wu, Q., Wang, C., Wu, K., Zhang, H., Jin, S.: Carbon emission measurement model and spatial pattern of expressways in Guangdong Province based on traffic big data. Tropical Geography 42(06), 952–964 (2022). https://doi.org/10.13284/j.cnki.rddl.003491 7. Lin, X., Zhang, Y., Luo, Z.: Research on the calculation method of vehicle carbon emissions in expressway network. J. South China Univ. Technol. (Nat. Sci. Ed.) (9), 22–28 (2022) 8. Xu, Z., Zo, Z., Cao, B.: Carbon emission level estimation and low-carbon approach of urban passenger transport: a case study of Tianjin. J. Beijing Univ. Technol. 39(7), 1007–1013, 1020 (2013) 9. Intergovernmental Panel on Climate Change. 2006 IPCC guidelines for national greenhouse gas Inventories. Institute for Global Environmental Strategies (IGES), Japan (2006) 10. Wen, X., Li, W.: Time series prediction based on LSTM-attention-LSTM model, digital object identifier. https://doi.org/10.1109/ACCESS.2023.3276628 11. Deng, J., Lu, L., Qiu, S.: Software defect prediction via LSTM. IET Softw. 14(4), 443–450 (2020) 12. Xue, L., Liu, D.: Precise policy: research on carbon dioxide emission accounting methods in China ‘s provincial transportation sector. World Resources Institute 13. Xu, J., Dong, Y., Yan, M.: A model for estimating passenger-car carbon emissions that accounts for uphill. Downhill Flat Roads 12, 1–21 (2020) 14. Nocera, S., Ruiz-Alarcón-Quintero, C., Cavallar, F.: Assessing carbon emissions from road transport through traffic flow estimators. Transp. Res. Part C 95, 125–148 (2018) 15. Rui, S., Yi, Z., Zuo-Jun, M.S., et al.: Analyzing the effects of road type and rainy weather on fuel consumption and emissions: a mesoscopic model based on big traffic data. Digital Object Identifierhttps://doi.org/10.1109/ACCESS.2021.3074303 16. General Principles for Calculation of Comprehensive Energy Consumption (State Administration for Market Regulation, etc. (2020) 17. National Standard for Automotive Gasoline/Fuel (General Administration of Quality Supervision, Inspection and Quarantine, etc. (2016a/2016b)

PSR-GAN: Unsupervised Portrait Shadow Removal Using Evolutionary Computing Tianlong Ma, Longfei Zhang(B) , Xiaokun Zhao, and Zixian Liu Beijing Institute of Technology, Beijing 100081, China [email protected]

Abstract. Because of unwanted occlusion and bad lighting conditions, portrait photographs often suffer from shadows, the presence of which in an image can both decrease its aesthetic quality and increase the difficulty of performing high-level vision tasks. Since most of the shadow removal methods do not specifically remove portrait shadows and hardly delve into the face characteristics, these methods cannot achieve perfect results when removing portrait shadows. Inspired by evolutionary computing, in this paper, we propose a novel unsupervised portrait shadow removal framework, PSR-GAN. To make good use of face characteristics, we introduce a face extraction module, namely FEM, in which we utilize a network to obtain the portrait matte, thereby allowing the network to focus more on the face regions and ignore the interference of background redundant information. Experiments on our collected dataset show that our method is able to effectively remove portrait shadows, and outperforms other existing shadow removal methods. Keywords: portrait shadow removal · evolutionary computing · unsupervised learning · portrait matte · portrait shadow dataset

1

Introduction

Due to environmental conditions, casually-taken portrait photographs are easily affected by shadows. On one hand, the presence of shadows in an image can decrease its aesthetic quality. On the other hand, this phenomena greatly increase the difficulty of performing high-level vision tasks. Unfortunately, removing shadows from a single portrait image is a very challenging task, for shadows have a wide variety of shapes in different scenes. Moreover, it is necessary to restore shadow regions without changing colors on non-shadow regions. With the idea of evolutionary computing, we adopt an iterative optimization approach to solve this problem. To make the most of face information and reduce the interference of background redundant information, we introduce a face extraction module in our PSR-GAN. In this module, we use a network to obtain a portrait matte, which indicates the pixels that belong to face regions. Then, c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 79–86, 2024. https://doi.org/10.1007/978-981-99-9412-0_9

80

T. Ma et al.

by this portrait matte, we can obtain the portrait image without background information. Specifically, we utilize the portrait matting network, ModNet [4]. In this way, we can focus more on face regions. Also, we demonstrate the effectiveness of our model on various experiments. In summary, our contributions are as follows: 1. We propose a portrait shadow removal generative adversarial network, namely PSR-GAN, which is an unsupervised network guided by face information to focus on face regions. 2. We put forward a face extraction module, FEM, being able to effectively eliminate the background and extract face regions. 3. Experiments demonstrate that our method achieve state-of-the-art results on our collected dataset.

2

Related Work

Early shadow removal methods are based on physics models. Some of these methods model images as combinations of shadow and shadow-free layer. Fredembach et al. [1] propose a shadow removal method based on two observations. Later, statistical-learning methods are explored to utilize features [6], including intensity, color, texture, and gradient, to remove shadows. Since these features lack high-level semantics, it is extremely difficult for this kind of methods to understand shadows. In order to address the aforementioned limitations of physics-based methods and statistical-learning methods, many deep learning methods are proposed. Qu et al. [5] propose an automatic and end-to-end deep neural network. It is designed with a multi-context architecture and embeds information from three different perspectives. Hu et al. [2] formulate a new deep framework that automatically learns to remove shadows and generate shadow masks using unsupervised learning methods. Zhang et al. [10] employ a GridNet architecture to remove portrait shadows. However, none of these methods specifically design a separate module to process face regions. Our PSR-GAN is able to concentrate on the face regions and make the most of face characteristics.

3

Data Synthesis

Since there is no publicly available dataset for portrait shadow removal, we need to collect data by ourselves. Specifically, we utilize the data synthesis method [10]. In this section, We first propose the shadow model. Then, we give a brief description of the synthesis of shadow images. Lastly, the basics of the dataset are introduced. 3.1

Shadow Model

We model shadow images as a linear blend between a image I and a image Is , in terms of a shadow mask M : I = I ◦ (1 − M ) + Is ◦ M

(1)

PSR-GAN

81

In the image I , faces are lit by all light sources in the scene, such as the lamp, the sky, and the sun. In the image Is , faces are lit by everything other than the key light source. As to the shadow mask M , it indicates which pixels are shadowed. If the pixel value in the shadow mask M is 1, it means the corresponding pixel in the shadow image I is shadowed. While if the pixel value is 0, it implies the corresponding pixel is lit. Specifically, I , which does not contain any shadows, is selected from Flicker-Faces-HQ Dataset [3]. Is can be obtained by performing a color transform from I . And M comes from a Perlin noise function. 3.2

Shadow Image Synthesis

We utilize the pre-trained ModNet [4] to get the images that contain only the face regions. Then, we perform the data synthesis operations [10], according to the characteristics of portrait shadows. 3.3

Dataset Composition

To construct our dataset, we carefully select 100 portrait images from FlickerFaces-HQ Dataset. These images must not contain any shadows and the main body of the image should be the human face. And we generate 10 images from each portrait image through data augmentation. Our portrait shadow dataset contains 800 pairs of training data and 200 pairs of testing data, covering various ages, races, and genders. Because of its comprehensive data distribution, our dataset can make the model more generalizable.

4 4.1

Methodology Evolutionary Computing

Generative adversarial networks involve an adversarial competition between the generator and the discriminator. The generator’s objective is to generate samples that are as realistic as possible, while the discriminator’s goal is to correctly distinguish between real and generated samples. This adversarial competition resembles the competitive nature among individuals in evolutionary computation, where superior individuals have a higher chance of being selected for subsequent reproduction and evolution. 4.2

Network Architecture

In order to focus more on face regions and make the most of the face information, we introduce a face extraction module, FEM, by which portrait shadow removal can be effectively performed. In addition, our PSR-GAN adopts an improved generative adversarial network structure based on Mask-ShadowGAN [2].

82

T. Ma et al.

Fig. 1. Background regions are incorrectly changed, and some shadows are ignored.

Face Extraction Module. The face extraction module, FEM, is the key part of our method, for it enables our model to make good use of face characteristics and decreases the interference of background redundant information, which makes our method different from others. If not extracting the face regions, many problems could arise. See Fig. 1, the blue boxes show that the background part of the result [2] is severely damaged, which greatly reduces the aesthetic quality. Moreover, cases could happen that the portrait shadow is partly ignored, as displayed in the green box. Overall Structure. Our PSR-GAN consists of two generative adversarial networks. Gf takes the shadow image Is as input and then outputs a shadow-free image I˜f . Gs generates the shadow image I˜s from the corresponding shadow-free image If . Df is used to distinguish between the real shadow-free image If and the generated shadow-free image I˜f and Ds aims to discriminate between the real shadow image Is and the generated shadow image I˜s (Fig. 2).

Fig. 2. The schematic illustration of our PSR-GAN.

PSR-GAN

4.3

83

Loss Function

Our loss function consists of three parts, namely cycle-consistency loss, adversarial loss, and identity loss. Cycle-Consistency Loss. Traditional cycle-consistency loss cannot deal with this many-to-one mapping problem [11]. Specifically, a shadow-free image can be transformed into an infinite number of different shadow images according to the distribution of shadow positions, while multiple shadow images might only correspond to the same shadow-free image. Hence, we add a shadow mask to indicate the shadow area when generating a shadow image. Starting from a shadow image Is , the corresponding cycle-consistency loss can be expressed as follows:   (2) Lacycle (Gf , Gs ) = EIs ∼pdata (Is ) Gs (Gf (Is ) , M ) − Is 1 Here, we adopt the L1 loss ·1 to calculate the pixel-by-pixel difference between the input image and the output image. It is notable that the shadow mask M is obtained by calculating the difference between the real shadow image Is and the generated shadow-free image I˜f and binarizing the result. Starting from a shadow-free image If , the corresponding cycle-consistency loss is similarly expressed as follows:   (3) Lbcycle (Gs , Gf ) = EIf ∼pdata (If ) Gf (Gs (If , Mr )) − If 1 Different from M , the shadow mask Mr are selected from shadow masks learned from the real shadow image. Adversarial Loss. As to the generated shadow-free image I˜f and the real shadow-free image If , we use FEM MF EM to remove their background in the first place. Then, we add them to the discriminator Df . Here, we optimize the following objective function: LaGAN (Gf , Df ) = EIf ∼pdata (If ) [log (Df (MF EM (If )))] + EIs ∼pdata (Is ) [log (1 − Df (MF EM (Gf (Is )))]

(4)

Similarly, with regard to the generator Gs and the discriminator Ds , the corresponding objective function is expressed as follows: LbGAN (Gs , Ds ) = EIs ∼pdata (Is ) [log (Ds (MF EM (Is )))] + EIf ∼pdata (If ) [log (1 − Ds (MF EM (Gs (If , Mr )))]

(5)

Identity Loss. We take the shadow image Is and the shadow mask Mn which does not indicate any shadow regions as the input of the generator Gs . Here, we encourage that no shadows will be added on the input shadow image under the

84

T. Ma et al.

guidance of Mn . Moreover, we can preserve the color composition between the input and output images. The objective function is shown below: Laidentity (Gs ) = EIs ∼pdata (Is ) [Gs (Is , Mn ) − Is 1 ]

(6)

In addition, we take the shadow-free image If as the input of the generator Gf . For the fact that no shadows exist in the image, we expect the output image to be the same as the input one. This is the objective function:   (7) Lbidentity (Gf ) = EIf ∼pdata (If ) Gf (If ) − If 1 Loss Function. With the assistance of cycle-consistency loss, the outline and details of the image can be effectively maintained. Adversarial loss can greatly help improve the quality of generated images and make the generated image more realistic. What’s more, in order to ensure the effectiveness of the shadow mask and maintain the color composition of the input image, we introduce the identity loss. The final loss function is the weighted sum of the three loss functions. Lf inal (Gs , Gf , Ds , Df )   = ω1 LaGAN (Gf , Df ) + LbGAN (Gs , Ds )   + ω2 Lacycle (Gf , Gs ) + Lbcycle (Gs , Gf )   + ω3 Laidentity (Gs ) + Lbidentity (Gf )

(8)

After extensive experiments, we set ω1 , ω2 , and ω3 as 1, 10, and 5, respectively.

5 5.1

Experiments Experimental Settings

Comparing Methods and Evaluation Metrics. We evaluate the shadow removal performance by computing the peak signal-to-noise ratio(PSNR), structure similarity index measure(SSIM) [8], and learned perceptual image patch similarity(LPIPS) [9] between the ground truth and predicted shadow-free images. For PSNR and SSIM, larger value indicates better results. However, smaller value means better results for LPIPS. We compare our PSR-GAN with the following methods: unsupervised method CycleGAN [11] and MaskShadowGAN [2], supervised method GridNet [10] and ST-CGAN [7]. 5.2

Experimental Results

Quantitative Results. We conduct quantitative evaluations on our proposed portrait shadow dataset, and the corresponding results are shown in Table 1. Specifically, Table 1 shows the experimental results in the entire regions, the face regions, and the shadow regions. As shown in the table, our method achieves the highest PSNR and SSIM and the lowest LPIPS.

PSR-GAN

85

Table 1. Experimental results. Method

Training

PSNR

SSIM

LPIPS

Unpaired Unpaired Unpaired Paired Paired

31.258 30.889 31.020 30.100 29.453

0.9990 0.9986 0.9986 0.9980 0.9944

0.0649 0.0686 0.0660 0.0917 0.0686

Unpaired Unpaired Unpaired Paired Paired

32.288 31.840 31.886 31.108 30.892

0.9995 0.9991 0.9990 0.9984 0.9970

0.0576 0.0603 0.0583 0.0679 0.0598

Unpaired Unpaired Unpaired Paired Paired

32.979 32.652 32.687 32.085 31.781

0.9996 0.9993 0.9992 0.9984 0.9980

0.0532 0.0568 0.0548 0.0646 0.0610

Entire Regions Our PSR-GAN CycleGAN[11] Mask-ShadowGAN[2] GridNet[10] ST-CGAN[7] Face Regions Our PSR-GAN CycleGAN[11] Mask-ShadowGAN[2] GridNet[10] ST-CGAN[7] Shadow Regions Our PSR-GAN CycleGAN[11] Mask-ShadowGAN[2] GridNet[10] ST-CGAN[7] Input

GT

Ours

GridNet

ST-CGAN CycleGAN

MSG

Fig. 3. Comparison results on our proposed portrait shadow dataset.

Qualitative Results. Figure 3 shows the comparison results between our PSRGAN and other methods. It is obvious that our results look more realistic and contain fewer artifacts. Moreover, except for our PSR-GAN, the results of all comparison methods have a serious problem that the background regions are lit up in many cases.

86

6

T. Ma et al.

Conclusion and Future Works

In this work, we adopt the idea of evolutionary computing and present a novel portrait shadow removal framework, named as PSR-GAN, and propose a face extraction module, FEM. Making the most of face characteristics, our method performs best among all compared methods on our collected dataset. However, there is still some room for improvement in our method. Despite the fact that our dataset conforms to the law of the shadow phenomena to some degree, these synthesized shadow images might differ slightly from the real shadow images. In the future, we plan to collect real shadow and shadow-free images under more complex scenes, and boost the capability of our network to fit data. Acknowledgements. This work has been supported by National Key R&D Program of China under Grant NO.2018YFB1403905.

References 1. Fredembach, C., Finlayson, G.: Hamiltonian path-based shadow removal. In: Proceedings of the 16th British Machine Vision Conference (BMVC), vol. 2, pp. 502– 511 (2005) 2. Hu, X., Jiang, Y., Fu, C.W., Heng, P.A.: Mask-shadowGAN: learning to remove shadows from unpaired data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2472–2481 (2019) 3. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019) 4. Ke, Z., Sun, J., Li, K., Yan, Q., Lau, R.W.: MODNet: real-time trimap-free portrait matting via objective decomposition. arXiv e-prints arXiv:2011.11961 (2020) 5. Qu, L., Tian, J., He, S., Tang, Y., Lau, R.W.: DeshadowNet: a multi-context embedding deep network for shadow removal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4067–4075 (2017) 6. Vicente, T.F.Y., Hoai, M., Samaras, D.: Leave-one-out kernel optimization for shadow detection and removal. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 682–695 (2017) 7. Wang, J., Li, X., Yang, J.: Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1788–1797 (2018) 8. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004) 9. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018) 10. Zhang, X., et al.: Portrait shadow manipulation. ACM Trans. Graph. 39(4), 78 (2020) 11. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2223–2232 (2017)

Chebyshev Inequality and the Identification of Genes Associated with Alzheimer’s Disease Lei Yu1 , Xueli Tan1 , Delin Luo1 , Lin Yang1 , Xinping Pang2 , Zhengchao Shan3 , Chengjiang Zhu1 , Jeng-Shyang Pan4(B) , and Chaoyang Pang1(B) 1 College of Computer Science, Sichuan Normal University, Chengdu 610101, China

[email protected]

2 West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University,

Chengdu 610041, China 3 Labor Union, Sichuan Normal University, Chengdu 610101, China 4 College of Computer Science and Engineering, Shandong University of Science and

Technology, Qingdao 266590, China [email protected]

Abstract. Alzheimer’s disease (AD) is the most common form of dementia. However, its pathogenesis is not fully understood, and one of the methods to explore the pathogenesis of AD is to search for causative genes. In this study, we conducted research using AD-related DNA microarray data, learned from earthquakes as an analogy. We considered that earthquakes release varying amounts of energy, causing different levels of impact on the ground. Similarly, we compared this phenomenon to the gene correlations during different stages of the disease. We regarded the energy released by an earthquake as the differences in gene correlations at the disease stage and likened the ground surface damage caused by earthquakes to AD’s pathogenesis. Based on these insights, we developed a Chebyshev inequality screening algorithm that utilizes correlation calculations to identify genes associated with AD. The results showed this approach identified 46 AD candidate genes, most of which are closely associated with AD. Computational validation supports the reliability of the algorithm, providing further possibilities to explore potential molecular mechanisms. This study makes significant contributions to advancing our understanding of AD and offers promising directions for future research and potential therapeutic targets. Keywords: Alzheimer’s disease · Chebyshev inequality · DNA microarray

1 Introduction 1.1 Introduction to AD and Gene Chip Technology Dementia, a clinical syndrome disrupting cognitive function, predominantly comprises Alzheimer’s disease (AD), constituting 80% of dementia diagnoses [1]. Research analysis forecasts 152 million AD cases by 2050, with significant economic ramifications [2, 3]. AD’s primary pathogenesis stems from amyloid and tau protein depositions [4]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 87–97, 2024. https://doi.org/10.1007/978-981-99-9412-0_10

88

L. Yu et al.

Under normal conditions, APP, a type I transmembrane protein, maintains high concentrations [5]. The KPIAPP isoform’s AD brain protein links to increased Aβ deposition [6], and 80% of mutated APP genes in AD escalate Aβ content [7, 8]. Tau, a microtubuleassociated protein (MAPT), destabilizes microtubules by binding tubulin [9]. In AD, numerous phosphorylated tau proteins exacerbate disease, disassembling microtubules and compromising neuronal integrity [10]. Genechip, also referred to as microarray technology, represents a pioneering molecular biology approach [11]. This technology empowers researchers to enhance AD candidate gene identification. Among the frequently employed methods are K-means clustering algorithms [12], as well as ACO and PCA [13]. Furthermore, a multitude of studies delves into novel algorithms for AD candidate gene identification [14–16], including the amplitude bias algorithm [17]. 1.2 Simple Idea of Chebyshev’s Inequality Applied to Genetic Screening To pinpoint potential AD candidate genes, we draw an analogy rooted in seismic energy release principles observed during earthquakes. Just as earthquakes discharge varying energy levels, generating diverse ground impacts, we liken the dissimilarity in genedisease marker correlations across stages to seismic energy release. This analogy equates the effect on AD severity to ground impact. Under normal physiological circumstances, gene expression tends to stabilize within the human body. However, in conditions like Alzheimer’s, disease marker genes wield regulatory influence, inducing altered expression in related genes. Consequently, correlation shifts between these genes and disease markers transpire. If these shifts are significant, they can wield a profound AD impact, akin to an earthquake’s potent energy causing ground damage. By gauging these fluctuations in gene correlations, we glean insights into specific gene significance in AD development.

2 Materials and Methods 2.1 Data Collection The gene expression dataset used in this study was sourced from the atlas GSE1297 dataset in the Gene Expression Omnibus (GEO), which is based on the GPL 96 platform (Affymetrix Human Genome U133A array) and contains 31 samples (9 representing control groups and 22 AD patients with three severity levels). Since there are multiple loci in the sample corresponding to one gene, we sum and average the same genes at different sites as the final expression value for each gene. The processed data are represented as follows:   (1) G = G1 . . . Gn . . . Gj T  Gn = g1n · · · gmn··· gin (n ∈ {1 . . . , j}, m ∈ {1 . . . , i})

(2)

In this context, “i” represents the count of genes, with “i” being equal to 13,515. Similarly, “j” signifies the number of samples, where “j” equals 31. Consequently, symbolizes the collective gene expressions within the nth sample, and gmn signifies the expression of the mth gene in the nth sample.

Chebyshev Inequality and the Identification of Genes Associated

89

2.2 Data Preprocessing To mitigate the impact of experimental variables on sample expression values, Z-score standardization is employed for data processing. The computation, as detailed in Eqs. 3– 5, involves utilizing and to denote the mean and standard deviation across all genes in the nth sample. Signifies the normalized expression of the mth gene within the nth sample. The resultant standardized dataset is presented in Eq. 6: 1 i gmn m=1 i

(3)

1 i (gmn − μn )2 m=1 i

(4)

μn = E(Gn ) =  σn =

2

gmn − μn σn ⎞ ⎛ x11 · · · x1j   ⎜ ⎟ X = X1 . . . Xj = ⎝ ... . . . ... ⎠ xi1 · · · xij xmn =

(5)

(6)

2.3 Correlation Obtained by Two Ways This paper examines gene correlations, which reveal linear gene expression relationships and shared change trends. By selecting key AD-associated genes and computing their correlations with others, we unveil relative trends across various AD stages. Leveraging APP and MAPT genes, encoding amyloid beta and tau proteins, as markers, we gauged correlations of other genes with these references. Our goal was to discern potential AD relevance through insights into gene associations with these pivotal markers. Gene Correlation Obtained by Pearson Correlation Coefficient. Pearson’s correlation coefficient serves as a statistical gauge of linear correlation, ranging from −1 to 1. When the coefficient approaches ±1, it signifies a robust correlation between variables, with ±1 indicating perfect positive or negative correlation, respectively. A coefficient of 0 signifies no linear correlation. In our study, we computed correlation coefficients for distinct genes with marker genes (APP and MAPT) across four data stages (control, mild, moderate, severe). Equations 7–10 elucidate the computation procedure for these stages.   (Con) (Con) cov Xm , X(APP/MAPT ) (Con)    =  (7) rm (Con) (Con) D X(APP/MAPT ) D Xm   (Inc) (Inc) cov Xm , X(APP/MAPT ) (Inc)    =  (8) rm (Inc) (Inc) D X(APP/MAPT ) D Xm

90

L. Yu et al.

(Mod ) rm

(Sev) rm

  (Mod ) (Mod ) cov Xm , X(APP/MAPT )    =  (Mod ) (Mod ) D X(APP/MAPT ) D Xm   (Sev) (Sev) cov Xm , X(APP/MAPT )    =  (Sev) (Sev) D X(APP/MAPT ) D Xm

(9)

(10)

(Con)

denotes the array of expression values for the mth gene in the In the formula, Xm (Con) control stage. X(APP/MAPT ) denotes the array of expression values for the genes APP or MAPT in the control stage. Gene Correlation Obtained by Cosine Similarity. Cosine similarity is a metric that quantifies the similarity between two vectors. In correlation analysis, data can be represented as vectors, with each dimension representing a variable or feature. By using cosine similarity, the degree of similarity between these vectors can be determined, and gain insights into the correlation between the variables.In this study, we divide it into four stages, such as in Eqs. 11–14: (Con)

(Con)

Xm ·X   (APP/MAPT )  = cosine(Con) m  (Con)   (Con)  Xm  × X(APP/MAPT )  (Inc)

(Inc)

Xm · X(APP/MAPT )    cosine(Inc) = m  (Inc)   (Inc)  Xm  × X(APP/MAPT )  (Mod )

·X   (APP/MAPT )  =  (Mod )   (Mod )  Xm  × X(APP/MAPT ) 

cosine(Sev) m

Xm ·X   (APP/MAPT )  =  (Sev)   (Sev)  Xm  × X(APP/MAPT ) 

(Sev)

(12)

(Mod )

) cosine(Mod m

Xm

(11)

(13)

(Sev)

(14)

2.4 Changes in Disease Relevance at Different Stages Genes exhibiting substantial correlation shifts are likely linked to AD. To quantify these shifts, we determined correlations among stages through subtraction. As Alzheimer’s disease progresses, gene correlations might exhibit gradual changes, with potential insignificance between adjacent stages. Hence, contrasting different AD severities against the control group could render correlations significant, as depicted in Eqs. 15–19. C (Con_Inc) = r (Inc) − r (Con)

(15)

C (Inc_Mod ) = r (Mod ) − r (Inc)

(16)

Chebyshev Inequality and the Identification of Genes Associated

91

C (Mod _Sev) = r (Sev) − r (Mod )

(17)

C (Con_Mod ) = r (Mod ) − r (Con)

(18)

C (Con_Sev) = r (Sev) − r (Con)

(19)

2.5 Chebyshev’s Inequality Screens for Genes Associated with AD The chebyshev’s inequality is used to screen for genes where these changes are significant. In this paper, let C belong to the 5 change vectors in Sect. 2.5. And let C have an expectation of E(C) and a variance of D(C)2 . Then, for any ε > 0, Eq. 20 holds. P{|Cm − E(C)| ≥ ε} ≤

  D(C)2  C ∈ C (Con_Inc) , C (Con_Mod ) , C (Con_Sev) , C (Inc_Mod ) , C (Mod _Sev) 2 ε

(20)

2

denotes the maximum possible probability that this difference is greater where D(C) ε2 than ε. Thus, ε can be manually set. It is worth noting that when ε is given as different multiples of the D(C) value (i.e., the standard deviation of C), its maximum probability of occurrence is shown in Table 1. Table 1. Table of maximum probabilities corresponding to different values of ε ε

Probability of the maximum occurrence of the difference in correlation

1 ∗ D(C)

100%

2 ∗ D(C)

25%

3 ∗ D(C)

11.1111%

4 ∗ D(C)

6.25%

5 ∗ D(C)

4%

6 ∗ D(C)

2.7778%

7 ∗ D(C)

2.0408%

The table shows that when ε is set to 3 ∗ D(C), the maximum probability of the correlation difference for the mth gene is 11.1111%. As ε increases beyond 3 ∗ D(C), the probability gradually decreases. Therefore, in this paper, ε=3 ∗ D(C) is chosen as the threshold value, and the genes whose difference in correlation with the disease marker genes is greater than 3 ∗ D(C) are considered candidate genes for AD. Meanwhile, we also calculated the results for ε = 4 ∗ D(C), 5 ∗ D(C), 6 ∗ D(C), and 7 ∗ D(C) and further analyzed them in the result.

92

L. Yu et al.

3 Results 3.1 Correlation Between Cosine Similarity and the Pearson Correlation Coefficient Measure for AD Genes The Pearson correlation coefficient outcomes (Fig. 1A and Fig. 1B) unveiled robust correlations between most genes and both APP and MAPT in the control stage. Postdisease onset, gene sequences shifted from ordered to disordered, a trend accentuated with disease severity. Cosine similarity findings (Fig. 1C and Fig. 1D) also highlighted substantial gene correlations with APP and MAPT in the control stage. Nonetheless, genes with correlations nearing 0 displayed disorderliness in incipient, moderate, and severe stages, a trend that escalated with disease severity. Moreover, select genes displaying strong control-stage correlations exhibited reduced correlation as the disease advanced. These altered genes could potentially bear relevance to Alzheimer’s disease and contribute to its progression.

Fig. 1. Correlations between AD marker genes and all genes. (A, C) The graph shows the correlation of MAPT with all other genes at different stages. (B, D) The graph shows the correlation of APP with all other genes at different stages. The ordinate is used to represent the correlation coefficient values. The correlation in (A) and (B) is calculated by Pearson Correlation Coefficient. The correlation in (C) and (D) is calculated by Cosine Similarity.

3.2 Pearson’s Correlation Coefficient and Cosine Similarity Measures the Difference in Correlation Between Different Stages Analyzing the variation in correlation across different disease stages offers insights into genetic responses. We examined changes between control and incipient, control and moderate, control and severe, incipient and moderate, as well as moderate and severe stages (Fig. 2). For Pearson-based results (Fig. 2 A-B), gene correlation shifts were observed on both sides of the horizontal axis, corresponding to genes highly correlated

Chebyshev Inequality and the Identification of Genes Associated

93

with APP or MAPT in the control stage. This indicates reduced relevance of strongly correlated genes during disease onset. Cosine-based findings (Fig. 2 C-D) displayed correlation changes concentrated on the right side of the axis, representing genes with low correlation in the control stage. Changes became more prominent with disease progression, particularly “Sev minus Mod” compared to “Mod minus Inc,” suggesting greater correlation shifts in later stages. This implies that more genes undergo significant changes in later disease stages.

Fig. 2. The differences in correlations between different stages. (A, B) Differences in Pearson correlations with AD marker genes between different stages. (C, D) Differences in Cosine Similarity with AD marker genes between different stages.

3.3 The Chebysheff Inequality was Used to Screen Candidate Genes from Changes Based on Pearson Correlation Coefficient and Cosine Similarity In this analysis, we applied the Chebyshev inequality (Eq. 10) to calculate the correlation changes between genes and APP and MAPT, respectively. In the Pearson correlation coefficient, we set the threshold to 3, 3.25 and 3.5 times the standard deviation to further screen for AD candidate genes. The results based on a threshold of 3.5 times standard deviation for APP and MAPT are shown in Table 2. While in cosine similarity, more than 500 genes exceeded 3 times the standard deviation after screening, so a stepwise higher threshold was required for accurate screening. We used a 3-to-7-fold standard deviation as the threshold for genetic screens. In Table 2 and Table 3, each column represents genes that changed significantly between the indicated stages. An empty cell in a column indicates that there are no genes with significant changes in correlation at that particular threshold between the respective stages. These results highlight the subset of genes that exhibit substantial alterations in correlation with the marker genes during different stages of AD.

94

L. Yu et al.

Table 2. AD Candidate Genes Passing the 3.5-Fold Standard Deviation Threshold Based on APP and MAPT Correlation (MAPT in underline) Inc minus Con

Mod minus Con

Sev minus Con

Mod minus Inc

Sev minus Mod

GINS1

MINA

SUN1

DPM3

GOLGA6

CDH13

YWHAE

APBB1

YTHDF3

BCL7A

RABGAP1

TNKS

DET1

Table 3. AD Candidate Genes Passing the 7-Fold Standard Deviation Threshold Based on APP and MAPT Correlation (MAPT in underline) Inc minus Con

Mod minus Sev minus Mod minus Inc Con Con

ATG14

ITGA7

C1S

GPX3

RBM8A

EFEMP2

DECR2

TNFRSF21 RBM15

SLC8A2

NDP

TMEM208

SF3A1

ACOT13

SF3A1

PCSK2

KATNBL1

WIF1

KATNBL1

CACNG3

RAPGEFL1 ALG8 ATG14

CUL1

Sev minus Mod CIZ1

DNAJ4

PLEKHM2 TMEM208

GPX3

DNAJA4

OXA1L

ALG8

FAM49A

KCNAB2

WIF1

IKBKG

TYW1

TMEM208

ALG8

RALB

KCNJ16

CREB3

RAPGEFL1 PLXNB3

4 Discussion 4.1 Discussion of Correlation Calculation Results This study employs Pearson’s correlation coefficient and cosine similarity as primary metrics to evaluate gene correlations with AD marker genes, with a specific focus on wellestablished markers MAPT and APP. These markers serve as benchmarks for assessing correlations with other genes, aiming to identify potential candidates pivotal in AD’s development or progression. The methodology involves sorting correlations from the control stage and extending this sequence to incipient, moderate, and severe stages, as depicted in Fig. 1. This approach offers clear insights into AD-related changes relative to the control stage. The observed disturbances in the diseased stages suggest substantial gene alterations in AD. Discrepancies in correlations between stages illuminate gene dynamics during AD progression. Pearson correlation coefficient results (Fig. 2 A-B) underscore shifts primarily on both sides of the horizontal axis, aligning with genes strongly correlated during the control stage. This signifies significant changes in their relationships with APP or MAPT post AD onset. Conversely, cosine similarity outcomes (Fig. 2 C-D) spotlight

Chebyshev Inequality and the Identification of Genes Associated

95

shifts primarily on the right side of the horizontal axis, indicating regions of uncorrelated genes during the control stage. This suggests certain gene vectors progressively align with the disease stage’s direction, indicating notable expression pattern alterations. In summary, these collective findings underscore disrupted gene coordination homeostasis in AD. Genes manifesting substantial correlation shifts with APP or MAPT emerge as potential contributors to the disease’s development and progression. These genes hold promise as essential candidates for delving into AD’s fundamental mechanisms, warranting further comprehensive investigation. 4.2 Analysis of AD Candidate Genes The Analysis of AD Candidate Genes from Pearson Correlation Coefficient-Based Changes Using Chebyshev’s Inequality. The results of AD candidate genes obtained based on Pearson’s correlation coefficient are presented in Table 2. These tables include genes that exhibit significant changes between different stages of the disease, which are considered as potential AD candidate genes. Among these identified genes, several are already known to be associated with AD. For instance, APBB1 encodes a protein that is known to interact with the Alzheimer’s disease amyloid precursor protein (APP) [18]. It has been shown to increase the translocation of APP to the cell surface and promote the secretion of αAPPs and Aβ [19]. As a result, APBB1 is believed to play a critical role in the pathogenesis of AD. CDH13, which encodes a member of the calreticulin superfamily. This protein is involved in the negative regulation of axon growth during neural differentiation. Studies have demonstrated that increased CDH13 expression can inhibit neuronal growth, and it has been suggested that there is a direct correlation between CDH13 expression levels and Alzheimer’s disease [20]. The Analysis of AD Candidate Genes from Cosine Similarity-Based Changes Using Chebyshev’s Inequality. The results of AD candidate genes based on cosine similarity are shown in Table 3. After research, the majority of the significant genes listed in the table have demonstrated associations with AD. For instance, TNFRSF21 has been shown to bind to APP and regulate neuroinflammatory effects in AD. Inhibiting TNFRSF21 has been found to reduce APP expression and decrease neuroinflammation, indicating its potential role in disease pathology [21]. The gene encoding the DNA replication factor CIZ1 was found to be more highly expressed in Alzheimer’s disease tissues compared to healthy brains [22]. Notably, the detrimental effects of CIZ1 deficits in the disease become more pronounced with age, highlighting its relevance to Alzheimer’s disease progression [23]. Additionally, several other genes, such as ATG14, RAPGEFL1, WIF1, RBM15, SF3A1, GPX3, NDP, CUL1, IKBKG, and RALB, have all been reported to have direct or indirect associations with AD. These findings indicate their potential involvement in the disease’s underlying mechanisms and suggest their significance as potential targets for further investigation and therapeutic development.

96

L. Yu et al.

5 Conclusions Although Alzheimer’s disease has been known for many years, definitive pathogenesis has remained elusive. In this paper, we introduce a novel Chebyshev inequality screening method based on correlation changes, aimed at identifying AD candidate genes. By specifically examining the correlation of genes with APP and MAPT, two genes that encode Abeta and tau proteins (recognized as typical AD markers), a total of 46 genes exhibiting significant changes in Alzheimer’s disease were identified. Among them, 13 genes with substantial alterations were detected using the Pearson correlation coefficientbased method, while 33 genes with significant changes were identified through the cosine similarity-based method. Remarkably, most of these genes demonstrated direct or indirect associations with AD, thus substantiating the robustness of our proposed computational techniques. Furthermore, correlation analysis provided additional indirect support for the soundness of our approach. Furthermore, genes such as DNAJ4A, not discussed in detail, may also be potentially associated with AD, offering new avenues for future research in this field. Our findings contribute to a better understanding of AD and may provide valuable leads for potential therapeutic targets.

References 1. Crous-Bou, M., Minguillón, C., Gramunt, N., Molinuevo, J.: Alzheimer’s disease prevention: from risk factors to early intervention. Alzheimer’s Res. Therapy 9, 1–9 (2017) 2. Yiannopoulou, K., Papageorgiou, S.: Current and future treatments in Alzheimer disease: an update. J. Central Nervous Syst. Disease 12, 1179573520907397 (2020) 3. Livingston, G., et al.: Dementia prevention, intervention, and care: 2020 report of the Lancet Commission. Lancet 396(10248), 413–446 (2020) 4. Maurer, K., Volk, S., Gerbaldo, H.: Auguste D and Alzheimer’s disease. Lancet 349(9064), 1546–1549 (1997) 5. Tan, J., Evin, G.: β-Site APP-cleaving enzyme 1 trafficking and Alzheimer’s disease pathogenesis. J. Neurochem.Neurochem. 120(6), 869–880 (2012) 6. Menendez-Gonzalez, M., Perez-Pinera, P., Martinez-Rivera, M., Calatayud, M., Blazquez, M.B.: APP processing and the APP-KPI domain involvement in the amyloid cascade. Neurodegener. Dis.. Dis. 2(6), 277–283 (2006) 7. Julia, T.,Goate, A.: Genetics of β-amyloid precursor protein in Alzheimer’s disease. Cold Spring Harbor Perspect. Med. 7(6) (2017) 8. Bi, C., Bi, S., Li, B.: Processing of mutant β-amyloid precursor protein and the clinicopathological features of familial Alzheimer’s disease. Aging Disease 10(2), 383 (2019) 9. Gabbouj, S., et al.: Altered insulin signaling in Alzheimer’s disease brain–special emphasis on PI3K-Akt pathway. Front. Neurosci. 13, 629 (2019) 10. Hong, M., Lee, V.: Insulin and insulin-like growth factor-1 regulate tau phosphorylation in cultured human neurons. J. Biol. Chem. 272(31), 19547–19553 (1997) 11. Li, B., Zhu, X., Zhang, R., Wang, C., Xia, X.: The role of gene chip technology in microbiology studies. Chin. J. Pathogenic Biol. 704–706 (2011) 12. Gabig-Cimi´nska, M., W˛egrzyn, G.: An introduction to DNA chips: principles, technology, applications and analysis. Acta Biochim. Pol.Biochim. Pol. 48, 615–622 (2001)

Chebyshev Inequality and the Identification of Genes Associated

97

13. Podtelezhnikov, A., Tanis, K., Nebozhyn, M., Ray, W., Stone, D., Loboda, A.: Molecular insights into the pathogenesis of Alzheimer’s disease and its relationship to normal aging. PLoS ONE 6(12), e29610 (2011) 14. Zhang, Q., et al.: Preliminary exploration of the co-regulation of Alzheimer’s disease pathogenic genes by microRNAs and transcription factors. Front. Aging Neurosci. 14, 1069606 (2022). https://doi.org/10.3389/fnagi.2022.1069606 15. Yang, X., et al.: The relationship between protein modified folding molecular network and Alzheimer’s disease pathogenesis based on BAG2-HSC70-STUB1-MAPT expression patterns analysis. Front. Aging Neurosci. 15, 1090400 (2023). https://doi.org/10.3389/fnagi. 2023.1090400 16. Zhang, Q., Chen, B., Yang, P., Wu, J., Pang, X., Pang, C.: Bioinformatics based study reveals that AP2M1 is regulated by the circRNA-miRNAmRNA interaction network and affects Alzheimer’s disease. Front. Genet. 13, 1049786 (2022). https://doi.org/10.3389/fgene.2022. 1049786 17. Pang, C., et al.: Identification and analysis of Alzheimer’s candidate genes by an amplitude deviation algorithm. J. Alzheimer’s Disease Parkinsonism 9(1) (2019) 18. Guénette, S., et al.: Evidence against association of the FE65 gene (APBB1) intron 13 polymorphism in Alzheimer’s patients. Neurosci. Lett.. Lett. 296(1), 17–20 (2000) 19. Sabo, S., et al.: Regulation of β-amyloid secretion by FE65, an amyloid protein precursorbinding protein. J. Biol. Chem. 274(12), 7952–7957 (1999) 20. Liu, F., Zhang, Z., Chen, W., Gu, H., Yan, Q.: Regulatory mechanism of microRNA-377 on CDH13 expression in the cell model of Alzheimer’s disease. Eur. Rev. Med. Pharmacol. Sci. 22(9) (2018) 21. Zhang, T., Yu, J., Wang, G., Zhang, R.: Amyloid precursor protein binds with TNFRSF21 to induce neural inflammation in Alzheimer’s disease. Eur. J. Pharm. Sci. 157, 105598 (2021) 22. Dahmcke, C., Büchmann-Møller, S., Jensen, N., Mitchelmore, C.: Altered splicing in exon 8 of the DNA replication factor CIZ1 affects subnuclear distribution and is associated with Alzheimer’s disease. Mol. Cell. Neurosci.Neurosci. 38(4), 589–594 (2008) 23. Khan, M., Xiao, J., Patel, D., LeDoux, M.: DNA damage and neurodegenerative phenotypes in aged Ciz1 null mice. Neurobiol. Aging. Aging 62, 180–190 (2018)

A Novel Crossover Operator Based on Grey Wolf Optimizer Applied to Feature Selection Problem Wenbo Guo1 , Yue Sun1 , Xinping Pang2 , Lin Yang1 , Lei Yu1 , Qi Zhang1 , Ping Yang1 , Jeng-Shyang Pan3(B) , and Chaoyang Pang1(B) 1

3

College of Computer Science, Sichuan Normal University, Chengdu 610100, China [email protected] 2 West China School of Basic Medical Sciences and Forensic Medicine, Sichuan University, Chengdu 610065, China College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China [email protected] Abstract. Alzheimer’s disease (AD) poses challenges in feature selection due to the limited number of features relative to the large number of samples. To address this issue, we propose a novel crossover operator called the Grey Wolf Optimizer Genetic Algorithm (GWOGA) to enhance classification accuracy with a reduced number of features in AD research. GWOGA combines the Grey Wolf Optimizer and a genetic algorithm, addressing limitations and improving feature selection performance. We compared GWOGA with five state-of-the-art feature selection methods and seven crossover operators on integrated and real AD datasets. Results demonstrate that GWOGA outperforms all other methods in terms of accuracy and dimensionality, achieving similar classification accuracy with a significantly smaller feature set. The fitness score analysis indicates that GWOGA identifies a subset of features effectively representing the entire dataset, despite not always having the highest accuracy among state-of-the-art methods. GWOGA shows promise for future AD research, providing a balance between accuracy and computation time and exhibiting potential for identifying novel AD biomarkers. Keywords: Alzheimer Disease

1

· Feature Selection · Genetic Algorithm

Introduction

Alzheimer’s disease (AD) has become a primary focus of 21st-century public health research due to its severe cognitive impact and prevalence among individuals over 65 years old [1]. With AD affecting 4–7% of this population, and Wenbo Guo, Yue Sun: These authors contributed equally to this work.

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/978-981-99-9412-0 11. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 98–107, 2024. https://doi.org/10.1007/978-981-99-9412-0_11

A Novel Crossover Operator

99

its prevalence rising with an aging demographic, reaching 20–30% in those over 85 years old, it is a significant cause of loss of independence and mortality [2,3]. Next-generation sequencing (NGS) technology offers promising avenues for investigating AD at a systems level, but its complexity and high dimensionality present challenges [4]. To address this, biomarker identification is approached as a classification problem, and efficient genetic algorithms (GAs) are used for feature selection to find near-optimal solutions for high-dimensional data [5,6]. GAs have shown promise in predicting disease-causing genes, including in AD [7–10]. However, despite their potential, there are still limitations that need to be addressed to optimize their performance. • Most feature-selection methods struggle with high-dimensional data, posing challenges for predicting disease-causing genes, particularly in neurodegenerative diseases like AD. Acquiring AD samples is particularly difficult due to the need for access to specific brain lobes, unlike diseases that can be diagnosed using blood or tissue cells. • High-dimensional data in feature selection leads to a vast search space, limiting the effectiveness of traditional crossover and mutation operators in GA, which can impact the final accuracy. • The traditional GA’s crossover operator randomly selects parents, resulting in poor stability and potential suboptimal solutions in subsequent iterations. • Existing research on GA’s feature selection emphasizes final classification accuracy but overlooks the number of selected feature subsets, leading to situations where the classification result may be better, but the selected feature subset is larger. To tackle the limitations of traditional feature selection methods, we introduces a novel crossover operator, inspired by the Gray Wolf Optimization (GWO) algorithm, to select the best feature subset for predicting Alzheimer’s disease (AD) patients using next-generation sequencing (NGS) gene expression data [11]. The proposed Grey Wolf Optimization Genetic Algorithm (GWOGA) shows superior performance compared to state-of-the-art algorithms and traditional crossover operators in terms of prediction accuracy. GWOGA adapts the crossover process based on individual adaptability, yielding efficient solutions for high-dimensional datasets. Extensive simulations on public datasets and real AD data demonstrate the effectiveness of GWOGA, making it a valuable tool for identifying important genes in AD research.

2 2.1

Material and Methods Grey Wolf Optimizer as Crossover Operator

In this paper, we propose a new GA crossover operator, the GWO-based GA (GWOGA), to improve feature selection performance in AD datasets. As a wrapper method, the GA uses the classifier accuracy of the model to filter features. The accuracy of the random forest was used to determine which features should

100

W. Guo et al.

Start

Initialize the parameters A, C and current individual i = 0 of GWO

Initialize parameters and set the iteration times t = 0 of GA

According iter times t computing the steps a

Randomly initialize the positions of population X(t)

Find the α, β and γ according to the fitness

Apply GA standard selection operator on the whole X(t) i = population size? No Yes

No

t = max_iter? Compute bstep and cstep for current individual Yes

Output the best individuals

End

Updating the position of current individual

Apply GA standard mutation operator on the whole X(t)

i=i+1

t = t +1

Fig. 1. The flowchart of proposed GWOGA method.

remain in the next generation. The flowchart proposed by the GWOGA is shown in Fig. 1: • Step 1 Initialize the population and express the problem to be solved as individual chromosomes in the genetic space, usually using binary coding. After coding, the initial population P(0) is randomly generated, the iteration counter t = 0 is set, and the maximum iteration number T is set. • Step 2 The adaptability of the individuals, P(t) in the population P(0), to the environment is measured by the accuracy of the random forest. According to Darwin’s theory of evolution, individuals with higher adaptability to their environment have a higher likelihood survival. • Step 3 The end condition is set to “reaching the T generation”. The grey wolf algorithm is applied to the entire population as a crossover operator when t < T. The updating method is described in Eq. (1). In the binary wolf colony algorithm, the location update for the most important wolf colony is represented by the following functions: = Crossover(x1 , x2 , x3 ) xt+1 t

(1)

Wherein, Crossover(x1 , x2 , x3 ) is the appropriate cross between solution x, y, z and x1 , x2 , x3 . They are the binary vectors representing the wolf moving towards Alpha, Beta and Delta wolves. Calculate using Eqs. (2), (3) and (4) respectively:  1, if (xdα + bstepαd ≥ 1) d x1 = (2) 0, otherwise

A Novel Crossover Operator

 1, if (xdβ + bstepβ d ≥ 1) = 0, otherwise  1, if (xdδ + bstepδ d ≥ 1) xd3 = 0, otherwise

xd2

101

(3)

(4)

The GWOGA solution is represented as a dimensional vector with elements corresponding to features (genes). The binary vector indicates selection: 0 for not selected and 1 for selected. xdα , xdβ , xdδ are position vectors of α, β, δ in dimension d. Alpha primarily leads hunting, with participation from beta and delta. Mathematically modeling gray wolf hunting assumes top solutions (alpha, beta, delta) understand prey location and influence updates for other agents, including omega (based on Formula 1). • Step 4 Initialize the binary values of random vectors A and C, set the current iteration number to 0, and calculate the step size a according to t.  = 2a ∗ r1 A

(5)

 = 2r2 C

(6)

a decreases linearly from two to zero during the iteration and maintains a balance between exploration and exploitation. r1 and r2 are the random vectors in [0, 1]. 2 a=2−t (7) T • Step 5 Find the α, β, and δ wolves according to the accuracy of the random forest. • Step 6 For all individuals in the population, except the alpha, beta, and delta wolves, calculate the binary step sizes bstep, astep, and xdα . An example of calculating bstep and astep of α is described in formulas (8) and (9).  1, if (cstepdα ≥ rand) d bstepα = (8) 0, otherwise where rand is a random number with a normal distribution in the interval [0,2]. cstepdα is the continuous value of step size in d dimension, which can be calculated by the following formula: cstepdα =

1 d d 1 + e−10(A1 Dα −0.5)

 Dα = C1 · Xα − X

(9) (10)

102

W. Guo et al.

 is A and C are the random vectors described in the formula (5) and (6). X the current individual which needs to move to a better location. According to bstep and the current position xdi , the value of xdα is  1, if (bstepdα + xdi > 1) d xα = (11) 0, otherwise The value of bstepdα is calculated using (8). xdα is the value calculated based on the position of the d-th dimension of α and the current individual. • Step 7 Calculate the values of xdβ and xdδ based on the β and δ wolves. Subsequently, the current individual is updated as: ⎧ d ⎪ ⎨xα , if (rand < 0.33) xd = xdβ , if (0.33 < rand < 0.66) (12) ⎪ ⎩ d xδ , otherwise where rand is a random number within the interval [0,1]. xd is the next position of the current individual in the d − th dimension. • Step 8 Apply the mutation operator to each individual (bit string mutation operator). Each dimension had the same possibility of mutation. • Step 9 Repeat Step 3 -Step 8 until t = T and the final generation is reached. The optimal individual is the alpha wolf vector. The elements in α, which equals 1, represent the selected genes. 2.2

Description of Datasets and Evaluation Methods

In this study, 11 public datasets from the UCI database were downloaded and categorized into high-dimensional and low-dimensional datasets based on features/sample ratios (Table 1). The focus is on feature selection for Alzheimer’s disease (AD) datasets, which typically exhibit high dimensionality with approximately 3–40 samples and 23,000 genes. Moreover, to assess the performance of the proposed Grey Wolf Optimization Genetic Algorithm (GWOGA), four real AD datasets were selected for evaluation. The performance was measured using accuracy, precision, recall, F1 score of the random forest classifier. We also utilized fitness score to evaluate both model accuracy and selected feature subset size. It is calculated by adding the error rate of the binary classifier to the ratio of the size of the identified feature subset to the size of the total feature subset.

3

Results

In this section, we utilized 5-fold cross-validation (CV) for stable performance using the ‘cross val score’ function in the sckit (version 11.0) package of Python 3.7.

A Novel Crossover Operator

103

Table 1. The detail information of different datasets

3.1

Name

Num. Samples Num. Features Features/Samples

Gas Rejafada Gastrointestinal SetaProcessT2 SetaProcessT1 Breast Cancer Heart Disease Nba Condon Abalone News Popularity

30 800 122 44 74 569 303 1340 12167 2640 32294

434 6824 509 84 84 30 13 20 65 8 59

14.466 8.53 4.172 1.909 1.135 0.052 0.042 0.014 0.005 0.003 0.001

Compare with Different Feature Selection Algorithms

To assess GWOGA’s effectiveness, we compared it with five prominent feature selection algorithms: Variance Threshold, Select K Best, Recursive Feature Elimination (RFE)[12], Binary Particle Swarm Optimization (BPSO), and Simulated Annealing (SA). As shown in Table 2, GWOGA showed superior or equal mean classification accuracy in high-dimensional datasets compared to the other methods. RFE performed equally well in Gas and Rejafada datasets. SA demonstrated similar results to GWOGA, but with more stability than RFE. Overall, our proposed method showed the promising accuracy in high dimension datasets. Table 2. The mean accuracy of the high dimensions datasets Mean Accuracy

GWOGA

Variance Threshold

Select K Best

RFE

BPSO

Gas

0.967

0.867

0.933

0.967

0.933

SA 0.933

Rejafada

0.976

0.923

0.974

0.976

0.968

0.966

Gastrointestinal

0.992

0.811

0.885

0.844

0.877

0.983

SetaT2

0.614

0.592

0.614

0.547

0.525

0.547

SetaT1

0.669

0.608

0.655

0.654

0.606

0.686

Supplementary Table 2 presents the results for low-dimensional datasets, showing minimal differences among the five methods. GWOGA and BPSO achieved the best results in two datasets, while the variance threshold, select k best, and RFE methods outperformed others in one dataset each. However, the RFE algorithm failed to find solutions in two datasets within the specified time frame. Table 3 demonstrated that GWOGA consistently outperformed other algorithms in all datasets, indicating its ability to identify small, representative feature subsets. This is particularly crucial for AD research, where

104

W. Guo et al.

biologists typically use limited AD samples for experiments. GWOGA’s fitness score makes it highly suitable for AD data analysis and potential biomarker identification Table 3. The mean fitness score of the different datasets

3.2

Mean Fitness

GWOGA

Variance Threshold

Select K Best

RFE

BPSO

Gas

0.047

0.154

0.267

0.533

0.728

SA 0.115

Rejafada

0.104

0.116

0.223

0.524

0.655

0.299

Gastrointestinal

0.028

0.286

0.316

0.655

0.75

0.091

SetaT2

0.458

0.992

0.588

0.953

1.225

0.5

SetaT1

0.378

0.916

0.547

0.846

1.132

0.643

Breast Cancer

0.153

0.692

0.254

0.544

0.832

0.153

Heart Disease

0.315

0.76

0.498

0.651

1.275

0.318

NBA

0.377

1.01

0.571

0.836

1.236

0.394

Codon

0.135

0.285

0.223

NA

0.814

0.165

Abalone

0.353

0.354

0.488

0.69

1.173

0.363

News Popularity

0.443

0.699

0.577

NA

1.127

0.49

Compare with Different Crossover Operators

This section evaluates the performance of the Grey Wolf Optimizer (GWO) as a crossover operator in genetic algorithms, comparing it to various other crossover operators. The comparison is based on the GSE15222 dataset, with feature subsets containing one, three, five, seven, and nine genes. From Fig. 2, it is evident that GWOGA demonstrates improved average accuracy, precision, recall, and F1 score compared to other crossover operators. The single-point crossover operator, in particular, shows the worst classification performance, indicating a reduction in accuracy with larger feature subsets. This issue can occur with other fixed-rate crossover operators as well. However, using GWO as the crossover operator in GA mitigates this problem and enhances classification accuracy and stability, identifying more representative characteristic genes from the entire gene set compared to other operators. 3.3

Real AD Data Applications

In this section, we applied the GWOGA feature selection algorithm to four real AD datasets, comparing it with five state-of-the-art methods. As shown in supplementary Fig 1 and supplementary Fig 2, GWOGA achieved the highest accuracy and precision values for three datasets, but slightly lower for one dataset due to probe mismatches. The average fitness score demonstrated that GWOGA outperformed all methods across AD datasets, especially in cases with a large number of features and a small sample size (Fig. 3). For one dataset, the SA method performed equally well as GWOGA, but SA showed poor performance in other datasets. The Variance Threshold method’s scores were not as favorable as GWOGA’s, indicating GWOGA’s stability across all datasets.

A Novel Crossover Operator 0.8

0.8

0.75 GWO Uniform Onepoint Twopoint Partialymatched UniformPartialymatched Ordered SimulateBinary

0.7

0.65

Average F1 Score

Average Accuracy

0.75

1

3

5

7

GWO Uniform Onepoint Twopoint Partialymatched UniformPartialymatched Ordered SimulateBinary

0.7

0.65

0.6

0.6

9

1

Number of Features

3

5

7

9

Number of Features

0.8

0.8

0.75

0.75 GWO Uniform Onepoint Twopoint Partialymatched UniformPartialymatched Ordered SimulateBinary

0.7

0.65

Average Recall

Average Precision

105

0.65

0.6 1

3

5

7

GWO Uniform Onepoint Twopoint Partialymatched UniformPartialymatched Ordered SimulateBinary

0.7

0.6

9

1

Number of Features

3

5

7

9

Number of Features

Fig. 2. Comparison of accuracy, F1 score, precision, and recall value between GWOGA and other crossover operators in GSE1522 data when 1,3,5,7,9 genes were selected GSE15222

GSE173955

1.5

1.5

1.25

1.25

1

1

0.75

0.75

0.5

0.5

0.25

0.25 0

0

Mean Fitness

Mean Fitness

GSE203206

GSE1297

1.5

1.75 1.5

1.25

1.25

1

1

0.75

0.75 0.5

0.5

0.25

0.25

0

0

Mean Fitness No FS Select K Best SA

GWOGA RFE

Variance Threshold BPSO

Mean Fitness No FS Select K Best SA

GWOGA RFE

Variance Threshold BPSO

Fig. 3. Comparison of fitness between GWOGA and other feature selection methods in four real AD data.

4

Discussion

Alzheimer’s disease research faces challenges in feature selection due to data complexity, heterogeneity, and limited availability. Biomarkers selection for diagnosis and monitoring is particularly demanding in handling high dimensionality and noise. Genetic algorithm (GA) is a powerful optimization method commonly used for feature selection, but its performance depends on various factors. To enhance GA’s efficacy, developing a novel crossover operator is essential, especially for high-dimensional data in Alzheimer’s research. This advancement may lead to identifying novel biomarkers, providing new insights into disease mechanisms. GA holds potential for feature selection in Alzheimer’s research, requiring an improved crossover operator for optimal performance in high-dimensional data.

106

W. Guo et al.

This study aimed to enhance genetic algorithms’ performance in highdimensional data, specifically in Alzheimer’s disease research. A novel crossover operator, GWOGA, inspired by the grey wolf optimizer, was proposed and combined with state-of-the-art feature selection methods on UCI and real AD datasets. GWOGA consistently outperformed traditional crossover operators, achieving comparable accuracy with a smaller feature set. The fitness score analysis indicated that GWOGA effectively represented the datasets. This study has significant implications for improving the performance of genetic algorithms in high-dimensional data, particularly in the context of Alzheimer’s disease. By proposing a novel crossover operator that overcomes the limitations of traditional methods, this study have demonstrated the potential of GWOGA to improve feature selection and classification accuracy in such datasets.

5

Conclusion

We applied GWOGA, a novel crossover operator based on the Grey Wolf Optimizer, to solve feature selection problems in high-dimensional and low-sample datasets. It can accurately predict crucial genes associated with AD. In comparison to five state-of-the-art feature selection methods and seven traditional crossover operators, GWOGA demonstrated a higher classification accuracy in high-dimensional data. The results of real AD data applications proved the effectiveness of this new crossover operator in both gene classification and selection.

Conflict of Interest. On behalf of all authors, the corresponding author states that there is no conflict of interest. Code Availability. The code of GWOGA is available in https://github.com/ guowenbo1/GWOGA1.

References 1. Huang, Y., Mucke, L.: Alzheimer mechanisms and therapeutic strategies. Cell 148(6), 1204–1222 (2012) 2. Mukherjee, S., et al.: Identifying and ranking potential driver genes of Alzheimer’s disease using multiview evidence aggregation. Bioinformatics 35(14), 568–576 (2019) 3. Wang, M., Hao, X., Huang, J., Shao, W., Zhang, D.: Discovering network phenotype between genetic risk factors and disease status via diagnosis-aligned multimodality regression method in Alzheimer’s disease. Bioinformatics 35(11), 1948– 1957 (2019) 4. Trinh, H.-C., Kwon, Y.-K.: A novel constrained genetic algorithm-based Boolean network inference method from steady-state gene expression data. Bioinformatics 37(Supplement 1), 383–391 (2021) 5. Hussein, F., Kharma, N., Ward, R.: Genetic algorithms for feature selection and weighting, a review and study. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 1240–1244. IEEE (2001)

A Novel Crossover Operator

107

6. Jalili, M.: Graph theoretical analysis of Alzheimer’s disease: discrimination of ad patients from healthy subjects. Inf. Sci. 384, 145–156 (2017) 7. Abd El Hamid, M.M., Omar, Y.M., Mabrouk, M.S.: Identifying genetic biomarkers associated to Alzheimer’s disease using support vector machine. In: 2016 8th Cairo International Biomedical Engineering Conference (CIBEC), pp. 5–9. IEEE (2016) 8. Kang, C., Huo, Y., Xin, L., Tian, B., Yu, B.: Feature selection and tumor classification for microarray data using relaxed lasso and generalized multi-class support vector machine. J. Theor. Biol. 463, 77–91 (2019) 9. Alshamlan, H.M., Badr, G.H., Alohali, Y.A.: Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput. Biol. Chem. 56, 49–60 (2015) 10. Hussain, I., et al.: Optimizing energy consumption in the home energy management system via a bio-inspired dragonfly algorithm and the genetic algorithm. Electronics 9(3), 406 (2020) 11. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014) 12. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)

Research on the Detection Method of Sybil Attacks on Wireless Terminals in Power Internet of Things Daming Xu1 , Kelin Gao1 , Zhongbao Hou1 , Li Zhang1 , Zhipei Wei1 , Qi Li1 , and Yuzhu Ding2(B) 1 State Grid Baicheng Power Supply Company, Baicheng 137000, Jilin, China 2 Northeast Electric Power University, Jilin 132012, Jilin, China

[email protected]

Abstract. As a key technology for achieving comprehensive coordinated control of new power systems, Wireless Sensor Network is widely used in the perception layer of power Internet of Things. Sybil attack, as a complex form of attack, severely disrupts the positioning, routing, and data transmission functions of Wireless Sensor Networks. This paper proposes a Sybil attack detection scheme based on the Received Signal Strength Indicator. It utilizes special detection nodes in the network to detect Sybil attack nodes. Additionally, a routing strategy based on trust threshold is proposed. Simulation results show that the algorithm can effectively detect and eliminate the adverse effects of Sybil attacks on network positioning and routing. Keywords: Power Internet of Things · Wireless Sensor Network · Sybil Attack · Network Localization · Routing

1 Introduction As the bottom layer of the power Internet of Things (PIoT), the perception layer contains a massive number of terminals and is crucial for achieving comprehensive sensing in the PIoT. To achieve uninterrupted and all-around monitoring of power terminals, a large number of sensing devices need to be deployed. As a wireless access technology, Wireless Sensor Network (WSN) plays a key role in enabling the massive terminal access in the PIoT. WSN is a wireless network composed of a number of sensor nodes that possess sensing, computation, and communication capabilities. These nodes selforganize to form a network and transmit monitoring information to users in specific environments. Due to limitations in memory, computing power, and energy, it is nearly impossible to replace individual nodes in a WSN [1]. To ensure the normal operation of WSN under malicious attacks, several necessary measures can be taken [2]. The Sybil attack is a highly threatening attack method to WSN [3]. Due to the fact that the Sybil attack nodes have obtained legitimate identity credentials within the network, the Sybil attack possesses characteristics such as being difficult to detect, highly © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 108–117, 2024. https://doi.org/10.1007/978-981-99-9412-0_12

Research on the Detection Method of Sybil Attacks

109

covert, and highly threatening [4]. During the positioning process, a Sybil attack node can impersonate one or multiple beacon nodes to send false coordinate information to other nodes [5]. This can result in increased localization errors. During the routing process, a Sybil attack node can attract other nodes’ data transmissions to pass through the attack node [6]. Alternatively, the Sybil attack node can collect and transmit information data [7]. Indeed, once such Sybil attack nodes exist in the network, they can have a significant impact on algorithms used for network localization, data transmission, and energy consumption [9]. As a key technology for achieving comprehensive perception in the smart grid, research on the security of WSN is crucial to support reliable data transmission and comprehensive perception in the PIoT. Therefore, detecting and defending against Sybil attack nodes in WSN becomes an important issue in ensuring the security of wireless terminals in the PIoT. Currently, researchers have proposed using voting algorithms based on the network itself to detect nodes. When locating unknown nodes, the anchor nodes are grouped, and the results of each group are calculated. Suspicious nodes with large errors are identified and given an additional vote, allowing for the identification of malicious attack nodes [10]. For defense strategies against Sybil attacks, most of them utilize encryption algorithms. Researchers have proposed a key management approach: during the initialization phase, the network pre-distributes a key to each sensor node for generating a unidirectional key chain. The generation method of a hash tree is used to allocate an identity certificate to each sensor node, which associates the node with the unidirectional key chain. During the node communication and data transmission phases, the nodes first exchange their identity certificates to verify the legitimacy of each other’s identities. Only after successful verification, the nodes can begin communication and data transmission [11]. In literature [12], an OKCIDA mechanism has been proposed, which utilizes one-way key chains for encryption. Additionally, an improved LFNA protocol has been introduced to address the Sybil attack problem at the routing level. In literature [13], two robust distance-based methods are proposed to verify malicious attacks discovered through beacon-based localization in sensor networks. These methods filter out malicious beacon signals by examining the inconsistency in the position reference values of different beacon signals. By removing such malicious data, they prevent malicious attacks. In terms of the adverse effects of Sybil Attacks on routing, in the DD routing algorithm [14], the network first evaluates the selected path. If the average trust value of all nodes on the selected path is higher than the trust threshold set by the base station for interest messages, it indicates that the path is secure, and the routing selection and data transmission proceed along the original route. Moreover, when Sybil Attack nodes have already occurred, it is important to identify forged data. Literature [15] proposes a tolerance algorithm that utilizes the redundancy of reference information in node localization systems. It uses the unbiased estimate of variance to detect whether it conforms to the error hypothesis. This method enhances the network’s tolerance to malicious data when Sybil Attacks are present.

110

D. Xu et al.

2 The Impact of Sybil Attacks 2.1 The Impact of Sybil Attacks on Localization Due to the physical conditions required in localization algorithms, such as signal strength and signal transmission time, as well as the limited communication range of Sybil attack nodes, a portion of the network may be affected. To completely disable the network, attackers would need to join a large number of Sybil attack nodes to cover the entire network. As a result, anchor nodes in the network would be unable to provide coordinate references for unknown nodes, leading to localization failures. 2.2 The Impact of Sybil Attacks on Data Transmission During data transmission, Sybil attack nodes can disguise themselves as legitimate nodes in the network and present themselves as more suitable for data transmission based on distance coordinates or energy advantage information. This leads to the nodes in the network considering the attack nodes as normal nodes and giving them the advantage in transmission conditions that cannot be matched by normal nodes. As a result, the data intended for the nodes is redirected to the attack nodes, which replace the normal nodes in the network. This poses a serious threat to the information security of the network. 2.3 The Impact of Sybil Attacks on Routing Sybil attacks can have the following impacts on routing: (1) Once a Sybil attack node obtains a legitimate identity within the network, it can disguise itself as a cluster head node. All network nodes within its communication range will send information to the attack node. The attack node can then manipulate or delete the information, and send unreliable data to the base station. (2) The attack node disguises itself as a regular node and sends incorrect information to the cluster head node of its cluster. Alternatively, it continuously sends messages to the cluster head nodes within its own region, causing them to be unable to properly receive information from other nodes or receive a large amount of erroneous data. (3) Attack nodes can change their identities to impersonate important nodes in the routing path, causing changes in the data flow within the network and affecting the distribution of energy consumption among network nodes.

3 Sybil Attack Detection and Resistance 3.1 Sybil Attack Detection Scheme This paper proposes a Sybil Attack detection scheme based on RSSI (Received Signal Strength Indication). The specific process is as follows: (1) In the network initialization stage, all ordinary nodes broadcast the remaining energy and their own ID number, find two surrounding detection nodes within their own communication range, and communicate with them. Assuming node S is the attack node, in time t1 , the Sybil Attacks node S to send its own packets to detection nodes D1 and D2 under a disguised identity.

Research on the Detection Method of Sybil Attacks

111

(2) Taking detection node D1 as an example, detection node D1 receives the identity packet of the Sybil Attack node S at t 1 and confirms the disguised identity X of the attack node S (X also refers to the node to be detected). At this time, detection node D1 cannot confirm whether node S is really the Sybil Attack node. According to the log-path loss model, the signal strength value (RSSI) of node S can be obtained by detecting node D1 . The calculation of the signal strength value is shown in (1): RXD1 =

Pt k dDα1

(1)

where Pt is the transmission power; k is a tunable parameter; dDα1 indicates the Euclidean distance between node D1 and node S. α is the path loss index, which is related to the network environment. The value ranges from 1.6 to 1.8 in the free space and from 4–6 when there are many obstacles. (3) When detecting node D1 works, detecting node D2 will also receive the packet information of node X to be detected at time t 1 , and calculate it according to the above sequence, and finally obtain RXD2 through a series of calculations. After RXD2 is calculated, the RSSI signal strength ratio value is calculated using (2): RXD2 RXD1

 =

   Pt k Pt k / α dDα2 dD1

(2)

(4) When the network passes t0 for a period of time, that is, the time is t 1 + t 0 , and the attack node S in the network starts to broadcast the information packet in another identity. At this time, detection nodes D1 and D2 perform the same algorithm steps as before to calculate the RSSI signal strength ratio at time t 1 + t 0 . And use the calculated results into (3) to get the decision ratio: RXD2 RXD1



RYD2 RYD1

≈0

(3)

Since common nodes have fixed ID numbers, and the distance between detection node D1 and detection node D2 is different, if the values of the two determination ratios are almost equal, it means that node S has two identities at the same location at two different times. If node S keeps changing identities, it can be completely determined whether node S is an attack node by repeating the above steps. Detection nodes D1 and D2 report the decision result to the base station and record the detection result, and the base station makes countermeasures against the Sybil Attack nodes in the network. 3.2 Improved Positioning Strategy of Anti-Sybil Attack This article proposes an improved positioning method based on the sum of squared residuals. The specific approach is as follows: First, anchor node authentication is performed. The process is as follows:

112

D. Xu et al.

(1) The anchor node broadcasts its own position coordinates and ID number, while also listening and saving the position coordinates and ID numbers of all other anchor nodes within its communication range. (2) The anchor node uses the trilateration method to determine its own position. After the positioning is completed, the positioning error rate is calculated. If the positioning error rate exceeds 50%, the anchor node broadcasts the grouping of anchor nodes that have positioning errors exceeding the threshold in the network. (3) When other nodes receive this packet, they record the ID and location information of the anchor nodes in the packet and mark it as 1. If the same node appears in different packets and one of the packets is broadcasted due to excessive errors, causing other nodes to receive it, the record of that node is incremented by 1 until the process ends. For packets with anchor node information that result in normal calculation results, they are broadcasted to inform surrounding nodes. If the anchor nodes in the normal packets have been incremented by other nodes, when receiving a normal packet, the nodes decrement the record based on the anchor node’s ID. (4) If the same anchor node is recorded multiple times by surrounding nodes, it is identified as a camouflage node, and it is identified as a false coordinate disguised by the Sybil Attack node. Secondly, this article proposes a localization method that utilizes the sum of squared residuals to resist Sybil attacks. It assumes that the anchor nodes closest to the unknown node U are B1 , B2 , and the Sybil attack node S. The specific steps are as follows: (1) Unknown node U collects the location information and ID information of all the surrounding anchor nodes that indicate identity, and groups the location information of the nodes that indicate identity of the anchor node within the communication range into three groups. (2) The unknown node U locates itself based on the grouping of anchor nodes, and obtains the positioning result of each grouping after the positioning is completed. Assuming that the positioning result is recorded as (xu , yu ), then the distance between the calculated positioning result and the three anchor nodes in its grouping can be calculated using (4):  (4) dUB = (xu − xB )2 + (yu − yB )2 where (x B , yB ) is the position coordinate of the anchor node in the anchor node group. Formula (5) is used to find the residual sum of squares β of the positioning results of each group:   n   (xu − xB )2 + (yu − yB )2 − d i=1 β= (5) n where d is the average sum of the distance between the unknown node positioning coordinates of each group and each anchor node and then divided by the number of groups.

Research on the Detection Method of Sybil Attacks

113

Then, the calculated β is averaged according to the groups of anchor nodes, as shown in (6). β=

β1 + β2 + . . . + βn n

(6)

(3) Compare the β value obtained by each group with β, when β > β, it means that the positioning error of the group is significantly greater than the average error, and this group is no longer used for positioning. 3.3 Improved Routing Strategies to Counter Sybil Attacks This article proposes a routing strategy for multi-path information transmission. The specific process is as follows: 1) Initialize trust threshold. All nodes in the network are initialized with a trust value of η = 0.5. The detecting nodes in the network detect Sybil attack nodes. Since the RSSI signal strength from the Sybil attack nodes to the detecting nodes does not change due to their disguise at the same time. Therefore, when two detection nodes in the network simultaneously detect the presence of a witch attack node nearby, the detection nodes can determine the approximate location of the attack node in the network, as well as the IDs of the nodes contained within its region. 2) Adjustment of trust threshold. Once the detection nodes determine the Sybil attack region, they broadcast the ID numbers of the nodes within that region. Each time a broadcast is received, the trust threshold of the receiving node decreases by half. For the areas where Sybil attacks are not detected, the detecting nodes will broadcast data packets to strengthen the trust threshold based on the node’s ID number. Each time a node receives this data packet, the trust threshold increases by 0.1. 3) Cluster head election phase. During the election of cluster head nodes, nodes with higher trust thresholds have a greater probability of being elected as cluster head nodes. 4) Routing phase. The trust threshold of nodes is compared on a gradient basis. The previous node always passes the routing path to the node with the highest trust threshold, allowing the routing paths in the network to transmit information along the path with the highest trust gradient. Cluster head nodes will select two or more nodes with the highest trustworthiness within their neighbor range to pass on information. This allows the same information to be transmitted through multiple routing paths. When the base station receives this information, it first checks if the transmitted information is the same. If the transmitted data is the same, it indicates that the data has not been tampered with.

114

D. Xu et al.

4 Simulation Experiment and Analysis 4.1 Simulation Experiment and Analysis To validate the effectiveness of this algorithm, this paper conducted algorithm simulations using Matlab. The experimental simulation area was set to a 300 × 300 region, and in each round, a variable number of Sybil Attack nodes, denoted as S, were introduced. In this paper, regular nodes had an energy of 2J, while detection nodes had an energy of 5J. Table 1 lists other parameters. Table 1. Simulated Parameter Simulation parameter

value

Number of nodes

400

Node energy consumption, E elec

50 nJ/bit

Data fusion energy consumption, E DA

5 nJ/bit/packet

Amp free space Energy, 1fs

100 pJ/bit/m2

Amp multipath loss, 1mp

0.013 pJ/bit/m4

Packet length, k

2000 bits

Information Pack, ifo

40 bits

Signal sampling rate

500

4.2 Algorithm Simulation and Analysis Figure 1 shows the detection diagram of Sybil Attack nodes. The red line represents the number of Sybil Attack nodes in the network, and the black line represents the number of attack nodes detected and forged legitimate nodes in the network. As can be seen from the figure, the number of counterfeit legitimate nodes detected in the network is far greater than the actual number of attack nodes. Figure 2 presents a comparative graph of localization accuracy under various scenarios. The graph illustrates the positioning errors in the network under normal circumstances, after algorithm processing, and when subjected to a Sybil attack. In this figure, 50 rounds of unknown nodes under Sybil attack in the network are selected and their errors are averaged, and the results are compared with the unknown nodes with the same ID under normal conditions and the unknown nodes with the same ID under the condition of algorithm correction. The positioning error obtained is the average error value. The error calculation formula is shown in (7): 

N  (xi −xi )2 +(yi −yi )2

E=

R

i=1

N

(7)

Research on the Detection Method of Sybil Attacks

115

Fig.1. The detection rate of Sybil Attack nodes and the number of forged identities

where E is the average error value; N indicates the number of attacked nodes on the network. As can be seen from Fig. 2, the positioning error after algorithm correction decreases by approximately 37%, but there is still a certain gap compared to the positioning error under normal conditions. However, the error remains within an acceptable range.

Fig. 2. Comparison of average positioning errors of attacked nodes

Figure 3 represents a comparison of node localization errors under different communication radii, including localization errors when under attack, localization errors after algorithm correction, and localization errors under normal conditions. When the communication radius is 30 m, the localization error decreases by 30.7%. When the communication radius is 40 m, the localization error decreases by approximately 27.2%. From the graph, it can also be seen that increasing the communication radius significantly improves the node localization accuracy. Figure 4 represents the variation in data packet reception rate in the network under different conditions. The data packet reception rate is defined as the ratio of the number of successfully received data packets at the base station to the number of data packets

116

D. Xu et al.

Fig. 3. Average positioning error of nodes under different communication radii of this algorithm

sent. From the graph, it can be observed that the grouping reception rate under this algorithm remains stable within an acceptable range.

Fig.4. Packet reception rate

5 Conclusion This paper provides a detailed analysis of Sybil attacks and proposes an RSSI-based Sybil attack detection scheme. It utilizes detecting nodes within the network to identify the forged disguise nodes during a specific time period. Additionally, a routing strategy based on trust threshold is proposed. Experimental simulation results demonstrate that the Sybil attack detection strategy proposed in this paper effectively detects disguised nodes and Sybil attack nodes within the network. In terms of resisting Sybil attacks, the proposed algorithm improves localization accuracy and data transmission quality during localization. It helps to collect accurate perception data for the Internet of Things in the power grid, enabling uninterrupted and comprehensive perception in the PIoT.

Research on the Detection Method of Sybil Attacks

117

References 1. Li, J.P., Li, G.C., Chu, S.C., Gao, M., Pan, J.S.: Modified parallel tunicate swarm algorithm and application in 3D WSNs coverage optimization. J. Internet Technol. 23(2), 227–244 (2022) 2. Li, J.P., Gao, M., Pan, J.S., Chu, S.C.: A parallel compact cat swarm optimization and its application in DV-Hop node localization for wireless sensor network. Wirel. Netw. 27(3), 2081–2101 (2021) 3. Gou, S.N.: Application research about wireless sensor network nodes location algorithm based on improved RSSI. Appl. Res. Comput. 29(5), 1867–1869 (2012) 4. Hou, S.F., Zhou, X.J., Yan, B.: Mobile node localization for wireless sensor networks. J. Chinese Comput. Syst. 32(6), 1081–1084 (2011) 5. Zhu, X.J., Meng, X.R.: An algorithm of mobile node localization based on weighted centriod for wireless sensor networks. Comput. Eng. Sci. 33(11), 15–19 (2011) 6. Want, R., Hopper, A., Falcao, V.: The active badge location system. ACM Trans. Inform. Syst. 10(1), 91–102 (1992) 7. Yu, Q., Sun, S.Y., Xu, B.G., Chen, S.J.: Node localization in wireless sensor networks based on improved particle swarm optimization. J. Comput. Appl. 35(6), 1519–1522 (2015) 8. Li, J.P., Han, Q., Wang, W.T.: Characteristics analysis and suppression strategy of energy hole in wireless sensor networks. Ad Hoc Netw.Netw. 135, 1–12 (2022) 9. Yaghoubi, F., Abbasfar, A., Maham, B.: Energy-efficient RSSI-based localization for wireless sensor networks. IEEE Commun. Soc. 18(6), 973–976 (2014) 10. Abu-Shaban, Z., Zhou, X.Y., Abhayapala, T.D.: A novel TOA-based mobile localization technique under mixed LOS/NLOS conditions for cellular networks. IEEE Trans. Veh. Technol.Veh. Technol. 65(11), 8841–8853 (2016) 11. Shan, Z.L., Liu, L.H., Zhang, Y.S., Huang, G.X.: A strong self-adaptivity localization algorithm based on gray prediction model for mobile nodes. J. Electron. Inf. Technol. 36(6), 1492–1497 (2014) 12. Baggio, A., Langendoen, K.: Monte Carlo localization for mobile wireless sensor networks. Ad Hoc Netw.Netw. 6(5), 718–733 (2008) 13. Abouzar, P., Michelson, D.G., Hamdi, M.: RSSI-based distributed self-localization for wireless sensor networks used in precision agriculture. IEEE Trans. Wireless Commun.Commun. 15(10), 6638–6650 (2016) 14. He, T., Huang, C.D., Blum, B., Stankovic, J., Abdelzaher, T.: Range-free localization schemes for large scale sensor networks. In: 9th Annual International Conference on Mobile Computing and Networking, pp. 81–95. San Diego (2003) 15. Chen, L., Tian, B., Lin, W.L., Ji, B., Li, J.Z., Pan, H.H.: Analysis and prediction of the discharge characteristics of the lithium–ion battery based on the grey system theory. IET Power Electron. 8(12), 2361–2369 (2015)

Probability Vector Enhanced Tumbleweed Optimization Algorithm Yang-Zhi Chen1 , Ruo-Bin Wang1(B)

, Hao-Jie Shi1 , Rui-Bin Hu1 , and Lin Xu2

1 School of Information Science and Technology, North China University of Technology,

Beijing 100144, China [email protected] 2 STEM, University of South Australia, Adelaide 5095, Australia

Abstract. As optimization problems become increasingly complex, the demand for hardware capabilities has also grown. In this paper, we propose the Probability Vector Enhanced Tumbleweed Optimization Algorithm (PVE-TOA), inspired by compact Particle Swarm Optimization (cPSO) and Wind-driven Optimization Algorithm (TOA), aiming to optimize the performance and spatial utilization of the TOA algorithm. Unlike the modifications that cPSO makes to the Particle Swarm Optimization algorithm, PVE-TOA introduces modifications to enhance the performance of TOA in specific optimization scenarios while retains the ability to reduce memory consumption. Keywords: Probability Vector Enhanced Tumbleweed Optimization Algorithm · Tumbleweed Optimization Algorithm · Compact Optimization

1 Introduction There are numerous methods available to enhance the performance of intelligent algorithms. One such approach is to leverage parallel strategies, which can lead to improved optimization solutions and faster convergence rates. Notable examples of parallel strategies include parallel genetic algorithm (PGA) [1], parallel gannet optimization algorithm (PGOA) [2], and adaptive parallel arithmetic optimization algorithm (APAOA) [3]. Another effective improvement method involves using probability models to replace populations in meta-heuristic algorithms, resulting in reduced runtime memory usage. This technique has been successfully applied in algorithms like compact genetic algorithms (cGA) [4], compact co-evolutionary algorithm (CCEA) [5], and Improved compact cuckoo search algorithm (icCS) [6]. Additionally, the binary conversion of algorithms provides a valuable solution for tackling discrete problems, such as feature selection. Algorithms like improved binary symbiotic organism search algorithm (IBSOS) [7] and binary quasi-affine transformation evolution algorithm (BQUATRE) [8] demonstrate the efficacy of this approach. Moreover, combining multiple improvement methods can lead to algorithms with enhanced overall performance. Examples of such combinations include the parallel © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 118–127, 2024. https://doi.org/10.1007/978-981-99-9412-0_13

Probability Vector Enhanced Tumbleweed Optimization Algorithm

119

compact gannet optimization algorithm (PCGOA) [9] and the particle binary rafflesia optimization algorithm (PBROA) [10]. In the current complex optimization landscape, there is a growing demand for advanced algorithms. However, certain real-world applications face hardware limitations, such as cost and space constraints. To address this challenge, we propose the probability vector enhanced tumbleweed optimization algorithm (PVE-TOA), a novel optimization algorithm that utilizes probability vectors to enhance the tumbleweed optimization algorithm (TOA) [11], resulting in improved memory usage and algorithm performance. In the subsequent sections, we will briefly introduce the compact particle swarm optimization (cPSO) [12] and TOA in the Related Work section, followed by a detailed presentation of PVE-TOA. We will then analyze and compare experimental data to gain insights into its performance. Finally, the paper will conclude with a summary of findings and potential future research directions.

2 Related Work 2.1 Compact Particle Swarm Optimization cPSO is an improved version of particle swarm optimization (PSO) [13], which is a population-less optimization algorithm that utilizes a virtual population to replace the actual population of solutions. Unlike fuzzy hierarchical surrogate assists probabilistic particle swarm optimization (FHSAPPSO) [14], comprehensive learning particle swarm optimizer (CLPSO) [15] and dynamic neighborhood learning based particle swarm optimizer (DNLPSO) [16], which focus on improving algorithm speed or performance, cPSO emphasizes efficient memory utilization in the algorithm. The virtual population is encoded within a data structure known as the probability vector (PV) [4], represented by a n × 2 matrix: PV t = [μt , σ t ]

(1)

n represents the number of dimensions, μ denotes the mean values, and σ represents the standard deviation values, describing the Gaussian Probability Distribution Function. The algorithm initializes the virtual population and utilizes a sampling mechanism to handle the values of design variables. During the optimization process, the winning solution influences the mean and standard deviation values of the PV. By leveraging the PV, cPSO achieves efficient memory consumption, demonstrating the advantage of yielding improved optimization results while conserving resources. The update formulas for μ and σ in the population are as follows: μt+1 [i] = μt [i] + 1/Np(winner[i] − loser[i]) σ t+1 [i] =

 (σ t [i])2 + (μt [i])2 − (μt+1 [i])2 + 1/Np(winner[i]2 − loser[i]2 )

(2) (3)

120

Y.-Z. Chen et al.

Np represents the population size, and after comparing two individuals within the population, the one with lower fitness value is considered as the winner, while the one with lower fitness value is considered as the loser. For more details, please refer to the formula presented in [4]. 2.2 Tumbleweed Optimization Algorithm The TOA algorithm is inspired by the growth process of tumbleweeds, where the optimization process is divided into two stages: global exploration and local exploitation, analogous to the dynamic movement of adult tumbleweeds with the wind and the stationary growth of seedlings, respectively. The algorithm utilizes mathematical modeling and formula expression based on the behavior of tumbleweeds to solve optimization problems. TOA possesses multiple groups of structures, and these groups undergo changes after a certain number of iterations, similar to the process in the bamboo forest growth optimization algorithm (BFGO) [17] algorithm. Each group of subpopulations is determined by the k-means algorithm [18], representing a subregion in the search space. The TOA algorithm consists of two stages: seedling growth and seed propagation, corresponding to local exploitation and global exploration [19], respectively. During the seedling growth stage, individual fitness is computed to determine their interaction with external populations. Individuals with higher fitness values can communicate with external populations, while those with lower fitness values can only interact within their own subpopulation. In the seed propagation stage, the algorithm updates parameters to imitate the process of mature tumbleweeds spreading seeds. The TOA algorithm exhibits exceptional performance in solving numerical optimization problems, showcasing robust global search capabilities.

3 Probability Vector Enhanced Tumbleweed Optimization Algorithm 3.1 Basic Idea PVE-TOA improves the original TOA algorithm by using PV to generate random points, enhancing its exploration and exploitation capabilities. The value of σ controls the range of random points: smaller σ leads to points closer to μ, while larger σ allows for points farther away from μ. PVE-TOA utilizes this feature to aid exploration and exploitation around the optimal position.

Probability Vector Enhanced Tumbleweed Optimization Algorithm

121

The core idea is to reduce σ when μ is close to the optimum, making generated random points closer to it, leading to the discovery of even better positions. Conversely, when μ is in a less favorable position, increasing σ allows random points to explore more distant locations. However, the original μ and σ update formulas seemed unable to achieve this objective, prompting PVE to make modifications. To implement this concept in TOA with its multi-group structure, PVE-TOA replaces the traditional particle swarm storage for each group with a PV represented by μ and σ. Updating the PV now serves as updating a group. As the local development phase and global exploration phase alternate, when updating PV at each stage, it is important to consider the next stage and make appropriate adjustments to the value of σ. This ensures a smooth transition and optimal performance as the algorithm progresses between these two stages. 3.2 Seedling Growth Stage The focus of this stage is on local development, aiming to find better points in the vicinity of μ. At this point, a smaller value of σ is required to ensure that the randomly generated points are located near μ and assist in the development process. Before each group update, a particle swarm is randomly generated using PV, and the fitness values of these random particles are evaluated here, updating the local best (pbest) of the group and its fitness value (pestval). Subsequently, the TOA seed germination stage particle updating algorithm is used to update these randomly generated particle swarms. After the update is completed, a comparison is made between the fitness of pbest and μ to determine the winner and loser: [winner, loser] = compete[pbest, μ]

(4)

After comparison, μ is updated using Eq. (2) to ensure convergence to a more optimal position. Additionally, to prepare for the subsequent global exploration phase, the original Eq. (3) is used to adjust the value of σ, and the absolute value notation is added to ensure that σ remains a real number:   σ t+1 [i] = (σ t [i])2 + (μt [i])2 − (μt+1 [i])2 + 1/Np(winner[i]2 − loser[i]2 ) (5) This formula exhibits a certain level of adaptability; as all groups converge together in the later stages, it prevents significant fluctuations in σ from occurring.

122

Y.-Z. Chen et al.

3.3 Seed Propagation Stage During the global exploration phase, a relatively large value of σ should be used to ensure that the points randomly generated by PV are located far from μ, enabling them to assist in the exploration process. In this stage, a group number of points is also randomly generated using PV, and then updated following the original TOA seed germination stage updating method. After the complete update, winners and losers are determined according to Eq. (4). In preparation for the next local development phase, σ needs to be reduced during this stage. For this purpose, we have made adjustments to Eq. (3);   σ t+1 [i] = 1/Np (σ t [i])2 + (μt [i])2 − (μt+1 [i])2 + 1/Np(winner[i]2 − loser[i]2 ) (6) Exactly, with the adjustments made to Eq. (3) during the global exploration phase, σ will rapidly decrease, allowing for a smaller value of σ in the subsequent local development phase. This adaptive behavior ensures that the algorithm can efficiently switch between exploration and exploitation, improving its overall performance in finding optimal solutions. During the entire optimization process, a careful consideration of memory consumption led to significant improvements. Initially, storage for all particle states was required, but this was efficiently replaced with the need to store only a few sets of probability vectors (PVs). While runtime still demands the storage of a group of particles, the optimization becomes apparent when dealing with an appropriate number of groups. This streamlined approach considerably minimizes memory usage, making the algorithm more resource-friendly and adaptable to real-world applications with limited hardware resources.

4 Numerical Experimental Analysis In this section, we provide a detailed overview of the pseudocode used and the results obtained from our extensive testing with the well-known CEC2013 [20] benchmark. Our experiments were meticulously carried out in two distinct settings, one with 30 dimensions and the other with 40 dimensions. Each experiment consisted of 1000 iterations per run, and we repeated the process independently for 30 runs, utilizing a total of 30 populations. Throughout the experiments, we employed a unique approach, executing the seed propagation stage and seedling growth stage alternatively every 25 iterations, thus ensuring a robust evaluation of the proposed PVE-TOA algorithm. To gauge the algorithm’s performance, we made a careful comparison with the original TOA under two different conditions, specifically when k = 3 and k = 4 (Tables 1 and 2):

Probability Vector Enhanced Tumbleweed Optimization Algorithm

123

Algorithm 1. Probability Vector Enhanced Tumbleweed Optimization Algorithm Input: f(x): objective function; ps: population size; K: maximum number of group; dim: problem dimension; gc: growth cycle; Max_gen: maximum number of iterations Output: optimal gbest and optimal value f(gbest) ps_g = round(ps / K) for k = 1 to K do Initialize Group(k).μ and Group(k).σ end for for gen = 2 to Max_gen do grow_iter mod(gen, gc); for k = 1 to K do for i = 1 to ps_g do Create X(i) via Group(k).PV. Get fit(i) from X(i) end for if grow_iter < gc/2 then for i = 1 to ps_g do update X(i) by TOA’s Seedling Growth Stage end for [winner,loser] = compete[Group(k).pbest,Group(k).μ] for i = 1 to dim do update Group(k).PV using Eq.(2), Eq.(5) end for else for i = 1 to ps_g do update X(i) by TOA’s Seed Propagation Stage end for [winner,loser] = compete[Group(k).pbest,Group(k).μ] for i = 1 to dim do update Group(k).PV using Eq.(2), Eq.(6) keep Group(k). σ > 0 end for end if end for end for

PVE-TOA generally outperforms TOA in most functions, showing smaller mean values and lower standard deviations for increased stability. In some functions, PVETOA performs similarly to TOA with no significant differences in mean and standard deviation. However, in a few functions, PVE-TOA slightly lags behind TOA with minor variations. Overall, PVE-TOA demonstrates superior or comparable performance in the majority of functions, showcasing its potential for complex optimization tasks. Further research will explore enhancements in specific functions and reveal insights into PVETOA’s broader applications.

124

Y.-Z. Chen et al. Table 1. The comparison between PVE-TOA and TOA on 30D.

Function

K=3

K=4

PVE-TOA

TOA

PVE-TOA

TOA

mean

std

mean

std

mean

std

mean

std

F1

8.55E−06

3.63E−05

5.42E−01

1.45E−01

2.32E−07

4.89E−07

1.20E+00

2.86E−01

F2

8.73E+06

3.80E+06

1.89E+07

8.76E+06

9.20E+06

4.29E+06

1.66E+07

8.35E+06

F3

5.02E+08

5.79E+08

2.75E+09

3.14E+09

2.58E+08

2.46E+08

2.03E+09

2.12E+09

F4

3.60E+04

9.97E+03

6.51E+04

2.06E+04

4.37E+04

1.37E+04

7.57E+04

2.11E+04

F5

3.02E−04

2.81E−04

8.23E−01

3.20E−01

1.05E−04

9.87E−05

1.37E+00

5.07E−01

F6

3.70E+01

1.72E+01

5.64E+01

2.97E+01

4.36E+01

2.32E+01

6.62E+01

3.11E+01

F7

1.08E+02

3.87E+01

1.10E+02

3.53E+01

1.02E+02

3.66E+01

8.67E+01

3.32E+01

F8

2.10E+01

4.91E−02

2.10E+01

4.44E−02

2.10E+01

5.65E−02

2.10E+01

5.92E−02

F9

3.05E+01

2.12E+00

2.69E+01

3.66E+00

3.02E+01

2.59E+00

2.69E+01

3.37E+00

F10

4.40E+00

2.06E+00

1.65E+01

6.37E+00

3.47E+00

2.97E+00

1.82E+01

7.30E+00

F11

7.11E+01

2.43E+01

1.14E+02

5.81E+01

6.46E+01

2.68E+01

1.10E+02

4.02E+01

F12

1.33E+02

3.91E+01

1.62E+02

1.17E+02

1.52E+02

3.21E+01

1.45E+02

5.13E+01

F13

1.90E+02

4.09E+01

1.95E+02

4.14E+01

1.73E+02

3.48E+01

1.96E+02

3.13E+01

F14

4.52E+03

1.12E+03

2.36E+03

4.88E+02

5.20E+03

1.01E+03

3.41E+03

1.38E+03

F15

5.48E+03

7.18E+02

6.40E+03

1.18E+03

5.79E+03

6.12E+02

6.95E+03

4.38E+02

F16

2.39E+00

7.99E−01

3.10E+00

3.54E−01

2.29E+00

6.19E−01

3.10E+00

3.47E−01

F17

1.62E+02

4.99E+01

2.30E+02

2.02E+01

1.42E+02

4.07E+01

2.26E+02

1.51E+01

F18

2.14E+02

3.83E+01

2.51E+02

2.04E+01

2.31E+02

3.74E+01

2.44E+02

1.50E+01

F19

7.99E+00

3.63E+00

1.52E+01

1.71E+00

7.66E+00

4.11E+00

1.57E+01

1.51E+00

F20

1.35E+01

1.22E+00

1.33E+01

9.19E−01

1.30E+01

1.08E+00

1.34E+01

1.00E+00

F21

2.97E+02

9.72E+01

3.27E+02

8.34E+01

3.08E+02

8.79E+01

2.96E+02

6.79E+01

F22

4.95E+03

1.45E+03

2.70E+03

7.87E+02

5.63E+03

1.02E+03

3.35E+03

1.31E+03

F23

6.25E+03

7.54E+02

6.10E+03

1.41E+03

6.41E+03

7.16E+02

6.64E+03

6.19E+02

F24

2.84E+02

1.34E+01

2.76E+02

8.36E+00

2.84E+02

1.17E+01

2.71E+02

1.03E+01

F25

2.99E+02

1.35E+01

2.89E+02

9.67E+00

3.01E+02

1.15E+01

2.83E+02

9.14E+00

F26

3.00E+02

8.89E+01

3.45E+02

5.79E+01

3.16E+02

8.34E+01

3.45E+02

5.78E+01

F27

1.10E+03

1.05E+02

9.98E+02

9.46E+01

1.12E+03

7.52E+01

9.78E+02

1.04E+02

F28

4.65E+02

4.26E+02

4.80E+02

4.29E+02

4.18E+02

3.59E+02

3.79E+02

2.36E+02

Table 2. The comparison between PVE-TOA and TOA on 40D. Function

K=3

K=4

PVE-TOA

TOA

PVE-TOA

TOA

mean

std

Mean

std

mean

std

mean

std

F1

4.34E−01

7.99E−01

5.79E+00

1.67E+00

5.20E−02

1.60E−01

9.65E+00

2.92E+00

F2

2.49E+07

9.73E+06

4.58E+07

1.89E+07

2.23E+07

1.15E+07

5.19E+07

1.92E+07

F3

3.98E+09

2.60E+09

1.55E+10

8.21E+09

3.39E+09

2.45E+09

1.69E+10

8.87E+09

F4

7.48E+04

1.77E+04

1.29E+05

2.29E+04

7.38E+04

1.52E+04

1.42E+05

2.59E+04

F5

1.03E+00

9.13E−01

2.67E+01

1.21E+01

2.38E−01

1.96E−01

2.65E+01

1.39E+01

(continued)

Probability Vector Enhanced Tumbleweed Optimization Algorithm

125

Table 2. (continued) Function

K=3

K=4

PVE-TOA

TOA

PVE-TOA

TOA

mean

std

Mean

std

mean

std

mean

std

F6

7.50E+01

3.52E+01

1.01E+02

5.93E+01

5.72E+01

2.10E+01

8.38E+01

3.61E+01

F7

1.58E+02

3.39E+01

1.74E+02

3.31E+01

1.57E+02

3.21E+01

1.59E+02

2.96E+01

F8

2.12E+01

4.27E−02

2.12E+01

3.65E-02

2.12E+01

4.05E-02

2.12E+01

3.86E-02

F9

5.96E+01

4.30E+00

5.58E+01

5.54E+00

6.07E+01

3.97E+00

5.62E+01

5.84E+00

F10

4.73E+01

1.63E+01

1.19E+02

3.12E+01

3.43E+01

1.36E+01

1.13E+02

3.70E+01

F11

2.02E+02

5.72E+01

2.78E+02

8.66E+01

1.80E+02

5.40E+01

2.34E+02

4.73E+01

F12

3.69E+02

8.69E+01

3.33E+02

7.36E+01

3.64E+02

7.08E+01

3.70E+02

8.50E+01

F13

4.76E+02

8.34E+01

4.75E+02

5.16E+01

4.46E+02

6.35E+01

4.56E+02

3.13E+01

F14

9.97E+03

2.07E+03

5.97E+03

8.94E+02

1.09E+04

1.33E+03

6.97E+03

2.36E+03

F15

1.19E+04

7.72E+02

1.33E+04

1.31E+03

1.21E+04

1.07E+03

1.36E+04

5.27E+02

F16

2.86E+00

7.87E−01

4.14E+00

3.68E−01

3.41E+00

5.72E−01

4.09E+00

3.76E−01

F17

4.02E+02

1.14E+02

4.82E+02

3.35E+01

3.92E+02

7.51E+01

4.76E+02

2.42E+01

F18

5.21E+02

1.03E+02

5.18E+02

2.76E+01

5.08E+02

8.57E+01

4.97E+02

2.46E+01

F19

2.48E+01

8.18E+00

3.31E+01

3.12E+00

2.19E+01

7.40E+00

3.34E+01

2.48E+00

F20

2.36E+01

9.24E−01

2.32E+01

8.40E−01

2.28E+01

1.13E+00

2.28E+01

3.61E−01

F21

9.05E+02

3.09E+02

9.01E+02

3.52E+02

7.69E+02

4.19E+02

7.40E+02

4.14E+02

F22

1.10E+04

1.99E+03

6.48E+03

1.12E+03

1.18E+04

1.88E+03

8.10E+03

2.27E+03

F23

1.29E+04

1.09E+03

1.31E+04

1.93E+03

1.33E+04

8.42E+02

1.33E+04

1.46E+03

F24

3.61E+02

2.20E+01

3.55E+02

1.41E+01

3.63E+02

1.45E+01

3.40E+02

1.33E+01

F25

3.93E+02

2.51E+01

3.73E+02

1.61E+01

3.94E+02

2.07E+01

3.71E+02

1.78E+01

F26

4.00E+02

1.02E+02

4.28E+02

4.44E+01

3.95E+02

1.08E+02

4.13E+02

5.98E+01

F27

1.86E+03

1.61E+02

1.70E+03

1.04E+02

1.86E+03

1.58E+02

1.71E+03

1.68E+02

F28

1.99E+03

1.72E+03

2.00E+03

1.72E+03

1.41E+03

1.57E+03

1.90E+03

1.72E+03

5 Conclusion This paper introduces an optimization algorithm called Probability Vector Enhanced Tumbleweed Optimization Algorithm. PVE-TOA employs multiple sets of Probability Vectors to replace the entire particle swarm, thereby reducing memory consumption. However, during the iteration process, it still requires storing a group of particles. PVETOA enhances the algorithm’s exploration and exploitation capabilities by generating random points using PV. Depending on the value of σ, the generated random points can assist in exploring and exploiting the optimal positions. By modifying the update formulas for μ and σ, PVE-TOA reduces σ during the seed propagation stage to facilitate the development of better solutions. Throughout the process, memory consumption is optimized, while the algorithm’s performance is improved for most functions. The improvement approach of PVE-TOA is not limited to TOA but can also be applied to other algorithms.

126

Y.-Z. Chen et al.

Acknowledgement. This work is supported by Beijing College Students Innovation and Entrepreneurship Training Project 2023.

References 1. Abela, J., Abramson, D.: A parallel genetic algorithm for solving the school timetabling problem. Division of Information Technology, C.S.I.R.O1 (1991) 2. Su, J.-B., Wang, R.-B., Geng, F.-D., Wei, Q., Xu, L.: A parallel Gannet optimization algorithm with communication strategies (PGOA). In: Kondo, K., Chen, S.-M., Tsai, P.-W. (eds.) Advances in Intelligent Information Hiding and Multimedia Signal Processing: Proceedings of the 6th International Conference on Intelligent Computing and Optimization 2023 (ICO 2023), vol. 339, pp. 71–80. Springer, Cham (2023). https://doi.org/10.1007/978-981-99-010 5-0_7 3. Wang, R.-B., Wang, W.-F., Xu, L., Pan, J.-S., Chu, S.-C.: An adaptive parallel arithmetic optimization algorithm for robot path planning. J. Adv. Transp. 2021(8), 3606895 (2021) 4. Mininno, E., Cupertino, F., Naso, D.: Real-valued compact genetic algorithms for embedded microcontroller optimization. IEEE Trans. Evol. Comput. 12(2), 203–219 (2008). Liang, J., Qu, B., Suganthan, P., Hernández-Díaz, A.G. 5. Xue, X., Pan, J.-S.: A compact co-evolutionary algorithm for sensor ontology meta-matching. Knowl. Inf. Syst. 56(2), 335–353 (2018) 6. Pan, J.-S., Song, P.-C., Chu, S.-C., Peng, Y.-J.: Improved compact cuckoo search algorithm applied to location of drone logistics hub. Mathematics 8(3), 333 (2020) 7. Du, Z.-G., Pan, J.-S., Chu, S.-C., Chiu, Y.-J.: Improved binary symbiotic organism search algorithm with transfer functions for feature selection. IEEE Access 8, 225730–225744 (2020) 8. Liu, F.-F., Chu, S.-C., Wang, X., Pan, J.-S.: A novel binary QUasi-Affine TRansformation Evolution (QUATRE) algorithm and its application for feature selection. Smart Innov. Syst. Technol. (SIST) 268, 413 Accesses (2022). First Online: 22 February 2022 9. Pan, J.-S., Shiu, B.-J., Chuang, S.-C., Jin, M., Shieh, C.-S.: A parallel compact gannet optimization algorithm for solving engineering optimization problems. Mathematics 9(4), 439 (2021) 10. Pan, J.-S., Shi, H.-J., Chu, S.-C., Hui, P., Shehadeh, H.A.: Particle binary rafflesia optimization algorithm and its application in feature selection problem. Symmetry 15(7), 1073 (2023) 11. Pan, J.-S., Yang, Q., Shieh, C.-S., Chu, S.-C.: Tumbleweed optimization algorithm and its application in vehicle path planning in smart city. J. Internet Technol. 23(5), 1–10 (2022) 12. Neri, F., Mininno, E., Iacca, G.: Compact particle swarm optimization. Inf. Sci. 222, 1–14 (2013) 13. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, pp.1942–1948 (1995) 14. Chu, S.-C., Du, Z.-G., Peng, Y.-J., Pan, J.-S.: Fuzzy hierarchical surrogate assists probabilistic particle swarm optimization for expensive high dimensional problem. Knowl.-Based Syst. 220, 106939 (2021) 15. Liang, J.J., Qin, A.K., Suganthan, P.N., Baskar, S.: Comprehensive learning particle swarm optimizer for global optimization of multimodal functions. IEEE Trans. Evol. Comput. 10(3), 281–295 (2006) 16. Nasir, M., Das, S., Maity, D., Sengupta, S., Halder, U., Suganthan, P.N.: A dynamic neighborhood learning based particle swarm optimizer for global numerical optimization. Inf. Sci. 209, 16–36 (2012) 17. Chu, S.-C., Feng, Q., Zhao, J., Pan, J.-S.: BFGO: bamboo forest growth optimization algorithm. J. Internet Technol. 24(1), 1–12 (2023)

Probability Vector Enhanced Tumbleweed Optimization Algorithm

127

18. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a kmeans clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979) 19. Crepinsek, M., Liu, S.-H., Mernik, M.: Exploration and exploitation in evolutionary algorithms: a survey. ACM Comput. Surv. (CSUR) 45(3), 1–33 (2013) 20. Problem definitions and evaluation criteria for the CEC 2013 special session on real-parameter optimization, Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou, China and Nanyang Technological University, Singapore, Technical report 201212, January 2013

Improving BFGO with Apical Dominance-Guided Gradient Descent for Enhanced Optimization Hao-Jie Shi1 , Feng Guo1 , Yang-Zhi Chen1 , Lin Xu2

, and Ruo-Bin Wang1(B)

1 School of Information Science and Technology, North China University of Technology,

Beijing 100144, China [email protected] 2 STEM, University of South Australia, Adelaide 5095, Australia Abstract. The Bamboo Forest Growth Optimization (BFGO) algorithm is a popular meta-heuristic for diverse optimization problems. However, its performance can be unstable due to problem-specific parameters, leading to suboptimal outcomes. To enhance efficiency and overcome these limitations, we present a novel extension—BFGO with Apical Dominance-Guided Gradient Descent (BFGOADGD). BFGO-ADGD leverages the modified exploitation capabilities of BFGO to function as the exploration mechanism within the frame-work, incorporating Apical Dominance guidance and employing Gradient Descent for effective exploitation. Experimental evaluations on the CEC2017 test set and engineering problems showcase BFGO-ADGD’s superior performance over existing heuristics. This approach not only advances BFGO but also showcases the integration of biological growth principles with optimization methodologies. BFGO-ADGD holds promise for tackling challenging optimization tasks and inspiring future research in this domain. Keywords: Bamboo Forest Growth Optimization · Apical Dominance · Gradient Descent

1 Introduction Optimization lies at the heart of numerous real-world challenges, ranging from engineering design and finance to machine learning and artificial intelligence. As optimization problems become increasingly complex and diverse, the demand for effective and versatile optimization algorithms continues to grow. Meta-heuristic algorithms, inspired by natural and biological processes, have emerged as powerful tools for addressing complex optimization tasks [1]. Among these algorithms, the Bamboo Forest Growth Optimization (BFGO) algorithm has gained prominence for its unique growth-inspired approach to optimization [2]. BFGO draws inspiration from the growth patterns of bamboo forests, where the plant first establishes strong roots and then exhibits rapid and robust growth. While BFGO exhibits promising performance in various optimization scenarios, its effectiveness can be hindered by unstable performance and sensitivity to problem-specific parameters. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 128–137, 2024. https://doi.org/10.1007/978-981-99-9412-0_14

Improving BFGO with Apical Dominance-Guided Gradient

129

In response to the limitations of the original BFGO algorithm, we present an innovative extension—BFGO with Apical Dominance-Guided Gradient Descent (BFGOADGD). This improved algorithm combines the principles of Apical Dominance, observed in plant growth, and the powerful optimization technique of Gradient Descent. By synergistically integrating these features, BFGO-ADGD aims to enhance the algorithm’s adaptability, convergence speed, and overall optimization efficiency. The primary objective of this paper is to present the BFGO-ADGD algorithm as an advancement over the original BFGO, emphasizing the integration of Apical Dominance and Gradient Descent. We seek to address the challenges faced by conventional heuristic algorithms, such as instability and sensitivity to landscape, by infusing the optimization process with biologically-inspired growth principles and powerful local search techniques. The key contributions of this research can be summarized as follows: 1. Introduction of BFGO-ADGD, highlighting Apical Dominance guidance and Gradient Descent exploitation. 2. Rigorous evaluation and comparison with benchmark functions and engineering optimization problems. 3. Valuable insights into the synergistic integration of Apical Dominance and Gradient Descent in optimization algorithms, revealing the untapped potential of applying biological principles within the realm of meta-heuristic methods. The rest of this paper is structured as follows: In Sect. 2, we review related works in optimization. Section 3 presents the methodology of the BFGO-ADGD algorithm. Section 4 details the experimental setup, results, and performance analysis. Finally, Sect. 5 concludes the paper, summarizing the contributions and significance of BFGOADGD in the field of optimization.

2 Related Work 2.1 Metaheuristic Algorithms and Gradient Descent Heuristic algorithms encompass a broad class of optimization methods that draw inspiration from natural processes and problem-solving strategies. Such algorithms aim to efficiently explore the solution space, although they may not guarantee the optimal solution. Meta-heuristics, on the other hand, draw inspiration from behaviors, experiences, and rules observed in natural or production activities. These algorithms model specific optimization problems and design strategies based on natural laws. Common categories of metaheuristics include GOA [3], TOA [4], and ROA [5]. Metaheuristics offer robust exploration and exploitation capabilities, making them well-suited for tackling complex optimization landscapes. Gradient Descent is a powerful optimization technique, iteratively finding function minima or maxima by moving in the direction of steepest change. Its effectiveness in local search makes it fundamental in various optimization domains. RMSprop adapts learning rates for faster convergence and better handling of gradients. It uses an exponential moving average of squared gradients to ensure stable parameter updates during optimization.

130

H.-J. Shi et al.

In the field of optimization, methods can be broadly categorized into two groups: gradient-based (GB) methods and modern non-gradient-based methods (i.e., metaheuristic algorithms (MAs)). Gradient-based methods, such as the Levenberg-Marquardt (LM) algorithm [6], gradient descent (GD) [7], and Newton’s method [8], have been extensively utilized for solving optimization problems. These methods aim to identify extreme points by locating positions where the gradient of the objective function is zero [9]. Approaches like conjugate direction and Newton’s method are based on this principle, selecting search directions to iteratively approach the optimal solution. However, the significant challenges with gradient-based methods are their slow convergence speed and the lack of a guarantee to reach the global optimal solution [10]. The necessity to compute derivatives of the objective function and constraints further adds to the computational overhead of these methods. In contrast, modern non-gradient-based methods have gained popularity for their robustness in finding global optima. Unlike gradient-based methods, MAs initiate the optimization process with randomly generated initial points forming an initial population. Each point possesses a search direction, updated based on previous results during the iterative optimization process until the convergence criterion is satisfied. Metaheuristic algorithms demonstrate resilience in escaping local optima, which is a common limitation of gradient-based methods. However, due to their population-based nature, MAs may require higher computational resources, especially when dealing with high-dimensional search spaces. While metaheuristic algorithms (MAs) have shown great promise in solving complex optimization problems, they often face challenges in achieving a balanced trade-off between exploration and exploitation. Numerous approaches have been proposed to improve the performance of MAs [11–14], including the integration of new strategies and techniques. However, these traditional improvements may still struggle to strike the ideal balance between exploration and exploitation, limiting their overall effectiveness. Integrating Gradient Descent into meta-heuristic algorithms enhances exploitation, refining solutions around local optima [7, 15]. This empowers algorithms to converge efficiently and achieve high-quality solutions in complex landscapes. Therefore, the novel contribution of BFGO-ADGD lies in bridging the gap between gradient-based and non-gradient-based optimization methods. 2.2 Bamboo Forest Growth Optimization (BFGO) Algorithm The Bamboo Forest Growth Optimization (BFGO) algorithm is a promising metaheuristic approach inspired by the growth patterns of bamboo forests. Just like the growth stages of bamboo, the BFGO algorithm consists of two distinctive phases: the All-round Extension of The Bamboo Whip (Exploitation) and the Bamboo Growth Stage (Exploration) [2]. All-Round Extension of the Bamboo Whip (Exploitation). During this phase, the bamboo whip undergoes significant growth and expansion. The meristem of the whip section divides, and the underground stem lengthens and expands its territory. Strong buds on the bamboo whip germinate and differentiate into bamboo shoots, while other

Improving BFGO with Apical Dominance-Guided Gradient

131

buds grow sideways to form new underground stems. The root system extends in various directions, including the direction of the group cognition item, the bamboo whip memory item, and the central item of the bamboo forest [2]. To guide the exploitation, the BFGO algorithm employs three directional terms represented as α, β, and γ. The algorithm updates the individual’s position using the following Eq. (1) [2]: ⎧ ⎪ ⎨ X _G + Q ∗ (c1 ∗ X _G − Xt ) ∗ cos α, r1 < 0.4 (a) (1) Xt+1 X _P(k) + Q ∗ (c1 ∗ X _P(k) − Xt ) ∗ cos β, 0.4 ≤ r1 < 0.7 (b) ⎪ ⎩ C(k) + Q ∗ (c1 ∗ C(k) − Xt ) ∗ cos γ , else (c) cos α =

Xt · X _G |Xt | × |X _G|

(2)

cos β =

Xt · X _P(k) |Xt | × |X _G|

(3)

cos γ =

Xt · C(k) |Xt | × |X _G|

(4)

t T

(5)

Q =2−

Here, X_G represents the global optimal individual, X_P(k) represents the optimal individual on the k-th bamboo whip, and C(k) represents the center point of the k-th bamboo whip. The directional terms α, β, and γ correspond to the directions of the current individual with respect to the group cognition term, the bamboo whip memory term, and the bamboo forest center term [2]. The parameter Q decreases from 2 to 1 to balance exploration and exploitation, while c1 is a random number between 0 and 2. Bamboo Growth Stage (Exploration). In this stage, the BFGO algorithm carries out selection, similar to the bamboo shoots’ selection process in natural bamboo growth. Only a small portion of unearthed bamboo shoots have the potential to grow into mature bamboo, while others fend for themselves. The bamboo shoots with growth potential gain sufficient energy to experience rapid growth. The update formula used in the Exploration phase is as follows [2]:  Xt + X _D × H (6) Xtemp = Xt − X _D × H    Xt − C(k) + 1   (7) X _D = 1 −  X _G − C(k) + 1  H =

q(t) − q(t − 1) X _G − Xt

(8)

b

q(t) = X _G × e−d × e ϕ×t ϕ

(9)

Here, X_D represents the relationship between the distance from an individual to the center position and the distance between the optimal individual of the group and the

132

H.-J. Shi et al.

center position. The parameter H captures the difference between the two iterations of growth, while q denotes the cumulative growth of the t-th generation. The parameter d takes values in the range (−1, 1). Both b and f are akin to the site conditions of bamboo, influencing the exploration behavior of the algorithm.

3 Methodology The Methodology of BFGO-ADGD revolves around the seamless integration of Apical Dominance and Gradient Descent principles to achieve a balanced and efficient optimization process. 3.1 Apical Dominance in BFGO-ADGD Apical Dominance, a concept inspired by the natural phenomenon observed in plant growth [16], forms the cornerstone of BFGO-ADGD’s innovative exploitation strategy. In the biological world, the main stem of a plant exerts dominance over lateral branches, redirecting resources towards its own growth while suppressing the growth of neighboring branches. This principle of Apical Dominance serves as a powerful guiding force in BFGO-ADGD’s exploitation phase. The strategy of Apical Dominance in BFGO-ADGD dynamically adapts to changes in the current global best solution. As the optimization process progresses and a new best solution is found, the dominant central region of exploration is updated accordingly. By continually prioritizing the current best solution as the central focus, BFGO-ADGD intelligently allocates computational resources to the areas that hold the most promise for further improvement. This dynamic nature of Apical Dominance ensures that the algorithm remains responsive to evolving landscapes in the solution space. As the optimization progresses and better solutions are discovered, the dominant region moves to embrace these new promising regions. Consequently, BFGO-ADGD concentrates its exploitation efforts where they are most needed, effectively optimizing the allocation of computational resources. The exploitation of Apical Dominance ensures that the algorithm can effectively explore diverse regions of the solution space, while maintaining a clear focus on areas that demonstrate promise akin to the main stem’s growth in bamboo forests. This guidance enhances the algorithm’s adaptability to complex landscapes, making it well-suited for solving optimization problems with multifaceted and challenging solution spaces. By incorporating Apical Dominance into the exploitation phase, BFGO-ADGD strikes a balance between global exploration and local refinement. It effectively leverages the strengths of both gradient-based and population-based optimization methods. The strategic exploitation driven by Apical Dominance complements the local refinement capabilities provided by Gradient Descent with RMSprop, leading to an optimization process that is not only robust but also efficient.

Improving BFGO with Apical Dominance-Guided Gradient

133

In BFGO-ADGD, the algorithm strategically prioritizes the current global best solution as the dominant central region, similar to how the main stem of a plant holds central prominence in its growth. This prioritization serves as a focal point for the exploitation process, directing computational efforts towards promising regions of the solution space. 3.2 Gradient Descent in BFGO-ADGD Gradient Descent constitutes a crucial aspect of BFGO-ADGD’s exploitation phase. This classical optimization technique enables the algorithm to refine solutions with precision around local optima. As BFGO-ADGD converges towards promising regions guided by Apical Dominance, Gradient Descent complements the exploration process by intensively exploiting these regions for further refinement. To enhance the effectiveness of Gradient Descent in the algorithm, BFGO-ADGD incorporates the Root Mean Square Propagation (RMSprop) optimizer. RMSprop adapts the learning rate for each parameter during the optimization process, facilitating faster convergence and improved handling of varying gradients. By employing an exponential moving average of squared gradients, RMSprop normalizes the parameter updates, reducing oscillations and promoting stability during the exploitation phase. 3.3 BFGO with Apical Dominance-Guided Gradient Descent While the foundation of BFGO remains intact, BFGO-ADGD introduces significant modifications and enhancements to improve overall performance. BFGO-ADGD’s flipped strategy diversifies the search to overcome local optima. It swaps positions between top-performing bamboo shoots (top 20%) and lowerperforming bamboo roots (bottom 40%) within each bamboo whip group. The algorithm selects random elite individuals and generates new positions for bamboo shoots using a weighted combination of these elite positions. For bamboo roots, new positions are generated by combining positions of randomly chosen elite individuals from the topperforming ones. This random exchange encourages exploration and novel solutions, enabling escape from local optima. To capitalize on the metaheuristic’s capacity for exploring diverse regions and evading local optima, we introduce a refined version of the direction update Eq. (1), denoted as Eq. (10):  X _P(k) + Q ∗ (c1 ∗ X _P(k) − Xt ) ∗ cos β, r1 < 0.5 (a) (10) Xt+1 C(k) + Q ∗ (c1 ∗ C(k) − Xt ) ∗ cos γ , else (b)

134

H.-J. Shi et al.

In BFGO-ADGD, the core innovation lies in the replacement of BFGO’s exploitation stage with the Apical Dominance Guided Descent (ADGD) strategy. After the exploration phase, ADGD takes charge of further exploiting the current global best solution to refine it towards high-quality local optima. Upon completion of the exploration phase, BFGO-ADGD identifies the current global best solution as the dominant central region, akin to the main stem in bamboo forests. ADGD then conducts multiple iterations, denoted by m, using the Gradient Descent approach with RMSprop optimization to update the current global best solution. The application of RMSprop during these m iterations enhances the algorithm’s ability to handle varying gradients and adaptively adjust the learning rate for efficient convergence. This ensures that the exploitation process is carefully and precisely focused around the promising regions discovered during the exploration phase. Following the m iterations of ADGD, the algorithm updates the historical global best solution based on the results obtained. This continual improvement of the historical best solution enables BFGOADGD to maintain a record of the most promising solutions encountered throughout the optimization process. BFGO-ADGD harmoniously combines the strengths of Gradient Descent and Apical Dominance guidance, synergistically enhancing its optimization capabilities. The exploitation prowess of Gradient Descent empowers the algorithm to navigate efficiently towards the feasible regions of the search space. Coupled with RMSprop’s adaptive learning rate mechanism, BFGO-ADGD ensures steady progress towards high-quality local optima while gracefully avoiding premature convergence to suboptimal solutions. Simultaneously, Apical Dominance strategically directs exploration efforts towards promising regions of the solution space. This integration allows BFGO-ADGD to make wellinformed decisions about resource allocation during the optimization process, striking a harmonious balance between robust global exploration and efficient local exploitation.

4 Experiment and Analysis In this section, we evaluate the optimization performance of the BFGO-ADGD algorithm using the widely adopted CEC2017 benchmark [17]. There are four types of functions, namely unimodal function, multimodal function, hybrid function, and composite function. To ensure robustness, we conduct 30 independent runs of each algorithm on each benchmark function, with each dimension set to 10. The pseudo code of the BFGO-ADGD algorithm is shown in Algorithm 1.

Improving BFGO with Apical Dominance-Guided Gradient

135

Algorithm 1: BFGO-ADGD Input: N: population size; D: problem dimension; T: maximum number of iterations Output: optimal GB_position and optimal value GB_fitness Initialization. for t < T do Divide the population of bamboo according to sorted fitness into K group if t>10 && GB_fitness(t-1) == GB_fitness(t-3) then Do Flipped Strategy end for k=1:K do if r1 < 0.5 then Update Bamboo Position{k} via Equation (10a) else Update Bamboo Position{k} via Equation (10b) end Calculate Bamboo Fitness{k} value of Bamboo Position{k} Update pp{k}, Gp, based on Bamboo Fitness{k} end for for i=1:m do Update Gp via RMSprop Calculate fitness value of Gp end for Update GB_ position and based on Gf Update one of Bamboo Position based on GB_position end for

The obtained results are compared against those of typical algorithms, including BFGO, PSO [18], SSA [19] and GWO [20]. We measure the average optimal values (mean) and corresponding standard deviations (std). The comparison results are presented in Table 1, where the best values are highlighted in bold. As presented in Table 1, the results demonstrate a notable performance improvement of BFGO-ADGD over its predecessor, BFGO. BFGO-ADGD consistently achieves nearly optimal solutions for all four types of benchmark functions, outperforming the other three algorithms in the comparison. The robustness of BFGO-ADGD is evident from the 30 independent runs conducted on each benchmark function. The algorithm consistently exhibits superior performance, reflected by the narrow standard deviations (std) across the runs. The success of BFGO-ADGD can be attributed to its ability to strike a fine balance between exploration and exploitation. The seamless integration of Apical Dominance and Gradient Descent principles empowers the algorithm with efficient exploration, focusing on promising regions of the solution space. Concurrently, the incorporation of RMSprop within Gradient Descent facilitates local refinement around high-quality local optima. This synergistic interplay between Apical Dominance and Gradient Descent enables BFGO-ADGD to adapt dynamically to the complexity of the optimization landscape. As a result, the algorithm demonstrates an exceptional ability to efficiently explore diverse regions while effectively refining solutions towards optimal solutions.

136

H.-J. Shi et al. Table 1. Comparison of optimization results for unimodal benchmark functions

Function

BFGO-ADGD

BFGO

mean

std

mean

std

PSO mean

std

SSA mean

std

GWO mean

std

F1

9.72E+02

1.05E+03

6.60E+05

6.53E+05

1.23E+08

3.69E+08

3.56E+03

3.82E+03

3.41E+06

1.02E+07

F2

2.00E+02

4.83E-03

7.02E+03

7.09E+03

8.69E+07

4.21E+08

2.45E+03

3.31E+03

1.39E+05

2.46E+05

F3

3.03E+02

2.79E+00

3.20E+02

1.85E+01

3.00E+02

2.17E-09

3.00E+02

7.93E-10

9.95E+02

1.70E+03

F4

4.00E+02

7.48E-01

4.15E+02

2.34E+01

4.23E+02

4.36E+01

4.09E+02

1.55E+01

4.11E+02

9.44E+00

F5

5.09E+02

3.43E+00

5.22E+02

1.26E+01

5.26E+02

1.02E+01

5.21E+02

1.05E+01

5.15E+02

8.45E+00

F6

6.02E+02

1.33E+00

6.06E+02

5.60E+00

6.07E+02

5.25E+00

6.08E+02

1.02E+01

6.01E+02

8.65E-01

F7

7.21E+02

5.16E+00

7.24E+02

7.45E+00

7.36E+02

9.54E+00

7.35E+02

1.31E+01

7.29E+02

8.77E+00

F8

8.11E+02

4.24E+00

8.16E+02

6.87E+00

8.20E+02

6.72E+00

8.25E+02

1.01E+01

8.13E+02

5.47E+00

F9

9.04E+02

4.54E+00

9.17E+02

3.99E+01

9.22E+02

3.61E+01

9.13E+02

2.66E+01

9.06E+02

1.44E+01

F10

1.39E+03

2.21E+02

1.86E+03

2.98E+02

1.89E+03

2.63E+02

1.88E+03

2.54E+02

1.52E+03

2.73E+02

F11

1.12E+03

1.13E+01

1.18E+03

5.38E+01

1.18E+03

6.82E+01

1.20E+03

8.99E+01

1.12E+03

1.05E+01

F12

1.22E+04

8.66E+03

3.34E+05

6.87E+05

1.30E+06

2.99E+06

1.56E+06

1.79E+06

6.66E+05

7.92E+05

F13

3.20E+03

2.25E+03

1.39E+04

1.18E+04

3.85E+03

6.64E+03

1.45E+04

1.09E+04

1.19E+04

8.52E+03

F14

1.43E+03

1.27E+01

1.47E+03

3.84E+01

1.46E+03

3.77E+01

1.58E+03

2.31E+02

2.79E+03

1.72E+03

F15

1.58E+03

1.08E+02

1.85E+03

3.81E+02

1.65E+03

1.88E+02

3.29E+03

2.45E+03

2.99E+03

1.47E+03

F16

1.63E+03

4.94E+01

1.73E+03

1.02E+02

1.73E+03

1.23E+02

1.73E+03

1.32E+02

1.69E+03

9.28E+01

F17

1.74E+03

1.43E+01

1.75E+03

2.76E+01

1.77E+03

4.86E+01

1.77E+03

2.28E+01

1.75E+03

2.79E+01

F18

3.97E+03

2.38E+03

2.88E+04

3.24E+04

2.34E+04

1.99E+04

2.38E+04

1.64E+04

2.90E+04

1.53E+04

F19

2.05E+03

3.05E+02

2.07E+03

2.55E+02

3.00E+03

5.65E+03

4.98E+03

5.56E+03

7.64E+03

5.99E+03

F20

2.04E+03

1.46E+01

2.09E+03

5.61E+01

2.06E+03

5.14E+01

2.11E+03

6.05E+01

2.06E+03

4.75E+01

F21

2.20E+03

1.33E+00

2.24E+03

5.45E+01

2.29E+03

5.94E+01

2.27E+03

6.19E+01

2.30E+03

3.93E+01

F22

2.30E+03

1.69E+01

2.30E+03

2.15E+01

2.32E+03

2.86E+01

2.29E+03

2.88E+01

2.30E+03

2.39E+01

F23

2.61E+03

4.47E+00

2.63E+03

1.31E+01

2.63E+03

1.38E+01

2.62E+03

7.32E+00

2.62E+03

8.65E+00

F24

2.61E+03

1.20E+02

2.72E+03

1.02E+02

2.76E+03

5.26E+01

2.73E+03

6.38E+01

2.73E+03

6.17E+01

F25

2.92E+03

2.44E+01

2.93E+03

2.43E+01

2.94E+03

3.75E+01

2.93E+03

2.45E+01

2.94E+03

1.24E+01

F26

2.92E+03

5.38E+01

2.98E+03

7.65E+01

3.03E+03

8.14E+01

2.91E+03

2.42E+01

3.01E+03

2.94E+02

F27

3.09E+03

2.83E+00

3.08E+03

8.49E+00

3.11E+03

1.95E+01

3.09E+03

2.72E+00

3.10E+03

1.67E+01

F28

3.13E+03

6.33E+01

3.28E+03

2.31E+01

3.39E+03

1.16E+02

3.23E+03

1.10E+02

3.36E+03

8.40E+01

F29

3.16E+03

1.53E+01

3.24E+03

5.10E+01

3.22E+03

6.04E+01

3.20E+03

3.99E+01

3.19E+03

5.73E+01

F30

7.22E+03

6.83E+03

1.01E+04

1.60E+04

8.41E+05

1.03E+06

3.96E+05

5.75E+05

5.32E+05

6.73E+05

5 Conclusion In this paper, we introduced BFGO-ADGD, a novel optimization algorithm integrating Apical Dominance and Gradient Descent. Our extensive experiments on the CEC2017 benchmark showcased the algorithm’s superior performance over BFGO, PSO, SSA, and GWO. BFGO-ADGD strikes an efficient balance between exploration and exploitation, efficiently converging to high-quality solutions. The algorithm’s adaptability, robustness, and versatility make it a valuable addition to the field of heuristic optimization, promising real-world applications in diverse domains. Acknowledgement. This work is supported by Beijing College Students Innovation and Entrepreneurship Training Project 2023.

Improving BFGO with Apical Dominance-Guided Gradient

137

References 1. Lukasiewycz, M., Glaß, M., Reimann, F., Teich, J.: Opt4J: a modular framework for metaheuristic optimization. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, Boston, MA, USA, pp. 1723–1730 (2011) 2. Chu, S.-C., Feng, Q., Zhao, J., Pan, J.-S.: BFGO: bamboo forest growth optimization algorithm. J. Internet Technol. 24(1), 1–10 (2023) 3. Pan, J.-S., Wang, R.-B., Chu, S.-C.: Gannet optimization algorithm: a new meta-heuristic algorithm for solving engineering optimization problems. Math. Comput. Simul. 202, 343– 373 (2022) 4. Pan, J.S., Yang, Q., Shieh, C.S., Chu, S.C.: Tumbleweed optimization algorithm and its application in vehicle path planning in smart city. J. Internet Technol. 23, 927–945 (2022) 5. Pan, J.S., Fu, Z., Hu, C.C., Tsai, P.W., Chu, S.C.: Rafflesia optimization algorithm applied in the logistics distribution centers location problem. J. Internet Technol. 23, 1541–1555 (2022) 6. Moré, J.J.: The Levenberg-Marquardt algorithm: implementation and theory. In: Numerical Analysis, pp. 105–116. Springer, Cham (1978). https://doi.org/10.1007/BFb0067700 7. Madgwick, S.O., Harrison, A.J., Vaidyanathan, R.: Estimation of IMU and MARG orientation using a gradient descent algorithm. In: 2011 IEEE International Conference on Rehabilitation Robotics, pp. 1–7. IEEE (2011) 8. Ypma, T.J.: Historical development of the Newton-Raphson method. SIAM Rev. 37, 531–551 (1995) 9. Shahidi, N., Esmaeilzadeh, H., Abdollahi, M., Ebrahimi, E., Lucas, C.: Self-adaptive memetic algorithm: an adaptive conjugate gradient approach. In: IEEE Conference on Cybernetics and Intelligent Systems, pp. 6–11. IEEE (2004) 10. Salajegheh, F., Salajegheh, E.: PSOG: enhanced particle swarm optimization by a unit vector of first and second order gradient directions. Swarm Evol. Comput. 46, 28–51 (2019) 11. Kong, L., Pan, J.-S., Tsai, P.-W., Vaclav, S., Ho, J.-H.: A balanced power consumption algorithm based on enhanced parallel cat swarm optimization for wireless sensor network. Int. J. Distrib. Sens. Netw. 11, 729680:1–729680:10 (2015) 12. Wang, R.B., Wang, W.F., Xu, L., et al.: Improved DV-Hop based on parallel and compact whale optimization algorithm for localization in wireless sensor networks. Wirel. Netw. (2022). Advance Online Publication. https://doi.org/10.1007/s11276-022-03048-z 13. Du, Z.-G., Pan, J.-S., Chu, S.-C., Chiu, Y.-J.: Improved binary symbiotic organism search algorithm with transfer functions for feature selection. IEEE Access 8, 225730–225744 (2020) 14. Wang, R.B., Wang, W.F., Xu, L., et al.: An adaptive parallel arithmetic optimization algorithm for robot path planning. J. Adv. Transp. 2021(Pt.8), 1–22 (2021) 15. Ahmadianfar, I., Bozorg-Haddad, O., Chu, X.: Gradient-based optimizer: a new metaheuristic optimization algorithm. Inf. Sci. (2020). https://doi.org/10.1016/j.ins.2020.06.037 16. Cline, M.G.: Apical dominance. Bot. Rev. 57, 318–358 (1991) 17. Mohamed, A.W., Hadi, A.A., Fattouh, A.M., Jambi, K.M.: LSHADE with semi-parameter adaptation hybrid with CMA-ES for solving CEC 2017 benchmark problems. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 145–152 (2017). https://doi.org/10.1109/ CEC.2017.7969307 18. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95International Conference on Neural Networks, pp. 1942–1948. IEEE (1995) 19. Mirjalili, S., Gandomi, A.H., Mirjalili, S.Z., Saremi, S., Faris, H., Mirjalili, S.M.: Salp Swarm Algorithm: a bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 114, 163–191 (2017) 20. Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary grey wolf optimization approaches for feature selection. Neurocomputing 172, 371–381 (2016)

Research on Innovative Social Service Training Mode in Higher Vocational Colleges Under the New Situation Wei Zong and Dawei Luo(B) Changzhou College of Information Technology, Changzhou 213000, China [email protected]

Abstract. In the new era, vocational colleges are devoted to internal development and high-quality growth, exploring innovative models of social service training. This is an effective measure to enhance the social recognition and influence of vocational colleges, build a bridge between vocational education and society, and promote the innovative development of vocational education. Therefore, vocational colleges should clarify the basic situation of the social service training model, including the areas that require innovation and the existing issues. Based on this, strategies such as establishing social training organizations, building high-quality training teams, innovating social service training models, adopting cutting-edge information technology training methods, promoting social service training through innovative government policies and regulations, and strengthening collaborative training between educational institutions and enterprises should be implemented to enhance the overall quality of social service training in vocational colleges and cultivate exceptional talents to serve the socio-economic development. There are three contributions in this research. We focus on the analysis on the social service modes which are suitable for vocational colleges in China. Based on that, we propose a few innovative strategies to strengthen the cooperation among vocational colleges, companies and government. Besides that, a specific college is sampled with its performance on the social service to provide more data and insight into the main topic in our research. Keywords: vocational colleges

1

· social service training model

Introduction

In the context of rapid socio-economic development, China has become the world’s second-largest economy. The importance of training within organizations has significantly increased, shifting from a “substitute” role to a “mainstream” role and from “amateur” to “professional.” Training high-quality talents has become a key means to enhance core competitiveness and drive organizational development. How to effectively carry out talent training has become a top priority and a new challenge for vocational colleges in China. Especially after the c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 138–145, 2024. https://doi.org/10.1007/978-981-99-9412-0_15

Research on ISSTM in HV Colleges

139

State Council officially issued the “Implementation Plan for National Vocational Education Reform,” placing higher vocational education in a more important position in economic and social development as well as educational reform and innovation [1]. In this context, vocational colleges should establish the concept of social service, identify the positioning and entry points for social service training, clarify the basic content of social service training, and construct a distinctive vocational-oriented social service training model to facilitate the transformation and development of vocational colleges.

2 2.1

The Connotation and Significance of Social Service in Vocational Colleges The Connotation

As an important component of higher education, vocational colleges share the same functions as general universities, including teaching, research, and social service. However, vocational colleges are committed to the path of technological development and are classified as skill-based higher education institutions. They have differentiated positioning, and different service content and methods compared to general universities in terms of talent cultivation, scientific and technological development, and contribution to socio-economic progress. Vocational colleges, in comparison, exhibit more direct and tangible forms of social service. Vocational colleges, as skill-based higher education institutions, play a crucial role in serving society and the economy. They should focus on social and market demands, innovate social service and training models, and contribute to the cultivation of technical and skilled professionals. By providing high-quality technical services, actively participating in industry development and innovation, and addressing industry-specific challenges, vocational colleges can meet the needs of societal development and effectively contribute to the overall advancement of the vocational education sector. 2.2

The Significance

The participation of vocational colleges in social service training is of significant importance to their own development and to society as a whole. Firstly, it helps enhance the social recognition and influence of vocational colleges. Secondly, it helps build bridges of communication between vocational colleges and society [2]. Lastly, it contributes to the innovation and development of vocational education. Therefore, the active engagement of vocational colleges in social service training benefits the colleges themselves and serves the larger society. It enhances their social recognition, strengthens their connection with society, and fosters the innovation and development of vocational education, ultimately contributing to the overall advancement of the vocational education sector.

140

3 3.1

W. Zong and D. Luo

Analysis of Social Service Training Model in Higher Vocational Colleges Content of Social Service Training Model

The traditional social service training model in higher vocational colleges is primarily focused on addressing the issue of knowledge obsolescence. It leverages the significant advantage of disciplinary knowledge in higher vocational colleges to impart relevant market theories and methods to trainees, thus solving practical problems in social development. In this training model, teachers typically combine social demands with classroom teaching to provide professional knowledge and leverage the social service function of higher vocational colleges. Regarding the basic content of social service training, it mainly covers two aspects. Firstly, there is training in professional ethics and corporate culture. Professional ethics training aims to stimulate employees’ qualities of dedication and professionalism, enhancing their awareness and abilities to serve their respective jobs. Corporate culture training helps improve organizational cohesion by using corporate values as guidance, fostering employees’ sense of identity and belonging, and boosting their work motivation and enthusiasm. Secondly, there is a focus on strengthening vocational skills training. Social service training activities organized by higher vocational colleges should include training in job skills and emerging skills. Moreover, vocational skills training allows employees to understand industry trends and make contributions to the productivity of the company through their own learning and development. 3.2

Issues in the Social Service Training Model

In recent years, as higher vocational colleges undergo transformation and development, reshaping educational concepts and models, the overall management and educational outcomes have significantly improved. However, there are still several issues in the social service training model, resulting in less than ideal training effectiveness. The main problems are as follows: Firstly, there is a lack of a well-established mechanism for social service training. As a result, social service training lacks planning and standardization, and the establishment of a comprehensive framework is incomplete [3]. Secondly, the development of social service training concepts lags behind. Some higher vocational colleges even adopt a formalized social service training model, primarily focusing on completing tasks assigned by higher education authorities, excessively emphasizing the economic benefits derived from their involvement in social service, while neglecting the comprehensive development of trainees’ qualities and abilities, leading to limited social benefits [4]. Thirdly, there is a shortage of high-quality training faculty. Consequently, they struggle to provide trainees with comprehensive training content, hampering the fulfillment of the social service function of higher vocational colleges. Fourthly, as shown in Table 1 and 2, where the data were collected by our research team, the majority of training participants in higher vocational colleges

Research on ISSTM in HV Colleges

141

Table 1. Number of Trainees in Social Training Enrollment of a Dual High-level College in the Past Five Years (People) Year PW training Sponsor’s input Self-financing Gov. support Total 2018 5603

4481



822

10906

2019 4273

2943

7745

1798

16750

2020 48301

385

2183

1768

52637

2021 41706

358

2474

1215

45753

2022 755171

212



1316

756699

Table 2. Amount of income received in Social Training Enrollment of a Dual High-level College in the Past Five Years (Yuan) Year PW training Sponsor’s input Self-financing Gov. support Total 2018 0

3509698

0

2861691

6371389

2019 0

3126143

1605185

3496048

8227376

2020 0

680185

676300

3740900

5097385

2021 0

1438680

1568680

3859000

6866360

2022 0

269138

0

3083468

3352607

are engaged in public welfare training. The proportion of training income from organizers has been decreasing over the years, indicating a growing reliance on government funding for training. This signifies a low capacity for self-generated training revenue and a lack of attractiveness in the social training market.

4 4.1

Strategies for Innovating the Social Service Training Model in Higher Vocational Colleges Establishing New Social Training Organizational Structures

Building well-established social training organizations and departments is a fundamental prerequisite and solid guarantee for achieving ideal outcomes in social service training in higher vocational colleges [5]. On one hand, it is essential to establish independent management departments such as Continuing Education Institutes. On the other hand, leveraging the main role of secondary colleges in higher vocational colleges, establishing dedicated training centers and utilizing the professional characteristics and advantages of secondary colleges can enhance communication and exchange with industries and enterprises, thereby strengthening the social service training capabilities of secondary colleges. Higher vocational colleges should establish specialized training centers with dedicated personnel responsible for training management [6].

142

4.2

W. Zong and D. Luo

Building a High-Quality Training Faculty

The comprehensive qualities and professional competence of training teachers are key factors influencing the quality of social service training in higher vocational colleges. Given the new circumstances, higher vocational colleges should attach great importance to the construction of the faculty for social service training, building a high-quality and competent training team that ensures teachers are knowledgeable about social development trends, industry advancements, and possess strong training capabilities. In the process of building the training faculty, firstly, it is important to establish clear standards and requirements for the professional ethics and demeanor of training teachers. Secondly, innovate conventional training methods by sending training teachers from higher vocational colleges to participate in learning at industry and enterprise frontline positions on a regular basis. Lastly, establish evaluation standards for dual-qualified teachers. By combining external recruitment with internal training, comprehensively enhance the comprehensive qualities and abilities of training teachers. 4.3

Innovating the Social Service Training Model

Under the new situation, vocational colleges need to establish a people-oriented educational philosophy in social service training, emphasizing the flexibility and diversity of training methods, innovating traditional training approaches and models, and focusing on cultivating the practical skills of trainees (see Fig. 1). Firstly, social service training in vocational colleges generally targets adult individuals engaged in social work. This group has distinct characteristics and is influenced by their life and work, often demonstrating rich practical experience and strong comprehension abilities. However, they are also constrained by practical factors and may have poor knowledge retention and mastery abilities. Moreover, their learning time is relatively fragmented. Therefore, it is necessary to consider the characteristics of trainees and select training content that is targeted and reasonable. Flexible arrangements should be made for training time to meet the diverse learning demands of trainees. Secondly, innovative teaching methods should be employed. In classroom teaching, practical problems should serve as guides, and a variety of teaching methods, such as case analysis and interactive teaching, should be incorporated based on the selected training content. This will stimulate the enthusiasm and pro-activeness of trainees in their learning and reinforce the effectiveness of training through positive interactions. Thirdly, the relevance of social service training should be enhanced by establishing a demand-oriented social service model and innovating service mechanisms. Since social service training in vocational colleges mainly targets industry and corporate employees who exhibit significant individual differences, it is important for teachers to strengthen communication and interaction with employees, fully understand their educational ideas and needs, and create a favorable training and educational environment. By doing so, the effectiveness

Research on ISSTM in HV Colleges

143

Fig. 1. Integrated resource, training for research promotion, and multidimensional collaborative training model

and efficacy of social service training can be effectively improved. It is essential to explicitly position the development goals of regional vocational education centers as improving the aggregation of high-quality vocational education resources, promoting educational reform and opening up, and demonstrating the radiating functions of vocational education. 4.4

Introduction of New Generation Information Technology Training Methods

In the context of the Internet information era, modern information technology is widely used in various fields, providing new opportunities for innovative educational training methods. Therefore, in the process of conducting social service training, vocational colleges need to introduce modern educational training methods and approaches that span the entire process and various stages of employee training. For example, online live streaming platforms, multimedia information technology, simulation technology, as well as common online teaching methods like micro-courses and massive open online courses (MOOCs), can transform complex knowledge and skills training into vivid videos, audios, and other materials, thereby enhancing trainees’ receptiveness to knowledge and skills. Particularly, the application of digital twin technology can create a teaching environment consistent with the real working environment of enterprises, providing a stronger sense of experience for corporate employees and shortening the connection between training and actual work.

144

W. Zong and D. Luo

In addition to these, utilizing 5G technology supports the sharing of training resources and equipment, cloud computing enables fine-grained operation of training management modes and processes, big data enables precise customization of training needs and business, and the Internet of Things facilitates the integration of diverse resources and faculty [7]. These new generation information technology methods can continuously empower and enhance social service training, gradually improving the comprehensive qualities and abilities of trainees. 4.5

Strengthening School-Enterprise Collaborative Training

School-enterprise collaborative training is an expansion and complement to social service training in vocational colleges. By leveraging the resources and technological advantages of enterprises, vocational colleges can closely integrate social service training with the market and enterprises, integrate and effectively utilize multiple educational resources and enterprise technologies, and provide trainees with high-quality educational resources and training environments. For example, in organizing social service training activities, relying on school-enterprise cooperation to understand the internal employment standards and needs of enterprises, and developing talent training plans that match them, can significantly improve the efficiency and quality of training. It also allows trainees to gain a deep understanding of enterprise characteristics, corporate culture, business models, etc., enabling them to unleash their inherent potential and value in their work in enterprises, thus contributing to the development of enterprises through vocational college social service training. In addition, establishing social training demonstration bases for vocational colleges to conduct social service training activities can enhance the influence of vocational college social service training [8]. During the construction of demonstration bases, based on the characteristics of local economic development, organizing distinctive human resource training and development activities, closely connecting with enterprise business development strategies, and improving the social talent cultivation system, aim to cultivate high-quality skilled and innovative talents for regional economic development.

5

Conclusion

In conclusion, achieving the grand goals of national rejuvenation and development requires the joint efforts and dedication of professionals from all industries, especially outstanding technical and skilled workers who play an indispensable role. It is essential to increase society’s recognition and appreciation of ordinary workers, enhance the focus on continuing education, and cultivate a mindset of lifelong learning to supply high-quality talents for sustainable social development. This research focus on the analysis on the social service modes which are suitable for vocational colleges in China. Based on that, we propose a few innovative strategies to strengthen the cooperation among vocational colleges, companies and government. Besides that, a specific college is sampled with its

Research on ISSTM in HV Colleges

145

performance on the social service to provide more data and insight into the main topic in our research. As a dual platform for academic education and vocational training, higher vocational education needs to fully recognize its social service function, leverage its distinctive resources and advantages, and promote innovation in the social service training model while implementing academic education. This will cultivate high-quality professionals with both moral integrity and technical skills, open windows for social continuing education and lifelong learning, and enable higher vocational education to contribute to social construction and development.

References 1. Wang, Z.: Strategy for the High-Quality development of vocational colleges’ social services under the background of Double High Initiative. China Adult Educ. 19, 3 (2020) 2. Ding, X.: The realistic dilemma and concrete path of improving teachers’ social service ability in higher vocational colleges. Educ. Career 13, 5 (2021) 3. Luo, J., Jiang, W.L., Gao, S.: Innovation of social training mechanism in higher vocational colleges under the background of Article 20 of Vocational Education. J. Higher Educ. 8, 34–37 (2022) 4. Chen, X.: Optimization of social training mechanism in higher vocational colleges under the background of Article 20 of Vocational Education. China Adult Educ. 24, 5 (2022) 5. Li, S.: Analysis of the current situation and quality improvement strategies of social training undertaken by higher vocational colleges. China Train. (2020) 6. Huang, L.Q.: Analysis on the construction of social service ability in higher vocational colleges. Educ. Career 24, 5 (2022) 7. Hu, G.H.: The dilemma and path of improving quality and improving excellence in social training in vocational colleges. Vocational Educ. Res. 2, 5 (2023) 8. Zhang, M.: Current situation analysis of social service function in higher vocational colleges and ways to improve it. Cont. Educ. Res. 1, 3 (2020)

Research on the Precise Teaching Path of Higher Vocational Colleges Under the Concept of OBE in the Digital Era Juan Luo(B) School of Digital Economics, Changzhou College of Information Technology, Changzhou 213164, Jiangsu, China [email protected]

Abstract. The concept of Outcome-Based Education (OBE) advocates studentcentered, outcome-based, and continuous improvement. Under the situation that the structure of students in higher vocational colleges is complex and their personalized development needs are prominent. It is urgent to implement this philosophy in every aspect of teaching, and the precise teaching in the digital era makes this vision a reality. This paper discusses the role of precise teaching under the concept of OBE, including meeting personalized development needs, adapting diverse learning paths, implementing generative teaching strategies, and exploring digital empowerment education. Moreover, the “precise” is not only the result of information technology support, but also the process decision-making of bilateral activities between teachers and students. Therefore, based on the reverse thinking of OBE, this paper designs the implementation path of precise teaching. It defines learning outcomes by precisely understanding students’ learning situations and setting instructional objectives, attains learning outcomes by precisely aligning teaching contents and designing instructional activities, evaluates and optimizes learning outcomes by precisely initiating value-added evaluation and implementing teaching intervention, ultimately promoting continuous improvement in teaching quality and deepening the supply-side reform of talent cultivation. Keywords: Digitalization · The Concept of OBE · Precise Teaching · Higher Vocational Colleges

1 Introduction The “Opinions on Promoting High-quality Development of Modern Vocational Education” issued by the General Office of the CPC Central Committee and the General Office of the State Council emphasizes the importance of deepening the reform of education and teaching, innovating teaching modes and methods, and developing high-quality technical and skilled personnel that meet the needs of social and economic development. As higher vocational colleges are moving from scale development to connotation construction, it is necessary to deepen the supply-side reform, prioritize the improvement of talent cultivation quality, recognize curriculum as the core of talent cultivation, and view © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 146–155, 2024. https://doi.org/10.1007/978-981-99-9412-0_16

Research on the Precise Teaching Path

147

the classroom as the interface for teaching and learning, playing a crucial role in talent cultivation. However, higher vocational colleges face challenges regarding the complex sources of students and the demand for individualized development. OBE emphasizes cultivating individuals with full personal development by guiding learning outcomes and achieving educational goals through continuous improvement. With the deep integration of vocational education and information technology, precise teaching supported by big data and artificial intelligence can respect individual differences, realize the implementation of OBE concept in classroom teaching, and effectively improve the teaching effect and the quality of personnel training. Therefore, in the in the digital era, precise teaching based on the concept of OBE serves as a crucial pathway for achieving personalized education and fostering the potential for every individual to succeed.

2 The Connotation of OBE Concept and Precise Teaching 2.1 The Concept of OBE The concept of OBE, synthesized from the three elements of student-centered, outcomebased, and continuous improvement [1], was first proposed by Spady in 1981. With the adoption of the OBE concept by numerous engineering education accreditation institutions worldwide, it has gained widespread acceptance and application in China’s education sector for teaching reform and talent cultivation. Firstly, the concept of “student-centered” is rooted in the humanistic theory and emphasizes that learners have inherent autonomous motivation for self-actualization. In the teaching process, teachers should not only focus on students’ cognitive development, but more importantly, respond to their needs and differences, and guide and inspire students to become individuals with full personality development. Secondly, “OBE” is based on the expected peak achievements that students can reach, and the teaching process is designed and implemented through backward design of courses and appropriate teaching strategies to promote deep and collaborative learning among students. Finally, the concept of “continuous improvement” is consistent with the PDCA cycle in quality management. Teachers organize teaching based on expected learning outcomes, monitor the entire teaching process, provide timely interventions, and then students self diagnose and improve based on teaching evaluations, ultimately achieving expected learning outcomes. Therefore, the OBE concept overcomes the problems inherent in traditional vocational education, such as a curriculum-centric approach, unidirectional teaching methods, and vaguely defined learning outcomes. Instead, the OBE approach interweaves all the elements and aspects of teaching into a student-centered, outcome-based network, ensuring alignment among teaching objectives, processes, and outcomes, and achieves the ultimate goal of talent cultivation. 2.2 Precise Teaching Precise teaching, originally proposed by Lindsley and rooted in the idea of individualized instruction and Skinner’s behaviorist learning theory [3], refers to the use of

148

J. Luo

scientific tools and measures to continuously observe learners’ behavior and accurately record and present it, thus providing teachers with methods for making instructional decisions and interventions. With the deep integration of information technology and education, such as artificial intelligence and big data, the connotation of precise teaching has been expanded, and its core function of “measurement for learning” has become more prominent, making it an important carrier for exploring the integration of information technology and curriculum teaching. Most of researches on precise teaching mainly focus on the construction and implementation framework of teaching modes driven by big data and supported by intelligent technology, highlighting the important role played by information technology in the teaching process [4]. However, Wang et al. [5] believed that precise teaching undoubtedly belongs to inquiry-based teaching, which requires the support of various types of data, but should not overly rely on it, and emphasizes the importance of teachers’ teaching wisdom decisions to achieve human-machine collaboration, effectively improving teaching quality. Therefore, this study considers that precise teaching cannot be separated from the enablement of emerging technologies, but more importantly, it requires a return to the essence of education. The student-centered philosophy and the recognition of precise teaching advocated by OBE are highly compatible and take learning outcomes as the logical starting point for reflection: What are the learning outcomes of students? Why do students need to acquire these outcomes? How to help students achieve learning outcomes? How to evaluate whether students have acquired these outcomes? In order to address the above issues, this study intends to follow the OBE concept to explore the path for higher vocational colleges to carry out precise teaching.

3 The Necessity of Implementing Precise Teaching in Higher Vocational Colleges Under the Concept of OBE In higher vocational colleges, there still exists a considerable emphasis on the transmission of disciplinary knowledge in classroom teaching, which neglects the characteristics of higher vocational education and the individual attributes of students. Consequently, this leads to a disconnection between learning and employment, as well as theory and practice. In the construction of the connotation of higher vocational colleges, the most significant aspect is the enhancement of students’ abilities and levels of becoming individuals. Therefore, it is necessary to adopt the concept of OBE to redesign the direction of talent cultivation, and to implement precise teaching based on intelligent and refined development, thus truly incorporating it into classroom teaching and facilitating students’ comprehensive development and personalized growth. 3.1 An Important Avenue to Meet Personalized Development Needs The diversity of higher vocational college students mainly comes from general high schools, vocational high schools, and technical secondary schools. In addition, due to the expansion of enrollment, higher vocational colleges also admit social students such

Research on the Precise Teaching Path

149

as retired soldiers, in-service employees, laid-off workers, and migrant workers. However, the “massification” teaching model is no longer suitable for the diverse student backgrounds and learning needs. In order to achieve personalized development based on students and supported by information technology, precise teaching is necessary. Firstly, it should be able to identify individual learning situations and start teaching from students’ existing cognitive levels, clarifying learning starting points and teaching references. Secondly, it should be capable of determining learning needs by diagnosing individual learning obstacles and demands, assessing the gap between cognition and needs, and setting precise teaching goals to define learning outcomes. Thirdly, it should scientifically match learning content, focusing on teaching objectives, and using cognitive maps and knowledge maps to process and accurately deliver content that conforms to individual cognitive laws. Therefore, precise teaching under the concept of OBE can promote the realization of learners’ personalized development vision from the aspects of learning situation analysis, goal setting, content selection, etc., stimulate learning motivation, and maximize individual talents. 3.2 An Important Means of Adapting Diverse Learning Paths Higher vocational college students exhibit a strong ability to acquire experience and strategies, yet they lack proficiency in independent exploration. This observation reflects their limited capacity in self-reflective frameworks and the construction of learning pathways. Moreover, due to the differences in cognitive patterns and learning needs among students, relying solely on teachers’ subjective planning of learning paths is clearly not appropriate. Therefore, it is necessary to design instructional activities that are centered around students and aimed at achieving desired results, while also planning diverse learning paths to adapt to their developmental needs. Firstly, it is important to establish authentic scenarios that drive task-oriented learning. Teachers can rely on intelligent technology to accurately match students with tasks and resources of varying difficulty, guiding them in constructing and solving problems independently. Secondly, the concept of OBE emphasizes the evaluation of the valueadded aspects of learning outcomes. Precise teaching enables this subtle consideration to become a reality by diagnosing the completion of students’ preceding activities based on the value-added evaluation system, and combining it with individual cognitive development paths to arrange subsequent activities. Thirdly, real-time monitoring of students’ learning processes allows for differentiated intervention methods to be provided to students who deviate from the learning trajectory. Therefore, precision teaching under the concept of OBE can accommodate diverse learning paths through instructional activity design, diagnosis and evaluation, monitoring and intervention. 3.3 A Significant Approach to Implementing Generative Teaching Strategies The cognition, emotion, and skills of students are not immutable, but dynamically generated throughout the teaching process. Precision teaching, embodying the lean thinking of continuous improvement, is an effective approach for implementing dynamic generative teaching strategies. Firstly, generative teaching strategies advocate for the integration of affective and cognitive aspects in humanistic theories, emphasizing the cultivation

150

J. Luo

and stimulation of students’ learning motivation. Precise teaching, on the other hand, recognizes individual differences among students and utilizes tiered instructional goals, targets, content, and evaluation methods to stimulate their learning initiative. Secondly, generative teaching strategies emphasize the positive interaction between teachers and students, as well as among students themselves, promoting active thinking and knowledge transfer during the instruction process. Precise teaching enables teachers to identify changes in students’ learning situations and adjust their strategies accordingly, while also providing scientific basis for students to recognize their own learning gaps and modify their behaviors. This facilitates continuous iteration and optimization of the teaching process. Therefore, precise teaching achieves dynamic generative teaching through the interaction between teachers’ precision in instruction and students’ individualized learning. 3.4 A Significant Form of Digital Empowerment Education The 20th National Congress of the Communist Party of China called for the promotion of digitization in education, making the digital transformation of vocational education an inevitable trend. As vocational education is closely linked to regional economic and social development, it is necessary to enhance its service capabilities from the talent supply side, meet students’ personalized development needs, adapt to diversified learning paths, and implement generative teaching strategies to achieve meaningful learning. Precise teaching under the concept of OBE utilizes modern information technologies such as big data, AI, and 5G to optimize teaching [2]. This makes personalized guidance a reality, provides scientific basis for teaching decisions, promotes collaborative development between teachers and students, and becomes an important form of exploring digital transformation to empower educational reform.

4 The Implementation Path of Precise Teaching in Higher Vocational Colleges Under the of Concept in the Digital Age Based on the concept of OBE, this study proposed the implementation framework of precise teaching in higher vocational colleges (see Fig. 1). This framework, synthesized from the three elements of OBE and centered on students, is outcome-based and considers what students should learn, why they need to learn it, how to help them achieve their learning outcomes, and how to evaluate their achievements. Supported by information technologies such as artificial intelligence and big data, the implementation of precise teaching in higher vocational colleges is divided into three stages: defining learning outcomes, obtaining learning outcomes, and evaluating learning outcomes, presented in an upward spiral structure. At the same time, teachers’ important role in teaching design and organization is emphasized, and the concept of “precision” is implemented throughout the teaching process, including six stages: precisely understanding students’ learning situations and setting teaching goals, precisely matching teaching content with designing teaching activities, and precisely carrying out value-added evaluation and implementing teaching interventions. This cyclic process ultimately leads to continuous improvement of teaching quality and learning outcomes for both teachers and students.

Research on the Precise Teaching Path

151

Fig. 1. Implementation framework of precise teaching in higher vocational colleges under the concept of OBE.

4.1 Define Learning Outcomes and Establish Precise Teaching Objectives Precisely Understanding the Learning Situation of Students. Under the concept of OBE, learning outcomes are equivalent to course objectives. In setting these objectives, it is not only crucial to have well-defined descriptions, but also to ensure that they accurately match the students’ learning situation. Learning analytics, a value-based assessment, is utilized to explore multiple dimensions of educational data through channels such as intelligent campus platforms and intelligent teaching platforms. This allows for the analysis and profiling of students, including their basic traits, knowledge foundation, learning ability, learning needs, learning style, and emotional attitude, with the goal of gaining a precise understanding of the group and individual learning situations. As the main driver of teaching, teachers can rely on the technologies of artificial intelligence and big data to obtain learning process data of learners, which can be used to form “knowledge distribution diagram”, “score distribution diagram”, “teacher-student interaction network diagram”, etc., to analyze group learning situations and focus on common weaknesses among students. More importantly, teachers can utilize individual learners’ “personal knowledge graph”, “knowledge point time-effect matrix”, “dynamic change diagram of achievement”, “learning behavior diagram”, “psychological test diagram”, etc. to analyze individual learning situations and focus on personalized differences among students. Based on the analysis of group and individual learning situations, precise identification of “learning pain points” can be made, providing support in formulating differentiated teaching objectives and designing teaching activities. Precisely Setting Instructional Objectives. On the basis of understanding the learning situation, form teaching objectives, that is, define expected learning outcomes, in order to accurately express the mapping relationship between individual characteristics

152

J. Luo

of students and expected learning outcomes. In line with the concept of OBE, setting precise teaching objectives requires attention to the upper limit of the zone of proximal development [6], utilizing high-quality peak achievements as a guide for reverse design, and breaking down the overall course objectives that support graduation requirements into detailed teaching unit objectives, thereby truly inspiring students’ learning motivation and cultivating individuals who can adapt to changes and fully develop their personalities. The teaching objectives should be accurately described and precisely mapped. In higher vocational colleges and universities, teaching objectives are generally divided into knowledge objectives, skill objectives, and quality objectives, which are ambiguously described using terms such as “understand, master, analyze, form, and apply.” It is necessary to transform these general descriptions into specific, clear, and measurable precise descriptions to demonstrate learning outcomes effectively. Additionally, combining learner profiles and knowledge graphs created through information technology, fundamental and appropriate objectives based on group learning situations can be set while hierarchical and classified objectives based on individual characteristics can be established to create a direct mapping relationship, thereby achieving meaningful learning. Finally, this approach is effective in optimizing the learning process.

4.2 Attain Learning Outcomes by Precision Designing of Instructional Processes Precisely Aligning Teaching Contents. Accurate matching of teaching content is an effective guarantee for achieving teaching objectives, which requires precise customization and delivery based on dynamic tracking of individual student’s learning situation. Higher Vocational colleges often tailor talent development to meet local economic demands by utilizing real or virtual simulation projects of enterprises as carriers, planning and restructuring appropriate teaching content, thus achieving precise customization. The precise delivery of teaching content is based primarily on intelligent recommendation algorithms, which delve deeply into the associations between cognitive maps and knowledge graphs, and around the mapping relationship between learners and teaching content, deliver customized content to corresponding students. The main delivery channels include: 1) learner-based recommendation, which provides the same teaching content to a group of students with similar learning situations, taking into account the association between students; 2) content-based recommendation, which delivers highly matched content to corresponding students based on the matching degree between students and content; and 3) association rule-based recommendation, which provides content to students according to their learning trajectories, by combining the related relationship between unrevised and revised content. Precisely Designing Instructional Activities. The achievement of differentiated teaching objectives relies on precise teaching activities in the classroom. “Precision” is reflected in the interactive process between teachers’ “precise teaching” and students’ “personalized learning”. Therefore, the precise conduct of teaching activities requires support from technologies such as data mining, artificial intelligence, and intelligent recommendation, using created real-world situations as carriers and employing diverse learning paths from the before, during, and after class stages.

Research on the Precise Teaching Path

153

Before class, personalized learning is implemented through online resources. Teachers rely on intelligent teaching platform to intelligently and accurately push teaching resources, and students complete ability assessment and grouping after autonomous learning. Teachers obtain student data to detect individual differences, clarify teaching difficulties, and adjust teaching strategies in a timely manner. In class, precise guidance is implemented for hierarchical tasks. Teachers release hierarchical tasks of different difficulty levels, and students receive tasks based on their existing foundation and selfwill. They raise questions during the completion process, and the teacher collects issues while touring. Meanwhile, the teacher utilizes the information collection and analysis technology integrated in the intelligent classroom to quickly obtain student confusion and implement hierarchical precise guidance. For common problems, holistic teaching activities are adopted, and for individual issues, differentiated learning guidance is adopted. After class, dynamic adjustments are made based on feedback results. After completing the tasks, student groups submit hierarchical task results, and teachers obtain data on learning attitudes and behaviors, learning processes and results, and form visual evaluation results feedback to students. Finally, students self-assess and dynamically adjust their groups, co-learn to achieve peak results, and teachers dynamically adjust their teaching strategies, forming a closed-loop operation and continuous improvement mechanism before, during, and after class.

4.3 Evaluate Learning Outcomes and Precisely Diagnosing Achievements Precisely Initiating Value-Added Evaluation. Under the concept of OBE, evaluating learning outcomes refers to measuring whether students have achieved expected results, namely, diagnosing the attainment of teaching objectives, which helps to enhance the consistency of “teaching, learning, and evaluation”. However, the traditional teaching evaluation in vocational education emphasizes identification and selection functions while ignoring incentive and diagnostic functions. Therefore, it is necessary to explore value-added evaluation, measure added value by comparing with teaching objectives and using evaluation standards. One approach is to establish a multi-level value-added assessment system. Firstly, evaluation indicators are established around teaching objectives and contents, with the core content being the evaluation of vocational knowledge, skills and competencies. Secondly, a multi-level value-added assessment standard is formed based on these indicators. As the learning effectiveness and value-added space differs among students learning the same content, a uniform standard cannot be used to evaluate all students. Therefore, multiple levels of value-added quotas need to be established. For students with weaker learning foundations and abilities but high progress potential, a higher quota should be set, whereas for those with lower progress potential, a lower quota should be set to motivate them to explore their potential and achieve higher levels of development. The second aspect involves building a diversified value-added evaluation support system. With regard to the evaluation subject, higher vocational colleges are the main platform for cultivating professional and technical talents for enterprises. The evaluation subject not only includes teachers and students, but also requires the participation of industry and enterprise experts, forming a diversified evaluation system. In terms of

154

J. Luo

evaluation methods, a comprehensive use of diagnostic evaluation before learning, formative evaluation during learning, and summative evaluation after learning is needed. Regarding technical support, it is necessary to utilize intelligent teaching platforms and integrated smart classrooms to collect information on learning behaviors, habits, and emotional attitudes, record test, final exam and layered task assessment results, analyze and create “dynamic grade change charts” and “learning behavior charts” and other visual reports, and provide accurate diagnosis for students who have not achieved the corresponding value-added quota. Precisely Implementing Teaching Intervention. It is the core aspect of precise teaching. By combining existing research and practical teaching experience, the three-level intervention system of “class-group-individual” can effectively solve students’ common and individual problems. Considering the reality of large-scale teaching in higher vocational colleges, whole-class teaching intervention should be conducted first. Based on the “knowledge distribution map”, “score distribution map”, and “teacher-student interaction network map” generated from evaluation, teachers can analyze the common problems of the class, provide guidance on the learning path, and organize activities such as scenario simulations, on-site teaching, game interactions, and classroom debates to facilitate knowledge construction, internalization, and overall learning motivation. Furthermore, group-based differentiated instruction was employed based on the OBE concept that emphasizes cooperation and communication rather than competition and exclusivity. Teachers first formed groups based on students’ profiles and accurately established the teaching objectives and content for each group, while providing corresponding classroom tasks and post-training activities to achieve differentiated classification teaching. After each learning stage, highly evaluated groups were encouraged to challenge more difficult tasks, while low evaluated groups were provided with profiles to analyze the reasons for their performance. In particular, students who are unable to keep up or not sufficiently challenged were encouraged to adjust their group, resulting in dynamic differentiation. Finally, an individualized instructional intervention is carried out, focusing on the explanation of underachieving and overachieving students. For underachieving students, primary intervention measures include setting clear and concise teaching goals, delivering basic teaching content and tasks, and conducting one-on-one tutoring and Q&A sessions. In addition, complementary intervention measures such as emotional motivation, psychological counseling, and institutional constraints are provided to foster the development of learning interests and abilities. For overachieving students, teachers adopt main intervention measures such as setting challenging high-order goals, matching advanced teaching content and tasks, and implementing one-on-one guidance and pointing. Furthermore, complementary intervention measures such as extended projects and variant training projects are offered to facilitate the formation of critical thinking and problem-solving abilities. Therefore, based on instructional evaluation and diagnosis, feedback, and intervention, teaching decisions can be continuously optimized, which leads to a cycle of iteration towards expected outcomes.

Research on the Precise Teaching Path

155

Acknowledgment. This research is sponsored by QingLan Project of colleges and universities in Jiangsu and also by the educational reform project at Changzhou College of Information Technology “Research on the Precise Teaching Path of Higher Vocational Colleges under the Concept of OBE in the Digital Era” under grant No.2023CXJG19. I thank the anonymous referees for their comments.

References 1. Zhang, N.X., Zhang, L., Wang, X.F., Sun, J.H.: Origin, core and practical boundary of OBE: discussion on paradigm change in professional certification. Res. High. Educ. Eng. 03, 109–115 (2020) 2. An, F.H.: Precision instruction: historical evolution, realistic examination and value interpretation. Curriculum Teach. Mater. Method 41(08), 56–62 (2021) 3. Sun, H.M., Cai, Y.H., Li, X.Y., Xuan, Y.Z.: Instructional design of learning activities based on OBE. High. Educ. Dev. Eval. 38(06), 99–111+123–124 (2022) 4. Guo, L.M., Yang, X.M., Zhang Y.: Analysis on new development and value orientation of precise teaching in the era of big data. E-education Res. 40(10), 76–81+88 (2019) 5. Wang, L.H., Xia, L.L., He, W.T.: Precise teaching returning to pedagogy—towards humanmachine cooperation. E-education Res. 42(12), 108–114 (2021) 6. Liu, N., Yu, S.Q.: Research on precise teaching based on zone of proximal development. E-education Res. 41(07), 77–85 (2020)

An Optimal Inventory Replenishment Strategy with Cross-docking System and Time Window Problem Yen-Deng Huang1,2 , Simon Wu3,4(B) , and Xue-Fei Yuan1 1 School of Digital Economics, Changzhou College of Information Technology, Science

Education City, Changzhou, Jiangsu 213164, China 2 School of Information and Technology, Sanda University, Shanghai Pudong District,

Shanghai 201209, China 3 Department of Industrial and Systems Engineering, Chung Yuan Christian University,

200, Chung Pei Road, Zhongli District, Taoyuan, Taiwan 320314 [email protected] 4 Ford Motor, Taoyuan, Taiwan

Abstract. This study discussed the innovative products is not only considering shorter product life cycle (PLC), but also facing changeable variation of demand from the market with diversified customers. It is important to apply different price discounts for providing effective price strategy, so that it can correspond to different demands from their downstream. Therefore, this study formulates the model of the minimal overall associate costs subject to time window constraint and service level (SL) with considering the cross-docking operating techniques, price discount strategy and stock-out is admitted. This study is focused on an innovative and high-tech inventory replenishment problem considering cross- docking operating system, uncertain demand, comprehensive ordering system (s, T, Q), service level and time window with multi-suppliers, multi-periods and quantity discount. Therefore, stochastic optimization meta-heuristic algorithm- Particle Swarm Optimization Retrospective Approximation (PSORA) method is applied to solve the integrated inventory replenishment model. This study will provide marginal insights in managing the supply chain of consumer electronic innovative products and reducing the total operating cost. Keywords: Inventory management · cross-docking system · Time Windows · stochastic optimization meta-heuristics algorithm

1 Introduction In recent years, the life cycle of the consumer electric innovative products has become shorter, customers have more choices than before, which cause the market demands are very difficult to predict precisely. This kind of demand uncertainty causes industries may purchase more products to meet all demands from their downstream, and then result in overmuch inventory cost. In order to well manage inventory, many companies © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 156–164, 2024. https://doi.org/10.1007/978-981-99-9412-0_17

An Optimal Inventory Replenishment Strategy

157

are seeking ways to reduce inventory through advanced decision-making techniques (Arnold and Chapman [1]). In the 1980s, Wal-Mart first successfully introduced the inventory management techniques by cross-docking operating system which effectively reduced the standing time of products in the warehouse and delivered the goods to destination in short time.Napolitano [2] constructed a program which briefly illustrates the diverse types of cross-docking operations. Lee et al. [3] developed an integrated model of vehicle routing scheduling and cross-docking in the supply chain and applied Tabu Search (TS) algorithm to solve the NP-hard problems. More detailed studies can be found in Mousavi and Tavakkoli-Moghaddam [4], Shi et al. [5], and Agustina et al. [6]. Agustina et al. [6] examined fresh food distribution with the characteristics of short shelf-life and respected delivery time windows in supply chain. As mentioned above, we will discuss the strategy of price discounts in related innovative products (as Shen and Willems [7]). Furthermore, this study also considers time window constraint to control the delivery time from suppliers send the product to the destination (as Lim et al. [8]), also this study formulates the model of the minimal overall associated costs subject to time window constraint and service level (SL) with considering the cross-docking operating techniques. The comprehensive ordering system (s, T, Q) is illustrated in Fig. 1. Finally, the integrated inventory model with cross-docking will provide marginal insights that reducing the total operating cost. Owing to the complexity of the proposed problem, this problem is also an NP-hard problem and cannot be solved with analytical or mathematical method. Furthermore, the proposed model is classified into a stochastic optimization problem due to their customer demands are assumed unknown. Chen and Huang [9] developed a new stochastic optimization algorithm, called finite-difference retrospective algorithm (FDRA) to solve the bus schedule problem. FDRA is to solve a series of sample path approximation problem. Huang et al. [10] extended the concepts of RA and combined with an improved global-local neighbor particle swarm optimization (GLNPSO) proposed by Boonmee and Sethanan [11] to solve the stochastic optimization problems. The proposed GLNPSO expanded and improved the basic framework of PSO by considering self-adaptive inertia weight, the velocity and dispersion index of the swarm, self-adaptive acceleration constants, and multi-social learning mechanisms that include global, personal best, local, and neighbor position. Finally, stochastic optimization meta-heuristic algorithm- Particle Swarm Optimization Retrospective Approximation (PSORA) method proposed by Huang et al. [10] is applied to solve the integrated inventory replenishment model.

158

Y.-D. Huang et al. Available inventory and set t=1

Request of requirement

Without order and set t = t+1

No

Is cycle t = T

Yes Check inventory (Holding quantity+ Goods in Transit - shortage)

Receive quantity

NO

check if inventory ≤ reorder point( s )

YES Calculate order quantity =fixed order quantity Q

Purchasing order

Fig. 1. The flowchart of the comprehensive ordering system (s, T, Q)

2 Problem Description and Assumptions First, this study considers multiple suppliers, a distribution center (DC) with crossdocking system, and one or multiple retailers. The customer demand for each product follows the Poisson distribution with arrival rate λ. Under the comprehensive ordering system (s, T, Q), the retailer places an order and purchases the order quantity (Q) when the inventory period (T ) is met and the ending inventory level (I) is lower than reorder point (s) as shown in Fig. 1. As the above situation, product shortage may be happened. Then, all retailers place an order and notify DC, then DC will inform each supplier to prepare the products by electronic ordering system (EOS). Suppliers will supply the product by truck-load (TL) to the DC and then transports them to all retailers by TL. When the retailer places an order, the lead time (LT) for each supplier is quite different, and delivery time is also not same. Therefore, product shortage may be happened and result in the increasing inventory costs in the DC. For simplify, this study assumes that LT of each supplier is same and the delivery time which is dispatched to the DC from his upstream suppliers and to the retailers from the DC follows the Normal distribution. The TL leaves immediately to the retailers when the last vehicle from a supplier arrives to the DC. Finally, this study assumed that expected value (μ) of delivery time from supplier to DC is positive correlation with Q. This is because of the time in which the loading and unloading of products will also increase when the retailer orders a large quantity Q. Under the limitations of soft time window and SL threshold (α), this study

An Optimal Inventory Replenishment Strategy

159

is to optimize the minimal overall associated cost in the comprehensive ordering system (s, T, Q) of inventory model. Basic assumptions are shown as follow. • The lead time of every supplier is the same. • The quantity discount of the purchasing prices is the incremental discount and unit purchasing cost of a product is dependent on the order quantities from his downstream. • The transporting truck is only single vehicle and TL considered the capacity is infinite regardless of the truck transports the product from suppliers to DC or from DC to retailers. • The delivery time follows Normal distribution regardless of from suppliers to DC or from DC to retailers. • Supply quantity from all suppliers is known and constant and the demand rate (λ) of each retailer follows Poisson distribution. • DC is only for transportation because of Products don’t stay long time in the DC. • Loading or unloading situation among transportation do not consider from suppliers to DC or from DC to retailers. • The comprehensive ordering system (s, T, Q) is adopted for all retailers. The retailer places an order and purchases the order quantity (Q)when the inventory period (T )is met and the ending inventory level (I) is below reorder point (skr ). • Each supplier produces only one product and considers soft time window limitation. • The inventory period (T ) of each retailer assumes the same, meaning all retailers are checked at the same time. • Table 1 defines decision variables in this study. Table 1. Notations of decision variables Decision Var Definitions

Unit

T

The length of inventory period

(hour)

Qk

The total supply amount from supplier k to DC

(piece(s))

Qkr

Order quantity of product k for retailer r

(piece(s))

Qrk

Fixed order quantity of product k for retailer r

(piece(s))

skr

Reorder point of product k for retailer r

(piece(s))

s

Reorder point (ROP) (Set all ROP skr = s)

(piece(s))

X kr

When the retailer r order and the supplier k supply, the value is set to be 1, otherwise is 0 (binary variable)

3 Model Development The minimal cost function includes the ordering costs, transportation costs, inventory and stock-out costs, and penalty costs for violating the soft time window constraint as shown in Eq. 1. Minimizing overall costs is obtained with comprehensive ordering system (s, T, Q) when the constraints of soft time windows and SL threshold α are both satisfied as

160

Y.-D. Huang et al.

shown in Eqs. 2–8. The proposed model with cross-docking strategy is a well-known NPhard problem and cannot be solved with analytical or mathematical method. Furthermore, such type of problem is also classified into a stochastic optimization problem due to uncertain demands. Consequently, PSORA algorithm is applied to solve the integrated inventory model by considering the comprehensive ordering system (s, T, Q) and cross docking policy. The main objective function of the integrated supply chain system is demonstrated below. Minimization:   E(FO (Q)) + E(FkD (Q)) + E(FDr (Q)) + E(FR (Q, I )) + E(FD (Q)) + E(FB (Q)) + E Fp (Q)

(1)

Subject to Wk − βQkr  Cpk + Vk

(2)

Tkr = LT + TkD + TD + TDr

(3)

⎧ ⎪ ⎨ Pe (er − Tkr ) if max{Tkr Xkr }  er ,Xkr = 1∀k,r Pr = Pl (Tkr − lr ) if max{Tkr Xkr }  lr ,Xkr = 1∀k,r ⎪ ⎩ 0 o.w.  Qk = Qkr Xkr

(4)

(5)

r

Qrk , ifI nkr < skr (6) 0, o.w.

n   n1 + min max I n − d n1 + Q , 0 , d n2 min Ikr , dkr kr kr kr kr E  αkr αkr = α ∀k, r (7) λkr T n 1, if Ikr < skr skr = s ∀k, r Xkr = (8) 0, o.w. Qkr =

where Eq. 2 shows the unit selling price is not lower than supplier k’s acceptable price after discount policy is implemented. Equation 3 is to calculate total delivery time (T kr ) where the sum of delivery time includes LT, the time from supplier k to DC, processing time in DC, and time from DC to retailer r. Equation 4 calculates retailer r’s penalty cost Pr for violating soft time window when the supplier k ships products through DC and then transfers them to retailer r. If the total transportation time (T kr ) is less than the lower bound of the soft time window, it will get the earlier penalty cost Pe (er - T kr ), whereas it will get the later penalty cost Pl (T kr - l r ) when T kr is bigger than the upper bound of the soft time window. The transportation times for suppliers are different, so it is assumed that all delivery trucks are required to wait until the last truck arrives. Therefore, the maximum transportation time max{T kr X kr } is determined at the retailer r placed an order from supplier k, X kr = 1. Equation 5 is the supply quantity from

An Optimal Inventory Replenishment Strategy

161

supplier k is equal to order quantity form all retailers. Equation 6 assumes that when n is lower than the ROP s of product k for retailer r in the initial inventory level Ikr kr inventory period n, the retailer r place a fixed order quantity Qrk , whereas it does not order in inventory period n. For simplify, all ROPs skr is set the same as the parameter s. Equation 7 represents the expected SL value of product k for retailer r is bigger than the predetermined SL threshold α, i.e., the SL of all retailers must be satisfied with the predetermined SL threshold value. Finally, Eq. 8 indicates that the decision variable X kr n is lower than the is the binary variable. It means that when the initial inventory level Ikr ROP skr of product k for retailer r in inventory period n, the retailer r places an order in the comprehensive ordering system (s, T, Q). The decision variable X kr is set to be 1; otherwise 0 is set. Subsequently, the objective function as shown in Eq. 1 will be roughly described below. In Eq. 9, the order costs of retailers are divided into two parts, including fixed ordering cost and purchasing one made up the order quantity from retailer r and discount policy.  O + (Wk − βQkr )Qkr (9) FO (Qkr ) = r

Transportation cost includes two stages, which include into the cost delivering products form supplier k to DC and the cost delivering products from DC to retailer as shown in Eq. 10.    FkD (Qkr ) + FDr (Qkr ) = CkD Qk + CDr Qkr (10) r

k

k

The total transportation cost model shown on above transformed the piecewise linear concave cost function is adopted in this study. The multiple choices model is used to construct a piecewise linear concave function which would be expressed as follows:         w w w w Qkr = a Z + f y or CDr Qkr = aw Z w + f w yw (11) CkD r

w

k

⎧ Qkr ∀k, (link K - D) ⎪ ⎪ ⎨  r w Z =  ⎪ Qkr ∀r, (link D - R) ⎪ w ⎩ k

y  Z  bw yw

w−1 w

b

w



w

yw  1 yw ∈ {0, 1}

w

The inventory costs include the inventory model of the DC and retailer as shown in Eqs. 12–18. First, we briefly introduced inventory costs in DC based on the order quantity from all retailers. The total average inventory cost is presented below. FD (Qkr ) =

 r

k

HD

{max(TkD Xkr ) − TkD Xkr }Qkr max{TkD Xkr }

(12)

162

Y.-D. Huang et al.

Secondly, the inventory and shortage costs for all retailers are classified into six situations: a. The initial inventory level of product k exceeds the ROP for retailer r in period n n > s ). It still has inventory at the end in period n (i.e., I n+1 = I n − d n > 0). (i.e., Ikr kr kr kr kr The inventory and shortage costs in case-a are separately defined as:   n + In − dn Ikr kr kr FR (Qkr ,I ) = Hkr (13) , FB (Qkr ) = 0 2 b. The initial inventory level of product k exceeds the ROP for retailer r in period n n > s ). However, there is no inventory at the end in period n (i.e., I n < d n ). (i.e., Ikr kr kr kr The inventory and shortage costs in case-b are formulated as: FR (Qkr ,I ) = Hkr

n × In   n Ikr n kr , FB (Qkr ) = B dkr − Ikr n 2 × dkr

(14)

c. The initial inventory level of product k is less than the ROP for retailer r in period n  s ). After the retailer r places an order for product k from supplier k, there n (i.e., Ikr kr n  d n1 ) still has inventory for the product k which are delivered in the meantime (i.e., Ikr kr n+1 n n1 n2 and have inventory at the end in period n (i.e., Ikr = Ikr −dkr + Qkr − dkr > 0). The inventory and shortage costs in case-c are obtained as:       n + I n −d n1 n −d n1 +Q +I n+1 Ikr Ikr kr kr kr kr kr T + (T − Tkr ) kr 2 2 , FB (Qkr ) = 0 FR (Qkr ,I ) = Hkr T (15) d. The initial inventory level of product k is less than the ROP for retailer r in period n n  s ). After the retailer r places an order for product k from supplier k, it still (i.e., Ikr kr n  d n1 ) but has inventory for the product k which are delivered in the meantime (i.e., Ikr kr n −d n1 + Q  d n2 ). no inventory at the end in period n (i.e., Ikr kr kr kr The inventory and shortage costs in case-d are obtained as: 

FR (Qkr ,I ) = Hkr

(

I n1 + I n −d n1 kr kr kr 2

)

 n n1   (I −d +Q )×(I n −d n1 +Q ) Tkr + kr kr kr n2kr kr kr (T −Tkr ) 2×d kr

,  n1 n2 n FB (Qkr ) = B dkr + dkr − Ikr − Qkr T

(16)

e. The initial inventory level of product k is less than the ROP for retailer r in period n n  s ). After the retailer r places an order for product k from supplier k, there (i.e., Ikr kr n  d n1 ) is no inventory for the product k which are delivered in the meantime (i.e., Ikr kr n+1 n −d n1 + Q − d n2 > 0 ). but it has inventory at the end in period n (i.e., Ikr = Ikr kr kr kr The inventory and shortage costs in case-e are shown as follows: 

FR (Qkr ,I ) = Hkr

I n ×I n kr kr 2×d n1 kr



 (Ikrn −dkrn1 +Qkr )+Ikrn+1 (T −T ) Tkr + kr 2

,  n1 n FB (Qkr ) = B dkr − Ikr T

(17)

An Optimal Inventory Replenishment Strategy

163

f. The initial inventory level of product k is less than the ROP for retailer r in period n n  s ). After the retailer r places an order for product k from supplier k, there (i.e., Ikr kr n  d n1 ) is no inventory for the product k which are delivered in the meantime (i.e., Ikr kr n −d n1 + Q  d n2 ). and there is also no inventory at the end in period n (i.e., Ikr kr kr kr The inventory and shortage costs in case-f are shown as: 

FR (Qkr ,I ) = Hkr

I n ×I n kr kr 2×d n1 kr

   max (I n −I n1 +Qkr ,0)×max (I n −I n1 +Qkr ,0) kr kr kr kr Tkr + (T −Tkr ) n2 2×d kr

,  n1 n2 − I n − Q + dkr FB (Qkr ) = B dkr kr kr T

(18)

When the transportation time accumulated from supplier k to retailer r violates soft time window, the retailer’s penalty cost is calculated below. ⎧ ⎪ ⎨ Pe (er − Tkr ) if max{Tkr Xkr }  er ,Xkr = 1∀k,r  Fp (Qkr ) = (19) Pr , Pr = Pl (Tkr − lr ) if max{Tkr Xkr }  lr ,Xkr = 1∀k,r ⎪ ⎩ r 0 o.w.

4 A Case Study of a High-Tech Industry in Taiwan An innovative and high-tech inventory replenishment problem considering cross docking operating system, uncertain demand, comprehensive ordering system (s, T, Q), service level and time window with multi-suppliers, multi-periods and quantity discount is studied here. Figure 2 illustrates the distributed circulation of cross-docking operating system which contains multiple suppliers, a DC with cross docking system, and one or multiple retailers. For simplify, this study only considers two retailers.

Fig. 2. The diagram of DC with cross-docking operating system - N to 2 network

5 Conclusion This study develops the cross-docking operating system, time window, quantity discount, and comprehensive ordering system (s, T, Q) for the integrated innovative products supply chain inventory model. Considering the complexity of the proposed NP-hard

164

Y.-D. Huang et al.

problem which cannot be solved by analytical or mathematical method. So, the integrated model is classified into a stochastic optimization problem due to their customer demands are assumed uncertainty. Therefore, PSORA algorithm is applied to solve the integrated inventory model by considering the comprehensive ordering system (s, T, Q) and crossdocking policy under the service level is satisfied. For the future research, the dynamic revenue sharing and/or different discount strategies with smart price mechanisms will be considering into supply chain system by integrated model to make it more realistic and practical. Acknowledgment. This research is sponsored by the Project of Sciences and Research Foundation of Changzhou College of Information Technology under Grant No. SG050201010110, Research on Higher Vocational Colleges Serving Rural Revitalization under Grant No. KYPT202204R and Multiple Basis Projects of Social Sciences in Jiangsu Province of China under Grant No. 22XTB-41. We thank anonymous referees for their comments.

References 1. Arnold, J.R.T., Chapman, S.N.: Introduction to materials management, 5th edn. Pearson Prentice Hall, New Jersey (2003) 2. Napolitano, M.: Making the Move to Cross Docking-a Practical Guide. Warehousing Education and Research Council (2002) 3. Lee, Y.H., Jung, J.W., Lee, K.M.: Vehicle routing scheduling for cross-docking in the supply chain. Comput. Ind. Eng. 51, 247–256 (2006) 4. Mousavi, S.M., Tavakkoli-Moghaddam, R.: A hybrid simulated annealing algorithm for location and routing scheduling problems with cross-docking in the supply chain. J. Manuf. Syst. 32, 335–347 (2013) 5. Shi, W., Liu, Z.X., Shang, J., Cui, Y.J.: Multi-criteria robust design of a JIT-based crossdocking distribution center for an auto parts supply chain. Eur. J. Oper. Res. 229, 695–702 (2013) 6. Agustina, D., Lee, C.K.M., Piplani, R.: Vehicle scheduling and routing at a cross docking center for food supply chains. Int. J. Prod. Econ. 152, 29–41 (2014) 7. Shen, Y., Willems, S.P.: Strategic sourcing for the short-lifecycle products. Int. J. Prod. Econ. 139, 575–585 (2012) 8. Lim, A., Miao, Z., Rodrigues, B., Xu, Z.: Transshipment through cross docks with inventory and time windows. Nav. Res. Logist. 52(8), 724–733 (2005) 9. Chen, H.F., Huang, Y.D.: Stochastic optimization for system design. J. Chin. Inst. Indust. Eng. 23(5), 357–370 (2006) 10. Huang, Y.D., Song, W., Wee, H.M., Tseng, S.P., Yu, S.H.: Revisiting meta-heuristic optimization method in solving stochastic system design problem. In: 2021 9th International Conference on Orange Technology (ICOT), pp. 1–6 (2021) 11. Boonmee, A., Sethanan, K.: A GLNPSO for multi-level capacitated lot-sizing and scheduling problem in the poultry industry. Eur. J. Oper. Res. 250, 652–665 (2016)

Proposal of a DDoS Attack Detection Method Using the Communication Interval Kosei Iwasa1 , Shotaro Usuzaki1 , Kentaro Aburada1(B) , Hisaaki Yamaba1 , Tetsuro Katayama1 , Mirang Park2 , and Naonobu Okazaki1 1

2

University of Miyazaki, Miyazaki, Japan [email protected] Kanagawa Institute of Technology, Atsugi, Japan

Abstract. As the scale of Distributed Denial of Service (DDoS) attacks has been escalating in recent years, the need for real-time detection of attacks has increased. Existing intrusion detection systems (IDSs) perform detection with a fixed window size (assumed to be in hours). In previous research, attack detection was performed by preparing windows of multiple sizes, selecting the appropriate window based on the state of the data, and using features learned in advance for that window size. Although this method yielded a high DDoS attack detection rate of 98.30%, it exhibited a considerable false-positive rate of 7.37%. The proposed method measures the communication intervals of identical packets within the window, identified as attack-related in the previous survey, and classifies those packets with an average communication interval below a set threshold as attacks. The experiment resulted in a 50.2% decrease in the false-positive rate.

Keywords: DDoS

1

· Window Size · Communication Interval

Introduction

In today’s world, where the Internet has become an integral part of our social infrastructure, identifying and mitigating Distributed Denial-of-Service (DDoS) attacks, which involve sending malicious traffic to servers rendering them unresponsive, pose significant challenges. Notably, attacks against Yahoo [1], a major search service, and DNS [2] root servers were reported as early as in 2000. In October 2021, a 2.4 Tbps DDoS attack was reported against cloud computing service provider Microsoft Azure. In November of the same year, Microsoft suffered a DDoS attack targeting Microsoft Azure users in Asia with a throughput of 3.47 Tbps and a packet rate of 340 million packets per second (pps), necessitating the company’s mitigation measures. This is considered the largest DDoS attack in history, and underscored the urgency of implementing systems to counteract such attacks. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 165–174, 2024. https://doi.org/10.1007/978-981-99-9412-0_18

166

K. Iwasa et al.

Existing intrusion detection systems (IDSs) perform detection with a fixed window size (assumed to be in hours in this study). However, the appropriate window size is believed to vary depending on the state of the data, and fixing the window size may worsen detection accuracy and prevent real-time detection. Usuzaki et al. attempted to solve this problem by monitoring multiple windows simultaneously. However, as attacks were judged on a per-window basis, the false-positive rate was undesirably high. The Aggregation Pyramid is used for simultaneous monitoring of window size [3]. The appropriate window size is determined by applying the ADWIN algorithm. Under normal circumstances, monitoring is performed with a large window size, but if there is a sudden change in distribution, the window size is reduced to prioritize real-time response. In previous research, multiple window sizes were monitored concurrently, and the appropriate window size was used depending on the state of the data. In this study, for each window that is deemed to be under attack, the communication interval of the same type of packets is examined to determine if it constitutes a DDoS [4]. The goal of this study is to reduce false positives by removing normal packets in the window identified as under attack. This paper is organized as follows. Section 2 provides a description of DDoS attacks, existing DDoS attack detection methods, and two previous studies that were used as references in this investigation. Section 3 details the functions necessary for the design of the proposed system, and Sect. 4 describes the implementation of the proposed system and the results of operational experiments using test cases. Section 5 concludes the paper with a summary of the findings.

2 2.1

Research Background DDoS Attack

As illustrated in Fig. 1, a Denial of Service (DoS) attack aims to disrupt service by using a single computer to send a large number of malicious packets to a server or network, thus consuming its resources. DoS attacks are classified into two categories: “vulnerability attacks,” which take advantage of server or application vulnerabilities to execute unauthorized processing and cause service outages, and “flood attacks,” which flood the bandwidth by sending large numbers of unauthorized communication packets to the server, causing a denial of service condition. The latter is particularly efficient, and various reports of attacks have been confirmed. Connection requests made by DoS attacks are difficult to distinguish from legitimate connection requests made by ordinary users, making it difficult to identify these as attacks. Distributed Denial-of-Service (DDoS) attacks are an evolution of flood-type DoS attacks. As shown in Fig. 2, this attack uses a large number of hijacked devices to send packets in a distributed fashion. The scale of such an attack is reported to range from tens of thousands to tens of millions of devices. In addition, this attack requires far fewer connection requests per attacking machine than a DoS attack, making it more difficult to identify the attacker’s computer.

DDoS Attack Detection Using the Communication Interval

167

Fig. 1. DoS attack

Fig. 2. DDoS attack

2.2

DDoS Attack Detection Methods

DDoS attack detection methods are classified into two categories: signaturebased detection and anomaly-based detection. Signature-based detection is a method in which the characteristics of existing attacks are stored in advance as patterns. Incoming traffic is then compared to these patterns to identify matching attacks. This method offers high accuracy in attack detection if the correct patterns are registered in advance. However, this approach may present real-time processing time challenges as the volume of registered patterns and incoming packets increases. It also cannot respond to unknown attacks and necessitates regular pattern updates. On the other hand, anomaly-based detection is a method that considers all patterns that do not resemble registered normal patterns as attacks. The advantage of this detection method lies in its potential to detect unknown threats, its efficiency, and real-time operation. However, it has the drawback that it is difficult to define what constitutes “normal” in Internet traffic, and careful adjustment of parameters is required. Among anomaly-based detection methods, entropy-based methods are widely used because of their high computational speed and accuracy. Entropy-based methods detect attacks by calculating the entropy value using the packet header as the information source and monitoring its increase or decrease. Shannon’s entropy is one of the most widely used measures in this context. The primary

168

K. Iwasa et al.

challenge with entropy-based methods is how to determine the appropriate window size. To minimize the impact of outliers, a window size of about 1 min in hours [6] and tens of thousands in packets [7] is recommended. However, it has been noted that increasing window size decreases processing speed [8]. Anomalybased DDoS attack detection methods also posit that window size affects the accuracy of attack detection. 2.3

Previous Research

Usuzaki et al. proposed a method to detect DDoS attacks by monitoring Internet traffic with multiple window sizes using the Aggregation Pyramid [5]. By monitoring multiple windows simultaneously, it is expected that attacks can be detected quickly in narrow windows and that decisions can be made with less noise in wider windows. However, this method did not take into account the state of traffic data. An attack was determined to have occurred when an anomaly was detected in the majority of the monitoring windows, thus reducing the effectiveness of the minimum and maximum windows. Mizoguchi et al. expanded on the work of Usuzaki et al. and used the ADWIN algorithm to select windows and make attack decisions based on network traffic conditions [4]. A drawback in both methods was that normal packets contained within a window determined to be an attack were incorrectly determined to be attack packets. In particular, in the study by Mizoguchi et al., the percentage of packets determined to be attacks that were actually attacks was 67.6% (as measured by precision).

3 3.1

Proposed Method Packet Filtering Using the Communication Interval

In this study, the communication interval is used to distinguish attack packets from normal packets. The rationale behind this approach is that in a DDoS attack, a large number of packets that bring down a service are sent in a short period of time, resulting in a shorter communication interval compared to normal communication. More concretely, packets with the same source, destination IP address, and port number are considered as part of the same group. If the average communication interval for each group is less than a threshold value, it is deemed as an attack. Arrival counts of each group are recorded and if the window is marked as under attack, these counts are used to determine whether the arrived packets belong to an attack or not using Mizoguchi’s method. A problem in previous research was that all packets in a window identified as under attack were determined to be attack packets. Figure 3 shows the results of the classification of packets in windows that were determined to be attacks in the previous study and the present study, respectively. As shown in the Fig., the packet belongs only to either true positive (TP) or false positive (FP). The objective of this study is to convert FPs into true negatives (TNs) using the communication interval.

DDoS Attack Detection Using the Communication Interval

169

Fig. 3. Identification results of the packets in a window judged to be under attack in the previous research and proposed method

The present study does not address how to choose an appropriate threshold value, and the experiments described below will evaluate performances at various values. 3.2

Implementation

This study utilizes three associative arrays whose keys are a concatenated string of source IP address, source port number, source IP address, and source port number. Each key records the number of packets, length of communication time, and packet arrival interval. Table 1 summarizes the properties of the unordered map object used in our implementation. The specific processing steps are shown in Algorithm 1 below. Table 1. Properties of unordered map unordered map

Value corresponding to key

packet count map Number of packets interval sum map Sum of communication intervals for same group packets latest time map

Last arrival time of the same group packet

170

K. Iwasa et al.

Algorithm 1. Measurement of communication intervals 1: if No packets of the same type were observed. then 2: The new key for the packet type was prepared for the three unordered maps. 3: “1” is put into packet count map at the new key. 4: “1” is put into interval sum map at the new key. 5: Current time is put into latest time map at the new key. 6: else 7: The value at the key corresponding to the packet is increased by one in packet count map. 8: The value at the key corresponding to the packet is increased by the elapsed time since the latest arrival in the interval sum map. 9: Assign the current time to the corresponding value in the latest time map. 10: end if

An average communication interval for each packet group is calculated and compared to a set threshold value. If it is smaller than the threshold value, it is deemed an attack packet, and if it is larger, it is deemed a normal packet.

4 4.1

Evaluation Experiments Purpose of Experiment

This experiment aims to assess if the proposed method can reduce false positives compared to Mizoguchi’s method. Since the selecting indices for the appropriate threshold value have yet to be clarified, we test multiple values in the experiments and use the highest results for the comparison. 4.2

Experimental Methods

The performance of Mizoguchi’s method and the proposed method are compared based on the following indices. (1) The rate of packets judged as TN correctly by the proposed method against those incorrectly judged as FP by Mizoguchi’s method (2) The rate of packets judged as FN correctly by the proposed method against those incorrectly judged as TP by Mizoguchi’s method The values used in the proposed method are those for which the F-score is best across various threshold values. Thresholds range from 0 to 1 in increments of 0.01. The evaluation indicators are described in Sect. 4.3. 4.3

Experimental Conditions

The CICIDS2017 dataset is used for the experiments. CICIDS2017 is a dataset for IDS performance evaluation provided by the Canadian Institute for Cybersecurity (CIC). The dataset, captured from Monday, July 3, 2017, at 9:00 a.m.

DDoS Attack Detection Using the Communication Interval

171

to Friday, July 7, 2017, at 5:00 p.m., is divided into separate subsets for each day of the week. There are no attacks recorded on the Monday, but there are attacks on all other days of the week. Attack detection performance was evaluated using data from the Friday set, which includes DDoS attacks. Friday’s data also included botnet communications using the ARES tool and nmap port scan. Note that since the system is intended to monitor inbound packets, only inbound packets were extracted from the dataset in the experiment. Precision, recall, and F-score are used as evaluation metrics. These values are calculated using the following indicators. True Positive (TP) Number of packets correctly identified as attack packets. False Positive (FP) Number of normal packets judged as attack packets. True Negative (TN) Number of packets correctly identified as normal packets. False Negative (FN) Number of attack packets judged as normal packets.

P recision = Recall =

TP TP + FP

TP TP + FN

(1) (2)

2 × P r × Rc (3) P r + Rc Higher precision means fewer false positives when classifying normal packets as attacks, while higher recall means fewer missed attacks when detecting them. The F-score is calculated as the harmonic mean of precision and recall; the more equal they are, the higher the F-score. Since the proposed method is a binary classification, it is necessary to show that the detection accuracy is independent of the anomaly rate of the dataset. Therefore, the Matthews correlation coefficient is used [9]. The Matthews correlation coefficient takes values between [−1,1]. The higher the Matthews correlation coefficient, the less reliance the dataset has on the anomaly ratio. F =

M CC = 

(T P × T N ) − (F P × F N ) (T P + F P ) × (T P + F N ) × (T N + F P ) × (T N + F N )

The experimental environment is summarized in Table 2.

(4)

172

K. Iwasa et al. Table 2. Development Environment CPU

Intel(R) Core(TM) i7-7700 @3.60GHz

Computer memory

16GB

OS

Host OS: Windows 10 Guest OS: Ubuntu 20.04.3 (4thread, Computer memory set to 8192 MB)

Development language C++ Library

4.4

libpcap

Experimental Procedure

In the experiment, the proposed method was applied to the CICIDS2017 dataset, and the precision, recall, and F-score were calculated. Parameters were set according to Mizoguchi’s research. The window size is fixed at 1.0 s, level L is fixed at 60, and the Mahalanobis distance anomaly, d, is fixed at 4.9. In the proposed method, to determine whether or not a packet is an attack based on the communication interval, the threshold, t, is set arbitrarily, and packets with an average communication interval less than the threshold are categorized as attacks. The Mahalanobis distance is computed using the pre-training mean and variance. The pre-training uses the Monday data from CICIDS2017, which does not include any attacks, as the training dataset. The training dataset was the same as that used in the evaluation, with only inbound packets extracted. 4.5

Experimental Results

The proposed method reduced the number of false positives compared to Mizoguchi’s research. However, the best F-score threshold was 0.29 s, a very large value for a communication interval. Table 3 summarizes the breakdown of packets, Table 4 lists the detection accuracy when the F-score is at maximum1 , Table 5 provides the decrease in the false-positive rate and increase in the missed rate when comparing the proposed method and Mizoguchi’s method at maximum F-score, and Table 6 shows the detection and false-positive rates of DDoS attacks. The false-positive rate decreased by approximately 50% and the falsenegative rate increased by approximately 11%. As a result, the detection rate of DDoS attacks decreased, but false positives were reduced and F-score improved.

1

The results obtained with Mizoguchi’s method were recalculated and were not the same as the values in their paper because it was found that there was a programming error in the original paper.

DDoS Attack Detection Using the Communication Interval

173

Table 3. Packet details Total packets Number of DDoS attack packets Number of other packets 4993335

674422

4318913

Table 4. Detection accuracy at maximum F-score Pr Proposed method

Rc

F

M CC

0.7885 0.8746 0.8293 0.8025

Mizoguchi’s method 0.6757 0.9830 0.8009 0.7822 Table 5. Percentage decrease in false positives and increase in false negatives Percentage decrease in false positives Percentage increase in false negatives 50.2%

11.0% Table 6. Detection and false-positive rates of DDoS attacks DDoS attack detection rate False-positive rate of DDoS attacks

Proposed method

87.46%

3.66%

Mizoguchi’s method 98.30%

7.37%

5

Conclusions

In this study, we investigated a method to determine whether a window is under attack or not by calculating the average communication interval of packets in the same group and comparing it to a threshold value. Experimental results showed that the proposed method reduces the number of false positives. Future research topics include adjusting the window size so that DDoS attacks that appear only once in a window are not missed, and considering mechanisms that can accurately distinguish normal communications with short communication intervals, such as those related to OS updates, from attack activities. Acknowledgments. This work was supported by JSPS KAKENHI Grant Numbers JP21K11849, JP22K12013, and JP20K11812.

References 1. Garber, L.: Denial-of-service attacks rip the internet. Computer 33(4), 12–17 (2000). https://doi.org/10.1109/MC.2000.839316 2. Usuzaki, S., et al.: A proposal of highly responsive distributed denial-of-service attacks detection using real-time burst detection method. J. Inf. Process. 26, 257– 266 (2018)

174

K. Iwasa et al.

3. Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. Society for Industrial and Applied Mathematics (2007) 4. Momoka, M., Shotaro, U., Kentaro, A., Hisaaki, Y., Naonobu, O.: A real-time DDoS attack detection method using dynamic window size monitoring. In: 2022 Hinokuni - Land of Fire Information Processing Symposium (2022). (in Japanese) 5. Shotaro, U., Kentaro, A., Hisaaki, Y., Mirang, P., Nanobu, O.: Elastic denial-ofservice attack detection method by monitoring with multiple window size. In: Multimedia, Distributed, Cooperative, and Mobile Symposium, pp. 495–504 (2019). (in Japanese) 6. Vitali, D., et al.: DDoS detection with information theory metrics and Netflows-a real case. In: SECRYPT (2012) 7. Feinstein, L., et al.: Statistical approaches to DDoS attack detection and response. In: Proceedings DARPA Information Survivability Conference and Exposition, vol. 1. IEEE (2003) 8. Oshima, S., Nakashima, T., Sueyoshi, T.: Fast anomaly detection method using entropy-based mahalanobis distance. Inf. Process. Soc. Jpn. 52(2), 656–668 (2011) 9. Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405(2), 442–451 (1975)

Study of an Image-Based CAPTCHA that is Resistant to Attacks Using Image Recognition Systems Sojiro Nishikawa1 , Shotaro Usuzaki1 , Kentaro Aburada1(B) , Hisaaki Yamaba1 , Tetsuro Katayama1 , Mirang Park2 , and Naonobu Okazaki1 1

2

University of Miyazaki, Miyazaki, Japan [email protected] Kanagawa Institute of Technology, Atsugi, Japan

Abstract. In today’s digital age, image-based CAPTCHAs are increasingly vulnerable to attacks using annotation services, which tag images and classify images according to their contents, or reverse image search services. To prevent such attacks, an image-based CAPTCHA was proposed that takes advantage of the fact that humans can correctly recognize images containing many discontinuous points, while existing image recognition systems misrecognize them. However, this CAPTCHA proved susceptible to attacks using noise reduction filters. The objective of the present study is to create a CAPTCHA using images that are resistant to such filters. Images used in the new CAPTCHA were realized by increasing the proportion of lines forming discontinuous surfaces in images. Experimental results demonstrated a human recognition rate of 95.8%, with the image recognition systems successfully identifying only one image overall. Moreover, when a noise reduction filter was applied, the recognition rate was lower than those reported in previous studies. Keywords: CAPTCHA noise reduction filter

1

· image recognition · discontinuous point ·

Introduction

In recent years, CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) [1] have become increasingly important. As automated programs have begun mass acquiring accounts and posting of spam comments on bulletin boards, web services have faced notable challenges. CAPTCHAs have been employed to address these issues, and are used to prove that Internet users are humans, not bots. String-type CAPTCHAs and imagetype CAPTCHAs are the most common examples of CAPTCHAs. String-type CAPTCHAs typically involve the recognition of distorted or noisy strings, while image-based ones ask users to select a specific image that matches given criteria from an array of displayed images. However, the advances in optical character recognition (OCR) and machine learning technologies have resulted in the breaching of these conventional c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 175–184, 2024. https://doi.org/10.1007/978-981-99-9412-0_19

176

S. Nishikawa et al.

CAPTCHAs, and new, more resilient CAPTCHAs are needed. Moreover, even image-based CAPTCHAs have been compromised using annotation services that tag images and classify image content, and Google Reverse Image Search [2]. To address this, an image-based CAPTCHA has been proposed that takes advantage of the fact that humans can accurately recognize images containing lines that form discontinuities, a task where current image recognition systems fall short [3]. However, these CAPTCHAs are still vulnerable to attacks using noise reduction filters. In a previous study, it was found that applying a noise reduction filter to images improved recognition rates by up to 46%. In the present study, we introduce a CAPTCHA that is devised to be more resilient against noise reduction filters, a known weak point for existing CAPTCHAs. The design enhancement in the new CAPTCHA involves increasing the number of lines that make up the discontinuity in the images. The remainder of this paper is organized as follows. Section 2 reviews related studies, Sect. 3 explains the proposed method, Sect. 4 describes evaluation experiments assessing the practicality of the proposed method, and Sect. 5 provides a summary and discusses future challenges.

2 2.1

Related Work CAPTCHA

CAPTCHA is a technology used to verify that a website user is a human and not a bot. This is achieved by asking questions that are easy for a human to solve but difficult for machines. Typical CAPTCHAs include string-type CAPTCHAs and image-type CAPTCHAs. String-type CAPTCHAs display a string image with distortion and noise, and judges the user as human if the user can correctly type the string written in the image into the text box. String-type CAPTCHAs are the most prevalent type, but their effectiveness has been greatly undermined by advances in OCR technology. Image-based CAPTCHAs, on the other hand, utilize the advanced image recognition capabilities of humans and ask users to answer questions about the content of the images and what they have in common. If answered correctly, the user is authenticated as a human. This type of CAPTCHA is used to deal with malicious bots that repeatedly perform fraudulent tasks. However, annotation services are now being used to attack image-based CAPTCHAs, and new countermeasures are needed. Annotation services are used to tag images and classify image content. With such systems, image-based CAPTCHAs can be easily compromised. 2.2

Fooling Image Recognition by Many Discontinuous Points and Its Application to CAPTCHA

Hara et al. created an image that can be easily recognized by humans but misinterpreted by existing image recognition systems, and proposed a CAPTCHA

A CAPTCHA that is Resistant to Attacks Using Image Recognition Systems

177

Fig. 1. Image proposed by Hara et al.

based on this image [3]. Their approach involved the generation of the image shown in Fig. 1, demonstrating that adding a large number of discontinuities to the image led the image recognition system to misinterpret it. The image processing was performed in 2 steps: 1. To reduce the amount of information, the image is first converted to a grayscale version and then converted to n-value. 2. To create discontinuities, draw lines of luminance 0 and 255 at regular intervals across the image. In the experiments with human subjects, the average correct response rate was 100% and the average solution time was 4.65 s, suggesting that humans could easily interpret these images. In all of the tested image recognition systems, the images created by Hara et al. did not display any tags or descriptions related to the image content. However, applying the noise reduction filter shown below to the processed image and subjecting it to the above-mentioned image recognition system improved the recognition rate of the entire system. Notably, when a Gaussian filter was applied, the recognition rate of the image recognition system increased to 46%.

3 3.1

Proposed Method Summary

The objective of this study is to create images that are resistant to noise reduction filters, and to utilize these images to create a CAPTCHA. Since the CAPTCHA proposed by Hara et al. is vulnerable to noise reduction filters as discussed in Sect. 2.2, we improved the image generation method to maintain the feature of being easily recognized by humans but misinterpreted by image recognition systems. A sample of the proposed image is shown in Fig. 2. The image size is 672 × 418 pixels and image format is PNG.

178

S. Nishikawa et al.

Fig. 2. Example image created using the proposed method

This study distinguishes itself on two key aspects: (1) To address the vulnerability of Hara et al.’s method to noise reduction filters, the proportion of the lines comprising the discontinuous surfaces drawn in the image was increased. (2) In order to mitigate the problems that (1) made images darker and decreased human recognition rate, the Color Assimilation Grid Illusion was introduced to make the processed image easier for humans to recognize. We thought that by increasing the proportion of the lines that make up the discontinuity surface in the image, we could enhance its resistance to noise reduction filters. This stems from the fact that noise reduction filters process information based on surrounding pixels. In the proposed method, the proportion of lines constituting discontinuities in the image is about 83%. The color assimilation grating illusion, proposed by Øyvind Kol˚ as, is a visual effect in which colored grating lines placed on a grayscale image appear to add color to the entire image [4]. In this case, the color of the lines drawn in the grayscale image will be the color of the original image with increased saturation. This illusion also arises with objects other than grid lines, such as dots and parallel lines. 3.2

Image Processing Methods

To create images that are robust against noise reduction filters and cannot be accurately interpreted by image recognition systems, the following procedure was used in this study. Step. 1 The image is converted to a grayscale image and then into n-value to reduce the amount of information. Step. 2 Black lines are drawn to form grids at regular intervals on the image, thereby introducing discontinuity points.

A CAPTCHA that is Resistant to Attacks Using Image Recognition Systems

Fig. 3. Image after processing Step 2

179

Fig. 4. Line width and spacing

Step. 3 Dots for Color Assimilation Grid Illusion are drawn at regular intervals on the image. The first step aims to reduce color information, making it more difficult for image recognition systems to recognize the content. In Step 2, black lines are drawn to incorporate discontinuity points in the image. The image created at the end of Step 2 is shown in Fig. 3. As shown in Fig. 4, lines are drawn on the image at 3-pixel intervals, with 2-pixel spacing between lines. By repeating this process vertically and horizontally, a grid of black lines is formed. When lines are drawn in this manner, the black lines occupy about 84% of the image, rendering it less susceptible to noise reduction filters. However, the overall image appears dark, which may affect human perception. In Step 3, dots are drawn at regular intervals on the image for the Color Assimilation Grid Illusion, aiding human recognition. The image produced after Step 3 is shown in Fig. 5. The color of the illusion dots drawn is based on the color of the original image at the corresponding coordinates. Since the illusion dots are not drawn on a grayscale image but on an overall dark image, we considered that the effect of solely increasing saturation on color points would be weak. Therefore, the dots were drawn with both increased lightness and saturation. As shown in Fig. 6, the illusion dots are drawn one pixel at a time, spaced ten pixels apart. Even with the addition of the illusion dots, the black lines still occupy about 83% of the image, preserving the image’s resistance to noise reduction filters. 3.3

Suggested CAPTCHA

As shown in Fig. 7, the proposed image and a choice of m words, including one word that best describes the image, are displayed. Then, only the most appropriate word must be selected from the list to answer the question. The words displayed are predetermined by the author.

180

S. Nishikawa et al.

Fig. 5. Image after processing Step 3

4

Fig. 6. Size and spacing of illusion dots

Evaluation Experiment

In this chapter, the practicality of the images, their resistance to image recognition attacks, and their robustness against noise reduction filters are evaluated through experiments. Section 4.1 below describes an experiment to evaluate the utility of the images. In Sect. 4.2, experiments are conducted to verify resilience against attacks that use image recognition. In Sect. 4.3, experiments are carried out to assess resistance to attack using noise reduction filters. 4.1

Experiments to Evaluate the Utility of the Images

The purpose of this experiment is to evaluate whether images generated by the proposed method can be correctly recognized by humans. The experiment involved showing subjects the generated images, recording whether they answered correctly and the time it took them to answer, and then asking them to fill out a questionnaire on the practicality of the process. The participants were 12 students in their 20 s, and the experiment consisted of 10 questions with a proposed CAPTCHA with eight choices. The System Usability Scale (SUS) was used for the post-experiment questionnaire. Table 1 summarizes the results of the experiment, including the average correct response rate, average solution time, maximum solution time, minimum solution time, standard deviation of solution time, and SUS score. Document [3] is used as a comparison benchmark. The results of the experiment showed that humans could solve the CAPTCHA, but in some cases the solution time was too long. The average correct response rate of the proposed method is 95.8% and the average response time is 5.35 s. However, there is more variation in the solution time compared to the document [3]. The longest solution time is notably long, at 30.78 s. The questions that took the most time to solve were those in which the size of the

A CAPTCHA that is Resistant to Attacks Using Image Recognition Systems

181

Fig. 7. Proposed CAPTCHA Screen

object was small relative to the size of the image, and the percentage of correct answers for these images was low. Table 1. Experiment results and comparison with previous studies Proposed image Document [3] Average percentage of correct answers(%) 95.8

100

Average solution time (s)

5.35

4.65

Maximum solution time (s)

30.78

7.43

Minimum solution time (s)

2.28

2.59

Standard deviation of solution time

3.39

1.22

SUS Score

4.2

87.2

82.3

Evaluation Experiments on Resistance to Image Recognition Attacks

The purpose of this experiment is to determine if the proposed image can be recognized by an image recognition system and to assess the impact of optical illusion points on the image recognition system. The proposed image and the image after being processed in Step 2 were input into a general image recognition system, and the output results were analyzed. The 10 images used were the same

182

S. Nishikawa et al.

10 used in the experiment in Sect. 4.1. The image recognition systems tested were Google Vision API [5], Clarifai [6], Google Image Search [7], and CLIP [8]. If the output result of the image recognition system contained a description related to the original image, it was deemed that the system had successfully recognized the image. The results of the experiment show that the proposed image is resistant to image recognition systems. As summarized in Table 2, the recognition rate for the proposed image is one case for Google Vision API and none for other image recognition systems. Table 3 lists the output results of each image recognition system for an example image of a rooster. As the output results do not include a description of a rooster, it can be said that the image recognition system misrecognized the rooster image. Table 2 further indicates that the images from Step 2 have a lower recognition rate than the images created using the proposed method. Therefore, the recognition rate of the image recognition system could potentially be increased by the optical illusions used. Table 2. Correct recognition rate of each image recognition system Image Recognition System

Number of successfully recognized proposed images

Number of successfully recognized Step 2 images

Google Vision API

1

0

Clarifai

0

0

Google Image Search

0

0

CLIP

0

0

4.3

Evaluation Experiments on Resistance to Noise Reduction Filters

The purpose of this experiment is to determine whether the machine’s recognition performance improves when a noise reduction filter is applied to images Table 3. Output results by image recognition systems for the example “rooster” image

A CAPTCHA that is Resistant to Attacks Using Image Recognition Systems

183

Table 4. Recognition status results for each filtered image Image Recognition System

Average filter

Gaussian filter

Median filter

Opening process

Google Vision API

1

1

0

0

Clarifai

0

0

0

0

Google Image Search

0

0

0

0

CLIP

0

0

0

0

Success rate

2.5%

2.5%

0%

0%

4%

46%

10%

Success rate in Document [3] 10%

Fig. 8. Examples of images with a noise reduction filter applied

created with the proposed method. A noise reduction filter was applied to the proposed image, which was then input into a standard image recognition system for output analysis. The noise reduction filters used were an average filter, a Gaussian filter, a median filter, and an opening process, with other conditions kept the same as those in Sect. 4.2. As an example, Figs. 8 shows “rooster” images with each filter applied. The median filter and opening process rendered the entire image black, thus making it completely unrecognizable. Experimental results show that the success rate of the image recognition system is up to 2.5%, suggesting that the image is not susceptible to noise reduction filters. Table 4 summarizes the recognition status of the image recognition system after each filtering process. It can be seen that the success rates of all the denoising filters are lower than in the document [3].

184

5

S. Nishikawa et al.

Conclusion

In this study, we attempted to improve the resistance of CAPTCHAs to noise reduction filters, a known vulnerability of the method proposed by Hara et al., by increasing the proportion of straight lines that make up discontinuous surfaces drawn in the image. Furthermore, to ease human recognition, we integrated an innovative method incorporating dots that create an illusion of the image being colored. Experimental results show the effectiveness of the proposed method. The proposed image is readily recognizable by humans but difficult for image recognition systems, and it is resilient to noise reduction filters. However, it was found that describing the dots that cause the illusion effect may improve the recognition rate of image recognition systems. Future issues include consideration of the optimal size of image content that is easily recognized by humans, consideration of the appropriate number of choices, and proposals for CAPTCHA methods that are resistant to bots. Acknowledgments. This work was supported by JSPS KAKENHI Grant Numbers JP21K11849, JP22K12013, and JP20K11812.

References 1. Von Ahn, L., Blum, M., Langford, J.: Telling humans and computers apart automatically. Commun. ACM 47(2), 56–60 (2004) 2. Sivakorn, S., Polakis, J., Keromytis, A.D.: I’m not a human: breaking the google reCAPTCHA. Black Hat 14, 1–12 (2016) 3. Toru, H., Takashi, S.: Fooling Images Recognition by Many Discontinuous Points and Its Application to CAPTCHA. Inform. Process. Soc. Jpn. 60(12), 2139–2146 (2019) (in Japanese) 4. Color assimilation grid illusion. https://www.patreon.com/posts/color-grid-28734535. Accessed 28 Jan 2023 5. Vision AI cloud vision API google cloud. https://cloud.google.com/vision?hl=ja. Accessed 28 Jan 2023 6. General-image-recognition — Clarifai Community. https://clarifai.com/clarifai/ main/models/general-image-recognition?inputId=C5yYr. Accessed 28 Jan 2023 7. Google image search. https://www.google.co.jp/imghp?hl=ja&tab=ri&ogbl. Accessed 28 Jan 2023 8. CLIP: connecting text and images. https://openai.com/blog/clip/. Accessed 28 Jan 2023

Blind Image Quality Assessment Using Standardized NSS and Multi-pooled CNN Nay Chi Lynn(B) , Yosuke Sugiura, and Tetsuya Shimamura Graduate School of Science and Engineering, Saitama University, Saitama, Japan [email protected], {ysugiura,shima}@mail.saitama-u.ac.jp http://www.sie.ics.saitama-u.ac.jp/

Abstract. This paper proposes a blind image quality assessment (BIQA) method that combines the natural scene statistics (NSS) based feature extraction and the multi-pooled image feature extraction. The two features are concatenated and fully connected layers are utilized to output the image quality score. In the NSS feature extraction part, the mean subtracted contrast normalization is first conducted and then a CNN structure is followed. In the multi-pooled image feature extraction part, the structure of spatial pyramid pooling (SPP) is effectively embedded in a CNN structure. The proposed BIQA method is an end-toend learning technique. In experiments, the performance of the proposed method is compared with that of the state-of-the-art methods through the use of several databases. The experimental results show a superior accuracy of the proposed method for BIQA. Keywords: blind image quality assessment · natural scene statistics convolutional neural network · spatial pyramid pooling

1

·

Introduction

Humans can easily assess the information of images as noisy or blurry for some specific quality. However, the assessment of image quality is not an easy task for the algorithms particularly when any reference image is not available [1]. For the blind (or no-reference) image quality assessment (BIQA), only input for the algorithms is the distorted image whose quality should be measured. The availability of the reference image for assessing the quality of the distorted image may not always be possible in real world applications, which is the reason and motivation why the BIQA algorithms are required and discussed from a computational point of view. The BIQA methods found in the literature can be classified into two major categories: (i) analyzing the image features using the statistical approach [2–6], and (ii) learning the image features using the data driven approach [7–14]. The first category relies on measuring the statistical distribution of intensity values of the distorted image based on the property that natural pristine images possess a particular regularity on the distribution of pixel intensities. The perceptual c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 185–192, 2024. https://doi.org/10.1007/978-981-99-9412-0_20

186

N. C. Lynn et al.

quality of the image distorted by some kind of visual distortion can be determined based on the so-called natural scene statistics (NSS). The NSS-based methods usually extract features from some specific transform domain such as spatial [2], wavelet transform [3], discrete cosine transform (DCT) [4], curvelet transform [5], and shearlet transform [6] domains. After the appearence of the deep learning techniques, the learning based approaches became popular due to the ability of convolutional neural network (CNN) architectures to automatically learn the image features. Since the deep learning based features are data-driven (the features are learned from the available data), they allow the model to learn the most important features for the task at hand. The data driven approaches, however, can face poor performance on unseen data such as unseen distortion types, images, and levels, etc. in IQA task. Besides, although the existing IQA methods have demonstrated that the CNN is powerful for learning the discriminative features from the distorted images, a CNN with multi-scale feature representations in image quality assessment is prominently more effective than the fixed scale learning. The NSS and CNN based IQA methods have their strengths and weaknesses in spite of their corresponding performance to some extent. To overcome this, this paper proposes a BIQA technique by combining the NSS features and a CNN based deep multi-pooled features. We inspire this kind of architecture from the previous accomplished IQA methods [15–17], which extract the NSS features firstly and then utilize the convolutional layers to learn the mapping between the feature vectors and the corresponding quality score. In our method, we utilize a spatial pyramid pooling (SPP) based CNN instead of several convolutional layers to extract the feature vectors [18–23], and then extract the NSS features in parallel. Then, these parallel extracted feature vectors are concatenated as a new feature vector to the fully connected layers. Finally, the quality score is predicted by the product learned from the new feature vector. Each part of the architecture of our proposed model is detailed in the next section.

2

Proposed NSS Integrated Multi-pooled CNN Method

This paper proposes an end-to-end blind synthesized image quality assessment method by using a two-streams CNN network, the architecture of which is shown in Fig. 1. Our CNN is designed by sharing two pipelines in feature extraction. The first main stream is an end-to-end CNN based on the SPP method. It takes an image size of 224 × 224 as the input. Then, the feature maps of two streams after pyramid pooling are concatenated to perform feature pyramid from the network structure. The second auxiliary stream is a multi-layer perceptron network, the input to which is the NSS features extracted from the image. We then apply concatenation to aggregate the multi-level features. Finally, several fully connected layers are used to map the multi-scale features to perceptual image quality scores. The proposed method provides an improved IQA performance and exhibits generalisation capability when it is tested on various IQA databases.

Standardized NSS and Multi-pooled CNN

187

Fig. 1. Architecture of the proposed system

2.1

Image Features Extraction Using Multi-Pooled CNN

Existing IQA methods have demonstrated that applying CNN is powerful for learning discriminative features from image data. However, developing more effective CNN with the consideration of multi-scale feature representations for distorted image quality assessment is still leading. The local regions or the whole region of the distorted images are almost quite different if it is distinguished carefully with different sparse sampling. Therefore, multi-scale features learning which consider the discriminative feature scales on region is important for image quality evaluation. Our proposed BIQA method designs an end-to-end multi-pooled CNN to learn multi-scale features so that it overcomes the limitation of using one single fixed-scale features on perceptual visual quality. Low-level features are learned in the early layers, while more detail information is figured out in the deeper layers. We utilize the SPP for multi-scale feature extraction in the network structure. The SPP considers both local and global spatial features. It works by dividing the input image into a grid of spatial bins, and the features from each bin are then pooled together to create a single feature vector for that bin. This allows CNNs to be trained on images of different sizes, without explicitly resizing the images to the fixed size. 2.2

Natural Scene Statistics Features Extraction

NSS features have a set of statistical properties that describe the distribution of intensity values in images. This natural property identifies the parts of the image that have been corrupted regardless of the distortion type. NSS features are regarded to be distortion-generic and hence depict the robustness in performance on cross database validation. Therefore, we consider to propose the multi-pooled CNN architecture with the parallel combination with NSS features alongside with the aid of the quality assessment task to obtain better performance. If

188

N. C. Lynn et al.

there is any kind of distortion or artificiality in the image, the NSS features vary accordingly with respect to the intensity of distortion or artificiality. We extract the NSS statistical features by following the well-known blind image spatial quality evaluator (BRISQUE) method [2]. BRISQUE extracts the NSS features in spatial domain by means of the statistical property of the pixel intensity distribution. A generalized Gaussian distribution can be simulated by mean substracted contrast normalization (MSCN) coefficients. The MSCN coefficients take the locally normalized mean and variance as features, which are obtained by subtracting local mean of the current pixel intensity value (μ) and further dividing with local standard deviation (σ). It can be represented mathematically as follows. The image intensity I(i,j) at pixel (i,j) is transformed to ˆ the luminance I(i,j) as ˆ j) = I(i, j) − μ(i, j) (1) I(i, σ(i, j) + C where i ∈ 1, 2, ... M ; j ∈ 1, 2, ... N, in which M and N are height and width of the image, respectively. μ(i,j) and σ(i,j) are the local mean and local variance, respectively which are formulated as follows μ(i, j) =

Q P  

Gp,q Ip,q (i, j)

(2)

p=−P q=−Q

  P Q     σ(i, j) = Gp,q (Ip,q (i, j) − μ(i, j))2

(3)

p=−P q=−Q

The difference between the pristine images and the distorted images is not limited to pixel intensity distributions. Thus, BRISQUE also captures the joint distribution relationships of the center pixel with its four neighbors. To capture the neighborhood relationships, the pair-wise products of the MSCN image with its shifted version of the MSCN image is considered. Four orientations are used to find the pairwise product for the MSCN coefficients, namely: Horizontal (H ), Vertical (V ), Left-Diagonal (D1 ), Right-Diagonal (D2 ). They are specifically represented as ˆ j)I(i, ˆ j + 1) H(i, j) = I(i, ˆ j)I(i ˆ + 1, j) V (i, j) = I(i, (4) ˆ j)I(i ˆ + 1, j + 1) D1(i, j) = I(i, ˆ j)I(i ˆ + 1, j − 1) D2(i, j) = I(i, By following (1), (2), (3), (4), a vector of size 36×1 NSS features is extracted. And after finishing the extraction task, we standardize the NSS feature vectors to [0,1] using the min-max standardization. The min-max standardization is a simple standardization technique where the minimum and maximum intensity values of the image are scaled to a desired range. We apply this standardization technique because it compresses the absolute values of the features into the specified range without changing the original distribution, that is, the original

Standardized NSS and Multi-pooled CNN

189

Fig. 2. Spatial pyramid pooling

scale of the data is preserved, just making it more interpretable. We assume that maintaining the original range of the NSS features is important, min-max standardization is preferred to apply here. The formula for the min-max standardization is given as S(i, j) = (O(i, j) − Omin )/(Omax − Omin )

(5)

where S(i,j) is the standardized output, O(i,j) is the original input, and Omin and Omax are the minimum and maximum values in O(i,j), respectively. 2.3

Methodology

Our proposed method has three convolutional layers, two max pooling layers in which one is the SPP, three flatten layers, two concatenate layers, and five fully connected (FC) layers as shown in Fig. 1. When an image is inputted, we utilize the Conv-Pool (convolution and pooling) block. The convolution layer is applied to extract the features by taking the rectified linear unit (ReLU) activation function, and the maximum pooling layer is followed to reduce the spatial dimensions of convolution layers. The filter sizes are 32, 64, 64, and the kernel sizes are 3, 3, 3 for these three convolutional layers, respectively. Then in the pooling layer of the second block, the extracted features of the first block are pooled through three different scales by the SPP instead of the fixed pooling layer. The filter scales that are applied here are 1×1, 3×3, 5×5, and 7×7 as shown in Fig. 2. Afterward, we obtain the output feature maps from the third convolutional layer of distorted image. Inspired by [19–22] where multi-scale features are applied for IQA, we utilize multi-scale features learning in the proposed network. With the aid of learning multi-scale image features, we add the SPP in the network structure of the proposed network to the output feature maps of the

190

N. C. Lynn et al.

Table 1. Performance comparison on the CSIQ, TID2008, TID2013, and KADID-10k databases. The best results are shown in bold. Database

Method MeoN [10]

LLM [22]

BosI [7]

DeepS [13]

DiqaM NssA [11] [15]

DeepB [8]

DmsF [23]

Ours

CSIQ

PLCC SRCC

0.850 0.839

0.900 0.905

– –

0.919 0.919

– –

0.927 0.893

– –

0.952 0.960

0.952 0.951

TID2008

PLCC SRCC

– –

0.897 0.908

– –

– –

– –

– –

– –

0.937 0.928

0.939 0.931

TID2013

PLCC SRCC

0.828 0.811

0.907 0.904

0.929 0.926

0.872 0.846

0.850 0.839

0.910 0.844

0.949 0.951

0.922 0.906

0.950 0.948

– –

– –

0.628 0.630

– –

0.882 0.890

– –

0.912 0.896

– –

0.939 0.938

KADID10k PLCC SRCC

third convolutional layer. The pyramid pooling in our SPP consists four-levels with bin size as 1, 3, 5, and 7 respectively, which can capture both local and global spatial features with multi-scales. We flatten each output of the pyramid pooling modules, and then the flattened feature pyramid with size 4096, 1024, 64, and 64 are concatenated into a single vector that is the input for the next stage. After concatenating, the feature vectors are fully connected using the dense layer. Then, these fully connected feature vectors are flatten and concatenated with the NSS feature vector The concatenated (image pyramid and NSS) feature vector becomes the input of the quality prediction module.

3

Experimental Results

In this section, we show the results of comprehensive experiments on public IQA databases which demonstrate the validity of the proposed IQA method. Four widely-used image quality benchmark databases are utilized for performance evaluation which are TID2008 [24], CSIQ [25], TID2013 [26], and KADID-10k databases [27]. Each database is randomly divided into 80% for training and 20% for testing. Two commonly used criteria, which are the Spearman Rank Order Correlation Coefficient (SRCC) and Pearson Linear Correlation Coefficient (PLCC), are utilized to evaluate the performance of the proposed BIQA method. The SRCC is used to evaluate prediction monotonicity, while the PLCC aims to evaluate prediction accuracy. To validate the performance of the proposed method, we compare with stateof-the-art IQA methods such as MeoN [10], LLM [22], BosI [7], DeepS [13], DiqaM [11], NssA [15], DeepB [8], and DmsF [23]. The performance results are listed in Table 1, where the best performance in each row is highlighted in boldface. From Table 1, we can see that the proposed method outperforms the state-of-the-art IQA methods, especially deep learning based algorithms, which includes the LLM [23], MeoN [10] and DiqaM [11]. One probable explanation about this result is that multiscale characteristics are not taken into account

Standardized NSS and Multi-pooled CNN

191

in the CNN based methods. We also investigate with the method that employs the NSS features to the deep learning method NssA [15]. The performance of our method outperforms on all reported databases of them. In addition, our BIQA method gets even comparable high performance with the full referenceIQA method that learn multi-scale features using pyramid pooling, which is DmsF [24]. This can further confirm the quality prediction ability and the robustness of our proposed method.

4

Conclusion

In this paper, we proposed an natural scene statistics integrated end-to-end optimized CNN with multi-scale features learning method for BIQA. Our method is designed based on multi-task learning by combining natural scene statistics features as an auxiliary task to the multi-scale image feature learning main task for image quality prediction. The spatial pyramid pooling structure is considered in the multi-scale image feature representation task. In order to evaluate the performance of our method, we utilized four IQA benchmark databases. Experimental results confirmed that our proposed method outperforms the state-of-the-art IQA methods and demonstrated the effectiveness of the aid of combining of multi-scale image features and the natural scene statistics features.

References 1. Kamble, V., Bhurchandi, K.M.: No-reference image quality assessment algorithms: a survey. Optik 126(11–12), 1090–1097 (2015) 2. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012) 3. Moorthy, A.K., Bovik, A.C.: Blind image quality assessment: from natural scene statistics to perceptual quality. IEEE Trans. Image Process. 20(12), 3350–3364 (2011) 4. Saad, M.A., Bovik, A.C., Charrier, C.: Blind image quality assessment: a natural scene statistics approach in the DCT domain. IEEE Trans. Image Process. 21(8), 3339–3352 (2012) 5. Liu, L., Dong, H., Huang, H., Bovik, A.C.: No-reference image quality assessment in curvelet domain. Signal Process.: Image Commun. 29(4), 494–505 (2014) 6. Li, Y., Po, L.M., Xu, X., Feng, L.: No-reference image quality assessment using statistical characterization in the shearlet domain. Signal Process.: Image Commun. 29(7), 748–759 (2014) 7. Bosse, S., Maniry, D., Wiegand, T., Samek, W.: A deep neural network for image quality assessment. In: Proceedings of IEEE International Conference on Image Processing, pp. 3773–3777 (2016) 8. Bianco, S., Celona, L., Napoletano, P., Schettini, R.: On the use of deep learning for blind image quality assessment. Signal, Image Video Process., 355–362 (2018) 9. Kim, J., Nguyen, A.D., Lee, S.: Deep CNN-based blind image quality predictor. IEEE Trans. Neural Netw. Learn. Syst. 30(1), 11–24 (2018) 10. Ma, K., Liu, W., Zhang, K., Duanmu, Z., Wang, Z., Zuo, W.: End-to-end blind image quality assessment using deep neural networks. IEEE Trans. Image Process. 27(3), 1202–13 (2017)

192

N. C. Lynn et al.

11. Bosse, S., Maniry, D., M¨ uller, K.R., Wiegand, T., Samek, W.: Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 27(1), 206–219 (2017) 12. Ma, K., Liu, X., Fang, Y., Simoncelli, E.P.: Blind image quality assessment by learning from multiple annotators. In: Proceedings of IEEE International Conference on Image Processing, pp. 2344–2348 (2019) 13. Gao, F., Wang, Y., Li, P., Tan, M., Yu, J., Zhu, Y.: DeepSim: deep similarity for image quality assessment. Neurocomputing 257, 104–114 (2017) 14. Cheng, Z., Takeuchi, M., Kanai, K., Katto, J.: A fully-blind and fast image quality predictor with convolutional neural networks. IEICE Trans. Fundam. Electron., Commun. Comput. Sci. 101(9), 1557–1566 (2018) 15. Yan, B., Bare, B., Tan, W.: Naturalness-aware deep no-reference image quality assessment. IEEE Trans. Multimedia 21(10), 2603–2615 (2019) 16. Ge, D., Song, J.: Blind image quality assessment bases on natural scene statistics and deep learning. In: Proceedings of IEEE International Conference on Computer Sciences and Automation Engineering, pp. 939–945 (2016) 17. Jain, P., Shikkenawis, G., Mitra, S.K.: Natural scene statistics and CNN based parallel network for image quality assessment. In: Proceedings of IEEE International Conference on Image Processing, pp. 1394–1398 (2021) 18. Varga D.: Multi-pooled inception features for no-reference image quality assessment. Appl. Sci. 10(6), 2186 (2020) 19. Chen, J., Qin, F., Lu, F., Guo, L., Li, C., Yan, K., Zhou, X.: CSPP-IQA: a multiscale spatial pyramid pooling-based approach for blind image quality assessment. Neural Comput. Appl., 1–12 (2022) 20. Lu, Y., et al.: Blind image quality assessment based on the multiscale and dualdomains features fusion. Concurrency Comput.: Pract. Exper., e6177 (2021) 21. Wang, X., Wang, K., Yang, B., Li, F.W., Liang, X.: Deep blind synthesized image quality assessment with contextual multi-level feature pooling. In: Proceedings of International Conference on Image Processing, pp. 435-439 (2019) 22. Wang, H., Fu, J., Lin, W., Hu, S., Kuo, C.C., Zuo, L.: Image quality assessment based on local linear information and distortion-specific compensation. IEEE Trans. Image Process. 26(2), 915–926 (2016) 23. Zhou, W., Chen, Z.: Deep multi-scale features learning for distorted image quality assessment. In: Proceedings of IEEE International Symposium on Circuits and Systems ISCAS, pp. 1–5 (2021) 24. Ponomarenko, N., Lukin, V., Egiazarian, K., Astola, J., Carli, M., Battisti, F.: Color image database for evaluation of image quality metrics. In: 10th Workshop on Multimedia Signal Process, pp. 403–408 (2008) 25. Larson, E.C., Chandler, D.M.: Categorical image quality database. http://vision. okstate.edu/csiq 26. Ponomarenko, N., et al.: Image database TID2013: peculiarities, results and perspectives. Signal Process. Image Commun. 30, 57–77 (2015) 27. Lin, H., Hosu, V., Saupe, D.: KADID-10k: a large-scale artificially distorted IQA database. In: 11th Proceedings of International Conference on Quality of Multimedia Experience, pp. 1–3 (2019)

Abnormal Activity Detection Based on Place and Occasion in Virtual Home Environments Swe Nwe Nwe Htun , Shusaku Egami , Yijun Duan , and Ken Fukuda(B) Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-Ku, Tokyo 135-0064, Annex, Japan {swenwe.nwehtun,s-egami,yijun.duan,ken.fukuda}@aist.go.jp

Abstract. This paper focuses on enhancing the safety of older adults in their home environment by analyzing and distinguishing between abnormal and normal states in their daily living activities. We initially propose simulating virtual abnormal and everyday activities to address the challenge of limited real-world datasets containing abnormal activities. The technical approach consists of three main components to achieve our purpose: object detection, feature extraction, and a context-aware decision-making process. Specifically, object detection is performed using transfer learning from the YOLO pre-trained model to identify objects within the environment. Then, we propose a virtual grounding point feature extracted from skeleton images, enabling the prediction of transition from a normal to an abnormal human posture. Furthermore, the likelihood of posture deformities is calculated using skeleton joint points. Finally, the decision-making process takes into account the place and occasion within the home environment by using the Hidden Markov Model to provide the abnormal and normal state discrimination for context-aware safety assessments. Our proposed approach utilizes the VirtualHome2KG dataset, which has demonstrated its effectiveness in identifying abnormal and normal situations to enhance the safety of older adults. Keywords: Older Adults · Abnormal and Normal · Virtual Grounding Point · Human Posture · Place and Occasion · Hidden Markov Model

1 Introduction The digitization of diverse fields has brought about significant advancements in human action recognition, risk identification in daily activities, and the training of nursing robots for performing daily tasks and monitoring individuals in different environments. The growing global aging population presents unique challenges and opportunities, especially concerning the safety, quality of life, and health welfare concerns of older adults living in their homes. To effectively address these needs, the development of more diverse and inclusive datasets, along with efficient recognition methods, is crucial. Creating virtual datasets of daily activities in cyberspace allows for the simulation of various scenarios and interactions. Contextually enriched activity detection methods based on virtual datasets can enhance accurate activity recognition, behavioral analysis, anomaly detection, safety and risk mitigation, and continuous learning and improvement of the activity detection system. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 193–205, 2024. https://doi.org/10.1007/978-981-99-9412-0_21

194

S. N. N. Htun et al.

This paper focuses on supporting the safety of older adults in their home environment by discriminating between abnormal and normal states in daily living activities using a virtual activity dataset. This work comprises activity scenario simulation and processing methodological approach. The simulation of activity scenarios represents typical everyday living situations like walking, sitting, standing, cooking, cleaning, and abnormal falls. This simulation enables us to use it as a powerful tool that provides a way to obtain a large number of samples by forming complex human activities. The technical processing part involves object detection as a preprocessing step, feature extraction from human posture, and decision-making process using the Hidden Markov Model (HMM), including the awareness of human-environment relationships. Our proposed approach takes into account hierarchical decision-making from the aspects of the pose, estimating posture deformities as an occasion and human-environment relationships, estimating the place of humans surrounded by the objects. The significant purpose of this paper is to develop an HMM model that predicts the discrimination of abnormal from normal states utilizing the data acquired through the virtual human activity dataset. Additionally, this work aims to provide discernment into generating daily activity data in cyberspace and human-robot interaction (HRI) simulation technologies to promote the safety management system. Overall, this proposed work is a valuable contribution to supporting the safety and well-being of individuals in their home environment. By leveraging a virtual activity dataset and activity scenario simulation, along with a processing methodological approach, this work strives to effectively discern between normal and abnormal states in daily living activities, leading to enhanced safety and timely assistance for older adults.

2 Related Work The advances in generating simulated activity video data of avatars in cyberspace have received significant attention in recent years due to their potential benefits in improving embodied AI systems that can uncover everyday hazards that are challenging to detect [1]. Our previous work [2], which considered critical criteria for fulfilling human daily activity recognition requirements and provided a survey of existing datasets for activities of daily living in both real-life environments and virtual spaces, is essential for advancing research in this area. The VirtualHome2KG framework1,2 [3, 4], proposed by Egami et al., focuses on generating synthetic Knowledge Graphs (KGs) that represent daily life activities in virtual spaces. This framework is designed to facilitate the generation of virtual environments populated with avatars that perform various daily activities. These synthetic KGs represent the context of simulated activity video data, enabling researchers to study and analyze human behavior in virtual settings. Through the analysis, Egami et al. [5] showcased the capabilities of VirtualHome2KG in learning daily life activities in virtual space. The framework not only provides a rich source of synthetic data for behavior recognition research but also enables the exploration of different AI techniques to address 1 https://github.com/KnowledgeGraphJapan/KGRC-RDF/blob/kgrc4si/README.md. 2 https://challenge.knowledge-graph.jp/2022/index_en.html.

Abnormal Activity Detection Based on Place and Occasion

195

real-world challenges, such as fall risk detection in older adults. Our work utilizes the virtualHome2KG framework to generate daily activity data in order to detect and identify the risks in daily activities. Miscellaneous studies have widely used machine learning techniques for human activity recognition (HAR) and anomaly detection [6–8]. These techniques leverage the power of algorithms and statistical models to analyze and understand patterns in data, enabling the automatic identification and categorization of human activities and the detection of abnormal behavior. For instance, our previous work [6] focused on abnormal event detection, particularly in human posture. In that work, feature extraction techniques from human posture data are employed, and statistical analysis is applied to identify and detect abnormal events with high precision. In this paper, our work follows a similar concept of feature extraction, but the use of different data to impact the diversity of data and computation methods can lead to unique contributions and insights in the field of abnormal activity detection based on occasion. In the study [9–12], the researchers focus on surveying and designing human-object interaction methods to improve the recognition of spatial relations between humans and objects to enhance the performance of HAR systems by accurately understanding how humans interact with various objects in their environment. The study [10] proposed Far Near Distance Attention module, facilitating information propagation between humans and objects while considering their spatial distance skillfully. In [11], the ratiotransformer network demonstrates the capability to learn feature information not only from the target (person or object) itself but also from the association features between persons and objects. Our work simply focuses on understanding the relationship between a person and his or her surroundings by calculating the direction of each object from the person’s perspective. For abnormal activity detection, HMM implementation has been utilized, as demonstrated in our previous studies [12]. Our current work’s application of HMM for abnormal activity detection, along with the exploration of different input features in the hierarchical HMM model, contributes to the ongoing efforts to enhance anomaly detection systems.

3 Proposed Approach The proposed architecture for detecting abnormal and normal activities using the Hidden Markov Model (HMM) with VirtualHome2KG data is described in this section. The architecture comprises four main modules, as illustrated in Fig. 1. The technical procedures for each part are explained in detail in Sects. 3.1 to 3.5, respectively. 3.1 Activities Simulation Task Using VirtualHome2KG In this paper, we contribute the simulation task to generate realistic virtual environments representing daily household activities and abnormal activities using the VirtualHome2KG framework [3, 4] along with the Unity3 platform. Our contribution to the activity’s simulation is providing knowledge to fully grasp the concepts. The activities 3 https://github.com/xavierpuigf/virtualhome.

196

S. N. N. Htun et al. Activity Simulation Task

Activity name: Clean fridge Description: A person walks into the kitchen

Object Detection

and approaches the refrigerator. Realizing that it is time to clean the fridge, decide to find the appropriate cleaning materials. Then, ………….

beside kitchencounter? in front of stove? beside kitchen drawer?

Program [WALK] (308) [WALK] (268) [GRAB] (268) [WALK] (308) [OPEN] (308) [WIPE] (268) (308) [CLOSE] (308) [WIPE] (268) (308)

Human-Environment Awareness Where is person?

Feature Extraction

Detection of Abnormal/Normal State Normal

s2

Obtained Videos

view1

view2

o1

Abnormal

o2 o3 o4 o5

s1 o6 o7 o8

Observable states

Fig. 1. Overall architecture of the proposed approach for discrimination of abnormal state.

of daily living (ADLs) contain the basic daily life skills that people need to perform independently. When we learn the logical progression of household cleanliness, individuals are taught how to complete healthy basic household chores. For example, in the “Washing dishes” story, the first step is to fill the sink with soap and water. The second step is to place dirty dishes in soapy water and scrub each using a sponge. The third is raising the soap off the cleaned dishes and drying the dishes off with a kitchen towel or paper. Thus, we simulate the activity stories as a step-by-step logical progression applied in a real-life home. In addition, we contribute to increasing variability of abnormal activities through the simulation using VirtualHome2KG framework. Due to our observations on identifying the risks of older adults at home, it is found that most research works focus on two directions for discriminating abnormal situations from normal daily activities. The first one is fall detection [13], and the second one indicates extracting unusual resources from the daily routine of individuals [14]. Abnormal activity detection started many years ago; however, it is tough to precisely define abnormal activity or behavior since each person’s behavior is different. Another thing is that there is not enough data to develop analysis due to privacy issues. To address the first research direction, we have considered the risks that might occur at home and simulated abnormal activities, including falls, e.g., “Fall in bathroom.” Furthermore, we executed simulations to broaden the scope of typical daily activities to establish links to everyday routines. Thus, this enhancement holds potential advantages for pursuing a secondary research direction. Our simulated activities4 are available under the VirtualHome2KG framework. 3.2 Object Detection In this work, we inspired performing transfer learning using the pre-trained you only look once (YOLO v4) object detection network [15, 16], a one-stage vision technique 4 https://github.com/KnowledgeGraphJapan/KGRC-RDF/releases.

Abnormal Activity Detection Based on Place and Occasion

197

for finding the interests of multiple objects in an image. This section presents how to perform transfer learning using the pre-trained YOLO v4 network for object detection. Our VirtualHome2KG [3, 4] dataset generates the JSON data file with 2D bounding boxes of objects in image for every five-frame interval. Using VirtualHome2KG dataset, 2D bounding boxes of JSON files and corresponding images are first collected to prepare the ground-truth data. However, 2D bounding boxes obtained from Unity are incompatible with YOLO v4 network due to different image coordinate systems. Thus, the position of 2D bounding boxes of objects in sequential images is recalculated according to the form of [x y width height]. The prepared data is used to train the YOLO v4 network using transfer learning, to do it able to detect the custom VirtualHome2KG objects. This work applies three categories of daily human activities (EatingDrinking, HouseCleaning, Work) and one category of abnormal activities to prepare the ground-truth data that contains 61520 images. 60% of the dataset is split into a training set for training the network, and 40% is a test set for evaluating the network. Skeleton Pose Data. This section introduces the acquisition of the human skeleton structure from VirtualHome2KG [3, 4]. The dataset provides multiple views, including first-person and third-person perspectives, as well as three additional views recorded from cameras installed in the corners of each room. These multiple camera angles ensure comprehensive coverage of every action in various activity scenarios. The dataset goes beyond actions confined to a single room and includes actions involving room transitions. This means that the dataset captures not only the interactions and activities within a room but also the movement of individuals between different rooms. This makes the dataset more versatile and applicable to a broader range of scenarios and activities that may occur in a home environment. However, the virtual skeleton data is directly extracted from the Unity platform for a single view only. The skeleton posture and position that have not been generated identically for each camera view in the dataset can be a limitation when trying to align the skeleton data with the agent in each scene of the video in this work. However, the motivation for using virtual skeleton data in the VirtualHome2KG dataset has its advantages, one of which is the ability to provide complete pose data even when certain body parts are hidden from view. The advantage of having complete pose data is significant when developing and evaluating algorithms for tasks like action recognition, pose estimation, or activity monitoring, as it allows for a more comprehensive understanding of the human body’s movements and interactions in complex scenarios.

3.3 Feature Extraction According to the basic principles of balance and stability, the human body’s center of mass (CoM) is ideally located directly above the base of support (BoS). And BoS refers to the area of contact between the body and the supporting surface. For a standing person, BoS includes an area covered by both feet on the ground. Based on this concept, we proposed a virtual grounding point (VGP) feature in our previous work [6]. But the feature was extracted using the silhouette image in that work. When a complete silhouette image is not obtained, or certain parts of the image are missing, it can pose challenges to calculate the VGP accurately. In this research work, we propose VGP derived from

198

S. N. N. Htun et al.

the Base of Support (BoS) and apply it in the context of skeleton images. VGP refers to a calculated point representing the effective contact point between the human body and the supporting surface (ground). Virtual Grounding Point Calculation. To extract VGP, we first conduct a base of support (BoS) calculation. To construct the BoS, we first calculate the parameters using skeleton joint points, specifically those representing the left and right toes. After that, we form a line representing the BoS for a standing human body, as demonstrated in Fig. 2a. The equation that forms BoS is shown in (1), which can provide stable BoS even when the camera viewpoint changes. BoS = m ∗ (xlim − a) + b

(1)

where BoS refers to a line of contact between the body and the supporting surface, x lim is respective limits of the current x-axis, and m represents a slope or a ratio of the change in the y-axis to the change in the x-axis. a and b represent the parameters obtained by calculating the toes point to get the estimated area of BoS where a = x LeftToes - x RightToes , and b = yLeftToes - yRightToes , respectively. We then consider the spine point as a center of mass (CoM) as illustrated in Fig. 2b. The CoM of the human body typically lies along the vertical axis, somewhere near the lower abdomen. It changes with body position and movement. Investigating the CoM location and motion can provide valuable insights into how individuals maintain stability during various activities, such as walking, standing, or performing daily tasks. Since the CoM of the human body typically lies along the vertical axis, the next step is to form the vertical line over the BoS. To achieve this, we find the maximum values of the head and spine points of the human skeleton. After that, a vertical line from the head point to BoS is formed by passing through the spine point. Then, we calculate the intersection point of the vertical line to the BoS line. In doing so, the parameters of the line segment’s intersection are computed as shown in Eqs. (2) to (3).         xlim(1) − xmax(1) ∗ yBoS(1) − yBoS(2) − yBoS(1) − ylim(1) ∗ xlim(1) − xlim(2)        u =  xlim(1) − xlim(2) ∗ ylim(1) − ylim(2) − yBoS(1) − yBoS(2) ∗ xmax(1) − xmax(2)       (2)  xlim(1) − xmax(1) ∗ ylim(1) − ylim(2) − yBoS(1) − ylim(1) ∗ xmax(1) − xmax(2)        v =  xlim(1) − xlim(2) ∗ ylim(1) − ylim(2) − yBoS(1) − yBoS(2) ∗ xmax(1) − xmax(2)

(3) where u and v represent line segments intersect parameters, x lim(1) and x lim(2) refers to respective limits of the current x-axis of skeleton image, ylim(1) and ylim(2) refers to respective limits of the current y-axis of skeleton image, and y(BoS(1)) and y(BoS(2)) mean the base of support, and x max(1) and x max(2) refer to maximum values of the spine and head points respectively. After using the parameters for finding an intersection point, we derived the Eqs. (4) through (5). The resultant (x,y) point is considered a virtual grounding point (VGP) obtained from human skeleton images. Figure 2c comprehensively demonstrated a serial computation process of VGP.       xmax(1) + u ∗ xmax(2) − xmax(1) + xlim(1) + v ∗ xlim(2) − xlim(1) (4) VGPx = 2

Abnormal Activity Detection Based on Place and Occasion

      ylim(1) + u ∗ ylim(2) − ylim(1) + yBoS(1) + v ∗ yBoS(2) − yBoS(1) VGPy = 2

199

(5)

where VGPx and VGPy represent virtual grounding point at x and y axis, respectively. We then calculate the distance between CoM and VGP using Euclidean distance as described in Eq. (6). Analyzing pairs of changes in CoM and VGP along the axis can provide valuable insights into the patterns of human posture during different activities and transitions as shown in Fig. 2d. Let’s consider the observations mentioned in the context “fall”: When a person is standing, the distance between CoM and VGP is quite long. This is because the majority of the body’s length is supported by the feet, which creates a larger distance between the CoM and VGP at the ground level. As the person starts to fall down, the distance between CoM and VGP shortens. This is because, during the transition from standing to lying, the body’s CoM shifts downward, and the VGP moves closer to the CoM.  2  (6) d = (CoM x − VGP x )2 − CoM y − VGP y where d means the Euclidean distance between center of mass (CoM) and virtual grounding point (VGP). Estimation of Posture Deformities. In this section, we observe unusual posture by calculating degrees from 3 points of upper body part: heads, spine, and (x max(2) , ylim(2) ) of vertical line over BoS as demonstrated in Fig. 2e. When a person is falling or experiencing an unusual posture, the angles between these three points can become higher than 90°. Specifically, the curve at the spine joint relative to the head can increase beyond 90° during a fall. The spine may arch backward or twist unnaturally as the body loses balance, significantly if the person tilts their head backward during the fall. The degree (θ ) of three points: heads, spine, and (x max(2) , ylim(2) ) to calculate estimation of posture deformities is shown in Eq. (7).   180 ∗ (atan2(vectorA.y, vectorA.x) − atan2(vectorB.y, vectorB.x)) (7) θ1 = π where vectorA represents a vector from head to CoM (or spine) and vectorB represents a vector from (x max(2) , ylim(2) ) of vertical line over BoS to CoM (or spine). 3.4 Awareness of Human-Object Relationships We consider a crucial aspect of human-object relationships, which is interpreting the context of a person’s environment to identify abnormal or normal conditions for his/her actions. For instance, the approach would need to differentiate between a person lying on the bed in the bedroom (a common and normal resting position) and a person falling in the kitchen amidst kitchen objects (an abnormal and potentially dangerous situation). To achieve such contextual understanding, we focus on understanding the relationship between a person and his or her surroundings in this section. As presented in Sect. 3.2, the YOLO object detection network produces the results of bounding boxes and scores of each object class. Then, we calculate the overlapping ratio

200

S. N. N. Htun et al.

(a) Construction of BoS

(b) Obtained CoM

(c) VGP construction

(d) Euclidean distance

(e) 3 points of upper body

Fig. 2. Feature extraction using construction of VGP [from (a) to (c)], distance between CoM and VGP (d) and calculating degree from 3 points of upper body part (e).

of the bounding boxes of objects located near a person. After that, the centroid of the bounding boxes of each object class is obtained. Once the centroid of the human object’s bounding box is obtained, four coordinate points are calculated to define the region of the human object. After that, understanding the scene and spatial relationship between a person and the objects surrounding them is presented. The first step is linking the person to the centroids of the surrounding objects. Then, in the first quadrant of the coordinate system of person’s bounding box, a point along the x-axis is obtained. From the obtained point on the x-axis, we link the lines (vectors) connecting the human region’s centroid and each surrounding object’s centroid. Finally, the direction of each object from the person’s perspective is calculated based on the angles between the vectors formed in the previous step, as shown in Eq. (8). By following this process, we can understand the spatial arrangement of the objects surrounding a person, as well as the direction in which each object is located relative to the person, as shown in Fig. 3.   180 ∗ (atan2(vectorC.y, vectorC.x) − atan2(vectorD.y, vectorD.x)) (8) θ2 = π where θ 2 means degree representing the direction in which each object is located relative to the person. The representation of vector C and D are illustrated in Fig. 3. As demonstrated in Fig. 3, we define three types of locations: in front of , on, and beside. When θ indicates the locations we define, we will obtain spatial relationship between human and objects. However, if we study various camera angles, it is not easy to determine the spatial relationship between a person and an object using only θ value when it indicates a person is on the object. Therefore, when the overlapping ratio of bounding boxes on the person’s region of interest is small, we can assume the person is beside an object. 3.5 Decision-Making Process Using Hidden Markov Model In this section, we performed the decision-making process to detect abnormal and normal occasions of a person by employing HMM. Similar to previous work [12], we define two states: S = {S 1 , S 2 } to form Hidden Markov Chains. S 1 refers to an abnormal condition which includes “fall,” and S 2 represents a normal condition. In order to compute the Markov Transition matrix, the video sequences are manually observed from the provided data in Virtualhome2KG dataset to obtain the occurrence state symbol {S 1 , S 2 } for every

Abnormal Activity Detection Based on Place and Occasion

201

Fig. 3. Calculation of direction of each object from person’s region to see spatial arrangement.

interval. After that, a co-occurrence matrix M is formed as shown in Eq. (9). The state transition matrix of Markov Chain is obtained as shown in Eq. (10).   C11 C12 (9) M = C21 C22 where C 11 , C 12 , C 21 , and C 22 mean the number of pairs (S 1 , S 1 ), (S 1 , S 2 ), (S 2 , S 1 ) and (S 2 , S 2 ), respectively.   a11 a12 (10) T= a21 a22 where a11 = C 11 /C 1 , a12 = C 12 /C 1 , a21 = C 21 /C 2 and a22 = C 22 /C 2 . Here, C 1 = (C 11 + C 12 ) and C 2 = (C 21 + C 22 ) are obtained by summing each row of matrix M. We define the observable features to calculate emission probabilities by selecting a suitable threshold value for each feature. Specifying observable symbols determines whether the state changes abnormally. To do so, we use Euclidean distance values, degree orientation of the upper body part, and the direction of each object from the human region. For the features that specialize in human posture, we analyze and select the optimal threshold that can distinguish abnormal from normal. In the human-environment relationship, the object’s properties are used, and abnormalities are identified if the human does not have a proper relationship with the object. For example, a kitchen table is not a standable object, and it is considered abnormal if a person is standing on that object. Inspired by our proposed HMM structure [12], we obtain the emission probability of observed symbol ok (k = 1,2…,8) in state S j ( j = 1,2) as described in Eq. (11). ej (k) =

expectednumberofoccurrencesofok ∈ statesS j expectednumberofoccurrencesofstatesSj

(11)

202

S. N. N. Htun et al.

We then have emission probability matrix and utilize Viterbi algorithm [17] which efficiently computes the most likely hidden state sequence (abnormal and normal states) in the given observation sequences based on HMM model.

4 Experimental Results In order to evaluate the efficacy of our proposed approach, we performed an empirical study. In this section, we present obtained results, and a discussion about some limitations and future work. VirtualHome2KG [3, 4] generates abnormal activity categories in 7 apartments by using one virtual agent. In this work, we utilize 26 activities which include abnormal action “fall.” 4.1 Abnormal Activity Detection Based on Place and Occasion Object detection is conducted using our custom YOLO v4 network trained on 61520 VirtualHome2KG images. According to the experimental results and observations successfully detecting an object requires having at least one similar object in the training dataset. The similarity factors mentioned include size, shape, color, angle of rotation, and illumination. For future work and improvements to the object detection method, including more diverse images for each class would be beneficial. This could lead to more robust and accurate object detection, as the model would have encountered a broader set of examples during training. Then, two features are extracted for feature representation: the distance (d) from the center of mass (CoM) to VGP (Virtual Grounding Point) and the degree orientation (θ 1 ) from 3 points of the upper body part. By analyzing pairs of changes in CoM and VGP along the axis, it is found that the distance between CoM and VGP shortens when the person starts to fall. Here, the optimal distance’s threshold for determining abnormal or normal we set is a value less than one (positive). When a person is falling or experiencing an unusual posture, the angle (θ 1 ) between three points: heads, spine, and (x max(2) , ylim(2) ) of the vertical line over BoS can become higher than 90°. In order to evaluate the effectiveness of our proposed features, we then define four possible outcomes: detected abnormal state, undetected abnormal state, normal state, and mis-detected normal state [12]. The performance evaluation of d and θ 1 is done on precision, recall, accuracy, respectively, as shown in Table 1. In spatial relationship awareness, we define three types of locations: in front of , on, and beside. The object’s properties are used to identify abnormalities if the human has no proper relationship with the object, e.g., a coffee table is not a standable object, and it is considered abnormal if a person is on that object. All the observed features (d, θ 1, spatial relationship) are used as inputs to solve Hidden Markov Model. In this case, we define the initial probability values (π) as [0.8, 0.2]. These values indicate the likelihood of starting in each of the two states when the model begins. The performance evaluation on abnormal activity category using HMM is shown in Table 2. Figure 4 displays the experimental results of final detected states using HMM.

Abnormal Activity Detection Based on Place and Occasion

203

Table 1. Performance evaluation on features extraction. Activity’s Name

Homes

θ1

d PRC

RCL

ACC

PRC

RCL

ACC

Fall in bathroom1

1, 3, 5–6 100

66.27 96.93 100

71.07 97.37

Fall while during getting up or rising1

1–6

100

99.25 96.75 100

69.68 97.06

Fall while preparing meal1

1–7

79.37 93.42 97.09 82.41 91.40 97.32

Fall while sitting down or lowering1

1, 3, 6

78.69 74.28 92.09 100

Fall while standing and turning1

1, 2, 5–6 71.51 93.75 97.88 72.60 95.98 98.19

Fall while standing at somewhere height1

1, 4

28.57 87.52

84.92 77.07 95.96 93.51 84.39 97.59

Table 2. Performance evaluation on context-aware Hidden Markov Model. Activity’s Name

Homes

PRC

RCL

ACC

Fall in bathroom1

1, 3, 5–6

97.86

72.63

97.37

Fall while during getting up or rising1

1–6

97.85

71.72

97.12

Fall while preparing meal1

1–7

84.36

94.18

97.89

Fall while sitting down or lowering1

1, 3, 6

91.12

41.71

88.56

Fall while standing and turning1

1, 2, 5–6

76.30

95.98

98.48

Fall while standing at somewhere height1

1, 4

100

79.85

97.80

4.2 Limitations and Discussion The non-identical skeleton posture and position issue in the dataset can limit aligning skeleton data with the virtual agent in video scenes. An enhanced technique is crucial to address this challenge and align the skeleton data with the virtual agent’s posture in each camera view. Moreover, abnormal posture is estimated using sitting and standing positions, focusing on potential falls. To improve accuracy and comprehensive support, future detection tasks could consider a broader range of actions and postures. Additionally, we emphasize the limitations of using synthetic data for feature extraction of abnormal postures. Real-life postural dynamics are influenced by individual variations, age-related differences, and health conditions like Parkinson’s disease, impacting posture and stability. Simulating specific action patterns for older adults is crucial to improve analysis and understanding in research supporting them. This paper discusses understanding the scene and spatial relationship between a person and surrounding objects. However, integrating spatial and contact relationship is necessary to improve recognition accuracy. For instance, detecting a person standing near a refrigerator and making contact with the handle allows the system to infer activities like getting something to eat or drink, adding valuable detail to the recognition process.

204

S. N. N. Htun et al.

Normal

Frame no: 663 [SIT](315)

Normal

Frame no: 768 [STAND]

Abnormal

Frame no: 881 [FALL]

Abnormal

Normal

Frame no: 438 [SIT] (112)

Normal

Frame no: 543 [STAND]

Frame no: 656 [FALL]

Fig. 4. Demonstration of detected abnormal and normal states: Fall in bathroom1 (upper part of figure) and Fall while getting up or rising1 (lower part of figure).

5 Conclusion In summary, our proposed approach to detecting synthetic abnormal activities, incorporating place and occasion context while seamlessly integrating the Hidden Markov Model (HMM), brings compelling benefits that significantly improve accuracy, efficiency, and contextual awareness. Through the fusion of posture estimation and spatial relationships, harnessed by the robust capabilities of the HMM, we have introduced a holistic and contextually aware detection task that could enhance safety and offer personalized assistance, contributing to the improved well-being of older adults. This research lays the foundation for further advancements in caregiving technologies, ensuring a brighter and more independent future for the aging population. Future work would be focused on refining abnormal activity detection with cutting-edge machine learning, particularly deep learning, to further improve performance. Acknowledgement. This paper is based on results obtained from a project, JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

References 1. Fukuda, K., et al.: Daily activity data generation in cyberspace for semantic AI technology and HRI simulation. In: 40th Annual Meeting of the Japan Robotics Society of Japan, 3J1-03 (2022) 2. Htun, S.N.N., Egami, S., Fukuda, K.: A survey and comparison of activities of daily living datasets in real-life and virtual spaces. In: 2023 IEEE/SICE International Symposium on System Integration (SII), Atlanta, 2023, pp. 1–7 (2023)

Abnormal Activity Detection Based on Place and Occasion

205

3. Egami, S., Nishimura, S., Fukuda, K.: A framework for constructing and augmenting knowledge graphs using virtual space: towards analysis of daily activities. In: 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Washington, 2021, pp. 1226–1230 (2021). https://doi.org/10.1109/ICTAI52525.2021.00194 4. Egami, S., et al.: VirtualHome2KG: constructing and augmenting knowledge graphs of daily activities using virtual space. In: International Workshop on the Semantic Web (2021) 5. Egami, S., Ugai, T., Oono, M., Kitamura, K., Fukuda, K.: Synthesizing event-centric knowledge graphs of daily activities using virtual space. IEEE Access 11, 23857–23873 (2023). https://doi.org/10.1109/ACCESS.2023.3253807 6. Htun, S.N.N., Zin, T.T., Hama, H.: Virtual grounding point concept for detecting abnormal and normal events in home care monitoring systems. Appl. Sci. 10, 3005 (2020). https://doi. org/10.3390/app10093005 7. Modi, R., et al.: Video action detection: analysing limitations and challenges. In: 2022 IEEE/CVF Conference on CVPR Workshops, pp. 4907–4916 (2022) 8. Ariza-Colpas, P.P., et al.: Human activity recognition data analysis: history, evolutions, and new trends. Sensors 22, 3401 (2022). https://doi.org/10.3390/s22093401 9. Antoun, M., Daniel, C.: Asmar: human object interaction detection: design and survey. Image Vis. Comput. 130, 104617 (2022) 10. Wang, G., et al: Distance matters in human-object interaction detection. In: Proceedings of the 30th ACM International Conference on Multimedia (2022) 11. Wang, T., Lu, T., Fang, W., Zhang, Y.: Human-object interaction detection with ratiotransformer. Symmetry 14, 1666 (2022). https://doi.org/10.3390/sym14081666 12. Htun, S.N.N., Zin, T.T., Tin, P.: Image processing technique and hidden Markov model for an elderly care monitoring system. J. Imaging 6(49) (2020) 13. Libak, A.: Fall detection from a manual wheelchair: preliminary findings based on accelerometers using machine learning techniques. Assistive Technology (2023) 14. Yahaya, S.W., et al.: Detecting anomaly and its sources in activities of daily living. SN Comput. Sci. 2, 14 (2021). https://doi.org/10.1007/s42979-020-00418-2 15. Bochkovskiy, A., et al.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) 16. Lin, T..-Y.., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48 17. Daniel, J., James, H.M.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd edn. Stanford University, Stanford (2019)

Proposal of Fill in the Missing Letters CAPTCHA Using Associations from Images Hisaaki Yamaba1(B) , Muhammad Nur Firdaus Bin Mustaza1 , Shotaro Usuzaki1 , Kentaro Aburada1 , Masayuki Mukunoki1 , Mirang Park2 , and Naonobu Okazaki1 1

University of Miyazaki, 1-1, Gakuen Kibanadai-nishi, Miyazaki 889-2192, Japan [email protected] 2 Kanagawa Institute of Technology, 1030, Shimo-Ogino, Atsugi, Kanagawa 243-0292, Japan Abstract. This paper proposes a new fill in the missing letters type CAPTCHA using associations from images. Many web sites have adopted CAPTCHA to prevent bots and other automated programs from malicious activities such as posting comment spam. Text-based CAPTCHA is the most common and earliest CAPTCHA. But as optical character recognition (OCR) technology has improved, the intensity of distortions that must be applied to a CAPTCHA for it to remain unrecognizable by OCR has increased. This has reached a point where humans are having difficulty recognizing CAPTCHA text. The idea of the proposed CAPTCHA asks users to spell a word by filling some blanks. Since the number of shown letters are few, it is difficult to answer the correct word. But one or more images that can be used as hints to guess what is the answer word are also shown to the users. A series of experiments was carried out to evaluated the performance of the proposed CAPTCHA. First, a computer program was developed with various software languages for the usability evaluation. The system was used for the experiments to find the suitable parameters of the CAPTCHA such as numbers of letters that will be disclosed, position of disclosed letters. Next, security evaluation experiments were carried out using the system under the obtained parameters. The results of the experiments showed that the performance and limitation of the proposed CAPTCHA. Keywords: CAPTCHA blank

1

· Associations · Priming effect · Fill in the

Introduction

CAPTCHA — Complete Automated Public Turing Test to Tell Computers and Humans Apart — has become quite common on websites and applications. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 206–217, 2024. https://doi.org/10.1007/978-981-99-9412-0_22

CAPTCHA Using Associations from Images

207

CAPTCHA is a type of challenge-response test used to distinguish between human users and automated programs. It is used to prevent bots and other automated programs from signing up for email, posting comment spam, violating privacy, and making brute force login attacks on user accounts. CAPTCHA must be both highly secure and easy to use. To date, many versions of CAPTCHA have been proposed and developed so as to be not only difficult to solve by computer programs but also easy for humans to solve. Text-based CAPTCHAs are the most common and earliest CAPTCHA. They request users to enter the string of characters that appears in a distorted form on the screen. For example, Gimpy [1], EZ-Gimpy [2] and Gimpy-r [3] have been proposed as Text-Based CAPTCHA. It was easy for human beings to read the distorted strings, but this was difficult for computer programs when this CAPTCHA was introduced. However, as optical character recognition (OCR) technology has improved, the amount of distortion that must be applied to CAPTCHA strings has also increased. This has reached a point where humans are having difficulty solving CAPTCHAs. Thus, there is a need to develop a new text-based CAPTCHA that does not use distortion of letters. CAPTCHA using digraphia [4] is one of the text-based CAPTCHAs that was easy for human users to solve but difficult for OCR software to recognize. Digraphia is the use of two writing scripts in one language and one writing system [5]. In this CAPTCHA, two strings of words with the same meaning spelt in two letter systems were prepared, and obstructing figures were placed on top of them to make some of them unreadable, so that the original word could be found by combining the two. However, there might be a bot that can read those two strings of words even if obstructing figures were placed on top of them. In such a case, the security of the website using this CAPTCHA might be compromised. Instead of using two different character strings to identify the original word, this paper proposes a new CAPTCHA that asks users to spell an answer word by filling a missing letters. Since very few letters are shown to users, it is difficult to find out correct answers. But images that are related to the answer word are also shown to users. It is expected that human users can reach the correct answer by associating the hint images and the missing letter words using high cognitive ability of human-beings. To realize the proposed CAPTCHA, several parameters, suitable length of answer word and the number of missing letters, have to be decided. A prototype systems was developed and used to explore the good parameters. The rest of this paper is organized as follows: The proposed CAPTCHA scheme is presented in section two. Section three explains the experiments to determine the suitable CAPTCHA parameters, the number of letters to display and the position of the letters shown for missing CAPTCHA letters, which make the CAPTCHA performs well. Next, section four describes the security evaluation experiment. Lastly, section five describes the conclusion.

208

2

H. Yamaba et al.

Proposed CAPTCHA

This section describes the basic idea of the proposed CAPTCHA and tasks needed to realize the proposed CAPTCHA. 2.1

Basic Idea

Digraphia CAPTCHA [4] uses the human ability that they can obtain an correct answer from incomplete inputs. It is supposed that the priming effect helps human beings recognize two partially hidden character strings. Priming is a phenomenon whereby exposure to one stimulus influences a response to a subsequent stimulus, without conscious guidance or intention [6–8]. For example, after seeing a word or image related to food, many people respond “R I C E” with a “R” in the “ ” part of “ I C E”. On the other hand, after viewing words and images related to animal, the probability of answering “M I C E” increases with the letter “M”. Though two Malay strings are presented at the same time in [4], because a user sees them one after the other, the stimulus of seeing the first one leads to recognition of the other string. It is expected that usage of priming effect can be applied to other types of CAPTCHAs. For example, a CAPTCHA that pairs a picture and a word may work well. To improve the performance of CAPTCHAs, CAPTCHAs should be based on a more advanced human cognitive processing ability. Some studies proposed CAPTCHAs according to this approach: a CAPTCHA using the human ability to understand humor of four-panel cartoon [9], a CAPTCHA using phonetic punning riddles found on Knock-Knock Jokes [10], and a CAPTCHA using the ability of mathematical or logical thinking of human beings [11]. CAPTCHAs using priming are also expected to satisfy the condition. In this paper, a new CAPTCHA that uses both text and image is proposed. To solve this CAPTCHA, association between text and image is needed. First, users must identify what is the objects in the image. Then, users find out the answer word satisfying the missing letter string that is associated with the identified object. 2.2

Adoption of Generic/Specific Concept Relations

As the first step of this study, hypernymy and hyponymy relations are introduced for associations between images and a missing letters word. The CAPTCHA developed in this paper is shown in Fig. 1. One missing letters string shown in the lower part of the CAPTCHA screen and four hint images are shown in the upper part. The four word represented by the four images have the same hypernymy. The answer word in Fig. 1 is “LUNG” and the objects of the images are “HEART”, “LIVER”, “KIDNEY”, “BRAIN” and their hypernymy is “ORGAN”. The CAPTCHA system was developed using HTML, CSS, Javascript, and PHP. Mysql is used as the database to get the experiment’s data and store data from the experiment. Users are asked to enter the answer into the text box and submitting the answer by clicking the submit button. The system

CAPTCHA Using Associations from Images

209

will then decide whether the answer is correct or incorrect by comparing the data inside the database. Finally, all the data are being stored in the database for further investigation. CAPTCHA problems are generated as follows:

Hint images

Missing letter string

Fig. 1. The layout of the developed CAPTCHA system

1. Two databases are prepared in advance: eight noun words (hypernyms) and their hyponyms (60 words), and multiple images for each of the all words (hypernyms and their hyponyms) that will become hint images for the proposed CAPTCHA questions. The images were downloaded using Google image search. 2. One hyponym word is selected from the dictionary as an answer word. 3. Its hypernym is specified using the dictionary. 4. Four lower concept words of the upper concept are collected. The answer word must not be included in the set. 5. The four images of the four word are selected at random. 6. Missing letters are selected for the answer word. To realize this CAPTCHA, applying suitable position patterns to answer words is important. Suitable parameters shown below must be determined. – A suitable number of missing letters of answer words. – Suitable positions of missing letters in answer words. In the next section, acquisition of the suitable parameter candidates are attempted empirically.

210

3

H. Yamaba et al.

Exploring Suitable Parameters of the Proposed CAPTCHA

This section describes the experiments to determine the suitable CAPTCHA parameters such as the number of letters to display and the position of missing letters that make the CAPTCHA performs well. Table 1. Total number of possible position patterns Word length shown letters: 2 shown letters: 3 4 letters

6

4

5 letters

10

10

6 letters

15

20

7 letters

21

35

8 letters

28

56

Table 2. Success rate using dictionary information only Word length shown letters: 2 shown letters: 3

3.1

4 letters

1.82%

10.82%

5 letters

1.52%

7.43%

6 letters

1.08%

7.03%

7 letters

1.15%

7.14%

8 letters

1.13%

8.27%

Average

1.34%

8.14%

Number of Missing Letters of Answer Words

First, we carried out an experiment to determine how many letters should be displayed for missing letter answer words in the proposed CAPTCHA. Since computer programs that attack the CAPTCHA will use some dictionary to solve the CAPTCHA, the number of words that match each missing letter answer words have large influence on the security of the CAPTCHA. If the number of missing letter is large, the number of matched words is expected to be large. This means the CAPTCHA becomes safer as the number of missing letters is larger. There are several combinations of positions of missing letters and the combinations of shown letter positions are called position patterns in this paper. All possible position patterns for the cases that two and three letters are shown are tested in this experiment. For example, there are six position patterns for

CAPTCHA Using Associations from Images

211

four-letter words and the results of applying the patterns on “LUNG” are “L U ”, “L N ”, “L G”, “ U N ”, “ U G”, and “ N G”. Total number of possible position patterns for two and three letters are shown in Table 1. The number of words that match any of position patterns represents the ease of arriving the correct answers. In this experiment, word dictionary from gwicks.net [13] was adopted and 100 words were selected at random from the dictionary in advance. But the 100 words includes four-letter, five-letter, sixletter, seven-letter, and eight-letter words and the number of each length words are all 20 words. The average of the number of matched words was used for the evaluation index. Concretely, a reciprocal of the average can be regarded as the probability of the success rate when the words are the answers of the proposed CAPTCHA. Table 3. Position patterns used in the experiment Position Pattern Shown letters

Example (Word:PANDA)

P P1

First and second

PA

P P2

First and middle

P

P P3

First and last

P

P P4

Middle and last

P P5

First letter and first letter P in stressed syllable

N A N

A D

The results of this experiment for displaying two and three letters are shown in Table 2. When three letters are shown, the average success rate was 8.14% but when two letters are shown, the average success rate was 1.34%. Based on the results of this experiment, two letters of answer word are shown in the following experiments. Although showing two letter is a hard condition for human users to solve this CAPTCHA, this parameter was used in the following experiments in order to investigate the performance of the bot program used in the Sect. 5. 3.2

Positions of Missing Letters

Under the condition that two letters are displayed, which letter should be displayed is investigated here to improve the performance of the proposed CAPTCHA. Positions that are easy for a humans to answer but difficult for machine to solve are desirable. In this study, we restricted the number of letter’s positions to five position patterns P Pi shown in Table 3. By applying the five position patterns, five missing letter strings are generated from one answer word. As for P P5 , the first letter and the first letter of the first strong syllable will be displayed. But if the first syllable is the first strong syllable, the first letters and the first letter of the next strong syllable will be displayed. If there is no next strong syllable, the first letter of the first weak syllable will be shown instead. The word length used for

212

H. Yamaba et al.

missing CAPTCHA letter in this experiment ranges from five letters to eight letters. To compare the performances between position patterns, five CAPTCHA problems using the five missing letter string were assigned to different experimental participants, each. CAPTCHA problems were prepared as follows: 1. First, 20 hyponyms were selected as answer words, which have two or three syllables. Their length were from five to eight and the number of words of each length were all five. 2. From the 20 answer words, five word groups were arranged (denoted by W G1 , W G2 , W G3 , W G4 , W G5 ). Length of answer words are 5, 6, 7, and 8 and each word group includes one of each. 3. Missing letter strings were created by applying position patterns to answer words in word groups. A missing letter string set M Sij expresses that they are results of assigning a position pattern j (P P1 to P P5 ) to words in group i (W G1 to W G5 ). 4. Four hint words were selected for each of the 20 answer words and four hint images were selected for the four hind words. One CAPTCHA problem consists of one missing letter string and the corresponding four images. CAPTCHA problems that have the same answer word use the same hint images. 5. A problem group P Gij is arranged from CAPTCHA problems that have missing letter strings in M Sij . 6. Finally problem sets P Sk were created as the union set of five problem sets as shown below. Since each problem group has five questions, each P Sk includes 20 CAPTCHA questions. – P S1 : P G11 , P G22 , P G33 , P G44 , P G55 – P S2 : P G12 , P G23 , P G34 , P G45 , P G51 – P S3 : P G13 , P G24 , P G35 , P G41 , P G52 – P S4 : P G14 , P G25 , P G31 , P G42 , P G53 – P S5 : P G15 , P G21 , P G32 , P G43 , P G54 There was a total of 60 participants for this experiment and the participants were all fluent in English. They were divided into five groups Gn and different problem set were assigned to each group. The experiment was carried out according to the following procedure. 1. First, the participants are asked to answer a training section to understand the whole experiment before the main experiment. The experimental system developed in 3.2 was also used in this experiment. In this step, one P Sm was assigned to one user group according to the list shown below: – G1 : P S1 – G2 : P S2 – G3 : P S3 – G4 : P S4 – G5 : P S5

CAPTCHA Using Associations from Images

213

2. After one week, the main experiment was carried out. In the main experiment, the experimental system recorded the submitted answers of each participant. In this step, each group’s problem set was changed as follows: – G1 : P S2 – G2 : P S3 – G3 : P S4 – G4 : P S5 – G5 : P S1 The results, the average correct answer rate of the five position patterns, are shown in Table 4. Regardless of the position of the other letter, the human success rates are almost same when the first letter was shown. Therefore, when showing letters for missing CAPTCHA letters, the first letter should be shown for human users to increase the success rate of passing the proposed CAPTCHA. Table 4. Human success rates of each position pattern Position pattern Average score P P1

71.67%

P P2

70.67%

P P3

71.01%

P P4

53.00%

P P5

70.50%

Fig. 2. Image identification

4

Security Evaluation

This section describes the security evaluation experiment of the proposed method. A bot program was implemented to attack the proposed CAPTCHA using a word dictionary introduced in 4.1.

214

4.1

H. Yamaba et al.

WordNet

WordNet [12] is a database of words in the English language. It contains sizable lexical database of English words. Each of the nouns, verbs, adjectives, and adverbs in a set of cognitive synonyms known as a “synset” expresses a different concept. Conceptual-semantic and lexical relations like hypernymy and hyponymy are used to connect synsets. Hypernymy is the semantic relation of being superordinate or belonging to a higher rank or class. Meanwhile, hyponymy is the semantic relation of being subordinate or belonging to a lower rank or class. For example, between “FOOD” and “CAKE”, the superordinate from “CAKE” perspective is “FOOD” and the subordinate from “FOOD” perspective is “CAKE”. Therefore, hypernym of “CAKE” is “FOOD” and hyponym of “FOOD” is “CAKE”.

Fig. 3. Search for the most specific common hypernym of hint words

Fig. 4. Collection of all hyponyms succession to the hypernym

CAPTCHA Using Associations from Images

4.2

215

Bot Attack Procedure

A bot that can attack the proposed CAPTCHA was developed using hypernymy and hyponymy relation in WordNet. The steps to solve the proposed CAPTCHA are as follows: 1. Words expressed in the hint images are identified (see Fig. 2). 2. The most specific common hypernym of the words identified in Step 1. is explored in WordNet [12] as shown in Fig. 3. 3. All hyponyms are of the hypernym obtained in Step 2. collected as shown in Fig. 4. 4. Words that match the missing letter string are collected from the words obtained in Step 3. as shown in Fig. 5. 5. One words is selected at random as the answer. The probability that the selected word is correct is the reciprocal of the number of words obtained in Step 4. 1 × 100 F inal candidates number

Fig. 5. Selection of words matching the missing letter string

This bot needs image recognition system to identify the hint image name but for now the process but the function have not been implemented. In this experiment, it is assumed that names of hint images are already given. 4.3

Experimental Conditions

The word length for missing CAPTCHA letter used in this experiment ranges from four letters to eight letters. The number of letters to be shown is two. The first letter of the answer word were always shown and the other letter to be displayed was selected at random in this experiment. 294 CAPTCHA questions were prepared for this experiment. The questions generation of this experiment is almost same with question generation in Sect. 3.2 except that applied position patterns were limited to the patterns that includes the first letter. The implemented bot program tried to solve all of the 294 CAPTCHA questions. The bot success rate for each question was recorded and the average bot success rate was calculated.

216

4.4

H. Yamaba et al.

Results

The average bot success rate is 62.81%. This means that many of the CAPTCHA question was answered correctly by the bot using WordNet or using the hierarchical structure of concepts. To improve the security of the proposed CAPTCHA, introduction of other relations between hint images and answer words is needed.

5

Conclusion

This paper proposes a new fill in the missing letters type CAPTCHA using associations from images. Using the idea obtained from the study on Digraphia CAPTCHA, two types of hints (missing letter words and images) were introduced to help users solve the proposed CAPTCHA. Concretely, relations of hypernymy and hyponymy was used in the developed CAPTCHA system. The hypernym of the answer word is obtained and its hyponyms are collected. The answer word is shown to users but some of its letters are hidden but since the images of the hyponyms are shown together, users can fill the blanks of the answer word associated with the hint images. To realize the proposed CAPTCHA, two important parameters, the number of hidden/shown letters of an answer word and positions of the hidden/shown letters, were explored using the experimental system implemented based on the proposed CAPTCHA. Next, using the obtained parameters, security evaluation and usability evaluation were carried out. On the security evaluation, a bot program was implemented using WordNet and offered to the security evaluation. The bot could solve the proposed CAPTCHA using only the WordNet dictionary with a high probability. One of the reason is that since the relations between an answer word and hint images were limited to hypernymy/hyponmy, it was easy to simulate association using WordNet. And also, correct hint words were given in the experiment because image recognition function was not implemented in the bot program. This might help the program to obtain true answer. In the future work, the proposed CAPTCHA will be improved to introduce other relationships that cause associations, for example “apple” is associated with its production area or persons who connected with apples. Adjustment of difficulty, such as increasing of the number of displaying letters and further investigation of suitable positions of letters, will be also attempted. On the other hand, the bot program used in evaluation experiments will be improved by adding the ability of image recognition. Also, usability evaluation has to be carried out to confirm the performance is enough.

References 1. The CAPTCHA Project. Gimpy. http://www.captcha.net/captchas/gimpy. Accessed 31 Jul 2023 2. Mori, G., Malik, J.: Recognizing objects in adversarial clutter: breaking a visual CAPTCHA. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recogn. 1, 134–144 (2003)

CAPTCHA Using Associations from Images

217

3. Moy, G., Jones, N., Harkless, C., Potter, R.: Distortion estimation techniques in solving visual CAPTCHAs. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004) 4. Yamaba, H., et al.: Proposal of Jawi CAPTCHA using digraphia feature of the Malay language. In: Nakanishi, T., Nojima, R. (eds.) IWSEC 2021. LNCS, vol. 12835, pp. 119–133. Springer, Cham (2021). https://doi.org/10.1007/978-3-03085987-9 7 5. Ian, R.H.: Dale: digraphia. Int. J. Sociol. Lang. 26, 5–13 (1980) 6. Meyer, D.E., Schvaneveldt, R.W.: Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. J. Exp. Psychol. 90(2), 399– 413 (1971) 7. Collins, A.M., Loftus, E.F.: A spreading activation theory of semantic processing. Psychol. Rev. 82(6), 407–428 (1975) 8. Tulving, E., Schacter, D.L., Stark, H.A.: Priming effects in word-fragment completion are independent of recognition memory. J. Exp. Psychol. Learn. Mem. Cogn. 8(4), 336–342 (1982) 9. Kani, J., Suzuki, T., Uehara, A., Yamamoto, T., Nishigaki, M.: Four-panel Cartoon CAPTCHA. Inf. Process. Soc. Jpn 54(9), 2232–2243 (2013). (In Japanese) 10. Ximenes, P., dos Santos, A., Fernandez, M., Celestion Jr, J.: A CAPTCHA in the Text Domain, On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops, pp. 605–615 (2006) 11. Kaur, R., Choudhary, P.: A novel CAPTCHA design approach using Boolean Algebra. Int. J. Comput. Appl. 127(11), 13–18 (2015) 12. George, A.: Miller: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995) 13. Word Dictionary from wicks. http://www.gwicks.net/dictionaries.htm. Accessed 31 Jul 2023

Digital Transformation (DX) Solution for Monitoring Mycoplasma Infectious Disease in Calves: A Worldwide Health Challenge Cho Nilar Phyo1 , Pyke Tin2 , Hiromitsu Hama3 , and Thi Thi Zin2(B) 1 University of Miyazaki, Miyazaki, Japan 2 Graduate School of Engineering, University of Miyazaki, 1-1, Gakuen Kibanadai-Nishi,

Miyazaki 889-2192, Japan [email protected] 3 Graduate School of Engineering, Osaka Metropolitan University, 3-3-138, Sugimoto, Sumiyoshi-Ku, Osaka-shi 558-8585, Japan

Abstract. The Mycoplasma bovis (M. bovis) is a serious threat to cattle health, resulting in significant economic losses worldwide, particularly in veal calf sector. While the disease can circulate undetected, early identification of subclinical carriers is crucial. To this end, a fully automated monitoring system for Mycoplasma Infectious Disease in Calves was proposed using digital transformation technologies and AI advances. The proposed system will consist of four stages. In the first stage, an image processing technique will be developed to automatically or manually record behavioral or physiological parameters in calves while feeding at milk feeding robots. The second stage will integrate multiple data resources, such as DX records and image data, to analyze the data for detection and diagnosis of mycoplasma infection. The third stage will employ DX and AI advances to enforce the proposed monitoring system for making accurate decisions, such as whether to treat or not and what to treat calves for. In fourth stage, some experimental results will be displayed. In conclusion, the proposed automated monitoring system will provide a valuable tool for early detection of Mycoplasma Infectious Disease in calves, leading to reduce economic losses and offer timely information to address major worldwide problem. Keywords: Video Monitoring for Cattles · Mycoplasma Infectious Disease in Calves · Support Vector Machine (SVM) · Real-life Experiments

1 Introduction Mycoplasma Infectious Disease, also known as Mycoplasma-bovis (M. bovis), poses a significant threat to cattle health and causes substantial economic losses in various cattle production systems worldwide, particularly in the veal calf sector. Despite its impact, efficient health-monitoring tools to identify the disease on time are lacking, and it can circulate without being detected or showing signs of clinical disease. Therefore, there is a pressing need for a fully automated monitoring system that can provide new © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 218–226, 2024. https://doi.org/10.1007/978-981-99-9412-0_23

Digital Transformation (DX) Solution

219

insights into the dynamics of disease transmission and afford the necessary and timely information needed to address this worldwide problem of cattle health. In recent years, M. bovis has emerged as an increasingly important cause of respiratory disease, otitis media, and arthritis in young calves less than three months old [1–3]. Disease outbreaks with high morbidity rates occur, which can be economically devastating for the affected farm. However, most studies have focused on the clinical aspects of Mycoplasma disease in calves [4, 5]. There is a lack of research on the use of image processing techniques to detect Mycoplasma behaviors in calves. Therefore, there is a need to investigate the potential of using image processing techniques, computer vision methods, as well as AI and DX technologies, to monitor and detect Mycoplasma behavior in calves. Such an automated system could provide realtime monitoring, rapid disease detection, and early intervention to prevent out-breaks and minimize economic losses.

2 Proposed System To address the issue of timely and accurate detection of Mycoplasma Infectious Disease in calves, a fully automated monitoring system was proposed by utilizing digital transformation tools and AI advances. The system will monitor the behavioral and physiological parameters of calves while feeding at milk feeding robots and interpret the data collected to detect any changes that are linked to the calves’ health status and Mycoplasma-related action parameters. The proposed monitoring system will consist of four stages, as illustrated in Fig. 1.

Fig. 1. Overview of the proposed system.

2.1 Stage 1: Development of Digital Transformation (DX) Tools The image and digital transformation technology were applied to automatically or manually record behavioral or physiological parameters in calves while feeding at milk feeding robots.

220

C. N. Phyo et al.

2.2 Stage 2: Data Interpretation and Analysis In this stage, changes in data will be detected and connected to changes in behavior and physiology, establishing a link to the calves’ health status and Mycoplasma-related action parameters. This stage will involve the development of software tools that can analyze images and data to detect any changes in calf behavior or physiological parameters that may be indicative of Mycoplasma Infectious Disease. In this concern, the following behaviors have been observed: Lameness. Mycoplasma bovis infection can cause arthritis, which can result in limping or other signs of lameness. Fever. Infected calves may show signs of a high fever, which can be detected by using a rectal thermometer. Head shaking. If a calf is frequently shaking head, it may be a sign of discomfort or irritation in the ears, which could be caused by an ear infection or other health issue. Ear drooping. Calves with Mycoplasma bovis infection may show a drooping of one or both ears, which can be an indicator of pain or inflammation in the ear. 2.3 Stage 3: Information Integration Multiple data resources, such as DX records and image data, will be integrated to select key parameters the observed behaviors in stage 2. The extracted key parameters are shown in Table 1. Table 1. Observed Behaviors and Their Key Parameters. Behaviors

Key Parameters

Lameness

Frequency of limping episodes per unit time Number of steps taken by the calf during a specified period of time Duration of time that the calf spends limping Variability of the length of limping episodes (i.e. standard deviation of limping episode length)

Head shaking

Frequency of head shaking per unit time Duration of time that the calf spends shaking its head

Ear drooping

Duration of time that the calf’s ear(s) remain drooped Frequency of drooping episodes per unit time, Number of drooping episodes during a specified period of time, Severity of drooping (e.g., measured by the angle of the droop)

Then the system will employ data analysis techniques, such as machine learning algorithms, to detect any anomalies or patterns that may be indicative of the disease. One machine learning algorithm that could be used to detect anomalies or patterns indicative of Mycoplasma disease with respect to lameness and ear dropping or flicking is the Support Vector Machine (SVM) algorithm. SVM is a supervised machine learning algorithm that can be used for classification or regression tasks. In this case, SVM was

Digital Transformation (DX) Solution

221

used for binary classification, where the algorithm was trained to distinguish between healthy cattle and those with Mycoplasma disease based on their lameness and ear movements. To train the SVM algorithm, a dataset of labeled examples was needed, where each example represents a calf and is labeled as either healthy or infected with Mycoplasma. The dataset also includes features that are relevant to the task, such as lameness and ear movements. Once the dataset is prepared, the SVM algorithm was trained to learn the patterns and anomalies that distinguish healthy cattle from those with Mycoplasma disease. The algorithm will then be able to predict the health status of new cattle based on their lameness and ear movements. 2.4 Stage 4: Utilization of AI Advances for Decision Making Process In this stage, the latest developments in DX and AI will be utilized to enhance the proposed monitoring system and enable accurate decision-making processes. The system will use the data collected and analyzed from the previous stages to display the experimental results in a user-friendly manner, providing valuable insights for effective decision making. Based on the data analysis, the system will provide recommendations on whether to treat the calves or not, and if so, what treatments would be most effective. The recommendations will be based on the results of previous experiments and the latest scientific research, ensuring that the decisions are evidence-based and accurate. Finally, the system will display the experimental results in an easy-to-understand format, such as graphs and charts, allowing for informed decision-making by the farmers and veterinarians. The data visualization will also enable the tracking of the calves’ health status over time, providing a comprehensive overview of their health and wellbeing.

3 Some Illustrative Simulation Results Several clinical sign assessments have been conducted to detect mycoplasma bovis infection in young calves. According to empirical studies [6], the main clinical signs observed were general clinical signs, respiratory signs, and nasal discharge. Among the many empirical findings, the authors of [7, 8] tested mycoplasma-infected calves and found that the general clinical signs scored 175, respiratory sign scores were 75, and nasal discharge scores were 230. By normalizing these scores, a prior probability distribution of these clinical signs can be defined to predict the occurrence of mycoplasma infectious disease in young calves. Specifically, Table 2 shows the prior probability distribution and it can infer the prior probability distribution for non-infected calves, as shown in Table 3. These probabilities will be used as features for infected and non-infected calves, where they will be labeled as F1 = general signs, F2 = respiratory signs, and F3 = nasal discharge, along with Infected (1) and non-infected (0) labels, respectively. The data will be generated by using the Markov Chain Monte Carlo simulation method based on the prior probability distributions. Table 4 shows sample simulation results for infected and non-infected calves. 3.1 Mahalanobis Distance-Based Mycoplasma Classification For the classification process, the Mahalanobis distance-based Mycoplasma classification method was utilized. Mahalanobis distance is a powerful metric that calculates the

222

C. N. Phyo et al. Table 2. Clinical Scores for Infected Calves.

Clinical Signs

General Signs

Respiratory Signs

Nassal Discharge

Total

Scores

175

75

230

480

Probability

0.364583333

0.15625

0.479166667

1

Table 3. Prior Probability Distributions for Infected and Non-Infected. Clinical Signs

General Signs

Respiratory Signs

Nassal Discharge

Total

Scores

175

75

230

480

Probability

0.364583333

0.15625

0.479166667

1

Prob Non-Infect

0.635416667

0.84375

0.520833333

2

Normalize

0.317708333

0.421875

0.260416667

1

distance between two samples by taking into account various feature attributes. This unitless measure is parameterized by a positive semi-definite (PSD) matrix and offers numerous advantages over other metrics. One of the main advantages of Mahalanobis distance is that it considers the correlations between different variables, which helps establish a more accurate relationship between variables and the labels of Multivariate Time Series (MTS). Another benefit is that it has a multivariate effect size, meaning that the scale of the Mahalanobis distance does not impact the classification or clustering performance of MTS. These advantages make Mahalanobis distance a reliable local distance metric for comparing MTS. In this experiment, the training samples and the test sample are combined into a data set {x 1 , x 2 , ……, x m , V } can be described as a column vector composed of N characteristic attributes {z1 , z2 , ……, zn }T μi is the expected value of ith element, μi = Σ(zi ). The correlation between the dimensions of these samples is expressed by the covariance matrix Σ, i.e.,    Cij ,

i=n,j=m

=

(1)

i=1,j=1

where C ij is a covariance defined as Cij = cov(zi , zj ) = E[ (zi − μi ) (zi − μi ) ] The Mahalanobis distance between data points x and y is   MD = [ (x − y)T ( )−1 (x − y) ]

(2)

(3)

Digital Transformation (DX) Solution

223

Table 4. Sample Simulation Results. F1

F2

F3

infected F1

F2

F3

non-infected

0.366667 0.233333 0.4

1

0.333333 0.433333 0.233333 0

0.3

0.3

1

0.366667 0.3

0.333333 0

0.2

0.233333 0.566667 1

0.5

0.3

0.2

0.4

0.366667 0.233333 0

0.333333 0.2

0.4

0.466667 1

0

0.333333 0.166667 0.5

1

0.366667 0.266667 0.366667 0

0.433333 0.166667 0.4

1

0.2

0.4

0.566667 0.233333 0

0.166667 0.433333 1

0.3

0.333333 0.366667 0

0.266667 0.166667 0.566667 1

0.4

0.466667 0.133333 0

0.466667 0.133333 0.4

1

0.333333 0.4

0.266667 0.233333 0.5

1

0.2

0.233333 0.166667 0.6

1

0.466667 0.4

0.3

0.233333 0.466667 1

0.266667 0.2

0.533333 1

0.3

0.5

0.266667 0 0.3

0

0.133333 0

0.433333 0.266667 0

0.333333 0.533333 0.133333 0

0.5

0.166667 0.333333 1

0.3

0.2

0.4

0.266667 0.333333 1

0.333333 0.433333 0.233333 0

0.4

0.166667 0.433333 1

0.433333 0.366667 0.2

0

0.3

0.033333 0.666667 1

0.366667 0.333333 0.3

0

0.5

0.333333 0.166667 0.5

1

0.2

0.5

0.3

0

0.266667 0.333333 0.4

0

1

0.3

0.433333 0.266667 0

0.433333 0.133333 0.433333 1

0.4

0.266667 0.333333 0

0.4

0.2

0.433333 0.366667 0

0.266667 0.133333 0.6

0.133333 0.466667 1 1

0.5

0.366667 0.133333 0

0.4

0.2

1

0.233333 0.433333 0.333333 0

0.4

0.166667 0.433333 1

0.466667 0.1

0.4

0.433333 1

0.4

0.166667 0.433333 1

0.4

0.2

0.366667 0.333333 0.3 0.2

0

0.533333 0.266667 0

0.333333 0.333333 0.333333 0

0.4

1

0.133333 0.533333 0.333333 0

0.233333 0.066667 0.7

1

0.366667 0.366667 0.266667 0

0.133333 0.466667 1

0.366667 0.366667 0.266667 0

0.4

0.466667 0.1

0.433333 1

0.333333 0.4

0.266667 0

224

C. N. Phyo et al.

As description above all, it is not difficult to find approaches to calculate the Mahalanobis distances between m training samples and the test sample n 

di =

i−1

n  

[ (xi − V )T (

 −1 ) (xi − V ) ]

(4)

i−1

In our simulation experiment, the covariance matrix of size 3 × 3 was computed, as given in Eq. (2) ⎡ ⎤ 0.438834 −0.00827 −0.10184 C = ⎣ −0.00827 0.010428 0.000804 ⎦ −0.10184 0.000804 0.069037 Then the Mahalanobis distance between the corresponding class means and the test data was calculated as shown in Table 5. The accuracy of 95% in average is resulting as shown in Table 6. Table 5. Mahalanobis distance between the corresponding class means and the test data. GT Mahalanobis Distance

Predict Match GT Mahalanobis Distance

1.669444

4.062222

Predict Match

1

0.33

1

1

0

0.080417 0

1

1

1.078333

0.988889 0

0

0

1.237778 0.633889 0

1

1

3.305556

5.855556 1

1

0

5.575556 4.080833 0

1

1

0.085069

2.592778 1

1

0

3.164444 0.70875

0

1

1

0.083611

3.494444 1

1

0

0.703333 1.055556 1

0

1

0.923333

3.606111 1

1

0

9.622222 2.87375

1

1

0.295

3.278889 1

1

0

1.518889 0.573333 0

1

1

1.258889

5.043333 1

1

0

6.711111 1.393889 0

1

1

1.795556

5.162222 1

1

0

2.99

0.004297 0

1

1

1.081111

2.943333 1

1

0

7.042222 2.356528 0

1

1

0.490069

2.247778 1

1

0

6.668889 2.907778 0

1

1

0.490069

2.247778 1

1

0

3.981111 0.126016 0

1

1

1.104444

4.072222 1

1

0

7.746667 1.076667 0

1

1

3.302222

5.435278 1

1

0

7.042222 2.356528 0

1

1

0.430556

3.164444 1

1

0

1.887222 1.181111 0

1

1

1.205556

1.282778 1

1

0

4.062222 0.080417 0

1

1

0.295

3.278889 1

1

0

4.488889 1.547778 0

1

1

2.196667

9.325556 1

1

0

1.786667 0.377222 0

0

1 (continued)

Digital Transformation (DX) Solution

225

Table 5. (continued) GT Mahalanobis Distance

Predict Match GT Mahalanobis Distance

1

0.33

1

1

0

1

0.083611

3.494444 1

1

0

1

0.841111

4.585

1

1

0

1.205556 1.282778 1

0

1

0.285208

4.141667 1

1

0

5.455556 2.365556 0

1

1

1.585556

6.538889 1

1

0

7.802222 4.267222 0

1

1

0.460556

2.448333 1

1

0

4.578889 1.243889 0

1

1

0.295

3.278889 1

1

0

1.786667 0.377222 0

1

1

1.732778

6.061111 1

1

0

8.308889 2.542474 0

1

1

0.295

3.278889 1

1

0

1

0.460556

2.448333 1

1

0

1

3.96

1

1

1

0.285208

4.141667 1

1

1.732778

6.061111 1

1.669444

10.50889

4.062222

Predict Match 0.080417 0

1

3.981111 0.126016 0

1

1.485556 0.284444 0 10.57333

1

5.269167 0

1

0

2.647778 0.223793 0

1

1

0

2.647778 0.223793 0

1

1

0

2.99

1

0.004297 0

Table 6. Clinical Scores for Infected Calves. Total

Correct

Accuracy

60

57

0.95

4 Conclusions This study has highlighted the potential benefits of using Digital Transformation (DX) solutions for monitoring Mycoplasma infectious disease in calves in the context of dairy farm management systems. The usefulness of using statistical measures such as Mahalanobis distance in conjunction with DX technologies have been explored to identify and track the spread of the disease. The simulation results presented in this study are based on real-life clinical findings from previous studies and demonstrate the practical application of the proposed method. However, further research and development is needed to fully realize the potential of DX solutions in this area. Overall, this study has shown that the integration of DX technologies and statistical analysis can provide valuable insights and improve disease management strategies for dairy farms facing the challenge of Mycoplasma infectious disease in calves. These findings will encourage further investigation and adoption of these innovative approaches in the agricultural industry. Acknowledgements. This publication is subsidized by JKA through its promotion funds from KEIRIN RACE.

226

C. N. Phyo et al.

References 1. Lowe, G.L., et al.: Physiological and behavioral responses as indicators for early disease detection in dairy calves. J. Dairy Sci. 102(6), 5389–5402 (2019) 2. Maunsell, F.P., Donovan, G.A., Risco, C., Brown, M.B.: Field evaluation of a Mycoplasma bovis bacterin in young dairy calves. Vaccine 27(21), 2781–2788 (2009) 3. Becker, C.A.M., et al.: Monitoring mycoplasma bovis diversity and antimicrobial susceptibility in calf feedlots undergoing a respiratory disease outbreak. Pathogens 9, 593 (2020) 4. Zin, T.T., et al.: Automatic cow location tracking system using ear tag visual analysis. Sensors 20(12), 3564 (2020) 5. Zin, T.T., Seint, P.T., Tin, P., Horii, Y., Kobayashi, I.: Body condition score estimation based on regression analysis using a 3D camera. Sensors 20(13), 3705 (2020) 6. Dudek, K., Nicholas, R.A.J., Szacawa, E., Bednarek, D.: Mycoplasma bovis infectionsoccurrence, diagnosis and control. Pathogens 9(8), 640 (2020) 7. Junqueira, N., et al.: Detection of clinical bovine mastitis caused by mycoplasma bovis in Brazil. J. Dairy Res. 87(3), 306–308m (2020) 8. Stripkovits, L., Ripley, P., Varga, J., Palfi, V.: Clinical study of the disease of calves associated with mycoplasma bovis infection. Acta Veter. Hung. 48(4), 387–395 (2000)

AI Driven Movement Rate Variability Analysis Around the Time of Calving Events in Cattle Wai Hnin Eaindrar Mg1 , Pyke Tin2 , Masaru Aikawa3 , Ikuo Kobayashi4 , Yoichiro Horii5 , Kazuyuki Honkawa6 , and Thi Thi Zin2(B) 1 Interdisciplinary Graduate School of Agriculture and Engineering, University of Miyazaki,

Miyazaki, Japan [email protected] 2 Graduate School of Engineering, University of Miyazaki, Miyazaki, Japan [email protected] 3 Organization for Learning and Student Development, University of Miyazaki, Miyazaki, Japan [email protected] 4 Sumiyoshi Livestock Science Station, Field Science Center, Faculty of Agriculture, University of Miyazaki, Miyazaki, Japan [email protected] 5 Center for Animal Disease Control, University of Miyazaki, 1-1, Gakuen Kibanadai-Nishi, Miyazaki 889-2192, Japan [email protected] 6 Honkawa Ranch, Oita 877-0056, Japan [email protected]

Abstract. In modern cattle management, the timely detection of cattle events is crucial for ensuring both animal welfare and farm profitability. This paper introduces an innovative approach that leverages AI-driven movement rate variability analysis to predict calving events in cattle. By harnessing advanced motion tracking technologies and machine learning algorithms, this methodology offers a nonintrusive and automated means of detecting physiological and behavioral changes associated with impending calving events. Through a comprehensive exploration of data collection, pre-processing, and feature engineering, this paper establishes the foundation for training accurate AI models. These models utilize distinct movement patterns, including changes in speed, frequency, direction, and rest behavior, as predictive indicators of calving events. Real-world validation on cattle farms underscores the practical viability of the proposed approach, demonstrating its potential to revolutionize calving event detection. By transcending traditional methods, this AI-driven solution exhibits superior accuracy and efficiency, thereby contributing to enhanced animal care, optimized farm operations, and improved economic outcomes. The paper concludes by highlighting future research avenues and underscoring the transformative implications of AI-driven movement analysis for calving event prediction in the realm of agricultural technology. Keywords: AI Driven Movement Rate Variability · Advanced Motion Tracking · Time of Calving Events

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 227–237, 2024. https://doi.org/10.1007/978-981-99-9412-0_24

228

W. H. E. Mg et al.

1 Introduction In modern cattle management, timely detection of calving events is essential for both animal well-being and farm profitability. This interplay between livestock health and economic success requires understanding calving factors. Calving significantly impacts cow health, calf survival, and post-calving reproductive performance. Rapid intervention during calving can mitigate issues, enhance neonatal outcomes, and optimize herd health. However, accurately identifying and monitoring calving is challenging [1]. Traditional visual methods are error-prone and impractical for 24/7 monitoring. Recent advancements in AI and motion tracking transform dairy farming. AI extracts insights from cattle behavioral data, autonomously recognizing and interpreting movements [2, 3]. Our study focuses on AI-driven movement rate variability analysis to predict calving. Our proposal has two aims: establishing a framework for data collection, preprocessing, detection, tracking, and feature engineering to train AI models; validating AI-driven movement analysis on cattle farms. The models identify movement patterns like speed changes, frequency, direction, and rest behavior to predict calving. This paper introduces an innovative AI and motion analysis for early calving prediction, offering benefits to animal care, farm operations, and economic outcomes. Our study is structured into five sections. Section 1 introduces the research. Section 2 discusses some related works. Section 3 describes the materials utilized for the analysis and explains the methods. In Sect. 4, the experimental implementation results and analysis are presented in detail. Finally, Sect. 5 presents the conclusion of this proposed research.

2 Some Related Works In the pursuit of timely calving event prediction, researchers have ventured into various facets of cattle management and behavior analysis [4, 5]. Calving time prediction has been addressed through the integration of physiological sensors [6, 7]. By monitoring hormone levels and body temperature changes, the researchers identified patterns that precede calving events. The behavioral cues surrounding calving events have garnered significant attention as well. Restlessness and nesting behaviors emerged as reliable indicators, emphasizing the importance of behavioral observation. Further researchers refined this understanding by highlighting the increased frequency of lying-to-standing transitions during the lead-up to calving [8]. These diverse research endeavors collectively contribute to the foundation upon which our study is built.

3 Materials and Methods In this section, we introduce our innovative AI-Driven Movement Rate Variability Framework designed for the analysis of motion patterns during calving events in cattle. Figure 1 provides an overview of the framework’s architecture. The framework comprises four key components: (i) Data Collection and Pre-processing, (ii) AI-Driven Movement Detection and Tracking, (iii) Feature Extraction and Movement Variability Analysis, and (iv) Calving Time Prediction. A comprehensive breakdown of each component is presented in the subsequent sections.

AI Driven Movement Rate Variability Analysis

229

Fig. 1. Overall Framework Architecture

3.1 Data Collection and Preprocessing To facilitate a robust analysis, a systematic data collection approach was adopted. Videoequipped devices that involve the utilization of a 360° surveillance camera equipped with a fisheye view were placed to capture cattle movement data in a calving pen. Specifically, the data collection process is performed in a real-life environment of the second largest dairy farm in Japan. 3.2 AI Driven Movement Detection and Tracking Process Our framework leverages advanced AI algorithms for real-time movement detection and tracking. Deep learning techniques, such as detectron2 detection and Customized Tracking Algorithm, were employed to accurately identify and follow cattle movements. The integration of AI significantly enhances the precision and efficiency of movement pattern recognition. 3.2.1 Automatic Cattle Detection The integral automated cattle detection facet of our system harnesses the robust and versatile detectron2 framework [9]. By capitalizing on the power of transfer learning, our custom-tailored detectron2 detection model refines its performance by fine-tuning pre-trained weights to our cattle-specific dataset. Non-cattle regions, such as those corresponding to humans or tank cars, are adeptly filtered out through a meticulous process of binary mask area calculations and predefined thresholds. 3.2.2 Automatic Cattle Tracking In our study, operating within cattle farms housing 4 to 8 cows accentuates the significance of accurate tracking and well-defined trajectories. Our Customized Tracking Algorithm (CTA) comes into play, strategically addressing these multifaceted tracking

230

W. H. E. Mg et al.

hurdles. In essence, our algorithm anticipates tracking challenges, ensuring that miss detections and different occlusion problems are handled adeptly. 3.3 Feature Extraction and Movement Variability Calculation Our focus lies on capturing and comprehending critical aspects of cattle behavior. This phase encapsulates five key features: Trajectory, Triangle Area, Moving Average, Poicare, and Entropy. Each of these features holds unique insights into the nuances of cattle movement patterns. Trajectory delineates the path of movement, Triangle Area quantifies motion density, Moving Average presents the average trajectory trend, Poicare Plot offers a visual narrative of converging paths, and Entropy assesses the complexity of movement behaviors. 3.3.1 Trajectory Calculation The extraction of crucial feature points constitutes a fundamental step in our analysis, enabling a comprehensive understanding of cattle movement behaviors. Among these key features are the centroid co-ordinates, denoted by gravity center values. By calculating successive distances and directions between centroid positions across frames, we unlock nuanced insights into the dynamics of cattle movement. Simultaneously, the directions between consecutive centroids illuminate the trajectory of movement, revealing intricate patterns and shifts in motion. These will be described in the experimental results section. 3.3.2 Triangle Area Calculation To derive an additional feature, we initially computed centroid points to represent the cow’s body. Subsequently, we meticulously constructed triangular regions in sets of three consecutive frames using these centroid points. For instance, during regular cow movement, the centroids’ triangular arrangement takes the form of an almost straight line, resulting in a lower triangular motion area. This triangular arrangement of centroid points for contour sequences equation is shown in Eq. 1 and this arrangement is represented by Fig. 2. TriangleArea =

1 [x1 (y2 − y3 ) + x2 (y3 − y1 ) + x3 (y1 − y2 )] 2

(1)

where (x1, y1), (x2, y2), and (x3, y3) are the coordinates of three consecutive points. Cattle detection is conducted at intervals of 7200 frames (equivalent to 1 h), with frame sequences extracted at a rate of 2 frames per second (2 FPS). Among the 7200 frame sequences, we successfully obtained a total of 3600 tri-angular regions. These regions are formed by pairs of frames with 2-frame overlap, including combinations like frames 1, 2, 3; frames 3, 4, 5; frames 5, 6, 7. 3.3.3 Moving Average Calculation The Moving Average is calculated based on the results obtained from the Triangle area analysis. This specific calculation involves taking the average of Triangle area values

AI Driven Movement Rate Variability Analysis Frame 1

Frame 2

Frame 3

Frame 4

2

1

Frame 5

3

Frame 7

Frame 6

6

4

3

231

5

5

7

Fig. 2. Working Structure for Triangle Area Calculation

for every three consecutive data points. The utilization of a moving average method can effectively filter out anomalies like peaks and valleys, facilitating the identification of trends with clarity. In the context of this study, the moving average method presents distinct advantages, notably its ability to narrow down the detection area ranges. Furthermore, it demonstrates a high capability to identify almost all outlier regions, encompassing transitions, as well as certain normal and abnormal behaviors. By applying this method, we can effectively capture the trend and direction of changes in motion density over time. The moving average is calculated with Eq. 2 and the working structure is shown in Fig. 3. N MovingAverage =

i=1 T area i

N

(2)

where, N = 3 (Interval Value), Tarea = Triangular Area

Moving Average

Moving Average

Moving Average

Fig. 3. Working Structure for Moving Average

3.3.4 Poicare Calculation The Poicare Plot incorporates the concept of Euclidean distance to analyze cattle movement behaviors around calving events. The Poicare Plot offers insights into how the distance between points changes over time, providing a dynamic representation of cow

232

W. H. E. Mg et al.

movement dynamics. It is calculated using the following Eq. 3.  Distance = (xi − xj )2 + (yi − yj )2

(3)

  where, (xi , yi )and xj , yj are the coordinates of the centroid points at time i and j respectively. In the Poicare analysis, a distinct phenomenon is observed where the primary cluster of data points becomes denser and more tightly converged as the calving event approaches. The results will be shown in the experimental results section. 3.3.5 Entropy Calculation In our research, entropy serves as a valuable tool to measure the unpredictability or variability of cattle movement patterns around calving events. By calculating entropy, we can gain insights into the diversity and complexity of cow behaviors during the leadup to calving. The entropy equation is expressed as Eq. 4. Additionally, we incorporated the exponential distribution in our analysis, and it is expressed as Eq. 5. By applying the exponential distribution, we can estimate the probability of a cow calving within a specific time frame based on historical data. H (X ) = −

n 

P(xi )log 2 P(xi )

(4)

i=1

ProbabilityofCalvingFormula = 1 − e−λt

(5)

where, H(X) represents the entropy, n is the number of events (n = 3), P(xi ) is the probability of poicare distance values, λ is the rate or inverse scale. 3.4 Calving Time Prediction Process An integral component of our framework involves predicting calving times based on the observed movement patterns. Our system integrates movement data and extracts features to forecast impending calving events with a high degree of accuracy.

4 Experimental Results To comprehensively assess the efficacy of our proposed system, we conducted three primary experiments: (1) a cattle detection experiment, (2) a cattle tracking experiment, and (3) an experiment analyzing cattle calving prediction. This experimental work is annotated by calving data images from videos with 1 frame per minute and we use 80% for training and 20% for validation. In Table 1, dataset detailed information is presented. 4.1 Cattle Detection and Evaluation Results Working through the detailed procedures described in section, we can summarize the experimental results of each process. Notably, our system achieved an impressive average detection accuracy of 98.70%. Figure 4 showcases sample detection results.

AI Driven Movement Rate Variability Analysis

233

Table 1. Dataset Information Dataset

Date

#Frames

#Instances

Training

Oct, Nov-2021, March, Apr, June, Sept, Oct, Nov-2022, Jan, Feb, Jul-2023

4,445

27,275

Validation

Dec-2021, Jan-2022, Jan, May-2023

1,111

5,275

Fig. 4. Sample Detection Results

4.2 Cattle Tracking and Evaluation Results In Fig. 5, it depicts a densely populated scene where multiple cows engage in dynamic interactions. We conducted a series of experiments at three distinct time intervals, with each session involving a group of four cows identified as Cow IDs 1, 2, 3, and 4. Furthermore, Fig. 6 provides a visual representation of the trajectories accumulated over a one-hour testing period, captured through the camera’s lens. This figure effectively complements our experiments by illustrating the unique paths taken by these four cows. To enhance ease of identification, the trajectories in Fig. 6 are color-coded based on the cows’ individual identification numbers. Specifically, Cow ID 1’s trajectory is indicated by blue lines, Cow ID 2’s trajectory is represented by orange lines, and the green lines in Fig. 6 correspond to the trajectories of cows with ID 3’s trajectory, while the red lines depict the paths taken by cows identified as ID 4’s trajectory. Remarkably, our system achieved an impressive average tracking accuracy of 99.98%. 4.3 Calving Prediction and Analysis Results To comprehensively assess the aspect of cattle calving prediction, we conducted an indepth analysis using four key features: Triangle Area, Moving Average, Poicare, and Entropy which are calculated from the trajectory results.

234

W. H. E. Mg et al.

Fig. 5. Sample Tracking Results

Fig. 6. Sample Trajectory Results

4.3.1 Trajectory Analysis Results Given the distinct calving conditions among cows, a comparative analysis between Track ID 2 and ID 3 was conducted to understand their calving behavior. Figure 7 highlights the sample results of the trajectory testing analysis, both observed within the 12-h period prior to calving time.

Fig. 7. Sample Trajectory Results for IDs 2 and 3

AI Driven Movement Rate Variability Analysis

235

4.3.2 Triangle Area Analysis Results Figure 8 presents the sample outcomes of the triangular area testing analysis. By leveraging the Triangle Area data, we conducted an analysis of the count number within the designated green box, surpassing the assigned threshold. A threshold value of 100 is denoted by the green line. This area subsequently reduces and stabilizes before the 2-h mark. The red shape serves as a visual indicator that effectively communicates fluctuations in movement intensity. Our investigation of ID 2 and ID 3 behavior reveals the same pattern: increased, decreased, decreased, increased, decreased, decreased.

Fig. 8. Sample Triangular Area Results for IDs 2 and 3

4.3.3 Moving Average Analysis Results Figure 9 presents the sample results of the moving average testing analysis. By utilizing the Moving Average data, we conducted an analysis of the count number within the designated green box, surpassing the assigned threshold. Our investigation of ID 2 and ID 3 behavior reveals the same pattern: increased, decreased, decreased, increased, decreased, decreased.

Fig. 9. Sample Moving Average Results for IDs 2 and 3

236

W. H. E. Mg et al.

4.3.4 Poicare Analysis Results Figure 10 presents the sample outcomes of the Poicare testing analysis. During the Poicare analysis, the primary cluster of data points exhibits a shift towards higher density and tighter convergence as the anticipated calving event draws nearer.

Fig. 10. Sample Poicare Results for IDs 2 and 3

4.3.5 Entropy Analysis Results In Fig. 11, we delve into the analysis of calving time prediction of the cows. During this phase of analysis, we have introduced a calibrated calving threshold of 0.998. The red points denote cows that are anticipated to calve within the subsequent 3 h. The visualization provided by Fig. 11.

Fig. 11. Prediction Calving Time Results for IDs 2 and 3

5 Discussions and Conclusions Our study focused on cattle trajectory analysis using five key features: trajectory, triangle area, moving average, Poicare, and entropy. Through the examination of these features, we conducted tests on three calving cows within the 12-h period before calving. The results underscore our system’s efficacy in predicting calving time. Specifically, our research demonstrated the potential to automatically predict calving time within three

AI Driven Movement Rate Variability Analysis

237

hours before the event. What sets our study apart is its real-time applicability, seamlessly integrating into live calving farm operations. This automated prediction capability not only enhances animal care but also improves overall farm productivity by enabling timely interventions. Furthermore, our system’s adaptability suggests its potential for broader applications in animal behavior analysis, marking a significant advancement in animal management practices. In conclusion, our study’s success in predicting calving time and its real-time utility mark substantial contributions to animal management. By bridging innovative technology with practical farming needs, our system holds the promise of transforming animal care and behavior analysis, making strides towards more efficient and informed agricultural practices. Acknowledgements. This publication is subsidized by JKA through its promotion funds from KEIRIN RACE.

References 1. Zin, T.T., Maung, S.Z.M., Tin, P., Horii, Y.: Feature detection and analysis of cow motion classification for predicting calving time. Int. J. Biomed. Soft Comput. Hum. Sci. Off. J. Biomed. Fuzzy Syst. Assoc. 26(1), 11–20 (2021) 2. Sumi, K., Maw, S.Z., Zin, T.T., Tin, P., Kobayashi, I., Horii, Y.: Activity-integrated hidden Markov model to predict calving time. Animals 11(2), 385 (2021) 3. Mg, W.H.E., Zin, T.T.: Cattle face detection with ear tags using YOLOv5 model. ICIC Exp. Lett. Part B: Appl. 14(01), 65 (2023) 4. Maw, S.Z., Zin, T.T., Tin, P., Kobayashi, I., Horii, Y.: An absorbing Markov chain model to predict dairy cow calving time. Sensors 21(19), 6490 (2021) 5. Neethirajan, S.: The role of sensors, big data and machine learning in modern animal farming. Sens. Bio-Sens. Res. 29, 100367 (2020) 6. Cangar, Ö., et al.: Automatic real-time monitoring of locomotion and posture behaviour of pregnant cows prior to calving using online image analysis. Comput. Electron. Agric. 64(1), 53–60 (2008) 7. Mee, J.F.: Managing the dairy cow at calving time. Veter. Clin. Food Animal Pract. 20(3), 521–546 (2004) 8. Koyama, K., et al.: Prediction of calving time in Holstein dairy cows by monitoring the ventral tail base surface temperature. Vet. J. 240, 1–5 (2018) 9. Wang, Y., Xu, X., Wang, Z., Li, R., Hua, Z., Song, H.: ShuffleNet-Triplet: a lightweight REidentification network for dairy cows in natural scenes. Comput. Electron. Agric. 205, 107632 (2023)

Channel-Wise Pruning via Learn Gates&BN for Object Detection Tasks Min-Xiang Chen, Po-Han Chen, and Chia-Chi Tsai(B) Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan {n28111089,cctsai}@gs.ncku.edu.tw

Abstract. Network pruning is an important research area aimed at addressing the high computational costs of deep neural networks. Previous studies [1] have indicated that it is not necessary to follow the conventional pruning process of training a large, redundant network, but rather, more diverse and higher-performing potential models can be directly pruned from randomly initialized weights. However, our experimental results on the MS COCO 2017 dataset [2] demonstrate that this approach is not applicable when compressing object detection models. We believe this outcome is associated with the complex network structures involved in object detection, making it relatively challenging to explore pruned architectures from random weights. To address this issue, we improved the existing Learn gates method and incorporated Batch normalization [3] to jointly learn channel importance. This enhances the learning capability of channel importance in a shorter time frame and facilitates the exploration of suitable pruned network structures within pre-trained weights. When applying our network pruning method to object detection models YOLOv3 [4] and YOLOv4 [5], our approach achieves higher accuracy with only a brief period of network structure learning. Keywords: Network Pruning · Object Detection · machine learning · computer vision

1 Introduction In recent years, with the rapid development of Convolutional Neural Networks (CNNs), fields such as computer vision and natural language processing have undergone revolutionary changes. CNNs are renowned for their exceptional capability in handling complex tasks and large-scale datasets. However, the effectiveness of these state-ofthe-art deep learning models comes at the cost of significant scale and computational complexity, making their deployment in resource-constrained devices or real-time applications challenging. As a result, researchers are actively exploring innovative techniques for model optimization and efficiency improvement. Network pruning has emerged as an effective approach to address the computational burden associated with large neural networks. By removing redundant or less critical components, such as neurons, channels, or layers, from the neural architecture, network pruning can significantly reduce the number of parameters and operations in the model, thus accelerating the inference process. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 238–249, 2024. https://doi.org/10.1007/978-981-99-9412-0_25

Channel-Wise Pruning via Learn Gates&BN

239

The conventional pruning process involves three steps: pre-training, pruning, and fine-tuning. However, the structure of pruned models often needs to be explored from a fully trained model, which consumes a considerable amount of time for weight training. Although the Pruning from scratch method [1] suggests that reliable pruned structures can be found from random weights, our research has found that for complex original network structures, particularly for object detection tasks (such as YOLOv3 and YOLOv4), potential pruned network structures cannot be easily discovered from random weights. We believe that the weights of these complex networks possess specific characteristics that make direct pruning from random weights unsuitable. To overcome this challenge, we propose the Learn gates & BN pruning method, which fully utilizes the Learn gates [1] and batch normalization mechanisms [3], enabling us to rapidly explore the importance of channels in the pre-trained network. With the Learn gates & BN pruning method, we do not need to update the weights of convolutional layers, significantly speeding up the pruning process while maintaining excellent performance even at high pruning rates. Experimental results demonstrate that our Learn gates & BN pruning method performs exceptionally well in object detection tasks, such as YOLOv3 [4] and YOLOv4 [5] pruning experiments on the MS COCO 2017 dataset. Compared to the Pruning from scratch method, our approach can identify superior pruning structures and maintain excellent performance even at high pruning rates.

2 Related Work In order to deploy models in real-world applications, latency, throughput, and efficiency of the model present a challenge. However, model compression algorithms have the ability to transform a large and complex model into a smaller and clear version, while simultaneously maintaining the precision of the model. This signification reduces the hardware’s storage, bandwidth, and computational requirements, thereby accelerating model inference and facilitating its implementation. Common model compression methods can be divided into several categories: (1) Weight Quantization [6] is mainly used to simulate the behavior of the model on the hardware. Typically, models are represented by 32-bit floating-point numbers. However, studies have shown that the model can achieve good results without requiring too much precision. Therefore, most quantization methods lean towards INT8 quantization [7, 8], and there are even more extreme methods that use 1-bit quantization [9–11]. With the right hardware design, energy-consuming multiplication and division operations can be replaced with faster and more energy-efficient shift operations. (2) Knowledge Distillation [12] is the process of extracting and distilling the knowledge contained in a well-trained model into another model. The overall concept can be divided into a teacher-student model, where the teacher model (the complex model) “distills” the student model (the simplified model). Model pruning [13–16] is a method distinct from the previous ones. Rather than quantizing model parameters or distilling a simplified model, it seeks to directly eliminate some of them. The principle of pruning involves eliminating ‘unimportant’ weights within the model to reduce the number of parameters and computations, all while striving to maintain the model’s accuracy. These three methods complement each other to complete the model compression process.

240

M.-X. Chen et al.

The following mainly introduces model pruning and related techniques, which include pruning methods, the granularity of pruning, and the targets of pruning. Pruning Methods We divided pruning methods into two categories: (1) multi-shot pruning [13, 17] and (2) one-shot pruning [15, 16]. Multi-shot pruning, also known as iterative pruning, aims to break down the overall pruning goal into several smaller objectives. For instance, to achieve a 70% pruning rate, the pruning process might be divided into 10 steps of 7% each. Typically, iterative training is performed after each pruning step to ensure that the model’s accuracy does not decrease significantly. Another method of pruning was one-shot pruning [15, 16], which, as opposed to multi-shot pruning, aimed to complete the pruning in a single step. Granularity of Pruning Based on the granularity of pruning, it was divided into weight pruning [13] and filter pruning [16, 18]. Weight pruning could lead to irregular arrangements of weights, necessitating additional methods to restore regularity and requiring special hardware support. In contrast, filter pruning allowed the weights to maintain their regularity after pruning, simplifying the process of mapping onto hardware for execution. As a result, filter pruning was the granularity more commonly adopted. The Targets of Pruning Depending on the target of pruning, it could be divided into pruning with a pre-trained model and pruning from scratch [1]. Pruning with a pre-trained model aimed to prune models that had already been trained. This approach was based on the lottery ticket hypothesis [17], which suggested that weights had a significant influence on the pruned network, and therefore, a well-trained model was required to complete downstream tasks. On the other hand, pruning from scratch (without a pre-trained model) [1] argued that the network structure was more critical for downstream tasks, and therefore achieving pruning through pre-trained weights was not the primary consideration. Beyond the pruning process and methods, the most important aspect is determining whether an object should be pruned. We use filter pruning as an example, this can primarily be divided into (1) offline pruning [13, 18] and (2) online pruning [19, 20]. Offline pruning mainly judges the importance of pruning through filter data. The main evaluation methods include norm, standard deviation, entropy, geometric medium, average percentage of zero activation, etc. A representative method using the norm is ThiNet [16]. On the other hand, online pruning mainly judges importance based on the scaling factor of the Batch Normalization Layer. A representative method here is Network Slimming [18]. In recent years, many methods have attempted to integrate pruning into the training period to speed up model output. Among these, soft filter pruning [21] and dynamic pruning [22] are the most popular. In the broad semantics of the machine learning field, many methods can be combined with pruning. In pruning with supervised learning, annotating data is a time-consuming task. Therefore, derivative methods like pruning with self-supervised labels [23, 24] have been used to mitigate the cost of data labeling. In terms of privacy data, there is also pruning with federated learning [25, 26]. For different applications, in addition

Channel-Wise Pruning via Learn Gates&BN

241

to accuracy requirements, other measures such as recall rate might be demanded. In summary, pruning is not limited to model compression and requires the selection of appropriate strategies in conjunction with applications to achieve better benefits.

Conv Batch normalization

Conv

Gates

Batch normalization

Mish

Gates Mish

Conv Batch normalization

Mish

(a)Gates in the convolutional layer

(b)Gates in the residual layer

Fig. 1. We incorporate a Gates Layer into each convolutional layer. However, to circumvent the dimension mismatch problem, we refrain from adding a Gates layer within the residual layer.

3 Method 3.1 Pruning from Scratch Method Existing one-shot pruning methods [15, 16] require a considerable amount of time for sparse training, with the aim of training a sparse and redundant network. Then, unimportant channels are eliminated through model pruning. However, the downside of this method is that it takes a lot of time to train the sparse network. For a more efficient network compression, [1] proposed the “Pruning from scratch” method. This method allows for the pruning of more diverse pruning structures directly from the model with randomly initialized weights, including potentially better-performing models. Different Existing one-shot pruning methods [15, 16], by introducing gates into the channels to determine the importance of the channel, pruning can start from initial weights. This method not only significantly reduces the pre-training burden of traditional pruning methods, but also allows for the discovery of more diverse pruning structures. As shown in Fig. 1(a), we add a Gates Layer after the BN Layer. The addition of gates is for learning the importance of channels in the random weights. As shown in Fig. 1(b), Gates parameters can be added to all hidden convolution layers, except for the last convolution layer in the Residual block. In order to prevent the dimensions mismatch

242

M.-X. Chen et al.

between of the shortcut layer and the convolution layer within the Residual block after pruning, no gates were added to the final layer in the Residual block. (a)Pruning from scratch Structure Learning Random weight model

Learn gates

Model pruning

Pruned structure

Train from scratch

Compact model

Pruned structure

Fine-tune

Compact model

(b)Learn gate & BN procedure Structure Learning Pre-trained model

Learn gates&BN

Model pruning

Fig. 2. Different from Pruning from scratch (a), we use a pre-trained model as input. During training, we freeze the model weights and only train Gates and Batch Normalization (BN) weights. After the model is pruned, we utilize the pre-trained weights for fine-tuning.

Figure 2(a) displayed the process of “Pruning from scratch,” consisting of two stages: Structure Learning and Weight Optimization. In the Structure Learning stage, all parameters, excluding Gates, were frozen, and sparse training was conducted on the Gates for ten epochs to probe the importance distribution of channels in randomly initialized weights. Subsequently, pruning was performed based on the values of the Gates. If the value of a Gate fell below a preset threshold, the corresponding channel was eliminated. However, according to our experimental findings, we noted that when “Pruning from scratch” was applied to an object detection model, its precision was lower compared to existing one-shot pruning methods like the S&T method[27]. To address this problem, we proposed a method called “Learn Gates & BN”. The goal of the “Learn Gates & BN” method is to counter the limitations of Pruning from scratch, which exclusively relies on Gates to ascertain the importance of channels in a network with randomly parameterized weights. By adopting this method, we aim to have improved control over the model’s sparsity and enhance the accuracy of the object detection model. 3.2 Learn Gates & Batch Normalization As stated in Sect. 3.1, “Learn Gates & BN” aims to address the issue of lower precision when applying “Pruning from scratch” to an object detection model. In relation to the “Pruning from scratch” method, we proposed the following two approaches: Learn Gates and Batch Normalization The primary purpose of adding Batch Normalization Layers to the network architecture is to normalize multiple data sets in each training iteration, thereby enhancing the stability of training and promoting faster convergence, as well as mitigating the vanishing gradient problem. However, the original Pruning from scratch method performs sparse

Channel-Wise Pruning via Learn Gates&BN

243

training for only a few epochs during the Learn gates process, and Batch Normalization cannot function during each training iteration, which may cause instability in gradient updates. Consequently, relying solely on Gates may not accurately determine the importance distribution of Channels. To resolve this issue, in the process of learning the Channel importance distribution, we not only used the original Gates but also included the parameters of the batch normalization layer for updates. Through this approach, we could better control the importance of channels and improve the accuracy of the Pruning from scratch method. This way, during the training process, the information of both Gates and Batch Normalization can be taken into account to more effectively determine the importance distribution of each channel, thus improving the overall performance of the model. Learn from a Pre-trained Model The Learn Gates phase in the “Prune from scratch” method determines the pruned network structure based on random weights. However, the absolute randomness of the initial weights might hinder Gates’ capability to discover the optimal pruned network structure. This situation results in Gates only finding a pruned network structure suitable for the given random weights. Therefore, even comprehensive training under this pruned network structure fails to attain the best precision. To address this issue, we introduced pre-trained weights as initial weights during the Learn Gates and Batch Normalization phase. Compared to random weights, the pretrained model represents the best outcome under the unpruned model and can accurately guide Gates in determining the importance of channels in the correct direction. As depicted in Fig. 2(b), “Learn gates & BN” also consists of two steps: Structure Learning and Weight Optimization. In the Structure Learning phase, pre-trained weights are loaded into the initial network. The parameters within Convolution layers are frozen, and only Gates and Batch Normalization layers are trained for 10 epochs, allowing the Gates to explore the distribution of channel importance under pre-trained weights through sparsity. The network model derived from Learn Gates & Batch Normalization is pruned. Smaller Gate values imply that the corresponding Channel is less important. If a Gate value is lower than the pre-set global threshold, the corresponding channel is removed. This results in a compact network model with a decrease in original accuracy. In the Weight Optimization phase, training is carried out using the network architecture derived from pruning, with the network model from the pruning phase serving as the initial weights. Just like in the aforementioned Pruning from the scratch method, Gates are only used to learn and determine the importance of the corresponding channels, so redundant Gates are not required in the retraining process.

4 Experiment 4.1 Network Model YOLO. The YOLO (You Only Look Once) series [5, 28–30] is a model architecture used for object detection. When applied to edge devices and resource-constrained devices, it requires a streamlined model structure to meet real-time requirements. We applied our method to YOLOv3 [4] and YOLOv4 [5]. As shown in Fig. 1, the positions of the gates

244

M.-X. Chen et al.

layers are indicated. We placed the gates layers after the batch normalization layers in the network structure of both YOLOv4 and YOLOv3. However, we did not add gates layers to the previous layer of shortcut connections to ensure consistent dimensions. We used multi-scale image sizes for channel importance learning and fine-tuning, while the testing stage utilized an image size of 416 × 416. 4.2 Datasets MS COCO 2017. The Microsoft COCO (Common Objects in Context) dataset consists of high-resolution images from real-world scenes, containing 80 different object categories. The images are annotated with detailed label information, including bounding box positions, object class labels, and possible object keypoint positions. We trained our method using 118,287 images from the train2017 set. To evaluate the performance of our approach, we compared it with the S&T [27] method on 5,000 validation images from val2017 and 20,288 test images from test-dev2017 for object detection. 4.3 Learning Gates&BN and Fine-Tuning Learning Gates&BN. During the process of learning channel importance, we employed the SGD optimizer with a batch size of 8, an initial learning rate of 2.324 × 10–2 , and a total of 10 epochs. At 70% and 90% of the training epochs, the learning rate was reduced to one-tenth of its original value. Fine-Tuning. For the pruned model with a MACs drop ratio of less than 50%, the initial learning rate was set to 2.324 × 10–3 . We fine-tuned the model using a batch size of 8 and the ReduceLROnPlateau learning rate scheduler. When the training loss did not decrease for every 6 epochs, the learning rate was reduced to one-tenth of its previous value, eventually reaching 2.324 × 10–6 . For the pruned model with a MACs drop ratio of more than 50%, we employed the cosine learning rate scheduler [31] with an initial learning rate of 2.324 × 10–2 . The model underwent training for 300 epochs with a batch size of 16.

4.4 Experiment Results In the experiments on YOLOv3 (Table 1), we observed that our method and S&T [27] yielded similar results at 16% and 28.6% MACs reduction, indicating comparable performance at lower pruning rates. However, at a more significant MACs reduction of 61.7%, our method achieved an AP50 of 52%, while S&T only reached 40.6%, showing a significant improvement of 12.6% compared to S&T. Clearly, our method outperforms S&T at higher pruning rates, demonstrating superior performance in handling extensive pruning scenarios. In the experiments on YOLOv4 (Table 2), when comparing at the same low pruning rate (16% MACs reduction), the results of our method and S&T are quite similar, with only a slight difference of 0.2% AP50 on the COCO test-dev2017 dataset. However, in the comparison at high pruning rates (61.7% MACs reduction), our method achieves an

Channel-Wise Pruning via Learn Gates&BN

245

Table 1. The comparison of our method with S&T and Pruning from Scratch (PFS) in YOLOv3. Model

Method

YOLOv3



MACs ↓ (%) –

Params ↓ (%)

MACs

Params

val AP50

30.16G



64.36M

60.1%

Ours

16.08%

25.31G

9.28%

58.39M

59.1%

S&T

16.09%

25.31G

32.21%

43.63M

59.8%

Ours

28.64%

21.52G

19.06%

52.11M

57.6%

S&T

28.67%

21.54G

50.62%

31.79M

58.8%

PRS

29.98%

23.14G

26.04%

45.81M

41.6%

Ours

61.73%

12.64G

63.37%

22.69M

51%

S&T

61.75%

12.64G

68.68%

19.40M

40.6%

Table 2. The comparison of our method with S&T in YOLOv4. Model

Method

YOLOv4



MACs ↓ (%) –

MACs

Params ↓ (%)

Params

test-dev AP50

30.16G



64.36M

60.1%

Ours

16.08%

25.31G

9.28%

58.39M

59.1%

S&T

16.09%

25.31G

32.21%

43.63M

59.8%

Ours

28.64%

21.52G

19.06%

52.11M

57.6%

S&T

28.67%

21.54G

50.62%

31.79M

58.8%

Ours

38.29%

18.61G

27.86%

46.45M

57.4%

S&T

38.31%

18.60G

61.18%

24.98M

57.3%

Ours

50.64%

14.86G

40.95%

38.01M

55.4%

S&T

50.61%

14.9G

73.52%

17.05M

52.5%

Ours

65.80%

10.31G

58.58%

26.67M

52.2%

S&T

65.86%

10.29G

85.9%

9.07M

38.9%

AP50 of 43.7%, while S&T achieves 52.2%, showing an improvement of 13.3% over S&T. Notably, our method performs equally well as YOLOv3 in the context of high pruning rates, outperforming S&T. 4.5 Analysis Based on the analysis of the network architecture, we investigated the impact of the distribution of preserved filters in each convolution layer on the accuracy. As depicted in Fig. 3, Convolution layer index 1–52 represents the backbone (Darknet53) of YOLOv3, while Convolution layer index 53–75 corresponds to the FPN [32] (Feature Pyramid Network). Our method tends to retain more filters in the backbone part, whereas S&T preserves more filters in the FPN. Despite the significant difference in network pruning

246

M.-X. Chen et al.

structures between our method and S&T, both approaches can achieve high accuracy with comparable performance [1] in diverse network architectures.

90%

80%

80%

70%

70%

Retain ratio

100%

90%

Retain ratio

100%

60% 50% 40%

60% 50% 40%

30%

30%

20%

20%

10%

10%

convolution layer index

73 70 67 64 61 58 55 52 49 46 43 40 37 34 31 28 25 22 19 16 13 10 7 4 1

0%

73 70 67 64 61 58 55 52 49 46 43 40 37 34 31 28 25 22 19 16 13 10 7 4 1

0%

convolution layer index

Fig. 3. Our method (left figure) and S&T (right figure) in YOLOv3, both pruned at ratios of 65.8% and 65.86%, respectively, depict the percentage of preserved filters in each convolution layer. The X-axis of the graphs represents the index of the convolution layer, while the Y-axis denotes the percentage of retained filters compared to the original number of filters.

100%

100%

90%

90%

80%

80%

70%

70%

Retain ratio

Retain ratio

In high pruning scenarios, as depicted in Fig. 4, our method exhibits a more evenly distributed preservation percentage across each convolution layer, while S&T demonstrates an uneven distribution. S&T retains less than 10% of the original filters in the later part of the backbone, resulting in a substantial loss of complex image features and high-level semantic learning capability. Conversely, our method retains a considerable number of filters in the deeper layers of the network, which prevents a significant decrease in AP50 performance in high pruning rates.

60% 50% 40% 30%

60% 50% 40% 30%

20%

20%

10%

10%

convolution layer index

0%

73 70 67 64 61 58 55 52 49 46 43 40 37 34 31 28 25 22 19 16 13 10 7 4 1

73 70 67 64 61 58 55 52 49 46 43 40 37 34 31 28 25 22 19 16 13 10 7 4 1

0%

convolution layer index

Fig. 4. Our method (left figure) and S&T (right figure) in YOLOv3, both pruned at ratios of 16.08% and 16.09%, respectively, illustrate the percentage of preserved filters in each convolution layer. The X-axis of the graphs represents the index of the convolution layer, while the Y-axis denotes the percentage of retained filters compared to the original number of filters.

Channel-Wise Pruning via Learn Gates&BN

247

5 Discussion and Conclusion Through our research and experiments, we have successfully explored innovative approaches for network pruning in Convolutional Neural Networks (CNNs) and proposed the Learn gates&BN pruning method to address complex network architectures and challenging tasks. Our main contributions can be summarized as follows: Firstly, we identified limitations in traditional Pruning from scratch methods when dealing with complex original network structures, especially in object detection tasks, as they fail to identify suitable pruning structures. This observation prompted us to explore more suitable pruning methods to effectively handle these specialized tasks. Secondly, we introduced the Learn gates&BN pruning method, which combines gates and batch normalization mechanisms, enabling us to rapidly assess the importance of network channels in a short period of time. This approach does not require updating the convolutional layer weights, significantly speeding up the pruning process while still achieving excellent results even at high pruning rates. Lastly, we conducted pruning experiments on complex network architectures like YOLOv3 and YOLOv4. The results demonstrated that our Learn gates&BN pruning method achieved outstanding performance on the MS COCO 2017 dataset. These findings further validate the effectiveness and applicability of our proposed method.

References 1. Wang, Y., et al.: Pruning from scratch. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12273–12280 (2020) 2. Lin, T.-Y., et al.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48 3. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, PMLR, pp. 448– 456 (2015) 4. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement, arXiv preprint arXiv:1804. 02767 (2018) 5. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934 (2020) 6. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding, arXiv preprint arXiv:1510.00149 (2015) 7. Krishnamoorthi, R.: Quantizing deep convolutional networks for efficient inference: a whitepaper, arXiv preprint arXiv:1806.08342 (2018) 8. Banner, R., Hubara, I., Hoffer, E., Soudry, D.: Scalable methods for 8-bit training of neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018) 9. Courbariaux, M., Bengio, Y., David, J.-P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, vol. 28 (2015) 10. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or -1, arXiv preprint arXiv:1602.02830 (2016)

248

M.-X. Chen et al.

11. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: Xnor-net: Imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/ 978-3-319-46493-0_32 12. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531 (2015) 13. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, vol. 28 (2015) 14. Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference, arXiv preprint arXiv:1611.06440 (2016) 15. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets, arXiv preprint arXiv:1608.08710 (2016) 16. Luo, J.-H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017) 17. Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International Conference on Learning Representations (2018) 18. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017) 19. Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNS. In: Advances in Neural Information Processing Systems, vol. 29 (2016) 20. Gao, X., Zhao, Y., Dudziak, Ł., Mullins, R., Xu, C.-Z.: Dynamic channel pruning: Feature boosting and suppression, arXiv preprint arXiv:1810.05331 (2018) 21. He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks, arXiv preprint arXiv:1808.06866 (2018) 22. Lin, T., Stich, S.U., Barba, L., Dmitriev, D., Jaggi, M.: Dynamic model pruning with feedback, arXiv preprint arXiv:2006.07253 (2020) 23. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations, arXiv preprint arXiv:1803.07728 (2018) 24. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5 25. Koneˇcný, J., McMahan, H.B., Yu, F.X., Richtárik, P., Suresh, A.T., Bacon, D.: Federated learning: strategies for improving communication efficiency, arXiv preprint arXiv:1610.05492 (2016) 26. Hard, A., et al.: Federated learning for mobile keyboard prediction, arXiv preprint arXiv: 1811.03604 (2018) 27. 張恩誌, "多重混合式剪枝優化之深度學習模型設計方法與案例," 碩士, 電子研究所, 國 立陽明交通大學, 新竹市 (2021). https://hdl.handle.net/11296/eagsgv 28. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) 29. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) 30. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023)

Channel-Wise Pruning via Learn Gates&BN

249

31. Loshchilov, I., Hutter, F.: SGDR: Stochastic gradient descent with warm restarts, arXiv preprint arXiv:1608.03983 (2016) 32. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

Prediction of Sepsis Mortality Risk Based on Ensemble Learning Algorithm FBTV Xuan Zhang , Lixin Huang , Teng Fu , and Yujia Wu(B) School of Information Science and Technology, Shanghai Sanda University, Shanghai 201209, China [email protected]

Abstract. Sepsis is one of the serious complications of ICU patients and one of the most common causes of death. Research on the ICU sepsis patients’ data in MIMIC-IV database, we propose an ensemble learning algorithm FBTV with voting mechanism to achieve six-hour advance mortality risk prediction. At first, we extract 15,558 sepsis data from MIMIC-IV database and select key features with de-duplication, filtering, padding and outlier processing, etc. After that, 31 key features are extracted by using principal component analysis, and 2,281 data that meet the requirements are used as the dataset for the research. Then, an ensemble learning model are built based on voting mechanism with random Forest, xgBoost and decision Tree as FBTV. Finally, indicators such as accuracy, recall, F1 value, and AUC were used to evaluate the performance of the model. It shows that the FBTV algorithm achieves an accuracy of 0.956 compared with the single algorithm decision Tree which has the highest accuracy 0.927. This work is confirmed to assist clinicians in decision making and adjusting treatment plans to reduce the occurrence of adverse outcomes. Keywords: Sepsis · Mortality Risk Prediction · Ensemble Learning · Machine Learning

1 Introduction Sepsis is life-threatening organ dysfunction caused by a dysregulated host response to infection, as defined by the Third International Consensus [1]. Sepsis and septic shock are major healthcare problems, impacting millions of people around the world each year and killing between one in three and one in six of those it affects [2–4]. The clinical manifestations of sepsis mainly include fever (or hypothermia), chills, palpitations, shortness of breath, and altered consciousness [5], which are difficult to detect because they are similar to common diseases such as hypotension and fever, and pose a great challenge to clinical management [6]. Machine learning are used to explore unknown information by learning known information and converting data resources into useful knowledge to help people make scientific analysis and decisions. Research on the prediction of ICU patients’ conditions and early warning of mortality risk by using machine learning and data mining methods © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 250–259, 2024. https://doi.org/10.1007/978-981-99-9412-0_26

Prediction of Sepsis Mortality Risk

251

has been a difficult and hot research topic in recent years. How to analyze the massive amount of physiological data generated by ICU patients during their treatments and find out the valuable information implied is the key to grasp the condition status and potential danger of patients. In ICU sepsis research, foreign studies started earlier but focused on medical evidence-based and clinical validation. Since 2004, the European Medical Association of Intensive Case (EMAIC) and the American Society of Critical Care Medicine (ASCCM) have jointly published guidelines for the management of sepsis. They are currently updated to the 5th edition of the “Surviving sepsis campaign (SSC): international guidelines for the treatment of sepsis and infectious shock, 2021 edition” [7]. A Meta-analysis also showed that the systemic inflammatory response syndrom (SIRS) diagnostic criteria were superior to qSOFA (quick SOFA) in the diagnosis of sepsis, but qSOFA was a better predictor of in-hospital mortality [8]. However, the disease assessment accuracy using traditional medical scoring systems is not enough to meet the needs of precision medicine, that’s why more studies related to the use of machine methods to assist in diagnosis have been generated. Richard Andrew Taylor et al. of Yale University used random forests to predict the risk of death within 28 days of hospitalization in sepsis patients based on four New York hospitals data, and showed that random forests outperformed traditional computational methods and medical scoring systems [9]. R Pirracchio et al. proposed a Super ICU Learner Algorithm integrating 12 algorithms with 17 variables selected to predict the mortality of patients within 24 h [10]. Bu Xiaoxuan proposed a model combining FUZZY ARTMAP neural network and decision tree for predicting the risk of death of ICU patients by using ICU time series data provided by the MIT Computational Physiology Laboratory, and compared other machine learning methods to confirm the effectiveness [11]. Fengy et al. summarized and compared the performance of different methods of MLP/RF/GB/LSTM/CNN-LSTM combining different features in four aspects of mortality prediction, diagnosis code, disease detection, and readmission rate, and the combination method used achieved 0.94 in mortality prediction [12]. Lin Ke et al. conducted a prediction study based on the MIMIC-III database using random forest and XGBoost algorithms for the risk of in-hospital mortality in ICU patients and sepsis patients, respectively, and the results proved that the prediction results were higher than the SAPS-II model [13]. Domestic sepsis research is relatively late in either clinical or machine learning fields. In particular, machine learning has been applied to fewer diagnostic studies of sepsis aid, and most studies are still focused on clinical trials of sepsis by medical personnel. Early diagnosis and effective treatment of patients who may develop sepsis is the key to reducing the risk of death. This study focuses on the specific study of sepsis mortality risk prediction for ICU patients in the MIMIC-IV database using machine learning methods. An ensemble learning algorithm FBTV based on voting mechanism from 8 baseline algorithms is built that is supposed to help health care providers to identify septic patients at risk of death as early as possible.

252

X. Zhang et al.

2 Related Work 2.1 MIMIC-IV Database Introduction MIMIC (Medical Information Mart for Intensive Care) is a large, freely-available database comprising deidentified health-related data from patients who were admitted to the critical care units of the Beth Israel Deaconess Medical Center. MIMIC-IV contains data from 2008–2019. The data was collected from Metavision bedside monitors. The medical information related to patients in the MIMIC database in this study is retrospective and in reality does not affect the actual clinical treatment and disease prognosis of the patients. The MIMIC-IV database has erased all patient privacy tags to ensure the security of patient information. Therefore, the database is exempt from the informed consent of the patients concerned, as determined by the ethics committee. Authors have obtained permission to download and use the database by passing the human ethics test used in the database (Record ID:55359005). MIMIC-IV is grouped into three modules: core, hosp, and icu. The aim of these modules is to highlight their intended use and provenance. The core module stores patient tracking information necessary for any data analysis using MIMIC-IV. The hosp module contains data derived from the hospital wide EHR. The icu module contains data sourced from the clinical information system at the BIDMC. 2.2 Ensemble Learning Ensemble learning algorithm is a general term for a class of algorithms. The main approach is to train several different individual learners (also called base learners or weak learners) combining the advantages of each learner in a particular combination (e.g., averaging, voting, learning, etc.) to obtain better prediction results. Ensemble learning yields good predictions on any size datasets, and it is one of the most popular machine learning algorithms today. Ensemble learning algorithms can be broadly classified into three categories: Bagging, Boosting, and Stacking which the first two categories are more commonly used. Random Forest (RF) is the representative algorithm of Bagging, and XGBoost, AdaBoost, and GBDT are the representative algorithms of Boosting. Bagging (Boostrap Aggregating), in which m new training datasets are randomly selected in a batch of training set data with put-back. Each selected training set is trained to obtain m weak classification models, and according to a certain combination strategy the final strong classification model is obtained finally. The general combination strategies are: averaging, voting. The averaging method is usually used for numerical class regression prediction problems, where the final prediction output is obtained by arithmetic or weighted averaging of the outputs of several weak classifiers. Voting rule is mainly for prediction of classification problems and generally follows a minoritymajority approach. Ensemble learning algorithms are widely used in various data mining tasks because of their powerful combinatorial learning capabilities with high efficiency and accuracy on data sets of various sizes.

Prediction of Sepsis Mortality Risk

253

3 Methodology In this study, we propose an ensemble learning algorithm based on voting mechanism named FBTV to achieve six-hour advance mortality risk prediction for sepsis patients by using the treatment data of ICU sepsis patients in MIMIC-IV database. Considering the specificity of medical data and the requirement of interpretability, we select RandomForest, xgBoost, DecisionTree, Adaboost, GradientBoosting, LGBMClassifier, catBoost, and LSTM as the baseline algorithms. We experimentally determine the effectiveness of the multi-algorithm fusion model FBTV based on the voting mechanism of RandomForest, xgBoost, and DecisionTree (see Fig. 1).

Fig. 1. Workflow of research to mortality predictive models of Sepsis patients

3.1 Data Extraction As shown in Fig. 2, we extract sepsis cases from the MIMIC-IV database. The filtered case records are sorted by time sequence to ensure the chronological order and continuity of the data. We screened sepsis cases by keywords and ICD number to find records of patients containing the corresponding admission number. It also extracts a variety of data information such as patient admission vital signs (blood pressure, heart rate, respiratory rate), laboratory microbiology tests, and drug administration. Finally, we generate datasets for analysis and mining by de-duplicating, filtering, splitting, and integrating the selected data. This dataset includes data information in multiple dimensions, such as time series, static and dynamic features, and provides the basis and guarantee for subsequent sepsis prediction model development and performance evaluation.

254

X. Zhang et al.

Fig. 2. The patient selection processes for data extraction from MIMIC-IV database.

3.2 Data Preprocessing Missing data in the dataset are more common due to the variability in examination and treatment to patient. For numerical data, we have tried four padding methods: forward padding, backward padding, zero-value padding and mean-value padding. Following with the experts’ suggestions and training results, we select zero-value padding in this study. For categorical data, we use plural for this purpose. For outliers, we try various approaches. Besides manually removing anomalous data that do not conform to common sense and experience, we use the outlier detection method for other difficult to detect outliers. Feature-specific data is analyzed globally or locally to detect and remove outliers. In addition, we also use statistical treatments such as the calculation of Z-score or IQR (interquartile spacing) to detect and treat outliers. 3.3 Feature Selection Feature extraction is an important part of data mining, which directly affects the classification accuracy and generalization ability of the model. We do not try more because of the specialized nature of medical data and the lack of corresponding domain knowledge. It would also be unhelpful to simply perform a feature crossover without any medical implications. Therefore, we select 31 key features using principal component analysis (PCA) after removing features with null values greater than 70% in the original data. These characteristics include basic information such as the patient’s age and gender, as well as test data such as blood biochemical indicators and vital signs. Finally, we confirm the validity of these features based on expert research and literature studies, and transformed the raw EHR data into feature vectors successfully which can be used for model training and prediction (see Table 1).

Prediction of Sepsis Mortality Risk

255

Table 1. Thirty-one Selected Key Features No Field

Type

No Field

Type

1

gender

text(2)

17

Urea_Nitrogen

num(mmol/L)

2

age

num

18

Hematocrit

num(L/L)

3

admission_type text(9)

19

Hemoglobin

num(g/L)

4

insurance

text(3)

20

MCH

num(pg)

5

race

text(33)

21

MCHC

num(g/L)

6

marital_status

text(5)

22

MCV

num(fL)

7

Anion_Gap

num(mmol/L) 23

Platelet_Count

num(×10^9/L)

8

Bicarbonate

num(mmol/L) 24

RDW

num(%)

9

Calcium_Total

num(mmol/L) 25

Red_Blood_Cells

num(×10^12/L)

10

Chloride

num(mmol/L) 26

White_Blood_Cells

num(×10^9/L)

11

Creatinine

num(µmol/L)

Heart_Rate

num(times/min)

12

Glucose

num(mmol/L) 28

Heart_Rhythm

text(26)

13

Magnesium

num(mmol/L) 29

Respiratory_Rate

num(times/min)

14

Phosphate

num(mmol/L) 30

O2_saturation_pulseoxymetry num(%)

15

Potassium

num(mmol/L) 31

Ectopy_Type_1

16

Sodium

num(mmol/L)

27

text(15)

3.4 Experimental Dataset By the data pre-processing and feature extraction process, we obtain a total of 2281 sepsis cases with 31 key characteristics. This dataset, with 1901 patients in the survival group and 380 in the death group, provides strong support for the development and evaluation of our sepsis prediction models. We randomly divide the dataset into two parts: a training set and a test set, where 80% of the data are used as the training set for model training and tuning, while 20% of the data are used as the test set to evaluate the performance and generalization ability of the model. In particular, since we want to achieve the prediction of life and death of sepsis patients six hours in advance, we need to consider the time series when performing the dataset partitioning. Therefore, the data are sorted by time sequence data, and then use the first 80% of the data as the training set and the last 20% of the data as the test set to ensure the time crossover between the training and test sets. 3.5 Ensemble Learning Model Construction At the beginning of the model construction, we first evaluate classical algorithms such as Bayesian classification, support vector machines, decision trees, random forests, and K-nearest neighbors. The experimental results show that the average accuracy of treebased algorithms such as decision trees and random forests is significantly higher than

256

X. Zhang et al.

other types of algorithms. So we mainly choose tree-like algorithms such as randomForest, xgBoost, decisionTree, adaBoost, GradientBoosting, LGBMClassifier, and catBoost as baseline algorithms. On the other hand, we also incorporate LSTM (Long ShortTerm Memory) into the baseline algorithm due to its obvious advantages in processing temporal data. We integrate the baseline algorithms by two methods, one is voting mechanism the other is weighted fusion. And the experimental results confirm that the voting method is superior to the weighted fusion algorithm. Three of the eight baseline algorithms are randomly to take part in the voting process. The specific operations are as follows (see Fig. 3): Step1: Divide the training dataset into three subsets for 3 baseline algorithms each. Step2: Use the three algorithms to predict the same validation dataset separately and obtain three prediction sequences. Step3: Merge three prediction sequences and find the highest frequency result as integrated sequence result. Step4: Evaluate the integrated sequence results and compare the accuracy of it with the predicted results of each individual algorithm.

Fig. 3. Pseudo-code of ensemble machine learning models based on voting mechanism.

The ensemble learning algorithms’ outcome show that the fusion of RandomForest, xgBoost and DecisionTree algorithms is the best, so this study defined this algorithm as FBTV (randomForest-xgBoost-decisionTree-Voting) model for mortality risk prediction of septic patients in ICU (see Table 2).

Prediction of Sepsis Mortality Risk

257

Table 2. Performance comparison of ten machine learning models Model

Accuracy

Precision

Recall

F1

AUC

randomForest

0.920

0.762

0.847

0.802

0.802

xgBoost

0.919

0.655

0.954

0.776

0.823

decisionTree

0.927

0.789

0.859

0.822

0.877

AdaBoost

0.791

0.150

0.584

0.238

0.560

Gbdt

0.797

0.167

0.587

0.261

0.568

lgbm

0.804

0.102

0.827

0.181

0.548

catBoost

0.795

0.181

0.558

0.273

0.571

LSTM

0.897

0.871

0.588

0.702

0.915

FBTW Ensemble

0.955

0.844

0.942

0.890

0.915

FBTV Ensemble

0.956

0.849

0.938

0.891

0.917

4 Experiments 4.1 Ensemble Learning Algorithm Based on Voting Mechanism It is found that among the results when we conduct experiments among eight baseline algorithms by the voting mechanism, the combination with RandomForest, xgBoost and DecisionTree has the best performance with an accuracy of 0.956 (see Table 3). Table 3. Performance Comparison of ensemble learning algorithms based on voting mechanism (partial) Model

Accuracy

Precision

Recall

F1

AUC

FBTV Ensemble

0.956

0.849

0.938

0.891

0.917

Voting(randomforest,xgboost,adaboost)

0.896

0.540

0.950

0.688

0.766

Voting(randomforest,xgboost,catboostclassifier)

0.897

0.543

0.953

0.692

0.768

Voting(randomforest,decisiontree,gradientboosting)

0.955

0.842

0.941

0.889

0.914

Voting(randomforest,decisiontree,lgbmclassifier)

0.955

0.839

0.944

0.888

0.913

Voting(xgboost,decisiontree,catboostclassifier)

0.897

0.543

0.951

0.691

0.768

4.2 Weighted Fusion Ensemble Learning Algorithm In this experiment, the three baseline models with the best effect are still the RandomForest, xgBoost, and decisionTree which is defined as FBTW (randomForestxgBoost-decisionTree-Weighting). The accuracy of FBTW is 0.955, and the parameters corresponding to each baseline algorithm are wf = 0.8, wb = 0.1, wt = 0.1 (see Table 4).

258

X. Zhang et al.

Table 4. Performance Comparison of ensemble learning algorithms based on weighting mechanism (partial) Model

Accuracy

Precision

Recall

F1

AUC

FBTW Ensemble

0.955

0.844

0.942

0.890

0.915

Weighting(randomforest,xgboost,lgbmclassifier)

0.898

0.528

0.989

0.688

0.763

Weighting(randomforest,xgboost,catboostclassifier)

0.901

0.541

0.987

0.699

0.769

Weighting(randomforest,decisiontree,lgbmclassifier)

0.955

0.837

0.945

0.888

0.912

Weighting(xgboost,decisiontree,gradientboosting)

0.894

0.529

0.957

0.681

0.761

Weighting(decisiontree,adaboost,lgbmclassifier)

0.954

0.850

0.931

0.889

0.917

4.3 Experimental Analysis Comparing the results of the two types ensemble learning algorithms, we can see that for the single algorithm, the tree-based algorithm has the best performance in predicting. Tree-based algorithms such as Random forest and XGBoost can output the importance of features after training, which can be used not only for feature selection, but also for exploring factors that have an important impact on prediction tasks. The result of the ensemble learning based on weighted fusion is comparable to the voting method and reaches 0.955. However, the weight of randomForest is 0.8, and the weight of xgBoost and decisionTree is 0.1 respectively, which has a more obvious single-algorithm dependency problem.

5 Conclusion We propose an ensemble learning algorithm based on voting mechanism FBTV to achieve six-hour advance mortality risk prediction to the treatment data of ICU sepsis patients in the MIMIC-IV database. The construction of ensemble learning based on voting mechanism and weighted fusion models are evaluated, we select the more reasonable FBTV algorithm as the mortality risk prediction model. This model can assist clinicians in decision making and adjusting treatment plans to reduce the occurrence of adverse outcomes. At the same time, there are certain limitations of the study. First, this study is a single-center retrospective study based on the public database MIMIC-IV. Although sepsis cases have been collected to the extent possible, the sample size remains smaller than some large sample studies, and no external database validation has been performed. Second, due to the long time span of the MIMIC-IV database, there is a high rate of missing data for newly identified sepsis-specific indicators in recent years. Third, the MIMIC-IV database is derived from Beth Israel Women’s Deaconess Medical Center, with a low percentage of Asian populations and possible population heterogeneity. We will incorporate local patient data in our later work, combine the existing model with the local database to validate the model performance, and also adjust the parameters to improve the model performance to make it more compatible with the local population characteristics.

Prediction of Sepsis Mortality Risk

259

Acknowledgement. This work is sponsored by 2023 Research Foundation of Shanghai Sanda University (No. 2023YB23).

References 1. Singer, M., Deutschman, C.S., Seymour, C.W., et al.: The Third International Consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 315(8), 801–810 (2016) 2. Fleischmann, C., Scherag, A., Adhikari, N.K., et al.: Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am. J. Respir. Crit. Care Med. 193(3), 259–272 (2016) 3. Fleischmann-Struzek, C., Mellhammar, L., Rose, N., et al.: Incidence and mortality of hospital- and ICU-treated sepsis: results from an updated and expanded systematic review and meta-analysis. Intensive Care Med. 46(8), 1552–1562 (2020) 4. Rhee, C., Dantes, R., Epstein, L., et al.: Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009–2014. JAMA 318(13), 1241–1249 (2017) 5. Cecconi, M., Evans, L., Levy, M., et al.: Sepsis and septic shock. Lancet 392(10141), 75–87 (2018) 6. Fu, M.S.: Machine learning-based diagnosis method for sepsis in ICU and death risk prediction. Nanjing University of Aeronautics and Astronautics (2020) 7. Weiss S.L., Peters, M.J., Alhazzani, W., et al.: Surviving sepsis campaign: International guidelines for management of sepsis and septic shock. Intensive Care Med. 47(11), 1181–1247 (2021) 8. Serafim, R., Gomes, J.A., Salluh, J., et al.: A comparison of the quick-sofa and systemic inflammatory response syndrome criteria for the diagnosis of sepsis and prediction of mortality: a systematic review and meta analysis. Chest, 153–646 (2018) 9. Taylor, R.A., Pare, J.R., Venkatesh, A.K., et al.: Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach. Acad. Emerg. Med. 23(3), 269–278 (2016) 10. Pirracchio, R., Petersen, M.L., Carone, M., et al.: Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. Lancet Respir. Med. 3(1), 42–52 (2015) 11. Bu, X.-X.: Research on life and death prediction of ICU patients based on FAM-CART. Beijing Jiaotong University (2017) 12. Tang, F., Xiao, C., Wang, F., et al.: Predictive modeling in urgent care: a comparative study of machine learning approaches. JAMIA Open 1(1), 87–98 (2018) 13. Junqing, X., Ke, L., Chunxiao, L., et al.: Application of random forest model in predicting the risk of in-hospital mortality of ICU patients. China Digit. Med. 12(11), 81–84 (2017)

Research on the Influence Mechanism of College Students’ Learning Satisfaction Based on the Chain Mediation Model Xin Guo1 , YiChen Yang1 , Zilong Yin2 , Ying Chen1 , and Yujia Wu1(B) 1 School of Information Science and Technology, Sanda University, No. 2727, Jinhai Rd.,

Pudong New Area, Shanghai, China {guoxin,ychen,wuyujia}@sandau.edu.cn 2 Department of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China

Abstract. This paper draws on authoritative scales such as SSI (US College Student Satisfaction Assessment), NSS (UK College Student Satisfaction Assessment) and other authoritative scales to compile a learning satisfaction questionnaire. The questionnaire was published through the Internet to conduct a survey of students from different colleges and universities across the country. After collecting survey data, the chain mediation model was used to analyze the factors that affect college students’ learning satisfaction and the interaction between these factors. The results show that: learning value, learning process, learning conditions, learning resources, and learning satisfaction are positively correlated. Learning conditions, learning process, and learning resources can play an intermediary role between learning value and learning satisfaction. The research conclusions can be used to improve college students’ learning satisfaction and teaching quality. Keywords: College students · learning satisfaction · teaching quality · chain mediation model · Bootstrap

1 Introduction Students are fundamental participants in college education and witness the quality of teaching. The level of their satisfaction with learning is a pivotal measure of college teaching quality [1]. The initial survey of student learning satisfaction was conducted by the American Education Committee, which employed the Cooperative Institutional Research Program (CIRP) in 1966 to measure first-year student satisfaction [2].Student learning satisfaction stems from the concept of customer satisfaction. While customer satisfaction reveals the role of business in generating and providing value to customers, student satisfaction in learning portrays the impact of schools on student learning and self-growth. With the deepening reform of China’s higher education, research on college students’ learning satisfaction has gradually been valued in China, and scholars have conducted various studies on college students’ learning satisfaction [3–5]. Jing Wen [6] investigated © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 260–270, 2024. https://doi.org/10.1007/978-981-99-9412-0_27

Research on the Influence Mechanism of College Students’ Learning Satisfaction

261

and analyzed the current situation of college students’ learning satisfaction in China. She found that Chinese college students are currently in a “basically satisfied” state and recognize the importance of learning satisfaction. Xian Li [7] conducted an empirical analysis of the factors affecting English teaching satisfaction from three aspects: student factors, nine universities in Hainan Province. Lijun Guo et al. [8] studied the influencing factors and mechanism of learning satisfaction of MOOC learners in different levels of colleges based on the information system success model and self-efficacy theory. However, previous studies have primarily relied on statistical analysis to measure learning satisfaction and they have not comprehensively examined the indirect effects among different factors, leading to a partial understanding of the overall effect of various factors on learning satisfaction. The chain mediation effect model is a widely-used approach in fields such as supply chain management, enterprise management, and psychological analysis. For example, Madhavan et al. [9] investigated the relationship between supply chain function and sustainable supply chain strategy and its effects on corporate performance. It also demonstrated how sustainable supply chain strategies serve as a mediator between supply chain function and company performance by providing reliable indicators to measure company success in the context of sustainable supply chain activities. Similarly, Chen et al. [10] studied the impact of knowledge coupling on exploratory and developmental innovation, including complementary knowledge coupling and alternative knowledge coupling. Their empirical analysis of survey responses from 229 technology companies in China found that the linkage of complementary knowledge positively influences exploration and development innovation. In their study, Pu et al. [11] examined the mediating effect of information exchange on the relationship between the structure of relationships and the adoption of electronic supply chain management systems and the adoption intentions of companies. The questionnaire was distributed among 212 companies in China. They conducted a three-stage least squares regression analysis to test the research model. In their study, Zhang et al. [12] used social capital theory to construct a model that explains the determinants of the impact of academic research on scholars in supply chain management. They collected data from 450 scholars in supply chain management from various countries, and found that social capital theory has a strong explanatory power for the mediating role of scholars’ social capital in the relationship between their research skills and academic influence. Wang et al. [13] used a chain mediation model to study adverse psychological outcomes of COVID-19 patients. In total, 4,612 participants from 8 countries completed questionnaires on physical symptoms, health inquiry, depression, anxiety and stress. The results show that the Philippines and Poland display the highest levels of anxiety, depression and stress. Drawing from previous research conducted in data mining and related fields, this study explores the factors that affect college students’ satisfaction with learning by developing a model of chain mediation effects based on previous studies. Although existing research [14, 15] shows that factors such as learning conditions, learning resources, learning process, and learning value have a direct impact on satisfaction with learning, no research has examined the comprehensive impact of these factors or explored underlying relationships such as chain mediation effects. This study focuses on the chain

262

X. Guo et al.

mediation effect between learning value and satisfaction of students, thus providing insight for schools seeking to enhance overall learning satisfaction.

2 Method 2.1 Research Subjects This study distributed a “College Student Learning Satisfaction Survey” to students from all over the country with varying majors in their respective colleges and universities. A total of 585 valid questionnaires were collected and analyzed. The survey revealed that females (63%) constituted a higher proportion than males (37%). Regarding the majors, management (28%), science and engineering (26%) majors accounted for the highest proportion, followed by education and art (16%) and other majors (14%). Humanities (9%), medical and nursing (3%), philosophy and law (4%), meanwhile, had relatively low proportions. The class year-wise distribution of the responses was as follows: freshmen (48.7%), juniors (23.6%), sophomores (15.4%), and seniors (12.3%). Overall, the sample is diverse in its gender distribution, major years and class years, making it an accurate representation of the study population. 2.2 Research Tools The learning satisfaction questionnaire was developed by referencing authoritative scales such as the Student Satisfaction Inventory (SSI) in American Universities and the National Student Survey (NSS) in the United Kingdom. The questionnaire covers seven factors that affect college students’ learning satisfaction: learning conditions, course settings, learning resources, teacher teaching, learning process, and learning value. The survey employs a 5-point Likert scoring method, with each item ranging from “very satisfied” to “very dissatisfied,” scored as 1 to 5 points respectively. A higher score indicates lower learning satisfaction. This study focuses on the various mechanisms by which learning value, learning process, learning conditions, and learning resources affect learning satisfaction. 2.3 Research Hypothesis The present study is based on several assumptions. (1) Learning value is a factor that students consider whether the knowledge they acquire from school has value for themselves. Learning value is crucial in determining students’ satisfaction with the learning process. Although it does not directly contribute to their learning attempts, it serves to stimulate positive learning emotions, attention, and state of mind to increase the efficiency of the learning process [16]. Therefore, it is crucial for the objective learning value to match the subjective learning value of students to instill positive emotions that motivate them to engage in learning the relevant background knowledge and enhance their learning satisfaction.

Research on the Influence Mechanism of College Students’ Learning Satisfaction

263

(2) The learning process involves group cooperation, learning pressure, extracurricular activities, assessment methods, and instructor management, all of which are classified as learning process factors in the current study. Research studies have shown that the learning process significantly affects students’ learning efficiency. The stress or psychological weight brought on by a person’s interactions with the learning environment and the discrepancy between their talents and the demands of learning is known as learning pressure [17]. The group cooperation process encourages student participation and engagement in learning, creating a classroom environment in which students actively participate in the learning process, interacting with both their classmates and instructors [18]. Extracurricular activities are crucial for encouraging students’ holistic development and raising their engagement levels. Student initiative and constant progress are encouraged by an effective learning process. The school assessment method and the guidance of counselors play a key role in students’ learning motivation and mental health, respectively. (3) Learning conditions and resources serve as vital indicators that influence learning satisfaction among college students. A conducive learning environment enhances students’ behavior habits and thinking patterns, leading to more effective and satisfying learning. Adequate hardware facilities and a comfortable learning environment can improve students’ satisfaction levels. A well-stocked library is another example of a complete learning resource that may give students access to a wealth of the most recent information, boosting their learning opportunities and eventually enlarging their thought processes. 2.4 Research Model The mediating effect analysis is widely used in social science research, such as psychology, management, and communication [19]. The issue that the mediating effect investigates is how the independent variable X influences the dependent variable Y, or the mechanism of X’s effect on Y. This method is frequently used in route analysis to determine if there are any significant mediating factors in the model. In the mediating effect, considering the impact of independent variable X on dependent variable Y, if X affects variable M and has an impact on Y, then M is called a mediating variable. M1 and M2 are referred to as common mediating variables in the chain mediation effect if X influences variables M1 and M2 jointly and has an effect on Y. Chain mediation is essentially a simple mediation with an order relationship and multiple mediating variables. The chain mediation model can study the mediating effect of each mediating variable by controlling other mediating variables, which can reduce parameter estimation errors caused by other mediating variables in simple mediation models. At the same time, it can also judge which mediating variable has a stronger effect through the chain mediation model. The chain mediation model is shown in Fig. 1. The process of using a chain mediation model for analysis generally includes three parts: The first one is the overall intermediation effect analysis. The second is the analysis of individual intermediary effects, and the third is the comparison between the intermediary effects of either individual intermediaries or combinational intermediary effects. The various effects equations are presented as (1)–(5), in which (1)(2)(3) are the indirect effects equation, (4) for total indirect effects equation, and (5) for the total effects

264

X. Guo et al. M1

X

f

M2

Y

c

Fig. 1. Basic chain mediating effect model

equation. The path identifier used in the equations is as follows: Int1 for path 1, Int2 for path 2, and Int3 for path 3, while c represents the direct effect, and the regression coefficients (a, b, d, e, f) for the two variables are also employed. Int1 = a ∗ b

(1)

Int2 = d ∗ e

(2)

Int3 = a ∗ f ∗ e

(3)

Total indirect effect = Int1 + Int2 + Int3

(4)

Total effect = Total indirect effect + c

(5)

The current study looked into how different variables interact, which cannot be fully explained by a simple mediation paradigm. Therefore, a three mediating variables chain mediation model, as demonstrated in Fig. 2, was deployed as the research model. The independent variable is denoted as X, dependent variable as Y, mediating variables as M1, M2 and M3, respectively, and this model has the capability to explore the mediating effects of several mediating variables. In order to develop more specific solutions to the current issues, it is possible to identify which chain in the chain mediation has the greatest influence on Y by evaluating the values of numerous chain effects.

M2

M1

M3

X

Y

Fig. 2. Model of chain mediating effects of three mediating variables

Research on the Influence Mechanism of College Students’ Learning Satisfaction

265

3 Experiment 3.1 Common Method Biases Test The research data in this article was collected through a questionnaire survey, so may have common method bias. Common method bias refers to the artificial covariation between predictive variables and calibration variables caused by factors such as the same data source and project characteristics. This artificial covariation greatly affects the research results, so the Harman single factor method was chosen for testing. Results revealed that there was no significant common technique bias issue since the explanatory variation of the first component was 46.33%, which was below the crucial criterion of 50%. 3.2 Reliability and Validity Analysis The Cronbach’s alpha coefficient was used to determine the questionnaire’s reliability. Since it is higher above the generally recognized cutoff point of 0.8, the overall Cronbach’s alpha coefficient for the sample data in this study was 0.962, which denotes strong reliability. Both the KMO and Bartlett’s spherical tests were run to evaluate validity. Due to the overall KMO index of the sample data being 0.969, the approximate chi square value of Bartlett’s spherical test reached 10156.836, indicating that the overall validity of the data is very good. All dimensions and overall data have passed reliability and validity tests. As a result, the data gathered from the questionnaire is trustworthy and appropriate for further research. 3.3 Descriptive Statistical Analysis Descriptive statistical analysis showed that students’ satisfaction with learning conditions, curriculum design, learning resources, teacher teaching, learning process, and learning value was good and relatively stable. 3.4 Correlation Analysis Among Variables There were positive correlations between learning satisfaction and learning value, teachers’ teaching, learning process, learning resources, curriculum design, and learning conditions. The difference was statistically significant. 3.5 Chain Intermediary Effect Analysis Previous research has shown that learning value is a key factor in determining satisfaction with the learning experience [20], with the Pearson correlation coefficient indicating the strongest positive association. This study uses SPSS macro program Process to analyze the chain mediating effect, and the results show that there is a mediating effect between learning value and learning satisfaction. To determine the significance of the mediating effect, we utilized the bias-corrected percentile bootstrap recommended by Wen

266

X. Guo et al.

et al. [21]. The mediating effect among learning process, learning conditions, learning resources and learning satisfaction was analyzed. The specific analysis results are shown in Table 3.The results show that learning value significantly and positively predicts the learning process (t = 29.3065, p = 0). Additionally, both learning value (t = 11.3877, p = 0) and learning process (t = 10.5612, p = 0) significantly and positively predict learning conditions. Further, learning value (t = 4.9799, p = 0), learning process (t = 9.7531, p = 0), and learning conditions (t = 8.5859, p = 0) significantly and positively predict learning resources. Lastly, learning value (t = 8.5859, p = 0), learning process (t = 4.9906, p = 0), learning conditions (t = 4.6876, p = 0), and learning resources (t = 3.8018, p = 0.0002) significantly and positively predict learning satisfaction. The coefficient of each path in the mediating effect is significant. Based on the values of β, a chain mediation model was developed and is presented in Fig. 1 (Table 1 and Fig. 3). Table 1. Regression results Regression Equation

Fit Index

Dependent Variable

Independent Variable

R

Learning Process

Significance of Regression Coefficients R-squared

F

β

t

p

Learning Value 0.772

0.596

858.871

0.669

29.307

0

Learning Conditions

Learning Value 0.803

0.645

527.860

0.413

11.388

0

0.441

10.561

0

Learning Resources

Learning Value 0.835

0.180

4.98

0

Learning Process

0.401

9.753

0

Learning Conditions

0.321

8.586

0

Learning Satisfaction

Learning Process

Learning Value 0.835

0.697

0.698

446.230

0.36

8.86

0

Learning Process

334.987

0.244

4.991

0

Learning Conditions

0.205

4.688

0

Learning Resources

0.173

3.802

0

Research on the Influence Mechanism of College Students’ Learning Satisfaction

267

Learning Conditions

Learning Process

Learning Value

0.4017

Learning Resources

0.3597

Learning Satisfaction

Fig. 3. Chain intermediary model

The results are shown in Table 2. The mediating effect between learning value and learning satisfaction is significant, with a total indirect effect value of 0.4253. Specifically, mediating effect is mainly generated through four intermediary chains. The first chain is “earning value -> learning process -> learning satisfaction”, its indirect effect is 0.1631, and the Bootstrap 95% confidence interval does not contain 0, indicating the mediating effect of this chain is significant. The second chain is “learning value -> learning conditions -> learning satisfaction”, its indirect effect is 0.0845, and the Bootstrap 95% confidence interval does not contain 0, indicating the mediating effect of this chain is significant. The third chain is “learning value -> learning process -> learning conditions -> learning satisfaction”, its indirect effect is 0.0605, and the Bootstrap 95% confidence interval does not contain 0, indicating the mediating effect of this chain is significant. The fourth chain is “learning value -> learning process -> learning resources -> learning satisfaction”, its indirect effect is 0.0466, and the Bootstrap 95% confidence interval does not contain 0, indicating the mediating effect of this chain is significant. Therefore, the test results show that learning process, learning conditions, and learning resources all play a mediating role in the process of learning value affecting students’ learning satisfaction, including independent mediating effect and chain mediating effect. The analysis of the chain mediated effect reveals that the coefficient of learning value affecting learning satisfaction is the largest (β = 0.3597, p < 0.001), which implies that students’ recognition of learning value has a crucial impact on learning satisfaction. According to the research, students are more likely to devote their time and energy to studying and feel satisfied when they believe that the educational program and teaching materials provided by the school are consistent with their valued learning. Additionally, the chain of “learning value → learning process → learning satisfaction” exerts the highest indirect effect, which means that students’ learning experience significantly influences their satisfaction towards the learning process. Therefore, educational institutions should use top-notch technology and software to help pupils understand their lessons effectively.

268

X. Guo et al. Table 2. Mediating effect between learning value and learning satisfaction Effect

BootSE

BootLLCI

BootULCI

Relative Mediation Effect

Total indirect effect

0.425

0.046

0.340

0.519

54.17%

Learning Value-> Learning Process-> Learning Satisfaction

0.163

0.043

0.079

0.251

20.77%

Learning Value-> Learning Conditions-> Learning Satisfaction

0.085

0.028

0.035

0.143

10.76%

Learning Value-> Learning Resources-> Learning Satisfaction

0.031

0.015

0.007

0.063

3.97%

Learning Value-> Learning Process-> Learning Conditions-> Learning Satisfaction

0.061

0.018

0.027

0.097

7.71%

Learning Value-> Learning Process-> Learning Resources-> Learning Satisfaction

0.047

0.019

0.011

0.085

5.94%

Learning Value-> Learning Conditions-> Learning Resources-> Learning Satisfaction

0.023

0.011

0.005

0.047

2.93%

Learning Value -> Learning Process -> Learning Conditions -> Learning Resources -> Learning Satisfaction

0.017

0.008

0.0035

0.034

2.10%

4 Conclusion This paper provides theoretical insights into the relationship between learning value and learning satisfaction. In addition, practical implications are drawn that can guide teaching reform in higher education. The findings suggest that universities should take steps to enhance students’ learning values by identifying their learning goals and providing tailored instruction and resources. To promote students’ satisfaction and engagement in their studies, diverse teaching methods should be employed, content difficulty leveled appropriately, and group activities promoted. However, the number of samples in this study is limited, so we will expand the number of samples and further improve it with Big data technology.

Research on the Influence Mechanism of College Students’ Learning Satisfaction

269

Acknowledgments. This work was sponsored by Shanghai Municipal Education Commission under the contract Z90004.23.001 (Professional Master’s Degree Authorized School Training Project). And this work was also partially sponsored by Sanda University under the contract A020201.23.058 (Key Courses Construction Project).

References 1. Gao, B., Zhu, S., Wu, J.: The relationship between mobile phone addiction and learning engagement in college students: the mediating effect of self-control and moderating effect of core self-evaluation. Psychol. Dev. Educ. 03, 400–406 (2021) 2. Slotnick, S.O.: CIRP (Cooperative Institutional Research Program) Freshman Survey Report. Fall 1992. College Freshmen, 142 (1993) 3. Chang, Y., Hou, X., Liu, Y.: A study on the evaluation system and evaluation model of student satisfaction in Chinese universities. J. Higher Educ. 09, 82–87 (2007) 4. Lenton, P.: Determining student satisfaction: an economic analysis of the national student survey. Econ. Educ. Rev. 47, 118–127 (2015). https://doi.org/10.1016/j.econedurev.2015. 05.001 5. Wen, J.: The Types and characteristics of Chinese college students’ learning satisfaction and the improvement space. Decis. Inf. 08, 120–129 (2016) 6. Wen, J.: Study on college students’ learning satisfaction and its Influencing factors. China Higher Educ. Rev. 00, 134–144 (2013) 7. Li, X.: Analysis on the influencing factors of college students’ satisfaction with college teaching—— taking Hainan province as an example. World Surv. Res. 02, 49–54 (2019) 8. Guo, L., Cao, Y.: Research on the influencing mechanism of college students learning satisfaction with MOOCs. J. Higher Educ. 12, 69–75 (2018) 9. Madhavan, M., Kaliaperumal, C., Muthuvel, S.: Mediation effects of sustainable supply chain strategies on supply chain functions and firm performance. Int. J. Bus. Perform. Supply Chain Model. 7(3), 292–304 (2015) 10. Chen, H., Yao, Y., Zhou, H.: How does knowledge coupling affect exploratory and exploitative innovation? the chained mediation role of organisational memory and knowledge creation. Technol. Anal. Strateg. Manag. 33(6), 713–727 (2021) 11. Pu, X., Wang, Z., Chan, F.: Adoption of electronic supply chain management systems: the mediation role of information sharing. Ind. Manag. Data Syst. 120(11), 1977–1999 (2020) 12. Zhang, Y., Wu, Y., Goh, M., et al.: Supply chain management scholar’s research impact: moderated mediation analysis. Libr. Hi Tech 37(1), 118–135 (2019) 13. Wang, C., Chudzicka-Czupała, A., Tee, M.L., et al.: A chain mediation model on COVID-19 symptoms and mental health outcomes in Americans. Asians Eur. Sci. Rep. 11, 6481 (2021) 14. Zhu, L., Wang, N., Du, Y.: Research on the influencing factors and promotion strategies of online learning satisfaction of college students. J. Natl. Acad. Educ. Adm. 05, 82–88 (2020) 15. Zhao, Y.: The influencing factors on students’ satisfaction with MOOCs. J. Higher Educ. 02, 73–78 (2018) 16. Chen, Q., Liu, R.: Contemporary Educational Psychology, 2nd edn., pp. 213–214. Beijing Normal University Press (03) (2007) 17. Wang, Y.: Study on the relationship mechanism between learning stress and online game addiction in adolescents. Master’s dissertation. Shenzhen University(2019) 18. Zheng, J., Wu, X., Min, X.: The application of peer teaching Method in the context of group cooperative Learning – Based on the perspective of literature review. Reform Open. 13, 101–107 (2020)

270

X. Guo et al.

19. Su, X., Lai, J., Zhan, X.: Influence of helping behavior on employees’ performance appraisal in the tourism service industry: chain mediation model of motive attribution and regulatory focus. Tour. Tribune 36(08), 101–111 (2021) 20. Gao, J., Yu., P.: Research on influential factors of students’ satisfaction in virtual experiment learning under MOOC environment. Exp. Technol. Manag. 35(01), 221–225 (2018) 21. Wen, Z., Ye, B.: Mediation effect analysis: methodology and model development. Adv. Psychol. Sci. 22(5), 731–745 (2014)

Design of License Plate Recognition System Based on Machine Vision Ming Hui Zhang1 , Xu Yang1 , and Ming Chao Zhang2(B) 1 School of Information Science and Technology, Sanda University, Shanghai, China

[email protected]

2 Changchun Humanities and Sciences College, Changchun, China

[email protected]

Abstract. A complete license plate recognition system is designed using YOLO technology based on license plate detection and LPRNet technology based on license plate recognition. The system adopts the CCPD (Chinese City Parking Dataset) of China’s urban parking lots. Combined with YOLOv5 technology, by improving the LPRNet network model, the accuracy of license plate localization and recognition in CCPD dataset verification reaches 99% and 97.04%, respectively, indicating that the license plate recognition system designed in this paper has good stability and reliability in practical applications. Keywords: machine vision · license plate recognition · CCPD · YOLO · LPRNet

1 Introduction License plate recognition technology is a hot research direction in the field of machine vision. At present, the positioning of license plates is mainly based on traditional license plate positioning algorithms and based on neural network license plate recognition algorithms. Many scholars have proposed a series of feature detection algorithms based on image processing to realize the detection and positioning of license plates. J. BulasCruz et al. proposed a straight-line-based real-time license plate number reading method, Waing and Aye studied a traffic violation system, and Tan et al. proposed an edge geometry idea. Prabhakar et al. proposed to locate the straight line region by median filter and Hough transform, and locate the license plate by morphological processing, and Saini et al. proposed a method based on wavelet transform and empirical mode decomposition (EMD). In recent years, deep learning technology has become a hot research direction in the field of image processing. In the field of object detection, Ross Girshick and other scholars proposed the R-CNN algorithm, which has a large amount of computation and slow speed, which limits its use in practical applications. The Fast-RCNN algorithm came into being, which improves the detection speed by introducing a RoI pooling layer that combines target area extraction and classification tasks into a network. Abdullah et al. proposed a Bangladeshi license plate recognition method based on YOLOv3 algorithm model. Kakani et al. proposed a method to segment characters using the extreme region © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 271–279, 2024. https://doi.org/10.1007/978-981-99-9412-0_28

272

M. H. Zhang et al.

method and adopt the method of limiting the character recognition of the Boltzmann machine. Omar et al. use the cascade network method to realize license plate recognition, realize license plate positioning through preprocessing and semantic neural network, and use the decoder network to extract location information, and finally use convolutional neural network to realize character recognition. Kim et al. used a deep convolutional neural network to achieve Korean license plate recognition. In the field of license plate recognition, P. Comelli et al. first extracted the feature points of license plate characters, and then matched the characters on the basis of template matching, so as to realize the recognition of license plate characters. Amir et al. used a low-pass Gaussian filter and a Laplace transform matrix to attenuate the effects of noise and lighting, and then efficiently identified license plate characters through a feedforward neural network. Zhang J F et al. combined color features with template matching, used vertical projection to divide characters, and compared the similarity of template characters and license plate images to identify license plate characters [1].

2 YOLO-v5-Based License Plate Detection The license plate recognition system based on machine vision adopts the YOLO-v5 model [2] as the object detection algorithm for license plate detection. For optimal performance, the hyperparameters and training parameters of the model are adjusted. 2.1 YOLO-v5 Training Process In the training process of the license plate detection task, the CCPA_base dataset is used, which contains 7925 license plate pictures, and the dataset is divided into three parts: training set, test set and verification set, of which the training set accounts for 70%, the test set accounts for 20%, and the verification set accounts for 10%. CCPD_base the dataset contains labels and images, the model trains the network using standard txt labeling files. Using the standard TXT file format, it can be easily compared and exchanged with other datasets, which is highly practical and versatile. The dataset only needs to detect targets such as license plates, so each label file contains only one line, and the category is set to 0, indicating license plates. The method of extracting the coordinate information of the upper left and lower right corners of the license plate from the CCPD picture file name is adopted, and the information is converted into normalized coordinate values by calculation, and the result is saved in the labels file in txt format. The specific calculation formula is as follows [3] x_centet =

xmax + xmin 2width

(1)

y_centet =

ymax + ymin 2height

(2)

w=

xmax + xmin width

(3)

h=

ymax + ymin height

(4)

Design of License Plate Recognition System

273

Select the images in the CCPD_base dataset as the training and validation sets, use the weight parameters of the pre-trained model to initialize the network, and use the Adam optimizer to train the model. The network parameters are continuously adjusted by adjusting the main hyperparameters such as weight decay, learning rate and momentum, and the measures such as learning rate decay, weight decay and specification terms are used to avoid the occurrence of overfitting phenomenon. 300 epochs were trained and the optimal model was saved to obtain better training results. Experiment to continuously adjust the learning rate to get the best training results. The specific implementation is shown in the following equation: η(s) =

η0 1 + ×ηd

(5)

In the process of training a deep learning model, it is often necessary to set some parameters to optimize the performance of the model. Where the number of iterations is expressed as s, the initial learning rate is η0, and the learning rate decay factor is ηd. In order to speed up the convergence of the model and suppress oscillations, momentum is added to the training process, which makes the learning progress completed faster. The following table (Table 1) describes the specific training parameters: Table 1. Table captions should be placed above the tables Parameter name

Parameter name

img

608 * 608

batch_size

8

epochs

300

Initial learning rate

0.001

Learning rate decay strategy

epoch

Learning rate decay step

Steps = 3, 5, 10

Learning rate attenuation size

0.5, 0.1, 0.1

Weight decay

0.0005

2.2 Experimental Hardware Environment and Parameter Settings The input image size used in the experiment is 416 * 416, and the minibatch stochastic gradient descent optimization algorithm with momentum of 0.9 is used. The initial learning rate is set to 0.001, and the segmented learning rate setting method is used

274

M. H. Zhang et al.

to decay the learning rate, that is, when the number of iterations exceeds the specified value, the learning rate becomes 0.1 times of the previous value. In order to improve the convergence speed of the deep network, the batch normalization operation is carried out after the convolution operation. 2.3 Experimental Results and Analysis The YOLO-v5 model was trained on a CCPD_base dataset, and after 300 epochs of training, the model reached the convergence state, and the training results were analyzed as follows: (1) In the process of model training, the improvement of accuracy and recall is very stable. After a period of training, the performance of the model has reached a saturation state. At this time, the accuracy performance of the model can be stably maintained above 87%, and the recall performance can be stably maintained at about 90%. The mean precision ([email protected]) and harmonic mean ([email protected]:0.95) are also maintained at a very high level. The average accuracy mean can be stably maintained around 0.7, and the harmonic mean can be stably maintained above 0.75. (2) After the model was trained, it produced two file best.pt and a last.pt, both of which were 13.6 MB in size. In practical applications, the recognition speed of this model is very fast. After testing, the recognition rate of CCPD_base license plate data reached 100% and the license plate recognition accuracy was 87% in the case of GTX 1080 graphics card (Fig. 1). The confusion matrix can also prove that the model exhibits high accuracy and reliability, (Fig. 2), the error is decreasing, and the accuracy is gradually increasing. The experimental results on the validation set (Fig. 3) further verify the excellent performance of the YOLO-v5 model in license plate detection. (Fig. 4) The speed of the model basically meets the requirements of real-time detection. These experimental results show that the performance of the model is excellent.

Fig. 1. YOLO license plate recognition accuracy

Design of License Plate Recognition System

275

Fig. 2. Confusion matrix results

Fig. 3. Changes during training

3 License Plate Recognition Based on LPRNet 3.1 LPRNet Training Process The LPRNet model [4–6] is used as the object detection algorithm for license plate recognition, and the hyperparameters and training parameters are optimized to improve the model performance. During the training process, a dataset of about 7925 Chinese license plates was used to obtain high recognition accuracy. In addition, the LPRNet basic model is improved to use 2 * 2 steps for all pooling layers to reduce the size of the intermediate feature map and the computational cost of total inference. In view of

276

M. H. Zhang et al.

Fig. 4. Actual identification of license plates

the runtime performance requirements in practical applications, the LPRNet simplified model is adopted, and all pooling layers are modified in 2 * 2 steps. The calculation overhead and accuracy of LPRNet simplified model and LPRNet basic model are compared on different datasets, and the results show that the LPRNet simplified model has great improvement in performance and efficiency, which provides a better choice for practical applications. This is shown in Table 2. Table 2. Recognition accuracy of different models Algorithmic Model Model

Model Size

LPRNet baseline

7.4 MB

Inference Speed (FPS) 24.1

Recognition Accuracy (Character Level) 95.2%

LPRNet basic

3.4 MB

90.9

95.4%

LPRNet reduced

2.1 MB

148.8

95.3%

3.2 Network Structure The YOLO-v5 model was used to infer 2000 license plate pictures, and the detected license plate detection boxes were intercepted as the training dataset. The backbone network of the YOLO-v5 model consists of 3 convolutional layers and 3 custom base modules, where each base module includes 4 convolutional layers and a feature output layer, using batch normalization and ReLU activation functions. The input of the network is a 94 * 24 pixel image, the output layer is a convolutional layer, the size of the output matrix is 18 * 68, and the result of 18 * 68 is output after averaging pooling the second dimension. In order to prevent indirect errors caused by character segmentation, the CTC loss function is used, and the beam search method is used to obtain the optimal output sequence.

Design of License Plate Recognition System

277

The CTC (Connectionist Temporal Classification) loss function is often used in character recognition based on LSTM, CTC combination, Seq2Seq, CTC combination, or CNN to achieve no character splitting. After extracting features in the LPRNet backbone network, license plate sequences are obtained through a series of convolution kernels. Since the output length of LPRNet and the length of the target sequence are different, the CTC loss function is used. If you use Softmax Loss, you need to mark the position of each character in the image in the training set so that each character element corresponds to each column output, which will waste a lot of time. The CTC can deal with the misalignment of the network label and the output, and calculate the text probability through the following formula: LCTC (X , W ) =

 C:k(C)=w

p(C/X ) =

T

 C:k(C)=W

t=1

p(ct /X )

(6)

Given the input X, the probability of W is obtained by path C after the K transformation, where the time step t is used as the subscript. This means that the probability of combining characters and paths is summed, and finally the sum is processed with negative logarithms to obtain the CTC loss function. Directed search is a heuristic search algorithm that evaluates the detection capability of each node through heuristic functions. This algorithm is often used to solve the situation of large space, and in order to reduce space and time consumption, the poor quality nodes are removed during depth expansion, and the high quality beam width nodes are retained. Cluster search builds a search tree through the breadth-first strategy, in each layer of the tree, the probability size is used as the basis for node ordering, the beam width nodes are retained, and then expanded from the next layer to remove the remaining nodes, so as to improve the search efficiency. LPRNet uses a beam search to obtain the top n sequences with the highest probability and returns the first successful matching template set, which is based on the motor vehicle number plate standard. Due to the limitations of the beam_size parameter, n represents the number of possibility words reserved for each step. 3.3 Experimental Hardware Environment and Parameter Settings The training of LPRNet models is implemented using the PyTorch framework. During the training process, Adam optimizer is used to optimize the model parameters with a batch size of 32, an initial learning rate of 0.001, and a gradient noise figure of 0.001. After every 100 iterations, the learning rate is reduced by a factor of 10 and the network is trained for a total of 250 iterations. Data augmentation using stochastic affine transformations such as rotation, scaling, and translation was used in the experiment. 3.4 Analysis of Experimental Results The application of LPRNet model in license plate recognition tasks is explored, and the model performance is optimized by gradually adjusting model parameters, increasing dataset size, and data augmentation. Several experiments have shown that the LPRNet model with good performance in the license plate recognition task is obtained, and

278

M. H. Zhang et al.

its error decreases and the accuracy gradually improves. The recognition accuracy of 97.04% on the CCPD_base dataset is achieved, and the experimental results further confirm the accuracy and reliability of the model, and achieve significant results (Fig. 5). In the license plate recognition system, the original image is detected by a pretrained YOLO model, which can recognize the target object in the picture and generate the corresponding border. When a license plate area is identified, the YOLO model marks a prediction box and displays the object type and confidence level on the prediction box (as shown in Fig. 6). To record the recognition results, the system generates a txt file containing information about the prediction box, such as border coordinates, object type, and confidence level. Next, the system will perform license plate recognition based on the license plate area detected by the YOLO model. Before data is transferred, processes such as normalization and scaling of the data need to be performed to ensure that the data format and dimensions are consistent with the requirements of the LPRNet model. At the same time, the license plate position information and the license plate sub-image will also be passed to the LPRNet model as parameters. In the license plate recognition process, the LPRNet model will receive the license plate location information and license plate sub-image transmitted by the YOLO model, and recognize the license plate (as shown in Fig. 7).

Fig. 5. The accuracy of LPRNet on the CCPD_base dataset

Fig. 6. YOLO model detection results

Design of License Plate Recognition System

279

Fig.7. License plate number identified by the LPRNet model

4 Conclusion The YOLO algorithm is used for license plate positioning, and the model is trained using a dataset containing 7925 license plate images, which achieves 87% recognition accuracy, which can accurately locate license plate images containing multiple license plates and large tilt angles. By improving the LPRNet network model, it is applied to license plate character recognition, and a sequence-based license plate character recognition method is proposed to solve the problem of license plate character imbalance. After experimental verification, the recognition accuracy of the method in this paper on the test set of 150344 license plate data reached 97.04%. The two algorithms of YOLOv5 and LPRNet are combined to form a combination mode and are applied to the field of license plate recognition.

References 1. Zhang, H.: Research on license plate recognition technology based on classification BP network. North University of China (2011) 2. Zhou, Y.: Research on human behavior recognition algorithm based on multi-scale CNN features. Southwest University (2018) 3. Zhao, Y.: Research and application of license plate recognition technology based on deep learning. Donghua University (2019) 4. Zhang, Y.: Design and implementation of video surveillance management platform based on image processing. Xidian University (2020) 5. Chen, K., Zhu, Z., Deng, X.: Deep learning for multi-scale object detection: a survey. J. Softw. 32(4), 1201–1227 (2021) 6. Zherzdev, S., Gruzdev A.: LPRNet: License Plate Recognition via Deep Neural Networks (2018)

Construction and Application of Knowledge Graph in the Field of Medical Food Supplements Ming Hui Zhang1 , Wei Hong Yu1 , and Ming Chao Zhang2(B) 1 School of Information Science and Technology, Sanda University, Shanghai, China 2 Changchun Humanities and Sciences College, Changchun, China

[email protected]

Abstract. Taking the field of medical food supplement as an example, the relevant technologies for the construction of domain knowledge graph and its intelligent question answering application are analyzed, the semantic analysis technology in the application of intelligent question and answer is deeply studied, the validity of the model and algorithm are verified, and finally the intelligent question answering application based on the field of medical food supplement is constructed. The application supports structured data graphing, graph construction and maintenance, and intelligent question answering examples that can return user-readable answers, which can help people obtain relevant knowledge faster and more accurately, which has certain practicality. Keywords: knowledge graph · intelligent question answering · BERT · natural language processing

1 Introduction First proposed by Google in 2012, Knowledge Graph is a semantic-based way of organizing knowledge that abstracts real-world entities, concepts, attributes, and other elements and their relationships, and represents and stores them graphically [1]. In recent years, knowledge graph has been widely used in natural language processing, intelligent question answering, recommendation systems, search engines and other fields. Agrawal Garima, Deng Yuli et al. propose a bottom-up approach to collating entityrelationship pairs and constructing a knowledge graph and question answering model for cybersecurity education. The results showed that students found these tools helpful in learning core concepts, and they used the knowledge graph as a visual reference to cross-check to help them complete project tasks [2]. Feng Fan, Tang Feitong, Gao Yijia, and others integrate genomic datasets and annotations from more than 30 consortia and portals, including 34.71 billion genomic entities, 3.63 billion relationships, and 900 million entities and relationship attributes. Visually connect siloed data matrices and support efficient queries of scientific discoveries. Relationships between genomic entities are emphasized, transforming complex analysis between multiple genomic entities and relationships into code-free queries, and facilitating future data-driven genomic discovery [3]. In 2018, Yang Yuji et al. conducted systematic research on the construction of © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 280–289, 2024. https://doi.org/10.1007/978-981-99-9412-0_29

Construction and Application of Knowledge Graph

281

domain knowledge graphs, proposed a rapid construction method for domain knowledge graphs, and used this method to construct a Chinese subject knowledge graph containing 670,000 entities and 14.12 million facts using nine subjects of basic education in China as the original data [4].

2 GPLinker Intends to Identify Slot Analysis Models Based on the BERT-wwm-GlobalPointer’s intent recognition slot analysis model, GPLinker completes the joint task of multi-intent recognition and slot filling. 2.1 The Overall Structure of the Model Firstly, the semantic feature vector of the input question is obtained by the BERT-wwm pre-trained language model, and after improving the language understanding ability of the model, the encode encoding output by the pre-trained model is used as input, and the intention and slot recognition are recognized by GlobalPointer using the idea of global normalization, which solves the problem of entity nesting, and then the analyzed slots and trigger words are fully calculated for the intention division, and finally the joint task of multi-intent recognition and slot filling is completed. BERT-Wwm Pre-Trained Model The model used in this experiment is the BERT-wwm version, which is a pre-trained language model based on the BERT model, the full name is “Chinese pre-trained BERTWhole Word Masking”, which has the advantage of better handling the Chinese word segmentation problem and improves the performance and generalization ability of the model. BERT-wwm also uses a larger dataset and more training steps, allowing the model to excel on a variety of Chinese natural language processing tasks. Intent recognition for single-sentence questions does not require sentence segmentation, so the initial word embedding vector can be represented as Eq. 1. ho = Ws + Wp

(1)

BERT-wwm uses a 12-layer transformer encoder, and the output vector of layer a can be expressed as Eq. 2. ha = Transformer(ha−1 ), a ∈ [1, 12]

(2)

Global Pointer-Based Slot Tagging Method GlobalPointer, or global pointer, uses the idea of global normalization to convert the input sentence into a set of all possible consecutive character fragments, and performs the joint task of intent recognition slot analysis by learning a global pointer to determine whether each consecutive character fragment belongs to any type of trigger word or slot. The method can identify nested and non-nested entities without discrimination. GlobalPointer is designed for the inconsistency between training and prediction brought by the two modules when they identify the head and tail of the entity respectively, and treats the head and tail as a whole to discriminate [5]. Multi-Intent Search Method Based on Full Subgraph

282

M. H. Zhang et al.

The trigger word in the joint task of consent graph recognition slot analysis is related to the slot word, and this connection can be described by an undirected graph, which converts the intent division into a complete subgraph search on the graph. Build an undirected graph method: First of all, when two trigger words or slot words have a relationship, then their (first, first) and (tail, tail) can be matched, and you can use GlobalPointer to predict their matching relationship. Since only one undirected graph needs to be constructed, the lower triangle can be masked, and all the edges are described only by the upper triangle.

Fig. 1. Schematic diagram of the search strategy of the complete subgraph

As shown in Fig. 1, the 9 nodes can search out two complete subgraphs, where node D appears in both subgraphs at the same time, which means that two intentions can be divided, and they have a common node D. The recursive search algorithm is as follows: Step 1: Enumerate all node pairs on the graph, if all node pairs are adjacent, then the graph itself is a complete graph, return directly; If there are nonadjacent node pairs, proceed to step 2; Step 2: For each pair of nonadjacent nodes, find out all adjacent node sets (including themselves) to form subgraphs, and then perform step 1 for each subset. Or the above diagram as an example, you can find out that A and E are a pair of nonadjacent nodes, find out that their adjacent sets are {A, B, C, D} and {D, E, F, G, H, I}, respectively, and then continue to find {A, B, C, D} and {D, E, F, G, H, I} nonadjacent node pairs, and find that they cannot find it, so {A, B, C, D} and {D, E, F, G, H, I} are complete subgraphs. Note that this does not depend on the order of nonneighbor pairs, because the same is done for “all” nonneighbor pairs. Throughout the process, you may get a lot of duplicate results, but you won’t miss a result, won’t be affected by the recognition order, and finally deduplicate. In addition, each time you search, you only need to search for nodes under the same intent, and in most cases the nodes contained in the same intent are only single digits, so the above algorithm seems complicated, but the actual running speed is still very fast. Therefore, the GPLinker model has relatively simple, efficient and complete characteristics, and theoretically does not have the problem of exposure bias [6]. Loss Function GlobalPointer is actually a “n(n + 1)/2 choose k” multi-label classification problem, but in most cases the size of n is the length of the text, which is quite different from the

Construction and Application of Knowledge Graph

283

number of entities of each type in the sentence. This creates an uneven classification. For this purpose, this paper uses multiclass cross-entropy as the loss function [7] , and the calculation formula is shown in Eq. 3   L = log(1 + e−si ) + log(1 + esi ) (3) i∈P

i∈N

Assuming that the classified entity type is a, P and N represent the head-to-tail collection of all entities of type a and the end-to-end collection of non-entity or non-A type entities, respectively. 2.2 Training Data Preprocessing Question Classification In the Intent Recognition Slot Analysis joint task, intents and slots are selected in a finite set of agreements. Therefore, you need to classify the intent and its slot before generating the training data. According to the 4 entities and 4 relationship types in the semi-structured data, this paper divides the problems in the field of medical food supplement into 14 types of single intention and 18 types of multiple intention, a total of 32 question structures, and defines the four entity types as slots. Dataset Generation Generate datasets in the form of syntax trees. Firstly, the names of four types of entities are obtained from the semi-structured data and stored according to the entity type. Then, according to different types of question template structures, the syntax tree is constructed, and the obtained entity names are filled in the corresponding slots in the structure, so as to complete the construction of the syntax tree. After that, the sentence structure generated by the syntax tree needs to be defined, and the intent type, trigger word, slot word and corresponding index need to be saved in the statement; Then, these 32 types of syntactic trees are traversed a certain number of times to generate more than 20,000 pieces of data each with single intent and multiple intents, and then split the single-intent multi-intent data to synthesize a dataset of more than 20,000 pieces by 50% each. Finally, the dataset is split according to the ratio of 1:3:6 of the test set, validation set, and training set to complete the generation and preparation of structured model training data, and the format and content of the generated result are as follows: Single intent training statement: {“id”: “c73cf1129f1a47c9aee4ce6ef0b8d968”, “text”: “Can gastric leiomyosarcoma eat floating wheat porridge”, “event_list”: [{“event_type”: “disease_donot_food”, “trigger”: “OK”, “trigger_start_index”: 6, “ arguments”: [{“argument”: “gastric leiomyosarcoma”, “role”: “disease”, “argument_start_index”: 0}, {“argument”: “floating wheat porridge”, “role”: “food”, “argument_start_index”: 11}]}]}; Multi-intent training statement: {“id”: “fced6afdd35240b48a6eba7d2fb2c059”, “text”: “What are the foods that should be eaten in the elderly with hyperkalemia”, “event_list”: [{“event_type”: “disease_not_food”, “trigger”: “forbidden”, “trigger_start_index”: 7, “ arguments”: [{“argument”: “Hyperkalemia in the elderly”, “role”: “disease”, “argument_start_index”: 0}]},{“event_type”: “disease_do_food”,

284

M. H. Zhang et al.

“trigger”: “Eatable”, “trigger_start_index”: 9, “arguments”: [{“ argument”: “Hyperkalemia in the elderly”, “role”: “disease”, “argument_start_index”: 0}]}]}. 2.3 Parameter Settings In this paper, considering the influence of different Bert pre-training model versions and model parameters on the effect of GPLinker event extraction model, the following model parameters are determined after multiple comparative experiments: the total training round (train_epochs) is set to 8, and the batch size (batch_size) is set to 32. The main experimental parameter settings are shown in Table 1. Table 1. Model parameter settings Parameter Name

value

train_epochs

8

learing_rate

2e-5

batch_size

32

max_len

128

dropout_rate

0.1

where train_epochs represents the number of rounds to train the neural network, and the entire dataset is used for training in each round; learning_rate refers to the learning rate used when training the neural network, which controls the size of the step each time the weight is updated. The larger the learning rate, the larger the update amplitude, but it may cause the algorithm to not converge; The smaller the learning rate, the smaller the update amplitude, but it takes more time to achieve optimal results; drop_rate refers to the dropout ratio used during training, i.e. a certain proportion of neurons are randomly discarded in the output of each layer to prevent overfitting; batch_size refers to the number of samples used in each iteration, that is, all samples are divided into several small batches for training, in general, the larger the batch_size, the greater the memory consumption, and the training speed will be slower accordingly. 2.4 Experimental Evaluation Indicators Recall R measures the ability of the model to successfully find all relevant samples. Model recall is the proportion of all samples that are actually positive categories that are correctly predicted to be positive categories. The higher the recall, the greater the probability that the true positive case will be detected, as shown in Eq. 4: R=

TP TP + FN

(4)

Precision P measures the accuracy ability of positive cases in the classification results of the model. Model accuracy refers to the proportion of the sample predicted to be

Construction and Application of Knowledge Graph

285

positive. That is, the higher the accuracy, the higher the proportion of true positive cases among the positive cases predicted by the model, and the calculation formula is shown in Eq. 5: P=

TP TP + FP

(5)

The F1 score takes into account that recall and accuracy are the weighted average of the two, and the specific calculation formula is shown in Eq. 6: F1 =

2PR P+R

(6)

In the current scenario of medical food supplement Q&A, missing patients may bring more serious consequences. Therefore, high recall may be more important than high precision, so this article will prefer to use the model parameter with the higher recall R value when evaluating experimental results with the same F1 value. 2.5 Experimental Results and Analysis In order to prove the effectiveness of the GPLinker event extraction model, the control variable method will be based on the following two dimensions: the training results of different versions of the BERT pre-training model are compared; Comparison of training results under different model parameters。 Comparison of Experimental Results of Various Parameters The experiment will be conducted from two aspects: batch_size batch size and num_train_epochs training rounds. Batch_size batch size parameter refers to the number of samples contained in each batch. During model training, the data is usually processed in batches, and the size of each batch can be controlled by adjusting batch_size parameters. Batch_size there is a threshold, and once this threshold is exceeded, the model performance will easily fall into sharp minima, resulting in overfitting and poor generalization. The num_train_epochs parameter specifies the number of times the training dataset is traversed (that is, epoch) during training. It controls the number of rounds of model learning and training time, and often needs to be adjusted based on factors such as dataset size, model complexity, and compute resources. Smaller num_train_epochs can cause the model to underfit, while larger values can cause the model to overfit. (1) The influence of batch size on experimental results This experiment will be carried out under the condition that the num_train_epochs parameter values are 5 and 10, respectively, and the control variable idea is used to carry out the experiment, only change the batch_size parameter values, set the batch_size parameter values to 8, 16, 32, 64, 128, 256 respectively, compare the results at the two levels of intention and slot, and select the better parameter values of the batch_size through the experiment, and the experimental results are shown in Fig. 2 and Fig. 3 It can be seen from the figure that when the batch_size parameter values are set to 16 and 32, the model can achieve the best results in fewer training rounds, and in subsequent comparative experiments, the batch_size parameter values of 16 and 32 will start.

286

M. H. Zhang et al.

Fig. 2. Influence of batch size on evaluation index (number of iterations is 5)

Fig. 3. Influence of batch size on evaluation index (number of iterations is 10)

(1) The influence of training rounds on experimental results According to the results of the comparative experiment of batch size, this experiment will be carried out under the condition of batch_size parameter values of 16 and 32, respectively, through the experiment, from the num_train_epochs parameter value within 10, select the optimal parameter value of the num_train_epochs, and determine the optimal batch_size parameter value from 16 and 32, the experimental results are shown in Fig. 4 and Fig. 5 respectively: It can be seen from the figure that when the batch_size is 16, the F1 value of the model is within 10 iterations, showing a trend of gradually rising and then floating, when the batch_size is 32, the F1 value of the model is within the number of iterations is 10, showing a trend of gradually rising and decreasing, and both at the num_train_epoche of 8, the F1 value of the two levels of intention and slot reaches the maximum value (batch_size parameter is 32 when the parameter is 32 and 0.5% when the parameter is 16, respectively and about 0.3%). In the current scenario of medical food supplement Q&A, missing patients may bring more serious consequences, that is, high recall is more important than high precision, and when the batch_size parameter is 32, the recall R value at the two levels of intent and

Construction and Application of Knowledge Graph

287

Fig. 4. Influence of batch size on evaluation index (batch size is 16)

Fig. 5. Influence of batch size on evaluation index (batch size is 32)

slot is significantly higher than that of 16% when the batch_size parameter is about 2%. Therefore, the final choice sets the num_train_epochs to 8 and the batch_size parameter to 32.

3 Design and Implementation of Intelligent Question Answering Application Based on Domain Knowledge Graph 3.1 Domain Knowledge Graph Construction Method and Implementation Data Source Preparation Based on the field of medical food supplements, the project uses the semi-structured data in the open source project as the source and undergoes data preprocessing to build a knowledge graph with diseases as the core, including food, drugs, symptoms, and

288

M. H. Zhang et al.

diseases, a total of 4 types of entities with a scale of 23,000 and 4 types of entities with a scale of about 100,000. Construction of Knowledge Graph in the Field of Medical Food Supplements In order to meet the complex relational storage between data, the query function with high flexibility is realized, and the knowledge is persisted. Using the Neo4j graph database, data is stored in the form of nodes and edges. Nodes represent entities, such as food, disease, symptoms, etc., and edges represent relationships between entities, such as disease symptoms, disease recommended recipes, common medicines for diseases, and so on. Neo4j graph database emphasizes the connectivity between entities, has good traversal performance, and has better performance for complex relational queries, and has the ability of knowledge reasoning. 3.2 Q&A Application Design Overall Process Design of Q&A Applications This paper builds an application example of a knowledge graph platform that includes knowledge graph construction in the field of medical food supplements, exploration and maintenance functions, and an intelligent question answering application based on knowledge graph platform and GPLinker model. The whole framework consists of three modules, domain knowledge graph construction module, question analysis module and knowledge graph retrieval module. In the construction of domain knowledge graph, the semi-structured text corpus is first preprocessed to generate structured data and stored in MySQL, and then the data and index are stored in the Neo4j graph database and ES database respectively through the graph construction function. As a knowledge base, the graph database supports the exploration and supplementary modification of knowledge, and at the same time serves as the base of the map retrieval module to provide input parameters for the map retrieval component API. The ES database stores and retrieves component information and indexes of entities and relationships in the graph database to achieve fuzzy matching of entity relationships. In the question parsing module, the question is first passed into the question parsing API, and then the intent recognition and slot analysis are filled through the GPLinker model, and the generated map search conditions are entered into the map retrieval component. In the knowledge graph retrieval module, firstly, according to the input parameters provided by the question parsing API, the retrieval component performs fuzzy matching in ES according to conditions to obtain the respective candidate set of each entity and relationship, and then assembles the query statement of the Neo4j database from the candidate set, then performs knowledge retrieval and assembles the output results according to the answer templates designed with different question intentions, and finally outputs user-readable answers. Q&A Application Function Module Design This application is mainly divided into data import storage module, map construction application module, and intelligent question answering application module. The data import storage module is mainly responsible for storing the structured data used to build the map; The graph construction application module is mainly responsible for using the

Construction and Application of Knowledge Graph

289

stored structured data to build domain knowledge graphs and provide users with services for displaying, maintaining, and configuring speech templates that return answers. The intelligent question answering application module is responsible for providing users with knowledge retrieval in the field of medical food supplement in the form of intelligent question answering. Data source creation function, through which users can create a schema in MySQL to store structured data.

4 Conclusion Starting from the field of medical food supplementation, GPLinker based on BERTwwm-GlobalPointer-based intent recognition slot analysis model is used to realize the semantic analysis of question sentences, which solves the problem that the templatebased question answering application has a large workload of manually written templates, low flexibility, and insufficient accuracy to determine the specific intention of user input. Using the idea of control variables, the model was compared with different model parameters and different pre-trained models, and the parameter settings of the model were determined. The domain knowledge graph and intent recognition slot analysis model of medical food supplement are integrated and constructed, and the intelligent question answering application based on domain knowledge graph is designed and implemented. The construction of a relatively complete knowledge graph in the field of medical food supplements, the fuzzy matching function through the combination of Neo4j and ES database query mode, the GPLinker of the intent recognition slot analysis model and the retrieval component configuration function are combined to improve the performance of the question answering application and improve the accuracy and readability of the answers. The implemented intelligent question answering application has a complete visualization function from structured data import-knowledge graph construction, display and maintenance-intelligent question answering application, and the answer application returns answers with high readability, and makes the knowledge graph have convenient accumulation and update capabilities.

References 1. Feng, F., et al.: GenomicKB: a knowledge graph for the human genome. Nucleic Acids Res. 51(D1), D950–D956 (2022) 2. Yuji, Y., Bin, X., Jiawei, H., Meihan, T., Peng, Z., Li, Z.: An accurate and efficient method for constructing domain knowledge graph. J. Softw. 29(10), 2931–2947 (2018) 3. Weizenbaum, J.: ELIZA: a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1), 36–45 (1966) 4. Wallace, R.: Artificial linguistic internet computer entity (alice). City (1995) 5. Su, J.: GlobalPointer: Event joint extraction based on GlobalPointer. https://lspaces.ac.cn/arc hives/89262022/02/21 6. Su, J., Murtadha, A., Pan, S., et al.: Global pointer: novel efficient span-based approach for named entity recognition. ArXiv preprint arXiv:2208.03054, (2022) 7. Yuan, Q., Shi, Y., Liu, J., Yu, G., Chen, B.: Construction method of kitchen diet knowledge map. In: Proceedings of 2020 China Household Appliance Technology Conference, pp. 1385–1390 (2020)

Practice and Exploration of Discrete Mathematics Teaching in Applied Undergraduate Colleges Under the Background of New Engineering Ming Chao Zhang1 and Ming Hui Zhang2(B) 1 Changchun Humanities and Sciences College, Changchun, China 2 School of Information Science and Technology, Sanda University, Shanghai, China

[email protected]

Abstract. “New Engineering” cultivates interdisciplinary compound talents, with solid professional knowledge and computational thinking ability. “Discrete Mathematics” is the core course for developing computational thinking skills. Under the background of new engineering, from the aspects of updating teaching concepts, rationally setting teaching content, reforming teaching models, optimizing curriculum evaluation systems, and improving teachers’ ability and quality, the “discrete mathematics” course is practiced and explored, in order to improve students’ computational thinking ability such as abstract generalization and representation, formal reasoning and proof, model construction and solution, divergent thinking, etc., and adapt to the teaching of “discrete mathematics” in applied undergraduate colleges. Keywords: New Engineering · Applied Undergraduate Discrete Mathematics · Practice and Exploration

1 Introduction The concept of “new engineering” began with a new idea of training higher education talents proposed by China’s Ministry of Education in 2017. The construction of new engineering requires innovation, integration and full-cycle as the main educational purpose, requires the establishment of a new educational concept, and the construction of a new engineering professional structure through three ways: incremental optimization, stock adjustment, and cross-integration. From three to exploration, we will establish a new model of new engineering development, that is, by shifting from discipline-oriented to industrial demand, from professional segmentation to cross-border cross-integration, and from adapting to services to supporting and leading [1]. The new engineering construction guide puts forward seven aspects of continuous deepening of engineering education reform in seven aspects: clear goals, concept leadership, structural optimization, model innovation, quality assurance, classified development, and formation of demonstration results [2]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 290–298, 2024. https://doi.org/10.1007/978-981-99-9412-0_30

Practice and Exploration of Discrete Mathematics Teaching

291

For application-oriented undergraduate colleges, it is required to cultivate social talents who combine knowledge and technology with applied practice, pay attention to both ability and quality, and meet the society’s demand for high-quality applicationoriented talents.

2 The Teaching Concept of “Discrete Mathematics” in Applied Undergraduate Colleges Under the Background of “New Engineering” “New Engineering” puts forward the direction of engineering education reform based on new needs for development, new competition and new requirements for education in the new era and new environment, which is the optimization and reengineering and content upgrading of engineering science construction, as well as the exploration of future engineering students’ training goals, training methods, and training content [3]. With the Internet and industrial intelligence as the core, “New Engineering” cultivates compound talents who interdisciplinary, adapt to and lead industrial development, and lays the foundation for cross-border integration and solving complex engineering problems [4]. The compound talents cultivated by it not only have solid professional knowledge, but also have computational thinking capabilities such as abstract representation, logical thinking, formal proof, model construction and discrete solution of related professional knowledge [5]. “Discrete mathematics” is the core course to cultivate computational thinking ability, and is a bridge between pure mathematics and computer science, cultivating the ability of computer science and big data science and technology students to explore potential relationships from discrete objects, construct abstract models, design discrete algorithms, and solve problem feasibility [6]. Under the background of “new engineering”, applied undergraduate education is based on the intersection of multidisciplinary theories, guided by engineering applications, using artificial intelligence and big data analysis technology to solve complex engineering problems, interdisciplinary computational thinking ability, interdisciplinary divergent thinking and innovation ability cultivation [7]. The curriculum team has continuously explored and formed the teaching concept of “discrete mathematics” under the background of “new engineering”: student cultivation adheres to “student learning as the center”, cultivates students’ computational thinking ability, integrates teaching content, improves the diversity and novelty of teaching methods, emphasizes the combination of theory and practice, practical application and multidisciplinary intersection [8], focuses on improving students’ learning interest, participation, learning effect and ability cultivation, and stimulates creativity.

292

M. C. Zhang and M. H. Zhang

3 Discrete Mathematics Teaching Practice in Applied Undergraduate Colleges Under the Background of “New Engineering” Under the background of “new engineering”, for application-oriented undergraduate colleges, emphasis is placed on the acquisition of students’ application skills, how to apply the knowledge learned, and how to improve their hands-on ability. Our university is an applied undergraduate college, and the “Discrete Mathematics” course is offered to second-year students in the computer science category. The course covers many knowledge points, strong theory, scattered and abstract content, and students do not know what aspects of this course’s program can be applied to in the learning process, resulting in students’ lack of interest, enthusiasm and initiative in the learning of the course. In view of the teaching status of “discrete mathematics”, the course team adjusted the teaching content, methods and means, and continuously explored the teaching practice of “discrete mathematics” suitable for applied colleges under the background of “new engineering”. 3.1 Under the Background of New Engineering, Reform the Teaching Content of “Discrete Mathematics” Courses to Adapt to the Construction of Knowledge System of Applied Talents “Discrete Mathematics” is the core basic course of computer science. To cultivate interdisciplinary compound talents, the new engineering department needs to have multidisciplinary professional sentence symbolization, professional-related formal reasoning, deduction and proof capabilities, relevant professional knowledge abstract representation, logical thinking, model innovation construction and discrete solution, mining object relationships, computer representation and other computational thinking capabilities. Facing the cultivation needs of interdisciplinary computational thinking, the teaching content of discrete mathematics should inherit the advantages of traditional engineering teaching, combine computer science and industrial intelligence, and cultivate the inductive ability, generalization ability, computational thinking and engineering practice ability of interdisciplinary students [7]. According to the 48 h of “discrete mathematics” courses offered by our school, the teaching content is reformed and integrated, and the teaching is balanced with paying attention to the foundation and highlighting the key points, taking into account the training goals of application-oriented talents and the needs of subsequent professional courses, and establishing a knowledge system of application-oriented talents. Teaching under the background of “new engineering” needs to integrate the latest theoretical knowledge, the latest technology and the latest research results into the course teaching process, so as to keep pace with the times and constantly update the teaching content. In teaching, the course team will modularize the knowledge system of “discrete mathematics” courses, integrate the knowledge points of teaching content, reduce the teaching difficulty of some chapters, integrate and compress the classic content of discrete mathematics, weaken the difficult topics and explanations of partial proofs when teaching, combine the latest technical theories and research results, focus on explaining the basic

Practice and Exploration of Discrete Mathematics Teaching

293

theories, basic concepts and basic principles with practical needs and application backgrounds, and enable students to truly appreciate that the theoretical knowledge of the majors learned has certain guiding value in practice [1, 5]. In terms of teaching content, it is committed to cultivating students’ ability to highly abstract different problems, so that students can grasp the essence of things and in turn guide the in-depth application in practice. There is a focus on the points. The mathematical logic section enables students to master the solution process from “natural language logic” to “symbolic calculus model”, focusing on the proposition used to express the constraints of the model and how its truth value is determined, and weakening the formal calculus system of logic. The set theory part is to let students lay a good foundation for discrete system epistemology, from the correspondence between “special relations” and “set structure” to the abstract understanding of “phenomenon” and “essence” of discrete systems, focusing on the set used to express model elements, the relationships and functions used to express model structure, and how to determine the correspondence between the elements of a set and different set elements, without discussing axiomatic set theory; The graph theory section allows students to experience the engineering process of “discrete mathematics” through a specific field of “symbolic calculus system”, focusing on the use of vertices, edges and their correlations to express the structure of the model, and the practical application of special graphs such as trees, planar diagrams, Euler diagrams, and Hamiltonian diagrams [9, 10]. Reform the teaching content of “discrete mathematics” courses, reorganize and optimize knowledge modules, highlight the potential connections between knowledge modules, make them an organic whole that supports each other and collaborates with each other, cultivate students’ ability to analyze problems, stimulate students to put forward innovative ideas from different angles, and adapt to the construction of applied talent knowledge system. 3.2 Under the Background of New Engineering, Reform the Teaching Method of “Discrete Mathematics” Courses and Improve the Interdisciplinary Ability of Applied Talents Apply Knowledge Maps in Teaching to Build a Multidisciplinary Knowledge System As an important core basic course of computer science, discrete mathematics should adapt its knowledge structure, teaching content and teaching methods to the cultivation of interdisciplinary divergent thinking and innovation ability. Interdisciplinary divergent thinking needs to build the knowledge system of disciplines, and more importantly, integrate related disciplines to cultivate students’ innovation ability. In teaching, according to the lesson time and teaching objectives, on the basis of classroom teaching methods, in order to improve learning efficiency and stimulate students’ divergent thinking, the content and knowledge modules taught are summarized and sorted out, a knowledge map is formed, and the intrinsic relationship between knowledge modules is vividly and concretely expressed. Through hierarchical lines to connect each knowledge point, build the internal relationship between each knowledge point, promote the integration of knowledge, and form a clear knowledge structure map. Figure 1 shows the relationship between the modules of the “Discrete Mathematics” course and the map of related knowledge points within the modules. From the relationship between

294

M. C. Zhang and M. H. Zhang

the various knowledge modules, it can be seen that set theory and mathematical logic, as the theoretical basis of discrete mathematics, can give students a macro understanding of the overall knowledge framework of discrete mathematics courses. In classroom teaching, the knowledge system with hierarchical structure is constructed with the help of knowledge map, which can not only give the overall architecture, but also represent the internal relationship between each knowledge subsystem and the specific logical composition and characteristics of related concepts, reasoning, proofs and other related concepts within the subsystem. At the end of each chapter, students are guided to construct their own mind maps, which can improve students’ ability to summarize and synthesize knowledge points, and at the same time, they can sort out the solution ideas and gradually refine the knowledge [11].

Predicate logic symbolization

Propositional Logic Mathematical Logic

Reasoning theory

Predicate Logic

Set concept

Set

Relationship definition Diagram Relationship matrix Binary relationships Equivalence relationship Partial order relationship Feature functions Membership functions

Relationship

Set Theory

Function

Directed graph Undirected graphs Generate subgraphs Export the subgraph Hamiltonian diagram Eulera Two -part diagram

Graph

Directed tree Undirected tree Traversal of the tree Binary tree Optimal binary tree Minimal spanning tree

Tree

Graph Theory

Fig. 1. Relationship diagram of knowledge modules in discrete mathematics

Practice and Exploration of Discrete Mathematics Teaching

295

Discrete mathematics is widely used in various disciplines, especially in the field of computer science and technology, and discrete mathematics is also an essential prerequisite for many professional courses in computer science, such as operating systems, computer communication and networks, artificial intelligence, databases, algorithm design and analysis. Discrete mathematics knowledge can be widely used in data mining, network security, artificial intelligence, computer graphics, and computer vision. In order to help students build interdisciplinary divergent thinking and integrate related disciplines, in the teaching process, sort out the connections between different courses, increase the introduction of course relevance, strengthen the vertical connection between professional basic courses, highlight the application of knowledge, and build a knowledge map between discrete mathematics and other professional courses, so that students can understand the application of discrete mathematics knowledge system in professional courses and learn to integrate them. As shown in Fig. 2, the knowledge graph of several core courses of discrete mathematics and big data science and technology is constructed. Students can experience the effective connection with professional courses, understand the internal logic and relationship of professional courses, and realize the cultivation of students’ divergent thinking and innovative thinking. Data Structure (Elements and Collection Storage Logic ) (Abstract Data of the Digure) (Abstract Data Type of the Tree)

Principles of Big Data Technology (Big Data Storage and Management ) (Big Data Processing and Analysis ) Binary relationships , properties

The abstract implementation describes the method

Computer Operating System (Processor Management ) (Storage Management ) (File Management ) (Device Management )

Computing basics

Select , Project , Jion

Set Theory

Mathematical Logic

Main disjunction paradigm truth table illation

Principles of Computer Circuits (Logic Design ) (Combination Lock Issue) (IC PCB Design )

Eulera Hamiltonian diagram Process scheduling mechanism

Connectivity of the graph The adjacency matrix of the graph

Network (Network Topology )

Set theory and data

graph theory

Eulera

directed graph Undirected graphs Spanning tree

tree Minimal spanning tree Shortest path

Software

Computer Principles (HDD Storage Design )

Algorithm Analysis and Design (Reform Law , Greedy Technology )

Fig. 2. Discrete mathematics and knowledge graph of some professional courses

296

M. C. Zhang and M. H. Zhang

Optimize Teaching Resources and Broaden Students’ Learning Channels In the context of the new engineering education concept, it is required to focus on mobilizing and improving students’ interest in learning, improving students’ participation in learning, improving students’ learning effects, and paying attention to the cultivation of learning ability. The course team optimizes the existing teaching resources, gradually improves the online teaching resources, and broadens the channels for students’ independent learning. The specific methods are as follows: publish PPT, video, documents and other forms of online learning resources based on knowledge modules; Build a question bank, create a variety of subjective and objective questions, and facilitate teaching tests, homework and other links; Establish extended learning resources, collect excellent curriculum resources (videos, etc.), share them with students, and deepen students’ understanding; Establish algorithm resources, introduce some typical algorithms in the online teaching platform, and deepen students’ understanding of discrete mathematical knowledge points [12]. At present, the teaching resources of the network platform are constantly improving and under construction. Build a “Student-Centered” Teaching Model to Tap Students’ Potential In the traditional teaching mode, students are basically in a passive state of acceptance, and students’ autonomy and enthusiasm are not high. “New Engineering” requires attention to mobilizing and improving students’ interest in learning, improving students’ participation in learning, improving students’ learning effects, and paying attention to the cultivation of learning ability. This requires that teaching should follow the principle of “teacher-led, student-oriented”, form a “student-centered” teaching model, teach students “how to learn”, encourage discussion and communication between students [13], cultivate students’ innovative spirit, learning ability, practical ability, and tap students’ potential. Build a Team of Qualified Teachers to Achieve Mutual Learning in Teaching Under the background of new engineering, the organic combination of engineering and computer puts forward new requirements for computational thinking, which puts forward new challenges to the knowledge structure, knowledge application and logical thinking training of “discrete mathematics” teaching, and puts forward higher requirements for teachers. The enrichment and improvement of teaching content and teaching methods are inseparable from the accumulation of teachers’ knowledge. Teachers’ own ability, knowledge level, knowledge ability, quality cultivation, etc., are all subtly infecting and influencing students [8]. The continuous improvement of teachers’ knowledge has a direct demonstration and guiding effect on students, and strives to achieve mutual learning in teaching. Establish Multiple Evaluation Methods and Optimize the Course Evaluation System The implementation of process-based assessment methods, the establishment of a diversified evaluation system, the focus on examining students’ learning process, and the initiative, depth and effect of participation are the main dimensions of evaluation. Course overall evaluation score = usual grade + final grade. Among them, the usual grades are calculated proportionally by the results of attendance, preview, discussion, class notes,

Practice and Exploration of Discrete Mathematics Teaching

297

classroom answering questions, homework, online teaching platform learning time, test question results, chapter tests, etc., and the attendance results are based on classroom attendance; Pre-study grades are based on pre-study reports and mind map completion; Class note grades are based on note-taking; Online assignments are based on online tests and completion of homework; Establish relative evaluation and assessment criteria for each part of the score (including the number of completions, effects and corresponding scores). The final examination is conducted in the form of a closed-book written examination, which focuses on flexibly testing students’ knowledge application ability and mathematical modeling ability, including fill-in-the-blank, multiple-choice, judgment, drawing, calculation, proof, comprehensive application, etc. Objective and reasonable diversified assessment methods can stimulate students’ initiative in learning, help cultivate students’ independent learning ability and application ability, enhance students’ sense of innovation, strengthen students’ use of course knowledge to model and apply practical problems, cultivate innovative ability, and improve students’ comprehensive ability to find problems, analyze problems, solve problems and mathematical modeling.

4 Conclusion In view of the new requirements for talent training in application-oriented undergraduate colleges under the background of “new engineering”, the teaching reform practice of “discrete mathematics”, an important core course of computer and related majors, is carried out. By updating teaching concepts, reforming teaching content, building a “studentcentered” teaching model, using multidisciplinary cross-integration of knowledge maps and other teaching methods in teaching, establishing multiple evaluation methods, continuously improving teachers’ ability and literacy to achieve teaching mutual benefit and other practices and explorations, stimulating students to explore innovative learning, cultivating students’ abstract generalization ability, divergent thinking ability and comprehensive analysis of problems, forming a scientific and open computational thinking mode, and playing an important role in the cultivation of application-oriented talents in the context of new engineering.

References 1. Wu, X., Gao, Y.: Research on “Discrete mathematics” Curriculum reform in the context of new engineering. J. Tonghua Normal Univ. 43(2), 136–140 (2022) 2. Ministry of education. New engineering construction guide. 09 May 2017 3. Wang, X., Wang, R., Yang, J., Li, S.: Discrete mathematics as the core professional basic course in the background of new engineering research on practical teaching system. Comput. Educ. (10), 146–149 (2018) 4. Lin, J.: Reform and curriculum construction of curriculum system of new engineering majors. Higher Educ. Teach. Res. Dyn. (1), 1–13+24 (2020) 5. Tao, L., Gong, F.: Research on the teaching of “discrete mathematics” under the background of new engineering. Sci. Technol. Wind 427(23), 41–42 (2020)

298

M. C. Zhang and M. H. Zhang

6. Li, G.: Exploration on teaching reform of discrete mathematics courses of applied computer majors under the background of new engineering. Herald Sci. Technol. Econ. 28(32), 164–166 (2020) 7. He, K.: Discrete mathematics teaching content reform under new engineering education. Comput. Age (5), 82–84+88 (2022) 8. Luo, W.: Research on discrete mathematics curriculum teaching under the development of new engineering. Softw. Guide 12(19), 157–159 (2020) 9. Zhou, X., Qiao, H., Li, L.: Refinement of teaching objectives and curriculum content integration of discrete mathematics courses. Educ. Teach. Forum (6), 261–264 (2020) 10. Zheng, Z., Fan, C., Liu, X.: Curriculum reform and practice of “discrete mathematics” in the context of new engineering. Educ. Teach. Forum 4(17), 66–69 (2021) 11. Huai, L., Xuan, Y.: Discussion and practice of discrete mathematics curriculum teaching methods in the context of new engineering. Res. Contemp. Educ. Pract. Teach. (12), 184–185 (2020) 12. Wang, M., Liu, Q., Wu, L.: Research and practice of hybrid teaching of discrete mathematics under the background of informatization. Comput. Knowl. Technol. 18(10), 167–168 (2022) 13. Limei, D., Jianmei, Z., Xincheng, Z.: Research on the teaching reform ideas of discrete mathematics course. J. Changzhi Univ. 38(2), 109–111 (2021)

Research on Production Line Balance Optimization Based on Improved PSO-GA Algorithm Zhijian Pei1(B) , Zhihui Deng2 , and Xinmin Shi1 1 Department of Intelligent Equipment, Changzhou College of Information Technology,

Changzhou 213164, China [email protected] 2 Academic Affairs Office of Changzhou College of Information Technology, Changzhou 213164, China

Abstract. According to the complementary characteristics of process planning and workshop scheduling, integrating them can improve the performance of intelligent manufacturing systems. However, traditional precise methods cannot effectively solve the problem of large-scale integrated process planning and scheduling (IPPS). In order to minimize the maximum completion time, this paper establishes a resource scheduling model for intelligent manufacturing workshops. On this basis, a combination of genetic algorithm and particle swarm optimization algorithm is proposed. In order to further improve the search and optimization ability, a hierarchical encoding and decoding method is designed. Finally, through the test of benchmark cases and actual production and processing problems, the superiority and effectiveness of the algorithm are verified. Keywords: Genetic algorithm · Encoding and decoding method · Integrated process planning and scheduling · Intelligent manufacturing

1 Introduction Scheduling has become the core of modern manufacturing and management, and is one of the restrictive factors in the production process of enterprises. Compared with traditional manufacturing systems, modern flexible processing is a highly flexible automated production system [1], showing more choice and greater scope of the process which lead to multiple choices of tasks and equipment for the same process characteristics [2, 3]. Therefore, dealing with production scheduling resource allocation and planning constraints is the key to improving constraints and improving efficiency. Fund projects: Jiangsu Qing Lan Project 2020 for young teachers; Jiangsu Qing Lan Project 2021 for teacher team; High-level training for professional leaders of teachers in Jiangsu vocational colleges (2021GRGDYX090); Scientific research platform project of Changzhou college of Information Technology (KYPT202103G). © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 299–310, 2024. https://doi.org/10.1007/978-981-99-9412-0_31

300

Z. Pei et al.

The essence of production scheduling is to decompose the mapping between production tasks and corresponding processing resources [4, 5]. Traditional scheduling and planning are generally carried out sequentially [7, 8], each job has only one process plan, and scheduling flexibility cannot be realized. Therefore, it will cause production problems such as unbalanced load, conflicting optimization goals and resource bottlenecks [9]. In order to solve the above problems, scholars proposed the IPPS method. Moon [9] et al. established a mixed integer programming model considering sequence and priority constraints to solve process planning and scheduling problems, and developed a search method based on topological sorting. Finally, using experiments with different size of problems, the efficiency of the developed evolutionary search method is verified. Mathematical modeling by Torkashvand [10] et al. via mixed integer linear programs. A biogeography-based optimization method was developed. And its performance is compared with three famous algorithms in the literature, which proves that the algorithm is superior to other tested algorithms. Zhang [11] et al. adopted an Object Coding Genetic Algorithm (OCGA) to solve the IPPS problem in workshop-type flexible manufacturing systems. Corresponding genetic operations are proposed to encode processing operations onto chromosomes and implemented in genetic manipulation. Experiments show that the proposed genetic algorithm can generate excellent results for complex IPPS instances. Wang [12] et al. built an IPPS optimization model and proposed the ACO algorithm. By designing the transfer probability of ants between nodes and the different nodes they visit, a scheduling scheme was established. The search rate is adjusted according to the maximum completion time of the scheduling scheme of different stages. Finally, the feasibility of the algorithm is verified by simulation. IPPS is an NP-hard problem [3], and traditional exact methods [13] cannot effectively solve large-scale IPPS problems. It is necessary to design a coding method according to its characteristics to improve this problem. Aiming at the problems of IPPS, this paper proposes a new encoding and decoding method based on OR nodes by analyzing the selection mechanism of OR nodes in the process network diagram. Compared with existing coding methods, it has better integration characteristics. Use the OR node chromosome to determine the process selection of the job. The job chromosome gene composed of the job number and the job number can determine the sequence of the process and the scheduling sequence. The machine chromosome indicates the machine assignment of the operation. On this basis, this paper further proposes an improved genetic algorithm, and the simulation results show that the algorithm has sufficient advantages in solving the IPPS problem.

2 Scheduling Mathematical Model 2.1 Analysis of Job Shop Scheduling Problems The IPPS problem can be described as: Given a set of operations and M machines with a number of N parts for process planning, optimize the schedule and process plan of all operations simultaneously according to constraints and objectives. In other words, to formulate a reasonable process route and workshop scheduling plan when certain constraints and optimization objectives are met [7, 14–16]. There are mainly the following types of flexible process planning: 1) Machine flexibility: refers to different machines

Research on Production Line Balance Optimization

301

that can operate a process or process; 2) Sequence flexibility: refers to the order of certain processing features can be freely exchanged, because not all Features all contain strict sequence constraints, and there are various operation sequences appearing in the same operation; 3) Processing flexibility: refers to the fact that different processes can operate on the same feature, because each manufacturing feature has a different processing plan, and each plan involves different processing methods. This paper draws on the flexible process planning network proposed by Kim [14] and Liu [17] et al., as shown in Fig. 1. The network is composed of four types of nodes and directed edges describing the priority relationship between nodes, which is a unidirectional acyclic graph. In the figure, S and E represent the production start and end nodes, respectively, the matrix box represents the processing of the operation node, the arrows indicate the processing order, and OR and JOIN represent the flexibility of processing, that is, the feature can be processed by different processes. OR indicates the processing path from OR to JOIN, if the subsequent route of a node is connected through OR, the job can only be processed through one OR. If the path is not connected with OR, all operations on this row must be visited. Numbers in circles represent operation numbers, and the numbers in “{}” and “()” represent the optional machine number and corresponding processing time respectively. For example, the possible paths for a job are: O1 → O3 → O4 → O5 → O7 → O9 → O12 → O13 → O14 → O15 → O16 → O6 → O11

2.2 IPPS Mathematical Model In the traditional scheduling scheme, there is only one process plan for each job, and flexibility is not considered. With the widespread use of smart and flexible manufacturing systems, each job may involve multiple feasible process plans. IPPS is such an NP-hard problem [18]. Since different jobs have different process network diagrams, the number of alternative network diagrams may be large. In order to study the workshop scheduling problem in the intelligent manufacturing environment, the following constraint assumptions need to be satisfied: 1) The workpiece is independent, and one machine can only process one job at one time; 2) Different operations of a workpiece cannot be processed at the same time and the same workpiece can only be processed by one machine; 3) Every machine and every workpiece has the same priority. Once the workpiece starts to be processed, it cannot be interrupted unless the machine breaks down; 4) After each workpiece is processed, it is immediately transmitted to the next level process and the transmission time is 0; 5) The time when the machine starts processing the workpiece is the start-up time, and the time when the processing is completed is the downtime. Based on these assumptions, the mathematical model of IPPS is described as follows: Fgoal = max{CTie }

(1)

302

Z. Pei et al.

In formula (1): CTie represents the machining end time of the workpiece Ni . CTije = CTijs + ctijk

(2)

In formula (2) CTijs represents the processing start time of Oij , CTije represents the processing end time of Oij , and ctijk represents the processing time of Oij by machine Mk . CTi(j+1)s ≥ CTije

(3)

Formulas (2) and (3) indicate that the same workpiece needs to be processed in a certain order, and each machine can only process one workpiece at one time. M 

xijk = 1

(4)

k=1

In formula (4), xijk ∈ [0, 1] [0, 1], it means whether Oij is processed on the kth machine tool, if yes, it is 1, otherwise it is 0. It also shows that the same workpiece can only be processed by one machine at one time.

3 Workshop Scheduling Algorithm Based on PSO-GA Genetic algorithm has strong optimization ability, and each chromosome represents a potential optimal solution of the problem to be solved. However, it is easy to fall into a local optimal situation during the search process and the later search speed is slow. In addition, for complex problems to be optimized, it is often difficult for chromosomes to fully express the solution of the problem. However, the scheduling of multi-processes in intelligent workshop has difficulties such as dynamic fuzziness and multi-constraints, so the traditional genetic algorithm cannot be satisfied. Multi-layer coding GA divides individual codes into multiple layers, each layer of codes expresses different meanings, and the multi-layer codes together completely express the solution of the problem [25]. And the PSO algorithm can avoid falling into the local optimum and speed up the search. Therefore, this paper adopts the GA algorithm based on PSO optimized multi-layer coding to solve the flexible workshop scheduling problem. The flow chart of the GA algorithm for optimizing multi-layer coding by particle swarm optimization is shown in Fig. 2, and the specific operations are as follows: Encoding and decoding: This article refers to the MGA real number multi-layer encoding method adopted by Li [19] et al. This coding method is composed of three parts: OR node chromosome, operation chromosome and machine chromosome. Figure 1 shows the specific operation process plan for scheduling, and its corresponding mathematical expression is shown in Table 1. Figure 3 is an example of the encoding method. For the operation chromosome, each gene contains a job number and an operation number. On the basis of the above encoding method, the decoding method needs to be pre-processed.

Research on Production Line Balance Optimization

Fig. 1. Process Planning Network

303

Fig. 2. PSO-GA Algorithm Flow

Step 1: First decode the operation chromosome, and get the operation of each OR node in sequence from left to right. As shown in Table 1 and Fig. 3, the operations obtained according to each OR node are: J1 (O1 ), J2 (O4 , O3 , O5 ), J3 (O6 ) , J4 (O7 ), J5 (O9 ), J6 (O11 ), J7 (O12 ) , J8 (O13 ), J9 (O14 ), J10 (O15 ), J11 (O16 ) Step 2: Decode OR node chromosomes, chromosomes only represent the processing order of artifacts. As shown in Table 1 and Fig. 3, the processing order of each feature sequence obtained is: J1 → J2 → J4 → J5 → J7 → J8 → J9 → J10 → J11 → J3 → J6 From the results obtained in step 1, the processing sequence of each operation is: O1 → O3 → O4 → O5 → O7 → O9 → O12 → O13 → O14 → O15 → O16 → O6 → O11 Step 3: Decode machine chromosomes, and obtain the processing machines selected by each operation from left to right in turn. As shown in Table 1 and Fig. 3, the sequence of processing machines was obtained. O1 (M1 ), O2 (M10 ), O3 (M6 ), O4 (M7 ) , O5 (M16 ), O6 (M3 ), O7 (M5 ), O8 (M12 ) , O9 (M14 ), O10 (M2 ), O11 (M8 ), O12 (M42 ) , O13 (M10 ), O14 (M21 ), O15 (M18 ), O16 (M47 )

304

Z. Pei et al.

Step 4: Based on the results of Step 2 and Step 3, determine the final routing of the job, as well as the corresponding processing machines and processing time for each operation. As shown in Table 1 and Fig. 3, the operation path of this encoding scheme is: O1 (M1 ) → O3 (M6 ) → O4 (M7 ) → O5 (M2 ) → O7 (M4 ) → O9 (M14 ) → O12 (M48 ) → O13 (M9 ) → O14 (M23 ) → O15 (M18 ) → O16 (M42 ) → O6 (M3 ) → O11 (M8 ) The processing time corresponding to each operation can be obtained from Table 1. Table 1. Workpiece Plan Information. Workpiece Operation order Machine

processing time priority constraints

J1

O1

M1 , M11

J2

O2 O3 , O4 , O5

M10 , M13 42,38 M6 , M9 /M7 /M2 , M1 17,18/7/2,16

J3

O6

M 3 , M8

40,48

J4

O7

M4 , M 5

23,47

prior to J5 , J6

J5

O8 , O10 O9

M12 /M2 , M 8 M14 , M12

20/18,22 50,28

prior to J6

J6

O11

M 8 , M6

8,6

J7

O12

M48 , M42

48,42

prior to J8 , J9 , J10 , J11

J8

O13

M9 , M11 , M10

9,11,10

prior to J10 , J11

J9

O14

M23 , M24 , M21

23,24,21/18,32

prior to J10 , J11

J10

O15

M18 , M32 ,

42,47

prior to J11

J11

O16

M42 , M47

42,47

20,18

prior to J2 , J3 prior to J3

Fig. 3. Example of coding method

Initialization: At the OR node, one of the two paths can be selected, and the machine chromosome randomly selects a processing machine for each operation, and the operation chromosome represents the operation sequence of randomly generating the operation chromosome (should follow the operation sequence in the network diagram precedence constraints). The fitness function of the chromosome is selected as the time when all the jobs are completed.

Research on Production Line Balance Optimization

305

Selection: Use the roulette method to select individuals with better fitness. If the probability (between 0 and 1) is less than the set probability (0.8 is set in this paper), it is discarded. Pi (i) = Fitness(i)/

n 

Fitness(i)

i−1

(5)

Fitness(i) = 1/fitness(i) where Pi represents the probability that chromosome i is selected in each selection. Crossover: In order to preserve the feasibility of operating the chromosome, two intersection points A and B are randomly selected, and the chromosome is divided into three parts, and the two-point crossover operation is designed as shown in Fig. 4(a). The gene at the position between the chromosome A-B of the parent P1 is passed to the offspring chromosome O1, and the same operation is also performed on the parent P2 and O2. Delete the same genes in P1 as in O2, and place the remaining genes in P1 outside the positions A − B in O2. The same crossover method is also used for P2 and O1.

Fig. 4. Cross operation; (a) OR node chromosome crossing, (b) operation chromosome crossing, (c) machine chromosome crossing.

306

Z. Pei et al.

Since the gene positions of two chromosomes represent the OR-node number, crossover only needs to exchange genes at the same position. The crossover process is shown in Fig. 4(b) and Fig. 4(c). Variation: A two-point exchange variation was used for the OR node chromosome. For operational chromosomes and machine chromosomes, the variation randomly selects an element in the string and changes its value to another value in the corresponding range, as shown in Fig. 5.

Fig. 5. Variation Operation

4 Simulation and Experimental Analysis Three experiments are given to illustrate the effectiveness and performance of the proposed mathematical model and algorithm. For comparative purposes, the first and second experiments were cited from some other papers, while the last one consisted of various jobs with several alternative process schemes. This paper takes the completion time as the target. After the IPPS problem is expressed as a mathematical model, the solution is obtained by the proposed PSO-GA method. In order to conduct experimental comparisons under the same conditions, the parameters and hardware platform used in the proposed algorithm are set: Matlab 2020a solves the PSO-GA algorithm, and the hardware platform is Intel Core(TM) i7–7700 processor, RAM 16 GB. The basic parameter settings of the PSO-GA algorithm proposed in this paper are: the initial population size N = 100, the maximum number of iterations ite = 50, the crossover probability PC ∈ [0.4 ∼ 0.8] and the variation probability PV ∈ [1.0 ∼ 0.03] are adaptively changed, and the variation probability Pm = 0.2. Experiment 1 is taken from Jain [20], and 6 problems were constructed with 18 processes and 4 machines. Table 2 shows the experimental results, and Fig. 6 shows the Gantt chart of Experiment 3. The experimental results of Experiment 3 show that the results of the non-IPPS model are worse than those of IPPS, which proves the feasibility of the model established in this paper.

Research on Production Line Balance Optimization

307

Experiment 2 includes the questions proposed by Moon et al. [9], Amin and Afshari [21], where P1–1 and P1–3 are extensions of P1, and P2–1 and P2–2 are extensions of P2. Table 3 shows the experimental results and the comparison with the latest methods. Experimental results show that the proposed PSO-GA can achieve better results than the reported methods in less CPU time. Experiment 3 based on the actual data of the intelligent manufacturing workshop with 6 workpieces and 10 machines, and the premise that each workpiece requires 8 processing procedures, the optional machines for the process and the processing timetable for each process are given, as shown in the Table 4. Table 2. Comparison Experiment Results. No. of problems

Process quantity

Maximum Completion Time (S) No IPPS

IPPS

1

8

620

519

2

10

805

646

3

12

1063

963

4

14

1428

1125

5

16

1508

1498

6

18

1791

1607

Fig. 6. Gantt chart corresponding to Experiment 1.

Using the algorithm proposed by this paper to search and optimize, the iterative process of comparing PSO-GA algorithm and GA fitness value is shown in Fig. 7, and the corresponding Gantt chart is shown in Fig. 8. 505 in the figure represents the fifth machining process of the workpiece 5. It can be seen from Fig. 7 that the PSO-GA algorithm has already found the optimal value in the 38th generation, which is 22% higher than GA and the fitness value is reduced

308

Z. Pei et al. Table 3. Comparison of Results of Experiment 2.

Problem

P1

Process quantity

No. of machines

2

Optimal completion CPU time time

References

Cited paper

Moon et al. [9]

5

21

PSO-GA

Cited paper

16

/

P1–1

25

5

82

62

5

P1–3

100

5

447

347

267

P2

PSO-GA 1.84 3.2

Amin et al. [21]

85.15

8

6

33

26

/

1.85

P2–1

40

6

156

105

27

12.63

P2–2

80

6

301

258

196

85.28

Table 4. Process Machine List and Processing Time. Workpiece

Workpiece 1

Workpiece 2

Workpiece 3

Workpiece 4

Workpiece 5

Workpiece 6

Process 1/time

3,10/3,5

2/6

3,9/1,4

4/7

5/6

2/2

Process 2/time

1/10

3/8

4,7/5,7

1,9/4,3

2,7/10,12

4,7/4,7

Process 3/time

2/9

5,8/1,4

6,8/5,6

3,7/4,6

3,10/7,9

6,9/6,9

Process 4/time

4,7/5,4

6,7/5,6

1/5

2,8/3,5

6,9/8,8

1/1

Process 5/time

6,8/3,3

1/3

2,10/9,11

5/1

1/5

5,8/5,8

Process 6/time

5/10

4,10/3,3

5/1

6/3

4,8/4,7

3/3

Process 7/time

4,8/6,5

6/8

9,1/7,8

7/9

3/7

3,8/2,7

Process 8/time

2,6/3,5

5/6

6,8/8,6

9/3

3/1

3,9/1,4

from 65 to 60, which shows the rapidity of the PSO-GA algorithm and optimization effectiveness. The above results show that the proposed new encoding and decoding method can effectively solve the problem of complex ensemble scheduling. In addition, the rapidity and optimization effectiveness of the proposed PSO-GA algorithm can also be obtained. It can effectively solve the IPPS problem.

Research on Production Line Balance Optimization

Fig. 7. Comparison of Algorithm Iteration Curves.

309

Fig. 8. Gantt chart of Experiment 3.

5 Conclusion This paper addresses the workshop IPPS problem in the intelligent manufacturing environment. With the minimum completion time as the goal, an intelligent manufacturing workshop resource scheduling model is established. In order to avoid falling into local optimum and further improve the search ability, a PSO-GA algorithm is proposed on the basis of this model. By analyzing the selection mechanism of OR nodes for operations, an encoding and decoding method for OR nodes with better integration characteristics is proposed. In the codec method, the OR chromosome is used to determine the process selection of the job, while the job number and the machine chromosome determine the process scheduling sequence and machine allocation. Finally, multiple experimental studies were performed to compare the method with other previously developed methods. Experimental results show that this method is more effective for IPPS problems and achieves better overall optimization results.

References 1. Shin, S.J., Woo, J., Rachuri, S., et al.: An energy-efficient process planning system using machine-monitoring data: a data analytics approach. Comput. Aided Des. 110, 92–109 (2019) 2. Cong, Y., Tian, D., Feng, Y., et al.: Speedup 3-D texture-less object recognition against self-occlusion for intelligent manufacturing. IEEE Trans. Cybern. 49(11), 3887–3897 (2018) 3. Yuan, M., Zhou, Z., Cai, X., et al.: Service composition model and method in cloud manufacturing. Robot. Comput.-Integr. Manuf. 61, 101840 (2020) 4. Chen, M., Zhu, H., Zhang, Z., et al.: Multi-agent job shop scheduling strategy based on pheromone. China Mech. Eng. 29(22), 2659 (2018) 5. Cui, D., Bo, J., Bureau, W.W., et al.: Improved bird swarm algorithm and its application to reservoir optimal operation. J. China Three Gorges Univ. (Natural Sciences) (2016) 6. Liu, X., Yi, H., Ni, Z.: Application of ant colony optimization algorithm in process planning optimization. J. Intell. Manuf. 24(1), 1–13 (2013) 7. Barzanji, R., Naderi, B., Begen, M.A.: Decomposition algorithms for the integrated process planning and scheduling problem. Omega 93, 102025 (2020)

310

Z. Pei et al.

8. Li, X., Gao, L., Wang, W., et al.: Particle swarm optimization hybridized with genetic algorithm for uncertain integrated process planning and scheduling with interval processing time. Comput. Ind. Eng. 135, 1036–1046 (2019) 9. Moon, C., Lee, Y.H., Jeong, C.S., et al.: Integrated process planning and scheduling in a supply chain. Comput. Ind. Eng. 54(4), 1048–1061 (2008) 10. Torkashvand, M., Naderi, B., Hosseini, S.A.: Modelling and scheduling multi-objective flow shop problems with interfering jobs. Appl. Soft Comput. 54, 221–228 (2017) 11. Zhang, L., Wong, T.N.: An object-coding genetic algorithm for integrated process planning and scheduling. Eur. J. Oper. Res. 244(2), 434–444 (2015) 12. Wang, J., Yin, G., et al.: Integrated process planning and scheduling based on an ant colony algorithm. J. Southeast Univ. (Nat. Sci. Edn) 42(S1), 173–177 (2012) 13. Zhou, Y.Z., Yi, W.C., Gao, L., et al.: Adaptive differential evolution with sorting crossover rate for continuous optimization problems. IEEE Trans. Cybern. 47(9), 2742–2753 (2017) 14. Liu, Q., Li, X., Gao, L., et al.: A modified genetic algorithm with new encoding and decoding methods for integrated process planning and scheduling problem. IEEE Trans. Cybern. 51(9), 4429–4438 (2020) 15. Zhang, S., Wong, T.N.: Integrated process planning and scheduling: an enhanced ant colony optimization heuristic with parameter tuning. J. Intell. Manuf. 29(3), 585–601 (2018) 16. Kim, Y.K., Park, K., Ko, J.: A symbiotic evolutionary algorithm for the integration of process planning and job shop scheduling. Comput. Oper. Res. 30(8), 1151–1171 (2003) 17. Roshanaei, V., Azab, A., ElMaraghy, H.: Mathematical modelling and a meta-heuristic for flexible job shop scheduling. Int. J. Prod. Res. 51(20), 6247–6274 (2013) 18. Gaowei, J., Jianfeng, W., Peng, W., et al.: Using multi-layer coding genetic algorithm to solve time-critical task assignment of heterogeneous UAV teaming. In: 2019 International Conference on Control, Automation and Diagnosis (ICCAD), pp. 1–5. IEEE (2019) 19. Li, X., Gao, L., Pan, Q., et al.: An effective hybrid genetic algorithm and variable neighborhood search for integrated process planning and scheduling in a packaging machine workshop. IEEE Trans. Syst. Man Cybern. Syst. 49(10), 1933–1945 (2018) 20. Jain, A., Jain, P.K., Singh, I.P.: An integrated scheme for process planning and scheduling in FMS. Int. J. Adv. Manuf. Technol. 30(11), 1111–1118 (2006) 21. Amin-Naseri, M.R., Afshari, A.J.: A hybrid genetic algorithm for integrated process planning and scheduling problem with precedence constraints. Int. J. Adv. Manuf. Technol. 59(1), 273–287 (2012)

Prediction and Analysis of Stroke Risk Based on Ensemble Learning Xiuji Zuo1 , Xin Guo1(B) , Zilong Yin2 , and Shih-Pang Tseng3 1 School of Information Science and Technology, Sanda University, Shanghai 201209, China

[email protected]

2 Department of Electronic and Electrical Engineering, Shanghai University of Engineering

Science, Shanghai 201620, China 3 School of Software and Big Data, Changzhou College of Information Technology,

Changzhou 213164, China [email protected]

Abstract. With the development of science and technology, the application of data mining in medical field is becoming more and more popular. Machine learning methods also plays an important role in disease prediction. Stroke is characterized by high incidence rate, high disability rate, high mortality rate and high recurrence rate, and it is also likely to cause other kinds of complications. In this paper, each feature in the stroke dataset was analyzed in order to find out the factors affecting stroke and conducts classification and prediction research on whether there is a disease risk. Specifically, the PCA (Principal Component Analysis) algorithm is used to extract the main feature components of data, the SMOTE (Synthetic Minority Oversampling Technique) algorithm is used to adjust imbalanced feature categories. Traditional machine learning classification algorithms, such as decision tree, SVM(support vector machines), and various ensemble learning algorithms are used for the prediction of stroke risk, so as to study the relationship between stroke disease and each feature, and the classification prediction model, so that we can prevent strokes in time and reduce the risk of stroke. Among all the models, Bagging (Bootstrap aggregating) has the best performance with an ROC value of 0.97. Keywords: Stroke risk · Data Mining · Ensemble Learning · PCA

1 Introduction Stroke is an acute cerebrovascular disease that causes brain tissue damage due to the sudden rupture of blood vessels in the brain or the inability of blood to flow into the brain due to vascular obstruction. With four major features of high morbidity, disability, mortality and recurrence [1], it also cause other kinds of complications and dysfunctions [2]. According to the data of “China Stroke Prevention and Control Report (2020)”, 4 out of 10 people aged 40 years and above in China are likely to suffer from stroke. Therefore, it is important to grasp the stroke incidence factors and take preventive measures against them to improve the survival rate of stroke patients. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 311–319, 2024. https://doi.org/10.1007/978-981-99-9412-0_32

312

X. Zuo et al.

This paper aims to use machine learning methods to analyze the risk factors associated with stroke and predict whether stroke will occur, in order to achieve the purpose of preventing stroke in advance.

2 Related Work In recent years, with the development of big data technology and the popularization of machine learning methods in the medical industry, many scholars at home and abroad have been involved in the research of cardiovascular and cerebrovascular disease prediction models. Scholars have conducted research on different aspects of stroke, mainly including pre-prediction, recurrence prediction, prognosis prediction and recovery prediction of stroke patients [3]. Xiumei Gao et al. used risk factors that cause stroke recurrence as input variables in the logistic regression model to predict the probability of stroke recurrence [4]. Zhijie Zhang et al. used a collection of electronic medical records of clinical medicine, constructed a BP neural network with the mean absolute error as the evaluation criterion to predict whether a stroke would recur or not and compared the performance with other classical machine learning models, such as decision trees and support vector machines [5]. Xiaoqing Yang et al. applied multivariate logistic regression model combined with multimodal MRI to predict patient outcomes after mechanical thrombolysis in acute stroke [6]. Wei Ye et al. fused AIS clinical dataset with imaging histology data, used prognostic triple classification of outcomes and optimized the parameters of the integration algorithm to improve the accuracy of prognostic classification prediction. Among all the ensemble learning models, the soft voting integration has the best prediction effect with an AUC (Area Under The Curve) value of 0.95. The AUC value reaches 96% after optimization by IABC(Improved Artificial Bee Colony) algorithm [7]. Pingju Lin et al. used clinical measurements, EEG measurements, and stroke survivors’ scores for completing stretch rehabilitation training as the original data. A deep learning algorithm was used to input the raw data and output recovery prediction categories for each patient, including poor recovery and scaled recovery. In the study, the deep learning model achieved an AUC value of 0.95 [8]. Xinyu Zhang et al. combined logistic regression algorithm with artificial intelligence neural network to predict the risk of debility in elderly stroke patients [9]. Framingham determined the points of each risk factor and calculated the risk probability through logistic regression [10]. In summary, studies have been exploring from the simplest statistical models to complex and varied machine learning models in stroke prediction research, but there is no model that is recognized as the most effective. In this paper, Smote algorithm is used to deal with the imbalance of feature categories, PCA algorithm is used to extract features, and ensemble learning algorithm is used to build stroke risk prediction model, so as to improve the effect of stroke risk prediction.

Prediction and Analysis of Stroke Risk Based on Ensemble Learning

313

3 Method The stroke dataset on the kaggle website is collected and preprocessed, which mainly includes the steps of duplicate value processing, missing value processing, outlier processing and data format conversion. The drop duplicates function of the dataframe module in python is used to remove duplicates, and mean value method is used to fill in the missing values. 3.1 Data Preprocessing In order to improve the quality of data mining, it is necessary to preprocess the data, including duplicate value processing, missing value processing, outlier processing, data format conversion and other steps. Specifically, the drop_duplicates function of the dataframe module in python is used to delete duplicate values, and the mean value method is used to fill in the missing values. After analysis, it was found that the outlier in the data set were gender variables and smoking status variables. For the gender variable, since there was only one case of gender “Other”, it was not significant for the study and therefore was deleted directly. For smoking status, there are 789 cases of Unknown, accounting for 15% of the overall data set, so this paper used the replace function in python to replace Unknown with null values, and then used the fillna function to fill in the missing values by applying the plurality method. 3.2 Principal Component Analysis The main steps of principal component analysis include test analysis, data standardization, calculation of covariance and calculation of contribution rate. The data was first subjected to spherical test and KMO test, so as to ensure that the PCA is valid. The correlation coefficient matrix and covariance matrix can be used to solve the matrix. Before solving the matrix, the data need to be standardized. This paper used the covariance solving matrix method. The component contribution ratio is the importance of each principal component among all principal components. Firstly, the cumulative contribution of each feature was calculated by the cumsum function; subsequently, selected the number of principal components with a cumulative contribution rate of 85% and the specific principal components by constructing a for loop; finally, the matrix function was used to filter out the feature vector matrix corresponding to the eligible principal components and calculated the scores of each principal component, selecting the ones with higher scores to become the features for subsequent analysis. 3.3 Prediction Model Data Imbalance Processing In this paper, SMOTE (Synthetic Minority Oversampling Technique) algorithm is used

314

X. Zuo et al.

for balancing the data. The basic idea of SMOTE algorithm is to analyze the minority samples and synthesize new minority samples based on the old ones to add to the dataset, so as to expand the samples. Support Vector Machines SVM (support vector machines) is a type of supervised learning that segments the samples by finding a hyperplane, and the principle of segmentation is interval maximization, which ultimately translates into a convex quadratic programming problem to solve. Decision Tree Decision tree is a supervised learning classifier that is nonparametric. A decision tree contains a number of internal nodes where decisions are made by selecting nodes and the final result is represented by leaf nodes. For a node of a decision tree, after classifying the sample data through it, it is hoped that the entire sample set will be as orderly as possible within their respective categories. The purpose is to minimize the entropy of the sample data after a certain feature is used for classification. The entropy is calculated by the following formula: H (X ) = −

k 

pi logpi

(1)

i=1

Voting Algorithm Voting is a serial integrated learning method, using the basic idea of the minority is subordinate to the majority. It is mainly used for the prediction of classification problems. Voting algorithm is divided into two methods: soft voting and hard voting. In this paper, the soft voting algorithm is used to sum the probabilities of the different classification results predicted by the basic learner, and the one with the highest probability is the final classification result. Bagging Algorithm Bagging (Bootstrap Aggregating) algorithm is an integrated learning algorithm based on Bootstrap resampling technique. Specifically, the Bagging algorithm constructs multiple basic models by randomly sampling data several times, and averages or votes the prediction results of these basic models in a “parallel” way to improve the model performance. Stacking Algorithm Stacking is a strong learner that combines Boosting and Bagging in a “series-parallel” fusion, and its basic learners can be different classes of models. The Stacking model is generally divided into two layers. Firstly, it learns the basic learners, and then uses the results obtained by the first layer of basic learners as input to the second layer of model.

Prediction and Analysis of Stroke Risk Based on Ensemble Learning

315

4 Experiment 4.1 SVM In this paper, the svc function in sklearn package is called to construct the SVM prediction model, rbf is chosen as the kernel function of the model, and the probability function is adjusted to “True” to allow the svc function to perform probability estimationand visualize the model performance of the SVM. The precision_score, recall_score, f1_score, accuracy_score functions in sklearn package are used to evaluate the performance of the SVM model. The results show that the Accuracy Rate is 0.818, Recall Rate is 0.887, Precision Rate is 0.779, and F1 value is 0.830. The low Precision Rate indicates that True Positives are less effective. The confusion matrix shows the specific data of each category in the prediction. Confusion matrix function in sklearn package is called in this paper to calculate the confusion matrix and visualize by heatmap function. Figure 1 shows the confusion matrix of the experimental results of the SVM model. The label of stroke patient is 1, with a total of 968 cases, of which 859 cases were correctly classified using the model. Relatively, the label of non-stroke patient is 0, with a total of 976 cases, of which 732 cases were correctly classified by the model.

Fig. 1. Heat map of SVM model confusion matrix

The ROC (Receiver Operating Characteristic) curve is a visual method to evaluate the performance of the model. Its main function is to judge the recognition ability of a classifier for samples at each threshold. AUC is the area under the ROC curve. The higher the AUC value, the better the classification and identification effect. The AUC value of the SVM model is 0.88, and the effect is not ideal. 4.2 Decision Tree In this paper, the DecisionTreeClassifier function in sklearn is called to construct the decision tree model. The decision tree random seed tree 42 is chosen, and the remaining parameters are default values. The results show that the Accuracy Rate is0.882, Recall Rate is 0.887, Precision Rate is 0.877, and F1 value is 0.882. It can be seen that the performance indicators are all about 88%, indicating that the predictive ability of the decision tree model has room for further increase.

316

X. Zuo et al.

Figure 2 shows the confusion matrix for the experimental results of the decision tree model. The label of stroke patient is 1, with a total of 968 cases, of which 859 cases were correctly classified using the model. Relatively, the label of non-stroke patient is 0, with a total of 976 cases, of which 856 cases were correctly classified by the model.

Fig. 2. Heat Map of Decision Tree Model Confusion Matrix

The AUC value of the decision Tree model is 0.88, and the effect needs to be improved. 4.3 Voting Model This paper use SVM and Decision Tree as the basic learning machine of Voting model. The results show that the Accuracy Rate is 0.882, Recall Rate is 0.887, Precision Rate is 0.877, and F1 value is 0.882. It can be seen that the performance indicators are all over 80%, which improves the model performance compared to the SVM model and equalizes the performance with the Decision Tree model. Figure 3. Shows the confusion matrix for the experimental results of the model. The label of stroke patient is 1, with a total of 968 cases, of which 859 cases were correctly classified using the model. Relatively, the label of non-stroke patient is 0, with a total of 976 cases, of which 856 cases were correctly classified by the model.

Fig. 3. Heat Map of the Voting Model Confusion Matrix

The AUC value of the Voting model reaches 0.93, and the model effect is greatly improved compared with the SVM model and the decision Tree model.

Prediction and Analysis of Stroke Risk Based on Ensemble Learning

317

4.4 Bagging Model The Accuracy Rate of Bagging Ensemble learning on SVM model is 0.824, the Precision Rate is 0.795, the Recall Rate is 0.870, and the F1 value is 0.831. Compared with SVM, the model performance is improved. The Accuracy Rate of Bagging Ensemble learning on Decision Tree model is 0.916, the Precision Rate is 0.878, the Recall Rate is 0.966, and the F1 value is 0.920. The values of all the metrics except the accuracy rate are more than 0.9, which indicates that the Bagging Ensemble method with Decision Tree as the base learner is excellent. The AUC value of Bagging Ensemble learning on the SVM model is the same as that of the SVM model. But, the AUC value of ensemble learning on the Decision Tree model has been greatly improved to 0.97. 4.5 Stacking Model This paper use SVM and Decision Tree as the basic learning machine of Stacking model. The results show that the Accuracy Rate is 0.880, Recall Rate is 0.870, Precision Rate is 0.886, and F1 value is 0.878. It can be seen that the performance indicators have a large improvement compared to the SVM model, and the model performance is not much different compared to the Decision Tree model. The AUC value of the Stacking model reaches 0.93, which is a big improvement compared with both the SVM model and the Decision Tree model constructed in this paper.

5 Conclusion 5.1 Results Analysis Table 1 shows the summary of accuracy, precision, recall and F1 value of each machine learning model. From the table, it is easy to see that the machine learning algorithm model with the best performance is the Bagging Learner with the decision tree model as the basic learner. In addition, Fig. 4 shows the comparison of the ROC curves of all the models, which indicates that the ROC curve of the SVM model is steeper, with the AUC value of 0.88 that can be further optimized. The ROC curve of the Decision tree model has a sharp turning point, with the same AUC value of 0.88, whose performance still has to be improved. The AUC value of the Voting model reaches 0.93, and the model effect is greatly improved compared with the SVM model and the decision Tree mode; The AUC value of the Bagging based on SVM model is the same as the SVM model; The AUC value of the Bagging based on decision tree model reaches 0.97, and the AUC value of Stacking learner reaches 0.93, which is a big improvement compared with the SVM model and the decision tree model. To summarize,the ensemble learning model in this paper are more effective, of which the Bagging ensemble learning model based on decision tree has the best effect.

318

X. Zuo et al. Table 1. Summary of the performance of each model

Model Name

Accuracy Rate

Recall Rate

Precision Rate

F1-value

AUC Value

SVM

0.8184156378600823 0.887396694214876

Decision Tree

0.882201646090535

0.887396694214876

0.8774259448416751 0.8823831535695942 0.88

Voting

0.882201646090535

0.887396694214876

0.8774259448416751 0.8823831535695942 0.93

Bagging(SVM)

0.823559670781893

0.8698347107438017 0.7950897072710104 0.830784410458806

0.7787851314596554 0.8295509415741188 0.88

0.88

Bagging 0.9161522633744856 0.9659090909090909 0.8779342723004695 0.9198229217904574 0.97 (Decision Tree) Stacking

0.8796296296296297 0.8698347107438017 0.8863157894736842 0.8779979144942648 0.93

Fig.4. ROC curves of all models

5.2 Future Work This paper studied the methods for predicting stroke risk and obtained good prediction results, but there are still shortcomings. The following parts can be studied more deeply in future work. (1) Grid search algorithm can be used to optimize the parameters of SVM and decision tree model and serve as the basic learner of Stacking model and Voting model, so as to further optimize ensemble learning. (2) The best performing Bagging model can be used as the main algorithm for predicting stroke risk. A stroke risk prediction system can be designed, which visualizes various features of stroke and adds relevant information of each patient. Using this information, the system can predict whether stroke will occur through the Bagging model, thereby making the project more comprehensive.

Prediction and Analysis of Stroke Risk Based on Ensemble Learning

319

Acknowledgements. This work was sponsored by Shanghai Municipal Education Commission under the contract Z90004.23.001 (Professional Master’s Degree Authorized School Training Project). And this work was also partially sponsored by Sanda University under the contract A020201.23.058 (Key Courses Construction Project).

References 1. Lou, J., Chen, Q., Liu, C., Xiao, L., Yang, H.: Visualization and analysis of research hotspots in stroke extended care. J. Nurse Repair 38(3), 209–215 (2023) 2. Wang, L., Peng, B., Zhang, H., Hongqi et al.: Summary of the china stroke prevention and control report 2020. Chin. J. Cerebrovasc. Dis. 19(2), 136–144 (2022) 3. Li, J., Zhang, Y., Li, H., Yang, Y.: Research progress of machine learning in stroke prediction. Henan Med. Res. 31(20), 3832–3835 (2022) 4. Gao, X., Yan, H., Lin, Q.: High risk factors and prediction of recurrent readmission in elderly ischemic stroke patients. Chin. J. Gerontol. 42(20), 5139–5141 (2022) 5. Zhang, Z., Hou, R.: Construction of a neural network-based prediction model for 180 d recurrence risk of stroke. Inf. Comput. (Theoretical Edition) 34(12), 14–16 (2022) 6. Yang, X., Zhou, X., Yin, X., et al.: Construction of a prognostic prediction model after mechanical thrombolysis for acute stroke based on NIHSS score and multimodal MRI. Chin. J. CT MRI 21(02), 14–16 (2023) 7. Ye, W., Tao, Y., Chen, X., et al.: A deeply integrated optimization method for multicategorical prognosis prediction of stroke. Comput. Eng. Appl. 59(05), 95–105 (2023) 8. Lin, P., et al.: A transferable deep learning prognosis model for predicting stroke patients’ recovery in different rehabilitation trainings. IEEE J. Biomed. Health Inform. 26(12), 6003– 6011 (2022) 9. Zhang, X., Zhang, L., Sui, R.: Construction of a debility prediction model for elderly stroke patients based on logistic regression and artificial neural network. Mil. Nurs. 40(02), 10– 14+19 (2023) 10. Zhou, W., Zhu, M., Lu, Y., Cheng, M., Li, X.: Estimating stroke risk in an elderly population using the framingham stroke probability model. Shanghai Prev. Med. 27(10), 598–601 (2015) 11. Ling, C.X., Huang, J., Zhang, H.: AUC: a better measure than accuracy in comparing learning algorithms. In: Xiang, Y., Chaib-draa, B. (eds.) Advances in Artificial Intelligence. Canadian AI 2003. LNCS, vol. 2671, pp. 329–341. Springer, Berlin (2003). https://doi.org/10.1007/3540-44886-1_25

The Users’ Purchase Behavior Research of Buying Clothing of Live Streaming eCommerce on Tiktok Huang Zheng and Zhiqiang Zhu(B) Sanda University, 2727 Jinhai Rd., Shanghai 201209, China [email protected]

Abstract. This paper focuses on the Tiktok platform and clothing commodity categories, and takes consumers watching live shopping with goods as the main research object to conduct research on user purchase behavior. Based on the S-O-R theoretical model, this study established a research model of the impact of Tiktok platform live broadcast with goods on consumers’ purchase behavior of clothing goods, with the characteristics of Tiktok platform live broadcast room and anchor as the stimulus S, perceived value and satisfaction as the media O, and purchase behavior as the response R. After obtaining data through questionnaire survey, SPSS was used to test the reliability and validity of the collected questionnaire data, and then linear regression was used to further verify the hypothesis, and the Mesomeric effect was tested through the stepwise Law of Return method. After testing and modifying the hypothesis theoretical model, a modified model was obtained, and based on this, operational suggestions were put forward. Keywords: Clothing purchase behavior · Live Streaming eCommerce · S-O-R

1 Introduction With the rapid development of short video platforms represented by Tiktok, online consumption channels of online shopping users have gradually expanded from traditional e-commerce platforms such as Taobao to short video and social platforms. In the first half of 2022, the proportion of users who consume on traditional e-commerce platforms accounted for 27.3% of online shopping users, while the proportion of users who consume on short video live streaming platforms has reached 49.7%. The majority of users are young people, which is of great significance for the development of live streaming platforms. Through literature search, there are many studies on the impact of clothing products Live Streaming ecommerce on consumer purchase behavior, but there is no research on the impact of clothing products on consumer purchase behavior on the Tiktok platform. The paper will take the consumers who purchase clothing goods in the form of Tiktok Platform Live Streaming eCommerce as the research object, explore the influencing factors of consumers’ purchase of clothing goods in the environment of Live Streaming ECommerce, and then explore its influencing mechanism, so as to provide theoretical © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 320–329, 2024. https://doi.org/10.1007/978-981-99-9412-0_33

The Users’ Purchase Behavior Research

321

and empirical basis for the sales strategy of Tiktok Platform Live Streaming ECommerce clothing goods.

2 Construction of Hypothesis Model S-O-R Model (Stimulus Organism Response, Abbreviated as S-O-R Model) This model was proposed by Mehrabian and Russell to explain and analyze the impact of the environment on human behavior Applied in consumer action theory. It is relatively mature in continuous innovation, and many scholars have used this model for research and innovation in consumer psychology and purchasing behavior, achieving research results with theoretical and practical significance. This study takes the S-O-R theory as the theoretical basis and innovates the model and adjusts the parameters based on existing research. Model Related Variable Selection Through the research on Tiktok platform, the following five dimensions are selected to build the model, which are respectively: the visibility, interactivity and entertainment in the live broadcast room characteristics; Professionalism, Opinion leader and preferential lines in the anchor features of Tiktok platform Live Streaming ecommerce clothing products; The functional value, cognitive value, and emotional value in perceived value; Satisfaction with the live broadcast room, anchor, and clothing; User stickiness, purchase, and attraction in purchasing behavior. Tiktok Platform Live Streaming eCommerce Clothing Goods Live Broadcast Room Features. Through the extraction and analysis of the factors in the relevant research of Live Streaming eCommerce and the factors in the relevant research that affect consumers to buy clothes in the Live Streaming eCommerce context, in combination with the characteristics of the Tiktok platform itself, this factor summarizes three measurement indicators, namely: visibility, interactivity, and entertainment. This article combines the above factors and defines visibility as the appearance, flat effect, and wearing effect of the clothing displayed in the live broadcast room; Interactivity is defined as bullet screen Q&A in the live broadcast room, response to bullet screen demands in the live broadcast room, and bullet screen interaction between consumers; Entertainment is defined as being interested in the content of the live broadcast room, having a good atmosphere in the live broadcast room, and having a comfortable overall interface. As shown in Table 1. Tiktok Platform Live Streaming eCommerce Clothing Products Anchor Features. This factor was also obtained through observation and practical investigation of previous studies. In the research on the characteristics of Live Streaming ECommerce anchors, the professionalism of anchors, Opinion leader and trust are common factors. This paper summarizes the relevant factors that affect the sales of clothing products by Live Streaming eCommerce, and summarizes three measurement indicators: professionalism, Opinion leader, and preference. Professionalism is defined as the degree of understanding of the anchor with goods to the clothes with goods, the degree of understandability when explaining the product characteristics, the shaping of their own

322

H. Zheng and Z. Zhu

Table 1. Measurement Indicators of Live Broadcast Room Characteristics of Clothing Commodities on Tiktok Platform. Dimension

Measurement indicators

Live broadcast room characteristics (S1) of clothing goods on the Tiktok platform

X1 Apparel visibility

X11 Appearance Display X12 flat effect

X2 Interactivity

X13 Upper body effect X21 barrage function X22 engages in barrage interaction with the anchor X23 engages in screen interaction with other audiences

X3 Entertainment

X31 Enjoy Watching Live Streaming ECommerce X32 live broadcast atmosphere X33 live streaming interface simple and comfortable

personnel with goods, the combination of Opinion leader and trust in the anchor, and defined as the degree of confidence and fashion recognition of the anchor or clothes; The discount is defined as the price discount and gift giving that the anchor can bring to the clothing when bringing goods. As shown in Table 2. Table 2. Measurement Indexes of Anchor Characteristics of Clothing Goods on Tiktok Platform. Dimension

Measurement indicators

anchor characteristics (S2) X4 professionalism X41 anchor’s understanding of products X42 anchor explain the product clearly X43 anchor persona successful X5 Opinion leader

X51 consumers satisfied with the quality of the goods X52 consumers recognize the goods as the current trend X53 unique insights in the field of clothing matching

X6 discount

X61 price discounts X62 additional gifts when placing an order during live streaming X63 coupons be included when placing orders during live streaming

The Users’ Purchase Behavior Research

323

Perceived Value and Satisfaction. Perceived value is often used as a mediator variable in the S-O-R model, which has a certain degree of reliability. This article combines the research topic and takes perceived value as the mediator variable, and the functional value, cognitive value, and emotional value contained in perceived value as measurement indicators. The functional value in this article refers to the demand for clothing brought in, the cognitive value refers to a deeper understanding of clothing brought in, and the emotional value refers to a greater recognition of Live Streaming eCommerce marketing methods for clothing products. Satisfaction, as a commonly used intermediary variable in consumer behavior, has a certain degree of reliability. This article replaces the commonly used mediating variables such as trust and demand release in the S-O-R model with satisfaction. Satisfaction, as a psychological feeling of happiness or unhappiness generated by users towards the products or services they experience, has a strong impact on customer expectations, perceived quality, customer complaints, customer loyalty, etc. Live Streaming eCommerce is essentially a vivid marketing tool that can be seen as allowing consumers to fully understand the pre-sales display and services of goods. Satisfaction, as a mediating variable, is highly consistent with the research content in this article, and it is not commonly seen in the S-O-R model, making it an innovative attempt. This article divides the measurement indicators of satisfaction into satisfaction with live broadcast rooms, anchors, and products, as shown in Table 3. Table 3. Intermediary variable measurement indicators. Mediating variable

Measurement indicators

Z1 Perceived Value

Z11 Functional Value Z12 Cognitive Value Z13 Emotional Value

Z2 Satisfaction

Z21’s satisfaction with the live broadcast room Z22 satisfaction with anchors Z23’s satisfaction with the carried clothing

Purchasing Behavior. Purchasing, as the ultimate goal of marketing, has always been the most commonly used dependent variable in the S-O-R model. This article will also use purchasing behavior as the dependent variable. This article supplements purchasing behavior by combining the two important elements of user stickiness and traffic flow for Live Streaming ECommerce. In addition to purchasing, the article will add two measurement indicators for purchasing behavior: increasing the number of views of the Live Streaming eCommerce and sharing the live broadcast room with family and friends. The details are shown in Tables 4.

324

H. Zheng and Z. Zhu Table 4. Measurement indicators of purchasing behavior.

Dimension

Measurement indicators

Y1 purchasing behavior

Y11 increases user stickiness Y12 Purchase Clothing Y13 drainage

3 Assumption Model According to the basic theoretical framework of S-O-R, the stimulus variable (S) is divided into visibility, interactivity and entertainment in the Tiktok platform clothing commodity live broadcast room, and the professionalism, Opinion leader characteristics and preference of the anchor; Divide the internal understanding (O) of an organism into two intermediate variables, namely perceived value and satisfaction; The reaction (R) of an organism can be summarized as purchasing behavior. Draw the S-O-R model diagram according to the description (see Fig. 1).

Fig. 1. S-O-R research model diagram.

4 Data Collection and Research Design In order to obtain data for verifying the hypothesis model, this article collects measurement indicator data from various dimensions through a questionnaire survey. The survey questionnaire is mainly divided into two parts. The first part is the gender, age, income and occupation commonly used in Demographics to understand the basic personal information of consumers; The second part is divided into the specific measurement of the live broadcast room characteristics of clothing goods on the Tiktok platform, the anchor characteristics, perceived value, satisfaction and consumer purchase behavior of clothing goods on the Tiktok platform, which is the subject of this survey.

The Users’ Purchase Behavior Research

325

A total of 341 questionnaires were obtained, with 314 valid questionnaires and an effective rate of 92%. Descriptive Statistics of Data In the collected survey questionnaire, women accounted for 44.3% and men accounted for 55.7%. The female sample was slightly smaller than the male sample, and overall it was relatively balanced. In the age information column, consumers aged 18–25 account for 32.8%, those aged 26–30 account for 27.7%, and those aged 31–40 account for 21.3%. The number of consumers in other age groups is relatively small. Consumers in a larger age group have certain personality pursuits and consumption levels; In the occupational information column, the number of individual merchants’ accounts for 36.6% of the overall data, followed by government and public institution personnel, accounting for 21.3% of the overall data. In the monthly income information column, the monthly income ranges from 5001–10000 RMB and 10001–15000 RMB, accounting for 87.3% of the overall data, which is more in line with the social income status within the age group. Based on the above descriptive analysis, it can be seen that this survey is relatively reasonable. Reliability and Validity Testing The stimulus variable (independent variable) set in this paper has two general characteristics, namely, the live broadcast room feature of Tiktok eCommerce clothing products on the Tiktok platform and the anchor feature of Tiktok eCommerce clothing products on the streaming platform. Tiktok platform clothing products live broadcast room features include visibility, interactivity and entertainment; The anchor features of clothing products on the Tiktok platform include professionalism, Opinion leader and preference; Set two mediators (mediating variables), namely perceived value and satisfaction. Set a response (dependent variable) as the purchasing behavior. There are a total of 9 measurable factors. Most of the Clonebach Alpha values of Tiktok platform clothing products (live broadcast room clothing visibility, live broadcast room interaction, live broadcast room entertainment), Tiktok platform clothing products Live Streaming ECommerce anchor characteristics (anchor professionalism, anchor Opinion leader characteristics, anchor preference), perceived value, satisfaction, and purchase behavior exceed 0.7, The overall reliability measurement value is 0.955, and the cumulative contribution rate of variance extracted from the indicator silver is over 50%. Therefore, the validity of factor analysis in this questionnaire survey is also reliable.

5 Empirical Analysis and Model Revision The questionnaire revolves around nine model variables: live broadcast room clothing visibility, interactivity, entertainment, anchor professionalism, Opinion leader, preference, perceived value, satisfaction and purchase behavior. In order to make all the hypotheses be confirmed, regression analysis is used to test the hypothesis and test the correctness of the hypothesis. Regression Analysis of Variables and Perceived Value Regression analysis was conducted using SPSS 26.0, resulting in Table 5. The Durbin

326

H. Zheng and Z. Zhu

Watson coefficient is 1.805, approaching 2.00, indicating that the sample has a certain degree of independence. Because only one predictive variable was measured and there were fewer variables, the R square was observed. In the case of a small number of samples, R means that the end-to-end characteristics of the Tiktok platform can explain 23.1% of the perceived value, which is acceptable. It can be seen from Table 6 that the value of F is 93.948, and P is less than 0.05, indicating that at least one variable will have an impact on satisfaction, and the overall regression of the model is significant. Table 5. Tiktok platform live broadcast room model summary b1. Model

R

R2

After adjustment R2

Errors in adjusted standard estimates

Debin-Watson

1

.481a

.231

.229

1.017

1.805

a Predicted variable: (constant), Tiktok platform live broadcast room characteristic. b Dependent variable: Perceived value (ZI)

The P value of the t-test obtained after the operation is 0.000, less than 0.05, which indicates that the predictive variables have a significant impact on the dependent variables, and that the live broadcast room characteristics of clothing products on the Tiktok platform have a significant regression on the perceived value, which can be further analyzed. The three dimensions of the Tiktok platform live broadcast room, namely clothing visibility, interactivity and entertainment, are taken as independent variables, and perceived value as dependent variables. Regression grading is carried out by using layer by layer regression analysis. The VIF values are all less than 5, indicating that there is no collinearity between each independent variable. It can be seen from the process of linear regression that the t-test probability p value is less than 0.05, and the regression coefficients are all positive numbers, indicating that the visibility, interactivity and entertainment of clothing in the live broadcast room have a positive impact on the perceived value. Through analysis, it is assumed that the characteristics of the Tiktok platform clothing commodity Live Streaming Ecological broadcast room have a positive impact on the perceived value of consumers, The visibility of clothing within the live broadcast room positively affects consumers’ perceived value, the interactivity within the live broadcast room positively affects consumers’ perceived value, and the entertainment within the live broadcast room positively affects consumers’ perceived value. The following formula is obtained: Perceived Value (Z1) = 1.162 + 0.161X 1 + 0.251X 2 + 0.282X 3

(1)

From this, it can be seen that in order to enhance user perceived value, anchors during Live Streaming ECommerce need to pay attention to the visibility of the live broadcast room’s merchandise, pay attention to interaction with the audience, and also activate the atmosphere of the live broadcast room to make it more interesting. In this way, the increase in user perceived value is likely to lead to purchasing behavior.

The Users’ Purchase Behavior Research

327

Similarly, three dimensions of the Tiktok platform clothing delivery anchor as independent variables, namely the anchor professionalism (X4), Opinion leader (X5) and preference (X6), as well as the perceived value as dependent variables, are regression graded, and the hierarchical regression analysis is used to obtain the following results: Perceived Value (Z1) = 1.289 + 0.265X 4 + 0.281X 5

(2)

From the regression process, it can be seen that the t-test probability p-value of anchor professionalism and Opinion leader is less than 0.05, and the regression coefficients are positive, indicating that the anchor professionalism and Opinion leader have a positive impact on perceived value, and the preference t-test probability p-value is 0.075 greater than 0.05, which does not have a positive impact on perceived value. Through analysis, it is concluded that the hypothesis that the Tiktok platform clothing commodity Live Streaming ECommerce anchor characteristics have a positive impact on consumers’ perceived value is valid, the hypothesis that the anchor’s professionalism has a positive impact on consumers’ perceived value is valid, the hypothesis that the anchor’s Opinion leader characteristics have a positive impact on consumers’ perceived value is valid, and the hypothesis that the anchor’s preference has a positive impact on consumers’ perceived value is not valid. Regression Analysis Conclusion of Various Variables and Satisfaction Through the same analysis process as “4.1 Regression Analysis of Variables and Perceived Value”, the hypothesis that the live broadcast room characteristic positively affects consumer satisfaction was validated, the hypothesis that clothing visibility positively affects consumer satisfaction was validated, the hypothesis that interactivity positively affects consumer satisfaction was validated, and the hypothesis that entertainment positively affects consumer satisfaction was validated. The assumption that the anchor’s characteristics positively affect consumer satisfaction is valid, the assumption that the anchor’s Opinion leader’s characteristics positively affect consumer satisfaction is valid, the assumption that the anchor’s preference positively affects consumer satisfaction is valid, and the assumption that the anchor’s professionalism positively affects consumer satisfaction is not valid. Conclusion of Regression Analysis on Perceived Value, Satisfaction, and Consumer Buying Behavior Through regression analysis of perceived value, satisfaction, and consumer purchasing behavior, it is verified that both perceived value and satisfaction positively affect users’ purchasing behavior of clothing products, and have a positive impact on increasing user stickiness, purchasing, and attracting customers. And perceived value positively affects satisfaction. The two complement each other. Regression Analysis Conclusion of Perceived Value and Satisfaction Mesoamerica Effect After regression analysis, it is concluded that perceived value plays a partial mediating role between live broadcast room clothing visibility, interactivity, entertainment, and purchasing behavior. The assumption that the visibility of clothing in the live broadcast

328

H. Zheng and Z. Zhu

room positively affects its purchasing behavior through perceived value is valid; The hypothesis that interactivity within a live broadcast room positively affects its purchasing behavior through perceived value is valid; The hypothesis that entertainment in a live broadcast room positively affects its purchasing behavior through perceived value is valid. Satisfaction plays a mediating role between the anchor Opinion leader, preference and purchase behavior, satisfaction plays a complete mediating role between the anchor Opinion leader and purchase behavior, and satisfaction plays a partial mediating role between the anchor preference and purchase behavior. The assumption that the Opinion leader character of the anchor can positively affect the purchase behavior through satisfaction is tenable, the assumption that the preference of the anchor can positively affect the purchase behavior through satisfaction is tenable, and the assumption that the professionalism of the anchor can positively affect the purchase behavior through satisfaction is not tenable.

6 Model Correction Based on the hypothesis test results, the model was modified by removing the effects of preference on perceived value, professionalism on satisfaction, perceived value as a mediator between preference and purchasing behavior, and satisfaction as a mediator between professionalism and purchasing behavior. The modified model is shown in Fig. 2.

Fig. 2. Correction Model.

7 Conclusion Through data analysis, the conclusions can be drawn:

The Users’ Purchase Behavior Research

329

The visibility of clothing in the live broadcast room on the Tiktok platform, the interaction in the live broadcast room and the entertainment effect in the live broadcast room affect consumers’ purchase behavior through the perceived value and satisfaction of the live broadcast room on the Tiktok platform. The professionalism and Opinion leader characteristics of the Tiktok platform clothing anchor influence consumers’ purchase behavior through the perceived value of the Tiktok platform clothing anchor. The Opinion leader characteristics and preferential characteristics of Tiktok platform clothing anchors affect consumers’ purchase behavior through satisfaction with Tiktok platform clothing anchors. The perceived value of the audience will increase their satisfaction with the live broadcast room’s anchors and clothing.

Bibliography 1. Tang, F.: E-commerce will be the next key growth engine of short video platform. China Economic Times, 20 Sept 2022 (002). https://doi.org/10.28427/n.cnki.njjsb.2022.002142 2. Han, M.: A brief analysis of the current situation of short video platform live broadcasting – taking Tiktok as an example. Mark. Circle 14, 8–10 (2022) 3. Feng, J., Lu, M.: Empirical study on impulsive purchase intention in live marketing in the mobile internet era. Soft Sci. 34(12): 128–133,144 (2020) 4. Lu, Y., Tang, D.: Research on Consumer Behavior of Tiktok Platform. Today Media 30(06), 53–55 (2022) 5. Guo, Q., Wenqi, L.: Comparative study on live streaming and sales on e-commerce platforms and short video platforms. Media 09, 49–52 (2022)

Exploration on the Evaluation Index System of E-commerce Application Talents’ Literacy for Industry Needs Zhiqiang Zhu(B) Sanda University, 2727 Jinhai Rd., Shanghai 201209, China [email protected]

Abstract. The rapidly developing e-commerce industry has a huge demand for talents, and currently, the cultivation of talents in universities cannot meet the industry’s needs in terms of quantity and quality. To help universities effectively evaluate the quality of talent cultivation, this study revised the three-level indicators and weights of the professional literacy evaluation index system for vocational undergraduate students based on enterprise recruitment information, and established an e-commerce application talent literacy evaluation index system. Keywords: E-commerce talent cultivation · professional literacy evaluation index system · third level indicators

1 Introduction “The 50th Statistical Report on the Development of China’s Internet in 2022” shows that by June 2022, the number of Online shopping users in China has reached 841 million, accounting for 80.0% of all Internet users. Due to the epidemic, the development of e-commerce in China has slowed down, but as the epidemic is comprehensively controlled, the momentum of China’s economic development is gradually rebounding. The development of the digital economy, live streaming e-commerce, rural e-commerce, cross-border e-commerce, and industrial internet platforms have brought new development opportunities to e-commerce. The demand for e-commerce talents in the industry has seen a leapfrog growth, especially in the increasingly competitive environment, which has led to a sharp increase in the talent gap for pedestrians. According to data from the China E-commerce Report (2020), the employment demand for e-commerce and related services will reach 70 million in the future. The demand for talents is not only in terms of quantity, but there is a significant gap between the quality of e-commerce talents cultivated by universities and the requirements of the industry. In order to meet market demand, in recent years, various universities have continuously explored and summarized the training models and approaches for applied talents. They have conducted beneficial research and practice in areas such as school enterprise cooperation, integration of industry and education, optimization of talent training plans, and exploratory teaching models, and have also achieved certain results. However, these © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 330–335, 2024. https://doi.org/10.1007/978-981-99-9412-0_34

Exploration on the Evaluation Index System

331

studies and practices focus on the exploration of model innovation and training methods, and there is currently no market based research on the literacy of e-commerce talents. Therefore, this study starts from the recruitment needs of market related positions, based on the existing theoretical indicator system of applied talent literacy, summarizes and studies the requirements for e-commerce applied talent literacy, establishes an evaluation indicator system for e-commerce applied talent literacy, and explores the effectiveness evaluation of talent cultivation in a targeted manner.

2 Assumption Evaluation Indicator System Liu Chunguang [1] et al. established a professional literacy evaluation index system for vocational undergraduate students in their research. The professional quality evaluation index system of vocational college students established by him is based on and uses the brainstorming method and Delphi Method to determine the first level indicators after soliciting experts’ opinions, and on this basis, the second level indicators and the third level indicators are formed after consulting experts through the consultation form. This system includes two first level indicators, specialized professional literacy and universal professional literacy, seven second level indicators, including professional knowledge literacy, professional skills literacy, professional ethics and personality literacy, professional ideal and belief literacy, professional awareness literacy, professional psychological literacy, and professional behavior literacy, as well as 32 third level indicators. Weights have been set for the second and third level indicators. It can be found that in the secondary indicators of the professional literacy evaluation index system for vocational undergraduate students, the weights of professional ethics, personality, and psychological literacy are the highest, both exceeding 0.2. However, the weights of professional awareness literacy and professional ideal belief literacy are close to 0.2, while the weights of professional skill literacy are only 0.1. The weights of professional knowledge literacy and professional behavior literacy are very low (see Fig. 1). Professional psychological literacy Professional ethics and personal qualies

0.30 0.25 0.20 0.15 0.10 0.05 0.00

Professional behavior literacy

Professional knowledge and literacy

Professional awareness and literacy

Professional ideals, beliefs, and literacy

Professional Skills and Literacy

Fig. 1. Weight distribution of secondary indicators for applied talent literacy evaluation.

332

Z. Zhu

The cultivation of e-commerce talents belongs to the category of vocational education. This study assumes that the evaluation index system for vocational undergraduate students’ professional literacy at all levels and the weights of the indicators are also applicable to the evaluation index system for the literacy of e-commerce applied talents.

3 Evaluation Index System for the Literacy of E-commerce Applied Talents 3.1 Data Collection and Descriptive Analysis To verify the accuracy of the evaluation system for e-commerce talents, this study verifies and revises it based on the employment needs of enterprises. This article collects recruitment information from 270 companies in Shanghai, Beijing, and Shenzhen, published by three major recruitment websites in China in July 2023. The information collected includes seven fields: position, region, education background, company nature, company size, number of recruiters, and job description. Through classification and preliminary statistics, it can be found that the 270 enterprises that released information are classified according to the nature of their units. The top three enterprises in the rankings are private (165 companies), other (35 companies) companies, and wholly foreign-owned (22 companies), accounting for over 80% of the total. The rest are joint-stock enterprises (17 companies), joint ventures (14 companies), state-owned enterprises (8 companies), listed companies (8 companies), and Hong Kong, Macao, and Taiwan companies (1 company). According to the size of the enterprise, the top three are enterprises with 20–99 employees, 100–299 employees, and less than 20 employees, which are 45.19%, 23.33%, and 13.70%, respectively. The nature and size distribution of recruitment enterprises obtained in this article are consistent with the overall situation of enterprises in the whole society. And it can be seen that small and medium-sized enterprises provide 89.64 job opportunities. 3.2 Revision of Hypothesis Evaluation Indicator System and Establishment of E-commerce Applied Talent Literacy Evaluation Indicator System The quality evaluation index system for e-commerce applied talents still retains the first and second level indicators of the vocational undergraduate student quality evaluation index system, only revising the third level indicators. According to the third-level indicators of the professional literacy evaluation index system for vocational undergraduate students, the recruitment information of 270 enterprises was analyzed to obtain the weights of the third-level indicators of the e-commerce applied talent literacy evaluation index. The system of professional literacy evaluation indicators for vocational undergraduate students was revised to obtain the e-commerce applied talent literacy evaluation index system. There are a total of 32 third-level indicators, which are used to count the number of companies (frequency, X) that have made corresponding requirements in the recruitment information of 270 companies. They are compared with the total number of times (Y) that

Exploration on the Evaluation Index System

333

all third-level indicators appear in the recruitment information, and the revised global weight (Z) of the third-level indicator is obtained. The calculation formula is as follows: Z = X /Y

(1)

The acquisition of X is obtained by counting the number of times the keywords of each indicator appear in the recruitment information. Each three-level indicator may have multiple keywords, such as “organizational management ability” achieved by keywords such as “organization”, “management ability”, “coordination”, “planning”, etc. In the statistical process, X is obtained by correcting the number of repeated occurrences of multiple keywords in a company. Meanwhile, in data analysis, it was found that the specific requirements proposed by enterprises usually cover multiple tertiary indicators. This study merged these indicators into one: professional ethics, dedication, patriotism, social responsibility, and labor consciousness. The depth and breadth of professional knowledge, as well as humanities and social sciences knowledge, were merged into one, while willpower, emotional management, and self-management and control abilities were merged into one, Excellence and preciseness are combined into one item, professional identity, professional values and professional Career development are combined into one item, and regulatory awareness and safety awareness are combined into one item. The third level indicators have been merged from the original 32 items to 20 items. Following the design principles of the hypothetical evaluation index system, the system is designed according to the first level index, the first level index weight, the second level index, the second level index weight, the third level index, and the third level index weight. The weight of the indicators at all levels is recalculated according to the proportion of the third level index, and the e-commerce application-oriented talent literacy evaluation index system is obtained, as shown in Table 1. Through comparison, it can be found that the revised evaluation index system for the literacy of e-commerce applied talents has undergone significant changes in the weights of various indicators in the evaluation index system for vocational undergraduate students’ professional literacy. In the secondary indicators, the weight of professional ethics, personality, and psychological literacy is the highest, both exceeding 0.2. However, the weight of professional awareness and ideal belief literacy is close to 0.2, while the weight of professional skill literacy is only 0.1. The weight of professional knowledge literacy and professional behavior literacy is very low, as shown in Fig. 4. The weight of professional knowledge literacy has changed from 0.0402 before the revision to 0.2342 after the revision, and the professional skill literacy has also increased from 0.0536 to 0.765734. The professional behavior literacy has also increased from 0.2613 to 0.551864, which has undergone significant changes. This reflects the different requirements and expert opinions that companies have for applicants during recruitment. Companies need applicants to possess significant knowledge and skills that can be applied to their work as soon as possible. Experts place more emphasis on cultivating students’ internal professional qualities, emphasizing the cultivation of solid basic knowledge and abilities, as well as their future career development. The two secondary indicators of professional ethics, personality literacy, and professional ideal and belief literacy have significantly decreased after revision, from 0.1005

334

Z. Zhu Table 1. Evaluation Index System for the Literacy of E-commerce Applied Talents.

Evaluation indicators for the literacy of e-commerce applied talents

First level indicators

weight

Second level indicators

weight

Third level indicators

weight

Specialized Professional Literacy

0.162407723

Professional knowledge and literacy

0.234266

Professional and Social Science Knowledge

1

Professional Skills and Literacy

0.765734

A rigorous and refined work attitude

0.223744

Operational ability

0.776256

Professional ethics and personal qualities

0.041356

Social Ethics and Professional Ethics

1

Professional ideals, beliefs, and literacy

0.081356

Career Value and Career Development

1

Professional awareness and literacy

0.288814

Universal Professional Literacy

0.837592277

Professional psychological literacy

0.03661

Professional behavior literacy

0.551864

Service Awareness

0.215962

Cooperative awareness

0.50939

Obedience consciousness

0.129108

Regulatory awareness and safety awareness

0.037559

Competitive awareness

0.107981

Anti-frustration ability

0.537037

Aggressiveness

0.462963

Expression and communication skills

0.250614

Self-management

0.017199

Problem solving and execution ability

0.223587

Information collection and processing capabilities

0.128993

Management ability

0.194103

Innovation

0.030713

Learning ability

0.127764

Professional Etiquette Performance

0.027027

Exploration on the Evaluation Index System

335

and 0.1551 to 0.0414 and 0.0814, respectively. This does not mean that the company overlooks the moral character and professional beliefs of applicants. This decrease can be explained by the fact that companies examine these two tertiary indicators through employees’ attitudes towards work and their abilities. Similarly, there has been a significant decrease in professional psychological literacy after the revision, which can be explained by the fact that enterprises evaluate employees’ psychological literacy through professional behavior literacy. Therefore, in the revised indicator system, the weight of professional behavior literacy has been significantly increased.

4 Conclusion The establishment of an evaluation index system for the literacy of e-commerce applied talents has reference significance for the cultivation of e-commerce talents in vocational colleges. The traditional educational philosophy of universities lies in the talent allocation model of “emphasizing foundation and broadening scope”, which enables students to have a solid knowledge foundation, be able to meet the requirements of multiple fields of work, and have the potential for long-term career development. There is a certain gap between this and the requirements of enterprises for applicants to master professional skills and be able to quickly adapt to current job requirements. How to balance the conceptual differences between the two still requires continuous research and exploration.

Bibliography 1. Liu, C., Xie, J.: The exploration and construction of the evaluation index system of professional quality of students in vocational colleges. Forum Contem. Educ. 2 (2023) 2. Min, Z.: Application oriented college students based on employment orientation exploration of professional literacy cultivation. Employ. Entrep. https://doi.org/10.19424/j.cnki.41-1372/ d.2023.08.001 3. Bing, W.: Research on vocational literacy education for applied undergraduate college students in the new era. J. Dezhou Univ. 38(6) (2022) 4. Yue, K., Kan, Y.I., Wang, Y.U., Qian, W.: Knowledge graph association query processing for e-commerce application. Comput. Integr. Manuf. Syst. 26(5) (2020) 5. Peng, L., Wang, J.: The problems and countermeasures of cultivating applied talents in ecommerce in universities. Theor. Pract. Educ. 40(30), P13–P15 (2020)

Research on Cosmetics Consumer Behavior Analysis and Marketing Strategy Optimization on E-Commerce Platform “Xiaohongshu” Xuwei Zhang1 and Zhiqiang Zhu2(B) 1 School of Information Science and Technology, Sanda University, 2727 Jinhai Rd.,

Shanghai 201209, China 2 Sanda University, 2727 Jinhai Rd., Shanghai 201209, China

[email protected]

Abstract. Based on consumer behavior, marketing strategies, and related theories, this article takes the “Xiaohongshu” e-commerce platform as the research object. Through questionnaire surveys, literature research, and data analytics, it studies the consumer behavior and marketing strategies of China’s cross-border ecommerce platforms, and proposes targeted improvement suggestions from product, price, service, and website visibility, to provide reference for the development of “Xiaohongshu” and other cross-border e-commerce platforms. Keywords: Cross-border e-commerce platforms · Consumer Behavior · “Xiaohongshu”

1 Introduction In recent years, with the significant improvement of people’s revenue, consumers’ demand for overseas goods has become increasingly strong, leading to the emergence of cross-border e-commerce platforms. At the same time, consumers’ Consumer behavior and habits have also changed, which puts forward new requirements for cross-border e-commerce platforms. Among these platforms, cosmetics, as a necessity for people, are among the top selling products on various overseas purchasing websites. Among them, “Xiaohongshu” is one of the main purchasing websites for overseas cosmetics. The platform initially served as a lifestyle sharing platform, later it involved cross-border e-commerce business to provide decision-making for consumers. In just five months, sales reached over 200 million RMB. As of July 2019, the number of users of “Xiaohongshu” has exceeded 300 million, and the monthly active number has exceeded 100 million, becoming a dark horse in China’s cross-border e-commerce. There is currently a lot of research on online consumption strategies for cosmetics. These studies have also identified some problem, such as incomplete online marketing and sales environment for cosmetics, incomplete logistics network, and incomplete credit system. In addition, many enterprises also have issues such as low sensitivity to customer product demands, incomplete brand strategies, chaotic product pricing, and slow channel © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 336–344, 2024. https://doi.org/10.1007/978-981-99-9412-0_35

Research on Cosmetics Consumer Behavior Analysis

337

construction and updates. But these issues are all focused on the enterprise level. No research has been found on online sales platforms that are the main channels for cosmetics sales. This research takes the “Xiaohongshu” platform as an example and analyzes the data obtained from user surveys to study the platform’s online marketing strategies, providing reference for “Xiaohongshu” and other cross-border e-commerce platforms.

2 Data Collection 2.1 Distribution and Recovery of Questionnaires The questionnaire for the study was mainly conducted on the Internet (Questionnaire Star platform) and the survey links were sent through WeChat, QQ and other platforms. A total of 320 questionnaires were collected, with 300 valid questionnaires, accounting for 93.75%. 2.2 Statistical Analysis of Sample Status The information on the collected data is shown in Table 1. Table 1. Basic information of the respondents. Variable

option

Number Percent (%)

Gender

Female

174

58.0

Male

126

42.0

Age

Education

Under 18years old (included)

60

20.0

18–30 (included) years old

109

36.3

31–40 (included) years old

45

15.0

41–50 (included) years old

40

13.3

50 years old above

46

15.3

Under high-school (included)

83

27.7

College

62

20.7

Undergraduate

102

34.0

Post graduate

53

17.7

Occupation Enterprise staff

55

18.3

Employees of educational/scientific research institutions

41

13.7

Government staff

35

11.7

Freelance

53

17.7

Student

69

23.0

Others

47

15.7

338

X. Zhang and Z. Zhu

From a gender perspective, there are more female respondents, accounting for 58.0%; Male accounted for 42.0%. In terms of age group, the number of people aged 18–30 is the highest, with 109 people accounting for approximately 36.3% of the total population, followed by 60 people aged 18 and below, accounting for 20.0% of the total population. The number of people aged 31–40, 41–50, and 50 and above is relatively small, accounting for approximately 15.0%, 13.3%, and 15.3% of the total population, respectively. From the perspective of educational background, the number of undergraduate students is the highest, with 102 students accounting for 34.0% of the total number, followed by 83 students from high school and below, accounting for approximately 27.7% of the total number. The number of students with associate’s degree and master’s degree or above is relatively small, accounting for approximately 20.7% and 17.7% of the total number, respectively. In terms of occupation, the number of students is the largest, 69, accounting for 23.0% of the total number, followed by enterprise staff and Freelancer staff, 55 and 53, respectively, accounting for 18.3% and 17.7% of the total number, while the number of staff, civil servants and others in education/research institutions is relatively small, accounting for 13.7%, 11.7% and 15.7% of the total number, respectively. 2.3 Statistical Analysis of the Usage of “Xiaohongshu” In order to better analyze the usage of “Xiaohongshu”, based on the survey questionnaire, this article will describe it from three aspects: usage duration, shopping frequency, and selection reasons. As shown in Fig. 1, among the respondents, the number of people who used the Little Red Book for 1 to 3 years was the highest, accounting for approximately 40.67% of the total number, followed by the number of people who used it for 6–12 months and 3 to 5 years, accounting for 20.00% and 16.33% of the total number, respectively. However, the number of people who used it for less than 5 years and less than 6 months, accounting for 12.67% and 10.33% of the total number, respectively. It can be seen that the past three years have been a period of rapid growth for the users of “Xiaohongshu”.

Fig. 1. Statistical distribution of the duration of consumer use of “Xiaohongshu”.

From Fig. 2, it can be seen that among the respondents, the proportion of people who shop 1–3 times a month on the “Xiaohongshu” app is the highest, accounting for about 43% of the total number. People who shop 1–3 times or more per week account for 34% of the total number, while those who shop 3–5 times a half year account for 17% of the total number. People who shop less than 2 times a half year account for the lowest, accounting for 6% of the total number. In summary, through investigation, it was found that consumers who currently use “Xiaohongshu” have a higher frequency of shopping on “Xiaohongshu”.

Research on Cosmetics Consumer Behavior Analysis

339

Fig. 2. Frequency distribution of consumers using “Xiaohongshu” for shopping.

Among the respondents, the highest number of people believed that the quality of the “Xiaohongshu” platform was guaranteed and that the platform was cheap, accounting for about 20.67% and 19.00% of the total, respectively. The second highest number of people were those who had website awareness, good service attitude, and complete products, accounting for 16.33%, 12.67%, and 11.33% of the total, respectively. However, the number of people who chose “Xiaohongshu” due to fast logistics speed, accounting for about 8.67% of the total, In addition, 11.33% of consumers choose other options. From this, it can be seen that quality assurance and cheap prices are important reasons that affect consumers’ use of “Xiaohongshu”. In summary, through research on the usage of “Xiaohongshu”, it was found that the frequency of consumers making online purchases on “Xiaohongshu” is as high as 77%, with a frequency of 1–3 times or more per month, indicating that consumers have a high level of trust in “Xiaohongshu”. From the perspective of usage time, the usage time of “Xiaohongshu” is mainly within 3 years, indicating that with the promotion, “Xiaohongshu” has gained more and more recognition from consumers. From the analysis of the reasons for choosing, it was found that the product quality is the most concerned by consumers, indicating that the quality of the product has become the main factor affecting consumers’ shopping in “Xiaohongshu”.

3 Empirical Results and Analysis 3.1 Analysis of Factors Influencing Consumers’ Use of “Xiaohongshu” for Shopping In order to better analyze the reasons that affect consumers’ use of “Xiaohongshu”, based on the survey questionnaire, this article will describe it from three aspects: products, shopping community recommendations, and logistics system. Product Factors. This article studies the level of consumer recognition of “Xiaohongshu” product types, prices, quality, product introduction, and update speed. The specific content is shown in Table 2. It can be seen that among the respondents, there are more people who believe that the quality of products on the “Xiaohongshu” platform is guaranteed, the price is cheap, and the introduction is detailed, accounting for 58.27%, 52.59%, and 50.92% of the total number, respectively. This indicates that the majority of consumers are relatively satisfied with the quality, price, and introduction of products on the “Xiaohongshu” platform. Consumers who believe that the “Xiaohongshu” platform has a wide variety of product brands and fast update speed are relatively small, accounting for

340

X. Zhang and Z. Zhu

49.12% and 46.58% of the total number, respectively. This indicates that “Xiaohongshu” needs further optimization in product types and update speed. Table 2. Consumer recognition of “Xiaohongshu” products and related factors (%). Question

Option Strangely Agree

Agree

Not sure

Dis-agree

Strangely Disagree

Wide range of brands 22.55

26.57

30

17.58

3.3

Low Price

28.24

24.35

24.73

15.29

7.39

Quality Assured

27.72

30.55

20.19

13.34

8.2

Detailed product

17.31

33.61

22.47

19.41

7.2

Fast products update

22.34

24.24

23.68

20.43

9.31

Shopping Community Factors. From Table 3, it can be seen that among the respondents, 54.88%, 64.25%, and 53.55% of the respondents believe that “Xiaohongshu” shopping sharing community has a significant impact on consumers’ shopping decisions, deepening their understanding of products, and believing that “Xiaohongshu” notes are trustworthy, respectively. This indicates that “Xiaohongshu” shopping sharing community will have a significant impact on consumers’ shopping and have a positive effect on improving “Xiaohongshu” customer conversion rate. Table 3. Consumer Recognition of “Xiaohongshu” Shopping Sharing Community (%). Question

Option Strangely Agree

Agree

Not sure

Dis-agree

Strangely Disagree

The shopping 29.43 community will affect my shopping decisions

25.45

21.63

14.97

8.52

The shopping community allows me to fully understand the products

30.23

34.02

21.43

11.11

3.21

“Xiaohongshu” notes are authentic and trustworthy

27.88

25.67

20.67

16.13

9.65

Logistics Factors. From Table 4, it can be seen that among the respondents, there are more people who believe that “Xiaohongshu” has fast logistics delivery speed and

Research on Cosmetics Consumer Behavior Analysis

341

good customer service attitude, accounting for 53.49% and 51.46% of the total number, respectively. This indicates that most consumers are relatively satisfied with “Xiaohongshu”‘s logistics delivery speed and customer service attitude. There are relatively few consumers who believe that the return and exchange of “Xiaohongshu” is fast and the transaction security is guaranteed, accounting for 38.79% and 49.10% of the total number, respectively. This indicates that “Xiaohongshu” needs further optimization in terms of return and exchange and transaction security, especially in terms of convenience. Table 4. Consumer recognition level of “Xiaohongshu” logistics system (%). Question

Option Strangely Agree

Agree

Not sure

Dis-agree

Strangely Disagree

Fast logistics delivery

25.02

28.47

20.27

18.01

8.23

Good Customer service personnel

28.43

23.03

20.56

18.44

9.54

Convenient and effective return and exchange of goods

16.34

22.45

33.43

19.68

8.1

Transaction security 18.76

30.34

27.45

16.76

6.69

In summary, through research on the factors that affect consumers’ use of “Xiaohongshu” for shopping, it was found that from the perspective of product factors, consumers are relatively satisfied with the quality, price, and product introduction of the “Xiaohongshu” platform, but optimization is needed in terms of product types and update speed. From the perspective of shopping community factors, shopping sharing communities will have a significant impact on consumers’ shopping, and should be promoted in a progressive manner. From the perspective of logistics factors, consumers are relatively satisfied with the logistics delivery speed and customer service of “Xiaohongshu”, but optimization is needed in terms of returns, exchanges, and transaction safety.

3.2 Cross Analysis of Survey Data In order to better promote the development of “Xiaohongshu”, we must understand the reasons why consumers choose “Xiaohongshu”. Therefore, this article uses SPSS 20.0 software to conduct a cross analysis of consumers’ gender, age, education, occupation, and the reasons why consumers choose “Xiaohongshu”, in order to understand the relationship between different factors. The reasons for choosing “Xiaohongshu” online shopping vary depending on the gender of consumers. From Table 5, it can be seen that male consumers choose “Xiaohongshu” more because of its guaranteed quality and website visibility, while female consumers choose “Xiaohongshu” more because of its cheap price and guaranteed quality. According to the Chi-squared test results of gender and choice reasons (see Table 6),

342

X. Zhang and Z. Zhu

gender has no significant impact on the reasons why consumers choose to shop in Little Red Book. Table 5. Cross Table of Gender and Reasons for Selection. Reasons for choosing “Xiaohongshu” online shopping Rich variety Gender

M

F

Total

total

Good service

Quality Assured

Low Price

Fast Logistics

High website reputation

Other

Number

14

12

31

18

15

19

17

Percent/%

11.1%

9.5%

24.6%

14.3%

11.9%

15.1%

13.5%

126 100.0%

Number

20

26

31

39

11

30

17

174

Percent/%

11.5%

14.9%

17.8%

22.4%

6.3%

17.2%

9.8%

100.0%

Number

34

38

62

57

26

49

34

300

Percent/%

11.3%

12.7%

20.7%

19.0%

8.7%

16.3%

11.3%

100.0%

Table 6. Chi-squared test test results of gender and selection reasons.

Pearson chi-square

value

df

Progressive Sig (Both sides)

9.604a

6

.142

likelihood ratio, LR

9.663

6

.140

Linear and linear combination

.368c

1

.544

N in valid cases

300

The reasons for choosing “Xiaohongshu” online shopping vary depending on the age of consumers.It can be seen that consumers aged 20 and below, 41 to 50 years old, and 51 years old and above are more interested in product quality. Among them, consumers aged 20 and below have a certain level of comparable consumption, while consumers aged 41 to 50 and 51 years old and above pursue quality more. Consumers aged 21 to 30 and 31 to 40 are more focused on cost-effectiveness and hope to obtain affordable products. According to the Chi-squared test test results of age and reasons for choice, age has a significant impact on the reasons why consumers choose to buy in “Xiaohongshu”. Consumers have different educational backgrounds, and there are also certain differences in the reasons for choosing “Xiaohongshu” online shopping. Consumers with higher education levels pay more attention to product quality, while consumers with lower education levels pay more attention to product prices. At the same time, it was also found that consumers with a bachelor’s degree place more emphasis on website visibility. According to the Chi-squared test test results of educational background and reasons for choosing, educational background has a significant impact on the reasons why consumers choose to buy in “Xiaohongshu”. Consumers have different educational backgrounds, and there are also certain differences in the reasons for choosing “Xiaohongshu” online shopping. Consumers with higher education levels pay more attention to product quality, while consumers with

Research on Cosmetics Consumer Behavior Analysis

343

lower education levels pay more attention to product prices. At the same time, it was also found that consumers with a bachelor’s degree place more emphasis on website visibility. According to the Chi-squared test test results of educational background and reasons for choosing, educational background has a significant impact on the reasons why consumers choose to buy in “Xiaohongshu”. There are also certain differences in the reasons why consumers choose “Xiaohongshu” online shopping due to different professions. It can be seen from Table 4-11 that the staff of education/research institutions and civil servants pay more attention to product quality and service attitude, while students, Freelancer and enterprise staff pay more attention to product price and website popularity. According to the Chi-squared test test results of occupation and reasons for choice (Table 4-12), occupation has a significant impact on the reasons why consumers choose to buy in “Xiaohongshu”.

4 Conclusion The past three years have been a period of rapid growth for users of “Xiaohongshu” but the product types and update speed need to be optimized. Based on the research results, suggestions for optimizing marketing strategies are proposed as follows: Product Strategy. “Xiaohongshu” still needs to actively expand its suppliers and improve the platform’s product supply chain. “Xiaohongshu” should also increase the types of cosmetics appropriately based on consumer preferences and actual situations. From the perspective of product description, “Xiaohongshu”‘s shopping sharing community has greatly improved consumers’ purchase rate. If “Xiaohongshu” can further provide detailed descriptions of the product while ensuring the authenticity and reliability of the description information, it will further increase consumers’ purchasing demand. “Xiaohongshu” e-commerce platform should also develop the men’s and infant market and actively expand the rural market. Price Strategy. “Xiaohongshu” must develop a reasonable pricing strategy. The differentiated Pricing should be adopted. “Xiaohongshu” should also actively use the combination pricing method to carry out Product bundling, use holidays for discount promotions, and introduce featured goods to attract customers to come to spend. Service Strategy. “Xiaohongshu” should actively strengthen the training and management of customer personnel, improve the quality of customer service, enable consumers to fully understand the information, performance, and added value of cosmetics, and provide a basis for consumers’ purchasing decisions. For customer inquiries, it is necessary to handle them reasonably and effectively to improve customer perception. “Xiaohongshu” must strengthen cooperation with logistics companies, not only to improve the accuracy of distribution levels, but also to supervise the execution of orders. In addition, personalized recommendations can be made based on consumers’ purchasing history and consumption habits, with precise advertising placement to help consumers discover products of interest.

344

X. Zhang and Z. Zhu

Enhancing Website Visibility. Website popularity is an important influence on consumer consumption. The platform should actively expand publicity through public platforms such as webpage, WeChat, microblog, blog, Zhihu, Tiktok and Kwai. When promoting, it is important to vigorously showcase the characteristics of the “Xiaohongshu” website and use live streaming to attract more traffic. In terms of advertising placement, online celebrity marketing and variety show advertising can be used to directly obtain traffic. It is also possible to create topics on Weibo, attract consumer discussion and attention, achieve the effect of website promotion, and thereby increase awareness.

References 1. Tang, F.: E-commerce will be the next key growth engine of short video platform. China Economic Times, 20 Sept. 2022 (002). https://doi.org/10.28427/n.cnki.njjsb.2022.002142 2. Guo, Q., Liu, W.: Comparative study on live streaming and sales on e-commerce platforms and short video platforms. Media 09, 49–52 (2022) 3. Wongkitrungrueng, A., Assarut, N.: The role of live streaming in building consumer trust and engagement with social commerce sellers. J. Bus. Res. 117, 543–556 (2020) 4. Gao, Y.: Research on the influence of e-commerce anchor characteristics on consumer purchase intention. Harbin Institute of Technology (2020) 5. Wang. T.: A Study on Consumer Purchase Intention in the Context of E-commerce Live Streaming. Minzu University of China (2020) 6. Lu, C.: E-commerce live streaming marketing, perceived value, and customer purchase intention. Commercial Econ. 22, 103–106 (2022) 7. Guo, H., Xu, S., Shang, M., Zhao, F.: Product research on the impact of innovation and consumer perceived value on consumption intention. Times Econ. Trade 19(09), 13–19 (2022). https:// doi.org/10.19463/j.cnki.sdjm.2022.09.034

DOA Estimation of Special Non-uniform Linear Array Based on Quantum Honey Badger Search Algorithm Yaqing Zheng, Hongyuan Gao(B) , and Yulong Qiao Harbin Engineering University, Harbin 150001, China {zhengyaqing,gaohongyuan,qiaoyulong}@hrbeu.edu.cn

Abstract. Aiming at the current lack of research related to the application of special non-uniform linear arrays in impulsive noise and multi-coherent signal environments, in this paper, a direction of arrival (DOA) estimation method with higher effectiveness and robustness has been proposed. The proposed matrix based on the sine transform exponential kernel low-order moment (SCELOM) can effectively suppress impulsive noise. The maximum likelihood algorithm (ML) is used to obtain good directional performance, and quantum optimization theory is applied to the honey badger bionics mechanism to construct the quantum honey badger algorithm (QHBA) to solve the problem of large computational complexity involved in the multidimensional nonlinear optimization problem of the maximum likelihood algorithm, which improves the search efficiency and estimation accuracy. Finally, the maximum likelihood equation with the SCELOM matrix named as QHBA-SCELOM-NLA-ML is designed. Monte Carlo simulation results show that the proposed algorithm has the advantages of fast convergence, high precision, strong anti-impact noise ability, scalable array aperture, good decoherence and wide applicability. Keywords: Impulsive noise · Special non-uniform linear array · Quantum honey badger algorithm · Sine transform exponential kernel low-order moment · Maximum likelihood method

1 Introduction At present, as the key technology and key research content of array signal processing, DOA estimation technology has been extensively applied in many areas such as seismic detection, radar, passive sonar and wireless communication [1, 2]. In most cases, the existing DOA estimation methods are proposed on the basis of the condition that the array element noise is white Gaussian noise [3]. However, many applications of DOA estimation technology are non-Gaussian noise environments with impulsive characteristics, such as underwater noise and some man-made noises, which can be described by processes with different characteristic exponents [4]. The characteristic exponent determines the impact degree of the stable distribution of alpha. When the value of characteristic index is smaller, the impact characteristic is more obvious. However, for © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 345–354, 2024. https://doi.org/10.1007/978-981-99-9412-0_36

346

Y. Zheng et al.

non-Gaussian noise without higher moments of second-order and above, the model applied to the Gaussian model do not match it, making the traditional algorithms based on second-order and above cumulants invalid. Therefore, it is necessary to design the low-order moments of the received signal in the impulsive noise environment [5–7]. The special non-uniform linear array is an extended array structure in which the set of relative position differences of the array elements is extended. The special nonuniform linear array can expand the array effective aperture, so that the direction finding method can be applied when the sources number is larger than the elements number, and overcome the possible ambiguity problem of non-equidistant linear array, so it has drawn extensive attention and researched by scholars [8]. However, it is still less studied about special non-uniform linear arrays’ application to the environment of impulsive noise and multiple coherent sources. This paper gives a method to solve this problem and makes the application range of special non-uniform linear array more extensive. A novel stable maximum likelihood method based on SCELOM is designed and proposed for the DOA problem in the environment of impulsive noise. In general, it will be a nonlinear multi-dimensional function when optimizing the fitness function obtained, hence correspondingly involves a large amount of calculation. To avoid this, a new intelligent computation algorithm is designed [9]. The quantum honey badger algorithm is designed to solve the objective function of the proposed equation, which is called QHBA. QHBA is used to optimize the designed maximum likelihood equation in the search domain and obtain the corresponding optimal search angle value. In this paper, the structural flow is arranged below. In Sect. 2, develops the DOA estimation model in the impulsive noise environment. In Sect. 3, based on QHBA and SCELOM-NLA-ML, a new DOA estimation algorithm QHBA-SCELOM-NLA-ML is designed. In Sect. 4, presents simulation parameter settings and simulation experiment results. In Sect. 5, concludes the paper.

2 DOA Estimation Model Assuming that the special non-uniform linear array is composed of M isotropic antenna elements, on the basis of the characteristics of the special non-uniform linear array, the uniform linear array composed of more elements can be simulated. If the maximum correlation delay calculated by the special non-uniform linear array is M −1, the number of virtual elements in the virtual linear array is M . The element position vector of a special non-uniform linear array is defined as p = [p1 , p2 , ..., pM ], p1 < p2 < ... < pM , pm denotes the position of the mth element from the first element, m = 1, 2, ..., M . The position of the first element of the special non-uniform linear array is denoted by p1 = 0, then p = Hd = [h1 d , h2 d , ..., hM d ], where H = [h1 , h2 , ..., hM ] represent the set of natural numbers, d is the minimum element spacing of the uniform linear array that can  be constructed by the special uniform linear array, 0 < d ≤ λ 2, λ is the wavelength of ˜ = {hr˜ − hm˜ |hr˜ ∈ H, hm˜ ∈ H, r˜ > m} ˜ can be formed. The the incident signal. The set H ˜ are continuous natural numbers from 1 to M . The maximum difference of elements of H  the virtual uniform linear array is M (M − 1) 2, and the natural numbers in the relative ˜ are continuous. position difference set H

DOA Estimation of Special Non-uniform Linear Array

347

Supposing at the far field of the array, if N narrowband point source incident from the direction θ = [θ1 , θ2 , ..., θN ] on the special non-uniform linear array, and the incident wavelength is λ, then received kth snapshot sampling data is given by z(k) = A(θ )s(k) + n(k)

(1)

where k = 1, 2, ..., K, K is the maximum number of snapshots, and z(k) = [z1 (k), z2 (k), ..., zM (k)]T represents the kth snapshot data vector received by the M × 1 array, s(k) = [s1 (k), s2 (k), ..., sN (k)]T represents the N × 1 signal vector, n(k) = [n1 (k), n2 (k), ..., nM (k)]T represents the M × 1 noise vector, which is spatially and temporally independent impulsive noise. A(θ ) = [a(θ1 ), a(θ2 ), ..., a(θN )] represents the M × N steering matrix and a(θi ) is the ith steering vector of the steering matrix, i = 1, 2, ..., N , θ = [θ1 , θ2 , ..., θN ] is the source direction vector. The steering vector of the special non-uniform linear array with incident angle θi indicates a(θi ) = [e−j2πp1 sin(θi )/λ , e−j2πp2 sin(θi )/λ , ..., e−j2πpM sin(θi )/λ ]T , where j is a complex number unit. The sinusoidal transformation exponential kernel low-order matrix of the array ˜ the element R˜ in row i and column j of the matrix received signal is defined as R, ij R˜ can be expressed as      R˜ ij = E exp −sin[(˜c1 |˜zi (k) − z˜j∗ (k)|)2 ] 2σ˜ 2 · z˜i (k) · z˜j∗ (k)

(2)

where c˜ 1 is the weight constant, z˜i (k) and z˜j (k) denote the i and j dimensional signals in the received kth snapshot vector z˜ (k) = [˜z1 (k), z˜2 (k), · · · , z˜M (k)]T , respectively, which satisfying 1 ≤ i ≤ M , 1 ≤ j ≤ M , | · | stands for absolute value, (·)∗ is the conjugate operation, σ˜ is the kernel length of the kernel function. The sine transform exponential ˜ can further expressing as kernel low-order matrix R ⎤ ⎡ h −h 1 −hM R˜ 111 1 R˜ h121 −h2 · · · R˜ h1M ⎢ ˜ h2 −h1 ˜ h2 −h2 2 −hM ⎥ · · · R˜ h2M R22 ⎥ ⎢ R21 ⎥ (3) R˜ = ⎢ .. .. .. .. ⎥ ⎢ . ⎦ ⎣ . . . M −hM R˜ hMM1−h1 R˜ hMM2−h2 · · · R˜ hMM

Then, the low-order moment of the sine transform exponential kernel of the normalized signal constructed on the basis of the virtual uniform array is ⎤ ⎡   R11 R12 · · · R1M ⎥ ⎢   ⎥ ⎢R R  ⎢ 21 22 · · · R2M ⎥ (4) R=⎢ . .. . . . ⎥ ⎢ .. . .. ⎥ . ⎦ ⎣    RM 1 RM 2 · · · RM M ˜ Rrm = E[R˜ (hl −hf ) ], E is the where M is the largest expanded dimension of the matrix R, lf ˜ m ∈ H, ˜ hl − hf ∈ H. The special mathematical expectation, r − m = hl − hf , r ∈ H, 

348

Y. Zheng et al.

non-uniform linear array steering matrix after virtual expansion into a virtual uniform 







linear array is B(θ ) = [ b(θ1 ), b(θ2 ), ..., b(θN )], where the extended steering vector 

corresponding to the ith angle is b(θi ) = [1,e−j2π d sin(θi )/λ , ..., e−j2π(M −1)d sin(θi )/λ ]T , i = 1, 2, ..., N . Then the maximum likelihood equation based on the low-order moment of the sine transform exponential kernel can be written as θˆ = arg max tr(P  θ

B(θ)



R)

(5)

where tr(·) is the function for finding the matrix trace, P 

B(θ)

H



H



= B(θ ) · (B (θ )B(θ ))−1 ·



B (θ ) is the B(θ ) projection matrix.

3 DOA Estimation for Special Nonuniform Linear Array Based on Quantum Honey Badger Algorithm 3.1 Quantum Honey Badger Algorithm Honey badger algorithm (HBA) is a neoteric intelligent optimization algorithm gain inspiration from the honey badger’s foraging behavior and quantum mechanism [10]. The quantum honey badger algorithm (QHBA) is proposed for HBA and quantum theoretical mechanism. In the quantum honey badger algorithm, the group size of the quantum honey badger swarm is set as N˜ , that is, the unknown parameters number to be optimized is B, the maximum iterations is Tmax , and ε represents the number of iterations. Each quantum honey badger in the group has its own quantum position and position. Randomly initialize the quantum position of N˜ quantum honey badger within [0, 1], and the nth quantum honey badger corresponding the quantum position sets as xεn = ε , x ε , ..., x ε ], n = 1, 2, ..., N ˜ , where 0 ≤ xε ≤ 1, b = 1, 2, ..., B. Mapping into [xn1 n2 nB nb ε , x˜ ε , ..., x˜ ε ], the mapping rules are as follows the corresponding position as x˜ εn = [˜xn1 n2 nB   ε ε (6) x˜ nb = x˜ bmin + xnb · x˜ bmax − x˜ bmin ε ∈ [˜ where x˜ nb xbmin , x˜ bmax ], and then define x˜ bmin as the lower bound of the bth dimension variable, x˜ bmax as the upper bound of the bth dimension variable. For each quantum honey badger, two quantum position update strategies are used with a certain probability, and according to the quantum evolution rules, generate the corresponding quantum rotation angle, and update the quantum honey badger corresponding the quantum position by using the simplified quantum rotation gate. For the nth quantum honey badger, when Pnε ≤ η, using the first update strategy, where η is a probabilistic choice constant between [0, 1] and pnε is a random number between [0, 1] obeying a uniform distribution. The first update strategy is updating the quantum position through the simplified quantum rotation gate according to the quantum honey badger corresponding the current quantum position, and then the quantum rotation

DOA Estimation of Special Non-uniform Linear Array

349

angle corresponding to the bth dimension quantum position updated by the nth quantum honey badger is ε+1 ε × δnb = c˜ r εnb × Fnb

Snε 2 4πd n

  ε + F ε × r˜ ε × α ε × d ε × cos(2π˙r ε ) × [1 − cos(2π¨r ε )] × xgb n nb nb nb nb

(7)

ε , r˙ ε and r¨ ε are random uniform numbers between [0, 1], c ˜ is the weight where r εnb , r˜nb nb nb ε constant, Sn is the source strength of the nth quantum honey badger in the εth iteration, also called the concentration strength of the nth quantum honey badger, and shows the ε distance from the nth quantum honey badger to the first quantum honey badger, d n denotes the distance from the the nth quantum honey badger to prey at the εth iteration, ε+1 represents the corresponding quantum rotation angle, which is used to represent the δnb evolutionary trend corresponding to the nth quantum honey badgers in the ε +1 iteration. ε is used as a sign to change the search direction of the bth dimension variable of the Fnb nth quantum honey badger, whose expression is  1, pnε ≤ η ε (8) = Fnb −1, pnε > η

It is defined as the density factor α ε is α ε = c × exp(−

ε ) Tmax

(9)

when pnε > η, the second search strategy is used. The second update strategy of quantum position changes the search step size and direction, in which the nth quantum honey badger updates the bth dimension quantum rotation angle, which is defined as ε

ε+1 ε ε = Fnb × rˆnb × αε × d n δnb

(10)

ε is a random number between [0, 1]. where rˆnb The update process of quantum position is completed by the simplified simulated quantum rotation gate. The update formula for the bth quantum position of the nth quantum honey badger to be updated using the simulated quantum rotation gate is demonstrated below     ε  ε+1 ε+1 ε+1  ε 2  (11) xnb = xnb × cos(δnb ) + 1 − (xnb ) × sin(δnb ) ε+1 where xnb is the bth dimension quantum position of the (ε + 1)th generation after the ε of the nth quantum honey badger, update of the bth dimension quantum position xnb n = 1, 2, ..., N , b = 1, 2, ..., B. The quantum honey badger algorithm can not only be applied to the DOA estimation problems, but also be applied to the solution of other optimization problems.

3.2 DOA Estimation for Special Non-uniform Linear Array Based on Quantum Honey Badger Algorithm The fitness function based on maximum likelihood Eq. (5) is constructed. The fitness of the position x˜ εn of the nth quantum honey badger is calculated through the fitness function,

350

Y. Zheng et al. ε







and the position x n = [ x n1 , x n2 , ..., x nB ] is substituted into the fitness function, and then the nth quantum honey badger corresponding the position fitness value of in the εth generation is ε



f ( x n ) = tr(P  ε R)

(12)

B( x n )

 ε

 H ε  ε

 H ε

where P  ε = B( x n )[B ( x n )B( x n )]−1 B ( x n ), n = 1, 2, ..., N . The calculated fitness B( x n )

values are sorted from large to small, and finding the maximum fitness function value corresponding the quantum position till the current generation, and the quantum position ε , x ε , ..., x ε ]. is determined to be the global optimal quantum position xεg = [xg1 g2 gB According to the mapping rules, mapping the new bth dimensional quantum position ε+1 of the nth quantum honey badger to the newly generated bth dimensional position xnb

ε+1

x nb of the nth quantum honey badger, n = 1, 2, ..., N , b = 1, 2, ..., B. The fitness ε+1

ε+1

value f ( x n ) of the newly generated position x n of the nth quantum honey badger is calculated on the basis of the fitness function, and then the quantum honey badger correε

ε+1

sponding the quantum position is selected by greedy selection, if f ( x n ) > f ( x n

ε+1



x n , xε+1 n

xεn ,

ε+1



), then

xn = = f ( x n ) = f ( x n ). Then, the quantum honey badger is sorted on the basis of the fitness from large to small, and the quantum honey badger, which has the largest fitness is found and its quantum position is recorded as the global optimal quantum ε+1 ε+1 ε+1 = [xg1 , xg2 , ..., xgB ]. position up to generation ε + 1, and it is updated to xε+1 g

4 Simulation Experiment Results For the sake of demonstrate and verify the effectiveness and superiority of QHBSCELOM-NLA-ML algorithm, the DOA estimation methods of ROC-MUSIC [5], FLOM-MUSIC [6], FLOC-MUSIC [7], ROC-NLA-MUSIC [12] and FLOC-NLAMUSIC [12] were compared. For the model parameters, the array elements number M = 4 and the position vector of array elements H = [0, 1, 4, 6] are set for the special non-uniform linear array, the smallest adjacent array elements are separated by half a wavelength. The maximum number of snapshots sets as K = 100, σ = 1.5 and the number of virtual array elements M = 7 after expansion are set. The simulation is implemented in an impulsive noise environment, and the Monte-Carlo experiments number is set to 500 in all scenarios to ensure the reliability of the experimental results. For the designed QHBA algorithm parameters, set the population size N˜ = 100, the search space is between −90° and 90°, the weight constant c˜ 1 = 0.5, the weight constant c˜ = 6, the whole population iterating maximum number sets as Tmax = 100, the learning factor c = 2, and the probability selection constant η = 0.5. 4.1 The First Set of Simulation Experiments Firstly, it is considered in this experiment that there are two independent narrowband signal sources incident from θ1 = 0◦ and θ2 = 20◦ . Under the background of impact

DOA Estimation of Special Non-uniform Linear Array

351

noise, the conventional SNR calculation method is no longer applicable, so it is necessary to introduce the generalized signal-to-noise ratio (GSNR), whose calculation formula is    E s(l)2 GSNR = 10lg (13) γα   where E s(l)2 indicates the average power of the signals, γ represents the scale, α represents the characteristic exponent of noise. Setting GSNR = 10 dB, α = 1.8, the performance of QHBA-SCELOM-NLA-ML under different snapshot numbers is evaluated by Fig. 1(a) and (b). Figure 1(a) shows the curve of the estimation success probability (within 2° of error is the estimation success) changing with the number of snapshots under the condition of two independent sources. The root mean square error (RMSE) is defined below.      N Ne θ − θˆ 2   i in˙ (14) RMSE =  NNe i=1 n˙ =1

where N represents the source numbers and Ne represents the Monte-Carlo experiment numbers. θi denotes the true incoming direction of the ith source, and θˆin˙ denotes the estimated angle of DOA for the ith source in the n˙ th experiment. Figure 1(b) demonstrates the curve of the RMSE changing with the snapshots number under the condition of two independent sources. The simulation results show that QHBASCELOM-NLA-ML is more accurate and has a higher probability of success in direction estimation of two independent sources, especially in small snapshots. Compared with the other four methods, QHBA-SCELOM-NLA-ML has better robustness and effectiveness.

Fig. 1. The simulation contrast curve of direction-finding accuracy as the snapshots number changes for two independent sources.

4.2 The Second Set of Simulation Experiments For a special non-uniform linear array, we evaluate its virtual array expansion capability here. It is considered in this experiment that there are five independent narrowband signal

352

Y. Zheng et al.

sources incident from θ1 = −40◦ , θ2 = −20◦ , θ3 = 0◦ , θ4 = 20◦ and θ5 = 40◦ . Setting GSNR = 5 dB, α = 1.8, the performances of QHBA-SCELOM-NLA-ML under different generalized signal-to-noise ratio and characteristic exponent evaluated by Fig. 2(a) and (b). It has been shown in Fig. 2(a) that the estimation success probability (within 2° of error is the estimation success) curve change with the GSNR under the condition of five independent sources. Figure 2(b) shows the estimation success probability curve changing with the characteristic exponent under the condition of five independent sources.

Fig. 2. The simulation contrast curve of success rate for five independent sources.

The simulation results demonstrate that QHBA-SCELOM-NLA-ML can realize the expansion of array aperture, and can find the direction of the source number which is larger than the real array element number. It has better robustness and performance in low GSNR and small snapshot. Comparing with the other three methods, QHBA-SCELOMNLA-ML has better application universality and stability. 4.3 The Third Set of Simulation Experiments Since the previous simulation experiments are for independent source direction finding, and the problem of coherent source direction finding is more important, so here we evaluate the performance of coherent source direction finding. It is considered in this experiment that there are two coherent narrowband signal sources incident from θ1 = 0◦ and θ2 = 20◦ . Setting GSNR = 10 dB, α = 1.8, the performances of QHBA-SCELOMNLA-ML under different GSNR and characteristic exponent evaluated by Fig. 3(a) and (b). It has been shown in Fig. 3(a) that the estimation success probability (within 2° of error is the estimation success) curve change with the GSNR under the condition of two coherent sources. Figure 3(b) shows the estimation success probability curve changing with the characteristic exponent under condition of two coherent sources. The simulation results show that QHBA-SCELOM-NLA-ML has obvious advantages in solving coherent sources, and can solve coherent sources without additional decoherence operations, and can do without loss of array aperture. Compared with the other four methods, QHBA-SCELOM-NLA-ML has better application superiority.

DOA Estimation of Special Non-uniform Linear Array

353

Fig. 3. The simulation contrast curve of success rate for two coherent sources.

5 Conclusions In this paper, a novel QHBA-SCELOM-NLA-ML method is proposed to locate the direction of source incoming waves in the impulsive noise environment. Using the quantum honey badger algorithm to search and work out the optimal angle of the proposed maximum likelihood equation in the search domain, it will greatly reduce the amount of calculation and can quickly obtain high-precision solutions. Compared with ROC-MUSIC, FLOC-MUSIC, ROC-NAL-MUSIC, FLOC-NAL-MUSIC and FLOM-NAL-MUSIC, the proposed method QHB-SCELOM-NAL-ML can enhance the performance and the results of the previous method. QHBA-SCELOM-NAL-ML algorithm can achieve highprecision direction finding in the environment of strong impulsive noise, small snapshots number, low GSNR and the presence of coherent sources, while the comparison algorithm fails to achieve accurate direction finding under conditions, and the performance of the comparison algorithm decreases sharply in harsh environments.

References 1. Chen, Y., Liu, H., Li, Y.: DOA estimation of Underwater LOW noise target technique based on focusing matrix. In: 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Hefei, pp. 417–421 (2020) 2. Xu, F., Morency, M.W., Vorobyov, S.A.: DOA estimation for transmit beamspace MIMO radar via tensor decomposition with Vandermonde factor matrix. IEEE Trans. Sig. Process. 70, 2901–2917 (2022) 3. Wei, W., Liu, R., Yu, X., Cui, G.: Fast single-snapshot DOA estimation of coherent sources for Distributed mmWave Radar System. IEEE Trans. Circ. Syst. II Exp. Briefs 69(8), 3615–3619 (2022) 4. Javaheri, A., Zayyani, H., Figueiredo, M.A.T., Marvasti, F.: Robust sparse recovery in impulsive noise via continuous mixed norm. IEEE Sig. Process. Lett. 25(8), 1146–1150 (2018) 5. Tsakalides, P., Nikias, C.L.: The robust covariation-based MUSIC (ROC-MUSIC) algorithm for bearing estimation in impulsive noise environments. IEEE Trans. Sig. Process. 44(7), 1623–1633 (1996)

354

Y. Zheng et al.

6. Ma, X., Nikias, C.L.: Joint estimation of time delay and frequency delay in impulsive noise using fractional lower order statistics. IEEE Trans. Sig. Process. 44(11), 2669–2687 (1996) 7. Li, S., He, R., Lin, B., Sun, F.: DOA estimation based on sparse representation of the fractional lower order statistics in impulsive noise. IEEE/CAA J. Automatica Sinica 5(4), 860–868 (2018) 8. Zheng, Z., Fu, M., Wang, W.-Q., Zhang, S., Liao, Y.: Localization of mixed near-field and farfield sources using symmetric double-nested arrays. IEEE Trans. Antennas Propag. 67(11), 7059–7070 (2019) 9. Gao, H., Su, Y., Zhang, S., Hou, Y., Jo, M.: Joint antenna selection and power allocation for secure co-time co-frequency full-duplex massive MIMO systems. IEEE Trans. Veh. Technol. 70(1), 655–665 (2021) 10. Hashim, F.A., Houssein, E.H., Hussain, K., Mabrouk, M.S.: Honey Badger Algorithm: new metaheuristic algorithm for solving optimization problems. Math. Comput. Simul 192(0378– 4754), 84–110 (2022) 11. Park, Y., Gerstoft, P., Lee, J.-H.: Difference-frequency MUSIC for DOAs. IEEE Sig. Process. Lett. 29, 2612–2616 (2022) 12. Gao, H., Han, X.: Direction finding of signal subspace fitting based on cultural bee colony algorithm. In: 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), Changsha, pp. 966–970 (2010)

Chewing Behavior Detection Based on Facial Dynamic Features Cheng-Zhe Tsai1 , Chun-Chih Lo1 , Lan-Yuen Guo2 , Chin-Shiuh Shieh1 , and Mong-Fong Horng1(B) 1 Department of Electronic Engineering, National Kaohsiung University of Science and

Technology, Kaohsiung, Taiwan [email protected] 2 Department of Sports Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan

Abstract. Diet serves as the primary source of calorie intake for human beings, and maintaining a regular dietary intake is crucial for overall health. The pace or speed of chewing can significantly impact the body’s response to food consumption. Traditionally, dietary monitoring has relied on manual assessment by clinicians, a process that is labor-intensive, time-consuming, and susceptible to inaccuracies. In this study, we introduce a novel image processing-based approach for quantitatively evaluating chewing and swallowing capabilities. In this method, facial recognition is employed to detect and calibrate facial features using the Dlib facial landmark model. This enables the precise identification of the mandible’s position, facilitating the capture of the subject’s chewing movements. Subsequently, signal processing techniques are applied to calculate the number of chewing instances. Experiments was conducted with five subjects of diverse genders and ages. The results indicated a mean absolute error of 6.48% in chewing count calculation. The proposed method offers the advantages of convenience and minimal error in comparison to similar studies. Keywords: chewing instances · dietary intake · facial feature recognition

1 Motivation and Purpose 1.1 Motivation The ingested food undergoes a pivotal chewing process within the mouth before proceeding to be swallowed and subsequently digested. The act of swallowing is immediately followed by chewing, creating a strong correlation between the two actions. Chewing plays a vital role in breaking down food into smaller fragments. Through coordinated efforts involving the tongue and oral muscles, saliva is absorbed to form cohesive food masses. Subsequent to ingestion, food progresses through the oral cavity, pharynx, esophagus, and eventually reaches the stomach. Figure 1 illustrates the sequential flow of swallowing, divided into five phases [1]: (a) Preparation phase, (b) Preparatory action phase, (c) Oral phase, (d) Pharyngeal phase, and (e) Esophageal phase. These phases © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 355–365, 2024. https://doi.org/10.1007/978-981-99-9412-0_37

356

C.-Z. Tsai et al.

collectively form crucial components of the swallowing process. Any abnormalities in their function can be significant contributors to swallowing difficulties. In this study, our focus is on understanding chewing movements during the oral phase. We aim to characterize these movements comprehensively and propose a method for quantifying their frequency.

Fig. 1. The five phases of swallowing process [1]

1.2 Purpose This study employs image recognition technology to precisely capture chewing events. Specifically, its objective is to devise a specialized chewing motion that enables image acquisition from the external perspective of the motion. To achieve this goal, the investigation utilizes camera equipment to systematically monitor subjects’ eating behaviors. The camera equipment is strategically positioned to observe and document the dietary habits of the subjects. Captured images of the eating process are then transmitted back to a computer for subsequent analysis. This phase involves a thorough assessment of the distinctive characteristics exhibited in the subjects’ chewing movements during their meals. Upon receiving the transmitted image messages, the study delves into comprehensive image analysis. This analysis serves as the foundation for the design of advanced algorithms adeptly calibrated to evaluate the subjects’ eating behaviors. These algorithms encompass essential parameters, including the frequency of chewing occurrences and the pace of chewing. This approach provides an efficient methodology for accurately quantifying and comprehending the intricate dynamics of chewing during dietary intake.

2 Literature Review and Technical Analysis 2.1 Visual Characteristics of Chewing During Eating During the act of eating, chewing is characterized by a consistent pattern of the mandible’s cyclic opening and closing motions. This orchestrated movement, facilitated by the masticatory muscles, allows the teeth to grind and compress the food, transforming it into a cohesive mass suitable for the processes of swallowing, digestion, and

Chewing Behavior Detection Based on Facial Dynamic Features

357

absorption. As visually depicted in Fig. 2, chewing involves the precise coordination of the masticatory muscles to regulate the movement of both the teeth and the mandible. This coordinated effort results in a rhythmic sequence of the jaw’s opening and closing, effectively processing the ingested food within the confines of the oral cavity. Hence, the visual depiction of chewing is prominently tied to the motion of the mandible. The extent of the mandible’s displacement emerges as a crucial quantitative measure, employed to assess the frequency of individual chewing sessions.

Fig. 2. Mandible motion during chewing [2]

2.2 Past Literature on Chewing Recognition In 2008, Nishimura et al. introduced a novel approach involving a wireless wearable in-ear microphone for monitoring eating habits [3]. Their study featured the installation of a wireless microphone on the subject’s ear to capture distinct chewing sound features. The sound signals generated while biting and chewing food were harnessed to track chewing motion. Through signal processing techniques, the number of chewing instances was computed. The final experimental outcomes revealed a 1.93% margin of error in estimating the frequency of chewing instances. In 2016, Farooq et al. presented an automated methodology for quantifying the number of chewing instances employing a wearable piezoelectric sensor [4]. The authors affixed a piezoelectric sensor element beneath the participant’s ear, secured with a medical adhesive. This setup enabled the capture of vibrations generated during jaw movement during the act of chewing. The piezoelectric sensor translated these vibrations into voltage, producing a chewing signal that could be identified and quantified through peak detection methods. Their approach reported an average calculation error of 7% in determining the number of chews. Commonly employed techniques for capturing chewing activity include utilizing microphones to record chewing sounds or employing piezoelectric sensors to detect and analyze chewing signals. Although these methods offer commendable accuracy, microphone recordings may be susceptible to environmental interference, potentially compromising their precision. Additionally, both microphone and piezoelectric sensor approaches necessitate physical attachment to the body, which can prove cumbersome and uncomfortable for certain subjects, thereby inconveniencing the testing environment. In contrast, the approach proposed in this study solely relies on image analysis to assess the number of chewing instances, offering a relatively more convenient alternative.

358

C.-Z. Tsai et al.

2.3 Dlib Facial Landmark Model Dlib is a C + + based machine learning toolkit [5] designed for the development of machine learning, image processing, and natural language processing applications. This comprehensive toolkit offers an array of machine learning algorithms and tools tailored to image-related tasks, including image processing, feature extraction, face detection, object detection, and more. A noteworthy component within Dlib’s repertoire is the Dlib facial landmark model, as depicted in Fig. 3. This model operates on the foundations of face detection and feature annotation within the facial structure using deep learning technology and trained using Convolutional Neural Networks (CNN) [6]. This model demonstrates an inherent capability to autonomously identify facial components within images, including eyes, nose, mouth, and other distinctive attributes. This model’s efficacy is further substantiated by its training data, which comprises an extensive collection of real facial images captured in diverse scenarios. This comprehensive dataset empowers the model with the capability to recognize and annotate facial features across varying environmental scenarios. In this study, the Dlib facial landmark model is leveraged to recognize the number of chewing instances performed by participants. The quantification of chewing sessions is predicated on the extent of mandibular bone displacement during the process of chewing and swallowing. The Dlib facial landmark model proficiently anticipates and accurately predicts mandibular features, which proves invaluable for the current study’s objectives.

Fig. 3. Dlib Face Merker Model [7]

3 Methods and Procedures 3.1 Application Scenarios The fundamental objective of this study is to employ an imaging approach that enables subjects to quantify their chewing instances while positioned in front of a camera, thereby eliminating any requirement to manipulate or wear sensory equipment. Figure 4 illustrates the envisioned application scenario. This scenario encompasses situating the subject within a well-illuminated environment, positioning them in a chair, and configuring a color camera along with a depth camera in front of the subject. This configuration serves to capture distinctive attributes of the subject’s mouth and mandible, facilitating subsequent determination and analysis of chewing actions. During the testing phase, a

Chewing Behavior Detection Based on Facial Dynamic Features

359

consistent quantity of food is provided to the participant. This ensures a uniform testing condition, allowing each participant to consume the same amount of food during the experiment. Once the participant is prepared, they are instructed to initiate eating. At this point, the camera initiates the recording of images depicting the participant’s feeding process. These recorded images are subsequently transmitted to a computer for analysis. The collected data is then inputted into a pre-designed program crafted for the purpose of chewing instance quantification.

Fig. 4. Application Scenarios [7]

3.2 Chewing Calculation Process The chewing signal is further processed by signal processing method, and then the chewing signal is peak detected and the number of chewing is calculated. The sequential procedure for calculating chewing instances is depicted in the flowchart illustrated in Fig. 5. This process is initiates by segmenting the captured images of the subject’s eating activity into individual frames. Subsequent to segmentation, each frame undergoes recognition using the Dlib facial landmark model for face identification. This recognition procedure encompasses annotating facial features and generating corresponding coordinates for these features. Notably, the coordinates associated with the position of the subject’s mandibular bone during the chewing motion within the video are extracted as the chewing signal. To facilitate comprehensive analysis, the chewing signal is subjected to signal processing techniques. This processed signal subsequently undergoes peak detection procedures, culminating in the computation of the number of chewing instances.

360

C.-Z. Tsai et al.

Fig. 5. Flowchart for Calculating the Number of Chewing Instances

3.3 Algorithm Design for Chewing Instances Calculation The algorithm presented herein facilitates the precise calculation of chewing instances. Within the chewing signal, each peak manifest as a distinct chewing characteristic, reflecting the upward and downward displacement of the mandible during food consumption. These peaks within the chewing signal serve as the basis for computing the frequency of chewing events during eating. Figure 6 illustrates the calculation process of peak occurrences within the chewing signal. The initial phase involves generating the chewing signal by converting the continuous sequence of the subject’s eating activity into a singular image suitable for facial recognition. This image is then subjected to detailed analysis to identify the coordinates of the facial region. These coordinates are then employed in conjunction with a landmark model to characterize the entirety of facial features, thereby obtaining the corresponding features (i, j) coordinates. Due to the inherent variations in facial features and sizes across individuals, the algorithm calculates the distance d(i,j) (n) between the two eye points and the distance from the nose to the chin using Eq. 1. Furthermore, the ratios of distances d(i1,j1) (n) between the eyes and d(i2,j2) (n) between the nose and chin are determined using Eq. 2. These measurements collectively compose the primary masticatory signal x(n) output. Initial analysis of the raw chewing signal reveals the presence of significant noise within its pattern. To address this, the initial step is to let the raw chewing signal undergoes processing via a shift-averaging filter, as depicted in Eq. 3. This filter effectively averages the raw signals x(0), x(n − 1), resulting in the generation  of an improved chewing or swallowing signal x (n). Subsequently, the algorithm identifies peak counts within the signals, employing a moving average filter as a threshold T , this step plays a decisive role in identifying the signal peaks. When the chewing signal surpasses the defined threshold T , it triggers the generation of an amplified signal y(n) = 1. Conversely, when the signal is less than or

Chewing Behavior Detection Based on Facial Dynamic Features

361

Fig. 6. Flowchart of Chewing Instance Calculation

equal to the threshold T , the signal y(n) = 0. Finally, peak counts denoted as C are accumulated by evaluating whether the condition y(n) = 1 and (n − 1) = 0 is satisfied. Upon meeting these conditions, the count C is incremented. Otherwise, the assessment process continues.  (1) d(i,j) (n) = (xi (n) − xj (n))2 + (yi (n) − yj (n))2 where d(i,j) (n) represents the Euclidean distance between the specified points, with (xi (n) and yi (n) denoting the coordinates of feature point i, and xj (n) and yj (n) representing the coordinates of feature point j. x(n) =

d(i2,j2) (n) d(i1,j1) (n)

(2)

where x(n) represents the ratio obtained from the distances between feature points i1 j1 and feature points i2 j2 , and it serves as the output for the chewing raw signal. 

x (n) =

1 [x(n − 1), x(n − 2), . . . , x(0)] n

(3)



where x (n) is the average of the raw signals x(n − 1), x(n − 2), . . . , x(0), enabling the signal to undergo a low-pass filtering calculation process. y(n) = {

1 ifx (n) > T 0 otherwise

(4)

where T represents the threshold. If the value of x(n) is greater than T , then y(n) is assigned the value 1; otherwise, it is assigned the value 0.

362

C.-Z. Tsai et al.

4 Experimentation and Performance Analysis 4.1 Purpose of the Experiment The primary goals of the experiments conducted in this study are twofold: first, to establish the viability of capturing masticatory image features, and second, to demonstrate the efficacy of the newly developed masticatory capture technique based on image processing. In this section, a sequence of experiments will be conducted to explore the chewing characteristics exhibited by subjects during eating. This includes the formulation of a comprehensive set of measurement protocols for the participants, alongside the design, evaluation, and comparison of digital image-based feature capture methods employing various metrics. Additionally, the number of chewing instances of the subjects will be subjected to analysis, leading to the development of a novel approach for quantifying the number of chewing instances. 4.2 Characteristics of Chewing

Chewing Signal

In Fig. 7, the participant is seated in front of a USB camera while consuming food. Subsequently, the feeding images are fed into the Dlib facial landmark model. The resultant algorithm generates chewing signals, where the horizontal axis signifies the chewing signals measured across each image, and the vertical axis illustrates the amplitude of the chewing signals. Notably, a discernible pattern is observed in the form of changes in mandibular displacement during chewing. The signals depicted in Fig. 7 correspond to peaks resulting from the cyclical up and down movement of the mandible. The ensuing step involves the computation of these peaks in the masticatory signals, which are then presented as masticatory indicators. 2.85 2.65 2.45 2.25 0

1

2

3

4 5 6 7 Times(seconds)

8

9

10

11

Fig. 7. Raw Signals of Chewing

4.3 Chewing Signal Data Pre-Processing Due to the presence of substantial noise in the raw signal, it is imperative to eliminate this noise to enhance the accuracy of the assessment. To achieve this, a moving average filtering technique is employed to refine the raw chewing signal by retaining its fundamental attributes while effectively mitigating noise interference. The outcome of this process is showcased in Fig. 8, where the resultant low-pass filtering serves to amplify the significant chewing features, thereby preserving essential data characteristics.

Chewing Signal

Chewing Behavior Detection Based on Facial Dynamic Features

363

2.85 2.65 2.45 2.25 0

1

2

3

4

5

6

7

8

9

10

11

Times(seconds) Fig. 8. Chewing Signal Low-Pass Filtering

4.4 Analysis of Experimental Results for Chewing Instance Calculation The application of the dynamic threshold, achieved through the utilization of a moving average filter, plays a important role in this study by identifying the peaks within the chewing signal. As depicted in Fig. 9, the signal processing results for calculating the number of chewing instances are presented. The blue line denotes the outcome of the raw chewing signal following low-pass filtering, the red line signifies the threshold value generated through the application of the moving average filter, and the green line represents the signal utilized for chewing determination. Furthermore, a diverse group of five subjects, varying in age and gender, were engaged in consuming distinct foods in front of the camera. The manual recording of each subject’s chewing instances was carried out as a benchmark for comparison against the program’s calculations. The assessment of the accuracy was executed by leveraging Eq. 5, thereby computing the absolute percentage error in the calculated number of chews.   |x − y| ∗ 100% (5) δ= y

Chewing Signal

where δ presents the absolute percentage error, x denotes the measured value, and y signifies the actual value. 1.5 1 0.5 0 0

1

2

3

4 5 6 7 Times(seconds)

Chewing Signal

8

Threshold

9

10

11

y(n)

Fig. 9. Chewing Signal Processing

The experiment employed the developed chewing instance calculation system with participation from five healthy subjects. The experimental protocol involved placing a

364

C.-Z. Tsai et al.

uniform-sized cookie into the subjects’ mouths, after which the program was initiated to detect and track chewing movements until the food was swallowed. The entire process, from initiation to completion, was timed and monitored by the program. The outcomes of the chewing tests are presented in Table 1, including the age of each of the five subjects, the corresponding count of chewing instances, and the duration of their chewing activities, among other relevant information. Table 1. Experimental Results for the Number of Chewing Instances Tested Person No

Age

Chewing Quantity Measurement

Actual number of chews

Time (sec)

Number of Chewing Instances (times/second)

Absolute error of calculation (%)

1

23

28

28

27.1

1.0

0

2

22

34

36

33

1.1

5.6

3

26

28

30

20

1.5

6.7

4

25

15

16

16

1.0

6.3

5

24

33

29

39

0.7

13.8

The experimental results demonstrated that the proposed method in this study yielded a mean absolute error of 6.48% in chewing calculation. By comparing the outcomes of this study to the chewing recognition results presented by Nishimura et al. and Farooq et al. [3, 4], our approach displayed a slightly higher error rate in calculating the number of chewing instances. However, the approach adopted here offers distinct advantages such as non-contact and convenience. These outcomes substantiate the effectiveness of the chewing instance calculation method employed in this study. The measurement technique and algorithm devised herein facilitated the derivation of chewing counts, thereby enabling an analysis of the subjects’ eating behaviors.

5 Results and Discussion In this study, we have developed a calculation process for determining the number of chewing instances using an image-based approach. This method effectively captures the chewing characteristics of subjects and incorporates an algorithm to accurately compute the number of chews. The experiment in this study involving five subjects from various age groups, the obtained results demonstrated an mean absolute error of 6.48% in the calculated number of chewing instances. This approach offers convenience and non-contact advantages when compared to previous chewing capture methods. Finally, the system developed through this study holds the potential to enhance participants’ comprehension of their individual chewing patterns, offering valuable insights into their dietary habits.

Chewing Behavior Detection Based on Facial Dynamic Features

365

Acknowledgement. The authors would like to thank the National Science Council in Taiwan R.O.C for supporting this research, which is part of the project numbered MOST 109–2221-E-992 -073 -MY3, NSTC 112–2622-8–992-009 -TD1 and NSTC 112–2221-E-992 -057 -MY3.

References 1. Carbo, A.I., Brown, M., Nakrour, N.: Fluoroscopic swallowing examination: radiologic findings and analysis of their causes and pathophysiologic mechanisms. Radiographics 41(6), 1733–1749 (2021) 2. Mrzezo. Mechanics of Mandibular Movement. https://pocketdentistry.com/4-mechanics-ofmandibular-movement/ 3. Nishimura, J., Kuroda, T.: Eating habits monitoring using wireless wearable in-ear microphone. In: 2008 3rd International Symposium on Wireless Pervasive Computing, pp. 130–132. IEEE (2008) 4. Farooq, M., Sazonov, E.: Automatic measurement of chew count and chewing rate during food intake. Electronics 5(4), 62 (2016) 5. Dlib. http://dlib.net/ 6. O’Shea, K., Nash, R.: An introduction to convolutional neural networks, arXiv preprint arXiv: 1511.08458, (2015) 7. Rosebrock, A.: Facial landmarks with dlib, OpenCV, and Python. https://pyimagesearch.com/ 2017/04/03/facial-landmarks-dlib-opencv-python/

Unknown DDoS Attack Detection Using Open-Set Recognition Technology and Fuzzy C-Means Clustering Hao Kao1(B) , Thanh-Tuan Nguyen2(B) , Chin-Shiuh Shieh1 , Mong-Fong Horng1 , Lee Yu Xian1 , and Denis Miu3 1

2

Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan {F110152139,F111152183}@nkust.edu.tw Department of Electronics and Automation Engineering, Nha Trang University, Nha Trang, Vietnam [email protected] 3 Genie Networks, Taipei, Taiwan

Abstract. In contemporary society, internet users face various cyber threats, including malware, phishing websites, and hacker attacks. Conventional defense software struggles to prevent unknown attack methods, making developing effective detection techniques crucial. Distributed Denial-of-Service (DDoS) attacks represent a persistent and evolving cybersecurity challenge. As such, enhancing the defense and detection of DDoS attacks is imperative. This study addresses the open set recognition (OSR) problem, a crucial pattern recognition aspect involving managing unknown categories. To address this challenge, we employ Spatial Location Constraint Prototype Loss (SPCPL) and Fuzzy C-Means methods. The findings of the experiment indicate that the unknown attack detection technique we have suggested, which relies on Fuzzy C-Means open set recognition, exhibits superior performance compared to conventional known attack detection methods in detecting and preventing unknown attacks. The misjudgment rate is extremely low given the high accuracy rate of 98% and minimal sample overlap. Finally, improving the dependability and consistency of models in real-world implementations can aid in mitigating intricate and ever-changing attack situations.

Keyword: Distributed Denial-of-Service (DDoS), Open set recognition, Spatial Location Constraint Prototype Loss, Fuzzy C-Means

1

Introduction

The COVID-19 pandemic, which emerged in 2020, brought about a heightened reliance on the internet, resulting in a notable surge in Distributed Denial of Service (DDoS) attacks. For enterprises functioning as service providers, ensuring the stable operation of their networks and services has become paramount. c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 366–380, 2024. https://doi.org/10.1007/978-981-99-9412-0_38

Unknown DDoS Attack Detection

367

Any disruption caused by an attack can lead to substantial business losses and reputational harm. However, the continually evolving DDoS attack techniques have rendered conventional methods inadequate in countering these novel threats[1]. In light of these circumstances, there is an imperative to empower existing Intrusion Detection Systems (IDS) to effectively detect and report unknown traffic patterns, assisting network engineers in discerning unprecedented anomalies. Leveraging machine learning and deep learning, IDS has demonstrated heightened efficacy, leaving malicious actors with limited concealment opportunities[2]. Nevertheless, machine learning-based IDS still exhibit vulnerabilities, particularly when confronted with unknown patterns, leading to a significant decline in accuracy. Thus, the primary objective of our research is to propose multiple IDS approaches that can simultaneously detect known and unknown DDoS attacks and rigorously evaluate their testing performance. By employing the Fuzzy CMeans clustering method and Spatial Location Constraint Prototype Loss, we aim to equip organizations with a robust, open-set identification technology capable of effectively safeguarding their network infrastructures and preserving uninterrupted service delivery amidst the evolving landscape of cyber threats. The findings of this study contribute to advancing the field of network security and offer valuable insights for enhancing the resilience of IDS in contemporary cybersecurity scenarios. The main contributions of this study are focused on the following aspects: • We adopted AlexNet as the neural network architecture for our training procedure. Simultaneously, we improved the AlexNet architecture in order to classify conventional data more efficiently. • Through the clustering methods of SLCPL and FCM, we adjusted the positions of unknown attack samples, bringing them closer to specific categories and thereby enabling the recognition of unknown attacks.

2 2.1

Related Works DDoS Attack

In the absence of DDoS attacks, network systems can operate smoothly and provide uninterrupted services. Achieving such a robust network environment involves considering several critical factors. Firstly, meticulous network planning and architecture design are essential, encompassing bandwidth allocation, network topology, and redundancy mechanisms [3]. Secondly, the implementation of effective intrusion detection and defense systems plays a pivotal role in promptly identifying and mitigating various attack vectors. Furthermore, regular security vulnerability scanning and timely updates are crucial measures for maintaining network security at its optimal level. Lastly, the deployment of robust traffic monitoring and management mechanisms aids in identifying and

368

H. Kao et al.

addressing anomalous traffic patterns. Network systems can sustain stable operations and continuous service provision by integrating these measures, even in DDoS attacks. 2.2

AlexNet

AlexNet is a deep Convolutional Neural Network (CNN) introduced by Alex Krizhevsky et al. in 2012, which achieved remarkable success in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [4]. This network architecture marked a significant milestone at the time, exerting a substantial influence on the advancement of deep learning and the progress of image classification tasks. The primary characteristics of AlexNet lie in its combination of depth, convolution, and pooling layers, along with a considerable number of trainable parameters. Its architecture is as follows: 1. Combination of Convolutional and Pooling Layers: AlexNet employs convolutional and pooling layers for feature extraction. These layers effectively capture local features within images and utilize pooling layers to reduce the dimensionality of feature maps while retaining essential features. 2. Rectified Linear Unit (ReLU) Activation Function: AlexNet applies the ReLU activation function after each convolutional layer, which addresses the vanishing gradient problem and accelerates the training process. 3. Dropout: The dropout technique is introduced in the fully connected layers of AlexNet, mitigating overfitting and enhancing the model’s generalization capability. 4. Multi-GPU Training: AlexNet pioneered the use of multiple GPUs for training in deep learning models, significantly accelerating the training process. The success of AlexNet underscores the formidable capabilities of deep learning in tasks like image classification. While more profound neural network architectures emerged in the subsequent years, AlexNet’s role as a starting point in deep learning has cast a lasting influence on the design and development of subsequent models (Fig. 1). 2.3

Open Set Recognition

Open Set Recognition (OSR) [5] is a significant problem in machine learning and pattern recognition, aiming to address the identification challenges posed by unknown classes or unseen samples in the real world. OSR seeks to tackle this issue by extending the recognition problem to scenarios that encompass both known and unknown classes. This approach often leverages category modeling and anomaly detection techniques to establish an identification model that distinguishes between known and unknown classes while detecting and excluding unknown samples. In this context, the category modeling process is employed to train the model for recognizing known classes, whereas anomaly detection is utilized for identifying unknown classes or unseen samples.

Unknown DDoS Attack Detection

369

Fig. 1. Architecture of AlexNet

2.4

Spatial Location Constraint Prototype Loss

The Spatial Location Constraint Prototype Loss (SLCPL) [6] is a loss function designed for deep learning models with the aim of enhancing the model’s sensitivity to the spatial location of objects. Specifically, the SLCPL algorithm incorporates the positional information of class prototypes as constraint conditions, enabling samples from the same class to be more concentrated in the feature space while maintaining a similar spatial distribution among them. By minimizing the loss function, the model is encouraged to allocate samples from the same class to regions that correspond to their respective positions, thereby enhancing spatial accuracy. 2.5

Fuzzy C-Means Algorithm

Fuzzy C-Means (FCM) [7] is a widely-used fuzzy clustering algorithm for partitioning data points into various clusters. In contrast to the conventional KMeans clustering, FCM permits each data point to possess a membership degree between 0 and 1 for different clusters, representing the extent to which the data point belongs to each cluster. The primary objective of FCM is to minimize the squared error between each data point and the center of the cluster to which it belongs while accounting for the influence of membership degrees. The fundamental idea behind this algorithm involves iterative computations that progressively adjust cluster centers and membership degrees until a convergence condition is met. FCM’s strengths include its capability to handle fuzziness and overlapping data points, showing good performance in various practical applications. However, the algorithm is sensitive to the choice of initial values and is also sensitive to noisy data. Additionally, FCM exhibits relatively high computational complexity, especially when dealing with large datasets. FCM is a flexible fuzzy clustering algorithm that effectively approaches data distributions characterized by fuzziness and overlap. In practical applications, clustering accuracy and efficiency can be improved by appropriately adjusting algorithm parameters and initial values and integrating other techniques.

370

2.6

H. Kao et al.

Unknown DDoS Detection

In existing approaches, machine learning and data mining techniques have been widely applied to the detection of unknown DDoS attacks. For instance, methods based on traffic analysis utilize statistical features and behavioral patterns to detect anomalous attack traffic [8], or employ Bidirectional Long Short-Term Memory (BI-LSTM) models, Gaussian Mixture Models (GMM), and incremental learning [9]. Similarly, machine learning-based approaches utilize training datasets to build models and detect attacks by comparing actual traffic with predicted outcomes. However, detecting unknown DDoS attacks still poses several challenges. Firstly, the constantly evolving strategies and techniques of attackers make it difficult to predict and detect attack features. Secondly, traditional detection methods often require prolonged learning and training times, leading to delays in real-time detection. Additionally, the complexity and high transmission rates of large-scale networks further compound the difficulty of detection. In research by Liang et al. [10], the authors addressed the problem of detecting out-of-distribution (OOD) images without the need for training data from unknown distributions in an unlabeled open-set environment. In various application domains, such as computer vision and image classification, machine learning models are typically trained on specific data distributions. When confronted with unknown data different from the training data, the model’s performance may deteriorate. Consequently, detecting and identifying images from unknown distributions is crucial for ensuring model stability and reliability. In a 2017 publication by Schlegl et al. [11], an approach was proposed that employs Generative Adversarial Networks (GANs) for unsupervised anomaly detection and guided discovery of anomalies. The paper introduced a GAN-based unsupervised anomaly detection method where the combination of a generator and a discriminator is used for training. The key idea of this method is to introduce stochastic variation through noise into the generator, making the generated images more diverse. Simultaneously, the discriminator is trained to differentiate between normal and anomalous samples. During training, the generator is compelled to produce images similar to benign samples but different from anomalous samples.

3 3.1

Methodology Proposed Framework

Figure 2 illustrates the architecture of our Unknown Distributed Denial of Service Intrusion Detection System (Unknown DDoS IDS) developed in this study, which is based on a variety of deep learning models. The architecture consists of three main modules: the Data Preprocessing module, OSR module, and the Unknown Identification module. The system is trained using the CICIDS2017 dataset [12], and performance, detection methodologies, accuracy of known attacks, and detection rate of unknown attacks are compared and validated. We employ the AlexNet model to assess system performance, and two OSR

Unknown DDoS Attack Detection

371

Fig. 2. Architecture of Unknown DDoS Detection System

methods, namely SLCPL, are employed for detecting unknown DDoS attacks. Furthermore, we introduce the Fuzzy C-Means clustering technique to differentiate between known and unknown attacks. As the employment of the SLCPL method alone may not entirely distinguish between known and unknown attacks, the clustering method is included to enhance the classification performance. 3.2

Unknown DDoS Attack Model

In our model, we have adapted the AlexNet [4] architecture by employing onedimensional convolutions. The structure of 1D AlexNet encompasses several crucial components. Firstly, convolutional layers employ convolution operations to capture features from one-dimensional data. Following this, activation function layers introduce non-linearity, often utilizing the ReLU function. Subsequently, pooling layers are integrated to reduce feature maps’ dimensions and computational load while retaining essential features. Additionally, 1D AlexNet includes fully connected layers that map features to the final classification outcome. Finally, the output layer utilizes softmax to generate the ultimate classification result. Typically, 1D AlexNet consists of interleaved multiple convolutional and pooling layers to extract features at various levels. Batch normalization layers can be added after each convolutional layer to expedite convergence and enhance model stability. Preceding the fully connected layers, Dropout layers can be employed to alleviate overfitting. 1D AlexNet exhibits formidable capabilities in processing one-dimensional data, such as audio recognition, natural language processing, and bioinformatics, among other fields. By appropriately adjusting the parameters and layer configuration of 1D AlexNet, models can be constructed and trained according to the specific requirements of the task. 3.3

Spatial Location Constraint Prototype Loss

Spatial Location Constraint Prototype Loss (SLCPL) [6] is a type of loss function designed for deep learning models with the aim of enhancing the spatial location

372

H. Kao et al.

sensitivity of the model towards objects. The SLCPL attempts to address this issue by introducing location constraints. It is based on a key concept: the distribution of object features should be closely related to their spatial location. This method associates the prototype vector of each class with the corresponding position information of that class and applies location constraints to the prototype learning process. Specifically, the SLCPL utilizes the positional information of class prototypes as constraint conditions, resulting in a concentration of samples from the same class within the feature space while maintaining a similar distribution of positions among them. By minimizing the loss function, the model is encouraged to assign samples from the same class to regions that correspond to their respective positions, thereby enhancing spatial positioning accuracy. By introducing the spatial location constraint prototype loss, the model can better utilize object location information, enhancing its spatial awareness of objects and improving recognition and localization accuracy. This approach holds potential for various computer vision tasks, such as object detection, object tracking, and scene segmentation (Fig. 3).

Fig. 3. Feature Space Distributions using SLCPL [6]

The following equation defines the Spatial Location Constraint Prototype Loss: N  N  M M   Ls = |f (xij ) − Pi |2 + λ |Lij − Li |2 (1) i=1 j=1

i=1 j=1

”In this equation, a positional constraint term is introduced in addition to the original prototype distance term. Lij represents the position of the ith sample, and Li represents the position of the ith class. By minimizing the positional constraint term, the loss function is able to project samples closer to the prototype of their respective class at their corresponding positions.

Unknown DDoS Attack Detection

3.4

373

Unknown DDoS Attack Identification

We employ the SLCPL (Spatial Location Constraint Prototype Loss) with the Fuzzy C-Means method to address unknown attack scenarios for further cluster analysis. This approach aims to more accurately distinguish between samples belonging to known attack classes and unknown attack classes in the feature space. Fuzzy C-Means is a clustering analysis method to partition samples into multiple fuzzy categories. The algorithm considers the degree of membership of each sample to every category rather than just employing traditional hard clustering methods. The following formula represents the Fuzzy C-Means algorithm: Given N samples and C cluster centers, each sample is denoted as xi , and each cluster center as vj . Define the fuzzy membership matrix U, where Uij represents the degree of membership of sample xi to cluster center vj . The objective function of Fuzzy C-Means aims to minimize the following cost function: J(U, V ) =

N  C 

m Uij ||xi − vj ||2

(2)

i=1 j=1

Here, m is the fuzziness parameter that controls the degree of fuzziness in clustering. This objective function aggregates the weighted sum of Euclidean distances between samples and cluster centers. The algorithm proceeds as follows 1. Initialize the fuzzy assignment matrix U , and the cluster centers V . 2. Perform iterative updates until a stopping condition is met (the maximum number of iterations or convergence of the objective function): Update U : Compute the degree of membership of each sample to each cluster center using the membership update formula:  Uij =

 C   ||xi − vj || m−1

k=1

2

−1

||xi − vk ||

(3)

Update V : Calculate the coordinates of each cluster center using the coordinate update formula: N m i=1 Uij · xi (4) vj =  N m i=1 Uij 3. Return the final fuzzy assignment matrix U and cluster centers V. The core principle of this algorithm lies in iteratively updating the fuzzy assignment matrix and cluster centers, continuously adjusting the membership degrees of samples and the positions of cluster centers to minimize the objective function. Through the fuzzy assignment matrix, one can obtain the fuzzy membership degrees of each sample with respect to each cluster center rather than solely obtaining rigid classification outcomes.

374

4 4.1

H. Kao et al.

Experiment Result CICIDS2017dataset

CICIDS2017 is a widely used dataset for the research and development of IDS. Developed by the Canadian Institute for Cybersecurity (CIC), CICIDS2017 aims to simulate various network attacks and normal traffic in real-world networks [13]. Below is a detailed overview of the CICIDS2017 dataset: 1. Dataset Structure: The CICIDS2017 dataset consists of six different subsets, namely Benign Traffic, DoS, DDoS, PortScan, DDoSSmurf, and BruteForce. Each subset contains different attack types and normal traffic, totaling over 8,500,000 network connections. 2. Attack Types: The dataset encompasses several attack types, including DoS attacks, DDoS attacks, port scans, and brute force attempts. These attack types simulate common network attack behaviors encountered in real-world scenarios, making it suitable for training and testing intrusion detection systems. 3. Feature Extraction: CICIDS2017 provides an extensive range of features describing various network connection aspects. These features include statistical characteristics based on the transport, network, and application layers, such as transport layer traffic, protocol types, source IP, and destination IP. These features aid in the analysis and detection of network attacks. 4. Data Authenticity: The generation of the CICIDS2017 dataset is based on real network data, including actual network traffic and attack behaviors. This enhances the dataset’s realism, contributing to the accuracy and efficacy of intrusion detection systems. 5. Applicability: CICIDS2017 can be used for training, validating, and testing the performance of intrusion detection systems. Researchers and developers can utilize this dataset for performance evaluation, feature selection, model optimization, algorithm comparison, and related tasks. In conclusion, the CICIDS2017 dataset is widely employed in intrusion detection research due to its diverse attack types, rich feature set, and reliance on authentic network data. Its applicability and authenticity render it a valuable resource for researchers and developers on intrusion detection systems. Table 1. CICIDS2017 attack data form [13] Date of Attack

Event of Attack

Monday July 3, 2017

Benign

Tuesday July 4, 2017

FTP-Patator SSH-Patator

Wednesday July 5, 2017 DoS slowloris DoS Slowhttptest DoS Hulk DoS GoldenEy Heartbleed Port 444 Thursday, July 6, 2017

Web - Brute Force Web - XSS Web - Sql Injection Infiltration

Friday July 7, 2017

Botnet ARES Port Scan DDoS LOIT

Unknown DDoS Attack Detection

4.2

375

Experimental Environment

In this research, we employ the CICIDS2017 and CICDDoS2019 datasets for experimentation, utilizing a workstation with the Ubuntu 20.04 operating system. Our workstation is equipped with an AMD Ryzen 5700X 8C16T processor, 96GB DDR4 memory, and Nvidia RTX3070 and Nvidia RTX2060 as computational acceleration devices. We employ NVIDIA Driver Server version 510 as the driver software and use VSCode in conjunction with Conda as the development environment. Regarding the model framework, we undertake development using PyTorch 1.11.0, sklearn, and Python 3.9.12. 4.3

Evaluation Metric

In machine learning and pattern recognition, evaluation metrics serve as quantifiable measures to gauge model performance and effectiveness. These metrics assess and compare a model’s performance on a specific task or problem. Several common evaluation metrics are as follows: 1. Accuracy: Measures the correctness of model predictions, the proportion of correctly predicted samples out of the total samples. 2. Precision: Evaluate the ratio of true positive predictions to the total predicted positive samples. 3. Recall: Assesses the ratio of true positive predictions to the total true positive samples. 4. F1 Score: An evaluation metric that comprehensively considers precision and recall, allowing for a balance between these aspects. 5. ROC Curve (Receiver Operating Characteristic curve): A curve plotted based on the model’s true positive rate and false positive rate, used to assess the model’s performance at various threshold levels. 6. AUC (Area Under the ROC Curve): A metric that measures the area under the ROC curve, providing an evaluation of the model’s performance that combines precision and recall in a balanced manner. These evaluation metrics can be chosen based on the specific problem and task, and their interpretation can be adjusted according to the requirements. It is customary to comprehensively consider multiple evaluation metrics to attain a comprehensive assessment and comparison of model performance in model evaluation. In the context of this research, the metrics used for assessing model performance include accuracy, precision, recall, and the F1-Score. These metrics will measure the model’s performance in distinguishing between different classes. 4.4

Conventional DDoS Identification Modules

In this study, we employed the 1D AlexNet for training purposes. However, we observed that the training process increased dispersion within the distribution of known categories, as depicted in Fig. 4. This dispersion phenomenon emerged

376

H. Kao et al.

due to our transformation of the original one-dimensional AlexNet into a twodimensional counterpart, causing the samples in our feature space to remain inadequately classified. In order to address this issue, modifications were introduced to the 1D AlexNet architecture. Our approach entailed the incorporation of fully connected layers following each convolutional layer. As illustrated in Fig. 5, the integration of these fully connected layers resulted in a significant improvement in the outcome.

Fig. 4. Unmodified 1D AlexNet

Fig. 5. 1D AlexNet with a fully connected layer added after the convolutional layer

Despite our continuous experimentation with various modifications to the 1D AlexNet model during the experiments, the results did not yield improvements comparable to the outcome achieved by incorporating fully connected layers after each convolutional layer. Concurrently, we evaluated this result through the assessment of a confusion matrix. The visualization of this evaluation can be observed in Fig. 6. It is observed that the accuracy for each category exceeded 98%, yielding highly satisfactory results.

Unknown DDoS Attack Detection

377

Fig. 6. Known attack confusion matrix

4.5

Unknown Identification Module

The poorest results were observed when we projected the unknown attack samples onto the feature space. The majority of the unknown attack samples were covered by our known attack samples, which contradicts our expectations for the feature space. We hypothesize that the issue might be related to the clustering method. It is important to note that SLPCL was not the aspect we considered modifying, as, without SLPCL, the distribution became highly scattered. Thus, following the output of SLPCL, we further passed the data through another novel clustering method. Therefore, we opted for the Fuzzy C-means clustering method. Fuzzy C-means effectively repositions the distribution of unknown classes. Initially, we increased the number of classes from 7 to 10 in order to better reposition the samples. From Fig. 7, it can be observed that the distribution when applying Fuzzy C-means is very favorable, effectively avoiding the known attack samples. This is a significant discovery in our context.

Fig. 7. Distribution status of unknown attack samples

378

H. Kao et al.

We conducted a cumulative calculation of the distribution region of unknown attack classes to compute precision, yielding excellent results. Subsequently, we evaluated false positive rate, accuracy, and F1 score, all of which exceeded 97%, even surpassing 99%. Such outcomes are highly satisfactory. In the context of unknown DDoS detection, if we directly input the unknown dataset into the model, due to the nature of closed-set learning, the deep feature distribution of the closed-set does not allocate any space for unknown features. Consequently, unknown features overlap with known features, leading to a sharp decrease in accuracy. As observed from Table 2, if we directly feed the unknown dataset into the model, owing to the characteristics of closed-set learning, unknown features overlap with known features, resulting in a significant decrease in accuracy and even failing to achieve an accuracy of 90%. Table 2. Known DDoS detection performance Test Data Set

Precision Recall Accuracy F1

CICIDS2017 Wednesday 0.9882

0.9976 0.9926

0.9917

CICIDS2017 Friday

0.8386 0.8383

0.7856

0.7518

As per the outcomes presented in Table 3, after undergoing processing by Fuzzy C-Means, all evaluation metrics of the model exceeded 97%, surpassing our expectations. This demonstrates that the proposed open-set recognition method is effective in distinguishing known and unknown data within the deep feature space. Table 3. Known DDoS detection performance Test Data Set

5

Precision Recall Accuracy F1

CICIDS2017 Wednesday 0.9882

0.9976 0.9926

0.9917

CICIDS2017 Friday

0.9999 0.9814

0.9888

0.9778

Conclusion

Information security has witnessed numerous advanced applications with the flourishing development of the Internet. Service providers now strive to achieve stable service quality, which has become their crucial objective. However, some individuals view DDoS attacks as a means of generating revenue. Current research primarily focuses on training and testing with known attack types. Nevertheless, intrusion detection systems trained solely on datasets have limitations in identifying novel unknown traffic. Therefore, this study proposes a hybrid approach that combines the characteristics of unsupervised and supervised networks. Experimental results demonstrate that the proposed framework provides

Unknown DDoS Attack Detection

379

a method for the closed-set training model to reject outputs or identify them as unknown attacks. This method relies on data labeling by domain experts and utilizes incremental learning to enhance the model further. In conclusion, this research yields two significant findings. We initially employed a neural network model designed for two-dimensional images in the study. However, during the research process, we modified the model to adapt to the processing requirements of one-dimensional data. Additionally, we made multiple adjustments to the neural network model to enhance its performance throughout the study. Using this specific neural network model might not be the optimal choice. Therefore, in the neural network aspect, we aim to explore the utilization of more complex models, expecting better results when dealing with unknown attacks. Moreover, we also seek to improve the clustering model further to enable better aggregation of samples from unknown attacks. Acknowledgement. This research was supported by the National Science and Technology Council with grant numbers: NSTC 112-2221-E-992-045, NSTC 112-2221-E992-057-MY3 and MOST 109-2221-E-992 -073 -MY3.

References 1. Mirkovic, J., Prier, G., Reiher, P.: Attacking DDoS at the source. In: 10th IEEE International Conference on Network Protocols, 2002. Proceedings. IEEE, Conference Proceedings, pp. 312–321 (2002) 2. Zhang, B., Zhang, T., Yu, Z.: DDoS detection and prevention based on artificial intelligence techniques. In: 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Conference Proceedings, pp. 1276–1280 (2017) 3. Tripathi, N., Hubballi, N.: Application layer denial-of-service attacks and defense mechanisms: a survey. ACM Comput. Surv. (CSUR) 54(4), 1–33 (2021) 4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017) 5. Geng, C., Huang, S.-J., Chen, S.: Recent advances in open set recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3614–3631 (2020) 6. Xia, Z., Wang, P., Dong, G., Liu, H.: Spatial location constraint prototype loss for open set recognition. Comput. Vis. Image Underst. 229, 103651 (2023) 7. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984) 8. Ahmed, M., Mahmood, A.N., Hu, J.: A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 60, 19–31 (2016) 9. Zhang, Z., Zhang, Y., Niu, J., Guo, D.: Unknown network attack detection based on open-set recognition and active learning in drone network. Trans. Emerg. Telecommun. Technol. 33(10), e4212 (2022) 10. Hsu, Y.-C., Shen, Y., Jin, H., Kira, Z.: Generalized ODIN: detecting out-ofdistribution image without learning from out-of-distribution data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Conference Proceedings, pp. 10951–10960 11. Schlegl, T., Seeb¨ ock, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp. 146–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59050-9 12

380

H. Kao et al.

12. Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M.: Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE Access 9, 22351–22370 (2021) 13. Brunswick, U.O.N.: Intrusion detection evaluation dataset (CIC-IDS2017) (2023). http://www.unb.ca/cic/datasets/ids-2017.html

Dairy Cow Behavior Recognition Technology Based on Machine Learning Classification Che-Wei Chou, Chang-Ang Lee, Shu-Wei Guo, Chin-Shiuh Shieh, and Mong-Fong Horng(B) Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan [email protected]

Abstract. For the livestock industry, the health of cattle is closely related to the quality of their products and the profitability of the enterprise. Timely detection of the health condition of cattle during the farming process is essential to prevent widespread infections or diseases. This study employs inertial sensing devices combined with various machine learning techniques for the classification and comparison of cattle behavior, thus achieving cattle behavior recognition technology. Initially, accelerometer data is collected from different cattle behaviors, and features are extracted from the data. Machine learning algorithms are then applied to classify these features, resulting in the implementation of cattle behavior recognition technology. The computational analysis presents the recognition rates more than 90% for six cattle behaviors and the best case with some behaviors even reaches 95%. With accurate cattle behavior analysis technology in place, livestock operators can gain insights into the activity patterns of cattle, enabling them to identify abnormal health conditions and facilitate early treatment, thus reducing losses. Keywords: Cattle Behavior Recognition · Machine Learning · Inertial Sensing Technology

1 Introduction 1.1 Background Currently, there are various diseases affecting cattle, such as anthrax, Bovine Viral Diarrhea (BVD), bovine tuberculosis, Infectious Bovine Keratoconjunctivitis (IBK), Infectious Bovine Rhinotracheitis (IBR), and more. Livestock diseases pose significant challenges to the livestock industry worldwide. Timely detection and resolution of issues are crucial to reduce livestock mortality rates. Early livestock monitoring techniques were time-consuming, had higher error rates, and incurred high equipment costs. Confirming the health status of livestock often required waiting for veterinary observation, delaying necessary treatment and potentially resulting in livestock deaths. This situation significantly impacts the overall economic growth of the livestock industry [1]. Currently in Taiwan, there are less than 40 large animal veterinarians. Estimated data suggests that © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 381–391, 2024. https://doi.org/10.1007/978-981-99-9412-0_39

382

C.-W. Chou et al.

on average, one veterinarian is responsible for nearly 5000 head of cattle, which is five times higher than the ratio in Japan [2]. Therefore, if early detection of abnormal cattle behavior can be achieved through cattle behavior recognition technology, there is a potential opportunity to administer relevant treatment before the condition worsens. 1.2 Motivation and Purpose Benefiting from the advancements in Internet of Things (IoT) devices and cloud-based applications, the adoption of high-tech applications in farm operations worldwide is gradually increasing. The integration of technological innovations has enhanced farmers’ production efficiency, ranging from drones capable of automated pesticide spraying and robots for crop harvesting to applications of smart agriculture utilizing artificial intelligence and big data to monitor the conditions of confined livestock. This technological integration holds immense potential for farm management. This study employs intelligent collars equipped with inertial sensing components on cattle to record the variations in these sensors during different behaviors. It aims to recognize behaviors such as resting, various feeding patterns, rumination, drinking, and salt licking. If certain behavioral data falls below standard values, it can promptly alert farmers for necessary actions. The information farmers desire to obtain while raising cattle is illustrated as follows: • • • •

Injured legs resulting in inability to walk Stomach discomfort leading to abnormal eating or rumination behavior Insufficient water intake due to short drinking time Prolonged absence of salt licking causing nutritional deficiencies

2 Literature Review 2.1 Common Cattle Behaviors Ruminating is a unique digestive process in cattle, which is crucial for their growth and development. The main differences between resting and ruminating (as shown in Fig. 1) include differences in posture, where cattle slightly raise their heads during rumination; differences in breathing, with a lower respiratory rate during rest and a faster and shallower respiratory rate during rumination; and differences in chewing, as cattle perform chewing motions during rumination to regurgitate and rechew their food. Cattle licking salt refers to the behavior of cattle satisfying their body’s need for sodium and other minerals by licking salt. The deficiency of sodium and other minerals in cattle can lead to nutritional deficiencies and health issues. Insufficient sodium and other minerals can result in reduced appetite, low milk production, delayed growth, and weakened immune function, among other problems. On farms, cattle feeding behaviors are generally categorized into two types: stall feeding and grazing (as shown in Fig. 2). This differentiation is mainly due to the fact that during stall feeding, cattle consume relatively uniform feed, involving simpler chewing actions primarily using the upper and lower teeth for bite and chewing. On the other hand, during grazing, cattle need to move their heads and mouths to select and

Dairy Cow Behavior Recognition Technology

383

chew on grass, involving more complex chewing actions. The advantage of grazing is that cattle can choose and feed according to their preferences, which facilitates better digestion and nutrient absorption for the cattle themselves.

Fig. 1. Illustration of Cattle Resting (RES), Ruminating (RUS), and Licking Salt (SLT) behaviors [3]

Fig. 2. Illustration of Cattle Grazing Feeding(GRZ) and Bunk Feeding(FES) [3]

2.2 Comparison with Relevant Research In the study by R. Dutta et al. [4], multiple machine learning classifiers were used to classify different cow behaviors, including eating, rumination, resting, and walking, with random forest showing the best performance. The accelerometers they used were attached near the cow’s neck to record their behaviors. J. Kaler et al. [5] used random forest, multi-layer perceptron, SVM, AdaBoost, and KNN to classify lame and non-lame sheep, with random forest yielding the best results. To recognize the posture of cows using accelerometers, L. Riaboff et al. [6] placed accelerometers on cows in a manner similar to earrings. They applied various machine learning methods including Extreme Gradient Boosting (XGB), random forest, and SVM, with XGB demonstrating the best performance.

3 Cow Behavior Recognition Technology 3.1 Introduction to the Dataset This study utilized the Japanese Black Cattle behavior dataset collected by Shinshu University in Nagano, Japan [7]. The dataset consists of tri-axial accelerometer data capturing the behavior of six different Japanese Black Cattle individuals. The sampling frequency of the data is 25 Hz. The cattle were allowed to move freely in two open spaces, namely a grassy field and an enclosure within a farm. Simultaneously, video recordings were taken to document the behaviors. The labeling of behaviors was carried out by experienced caretakers based on the video footage. This study focused on six common behaviors exhibited by the cattle: • Resting while standing, labeled as RES. • Ruminating, labeled as RUS.

384

C.-W. Chou et al.

Fig. 3. Proportions of Different Behaviors in the Dataset

• • • •

Fig. 4. Application Scenario Diagram

Feeding at a feeding station, labeled as FES. Grazing on the grassy field, labeled as GRZ. Drinking water, labeled as DRN. Licking salt, labeled as SLT.

The proportions of these six behaviors in the dataset are depicted in Fig. 3. Figure 4 illustrates the application scenario of this study. 3.2 Behavior Recognition Server System Architecture This study utilized an Intel i5–9500 CPU running on the Windows 10 operating system for cattle behavior recognition technology. The Python and MATLAB functions used during development are listed in Fig. 5.

Fig. 5. System Architecture Diagram of Behavior Recognition Server

Dairy Cow Behavior Recognition Technology

385

3.3 Calculation Process The calculation process of this study is illustrated in Fig. 6. Firstly, the collected threeaxis acceleration values undergo data preprocessing such as normalization and moving average, followed by feature extraction. Next, the dynamic vector sum is calculated to quantify the intensity of the object’s movement, enhancing recognition accuracy. Subsequently, the accumulated distance values between the dynamic vector sum and the predicted value are computed over a 4-s time interval to distinguish whether the cattle are in a moving state. Finally, the data that is not in a moving state is subjected to classification using the decision tree and random forest methods for the classification of the remaining four behaviors.

Fig. 6. Flowchart of the Calculation Process

3.4 Data Preprocessing In order to mitigate the impact of noise caused by the environment when collecting cattle behavior data, such as distinguishing between resting and rumination, where the difference lies only in slight head raising and lower jaw movement, noise can easily lead to inaccurate recognition. Therefore, prior to analyzing the data, data preprocessing is necessary to enhance subsequent behavior recognition. This study employed moving average filtering and normalization as preprocessing techniques on the raw data. Moving Average (MA) helps to eliminate noise or fluctuations in the signal, aiding in the removal of momentary interferences or noise. This is represented by Eq. (1). y[xi ] =

N −1 1  xi [n − k], i ∈ {x, y, z} N k=0

(1)

386

C.-W. Chou et al.

where y[xi ] represents the filtered result, xi [n] denotes the values of the original input sequence, and N indicates the size of the moving average window. After applying moving average processing, this study normalized the data to the range of [0, 1] to reduce the differences between different cattle, aiming to enhance the performance and stability of machine learning algorithms. This is shown in Eq. (2). Nr[n] =

(y[n] − min(n)) (max(n) − min(n))

(2)

where y[n] represents the data after moving average, and min(n) and max(n) are the minimum and maximum values, respectively. 3.5 Machine Learning Methods Machine learning is well-suited for training effective classifiers from known behavioral samples to distinguish patterns in cattle behavior. These patterns can then be used to classify the behaviors of unknown cattle. The machine learning classification methods used in this study are introduced as follows. Decision Tree. In this study, the sklearn.tree package was used for implementing the decision tree algorithm. The imported dataset was split into an 80% training set and a 20% test set for classification. Through experimentation, it was determined that setting the maximum depth of the tree (max depth) to 12 achieved the highest accuracy. The minimum sample split (min samples split) is one of the conditions controlling the stopping criterion for tree splitting. When the sample count at a node is less than the specified value, the decision tree stops splitting at that node to prevent overfitting. This parameter is typically set between 2% and 10% of the total dataset. Given the unequal proportions of behavior sample labels in this study, the behavior with the fewest samples was chosen and set to 2% for this parameter value. Another parameter is the maximum number of features (max features), used to ensure that each node considers a certain number of features to avoid having too few features, which could result in an insufficient sample count. Random Forest. There are two common voting mechanisms in Random Forest: majority voting and weighted voting. Majority voting selects the class that appears most frequently as the prediction, while weighted voting assigns weights based on the accuracy of each decision tree and selects the final prediction after summing the weighted results. To implement the multi-decision tree voting of Random Forest, this study fine-tuned the parameters of the decision trees. Firstly, the data underwent preprocessing, and then the dataset was split into a 20% testing set and an 80% training set, with different behaviors labeled accordingly. The model was constructed using the RandomForestClassifier package from the sklearn library. When adjusting the Random Forest model, the following three parameters were involved: Number of Trees (n estimators): A higher number of trees leads to higher accuracy but increases computation time. This study set the number of decision trees to 150 for voting. Tree Depth (max depth): In comparison with

Dairy Cow Behavior Recognition Technology

387

individual decision trees, the maximum depth of the trees in Random Forest was also set to 12. Minimum Samples for Split (min samples split): Since Random Forest involves multi-decision tree voting, it tolerates a relatively higher minimum sample requirement. Setting it too high would reduce the diversity among different decision trees and undermine the advantages of Random Forest. This study set the minimum samples for split to 40. Additionally, concerning the maximum number of features (max features), setting it too low would result in each decision tree considering only a few features, affecting the model’s performance. Setting it too high might lead to overfitting. Considering that some behavior labels had a smaller proportion in the dataset, the calculation of the maximum number of features was based on the average of these behavior labels, followed by taking the square root. Finally, this study used a confusion matrix to present the prediction results, comparing the performance of Random Forest and decision trees in classification. Through the multi-decision tree voting mechanism, Random Forest embodies the concept of collective intelligence. Extreme Gradient Boosting (XGBoost). In this study, after employing the machine learning methods of decision trees and random forests for classification, XGBoost was introduced for comparison. The distinction between XGBoost and random forests can be explained as the difference between Bagging and Boosting. Bagging generates individual trees through random sampling, without interconnections, while random forests are an implementation of Bagging. Boosting, on the other hand, generates trees successively, with each subsequent tree built upon the previous one. XGBoost is a realization of the Boosting method. Each tree in a Boosting model addresses the shortcomings of the preceding tree, making it generally more precise than Bagging. In the model building process, preprocessed acceleration data was used. The dataset was split into an 80% training set and a 20% testing set, with behaviors labeled accordingly. The model was constructed using the ensemble package from the sklearn library. When selecting the XGBoost model, the softmax function from Eq. (3) was utilized. This function, also known as the normalization function, maps vectors to the [0, 1] interval, representing the probability distribution of each classification. The sum of the vector elements equals 1.   ezj Softmax zj =  K

R=1 e

ZR

, for i = 1, . . . , K

(3)

Next, important model parameters need to be determined, including the maximum tree depth, total number of iterations, learning rate, and maximum number of features. After hyperparameter tuning, a model with 14 layers and 400 iterations was chosen. The learning rate reflects the similarity of features between generating subsequent sets of trees and the previous set of trees. A higher learning rate indicates a higher degree of similarity. In this study, a learning rate of 0.1 was adopted, meaning that 10% of the features from the previous tree are retained in each iteration. Lastly, since the dataset size remains unchanged, the choice of maximum number of features aligns with the previous decision for the decision tree and random forest models.

388

C.-W. Chou et al.

3.6 Model Optimization and Performance Tuning GridSearchCV provides a straightforward and effective method for selecting the best combination of hyperparameters. It operates by conducting an exhaustive search within the specified ranges of hyperparameters, performing cross-validation for each combination, and calculating corresponding model performance metrics. Ultimately, it selects the optimal set of hyperparameters based on the results of cross-validation. Table 1 and Table 2 respectively present the tuned hyperparameters in the Random Forest and XGBoost models of this study. Among these, “max_depth” refers to the depth of decision trees, where a higher value indicates a more complex model. “n_estimators” represents the number of decision trees within the model. “learning_rate” denotes the learning rate, with higher values implying a greater proportion of the previous tree’s contribution in the subsequent tree. “subsample” signifies the proportion of sub-samples used during training; for instance. Lastly, “colsample_bytree” refers to the proportion of sub-samples used when building each decision tree. Table 1. Random Forest Hyperparameter Tuning Metrics. Parameter

Range

max_depth

7、8、9、10、11、12、13、14

n_estimators

100、150、200、250、300

Table 2. XGBoost Hyperparameter Tuning Metrics. Parameter

Range

max_depth

7、8、9、10、11、12、13、14

learning_rate

0.01、0.02、0.05、0.1、0.2、0.5

subsample

0.5、0.6、0.7、0.8、0.9

colsample_bytree

0.8、0.9、1.0

n_estimators

100、150、200、250、300

4 Performance Evaluation and Analysis 4.1 Restated Validation Objectives In this study, three different machine learning methods were employed for cattle behavior classification: Decision Tree, Random Forest, and XGBoost.

Dairy Cow Behavior Recognition Technology

389

4.2 Machine Learning Prediction Results Figures 7 and 8 present the confusion matrix plots obtained from using the Decision Tree and Random Forest machine learning methods for classification. From Fig. 7, it can be observed that while the Decision Tree exhibits a certain level of recognition capability for each behavior, its performance is relatively poorer in identifying Resting (RES) and Rumination (RUS), as well as Stationary Feeding (FES) and Grazing Feeding (GRZ). This is due to the higher similarity between these behaviors. In Fig. 8, the Random Forest benefits from its internal voting mechanism, effectively mitigating the risk of overfitting compared to a single Decision Tree, as it leverages random sampling and feature selection. However, the voting mechanism’s influence is also notable in classifying similar behaviors such as Resting (RES) and Rumination (RUS), as well as Stationary Feeding (FES) and Grazing Feeding (GRZ), leading to improved results in these cases.

Fig. 7. Confusion Matrix of Decision Tree Classification Results.

Fig. 8. Confusion Matrix of Random Forest Classification Results.

Within the XGBoost model (as shown in Fig. 9), owing to the boosting nature, each set of decision trees generated retains certain features from the preceding set, enhancing the classification performance. Consequently, the XGBoost model improves classification accuracy, particularly for similar behaviors like Stationary Feeding (FES) and Grazing Feeding (GRZ), where the model’s ability to enhance classification.

390

C.-W. Chou et al.

Fig. 9. Confusion Matrix of XGBoost Classification Results.

5 Conclusion The main objective of this study is to explore how accelerometer and machine learning techniques can be employed to identify cattle behavior. The results of the study demonstrate that the combination of accelerometers and machine learning techniques can accurately monitor various behaviors of cattle. This research introduces a novel approach to achieve cattle behavior recognition, aiding farmers in gaining a more precise understanding of cattle behavior. This methodology contributes to the timely detection of potential health issues and the improvement of cattle husbandry practices. Acknowledgement. This research was supported by the National Science and Technology Council with grant numbers: MOST 109–2221-E-992 -073 -MY3, NSTC 112–2622-8–992-009 -TD1 and NSTC 112–2221-E-992 -057 -MY3.

References 1. Rony, M., Riad, D.B., Hasan, Z.: Cattle External Disease Classification Using Deep Learning Techniques. In: 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, pp. 1–7 2. Bettermilk. The talent dilemma in Taiwan’s large animal veterinary field: Reasons for the severe imbalance in human-cattle ratio (13 September 2022). https://www.bettermilk.com.tw/ blogs/veterinary/108510 3. Williams, L.R., Bishop-Hurley, G.J., Anderson, A.E., Swain, D.L.: Application of accelerometers to record drinking behaviour of beef cattle. Animal Product. Sci. 59(1), 122–132 (2017) 4. Dutta, R., et al.: Dynamic cattle behavioural classification using supervised ensemble classifiers. Comput. Electr. Agricult. 111, 18–28 (2015)

Dairy Cow Behavior Recognition Technology

391

5. Kaler, J., Mitsch, J., Vázquez-Diosdado, J.A., Bollard, N., Dottorini, T., Ellis, K. A.: Automated detection of lameness in sheep using machine learning approaches: novel insights into behavioural differences among lame and non-lame sheep. Royal Soc. Open Sci. 7(1), 190824 (2020) 6. Riaboff, L., Poggi, S., Madouasse, A., Couvreur, S.: Development of a methodological framework for a robust prediction of the main behaviours of dairy cows using a combination of machine learning algorithms on accelerometer data, vol. 169 (2020) 7. Ito, H., et al.: Japanese Black Beef Cow Behavior Classification Dataset (v2.0.0) (2022). Zenodo. https://doi.org/10.5281/zenodo.5849025

PSO-Based CI Agent with Learning Tool for Student Experience in Real-World Application Chang-Shing Lee1,3(B) , Mei-Hui Wang1 , Chih-Yu Chen1 , Che-Chia Liang2 , Sheng-Chi Yang3 , and Mong -Fong Horng4 1 Department of Computer Science and Information Engineering, National University of

Tainan, Tainan, Taiwan [email protected] 2 Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan 3 KWS Center, National University of Tainan, Tainan, Taiwan 4 Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan

Abstract. This paper proposes a particle swarm optimization (PSO)-based computational intelligence (CI) agent with learning tools for young students’ experience and learning CI application. The CI agent combines human languages that inspire fuzzy systems with machine languages based on PSO. CI, including fuzzy systems (FSs), evolutionary computation (EC), and neural networks (NNs), is an imperative branch of artificial intelligence (AI). As a core technology of AI, it plays a vital role in developing intelligent systems and agents. During the CI Sandbox Workshop and Competition at IEEE CEC 2023 in the USA and FUZZIEEE 2023 in Korea, we organized a workshop with an associated competition for young students to learn and experience CI using the CI&AI-FML learning tool with a PSO-based CI agent. First, young students receive learning materials and guidance from tutors to learn about CI-related basic concepts. Then, they are enabled to test the learned materials with the CI learning tool in a simple real-world application. Three experiments, including an advanced driver assistance system (ADAS) at IEEE CEC 2023, an intelligent agriculture system (IAS) as well as a smart greenhouse system (SGS) at FUZZ-IEEE 2023 were conducted in these two CI sandboxes to real-time collect data from the sensors of the CI&AI-FML learning tool and send them to the PSO-based CI agent to make inferences. Finally, the learning tool activates according to the inferred result of the CI agent for young students’ experience in real-world applications. In the future, we will extend the PSO-based CI agent with Quantum Computational Intelligence (QCI) to more countries for young students co-learning CI with smart machines. Keywords: Computational Intelligence (CI) · CI&AI-FML Co-Learning · CI Agent · Particle Swarm Optimization (PSO) · Learning Tool · Quantum Computational Intelligence (QCI)

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 392–402, 2024. https://doi.org/10.1007/978-981-99-9412-0_40

PSO-Based CI Agent with Learning Tool for Student Experience

393

1 Introduction This paper proposes a particle swarm optimization (PSO)-based computational intelligence (CI) agent with CI&AI-FML learning tools for young student experience and learning CI applications. CI encompasses the theory, design, application, and development of biologically and linguistically motivated computational paradigms. Fuzzy systems (FSs), evolutionary computation (EC), and neural networks (NNs) are the three pillars of CI that play a major role in developing successful intelligent systems [1]. The basic concept of the Heart Sutra-inspired student and machine co-learning CI model [2] contains human intelligence and machine intelligence, such as: “Agent observes student’s learning field based on time/space domain, and interacts with the environment ontology, then deep learning human intelligence, as well as designs the suitable learning goals for each student, finally student will achieve the learning tasks of each learning goal.”

Using the biological evolution as a source of inspiration…

Evolutionary Computation

Episode 2: Evolutionary Computation Concept & Experience Human Intelligence & Human-Centered Logic with PSO Machine Learning Experience

Using the human brain as a source of inspiration…

Using the human language as a source of inspiration…

Fuzzy Systems

Neural Networks

Episode 3: Neural Networks Concept & Experience Human Intelligence & Human-Centered Logic with Teachable Machine Experience

CI Real-World Applications

Fig. 1. Diagram illustrating Heart Sutra-inspired humans and machines co-learning CI model.

Figure 1 shows a diagram of the Heart Sutra-inspired humans and machines colearning CI model, as well as it can be applied to CI real-world applications, which are described as follows: 1) Human and machines co-learn CI by following the six steps proposed by Lee et al. [2]. 2) FSs use human language as a source of inspiration, enabling learners to construct a human knowledge model based on acquired domain knowledge. 3) EC, inspired by biological evolution, allows learners to optimize human intelligence (HI) and human-centered logic with PSO machine learning experience. 4) Neural Networks (NNs) draw inspiration from the human brain, combining HI and human-centered logic with Google Teachable Machine experience. 5) Learners apply their acquired knowledge to real-world applications. MQTT is used for machine-to-machine communication to transmit and receive data over the proposed interactive human-machine co-learning environment. Figure 2 shows the MQTT communication diagram of humans and machines co-learning CI. The CI agent communicates with the CI&Artificial Intelligence-Fuzzy Markup Language learning tools (CI&AI-FML-LT ), including the AI-FML Robot, AI-FML MoonCar, and AIFML Learning Tool (AI-FML-LT ), through the MQTT server. An agent will observe its

394

C.-S. Lee et al.

time/space domain with the interactive human-machine co-learning environment ontology, then deeply learn human-agent interaction intelligence. Humans finally design the objective goals and loss functions for the CI agent to help the agent accomplish its tasks of the goals. MQTT publishers, namely the ZAI-FML learning platform or AI-FML learning platform, send messages containing the topic AI-FML and inferred results of the CI agent to the MQTT broker. MQTT subscribers, which are the CI&AI-FML-LT, send subscribed AI-FML messages to the MQTT broker to receive messages on topics of interest. The CI&AI-FML learning devices use MQTT for data transmission, facilitating efficient communication between human language and machine language to achieve co-learning objectives.

Publish Inferred Result to Topic AI-FML

Subscribe Topic AI-FML

MQTT Broker (Server) Receive Inferred Result from Topic AI-FML

Learners/Students ZAI-FML Learning Platform AI-FML Learning Platform

Publisher (Client)

Human and Machine Co-Learning

AI-FML Robot & AI-FML Learning Tool AI-FML MoonCar

Subscriber (Client)

Fig. 2. MQTT communication diagram between humans and machines for CI&AI-FML-LT.

2 PSO-Based Sandbox for Teaching and Learning in CI Model 2.1 CI Sandbox Workshop and Competition @ IEEE CEC 2023 and FUZZ-IEEE 2023 Figure 3 illustrates the structure of the CI Sandbox Workshop and Competition at IEEE CEC 2023 in the USA and FUZZ-IEEE 2023 in Korea. Initially, tutors introduced and explained CI-related topics and basic concepts. After the topics and concepts had been briefly introduced, participants utilized the provided CI&AI-FML learning tools and teaching materials to complete assigned tasks. Subsequently, the learned model was applied in real-world test scenarios, including demonstrations and competitions. During the event at IEEE CEC 2023, participants from Taiwan, the USA, and Germany attended in person, while some participants from Hong Kong, Taiwan, and Japan joined online. Three participants from Taiwan and Germany applied their applications to the Advanced Driver Assistance System (ADAS) by collecting real-world data from the environment. They used an Ultrasound module of the CI learning tool to acquire distance data, which was then employed to render circles on the Liquid-Crystal Display (LCD) module by following the human rule that the circle’s radius would increase as an object approached the sensor. Moreover, they employed a Light Sensor module of the CI learning tool to collect light data, enabling them to adjust the circle’s color according

PSO-Based CI Agent with Learning Tool for Student Experience

395

to the rule that in darker environments, the displayed circle would be darker [3]. Video 1 is provided in the Appendix to get more information about the ADAS application at IEEE CEC 2023 in the USA. More than 50 participants at the FUZZ-IEEE 2023 online or on-site in Korea event came from Korea, Taiwan, Japan, Canada, Hong Kong, and India. They applied their constructed human knowledge models, which considered variance in distance, light, humidity, and temperature of the CI&AI-FML-LT from the environment to various applications. These included intelligent agriculture, smart greenhouses, and comfortable surgeon rooms, allowing communication with machine language through MQTT.

Japan

Hong Kong

Human Language

CI&AI-FML Sandbox @ IEEE CEC 2023 Chicago, USA

Taiwan

USA

Advanced Driver Assistance System

Machine Language

Germany

Japan

Taiwan

CI&AI-FML Sandbox @ FUZZ-IEEE 2023 Incheon, Korea

Korea

Taiwan

Intelligent Agriculture Winner (Elementary School Group)

Japan

Smart Greenhouse Runner Up (High School & Undergraduate Group)

Hong Kong

India

Surgeon Room Comfortable Winner (High School & Undergraduate Group)

Machine Language

Fig. 3. CI Sandbox Workshop and Competition @ IEEE CEC 2023 and FUZZ-IEEE 2023.

2.2 Program of CI Sandbox @ IEEE CEC 2023 and FUZZ-IEEE 2023 Figure 4 displays the program of CI Sandbox Workshop and Competition @ IEEE CEC 2023 and FUZZ-IEEE 2023, which includes Part I (Concept-based Learning and Experience-based Learning) and Part II (Practice-based Learning, Operation-based Learning, and Expression-based Learning) [4]. These two events feature seven tracks: Track A: Fuzzy Set and System / FSs, Track B: Evolutionary Computation / EC, Track C: Neural Network / NNs, Track D: Online CI Training Demonstration / Wireless BCI Drone Control, and Tracks E, F, G: Competition. Track A is focused on experiencing the CI model, while Tracks B and C are designed for PSO experience on the CI model learning to compare pre-learning and post-learning models. During Tracks A, B, and C, tutors prepare a simple real-world application and guide the participants to experience CI&AI-FML using related CI&AI-FML-LT with software and hardware. Tracks E, F, and G involve designing, practicing, and operating what participants have learned from Tracks A, B, and C, to apply the basic CI knowledge to real-world applications. Furthermore, teams of similar ages compete against each other by presenting and demonstrating their learning performance. The evaluation criteria, including technical content, presentation quality, and Q&A, determine the winner.

396

C.-S. Lee et al. Part I: Concept-based Learning and Experience-based Learning Track B: Evolutionary Computation (EC)

Track A: Fuzzy Systems (FSs) •

CI Model - Experience • Short Introduction to FSs • Chang-Shing Lee / Marek Reformat • Human Knowledge Model • KB / RB / Application Domain • IEEE 1855 Standard • Machine Language Model • CI&AI-FML Learning Tool • Python / Blockly • Human-Machine Interaction • MQTT Protocol



CI Model - Learning • Short Introduction to EC • Alexander Dockhorn / Yusuke Nojima • Human Knowledge Model - Before PSO Learning • KB / RB / Application Domain • IEEE 1855 Standard • Machine Language Model – After PSO Learning • PSO Algorithm • CI&AI-FML Learning Tool • Python / Blockly • Human-Machine Interaction • MQTT Protocol • Before PSO Learning/ After PSO Learning

Track C: Neural Networks (NNs) •

CI Model - Learning • Short Introduction to NNs • Dominik Woiwode • Amir Pourabdollah • Human Knowledge Model with Google Teachable Machine

Track D: Online CI Training Demonstration Wireless BCI Drone Control •

CI Model - Learning • Short Introduction to BCI • Li-Wei Ko / Pei-Shin Lu • Cheng-Hua Su

Part II: Practice-based Learning, Operation-based Learning, and Expression-based Learning Tracks E, F, and G: Competition •

CI Model – Competition Design • Human Knowledge Model - Before PSO Learning



CI Model – Competition Design • Machine Language Model – After PSO Learning • •

• •

Application Domain Design KB / RB



• Elementary school: at least 2 input fuzzy variables • High school: at least 3 input fuzzy variables • Undergradate: at least 4 input fuzzy variables IEEE 1855 Standard



PSO Algorithm CI&AI-FML Learning Tool

• Python / Blockly Human-Machine Interaction • MQTT Protocol •

Before PSO Learning/ After PSO Learning

Fig. 4. Program of CI Sandbox Workshop and Competition @ IEEE CEC and FUZZ-IEEE 2023.

3 CI Agent with Learning Tool for Young Student Experience 3.1 Introduction to CI&AI-FML Learning Tool Figure 5 illustrates the fourteen components of the CI&AI-FML Learning Tool (CI&AIFML-LT ), which is built upon the GenioPy board [5]. The IEEE Computational Intelligent Society (IEEE CIS) supported the CI&AI-FML-LT for young students’ experience and learning CI in the USA, Japan, Canada, Italy, Hong Kong, Germany, and Taiwan. CI&AI-FML-LT

Canada (U of Alberta)

Germany (Uni-Hannover)

Italy (UNINA)

Japan (TMU / OMU)

Taiwan (NUTN / TKU)

Hong Kong (U of Hong Kong)

Fig. 5. Modules of the CI&AI-FML Learning Tool for IEEE CIS student experience and learning.

This tool facilitates the connection of HI with Machine Intelligence (MI) and their application to real-world scenarios, bridging human language (AI-FML) and machine language (Python or Blockly code) to co-learn CI applications through MQTT. The detailed hardware specification of the CI&AI-FML-LT can refer to the paper published in [5]. The CI&AI-FML-LT provides six, five, and three inputs, outputs, and connectivity components, respectively, with the leaners. Table 1 lists the function of CI&AI-FML-LT modules and components. The complete code is available on GitHub (https://github. com/NUTNKWS/OpenLearningTool).

PSO-Based CI Agent with Learning Tool for Student Experience

397

Table 1. The function of CI&AI-FML-LT modules and components. Function

Module or Component Name

Input

• • • •

Output

• Light Emitting Diodes (LED) Module • Servomotor Module / Fan Module • LCD Module / Speaker Module

Connectivity

• WIFI Module / SD Card Module • USB Serial Connection

Button Module / Camera Module Microphone Module / Light Sensor Module Humidity/Temperature Sensor Module (DHT11) Ultrasound Sensor Module (HC-SR04)

3.2 CI Agent Structure for the Learning Tool The demonstration and competition of the Sandbox for Teaching and Learning in CI for Pre-University and Undergraduate Students @ IEEE CEC 2023 and FUZZ-IEEE 2023 aimed to develop a real-world application using the CI&AI-FML-LT. However, due to the CI&AI-FML-LT limitations, which include an 800 MHz platform and only 256 MB of RAM, it cannot independently run complex CI algorithms. Consequently, we had to simplify the algorithm. In the competition, a server-client approach was considered, allowing complex computations to occur on the server, with the client handling the reactive processing of results [3]. Hence, the CI agent for the learning tool is presented in this paper. CI Agent Human Language (HL) Human Knowledge Model with KB and RB

Machine Learning (PSO)

Subscriber / Publisher (Client No. N) Machine Data

Human Knowledge

Machine Language (ML) Python and Blockly

Subscriber / Publisher (Client No. 1)

Machine Data

Human Knowledge

Human Knowledge

MQTT Broker (Server)

Machine Data

Machine Language (MI) Python and Blockly

Machine Data

Machine Data

Subscriber / Publisher (Client No. N-1)

Human Knowledge

Machine Language (MI) Python and Blockly

Subscriber / Publisher (Client No. 2)

Publisher (Client No. 3)

Fig. 6. CI agent structure for the CI&AI-FML learning tools with real-world application.

Figure 6 shows the structure of the CI agent for the learning tool, described as follows: 1) In the human and machine co-learning environment, multiple clients, including Client No. 1, Client No. 2, …, and Client No. N, are presented. 2) Initially, humans activate

398

C.-S. Lee et al.

the CI agent to establish a connection with the specific MQTT server. They configure matching topics for both the subscriber and publisher to align with the topics of the CI&AI-FML-LT. Subsequently, humans upload their human language (HL) along with the constructed knowledge base (KB) and rule base (RB) to the CI agent. 3) The CI&AIFML-LT, equipped with machine languages (MLs) such as Python or Blockly, publishes the collected machine data to the MQTT server. 4) Upon receiving subscribed data, the CI agent employs computational intelligence to extract human knowledge and subsequently publishes the results to the MQTT Server. 5) Once the CI&AI-FML-LT receives the published human knowledge from the CI agent, it manages the reactive processing of the results. Figure 7 shows MQTT communication between the CI agent and CI&AI-FML-LT by setting the matched parameters of the output fuzzy set. def Activate_Hardware (outv_msg):

Human Knowledge Model Constructed from ZAI-FML Learning Platform

35

65

Not Recommended Recommended

Very Recommended

Fig. 7. MQTT Communication between CI agent and CI&AI-FML-LT for student learning.

4 Experimental Results 4.1 Extended ADAS Application @ IEEE CEC 2023 The Extended ADAS (EADAS) application has its roots in the collaborative efforts that emerged from the CI Sandbox held at IEEE CEC 2023 in the USA. Furthermore, we combine the collection of data, including distance and light, with the human knowledge model. The developed CI agent sends the acquired human knowledge back to the CI&AIFML-LT for activation based on the received human knowledge. For more information about the EADAS application in the CI Sandbox @ FUZZ-IEEE 2023, please refer to Video 2 in the Appendix. There are two input fuzzy variables, Distance and Light, while the output fuzzy variable is Alert. These variables and terms cause the system to have 5 × 4 = 20 fuzzy rules, one for each combination of the input fuzzy variables. Subsequently, we simulate 301 records as the training data, with 121 in the dangerous category, 59 in the medium category, and 121 in the safe category, respectively. Figures 8(a)-(b) present the Mean Squared Error (MSE) and accuracy values under different evolution generations using the PSO-based learning method, respectively. Figure 8, after learning for 2000 generations, exhibits better performance than the others.

PSO-Based CI Agent with Learning Tool for Student Experience 120

1

100 80

0.8

Accuracy

MSE

399

60 40 20

109.693

0

100

114.88

112.250

200 500 Iteration No.

99.619

98.600

1000

2000

0.6 0.4 0.2 0

0.897

0.897

100

200

0.927

500 Iteration No.

(a)

0.963

0.980

1000

2000

(b)

Fig. 8. (a) MSE and (b) accuracy values under different evolution generations.

4.2 Intelligent Agriculture System (IAS) Application @ FUZZ-IEEE 2023. The second application of CI&AI-FML-LT is the Intelligent Agriculture System (IAS), which originated from chrysanthemum cultivation. Accurately controlling the amount of water or fertilizer is crucial for cultivating chrysanthemums. If we cannot do so accurately, there is a significant possibility of stunting plant growth or even causing plant death. To address these issues, the IAS collects measured data from the sensors, including temperature, humidity, brightness, and distance to the irrigation devices, to control the speed of plant growth. With this collected information, the farmers can adjust the irrigation volume based on the human knowledge from the CI agent to facilitate plant growth under optimal conditions. Near

1

Medium

Far

0

0

50

150

70

180

255

Medium

Low

1

100

High

200

1000

Low

1

Light (Lux)

(b) High

Medium

0

10

15

30

25

Low

1

50

Temperature (C)

0

30

40

60

70

100

Humidity (%)

(d)

Medium

Low

High

Medium

(c) High

1

Accuracy

1

2000

1200

Distance (cm)

(a)

0.9

0.87333

100

200

0.86

0.93333

0.93667

1000

2000

0.8 0.6 0.4 0.2

0

30

40

60

70

100

Degree of Growth

0 500

Iteration No.

(e)

(f)

Fig. 9. Fuzzy sets of (a) Distance, (b) Light, (c) Temperature, (d) Humidity, and (e) Degree of Growth and (f) accuracy values under different evolution generations.

Figures 9(a)-(e) show the fuzzy sets of Distance, Light, Temperature, Humidity, and Degree of Growth, respectively. Each input variable consists of 3 linguistic terms, resulting in a total of 81 fuzzy rules. We consider two conditions to construct the fuzzy rules: 1) When temperature, humidity, light, and the distance to the irrigation device are all at optimal levels, chrysanthemum growth can be maximized, and 2) if the temperature and humidity are too high or too low, it can potentially hinder chrysanthemum growth,

400

C.-S. Lee et al.

leading to the construction of the fuzzy rules. This paper adopts an evolutionary computation method, PSO-based learning, to train this human knowledge model to better fit practical applications. In this experiment, a total of 300 training data were used. The results after training are shown in Fig. 9(f), indicating that higher accuracy is achieved after approximately 2000 generations of training. 4.3 Smart Greenhouse System (SGS) Application @ FUZZ-IEEE 2023 The third application of CI&AI-FML-LT is the Smart Greenhouse System (SGS), which monitors the greenhouse environment to ensure it is suitable for plant growth. We used indoor temperature and light levels in the greenhouse, along with the measured distance among plants, as input features to assess the condition of plant growth—whether it is categorized as dangerous, medium, or safe—after connecting to the CI agent. When the distance value is smaller, it indicates a greater horizontal deviation, signifying poorer growth. Conversely, when the distance value is larger, it indicates better plant growth. The human knowledge model of the SGS has three input fuzzy variables, including Distance, Light, and Temperature, and an output fuzzy variable Alert, resulting in a total of 5 × 4 × 3 = 60 fuzzy rules. Figure 10(a) displays the accuracy values and elapsed time under different evolutionary generations. It indicates that the learned model after 2000 generations performs better. In Fig. 10(b), various particle counts, including 5, 10, 15, 20, and 25, were tested over 1000 evolutionary generations. The results suggest that using 25 particles outperforms the other options. Nevertheless, when considering elapsed time, we believe that setting 10 particles and undergoing 2000 generations of training is a more suitable model. 0.926

0.948

300

0.6

200

118

0.4 0.2

400

295

7

55

18

100

0.0

0 100

200

500 1000 Iteration No.

(a)

2000

1.0

0.874

0.926

0.911

0.929

0.963 375

0.8

400 300

0.6 182

0.4

200

197

118

0.2

100

66

0.0

0 5

10

15 20 Particle No.

Elapsed Time (sec)

0.905

0.88

Accuracy

0.846

0.8

Elapsed Time (sec)

Accuracy

1.0

25

(b)

Fig. 10. Accuracy values and elapsed time under (a) different evolution generations and (b) different numbers of particles with 1000 evolution generations.

5 Conclusions This paper introduces a PSO-based CI agent that combines human and machine language to teach young students CI concepts. The IEEE CIS supported the CI&AI-FML-LT for young students’ experience and learning CI in Japan, Canada, Italy, Hong Kong, Germany, and Taiwan. Workshops held at IEEE CEC 2023 and FUZZ-IEEE 2023 enabled students to learn the basics of CI, apply them using CI&AI-FML learning tools, and

PSO-Based CI Agent with Learning Tool for Student Experience

401

conduct real-world experiments, for example, in EADAS, IAS, and SHS. The developed PSO-based CI agent receives machine data and sends back human knowledge to the learning tool to activate its hardware via the MQTT server. Experimental results show that the proposed structure of the PSO-based CI agent is beneficial for young students to learn and experience CI. In the future, we plan to expand the PSO-based CI agent with QCI to more countries, enabling young students to co-learn CI with smart machines. Acknowledgment. The authors would like to express their gratitude for the partial financial support sponsored by the National Science and Technology Council (NSTC) of Taiwan under the grants NSTC 112–2221-E-992–057-MY3 and NSTC 112–2622-E-024–002, as well as the IEEE CIS and IEEE Region 10 (IEEE R10). We would also like to extend our sincere appreciation to Jim Keller, Alexander Dockhorn, Guilherme N. DeSouza, Yusuke Nojima, Marek Reformat, Dominik Woiwode, Pei-Ying Wu, Pei-Yu Wu, Chun Che Lance Fung, Chi-Un Lei, and Ray Cheung for their contributions. The results presented in this paper are rooted in the collaborative efforts that emerged from the co-organized workshop on “A Sandbox for Teaching and Learning in Computational Intelligence for Pre-University and Undergraduate Students,” held at IEEE CEC 2023 in the USA and FUZZ-IEEE 2023 in Korea. Finally, we thank the participants from Taiwan, Japan, Korea, Hong Kong, Germany, the USA, and India for their involvement in the Sandbox for Teaching and Learning in CI.

Appendix In this Appendix, the short textual descriptions of the videos are listed in Table 2. Table 2. Short textual descriptions of the videos. No

Short Textual Descriptions

1

Topic: ADAS application @ A sandbox for teaching and learning in CI for pre-university and undergraduate students of IEEE CEC 2023 Link: https://youtu.be/9xepPmpD6wc and https://youtu.be/1HyNRH9nsII Website: https://sites.google.com/asap.nutn.edu.tw/ieee-cec-2023/home https://oase.nutn.edu.tw/cec2023-ciworkshop

2

Topic: Extended ADAS application @ A sandbox for teaching and learning in CI for pre-university and undergraduate students of FUZZ-IEEE 2023 Link: https://youtu.be/Z4cBq0qWYVs Website: https://sites.google.com/asap.nutn.edu.tw/fuzz-ieee-2023/home https://oase.nutn.edu.tw/fuzz2023-cicompetition/

402

C.-S. Lee et al.

References 1. IEEE Computational Intelligence Society (IEEE CIS). https://cis.ieee.org/about/what-is-ci, Accessed 19 Aug 2023 2. Lee, C.S., Wang, M.H., Reformat, M., Huang, S.H.: Human intelligence-based Metaverse for co-learning of students and smart machines. J. Ambient. Intell. Humaniz. Comput. 14, 7695–7718 (2023) 3. Woiwode D.: Learning experience report on a sandbox for teaching and learning in CI for preuniversity and undergraduate students. In: 2023 IEEE Congress on Evolutionary Computation (IEEE CEC 2023), Jul 1–5, Chicago, USA (2023) 4. Lee, C.S., Wang, M.H., Nojima, Y., Reformat, M., Guo, L.: AI-fuzzy markup language with computational intelligence for high-school student learning, https://arxiv.org/abs/2112.01228, Accessed 20 Aug 2023 5. Lee, C. S., et al.: Robotic assistant agent for student and machine co-learning on AI-FML practice with AIoT application. In: 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2021), Jul 11–14, Luxembourg (2021)

A GA-Based Scheduling Algorithm for Semiconductor-Product Thermal Cycling Tests Yeong-Chyi Lee1(B) , Tzung-Pei Hong2 , Yi-Chen Chiu2 , and Chun-Hao Chen3 1 Cheng Shiu University, Kaohsiung 833, Taiwan

[email protected]

2 National University of Kaohsiung, Kaohsiung 811, Taiwan

[email protected]

3 National Kaohsiung University of Science and Technology, Kaohsiung 807, Taiwan

[email protected]

Abstract. In this paper, we study the scheduling problem of semiconductorproduct thermal cycling tests (TCT), in which a batch of TCT test orders is issued to a testing machine. Each test order contains one or several test items, each with its own parameter conditions that must be satisfied. A machine has its capacity and overall equipment effectiveness (OEE). Additionally, order priorities are also considered. We propose a genetic-algorithm-based grouping scheduling method that handles grouping and scheduling at the same time with the objective of minimizing lateness. It groups genes and encodes chromosomes with variable lengths. Furthermore, to evaluate each chromosome, we propose a fitness function that simultaneously considers the delay time, number of test items, priority sequence, and equipment effectiveness rate. We also discuss strategies to eliminate non-feasible solutions after genetic operations. The effectiveness of the proposed method is validated through experiments using simulated production data and is compared against traditional methods. The experimental results for data with different densities of orders (off-peak season and peak season) show that the proposed method outperforms the others under the different types of orders. Keywords: genetic algorithm · scheduling · thermal cycling test · overall equipment effectiveness

1 Introduction During usage, electronic components often become dysfunctional due to wear and tear. Environments with feverish temperatures and humidity can accelerate this process. Therefore, at distinct stages of engineering, it is necessary to verify whether the material combinations provided by suppliers meet customer standards using predefined tolerance limits. Research indicates that product failure issues are often detectable in the preliminary stages of production. Detecting and rectifying these issues during product development can help mitigate losses incurred due to quality problems. Hence, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 403–412, 2024. https://doi.org/10.1007/978-981-99-9412-0_41

404

Y.-C. Lee et al.

through environmental testing, researchers can grasp the failure cycles of products in high-temperature and high-humidity conditions. This information can be used to assess the lifespan and production quality of supplier materials or products. TCT (Temperature Cycling Test) is a type of environmental testing used to assess the reliability and durability of a product under temperature variations. During the testing process, the product undergoes repeated temperature cycles within a specified range of maximum and minimum temperatures while being subjected to these conditions for a particular duration. In this paper, we investigate the scheduling of TCT test orders on a testing machine, grouped according to parameter conditions, with the aim of achieving the minimal lateness. Since each order contains a varying quantity of test items, the grouping of orders needs to consider the machine’s capacity constraints and the issue of overall equipment effectiveness (OEE) [15]. Additionally, the study considers the varying priority demands of different orders due to customer preferences at different engineering stages.

2 Related Work As the market changes rapidly, companies maintain competitiveness through flexible production strategies. In the literature concerning scheduling problems, the main focuses are optimizing completion time, total process time [21], or total delay time [16]. Many heuristic algorithms have been developed to solve related problems [18]. Scheduling problems can be divided into job shop scheduling problems (JSSP) [20] and flow shop scheduling problems (FSSP) [19] based on the operation mode. The competitive market and shortened product life cycles force businesses to maintain flexible production lines to respond to product and market demands [22]. Different production layouts and machine quantities result in variations in scheduling operations [1]. Worker factors in flexible flowshop scheduling have also been considered. Gong et al. propose algorithms for the flexible multi-objective flowshop scheduling problem [10]. To solve the flowshop scheduling problem, Gong proposes a hybrid approach that is flexible in terms of both machine and manpower [9]. Kress et al. present an approach based on heuristic algorithms for the FSSP problem [13]. The grouping problem in this study can be considered a high-dimensional bin packing problem, which considers not only the capacity constraints of the orders themselves but also factors such as weight, arrival time, operation time, and machine load. Different grouping methods affect the earliest start time of operations and the allocation of machine resources [14]. How these attributes affect the results is explained in detail in the following sections. Variations of the bin packing problem include the minimum number of bins problem, the minimum remaining space problem, and the maximum packing rate problem [11]. The goal of bin packing problems is typically to minimize the number of containers used, minimize the remaining space, or maximize the packing rate [3]. Most current research focuses on designing good heuristics and approximation algorithms. The complexity and difficulty of high-dimensional bin packing problems have significantly increased. Strategies for algorithm design include local search [6], greedy search [4], and tabu search [2]. In comparison, genetic algorithms find more efficient solutions through global random search [5, 8, 12, 19]. Pankratz uses a group genetic algorithm for route planning of logistics delivery vehicles [17].

A GA-Based Scheduling Algorithm for Semiconductor-Product

405

3 Problem Definition The problem investigated in this study is to conduct reliability testing on a batch of TCT orders using a testing machine and schedule the orders to be completed with minimal lateness based on parameter conditions. Since each order contains a different quantity of test items, the grouping of orders must account for the capacity limit of the machine as well as the machine’s OEE. In addition, this study also considers the different priority demands of orders from customers at different engineering stages. 3.1 Symbols • • • • • • •



O: A set of n test orders represented as O = {o1 , o2 , · · · , on }. P: A set of test experiment parameters. Q: A set of priority weight values. oi : The i-th test order, and oi ∈ O. The test order oi consists of a given test parameters ρi ∈ P, the quantity of test items γi , the arrival time τi , due date δi and the priority weight αi ∈ Q. |oi |: The number of test items contained in the test order oi . m: The maximum loading capacity of the testing machine. Bj : The set of orders for the j-th test group, which can be represented as Bj = {o1 , o2 , · · · , ok }, Bj ⊂ O, and k is the number of orders in the test group Bj . All orders belonging to the same test group use the same test experimental parameters; hence,   if the test experimental parameters for Bj are ρj , then ρi = ρj , ∀oi ∈ Bj . Bj : The loading of the test group Bj . It is also the total sum of the quantities of test samples belonging to Bj orders, which can be calculated as:    Bj  =

oi ∈Bj

|γi |,

(1)

  where Bj  ≤ m. • υj : The workload cost for the j-th group calculated as:   oi ∈Bj |γi | υj = 1 − + ε, m 

(2)

o ∈B |γi | |B | = mj ≥ 0 represents the machine-loading factor for the current group where i m j test operation, and ε ≥ 0 is used as a bias to avoid Uj = 0.

• • • • •

ϕi : The end time of the testing order oi . Δi : The lateness of the testing order oi . Sl : The l-th chromosome. f (oi ): The fitness value of the testing order oi . f (Sl ): The fitness value of the chromosome Sl .

406

Y.-C. Lee et al.

3.2 Order Grouping Orders with the same parameters must be processed concurrently. Allocating a set of orders to a specific-capacity machine for processing is a type of bin-packing problem, which has been recognized as an NP-complete problem [7]. The order grouping in this study constitutes an application of the Bin Packing problem, given various parameter conditions imposed on grouping. Orders are allocated to a group based on their own parameter conditions.To ensure the reliability of the testing experiment, all units within an order must operate in exactly the same environment. Thus, all units within an order must be loaded into the machine simultaneously. Assuming the maximum number of test products that can be accommodated by the testing machine is m, the machine selects a set of parameters ρ for each testing experiment. Once the testing experiment starts, conditions cannot be changed until the conclusion of the experiment. Therefore, only orders with identical experimental parameters can be placed in the same testing group. 3.3 Group Scheduling This paper aims to find scheduling strategies for a single machine. It is assumed that the testing machine can accommodate a maximum of m test units. Once the testing experiment begins, the parameters cannot be changed until the completion of the experiment. Therefore, only orders with the same experimental parameters can be grouped together for testing. A group of orders, denoted as Bj , comprises several individual orders. The earliest possible start time for the operations of the group Bj is determined based on the latest arrival time of the orders within it. Scheduling is then employed to determine the operational sequence with grouping. If ϕi represents the actual completion time of a testing order oi , and Δi represents the delay time (lateness) of oi , the lateness of the testing order oi is calculated as:  ϕi − δi , ϕi > δi ; Δi = (3) 0, else. The objective of scheduling in this paper is to minimize the total lateness for completing all test orders.

4 The Proposed GA-Based Group Scheduling Algorithm The problem investigated in this study is conducting reliability testing on a batch of TCT test orders using a testing machine and scheduling the orders to be completed with minimal lateness. Since each order contains a different quantity of test items, the grouping of orders must account for the capacity limit of the machine as well as the machine’s OEE. Furthermore, a fitness function that simultaneously considers the delay time, number of test items, priority sequence, and equipment effectiveness rate is proposed to evaluate each chromosome. Some strategies to eliminate non-feasible solutions after genetic operations are also designed.

A GA-Based Scheduling Algorithm for Semiconductor-Product

407

4.1 Encoding In this study, a chromosome represents a possible outcome of grouping and ordering a set of orders for CTC experiments. An order is represented by an integer label, such as 1, 3 and 5, indicating orders o1 , o3 and o5, respectively. A test parameter is represented by a letter, such as A, B, and C. A grouped order represents a set of orders with a test parameter Bj , denoted as Bj -{o1 , o2 , o3 , …, ok }. For instance, A-{1, 7} represents o1 and o7 being grouped for testing with parameter A. Additionally, the arrangement of grouped orders from left to right in the chromosome signifies a scheduling combination where each grouped order enters the machine for testing experiments. The lengths of different chromosomes may vary depending on the different grouping results. For example, a chromosome is shown in Fig. 1.

Fig. 1. Example of an encoded chromosome.

The initial population is formed by randomly generating k initial chromosomes, where k is denoted as the size of a population. First, orders with the same parameters are randomly divided into several groups according to the grouping method, and then all groups are randomly combined to generate a set of job sequences. Since the assignment is random, the length of each chromosome may be different. 4.2 Fitness and Selection After completing the grouping and adjusting the sequence, the operation time for each group is determined based on the latest order arrival time in each group. After completing the operation, each order has a given deadline provided by the customer. We multiply the delay time Δi of each order oi by the weight αi , quantity γi , and machine load rate υj of a test group Bj , and it is calculated as: f (oi ) = Δi ∗ αi ∗ γi ∗ υj .

(4)

We set the result as the fitness value f (oi ) of the order oi. The fitness value of a chromosome f (Sl ) is the sum of the fitness values of all the orders within the chromosome, and it is calculated as: n f (Sl ) = f (oi ), (5) i=1

where oi ∈ O = {o1 , o2 , · · · , on }. Since our goal is to find the minimal lateness scheduling result, the lower the fitness value, the better the chromosome performance. The fitness value f (Sl ) is used as the criteria for selecting and retaining offspring. After the fitness of each chromosome is calculated, the roulette wheel method to select chromosomes for reproducing offsprings for the next generation.

408

Y.-C. Lee et al.

4.3 Crossover and Mutation When crossover occurs, two chromosomes are selected as parents according to the selection operator in Sect. 4.3. A set of indices is given to determine the position from which chromosome segments are extracted. The extracted chromosome segments are exchanged to generate new offspring. In Fig. 2, for example, segments 3 and 4 of chromosome 2 are exchanged with the same position of chromosome 1. The two new chromosomes as the offspring are generated after crossover.

Fig. 2. An example of crossover

Two groups with the same test parameter are selected in a chromosome for mutation. One order is taken from the group with a later order group and moved to the group with an earlier order group. If it exceeds the conditions of the group, this operation is skipped. 4.4 Chromosomes Repair After genetic operators are performed, two features, order duplication and order missing, could be found in the newly generated chromosomes. Order duplication refers to the occurrence of the same order in multiple groups within the newly generated chromosomes. Order missing refers to the situation where certain orders are missing from the newly generated chromosomes. The method for repairing the non-feasible chromosomes is described in the following. STEP 1: Scan each order in the newly generated chromosome S new and create two lists: D-list (the Duplicate Order List) and L-list(the Missing Order List). Add orders that appear repeatedly in S new to D-list, and add orders that are missing from S new to L-list. If both D-list and L-list are empty, terminate the non-feasibility repair process. Otherwise, proceed to Step 2. STEP 2: For each order oi in D-list, retain only the scheduling occurrence that appears earliest in S new , and remove all other duplicate instances of the order. STEP 3: Search for empty order groups in S new and remove them. After removal, shift the subsequent order groups forward to fill the gap.

A GA-Based Scheduling Algorithm for Semiconductor-Product

409

STEP 4: Sort the missing orders in L-list in ascending order based on their arrival times. STEP 5: Execute the implantation procedure for each order oi in L-list; Each order oi in L-list is implanted into the chromosome S new from the beginning to the end.  If a group Bj is found in S new with the same testing parameters as the order oi and Bj  < m, then the order oi is implanted into Bj , denoted as oi ∈ Bj . Otherwise, a new order group is created in S new and placed at the end. This process continues until each order oi in L-list has been implanted into S new . For example, the offspring 1 in Fig. 2 is a non-feasible chromosome. Since orders o8 and o10 have been omitted after crossover, and orders in the fifth group were duplicated with the third group, the orders in the fifth group are thus deleted. Then, o8 is inserted from the front into the empty spaces in the fourth group, and o10 is inserted into the empty space in the sixth group. This will result in the repaired chromosome shown in Fig. 3.

Fig. 3. An example of a repaired chromosome.

4.5 The Proposed Algorithm Input: A batch of orders (with different product quantities, arrival times, deadlines, engineering phases, and experimental parameters), a set of weights for different engineering phases, and machine capacity constraints. Output: Approximate optimal scheduling results. Initial Stage: Encode the orders and testing parameters using the method described in Sect. 4.1 and create the initial population Popinit consisting of r chromosomes, where r is the predetermined population size. Begin: Iterate n times Calculate the fitness values by using formula (5) for all chromosomes in the population. Employ the roulette wheel selection method to choose parent chromosomes. Repeat until the size of the next generation r is reached: Perform crossover operator to generate the new generation chromosomes. Utilize the repair procedure in Section 4.5 to repair the non-feasible chromosomes. Perform mutation operator to generate a new chromosome. Obtain the new population . Return the chromosome with the minimal fitness value. End.

410

Y.-C. Lee et al.

5 Experimental Results In validating the performance of the proposed improved algorithm in practical production conditions, we note that the annual orders of a semiconductor company follow the seasonality pattern with peak and off-peak seasons corresponding to the release of customer products. The simulated production data of a particular machine in a certain company was selected for May and December in a specific year. These two months were chosen as the experimental period because the production line was in off-peak and peak seasons, respectively, for observation of the algorithm’s effectiveness under different production pressures. When OEE is set to 0.9, the performance of the proposed GOGA is the best, with higher numbers than both FIFO and traditional genetic algorithms. This indicates that GOGA have a significant advantage in high OEE scenarios. The algorithm reaches convergence around 500–1000 generations, with the rate of decrease being faster as OEE increases. When OEE is 0.9, the production efficiency of GOGA is significantly higher than FIFO and traditional genetic algorithms, as shown in Fig. 4. When OEE is set to 0.5, the performance of GOGA is significantly better than the other two strategies in the off-peak season, as shown in Fig. 5. It converges earlier in the case of low loads, and the algorithm converges almost completely after 500 generations.

Fig. 4. Experimental results of a peak season

A GA-Based Scheduling Algorithm for Semiconductor-Product

411

Fig. 5. Experimental results of an off-peak season

6 Conclusion In this paper, the proposed GOGA combines the grouping characteristics of group genetic algorithms and the scheduling capabilities of traditional genetic algorithms to solve the problem. Experimental results show that controlling the machine’s load rate improves the efficiency of overall scheduling. In the production of consumer electronics products, there is a fixed schedule for product launches, and orders also have periodic variations in demand. When capacity is not at its maximum, blind pursuit of load rates may lead to delays in overall scheduling. Therefore, we analyze historical production data to assess the current production resources needed. We simulate two different levels of order congestion and adjust parameters to yield better resource allocation results. In situations with a high number of orders, increasing the machine’s utilization rate ensures that most orders are delivered on time. Conversely, in situations with fewer orders, reducing the load rate suffices by operating the machine at only half its capacity. Although this increases the number of start-up operations, it reduces the number of delayed orders. Acknowledgement. This work was supported by the grant MOST 112-2221-E-390-014-MY3, the National Science and Technology Council, Taiwan.

References 1. Brown, E.C., Sumichrast, R.T.: Evaluating performance advantages of grouping genetic algorithms. Eng. Appl. Artif. Intell. 18(1), 1–12 (2005) 2. Crainic, T.G., Perboli, G., Tadei, R.: TS2PACK: a two-level tabu search for the threedimensional bin packing problem. Eur. J. Oper. Res. 195(3), 744–760 (2009) 3. Crainic, T.G., Perboli, G., Rei, W., Tadei, R.: Efficient lower bounds and heuristics for the variable cost and size bin packing problem. Comput. Oper. Res. 38(11), 1474–1482 (2011) 4. de Castro Silva, J.L., Soma, N.Y., Maculan, N.: A greedy search for the three-dimensional bin packing problem: the packing static stability case. Int. Trans. Oper. Res. 10(2), 141–153 (2003)

412

Y.-C. Lee et al.

5. Falkenauer, E.: A new representation and operators for genetic algorithms applied to grouping problems. Evol. Comput. 2(2), 123–144 (1994) 6. Faroe, O., Pisinger, D., Zachariasen, M.: Guided local search for the three-dimensional binpacking problem. INFORMS J. Comput. 15(3), 267–283 (2003) 7. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completeness. J. Symbolic Logic 48(2), 498–500 (1979) 8. Goldberg, D.E., Lingle, R.: Alleles, loci, and the traveling salesman problem. In: Proceedings of an International Conference on Genetic Algorithms and Their Applications, vol. 154, pp. 154–159. Lawrence Erlbaum, Hillsdale, NJ (1985) 9. Gong, G., Deng, Q., Gong, X., Liu, W., Ren, Q.: A new double flexible job-shop scheduling problem integrating processing time, green production, and human factor indicators. J. Clean. Prod. 174, 560–576 (2018) 10. Gong, X., Deng, Q., Gong, G., Liu, W., Ren, Q.: A memetic algorithm for multi-objective flexible job-shop problem with worker flexibility. Int. J. Prod. Res. 56(7), 2506–2522 (2018) 11. Hong, T.P., Chen, C.H., Lin, F.S.: Using group genetic algorithm to improve performance of attribute clustering. Appl. Soft Comput. 29, 371–378 (2015) 12. Janovec, M., Kohani, M.: Grouping genetic algorithm (GGA) for electric bus fleet scheduling. Transp. Res. Procedia 55, 1304–1311 (2021) 13. Kress, D., Müller, D., Nossack, J.: A worker constrained flexible job shop scheduling problem with sequence-dependent setup times. OR Spectrum 41, 179–217 (2019) 14. Lee, M.S., et al.: Chip to package interaction risk assessment of fcbga devices using fea simulation, meta-modeling and multi-objective genetic algorithm optimization technique. In: 2021 IEEE International Reliability Physics Symposium (IRPS), Monterey, CA, USA, pp. 1–6 (2021) 15. Liu, W., Zhu, X., Wang, L., Zhang, Q., Tan, K.C.: Integrated scheduling of yard and rail container handling equipment and internal trucks in a multimodal port. IEEE Trans. Intell. Transp. Syst., 1–22 (2023) 16. Morinaga, E., Tang, X., Iwamura, K., Hirabayashi, N.: An improved method of job shop scheduling using machine learning and mathematical optimization. Procedia Comput. Sci. 217, 1479–1486 (2023) 17. Pankratz, G.: A grouping genetic algorithm for the pickup and delivery problem with time windows. OR Spectrum 27, 21–41 (2005) 18. Su, C., et al.: Evolution strategies-based optimized graph reinforcement learning for solving dynamic job shop scheduling problem. Appl. Soft Comput. 145, 110596 (2023) 19. Syarif, A., Wamiliana, W., Lumbanraja, P., Gen, M.: Study on genetic algorithm (GA) approaches for solving flow shop scheduling problem (FSSP). IOP Conf. Ser. Mater. Sci. Eng. (857) 1, 012009 (2020) 20. Tamssaouet, K., Dauzère-Pérès, S.: A general efficient neighborhood structure framework for the job-shop and flexible job-shop scheduling problems. Eur. J. Oper. Res. 311, 455–471 (2023) 21. Xiong, W., Li, J., Qiao, Y., Bai, L., Huang, B., Wu, N.: An efficient scheduling method for single-arm cluster tools with multifunctional process modules. IEEE Trans. Syst. Man Cybern. Syst. 53(6), 3270–3283 (2023) 22. Zhou, J., Wang, T., Cong, P., Lu, P., Wei, T., Chen, M.: Cost and makespan-aware workflow scheduling in hybrid clouds. J. Syst. Architect. 100, 101631 (2019)

Power Internet of Things Security Evaluation Method Based on Fuzzy Set Theory Yuman Wang1 , Hongbin Wu1 , Yilei Wang2 , Zixiang Wang2 , Xinyue Zhu1,3(B) , and Kexiang Qian1,4 1 SGRI Power Grid, Digitizing Technology Department, State Grid Laboratory of Power Cyber-Security Protection and Monitoring Technology, State Grid Smart Grid Research Institute Co., Ltd., Beijing 102200, China [email protected] 2 Internet Technology Center, State Grid Zhejiang Electric Power Co., Ltd., Research Institute, Hangzhou 310000, China 3 School of Computer Science, Northeast Electric Power University, Jilin City 132012, Jilin, China 4 School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100000, China

Abstract. Aiming at the problems of high fuzziness and low integrity of the current Power Internet of Things network security evaluation index system, the fuzzy set theory is used to evaluate the security issues of Power IoT terminal, protocol, strategy, business operation, business management, employee operation and user operation of the Power Internet of Things from two aspects of the Power Internet of Things itself and manpower, the fuzzy judgment matrix is constructed, and the weight of index items is determined by AHP. To establish a comprehensive and reasonable Power Internet of Things safety evaluation index system, overcome the shortcomings of single and quantitative analysis of Power Internet of Things network risk, and realize the determination of fuzzy index weight combining quantitative and qualitative, so as to improve the accuracy of Power Internet of Things safety evaluation. Keywords: Power Internet of Things · Fuzzy sets · Analytic Hierarchy Process · Security evaluation

1 Introduction The safety of Power Internet of Things (Power IoT) plays an important role in maintaining the normal operation of Power IoT [1, 2]. Once Power IoT accidents happen, people’s daily life and even social and economic development will be seriously affected. At the end of the 20th century, the concept of Power IoT security assessment was first proposed by the international Power IoT association CIGRE [3]. In the 21st century, the government of our country attaches great importance to the safe operation of Power IoT. It not only makes requirements for the security of Power IoT in different states, but also actively explores new technologies to improve the security and stability of Power IoT operation [4]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 413–422, 2024. https://doi.org/10.1007/978-981-99-9412-0_42

414

Y. Wang et al.

At present, the research on fuzzy set theory and analytic hierarchy process has achieved remarkable results. The purpose of fuzzy decision making is to effectively deal with sets whose boundaries are not clearly defined. Literature [5, 6] mainly describes the development process of fuzzy set theory, fuzzy decision making and various models based on fuzzy theory. Analytic Hierarchy Process (AHP) is a widely used multi-criteria decision-making method, which is used to excavate the weight of standards and the priority of alternatives in a structured way on the basis of pound-for-pair comparison [7]. Because the subjective judgment may not be accurate during the comparison, fuzzy set is combined with analytic hierarchy process, which is called fuzzy analytic Hierarchy Process, which is the mainstream method of comprehensive index evaluation of power market at present [8]. At present, most of the result papers describe the different methods and the relative benefits of each method to derive the weight (priority) from the fuzzy judgment matrix. These methods can be classified according to the four characteristics of fuzzy analytic hierarchy process [9]: Relative importance; Evaluation factor set and weight, priority decision; Fuzzy set processing method; Determine the consistency measure. In recent years, Power IoT in many countries have been undergoing major shifts in new energy generation methods, which have also created new challenges for the industry [10]. Due to the uncertainty and intermittency of system data, experts are required to provide new solutions and adaptive adjustment strategies to maintain the reliability of Power IoT operation [11]. So far, domestic and foreign power workers have made some achievements in the field of electricity market evaluation system [12]. Reliability index is used to measure the effectiveness of Power IoT. Literature [13, 14] describes in detail a variety of Power IoT reliability evaluation methods and efficient and accurate calculation methods based on model simulation technology. Meanwhile, a Power IoT reliability evaluation method based on Monte Carlo simulation (MCS) and variance reduction technology (vrt) is proposed to quantitatively estimate system reliability. It can greatly improve the computing power of reliability evaluation of Power IoT. High-availability, reliable Power IoT are particularly important today as cloud data centers increasingly rely on the Internet of Things [15], but IP-based communication technologies in smart grids increase their networked potential, such as IP spoofing and distributed denial of service attacks, which are prone to smart meter readings, power demand errors, and compromised protective Power IoT terminals. Therefore, the Power IoT security assessment system with adaptive response mechanism plays an important role. Literature [16] puts forward the internal security risk framework of Power IoT assessment. Through security risk quantification, security scoring model is used to accurately evaluate the overall security status of the Power IoT.

2 Power Internet of Things Safety Evaluation Index System Since the safety evaluation of the Power IoT network need to consider uncertainty and fuzziness of the safety production of the Power IoT, this paper starts from the Power IoT itself and the user side. The Power IoT security is evaluated in terms of Power IoT terminal security, business operation and management security, employee operation security,

Power Internet of Things Security Evaluation Method Based on Fuzzy Set Theory

415

protocol security, policy security, etc. The Power IoT security evaluation index system is established through the three-level structure division and comprehensive analysis of the indicator information reflecting the state of the system. Table 1 shows the framework Table 1. Frame structure of Power Internet of Things security evaluation index system G: Power system safety evaluation index

First-order index

Secondary index

Three-level index

A: The Power Internet of Things side

A1 : Power IoT terminal safety

A11 Number of terminal connections A12 Type of stored information A13 terminal permission A14 Loophole A15 Trojans and virus numbers A16 Number of attacks A17 Number of successful attacks

A2 : Protocol security

A21 Quantity of agreement A22 Number of vulnerabilities A23 Data transmission encryption A24 Broadcast suppression A25 Programmability

A3 : Policy security

A31 Firewall A32 Firewall configuration A33 Access control policy A34 Backup and restoration mechanism A35 Security audit A36 Regular bug check and repair A37 Frequency of vulnerability inspection and repair (continued)

416

Y. Wang et al. Table 1. (continued)

G: Power system safety evaluation index

First-order index

Secondary index

Three-level index

B: The user side

B1 : Service operation security

B11 Number of disconnections B12 Off-network time B13 data transmission time B14 Service data transfer rate

B2 : Service management security

B21 Authentication mechanism B22 Authority differentiation B23 Service maintenance times B24 Single service maintenance time

B3 : Employee operation safety

B31 Operation and maintenance skills training B32 Pass rate of operation tasks B33 Employee safety awareness assessment status

B4 : User operation security

B41 User password or password complexity B42 Number of all users B43 Number of successful login attempts

structure of the Power IoT security evaluation index system. Figure 1 shows the basic steps of Power IoT security assessment.

Power Internet of Things Security Evaluation Method Based on Fuzzy Set Theory

417

Fig. 1. Basic steps of Power Internet of Things security assessment

3 Power Internet of Things Security Evaluation System Based on Fuzzy Set Since all the above factors are fuzzy factors, it is necessary to make a fuzzy evaluation of Power IoT security evaluation index [16]. Based on the Power IoT security evaluation index system and the research results at home and abroad, the Power IoT security is analyzed by fuzzy comprehensive evaluation method. 3.1 Determining the Set of Factors and the Assessment Collection Monitoring and evaluating Power IoT security requires comprehensive evaluation from many aspects, all these factors constitute the set of evaluation index system, that is, the set of factors is marked as A = {a1 , a2 , · · · , an }, where Ai represents the ith evaluation factor and N is the total number of evaluation factors. Here, 12 factors are selected to constitute the factor set of fuzzy evaluation for Power IoT practicality: a1 number of Power IoT terminals connected; a2 Storage information type; a3 Power IoT terminal rights; a4 number of agreements; a5 number of vulnerabilities; a6 data transmission encryption; a7 firewall configuration; a8 access control policy; a9 times of disconnection; a10 authentication mechanism; a11 operation task qualification rate; a12 number of all users. In this paper, the fuzzy set comments are divided into five levels: very relevant v1 , relatively relevant v2 , general related v3 , not very relevant v4 , and basically unrelated v5 . The corresponding grade scores are set to 1, 0.8, 0.6, 0.4, 0.2, that is, the score set S = {0.2, 0.4, 0.6, 0.8, 1}. The results of Power IoT security evaluation are directly reflected by the score.

418

Y. Wang et al.

3.2 Determining Membership There are many ways to determine the membership function, and the membership function used in this article is as follows: ⎧ ⎪ 0,0 ≤ x ≤ a ⎪ ⎪ x−a ⎨ ,a < x ≤ b f (x) = b−a (1) ⎪ 1,b < x ≤ c ⎪ ⎪ ⎩ d −x , c < x ≤ d d −c According to the index hierarchical structure chart shown in Table 1, the first-level index Ut contains k second-level indicators, and the fuzzy evaluation matrix of the indicator Ut can be calculated as: ⎤ ⎡ ut11 ut12 . . . ut15 ⎢ ut21 ut22 . . . ut25 ⎥ ⎥ Rtk×5 = ⎢ (2) ⎣... ... ... ... ⎦ utk1 utk2 . . . utk5

The meanings of the 33 fuzzy judgment matrices formed by the above method are shown in Table 2. Table 2. Fuzzy Evaluation Matrix List Number

Matrix

Order

Represents meaning

1

RG

5×5

Power Internet of Things Security Assessment Fuzzy Evaluation Matrix

2

RA

3×5

Power Internet of Things side safety Fuzzy Evaluation Matrix

……

……

……

……

3.3 Determining Index Weights The analytic hierarchy process (AHP) is a multicriteria decision-making method proposed by a professor at the University of Pittsburgh. The main steps include: Establishing a Judgement Matrix.: By solving the eigenvectors corresponding to the maximum eigenroot of the matrix. The weight of low-level indicators relative to highlevel indicators can be obtained, and 33 AHP judgment matrices can be established at the same time, thirty-three AHP judgment matrices were established, and the comparison basis is shown in Table 3, and the meaning of each matrix is shown in Table 4. Compare element weights: By solving the feature vector corresponding to its largest feature root, the weights of each index are obtained and made to meet the consistency requirements. Establishing Index Weight Set.

Power Internet of Things Security Evaluation Method Based on Fuzzy Set Theory

419

Evaluation index assignment: This paper uses the max-min synthesis operation: Ut = wl×k ◦ Rtk×5

(3)

Formula: Ut is the fuzzy evaluation matrix of the superior index; wl×k is the membership degree of the subordinate sub-index relative to Ui ; Rtk×5 is the fuzzy evaluation matrix of the subordinate subindexes. The element Utj in Ut is calculated as: Utj =

k

(wi ∧ rtij ), i = 1, 2, . . . , 5

(4)

i=1

In formula: wi and rtij are elements in matrix W and R respectively; ∨ and ∧ represent the upper and lower bounds respectively. According to the principle of maximum membership, the comprehensive evaluation matrix Ut of the electricity market can be obtained by calculating its comprehensive evaluation matrix Ut from the low-level index according to the above formula. According to the set of score values S = {0.2, 0.4, 0.6, 0.8, 1}, the total score of the index system Q can be obtained: Q = S T Ut

(5)

Table 3. 1–9 Scale Table Scale

The degree to which language is described

1

Equally important

3

Slightly Important

5

Significantly important

7

Strongly important

9

Extremely Important

Note: 2, 4, 6, 8 represent the median of adjacent scales.

Table 4. AHP Judgement Matrix Number

Matrix

Order

Represents meaning

1

HG

5×5

Power Internet of Things Security Assessment AHP Judgement Matrix

2

HA

3×5

Power Internet of Things side safety AHP Judgement Matrix

……



……

……

420

Y. Wang et al.

4 Experimental Results Based on the method in this paper, the AHP judgment matrix and index weight set are calculated by using the simulation data, and the index evaluation level membership curve shown in Fig. 2 is generated, and finally the index evaluation results of power Internet of Things security shown in Table 5 are obtained.

Fig. 2. Evaluation Grade Subordination Curve

Table 5. Evaluation results of indicators of Power Internet of Things Evaluation objectives

Fuzzy evaluation matrix

Total score

Evaluation

Power IoT terminal security

[0.20, 0.40, 0.48, 0.40, 0.30]

0.59

medium

Protocol security

[0.05, 0.10, 0.50, 0.10, 0.25]

0.52

medium

Strategy security

[0.20, 0.45, 0.20, 0.10, 0.15]

0.68

medium

Service operation security

[0.40, 0.15, 0.10, 0.05, 0.05]

0.82

good

Service management security

[0.10, 0.20, 0.10, 0.60, 0.40]

0.46

good

Employee operation security

[0.25, 0.30, 0.20, 0.10, 0.20]

0.66

excellent

User operation security

[0.30, 0.25, 0.60, 0.05, 0.40]

0.43

poor

Comprehensive evaluation index

[0.15, 0.05, 0.40, 0.30, 0.25]

0.71

good

The total score of the seven evaluation objectives in Table 5 is shown in Fig. 3 below.

Power Internet of Things Security Evaluation Method Based on Fuzzy Set Theory

421

Fig. 3. Overall Score Chart of Evaluation Objectives

5 Conclusion This paper describes the Power IoT security evaluation method based on fuzzy set theory. The main work is as follows: The ideas and steps for the construction of Power IoT security evaluation index system are put forward, and a comprehensive and accurate hierarchical structure of Power IoT security evaluation index is established. A comprehensive evaluation method is presented to determine the membership degree of safety evaluation index based on the fuzzy set theory, and the weights of each index are determined by the analytic hierarchy process. The basic characteristics of this method are illustrated. Acknowledgment. This work is supported by the science and technology project of State Grid Corporation of China “Research on Edge Attack Defense Technology for Power IoT Based on Threat Capture and Adaptive Response Linkage” (Grand No.5700-202319297A-1-1-ZN).

References 1. Li, J.P., Gao, M., Pan, J.S., Chu, S.C.: A parallel compact cat swarm optimization and its application in DV-Hop node localization for wireless sensor network. Wireless Netw. 27(3), 2081–2101 (2021) 2. Li, J.P., Han, Q., Wang, W.T.: Characteristics analysis and suppression strategy of energy hole in wireless sensor networks. Ad Hoc Netw. 135, 1–12 (2022) 3. Simic, D.: A review of applications of fuzzy sets to safety and reliability engineering. J. Appl. Log. 24, 85–96 (2017) 4. Li, J.P., Liu, K., Wang, J.: Data query routing algorithm with cluster bridge for wireless sensor network. In: 2022 IEEE International Performance, Computing, and Communications Conference, 11–13 November 2022, pp. 313–318 (2022)

422

Y. Wang et al.

5. Liu, Y.: A review of fuzzy AHP methods for decision-making with subjective judgements. Expert Syst. Appl. 161, 35–45 (2020) 6. Forouli, A.: Assessment of demand side flexibility in European electricity markets: a country level review. Energies 14(8), 110–118 (2021) 7. Kadhem, A.A.: Computational techniques for assessing the reliability and sustainability of electrical power internet of things: a review. Renew. Sustain. Energy Rev. 80, 1175–1186 (2017) 8. Maziku, H.: Security risk assessment for SDN-enabled smart grids. Comput. Commun. 133, 1–11 (2019) 9. Chang, Z.: Safety risk assessment of electric power operation site based on variable precision rough set. J. Circuits Syst. Comput. 31(14) (2022) 10. Karuna, K., Manoha, C.S.: Structural analysis with alternative uncertainty models: from data to safety measures. Struct. Saf. 62, 116–127 (2016) 11. Ling, C.: Importance analysis of different components in a multicomponent system under fuzzy inputs. Struct. Multidisc. Optim. 65(3) (2022) 12. Mechri, W.: Uncertainties handling in safety system performance assessment by using fuzzy Bayesian networks. J. Intell. Fuzzy Syst. 33(2), 995–1006 (2017) 13. Pan, Y.: Improved fuzzy Bayesian network-based risk analysis with interval-valued fuzzy sets and D-S evidence theory. IEEE Trans. Fuzzy Syst. 28(9), 2063–2077 (2020) 14. Purba, J.H.: A fuzzy-based reliability approach to evaluate basic events of fault tree analysis for nuclear power plant probabilistic safety assessment. Ann. Nucl. Energy 70, 21–29 (2014) 15. Xia, S.: A new method decision making problems with redundant and incomplete information based on incomplete soft sets: from crisp to fuzzy. Int. J. Appl. Math. Comput. Sci. 32(4), 657–669 (2022) 16. Zhou, X., Peng, T.: Application of multi-sensor fuzzy information fusion algorithm in industrial safety monitoring system. Saf. Sci. 122 (2020)

A Q-Learning Based Power-Aware Data Dissemination Protocol for Wireless Sensor Networks Neng -Chung Wang1(B) , Chao-Yang Lee2 , Ming-Fong Tsai3 , Hsu-Fen Fu1 , and Wei-Jung Hsu1 1 Department of Computer Science and Information Engineering, National United University,

Miaoli 360302, Taiwan [email protected] 2 Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliou 640301, Taiwan 3 Department of Electronic Engineering, National United University, Miaoli 360302, Taiwan

Abstract. In order to improve the lifetime of wireless sensor networks (WSNs), a power-aware data dissemination protocol based on Q-learning is proposed in this paper, called PADD-QL. When looking for the data transmission path, the proposed scheme can select the dissemination nodes along the vertical, horizontal, and diagonal directions to establish the data transmission path. In addition, the proposed scheme can find an optimal data transmission path from the source node to the sink by using Q-learning. In terms of the number of rounds executed, the performance gains of PADD-QL over TTDD and PADD are approximately 66% and 35%, respectively. Simulation experiments confirm that the proposed solution can significantly improve the lifetime of WSNs. Keywords: data dissemination · Q-learning · power-aware · sink · wireless sensor network

1 Introduction In recent years, Internet of Things (IoTs) related technologies and applications have developed vigorously. As the basic core of the IoTs, wireless sensor network (WSN) has become increasingly important. A WSN consists of a large number of self-organizing sensor node devices to detect the occurrence of specific events within the sensing area. In wireless sensor networks, devices can automatically organize to form self-organizing multi-hop wireless network. Due to the rapid development of applications such as healthcare, vehicle tracking, home automation, and environmental monitoring [1, 2], researches on WSN have received extensive attention. In general, sensors have sensing functions, and the characteristics of sensors include small size, power saving, and low price [3]. Since the progress of current electronic technology can greatly reduce the size of sensors, there © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 423–428, 2024. https://doi.org/10.1007/978-981-99-9412-0_43

424

N.-C. Wang et al.

are an increasing number of application products integrating sensor and wireless network technologies. Due to their low price and small size, sensors can be deployed in a wide range of sensing field. In a WSN, the sensor nodes are used to collect the data of interest and then transmit the data to the sink. In a WSN, a source node is a sensor node that detects the occurrence of an event and generates data to report the detected information, while a receiver node is a sensor node that collects data in the WSN, also known as a sink. In some special environments, such as forests, oceans and other areas, a large number of sensor nodes are deployed in the vast environment. Since the sensor is limited in power, it is quite difficult to replace the battery and maintain and manage it in a harsh or wide environment. Therefore, the focus of current research on WSNs is to design energy-saving data transmission protocols to improve the lifetime of WSNs.

2 Related Work The grid-based data transmission scheme is a cost-effective data transmission method. In WSN, the grid-based approach mainly uses the location information of sensor nodes to construct a grid structure as assistance for data transmission. In TTDD [4], the source node divides the sensing field into a grid structure, which containing many cells. After the grid structure is constructed, the intersection on the grid is assigned a grid point. Then, the sensor node with the smallest distance from the grid point becomes a dissemination node. In the scheme, only sensors located at dissemination nodes need to get forwarding information such as query message and sensing data. In the scheme, the nearest dissemination node in the vertical or horizontal direction is found to establish the path, and the same process is repeated until the data transmission path is established to the sink. In PADD [5], only a small number of dissemination nodes need to be responsible for forwarding message and transmitting data for data dissemination. This scheme selects a suitable cell size such that a dissemination node can direct communicate with its eight neighboring dissemination nodes. So, a packet can be sent not only to the dissemination node in the horizontal or vertical direction, but also to the dissemination node in the diagonal direction. When establishing the transmission path, the scheme selects the dissemination node with the highest residual energy for data transmission, so the sensor’s energy consumption can be evened out. Machine learning mainly collects a large amount of raw data to learn and train the data and select the corresponding mathematical model. At the same time, it calculates the classification results by verifying the data to determine whether the model is suitable for prediction or classification. Q-learning is a kind of enhanced learning algorithm [6]. Q-learning records the learned strategy and informs the agent to take all possible actions in different states to determine the maximum reward value. Q-learning can be performed without special changes for reward functions with random factors. Q-learning can find a strategy that maximizes the reward expectations of all actions and gives an optimal action selection strategy. In Q-learning, the learning agent will record the values (Q values) of all actions for each state to build a Q table. For each state, the learning agent will choose the action with the highest current value. After each action, the Q values

A Q-Learning Based Power-Aware Data Dissemination Protocol

425

in the Q table will be updated until an optimal strategy can be obtained after multiple iterations of learning.

3 Q-Learning Based Power-Aware Data Dissemination First, a grid structure is constructed. The sensor node that detects an event in the sensing field is called a source node. The source node divides the sensing field into a grid of cells with the same cell size. In addition, in order for each node to know its own location information, each sensor node has a global positioning system (GPS) device to receive its own location information. Second, the dissemination nodes are selected. After the grid structure is constructed, the intersection point on the grid is assigned a grid point. Then, the scheme selects the sensor node with the smallest distance from the grid point as the dissemination node, responsible for data storage and data forwarding. Third, the data transmission path is established. A Q-learning strategy is added to the data dissemination protocol to search an energy-saving transmission path. In the proposed scheme, the learning agent will record the Q values of all actions for each state. The learning agent will record all possible transmission paths taken by each dissemination node. When the current dissemination node selects a certain transmission path, the Q values in the Q table are updated according to the obtained reward value. During data transmission, the transmission path of the current dissemination node can be obtained through the Q values. In the process of Q-learning, the Q values are updated using Eq. (1). Q(St , At ) ← Q(St , At ) + α[Rt + γ max Q(St+1 , A) − Q(St , At )] A

(1)

Q(S t , At ) is the expected value obtained from the current state S t performing the action At to the state S t+1 . Rt is the reward obtained by performing the action At . α is the learning rate, 0 ≤ α ≤ 1. γ is the discount rate, 0 ≤ γ ≤ 1. max Q(St+1 , A) A

is the maximum Q value obtained by performing all actions A in state S t+1 . After the Q-learning process, an energy-saving data transmission path is found based on the Q values. In the research, the reward value takes into account two factors: residual energy and distance, the reward is obtained according to Eq. (2). Reward = Energy/Distance

(2)

When the sink needs to query data, the query message is sent from the sink to its adjacent dissemination node, then the query message is forwarded to the upstream dissemination nodes, and finally the query message reaches to the source node. After receiving the query message, the source node starts to establish an energy-saving data transmission path by using Q-learning. When looking for the data transmission path, the proposed scheme can select the dissemination nodes along the vertical, horizontal, and diagonal directions, and repeat the same procedure until the dissemination node with smallest distance from the sink. In each learning, the Q values are updated. After several iterations of learning, the Q values in the Q table will reach convergence, then

426

N.-C. Wang et al.

Fig. 1. An example of data transmission path.

an energy-saving data dissemination path is found based on the Q values. Then, the establishment of data transmission path is completed, as shown in Fig. 1. Finally, data transmission is conducted according to the pre-established data transmission path, which completes the data transmission for this round.

4 Simulation Results This study uses MATLAB for simulation, In the simulations, the first-order radio model is adopted as the energy model. The proposed scheme PADD-QL will be compared with TTDD and PADD. The simulation parameters of this study are described as follows: the network size is 400 m × 400 m, the transmission range is 150 m, the location of the sink is (300, 300), the initial power is 0.25 J/node, and the cell size δ is 100 m, the number of nodes is 200, the packet size is 2000 bits, the learning rate α is 0.9, and the discount rate γ is 0.8. Our explores the effect of node death rate on the number of execution rounds. For TTDD, PADD and PADD-QL, and we examine the number of executed rounds when the node death rate reaches different percentages. In Fig. 2, the number of rounds executed by PADD-QL is superior to that of TTDD and PADD. When the node death rate reaches 80%, the execution rounds of TTDD, PADD, and PADD-QL are 3052, 3741, and 5068, respectively. In this case, in terms of the number of executed rounds, the performance gains of PADD-QL over TTDD and PADD are approximately 66% and 35%, respectively. Since PADD-QL uses Q-learning strategy to find the energy-saving data transmission path, this solution reduces sensor’s energy consumption, so as to prolonging the lifetime of WSNs.

A Q-Learning Based Power-Aware Data Dissemination Protocol

427

Fig. 2. Percentage of node death versus number of rounds.

5 Conclusions To improve the energy efficiency of wireless sensor networks, we propose a Q-learning based power-aware data dissemination protocol (PADD-QL). In the proposed scheme, the reward value takes into account two factors: residual energy and distance, so an energy-saving data transmission path is found by using Q-learning. Based on simulation results, the performance of the proposed PADD-QL is better than that of TTDD and PADD. Acknowledgment. This work was supported by the National United University of Taiwan under grant 112-NUUPRJ-01.

References 1. Akyildiz, I., Su, W., Sankarasubramanian, Y., Cayirci, E.: A survey on sensor networks. IEEE Commun. Mag. 40(8), 102–114 (2002) 2. Gupta, A., Gulati, T., Bindal, A.K.: WSN based IoT applications: a review. In: Proceedings of the 10th International Conference on Emerging Trends in Engineering and Technology Signal and Information Processing, Nagpur, India, 2022, pp. 1–6, June 2022 3. Kanwar, V., Kumar, A.: Distance vector hop based range free localization in WSN using genetic algorithm. In: Proceedings of 6th International Conference on Computing for Sustainable Global Development, New Delhi, India, pp. 724–728, March 2019 4. Ye, F., Haiyun, L., Jerry, C., Songwu, L., Zhang, L.: A two-tier data dissemination model for large-scale wireless sensor networks. In: Proceedings of the ACM Interna-tional Conference on Mobile Computing and Networking, pp. 148–159, September 2002

428

N.-C. Wang et al.

5. Wang, N.-C., Chiang, Y.-K.: Power-aware data dissemination protocol for grid-based wireless sensor networks with mobile sinks. IET Commun. 5(18), 2684–2691 (2011) 6. Fukao, T., Sumitomo, T., Ineyama, N., Adachi, N.: Q-learning based on regularization theory to treat the continuous states and actions. In: Proceedings of Neural Network of IEEE World Congress on Computational Intelligence, vol. 2, pp. 1057–1062, May 1998

Application of Entropy – TOPSIS Method in Service Quality Assessment Tien-Chin Wang1 and Thuy Thi Thu Nguyen2(B) 1 International Business Department, National Kaohsiung University of Science and

Technology, Kaohsiung 80778, Taiwan [email protected] 2 Dong Nai Technology University, Dong Nai 810000, Bien Hoa, Vietnam [email protected]

Abstract. Services play a significant role in a country’s economy because they provide added value to customers and businesses. Therefore, service quality is one of the critical factors that determine the existence of an organization or business in the market. This paper presents a prioritization technique similar to the ideal solution - the TOPSIS method combined with Entropy weighting and the application to select service quality. A hypothetical situation is applied to five service packages with ten designed quality evaluation criteria based on the PZB Service Quality Model. The result is choosing the best service package. Thereby, business managers, based on this result, can devise appropriate service development strategies for customers. Keywords: Entropy-TOPSIS · Service quality · PZB Service Quality Model

1 Introduction The services development is an inescapable trend in the context of globalization. Services have a vital role in promoting the speed of goods and currency movement in the economy, significantly contributing to the growth of the economies of other countries [1]. The manufacturing and service industries will create links between manufacturing industries, domestic regions, and the world. In addition, service activities also help connect demand and supply in the market. In social life, the service industry creates a new direction, career choices, and jobs for many workers. Besides, the service industry also helps to change thinking, creating many opportunities as well as challenges for service people. The service industry also helps to meet people’s needs more quickly and easily, such as travelling, shopping, and entertainment. After studying the service quality theory and the service quality characteristics, it can be seen that this is an extremely important factor that determines the existence of an organization or enterprise in the market. For service industries, hotels or restaurants, service quality is considered a key factor to compete in the market and create trust in the hearts of customers [2]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 429–437, 2024. https://doi.org/10.1007/978-981-99-9412-0_44

430

T.-C. Wang and T. T. T. Nguyen

Factors affecting the quality of service will affect customer satisfaction. Service quality is concerned with the provision of services by the business to the customers [3]. Therefore, there are many studies on evaluating service quality, and choosing service quality to satisfy customer needs. Such as hotel service evaluation, airport, health care, and education [4]. The TOPSIS method based on the Entropy weight in this paper is introduced to select service quality with 10 service quality evaluation criteria. The design is based on the PZB Service Quality Model. A hypothetical situation is applied to 5 packages of Online banking services for individual customers. The result is choosing the best service package. Thereby, business managers based on this result can devise appropriate service development strategies for customers.

2 TOPSIS Method 2.1 Definition TOPSIS ideal score method of Hwang and Yoon (1981) has become a popular implement for figure out problems of multi-criteria decision-making (MCDM) [5]. TOPSIS is known as an effective method of ranking based on the distance to positive or negative solutions. In particular, for negative ideal solutions, the distance must be the furthest, and for positive solutions, the distance must be the shortest. The option that matches the above criteria will be selected [6]. TOPSIS method is said to have advantages when it is easy to understand, simple and has a reasonable concept when compared with MAIRCA, VIKOR, MABAC, and COPRAS. Recent publications have paid much attention to weighting criteria in issues-solving evaluation. The Entropy-based approach to confirmatory the target allocate weights is one of the best-known [6]. In multi-criteria decision-making, the more considerable the Entropy value, the smaller the overall measure of uncertainty corresponding to the attribute weight. In the process of decision-making, the discriminant ability of that attribute is less [7]. 2.2 Implementation Formula • The evaluation matrix general expression A matrix B = [rij] n.z with suppose n objects, z criteria for evaluation

B = [rij ]n×z

T T . . . Tz ⎡ 1 2 ⎤ Al1 x11 x12 . . . x1z ⎥ = Al2 ⎢ ⎢ x21 x22 . . . x2z ⎥ ⎣ ... ... ... ... ...⎦ Aln xn1 xn2 . . . xnz n×z

i = 1, 2, ...n j = 1, 2, ..., z

(1)

In which alternatives are Al1 , Al2 … Aln, measurement criteria are T1 , T2 …Tz , a crisp value xij is that different Ali grips concerning every indicator Tj.

Application of Entropy – TOPSIS Method in Service Quality Assessment

• Data Normalization The calculation of Value rij is as equation below: xij rij =  j = 1, 2, ..., z n 2 xij

431

(2)

i=1

• Entropy value Calculation From the concept of Entropy, the entropy value is expressed as follows: Ej = −c

n

rij log rij j = 1, 2, ..., z

(3)

i=1

• Entropy Weight Calculation wj =

1 − Ej j = 1, 2, ..., z z (1 − Ej )

(4)

j=1

Constructing standardized weight matrix ⎡

r11 ⎢ ⎢ r21 V = [rij ]n×z [wj ]z×1 = ⎢ ⎣ ... rn1

r12 r22 ... rn2

... ... ... ...

⎡ ⎤ w1 v11 ⎢ ⎢ ⎥ ⎢ v21 ⎢ w2 ⎥ = [vij ]n×z = ⎢ ⎢ ⎥ ⎣ ... ⎣ ... ⎦ vn1 wz z×1 n×z

⎤ r1z ⎥ r2z ⎥ ⎥ ... ⎦ rnz



v12 v22 ... vn2

... ... ... ...

⎤ v1z ⎥ v2z ⎥ ⎥ ... ⎦ vnz n×z

(5)

i = 1, 2, ..., n j = 1, 2, ..., z

With vij = rij · wj . . • Finding (PIS) V + positive ideal solution and (PIS) V- negative ideal solution The equation of (PIS) V + is: V + = [max(vi1 ), max(vi2 ), ..., max(viz )] = (v1+ , v2+ , ..., vz+ ) i

i

i

(6)

The equation of (PIS) V- is: V − = [min(vi1 ), min(vi2 ), ..., min(viz )] = (v1− , v2− , ..., vz− ) i

i

i

• Calculate Di+ separation measure positive-ideal and Di− negative-ideal

z Di+ = (vij − vj+ )2 i = 1, 2, ..., n

(7)

(8)

j=1

Di−



z = (vij − vj− )2 i = 1, 2, ..., n

(9)

j=1

• Calculate (CCi ) relative closeness coefficient to the ideal solution. CCi =

Di−

Di+

+ Di−

n With 0 ≤ CCi ≤ 1

• Ranking based on calculated CCi results The higher the CCi is, the higher the rating value.

(10)

432

T.-C. Wang and T. T. T. Nguyen

3 Developing Criteria for Evaluating Service Quality In this study, the service quality evaluation criteria are built according to the service quality model PZB of Parasuraman (1985). This model is a scale to help measure service quality from customer experience, also known as a five (5) gap service quality improvement model [8]. The PZB service quality model is known as a reputational risk assessment model for measuring service quality [9]. The reliability and accuracy of this scale are highly appreciated in many fields such as hotels, banks, airlines, restaurants, schools, hospitals, etc… Because this scale relies on customer perception when using the service. The PZB service quality model originated from the research of Parasuraman, ZeithamL, and Berry in 1985 and includes 10 components includes Reliability, Responsiveness, Competence, Accessibility, Courtesy, Communication, Reliability, Confidentiality, Understanding/Knowing the Customer, and Tangibles [10] (Fig. 1). Table 1. Definition of criteria according to Parasuraman et al.’s model [11] Factors

Definition

Criteria

Reliability level

Reliability is one of the biggest influencing factors in customer psychology. Reliability here refers to the accuracy of the service

Crt1

Responsiveness

Measure the ability to solve problems and complaints of customers effectively and quickly

Crt2

Service capacity This is the professional or professional ability of the staff to provide and perform the service Accessibility

Crt3

Accessibility is reflected in the customer’s easy access to the Crt4 service, quick access time, convenient customer service time, and service locations. Appropriate customer service

Courtesy

Businesses have a warm and respectful attitude towards customers Crt5

Communication

This is the component that shows the level of communication of service information to customers in the easiest way to understand

Crt6

Credibility

Shows the trust level of customers in the company’s brand and service quality in the past

Crt7

Security

This represents a guarantee in terms of security and privacy of customers’ information

Crt8

Customer understanding

The level of customer understanding is the ability to continuously Crt9 learn, understand and respond to the changing needs of customers

Tangible factors

Tangible elements include objective factors that customers can see Crt10 when interacting with services such as staff uniforms, decoration, product, and service layout., shop

Application of Entropy – TOPSIS Method in Service Quality Assessment

433

The criteria are summarized in the following table, each criterion has 100-point scale and assuming 5 service packages have been surveyed. Managers want to choose the most suitable service package and satisfy the needs of customers (Table 1).

Reliability

Service 1

Responsiveness Service 2

Criteria

Service 3

Competence

Service 4

Access Courtesy

Best selected qualified service

Communication Credibility

Service 5 Security Customer understanding tangibles Fig. 1. Selecting service quality according to 10-component of PZB Service Quality Model

4 Application in Service Quality Selection Suppose a bank offers Online banking services for individual customers including the following 5 service packages: Standard Package; Platinum Package; Flexible Packages; VIP Packages; Foreigners Package. These service packages will be surveyed based on customer satisfaction after use. After that, the bank will choose which service package is suitable to continue serving online banking services. Then after surveying and evaluating a group of customers, the scale of the service evaluation criteria is presented as follows (Table 2): • Present the original matrix of decision. • Normalization of research data. Normalizing raw data to eliminate different units of scale and measure outliers is necessary for some statistical methods especially when applying the TOPSIS method.

434

T.-C. Wang and T. T. T. Nguyen Table 2. The original matrix of decision

Service

Crt1

Crt2

Crt3

Crt4

Crt5

Crt6

Crt7

Crt8

Crt9

Crt10

Ser1

86

75

73

66

73

75

86

80

78

89

Ser2

83

72

75

70

75

73

83

85

89

80

Ser3

81

78

76

80

76

89.8

81

86

67

73

Ser4

80

89

77

75

77

90

80

83

75

75

Ser5

85

67

78

65

78

80

85

81

72

75

The normalization matrix is presented in the following table. From Eq. (1) and (2) the result can be showed in example as below (Table 3): r1.1 = 

86 (86)2 + (83)2 + (81)2 + (80)2 + (85)2

=

86 = 0.463 185.7

Table 3. The normalization matrix Service

Crt1

Crt2

Crt3

Crt4

Crt5

Crt6

Crt7

Crt8

Crt9

Crt10

Ser1

0.463

0.438

0.431

0.413

0.431

0.41

0.463

0.431

0.456

0.506

Ser2

0.447

0.421

0.442

0.438

0.442

0.399

0.447

0.458

0.52

0.455

Ser3

0.436

0.456

0.448

0.501

0.448

0.49

0.436

0.463

0.391

0.415

Ser4

0.431

0.52

0.454

0.47

0.454

0.492

0.431

0.447

0.438

0.427

Ser5

0.458

0.391

0.46

0.407

0.46

0.437

0.458

0.436

0.421

0.427

• Entropy Weight Calculation From Eqs. (3) and (4) the result of calculating the Entropy weight can be shown (Table 4): E1 =

−1 Log(5)x[(0.463xlog(0.463) + (0.447xlog(0.447) + ... + (0.458xlog(0.458)] −1 = 1.118 = −0.89445 W1 =

1 − Ej k

(1 − Ej )

=

1 − 1.118 (1 − 1.118) + (1 − 1.113) + ... + (1 − 1.115)

j=1

=

−0.12 = 0.102 −1.15756

Application of Entropy – TOPSIS Method in Service Quality Assessment

435

Table 4. The Entropy weight of criteria Service

Crt1

Crt2

Crt3

Crt4

Crt5

Crt6

Crt7

Crt8

Crt9

Crt10

Entropy weight 0.102 0.098 0.102 0.099 0.102 0.098 0.102 0.102 0.098 0.099

• Constructing normalized weight matrix From Eq. (5); the normalized weight matrix was be presented, an example below: v11 = r11 .w1 = 0.102 × 0.463 = 0.047 (Table 5). Table 5. The normalization weight matrix Service

Crt1

Crt2

Crt3

Crt4

Crt5

Crt6

Crt7

Crt8

Crt9

Crt10

Ser1

0.047

0.043

0.044

0.041

0.044

0.04

0.047

0.044

0.044

0.05

Ser2

0.045

0.041

0.045

0.043

0.045

0.039

0.045

0.047

0.051

0.045

Ser3

0.044

0.044

0.046

0.05

0.046

0.048

0.044

0.047

0.038

0.041

Ser4

0.044

0.051

0.046

0.046

0.046

0.048

0.044

0.045

0.043

0.042

Ser5

0.047

0.038

0.047

0.04

0.047

0.043

0.047

0.044

0.041

0.042

• Seek (PIS) positive ideal solution and (PIS) negative ideal solution From Eqs. (6) and (7), (PIS) V+ and (PIS) V- results will be presented in below Table 6. Table 6. PIS and NIS solutions Solution

Crt1

Crt2

Crt3

Crt4

Crt5

Crt6

Crt7

Crt8

Crt9

Crt10

V+

0.047

0.051

0.047

0.05

0.047

0.048

0.047

0.047

0.051

0.05

V-

0.044

0.038

0.044

0.04

0.044

0.039

0.044

0.044

0.038

0.041

• Calculate positive-ideal and negative-ideal From Eqs. (8) and (9) we got (Table 7)  D1+ = (0.047 − 0.047)2 + (0.043 − 0.051)2 + ... + (0.05 − 0.05)2 = 0.016487 D1− =

 (0.047 − 0.044)2 + (0.043 − 0.038)2 + ... + (0.05 − 0.041)2 = 0.012848

436

T.-C. Wang and T. T. T. Nguyen Table 7. The distance to the ideal solution

Solution

Service 1

Service 2

Service 3

Service 4

Service 5

D+

0.01649

0.01592

0.01721

0.01269

0.02091

D-

0. 01285

0.01437

0.01499

0.01775

0.00754

• Calculate the relative closeness coefficient to the ideal solution CCi . From Eq. (10), CCi was calculated and presented as an example below: CC1 =

D1−

D1+

+ D1−

=

0.01285 = 0.438. 0.01649 + 0.01285

Similar to the rest of the calculations for the relative proximity coefficient (CCi) calculation, we can calculate the result of ranking the choices (Table 8). Table 8. Calculate Closeness coefficient and Services Ranking No

Service

CCi

Service Ranking

1

Ser1

0.438

4

2

Ser2

0.474

2

3

Ser3

0.466

3

4

Ser4

0.583

1

5

Ser5

0.265

5

Thus, the fourth service is the best choice, followed by the second service and the third, first and fifth service. From the above results, the bank’s managers can choose suitable online banking service packages for personal assistance. VIP Packages are the best choice, followed by Platinum Package, Flexible Package, Standard Package, and Foreigners Package. Managers can consider removing unsuitable service packages or improving ones that have yet to achieve customer satisfaction.

5 Conclusion In the problems of multi-criteria decision analysis, the TOPISS method is suitable for limiting subjective factors when making decisions. This article introduces TOPSIS method built on Entropy weight and its application in service evaluation and selection with ten criteria designed based on PZB Service Quality Model. A hypothetical situation is applied to 5 service packages of online personal banking service; the best service package is selected. The right services are chosen to provide a suitable customer service development strategy. This study can be a reference for those who are working in the field of service quality management and researchers related to the issue of service selection.

Application of Entropy – TOPSIS Method in Service Quality Assessment

437

This study analyzes based on a hypothetical situation and assumes the results of the service quality assessment. Future studies will apply a specific service industry and provide practical questionnaires to obtain accurate quality assessments.

References 1. Huong, N.: What is service? The role of the service industry in the economy (2023). https:// luatvietnam.vn/linh-vuc-khac/dich-vu-la-gi-883-94100-article.html 2. Nam, G.V.: What is service quality? Methods for assessing service quality (2022). https:// chungnhanquocgia.com/chat-luong-dich-vu-la-gi-cac-phuong-phap-danh-gia-chat-luongdich-vu/ 3. Asubonteng, P., McCleary, K.J., Swan, J.E.: SERVQUAL revisited: a critical review of service quality. J. Serv. Mark. 10(6), 62–81 (1996) 4. Sureshchandar, G.S., Rajendran, C., Anantharaman, R.N.: The relationship between service quality and customer satisfaction – a factor specific approach. J. Serv. Mark. 16(4), 363–379 (2002). https://doi.org/10.1108/08876040210433248 5. Hwang, C.-L., Masud, A.S.M.: Multiple Objective Decision Making—Methods and Applications: A State-of-the-Art Survey, vol. 164. Springer Science & Business Media, Heidelberg (2012). https://doi.org/10.1007/978-3-642-45511-7 6. Mao, N., Song, M., Deng, S.: Application of TOPSIS method in evaluating the effects of supply vane angle of a task/ambient air conditioning system on energy utilization and thermal comfort. Appl. Energy 180, 536–545 (2016) 7. Shannon, C.E.: The mathematical theory of communication. Bell Syst. Techn. J. 27, 379–423 (1949) 8. Parasuraman, A., Berry, L.L., Zeithaml, V.A.: More on improving service quality measurement. J. Retail. 69(1), 140–147 (1993) 9. SimcicBronn, P.: Adapting the PZB service quality model to reputation risk analysis and the implications for CSR communication. J. Commun. Manage. 16(1), 77–94 (2012). https://doi. org/10.1108/13632541211197978 10. Franceschini, F., Galetto, M., Turina, E.: Service quality monitoring by performance indicators: a proposal for a structured methodology. Int. J. Serv. Oper. Manage. 5(2), 251–273 (2009) 11. Parasuraman, A., Zeithaml, V.A., Berry, L.L.: A conceptual model of service quality and its implications for future research. J. Mark. 49(4), 41–50 (1985)

Machine Learning Algorithm-Based Prediction of Hyperglycemia Risk After Acute Ischemic Stroke Yating Hao1

, Xuan Zhang2(B)

, and Lihua Dai3(B)

1 School of Health Sciences and Engineering, University of Shanghai for Science and

Technology, Shanghai 200093, China 2 School of Information Science and Technology, Shanghai Sanda University, Shanghai 201209,

China [email protected] 3 Emergency Department, Shidong Hospital of Shanghai Yangpu District, Shanghai 200438, China [email protected]

Abstract. The importance of early prediction of hyperglycemia after acute ischemic stroke in intensive care unit (ICU) was addressed to explore the application of machine learning methods in disease prediction. This research focuses on the prediction of the risk of causing hyperglycemia using data from acute ischemic stroke patients in the MIMIC (Medical Information Mart for Intensive Care) IV database. We construct and compare 8 machine learning models of Logistic Regression, Support Vector Machines, Decision Trees, Random Forests, Gradient Boosting, Bayesian Classifiers, K-Nearest Neighbors, and XGBoost. Feature selection and parameter tuning were performed to improve the model performance and to find the optimal prediction model for hyperglycemia after acute ischemic stroke. After comparison, the support vector machine-based model had the best prediction accuracy compared to other models, with accuracy, precision recall, and f1 score reaching 97.84%, 97%, 98%, and 97%, respectively. The machine learning method has high accuracy in the early identification of hyperglycemia after acute ischemic stroke in ICU, which is expected to be an objective and effective decision-making tool to assist clinicians. Keywords: Machine Learning · Predictive Modeling · Acute Ischemic Stroke · Post-Stroke Hyperglycemia

1 Introduction Post-stroke hyperglycemia is a common complication associated with poor prognosis and increased patient mortality and morbidity [1]. In stroke patients, if hyperglycemia occurs, timely and appropriate therapeutic measures can help improve prognosis and stroke recovery [2]. Therefore, the discovery and development of predictive models applicable to such problems has practical applications. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 438–446, 2024. https://doi.org/10.1007/978-981-99-9412-0_45

Machine Learning Algorithm-Based Prediction

439

The emergence of Machine Learning (ML) has provided viable tools for solving numerous real-world problems, which involves various techniques for solving complex problems with big data by identifying certain correlations between variables [3]. ML is increasingly used in biomedical applications, such as image segmentation for recognition and diagnosis, pharmacotherapy, adverse event prediction, etc. Few studies have used machine learning methods for the prediction of hyperglycemia in acute ischemic stroke, and the methods have focused on traditional statistical methods; for example, using the Cox proportional hazards model, it was found that patients with nondiabetic acute ischemic strokes had poorer clinical characteristics for each of the clinical features at the time of a hyperglycemic event compared with patients who did not have hyperglycemia [4]. However, some scholars have used machine learning to discuss the outcome of recurrent stroke, Hassan et al. modeled recurrent stroke using a recurrent stroke clinical public dataset and compared the performance of multiple machine learning algorithms, which showed that ANN had the highest accuracy rate of 80.00% [5]. It shows that machine learning methods have led to an increase in the effectiveness of predictions. In order to further the application of machine learning methods, this paper selects eight algorithms for experimentation. The main work includes: (1) Logistic Regression (LR), Support Vector Machines (SVM), Decision Trees (DT), Random Forests (RF), Gradient Boosting (GB), Naïve Bayes (NB) Classifiers, K-Nearest Neighbors (KNN), and XGBoost are built and compared; (2) using Principal Component Analysis (PCA) to find key feature sets that can be beneficial to the prediction task; (3) the grid search method is chosen for parameter tuning to improve the prediction performance of the model; (4) expect predictive modeling to assist physicians in predicting and categorizing patients based on various metrics from multiple perspectives.

2 Related Work Machine learning algorithms can be categorized as supervised learning, unsupervised learning and deep learning [6]. Some of the more representative machine learning algorithms applied to medicine include KNN [7], NB [8], RF [9], DT [10], SVM [11], XGBoost [12], LR [13], etc. LR is often used for binary classification, which is a kind of probability value estimation of possible output variables after specifying input variables. SVM aims to find a hyperplane that divides the data into different classes while maximizing the distance to the classification boundary, and some of the sample points closest to the classification boundary are the so-called support vectors. DT makes decisions based on a tree structure and is able to categorize and regressively analyze data. Shaikhina et al. identified the potential of a DT classification model in the prediction of high-risk renal transplantation proximity, and developed a DT model that predicts early transplant rejection with an accuracy of 85% [14]. RF is the basic classification method, which of the main idea is to arrive at a more stable and accurate model by constructing multiple decision trees and averaging their results. Shahadat Uddin et al. compared several machine learning algorithms for disease prediction studies and showed that RF algorithm showed relatively higher accuracy [15]. Gradient boosting trains the next weak learner on the residuals of the previous weak learner and obtains the final

440

Y. Hao et al.

prediction through weighted summation. Natekine et al. demonstrated the application of gradient boosting to an ARM controller for myoelectric robots, electromyography (EMG) physical action classification, and text categorization, respectively, and the gradient boosting machine showed excellent results in terms of accuracy and interpretability [16]. Since NB makes conditional independence assumptions, it is more expected that features that are highly correlated with classes but not with other features are assigned greater weights [17]. KNN’ basic idea is to compare the new sample with all the samples in the training set, and then select the k nearest neighbor samples as the prediction of the new sample based on similarity. Granata et al. used multiple machine learning models to predict clinical outcomes after resection of colorectal liver metastases, and the KNN had the best predictive ability with 95% accuracy [18]. XGBoost is an integrated learning algorithm based on decision trees, which integrates multiple decision trees together to form a powerful model, where each decision tree is a prediction of the target variable. In predicting the outcome of the occurrence of acute myocardial infarction, XGBoost showed near-perfect accuracy and more precise classification within the risk of disease occurrence [19]. In all studies, no model is best in all situations, so it is challenging to select the right model for a particular clinical problem. Badriyah et al. used machine learning for stroke disease classification by collecting CT image data from stroke patients and using eight machine learning methods, which showed that RF yielded the highest accuracy of 95.97%, as well as a high precision value (94.39%), recall value (96.12%), and f1-score (95.39%) [20]. In addition, it is also crucial to choose methods to optimize model performance, such as key feature extraction. PCA is a commonly used downscaling and feature extraction method, which aims to transform the original high-dimensional features into low-dimensional data through linear transformation, in which the correlation of the data is maximized in the low-dimensional space, retaining the key information of the data and removing redundant features [21]. The dimensionality of the data affects the learning of machine learning [22]. Therefore the research on feature selection methods has received more and more attention. In view of the mentioned works, eight machine learning models were used to predict the risk of hyperglycemia after acute ischemic stroke in ICU patients, and principal component analysis and parameter tuning were performed and compared to obtain the best model in this research.

3 Approach 3.1 Basic Modules The whole process of experimentation includes data extraction, feature selection, data cleaning and transformation, machine learning model building, model performance evaluation. Fig. 1 illustrates the operational flow of the entire experiment. The basic steps for constructing these models include (1) data extraction, collecting datasets containing target variables and multiple feature variables from MIMIC-IV, as well as data preprocessing; (2) feature selection, principal component analysis was chosen to screen key feature information; (3) data partitioning, dividing the original feature set into a training

Machine Learning Algorithm-Based Prediction

441

set and a test set according to the ratio of 8:2; (4) model training. The model is constructed using the training set; (5) model evaluation, the model is tested and evaluated using the test set, and the prediction accuracy, precision, recall and F1 value are calculated to evaluate the performance; (6) model optimization, the parameters are adjusted for optimization.

Fig. 1. Technical Flowchart: the operational flow of the entire experiment.

3.2 Feature Extraction Principal component analysis (PCA) was used to extract key information and dimensionality reduction of the data [23]. The first step in PCA is to calculate the covariance matrix of the raw data. The covariance measures the degree of association between two variables. For the sample data the matrix is X ∈ Rn×p , where n is the number of samples and p is the number of features. First, the data needs to be normalized by subtracting each column from its mean value x, then divide by the standard deviation S of the data in that column to get X . The data are first normalized, i.e., xij =

xij − xj sj

(1)

Next, the data covariance matrix Cp×p is derived from Eq. (1), i.e., 1  X TX (2) n−1 The covariance matrix is then subjected to eigen value decomposition (EVD), which transforms it into the form of eigenvectors and eigenvalues. Eigenvector Pp×p represents the orientation of the original data in the new feature space. And the eigenvalues C=

442

Y. Hao et al.

p×p indicates the importance of each feature vector. The magnitude of the eigenvalue indicates the magnitude of the variance of the data contained in the corresponding eigenvector, namely, C = PP −1

(3)

Finally, to select the principal components based on the size of the eigenvalues, we can select the top k most important principal components, i.e., select the top k ranked eigenvectors Pk , and obtain the data after reconstruction with the first k principal components, i.e., Xk = X  Pk

(4)

Among them, Xk ∈ Rn×k represents the reconstructed data matrix. 3.3 Classification Prediction Model Based on 8 Machine Learning Algorithms Eight supervised learning algorithms, namely LR, SVM, DT, RF, GB, NB, KNN, and XGBoost, were used to predict the likelihood of hyperglycemia in patients with acute ischemic stroke in ICU, and the labeled dataset was utilized to predict and solve the classification problem, and the processing of constructing the prediction model was as follows. First, relevant information about acute ischemic stroke patients, including basic demographic information and laboratory measurement data, was extracted from the MIMIC-IV database. The preprocessed data were divided into two groups according to the ratio of 8:2, one group was the training dataset (80%), which was used to train the model for “learning”, and the other group was the test dataset (20%), which was used to test and evaluate the learning effect of the model. Next, parameter tuning was performed with the aim of obtaining the best parameters as well as the best model for each machine learning model. Parameter tuning was performed using the grid search method and then evaluated by 5-fold cross-validation. Finally, we compared the performance of the post-stroke hyperglycemia prediction model to get the best hyperglycemia prediction model. We used Python to build the models, called sklearn, pandas, numpy and other related toolkits, and built 8 machine learning models. The “train__test__split” package in the Sklearn library is invoked to divide randomly the dataset into two parts, in which the training set occupies 80% and the test dataset occupies a proportion of 20%.

4 Experiment 4.1 Experiment Setting Data Sets. Datasets were obtained from the publicly available the medical information mart for intensive care (mimic) - iv database, which contains extensive clinical data covering tens of thousands of patients at beth israel deaconess medical center in the boston area of the united states between 2008 and 2019 [24]. After obtaining permission to use the data, we extracted the data of the study subjects in this paper from the mimic-iv

Machine Learning Algorithm-Based Prediction

443

database, and included 6506 study subjects according to the following criteria: (1) age greater than 18 years; (2) stroke patients in icu; and (3) only the records of the patients who were first admitted to the icu were selected. Each patient contained 13 attributes, including age, gender, measurement of total bilirubin and direct bilirubin, temperature, glycated hemoglobin, creatinine, sodium, potassium, hemoglobin, lactate, platelet and glucose. Data preprocessing and Parameter Settings. For features and category labels, data preprocessing work is required, which may include operations such as data cleaning, feature selection, feature scaling, etc., to ensure the quality and usability of the data. “glucose” is a label attribute used to determine whether hyperglycemic events occurred in patients with acute ischemic stroke, which is divided into two categories, records with glucose higher than normal and others with no hyperglycemia occurred. The method of filling the missing values with “0” is adopted, and the obtained data are used for PCA later, which in turn is used as an input dataset to train the model and validate the model. The dataset is divided into training set and testing set in the ratio of 8:2. In addition, parameter tuning is required. The performance of the learned models tends to differ significantly with different parameter configurations. The regularization parameter C is set to [0.1, 1, 10]. The kernel function parameter is set to [‘linear’, ‘rbf’]. The maximum tree depth parameter is set to [None, 5, 10]. The number of counts is set to [100, 200, 300]. The minimum number of samples for node splitting is set to [2, 5, 10], and the number of neighbors k parameter is set to [3, 5, 7], and the learning rate parameter is set to [0.01, 0.1, 1.0].

4.2 Model Evaluation and Performance Comparison The above 13 attributes are used as feature attributes to form a training dataset to train the model, in which the “glucose” attribute data is used as the target dataset to predict and test the constructed model. After the above tuning settings, the models were tested on the stroke dataset to obtain the evaluation indexes of the algorithmic models: accuracy, precision, recall, and F1 score, with the indexes taking the values in the range of [0,1]. Table 1 demonstrates the values of the performance metrics exhibited by the eight models on the feature sets. Overall, the prediction accuracy of the eight models is high, but the SVM algorithm model has the highest accuracy with 97.84%. Compared to the other models, SVM had better performance in classifying whether hyperglycemic events occurred after acute ischemic stroke. However, SVM did not have the best scores for precision, recall, and F1 score with 97% precision, 98% recall, and 97% F1 score. Among them, NB, GB, and XGBoost show optimal precision and recall, respectively. This difference may be related to the size of the data volume, the quality of the data, the selection of feature attributes, or the defects of the algorithm itself. In our research, the missing values in the dataset are filled with “0”, which may lead to a high proportion of category “0”. If there is an imbalance of sample categories in the dataset, the model may tend to predict a large number of categories.

444

Y. Hao et al. Table 1. Performance comparison of 8 ML models.

ML

Evaluation indicators Accuracy

Precision

Recall

F1-score

SVM

0.9784

0.97

0.98

0.97

XGBoost

0.9777

0.98

1

0.99

RF

0.9769

0.98

0.99

0.99

GB

0.9769

0.98

1

0.99

KNN

0.9761

0.99

0.99

0.99

LR

0.9754

0.98

0.99

0.99

DT

0.9746

0.98

0.99

0.99

NB

0.9516

1

0.95

0.97

5 Conclusion Regarding the prediction of hyperglycemic events occurring after acute ischemic stroke, eight machine learning algorithms, namely, LR, SVM, DT, RF, GB, NB, KNN, and XGBoost were selected to compare the prediction performance, and the data of 6506 patients were extracted from MIMIC-IV database. The relevant data of 6506 patients were extracted to build a predictive classification model, and principal component analysis was used for feature selection and parameter tuning. The support vector machinebased model has the best accuracy in predicting whether hyperglycemia occurs after stroke compared with other models, and its accuracy, precision recall, and f1 score reached 97.84%, 97%, 98%, and 97%, respectively. However, the limitation of this paper is that the dataset or feature attributes are small, and more features containing relevant amount of information or feature transformations may need to be added to improve the overall performance of the model. Alternatively, the balanced dataset approach can be tried for data category balancing. Only eight supervised learning algorithms were used to predict the prediction of post-stroke hyperglycemia, and more intelligent algorithms can be used subsequently to obtain models with superior predictive ability in followup research. With the feature attributes selected in this paper, the machine learning method has high accuracy in the early identification of post-stroke hyperglycemia after acute ischemic stroke in ICU, which has the potential to be an objective and effective reference tool to assist clinicians in making. Acknowledgement. This work is sponsored by 2023 Research Foundation of Shanghai Sanda University(No.2023YB23) and 2023 Key Course Construction Foundation of Shanghai Sanda University(No.A020201.23.060) and 2021 Key Educational Reform Foundation of Shanghai Sanda University (No.A020203.21.010).

Machine Learning Algorithm-Based Prediction

445

References 1. Gutiérrez-Zúñiga, R., Alonso De Leciñana, M., Delgado-Mederos, R., et al.: Beyond hyperglycemia: glycaemic variability as a prognostic factor after acute ischemic stroke. Neurología (English Edition). 38, 150–158 (2023) 2. Kim, J.-T., Lee, J.S., Kim, B.J., et al.: Admission hyperglycemia, stroke subtypes, outcomes in acute ischemic stroke. Diabetes Res. Clin. Pract. 196, 110257 (2023) 3. Ngiam, K.Y., Khor, I.W.: Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, e262–e273 (2019) 4. Shih, H.-S., Wang, W.-S., Yang, L.-Y., et al.: The role of nondiabetic hyperglycemia in critically ill patients with acute ischemic stroke. JCM. 11, 5116 (2022) 5. Hassan, F.H., Omar, M.A.: Recurrent stroke prediction using machine learning algorithms with clinical public datasets: an empirical performance evaluation. Baghdad Sci. J. 18, 1406 (2021) 6. Sarker, I.H.: Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2(3), 1–21 (2021). https://doi.org/10.1007/s42979-021-00592-x 7. Guo, G., Wang, H., Bell, D., et al.: KNN Model-Based Approach in Classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, pp. 986–996. Springer, Berlin, Heidelberg (2003) 8. Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence (2001) 9. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) 10. Myles, A.J., Feudale, R.N., Liu, Y., et al.: An introduction to decision tree modeling. J. Chemom. 18, 275–285 (2004) 11. Dietrich, R., Opper, M., Sompolinsky, H.: Statistical Mechanics of Support Vector Networks, http://arxiv.org/abs/cond-mat/9811421. (1999) 12. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 785–794. ASSOC COMPUTING MACHINERY, San Francisco, CA (2016) 13. Application of logistic regression models to assess household financial decisions regarding debt. Procedia Computer Science. 176, 3418–3427 (2020) 14. Shaikhina, T., Lowe, D., Daga, S., et al.: Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomed. Signal Process. Contr. 52, 456–462 (2019) 15. Uddin, S., Khan, A., Hossain, M.E., et al.: Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 19, 281 (2019) 16. Natekin, A., Knoll, A.: Gradient boosting machines, a tutorial. Front. Neurorobot.. 7, 21 (2013) 17. Jiang, L., Zhang, L., Li, C., et al.: A correlation-based feature weighting filter for Naive Bayes. IEEE Trans. Knowl. Data Eng. 31, 201–213 (2019) 18. Granata, V., Fusco, R., De Muzio, F., et al.: Contrast MR-based radiomics and machine learning analysis to assess clinical outcomes following liver resection in colorectal liver metastases: a preliminary study. Cancers 14, 1110 (2022) 19. Khera, R., Haimovich, J., Hurley, N.C., et al.: Use of machine learning models to predict death after acute myocardial infarction. JAMA Cardiol. 6, 633–641 (2021) 20. Badriyah, T., Sakinah, N., Syarif, I., et al.: Machine learning algorithm for stroke disease classification. In: 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), pp. 1–5 (2020) 21. Principal components analysis (PCA). Computers & Geosciences. 19, 303–342 (1993)

446

Y. Hao et al.

22. Gong, Y., Liu, G., Xue, Y., et al.: A survey on dataset quality in machine learning. Inf. Softw. Technol. 162, 107268 (2023) 23. Li, L., Liu, S., Peng, Y., et al.: Overview of principal component analysis algorithm. Optik 127, 3935–3944 (2016) 24. Johnson, A., Bulgarelli, L., Pollard, T., et al.: MIMIC-IV, https://physionet.org/content/mim iciv/2.2/

Implementation of Campus Pedestrian Detection Using YOLOv5 Yuh-Chung Lin1(B)

, Ta-Wen Kuan1

, Shih-Pang Tseng1,2

, and Xinhang Lv1

1 School of Information Science and Technology, Sanda University, Shanghi 201209, China

[email protected] 2 School of Software and Big Data, Changzhou College of Information Technology,

Changzhou 213164, China

Abstract. This study focuses on the development of accurate human detection using the YOLO open-source deep learning framework derived from OpenCV. The aim is to implement campus pedestrian detection. YOLO is a rapidly evolving and popular detection algorithm in recent years. The campus pedestrian detection system can improve students’ safety, contributing to the creation of a beautiful campus. This paper begins with a requirement analysis, which involves categorizing campus image information into pedestrians, traffic signs, fences, cones, fire hydrants, speed bumps, manhole covers, etc. A YOLOv5 dataset is created for training the model. The training process includes adjusting the model based on the training situation and employing data augmentation techniques such as mosaic enhancement, flipping, and grayscale to balance the richness of data labels and improve the recognition performance of the model. The experimental goal is to achieve a recognition rate of 70% to 80%. Overall, this research proposes a project design for campus pedestrian detection using the YOLO framework. By effectively utilizing YOLO’s fast and efficient capabilities, the project aims to enhance pedestrian safety, contributing to the establishment of a beautiful campus environment. Keywords: YOLO · Object Detection · Deep Learning · Data Augmentation

1 Introduction The safety of pedestrians on campus has always been a major concern, especially in large campuses where there is no separation of traffic between pedestrians and heavy vehicles. In addition to students walking around the campus, there are also heavy motor vehicles driving within the campus. Therefore, how to enhance the safety of students on campus is an important aspect that the school should focus on when constructing a comprehensive and safe learning environment. Artificial intelligence has been applied in various fields, and image recognition technology has made significant progress. Therefore, in order to enhance campus safety, we can leverage AI technology to understand the real-time flow of people and vehicles on campus, guide the path of vehicle movement, and alert drivers. To achieve this goal, we © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 447–455, 2024. https://doi.org/10.1007/978-981-99-9412-0_46

448

Y.-C. Lin et al.

first focus on the issue of pedestrian detection. Only with fast and accurate pedestrian detection technology we can provide a solid foundation for building a campus safety monitoring system. This article mainly uses the YOLO [1] open-source deep learning framework developed from OpenCV to design and develop more accurate person recognition. The YOLO framework has the characteristics of fast and efficient, so it is very suitable for real-time campus pedestrian detection based on YOLO’s fast object detection algorithm. It can be applied to vehicles or unmanned self-driving vehicles to detect pedestrians on the road through visual image recognition and alert drivers or unmanned vehicle control systems to improve campus pedestrian safety. Although YOLO is a universal algorithm with good detection performance, the official dataset may not be applicable to all situations. Therefore, in order to better achieve the goal of campus pedestrian detection, this article uses images taken in actual campus environments as training datasets for the algorithm model. Additionally, improvements were made to the YOLO5 model, including data enhancement and network structure improvements. Experimental results show that the improved model and dataset have better recognition performance for the target and some improvement for small object prediction, while still maintaining real-time detection efficiency.

2 Related Works The object detection is an important task during the visual recognition. It can be divided into two different periods which are based on whether CNN is introduced. The first period is the traditional object detection period and the second is the deep learning based detection period. Viola-Jones detector [2, 3] and Histogram of Oriented Gradients (HOG) [4] are the traditional object detection models which were built as hand-crafted feature extractors. In 2012, AlexNet won the ImageNet Large Scale Visual Recognition Challenge and inspired the development of the deep learning based detection models. The deep learning based models furtherly divided into one-stage and two-stage models. One-stage detectors, such as YOLO, SSD [5], RetinaNet [6], SqueezeNet [7], CornerNet [8], EfficientDet [9], use predefined various scale boxes and aspect ratio to localize object in a single shot. Otherwise, as the name suggests, two-stage detectors, such as RCNN [10], Fast-RCNN [11], FPN [12] etc., divide.

Fig. 1. The roadmap of object detection models.

Implementation of Campus Pedestrian Detection Using YOLOv5

449

The object detection task into two stages to complete. During the first stage, the number of objects in an image is found, and then there objects will be classified and localized in the second stage. Therefore, in terms of accuracy, the Two-Stage detection model outperforms the One-Stage detection model; on the other hand, in terms of speed, the One-Stage detection model has a clear advantage. In Fig. 1, it shows the roadmap of the development of object detection models.

3 YOLO Model YOLO is a type of convolutional deep neural network that can better extract the basic features of the model and reduce the number of parameters compared to traditional neural networks. Next, a comprehensive analysis of the YOLOv5 network structure will be conducted, and the entire detection process will be introduced. Finally, the evaluation indicators of the model will be introduced. There are many different versions of Yolo, and we use Yolov5 to implement pedestrian detection. YOLOv5 is a single-stage object detection machine learning algorithm based on YOLOv4. Through a series of new improvement ideas and solutions, the performance of the algorithm has been improved in terms of speed and accuracy. Then the structure of YOLOv5 is introduced following. Figure 2 shows the network architecture of the YOLOv5s model. The red, green, and blue boxes in the top left corner represent the image input, which is fixed at 640 × 640 × 3 in YOLOv5. The leftmost image is the backbone of the YOLOv5 network, which mainly extracts features from the input image. The three red parts on the far right are the output parts of YOLO, also known as the YOLO algorithm heads, which calculate the network’s output. The two middle columns are the neck parts added to the network backbone and head in recent years for fusing features of different scales. Typically, they consist of feature pyramids or Path Aggregation Networks (PAN) [13]. YOLOv5 uses PAN, which adds a top-down feature fusion path on top of the original bottom-up feature pyramid. In the figure, 640 × 640 × 3 represents an RGB image with a size of 640 × 640 pixels. Among the modules with four parameters, such as Focus, CBL, and YOLO, each parameter represents the input channel number, output channel number, kernel size, and stride, respectively. The modules with two parameters, such as C3, represent the input and output channel numbers, respectively. The modules with a single parameter, such as SPP, Concat, and Upsample, represent the number of output channels. The pink module in the figure represents the basic convolutional unit CBL in YOLOv5. C represents the convolutional layer, which is calculated as described in the previous section. B represents batch normalization (BN), which normalizes the output of all convolutional layers in the same batch to have a mean of zero and a variance of one, regularizing the network and helping it reach the optimal performance point faster, thus reducing training time. L represents the Leakyrelu activation function, which guides the nonlinear function to enhance the expressive power of the network. In the YOLOv5 network, the CBL module is used alone mainly to perform downsampling on the network. The light green module in the figure represents the Focus module, whose structure is shown in Fig. 3. The Focus layer selects every other pixel in the input image, which cuts the input image into four parts and concatenates them in the channel dimension. This is

450

Y.-C. Lin et al.

equivalent to downsampling the image without any information loss while reducing the number of network parameters.

Fig. 2. The network architecture of the YOLOv5s model.

Fig. 3. The structure of the Focus module.

The light blue module in Fig. 2 is the C3 module, which contains deep red residual units. The residual block structure is mainly provided in the residual network, and its characteristic structure is to increase the input energy of the network to the output part, overcoming the practical problem of energy decrease in traditional convolutional neural networks due to the increase in network depth. Additionally, using residual structures can also overcome practical problems such as gradient vanishing and gradient destruction in deep neural networks. The parameters in C3, represented as (128, 128) × 3, indicate

Implementation of Campus Pedestrian Detection Using YOLOv5

451

Fig. 4. The structure of SPP layer.

that the input channel number of the C3 structure is 128, the output channel number is 128, and the C3 module contains three residual units. Meanwhile, (64, 64) represents that the input and output modules of the C3 module have a channel number of 64, with only one residual unit. C3 is an improvement by YOLOv5 on the Cross-Stage Partial Connections (CSP) structure. The CSP module divides the input feature map into two parts for calculation, allowing gradients to propagate through different paths and obtain more gradient combinations. Additionally, because only half of the feature maps in the CSP structure enter the residual unit for calculation, the computational complexity of the network is significantly reduced. The C3 module has one less convolutional layer than the CSP module in YOLOv4, making it less computationally demanding. The light green module in Fig. 2 represents the Spatial Pyramid Pooling (SPP) layer, whose structure is illustrated in Fig. 4, where the green components represent the max pooling layers. The SPP layer compresses the input image using max pooling layers with different sizes and strides, and concatenates the channel numbers, thereby enhancing the network’s perception. The YOLO detection procedure is represented as follows: 1. The input image is divided into an S × S grid. 2. For each grid cell, B bounding boxes and the confidence score for each box are predicted. 3. Each bounding box predicts C class probabilities. 4. Non-maximum suppression (NMS) is performed based on the confidence score.

4 The Implementation and Results Although YOLO is a general algorithm with good detection performance, the official datasets may not be suitable for general situations. Therefore, in order to better achieve the research goal of campus pedestrian detection, it is necessary to make some improvements to the dataset. In this chapter, we will first analyze the dataset and the YOLO model. After a preliminary understanding, we will propose several improvement measures to address the issues. Before officially starting the YOLO deep learning network model training, it is necessary to conduct exploratory analysis of the dataset and observe the characteristics

452

Y.-C. Lin et al.

of most of the data in the dataset. After a preliminary and sufficient understanding of the dataset, it is possible to determine how to improve it in the following steps. The campus pedestrian dataset used in this study was obtained from the campus street. The street view images contain a large amount of campus pedestrian data, which was manually labeled and converted into YOLO annotation files to complete the initial dataset construction. The YOLOv5 dataset consists of image files and label files in the form of txt files containing the coordinates and xy axis information of the marked image points, and the annotated images. The order of the two files corresponds to each other, and during training, the computer can be informed of the specific location of the image category it needs to learn. After organizing, the database dataset consists of a total of 830 images, with 547 images used for training, 188 images for validation, and 95 images for testing. The detection categories include pedestrians, manhole covers, fire hydrants, cones, guardrails, and traffic signs. The number of instances for each category is shown in the figure. There are a total of 4751 instances of pedestrians, 414 instances of guardrails, 343 instances of traffic signs, 242 instances of manhole covers, 170 instances of cones, 78 instances of speed bumps, and 11 instances of fire hydrants. It can be seen that there is a severe data imbalance issue, with significantly more instances of pedestrians compared to the other five categories. To address this issue, data augmentation is applied to the dataset before training. After using the path aggregation network in the YOLOv5 algorithm, the detection performance of small objects has greatly improved compared to YOLOv1-YOLOv4. However, the overall detection performance for large objects is still not ideal, especially for small objects. The algorithm has difficulty distinguishing between manhole covers and road markings, which may be due to the uneven distribution of data in the dataset and poor image angles for manhole covers and road markings, leading to insufficient feature extraction and difficulty in distinguishing them based on existing features. Additionally, the YOLOv5 model has difficulty accurately locating some densely-packed small objects, resulting in errors in image localization, and can also incorrectly identify the background as an object or incorrectly identify multiple objects as a single object. There are two ways to address the issue of targeted data imbalance, one is to add advanced loss functions such as Focal loss to the training loss function, and the other is to use and apply data augmentation to the training data before training. This article adopts the method of data augmentation since modifying advanced loss functions is relatively complex and difficult. Data augmentation is a pre-processing stage applied to images before inputting them into the network, i.e., in the process of creating a dataset, minor changes are made to input images, such as random erasing, adjusting color saturation, adjusting color, and image grayscale, so that the deep neural network perceives it as a brand new image material. In this experiment, mosaic augmentation, grayscale, and automatic histogram contrast were used, and the basic principle of mosaic augmentation is to randomly compress, flip, change the color space, randomly distribute, and stitch together four random images. This approach not only effectively expands the amount of data for object detection but also enriches the background of detected targets in the dataset, reducing the likelihood of misidentifying a detection target due to the background environment.

Implementation of Campus Pedestrian Detection Using YOLOv5

453

One possible reason for the suboptimal performance of YOLOv5 in detecting small objects and classifying large objects is the insufficient feature extraction for small objects. To address this issue, one improvement strategy is to modify the backbone of YOLOv5 by adding an additional convolutional layer for downsampling, so that the convolutional layer can capture more basic features. The drawback of this approach is that the increase in the number of network parameters may lead to a slight decrease in computational speed. Although computational speed is not always the most important factor, a balance must be struck between speed and accuracy. The YOLOv5 model is mainly responsible for detecting small objects in the feature map outputted by 8x downsampling. This means that instances with pixel sizes smaller than 8x8 will be difficult to detect. There are 16,560 instances in the dataset with a size smaller than 8x8, and only 1,200 instances with a pixel size smaller than 4x4. Therefore, using a 4x downsampling feature map for detecting small objects can cover the entire dataset. In addition, the performance of YOLOv5 in locating densely packed small objects is not optimal. To address this issue, a feature fusion improvement scheme is proposed by adjusting the path aggregation network in YOLOv5, taking five feature maps with 4x, 8x, 16x, 32x, and 64x downsampling, respectively, and fusing them to detect large objects, medium-sized objects, small objects, and very small objects in a hierarchical manner. This improvement also involves sacrificing speed for accuracy. After processing the dataset, the model can be trained with it. As using CPU for deep learning is inefficient and yields poor results, this graduation project configured a CUDA environment locally and utilized GPU for training. Due to the hardware environment of NVIDIA 3070 graphics card, the batch size parameter was adjusted to 4 to reduce memory consumption and prevent memory overflow. The project did not use the method of freezing the backbone network to avoid GPU memory overflow. Freezing training is a concept widely used in transfer learning and is commonly applied to object detection tasks. Although freezing can greatly accelerate training speed and save GPU memory, it may slightly reduce the final accuracy. YOLOv5s pre-trained on COCO was used, and the backbone was frozen for training on VOC for 70 epochs, with an AP50 of 89.03%, which was 0.21% lower than that without freezing. The detection results are shown in following Fig. 5.

Fig. 5. The detection results.

Figure 6 shows the loss results during the epoch 300 training. Train/Box: Mean GIoU loss function, as x and y values approach 0, the loss becomes smaller, indicating

454

Y.-C. Lin et al.

Fig. 6. The loss results during the epoch 300 training.

better detection performance. From Fig. 6, we can see that the center values of x and y are around 0.06, indicating good training results. Train/Obj_loss: Mean loss for object detection and recognition. The training value is relatively high, indicating a direction for future optimization. Train/Cls_loss: Mean loss for model classification. The closer the value is to 0, the more accurate the classification of the target. The value is around 0.01, indicating that the training can accurately classify the target.

Fig. 7. The loss results during the epoch 300 validation.

Figure 7 shows the loss results during the epoch 300 validation. Val/Box_loss: Compared to the training data, the loss tends to be around 0.04, with a smooth curve, indicating excellent training results. Val/Obj_loss: The curve fluctuates greatly, indicating poor performance that needs to be further optimized in the future. Val/Cls_loss: Compared to the train dataset, it tends to be closer to 0, around 0.005, indicating excellent performance.

5 Conclusion The paper represents the implementation of pedestrian detection in campus using YOLOv5. For campus pedestrian detection, through experiments we found that the imbalance of the quantity of pedestrian images and other image data will cause interference and affect discrimination. In order to reduce the impact of insufficient datasets, mosaic data enhancement and grayscale data enhancement methods were used to improve and expand the dataset. We also found that in the case of many small objects the YOLO’s recognition results are not good. Therefore, when the dataset is not rich enough it may cause poor recognition rate and we tried to use mosaic data enhancement, grayscale, mirror flipping and other methods to increase the diversity of data to

Implementation of Campus Pedestrian Detection Using YOLOv5

455

improve the poor recognition effect. Down-sampling layer is used to obtain more features. In addition, for the detection of small targets, the feature fusion part of the model is adjusted, and 4 times down-sampling is used to improve the detection of small targets. Acknowledgements. This research is supported by the school research funding, Sanda University (research project no. A020203.23.004.04).

References 1. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788. IEEE, Las Vegas, NV, USA (2016) 2. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. I-I. IEEE, Kauai, HI, USA (2001) 3. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154 (2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb 4. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. San Diego, CA, USA (2005) 5. Liu, W., et al.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, pp. 21–37. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2 6. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327 (2018) 7. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 1 2.3

Diffusion Operator (DO)

In this stage, the two groups are considered as two regions, and a notable disparity in concentration exists between the two regions, denoted as i and j, respectively. The determination of the diffusion direction of molecules is governed by t , as delineated by the computational expression prethe employment of TDO t sented in Eq. (5), and the diffusion process is outlined by Eq. (6), where Xm denotes the position of molecule m, and rand signifies a random number within the interval (0,1). t = C2 × T F t − rand, C2 = 2 (5) TDO  t from region i to region j TDO < rand t = Xm (6) from region j to region i otherwise Initially, we discuss the scenario where region i exhibits a higher concentration than region j. Certain molecules within region i diffuse into region j, while others remain within region i and are influenced accordingly. The quantity of molecules diffusing from region i to region j is as follows: N Tij ≈ Ni × rand × (C3 − C4 ) + Ni × C4 ,

C3 = 0.2, C4 = 0.1

(7)

where Ni represents the total number of molecules in region i. Therefore, the remaining number of molecules in the region i is given by: N Ri ≈ Ni − N Tij

(8)

460

H. Li et al.

The formula for updating the positions of molecules diffusing from region i to region j is as follows: t+1 t t t t t Xij,m = Xj,best +DFijt ×rand×DOF t ×(Jij,m ×Xj,best −Xij,m )+G×Xij,m (9) t t where Xij,m signifies the position of molecule m from region i to region j, Xj,best t represents the optimal position in region j, DFij is the direction factor with a random value of either -1 or 1, and DOF t stands for the time-varying flow direction, calculated by the following formula:

DOF t = e−C5 ×(T F

t

−rand)

,

C5 = 2

(10)

t Additionally, Jij,m stands for the diffusion flux, calculated by the following formula: t dCij t = −D t (11) Jij,m dxij,m t dCij represents the dxtij,m t dCij and dxtij,m are as

where D is the diffusion coefficient with a value of 0.01,

concentration gradient, and the formulas for calculating follows: t t t = Xj,mean − Xi,mean dCij t t dxtij,m = (Xj,beat )2 − (Xij,m )2 + eps

(12) (13)

t t and Xj,mean respectively denote the average positions of where Xi,mean molecules in region i and region j. Furthermore, G is obtained by calculating the Gaussian mutation value using Eq. (2). There are three distinct methods for updating the positions of the remaining molecules in the region i. The first approach involves updating to the optimal position within region i, the second involves updating to both the optimal position within region i and the problem boundaries, and the third involves no change. These calculations are performed using the following formulas: ⎧ t ⎪ 0 < rand < 0.8 ⎨Xi,best t+1 t t (14) Xi,m = Xi,best + DOF × (L + rand × (U − L)) 0.8 ≤ rand < 0.9 ⎪ ⎩ t otherwise Xi,m t In which, Xi,best signifies the optimal position within region i, U and L stand for the upper and lower bounds of the problem. Molecules within region j will undergo movement within that region, and their update formulas are as follows: t+1 t = Xj,best + DOF t × (L + rand × (U − L)) Xj,m

(15)

In the case where region j exhibits a higher concentration than region i, the aforementioned process can be executed by interchanging the i and j.

Fick’s Law Algorithm with Gaussian Mutation

2.4

461

Equilibrium Operator (EO)

In this stage, our updating targets are groups g1 and g2 , and we will initially address the situation within group g1 . The updated formula for molecules within group g1 is as follows: = Xgt1 ,best +Qtg1 ,m ×Xgt1 ,m +Qtg1 ,m ×(M Sgt 1 ,m ×Xgt1 ,best −Xgt1 ,m )+G×Xgt1 ,m Xgt+1 1 ,m (16) where Xgt1 ,m signifies the position of molecule m within group g1 , Xgt1 ,best represents the optimal position within group g1 , and Qtg1 ,m denotes the relative quantity of region for group g1 . Its calculation formula is as follows: Qtg1 ,m = rand × DFgt1 × DRFgt1 ,m

(17)

where DRFgt1 ,m represents the diffusion rate factor within group g1 , and its calculation formula is as follows: DRFgt1 ,m = e−

t Jg 1 ,m TFt

(18)

The calculation formula for Jgt1 ,m is as follows: Jgt1 ,m = −D

dCgt1 dxtg1 ,m

dCgt1 = Xgt1 ,best − Xgt1 ,mean dxtg1 ,m = (Xgt1 ,best )2 − (Xgt1 ,m )2 + eps

(19) (20) (21)

Additionally, M Sgt 1 ,m denotes the step size of movement within group g1 , and its calculation formula is as follows: M Sgt 1 ,m

t F Sg 1 ,best g1 ,m +eps

− F St

=e

(22)

where F Sgt 1 ,best denotes the best fitness value within group g1 , and F Sgt 1 ,m represents the fitness value of molecule m within group g1 . The update formula for group g2 can be obtained by simply replacing g1 with g2 in the aforementioned process. 2.5

Steady-State Operator (SSO)

In this stage, the update formula for molecule m within group g1 is as follows: t t = Xbest + Qtg1 ,m × Xgt1 ,m + Qtg1 ,m × (M Sgt 1 ,m × Xbest − Xgt1 ,m ) + G × Xgt1 ,m Xgt+1 1 ,m (23) t represents the global best position, and Qtg1 ,m is calculated In Eq. (23), Xbest using the Eq. (17), (18), and (19), where the computation formulas for dCgt1 and dxtg1 ,m are modified to: t dCgt1 = Xgt1 ,mean − Xbest

(24)

462

H. Li et al.

dxtg1 ,m =



t (Xbest )2 − (Xgt1 ,m )2 + eps

(25)

Furthermore, the calculation formula for M Sgt 1 ,m is modified to: t F Sbest

− F St

M Sgt 1 ,m = e

g1 ,m +eps

(26)

The update formula for group g2 can be obtained by simply replacing g1 with g2 in the aforementioned process.

3

Experimental Design and Results

In this section, to validate the performance of the proposed GM-FLA, we conducted tests on the algorithm using benchmark functions from CEC2013. Additionally, we compared GM-FLA with several well-known meta-heuristic algorithms and the original FLA. The comparative algorithms are as follows: – Sine Cosine Algorithm (SCA) [8] – Whale Optimization Algorithm (WOA) [4] – Fick’s Law Algorithm (FLA) [21] For the purpose of a fair comparison, we set the particle or population count for all algorithms to 50, iteration counts to 1000, lower search bound to -100, and upper search bound to 100. The settings for other parameters are presented in Table 1.

Table 1: Parameter Settings. Algorithm Parameters SCA

a = 2, r1 decreases linearly from a to 0

WOA

a decreases linearly from 2 to 0, a2 linearly dicreases from −1 to −2, b = 1

FLA

C1 = 0.5, C2 = 2, C3 = 0.1, C4 = 0.2, C5 = 2, D = 0.01

Furthermore, to mitigate anomalies, we executed each algorithm 50 times, selecting the best fitness value out of these 50 runs for comparison. Subsequently, we conducted comparative experiments between GM-FLA and three algorithms on the 10D and 50D of the 28 benchmark functions from CEC2013. 3.1

Results for 10D

The operational outcomes of GM-FLA and the three comparative algorithms on the 28 benchmark functions at the 10D are presented in Table 2. In this table, the symbol “+” indicates that GM-FLA outperforms the respective algorithm in terms of performance, the symbol “≈” signifies that GM-FLA exhibits comparable performance to the respective algorithm, and the symbol

Fick’s Law Algorithm with Gaussian Mutation

463

Table 2: Performance of Four Algorithms on CEC2013 Benchmark Functions in 10 Dimensions. Function SCA

WOA

FLA

GM-FLA

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 +/ ≈ /−

4.65E-02 2.95E+05 1.97E+07 7.49E+03 2.31E+00 2.76E-01 3.45E+01 2.02E+01 4.45E+00 2.89E+00 1.17E+01 2.71E+01 2.50E+01 4.18E+02 5.60E+02 3.83E-01 3.57E+01 3.89E+01 1.88E+00 3.22E+00 1.04E+02 6.33E+02 9.42E+02 2.18E+02 2.17E+02 1.31E+02 4.01E+02 3.11E+02 28/0/0

3.87E-03 2.29E+05 1.31E+06 4.57E+03 1.35E-02 2.32E-01 1.18E+01 2.03E+01 3.14E+00 1.07E+00 3.19E-03 1.01E+01 1.10E+01 5.34E-01 5.55E+02 3.75E-01 4.19E-01 2.57E+01 1.97E-01 2.45E+00 1.02E+02 2.24E+00 6.70E+02 2.11E+02 1.31E+02 1.14E+02 3.62E+02 1.01E+02 26/1/1

1.25E-03 7.26E+04 5.56E+04 4.29E+03 8.23E-03 2.48E-03 1.31E+01 2.02E+01 3.04E+00 8.78E-01 3.07E-03 3.05E+00 1.00E+01 2.47E-01 3.94E+02 2.92E-01 1.29E-01 2.38E+01 4.64E-02 2.05E+00 1.01E+02 4.42E-01 1.31E+02 1.18E+02 1.25E+02 1.12E+02 3.33E+02 1.01E+02 –

1.70E+02 1.69E+06 2.08E+08 2.78E+03 6.33E+01 2.10E+01 3.48E+01 2.03E+01 6.24E+00 3.26E+01 4.43E+01 4.60E+01 3.97E+01 9.48E+02 1.09E+03 7.99E-01 4.60E+01 5.32E+01 5.89E+00 2.92E+00 4.04E+02 1.08E+03 1.27E+03 2.21E+02 2.20E+02 1.56E+02 5.22E+02 6.21E+02 27/0/1

“−” denotes that GM-FLA performs less favorably compared to the respective algorithm. In Table 2, it is evident that GM-FLA exhibits superiority over SCA in functions F1, F2, F3, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, F23, F24, F25, F26, F27, and F28; GM-FLA demonstrates an advantage over WOA across all functions; GM-FLA surpasses FLA in functions F1, F2, F3, F4, F5, F6, F8, F9, F10, F11, F12, F13, F14, F15, F16,

464

H. Li et al.

F17, F18, F19, F20, F21, F22, F23, F24, F25, F26, and F27, with similar performance on function F28. In conclusion, we contend that GM-FLA demonstrates favorable performance in the 10D. 3.2

Results for 50D

The operational outcomes of GM-FLA and the three comparative algorithms on the 28 benchmark functions at the 50D are presented in Table 3.

Table 3: Performance of Four Algorithms on CEC2013 Benchmark Functions in 50 Dimensions. Function SCA

WOA

FLA

GM-FLA

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 +/ ≈ /−

8.32E+02 8.81E+07 4.02E+10 6.82E+04 7.62E+02 2.69E+02 2.02E+02 2.11E+01 6.53E+01 5.33E+02 7.11E+02 8.02E+02 8.26E+02 8.15E+03 1.04E+04 1.94E+00 1.00E+03 9.89E+02 2.20E+02 2.45E+01 9.10E+02 1.08E+04 1.16E+04 3.90E+02 4.01E+02 4.70E+02 2.09E+03 5.33E+03 28/0/0

1.83E+01 1.72E+07 2.24E+09 4.39E+04 5.75E+00 4.48E+01 1.10E+02 2.11E+01 4.81E+01 3.19E+01 5.82E+01 2.67E+02 3.54E+02 1.01E+03 7.96E+03 2.10E+00 1.20E+02 5.06E+02 1.07E+01 2.17E+01 2.83E+02 1.92E+03 8.19E+03 3.38E+02 3.66E+02 2.03E+02 1.64E+03 4.33E+02 23/1/4

1.73E+01 2.20E+07 1.61E+09 4.42E+04 5.09E+00 4.09E+01 9.46E+01 2.10E+01 4.39E+01 3.78E+01 5.67E+01 2.48E+02 3.74E+02 8.67E+02 7.40E+03 1.73E+00 1.08E+02 4.89E+02 9.82E+00 2.14E+01 2.81E+02 1.48E+03 8.83E+03 3.36E+02 3.59E+02 2.02E+02 1.44E+03 4.29E+02 –

2.69E+04 3.33E+08 7.20E+10 6.94E+04 2.64E+03 1.80E+03 1.82E+02 2.11E+01 7.06E+01 3.33E+03 6.64E+02 6.71E+02 7.26E+02 1.28E+04 1.37E+04 3.00E+00 9.33E+02 9.03E+02 2.88E+04 2.36E+01 3.79E+03 1.40E+04 1.43E+04 4.16E+02 4.42E+02 2.66E+02 2.29E+03 5.00E+03 28/0/0

Fick’s Law Algorithm with Gaussian Mutation 10

10 11

85 SCA WOA FLA GM-FLA

9 8

SCA WOA FLA GM-FLA

80

75

Fitness value

7

Fitness value

465

6 5 4 3

70

65

60

2 55 1 0

50 100

200

300

400

500

600

700

800

900

1000

100

200

300

400

Iteration

(a) F3 SCA WOA FLA GM-FLA

Fitness value

Fitness value

800

900

1000

SCA WOA FLA GM-FLA

2500

1500

1000

500

2000

1500

1000

0

500 100

200

300

400

500

600

700

800

900

1000

100

200

300

400

Iteration

10

500

600

700

800

900

1000

Iteration

(c) F12

(d) F13

4

6.5 SCA WOA FLA GM-FLA

1.7 1.6

SCA WOA FLA GM-FLA

6 5.5 5

Fitness value

1.5

Fitness value

700

3000

2000

1.4 1.3 1.2

4.5 4 3.5 3

1.1

2.5

1

2

0.9

1.5 100

200

300

400

500

600

700

800

900

1000

100

200

300

400

Iteration

500

600

700

800

900

1000

Iteration

(e) F15

(f) F16

12000

480 SCA WOA FLA GM-FLA

10000

SCA WOA FLA GM-FLA

460

440

Fitness value

8000

Fitness value

600

(b) F9

2500

1.8

500

Iteration

6000

420

4000

400

2000

380

0

360 100

200

300

400

500

600

Iteration

(g) F21

700

800

900

1000

100

200

300

400

500

600

700

800

900

1000

Iteration

(h) F24

Fig. 1: Convergence curves of GM-FLA and comparison algorithms on some functions of CEC2013.

466

H. Li et al.

In Table 3, it is evident that GM-FLA demonstrates an advantage over SCA across all functions; GM-FLA demonstrates an advantage over WOA across all functions; GM-FLA surpasses FLA in functions F1, F3, F5, F6, F7, F8, F9, F11, F12, F14, F15, F16, F17, F18, F19, F20, F21, F22, F24, F25, F26, F27, and F28, with similar performance on function F4. In conclusion, we contend that GM-FLA demonstrates favorable performance in the 50D. 3.3

Convergence Curve

The partial convergence curves of GM-FLA and the three comparison algorithms in 50D are shown in Fig. 1. These curves illustrate the variations of fitness values with respect to the number of iterations during the optimization processes of each algorithm. The horizontal axis represents the number of iterations, and the vertical axis represents the fitness values. By observing the convergence curves, it is evident that in the early iterations, there is a significant fluctuation in fitness values. However, with the increase in the number of iterations, the convergence curves gradually converge toward the optimal solution. GM-FLA exhibits superior convergence performance compared to SCA, WOA, and FLA on the convergence curves of the 28 functions.

4

Conclusions

In this paper, we have proposed a meta-heuristic algorithm called Fick’s law algorithm with Gaussian mutation (GM-FLA). To mitigate the issue of local optima trapping and escape from local optima in Fick’s Law Algorithm, we have enhanced it using Gaussian mutation. Subsequently, we have conducted comparative experiments between GM-FLA and three other optimization algorithms on the benchmark functions of CEC2013. The experimental results are demonstrated the significant advantages of GM-FLA over other algorithms in 10D and 50D dimensions. Furthermore, through the comparison of convergence curves, it is evident that GM-FLA exhibits superior convergence performance compared to other algorithms. Thus, we conclude that our proposed GM-FLA effectively avoids falling into local optima and possesses the capability to escape from local optima.

References 1. Squires, M., Tao, X., Elangovan, S., Gururajan, R., Zhou, X., Acharya, U.R.: A novel genetic algorithm based system for the scheduling of medical treatments. Expert Syst. Appl. 195, 116464 (2022) 2. Song, Y., et al.: Dynamic hybrid mechanism-based differential evolution algorithm and its application. Expert Syst. Appl. 213, 118834 (2023) 3. Poli, R., Kennedy, J., Blackwell, T.: Particle swarm optimization: an overview. Swarm Intell. 1, 33–57 (2007) 4. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016)

Fick’s Law Algorithm with Gaussian Mutation

467

5. Atashpaz-Gargari, E., Lucas, C.: Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition. In: 2007 IEEE Congress on Evolutionary Computation, pp. 4661–4667. IEEE (2007) 6. Rao, R.V., Savsani, V.J., Vakharia, D.P.: Teaching–learning-based optimization: a novel method for constrained mechanical design optimization problems. Comput.Aided Design 43(3), 303–315 (2011) 7. Hatamlou, A.: Black hole: A new heuristic optimization approach for data clustering. Inf. Sci. 222, 175–184 (2013) 8. Mirjalili, S.: SCA: a sine cosine algorithm for solving optimization problems. Knowl.-Based Syst. 96, 120–133 (2016) 9. Wu, T.-Y., Li, H., Chu, S.-C.: CPPE: an improved phasmatodea population evolution algorithm with chaotic maps. Mathematics 11(9), 1977 (2023) 10. Wu, T.-Y., Shao, A., Pan, J.-S.: CTOA: toward a chaotic-based tumbleweed optimization algorithm. Mathematics 11(10), 2339 (2023) 11. Chen, C.-M., Lv, S., Ning, J., Wu, J.M.-T.: A genetic algorithm for the waitable time-varying multi-depot green vehicle routing problem. Symmetry 15(1), 124 (2023) 12. Wang, Y.-J., Chen, M.-C., Ku, C.S.: An improved archimedes optimization algorithm (IAOA). J. Netw. Intell. 8(3), 693–709 (2023) 13. Tan, W., Lv, Q., Chengcai, J., Yikun, H.: Knee solution-driven, decompositiondased multi-objective particle swarm optimization for ontology meta-matching. J. Netw. Intell. 8(3), 965–990 (2023) 14. Phulara Shaik, A.L.H., Manoharan, M.K., Pani, A.K., Avala, R.R., Chen, C.M.: Gaussian mutation–spider monkey optimization (GM-SMO) model for remote sensing scene classification. Remote Sens. 14(24), 6279 (2022) 15. Kang, L., Chen, R.-S., Xiong, N., Chen, Y.-C., Yu-Xi, H., Chen, C.-M.: Selecting hyper-parameters of gaussian process regression based on non-inertial particle swarm optimization in internet of things. IEEE Access 7, 59504–59513 (2019) 16. Zhang, X., et al.: Gaussian mutational chaotic fruit fly-built optimization and feature selection. Expert Syst. Appl. 141, 112976 (2020) 17. Xiong, G., Zhang, J., Shi, D., Zhu, L., Yuan, X., Tan, Z.: Winner-leading competitive swarm optimizer with dynamic gaussian mutation for parameter extraction of solar photovoltaic models. Energy Convers. Manage. 206, 112450 (2020) 18. Song, S., et al.: Dimension decided Harris hawks optimization with gaussian mutation: balance analysis and diversity patterns. Knowl.-Based Syst. 215, 106425 (2021) 19. Rajesh, P., Shajin, F.H.: Optimal allocation of EV charging spots and capacitors in distribution network improving voltage and power loss by quantum-behaved and gaussian mutational dragonfly algorithm (QGDA). Electr. Power Syst. Res. 194, 107049 (2021) 20. Zhou, W., Wang, P., Heidari, A.A., Zhao, X., Chen, H.: Spiral gaussian mutation sine cosine algorithm: framework and comprehensive performance optimization. Expert Syst. Appl. 209, 118372 (2022) 21. Hashim, F.A., Mostafa, R.R., Hussien, A.G., Mirjalili, S., Sallam, K.M.: Fick’s law algorithm: a physical law-based algorithm for numerical optimization. Knowl.Based Syst. 260, 110146 (2023)

Barnacle Growth Algorithm (BGA): A New Bio-Inspired Metaheuristic Algorithm for Solving Optimization Problems Ankang Shao1 , Shu-Chuan Chu1 , Yeh-Cheng Chen2 , and Tsu-Yang Wu1,3(B) 1

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China {201703204419,scchu0803}@sdust.edu.cn, [email protected], [email protected] 2 Department of Computer Science, University of California, Davis, CA 001313, USA 3 School of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science and Technology, Nanjing 210044, China

Abstract. Metaheuristic algorithms are an important area of artificial intelligence research and a popular method for solving complex optimization problems. In this paper, we propose a new metaheuristic algorithm called the Barnacle Growth Algorithm (BGA). BGA simulates three stages of barnacle growth, namely planktonic stage, exploratory stage, and mature stage. In three stages, particles will have different behaviors and parameters. In the first process, BGA will find as many local optimal solutions as possible in the solution space. In the second process, it finds the global optimal solution around. In the third stage, BGA will re-explore the surrounding area for some search individuals to ensure that they will not fall into local optimum. The convergence speed performance of BGA is tested using the benchmark function of CEC 2013 by experiments. Experiment results are shown that BGA has better convergence speed and search ability on unimodal functions and multimodal functions, compared with the same type optimization algorithms. Applying the BGA algorithm to the optimization problem of city power transmission network, it is verified that the BGA algorithm can be applied to real problems. At the same time, the experimental results are shown that transmission route generated by the BGA algorithm can achieve the lowest power loss. Keywords: Metaheuristic algorithm · Barnacle Growth Algorithm Convergence speed · Power transmission · Optimization problems

1

·

Introduction

An optimization problem is to obtain the optimal solution under certain constraints. Traditional optimization methods mainly include exhaustive method c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024  J.-S. Pan et al. (Eds.): ICGEC 2023, LNEE 1114, pp. 468–479, 2024. https://doi.org/10.1007/978-981-99-9412-0_48

Hamiltonian Mechanics

469

and mathematical strategy method [6]. However, with the development of society and the increase of demand, the complexity of problems that people need to optimize increases exponentially [5,30]. Traditional optimization algorithms are difficult to obtain satisfactory results within a reasonable time, so research on efficient intelligent optimization algorithms has attracted the attention of researchers [2]. The metaheuristic algorithms are able solve large-scale and complex optimization problems [11,29]. The inspiration of the metaheuristic algorithm is inspired by nature, including various biological phenomena and physical phenomena. It can be roughly divided into evolutionary algorithms, swarm intelligence algorithms, and physical algorithms [20]. And these methods have been used to solve many optimization problems, such as feature selection [22,25], scheduling strategy [1,9], hyperparameter optimization [23,28,33], constraint problems [12,16,21] and so on. In this paper, a new metaheuristic algorithm is proposed named the Barnacle Growth Algorithm. Barnacles are ancient creatures dating back to the Jurassic period. They can swim freely in sea water when they are born [14,26]. As barnacles grow, they secrete an adhesive that keeps them firmly attached to surfaces. The barnacle optimization algorithm imitates three processes in the growth cycle of barnacles, namely planktonic phase, exploration phase, and maturation phase. The barnacle optimization algorithm provides the different behaviors of the barnacles in the three stages to design different position update formulas. To verify the performance of the proposed barnacle optimization algorithm, we conduct a performance evaluation using CEC2013. And four popular optimization algorithms are selected for performance comparison, including PSO [18], DE [24], GOA [19], and MFO [32]. Experimental results are shown that when compared with other optimization algorithms of the same type, BGA has better convergence speed and search ability on unimodal functions and multimodal functions. Furthermore, we apply BGA to solve the urban power transmission problem, taking the sequence of each power transmission network as a feasible solution. After several iterations, the path with the minimum loss is finally selected. The experimental results show that the power transmission scheme generated by the BGA algorithm can achieve the lowest power loss. The remaining chapters of this paper are shown as follows. The Sect. 2 explains the implementation of inspiration and process of the Barnacle Growth Algorithm. Section 3 describes the experimental analysis and a comparative study with other existing methods. And an application of Barnacle Growth Algorith in power transmission is shown on Sect. 4. The Sect. 5 is the conclusion of this paper.

2

Proposed Optimization Algorithm

In this section, BGA is given as a new metaheuristic optimization algorithm with its inspiration and algorithmic structure.

470

2.1

A. Shao et al.

Inspiration

The barnacle is a widespread arthropod that can live in almost any sea. Due to the species’ strong adaptations on the surface of the skin of sea turtles, whales and even on the bottom of fishing boats. Barnacles are roughly divided into three stages on their lifetime. They are the naupliar larvae (planktonic stage), the cyprid larvae (exploratory stage), and the adult barnacle (mature stage), which are shown on Fig. 1.

adult barnacle

naupliar larvae

cyprid larvae

Fig. 1. The three stages of barnacles.

The planktonic stage is due to the fact that barnacle larvae are very weakly moving, mainly with the currents of the sea. However, in order to increase the size of the barnacle population, they secrete a pheromone that attracts more larvae to swim closer to the population. During the exploration stage, the barnacle larvae select the landing position and rely on the glia secreted by their antennae to fix their bodies on the surface of their growth position [3]. At this stage, larvae will also consider various factors such as humidity, temperature, and salinity to make positional adjustments. However, when the stored energy is about to run out, their selectivity towards fixed sites will decrease. This fixed process is irreversible, and the barnacle will enter the ”residential” [7,10]. The mature stage is the third stage of barnacle growth. Once fixed, the barnacle can evolve into a feeding organ that captures life and food in the seawater, providing energy for reproduction. During reproduction, eggs are released into seawater. These eggs circulate again in three stages. 2.2

Algorithm Structure

Based on the three stages of the growth cycle of barnacles, the Barnacle Growth Algorithm (BGA) was proposed. Before the algorithm introduction begins, some basic symbols need to be explained and displayed in the Table 1.

Hamiltonian Mechanics

471

Table 1. Relevant notations

Symbol Meaning Xti

The ith individual in tth iterations

Xgbest

Global best individual

fit(Xti ) Individual fitness value

Planktonic Stage. The larvae of barnacles mainly move with the flow of seawater or are influenced by pheromones, swimming towards positions close to the population. Therefore, in the BGA algorithm, a pheromone concentration factor β is used to select two motion modes. When β is less than the concentration threshold, a random Brownian motion of seawater is performed. When β exceeds the concentration threshold, it moves towards the average position of the population. These behaviors are represented by the following equation:  Xti + M ove, if β