Advanced Intelligent Computing Technology and Applications: 19th International Conference, ICIC 2023, Zhengzhou, China, August 10–13, 2023, ... I (Lecture Notes in Computer Science, 14086) [1st ed. 2023] 9819947545, 9789819947546

This three-volume set of LNCS 14086, LNCS 14087 and LNCS 14088 constitutes - in conjunction with the double-volume set L

130 116

English Pages 822 [815] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Advanced Intelligent Computing Technology and Applications: 19th International Conference, ICIC 2023, Zhengzhou, China, August 10–13, 2023, ... I (Lecture Notes in Computer Science, 14086) [1st ed. 2023]
 9819947545, 9789819947546

Table of contents :
Preface
Organization
Contents – Part I
Evolutionary Computation and Learning
A Region Convergence Analysis for Multi-mode Stochastic Optimization Based on Double-Well Function
1 Introduction
2 Double-Well Function
3 Region Convergence—Definition and Discussion
4 Experiment and Result Analysis
4.1 Experiment Environment and Basic Setting
4.2 2-D Experiment
4.3 10-D Experiment
5 Conclusion
References
ADMM with SUSLM for Electric Vehicle Routing Problem with Simultaneous Pickup and Delivery and Time Windows
1 Introduction
2 Problem Description and Mathematical Formulation
3 Solution Method
3.1 Construction of the Augmented Lagrangian Model
3.2 Decomposition and Linearization of the Augmented Lagrangian Model
3.3 Labeling-Setting Algorithm
3.4 Sequential Updating Scheme of the Lagrangian Multiplier
4 Experiments and Results
5 Conclusions and Future Research
References
A Nested Differential Evolution Algorithm for Optimal Designs of Quantile Regression Models
1 Introduction
2 Preliminaries
2.1 Differential Evolution Algorithm
2.2 Locally D-Optimal Designs for Quantile Regression
3 The Nested DE Algorithm for Maximin Optimal Designs
4 Applications of Dose-Response Models
4.1 Optimal Designs for Michaelis-Menten Models
4.2 Optimal Designs for Emax Models
4.3 Optimal Designs for Exponential Models
5 Conclusion
References
Real-Time Crowdsourced Delivery Optimization Considering Maximum Detour Distance
1 Introduction
2 Problem Definition
2.1 Model and Parameters Definition
2.2 Mathematical Model
3 Solution Approach Under Static Model
3.1 Lagrangian Relaxation Decomposition Algorithm
3.2 Infeasible Solution Repair Strategy
3.3 Subgradient Algorithm
3.4 LRDRM Algorithm Process
4 Improved Dynamic Optimization Algorithm
5 Simulation Result and Comparisons
5.1 Experimental Setup
5.2 Results and Comparison
6 Conclusion
References
Improving SHADE with a Linear Reduction P Value and a Random Jumping Strategy
1 Introduction
2 LRP-SHADE
2.1 SHADE
2.2 Linear Reduction P Value with a Random Jumping Strategy
3 Experiment
3.1 Strategy Verification
3.2 LRP-SHADE Performance Verification
3.3 Convergence Graph
4 Conclusion
References
Detecting Community Structure in Complex Networks with Backbone Guided Search Algorithm
1 Introduction
2 Related Works
3 Backbone Guided Search Algorithm
3.1 Local Search Procedure
3.2 Backbone Guided Search Procedure
4 Experimental Results
5 Conclusions
References
Swarm Intelligence and Optimization
Large-Scale Multi-objective Evolutionary Algorithms Based on Adaptive Immune-Inspirated
1 Introduction
2 Related Works
2.1 Large-Scale Multi-objective Optimization
2.2 Immune Optimization Algorithm
2.3 Competitive Swarm Optimizaion
3 Proposed Method
3.1 The Framework of LMOIA
3.2 Adaptive Immunization Strategy
3.3 Immune-Inspired Strategy
3.4 Competitive Learning Strategy
3.5 Environment Selection
4 Experimental Studies
4.1 Experimental Setting
4.2 Discussions of the Proposed Strategies
4.3 Comparisons Between LMOIA and Existing MOEAs
5 Conclusions
References
Hybrid Hyper-heuristic Algorithm for Integrated Production and Transportation Scheduling Problem in Distributed Permutation Flow Shop
1 Introduction
2 Problem Description and Formulation
2.1 Problem Description
2.2 Mathematical Model
3 Proposed Algorithm
3.1 Encoding and Decoding
3.2 LLHs and HLS
3.3 The HHHA
4 Computational Results
5 Conclusion Remarks
References
A Task Level-Aware Scheduling Algorithm for Energy Consumption Constrained Parallel Applications on Heterogeneous Computing Systems
1 Introduction
1.1 Background
1.2 Main Contributions
2 Related Work
3 Models and Preliminaries
3.1 Application Model
3.2 Energy Model
3.3 Preliminaries
4 Our Solution
4.1 Problem Definition
4.2 Satisfying the Energy Consumption Limit
4.3 Algorithm for Minimizing the Scheduling Length
4.4 Example of the TSAECC Algorithm
5 Experiments
5.1 Experimental Metrics
6 Conclusion
References
An Energy-Conscious Task Scheduling Algorithm for Minimizing Energy Consumption and Makespan in Heterogeneous Distributed Systems
1 Introduction
2 Related Work
3 Models
3.1 System Model
3.2 Energy Model
3.3 Problem Description
4 Proposed Task Scheduling Algorithm
4.1 Chromosome Representation and Initial Population
4.2 Evaluation
4.3 Selection
4.4 Superior Individual Strategy
4.5 Crossover, Mutation, Termination
5 Experiments
5.1 Metrics
5.2 Comparison Experiments
6 Conclusions
References
A Hyper-Heuristic Algorithm with Q-Learning for Distributed Permutation Flowshop Scheduling Problem
1 Introduction
2 DPFSP
3 HHQL for DPFSP
3.1 Encoding and Decoding
3.2 Initialization
3.3 HHQL
4 Simulation and Comparisons
5 Conclusion and Further Work
References
Robot Path Planning Using Swarm Intelligence Algorithms
1 Introduction
2 Problem Statement and Related Work
3 Methodology
3.1 Proposed Model Framework
3.2 Genetic Algorithm Implementation
3.3 Particle Swarm Optimization Implementation
4 Results and Discussion
4.1 Outcomes Using Genetic Algorithm
4.2 Outcomes Using Particle Swarm Optimization
4.3 Comparison Between GA and PSO Models
5 Conclusions and Future Work
References
Deep Reinforcement Learning for Solving Multi-objective Vehicle Routing Problem
1 Introduction
2 Problem Description
3 Algorithm Framework
3.1 Decomposition Strategy
3.2 Sub-problem Modeling
3.3 Training
4 Computational Results
4.1 Settings
4.2 Result
5 Experimental Results
References
Probability Learning Based Multi-objective Evolutionary Algorithm for Distributed No-Wait Flow-Shop and Vehicle Transportation Integrated Optimization Problem
1 Introduction
2 DNFVTIOP
2.1 Notation Definition
2.2 Problem Description
3 PLMOEA for DNFVTIOP
3.1 Solution Representation
3.2 Proposed Probability Matrices and Updating Mechanisms
3.3 Proposed Cooperation Strategy Based on the Nondominated Sorting
4 Experimental Results and Comparison
5 Conclusions and Future Research
References
Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective Two-Echelon Vehicle Routing Problem with Time Windows
1 Introduction
2 Problem Description
3 HHACOA for MO2E-VRPTW
3.1 Solution Representation
3.2 Population Initialization
3.3 Low-Level Heuristic Operations
3.4 Generation of High-Level Populations
3.5 The Flow of the HHACOA
4 Simulation Experiments and Result Analysis
4.1 Experimental Setup
4.2 Measures for Comparing Non-dominated Sets
5 Results and Discussion
6 Conclusion and Future Work
References
A Task-Duplication Based Clustering Scheduling Algorithm for Heterogeneous Computing System
1 Introduction
2 Related Work
3 Models and Problem Definition
3.1 Application Models
3.2 Problem Definition
4 Proposed Algorithm
4.1 Preparation Phase
4.2 Initial Clustering
4.3 Processor Selection Phase
5 Experimental Results and Discussion
5.1 Comparative Indicators
5.2 Randomly Generated Task Maps
5.3 Real-World Application Graphs
6 Conclusion
References
Hyper-heuristic Q-Learning Algorithm for Flow-Shop Scheduling Problem with Fuzzy Processing Times
1 Introduction
2 FSPF
2.1 Problem Descriptions
2.2 Triangular Fuzzy Numbers and Related Operations
3 HHQLA for FSPF
3.1 LLHs and Encoding
3.2 Q-learning Algorithm
3.3 Sampling Strategy
3.4 General Framework of HHQLA
4 Computational Result and Comparisons
5 Conclusions and Future Research
References
Hyper-heuristic Estimation of Distribution Algorithm for Green Hybrid Flow-Shop Scheduling and Transportation Integrated Optimization Problem
1 Introduction
2 GHFSSTIOP
2.1 Problem Assumptions and Symbol Definition
2.2 Problem Model
3 Feature of the Solution
3.1 Encoding and Decoding
4 Hyper-heuristic Estimation of Distribution Algorithm
4.1 Population Initialization
4.2 Probabilistic Model
4.3 High-Level Strategic Domain Population Update
4.4 Local Search
5 Simulation Testing and Comparisons
6 Conclusions and Future Research
References
Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem with Sequence-Independent Setup Times and Resource Constraints
1 Introduction
2 FJSP with SISTs and RCs
2.1 Problem Description
2.2 FJSP with SISTs and RCs
3 IEDA-HH for FJSP with SISTs and RCs
3.1 Encoding and Decoding of High-Level Strategy Domains
3.2 Encoding and Decoding of Low-Level Problem Domains
3.3 Probabilistic Model and Update Mechanism
3.4 Represent for Low-Level Heuristics
3.5 IEDA-HH for FJSP with SISTs and RCs
3.6 Computational Complexity Analysis of IEDA-HH
4 Computational Result and Comparisons
5 Conclusions and Future Research
References
Learning Based Memetic Algorithm for the Monocrystalline Silicon Production Scheduling Problem
1 Introduction
2 Description of the MSP Scheduling Problem
3 Learning Memetic Algorithm
3.1 Encoding and Decoding, Initialization
3.2 Learning Mechanism
3.3 Local Optimization Strategy
3.4 LMA for the MSP Scheduling Problem
4 Simulation Results and Comparison
4.1 Experimental Setup
4.2 Computational Results and Comparison
5 Conclusions and Future Work
References
Learning Variable Neighborhood Search Algorithm for Solving the Energy-Efficient Flexible Job-Shop Scheduling Problem
1 Introduction
2 Problem Description
3 LVNSPM for EEFJSP
3.1 Chromosome Representation and Initialization
3.2 Three-Dimensional Matrix Guide for Global Search
3.3 Variable Neighborhood Local Search
4 Experiment Results
5 Conclusions and Future Work
References
A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm for the Distributed Flexible Job-Shop Scheduling Problem
1 Introduction
2 Problem Description
3 Q-Learning-Based Hyper-heuristic Evolutionary Algorithm
3.1 Encoding and Decoding Strategy
3.2 Hybrid Initialization of Populations
3.3 Neighborhoods of DFJSP
3.4 Q-Learning-Based High-Level Strategy
3.5 The Procedure of QHHEA for DFJSP
4 Experimental Comparisons and Statistical Analysis
5 Conclusion and Future Work
References
A Firefly Algorithm Based on Prediction and Hybrid Samples Learning
1 Introduction
2 Standard FA
3 Proposed Approach
3.1 Prediction-Based Tournament
3.2 Hybrid Sample Attraction Model
3.3 Modification of Motion Strategy and Parameters
3.4 Framework of PHSFA
4 Experiments
4.1 Experiment Settings
4.2 Comparison of Attraction Strategies
4.3 PHSFA Compared to Other FA Variants
4.4 Time Complexity Analysis
5 Conclusion
References
Fireworks Algorithm for Dimensionality Resetting Based on Roulette
1 Introduction
2 Fireworks Algorithm
3 Proposed Approach
3.1 Offset Position
3.2 Mapping Rule
3.3 Roulette-Based Dimension Reset
4 Experiments and Disscutions
4.1 Parameters
4.2 CEC2020 Test Functions
4.3 Results
5 Conclusion
References
Improved Particle Swarm Optimization Algorithm Combined with Reinforcement Learning for Solving Flexible Job Shop Scheduling Problem
1 Introduction
2 Problem Description and Mathematical Model
2.1 Problem Description
2.2 Mathematical Model
3 IPSORL
3.1 Opposition-Based Learning (OBL)
3.2 Q-Learning
3.3 Improved PSO Algorithm
4 Experimental Results and Discussion
5 Conclusion
References
A Learning-Based Multi-Objective Evolutionary Algorithm for Parallel Machine Production and Transportation Integrated Optimization Problem
1 Introduction
2 MOPMPTIOP
2.1 Problem Description
2.2 Mathematical Model of MOPMPTIOP
3 LMOEA for MOPMPTIOP
3.1 Solution Representation
3.2 Genetic Operations
3.3 New Population Generation
4 Computational Result and Comparisons
5 Conclusions and Future Research
References
Q-Learning Based Particle Swarm Optimization with Multi-exemplar and Elite Learning
1 Introduction
2 Background Information
2.1 Classic PSO
2.2 Q-Learning
2.3 QL-Based PSO Variants
3 The Proposed Algorithm
3.1 The QPSOEL Structure
3.2 Multi-exemplar Selection Strategy
3.3 Elite Learning Strategy
4 Experiments and Comparison
4.1 Benchmark Functions and Compared Algorithm
4.2 Experimental Results
5 Conclusion and Future Work
References
A Branch and Bound Algorithm for the Two-Machine Blocking Flowshop Group Scheduling Problem
1 Introduction
2 Problem definition
2.1 Notations
2.2 Formulate the BFGSP
3 The Branch-and-Bound Algorithm
3.1 Solution Representation
3.2 Heuristics
3.3 Lower Bounds and Dominance Rules
3.4 The Branch and Bound Algorithm
4 Computational Experiments
5 Conclusions and Future Research
References
Deep Reinforcement Learning for Solving Distributed Permutation Flow Shop Scheduling Problem
1 Introduction
2 Distributed Permutation Flow Shop Scheduling Problem
2.1 Problem Description
2.2 Mathematical Model of DPFSP
3 Algorithm Design
3.1 Factory Allocation
3.2 State
3.3 Action Space
3.4 Reward Function
3.5 Deep Q-Network
4 Computational Experiments
4.1 Settings of Parameters
4.2 Computational Results
5 Conclusions and Future Research
References
Mobile Edge Computing Offloading Problem Based on Improved Grey Wolf Optimizer
1 Introduction
2 System Design
2.1 System Model
2.2 Communication Model
2.3 Calculation Model
2.4 Cost Model
3 Decision-Making Method Based On GWO
3.1 Improved GWO
3.2 Algorithm Complexity Analysis
3.3 OPGWO Validation
3.4 V-function Mapping Strategy
3.5 Set Up Edge Computing Model Encoding
4 Simulation Experiments
4.1 Setting of Experimental Parameters
4.2 Analysis of Results
5 Conclusion
References
Runtime Analysis of Estimation of Distribution Algorithms for a Simple Scheduling Problem
1 Introduction
2 Preliminaries
2.1 Sorting Problems with Deteriorating Effect
2.2 First Hitting Time and Convergence Time
3 FHT of EDA with Truncation Selection on SMSDE
4 Conclusion and Further Work
References
Nonlinear Inertia Weight Whale Optimization Algorithm with Multi-strategy and Its Application
1 Introduction
2 Whale Optimization Algorithm
2.1 Enclosing Prey Stage
2.2 Bubble Net Attacking Stage
2.3 Searching Prey Stage
3 Proposed Whale Optimization Algorithm
3.1 Latin Hypercube Sampling
3.2 Cauchy Distribution
3.3 Nonlinear Weight
3.4 Algorithm MSNWOA
4 Algorithm Simulation Analysis
5 Pressure Vessel Design
6 Conclusion
References
Sparrow Search Algorithm Based on Cubic Mapping and Its Application
1 Introduction
2 Sparrow Search Algorithm
3 Improvement of Sparrow Search Algorithm
3.1 Section of Population Initialization
3.2 Part of Producer and Scrounger
4 Simulation Experiment and Result Analysis
5 Design of Three-Bar Truss
6 Conclusion
References
Hyper-heuristic Three-Dimensional Estimation of Distribution Algorithm for Distributed Assembly Permutation Flowshop Scheduling Problem
1 Introduction
2 Problem Statement
3 Algorithm Design
3.1 Encoding and Decoding
3.2 HH3DEDA for DAPFSP
4 Simulation Result and Comparisons
5 Conclusions and Future Research
References
An Improved Genetic Algorithm for Vehicle Routing Problem with Time Windows Considering Temporal-Spatial Distance
1 Introduction
2 Problem Description
2.1 Symbolic Representations
2.2 Assumptions and Formulas
3 The Proposed Algorithm
3.1 Encoding
3.2 Initialization Method
3.3 Problem-Specific Crossover Method
4 Computational Experiments
4.1 Experimental Instances and Performance Criteria
4.2 Numerical Experiments
5 Conclusions and Future Work
References
A Reinforcement Learning Method for Solving the Production Scheduling Problem of Silicon Electrodes
1 Introduction
2 Problem Description and Modeling
2.1 Variable Definitions
2.2 Permutation-Based Model of the SEP
3 Proposed RL Algorithm
3.1 Solution Encoding and Decoding Schemes
3.2 Action Representation
3.3 State Representation
3.4 Action Selection Method
3.5 Reward R
3.6 Algorithmic Framework
4 Experimental Results and Discussion
4.1 Data and Experiment Design
4.2 Experiment Results and Discussion
5 Conclusions
References
Information Security
CL-BOSIC: A Distributed Agent-Oriented Scheme for Remote Data Integrity Check and Forensics in Public Cloud
1 Introduction
2 Related Work
2.1 Provable Data Possession
2.2 Blockchain and Smart Contract
2.3 Certificateless Public Key Encryption with Keyword Search
3 Preliminaries
3.1 Bilinear Pairing
3.2 Security Assumptions
4 Problem Statement
4.1 System Model
4.2 Design Goals
4.3 Security Definition
5 Framework
5.1 Algorithm Specification
5.2 Our Construction
6 Security Analysis
6.1 Correctness
6.2 Soundness
6.3 Perfect Privacy Preservation
7 Implementation and Evaluation
8 Conclusion
References
Zeroth-Order Gradient Approximation Based DaST for Black-Box Adversarial Attacks
1 Introduction
2 Related Work
2.1 Black-Box Adversarial Attacks
2.2 DaST in Black-Box Adversarial Attacks
3 Zeroth-Order Gradient Approximation Based DaST
3.1 Overview
3.2 GAN
3.3 Gradient Approximation
4 Experiments
4.1 Environment Settings
4.2 Model Architecture
4.3 Evaluation Standard
4.4 Results
5 Conclusion
References
A Dynamic Resampling Based Intrusion Detection Method
1 Introduction
2 Related Work
2.1 Intrusion Detection
2.2 Class Imbalance
3 Methodology
3.1 Dynamic Resampling
3.2 Parallel-Residual Feature Fusion (PRFF)
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Experimental Results
5 Conclusion
References
Information Security Protection for Online Education Based on Blockchain
1 Introduction
2 Current Situation of Online Education Information Security
2.1 Software Inherent Risks
2.2 Public WiFi Usage Risks
2.3 Risks of Virus Attack
3 Overall Explanation of Blockchain
3.1 Basic Concepts
3.2 Basic Characteristics of Blockchain
4 The Current Status of Blockchain Application in Online Education
5 The Feasibility of Blockchain in Online Education
5.1 Using Blockchain for User Information Protection
5.2 Using Blockchain for Teaching Resource Protection and Sharing
5.3 Using Blockchain for Teaching Tracking
6 Conclusion
References
Blockchain-Based Access Control Mechanism for IoT Medical Data
1 Introduction
2 Related Works
3 System Model
3.1 System Model
3.2 Smart Contract Access Control Module
4 Our BAC-IOMT Scheme
4.1 User Management Module
4.2 Diagnosis Module
4.3 Secure Access Control Module
4.4 Delegate Module
5 Performance Analysis and Security Proofs
5.1 Performance Analysis
5.2 Security Proofs
6 Conclusions
References
LNPB: A Light Node for Public Blockchain with Constant-Size Storage
1 Introduction
2 Design of LNPB
2.1 Framework
2.2 Design Goals
3 The Transaction Verifiaction Protocl
3.1 Parameter Initialization
3.2 Block Summary Generation
3.3 Proof Generation
3.4 Proof Verification
4 A New Proof Generation Method
5 Security Analysis
6 Simulation Experiment
6.1 Simulation Experiment Environment and Parameters
6.2 Experimental Results and Analysis
7 Related Work
8 Conclusion
References
Collaborative Face Privacy Protection Method Based on Adversarial Examples in Social Networks
1 Introduction
2 Face Privacy Protection Related Work
2.1 Face De-identification
2.2 Face Attribute De-identification
2.3 Face Adversarial Attack
2.4 Face Anti-compression Adversarial Attack
3 Method
3.1 Definition
3.2 Model
4 Experiment
4.1 Experimental Setup
4.2 Experimental and Result Analysis
5 Conclusion
References
Duty-Based Workflow Dynamic Access Control Model
1 Introduction
2 Related Work
3 Problem Raising
4 The Proposed DBAC
4.1 Overview of the Proposed Model
4.2 Formal DBAC Model
4.3 Separation of Duties Verification Mechanism
4.4 Dynamic Authorization
5 Performance Analysis and Comparison
5.1 Performance Analysis
5.2 Comparison with Traditional Models
6 Conclusion
References
A High-Performance Steganography Algorithm Based on HEVC Standard
1 Introduction
2 Theoretical Framework
2.1 Intra-frame Prediction
2.2 Intra-frame Distortion Drift
3 Description of Algorithm Process
3.1 Embedding
3.2 Data Extraction
4 Case Study
5 Conclusion
References
Using N-Dimensional Space Coding of Transform Coefficients for Video Steganography in H.265/HEVC
1 Introduction
2 DCT/DST Residual Coefficients Space
2.1 Selection of 4 × 4 DCT/DST Residual Coefficient Blocks
2.2 Construction and Coding of N-dimensional Residual Coefficients Space
3 Embedding and Extraction
3.1 Embedding
3.2 Extraction
4 Experimental Evaluation
5 Conclusion
References
CSMPQ: Class Separability Based Mixed-Precision Quantization
1 Introduction
2 Related Work
2.1 Network Quantization
2.2 TF-IDF
3 Methodology
3.1 Pre-processing
3.2 Transforming the Features into Words
3.3 The TF-IDF for Network Quantization
3.4 Mixed-Precision Quantization
4 Experiments
4.1 Implementation Details
4.2 Quantization-Aware Training
4.3 Post-Training Quantization
4.4 Ablation Study
5 Conclusion
References
Open-World Few-Shot Object Detection
1 Introduction
2 Open-World Few-Shot Object Detection
2.1 Preliminary
3 OFDet
3.1 Stage I
3.2 Stage II
3.3 Inference
4 Experiments
4.1 Experiments Setting
4.2 Results on Class-Agnostic Object Detection
4.3 Results on Few-Shot Object Detection
4.4 Results on OFOD
4.5 Ablation Study.
5 Conclusion
References
Theoretical Computational Intelligence and Applications
RWA Optimization of CDC-ROADMs Based Network with Limited OSNR
1 Introduction
2 Problem Description
2.1 Physical Model
2.2 Graph Model
2.3 Small Scale vs Large Scale
3 Optimization Method
3.1 ILP Formulation for Small Scale Network
3.2 ILP Formulation for Large Scale Network
4 Experimental Results
5 Conclusion
References
Resource Optimization for Link Failure Recovery of Software-Defined Optical Network
1 Introduction
2 Introduction
2.1 Physical Model
2.2 Graph Model
3 Optimization Method
3.1 ILP Formulation of Centralized Optimization
3.2 Distributed Optimization for Large Scale Network
3.3 Wavelength and Interface Assignment
4 Performance Evaluation
4.1 Parameters Setting of Simulation
4.2 Results Discussion
5 Conclusion
References
Multi-view Coherence for Outdoor Reflective Surfaces
1 Introduction
2 Pipeline Method
2.1 Drone Path Planning for Image Collection
2.2 Multi-view Control Points Matching Algorithm
2.3 Application to Multi-view Stereo
3 Experimental Results
3.1 Comparison of Control Points Matching
3.2 Comparison of 3D Reconstruction
3.3 Comparison of Texture Remapping
4 Limitations and Conclusion
References
MOVNG: Applied a Novel Sparse Fusion Representation into GTCN for Pan-Cancer Classification and Biomarker Identification
1 Introduction
2 Proposed Methodology
2.1 Combined Model Framework
2.2 Sparse Fusion Representation Method
2.3 Graph Tree Convolution Networks
3 Performance Evaluation
3.1 Dataset
3.2 Representation Ability
3.3 Ablation Experiment
3.4 Biomarker Identification
4 Conclusion and Future Work
References
A Blockchain-Based Network Alignment System for Power Equipment Data Inconsistency
1 Introduction
2 Related Work
3 System Design
3.1 Overall System Architecture
3.2 Access Control via HAC Tree
3.3 Data Compression Algorithm Based on Composite Hashing
3.4 Similar Equipment Data Retrieval Based on Representative Learning
4 Evaluation
4.1 Performance Evaluation of Data Compression Algorithm
5 Conclusion
References
TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network for Traffic Flow Forecasting
1 Introduction
2 Methodology
2.1 Problem Definition
2.2 Framework of TrafficSCINet
2.3 Adaptive Graph Convolution Layer
2.4 Sample Convolution and Interaction Network
3 Experiments
3.1 Datasets and Data Preprocessing
3.2 Baseline Methods
3.3 Experiment Settings
3.4 Experiment Results and Analysis
3.5 Ablation Experiments
4 Conclusion and Future Work
References
CLSTGCN: Closed Loop Based Spatial-Temporal Convolution Networks for Traffic Flow Prediction
1 Introduction
2 Preliminary
3 Methodology
3.1 Network Structure
3.2 Spatial Correlation Information
3.3 Spatial-Temporal Attention
3.4 Spatial-Temporal Convolution
4 Experiment
4.1 Dataset
4.2 Model Evaluation
4.3 Experiment Result and Analysis
5 Conclusion
References
A Current Prediction Model Based on LSTM and Ensemble Learning for Remote Palpation
1 Introduction
2 Related Work
2.1 Remote Palpation System
2.2 Recurrent Neural Network and Ensemble Learning
3 Method
3.1 Temporal Relationships in Spatial Position and Force Feedback
3.2 Current Prediction Model Based on LSTM and GRNN
3.3 Model Fusion with Ensemble Learning
4 Experiments
4.1 Dataset and Training Process
4.2 Evaluation Metrics and Compared Method
4.3 Experimental Result
5 Conclusion
References
Multi-step Probabilistic Load Forecasting for University Buildings Based on DA-RNN-MDN
1 Introduction
2 Related Work
2.1 Load Value Forecasting
2.2 Probabilistic Load Forecasting
3 Methodology
3.1 Technical Preliminaries
3.2 The Proposed Method
4 Experiments
4.1 Dataset
4.2 Data Processing
4.3 Parameters Settings and Evaluation Metrics
4.4 Results
5 Conclusion
References
A Quantum Simulation Method with Repeatable Steady-State Output Using Massive Inferior Solutions
1 Introduction
2 Theoretical Basis
2.1 GSE–A Quantum Energy Minimization Process
2.2 Solving a Single Ground State via DMC
2.3 Simulating QA with MGSE and Multi-scale DMC
3 Algorithm Proposal and Analysis
4 Experiment and Analysis
4.1 Convergence Experiment
4.2 A Practical Case with Simple Function Landscape
4.3 A Practical Case with Rugged Function Landscape
5 Conclusion
References
Metal Oxide Classification Based on SVM
1 Background
2 Feature Input
2.1 Data Set
2.2 Characteristic Variable Input
3 Modeling
3.1 Principal Component Analysis Theory
3.2 Support Vector Machine Theory
3.3 Modeling
3.4 Parameter Optimization
3.5 Model Evaluation Indicators
4 Model Results and Analysis
4.1 Performance Evaluation of the Model
4.2 Analysis of Experimental Results
5 Conclusion
References
Food Image Classification Based on Residual Network
1 Introduction
2 Experimental Data
2.1 Food Dataset
2.2 Data Preprocessing
3 Food Image Recognition Model
3.1 Pyramid Split Attention Unit
3.2 Soft Thresholding Subnetwork
3.3 Residual Module
3.4 Optimization Algorithm
4 Experimental Result and Analysis
4.1 Model Accuracy
4.2 Ablation Experiments
5 Conclusion
References
BYOL Network Based Contrastive Clustering
1 Introduction
2 BYOL Network-Based End-to-End Contrastive Clustering
2.1 The Contrastive Network for Representation Learning
2.2 The Discriminative Network for Data Clustering
2.3 Training of the BCC
3 Experiments
3.1 Implementation Detail
3.2 Comparisons Methods
3.3 Analysis of Batch Size
4 Conclusion
References
Deep Multi-view Clustering Based on Graph Embedding
1 Introduction
2 The Proposed Approach
2.1 Reconstruction Loss
2.2 Clustering Loss
2.3 Graph Embedding
2.4 Optimization
3 Experiments
3.1 Dataset
3.2 Comparing Methods
3.3 Experiment Setup
3.4 Results
4 Conclusion
References
Graph-Based Short Text Clustering via Contrastive Learning with Graph Embedding
1 Introduction
2 Model
2.1 Graph-Based Short Text Clustering via Contrastive Learning with Graph Embedding
3 Experiments
3.1 Datasets
3.2 Implementations
3.3 Compare with Baseline
3.4 Parameter Experiment
4 Conclusion
References
An Improved UAV Detection Method Based on YOLOv5
1 Introduction
2 Dataset Preparation
3 Algorithm Improvement
3.1 A Model Pruning Method Based on the BN Layers
3.2 Secondary Classification with EfficientNet
4 Experimental Analysis
4.1 Dataset and Eexperimental Platform
4.2 YOLOv5 Optimal Parameter Adjustment
4.3 Comprehensive Experimental Results
5 Summary
References
Large-Scale Traffic Signal Control Based on Integration of Adaptive Subgraph Reformulation and Multi-agent Deep Reinforcement Learning
1 Introduction
2 Methodology
2.1 Framework
2.2 Adaptive Subgraph Reformulation Algorithm
2.3 Multi-agent A2C for ATSC
3 Numerical Experiments
3.1 Simulation Environment and Experimental Parameters
3.2 Baseline Methods
3.3 Synthetic Traffic Grid
3.4 Jinan Urban Transportation Network
4 Conclusion and Future Work
References
An Improved AprioriAll Algorithm Based on Tissue-Like P for Sequential Pattern Mining
1 Introduction
2 Preliminaries
2.1 AprioriAll Algorithm
2.2 Tissue-Like p System with Promoters and Inhibitor
3 TP-AprioriAll Algorithm
3.1 Algorithm and Rules Description
3.2 Algorithm Flow
3.3 Complexity Analysis
4 Case Analyze
5 Experiment Results.
6 Conclusion
References
Joint Spatiotemporal Collaborative Relationship Network for Skeleton-Based Action Recognition
1 Introduction
2 Method
2.1 Spatial Stream
2.2 Temporal Stream
3 Experiments
3.1 Ablation Experiment
3.2 Comparative Experiment
4 Conclusion
References
A Digital Human System with Realistic Facial Expressions for Friendly Human-Machine Interaction
1 Introduction
2 Related Work
3 System Framework
4 Proposed Method
4.1 3D Reconstruction of a User
4.2 Head Rotation Simulation
4.3 Blendshape Theory
5 Experiments
5.1 Experimental Setup
5.2 Experimental Results and Analysis
5.3 Comparison and Evaluation
6 Conclusions and Future Work
References
Author Index

Citation preview

LNCS 14086

De-Shuang Huang · Prashan Premaratne · Baohua Jin · Boyang Qu · Kang-Hyun Jo · Abir Hussain (Eds.)

Advanced Intelligent Computing Technology and Applications 19th International Conference, ICIC 2023 Zhengzhou, China, August 10–13, 2023 Proceedings, Part I

Lecture Notes in Computer Science Founding Editors Gerhard Goos Juris Hartmanis

Editorial Board Members Elisa Bertino, Purdue University, West Lafayette, IN, USA Wen Gao, Peking University, Beijing, China Bernhard Steffen , TU Dortmund University, Dortmund, Germany Moti Yung , Columbia University, New York, NY, USA

14086

The series Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI), has established itself as a medium for the publication of new developments in computer science and information technology research, teaching, and education. LNCS enjoys close cooperation with the computer science R & D community, the series counts many renowned academics among its volume editors and paper authors, and collaborates with prestigious societies. Its mission is to serve this international community by providing an invaluable service, mainly focused on the publication of conference and workshop proceedings and postproceedings. LNCS commenced publication in 1973.

De-Shuang Huang · Prashan Premaratne · Baohua Jin · Boyang Qu · Kang-Hyun Jo · Abir Hussain Editors

Advanced Intelligent Computing Technology and Applications 19th International Conference, ICIC 2023 Zhengzhou, China, August 10–13, 2023 Proceedings, Part I

Editors De-Shuang Huang Department of Computer Science Eastern Institute of Technology Zhejiang, China Baohua Jin Zhengzhou University of Light Industry Zhengzhou, China Kang-Hyun Jo University of Ulsan Ulsan, Korea (Republic of)

Prashan Premaratne University of Wollongong North Wollongong, NSW, Australia Boyang Qu Zhong Yuan University of Technology Zhengzhou, China Abir Hussain Department of Computer Science Liverpool John Moores University Liverpool, UK

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-981-99-4754-6 ISBN 978-981-99-4755-3 (eBook) https://doi.org/10.1007/978-981-99-4755-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

The International Conference on Intelligent Computing (ICIC) was started to provide an annual forum dedicated to emerging and challenging topics in artificial intelligence, machine learning, pattern recognition, bioinformatics, and computational biology. It aims to bring together researchers and practitioners from both academia and industry to share ideas, problems, and solutions related to the multifaceted aspects of intelligent computing. ICIC 2023, held in Zhengzhou, China, August 10–13, 2023, constituted the 19th International Conference on Intelligent Computing. It built upon the success of ICIC 2022 (Xi’an, China), ICIC 2021 (Shenzhen, China), ICIC 2020 (Bari, Italy), ICIC 2019 (Nanchang, China), ICIC 2018 (Wuhan, China), ICIC 2017 (Liverpool, UK), ICIC 2016 (Lanzhou, China), ICIC 2015 (Fuzhou, China), ICIC 2014 (Taiyuan, China), ICIC 2013 (Nanning, China), ICIC 2012 (Huangshan, China), ICIC 2011 (Zhengzhou, China), ICIC 2010 (Changsha, China), ICIC 2009 (Ulsan, South Korea), ICIC 2008 (Shanghai, China), ICIC 2007 (Qingdao, China), ICIC 2006 (Kunming, China), and ICIC 2005 (Hefei, China). This year, the conference concentrated mainly on theories and methodologies as well as emerging applications of intelligent computing. Its aim was to unify the picture of contemporary intelligent computing techniques as an integral concept that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. Therefore, the theme for this conference was “Advanced Intelligent Computing Technology and Applications”. Papers that focused on this theme were solicited, addressing theories, methodologies, and applications in science and technology. ICIC 2023 received 828 submissions from 12 countries and regions. All papers went through a rigorous peer-review procedure and each paper received at least three review reports. Based on the review reports, the Program Committee finally selected 337 high-quality papers for presentation at ICIC 2023, and inclusion in five volumes of proceedings published by Springer: three volumes of Lecture Notes in Computer Science (LNCS), and two volumes of Lecture Notes in Artificial Intelligence (LNAI). This volume of LNCS_14086 includes 68 papers. The organizers of ICIC 2023, including Eastern Institute of Technology, China Zhongyuan University of Technology, China, and Zhengzhou University of Light Industry, China, made an enormous effort to ensure the success of the conference. We hereby would like to thank the members of the Program Committee and the referees for their collective effort in reviewing and soliciting the papers. In particular, we would like to thank all the authors for contributing their papers. Without the high-quality submissions from the authors, the success of the conference would not have been possible. Finally,

vi

Preface

we are especially grateful to the International Neural Network Society, and the National Science Foundation of China for their sponsorship. June 2023

De-Shuang Huang Prashan Premaratne Boyang Qu Baohua Jin Kang-Hyun Jo Abir Hussain

Organization

General Co-chairs De-Shuang Huang Shizhong Wei

Eastern Institute of Technology, China Zhengzhou University of Light Industry, China

Program Committee Co-chairs Prashan Premaratne Baohua Jin Kang-Hyun Jo Abir Hussain

University of Wollongong, Australia Zhengzhou University of Light Industry, China University of Ulsan, Republic of Korea Liverpool John Moores University, UK

Organizing Committee Co-chair Hui Jing

Zhengzhou University of Light Industry, China

Organizing Committee Members Fubao Zhu Qiuwen Zhang Haodong Zhu Wei Huang Hongwei Tao Weiwei Zhang

Zhengzhou University of Light Industry, China Zhengzhou University of Light Industry, China Zhengzhou University of Light Industry, China Zhengzhou University of Light Industry, China Zhengzhou University of Light Industry, China Zhengzhou University of Light Industry, China

Award Committee Co-chairs Michal Choras Hong-Hee Lee

Bydgoszcz University of Science and Technology, Poland University of Ulsan, Republic of Korea

viii

Organization

Tutorial Co-chairs Yoshinori Kuno Phalguni Gupta

Saitama University, Japan Indian Institute of Technology Kanpur, India

Publication Co-chairs Valeriya Gribova M. Michael Gromiha Boyang Qu

Far Eastern Branch of Russian Academy of Sciences, Russia Indian Institute of Technology Madras, India Zhengzhou University, China

Special Session Co-chairs Jair Cervantes Canales Chenxi Huang Dhiya Al-Jumeily

Autonomous University of Mexico State, Mexico Xiamen University, China Liverpool John Moores University, UK

Special Issue Co-chairs Kyungsook Han Laurent Heutte

Inha University, Republic of Korea Université de Rouen Normandie, France

International Liaison Co-chair Prashan Premaratne

University of Wollongong, Australia

Workshop Co-chairs Yu-Dong Zhang Hee-Jun Kang

University of Leicester, UK University of Ulsan, Republic of Korea

Organization

ix

Publicity Co-chairs Chun-Hou Zheng Dhiya Al-Jumeily Jair Cervantes Canales

Anhui University, China Liverpool John Moores University, UK Autonomous University of Mexico State, Mexico

Exhibition Contact Co-chair Fubao Zhu

Zhengzhou University of Light Industry, China

Program Committee Members Abir Hussain Antonio Brunetti Antonino Staiano Bin Liu Bin Qian Bin Yang Bing Wang Binhua Tang Bingqiang Liu Bo Li Changqing Shen Chao Song Chenxi Huang Chin-Chih Chang Chunhou Zheng Chunmei Liu Chunquan Li Dahjing Jwo Dakshina Ranjan Kisku Dan Feng Daowen Qiu Dharmalingam Muthusamy Dhiya Al-Jumeily Dong Wang

Liverpool John Moores University, UK Polytechnic University of Bari, Italy Università di Napoli Parthenope, Italy Beijing Institute of Technology, China Kunming University of Science and Technology, China Zaozhuang University, China Anhui University of Technology, China Hohai University, China Shandong University, China Wuhan University of Science and Technology, China Soochow University, China Harbin Medical University, China Xiamen University, China Chung Hua University, Taiwan Anhui University, China Howard University, USA University of South China, China National Taiwan Ocean University, Taiwan National Institute of Technology Durgapur, India Huazhong University of Science and Technology, China Sun Yat-sen University, China Bharathiar University, India Liverpool John Moores University, UK University of Jinan, China

x

Organization

Dunwei Gong Eros Gian Pasero Evi Sjukur Fa Zhang Fengfeng Zhou Fei Guo Gaoxiang Ouyang Giovanni Dimauro Guoliang Li Han Zhang Haibin Liu Hao Lin Haodi Feng Hongjie Wu Hongmin Cai Jair Cervantes Jixiang Du Jing Hu Jiawei Luo Jian Huang Jian Wang Jiangning Song Jinwen Ma Jingyan Wang Jinxing Liu Joaquin Torres-Sospedra Juan Liu Jun Zhang Junfeng Xia Jungang Lou Kachun Wong Kanghyun Jo Khalid Aamir Kyungsook Han L. Gong Laurent Heutte

China University of Mining and Technology, China Politecnico di Torino, Italy Monash University, Australia Beijing Institute of Technology, China Jilin University, China Central South University, China Beijing Normal University, China University of Bari, Italy Huazhong Agricultural University, China Nankai University, China Beijing University of Technology, China University of Electronic Science and Technology of China, China Shandong University, China Suzhou University of Science and Technology, China South China University of Technology, China Autonomous University of Mexico State, Mexico Huaqiao University, China Wuhan University of Science and Technology, China Hunan University, China University of Electronic Science and Technology of China, China China University of Petroleum, China Monash University, Australia Peking University, China Abu Dhabi Department of Community Development, UAE Qufu Normal University, China Universidade do Minho, Portugal Wuhan University, China Anhui University, China Anhui University, China Huzhou University, China City University of Hong Kong, China University of Ulsan, Republic of Korea University of Sargodha, Pakistan Inha University, Republic of Korea Nanjing University of Posts and Telecommunications, China Université de Rouen Normandie, France

Organization

Le Zhang Lejun Gong Liang Gao Lida Zhu Marzio Pennisi Michal Choras Michael Gromiha Ming Li Minzhu Xie Mohd Helmy Abd Wahab Nicola Altini Peng Chen Pengjiang Qian Phalguni Gupta Prashan Premaratne Pufeng Du Qi Zhao Qingfeng Chen Qinghua Jiang Quan Zou Rui Wang Saiful Islam Seeja K. R. Shanfeng Zhu Shikui Tu Shitong Wang Shixiong Zhang Sungshin Kim Surya Prakash Tatsuya Akutsu Tao Zeng Tieshan Li Valeriya Gribova

Vincenzo Randazzo

xi

Sichuan University, China Nanjing University of Posts and Telecommunications, China Huazhong Univ. of Sci. & Tech., China Huazhong Agriculture University, China University of Eastern Piedmont, Italy Bydgoszcz University of Science and Technology, Poland Indian Institute of Technology Madras, India Nanjing University, China Hunan Normal University, China Universiti Tun Hussein Onn Malaysia, Malaysia Polytechnic University of Bari, Italy Anhui University, China Jiangnan University, China GLA University, India University of Wollongong, Australia Tianjin University, China University of Science and Technology Liaoning, China Guangxi University, China Harbin Institute of Technology, China University of Electronic Science and Technology of China, China National University of Defense Technology, China Aligarh Muslim University, India Indira Gandhi Delhi Technical University for Women, India Fudan University, China Shanghai Jiao Tong University, China Jiangnan University, China Xidian University, China Pusan National University, Republic of Korea IIT Indore, India Kyoto University, Japan Guangzhou Laboratory, China University of Electronic Science and Technology of China, China Institute of Automation and Control Processes, Far Eastern Branch of Russian Academy of Sciences, Russia Politecnico di Torino, Italy

xii

Organization

Waqas Haider Wen Zhang Wenbin Liu Wensheng Chen Wei Chen Wei Peng Weichiang Hong Weidong Chen Weiwei Kong Weixiang Liu Xiaodi Li Xiaoli Lin Xiaofeng Wang Xiao-Hua Yu Xiaoke Ma Xiaolei Zhu Xiangtao Li Xin Zhang Xinguo Lu Xingwei Wang Xinzheng Xu Xiwei Liu Xiyuan Chen Xuequn Shang Xuesong Wang Yansen Su Yi Xiong Yu Xue Yizhang Jiang Yonggang Lu Yongquan Zhou Yudong Zhang Yunhai Wang Yupei Zhang Yushan Qiu

Kohsar University Murree, Pakistan Huazhong Agricultural University, China Guangzhou University, China Shenzhen University, China Chengdu University of Traditional Chinese Medicine, China Kunming University of Science and Technology, China Asia Eastern University of Science and Technology, Taiwan Shanghai Jiao Tong University, China Xi’an University of Posts and Telecommunications, China Shenzhen University, China Shandong Normal University, China Wuhan University of Science and Technology, China Hefei University, China California Polytechnic State University, USA Xidian University, China Anhui Agricultural University, China Jilin University, China Jiangnan University, China Hunan University, China Northeastern University, China China University of Mining and Technology, China Tongji University, China Southeast Univ., China Northwestern Polytechnical University, China China University of Mining and Technology, China Anhui University, China Shanghai Jiao Tong University, China Huazhong University of Science and Technology, China Jiangnan University, China Lanzhou University, China Guangxi University for Nationalities, China University of Leicester, UK Shandong University, China Northwestern Polytechnical University, China Shenzhen University, China

Organization

Yunxia Liu Zhanli Sun Zhenran Jiang Zhengtao Yu Zhenyu Xuan Zhihong Guan Zhihua Cui Zhiping Liu Zhiqiang Geng Zhongqiu Zhao Zhuhong You

xiii

Zhengzhou Normal University, China Anhui University, China East China Normal University, China Kunming University of Science and Technology, China University of Texas at Dallas, USA Huazhong University of Science and Technology, China Taiyuan University of Science and Technology, China Shandong University, China Beijing University of Chemical Technology, China Hefei University of Technology, China Northwestern Polytechnical University, China

Contents – Part I

Evolutionary Computation and Learning A Region Convergence Analysis for Multi-mode Stochastic Optimization Based on Double-Well Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guosong Yang, Peng Wang, and Xinyu Yin

3

ADMM with SUSLM for Electric Vehicle Routing Problem with Simultaneous Pickup and Delivery and Time Windows . . . . . . . . . . . . . . . . . Fei-Long Feng, Bin Qian, Rong Hu, Nai-Kang Yu, and Qing-Xia Shang

15

A Nested Differential Evolution Algorithm for Optimal Designs of Quantile Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhenyang Xia, Chen Xing, and Yue Zhang

25

Real-Time Crowdsourced Delivery Optimization Considering Maximum Detour Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xianlin Feng, Rong Hu, Nai-Kang Yu, Bin Qian, and Chang Sheng Zhang

37

Improving SHADE with a Linear Reduction P Value and a Random Jumping Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yanyun Zhang, Guangyu Chen, and Li Cheng

47

Detecting Community Structure in Complex Networks with Backbone Guided Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rong-Qiang Zeng, Li-Yuan Xue, and Matthieu Basseur

59

Swarm Intelligence and Optimization Large-Scale Multi-objective Evolutionary Algorithms Based on Adaptive Immune-Inspirated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weiwei Zhang, Sanxing Wang, Chao Wang, Sheng Cui, Yongxin Feng, Jia Ding, and Meng Li Hybrid Hyper-heuristic Algorithm for Integrated Production and Transportation Scheduling Problem in Distributed Permutation Flow Shop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenbo Chen, Bin Qian, Rong Hu, Sen Zhang, and Yijun Wang

71

85

xvi

Contents – Part I

A Task Level-Aware Scheduling Algorithm for Energy Consumption Constrained Parallel Applications on Heterogeneous Computing Systems . . . . . . Haodi Li, Jing Wu, Jianhua Lu, Ziyu Chen, Ping Zhang, and Wei Hu

97

An Energy-Conscious Task Scheduling Algorithm for Minimizing Energy Consumption and Makespan in Heterogeneous Distributed Systems . . . . . . . . . . 109 Wei Hu, Ziyu Chen, Jing Wu, Haodi Li, and Ping Zhang A Hyper-Heuristic Algorithm with Q-Learning for Distributed Permutation Flowshop Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Ke Lan, Zi-Qi Zhang, Bi Qian, Rong Hu, and Da-Cheng Zhang Robot Path Planning Using Swarm Intelligence Algorithms . . . . . . . . . . . . . . . . . . 132 Antanios Kaissar, Sam Ansari, Meshal Albeedan, Soliman Mahmoud, Ayad Turky, Wasiq Khan, Dhiya Al-Jumeily OBE, and Abir Hussain Deep Reinforcement Learning for Solving Multi-objective Vehicle Routing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Jian Zhang, Rong Hu, Yi-Jun Wang, Yuan-Yuan Yang, and Bin Qian Probability Learning Based Multi-objective Evolutionary Algorithm for Distributed No-Wait Flow-Shop and Vehicle Transportation Integrated Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Ziqi Ding, Zuocheng Li, Bin Qian, Rong Hu, and Changsheng Zhang Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective Two-Echelon Vehicle Routing Problem with Time Windows . . . . . . . . . . . . . . . . . 168 Qiu-Yi Shen, Ning Guo, Rong Hu, Bin Qian, and Jian-Lin Mao A Task-Duplication Based Clustering Scheduling Algorithm for Heterogeneous Computing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Ping Zhang, Jing Wu, Di Cheng, Jianhua Lu, and Wei Hu Hyper-heuristic Q-Learning Algorithm for Flow-Shop Scheduling Problem with Fuzzy Processing Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Jin-Han Zhu, Rong Hu, Zuo-Cheng Li, Bin Qian, and Zi-Qi Zhang Hyper-heuristic Estimation of Distribution Algorithm for Green Hybrid Flow-Shop Scheduling and Transportation Integrated Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Ling Bai, Bin Qian, Rong Hu, Zuocheng Li, and Huai-Ping Jin

Contents – Part I

xvii

Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem with Sequence-Independent Setup Times and Resource Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Xing-Han Qiu, Bin Qian, Zi-Qi Zhang, Zuo-Cheng Li, and Ning Guo Learning Based Memetic Algorithm for the Monocrystalline Silicon Production Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Jianqun Gong, Zuocheng Li, Bin Qian, Rong Hu, and Bin Wang Learning Variable Neighborhood Search Algorithm for Solving the Energy-Efficient Flexible Job-Shop Scheduling Problem . . . . . . . . . . . . . . . . . 241 Ying Li, Rong Hu, Xing Wu, Bin Qian, and Zi-Qi Zhang A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm for the Distributed Flexible Job-Shop Scheduling Problem . . . . . . . . . . . . . . . . . . . 251 Fang-Chun Wu, Bin Qian, Rong Hu, Zi-Qi Zhang, and Bin Wang A Firefly Algorithm Based on Prediction and Hybrid Samples Learning . . . . . . . 262 Leyi Chen and Jun Li Fireworks Algorithm for Dimensionality Resetting Based on Roulette . . . . . . . . . 275 Senwu Yu and Jun Li Improved Particle Swarm Optimization Algorithm Combined with Reinforcement Learning for Solving Flexible Job Shop Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Yi-Jie Gao, Qing-Xia Shang, Yuan-Yuan Yang, Rong Hu, and Bin Qian A Learning-Based Multi-Objective Evolutionary Algorithm for Parallel Machine Production and Transportation Integrated Optimization Problem . . . . . 299 Shurui Zhang, Bin Qian, Zuocheng Li, Rong Hu, and Biao Yang Q-Learning Based Particle Swarm Optimization with Multi-exemplar and Elite Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Haiyun Qiu, Bowen Xue, Qinge Xiao, and Ben Niu A Branch and Bound Algorithm for the Two-Machine Blocking Flowshop Group Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Sen Zhang, Bin Qian, Rong Hu, Changsheng Zhang, and Kun Li Deep Reinforcement Learning for Solving Distributed Permutation Flow Shop Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Yijun Wang, Bin Qian, Rong Hu, Yuanyuan Yang, and Wenbo Chen

xviii

Contents – Part I

Mobile Edge Computing Offloading Problem Based on Improved Grey Wolf Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Wenyuan Shang, Peng Ke, and Tao Zhou Runtime Analysis of Estimation of Distribution Algorithms for a Simple Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Rui Liu, Bin Qian, Sen Zhang, Rong Hu, and Nai-Kang Yu Nonlinear Inertia Weight Whale Optimization Algorithm with Multi-strategy and Its Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Cong Song li, Feng Zou, and Debao Chen Sparrow Search Algorithm Based on Cubic Mapping and Its Application . . . . . . 376 Shuo Zheng, Feng Zou, and DeBao Chen Hyper-heuristic Three-Dimensional Estimation of Distribution Algorithm for Distributed Assembly Permutation Flowshop Scheduling Problem . . . . . . . . . 386 Xiao Li, Zi-Qi Zhang, Rong Hu, Bin Qian, and Kun Li An Improved Genetic Algorithm for Vehicle Routing Problem with Time Windows Considering Temporal-Spatial Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Juan Wang and Jun- qing Li A Reinforcement Learning Method for Solving the Production Scheduling Problem of Silicon Electrodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 Yu-Fang Huang, Rong Hu, Xing Wu, Bin Qian, and Yuan-Yuan Yang Information Security CL-BOSIC: A Distributed Agent-Oriented Scheme for Remote Data Integrity Check and Forensics in Public Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Xiaolei Zhang, Huilin Zheng, Qingni Shen, and Zhonghai Wu Zeroth-Order Gradient Approximation Based DaST for Black-Box Adversarial Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Yanfei Zhu, Yaochi Zhao, Zhuhua Hu, Xiaozhang Liu, and Anli Yan A Dynamic Resampling Based Intrusion Detection Method . . . . . . . . . . . . . . . . . . 454 Yaochi Zhao, Dongyang Yu, and Zhuhua Hu Information Security Protection for Online Education Based on Blockchain . . . . 466 Yanran Feng Blockchain-Based Access Control Mechanism for IoT Medical Data . . . . . . . . . . 475 Tianling Yang, Shuanglong Huang, Haiying Ma, and Jiale Guo

Contents – Part I

xix

LNPB: A Light Node for Public Blockchain with Constant-Size Storage . . . . . . . 487 Min Li Collaborative Face Privacy Protection Method Based on Adversarial Examples in Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Zhenxiong Pan, Junmei Sun, Xiumei Li, Xin Zhang, and Huang Bai Duty-Based Workflow Dynamic Access Control Model . . . . . . . . . . . . . . . . . . . . . 511 Guohong Yi and Bingqian Wu A High-Performance Steganography Algorithm Based on HEVC Standard . . . . . 522 Si Liu, Yunxia Liu, Guoning Lv, Cong Feng, and Hongguo Zhao Using N-Dimensional Space Coding of Transform Coefficients for Video Steganography in H.265/HEVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532 Hongguo Zhao, Yunxia Liu, Yonghao Wang, Hui Liu, and Zhenghang Zhao CSMPQ: Class Separability Based Mixed-Precision Quantization . . . . . . . . . . . . 544 Mingkai Wang, Taisong Jin, Miaohui Zhang, and Zhengtao Yu Open-World Few-Shot Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556 Wei Chen and Shengchuan Zhang Theoretical Computational Intelligence and Applications RWA Optimization of CDC-ROADMs Based Network with Limited OSNR . . . . 571 Pengxuan Yuan, Yongxuan Lai, Liang Song, and Feng He Resource Optimization for Link Failure Recovery of Software-Defined Optical Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Yongxuan Lai, Pengxuan Yuan, Liang Song, and Feng He Multi-view Coherence for Outdoor Reflective Surfaces . . . . . . . . . . . . . . . . . . . . . 593 Shuwen Niu, Jingjing Tang, Xingyao Lin, Haoyang Lv, Liang Song, and Zihao Jian MOVNG: Applied a Novel Sparse Fusion Representation into GTCN for Pan-Cancer Classification and Biomarker Identification . . . . . . . . . . . . . . . . . . 604 Xin Chen, Yun Tie, Fenghui Liu, Dalong Zhang, and Lin Qi A Blockchain-Based Network Alignment System for Power Equipment Data Inconsistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616 Yuxiang Cai, Xin Jiang, Qifan Yang, Wenhao Zhao, and Chen Lin

xx

Contents – Part I

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network for Traffic Flow Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 Kai Gong, Shiyuan Han, Xiaohui Yang, Weiwei Yu, and Yuanlin Guan CLSTGCN: Closed Loop Based Spatial-Temporal Convolution Networks for Traffic Flow Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 Hao Li, Shiyuan Han, Jinghang Zhao, Yang Lian, Weiwei Yu, and Xixin Yang A Current Prediction Model Based on LSTM and Ensemble Learning for Remote Palpation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 Fuyang Wei, Jianhui Zhao, and Zhiyong Yuan Multi-step Probabilistic Load Forecasting for University Buildings Based on DA-RNN-MDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Lei Xu, Liangliang Zhang, Runyuan Sun, Na Zhang, Peihua Liu, and Pengwei Guan A Quantum Simulation Method with Repeatable Steady-State Output Using Massive Inferior Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 Guosong Yang, Peng Wang, Gang Xin, and Xinyu Yin Metal Oxide Classification Based on SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Kai Xiao, Zhuo Wang, and Wenzheng Bao Food Image Classification Based on Residual Network . . . . . . . . . . . . . . . . . . . . . . 695 Xueyan Yang, Jinping Sun, Zhuo Wang, and Wenzheng Bao BYOL Network Based Contrastive Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Xuehao Chen, Weidong Zhou, Jin Zhou, Yingxu Wang, Shiyuan Han, Tao Du, Cheng Yang, and Bowen Liu Deep Multi-view Clustering Based on Graph Embedding . . . . . . . . . . . . . . . . . . . . 715 Chen Zhang, Weidong Zhou, Jin Zhou, Yingxu Wang, Shiyuan Han, Tao Du, Cheng Yang, and Bowen Liu Graph-Based Short Text Clustering via Contrastive Learning with Graph Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 727 Yujie Wei, Weidong Zhou, Jin Zhou, Yingxu Wang, Shiyuan Han, Tao Du, Cheng Yang, and Bowen Liu An Improved UAV Detection Method Based on YOLOv5 . . . . . . . . . . . . . . . . . . . 739 Xinfeng Liu, Mengya Chen, Chenglong Li, Jie Tian, Hao Zhou, and Inam Ullah

Contents – Part I

xxi

Large-Scale Traffic Signal Control Based on Integration of Adaptive Subgraph Reformulation and Multi-agent Deep Reinforcement Learning . . . . . . 751 Kai Gong, Qiwei Sun, Xiaofang Zhong, and Yanhua Zhang An Improved AprioriAll Algorithm Based on Tissue-Like P for Sequential Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763 Xiaojun Ma and Xiyu Liu Joint Spatiotemporal Collaborative Relationship Network for Skeleton-Based Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 Hao Lu and Tingwei Wang A Digital Human System with Realistic Facial Expressions for Friendly Human-Machine Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 Anthony Condegni, Weitian Wang, and Rui Li Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799

Evolutionary Computation and Learning

A Region Convergence Analysis for Multi-mode Stochastic Optimization Based on Double-Well Function Guosong Yang1,2,3

, Peng Wang2,3(B)

, and Xinyu Yin2

1 Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu, China 2 School of Computer Science and Engineering, Southwest Minzu University, Chengdu, China

[email protected] 3 University of Chinese Academy of Sciences, Beijing, China

Abstract. In multi-mode heuristic optimization, the output fitness of an algorithm cannot converge to the global optimal value if its search points have not converged to the region with optimal solution. Generally, more samplings and more converged points in this optimal region may result in a higher probability of fitness convergence toward the optimal value. However, studies focus mainly on fitness convergence rather than region convergence (RC) of search points. This is partly because, for most objective functions, it is usually hard to track the region of search points in dynamic optimization. To remedy this, a novel analysis method is proposed using the double-well function (DWF), since it has a unique fitness landscape that makes it convenient to trace these points. First, a mathematical analysis of the DWF is given to explore its landscape. Then, RC is defined and discussed using DWF. On these bases, experiments are conducted and analyzed using Particle Swarm Optimization (PSO), and much useful information about its RC is revealed. Besides, this method can be used to analyze the RC of similar optimization algorithms as well. Keywords: Region Convergence · Multi-mode Optimization · Double-well Function · Heuristic Optimization · Procedure-oriented Testing

1 Introduction Heuristic optimization algorithms [1] are designed to locate the global optimal solution of the objective function f (x) in a given feasible solution space. With a limited computing budget, they explore the solution space by stochastic search and output a deeply optimized solution with acceptable accuracy and time cost. The function value of the sampled point (namely fitness) is the only requirement in heuristic optimization, and these algorithms serve as important solutions to many problems [2, 3]. In practice, it’s inconvenient to evaluate solution quality by the distance between the global optimal solution and the sampled point. Instead, fitness convergence is widely employed to study algorithm convergence. First, many benchmark functions with known global minima are employed to test fitness convergence [4]. Researchers usually use the © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 3–14, 2023. https://doi.org/10.1007/978-981-99-4755-3_1

4

G. Yang et al.

error between the algorithm output and the global minimum to evaluate the solution. In the CEC benchmark tests [5, 6], scholars introduced the convergence tendency of the best fitness and defined unified testing methods, giving insight into the converging dynamics. Moreover, some algorithms’ convergences are discussed mathematically. Trelea pointed out that PSO is not a global convergence algorithm [7], and the global fitness convergences of quantum-behaved particle swarm optimization and bat algorithm were also mathematically analyzed [8, 9]. However, in multi-mode optimization, these methods mainly concentrated on fitness convergence, and little attention has been paid to the search point’s convergence towards the region with global optimal solution (referred to as the global optimal region hereafter). Actually, if an algorithm’s search points have not converged to the global optimal region, the algorithm cannot realize fitness convergence to the global optimal. Therefore, RC of the global optimal is the necessary condition but not the sufficient condition for fitness convergence. Moreover, for a given algorithm, with more search points converged to the optimal region, more explorations are likely to be carried out there, and a greater probability of global fitness convergence would be attained generally, so RC from the local optimal region to the global one in time should be encouraged. However, too fast RC may lead to severe structural bias [10], wrongly identify a local optimal region as the global optimal one, and cause premature. Thus, RC plays a vital role, and its further study is necessary. To investigate this important feature, a special objective function is introduced as a RC analysis tool in Sect. 2, with a brief discussion to lay the foundation for this study. In Sect. 3, definitions and assumptions about RC are made for further analysis. Experiments are conducted and analyzed in Sect. 4, revealing the dynamics of RC in multi-mode optimization. Lastly, conclusions are given in Sect. 5.

2 Double-Well Function Most benchmark functions [5, 6] are designed for performance-oriented test with too complicated landscape, posing difficulty to RC analysis. Thus, the DWF [4], which has a regular fitness landscape so that the point’s jump between different areas can be easily tracked, is used to analyze the search points’ RC f (x) = h(x2 − l 2 )2 /l 4 + kx

(1)

where k ≥ 0 is the linear factor, l > 0 is the width factor, and h > 0 is the height factor. According to the calculus and the Shenjin Formula [4, 11], formula (1) can be discussed as per the value of the discriminant  = 27l 2 k 2 − 64h2 . (a) If  < 0, then formula (1) has the global minimum, local maximum, and local  √ √ √ minimum at x1 = −2l cos(θ/3)/ 3, x2 = l cos(θ/3) − 3 sin(θ/3) / 3, and   √ √ √ x3 = l cos(θ/3) + 3 sin(θ/3) / 3 respectively, where θ = arccos(3 3lk/8h). Moreover, f (x) decreases in (−∞, x1 ) ∪ (x2 , x3 ) and increases in (x1 , x2 ) ∪ (x3 , ∞). In this case, h mainly determines the fluctuation of f (x), l plays a decisive role in defining the distance of the two minima, and the difference between the two minima

A Region Convergence Analysis for Multi-mode Stochastic Optimization

5

is mainly influenced by h and k. If k = 0, then f (x) is symmetric about x = 0. If k > 0, f (x) is asymmetric. (b) If  ≥ 0, then f (x) has only one minimum, it is not a DWF. The optimization difficulty of DWF can be increased by extending the dimension, and the corresponding n-D DWF is n n Fn (x) = Fn (x1 , . . . , xn ) = fi (xi ) = hi (xi2 − li2 )2 /li4 + ki xi (2) i=1

i=1

where n ∈ N+ . In this case, xi1 , xi2 , xi3 , hi , li , and ki denote the corresponding factors in the i-th dimension. The Taylor expansion of Fn (x) at x0 is Fn (x) = Fn (x0 ) + [∂Fn (x)/∂x1 , . . . , ∂Fn (x)/∂xn ]|x (dx1 , . . . , dxn )T + 0

1 (dx , . . . , dxn )A(dx1 , . . . , dxn )T + . . . 2 1

(3)

where ⎤ ∂ 2 Fn (x)/∂x12 ∂ 2 Fn (x)/(∂x1 ∂x2 ) . . . ∂ 2 Fn (x)/(∂x1 ∂xn ) ⎢ ∂ 2 Fn (x)/(∂x2 ∂x1 ) ∂ 2 Fn (x)/∂x2 . . . ∂ 2 Fn (x)/(∂x2 ∂xn ) ⎥ 2 ⎥ A=⎢ ⎦ ⎣ ... ... ... ... 2 2 2 2 ∂ Fn (x)/(∂xn ∂x1 ) ∂ Fn (x)/(∂xn ∂x2 ) . . . ∂ Fn (x)/∂xn x ⎡

(4)

0

is the Hessian matrix. Since ∂ 2 Fn (x)/(∂xi ∂xj ) = 0 when i = j, A turns out to be a diagonal matrix. If (∂Fn (x)/∂x1 , . . . , ∂Fn (x)/∂xn )|x0 = 0, then Fn (x0 ) is not a minimum or maximum. If (∂Fn (x)/∂x1 , . . . , ∂Fn (x)/∂xn )|x0 = 0. (a) and A is a negative definite matrix, then Fn (x0 ) is a local maximum. Thus, if and only if all i < 0 and x2 = (x12 , . . . , xn2 ), Fn (x) gets local maximum n Fn (x)lamx = Fn (x2 ) = Fn (x12 , . . . , xn2 ) = fi (xi2 ) (5) i=1

(b) and A is a positive definite matrix, then Fn (x0 ) is a local minimum. Thus, if i < 0, and xi = xi1 or xi = xi3 are satisfied in each dimension, Fn (x) gets local minimum n Fn (x)l min = Fn (x1k , . . . , xnk ) = fi (xik ) (6) i=1

where k = 1 or 3. Obviously, Fn (x) gets the global minimum at x = (x11 , . . . , xn1 ) and the largest local minimum at x = (x13 , . . . , xn3 ). (c) and A is an indefinite matrix, then Fn (x0 ) is not a minimum or maximum.

3 Region Convergence—Definition and Discussion In multi-mode optimization, search behaviors can be roughly divided into two types. First, they search the solution space and locates the potential optimal region from a non-optimal one. Generally, this kind of search exists widely in the initial stage and algorithms are designed to employ large-scale search in this process, they are supposed to search the entire space sufficiently to avoid premature and structural bias [10]. Second, they focus on locating the potential global optimal solution by sampling inside a specific

6

G. Yang et al.

local optimal region, consuming much computing source there to improve the output accuracy. Dominated by inner-region search, usually this period is similar to a uni-mode optimization, though limited cross-region searches may still exist. In practice, these processes are mixed with each other, and it is hard to distinguish them completely, posing challenges to RC analysis. For distinct algorithms, the RC processes also differ. To track the cross-region search and analyze the RC, definitions and assumptions are made as follows based on the DWF. Definition 1. For a n-D DWF defined by formula (2), suppose ki ≥ 0, li > 0, hi > 0, and i > 0 are satisfied in each dimension, and the global minimum, local maximum, and local minimum of fi (x) in the i-th dimension are xi1 , xi2 , and xi3 , respectively.A = {x1l < x1 < x1u , . . . , xnl < xn < xnu } ⊂ Rn is an interval. For each dimension i, if (xil , xiu ) ∩ [xi2 − 2li , xi2 ] = [xi2 − 2li , xi2 ) or (xil , xiu ) ∩ [xi2 , xi2 + 2li ] = (xi2 , xi2 + 2li ] is satisfied, then A is a local minimum region of Fn (x). Specifically, if (xil , xiu ) ∩ [xi2 − 2li , xi2 ] = [xi2 − 2li , xi2 ) is satisfied for each dimension, then A is the global minimum region of Fn (x). Definition 1 ensures one and only one local minimum in a local minimum region, and in this interval fi (x) of each dimension has a downward convex curve. For Fn (x), it has 2n local minima. By comparing the coordinates of the sampled points in each dimension, one can determine the present local minimum region, track the point jump between different regions, and investigate the RC towards the global minimum. Definition 2. For a stochastic optimization algorithm used to optimize the DWF in Definition 1, if A is a local minimum region and the corresponding local minimum is xi , then the convergence of the algorithm’s sampling points toward A is defined as the RC of A, or the RC of the local minimum xi . Specifically, if A is the global minimum region, then this convergence is defined as the RC of the global minimum. For an ideal optimization, a sampling point should converge to the global optimal region as iteration time t → ∞. Thus, in a population-based stochastic optimization, we can record the sampling point number of a local minimum region to study its RC, and the convergence dynamics can be revealed by some independent

A Region Convergence Analysis for Multi-mode Stochastic Optimization

7

and repeating runs. To evaluate the optimization budget, we make two assumptions as follows. Assumption 1. For the t-th optimization iteration of a population-based stochastic optimization algorithm, the optimization computing resource allocated to region A is proportional to the number of search points sampled within and to A. For some population-based optimization algorithms, search points only take up a part of the whole computing resource. However, generally they still dominate the computing budget allocation and different points share similar computing resource. Hence, Assumption 1 roughly describes the assignment of the computing budget. Assumption 2. For a population-based stochastic optimization algorithm, more allocated computing resource to the global minimum region would lead to a higher probability of fitness convergence towards the global minimum with good accuracy. The multi-mode optimization result is influenced by the initial condition, stochastic factors, and the algorithm dynamics. With massive independent repetitions of optimization, the effect of stochastic factors can be weakened, in this case the global convergence is ensured by massive samplings driven by the computing resource and optimization dynamics rather than a casual sampling to the neighborhood of the global minimum. Assumption 2 emphasizes the importance of sufficient computing budget allocated to the global optimal region to achieve the global convergence.

4 Experiment and Result Analysis 4.1 Experiment Environment and Basic Setting We set the optimization interval for dimension i as [x i2 − 2l i , x i2 + 2l i ]. Since with a tiny positive k i , f i (x i ) has two local minima near (−li , 0) and (l i , 0), it’s approximately symmetric about the y-axis, and it has only a rugged landscape in this interval. To simplify the discussion, here we divided this interval into two parts by x i = x i2 and name them by their value in binary: the part with a smaller value is encoded as 0 while the rest part is encoded as 1, and the code of dimension i is placed in the i-th from right. For point (x 1 , x 2 ) in the 2-D case, if x 1 < x 12 and x 2 > x 22 , then this is region (10)2 , or area 2. Experiments with different dimensions and initial distributions were designed and five independent runs were conducted in each case. Moreover, the experiment was run on a PC with an i7-10750H CPU at 2.60 GHz and Windows 10, and the experiment data was processed and the figures were plotted using Python 3.7. PSO with a time-varying weight [12] was employed here as the testing algorithm. The maximum function evaluations (maxFEs) were 10000 times of its dimension. We set the initial weight wi = 0.9, the end weight we = 0.4, and the weight in the t-th iteration to w(t) = wi − (wi − we )t/maxFEs. The acceleration coefficients were w1 = w2 = 2.0, and the maximum velocity of dimension i was 0.8li . Lastly, the population size and the initial positions were set differently according to the optimization problem.

8

G. Yang et al.

4.2 2-D Experiment Here, the population was 36. To limit the research space near −40 < x 1 , x 2 < 40, l i was set to 20, and hi was set to 12.00 to make a high barrier between the minima and increase the optimization difficulty. Moreover, two kinds of initialization were considered. First, to avoid structural bias caused by initial positions, 9 search points were initialized to each region with random positions, and k i was set to 0.002 since this value would make very small fitness gaps for the four minima. Second, to study the dynamics of jumping out of a local minimum region, all sampling points were initialized in x i3 − 0.1li < x 1 , x 2 < x i3 + 0.1l i , and k i was set to 0.01 since this initialization was much more challenging. Note all i < 0 here. After the experiment, the data was analyzed from different perspectives as follows. 2D Distribution of All Sampling Points for Distinct Local Optimal Regions. In Fig. 1, with global initialization, the sampling point distributions focus on the linear areas determined by some couples of the local minima, the densest distribution locates near the global minimum or a local minimum, and the distributions in region 1 and 2 are greatly different despite their symmetric landscape. However, according to the landscape of the DWF, an ideal distribution should focus on the four local minimum regions, with the densest distribution near (−20, −20), denser and similar distributions near (20, −20) and (−20, 20), and more sparser points near (20, 20). These indicate that the RC of PSO is clearly biased [10]. The velocity update equation of PSO emphasizes searching the linear part determined by the personal best point and the global optimal points [12], thus, as per Assumption 1, the RC of PSO is apt to be limited in these linear regions, causing irrational allocation of computing resource and potential RC towards the local minimum region. As shown in Figs. 1c, h, and i, if the global best is tracked in a local minimum region in the early stage, then it is inclined to attract all other search points to its local minimum region, risk missing the global fitness convergence according to Assumption 2. These suggest that the employed PSO is handicapped by the simple topology of its search net. With global initialization, four successful optimizations share similar kernel density estimation curves with the largest value near (−20, −20), the only failure in Fig. 1c has the largest kernel density estimation near (−20, 20), and its best output solution is also located near there. This matches well with Assumption 2. In the second case, the distributions also show obvious linear distributions and structural bias. Moreover, due to the biased initialization, the distributions in area 3 are denser, and more computing resources are allocated to the right half. Dynamic Area Stack Percentage for Distinct Local Optimal Regions. In Fig. 2, with global initialization, sampling points percentage all converged to the global minimum region except for the third one. However, for the four successful trials, their RC tendencies were quite different. As shown by the sharp changes at beginning, once a point sampled a location with the smallest fitness near a local minimum, it became the global best and induced RC toward there for the left points, tending to cause structural bias according to Assumption 1. Later, in Figs. 2a and e, the proportion of a local minimum region increased dramatically, followed by a stable RC convergence toward the global minimum. This may be caused by the region change of the sampled best points. For Figs. 2b and d, the

A Region Convergence Analysis for Multi-mode Stochastic Optimization

9

(a) Sampling points' distribution (b)Sampling points' distribution (c)Sampling points' distribution (d)Sampling points' distribution (e)Sampling points' distribution of the 1st run with global initial- of the 2nd run with globalinitial- of the 3rd run with global initial- of the 4th run with global initial- of the 5th run with global initialization.

ization.

ization.

ization.

(f) Sampling points' distribution (g)Sampling points' distribu tion (h)Sampling points' distribution (i)Sampling points' distribution

ization.

(j)Sampling points' distribution

of the 1st run with initial points of the 2nd run with initi al points of the 3rd run with initial points of the 4th ru n with initial points of the 5th run with initial points near the largest local minimum. near the largest local minimum.

near the largest local minimum. near the largest loca l minimum. near the largest local minimum.

Fig. 1. Distribution of all sampling points for asymmetric DWF optimization using PSO with different initializations. Each dot represents a sampling point, the kernel density estimation curve for each dimension is placed on the right/upper edge, and the distribution density is shown by contour.

early global minimum RC was easily realized, the global best point successfully located the global minimum region within a short time. However, in Fig. 2c, PSO failed to obtain the global minimum RC. A global best point with very small fitness was obtained in region 2 early, and most points were attracted to that area, leading to a clear premature. Consequently, very little computing budget was allocated to the global minimum region, making it difficult to realize global fitness convergence according to Assumption 2. Shown in Figs. 2f to j, for the second initialization, other regions gradually took more percentages as optimization continued. However, the structural bias was obvious, since great percentage differences between the symmetric regions (namely region 1 and 2) occurred in Figs. 2f, h, and i. Moreover, their RCs towards the global minimum were very different. In Figs. 2f and i, a local minimum region first took a large proportion from the initial area, then they gradually converged to the global minimum region. Whereas in Figs. 2g and j, very limited points can be found in region 1 and region 2, since most sampling points directly realized global minimum RC from region 3. In Fig. 2h, region 1 dominated the whole process after the first stage, and only small percentage can be found in the global minimum area. Thus, little computing resource was allocated properly to the optimal region, and it’s difficult to achieve global convergence there, as per the assumptions above. Cross-Region Sampling. To further investigate the cross-region sampling in the four areas, we recorded every sampling in the two experiments and define a matrix in formula (7), where jmn (t) denotes the total number of samplings that jump from region m to region n in the first t iterations. In Fig. 3, jmn (t) was plotted in the (m + 1)-th row and (n + 1)-th column. For both initializations, inner-region sampling dominated

10

G. Yang et al.

(a) Dynamic area stack percentage(b)Dynamic area stack percentage(c)Dynamic area stack percentage(d)Dynamic area stack percentage(e)Dynamic area stack percentage of sampling points in the 1st run of sampling points in the 2nd run of sampling points in the 3rd run of sampling points in the 4th run of sampling points in the 5th run with global initialization.

with global initialization.

with global initialization.

with global initialization.

with global initialization.

(f) Dynamic area stack percentage(g)Dynamic area stack percentage(h)Dynamic area stack percentage(i)Dynamic area stack percentage(j)Dynamic area stack percentage of sampling points in the 1st run of sampling points in the 2nd run of sampling points in the 3rd run of sampling points in the 4th run of sampling points in the 5th run with initial points near the large- with initial points near the large- with initial points near the large-

with initial points near the large- with initial points near the large-

st local minimum.

st local minimum.

st local minimum.

st local minimum.

st local minimu m.

Fig. 2. Dynamic area stack percentage of sampling points in four local minimum regions for different initializations.

the optimization process, and cross-region sampling only took small proportions. Quantitatively, the numbers of inner-region sampling ranged approximately from 16000 to 18000, taking percentages higher than 80%. This indicates that, for both cases, PSO allocated the most computing budget to the inner-region search in a potential global minimum area to improve optimization ⎤ j00 (t) j01 (t) j02 (t) j03 (t) ⎢ j10 (t) j11 (t) j12 (t) j13 (t) ⎥ ⎥ J (t) = ⎢ ⎣ j20 (t) j21 (t) j22 (t) j23 (t) ⎦ ⎡

(7)

j30 (t) j31 (t) j32 (t) j33 (t)

accuracy, and limited computing resources were assigned to the cross-region search to achieve RC, which explains the premature of PSO. According to the assumptions above, to ensure the global minimum RC, more cross-region samplings are encouraged, especially in the initial stage. Moreover, for both initializations, the symmetric tendency between jmn (t) and jnm (t) was clear, indicating that the cross-region sampling is an undirected back-and-forth process to some extent. The two cases also differed in a few aspects. For them, sampling between the same two regions may share quite different amounts, this was partly influenced by the distinct initializations, so initialization is also important for RC.

4.3 10-D Experiment Here, the population size was 60. Similarly, we set hi = 12 and l i = 20. Considering the increased optimization difficulty in 10-D, we set k i = 0.01. Optimizations were carried out with two types of initialization. To weaken the influence of initial bias, for dimension i, all points were initialized in (x i2 − 0.1l, x i2 + 0.1l) in the first test, then they were initialized in (x i3 − 0.1l, x i3 + 0.1l) to study global minimum RC when all points are

A Region Convergence Analysis for Multi-mode Stochastic Optimization

11

Fig. 3. The total number of sampling points’ jump between different regions in optimization. The x/y-axis denotes the iteration/number. The number of jumps from region m to region n is shown in the (m + 1)-th row and (n + 1)-th column. A solid (dashed) line illustrates an optimization with global initialization (initialization near the largest local minimum).

limited in the region with the largest local minimum. Five independent optimizations were conducted in each case, the result is analyzed as follows. Number of Sampling Points’ Components that have Converged to the Global Minimum Interval. Here, the number of sampling points’ components that have converged to the global minimum region (x i2 − 2l i , x i2 ) was recorded and analyzed. For the first initialization shown in Fig. 4a, each run started with a number between 180 and 310, with sharp curves about 2/3 of the components converged to the optimal interval in about 20 iterations, indicating fast RC. However, many search points got local minimum RC rather than global minimum RC, since they first stabilized near a number smaller than 600. With a period of further search, these numbers repeatedly grew and dropped, suggesting that PSO was conducting cross-region sampling. Then they gradually reached another stable state, in which PSO converged to another local minimum region with better fitness. This phenomenon mainly took place in the first 600 iterations, thereafter,

12

G. Yang et al.

the curves remained stable. These suggest that the left computing budget was used to search for the minimum in a given region to improve accuracy. Overall, about 1/3 of the computing budget was used for the RC.

(a) Global minimum RC with initial points near the local maximum in the center.

(b) Global minimum RC with initial points near the largest local minimum.

Fig. 4. Two dynamic processes of global minimum RC with different initial conditions. Each curve illustrates one trial. If a dimension of any sampling point converges to (jumps out of) its global minimum interval (x i2 − 2l i , x i2 ), the corresponding curve increases (decreases) by one.

For the local minimum initialization shown in Fig. 4b, all the numbers started from 0, as the curves increased after a few iterations, many points jumped out of the largest local minimum region, starting to converge toward the global minimum area. Then, short stable states near the 50th iteration were followed, and their convergence rates were limited, this may be explained by the limited population diversity and Assumption 1. Similarly to the first experiment, then all curves increased sharply, suggesting fast RC. Finally, one run failed to realize global minimum RC, it only found the global minimum part in 9 dimensions. Moreover, the RC ended in the first 600 iterations, indicating a similar computing budget allocation. Sampling Points’ Jump to the Global Minimum Interval. Here we plot the number of component jumps from (x i2 , x i2 + 2l i ) to (x i2 − 2l i , x i2 ). In optimization, each jump denotes that one dimension of the search point shifted from the local minimum region to the global one. The situation with the first initialization is shown in Fig. 5a. Matching well with the analysis above, the curves increased rapidly first. Then the increases slowed down, and a few turning points can be seen, indicating new search rates towards the global minimum region. Finally, every curve reached a stable state in the first third. For the second initialization shown in Fig. 5b, the jump numbers are clearly larger due to the more challenging initialization. Besides, it’s interesting that the RC were realized a little faster here, since every curve enters a stable state in about 500 iterations.

A Region Convergence Analysis for Multi-mode Stochastic Optimization

(a) Global minimum RC with initial points near the local maximum in the center.

13

(b)Global minimum RC with initial points near the largest local minimum.

Fig. 5. The global minimum RC shown by the total jumps to the optimal interval, each curve illustrates an optimization. If a dimension of any sampling point jumps from its local minimum interval to the global minimum one, the corresponding curve increases by one.

5 Conclusion To investigate the search points’ dynamics of locating the potential global minimum region in multi-mode optimization, a new analysis method based on the DWF is proposed. Unlike many other performance-oriented benchmark tests, this method can track the RC in the black-box optimization. In addition, using PSO, numerical experiments with distinct dimensions and initializations were performed to verify the method’s effectiveness. The experiment result reveals important hidden dynamics of the RC, giving much useful information that cannot be obtained by ordinary performance-oriented tests. Moreover, this method can be extended to analyze other population-based algorithms and study their convergence behavior.

References 1. Yang, X.: Nature-Inspired Computation and Swarm Intelligence: Algorithms, Theory and Applications. Elsevier Ltd., Amsterdam (2020) 2. Dechanupaprittha, S., Jamroen, C.: Self-learning PSO based optimal EVs charging power control strategy for frequency stabilization considering frequency deviation and impact on EV owner. Sustain. Energy Grids Netw. 26, 100463 (2021) 3. Latifa, N.B., Aguili, T.: Optimization of coupled periodic antenna using genetic algorithm with Floquet modal analysis and MoM-GEC. Open J. Antennas Propag. 10(1), 1–15 (2022) 4. Wang, P., Yang, G.: Using double well function as a benchmark function for optimization algorithm. In: 2021 IEEE Congress on Evolutionary Computation, pp. 886–892. IEEE, New York (2021) 5. Liang, J., Qu, B., Suganthan, P.N.: Problem definitions and evaluation criteria for the CEC 2014 special session and competition on single objective real-parameter numerical optimization. Technical report. Zhengzhou University, Zhengzhou, China and Nanyang Technological University, Singapore (2013) 6. Awad, N.H., Ali, M.Z., Liang, J., Qu, B., Suganthan, P.N.: Problem definitions and evaluation criteria for the CEC 2017 special session and competition on single objective real-parameter numerical optimization. Technical report. Nanyang Technological University, Singapore, Jordan University of Science and Technology, Jordan, and Zhengzhou University, Zhengzhou, China (2016)

14

G. Yang et al.

7. Cristian, I.T.: The particle swarm optimization algorithm: convergence analysis and parameter selection. Inf. Process. Lett. 85(6), 317–325 (2003) 8. Sun, J., Wu, X., Palade, V., Fang, W., Lai, C.-H., Xu, W.: Convergence analysis and improvements of quantum-behaved particle swarm optimization. Inf. Sci. 193, 81–103 (2012) 9. Chen, S., Peng, G., He, X., Yang, X.: Global convergence analysis of the bat algorithm using a Markovian framework and dynamical system theory. Expert Syst. Appl. 114, 173–182 (2018) 10. Kononova, A.V., Corne, D.W., Wilde, P.D., Shneer, V., Caraffini, F.: Structural bias in population-based algorithms. Inf. Sci. 298, 468–490 (2015) 11. Fan, S.: A new extracting formula and a new distinguishing means on the one variable cubic equation. Nat. Sci. J. Hainan Teach. Coll. 2(2), 91–98 (1989) 12. Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence, pp. 69–73. IEEE, New York (1998)

ADMM with SUSLM for Electric Vehicle Routing Problem with Simultaneous Pickup and Delivery and Time Windows Fei-Long Feng, Bin Qian(B) , Rong Hu, Nai-Kang Yu, and Qing-Xia Shang School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China [email protected]

Abstract. This paper studies the electric vehicle routing problem with simultaneous pickup and delivery and time window (EVRPTWSPD). In this paper, a novel alternating direction multiplier method with sequential updating scheme of lagrangian multiplier is proposed to optimize EVRPTWSPD with the goal of minimizing cost. This method first decomposes the problem into a series of augmented lagrangian submodels by using augmented lagrangian decomposition technology, then solves the submodel through labelling-setting algorithm. Finally this method iteratively updates the subproblems and lagrangian multiplier through sequential updating scheme of lagrangian multiplier. In experiment benchmark verification, the method proposed in this paper has excellent performance, and the method can get the tight lower bound in the process of solving. Keywords: electric vehicle routing problem with simultaneous pickup and delivery and time window · alternating direction multiplier method with sequential updating scheme of lagrangian multiplier · labelling-setting algorithm

1 Introduction In recent years, with the development of electronic economy, the logistics industry is facing severe challenges. In order to improve the competitiveness of companies, the logistics companies began to strengthen the logistics management and reduce the operating costs (transportation costs, delay compensation, etc.). Meanwhile, logistics companies began to reduce the use of fuel vehicles as the price of fossil fuels rise and the environmental pressure of policies [1]. Electric vehicles have high energy efficiency and zero greenhouse gas emissions [2], which have been widely concerned by the logistics industry and academia. The electric vehicle routing problem with time windows was first proposed by Schneider [3]. EVRPTW is a variant of the classic vehicle routing problem. The goal of this problem is to find the lower cost vehicle routing to serve all customers, and at the same time, it is necessary to consider the charging of electric vehicles in the routing. Because of the huge solution space of electric vehicle routing problem (EVRPTW), © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 15–24, 2023. https://doi.org/10.1007/978-981-99-4755-3_2

16

F.-L. Feng et al.

most research uses intelligent algorithms or heuristic algorithms to solve it. Montoya [4] developed a hybrid meta-heuristic algorithm for EVRP with linear charging function. Verma [5] developed an improved genetic algorithm to solve EVRPTW. Yindong [6] developed estimation of distribution algorithm is used to solve the multi-compartment EVRP. The vehicle routing problem with simultaneous pickup and delivery (VRPSPD) considers that the customer has deliver goods and pickup goods. The vehicle needs to deliver goods from the depot to the customer and recycle goods from the customer to the depot. Wang [7] developed a parallel simulated annealing algorithm for VRPSPD. Park [8] studied the waiting strategy for VRPSPD and proposed to solve the problem using genetic algorithm. Oztas [9] proposed a hybrid metaheuristic algorithm based on iterative local search to solve the vehicle routing problem with simultaneous pickup and delivery. In this paper, we study the electric vehicle routing problem with simultaneous pickup and delivery and time window (EVRPSPDTW) aiming at minimizing costs. Xu [10] gave the mathematical model of EVRPTWSPD and solved the problem by adaptive large neighborhood search method. Yao [11] proposed that the alternate direction multiplier method (ADMM) can be applied to the vehicle routing problem with time windows and can obtain better solutions. Due to the slow convergence rate of ADMM, Shen [12] proposed to improve the convergence of the algorithm through the Sequential Updating Scheme of the Lagrangian Multiplier (SUSLM). The main contributions of this paper are as follows: 1) This paper develops EVRPSPDTW, which combines EVRPTW with VRPSPD to make the problem conform to the actual logistics process. 2) This paper establishes a mixed integer programming model for EVRPSPDTW, and verifies the feasibility of the model through gurobi. 3) This paper proposes an ADMM with SUSLM algorithm for EVRPSPDTW. Considering the slow convergence rate of ADMM algorithm, we designed the SUSLM iteration scheme to enhance the convergence of the algorithm in the EVRPSPDTW problem. The remainder of this paper is structured as follows. In Sect. 2, we describe the problem of EVRPSPDTW and establishes the mixed integer programming model of EVRPSPDTW. In Sect. 3, the ADMM with SUSLM algorithm and solution procedure are presented. In Sect. 4, the proposed model and solution method are applied to the Solomon benchmark. Finally, we draw our conclusions in Sect. 5.

2 Problem Description and Mathematical Formulation This paper solves the common problem of EVRPSPDTW in the process of logistics. This section describes EVRPSPDTW in detail and represent the mixed integer programming model constructed in this paper based on EVRPSPDTW. EVRPSPDTW aims to minimize costs while fulfilling customer demand. Each vehicle starts from the depot and returns to the depot. Figure 1 Shows an example of EVRPSPDTW. The vehicle 1 route is 0-1-r-2-3-0, and the vehicle 2 route is 0-5-4-0. The battery capacity of vehicle 1 is not enough to visit customers in the route, so the vehicle route visits the recharging station and charges at the recharging station.

ADMM with SUSLM for Electric Vehicle Routing Problem

17

Fig. 1. Electric vehicle routing with simultaneous pickup and delivery and time window (EVRPSPDTW).

EVRPSPDTW is defined on a directed graph G = (N , A), where N and A are node set and arc set, respectively. The node set N consists of a depot N0 = {0}, a set of customers Nc and a set of recharging station Nr , N = N0 ∪ Nc ∪ Nr . Each vehicle v ∈ V is the homogeneous of vehicle, vehicle have capacity Q and battery capacity B. Each customer i ∈ Nc has delivery demand dci , pickup demand pci , and time Windows [ei , li ]. Each charging station r ∈ Nr has the same ct charging time. Each arc (i, j) ∈ A has a transport cost denoted by cij . The mixed integer programming model of EVRPSPDTW is formulated as follows (Table 1): Objective function:  min Cij xijv (1) v∈V i,j∈N

Subjective to:





i∈N

(2)

v xi0 = 1, v ∈ V

(3)

xijv = 1, ∀j ∈ Nc , ∀v ∈ V

i∈N



v x0j = 1, v ∈ V

xijv =



xjiv  , ∀j ∈ N , ∀v ∈ V

(4) (5)

i ∈N

  xijv • dcapiv − dcapjv = dcj , ∀j ∈ Nc

(6)

  xijv pcapjv − pcapiv = pcj , ∀j ∈ Nc

(7)

v∈V

 v∈V

18

F.-L. Feng et al. Table 1. The notation used in the EVRPSPDTW mixed integer programming model.

Notations

Description

Sets N

Set of all nodes

A

Set of all arcs

V

Set of all vehicles

Nc

Set of all customers

Nr

Set of all recharging station

N0

The depot

Parameters Cij

Transportation cost from i to j

dtij

Transportation time from i to j

dij

Transportation electricity consumption from i to j

dci

Delivery demand of customer i

pci

Pickup demand of customer i

[ei , li ]

Time window of customer i

st

Sever time of customer

cap

Capacity of vehicle

ati

Arrival time of vehicle at node i

FB

Battery capacity

bi

Battery of vehicle in node i

wti

Waiting time of vehicle at node i

dcapjv

Delivery cargo weight of vehicle v

pcapjv

Pickup cargo weight of vehicle v

Variables xijv

If vehicle v traveling directly from i to j, xijv = 1. Otherwise xijv = 0

pcapiv + dcapiv < cap, ∀v ∈ V , ∀i ∈ Nc

(8)

      bi − dij • xijv + FB − dij • xijv = bj , ∀j ∈ Nc

(9)

i∈0∪Nc

i∈Nb v∈V

bi −



v xi0 • di0 ≥ 0, ∀i ∈ Nc

(10)

v∈V v FB − xi0 • di0 ≥ 0, ∀i ∈ Nb , ∀v ∈ V

(11)

ADMM with SUSLM for Electric Vehicle Routing Problem

bi −



xijv • dij ≥ 0, ∀i ∈ Nc , ∀j ∈ Nb

19

(12)

v∈V

bi ≥ 0, ∀i ∈ Nc atj =



  xijv • dtij + st + ati + wti , j ∈ N

(13) (14)

i∈N v∈V

ei ≤ ati + wti ≤ li , ∀i ∈ Nc

(15)

e0 ≤ ti ≤ l0 , ∀i ∈ Nr ∪ 0

(16)

xijv ∈ {0, 1}

(17)

The objective function (1) minimizes total cost. Constraints (2) and (3) are that all vehicles start from the depot and return to the depot. Constraint (4) each customer is visited exact once. Constraint (5) is the flow balance constraint for each node. Constraint (6) and (7) ensure that customer delivery demands and pickup demands are satisfied. Constraint (8) is the capacity constraint of the vehicle. Constraint (9) calculates the electric quantity of the vehicle at customers. Constraints (10), (11), (12) and (13) ensure that vehicle electric quantity is greater than 0 during vehicle routing. Constraint (14) calculates the time of vehicle arrival at the node. Constraint (15) the vehicle must begin service within the customer time window. Constraint (16) the vehicle must return to the depot within the depot time window. Constraint (17) decision variable xijv is a binary variable.

3 Solution Method Alternating direction multiplier method is a common method for solving optimization problems. ADMM can decompose the optimization problem into a series of smaller problems, and obtain the global solution of the problem by solving the subproblems and coordinating the subproblems solution per-iteration. In this paper, dynamic programming algorithm is used to solve the decomposed subproblems, and SUSLM (Sequential Updating Scheme of the Lagrange Multiplier) is introduced to update the lagrangian multiplier in the algorithm. In the following sections, the construction of the augmented lagrangian function, the decomposition and linearization of the augmented lagrangian function, the SUSLM, the labeling-setting algorithm to solve subproblems and the ADMM with SUSLM’s framework are outlined. 3.1 Construction of the Augmented Lagrangian Model First, the coupling constraint (4) is relaxed by Augmented Lagrange Relaxation (ALR) technique. ALR technology constructs augmented lagrangian objective function (18) by adding relaxation constraints to objective function (1), and introducing lagrangian

20

F.-L. Feng et al.

multiplier λc and quadratic multiplier μc . Where, lagrangian multiplier λc and quadratic multiplier μc are used to balance the relationship between each customer’s demand and cost. The augmented lagrangian model obtained by ALR technology can be expressed as (objective function: (18); Subject to: (2), (3), and (5)–(17)). min



cij xijv +

v∈V i∈N j∈N



    2 λj xijv − 1 + μj xijv − 1

j∈Nc

(18)

j∈Nc

Since the coupling vehicle constraint (4) of is relaxed by the ALR technology, the augmented lagrangian model can be decomposed into a series of single vehicle augmented lagrangian submodels. Meanwhile, the decision variable x is binary. The objective function of the augmented lagrangian model with quadratic terms and the objective function of the single vehicle augmented lagrangian submodel can be linearized. 3.2 Decomposition and Linearization of the Augmented Lagrangian Model Constraints (2), (3) and (5)–(17) are independent of each vehicle, so the augmented lagrangian model can be decomposed into augmented lagrangian submodels for each vehicle. Where, the augmented lagrangian submodel of vehicle v is expressed as (objective function: (19); Subject to: (2), (3) and (5)–(17)). Lv = min



cij xijv +

i∈N j∈N

   2   λj xijv − 1 + μj xijv − 1 j∈Nc v∈V

(19)

j∈Nc v∈V

In the augmented lagrangian submodel of vehicle v, the decision variables of other vehicles are constant. The auxiliary variable ηi represents the number of times that customer i is served except for vehicle v. The objective function (19) of the augmented lagrangian submodel of vehicle v can be rewritten as (20) by variables ηi . Lv = min



cij xijv +

i∈N j∈N



    2 λj xijv + ηj − 1 + μj xijv + ηj − 1

j∈Nc

(20)

j∈Nc

The decision variable x of the augmented lagrangian submodel of vehicle v is a binary variable. The augmented lagrangian submodel objective function (20) with a quadratic term is rewritten as the linearized augmented lagrangian submodel objective function (21). The objective function (21) can be rewritten as Eq. (22) and (23), where, Q is a constant.           Lv = min cij xijv + λj xijv + ηj − 1 + μj xijv + 2xijv + ηj − 1 • ηj − 1 i∈N j∈N

j∈Nc

j∈Nc

(21) Lv = min



cˆ ij xijv + Q

(22)

i∈N j∈N

 cˆ ij =

  cij + λj + μj 2ηj − 1 , j ∈ Nc cij , j ∈ / Nc

(23)

ADMM with SUSLM for Electric Vehicle Routing Problem

21

The linearization of augmented lagrangian submodel can simplify the difficulty of solution. To the best of our knowledge, labeling-setting algorithm can successfully solve the single vehicle augmented lagrangian submodel and has excellent performance. Therefore, this paper solves the augmented lagrangian submodel by using the time-discrete forward labeling-setting algorithm. 3.3 Labeling-Setting Algorithm The labeling-setting algorithm is a widely used dynamic planning for finding feasible paths. The state generated by the partial route of the vehicle from the warehouse to a node is called the label. In this paper’s labeling-setting algorithm, we define labeling li = (Ni , ti , gi , bi , ci ) as a forward labeling. The meaning of the labeling is as follows: Ni : The set of nodes visited by vehicles. ti : The time of vehicle arrival at node i. gi : The cargo capacity of vehicle at node i. bi : The electric quantity of the vehicle at node i. ci : The accumulated augmented lagrangian cost of the partial route. We first initialize the labeling l0 = (N0 , t0 , g0 , b0 , c0 ) at depot. Label li can be extended to label lj on the premise of satisfying the constraint. The extension function from label li to label lj is described as follows: If j is a customer node. • • • • •

Nj = Ni ∪ {j} tj = ti + st + wti + dtij gj = gi − dcj + pcj bj = bi − dij cj = ci + Cij If j is a recharging station node.

• • • • •

Nj = Ni ∪ {j} tj = ti + st + dtij gj = gi bj = FB cj = ci + Cij

In this paper, the labelling-setting algorithm is used to obtain the feasible route set for the augmented lagrangian submodel. Then select the route with the minimum augmented lagrangian cost from the feasible route set as the solution of the augmented lagrangian submodel and adding it into the sequential updating scheme of the lagrangian multiplier. 3.4 Sequential Updating Scheme of the Lagrangian Multiplier The sequential updating scheme of the lagrangian multiplier is cheaper than the compute augmented lagrangian submodel. Therefore, the SUSLM is a feasible method to accelerate the convergence seed of ADMM. In this paper, the SUSLM is used to iterate

22

F.-L. Feng et al.

the augmented lagrangian submodel and update the lagrangian multiplier. The SUSLM process is as follows: ⎧ k+1    x1 = arg min Lβ xi , . . . , xvk , λk  xi ∈ Xi ⎪ ⎪ ⎪ v 1 ⎪ k+1 ⎪ λk+ v = λk − β −c ⎪ i=1 Ai xi ⎪ ⎨ ... (24)     ⎪

k+1 k+1 k+1 k , . . . , x k , λk+ i−1 ⎪ v ⎪ x = arg min L , . . . , x , x , x ∈ X x

x β 1 i i i ⎪ v i i−1 ⎪   i+1 ⎪ ⎪ v k+1 ⎩ λk+ vi = λk+ i−1 v −β Ax −c i=1

i i

4 Experiments and Results In order to prove the optimization and computational efficiency of the proposed algorithm, this paper uses Solomon’s data set as a test instance. Meanwhile, this paper designs a comparative experiment between ADMM algorithm and ADMM with SUSLM algorithm, and takes the solution result of gurobi as the experimental reference. All the models and algorithms are coded in Python and performed on a Windows computer with an Intel(R) CPU(TM) i7-12700 CPU at 2.10 GHz and 32 GB RAM. It can be seen from Table 2 that the feasible solution and solution time of ADMM with SUSLM algorithm are better than ADMM algorithm. The ADMM with SUSLM algorithm and ADMM algorithm have the same solution time and optimization value as the gurobi solver in customer number 5 and customer number 10. In customer number 15, the feasible solution of the problem can be obtained in a reasonable time class. It is proved that ADMM algorithm and ADMM with SUSLM algorithm can be effectively applied to the EVRPSPDTW-RS problem, and the feasible solution of the problem can be obtained in a reasonable time. Table 2. Comparison of solution of gurobi, ADMM and ADMM with SUSLM. gurobi Cn C101

C102

Rn

Vn

Lb

Ub

Gap

time

5

2

2

72.6771

0

0%

0.27

10

4

2

15

6

2

138.9276

79.31919

0

0%

0.78

121.4751

12.56%

3600*

5

2

2

72.6771

0

0%

0.15

10

4

2

81.8449

0

0%

1.31 (continued)

ADMM with SUSLM for Electric Vehicle Routing Problem

23

Table 2. (continued) gurobi Cn

Rn

Vn

Lb

Ub

Gap

time

15

6

2

141.1029

120.3712

0%

3600*

IPADMM ADMM Lb

Ub Gap time

Lb

0.0159

72.6771

Ub Gap time

72.6771 0

0%

81.83

0

0%

142.1433 0

0%

72.6771 0

0%

0.06

72.6771

0

0%

0.18

81.83

0

0%

6.658511

81.977

0

0%

10.25

142.1433 0

0%

142.1433

0

0%

2923.51

10.41 1581.6

1323.16

0

0%

0.17

81.97766 0

0%

9.53

0%

1594.38

142.1433

0

5 Conclusions and Future Research In this paper, the mixed integer programming model of EVRPTW-RS is established. Meanwhile, this paper proposes a multi-block ADMM with SUSLM algorithm to construct the augmented Lagrange model of the problem, and decomposes and linearizes it into a series of shortest path problems. Finally, the label algorithm is used to solve each shortest path problem. Experiments show that the ADMM with SUSLM algorithm proposed in this paper is superior to the ADMM algorithm in terms of solution quality and computational efficiency. In the future, we will study more effective algorithms to solve EVRPSPDTW-RS based on decomposition strategy. In addition, we will develop the EVRPSPDTW-RS problem to make it more suitable for the actual transportation needs. Acknowledgments. This research was supported by the National Natural Science Foundation of China (61963022 and 62173169) and the Basic Research Key Project of Yunnan Province (202201AS070030).

References 1. Shen, Z., Feng, B., Mao, C., et al.: Optimization models for electric vehicle service operations: a literature review. Transp. Res. Part B. Methodol. 128, 462–477 (2019) 2. Keskin, M., Catay, B.: Partial recharge strategies for the electric vehicle routing problem with time windows. Transp. Res. Part C Emerg. Technol. 65, 111–127 (2016) 3. Schneider, M., Stenger, A., Goeke, D.: The electric vehicle-routing problem with time windows and recharging stations. Transp. Sci. 48(4), 500–520 (2014) 4. Montoya, A., Guéret, C., Mendoza, J.E., et al.: The electric vehicle routing problem with nonlinear charging function. Transp. Res. Part B Methodol. 103, 87–110 (2017) 5. Verma, A., Bierlaire, M.: Electric vehicle routing problem with time windows, recharging stations and battery swapping stations. EURO J. Transp. Logist. 7, 415–451 (2018)

24

F.-L. Feng et al.

6. Shen, Y., Peng, L., Li, J.: An improved estimation of distribution algorithm for multicompartment electric vehicle routing problem. J. Syst. Eng. Electron. 32(2), 365–379 (2021) 7. Chao, W., Dong, M., Fu, Z., et al.: A parallel simulated annealing method for the vehicle routing problem with simultaneous pickup–delivery and time windows. Comput. Ind. Eng. 83(5), 111–122 (2015) 8. Park, H., Son, D., Koo, B., et al.: Waiting strategy for the vehicle routing problem with simultaneous pickup and delivery using genetic algorithm. Expert Syst. Appl. 165, 113959 (2021) 9. Oeztas, T., Tus, A.: A hybrid metaheuristic algorithm based on iterated local search for vehicle routing problem with simultaneous pickup and delivery. Expert Syst. Appl. 202, 117401 (2022) 10. Xu, W., Zhang, C., Cheng, M., Huang, Y.: Electric vehicle routing problem with simultaneous pickup and delivery: mathematical modeling and adaptive large neighborhood search heuristic method. Energies 15(23), 9222 (2022) 11. Yu, Y.A., Xz, A., Hd, B., et al.: ADMM-based problem decomposition scheme for vehicle routing problem with time windows. Transp. Res. Part B: Methodol. 129, 156–174 (2019)

A Nested Differential Evolution Algorithm for Optimal Designs of Quantile Regression Models Zhenyang Xia(B) , Chen Xing, and Yue Zhang Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, Shandong, China [email protected]

Abstract. The Differential Evolution (DE) is an influential heuristic algorithm effective in attaining global optimization of any real vector-valued function. Easy to construct and use, this algorithm is increasingly popular in the field of solving complex optimization problems without any assumptions about the objective function. This article outlines the fundamental DE algorithm and subsequently proposes a nested DE algorithm to address the issue of finding the maximin optimal designs in quantile regression models. This algorithm can effectively address the issue of premature convergence and local optimum in many current theories and algorithms when searching for the maximin optimal designs of quantile regression. The proposed algorithm was applied to common dose response models, including the Michaelis-Menten model, Emax model, and Exponential model, with multiple sets of experiments conducted to analyze the impact of different parameters and connection functions on the algorithm’s performance. The numerical results obtained from these experiments suggest that the algorithm can be applied to a number of complex models and the maximin optimal design of the quantile regression model can be effectively obtained. Keywords: Differential Evolution · Optimal Designs · Quantile Regression

1 Introduction Heuristic algorithms have gained popularity due to their excellent calculation speed, flexibility, and ease of use. These algorithms can solve various complex optimization problems without strict convergence proof [1, 2]. Hence, this paper focuses on a widely used population-based heuristic algorithm—differential evolution algorithm, which is also an efficient global optimization algorithm. Introduced by Price and Storn in 1997 [3], the differential evolution algorithm comprises mutation, crossover, and selection operations. This algorithm can solve multi-objective problems and train neural networks, and it can alleviate the premature convergence problem caused by the particle swarm optimization algorithm [4, 5], commonly used by researchers to solve optimization design problems. Therefore, this paper proposes using the differential evolution algorithm to solve the optimization design problem of quantile regression models. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 25–36, 2023. https://doi.org/10.1007/978-981-99-4755-3_3

26

Z. Xia et al.

Quantile regression was proposed by Koenker and Bassett in 1978 [6]. Due to its excellent robustness, it has been widely used in various fields in the following decades, including pharmacology [7], medicine [8], and informatics [9]. However, due to the complex calculation of quantile regression, the optimal experimental design for constructing quantile regression has not received much attention from researchers. Although Dette and Trampisch provided a limited local and standardized maximum and minimum quantile regression D-optimal design theorem in 2012 [10], the theorem has significant limitations. Researchers also proposed robust optimal designs for quantile regression [11, 12], however, their theories are based on a large number of assumptions about the model. In recent years, some researchers have proposed a quantile regression optimal design for logistic models in restricted spaces [13]. However, this design has strict limitations on both the model and design space. So far, finding the optimal design of the quantile regression model remains a challenging task. Despite the existence of some theories attempting to solve this problem, they have limitations and require numerous assumptions, resulting in limited practical applications. The aim of this paper is to propose a numerical algorithm using the nested DE algorithm to address this challenge. The proposed algorithm is capable of avoiding local optima and efficiently finding the maximin optimal design of multiple complex quantile regression models without any assumptions. The next section describes the key parameters and steps in the differential evolution algorithm and briefly reviews the theory of constructing locally optimal design of quantile regression models. Section 3 demonstrates how to use the differential evolution algorithm to find the maximin optimal designs for quantile regression models, while Sect. 4 shows the numerical results of applying the algorithm to common dose-response models. The final section will end with a conclusion.

2 Preliminaries The current section aims to provide a detailed understanding of the differential evolution algorithm’s parameters and operation steps. Firstly, we introduce the various parameters of the algorithm, including their function and tuning methods. Subsequently, we provide an in-depth explanation of the mutation, crossover, and selection operations of the differential evolution algorithm. Additionally, we demonstrate how this algorithm can efficiently minimize real-valued functions. Moreover, we discuss the construction of D-optimal designs for quantile regression, which is an essential aspect of optimization design. Specifically, we describe the process of constructing quantile regression maximin D-optimal designs for specified scale functions and conditions. This information is necessary to understand how the differential evolution algorithm can be applied to optimal experimental design for quantile regression models. Therefore, we present a comprehensive overview of the relevant concepts, highlighting their importance and interrelation.

A Nested Differential Evolution Algorithm for Optimal Designs

27

2.1 Differential Evolution Algorithm The Differential Evolution (DE) algorithm is a heuristic optimization algorithm that has only four tuning parameters: population size NP, iteration number Gmax , mutation operator F, and crossover ratio CR. The value of each of these four parameters has a significant impact on the entire algorithm, and if not selected properly, there is a risk that the algorithm will fall into a local optimum. However, this article only discusses the parameters of the basic differential evolution algorithm. Currently, many variants of the differential evolution algorithm have been proposed by researchers, and some variants have changed the basic process of the algorithm [14, 15]. The parameters of the function have been changed, added, or deleted. In the initialization phase of the function, the population size NP and the number of iterations Gmax of the differential evolution algorithm need to be selected. If the value of NP is selected properly, the performance of the algorithm can be improved. The setting of this parameter is best to balance the exploration space and time complexity of the algorithm, not only to ensure that there are enough points to explore but also to ensure that the time complexity is not too high. In general, NP should be at least ten times the number of inputs to the function. The number of iterations is usually the stopping criterion of the algorithm, which determines the optimization result of the running time of the algorithm and needs to be carefully considered to obtain the best result in the least running time. The mutation operator F determines the amplification ratio of the deviation vector and is one of the most important parameters in the differential evolution algorithm. Its value determines the balance between the algorithm’s space exploration and utilization. A large mutation operator will quickly move the points in the entire space to the global optimal direction, but this will make it difficult to obtain an accurate value. However, a smaller mutation operator will make the points in the space move too slowly, resulting in poor convergence. In the general case of the basic DE algorithm F ⊆ [0, 2]. According to experience, if the value of the mutation operator is between 0.6–0.9, then the performance of the algorithm for numerical optimization problem is excellent. Among these parameters, the crossover ratio CR is less sensitive than the other three parameters. However, an inappropriate value of CR can still affect the optimization speed and global optimality of the algorithm. The crossover ratio CR determines the probability of the crossover operation, which is a measure of the population mutation rate. In general, the value of CR is between 0 and 1. However, according to research, the most suitable value of crossover ratio is CR ∈ {[0, 0.3] ∪ [0.8, 1]}. This range of values enable the algorithm to achieve better optimization performance.

28

Z. Xia et al.

After initializing the parameters of the differential evolution algorithm and specifying the stopping rules, the algorithm proceeds with three key steps: • Mutation. In the DE algorithm, mutations are regarded as disturbances of random −→ elements. The target vector in generation G is expressed as XiG . The mutation operation will make the weighted difference of the two target vectors, and then add to the third target vector to generate a new individual, which we generally call the donor − → vector ViG . Note that each target vector is different from the donor vector, and each is chosen randomly. According to the formula −→ −→ − → −→ (1) ViG = XrG0 + F XrG1 − XrG2 , where r0 ,r1 , r2 are not equal, we can get the donor vector. • Crossover. The crossover operation mixes the donor vector and the target vector to form a candidate for the next generation of individual vectors, which we generally −→ call the test vector UiG . The specific formula is as follows: ⎧− → −→ ⎨ V G , rand (0, 1) ≤ CR or j = jrand i,j G , (2) Ui,j = −→ ⎩ X G , otherwise i,j

→ −→ −→ − G , V G and X G respectively represent their j dimensional components, and where Ui,j i,j i,j rand (0, 1) represents a random number between 0 and 1. We can see that CR here guarantees the proportion of the test vector from the target vector. And jrand is a −→ randomly selected index, which guarantees that UiG gets at least one component − → from ViG . • Selection. The selection operation compares each test vector with the target vector, and uses the fitness function h to select the most suitable one as the next-generation −−−→ vector XiG+1 . The formula is as follows: ⎧     −→ −→ ⎪ −→ −−−→ ⎨ UiG if h UiG < h XiG G+1 Xi . (3) = → ⎪ ⎩− XiG otherwise. The DE algorithm has been shown to be effective in finding optimal designs for various statistical models [16, 17]. Therefore, in this article, we propose to use the DE algorithm to find optimal designs for quantile regression. By combining the power of the DE algorithm and the statistical method of quantile regression, we aim to provide a novel approach to finding optimal designs for quantile regression models. 2.2 Locally D-Optimal Designs for Quantile Regression In this paper, we introduce the univariate nonlinear quantile regression model, which is defined as follows: y(x) = g(x, θ) + σ (x, θ),

(4)

A Nested Differential Evolution Algorithm for Optimal Designs

29

where g and σ are the position function and the scale function, respectively.  is an independent and identically distributed variable that follows the distribution function F, and its τ − quantile is 0, that is, F −1 (τ ) = 0. x represents an explanatory variable in the design space X = (xl, xu), and θT = (θ1 , θ2 , . . . , θNP ) is a vector of model position parameters. We assume that the position function g is differentiable with respect to the parameter θ , and use ∂g(x, θ) T ∂g(x, θ) ,..., ) g˙ = ( ∂θ1 ∂θNP

(5)

to represent vector of partial derivatives with respect to θ and the gradient of the predicted value to θ . The quantile regression estimate for the parameter θ in the model can be defined as: θˆN (τ ) = arg min θ∈

N 

ρτ (y(xi ) − g(xi , θ)),

(6)

i=1

where τ is given and ranges from 0 to 1. If the approximate design on X is ξ = {xi , wi }(i = 1, 2, . . . , n), where wi is the weight of the corresponding support point xi , with ni=1 wi = 1, then y(xi ) represent the corresponding observation value.  represents the value range of θ . ρτ stands for the inspection function, and also stands for the segmentation loss of this quantile regression. It can be proved that under certain assumptions, θˆN is asymptotically normally distributed, that is, √

N (θˆN − θ) → N (0, τ (1 − τ )D1−1 D0 D1−1 ),

where the matrices D0 and D1 are respectively defined as:

T  D0 (ξ, θ ) = g˙ (x, θ)g˙ (x, θ)d ξ(x), D1 (ξ, θ) =



X

1 X σ (x, θ)

T g˙ (x, θ)g˙ (x, θ)d ξ(x).

(7)

(8) (9)

The asymptotic covariance matrix estimated by quantile regression is defined as: H (ξ, θ) = D1−1 (ξ, θ)D0 (ξ, θ)D1−1 (ξ, θ).

(10)

Under the conditions discussed in this paper, the mapping of the covariance matrix H is usually not a convex function, which makes the classic standard convex optimization theory no longer applicable to the optimization design problem of quantile regression. In the following section, we will discuss the application of the DE algorithm in the context of maximin optimal designs for quantile regression.

30

Z. Xia et al.

3 The Nested DE Algorithm for Maximin Optimal Designs In the previous section, we discussed the methodology for finding the locally optimal design of quantile regression. Although the local optimal design is challenging to implement due to its dependence on unknown parameters, it is often used as a benchmark for many optimal designs. In this paper, we use the local optimal design as the basis for the optimality criterion of the maximin principle and define eff (ξ, θ) = (

|H (ξ ∗ , θ)| θ

 |H (ξ, θ)|

1

)p

(11)

as the D-efficiency of a given design ξ, where ξ ∗ is the local D-optimal design for quantile θ regression, and p is the number of unknown parameters in the model. According to the standard maximin quantile regression optimal experimental design, we need to maximize the function 

 (ξ ) = minθ∈  {eff (ξ, θ )}.

(12)

To achieve this, we first need to find the local D-optimal design, then maximize the function  (ξ ), and finally obtain the quantile regression maximin optimal designs. In constructing nested DE algorithms for optimal designs of quantile model, we set the fitness function obj of the outer DE algorithm as −  (ξ ). In the given value range X = (xl, xu), 2 = (θ2l , θ2u ), we set the fitness function of the inner DE algorithm as |H |. The algorithm pseudocode as Algorithm 1. The algorithm presented in this section aims to calculate the quantile regression maximin optimal design of the Michaelis-Menten model. The algorithm takes as input the unknown parameters and the range of the design space, as detailed in the following section. The traditional differential evolution algorithm is used for the calculation process, with the parameters F = 0.9 and CR = 0.8 set during the initialization of the algorithm. Based on previous research, it is determined that the DE algorithm optimizes the accuracy and efficiency to achieve a balance in this condition. In the subsequent section, we will apply the proposed algorithm to realize the optimal design for dose-response models.

4 Applications of Dose-Response Models In this section, we apply the nested DE algorithm to some dose-response models, namely the Michaelis-Menten model, Emax model and Exponential models, to obtain the maximin optimal design for quantile regression. We provide a detailed description of these three examples and present the results of obtaining the optimal design.

A Nested Differential Evolution Algorithm for Optimal Designs

31

Algorithm 1. Pseudocode for the nested DE

initialization set fitness function obj { do for initialize the parameters of the DE algorithm set fitness function | repeat until for do mutation, crossover and selection operations by equations (1), (2) and (3) end for calculate the

*

corresponding to the minimum of |

++ plug

and

*

into

eff and save the value to

end for ) return − min( } repeat until do for mutation, crossover and selection operations by equations (1), (2) and (3) end for corresponding to the minimum of function calculate the ++ output

4.1 Optimal Designs for Michaelis-Menten Models The Michaelis-Menten model is a commonly used dose-response model in pharmacokinetic studies. It describes the relationship between dose and drug response. The expected responses of the Michaelis-Menten model are as follows: g1 (x, θ) =

θ1 , θ2 + x

(13)

where x ∈ (xl , xu ), the parameter θ1 represents the maximum value of the speed or the maximum response, and θ2 represents the maximum concentration when the response reaches half. In a previous study [10], two connection functions were proposed, which is given by: h1 (z) = 1/z t1 ,

(14)

h2 (z) = exp(−t2 z),

(15)

where t1 , t2 > 0. The scale function of the model is given by the link function as follows: σ (x, θ) = h(g(x, θ)).

32

Z. Xia et al.

Table 1. Maximin D-optimal designs for Michaelis-Menten model with link function h1 and h2 for selected parameter values. θ1 1

2

t1

(100, 2000)

0

h1 (z) = 1/z t1 267.4 2000 0.5

1

(300, 3000)

0



1

(100, 2000)

1

1

(300, 3000)

1

10

(100, 2000)

0

10

(100, 2000)

1

0.5

3



0.5

555.4 2000 0.5



3



1



3





0.5

1069.3 2000 0.5





0.5

764.4 2000 0.5



0.5

499.2 2000

0.5 503 2000 0.5 0.5



0.5

267.4 2000 0.5



1



0.5

721.4 2000 0.5



0.5



0.5

499.2 2000 0.5



1

h2 (z) = exp(−t2 z) 331.9 2000

0.5

437.2 2000 0.5



t2

0.5

1546.3 2000 0.5





0.5

Table 1 presents the numerical results of the optimal design of quantile regression of the Michaelis-Menten model obtained by the nested DE algorithm for x ∈ (0, 2000). The table shows that when the connection functions is h1 , the numerical results corresponding to θ1 = 1, Θ2 ∈ (100, 2000), t1 = 0 and θ1 = 10, Θ2 ∈ (100, 2000), t1 = 0, are all supported at 267.4, 2000 and the weights at these points are 0.5, 0.5. This suggests that θ1 has no effect on the results when the connection functions is h1 . Similarly, when the and θ1= 1 and Θ2 ∈ (100, 2000), the connection functionsare h1 and h2respectively,  267.4 2000 331.9 2000 numerical results are and . It can be seen that the position 0.5 0.5 0.5 0.5 change of the design point is relatively large, indicating that the connection function or scaling function has a significant influence on the structure of the optimal design. The necessary conditions for the quantile regression D-optimal design proposed by Dette [10] are as follows: . .

  2 g T (x, θ)D1−1 ξθ∗ , θ g˙ (x, θ) − g T (x, θ)D0−1 ξθ∗ , θ g˙ (x, θ) ≤ p. (16) σ (x, θ) In this article, we define the inequality (16) as ψ(x, ξ ∗ , θ). As shown in Fig. 1, θ we can use it to check the error of the numerical results obtained by the algorithm. Although this condition can only indicate the necessity of an optimal design, it is clear that the calculated design point d falls below the dashed line in both cases. This does not necessarily imply the optimality of the design, but it does suggest consistency in the theoretical and numerical results.

A Nested Differential Evolution Algorithm for Optimal Designs

33

Fig. 1. Plots of ψ (x, ξ ∗ , θ) for the D-optimal design of Michaelis-Menten model on the design θ space x ∈ [0, 2000] with scale function h1 and the parameter values are θ1 = 1, Θ2 ∈ (100, 2000), t1 = 0 and θ1 = 1, Θ2 ∈ (100, 2000), t1 = 1.

4.2 Optimal Designs for Emax Models The Emax model is a commonly used model in pharmacology, and its expected response is as follows: g2 (x, θ) = θ0 +

θ1 . θ2 + x

(17)

Compared to the Michaelis-Menten model, the Emax model has an additional parameter, θ0 , which represents the slope of the response curve. Table 2 presents the numerical results obtained from various algorithms used to obtain the optimal design of the Emax model. The table highlights that when the connection function is h1 , neither θ0 nor θ1 significantly affects the results. However, when the connection function is h2 , the numerical results show that θ1 has an impact, indicating the sensitivity of optimal design outcomes to different connection functions. This observation suggests that the choice of connection function plays a crucial role in the optimization process of finding the optimal design for the Emax model. Table 2 demonstrates that the choice of connection function can have a significant impact on the numerical results obtained from algorithms used to determine the optimal design for the Emax model. This observation highlights the importance of carefully selecting appropriate connection functions when optimizing the design of experiments for different statistical models.

34

Z. Xia et al.

Table 2. Maximin D-optimal designs for Emax model with link function h1 and h2 for selected parameter values. θ0 , θ1

2

t1

10, 1

(100, 2000)

0

h1 (z) = 1/z t1 0 230.8 2000

t2 1

h2 (z) = exp(−t2 z) 0 331.9 2000

1/3 1/3 1/3 100, 1

(100, 2000)

0



0 230.8 2000

1/3 1/3 1/3

1



1/3 1/3 1/3 10, 10

10, 1

(100, 2000)

(300, 3000)

0

0





0 230.8 2000

10, 1

(100, 2000)

1

1/3 1/3 1/3 0 375 2000

0 239.4 2000



1/3 1/3 1/3

1



517.4 1247.7 2000 1/3

1



1/3 1/3 1/3

0 331.9 2000

1/3

0 503 2000



1/3

1/3 1/3 1/3

3

1/3 1/3 1/3



26.4 577.8 2000



1/3 1/3 1/3

4.3 Optimal Designs for Exponential Models This paper uses an exponential model to demonstrate an unbounded response to dose. The exponential model is a flexible model that can fit a wide range of data and is often used to model biological and chemical processes. The proposed model is particularly useful in situations where the response to dose is not limited or bounded, and the response increases exponentially with increasing dose. The exponential model is a mathematical function that describes the relationship between the dose and the response. It is characterized by an exponential increase in the response as the dose increases. The model is expressed as g3 (x, θ) = θ0 + θ1 exp(x/θ2 ),

(18)

where θ0 is the maximum drug concentration, θ1 is the rate of elimination of the drug from the body or its effectiveness, and θ2 is the half-life of drug. In the Exponential model experiment, we chose two link functions h1 and h3 (z) = z t3 .

(19)

In Table 3 Maximin D-optimal designs for Exponential model with link function h1 and h3 for selected parameter values, numerical results of multiple experiments conducted for various parameters and connection functions are presented in order to observe the influence of said parameters and functions on the optimal design of the exponential model. The numerical results demonstrate that, amongst the four parameters, there is no substantial effect on the optimal design arising from θ1 . When the link function is h1 , it can be observed that θ0 does not have any impact on the optimal design results.

A Nested Differential Evolution Algorithm for Optimal Designs

35

However, when the link function is h3 , θ0 has a greater influence on the function. This demonstrates that the impact of different link functions on the optimal design results is significant and is consistent with the theoretical analysis results. Table 3. Maximin D-optimal designs for Exponential model with link function h1 and h3 for selected parameter values. θ0 , θ1

2

t1

0, 1

(100, 1000)

0

h1 (z) = 1/z t1 0 1702.5 2000 1/3 1/3

1, 1

(100, 1000)

0



0, 2

(100, 1000)

0

1

h3 (z) = z t3 0 569.5 2000

1/3

0 1702.5 2000 1/3 1/3



t3

1/3 1/3 1/3

1



1/3

0 1702.5 2000

0, 1

(200, 2000)

0

1

1/3 0 1474 2000



0, 1

(100, 1000)

1

1556.6 1884.9 2000 1/3

1/3

1/3

0 569.5 2000



1/3 1/3 1/3 1



1/3 1/3 1/3



1/3 1/3 1/3

1/3 1/3

0 692.1 2000



2



0 744.2 2000



1/3 1/3 1/3 0 145 562.2 1/3 1/3 1/3

The examples presented in this study demonstrate the excellent performance of the nested DE algorithm in solving the optimal design for quantile regression. For many complex models, it is challenging to theoretically determine the optimal design results, and numerous assumptions are required. However, the proposed nested DE algorithm can easily find the quantile regression maximin optimal design.

5 Conclusion The nested DE algorithm is a powerful optimization algorithm that can efficiently search for optimal designs in complex models. Based on the traditional DE algorithm, we propose to use the nested structure to improve the optimization process of searching for the optimal design. We tested the code and carried out actual optimization and analysis on several examples. The optimization design results show that the algorithm can be applied to a variety of models, including Michaelis-Menten model, Emax model, and Exponential model. In summary, the nested DE algorithm is a valuable tool to finding optimal designs in quantile regression models. This algorithm can efficiently search for optimal designs in complex models without requiring lots of assumptions.

36

Z. Xia et al.

References 1. Liu, X., Yue, R.X., Kee Wong, W.: Equivalence theorems for c and DA-optimality for linear mixed effects models with applications to multitreatment group assignments in health care. Scand. J. Stat. 49, 1842–1859 (2022) 2. Sebastià Bargues, À., Polo Sanz, J.-L., Martín Martín, R.: Optimal experimental design for parametric identification of the electrical behaviour of bioelectrodes and biological tissues. Mathematics 10, 837 (2022) 3. Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11, 341 (1997) 4. Chen, R.-B., Chang, S.-P., Wang, W., Tung, H.-C., Wong, W.K.: Minimax optimal designs via particle swarm optimization methods. Stat. Comput. 25, 975–988 (2014) 5. Shi, Y., Zhang, Z., Wong, W.K.: Particle swarm based algorithms for finding locally and Bayesian D-optimal designs. J. Stat. Distrib. Appl. 6, 1–17 (2019) 6. Koenker, R., Bassett Jr., G.: Regression quantiles. Econometrica: J. Econom. Soc. 46, 33–50 (1978) 7. Chen, X., Tang, N., Zhou, Y.: Quantile regression of longitudinal data with informative observation times. J. Multivar. Anal. 144, 176–188 (2016) 8. Fang, Y., Xu, P., Yang, J., Qin, Y.: A quantile regression forest based method to predict drug response and assess prediction reliability. PLoS ONE 13, e0205155 (2018) 9. Wang, H., Ma, Y.: Optimal subsampling for quantile regression in big data. Biometrika 108, 99–112 (2021) 10. Dette, H., Trampisch, M.: Optimal designs for quantile regression models. J. Am. Stat. Assoc. 107, 1140–1151 (2012) 11. Kong, L., Wiens, D.P.: Model-robust designs for quantile regression. J. Am. Stat. Assoc. 110, 233–245 (2015) 12. Selvaratnam, S., Kong, L., Wiens, D.P.: Model-robust designs for nonlinear quantile regression. Stat. Methods Med. Res. 30, 221–232 (2021) 13. Zhai, Y., Wang, C., Lin, H.-Y., Fang, Z.: D-optimal designs for two-variable logistic regression model with restricted design space. Commun. Stat.-Theory Methods 77, 1–18 (2023) 14. Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15, 4–31 (2010) 15. Das, S., Mullick, S.S., Suganthan, P.N.: Recent advances in differential evolution–an updated survey. Swarm Evol. Comput. 27, 1–30 (2016) 16. Xu, W., Wong, W.K., Tan, K.C., Xu, J.-X.: Finding high-dimensional D-optimal designs for logistic models via differential evolution. IEEE Access 7, 7133–7146 (2019) 17. Stokes, Z., Mandal, A., Wong, W.K.: Using differential evolution to design optimal experiments. Chemometr. Intell. Lab. Syst. 199, 103955 (2020)

Real-Time Crowdsourced Delivery Optimization Considering Maximum Detour Distance Xianlin Feng, Rong Hu(B) , Nai-Kang Yu, Bin Qian, and Chang Sheng Zhang School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China [email protected], [email protected] Abstract. Previous research in crowdsourced delivery has primarily focused on task assignment and delivery path planning for deliverers, but has neglecting the maximum detour distance of deliverers during the delivery process. This paper proposes a novel model, namely Crowdsourced Delivery Optimization Model with Maximum Detour Distance (CDOM-MDD), which incorporates maximum detour distance, soft time windows, and sequential pickup and delivery constraints. The objective of CDOM-MDD is to maximize the total profit of the deliverers. To solve the static CDOM-MDD model, this paper proposes a Lagrangian relaxation algorithm with a repair mechanism (LRDRM). For the dynamic CDOMMDD model, a dynamic optimization algorithm based on Rolling Horizon Control (RHC) is developed, which performs local optimization for subproblems. Simulation experiments demonstrate that the algorithm significantly reduces solution time for problem instances of varying sizes. Keywords: Crowdsourced delivery · Lagrangian relaxation · Dynamic optimization

1 Introduction The increased usage of mobile devices and GPS technology has motivated retailers and logistics companies to investigate Crowdsourced Delivery (CD) as an efficient, scalable, and cost-effective solution to tackle the challenges of increasing transportation costs and last-mile delivery [1]. CD is categorized as a sharing economy concept, which aims to allocate underutilized social resources and utilize surplus value through Internet platforms. Thus, CD is essentially an open delivery model that outsources delivery tasks to ordinary groups with free vehicles and surplus time through a mature mobile Internet platform, replacing the traditional role of professional deliverers [2, 3]. The delivery process can be approximated as a Multi-Depot Open Vehicle Routing Problem with Soft Time Windows (MDOVRPSTW). In terms of computational complexity, the VRP problem has been proved to be an NP-hard problem [4], which can categorize as MDOVRPSTW, so MDOVRPSTW is an NP-hard problem. Thus, developing effective algorithms to solve crowdsourced distribution is of significant theoretical value. Additionally, CD has gained engineering significance in recent years, as evidenced by its successful implementation by numerous companies including Amazon, Walmart, and DoorDash [5]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 37–46, 2023. https://doi.org/10.1007/978-981-99-4755-3_4

38

X. Feng et al.

Several researchers have proposed mathematical models to address the task assignment problem in crowdsourced delivery. To maximize the number of completed tasks, To et al. [6] developed a mathematical model and designed a heuristic algorithm based on location entropy priority. Deng et al. [7] also developed a mathematical model with the same objective and used dynamic programming and branch-and-bound algorithms for the small-scale problem and a hybrid heuristic-based algorithm for the large-scale problem. For the task assignment and path planning problems, Deng et al. [8] developed a mathematical model to maximize the number of completed tasks while minimizing the travel cost and designed a three-stage optimization algorithm. In a dynamic environment, Asghari and Shahabi [9] proposed a framework to maximize the platform profit and designed a dynamic planning and heuristic-based algorithm for solving the online task assignment problem in crowdsourced delivery. Finally, Li et al. [10] investigated the problem of dynamically planning delivery paths for individual deliverers and solved it quickly using distance nearest and earliest deadline heuristics and pruning strategies. In summary, we propose a crowdsourced delivery optimization model with maximum detour distance (CDOM-MDD) in a static environment, which is more reflective of real-world production scenarios. To solve the static CDOM-MDD model, a Lagrangian relaxation decomposition algorithm with repair mechanism (LRDRM) is presented. Additionally, an improved dynamic optimization algorithm is proposed to solve the dynamic CDOM-MDD model. The rest of this paper are structured as follows: Sect. 2 defines the CDOM-MDD problem and introduces its model. Section 3 proposes and details the LRDRM. Section 4 presents an improved dynamic optimization algorithm. In Sect. 5, experimental results are presented and analyzed. Finally, conclusions are drawn in Sect. 6.

2 Problem Definition 2.1 Model and Parameters Definition Definition 1 (Task): Let a tuple ps , ds  denote a task s. The ps denotes the pickup point of task s and ds denotes the delivery point of task s. Denote by S = {s|s = 1, 2, 3, . . . , |S|} the set of all tasks over time. Definition 2 (Crowdsourcing  deliverer):  A crowdsourced deliverer w can be denoted  by the tuple ow , fw , tw− , tw+ , Qwmax , vw , γw and the set of all deliverers by W = {w|w = 0, 1, 2, . . . , |W |}.  ow and fw denote the starting and ending points of deliverer  Where, w, respectively. tw− , tw+ denotes the time period in a day when the deliverer w is willing to accept the task. Qwmax denotes the maximum weight of deliverer w. vw denotes the delivery speed of deliverer w. γw denotes the maximum detour distance of deliverer w. At time t = tw− , deliverer w is located at ow and sends a request to the crowdsourcing platform to accept the task, and at t = tw+ , the deliverer w must reach his destination fw and no longer accept the task. Definition 3 (Node): In a directed network graph G = (V , E), the set of all nodes in the network is denoted by V = {1, of all edges is denoted by  2, 3, ·· · , |V |},and the set y  E = {(i, j)|i, j ∈ V }. The tuple ti− , ti+ , qi , ϒi , locix , loci is used to denote a node

Real-Time Crowdsourced Delivery Optimization

39

in the network G. The sets start , end , pick and delivery are the sets of the origin, destination, task pickup and delivery points of the deliverers respectively, and the four sets satisfy start ∪ end ∪ pick ∪ delivery = V . Where (ti− , ti+ ) denotes the time window in which node i, i ∈ pick ∪ delivery can receive the service of a deliverer, qi denotes the demand of goods at node i, i∈ pick ∪ delivery , ϒi denotes the compensation y that node i can pay to the deliverer and locix , loci is the xy-coordinate of node i, i ∈ V in the 2D plane. Definition 4 (profit): The total profit Z for the deliverers is equal to the sum of the compensation per task minus the total travel cost and the total time cost. Definition 5 (Detour distance): The detour distance of deliverer w from node i to node j is denoted by ϕijw . φijw consists of three components: vertical distance d⊥ , horizontal distance d|| , and angular distance dθ . Denote the xy-coordinate of node i by (locxi , locyi ), the xy-coordinate of the starting point of deliverer w by (locxow , locyow ), and the xyf f coordinate of the ending point of deliverer w by (locxw , locyw ). The path from node i j j to node j is denoted by vector Lij = (locx − locxi , locy − locyi ), and the path from the deliverer’s starting point to the deliverer’s ending point is denoted by vector Low fw = f f (locxw − locxow , locyw − locyow ). The horizontal distance d|| = min(l||1 , l||2 ), the vertical    2  2 Lij  sin θ, 0 ≤ θ ≤ π l + l 2 of  distance d⊥ = ⊥l1⊥ +l⊥⊥2 and the angular distance dθ =  π  1 2 Lij , 2 ≤θ ≤π path Lij and path Low fw . Where l⊥1 , l⊥2 , l||1 , l||2 and θ are calculated as follows.     (1) l⊥1 = locyow − locyi     f j l⊥2 = locyw − locy 

(2)

    l||1 = locxow − locxi 

(3)

   f j l||2 = locxw − locx 

(4)



Lij • Lo f θ = arccos   w w Lij Lo f  w w

(5)

Then φijw can be calculated from Eq. (6): φijw =

1 κ1 d⊥ + κ2 d|| + κ3 dθ

(6)

In Eq. (6), κ1 , κ2 and κ3 are the weights of the three distances, set as 1, 1, 2, respectively. Definition 6 (CDOM-MDD Problem): Given a set of deliverers W and a set of tasks S. The objective function of the CDOM-MDD problem is to maximize the total profit

40

X. Feng et al.

Z. Without loss of generality, this paper makes the following assumptions about the CDOM-MDD problem. – The geographic coordinates of the start and end points of the deliverers will not coincide with the geographic coordinates of the tasks. – The posting time of the task is equal to the earliest service time of the task pickup point. 2.2 Mathematical Model The objective function Z consists of three components: 1) The total profit obtained by all deliverers. 2) Travel costs for all deliverers. 3) Waiting time cost and delayed delivery cost. The time cost function Hiw for a deliverer w to visit node i is defined as shown in Eq. (7). Decision variable xijw , equal to 1 when the arc(i, j) is met by the deliverer w, 0 otherwise. ⎧ − − ⎨ ω1 ∗ (ti − aiw ), aiw < ti − Hiw = (7) 0, ti ≤ aiw ≤ ti+ ⎩ + ω2 ∗ (aiw − ti ), aiw > ti+ W  V 

(MILP) Z = max

w

yiw ϒi −

i

W  V  w



W  V V   w

i

xijw dij −

V  W 

j

i

Hiw

(8)

w

xijw ≤ 1 , ∀i ∈ V

(9)

j

aow w ≥ tw− , afw ≤ tw+ , ∀w ∈ W

(10)

φijw > γw → xijw = 0, ∀i, j ∈ V , i = j, w ∈ W

(11)

xijw =

j



xjiw = 0, i ∈ start ∪ end , i = ow , i = fw , ∀w ∈ W

(12)

j V 

xijw −

j =i

V 

xjiw = 0 , ∀w ∈ W , i ∈ pick ∪ delivery

(13)

j =i V 

xijw −

V 

xjiw = 1 , ∀w ∈ W , i ∈ start

(14)

xjiw = −1 , ∀w ∈ W , i ∈ end

(15)

xj,n+i,w = 0 , ∀w ∈ W , i ∈ pick

(16)

j =i

j =i

V 

V 

xijw −

j =i V  j =i

xijw −

j =i V  j =n+i

Real-Time Crowdsourced Delivery Optimization

41

arrive arrive xijw = 1 → qiw + qj = qjw , ∀w ∈ W , i, j ∈ V

(17)

arrive qiw ≤ Qwmax , ∀w ∈ W , i ∈ V

(18)

xijw = 1 → aiw + τijw + wait_tiw = ajw

(19)

wait_tiw = max(0, ti− − aiw ), ∀w ∈ W , i ∈ pick ∪ delivery

(20)

delay_tiw = max(0, aiw − ti+ ), ∀w ∈ W , i ∈ pick ∪ delivery

(21)

ti− ≤ aiw + wait_tiw ≤ ti+ , ∀w ∈ W , i ∈ pick ∪ delivery

(22)

aiw + τi,n+i,w ≤ an+i,w , ∀w ∈ W , i ∈ Pw

(23)

xijw ∈ {0, 1} , ∀i, j ∈ V , w ∈ W

(24)

aiw ∈ R+ , wait_ti ∈ R+ , delay_ti ∈ R+ , qiarrive ∈ R

(25)

In the above model, Eq. (8) presents the objective function for the CDOM-MDD problem, which is composed of three components: the total reward earned by all deliverers, the transportation cost and the cost incurred by the deliverers due to the time taken to wait or delay the delivery. Constraints (9) restricts any node i in the directed graph G to be visited by a deliverer at most once. Constraints (10) is the time window constraint for the deliverer. The constraints (11) indicates the detour distance of deliverer w traveling from node i to node j is not allowed to exceed its maximum detour distance. Since we assume that the pickup or delivery point of a task will not have the same geographical location as the starting point or the end point of any deliverer, so constraints (12) require that each deliverer cannot visit the starting and end points of other deliverer. Constraints (13)–(15) are the route balance constraints. Constraints (16) indicate that a deliverer visiting the pickup point of a task must visit the corresponding delivery point. Constraints (17) is the load balancing constraint. Constraints (18) is the maximum load constraint for the deliverer. Constraints (19) is the time balance constraint. Constraints (20)–(22) are the soft time window constraints for the task. Constraints (23) is a sequential pickup and delivery constraint. Constraints (24)–(25) represent the range of values of variables and parameters.

3 Solution Approach Under Static Model 3.1 Lagrangian Relaxation Decomposition Algorithm Lagrangian relaxation algorithm relaxes the problem by removing intractable constraints and putting them into the objective function using Lagrangian multipliers. Lagrangian relaxation algorithms have been widely used to solve VRP and task assignment problems

42

X. Feng et al.

[11–13]. First, we relax the coupling constraints that significantly affect solution speed by incorporating them into the objective function, creating the Lagrangian relaxation problem. We then decompose this problem into independent subproblems based on its properties. Constraint (9) is the only one that links each deliverer together and is thus |V | referred to as the “coupling constraint”. The Lagrangian multiplier vector λ ∈ R+ is introduced to transform the constraint (9) into a penalty term in the objective function by multiplier λi . The lagrangian relaxation function is shown in Eq. (26). LLR (x, y, λ) =

W  V  w

yiw ci −

W  V  V  w

i

i

xijw dij −

W  V  w

j

i

Hiw +

V 

(λi (1 −

W 

yiw ))

w

i

(26) Let SP(w) =

V 

(ci − λi )yiw −

V  V 

i

i

xijw dij −

j

V 

Hiw . Then, the mathematical model

i

of the relaxation problem LLR can be formulated as follows.

W  V V  V V V     (ci − λi )yiw − xijw dij − Hiw + λi (LR Model) LLR = max w

i

i

j

i

i

(27) s.t. (10)–(25) The dual function of LLR can be defined as ⎛ ⎞ V W V V  V V      ⎝ ⎠ (ci − λi )yiw − xijw dij − Hiw + λi φ(λ) = maxLLR (x, y) = x,y

w

i

i

j

i

i

(28) Then the dual of the original problem is (Dual Model) mim φ(λ) λi ∈RV+

(29)

The relaxed problem is decomposed into |W | subproblems, and the w th subproblem is denoted by SP(w). The solution of each subproblem corresponds to the optimal delivery path of each deliverer, respectively. In this paper, Gurobi is used to solve the subproblem. 3.2 Infeasible Solution Repair Strategy Since the first constraint is relaxed, solving the relaxation problem does not obtain a solution that is feasible for the original problem. Thus, we propose an infeasible solution repair mechanism using the repetitive task assignment strategy with the detour distance to establish a feasible upper bound and obtain a lower bound for the original problem. The proposed strategy assigns a weight to each candidate deliverer based on the similarity between the task delivery route and the booking route of the deliverer. A higher candidate weight is assigned to the deliverer when the detour distance is lower, indicating a higher probability of assigning the duplicate task to the candidate deliverer.

Real-Time Crowdsourced Delivery Optimization

43

3.3 Subgradient Algorithm The most used and effective algorithm for solving the Lagrangian dual problem is the subgradient optimization algorithm. First, the initial subgradient is constructed according to the relaxed constraint (9) as in Eq. (30) and the step size θ (k) is constructed as in Eq. (31): W V ×1  (k) (k) s = y|V |w − 1 (30) w

θ (k)

ZUP (k) − ZLB (k) (k)   = β s(k) 

(31)

2

In Eq. (31), 0 ≤ β (k) ≤ 2, and generally take β (0) = 2. As LLR rises, β (k) remains unchanged. When LLR does not change within a given number of steps, half of it is taken. 3.4 LRDRM Algorithm Process The specific algorithmic steps of the LRDRM are as follows: Step1: Initialize k = 0, λ(k) = 0. Step2: Use Gurobi to solve each subproblem and obtain the objective function value of the subproblem, i.e., Obj(SP(w)). Step3: Solve the relaxation function L(k) (Eq. (27)) based on the subproblem objective LR function value Obj(SP(w)) and the Lagrange multiplier λ(k) . Step4: The subgradient (Eq. (30)) is updated according to the solution set of the decision variables obtained by solving the subproblem. Step5: For λ(k) , choose any sub-gradient s(k) , if the sub-gradient s(k) = 0, then obtain an optimal solution and stop iteration; otherwise, λ(k+1) = max{λ(k) + θ (k) s(k) , 0}, k = k + 1, repeat Step2–Step5.

4 Improved Dynamic Optimization Algorithm In the dynamic crowdsourcing distribution model, the release time of tasks is unknown in advance, i.e., the problem is a real-time scheduling problem. To solve this problem, the paper proposes an improved dynamic optimization algorithm that combines the LRDRM algorithm and RHC algorithm. The algorithm introduces symbols t, Tend , Tc , Th and period to represent the system time, system end time, program optimization time, rolling time domain duration and number of periods, respectively. The conventional RHC algorithm optimizes all tasks, which leads to slow solution speed. The proposed algorithm assigns newly added tasks to subproblems by partial constraints and performs scheduling optimization for these subproblems. Other subproblems without the newly added tasks do not need to be optimized again. This approach reduces computation time compared to global optimization. The steps of the improved dynamic optimization algorithm are as follows. Step1: Initialize t = 0, period = 0, the set of tasks NowTask = ∅ for the current time period, the set of uncompleted tasks UnfiniTask = ∅. Input W .

44

X. Feng et al.

Step2: ∀s, if t ≤ ts− < t + Th , NowTask ← s. Step3: If period < 1, call the optimization procedure for solving. Output the scheduling plan Solperiod for time period [t, t + Th ); otherwise proceed to Step4. Step4: ∀s, if task s satisfies the constraint (11) and (18) of w, optimization Subproblem SP(w). Output the scheduling plan Solperiod for time period [t, t + Th ). Step5: The unfinished task s up to time t + Th is obtained from Solperiod . s → UnfiniTask. Step6: If UnfiniTask = ∅, then NowTask = ∅, otherwise NowTask ← UnfiniTask. Step7: t = t + Th , period = period + 1. Step8: Cycle Step2–Step6 until t ≥ Tend .

5 Simulation Result and Comparisons 5.1 Experimental Setup This paper restructures the Solomn dataset by adding delivery node data and starting/ending points for deliverers. This paper use 8 test problems of varying sizes for experimentation. All algorithms are tested on personal computers equipped with an Intel I7 processor (3.2 GHz), 8G RAM, Win10 OS, python 3.7 and Gurobi 9.0 programming environments. The algorithms’ running time for each experiment is set to |V | × 20 seconds, and each problem instance is independently tested 10 times, with the average of results used for comparison. 5.2 Results and Comparison The table below displays the solution efficiency results of the traditional RHC algorithm and the improved RHC algorithm. The experimental results indicate that the improved RHC algorithm exhibits significantly lower solution times compared to the RHC algorithm, while both algorithms yield similar solution quality results (Table 1). Table 1. Comparison experimental results of improved RHC and RHC algorithm Problem Number of tasks 3

Number of workers 3

Improved RHC

RHC

Runtime (s)

Runtime (s)

Gap (%)

0.60

0.01

0.76

Gap (%) 0.01

6

8

7.21

0.01

6.69

0.02

5

10

52.89

0.06

58.13

0.04

10

15

259.12

0.11

388.01

0.33

20

20

313.05

0.83

407.17

0.56

30

40

691.96

1.24

831.61

2.30

50

60

1398.43

1.01

1703.18

1.81

60

80

1979.44

2.46

2248.05

5.01

Real-Time Crowdsourced Delivery Optimization

45

To analyze the influence of Th on the experimental results, this study tested different values of Th for each experiment under the 60 × 80 scale test problem. The results are presented in the following table, which indicates a positive correlation between smaller Th values and lower average waiting time, average delay time, and average solution time (Table 2). Table 2. Effect of parameter Th on experimental results Th (min)

Improved RHC Average waiting time (s)

Average delay time (s)

Average solving time (s)

1

23.1

20

3

72

77.5

10.13

5

152.6

167.8

59.48

10

220.4

260.8

89.65

15

376.6

381.4

268.33

4.89

6 Conclusion By incorporating a maximum detour distance constraint, the proposed CDOM-MDD model is better suited to real-world delivery scenarios. The LRDRM algorithm is developed to solve the static CDOM-MDD problem by analyzing its properties. For the dynamic CDOM-MDD problem, we introduce an improved rolling horizon control (RHC) algorithm that reduces the solution time compared to traditional RHC algorithms. Our approach has shown promising results in improving the efficiency of crowdsourced delivery. This study provides valuable insights for researchers and practitioners in logistics management and planning. Acknowledgement. This research was supported by the National Natural Science Foundation of China (61963022 and 62173169) and the Basic Research Key Project of Yunnan Province (202201AS070030).

References 1. Buldeo Rai, H., Verlinde, S., Macharis, C.: Who is interested in a crowdsourced last mile? A segmentation of attitudinal profiles. Travel Behav. Soc. 22, 22–31 (2021). https://doi.org/10. 1016/j.tbs.2020.08.004 2. Borsenberger, C.: The sharing economy and the “Uberization” phenomenon: what impacts on the economy in general and for the delivery operators in particular? In: Crew, M., Parcu, P.L., Brennan, T. (eds.) The Changing Postal and Delivery Sector. TREP, pp. 191–203. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-46046-8_12

46

X. Feng et al.

3. Carbone, V., Rouquet, A., Roussat, C.: A typology of logistics at work in collaborative consumption. IJPDLM 48, 570–585 (2018). https://doi.org/10.1108/IJPDLM-11-2017-0355 4. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manag. Sci. 6, 80–91 (1959). https://doi.org/10.1287/mnsc.6.1.80 5. Alnaggar, A., Gzara, F., Bookbinder, J.H.: Crowdsourced delivery: a review of platforms and academic literature. Omega 98, 102139 (2021). https://doi.org/10.1016/j.omega.2019.102139 6. To, H., Shahabi, C., Kazemi, L.: A server-assigned spatial crowdsourcing framework, vol. 1 (2015) 7. Deng, D., Shahabi, C., Demiryurek, U.: Maximizing the number of worker’s self-selected tasks in spatial crowdsourcing. In: Knoblock, C.A., Schneider, M., Kröger, P., Krumm, J., Widmayer, P. (eds.) 21st SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2013, Orlando, FL, USA, 5–8 November 2013, pp. 314– 323. ACM (2013) 8. Deng, D., Shahabi, C., Zhu, L.: Task matching and scheduling for multiple workers in spatial crowdsourcing. In: Bao, J., et al. (eds.) Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, Bellevue, WA, USA, 3–6 November 2015, pp. 21:1–21:10. ACM (2015) 9. Asghari, M., Shahabi, C.: On on-line task assignment in spatial crowdsourcing. In: Nie, J.-Y., et al. (eds.) 2017 IEEE International Conference on Big Data (IEEE BigData 2017), Boston, MA, USA, 11–14 December 2017, pp. 395–404. IEEE Computer Society (2017) 10. Li, Y., Yiu, M.L., Xu, W.: Oriented online route recommendation for spatial crowdsourcing task workers. In: Claramunt, C., et al. (eds.) SSTD 2015. LNCS, vol. 9239, pp. 137–156. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22363-6_8 11. Imai, A., Nishimura, E., Current, J.: A Lagrangian relaxation-based heuristic for the vehicle routing with full container load. Eur. J. Oper. Res. 176, 87–105 (2007). https://doi.org/10. 1016/j.ejor.2005.06.044 12. Chen, C.-H., Yan, S., Chen, M.: Applying Lagrangian relaxation-based algorithms for airline coordinated flight scheduling problems. Comput. Ind. Eng. 59, 398–410 (2010). https://doi. org/10.1016/j.cie.2010.05.012 13. Wang, Y., Song, R., He, S., Song, Z., Chi, J.: Optimizing train routing problem in a multistation high-speed railway hub by a Lagrangian relaxation approach. IEEE Access 10, 61992–62010 (2022). https://doi.org/10.1109/ACCESS.2022.3181815

Improving SHADE with a Linear Reduction P Value and a Random Jumping Strategy Yanyun Zhang1 , Guangyu Chen2(B) , and Li Cheng1(B) 1 School of Computer Science and Information Engineering, Hubei University, Wuhan, China

{yyzhang,chengli}@hubu.edu.cn

2 Informatization Office, China University of Geosciences, Wuhan 430074, China

[email protected]

Abstract. DE/current-to-pbest/1 is a novel mutation scheme introduced in the JADE algorithm, and is inherited by the famous JADE variant, SHADE. Thereafter, it has continued to be used in various JADE and SHADE variants. Considering the parameter ‘p’ in mutation operator DE/current-to-pbest/1 controls the greediness of mutation directly, this paper adopts a linear reduction strategy of ‘p’ during the evolution process under the framework of SHADE algorithm (LRP-SHADE), and then helps to further trade off the exploration and exploitation abilities. Meanwhile, a random jumping strategy is also adopted, assisting the population to jump out the local optima and keep evolving. Finally, to verify the effectiveness of the proposed LRP-SHADE algorithm, groups of experiments have been conducted on benchmark CEC2014, including both a step-by-step validation experiment for strategies and performance comparisons between LRP-SHADE and other peer algorithms. According to the experimental results, the efficiency and effectiveness of the LRP-SHADE algorithm have been confirmed. Keywords: Differential Evolution · Mutation Operation · Evolutionary Computation · Jumping Strategy

1 Introduction Due to its simple structure and efficiency in dealing with complex optimization problems, Differential Evolution (DE) has emerged as one of the most popular optimizers since its inception in 1997 [18] and it has achieved noticeable progress during the last two decades [5, 15]. At the same time, DE has also been successfully applied in many fields, such as neural network [6], image processing [13], space trajectory optimization [24] and so on. Each optimization algorithm with search as its core faces the same problem, which is how to balance the exploration ability and exploitation ability [11]. To better balance the exploration capability and exploitation capability of DE, significant works have been carried out from the level of algorithm, evolution operators and control parameters respectively. At the algorithm level, hybridizing DE with other local search (LS) techniques properly can compensate for its deficiency in local exploitation capability, and finally achieve © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 47–58, 2023. https://doi.org/10.1007/978-981-99-4755-3_5

48

Y. Zhang et al.

effective and efficient DE variants [14]. Peng et al. [16] adopted both the BFGS algorithm as the local search operator and a periodic reinitialization to balance the exploration and exploitation of DE. Considering the characteristic of extreme non-linearity of optimization problem in the interplanetary trajectory design, Zuo et al. [25] proposed a case learning-based Differential Evolution algorithm (CLDE), and further combined a global version of CLDE and a local version of CLDE together to find optimizing solutions of the interplanetary trajectory design problem. As for enhancing the performance of DE through evolution operators, the combination of different strategies or adaptive strategy selection have often been considered. For example, Gong et al. [8] proposed a cheap surrogate model-based multi-operator search strategy for evolutionary optimization instead of choosing an operator according to probability. What’s more, through the multi-population framework, the combination of different strategies can be easily achieved. Wu et al. [22] realized a new DE variant, namely MPEDE, under a multi-population framework, which consisted of three mutation strategies, i.e., current-to-pbest/1, current-to-rand/1 and rand/1. And in 2019, Ma et al. [12] summarized the published techniques related to the multi-population methods in nature-inspired optimization algorithms. The control parameter setting of DE also has an impact on exploration and exploitation. In the work of Zhang et al. [23], a JADE algorithm was proposed. JADE developed a new mutation operator DE/current-to-pbest/1, and adopted a control parameter adaptation strategy based on updating a normal distribution from which the Cr values were sampled and a Cauchy distribution from which the F values were sampled. And under the framework of JADE, two novel variants have been developed in succession, namely SHADE [19] and L-SHADE [20]. Compared to JADE, SHADE added two memory archives M Cr and M F to store historical mean values of the normal distribution and historical the location parameters of the Cauchy distribution respectively. And then the Cr values and F values were sampled according to the corresponding distributions, which were updated by choosing the mean value and location parameter from M Cr and M F respectively. As for L-SHADE, it enhanced the performance of SHADE with a linear reduction population size. Afterwards, a series of works have been achieved based on SHADE and L-SHADE, such as SPS-L-SHADE-EIG [9], L-SHADE-Epsin [3], L-SHADE-cnEpsin [2], L-SHADE-RSP [1], COLSHADE [10] and so on. Although plenty of works have been done based on SHADE as mentioned above, the effect of the parameter ‘p’ introduced within the mutation operator DE/current-topbest/1 has been rarely discussed. Actually, the parameter ‘p’ controls the greediness of the mutation operator, and then trades off the exploration and exploitation [20]. So, this paper treats ‘p’ as another control parameter of DE and adopts a linear reduction strategy of ‘p’ during the evolution process under the framework of SHADE, namely LRP-SHADE, to further improve the performance. What’s more, in view of the situation that the population falls into the local optima, a random jumping strategy has also been adopted here to help the population keep evolving. To verify the proposed algorithm, experiments have been conducted on CEC2014 compared with several peer algorithms. The rest of the paper is organized as following. Section 1 gives quick look at the classic DE algorithm. The details of the proposed LRP-SHADE are described in Sect. 2.

Improving SHADE with a Linear Reduction P Value

49

Section 3 shows the experimental results. And Sect. 4 summarizes the whole work of this paper.

2 LRP-SHADE To further balance the exploration and exploitation capabilities of SHADE, an LRPSHADE algorithm is proposed with a linear reduction ‘p’ value in the mutation operation DE/current-to-pbest/1, as well as a random jumping strategy to avoid stagnation. This section will briefly introduce the framework of SHADE1.1 [20] at first and describe the improved work in detail later. 2.1 SHADE DE/current-to-pbest/1 with External Archive. SHADE inherited the mutation operation DE/current-to-pbest/1 with external archive from JADE algorithm, which is shown as shown below.   (1) Vi = Xi + Fi · Xpbest − Xi + Fi · (Xr1 − Xr2 ) To alleviate the greediness, an extra parameter ‘p’ was introduced in DE/currentto-pbest/1 on the basis of DE/current-to-best/1, where p ∈ [0, 1] and the recommended setting is 0.11 [20]. Specifically, in DE/current-to-best/1 the best individual of the population Xbest is directly adopted to the mutation, while in DE/current-to-pbest/1, Xpbest is an individual randomly chosen from the top NP × p superior individuals of the current generation. For Xr1 and Xr2 , the former is randomly chosen from the population P, and the latter is randomly chosen from P ∪ A, where A is an external archive storing inferior solutions. Meanwhile, Xpbest , Xr1 and Xr2 are different from each other. Adaptation Strategies for F and Cr. In SHADE, the parameters F and Cr are set at the level of individual, which means each member of the population has its own scaling factor Fi and crossover rate Cr i , where Fi obeys the Cauchy distribution, denoted by Fi = randc(MF,ri , 0.1), and Cr i obeys the normal distribution, denoted by Cr i = randn(MCr,ri , 0.1). MF,ri and MCr,ri are the ri − th members of the historical memory archives MCr and MF respectively, and ri is randomly chosen from [1, H] where H denotes the size of MCr and MF . All of the members in MF and MCr are initialized as 0.5 at the beginning of evolution. To update MF and MCr , SHADE firstly adopts two sets SF and SCr to archive the Fi and Cr i of the survival individuals from selection. And then calculate the weighted Lehmer mean values μF and μCr according to the members of sets SF and SCr respectively. And over generations, the members of MF and MCr will be updated by μF and μCr respectively in turn. The specific calculation formulas are as follows. μF = meanWL (SF )

(2)

μCr = meanWL (SCr )

(3)

50

Y. Zhang et al.

where meanWL ( ) refers to the weighted Lehmer mean operator, and the calculation formula is as following. |S| meanWL (S) = k=1 |S|

wk · S 2

k=1 wk

·S

fk wk = |S| k=1 fk

(4) (5)

where fk represents the improvement of the trial vector compared to the target vector corresponding to the k-th member of SF and SCr . 2.2 Linear Reduction P Value with a Random Jumping Strategy According to the formula of DE/current-to-pbest/1, when p = 1, the mutation operation tends to be the DE/current-to-rand/1, which is more helpful to explore within the entire search space. When the value of ‘p’ is small enough, the formula degenerates into DE/current-to-best/1, which is more likely to guide the population to the optimal region. Therefore, the parameter ‘p’ controls the greediness of the mutation operation DE/current-to-pbest/1, and then trades off the exploration and exploitation. Considering this property of the parameter ‘p’, a linear reduction strategy of p is adopted under the framework of SHADE. It is expected that a large ‘p’ value would make for the exploration at the early stage and then guide the algorithm find as more promising region as possible. As the evolution progresses, it is also believed that a gradually decreased ‘p’ might useful for the exploitation at the later stage. The ‘p’ value reduces as following. p=(

pmin − pmax ) × NFEs + pmax MaxNFEs

(6)

where pmax and pmin are the maximum and minimum of the ‘p’ respectively, and MaxNFEs is the given maximum number of evaluations. Here, pmax = 0.5 and pmin = 0.05, and for details, it will be discussed in Sect. 3. In view of the situation that the population may falls into the local optima and thus affect the evolving, the proposed algorithm first makes judgement according to the ratio of the optima’s fitness values of the offspring and the parent populations fbest,g+1 /fbest,g , and then a random jumping strategy is adopted to help jumping out of the scope of the local optima. Iftheratiofbest,g+1 /fbest,g isgreaterthanarandomnumberbetween0and1,itislikelythat theoptimafoundsofarisupdatedslowly.Andifthishappensovercountmax = 8generations, itisjudgedthatthepopulationmayfallsintothelocaloptima.Consideringthegreat‘p’value will alleviate the situation in the early stage of evolutionary process, the judgement will be conducted in the three-quarters of the evolutionary process (0.5 × MaxNFEs < NFEs < 0.75 × MaxNFEs). The judgement will not be adopted in the last quarter to avoid affecting theconvergencespeedofthepopulation.Oncethepopulationistrappedinalocaloptimum, a new individual will be generated randomly within the whole search space, and replace the worst one among the population. The pseudocode for LRP-SHADE is shown below.

Improving SHADE with a Linear Reduction P Value

51

52

Y. Zhang et al.

3 Experiment To analyze the performance of LRP-SHADE, the comparing experiments are conducted on CEC2104 between LRP-SHADE and SHADE [19], jDE [4], CoDE [21], SaDE [17], MPEDE [22] as well as UMOEAs [7] respectively. CEC2014 consists of 30 test problems in total, including Unimodal Functions (F1– F3), Simple Multi-modal Functions (F4–F16), Hybrid Functions (F17–F22), and Composite Functions (F23–F30). Each algorithm is tested on all of the problem in CEC2014 at dimension D is set as 30, 50 and 100 respectively, and the search space of each optimization problem is [−100, 100]D . Running an algorithm on a problem at a certain dimension is called a round of experiment, and the maximum number of evaluations is MaxNFEs = 10000 × D. Each algorithm will be conducted 51 rounds on each problem, and the results are presented in the form of function error value (FEV), namely the difference f (X ) − f (X ∗ ), where f (X ) denotes the objective function value found so far and f (X ∗ ) is the actual global optimum of the problem. When the FEV is less than 10−8 , the global optimal solution is considered to have been found. At this point, the FEV is

Improving SHADE with a Linear Reduction P Value

53

recorded as 0. Finally, the 51 results of each algorithm on each function are averaged. To better analyze the results, Wilcoxon’s ranksum nonparametric test method in MATLAB and Friedman’s nonparametric test method in KEEL software are adopted. Groups of experiments have been done on CEC2014 benchmarks. Section 3.1 shows the performance of improved strategies in LRP-SHADE. The results of both LRP-SHADE and all those compared algorithms are analyzed in Sect. 3.2. Section 3.3 illustrates the convergence speeds of different algorithms. 3.1 Strategy Verification To show the performance of the linear reduction ‘p’ and random jumping strategies, a variant of SHADE with only the linear reduction ‘p’ strategy, denoted as p-SHADE, and LRP-SHADE are compared with SHADE on CEC2014. And the statistical results with the Wilcoxon’s ranksum method are shown in Table 1. The symbols ‘ +’, ‘−’ or ‘ =’ indicate that the performance of the corresponding algorithm is significantly superior to, significantly inferior to or equivalent to SHADE. And the numbers in the table denote the quantities of functions that match those three cases respectively among 30 CEC2014 benchmark functions. Table 1. Wilcoxon statistical results of strategy verification experiment vs. SHADE

D = 30

p-SHADE LRP-SHADE

D = 50 −

=

7

2

21

11

4

15

+

D = 100 −

=

6

3

21

11

5

14

+



=

9

6

15

13

6

11

+

According to the results between SHADE and p-SHADE, it can be found that when D = 30 or D = 50, the linear reduction ‘p’ strategy causes significant impacts on about one-third functions, and when D = 100, the proportion of affected functions rises to one-half. Meanwhile, among these affected functions, the performance of p-SHADE is superior to that of SHADE in most cases. That’s to say, the linear reduction ‘p’ strategy has a significant impact on the performance of SHADE and most of the impacts are positive. Considering both the statistical results of p-SHADE and LRP-SHADE, LRPSHADE provides a further improvement in overall performance over p-SHADE. Therefore, it can be inferred that the random jumping strategy also plays a positive role in enhancing SHADE. And from the data of these two variants under the symbol ‘−’, there is no obvious change in quantity. This also shows that the two strategies are compatible with each other.

54

Y. Zhang et al.

3.2 LRP-SHADE Performance Verification To show the performance of LRP-SHADE and other peer algorithms on all CEC2014 benchmark functions, the Friedman’s statistical analysis of all experimental results was also shown in Table 2. The analysis took into account the results of each algorithm on all the benchmark functions and then ranked each algorithm. The smaller the ranking value is, the better the comprehensive performance of the algorithm is. From Table 2, LRP-SHADE performed best on all dimensions of the test problems. Table 2. Friedman’s average ranking values of all algorithms Algorithms

Friedman’s average ranking value D = 30

D = 50

D = 100

jDE

4.6167

4.3167

4.4

CoDE

3.8667

4.4167

4.7833

SaDE

6.1333

6.9333

6.1167

MPEDE

3.3667

3.6667

3.8833

UMOEAs

3.6833

3.2167

2.8167

SHADE

3.5

2.8833

3.4667

LRP-SHADE

2.8333

2.5667

2.5333

In summary, by comparing all algorithms with LRP-SHADE on the three dimensions of CEC2014, LRP-SHADE could achieve better performance in general, which could also fully verify the effectiveness of the algorithm. 3.3 Convergence Graph To better illustrate the convergence of each algorithm, multiple observation points were set during the operation of each algorithm. When progressing to the preset number of evaluations, the current optimal solution was recorded, and then the entire convergence curve of each algorithm was obtained. These observation points were 0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 × MaxNFEs, respectively. In order to show the convergences of all types of functions, the convergence curves of 9 functions covering four types were drawn, including F1, F4, F10, F16, F17, F22, F23, F27 and F30. Among them, F1, F4, F17 and F23 were the first functions of the Unimodal Functions (F1–F3), Simple Multi-modal Functions (F4–F16), Hybrid Functions (F17–F22) and Composite Functions (F23–F30) respectively, while F16, F22 and F30 were the last ones. F10 and F27 were functions in the middle of Simple Multi-modal Functions and Composite Functions respectively.

Improving SHADE with a Linear Reduction P Value

(a) F1

(b) F4

(c) F10

(d) F16

(e) F17

(f) F22

(g) F23

(h) F27

(i) F30

55

Fig. 1. Examples of convergence curves on CEC2014 (D = 30)

From Figs. 1, 2 and 3, it can be observed that, the improved strategies based on SHADE did not make significant differences on the convergence trends to the original algorithm in most cases, for example F1, F10, F16, F22 and F30. For F4 and F10, especially in 50-dimensional and 100-dimensional cases, LRPSHADE could achieve further convergence in the late stage. And for F23 and F27, it was obvious that when the evolution is halfway through, populations that was already at a plateau, restarted to converge further, and the restart point coincided with the trigger time of the random jumping strategy. Therefore, on the whole, the proposed strategies could help to improve the convergence. Meanwhile, when comparing to other algorithms, LRP-SHADE had a relatively faster convergence speed while achieving promising search results in general.

56

Y. Zhang et al.

(a) F1

(b) F4

(c) F10

(d) F16

(e) F17

(f) F22

(g) F23

(h) F27

(i) F30

Fig. 2. Examples of convergence curves on CEC2014 (D = 50)

(a) F1

(b) F4

(c) F10

(d) F16

(e) F17

(f) F22

(g) F23

(h) F27

(i) F30

Fig. 3. Examples of convergence curves on CEC2014 (D = 100)

Improving SHADE with a Linear Reduction P Value

57

4 Conclusion Considering the role played by parameter ‘p’ in the mutation scheme DE/current-topbest/1 and the necessity to help population get rid of falling into local optima, a SHADE variant, namely LRP-SHADE, has been proposed with a linear reduction ‘p’ strategy and a random jumping strategy. According to the experimental results on CEC2014 benchmark functions, the two improved strategies are compatible with each other and can help improve the performance of SHADE together. Meanwhile, the effectiveness and efficiency of LRP-SHADE have also been verified after comparisons with other six peer algorithms. However, through the analysis of convergence, it has been found that the random jumping strategy works well for some functions, for example F23 and F27, and needs to be further investigated for others. What’s more, whether the linear reduction ‘p’ strategy can be used as a general strategy in other algorithms using DE/current-to-pbest/1 is also worth further discussion.

References 1. Akhmedova, S., Stanovov, V., Semenkin, E.: LSHADE algorithm with a rank-based selective pressure strategy for the circular antenna array design problem. In: ICINCO (1), pp. 159–165 (2018) 2. Awad, N.H., Ali, M.Z., Suganthan, P.N.: Ensemble sinusoidal differential covariance matrix adaptation with Euclidean neighborhood for solving CEC2017 benchmark problems. In: 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 372–379. IEEE (2017) 3. Awad, N.H., Ali, M.Z., Suganthan, P.N., Reynolds, R.G.: An ensemble sinusoidal parameter adaptation incorporated with L-SHADE for solving CEC2014 benchmark problems. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 2958–2965. IEEE (2016) 4. Brest, J., Greiner, S., Boskovic, B., Mernik, M., Zumer, V.: Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems. IEEE Trans. Evol. Comput. 10(6), 646–657 (2006) 5. Das, S., Mullick, S.S., Suganthan, P.N.: Recent advances in differential evolution an updated survey. Swarm Evol. Comput. 27, 1–30 (2016) 6. Dragoi, E.N., Curteanu, S., Galaction, A.I., Cascaval, D.: Optimization methodology based on neural networks and self-adaptive differential evolution algorithm applied to an aerobic fermentation process. Appl. Soft Comput. 13(1), 222–238 (2013) 7. Elsayed, S.M., Sarker, R.A., Essam, D.L., Hamza, N.M.: Testing united multi-operator evolutionary algorithms on the CEC2014 real-parameter numerical optimization. In: 2014 IEEE Congress on Evolutionary Computation (CEC), pp. 1650–1657. IEEE (2014) 8. Gong, W., Zhou, A., Cai, Z.: A multi-operator search strategy based on cheap surrogate models for evolutionary optimization. IEEE Trans. Evol. Comput. 19(5), 746–758 (2015) 9. Guo, S.M., Tsai, J.S.H., Yang, C.C., Hsu, P.H.: A self-optimization approach for L-SHADE incorporated with eigenvector-based crossover and successful-parent selecting framework on CEC 2015 benchmark set. In: 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 1003–1010. IEEE (2015) 10. Gurrola-Ramos, J., Hernàndez-Aguirre, A., Dalmau-Cedeño, O.: COLSHADE for real-world single-objective constrained optimization problems. In: 2020 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2020) 11. Jerebic, J., et al.: A novel direct measure of exploration and exploitation based on attraction basins. Expert Syst. Appl. 167, 114353 (2021)

58

Y. Zhang et al.

12. Ma, H., Shen, S., Yu, M., Yang, Z., Fei, M., Zhou, H.: Multi-population techniques in nature inspired optimization algorithms: a comprehensive survey. Swarm Evol. Comput. 44, 365–387 (2019) 13. Mesejo, P., Ugolotti, R., Di Cunto, F., Giacobini, M., Cagnoni, S.: Automatic hippocampus localization in histological images using differential evolution-based deformable models. Pattern Recogn. Lett. 34(3), 299–307 (2013) 14. Noman, N., Iba, H.: Accelerating differential evolution using an adaptive local search. IEEE Trans. Evol. Comput. 12(1), 107–125 (2008) 15. Pant, M., Zaheer, H., Garcia-Hernandez, L., Abraham, A., et al.: Differential evolution: a review of more than two decades of research. Eng. Appl. Artif. Intell. 90, 103479 (2020) 16. Peng, L., Zhang, Y., Dai, G., Wang, M.: Memetic differential evolution with an improved contraction criterion. Comput. Intell. Neurosci. 2017, 1395025 (2017) 17. Qin, A.K., Suganthan, P.N.: Self-adaptive differential evolution algorithm for numerical optimization. In: 2005 IEEE Congress on Evolutionary Computation, vol. 2, pp. 1785–1791. IEEE (2005) 18. Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997) 19. Tanabe, R., Fukunaga, A.: Success-history based parameter adaptation for differential evolution. In: 2013 IEEE Congress on Evolutionary Computation, pp. 71–78. IEEE (2013) 20. Tanabe, R., Fukunaga, A.S.: Improving the search performance of shade using linear population size reduction. In: 2014 IEEE Congress on evolutionary Computation (CEC), pp. 1658–1665. IEEE (2014) 21. Wang, Y., Cai, Z., Zhang, Q.: Differential evolution with composite trial vector generation strategies and control parameters. IEEE Trans. Evol. Comput. 15(1), 55–66 (2011) 22. Wu, G., Mallipeddi, R., Suganthan, P.N., Wang, R., Chen, H.: Differential evolution with multi-population based ensemble of mutation strategies. Inf. Sci. 329, 329–345 (2016) 23. Zhang, J., Sanderson, A.C.: JADE: adaptive differential evolution with optional external archive. IEEE Trans. Evol. Comput. 13(5), 945–958 (2009) 24. Zuo, M., Dai, G., Peng, L., Tang, Z., Gong, D., Wang, Q.: A differential evolution algorithm with the guided movement for population and its application to interplanetary transfer trajectory design. Eng. Appl. Artif. Intell. 110, 104727 (2022) 25. Zuo, M., Dai, G., Peng, L., Wang, M., Liu, Z., Chen, C.: A case learning-based differential evolution algorithm for global optimization of interplanetary trajectory design. Appl. Soft Comput. 94, 106451 (2020)

Detecting Community Structure in Complex Networks with Backbone Guided Search Algorithm Rong-Qiang Zeng1,2(B) , Li-Yuan Xue3 , and Matthieu Basseur4 1 Department of Computer Science, Chengdu University of Information Technology,

Chengdu 610225, Sichuan, China [email protected] 2 Chengdu Documentation and Information Center, Chinese Academy of Sciences, Chengdu 610041, Sichuan, China 3 James Watt School of Engineering, University of Glasgow, Glasgow G12 8QQ, U.K. 4 LERIA, Université d’Angers, Boulevard Lavoisier, 49045 Cedex 01 Angers, France [email protected]

Abstract. Detecting communities is of great importance in complex networks, which is very hard and not yet satisfactorily solved. This paper investigates a backbone guided search algorithm in order to optimize the modularity of the community structure in complex networks. Our proposed algorithm consists of two main procedures, which are local search and backbone guided search. When the local search procedure can not improve the modularity function any more, the backbone guided search procedure is applied for further improvements. The computational results indicate the effectiveness of the proposed algorithm compared with the classical methods. Keywords: complex networks · community detection · modularity function · local search · backbone guided search

1 Introduction Many real-world complex systems can be usually represented as networks, such as social relationship networks, world wide web, protein-protein interaction network, and so on. Generally, complex networks can be modeled as graphs, where the nodes denote the objects and the edges denote the interactions among these objects. One of the most relevant features of the graphs is community structure, which plays an important role in understanding the intrinsic properties of networks. In order to tackle this problem, many meaningful quality functions and useful approaches are proposed in literature, including graph partitioning [9], hierarchical clustering [15], spectral algorithms [1, 18], metaheuristics [6, 7, 12], etc. One of the most popular quality functions is the modularity proposed by Newman and Girvan in [21], which is based on the idea that a random network is not expected © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 59–67, 2023. https://doi.org/10.1007/978-981-99-4755-3_6

60

R.-Q. Zeng et al.

to have a community structure. Now, the modularity is widely accepted by the scientific community. Given a simple undirected graph G = (V, E), where V is the set of vertices and E is the set of undirected edges. Suppose the vertices are divided into the communities such that vertex v belongs to community C denoted by C v , the modularity is defined as follows [21]: Q=

kv kw 1  ]δ(Cv , Cw ) [Avw − 2m vw 2m

(1)

where A is the adjacency matrix of graph G. Avw = 1 if one node v is connected to is equal to 1 if i = j and 0 another node w, otherwise Avw = 0. The δ function δ(i, j)  a vertex v is defined to be k = otherwise. The degree k v of v v Awv , and the number of  edges in the graph is m = wv Awv /2. Furthermore, the modularity function can be represented in a simple way, which is formulated below [21]:  (eii − ai2 ) (2) Q= i

where i runs over all communities in graph, eij and ai2 are respectively defined as follows [21]: eij =

1  Avw δ(Cv , i)δ(Cw , j) 2m vw

(3)

which is the fraction of edges that join vertices in community i to vertices in community j, and ai =

1  kv δ(Cv , i) 2m v

(4)

which is the fraction of the ends of edges that are attached to vertices in community i. Actually, the modularity has been proven to be NP-hard in [5]. As an NP-hard problem, the exact algorithm is usually difficult to solve the problem in a polynomial time and even small networks may require considerable computational time. Therefore, many heuristic and metaheuristic methods constitute a natural and useful approaches for tackling such problem. There exist numerous algorithms for optimizing the modularity in the literature, such as greedy techniques [20], simulated annealing [13], extremal optimization [3], spectral optimization [22], etc. In this paper, we present a backbone guided search algorithm to optimize the modularity of community structure, which is a hybrid metaheuristic algorithm integrating the backbone guided search techniques for further improvements. Our proposed algorithm follows a general framework composed of two main procedures: local search and backbone guided search. The local search procedure iteratively divides the community into the smaller ones until the modularity function value can not be improved any more. Then, the backbone guided search procedure is employed to further improve the modularity while maintaining a number of fixed community structures. Experimental results

Detecting Community Structure in Complex Networks

61

on six networks from the literature indicate the proposed algorithm is very competitive in comparison with the classical methods. The rest of this paper is organized as follows. In Sect. 2, we briefly review the literature which optimizes the modularity of large networks with heuristic and metaheuristic algorithms. Then, we present the ingredients of our Backbone Guided Search (BGS) algorithm which includes a basic local search procedure and backbone guided search procedure for further improvements in Sect. 3. Afterwards, our experimental results are reported in Sect. 4. Finally, the conclusions and perspectives are given in Sect. 5.

2 Related Works In recent years, many metaheuristics have been proposed to optimize the modularity in order to obtain good partitions in complex networks. In the following paragraphs, we present the studies which maximize the modularity with metaheuristics. M. Sales-Pardo et al. [13] investigate a simulated annealing algorithm in order to find the modularity of a network, which is analogous to finding the ground-state energy of a spin system. Moreover, they point out stochastic network models give rise to modular networks due to fluctuations. Specifically, the experimental results indicate that both the random graphs and the scale-free networks have modularity numerically and analytically. A. Arenas et al. [2] present a tabu search algorithm with a multiple resolution procedure, which allows the optimization of the modularity process to go deep into the structure. This approach consists of rescaling the topology by defining a new network from the original one and providing each node with a self-loop of the same magnitude r. Then, the new network not only presents the same characteristics as the original network in terms of connectivity, but also allows the search of modules at different topological scales. Zhipeng Lü et al. [17] propose an iterated tabu search algorithm for optimizing the modularity of community structure in complex networks. The proposed algorithm employs a postrefinement procedure to further optimize the objective function after the basic optimization can not improve the modularity any more. Experimental results on seven benchmark instances show their algorithm is highly effective. N. C. V. Nascimento et al. [19] present a heuristic algorithm combining greedy randomized adaptive search procedure with path relinking technique, in order to compute the clustering with maximum modularity in a unweighted or weighted graph. In this algorithm, a class of {0, 1} matrices is introduced that characterizes the family of clusterings in a graph, and a distance function is given that enables us to define an l-neighborhood local search. Computational experiments indicate that their proposed algorithm consistently produces better quality solutions than other heuristics from the literature in a set of artificially generated graphs and some well known benchmark instances. I. Hamid et al. [14] introduce a heuristic algorithm that identifies the seed nodes, marks them as starting communities and uses an objective function to calculate objective scores of the nodes, in order to detect the communities in large networks, which effectively decreases layout complexity of visualization due to its ability to construct larger community size. The extensive experiments on different real-world complex networks of different sizes validate its effectiveness in reducing complexity of and visualization of big networks.

62

R.-Q. Zeng et al.

3 Backbone Guided Search Algorithm The backbone guided search algorithm (BGS) is designed to optimize the modularity of community structure in complex networks, which is composed of two main components: local search and backbone guided search. The general scheme of our proposed algorithm is described in Algorithm 1, and the main components are detailed in the following sections. Algorithm 1 Pseudo-code of the backbone guided search algorithm 01: Input: the network adjacency matrix A 02: Output: the best value of the modularity function 03: P = {x1, . . . , xp} ← Random Initialization (P) 04: repeat /ѽ Section 3.1 ѽ/ 05: xi ← Local Search (xi) 06: until a stop criterion is met 07: Backbone Guided Search: /ѽ Section 3.2 ѽ/ 08: repeat: 09: randomly choose an individual xi from P and a vertex w 10: find w ∈ Ci in m individuals and w ∈ Cj in p − m individuals ( , , ) 11: ( , , )←∑ 12: if ( , , ) ≥ 0 then 13: w ∈ Cj in m individuals 14: end if 15: update {x1, . . . , xp} 16: until a stop criterion is met

Starting from an initial random population, the BGS algorithm uses the fast incremental neighborhood evaluation technique given in [17] to efficiently achieve the local search procedure. Afterwards, the backbone guided search procedure is employed to improve the quality of the entire population, whereupon the best value of the modularity function is generated after this process. 3.1 Local Search Procedure In the local search procedure, we divide the whole network into two communities, and each smaller community is further divided into two communities. This process is repeated until the modularity can not be improved as done in [17]. Generally, one community C k is divided into two communities C i and C j , and all the vertices belonging to C k are randomly assigned to C i and C j . After the initialization, we introduce a special data structure named move value used in [17] to compute the incremental value of the modularity function for each possible move of the current solution. Let C i and C j are two communities, w be a vertex from C i or C j . We assume that w ∈ C i and the corresponding change by moving vertex w from C i to C j can be computed

Detecting Community Structure in Complex Networks

63

as follows [17]: j

kw (ai − aj ) k2 kw − kwi + − w2 Q(w, Ci , Cj ) = m m 2m

(5)

j

where kwi and kw are respectively the number of edges connecting vertex w and the other vertices in communities C i and C j . On the other hand, for any vertex v in community C i , we can also obtain the updated ΔQ value ΔQ (v, Ci , Cj ) with the formula below [17]: Q (v, Ci , Cj ) = Q(v, Ci , Cj ) − (

kw2 2Awv ) − 2 m m

(6)

Correspondingly, for any vertex v in community C j , there are two possible cases for the updated ΔQ value ΔQ(v, Cj , Ci ). When considering the same vertex v (v = w), ΔQ(v, Cj , Ci ) is updated as follows [17]: Q (v, Cj , Ci ) = −Q(v, Ci , Cj )

(7)

when considering two different vertices v and w (v = w), ΔQ(v, Cj , Ci ) is updated as follows [17]: Q (v, Cj , Ci ) = Q(v, Cj , Ci ) + (

kw2 2Awv ) − 2 m m

(8)

According to Eqs. (5), (6), (7) and (8), the simple local search procedure chooses the best move in the current neighborhood at each step until the modularity does not improve. 3.2 Backbone Guided Search Procedure The terminology backbone originates from solving the well-known satisfiability problem (SAT). Wilbaut et al. [24] present an effective backbone-based heuristic algorithm for the multi-dimensional knapsack problem. With the similar strategy, Yang Wang et al. [23] propose a backbone guided tabu search algorithm for the unconstrained binary quadratic programming. As suggested in [23], we consider a variable to be strongly determined if changing its assigned value in a high quality solution will cause the quality of that solution to deteriorate significantly. Essentially, we just borrow the terminology backbone from the SAT literature as a vehicle for naming our procedure. After the local search procedure, some solutions in population have the similar community structures, while the other ones are different. First, we randomly select a solution x i and randomly select a vertex w in graph. Without loss of generality, we assume w ∈ C i in x i , Then, we can find the vertex w belongs to almost the same community in m solutions in assumption, while the community structures of the remaining p − m solutions are different from them, we assume the vertex w ∈ C j (i = j) in p − m solutions.

64

R.-Q. Zeng et al.

Afterwards, we move the vertex w from Ci to Cj in m solutions. Based on Eq. (5), we can compute the total changes by the formula below: T (w, Ci , Cj ) =

m 

Q(w, Cik , Cjk )

(9)

k=1

where ΔQ(w, Cik , Cjk ) denotes the ΔQ value of x k , If ΔT (w, Ci , Cj ) ≥ 0, we keep this move in m solutions, otherwise this move is canceled. We continue this process until the modularity does not improve.

4 Experimental Results In order to evaluate the efficiency of the backbone guided search algorithm, we carry out experiments on six networks with different size from the literature, which include a Zachary karate club network (Zachary) [17], a random generated network (GN) [17], a jazz musician collaborations network (Jazz) [11], USA air network in 1997 (USAir97) [14], a metabolic network for the nematode (C. Elegans) [16] and a university e-mail network (Email) [4]. The exact information of these networks is presented in Table 1. Our backbone guided search algorithm is programmed in C++ and compiled using Dev-C++ 5.0 compiler on a PC running Windows 10 with Core 2.50 GHz CPU and 4 GB RAM. In this section, we provide the computational results on six networks, which are generated by five heuristic algorithms. The information about these five algorithms are described in Table 2 and the computational results are summarized in Table 3. Table 1. The information of the networks. Vertices (n) Zachary [17]

Edges (m)

Description

34

78

GN [17]

128

1024

Random generated network

Jazz [11]

198

2742

Jazz musician collaborations network

USAir97 [14]

332

2126

USA air network in 1997

C. Elegans [16]

453

2052

Metabolic network

1133

5451

A university e-mail network

Email [4]

Zachary karate club network

In Table 3, the columns from the second to the fifth give the results obtained by K-Mean Clustering (K-Mean) [10], Spectral Clustering (Spectral) [10], Newman’s fast algorithm (NF) [20] and the extremal optimization algorithm (EO) [8] respectively. The last column gives the result obtained by BGS. From Table 3, we observe that five best results are obtained by BGS. It evidently outperforms the other four approaches on five networks. Moreover, the most significant result is achieved on the network Email.

Detecting Community Structure in Complex Networks

65

Table 2. The information of the algorithms. Algorithm Description K-Mean [10]

K-Mean clustering algorithm

Spectral [10]

Spectral clustering algorithm

NF [20]

Newman’s fast algorithm

EO [8]

Extremal optimization algorithm

BGS

Backbone guided search algorithm

Actually, the BGS algorithm takes the incremental neighborhood evaluation technique, which saves a considerable amount of computational efforts, especially on the large networks such as C. Elegans and Email. However, the selection of a vertex w usually determines whether the modularity is further improved in backbone guided search procedure. When the selected vertex w is located between two communities, it often has a good chance to obtain a good value of the modularity, whereas the selected vertex w is located near the center of one community, which is difficult to improve the modularity, especially on the generated network Jazz where the result obtained by BGS is worse than the result obtained by EO. Table 3. The computational results on six networks. K-Mean

Spectral

NF

EO

BGS

0.1255

0.3600

0.3810

0.4188

0.4196

GN

0.3052

0.1830

0.2370

0.2987

0.3076

Jazz

0.3765

0.2899

0.4379

0.4452

0.4382

USAir97

0.2876

0.2933

0.3034

0.3578

0.3672

C. Elegans

0.1156

0.2209

0.4001

0.4342

0.4460

Email

0.3682

0.3306

0.4796

0.5738

0.5807

Zachary

5 Conclusions In this paper, we have investigated a backbone guided search algorithm for maximizing the modularity of the community structure in complex networks. The proposed algorithm is mainly composed of the local search procedure and the backbone guided search procedure, in which the modularity function value is evidently improved. We have carried out the experiments on six networks, and the experimental results indicate the BGS algorithm is more effective than the classical methods. The performance analysis provides several future research directions. The first possibility is to employ tabu search to optimize the initial population instead of local search

66

R.-Q. Zeng et al.

as done in [17]. With the fast incremental neighborhood evaluation technique, it may make the modularity improved more efficiently. Second, it is worth further studying backbone guided search mechanism applied to the modularity, which has the potential to jump out of the local optima more quickly. Finally, it should be very interesting to integrate other quality functions with the modularity into metaheuristics for detecting community structure in complex networks, since it is not true in general to obtain better partitions with higher values of the modularity [10]. Acknowledgments. The work in this paper was supported by the Fundamental Research Funds for the Central Universities (Grant No. KYTZ202257) and supported by the West Light Foundation of Chinese Academy of Science (Grant No. Y4C0011006).

References 1. Alves, N.A.: Unveiling community structures in weighted networks. Phys. Rev. E 76(3), 036101 (2007) 2. Arenas, A., Fernandez, A., Gomez, S.: Analysis of the structure of complex networks at different resolution levels. New J. Phys. 10(5), 053039 (2008) 3. Boettcher, S., Percus, A.G.: Optimization with external dynamics. Phys. Rev. Lett. 86, 5211– 5214 (2001) 4. Boguna, M., Pastor-Satorras, R., Arenas, A.: Models of social networks based on social distance attachment. Phys. Rev. E 70, 056122 (2004) 5. Brandes, U., et al.: Maximizing modularity is hard, p. 0608255. arXiv:physics (2006) 6. Cai, Q., Gong, M., Ma, L., Ruan, S., Yuan, F., Jiao, L.: Greedy discrete particle swarm optimization for large scale social network clustering. Inf. Sci. 316, 503–516 (2015) 7. Cai, Q., Gong, M., Ruan, S., Miao, Q., Du, H.: Network structural balance based on evolutionary multiobjective optimization: a two-step approach. IEEE Trans. Evol. Comput. 19(6), 903–916 (2015) 8. Duch, J., Arenas, A.: Community detection in complex networks using extremal optimization. Phys. Rev. E 72, 027104 (2005) 9. Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.M.: Self-organization and identification of web communities. IEEE Comput. Soc. 35, 66–71 (2002) 10. Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010) 11. Gleiser, P.M., Danon, L.: Community structure in Jazz. Adv. Comp. Syst. 6(4), 565–573 (2003) 12. Gong, M., Cai, Q., Chen, X., Ma, L.: Complex network clustering by multiobjective discrete particle swarm optimization based on decomposition. IEEE Trans. Evol. Comput. 18(1), 82–97 (2014) 13. Guimera, R., Sales-Pardo, M., Amaral, L.A.N.: Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E 70, 025101 (2004) 14. Hamid, I., Wu, Y., Nawaz, Q., Zhao, R.: A fast heuristic detection algorithm for visualizing structure of large community. J. Comput. Sci. 25, 280–288 (2018) 15. Hastie, L., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Berlin (2001). https://doi.org/10.1007/978-0-387-84858-7 16. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N., Barab’asi, A.L.: The large-scale organization of metabolic networks. Nature 407, 651–654 (2000) 17. Lü, Z.P., Huang, W.Q.: Iterated Tabu search for identifying community structure in complex networks. Phys. Rev. E 80, 026130 (2009)

Detecting Community Structure in Complex Networks

67

18. Mitrovic, M., Tadic, B.: Spectral and dynamical properties in classes of sparse networks with mesoscopic inhomogeneities. Phys. Rev. E 80(2), 026123 (2009) 19. Nascimento, M.C.V., Pitsoulis, L.: Community detection by modularity maximization using grasp with path relinking. Comput. Oper. Res. 40, 3121–3131 (2013) 20. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004) 21. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004) 22. Richardson, T., Mucha, P.J., Porter, M.A.: Spectral tripartitioning of networks. Phys. Rev. E 80(3), 036111 (2009) 23. Wang, Y., Lü, Z.P., Glover, F., Hao, J.K.: Backbone guided Tabu search for solving the UBQP problem. J. Heuristics 19, 679–695 (2013) 24. Wilbaut, C., Salhi, S., Hanafi, S.: An iterative variable-based fixation heuristic for the 0-1 multidimensional Knapsack problem. Eur. J. Oper. Res. 199(2), 339–348 (2009)

Swarm Intelligence and Optimization

Large-Scale Multi-objective Evolutionary Algorithms Based on Adaptive Immune-Inspirated Weiwei Zhang1 , Sanxing Wang1 , Chao Wang2(B) , Sheng Cui3 , Yongxin Feng4 , Jia Ding4 , and Meng Li5 1 School of Computer and Communication Engineering, Zhengzhou University of Light

Industry, Zhengzhou 450000, China 2 Technology R&D Center, Jilin Tobacco Industry Co., Ltd., Changchun 130031, China

[email protected]

3 Liuzhou Cigarette Factory, Tobacco Guangxi Industry Co., Ltd., Liuzhou 545005, China 4 Raw Materials Department, Tobacco Hebei Industrial Co., Ltd., Shijiazhuang 050051, China 5 School of Food and Bioengineering, Zhengzhou University of Light Industry,

Zhengzhou 450001, China

Abstract. This paper proposes an adaptive immune-inspired algorithm to tackle the issues of insufficient diversity and local optima in large-scale multi-objective optimization problems. The algorithm utilizes immune multi-objective evolutionary algorithm as a framework and adaptively selects two different antibody generation strategies based on the concentration of high-quality antibodies. Among them, one approach utilizes the proportional cloning operator to generate offspring, which ensures convergence speed and population diversity, preventing the algorithm from getting trapped in local optimization. The other approach introduces a competitive learning strategy to guide individuals towards the correct direction in the population. Additionally, the proposed algorithm employs a displacement density-based strategy to determine the antibody status. Experimental results demonstrate that the proposed algorithm outperforms five stateof-the-art multi-objective evolutionary algorithms in large-scale multi-objective optimization problems with up to 500 decision variables. Keywords: Large-scale multi-objective optimization · proportional cloning adaptive immunity · competitive learning

1 Introduction Multi-objective optimization problems (MOPs) are commonly seen in real-world applications [1], such as industrial scheduling, bioinformatics, fighter vehicle control, and artificial intelligence, to name a few. While mathematical planning methods can quickly converge to a single optimum in a high-dimensional decision space, population-based evolutionary algorithms are better at obtaining a set of optimal solutions among multiple conflicting objectives. Multiple conflicting objectives are involved in MOPs to be © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 71–84, 2023. https://doi.org/10.1007/978-981-99-4755-3_7

72

W. Zhang et al.

optimized simultaneously. Among them, multi-objective genetic algorithms, differential evolutionary algorithms, swarm algorithms and multi-objective immune algorithms have been of interest to many scholars for their good convergence, simple computation, and fewer parameter settings. However, in practical engineering applications, such as space mission design [2], feature selection [3], and reverse engineering of gene regulation networks [4], it often needs to deal with optimization problems that involve more than one objective and thousands or even millions of decision variables at the same time. Existing MOEAs have been well assessed on MOPs with a small number of decision variables, but their performance degenerates dramatically on MOPs with large numbers of decision variables. It is mainly attributed to the “curse of dimensionality” [5], where the volume, as well as complexity, of the search space will increase exponentially as the number of decision variables increases linearly. To address this problem, existing scholars have attempted to improve the original MOP algorithms used to solve LSMOPs. In the third generation of the cooperative coevolution differential evolution algorithm (CCGDE3) [6], the differential evolution algorithm introduces the cooperative coevolution framework (CC). It decomposes the decision variables into multiple groups, and each group of decision variables undergoes cooperative evolution. In the large-scale multiobjective evolutionary algorithm (LMEA) based on variable clustering [7], decision variables are classified into two categories, convergent variables, and diversity variables, and then alternate iterations are performed to optimize the different types of variables using genetic evolutionary algorithms or differential evolutionary algorithms. LSMOPs are transformed into small-scale optimization problems using the idea of problem transformation in directional sampling-assisted large-scale evolutionary multi-objective optimization (LMOEA-DS) [8] and solved optimally using traditional genetic algorithms. A social learning strategy is used in the Social Learning-based particle swarm optimization (SLPSO) algorithm [9], where each dimension of each particle is learned from any particle that is better than it and the corresponding dimension of the average position of the population. In the large-scale optimization algorithm based on competing particle swarms (CSO) [10], a competitive mechanism is used in which the loser learns from the successor in the generation of offspring to update the position and velocity of the particles. Besides, generative adversarial networks and distributional adversarial networks are utilized to generate offspring approximating the true PS by learning the probability distributions of current solutions. All these improved methods address to some extent the problem that LSMOPs are too large in space, difficult to search, and prone to fall into local optima. Therefore, combining techniques for dealing with high-dimensional decision spaces with multi-objective evolutionary algorithms is a straightforward and effective approach to solving LSMOPs. In addition, nature-inspired algorithms, i.e. immuneinspired algorithms, with their uncommon convergence capability and ability to locate multiple local optima, also have significant advantages in dealing with large-scale multiobjective problems. Thus, integrating the two would have great potential to deal with the curse of dimensionality in LSMOP. In this paper, competitive learning is combined with an immune heuristic algorithm. It can improve the global search capability in large-scale decision spaces. Furthermore, the advantage of immune multi-objective evolutionary algorithms is that they can increase the speed of convergence, and the disadvantage is that they can

Large-Scale Multi-objective Evolutionary Algorithms

73

reduce the diversity of the population to some extent. Therefore, it is a crucial issue to find the Pareto optimal set of solutions in the decision space and thus increase the diversity of solutions in the objective space. This requires the algorithm to overcome the drawback of early maturity and to have strong local search capability. To solve the above problems, we propose an adaptive immunization strategy in the framework of immune evolutionary algorithms and incorporate proportional cloning operators to regulate the population distribution to avoid falling into local optima. At the same time, multi-objective optimization based on the shift-based density estimation (SDE) [11] is combined in CSO, and the winners and losers involved are redefined. To verify the effectiveness of the algorithm, experiments is conducted using LSMOP test functions, and the simulation results are compared with existing algorithms in the literature. The main contributions of this paper are summarized as follows: (1) The immune-inspired algorithm is improved and a proportional cloning strategy is used so that the population adaptively chooses two different antibody production strategies under different circumstances. This ensures the speed of convergence of the algorithm while taking into account the diversity of the populations and avoiding the algorithm falling into a local optimum. (2) A competitive learning strategy is introduced and losers and winners are redefined according to the degree of antibody incentive. Both ensure that individuals search in the right direction within the population, and at the same time ensure that the population converges. (3) The shift-based density estimation (SDE) strategy is applied in this algorithm as the excitation degree of the antibody. This alleviates the conflict between multiple objectives and thus gradually improves the quality of the solution.

2 Related Works In this section, the mathematical definition of large-scale multi-objective optimization is first introduced. Then, the immune-inspired algorithm and the competing particle swarm algorithm are presented respectively. 2.1 Large-Scale Multi-objective Optimization Generally, the mathematical formulation of a multi-objective optimization problem MOP can be represented as: Minimize: F(x) = (f1 (x), f2 (x), · · · , fM (x))T

(1)

where x = (x1 , · · · , xD ) ∈  is the decision variable, D is the dimension of decision space , and M is the number of objectives. Instead of a single optimal solution, there is a set of trade-off solutions in multi-objective optimization called Pareto-optimal front (PF) in the objective space, and the corresponding solutions in decision space are denoted as Pareto-optimal set (PS) It is called large-scale multi-objective optimization problems (LSMOPs) when the dimension of decision space D is large, normally over 100 [12].

74

W. Zhang et al.

2.2 Immune Optimization Algorithm Inspired by the clonal selection principle, the immune-inspired algorithm is proposed to tackle some complex optimization problems. To solve the multi-objective optimization problem, the immune-inspired algorithm involves three main operations, including cloning [7], mutation [13], and selection [14]. Where the antibody concentration indicates the good or bad diversity of the antibody population. According to the clonal selection principle, the antibodies with higher antibodyantigen affinity are capable of producing multiple clones. There is a variety of options to perform the cloning selection in real-world applications. Among them, the proportional cloning operation [15, 16] is one of the most popular methods. Assuming that P = {x1 , x1 , · · · , xN } is a population to conduct the cloning operator, where N denotes the antibodies number. The proportional cloning operation T C is defined as follows: T C (x1 , x2 , · · · , xN ) = {T C (x1 ), T C (x2 ), · · · , T C (xN )} ⎧ ⎫ ⎪ ⎪ ⎨ ⎬ C (x1 , x1 , · · · , x1 ), (x2 , x2 , · · · , x2 ), · · · , (xN , xN , · · · , xN ) =T     ⎪  ⎪ ⎩ ⎭ q1

q2

(2)

qn

where T C (xi ) = qj × xi , and j = 1, · · · , N . Here qj indicates the number of clones. 2.3 Competitive Swarm Optimizaion Competitive swarm optimization (CSO) is a relatively new swarm algorithm that differs from PSO, and has been shown to be effective in solving large-scale single-objective problems in recent years. In CSO, two particles are randomly selected at a time, and the velocity of the particle with the worse fitness value is updated based on the position of the particle with the better fitness value. The velocity and position update methods are described in Eqs. (4) and (5), respectively. vl,k (t + 1) = R1 (k, t)vl,k + R2 (k, t)(xw,k (t) − xl,k (t)) + ϕR3 (k, t)(xk (t) − xl,k (t)) (3) xl,k (t + 1) = xl,k (t) + vl,k (t + 1)

(4)

Where xl,k (t), vl,k (t), xw,k (t) xw,k (t), and vw,k (t) are the positions and velocities of the winners and losers in the k-th round of competition in the t-th generation, respectively, and N is the population size, R1 , R2 , R3 are three uniformly randomly distributed values in [0, 1]. xk (t) is the mean position of all particles in the swarm and ϕ is a parameter for controlling the influence of xk (t).

Large-Scale Multi-objective Evolutionary Algorithms

75

3 Proposed Method This section presents an adaptive immunization large-scale multi-objective evolutionary algorithm (LMOIA). The algorithm is designed to combine the original immunization algorithm with an adaptive mechanism that dynamically maintains high-quality antibody concentrations. To improve the local search capability, the adaptive mechanism utilizes proportional cloning. Additionally, a competitive learning update strategy is introduced to preserve the diversity of the population. 3.1 The Framework of LMOIA LMOIA follows a simple procedure, which is detailed in Algorithm 1. It begins by randomly initializing two populations and a set of uniformly distributed reference vectors. The populations then undergo iterative evolution (steps 1–2). In each generation, the concentration of quality antibodies is first calculated (step 4). Note that the quality antibody is defined as the non-dominated solution in the population, and the concentration NC represents the proportion of the whole population accounted for by the f non-dominated solution. Next, a set of offspring solutions Q is generated by our adaptive immunization operation, and the resulting offspring population is combined with the current population (step 5). The adaptive immunization operation uses proportional cloning to further improve the local search capability, and a competitive learning update strategy is introduced to maintain the diversity of the population. Finally, the environmental selection strategy is used to update the combined population (step 6). The process is repeated until the algorithm meets the end condition.

3.2 Adaptive Immunization Strategy In this paper, a new adaptive immunization operation is proposed, as shown in Algorithm 2, which aims to dynamically maintain the concentration of high-quality antibodies in this population in order to improve the overall solution quality. Our proposed strategy consists of two main parts, the first focusing on enhancing convergence, and the second

76

W. Zhang et al.

on maintaining diversity. Firstly, the shift-based density estimation (SDE) is calculated as the incentive degree of the antibodies, as shown in Eq. (6), and the antibodies are then ranked based on the magnitude of the incentive degree. Fit(xj , P) = min(Fit(d (xj , x1 ), d (xj , x2 ), · · · , d (xj , xN )))

(5)

M  d (xj , x1 ) =  (max{0, fi (xj ) − fi (xn )})2

(6)

i=1

where xj and xn are the j-th and n-th individuals in the contemporary population P, respectively. Then, adaptive selection of updated offspring is based on the concentration of quality antibodies. Specifically, if NC falls below the threshold value NT , we consider that there are too few quality antibodies, and the number of high-quality antibodies needs to be increased. In this case, we select high-quality antibodies from the population for proportional cloning. On the other hand, if the concentration of quality antibodies is above NT , we use a competitive particle population to renew the population and maintain diversity.

The motivation for this adaptive immunization strategy is to prevent the population from stagnating at a local optimum, which is particularly relevant in high-dimensional search spaces. The proposed strategy involves selecting high-quality antibodies as clones to facilitate population convergence, followed by renewing weaker individuals with a competitive particle population for diversity maintenance when NT becomes too large. This approach is expected to maintain a balance between population diversity and convergence. 3.3 Immune-Inspired Strategy In order to improve the ability of the population to converge and to jump out of the local optimum easily, antibodies with a high degree of excitation were selected for proportional

Large-Scale Multi-objective Evolutionary Algorithms

77

cloning. Specifically, we achieved population modulation of the concentration of highquality antibodies through proportional cloning. When the concentration is too small, i.e. the number of non-dominated solutions is small, the population tends to search in a minority direction. This is very detrimental to global search. Proportional cloning uses Eq. (2), with the number of clones defined as follows:   cd (xj , P ) qj = Cn × N (7) k=1 (xk , P ) where Cn denotes the clone scale and cd (xj , P ) denotes the degree of antibody crowding in the target space. Note that the size of the population changes dynamically from generation to generation, so take Cn = NT ∗ N to dynamically maintain the number of quality antibodies. This mechanism for dynamically maintaining high-quality antibodies d is of great interest in preventing populations from stagnating at local optima. 3.4 Competitive Learning Strategy In the LMOIA algorithm, the competitive learning strategy is introduced with the aim of increasing the global search capability of the population to improve the diversity of the population. Due to the existence of two solutions that are non-dominated by each other in multi-objective optimization [17], the winners and losers in CSO need to be redefined. Specifically, we divide the population equally into two subpopulations Pwinner and Ploser based on the SDE of the incentive, where all individuals in subpopulation Pwinner have better incentive values than those in subpopulation Ploser , i.e. all individuals in Pwinner are winners and all individuals in Ploser are losers. In addition, this paper simplifies the update method in CSO as shown in Eq. (10). v = xPwinner − xPloser

(8)

xPloser = xPloser + R ∗ v

(9)

where xPwinner ∈ Pwinner , xPloser ∈ Ploser , and R is a random number between [0–1]. 3.5 Environment Selection This paper uses the angle-penalized distance (APD) in the environment selection to assess the quality of the solution [18]. This environmental selection strategy first associates the particles in the population P with a reference vector w. Then, based on the angle between the particle and the reference vector, its closest reference vector is obtained, and finally a particle with the APD is selected from all particles associated with the same reference vector. Specifically, the convergence criterion is measured in APD by the distance between the candidate solution and the ideal point, and the diversity criterion is measured by the acute angle between each candidate solution and the reference vector. The APD is given in Eq. (11) to assess the quality of the solution. −  ⎞ ⎛ → − → −  f ( x), w FE α →   → −  ⎠ ×  f ( x) (10) ) × APD(x, − w ) = ⎝1 + M × ( → − → MaxFE → → mins∈w,− s =− w, s , w

78

W. Zhang et al.

− → where f ( x) denotes the target vector of x; M is the target number; FE and MaxFE are current respectively. α is the penalty parameter, the  and maximum number of evaluations, − → → → → s ,− w represents the angle between vectors − s and − w.

4 Experimental Studies In this section, a series of experiments were conducted to empirically investigate the performance of the proposed adaptive immunization algorithm. The first part aims to verify the effectiveness of the particle update strategy in LMOIAs. In the second part, the proposed LMOIA was compared with several state-of-the-art MOEAs on largescale benchmark MOPs. To evaluate the effectiveness of the proposed algorithm, the experiments were implemented on the well-known benchmark suite LSMOP. Detailed information about these test problems can be found in [19]. 4.1 Experimental Setting For benchmark functions, the objective number is set to 3 with the decision variable numbers varying from 100 to 500. For a fair comparison, the population size N is set at 153 for all LSMOPs, and total number of function evaluations is adopted as the termination condition for all the test instances, i.e., 15000 × D for a problem with D decision variables. For each test instance, each algorithm was run 20 times independently and we used the Wilcoxon rank sum test to compare the statistical results of the LMOIA and the comparison algorithms. Two commonly used performance metrics, the inverted generational distance (IGD) [20] and hypervolume (HV) [21], are adopted to evaluate the performance of the compared algorithms in this study. A smaller IGD value indicates better algorithm performance, while the opposite is true for the HV indicator. 4.2 Discussions of the Proposed Strategies In this section, we conduct ablation studies to evaluate the effectiveness of our proposed adaptive strategy. Two versions of ablation without the adaptive strategy are compared, namely LMOIA-PC and LMOIA-CSO. LMOIA-PC refers to a proportional clonal immunization algorithm only, while LMOIA-CSO refers to an update iteration using all competing particle populations. We conducted 20 independent tests with 100 decision variables on the triple objective LSMOP9, using the IGD metric to investigate the behavior of comparative reproductive strategies during population evolution. Figure 1 shows the mean values of IGD for the three comparison algorithms on each problem.

Large-Scale Multi-objective Evolutionary Algorithms

79

1 0.9

LMOIA-PC LMOIA-CSO LMOIA

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

LSMOP1

LSMOP2

LSMOP3

LSMOP4

LSMOP5

LSMOP6

LSMOP7

LSMOP8

LSMOP9

Fig. 1. IGD values implemented by LMOIA-PC, LMOIA-CSO and LMOIA on LSMOP1 to LSMOP9.

4.3 Comparisons Between LMOIA and Existing MOEAs Table 1 presents the IGD values of six MOEAs compared on the three-objective LSMOP 1-LSMOP9, each with 100, 200, and 500 decision variables. The proposed LMOIA outperforms the other MOEAs in terms of overall performance. In particular, LMOIA achieves the best results in 17 out of 27 test instances, followed by DGEA, NSGAII, and RVEA. Wilcoxon rank-sum test indicates that LMOIA performs better or similarly to the other MOEAs in 24/27, 25/27, 27/27, 22/27, and 23/27 test instances, respectively, compared to NSGAII, RVEA, CCGDE3, DGEA, and LMOEADS. It is noteworthy that LMOIA outperforms CCGDE3 and LMOEADS in all instances. The reason being that CCGDE3 is based on a co-evolutionary framework, and inappropriate grouping may affect its performance negatively. LMOEADS, on the other hand, is only competitive with a small number of evaluations. Table 2 lists the HV values of six compared MOEAs on the three-objective LSMOP 1-LSMOP9. Unlike Table 1, LMOIA achieves the best results in 15 out of 27 test instances, followed by NSGAII and DGEA. Table 1. Statistics of IGD values achieved by six compared algorithms on 27 test instances. Problem

D

NSGAII

RVEA

CCGDE3

DGEA

LMOEADS

LMOIA

LSMOP1

100

1.8261e–1 (1.26e–2) –

1.6409e–1 (1.96e–2) –

4.4533e + 0 (6.59e–1) –

1.7983e –1 (2.85e –2) –

1.9753e –1 (3.02e –2) –

4.1488e-2 (9.26e-3)

200

2.3383e–1 (2.51e–3) –

1.8417e–1 (1.42e–2) –

6.6714e + 0 (3.90e–1) –

2.5688e –1 (1.07e –2) –

2.3874e –1 (5.31e –2) –

6.9250e-2 (1.12e-2)

500

2.7956e–1 (5.32e–3) –

1.9147e–1 (6.17e–3) –

6.3270e + 0 3.0101e –1 (1.21e + 0) – (2.99e –3) –

2.9260e –1 (9.43e –3) –

6.0218e-2 (8.54e-3)

100

1.4600e–1 (2.30e–3) –

8.0118e–2 (9.29e–3) –

2.1610e –1 (5.60e –3) –

8.2972e –2 (2.10e –3) –

5.8716e-2 (1.87e-3)

LSMOP2

7.3248e-2 (6.33e-3) –

(continued)

80

W. Zhang et al. Table 1. (continued)

Problem

LSMOP3

LSMOP4

LSMOP5

LSMOP6

LSMOP7

LSMOP8

D

NSGAII

RVEA

CCGDE3

DGEA

LMOEADS

LMOIA

200

9.8383e–2 (1.83e–3) –

5.9704e–2 (1.04e–3) –

1.3253e –1 (4.92e –3) –

4.9865e –2 (7.12e –4) =

5.8783e –2 (1.99e –3) –

4.8950e-2 (8.09e-4)

500

6.2858e–2 (1.32e–3) –

4.2360e–2 (7.30e–4) –

7.3140e –2 (4.21e –3) –

3.6595e-2 (5.56e-4) +

4.2137e –2 (1.02e –3) –

3.8772e –2 (5.32e –4)

100

4.5717e–1 (6.73e–2) =

5.5456e–1 (1.00e–1) –

1.1119e + 1 5.1617e –1 (1.44e + 0) – (2.43e –2) =

6.7045e –1 (1.02e –1) –

4.5644e-1 (6.91e-2)

200

5.2877e–1 (3.42e–2) –

8.3482e–1 (7.09e–2) –

1.3874e + 1 5.2958e –1 (1.25e + 0) – (1.65e –2) –

7.7500e –1 (1.82e –2) –

4.8064e-1 (4.13e-2)

500

5.2997e–1 (1.96e–2) =

1.0367e + 0 (1.33e–1) –

1.7238e + 1 5.4086e –1 (1.77e + 0) – (1.40e –2) =

8.2367e –1 (5.06e –2) –

4.7860e-1 (6.89e-2)

100

3.0440e–1 (2.71e–2) –

1.6362e–1 (1.65e–2) –

5.5295e –1 (1.29e –2) –

1.4574e –1 (1.07e –2) –

1.9569e –1 (1.76e –2) –

7.2462e-2 (1.32e-2)

200

2.9122e–1 (5.11e–3) –

1.5616e–1 (1.79e–2) –

3.7988e –1 (6.17e –3) –

1.2777e –1 (1.19e –2) –

1.2939e –1 (3.04e –3) –

4.5697e-2 (6.18e-3)

500

1.5473e–1 (2.21e–3) –

8.6027e–2 (3.53e–3) –

2.0734e –1 (5.85e –3) –

8.0633e –2 (1.89e –3) –

9.5828e –2 (2.07e –3) –

3.8994e-2 (2.48e-3)

100

3.2593e–1 (1.94e–2) –

9.4592e–1 (7.85e–7) –

4.9097e + 0 2.0345e-1 (2.11e + 0) – (3.05e-2) =

2.6171e –1 (5.03e –2) –

2.1343e –1 (2.89e –1)

200

3.3845e–1 (3.57e–3) –

9.4592e–1 (3.90e–7) –

1.0125e + 1 2.2671e –1 (1.51e + 0) – (2.52e –3) =

2.8350e –1 (6.67e –3) –

1.4167e-1 (1.01e-1)

500

3.3043e–1 (4.84e–3) =

9.4592e–1 1.0953e + 1 2.3076e-1 (4.09e–10) – (4.27e + 0) – (3.69e-3) =

2.7869e –1 (6.29e –3) =

5.5410e –1 (4.72e –1)

100

1.0620e + 0 (2.53e–1) –

9.8115e–1 (1.31e–1) –

6.9418e + 2 5.4736e-1 (5.03e + 2) – (6.80e-2) +

6.0949e –1 (4.83e –2) +

7.1042e –1 (1.36e –1)

200

8.6862e–1 (6.09e–2) =

1.2332e + 0 (3.24e–1) =

9.0432e + 3 5.6259e-1 (2.95e + 3) – (9.48e-3) =

8.5207e –1 (1.96e –1) =

8.8827e –1 (3.36e –1)

500

7.4330e–1 (2.54e–2) =

1.5583e + 0 (7.17e–1) =

1.8633e + 4 5.8059e-1 (7.88e + 3) – (1.85e-2) =

7.6938e –1 (3.06e –2) =

8.1439e –1 (4.48e –1)

100

6.6965e-1 (1.11e-1) +

9.4593e–1 (3.07e–6) +

2.6198e + 0 (2.65e-1) –

7.2167e –1 (4.62e –2) +

7.3039e –1 (2.23e –2) +

9.4641e –1 (4.62e –4)

200

5.8895e-1 (4.71e-2) +

9.3462e–1 (2.53e–2) =

1.7434e + 0 (1.09e-1) –

8.0184e –1 (1.09e –2) +

8.3652e-1 (1.85e-2) +

9.4369e –1 (5.10e –3)

500

4.8591e-1 (2.40e-2) +

9.4593e–1 (8.57e–7) +

1.3035e + 0 (2.50e-2) –

7.1702e –1 (1.88e –1) +

8.5017e –1 (1.18e –3) +

9.4903e –1 (5.12e –3)

100

3.6047e–1 (2.17e–3) –

8.0274e–1 (2.88e–1) –

9.1165e –1 (1.37e –1) –

1.6785e –1 (4.64e –2) –

2.9714e –1 (6.20e –2) –

8.6545e-2 (7.33e-3)

200

3.6450e–1 (1.08e–3) –

2.9105e–1 (2.75e–2) –

9.2864e –1 (8.14e –2) –

7.2819e –2 (4.01e –3) =

1.8848e –1 (9.29e –2) –

6.6756e-2 (2.95e-3)

500

2.7549e–1 (6.81e–2) –

2.5177e–1 (5.85e–2) –

8.6101e –1 (1.29e –1) –

6.1756e –2 (1.29e –3) –

1.2325e –1 (1.58e –2) –

5.6779e-2 (4.38e-3)

(continued)

Large-Scale Multi-objective Evolutionary Algorithms

81

Table 1. (continued) Problem

D

NSGAII

RVEA

CCGDE3

LMOEADS

LMOIA

LSMOP9

100

9.2296e–1 (2.87e–1) –

4.6368e-1 (8.28e-2) =

3.2191e + 1 5.1191e –1 (7.94e + 0) – (4.55e –2) =

5.8604e –1 (1.90e –3) –

4.8986e –1 (6.56e –2)

200

1.0339e + 0 (2.49e–1) –

4.5154e–1 (2.50e–2) –

3.8979e + 1 5.2780e –1 (1.15e + 1) – (5.99e –2) –

5.8125e –1 (2.76e –3) –

2.8250e-1 (6.02e-2)

500

8.1064e–1 (3.05e–1) –

4.1061e–1 (1.05e–1) –

4.2037e + 1 5.8778e –1 (2.08e + 0) – (1.57e –1) –

5.7751e –1 (1.45e –3) –

1.5059e-1 (6.40e-2)

3/19/5

2/21/4

0/27/0

4/20/3

±/=

DGEA

5/12/10

Table 2. Statistics of HV values achieved by six compared algorithms on 27 test instances. Problem

D

NSGAII

RVEA

CCGDE3

DGEA

LMOEADS LMOIA

LSMOP1 100 6.3729e–1 6.5780e–1 (3.87e –2) (3.32e –2) – –

0.0000e + 6.6207e–1 6.1816e–1 0 (0.00e + (3.43e–2) (6.19e–2) – 0) – –

8.3239e-1 (1.18e-2)

200 5.9028e–1 6.2001e–1 (2.80e –2) (2.90e –2) – –

0.0000e + 5.4119e–1 5.5590e–1 0 (0.00e + (1.52e–2) (9.82e–2) – 0) – –

7.8935e-1 (1.78e-2)

500 6.1801e–1 6.2156e–1 (1.72e –3) (1.06e –2) – –

0.0000e + 4.8407e–1 4.7686e–1 0 (0.00e + (2.79e–3) (1.94e–2) – 0) – –

8.0782e-1 (1.12e-2)

LSMOP2 100 7.0750e–1 7.8831e–1 (4.20e –3) (9.54e –3) – –

6.0662e–1 7.9773e–1 7.7946e–1 (4.67e–3) (7.42e–3) (2.31e–3) – – –

8.0989e-1 (2.21e-3)

200 7.5987e–1 8.0887e–1 (2.77e –3) (6.90e –4) – –

7.1764e–1 8.2211e-1 (6.63e–3) (7.98e-4) – =

8.0299e–1 (5.27e–3) –

8.2203e–1 (9.37e–4)

500 7.9998e–1 8.2933e–1 (2.91e –3) (8.90e –4) – –

7.8913e–1 8.3779e-1 (5.24e–3) (8.56e-4) – +

8.2080e–1 (2.11e–3) –

8.3440e–1 (7.73e–4)

LSMOP3 100 4.2100e-1 (5.32e-2) +

2.4275e–1 (5.59e –2) =

0.0000e + 2.3526e–1 1.2345e–1 0 (0.00e + (2.85e–2) (7.71e–2) – 0) – =

2.8781e–1 (9.15e–2)

200 2.8301e-1 (4.66e-2) =

2.1932e–2 (1.80e –2) –

0.0000e + 1.8999e–1 9.2555e–2 0 (0.00e + (1.77e–2) (1.14e–3) – 0) – –

2.6303e–1 (3.99e–2) (continued)

82

W. Zhang et al. Table 2. (continued)

Problem

D

NSGAII

RVEA

CCGDE3

1.7455e–2 (3.90e–2) –

0.0000e + 1.8887e–1 0 (0.00e + (2.12e–3) 0) – =

9.1278e–2 (4.64e–4) –

2.6890e–1 (6.68e–2)

LSMOP4 100 5.1642e–1 6.6755e–1 (4.37e –2) (2.44e–2) – –

2.1132e–1 7.2267e–1 (1.42e–2) (1.29e–2) – –

6.4039e–1 (2.07e–2) –

7.9695e-1 (1.46e-2)

200 4.9575e–1 6.8889e–1 (9.86e –3) (2.13e–2) – –

4.0118e–1 7.4387e–1 (7.96e–3) (1.47e–2) – –

7.1980e–1 (3.66e–3) –

8.2640e-1 (7.41e-3)

500 6.8783e–1 7.7826e–1 (2.91e –3) (4.19e–3) – –

6.2098e–1 7.9173e–1 (8.27e–3) (2.07e–3) – –

7.6613e–1 (2.44e–3) –

8.3453e-1 (3.38e-3)

LSMOP5 100 4.0594e–1 9.0909e–2 (2.54e –3) (1.14e–9) = –

0.0000e + 4.1264e–1 0 (0.00e + (2.27e–2) 0) – =

4.0882e–1 4.2840e-1 (1.14e–2) = (1.81e-1)

200 4.0425e–1 9.0909e–2 (1.61e –3) (1.36e–9) = –

0.0000e + 3.9069e–1 0 (0.00e + (9.46e–5) 0) – =

4.0422e–1 4.4327e-1 (1.89e–3) = (1.11e-1)

9.0909e–2 0.0000e + 3.9017e-1 (7.86e–10) 0 (0.00e + (4.74e-5) = 0) – =

4.0236e–1 2.8355e–1 (1.11e–3) = (2.57e–1)

LSMOP6 100 6.4755e–3 7.8715e–3 (2.05e –2) (1.47e–2) – =

0.0000e + 3.9496e-2 0 (0.00e + (3.48e-2) 0) – =

1.7376e–2 3.6971e–2 (6.87e–3) = (5.21e–2)

200 2.3462e–3 1.6131e–2 (3.19e –3) (3.61e–2) = =

0.0000e + 6.1100e-2 0 (0.00e + (6.66e-3) 0) = +

4.1843e–3 1.8746e–2 (4.13e–3) = (2.59e–2)

500 5.2185e–2 1.8749e–2 (2.38e –2) (2.80e–2) = =

0.0000e + 6.1559e–2 0 (0.00e + (2.71e–2) 0) – =

1.1506e–2 7.8527e-2 (1.51e–2) = (5.02e-2)

LSMOP7 100 2.5539e–2 9.0897e-2 (3.80e –2) (7.73e-6) – +

0.0000e + 1.5067e–3 0 (0.00e + (2.44e–3) 0) – –

0.0000e + 0 8.9811e–2 (0.00e + 0) (5.88e–4) –

500 2.7197e-1 (2.44e-2) =

500 4.0554e-1 (4.25e-4) =

DGEA

LMOEADS LMOIA

200 8.9408e–2 8.8801e–2 (5.24e –2) (4.70e–3) = =

0.0000e + 0.0000e + 0.0000e + 0 8.9882e-2 0 (0.00e + 0 (0.00e + (0.00e + 0) (1.08e-3) 0) – 0) – –

500 1.4889e-1 (3.10e-2) +

0.0000e + 6.2474e–2 0 (0.00e + (9.76e–2) 0) – =

9.0907e–2 (2.36e–6) +

0.0000e + 0 8.5116e–2 (0.00e + 0) (9.48e–3) – (continued)

Large-Scale Multi-objective Evolutionary Algorithms

83

Table 2. (continued) Problem

D

NSGAII

RVEA

CCGDE3

DGEA

LMOEADS LMOIA

LSMOP8 100 3.7777e–1 1.1704e–1 (5.95e –3) (9.19e–2) – –

0.0000e + 4.3571e–1 4.0047e–1 0 (0.00e + (3.61e–2) (1.41e–2) – 0) – –

4.9993e-1 (1.21e-2)

200 3.5734e–1 2.7088e–1 (5.99e –3) (2.36e–2) – –

4.0244e–3 5.0047e–1 4.1300e–1 (3.75e–3) (5.32e–3) (1.02e–2) – – –

5.2702e-1 (6.17e-3)

500 3.5105e–1 3.2170e–1 (3.08e –2) (2.69e–2) – –

5.3023e–2 5.1559e–1 4.4227e–1 (1.21e–3) (2.26e–3) (2.08e–2) – – –

5.3954e-1 (8.34e-3)

LSMOP9 100 1.6505e–1 9.3982e–2 (2.26e –2) (3.74e–2) + =

0.0000e + 5.9579e–2 1.9220e-1 9.1691e–2 0 (0.00e + (8.19e–3) (3.86e-5) + (3.99e–2) 0) – –

200 1.5635e–1 7.6105e–2 (1.97e –2) (1.17e–2) = –

0.0000e + 5.0717e–2 1.9200e-1 1.8100e–1 0 (0.00e + (1.09e–2) (1.70e-4) = (1.88e–2) 0) – –

500 1.7425e–1 1.2298e–1 (2.43e –2) (2.55e–2) –

0.0000e + 5.1464e–2 1.9142e–1 0 (0.00e + (8.46e–3) (1.89e–4) – 0) – –

±/=

3/15/9

2/18/7

0/26/1

2/16/9

2.3693e-1 (3.14e-2)

1/19/7

5 Conclusions In this paper, we propose a large-scale multi-objective evolutionary algorithm LMOIA for large-scale multi-objective problems based on an immune algorithm. Our adaptive immune framework is designed to balance the exploration and development of MOEAs by selecting two different ways to generate antibodies depending on the concentration of high-quality antibodies. Additionally, we introduce a competitive learning strategy that aims to compensate for global optimality-seeking energy to avoid getting trapped in a local optimum. Experimental results show that the proposed LMOIA outperforms several state-of-the-art MOEAs in solving large-scale MOPs and has the advantages of simple structure and low complexity. In our future work, we will apply the model to product optimization and factory scheduling tasks for tobacco companies. Acknowledgement. This work was supported in part by Key research and development and promotion special project of Henan province under Grant 222102210037.

84

W. Zhang et al.

References 1. Zhang, H., Zhang, Q., Ma, L., et al.: A hybrid ant colony optimization algorithm for a multiobjective vehicle routing problem with flexible time windows. Inf. Sci. 490, 166–190 (2019) 2. Everson, R.M., Fieldsend, J.E.: Multi-objective optimization of safety related systems: an application to short-term conflict alert. IEEE Trans. Evol. Comput. 10(2), 187–198 (2006) 3. Maltese, J., Ombuki-Berman, B.M., Engelbrecht, A.P.: A scalability study of many-objective optimization algorithms. IEEE Trans. Evol. Comput. 22(1), 79–96 (2016) 4. Tian, Y., Lu, C., Zhang, X., et al.: Solving large-scale multi-objective optimization problems with sparse optimal solutions via unsupervised neural networks. IEEE Trans. Cybern. 51(6), 3115–3128 (2020) 5. Wang, H., Jiao, L., Shang, R., et al.: A memetic optimization strategy based on dimension reduction in decision space. Evol. Comput. 23(1), 69–100 (2015) 6. Antonio, L.M., Coello, C.A.C.: Use of cooperative coevolution for solving large scale multiobjective optimization problems. In: 2013 IEEE Congress on Evolutionary Computation, pp. 2758–2765. IEEE (2013) 7. Zhang, X., Tian, Y., Cheng, R., et al.: A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization. IEEE Trans. Evol. Comput. 22(1), 97–112 (2016) 8. Qin, S., Sun, C., Jin, Y., et al.: Large-scale evolutionary multi-objective optimization assisted by directed sampling. IEEE Trans. Evol. Comput. 25(4), 724–738 (2021) 9. Cheng, R., Jin, Y.: A social learning particle swarm optimization algorithm for scalable optimization. Inf. Sci. 291, 43–60 (2015) 10. Cheng, R., Jin, Y.: A competitive swarm optimizer for large scale optimization. IEEE Trans. Cybern. 45(2), 191–204 (2014) 11. Li, M., Yang, S., Liu, X.: Shift-based density estimation for Pareto-based algorithms in manyobjective optimization. IEEE Trans. Evol. Comput. 18(3), 348–365 (2013) 12. Ma, X., Liu, F., Qi, Y., et al.: A multi-objective evolutionary algorithm based on decision variable analyses for multi-objective optimization problems with large-scale variables. IEEE Trans. Evol. Comput. 20(2), 275–298 (2015) 13. Huang, Z., Zhou, Y.: Runtime analysis of immune-inspired hypermutation operators in evolutionary multi-objective optimization. Swarm Evol. Comput. 65, 100934 (2021) 14. Caraffini, F., Neri, F., Epitropakis, M.: Hyper SPAM: A study on hyper-heuristic coordination strategies in the continuous domain. Inf. Sci. 477, 186–202 (2019) 15. Gong, M., Jiao, L., Du, H., et al.: Multi-objective immune algorithm with nondominated neighbor-based selection. Evol. Comput. 16(2), 225–255 (2008) 16. Zhang, W., Zhang, N., Zhang, W., et al.: A cluster-based immune-inspired algorithm using manifold learning for multimodal multi-objective optimization. Inf. Sci. 581, 304–326 (2021) 17. Yue, C., Qu, B., Liang, J.: A multi-objective particle swarm optimizer using ring topology for solving multimodal multi-objective problems. IEEE Trans. Evol. Comput. 22(5), 805–817 (2017) 18. Deb, K.: Multi-objective genetic algorithms: problem difficulties and construction of test problems. Evol. Comput. 7(3), 205–230 (1999) 19. Cheng, R., Jin, Y., Olhofer, M.: Test problems for large-scale multiobjective and manyobjective optimization. IEEE Trans. Cybern. 47(12), 4108–4121 (2016) 20. Zhou, A., Jin, Y., Zhang, Q., et al.: Combining model-based and genetics-based offspring generation for multi-objective optimization using a convergence criterion. In: 2006 IEEE International Conference on Evolutionary Computation, pp. 892–899. IEEE (2006) 21. While, L., Hingston, P., Barone, L., et al.: A faster algorithm for calculating hypervolume. IEEE Trans. Evol. Comput. 10(1), 29–38 (2006)

Hybrid Hyper-heuristic Algorithm for Integrated Production and Transportation Scheduling Problem in Distributed Permutation Flow Shop Wenbo Chen, Bin Qian(B) , Rong Hu, Sen Zhang, and Yijun Wang Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China [email protected]

Abstract. Researching the integrated production and transportation scheduling problem (IPTSP) is a great way to improve the overall benefits of companies, especially global ones, in the supply chain. In this study, we propose a novel hybrid hyper-heuristic algorithm (H_HHA) to solve a little-studied integrated distributed permutation flow-shop problem (DFSP) and multi-depot vehicle routing problem (IDFS_MDVRP) with the aim of minimizing the total delivery time. The H_HHA consists of a genetic algorithm (GA) and a hyper-heuristic algorithm (HHA). We employ five pre-designed heuristic operations in the low-level heuristics (LLHs) to improve local search performance, while the GA is used to improve the highlevel heuristics (HLS) performance of the HHA. The interactions of the LLHs and HLS lead to the improved overall performance of the H_HHA. The simulation experiments and statistical results demonstrate the effectiveness of our proposed H_HHA in addressing the problem. Keywords: Integrated scheduling · Hyper-heuristic · Distributed flow-shop

1 Introduction The production scheduling problem (PSP) and transportation scheduling problem (TSP) are two supply chain challenges that have received considerable attention. In a classic supply chain, due to the different belongs of production factories and transportation vehicles, The PSP and TSP are mostly solved by different personnel as two separate and independent problems. Indeed, in make-to-order (MTO) or time-sensitive (e.g., perishable, seasonal) products, a just-in-time (JIT) strategy is usually used to satisfy the high requirement of delivery time from customers [1], and in this way, the PSP and TSP involved in the same supply chain are coupled strongly. Optimizing a single one independently will inevitably disregard the requirements and constraints of the other and lead to a suboptimality overall solution [2]. Therefore, studying the integrated PSP and TSP (IPTSP) from a higher decision level to ensure the overall optimum performance is necessary. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 85–96, 2023. https://doi.org/10.1007/978-981-99-4755-3_8

86

W. Chen et al.

The IPTSP has two phases, the production stage (PS) and the transportation stage (TS). The decision process of both in the IPTSP is made simultaneously, which can effectively shorten the delivery time, reduce inventory and costs, and improve the global competitiveness of enterprises [3]. There are two categories of the IPTSP, one is integrated production and in-plant transportation scheduling problem (IPITSP), and the other is integrated production and off-plant transportation scheduling problem (IPOTSP). The IPITSP focuses on reducing the makespan via enhanced coordination between PSP and automated guided vehicle (AGV) systems [4–6]. The IPOTSP integrates the PSP and TSP into a single problem and then uses optimize algorithms to solve a solution that has a 5%-20% average improvement compared to an unintegrated problem [7]. There are many achievements in the research of IPOTSP. Belo-Filho et al. [8] developed an adaptive large neighborhood search (ALNS) algorithm to solve the integrated scheduling problem of production and transportation of perishable goods with minimizing the sum costs of production and transportation. Ramezanian et al. [9] developed an improved imperialist competitive algorithm (I-ICA) to solve the integrated M-flow shop and two distinct routing problems respectively with the objective of minimizing the sum costs of the production and delivery. Aiming at minimizing the sum costs of transportation and penalty, Liu et al. [10] developed a hybrid multi-level optimization algorithm to solve the integrated vehicle routing and production problem with flexible departure time. Aiming at minimizing the weighted sum of delivery time and transportation costs, He et al. [3] addressed an enhanced branch-and-price algorithm to solve the integrated 3D printing and JIT delivery scheduling problem. Aiming at maximize the total profit, Ghasemkhani et al. [11] addressed a hybrid imperialist competitive algorithm (HICA) and self-adaptive differential evolution (SADE) algorithm to solve the integrated production-inventory-routing scheduling problem of multi-perishable products. In most of the finished research, the PS in IPOTSP is mostly involving a single or parallel machine(s) in a single factory [2], the TS in IPOTSP is mostly involving a basic VRP and non-inventory research on IPOTSP is limited [12]. In today’s supply chain, Distributed production with MOT and JIT is becoming one of the most important means of improving economic benefits and resisting risks for manufacturing companies. Therefore, a model integrated distributed permutation flow-shop problem (DFSP) and multi-depot VRP (IDFS_MDVRP) is established and a correlation algorithm is designed to solve it. As a complex combinatorial optimization problem (COP), the IDFS_MDVRP is NPhard because it reduces to an NP-hard problem DFSP [13]. Using intelligent optimization algorithms to solve NP-hard COP is one kind of the popular methods. Thus, an effective intelligent optimization algorithm, i.e., the hyper-heuristic algorithm (HHA), is selected as the main framework of our proposed algorithm for the IDFS_MDVRP. HHA is a two-layer structure. The low level is a set of pre-designed heuristics or neighborhood operations (LLHs), and the high level is a search sequence (HLS) that is formed with the operations from the low level. HHA has been widely applied in PSP [14–17]. Based on HHA’s excellent exploration, we propose a hybrid hyper-heuristic algorithm (H_HHA) to solve the IDFS_MDVRP.

Hybrid Hyper-heuristic Algorithm for Integrated Production

87

The remainder of this study is as bellow. The problem introduction and mathematical modeling are described in Sect. 2. The proposed algorithm is introduced in Sect. 3. Next, the experimental results are discussed in Sect. 4. In the end, Sect. 5 gives the conclusions of this study.

2 Problem Description and Formulation The integrated scheduling problem in this study consists of PSP and TSP which correspond to the PS and TS in the supply chain. The start time of TS is always later than the start time but is equal to or earlier than the end time of PS. 2.1 Problem Description The problem addressed in this study can be defined as follows: In a distributed permutation flow shop environment, a set of n (n ≥ k) jobs need to be processed in k (k ≥ 2) separate factories with the same configuration, each factory has different m machines and same N vehicles in, each of n jobs should be sequentially performed on all m machines. Once a job is completed, it loads into a vehicle immediately. And once a vehicle is fully loaded, it immediately transports completed jobs to customers, who correspond one-to-one with the jobs. After all loaded jobs have been delivered, the vehicle must return to the departure factory. The total time is the sum of both stages (PS and TS), which is equivalent to the sum of the makespan of all jobs and the time from the vehicle’s departure to its return. The objective is to determine the optimal sequence to ensure the minimum total time. As shown in Fig. 1, which presents an example of a Gantt chart used in this study, there are 10 jobs and 2 factories, and each factory has 2 machines and 2 vehicles. Jobs 3, 5, 8 and 7 are sequentially processed in factory 1 and transported by vehicle 1 with the sequences [7, 8, 3, 5]. Jobs 2, 4, 10, 6, 9 and 1 are sequentially processed in factory 2, while jobs 2, 4, and 10 are transported by vehicle 1 with the transport sequence [4, 2, 10] and jobs 6, 9 and 1 are transported by vehicle 2 with the transport sequence [6, 1, 9]. The total time is equal to the return time of vehicle 2 in factory 2. The following assumptions are made to ensure the rigor of the mathematical model: • • • • • •

All machines run continuously without any breakdowns. Each machine can only handle one job at a time. Each Job can only be performed in a single factory. Each Factory has at least one job to be performed. Each customer can be visited only once and can be visited only by one vehicle. Each vehicle can be assigned tasks only once.

88

W. Chen et al.

Fig. 1. Example of a Gantt chart for the IDFS_MDVRP

2.2 Mathematical Model The notations used in the mathematical model are as follows: k: number of factories, k ≥ 2. n: number of jobs (or customers), n ≥ k. m: number of machines in each factory. N : number of vehicles in each factory. i: index of machine, i ∈ {1, 2, · · · , m}. j: index of job (or customer), j ∈ {1, 2, · · · , n}. f : index of factory, f ∈ {1, 2, · · · , k}. h: index of vehicle, h ∈ {1, 2, · · · , N }. nf : number of jobs assigned to factory f, nf ∈ {1, 2, ·· · , (n − k + 1)}. l: index of job in sub-sequence πf , l ∈ 1, 2, · · · , nf . π: total sequence of jobs, π = [π1 , π2 , · · · , πk ].   πf : sub-sequence of jobs in factory f , πf = πf (1), πf (2), · · · , πf (l), · · · , πf nf . pj,i : processing time of job j on machine i. Cj,i : makespan of job j on machine i. nhf : number of jobs loaded (or customers served) by vehicle h in factory f . Cfh : makespan of jobs loaded by vehicle h in factory f . h z: index of customer in sub-sequence πh f , z ∈ {1, 2, · · · , nf }. πhf : sub-sequence of jobs loaded by vehicle h in factory f ,  – πhf = πhf (1), πhf (2), · · · , πhf nhf .

– – – – – – – – – – – – – – – – – –

– πh of customers served by vehicle h in factory f , f : sub-sequence  h h h h – π f = π f (1), π f (2), · · · , π f nhf . – dj : distance from the last one to current customer j.

Hybrid Hyper-heuristic Algorithm for Integrated Production

– – – – – – –

djS : distance from departure factory to current customer j. djR : distance from current customer j to return factory. vj : speed from the last one to current customer j. vjS : speed from departure factory to current customer j. vjR : speed from current customer j to return factory. tj : transport time of current customer j. tfh : return time of vehicle h in factory f .

– – – – –

Tfh : total time of vehicle h in factory f . Tf : total time of factory f . T : total time of all factories. wj : weight of job j. Lmax : maximum loading of vehicle.

89

In flow-shop scheduling problems, job-based or operation-based sequence models are typically used to determine the start or completion time of each job or operation on various machines. When applying evolutionary algorithms to these sequence models, the optimization variables are job or operation sequences, which are capable of being independently optimized by the algorithms [18]. This means that such algorithms optimize the variables of sequence models, rather than the mathematical models themselves. The advantage of the sequence models is that their computational process is similar to the iterative computational process of evolutionary algorithms, and some constraints can be implicitly included in the solution process. Therefore, based on Sect. 2.1, we proposed a sequence model for the IDFS_MDVRP. In the IDFS_MDVRP, the production stage always precedes the transportation stage, and the makespan of the production stage is calculated as follows: Cπf (1),1 = pπf (1),1 , ∀f ∈ {1, 2, · · · , k}.

(1)

  Cπf (l),1 = Cπf (l−1),1 + pπf (l),1 , ∀f ∈ {1, 2, · · · , k}, l ∈ 2, 3, · · · , nf .

(2)

Cπf (1),i = Cπf (1),(i−1) + pπf (1),i , ∀f ∈ {1, 2, · · · , k}, i ∈ {2, 3, · · · , m}.

(3)

Cπf (l),i = max Cπf (l−1),i , Cπf (l),(i−1) + pπf (l),i .  ∀f ∈ {1, 2, · · · , k}, i ∈ {2, 3, · · · , m}, l ∈ 2, 3, · · · , nf

(4)

, ∀f πhf nhf ,m

Cfh = C

∈ {1, 2, · · · , k}, h ∈ {1, 2, · · · , N }.

(5)

In the production stage, the decision variable is the subsequence πf or πhf . Following the production stage, the transport time of the vehicles can be calculated as follows: tπh (1) = f

d Sh

πf (1)

vS h

πf (1)

, ∀f ∈ {1, 2, · · · , k}, h ∈ {1, 2, · · · , N }.

(6)

90

W. Chen et al. d

tπ  h (z) = tπ  h(z−1) +

v

 π h(z) f  π h(z) f

,

. ∀f ∈ {1, 2, · · · , k}, h ∈ {1, 2, · · · , N }, z ∈ 2, 3, · · · , nhf f

tfh

=t



h πh f nf

+

f

d Rh



vRh



πf nhf

πf

nhf

(7)

, ∀f ∈ {1, 2, · · · , k}, h ∈ {1, 2, · · · , N }.

(8)

In the transportation stage, the decision variable is πh f . After calculating the makespan of the production stage and the transport time of the transportation stage, the total time can be easily calculated by the sum of both as follows: Tfh = Cfh + tfh , ∀f ∈ {1, 2, · · · , k}, h ∈ {1, 2, · · · , N }.

(9)

Tf = max Tf1 , Tf2 , · · · , TfN , f ∈ {1, 2, · · · , k}.

(10)

T = max{T1 , T2 , · · · , Tk }.

(11)

The loaded jobs cannot exceed the maximum load of the vehicle, and the production sequence of jobs and the transportation sequence of customers must have the same elements for the same vehicle. Therefore, all calculation processes must comply with these constraints:

πfh nhf

wπ h (1) + wπ h (2) + · · · + w

≤ Lmax ,

. ∀f ∈ {1, 2, · · · , k}, ∀h ∈ {1, 2, · · · , N }



 



  x ∈ 1, 2, · · · , nhf πfh (x) = x ∈ 1, 2, · · · , nhf πf h (x) f

f

∀f ∈ {1, 2, · · · , k}, ∀h ∈ {1, 2, · · · , N }

(12)

.

(13)

In the IDFS_MDVRP, the goal is to determine the optimal production sequence and transportation sequence to minimize the total time. By using the vehicle as the basic unit, the objective function can be presented as follows:



 πf∗h , πf∗ h = argT πfh , πf h → min . (14) f ∈ {1, 2, · · · , k}, h ∈ {1, 2, · · · , N }

 The argT πfh , πf h → min means finding a production sequence and transportation sequence that minimize the total time T in the vehicle h.

3 Proposed Algorithm H_HHA is an algorithm that blends a genetic algorithm (GA) with HHA. The HHA is used to explore the solution space of the IDFS_MDVRP, and the GA is used to improve the performance of the HHA. The structure of H_HHA is shown in Fig. 2.

Hybrid Hyper-heuristic Algorithm for Integrated Production

91

Fig. 2. Structure diagram of H_HHA

3.1 Encoding and Decoding In this study, random initialization encoding is used to generate a total sequence π. After the encoding, a step-by-step way is used to decode the π according to the nature of the IDFS_MDVRP. For solving the IDFS_MDVRP, the first step is allocating the jobs to every factory from the jobs set, and then, the completed jobs in every factory should be loaded and transported to customs in batches by vehicles. The corresponding decoding process can be briefly introduced as follows: (1) Allocating one job to a single factory each time from π and the allocated job will be removed from π. (2) Pre-allocating the same job to every factory from the rest π in sequence and calculating the Tf for each factory, the pre-allocated job will be finally allocated to the factory which has the minimum Tf . Repeat the Pre-allocating and allocating action until all jobs in π are allocated. (3) During the process of step (2), once a job is completely processed, it can be loaded into a vehicle, and once a vehicle cannot load more jobs, it immediately starts to deliver the loaded jobs to customers. After the decoding, allT ,Tf ,Tfh , πhf and πh f can be calculated out. 3.2 LLHs and HLS The LLHs is a set of heuristics or neighborhood operations which can explore better π, πhf or πh f from the problem’s solution space. The LLHs are designed below. – LLH1: Randomly select two different jobs from a sequence  (π,πhf or πh f ), and then insert the 2nd into the front of the 1st. – LLH2: Randomly select two different jobs from the , and then insert the 1st into the back of the 2nd. – LLH3: Randomly select two different jobs from the , and then swap them. – LLH4: Randomly select two different jobs from the , and then reverse the sequence of jobs between them. – LLH5: Randomly select one job from the , and then switch the place with the job next to it. Executing the LLHs on = [5, 1, 4, 2, 3], the results are shown in Fig. 3.

92

W. Chen et al.

Fig. 3. Illustrate executing the LLHs on = [5, 1, 4, 2, 3]

The HLS is essentially a search sequence consisting of elements from the LLHs. Typically, the length of the HLS sequence exceeds the total number of elements in the LLHs, and each element in the LLHs can be repeated in the sequence. When the H_HHA is running, the LLHs’ operations are performed on π with the HLS sequence.

Fig. 4. The flow chart of H_HHA

Hybrid Hyper-heuristic Algorithm for Integrated Production

93

3.3 The H_HHA The H_HHA is a hybrid algorithm combined with GA and HHA. In the H_HHA, the total time T is used to evaluate the behavior of π, and the performance of the HLS sequence is evaluated by T which is the improvement of T . A smaller T means a better π, and a bigger T means a better HLS sequence. The HLS sequence is employed to find a smaller T and the GA is used to find a bigger T . The flow chart of the H_HHA is shown in Fig. 4.

4 Computational Results All algorithms involved in this study are encoded and compiled by MATLAB R2022a. The computational experiments are executed on a PC with Inter(R) Core (TM) i5–12400 CPU @ 2.5 GHz processor and 16G of RAM under Microsoft Windows 10 OS. The used test data are generated based on Naderi et al.’s data [13] and the well-known Solomon benchmarks [19]. In the H_HHA, the population size of π and HLS is both 30, the probability of selection and crossover is both 0.9, the probability of mutation is 0.05, and the length of the HLS sequence is set to 15. For keeping computational results fair, all tested algorithms are performed under the same termination conditions. That is, the maximum elapsed CPU time is set to the same time, i.e., n × m × f × 100 ms. All computational comparisons are independently conducted with 15 replications in all 10 instances to ensure the stability and reliability of the experimental results, and all of the average numerical results are calculated to eliminate random errors. The computational tests are conducted in three groups of the H_HHA, heuristic rule 2 [13] with NEH [20] (HR2_NEH) and IGA [21]. As shown in Table 1, the minimum computational results (MCR) and average values (AVG) are all listed, and the best ones are highlighted in bold. Table 1. Computational results of HR2_NEH, IGA and H_HHA ×

×

2£ 5£ 20 2£ 5£ 50 2£ 10£ 80 2£ 10£ 100 3£ 5£ 20 3£ 5£ 50 3£ 10£ 80 4£ 5£ 50 5£ 10£ 75 6£ 10£ 72

HR2_NEH M CR AVG

M CR

AVG

722.3 1358.4 2005.6 2194.7 480.4 766.5 1409.5 710.2 1028.3 1455.8

621.4 1200.8 1985.3 2228.0 452.7 763.9 1419.7 669.3 1032.2 1156.7

639.1 1236.7 2024.6 2256.4 463.8 787.3 1446.0 685.7 1048.0 1174.8

758.5 1431.9 2047.2 2246.4 492.1 788.9 1437.9 725.4 1045.3 1486.5

IGA

H_HHA M CR AVG

618.1 1180.4 1971.4 2022.1 444.8 748.2 1276.9 663.5 1001.4 1149.2

632.4 1215.3 2001.5 2038.1 462.3 758.8 1298.6 681.9 1009.2 1173.7

94

W. Chen et al.

From the results, it’s clear that the results optimized by the H_HHA are better than the other comparison algorithms, that is showing the reasonable effectiveness of the two-layer struct of the H_HHA. In the H_HHA, the fitness of the encoding sequence not only represents the quality of the solution but also indicates the performance of the HLS search sequence. This performance correlates with the search ability and position relation of the LLHs operators in the HLS search sequence. By modifying the combination sequence of the LLHs operators, the performance of the HLS search sequence can be altered, leading to further changes in the fitness of the encoding sequence. These mechanisms create strong feedback interactions between the global and local search in the H_HHA. In contrast, most of the existing combinatorial optimization algorithms lack substantial interaction between the global and local search, and the local search is typically executed using a fixed pre-designed operation sequence. This could be the reason why the H_HHA finds better results than the other comparison algorithms. However, the H_HHA has several disadvantages, including its complex structure, difficulty for learners to understand, and high storage resource requirements.

5 Conclusion Remarks The IPTSP is becoming more and more important. In this study, the integrated optimization model IDFS_MDVRP is established based on the nature of the integrated distributed permutation flow-shop problem and multi-depot VRP, and then, the H_HHA is developed to solve the model. The H_HHA consists of GA and HHA: the HHA has a two-layer struct in which the LLHs provide a candidate set of heuristic operations, and the HLS explores the solution space with a constructed sequence of the heuristic operations; the GA is used to help construct a more exploratory sequence. Simulation experiments show the efficiency of our algorithm. Although this study demonstrates the effectiveness of H_HHA in solving the IDFS_MDVRP, there are still several challenges that need to be addressed. For instance, one of the biggest challenges is determining the optimal design of LLHs’ operators based on problem properties. Additionally, there is an urgent need to reduce environmental issues such as global warming and energy crisis, hence it is critical to integrate environmental effects into our model. Therefore, in future research, we plan to explore two primary directions. Firstly, we aim to develop multiple knowledge-based LLH operators to improve the H_HHA’s exploratory capacity. Secondly, it is crucial to extend the proposed H_HHA and IDFS_MDVRP to a more realistic domain to better address the environmental issues. Acknowledgments. This study was partially supported by the National Natural Science Foundation of China (62173169, 61963022), and the Basic Research Key Project of Yunnan Province (202201AS070030).

Hybrid Hyper-heuristic Algorithm for Integrated Production

95

References 1. Chen, Z.: Integrated production and outbound distribution scheduling: review and extensions. Oper. Res. 58(1), 130–148 (2010) 2. Moons, S., Ramaekers, K., Caris, A., Arda, Y.: Integrating production scheduling and vehicle routing decisions at the operational decision level: a review and discussion. Comput. Ind. Eng. 104, 224–245 (2017) 3. He, P., Li, K., Kumar, P.N.R.: An enhanced branch-and-price algorithm for the integrated production and transportation scheduling problem. Int. J. Prod. Res. 60(6), 1874–1889 (2022) 4. Reddy, N.S., Ramamurthy, D.V., Rao, K.P., Lalitha, M.P.: Practical simultaneous scheduling of machines, agvs, tool transporter and tools in a multi machine fms using symbiotic organisms search algorithm. Int. J. Comput. Integr. Manuf. 34(2), 153–174 (2021) 5. Reddy, N.S., Ramamurthy, D.V., Lalitha, M.P., Rao, K.P.: Minimizing the total completion time on a multi-machine FMS using flower pollination algorithm. Soft. Comput. 26(3), 1437– 1458 (2022) 6. Li, W., Han, D., Gao, L., Li, X., Li, Y.: Integrated production and transportation scheduling method in hybrid flow shop. Chin. J. Mech. Eng. 35(1), 1–20 (2022) 7. Meinecke, C., Scholz-Reiter, B.: A heuristic for the integrated production and distribution scheduling problem. World Acad. Sci. Eng. Technol. Int. J. Mech. Aerosp. Indus. Mech. Manufact. Eng. 8, 280–287 (2014) 8. Belo-Filho, M.A.F., Amorim, P., Almada-Lobo, B.: An adaptive large neighbourhood search for the operational integrated production and distribution problem of perishable products. Int. J. Prod. Res. 53(20), 6040–6058 (2015) 9. Ramezanian, R., Mohammadi, S., Cheraghalikhani, A.: Toward an integrated modeling approach for production and delivery operations in flow shop system: trade-off between direct and routing delivery methods. J. Manuf. Syst. 44, 79–92 (2017) 10. Liu, H., Guo, Z., Zhang, Z.: A hybrid multi-level optimisation framework for integrated production scheduling and vehicle routing with flexible departure time. Int. J. Prod. Res. 59(21), 6615–6632 (2021) 11. Ghasemkhani, A., Tavakkoli-Moghaddam, R., Rahimi, Y., Shahnejat-Bushehri, S., TavakkoliMoghaddam, H.: Integrated production-inventory-routing problem for multi-perishable products under uncertainty by meta-heuristic algorithms. Int. J. Prod. Res. 60(9), 2766–2786 (2022) 12. Kumar, R., Ganapathy, L., Gokhale, R., Tiwari, M.K.: Quantitative approaches for the integration of production and distribution planning in the supply chain: a systematic literature review. Int. J. Prod. Res. 58(11), 3527–3553 (2020) 13. Naderi, B., Ruiz, R.: The distributed permutation flowshop scheduling problem. Comput. Oper. Res. 37(4), 754–768 (2010) 14. Park, J., Mei, Y., Nguyen, S., Chen, G., Zhang, M.: An investigation of ensemble combination schemes for genetic programming based hyper-heuristic approaches to dynamic job shop scheduling. Appl. Soft Comput. 63, 72–86 (2018) 15. Lin, J.: Backtracking search based hyper-heuristic for the flexible job-shop scheduling problem with fuzzy processing time. Eng. Appl. Artif. Intell. 77, 186–196 (2019) 16. Song, H., Lin, J.: A genetic programming hyper-heuristic for the distributed assembly permutation flow-shop scheduling problem with sequence dependent setup times. Swarm Evol. Comput. 60, 100807 (2021) 17. Zhao, F., Di, S., Wang, L., Xu, T., Zhu, N., Jonrinaldi: A self-learning hyper-heuristic for the distributed assembly blocking flow shop scheduling problem with total flowtime criterion. Eng. Appl. Artif. Intell. 116, 105418 (2022)

96

W. Chen et al.

18. Li, Z.C., Qian, B., Hu, R., Chang, L.L., Yang, J.B.: An elitist nondominated sorting hybrid algorithm for multi-objective flexible job-shop scheduling problem with sequence-dependent setups. Knowl.-Based Syst. 173, 83–112 (2019) 19. Multiple Depot VRP Instances Vehicle Routing Problem. http://neo.lcc.uma.es/vrp/vrp-ins tances/multiple-depot-vrp-instances/. Accessed 13 June 2022 20. Nawaz, M., Enscore, E.E., Ham, I.: A heuristic algorithm for the M-machine, N-job flow-shop sequencing problem. Omega 11(1), 91–95 (1983) 21. Feng, X., Xu, Z.: Integrated production and transportation scheduling on parallel batchprocessing machines. IEEE Access 7, 148393–148400 (2019)

A Task Level-Aware Scheduling Algorithm for Energy Consumption Constrained Parallel Applications on Heterogeneous Computing Systems Haodi Li1,2 , Jing Wu1,2(B) , Jianhua Lu1,2 , Ziyu Chen1,2 , Ping Zhang1,2 , and Wei Hu1,2 1 College of Computer Science and Technology, Wuhan University of Science and Technology,

Wuhan, China {lihaodi,wujingecs,czyuer,huwei}@wust.edu.cn, [email protected], [email protected] 2 Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial Systems, Wuhan, China

Abstract. Energy consumption is a significant concern for both small-scale embedded devices to large-scale data centers. The problem of minimizing the scheduling length of parallel applications with energy constraints in heterogeneous computing systems has garnered widespread attention. The crux of this problem lies in the rational conversion of energy constraints of the application to the energy constraints of each task, utilizing dynamic voltage and frequency scaling (DVFS) technology. To determine the energy constraints of each task, the commonly used method is the task energy pre-allocation strategy. Previous studies only focused on energy allocation for individual tasks, neglecting the consideration of task levels. However, an equitable energy allocation for each task is not necessarily the optimal approach. If the levels of tasks in energy allocation are taken into account, more energy can be allocated to tasks that have a greater impact on the overall application scheduling time, thus effectively reducing the application scheduling time. Therefore, we have proposed a task level-aware scheduling algorithm that allocates energy to each level based on the proportion of the minimum energy consumption for the level, then, the algorithm assigns energy to each task based on the proportion of the task’s weighted time in the current level. Extensive experimental results demonstrate that our approach allocates energy reasonably and achieves better performance, effectively reducing the application’s schedule length. Keywords: Heterogeneous computing systems · Energy consumption · DVFS · Directed acyclic graphs · Parallel applications · Scheduling length

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 97–108, 2023. https://doi.org/10.1007/978-981-99-4755-3_9

98

H. Li et al.

1 Introduction 1.1 Background With the rapid development of computer technology and the gradually increasing needs of the industry, heterogeneous computing systems (HCS) are widely adopted due to their powerful data processing capabilities and efficient performance. However, higher performance also leads to greater energy consumption, and excessive energy consumption has raised many environmental protection and resource waste issues [1]. Energy constraints have become one of the primary limiting factors in the design of HCS. Dynamic voltage and frequency scaling (DVFS) technology [2] is a popular technique used to reduce the power consumption of computing systems. This technology adjusts the power supply voltage and frequency according to different tasks to achieve energy savings. However, focusing solely on energy savings may lead to a significant increase in scheduling time. Therefore, many design and development personnel seek to strike a balance between energy consumption and scheduling time to meet requirements. In the problem of mutual optimization of energy consumption constraints and scheduling time, one objective is typically constrained and minimize the other. 1.2 Main Contributions Our primary focus is to minimize the scheduling time of an application under the constraints of a given energy consumption constraint, and the main contributions of this paper are outlined below: • We designed an energy pre-allocation strategy using task levels relations, first assigning energy to task levels according to level energy demand rate (ER) and then to each task according to task weight time ratio (MSR). • We developed a new algorithm to minimize the schedule length of the application under the constraint of satisfying the energy consumption constraint. • Testing our algorithm using realistic applications, the experimental results demonstrate that our algorithm obtain smaller schedule lengths. The following sections of this paper provide a detailed account of our work. Section 2 reviews related studies. Section 3 elaborates on the model adopted in this paper. Section 4 describes the details of our solution. In Sect. 5, we present the results of our experiments on this algorithm. In Sect. 6, we present our conclusions.

2 Related Work To address the task scheduling problem in heterogeneous computing systems, many studies [3, 6] have used list-based heuristic algorithms to reduce application scheduling length. The task and processor models they provided are classical and still in use today. In recent years, the optimization of energy consumption and scheduling time in task scheduling for heterogeneous computing systems has become an important research topic [4]. After the proposal of DVFS technology [2], many studies have utilized this technology to achieve the desired optimization effect.

A Task Level-Aware Scheduling Algorithm

99

Optimization for energy consumption and completion time generally refers to minimizing one objective while given constraints on the other objective, such as schedule length or energy consumption. In [9], an effective scheduling algorithm ESECC was proposed, which assigns the same energy to each task by choosing a relatively average value, but the average allocation effect was not ideal. In [10], an algorithm named WALECC was proposed, which uses a relative weighted method to allocate energy to each task. An enhanced scheduling algorithm EECC was proposed in [5], which assigns equal energy to each task by selecting the relative middle value in unequal allocations. However, allocating energy to each task individually is not necessarily the best approach, they do not take into account the different effects of the level at which the task is located on the completion time of the program. Appropriately assigning different amounts of energy to different tasks can effectively reduce the length of the application program if the relationship between task levels is considered. In this paper, we propose a task-level energy pre-allocation scheduling algorithm, which utilizes task-level to further address the same problem.

3 Models and Preliminaries 3.1 Application Model As in previous studies [4, 9], we use directed acyclic graphs (DAG) to represent the application model. Let U = {u1 , u2 , . . . , u|U | } denote a set of heterogeneous processors, and the number of processors is represented by |U |. If there is X denoting any set, then |X | denotes its size. The application model represented by DAG is defined as G = {T , A, C, W }, where T = {t1 , t2 , . . . , t|N | } denotes the set of task nodes in model G. Given that the processors in a parallel computing system are heterogeneous, the execution time of each task may vary depending on the processor on which it is executed. A Denotes the set of communication edges, ai,j ∈ A denotes the set of communication messages from task ti to the communication message of task tj . If ti and tj are assigned to the same processor, we will ignore the communication cost between these two tasks. C Denotes the communication time set and ci,j ∈ C denotes the communication cost of tasks ti and tj . W is a matrix of size |N | × |U | matrix. wi,k Denotes the running time of task ti on processor uk with maximum frequency. The set of direct predecessor tasks of task ti is denoted by pred (ti ) and the set of direct successor tasks is denoted by succ(ti ). tentry is used to denote the entry task (such as t1 in Fig. 1(a)), a task without a predecessor node, and texit is used to denote the exit task (such as t10 in Fig. 1(a)), a task without a successor node. In Fig. 1(a), we have an example of a DAG that includes three heterogeneous processors and 10 different tasks. In the figure, if task t1 and task t6 are not assigned to the same processor, the communication cost between them is 14, denoted by c1,6 . Figure 1(b) shows the execution time of each task on different processors, and since the processors are heterogeneous, the execution time of the same task on different processors varies. 3.2 Energy Model Dynamic voltage and frequency scaling (DVFS) technology enables processors to dynamically adjust their supply voltage and frequency. The relationship between supply

100

H. Li et al.

Fig. 1. An example of a directed acyclic graph (DAG).

voltage and frequency is almost linear [4, 9], so energy saving can be achieved by tuning the frequency. The model adopted in this paper is based on the widely used presented in [4, 5]. The power consumption P(f ) of the system is calculated when the frequency is expressed in terms of f as follows:   P(f ) = Ps + h(Pind + Pd ) = Ps + h Pind + Cef f m (1) Where Ps represents the static energy consumption, which is equal to 0 only when the whole system is turned off. As in the literature [9, 11], static energy consumption is not considered in this paper because of its unmanageability. h represents the operating state of the system, when the system is running, h = 1; when the system is stopped, h = 0. Pind represents the frequency-independent power, and Pd represents the frequency-dependent dynamic power. Cef and m are both system-dependent constants, Cef represents the effective capacitance, m represents the dynamic power index. Since energy consumption may not always decrease with lower frequency, the minimum value of fee for the effective  Pind . frequency can be calculated as fee = m (m−1)C ef Then the actual minimum frequency can be expressed asflow  = max(fmin , fee ), which , f means that the interval of the actual frequency f is [f low max . The frequency set of the   processor uk is fk,low , fk,a , . . . , fk,max . The parameters of each processor are different,  so we can define frequency , frequency-dependent dynamic , P , . . . , P independent power set P 1,ind 2,ind |U |,ind     power set P1,d , P2,d , . . . , P|U |,d , effectivecapacitance set C1,ef , C2,ef , . . . , C|U |,ef , dynamic power index set m1 , m2 , . . . , m|U | .   In this way, we can calculate the energy consumption E ti , uk , fk,h of task ti executed at frequency fk,h on processor uk :    f m E ti , uk , fk,h = Pind + Ck,ef × fk,hk × wi,k × k,max (2) fk,h Thus, the energy consumption of the entire application can be expressed as: E(G) =

|N |

  E ti , pk , fk,h i=1

(3)

A Task Level-Aware Scheduling Algorithm

101

Since each task has its own actual energy consumption interval on different processors,  energy consumption of task ti can be calculated as Emin (ti ) =  the minimum min E ti , uk , fk,low and the maximum energy consumption of task ti can be calculated uk ∈U   as Emax (ti ) = max E ti , uk , fk,max . uk ∈U

Therefore, based on the minimum and maximum energy consumption of each task, the maximum energy consumption of the entire application can be calculated |N | as Emin (G) = energy consumption of the entire i=1 Emin (ti ) and the minimum |N | application can be calculated as Emax (G) = i=1 Emax (ti ). We assume that Egiven (G) is the given energy consumption limit, and this value must (G) ≤ Emax (G). Otherwise, the study will be meaningless. satisfy Emin (G) ≤ E given

3.3 Preliminaries Task Prioritizing Phase. In order to get the scheduling order of tasks, we need to prioritize each task. The task priority ordering is derived from the exit task traversing to the start task, and we calculate the same way as the classical algorithm HEFT [3], and the task priority ordering is determined by the value of rank u to sort in descending order, and rank u is defined as follows:    ranku (ti ) = wi + max ci,j + ranku tj (4) tj ∈succ(tj ) Where wi represents the average of the  scheduling time of task ti on all processors, and |U |

can be calculated as wi =

k=1 wi,k

|U |

.

Processor Allocation Phase. After doing the task prioritization work, the next step is to select the right processor to execute the highest-priority task. This part involves calculating the earliest start time (EST) and earliest finish time (EFT) for each task on different processors. The start task is assigned an earliest start time of 0 on each processor. The values of EST and EFT of other tasks are calculated by the following equations:

      EST ti , uk , fk,h = max avail[k], max AFT tj + ci,j (5) tj ∈pred (ti )

    fk,max EFT ti , uk , fk,h = EST ti , uk , fk,h + wi,k × fk,h

(6)

Where avail[k] is the earliest time that processor uk can execute the task, AFT (ti ) is the actual end time of task ti , and ci,j represents the actual communication cost of task ti to tj , which is 0 if both tasks are assigned to the same processor. After that, the task

102

H. Li et al.

is assigned to the processor with the smallest completion time by an insertion-based scheduling policy. From this, we can define the scheduling length SL(G) of the application as SL(G) = AFT (texit ). Scheduling Time of Tasks Without Regard to Frequency. In the model of this paper, if each task is scheduled without considering frequency and energy consumption, the scheduling results are the same as those of the classical algorithm HEFT. The value of this part is an important help for energy consumption pre-allocation, which allows to allocate more energy for the appropriate tasks to reduce the scheduling time of the application. Scheduling with HEFT algorithm can get the weight time MS(ti ) of each task, calculated as MS(ti ) = AFT (ti ) − AST (ti ).

4 Our Solution 4.1 Problem Definition The problem addressed in this study is to assign each task to an appropriate processor and find suitable frequencies to minimize the scheduling time of the application under energy consumption constraints in heterogeneous computing systems. The mathematical formulation of this issue can be described as follows: Minimize : SL(G) = Subject to : E(G) =

max AFT(ti )

ti ∈exit task

|N |

  E ti , pk , fk,h ≤ Egiven (G)

(7)

(8)

i=1

Where SL(G) represents the scheduling length of the application, and the value is determined by the actual finish time of the exiting task. Our goal is to minimize SL(G). E(G) represents the energy consumption of the application and is obtained by summing the energy consumption of each task. Egiven (G) represents the energy consumption limit, which is given by our calculation. Finally, the energy consumption of the application E(G) must be less than the Egiven (G) energy limit, which is the constraint in the problem we are solving. 4.2 Satisfying the Energy Consumption Limit After sorting the tasks according to their priorities, we use {ts(1) , ts(2) , . . . , ts(|N |) } to denote the set of sorted tasks, assuming ts(j) is the current task to be assigned, which is the jth th task in the task set. Then the set of tasks before the jth task is {ts(1) , ts(2) , . . . , ts(j−1) } to denote the tasks that have been assigned and the set of tasks {ts(j+1) , ts(k+2) , . . . , ts(|N |) } to represents the tasks that have not been assigned. The remaining available energy can be calculated as: Era (G) = Egiven (G) − Emin (G)

(9)

A Task Level-Aware Scheduling Algorithm

The task hierarchy is defined as follows:    L tentry = 0 L(ti ) = maxtx ∈pred(ti ) {L(tx ) + 1}

103

(10)

We define Ll = {ti , tk , ...}, Ll is the set of tasks in level l. The level of tentry is 0, and the level of texit is the maximum level. The tier values of tasks ti ,tk etc. in level l are l. Ll,ti denotes the level l to which task ti belongs, and the number of tasks in level l is denoted as |Ll |. The method of our algorithm is to pre-allocate the remaining available energy to each task so that the energy consumption limit of the current task can be found. Since the task hierarchy is added to be considered, it can be understood that this energy is first allocated to the level and then distributed to each task through the level. Next, we introduce the concepts of level energy demand rate ER and task weight time ratio MSR, which can help to understand the energy consumption pre-allocation part of the task. We define Emin (Ll,ti ) as the minimum energy demand of the level to which task ti belongs, calculated as:  

Emin Ll,ti = Emin (ti )

(11)

ti∈Tl

To determine the energy share of the level Ll in the application, we define the level     Emin Ll,ti energy demand rate ER, calculated as ER Ll,ti = E L  . l

min

l,ti

To determine the energy share of task ti in its level Ll , we define the task weight time ratio MSR(ti , Ll ), which is obtained by the weight time calculation of each task, i) calculated as MSR(ti , Ll ) = ms(tms(t . ) ti ∈Ll

i

After that, the allocable energy consumption of task ti can be expressed by Eae (ti ), calculated as:   (12) Eae (ti ) = Emin (ti ) + Era (G) × ER Ll,ti × MSR(ti , Ll ) The pre-allocated energy of a task cannot exceed its own maximum energy demand, so the pre-allocated energy of an unexecuted task can be defined as Epre (ti ), calculated as: Epre (ti ) = min{Eae (ti ), Emax (ti )}

(13)

Therefore, each task in the model can satisfy the following energy consumption constraint:  j−1  Es(j) (G) = i=1 E ts(i) , upr(s(i)) , fpr(s(i),h(s(i))  |N |    (14) +E ts(j) , uk , fk,h + i=j+1 Epre ts(i) ≤ Egiven (G) Where upr(i) represents the processor to which task ti is assigned and fpr(i),h(i) represents the frequency of the task at the assigned processor. Here, the energy consumption of the application consists of three parts: the first part is the energy consumption of the

104

H. Li et al.

executed task, the second part is the energy consumption of the current task, and the third part is the pre-allocated energy of the unexecuted task. If the current task is the first task, the energy consumption of the executed part is 0. If the current task is the last task, the energy consumption of the unexecuted part is 0. If each task can satisfy Es(j) (G) ≤ Egiven (G), then the actual energy consumption of the application E(G) must satisfy E(G) ≤ Egiven (G).

4.3 Algorithm for Minimizing the Scheduling Length The energy consumption limit can be calculated as:     |N | Egiven ts(j) = Egiven (G) − i=j+1 Epre ts(i)  j−1  − i=1 E ts(i) , upr(s(i)) , fpr(s(i),h(s(i))

(15)

This enables the conversion of the energy consumption constraint of the application into individual energy consumption constraints for each task. If the energy consumption of each task is within the energy consumption limit, then the energy consumption of the application will also meet the constraint for the given energy consumption.   Since the maximum energy consumption constraint  for each task ts (j) t , the actual energy consumption constraint E is Emax cons ts(j) for each task is  s(j)       Econs ts(j) = min Egiven ts(j) , Emax ts(j) .

A Task Level-Aware Scheduling Algorithm

105

As described above, each task is then given its own energy consumption constraint, and all tasks are assigned to the processor with the minimum EFT by insertion provided that the energy consumption constraint is satisfied. We present a description of the entire TSAECC algorithm and the main steps: In line 1, calculate the priority of each task and rank them. In line 2, calculate the maximum and minimum energy consumption of the entire application. In lines 3– 4, calculate the pre-assigned energy consumption for each task based on level energy demand rate (ER) and task weight time ratio (MSR). In line 6, calculate the energy limit for each task. In lines 7–17, iterate through all processors and frequencies, selecting the minimum EFT based processor and frequency for each task based on the energy constraints of each task. The time-consuming part of the algorithm is to find the processor and frequency for each task. The time complexity of traversing |N | tasks is O(|N |), and the time complexity of finding the processor and frequency with minimum eft is O(|N | × |U | × |F|) where |F| represents the maximum number of discrete frequencies from the minimum effective frequency to the maximum frequency of the processor. Finally, the TSAECC algorithm has the same time complexity as the algorithm in [5, 9–11] as O(|N |2 × |U | × |F|). 4.4 Example of the TSAECC Algorithm For our example, we utilize the task model depicted in Fig. 1 as the basis for our parallel application. The power parameters of all processors are shown in Fig. 2(a), where the maximum frequency fk,max for each processor is set to 1.0 and the frequency accuracy is set to 0.01. We set the energy consumption limit of the application Egiven (G) to Emax (G) × 0.5. The results in Table 1 shows that the schedule length is 82.0345. Furthermore, the actual energy consumption of the application is calculated to be 78.3526, which is less than the energy consumption limit Egiven (G). We show the scheduling Gantt chart in Fig. 2(b).

Fig. 2. Example of the TSAECC Algorithm

5 Experiments In this section, we use four algorithms: WALECC [10], EECC [5], ISAECC [11], ESECC [9] as comparison algorithms. These algorithms have the same objective as in this paper, and they are used to compare with our proposed method to evaluate the

106

H. Li et al.

performance of our proposed method. The final scheduling length SL(G) is the only evaluation criterion for these algorithms. To facilitate a uniform comparison, we use the energy consumption EHEFT (G) of the application when scheduling with the HEFT algorithm as a benchmark to set the energy constraint limits for the experiment. Table 1. Results of using TSAECC to assign tasks to the application in Fig. 1 ti

Econs (ti )

u(ti )

f (ti )

AST (ti )

AFT (ti )

E(ti )

t1

9.8907

u3

1.0

0

9.0

9.63

t3

11.4668

u1

1.0

21.0

32.0

9.13

t4

8.3141

u2

1.0

18.0

26.0

6.72

t2

10.2273

u3

0.58

9.0

40.0345

10.1233

t5

7.1056

u2

0.72

26.0

44.0556

7.076

t6

10.1242

u1

0.96

32.0

45.5417

10.0301

t9

9.9376

u2

0.98

56.0344

68.2794

9.8032

t7

7.5375

u1

1.0

45.5417

52.5417

5.81

t8

5.4189

u1

1.0

59.0345

64.0345

4.15

t10

8.5274

u2

1.0

75.0345

82.0345

5.88

E(G) = 78.3526, SL = 82.0345

5.1 Experimental Metrics ≤ The processor and application parameters are as follows:10ms ≤ wi,k 100ms, 10ms ≤ ci,j ≤ 100ms, 0.03≤ P k,ind ≤ 0.07, 0.8 ≤ Ck,ef ≤ 1.2, 2.5 ≤ mk ≤ 3.0, fk,max = 1GHz.The number of processors is 8 and these values are randomly generated. The experimental platform was configured with an Apple M1 chip with 8-core CPU and 8-core GPU, macOS Ventura 13. We generated a simulator using the Python language. To assess the effectiveness of our algorithm, we selected two real-world applications (Fast Fourier Transform and Gaussian elimination) as our DAG models for evaluation purposes. The size parameter ρ is used to define the matrix size of the application, and the number of tasks in the FFT graph can be calculated as |N | = (2 × ρ − 1)ρ × log2 ρ, where ρ = 2y , y is an integer. The Fast Fourier Transform application of size ρ has ρ exit tasks. To apply the Fast Fourier Transform application to our study, it is necessary to add a virtual exit task with schedule time 0 as a direct successor node to the ρ exit 2 tasks. The number of tasks in the GE application is |N | = (ρ +ρ−2) . 2 Experiment 1. In this experiment, we compare the final schedule lengths of different algorithms in Fast Fourier transform application and Gaussian elimination application with varying energy constraints. We set the size of the Fast Fourier transform graph to be ρ=64, resulting in a total of 511 tasks (due to the addition of a virtual rollout task the

A Task Level-Aware Scheduling Algorithm

107

actual number of tasks is 512). And we set the size of the Gaussian elimination graph to be ρ = 32, resulting in a total of 527 tasks. For comparison, the energy consumption of the application scheduled under the HEFT algorithm is used as the criterion, the energy constraint Egiven (G) ranges from EHEFT (G) × 0.4 to EHEFT (G) × 0.8. Figure 3(a) shows the comparison of the scheduling length of the algorithm under the Fast Fourier transform application at different energy constraints, and Fig. 3(b) shows the comparison of the scheduling length under the Gaussian elimination application. It can be seen that our algorithms have shorter scheduling times regardless of the different applications or the given energy consumption constraints. When the energy consumption constraint is small, our algorithm improves at most 5.4%-7% in performance over the other algorithms, and when the energy consumption constraint is large, our algorithm improves at most 3.3%-6.1% in performance over the other algorithms.

2600

WALECC TSAECC

14000

ESECC

4400

10000

ISAECC

WALECC 4000

ESECC

TSAECC

3800

2200

8000

EECC WALECC

6000

TSAECC 4000 2000 0

3400

1800 0.4

0.5

0.6

0.7

Egiven(G)

0.8

0.5

0.6

0.7

0.8

32

64

128

256

numbers of tasks

Egiven(G)

(a)The results of (b) The results of FFT application. GE application. Different energy constraints

EECC 8000 WALECC 6000

TSAECC

4000

0

16

0.4

ISAECC

10000

2000

3600

2000

ESECC

12000

ISAECC

EECC

4200

shedule lenhth

EECC

shedule lenhth

ISAECC

2800

2400

12000

4600

ESECC

3000

shedule lenhth

shedule lenhth

3200

13

21

31

47

71

numbers of tasks

(c) The results of (d) The results of FFT application. GE application. Different application scales

Fig. 3. Example of real parallel applications.

Experiment 2. In this experiment, we compare the final schedule lengths of different algorithms in fast fourier transform applications and gaussian elimination applications of varying scales. specifically, we set ρ values of FFT to {16, 32, 64, 128, 256}, Which resulted in a change in the number of tasks from 95 to 2559. And we set ρ values of GE to {13, 21, 31, 47, 71}, which resulted in a change in the number of tasks from 90 to 2555. The energy consumption constraint is fixed and set to EHEFT (G) × 0.5. Figure 3(c) shows the comparison of the scheduling length of the algorithm under the Fast Fourier transform application at different number of tasks, and Fig. 3(d) shows the comparison of the scheduling length under the Gaussian elimination application. It can be seen that our algorithms still have shorter scheduling times under different applications with the number of tasks ranging from less to more. In the Fast Fourier transform application experiments, our algorithm improves at most 8.2% compared to the others, and in the Gaussian elimination application experiments, our algorithm improves at most 6.1% compared to the others. This shows that the algorithm proposed in this paper do obtain good performance on applications of different scales.

6 Conclusion In this paper, we propose an algorithm called TSAECC, aimed at minimizing the scheduling length of parallel applications in heterogeneous computing systems under energy constraints. In the energy pre-allocation stage, we propose an idea of task-level based

108

H. Li et al.

allocation that assigns more appropriate energy to tasks that have a greater impact on the scheduling length of the application. We conducted extensive experiments using Fast Fourier transform application and the Gaussian elimination application. The results demonstrate that our algorithm is more effective in reducing the scheduling length of the application than existing algorithms with lower complexity under given energy constraints. In the future, we can further enhance our algorithm with the following points: • After completing the task allocation, there is usually some remaining energy that can be utilized. This energy can be allocated to tasks that do not have a frequency of 1. The scheduling time of the application can be further reduced by modifying the pre-allocated energy value for that task. • Extend our algorithm to investigate other metrics, such as reliability.

References 1. Omer, A.M.: Energy, environment and sustainable development. Renew. Sustain. Energy Rev. 12(9), 2265–2300 (2008) 2. Weiser, M., Welch, B., Demers, A., et al.: Scheduling for reduced CPU energy. Mob. Comput. 449–471 (1996) 3. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002) 4. Xiao, X., Xie, G., Li, R., et al.: Minimizing schedule length of energy consumption constrained parallel applications on heterogeneous distributed systems. In: 2016 IEEE Trustcom/BigDataSE/ISPA., pp.1471–1476. IEEE (2016) 5. Li, J., Xie, G., Li, K., et al.: Enhanced parallel application scheduling algorithm with energy consumption constraint in heterogeneous distributed systems. J. Circ. Syst. Comput. 28(11), 1950190 (2019) 6. Zhou, N., Qi, D., Wang, X., et al.: A list scheduling algorithm for heterogeneous systems based on a critical node cost table and pessimistic cost table. Concurrency Comput. Pract. Exp. 29(5), e3944 (2017) 7. Huang, J., Li, R., An, J., et al.: A DVFS-weakly dependent energy-efficient scheduling approach for deadline-constrained parallel applications on heterogeneous systems. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(12), 2481–2494 (2021) 8. Tang, Z., Cheng, Z., Li, K., et al.: An efficient energy scheduling algorithm for workflow tasks in hybrids and DVFS-enabled cloud environment. In: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming, pp. 255–261. IEEE (2014) 9. Song, J., Xie, G., Li, R., et al.: An efficient scheduling algorithm for energy consumption constrained parallel applications on heterogeneous distributed systems. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), pp. 32–39. IEEE (2017) 10. Hu, F., Quan, X., Lu, C.: A schedule method for parallel applications on heterogeneous distributed systems with energy consumption constraint. In: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing, pp.134–141 (2018) 11. Quan, Z., Wang, Z.J., Ye, T., et al.: Task scheduling for energy consumption constrained parallel applications on heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 31(5), 1165–1182 (2019)

An Energy-Conscious Task Scheduling Algorithm for Minimizing Energy Consumption and Makespan in Heterogeneous Distributed Systems Wei Hu1,2 , Ziyu Chen1,2(B) , Jing Wu1,2 , Haodi Li1,2 , and Ping Zhang1,2 1 College of Computer Science and Technology, Wuhan University of Science and Technology,

Wuhan, China {huwei,czyuer,wujingecs,lihaodi}@wust.edu.cn, [email protected] 2 Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial Systems, Wuhan, China

Abstract. Heterogeneous distributed systems have been widely used in the industrial field to address the demand for scalability and performance. However, with the increase in the number of computing nodes, the energy consumption of the system has sharply risen. Therefore, reducing energy consumption has become an important objective in the field of sustainable computing. To address this challenge, this paper proposes an energy-conscious scheduling algorithm based on heterogeneous computing systems for allocating and scheduling tasks with different priorities. The algorithm initializes the population based on the Earliest Finish Time (EFT) and processor allocation strategy and adopts the superior individual selection strategy to reduce the influence of inferior solutions. Additionally, the MECMA algorithm introduces a novel adaptive mutation operator to enhance the diversity of the population and accelerate the convergence speed. Simulation experiments have been conducted on a set of randomly generated tasks, and the experimental results have demonstrated the efficiency of the proposed algorithm. The proposed MECMA algorithm has significantly reduced energy consumption and shortened task completion time. The research results have great potential to contribute to the development of sustainable computing and the optimization of energy utilization in industrial fields. Keywords: Heterogeneous distributed systems · Energy-conscious · Energy consumption · Scheduling algorithm

1 Introduction The surging advent of technologies such as AI, big data, and IoT necessitates diverse computing capabilities [1]. These are catered to by heterogeneous distributed systems (HDS), which, while promising, are constrained by energy consumption considerations impacting both economy and environment. Consequently, the design of energy-efficient scheduling algorithms is paramount for HDS. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 109–121, 2023. https://doi.org/10.1007/978-981-99-4755-3_10

110

W. Hu et al.

The issue of energy efficiency becomes complex when dealing with HDS involving multiple platforms [2]. Energy consumption spans across platforms ranging from compact embedded systems to expansive data centers. Maintaining energy usage within a reliable limit boosts operational certainty. Several technologies are deployed to mitigate this energy challenge, notably dynamic voltage and frequency scaling (DVFS) [3] and virtualization integration. DVFS, a wellestablished method for enhancing processor energy efficiency, requires careful coordination of task scheduling, processor allocation, and power supply phases in the context of heterogeneous multi-core processors. However, energy-conscious scheduling in this scenario is an NP-hard problem [4], with different voltage configurations leading to disparate task completion times and energy consumption. Hence, there is an immediate need to strike a balance between energy usage and task completion time. Therefore, to address the above challenges, this paper focuses on parallel applications in heterogeneous distributed systems and makes the following contributions: • Propose a minimize application execution time and energy-conscious algorithm (MECMA) to optimize the entire scheduling time and energy consumption simultaneously. • A new multi-objective fitness function is proposed to balance task scheduling optimization problems using genetic algorithms. This function integrates normalized fitness functions for energy consumption and completion time, simplifying the balancing process and avoiding overemphasis on a single indicator. • The effectiveness of the MECMA algorithm in parallel applications was verified through experimental evaluation. The results of the experiments indicate that the proposed algorithm outperforms existing algorithms with respect to minimizing completion time and energy consumption, thereby exhibiting superior performance.

2 Related Work Existing scheduling algorithm research focuses on local optimal solutions, including heuristic algorithms, metaheuristic algorithms, and integer programming algorithms. Applications are represented as directed acyclic graphs [5], showing task dependencies. Modifying task allocation directly impacts system performance parameters like maximum completion time and energy consumption. Current low-energy scheduling methods can be categorized into three types: energy-efficient, energy-aware parallel, and energyconscious parallel algorithms. These algorithms can be further classified as heuristic, meta-heuristic. Common heuristic algorithms, such as list scheduling, clustering, and replicationbased task scheduling, employ various strategies to allocate tasks to processors. List scheduling calculates task priorities and assigns tasks to optimal processors, while dynamic list scheduling recalculates priorities after task assignment [6]. Clustering algorithms are utilized to group tasks based on their attributes and map these groups onto processors. The execution order is then determined by a set of predetermined rules. Replication-based algorithms reduce communication latency by replicating tasks on different processors [7]. Despite lower time complexity and effective search space reduction, heuristic methods may exhibit low scheduling efficiency in large-scale applications.

An Energy-Conscious Task Scheduling Algorithm

111

Meta-heuristic algorithms improve upon heuristic methods by integrating random and local search algorithms [8]. Utilizing intelligent combinations and learning strategies, they effectively find approximate optimal solutions [9]. Various optimization algorithms, such as Genetic Algorithms (GA), Particle Swarm Optimization (PSO), Simulated Annealing (SA), Artificial Bee Colony (ABC), and Differential Evolution (DE), fall into the categories of evolutionary, biological, and swarm intelligence. These algorithms have proven to exhibit robust performance and adaptability in addressing complex problems. Distinct scheduling objectives impact algorithm development. To reduce energy consumption, DVFS technology is extensively employed. In our study, we integrate the global search capacity of genetic algorithms with the efficacy of heuristic task scheduling algorithms. This approach optimizes both completion time and energy consumption. Leveraging the advantages of genetic algorithms, the proposed algorithm balances task dependencies and overall efficiency using a heuristic approach. Through optimization, it identifies a trade-off solution between completion time and energy consumption.

3 Models In this section, we present the system model, energy model, and problem description. 3.1 System Model We generalize a heterogeneous distributed system as a model consisting of a set of heterogeneous processors U = {u1 , u2 , . . . , um }, where m is the number of processors and the program running on the processors and can be represented as a directed acyclic graph (DAG): G = (T , E, W ). The set of task nodes in the system is denoted as T = {t1 , t2 , ..., tn }. The ith task is denoted by ti , and the set of dependencies of these tasks is denoted as E. The communication cost matrix of n × m is denoted by E, and the dependencies and communication cost between tasks ti and tj are denoted by ei,j . The execution time of these tasks is denoted by W . The time that task ti is executed with maximum frequency in processor uk is denoted by ωi,j . Figure 1(b) shows an example of a DAG that depicts task dependencies in a heterogeneous distributed system. W is a computational cost matrix where each ωi,j gives the estimated execution time for task ti to complete on processor uj , and the task is marked with an average execution cost before scheduling starts. The average execution cost of task ti can be defined as: m j=1 ωi,j (1) ωi = m The communication overhead ci,j between tasks ti (scheduled on processor um ) and tj (scheduled on processor un ) can be defined from the points in the figure as: ci,j = Lm +

datai,j Bm,n

(2)

where Lm denotes the communication delay between ti and tj , datai,j denotes the data size transferred from ti to tj , and Bm,n denotes the communication bandwidth.

112

W. Hu et al.

Before we propose one of these objectives, it is necessary to define EST  EFT  and [13], which evolve from the scheduling of a given part. EST (ti , uj ) and EFT ti , uj are the earliest start time and the earliest execution time of task ti on processor uj , respectively, as defined below for EST and EFT .   EST tentry , uj = 0 (3) 





 EST ti , uj = max avail j ,

 max (AFT (tm ) + cm,i )

tm ∈Pred (ti )

    EFT ti , uj = ωi,j + EST ti , uj

(4) (5)

 avail j is the earliest available time when processor uj is ready to execute the task, and AFT (tm ) is the actual completion time of tm as mentioned earlier. We refer to the first task as the entry task and the last task as the exit task, denoted by tentry and texit , respectively (Table 1). Figure 1(a) shows the execution time matrix with the maximum frequency in Fig. 1(b). Figure 1(b) shows an example based on a parallel application. This example shows only eight tasks executed on four processors {u1 , u2 , u3 , u4 }.

Fig. 1. DAG application model example

3.2 Energy Model In this study, we use an advanced system-level power model (DVFS) [11], in which the relationship between the supply voltage and the operating frequency is approximately linear in the DVFS technique. Therefore, when the clock frequency is adjusted, DVFS also adjusts the supply voltage. We use the term frequency regulation to indicate that the supply voltage and frequency are regulated simultaneously. The power consumption at the operating state of a particular processor can be defined as: Ed =

n i=1

2 Cef Vi,j fi,j ωi,j

(6)

An Energy-Conscious Task Scheduling Algorithm

113

Table 1. Voltage and corresponding frequency values for the four processors u1

u2

u3

u4

Voltage

F

Voltage

f

Voltage

f

Voltage

F

1

1.50

2.0

1.484

1.4

1.30

2.6

1.20

1.8

2

1.40

1.8

1.463

1.2

1.25

2.4

1.15

1.6

3

1.30

1.6

1.318

1.0

1.20

2.2

1.10

1.4

4

1.20

1.4

1.190

0.8

1.15

2.0

1.05

1.2

5

1.10

1.2

0.956

0.6

1.10

1.8

1.00

1.0

Cef is the effective capacitance; V is the supply voltage; Cef is a constant value associated with a particular processor and can be obtained from processor power measurements; f denotes the current frequency of the processor. When the processor is idle, we use the lowest frequency of the processor to calculate the energy consumption at idle, and we define the energy consumption at idle time: Es =

n

2 Cef Vj,s fj,s ωj,s

(7)

j=1

Assuming that E(G) is the energy consumption of the application G, E(G) contains the applied Es (G) and Ed (G) (i.e., dynamic energy consumption), It is defined as: E(G) = Es (G) + Ed (G)

(8)

3.3 Problem Description This study focuses on optimizing task assignment and frequency levels in a heterogeneous distributed system to minimize makespan and energy consumption. Minimize{Makespan(G), E(G)} subject to : (1)AFT (ti ) + ci,k ≤

max {AST (tk )}

tk ∈Succ(ti )

(9)

(2)E(G) ≤ Egiven Our goal, as denoted in Eq. (9), is to minimize the scheduling problem’s makespan. The problem’s constraints emerge from the following conditions. Constraint (1) stipulates that a task can only commence after its predecessors’ execution and inter-task communication have concluded. Constraint (2) guarantees that the cumulative energy consumption of all tasks remains below the given energy constraint.

4 Proposed Task Scheduling Algorithm This paper proposes an energy-conscious scheduling algorithm (MECMA), which aims to minimize energy consumption and makespan. The algorithm consists of six parts: (i) chromosome representation, (ii) population initialization, (iii) evaluation, (iv) selection, (v) superior individual strategy, and (vi) crossover, mutation, and termination.

114

W. Hu et al.

Through these steps, the MECMA algorithm can solve scheduling problems while minimizing energy consumption and makespan, thereby improving industrial production efficiency. 4.1 Chromosome Representation and Initial Population As shown in Fig. 2, the chromosome representation employed in the MECMA algorithm is presented, where each chromosome denotes a scheduling solution. The first layer indicates the priority order of tasks, the second layer specifies the processor allocated to each task, and the third layer signifies the voltage/frequency level utilized by the corresponding processor. Traditional genetic algorithms (GA) typically initialize their population randomly, resulting in the potential inclusion of numerous non-optimal individuals. To address this issue, an enhanced initialization method is proposed. By employing upward sorting in the Heterogeneous Earliest Finish Time (HEFT) algorithm, an initial scheduling sequence adhering to task dependencies and incorporating priority relationships is obtained [10]. The priority for scheduling is defined as follows:    (10) ranku (ti ) = ωi + max ci,j + ranku tj tj ∈Succ(ti )

Succ(ti ) is the set of direct successors of task ti , ci,j is the average communication overhead between task edges (i, j), and ωi is the tied computational overhead of task ti . Random assignments of processors and voltage/frequency levels are made for each task, resulting in an initial chromosome. From this chromosome, the initialized population is generated while ensuring that each task satisfies its task dependencies through appropriate constraints. Multiple initial solutions form population Q after each iteration, preserving task dependencies through a scheme where two tasks (ti and tj ) are randomly exchanged if no direct or indirect dependency exists, and the designed constraint formula is represented by Eq. (10).

    H ti tj = 0H tj ti = 0|ti tj ∈ T (11) Here, H represents the transitive closure matrix. After a predetermined number of iterations, we generated the corresponding initial population. 4.2 Evaluation The MECGA algorithm appraises scheduling solution quality by analyzing their completion time and energy consumption. It employs a normalized fitness approach, ensuring equivalence of energy consumption and makespan during fitness function computation, the definition of which is as follows: F= ω1 + ω2 = 1, F1 =

1 ω1 · F1 + ω2 · F2

Ei − Emin SLi − SLmin , F2 = Emax − Emin SLmax − SLmin

(12) (13)

An Energy-Conscious Task Scheduling Algorithm

115

Here, ω1 and ω2 are weight factors between 0 and 1, with their sum being 1. By adjusting the weight factors appropriately, the trade-off between energy consumption and makespan can be balanced to meet practical needs. Emax and Emin represent the highest and lowest energy consumption in the population, respectively, while SLmin and SLmax represent the shortest and longest makespan in the population, respectively.

Fig. 2. Chromosome representation diagram

4.3 Selection The selection phase is a process of survival of the fittest based on the fitness value calculated in the previous section. We use a roulette wheel selection method to select chromosomes with high fitness, which is essentially a spinning wheel that stops in one of several areas. The area where the wheel stops is positively correlated with the size of each area. Thus, the probability of being selected can be defined as: F(i) p(i) = N j=1 F(j)

(14)

where N represents the total number of chromosomes in the middle, and F(i) denotes the adaptation value of the ith individual. The algorithm for the selection phase is shown in Algorithm 1.

4.4 Superior Individual Strategy In order to address the issue of suboptimal solutions and the loss of high-quality solutions during crossover and mutation operations, the MECMA algorithm introduces an elite

116

W. Hu et al.

selection stage, which distinguishes it from the standard genetic algorithm. Specifically, prior to each iteration, the MECMA algorithm sorts the current population based on the fitness function and retains the top k individuals with the highest fitness values as elite individuals, where k denotes the number of individuals to be preserved. These elite individuals are directly replicated into the next generation population. By adopting this strategy, the algorithm not only enhances population diversity but also reduces the number of inferior solutions, thereby maintaining a desirable level of individual quality throughout the entire iteration process. 4.5 Crossover, Mutation, Termination Crossover. Crossover operation in genetic algorithms involves exchanging information between two chromosomes to create a novel chromosome. To preserve task dependencies, this study proposes a task-dependency-based crossover operation. In this operation, a pair of parent chromosomes is randomly selected, and partial mapping crossover is applied. Variable-length segments are assigned and exchanged to optimize parental information preservation, resulting in a novel chromosome. During the task-dependency-based crossover operation, tasks outside the exchange interval are analyzed for potential duplications in the offspring chromosomes. If duplicates are detected, corresponding tasks from the parent chromosome are substituted to ensure task dependency preservation. Algorithm 2 provides detailed procedures for the proposed crossover operation. Lines 1–2 first randomly select two positions s and e as the start and end positions of the exchange segment. Lines 3–4 represent the traversal of each position in the entire subinterval and the exchange of tasks at position ti . Lines 5–9 first traverse each task tj in the offspring chromosome. If the task tj is not in the interval [s, e], we find the position of the same task tj in another child chromosome. If the task at the found known position is not in the interval [s, e] and there is no dependency between the task tj and the task at the found position, it is called swapping the current position and the task at the found position. Since we only replace the tasks outside the swap interval, this keeps the task dependencies unbroken. Finally the newly generated offspring chromosome is returned. By this method, we can repair duplicate tasks while maintaining task dependencies and generate feasible task scheduling solutions in the algorithm. Mutation. Mutation operation is a crucial genetic operation used to enhance population diversity, prevent local optimal solutions, and improve search efficiency. In this study, a task sequence-based mutation operation is introduced for task scheduling problems. The operation assesses the need for mutation based on a random probability and, if necessary, selects an individual for the mutation process. The proposed mutation method involves swapping task positions, where two tasks are exchanged. A randomly selected task is relocated to a position that preserves task dependency relationships. For example, task ti is inserted after all its predecessors and before all its successors to maintain task sequence constraints. Regarding processor allocation and level mutation, the algorithm randomly selects positions in the processor sequence and voltage/frequency level sequence to assign

An Energy-Conscious Task Scheduling Algorithm

117

different processors. This approach maximizes diversity while ensuring algorithm effectiveness, leading to improved solutions. Algorithm 3 provides the pseudocode for the mutation operation. It begins by randomly selecting a task and populating sets Pi and Si with its predecessor and successor tasks respectively (Line 1–2). Lines 3–6 traverse the sequence to identify tasks neither predecessors nor successors of the selected task and devoid of dependency constraints with it. On meeting these conditions, the selected task is exchanged with task at position j, with random processor and frequency level alterations applied to both tasks, eventually yielding a new offspring.

Termination. The termination operation in a genetic algorithm refers to the process of stopping the algorithm upon reaching a certain condition. This condition is often met when the algorithm has discovered satisfactory solutions after numerous iterations or has been running for a considerable amount of time.

118

W. Hu et al.

5 Experiments This section showcases the superiority of the proposed MECMA algorithm through additional comparative experiments against established algorithms: HEFT and MOGA [10, 12]. Despite HEFT’s "unconscious" scheduling approach, its proven effectiveness in task scheduling problems necessitates comparison. MOGA, a genetic algorithm-based technique, aims to minimize workflow completion time under user-defined deadlines and energy limits, using dynamic voltage and frequency scaling (DVFS) to optimize energy consumption. Both algorithms hold widespread representation in the field. The experimental setup includes a population size of 2*|N|, 5000 iterations, a 0.5 crossover probability, and a 0.15 mutation probability. MATLAB simulations were performed to conduct the experiments. 5.1 Metrics To robustly assess the algorithm’s superiority, this study normalizes the task graph’s maximum completion time and energy consumption to the lower bound, specifically focusing on critical path tasks (CP tasks), excluding communication costs. The performance indicators employed are "Schedule Length Ratio" (SLR) and "Energy Consumption Ratio" (ECR). The SLR and ECR values are formally defined for the maximum completion time M and energy consumption Et of the schedule generated by the algorithm on the task graph G as follows: ECR = 

Et

min pj ∈p ωi,j ∗Pdyj,max ni ∈CP

(15)

M

ni ∈CP minpj ∈p ωi,j

(16)

SLR = 

The set of tasks in the CP critical path. 5.2 Comparison Experiments In order to assess the comparative performance of the MECMA algorithm, we focused on a randomly generated application graph, taking into account the following parameters: Number of tasks n: [60, 80, 100, 120, 140, 180, 200, 400], Communication to computation ratio CCR: [0.3, 0.5, 0.7, 1, 2, 3], Number of processors m: [4, 6, 8, 10, 12]. Experiment 1. As illustrated in Fig. 3, the experimental outcomes reveal the energy consumption ratio (ECR) corresponding to various task quantities. The proposed MECMA algorithm, when contrasted with the HEFT algorithm, exhibits a reduction in mean energy consumption by 7.127%. Moreover, in comparison to the MOGA algorithm, the average energy consumption is curtailed by 4.318%. Additionally, Fig. 3 elucidates the schedule length ratio (SLR) in relation to disparate numbers of tasks. The MECMA algorithm demonstrates a 7.767% decrease in average makespan as opposed to the HEFT algorithm and a 21.227% reduction when juxtaposed with the MOGA algorithm. Therefore, it can be concluded that the MECGA algorithm outperforms the HEFT and MOGA algorithms in terms of energy consumption and makespan with different number of tasks.

An Energy-Conscious Task Scheduling Algorithm

119

Fig. 3. Comparison of ECR and SLR with different number of tasks

Experiment 2. Comparative analyses were performed under different number of processors. As delineated in Fig. 4, it is discernible that the MECMA algorithm, across various processor counts, diminishes the average energy consumption by 5.829% in relation to the HEFT algorithm and by 1.252% when juxtaposed with the MOGA algorithm. Moreover, concerning makespan, the MECMA algorithm curtails the mean makespan by 5.781% when contrasted with the HEFT algorithm and by 18.479% compared to the MOGA algorithm. Therefore, the energy consumption and makespan of the MECMA algorithm are typically superior to other algorithms, regardless of the number of processors used.

Fig. 4. Comparison of ECR and SLR with different number of processors

Experiment 3. We assessed the performance of distinct Communication to computation ratio (CCR). As depicted in Fig. 5, under varying CCR conditions, the MECMA algorithm attains a 10.912% reduction in mean makespan relative to the HEFT algorithm and a 19.47% reduction in comparison to the MOGA algorithm. Additionally, the MECMA algorithm achieves a 4.247% reduction in average energy consumption when contrasted with the HEFT algorithm. The MECMA algorithm, classified as a metaheuristic algorithm, outperforms single heuristic algorithms but requires more computing resources and time. Despite its higher complexity, the MECMA algorithm offers flexibility and superior performance, making it the optimal choice for our needs.

120

W. Hu et al.

Fig. 5. Comparison of ECR and SLR with different CCR

6 Conclusions In this paper, we propose the MECMA algorithm for energy-aware scheduling in heterogeneous distributed systems. The MECMA algorithm considers both makespan and energy consumption to minimize the completion time of tasks while satisfying energy constraints. By incorporating task dependencies at different stages of the algorithm, the MECMA algorithm retains the global search capabilities of genetic algorithms and effectively prunes out a large number of suboptimal solutions, leading to better convergence towards the optimal solution. Experimental results demonstrate that our algorithm outperforms the HEFT and MOGA algorithms in terms of both completion time and energy consumption. Our research focuses on single application programs within the system, but in realworld scenarios, multiple application programs often run concurrently. While our method provides feasible solutions in a short time, we aim to identify more efficient techniques to improve problem-solving performance. Additionally, we seek to evaluate the practical applicability of our method in multi-processor scheduling challenges and incorporate multiple optimization parameters such as utilization, energy, temperature, and fault tolerance.

References 1. Xie, G., Xiao, X., Peng, H., et al.: A survey of low-energy parallel scheduling algorithms. IEEE Trans. Sustain. Comput. 7(1), 27–46 (2021) 2. Taghinezhad-Niar, A., Pashazadeh, S., Taheri, J.: Energy-efficient workflow scheduling with budget-deadline constraints for cloud. Computing 1–25 (2022) 3. Deng, Z., Cao, D., Shen, H., et al.: Reliability-aware task scheduling for energy efficiency on heterogeneous multiprocessor systems. J. Supercomput. 77, 11643–11681 (2021) 4. Hartmanis, J., Garey, M.R., Johnsons, D.S.: Computers and intractability: a guide to the theory of np-completeness. SIAM Re. 24(1), 90 (1982) 5. Digitale, J.C., Martin, J.N., Glymour, M.M.: Tutorial on directed acyclic graphs. J. Clin. Epidemiol. 142, 264–267 (2022) 6. Shimizu, T., Nishikawa, H., Kong, X., et al.: A fair-policy dynamic scheduling algorithm for moldable gang tasks on multicores. In: 2022 11th Mediterranean Conference on Embedded Computing (MECO), pp. 1–4. IEEE (2022) 7. Zhu, D., Melhem, R., Mossé, D.: The effects of energy management on reliability in real-time embedded systems. In: IEEE/ACM International Conference on Computer Aided Design, ICCAD-2004, pp. 35–40. IEEE (2004)

An Energy-Conscious Task Scheduling Algorithm

121

8. Usman, M.J., Ismail, A.S., Abdul-Salaam, G., et al.: Energy-efficient nature-inspired techniques in cloud computing datacenters. Telecommun. Syst. 71, 275–302 (2019) 9. Chen, S., Li, Z., Yang, B., et al.: Quantum-inspired hyper-heuristics for energy-aware scheduling on heterogeneous computing systems. IEEE Trans. Parallel Distrib. Syst. 27(6), 1796–1810 (2015) 10. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002) 11. Cao, E., Musa, S., Chen, M., et al.: Energy and reliability-aware task scheduling for cost optimization of DVFS-enabled cloud workflows. IEEE Trans. Cloud Comput. (2022) 12. Rehman, A., Hussain, S.S., Ur Rehman, Z., et al.: Multi-objective approach of energy efficient workflow scheduling in cloud environments. Concurrency Comput. Pract. Exp. 31(8), e4949 (2019)

A Hyper-Heuristic Algorithm with Q-Learning for Distributed Permutation Flowshop Scheduling Problem Ke Lan1 , Zi-Qi Zhang1,2(B) , Bi Qian1,2 , Rong Hu1,2 , and Da-Cheng Zhang1,2 1 School of Information Engineering and Automation, Kunming University of Science and

Technology, Kunming 650500, China [email protected] 2 Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China

Abstract. The Distributed Permutation FlowShop Scheduling Problem (DPFSP) is a challenging combinatorial optimization problem with many real-world applications. This paper proposes a Hyper-Heuristic Algorithm with Q-Learning (HHQL) approach to solve the DPFSP, which combines the benefits of both Q-learning and hyper-heuristic techniques. First, based on the characteristics of DPFSP, a DPFSP model is established, and coding scheme is designed. Second, six simple but effective low-level heuristics are designed based on swapping and inserting jobs in the manufacturing process. These low-level heuristics can effectively explore the search space and improve the quality of the solution. Third, a high-level strategy based on Q-learning was developed to automatically learn the execution order of low-level neighborhood structures. Simulation results demonstrate that the proposed HHQL algorithm outperforms existing state-of-the-art algorithms in terms of both solution quality and computational efficiency. This research provides a valuable contribution to the field of DPFSP and demonstrates the potential of using Hyper-Heuristic techniques to solve complex problems. Keywords: Distributed permutation flowshop scheduling problem · Hyper-Heuristic · Q-learning

1 Introduction The Permutation Flowshop Scheduling Problem (PFSP) is a widely studied variant of the flowshop scheduling problem, with the objective of minimizing the total production time by minimizing makespan [1]. In PFSP, each job must follow a fixed sequence of machines, meaning that the order of processing on each machine is predetermined and cannot be changed. This type of scheduling problem arises in many real-world applications, including manufacturing, transportation, and logistics [2]. However, finding an optimal solution becomes increasingly challenging due to the large search space of possible job permutations. In fact, it has been shown that the PFSP problem is an NP-hard problem when the number of machines is greater than two [3]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 122–131, 2023. https://doi.org/10.1007/978-981-99-4755-3_11

A Hyper-Heuristic Algorithm

123

In the context of the PFSP, it is typically assumed that all jobs are processed in a single production center. However, due to the globalization of the economy and the increasing interconnectedness of multinational enterprises, many modern production models are shifting from traditional centralized manufacturing methods to distributed manufacturing models across regions. To address scheduling problems that arise in such settings, researchers have proposed the Distributed Permutation Flow-shop Scheduling Problem (DPFSP) [4], which aims to optimize scheduling performance in distributed manufacturing environments. Naderi and Ruiz [4] developed six mixed integer linear programming (MILP) models to address the DPFSP and presented 14 constructive heuristics and fast local search techniques based on variable neighborhood descent (VND) to reduce the makespan. In this context, various optimization methods have been proposed and applied to solve complex problems. For instance, Ruiz et al. [5] put forwarded an iterative greedy algorithm based on DFSP that improves the results of Naderi et al. [4]. Although intelligent optimization algorithms have shown satisfactory performance on DPFSP, their iterative search process is usually time-consuming and rarely uses search history information to adjust the search behavior [6]. Recent trends show that the hyperheuristic has become an effective search method. Hyper-Heuristic (HH) is a new class of intelligent optimization algorithms. Heuristic algorithms operate directly on the problem domain, whereas HH algorithms are separated from the specific problem situation. Unlike traditional heuristics, which use problem-specific knowledge to generate solutions, HH focus on developing general methods for solving a wide range of problems. They do this by using a set of high-level heuristics or by combining low-level heuristics in innovative ways. The goal of HH algorithms is to develop a set of general heuristics that can adapt to different problem domains and produce high-quality solutions with minimal input from domain experts. To address complex problems, HH algorithm framework adaptively selects the optimizer, characterized by simple operations and a small number of parameters. Unlike the traditional meta-heuristic algorithms, this algorithm leads the Low-Level Heuristics (LLHs) through a specific High-Level Strategy (HLS) to form a new heuristic algorithm to achieve the search of different regions of the solution space. Hyper-heuristic tries to find the best sequence of heuristics by manipulating the underlying heuristic. There is a growing literature in the field of hyper-heuristic techniques [7].Many hyper-heuristic have proven to be very effective, such as hyper-heuristic based on particle swarm optimization [8], hyper-heuristic based on harmonic search [9], and hyper-heuristic based on bacterial foraging [10]. Q-Learning, as a type of Reinforcement Learning (RL), has shown great performance in solving various problems. In addition, HH algorithms based on RL have shown excellent capabilities in solving a wide range of scheduling problems. These algorithms leverage the power of reinforcement learning to improve their optimization strategies over time by learning from experience. The 0–1 knapsack problem was tackled using Q-Learning and HH algorithms to identify the best bio-inspired algorithm [11]. This study used four bio-inspired algorithms as LLHs, allowing automatic heuristic selection by the hyper-heuristic and Q-learning at each optimization cycle. Another study developed a Q-learning-based hyper-heuristic (QHH) algorithm to solve the semiconductor final testing scheduling problem (SFTSP), where Q-learning was employed to choose a heuristic from the LLH set as a high-level strategy [12]. In addition, A hyperheuristic with Q-learning is proposed to solve the Multi-objective Energy-Efficient Distributed

124

K. Lan et al.

Blocking Flow Shop Scheduling Problem (DBFSP) [13]. These studies inspired the development of an QHH algorithm to tackle the DPFSP. The remainder of this paper is structured as follows: Sect. 2 provides a brief introduction to DPFSP. In Sect. 3, we propose the Hyper-Heuristic Algorithm with Q-Learning (HHQL) to address DPFSP. Section 4 presents simulation results and comparisons. Finally, in Sect. 5, we conclude this paper and discuss future directions for research.

2 DPFSP The DAPFSP problem is described as follows. There are a set of f identical factories, each one consisting of several processing machines. For a set of n jobs ω = {ω1 , ω2 , ω3 , ..., ωn }, each job can be assigned to any one of the factories, because they are totally identical. Every factory is a relatively independent flow shop, and all parts on the factory must be machined in the same order, i.e., first on machine M1 , then on machine M2 , and so on until to machine Mm . The operation to machine the i th job on machine Mj is denoted Oi,j , whose processing time is denoted pi,j . Once processing operations Oi,j must be performed without interruption once the operation is started (Table 1). The notation used is presented below: Table 1. Symbol Description

Parameters

f

The set of factories, i.e., f {1,2,...,F }. The set of machines, i.e., j {1,2,...,M }. The set of jobs, i.e., i {1,2,..., N }. The number of jobs belong to factory f .

j i Nf 

The

total

sequence

{ 1, 2,..., N}.

of

jobs

sets, i.e.,

f

The subsequence of jobs sets in factory f , i.e.,  f ={1 f , 2 f ,..., fNf }.

pi, j Ci, j Cf

The processing time of job

C

The completion time of all factories.

i on machine j. i on machine j. The completion time of factory f . The completion time of job

For a schedule π of the DAPFSP, the makespan C(π ) can be calculated as follows: Cπ f ,1 = pπ f ,1 , f = 1, 2, ..., F; 1

Cπ f ,1 = Cπ f

(1)

1

+ pπ f ,1 , f = 1, 2, ..., F; i = 2, ..., Nf ;

(2)

Cπ f ,j = Cπ f ,j−1 + pπ f ,j , f = 1, 2, ..., F; j = 2, ..., M ;

(3)

i

1

i−1 ,1

1

i

1

A Hyper-Heuristic Algorithm

Cπ f ,j = max{Cπ f

i−1 ,j

i

125

+ Cπ f ,j−1 } + pπ f ,j , i

i

(4)

f = 1, 2, ..., F; i = 2, ..., Nf , j = 2, ..., M ; Cf = Cπ f

Nf

C=

, f = 1, 2, ..., F;

(5)

max {Cf };

(6)

,M

f =1,2,...,F

Equations (1–4) represent the formulas for calculating the completion time of each job in factory f . Equation (5) expresses the maximum completion time of factory f . Equation (6) indicates the formula for calculating the maximum completion time for all factories.

3 HHQL for DPFSP 3.1 Encoding and Decoding In this study, we represent a solution using a set of f sequences, with each sequence corresponding to a specific factory. These sequences consist of permutations of all parts allocated to their respective factory, indicating the processing order of the jobs in the flow shop. With this representation, a solution can be denoted by π = {π 1 , π 2 , ..., π f }, where π k = {π1k , π2k , ..., πNk k }, k = 1, 2, ..., f , is a sequence related to factory k, and Nk is the total number of jobs assigned to factory k. For ease of understanding, a simple example is provided. Consider a problem instance with 8 parts, 2 factories, and 2 identical machines in each factory. Table 2. Displays the processing times of the parts for one instance. One feasible encoding for this instance is shown as follows: π = {{2, 3, 1}, {7, 8, 6, 5, 4}}. In this instance, the first sequence π 1 = {2, 3, 1} represents the processing order of jobs in the first factory, and the second sequence π 2 = {7, 8, 6, 5, 4} represents the processing order of parts in the second factory. Table 2. A simple example of DPFSP.

jobs M1 M2

1 8 4

2 2 7

3 6 8

4 4 7

5 3 3

6 3 2

7 7 2

8 1 2

Decoding refers to the process of determining the makespan of a scheduling plan by assigning each job to a factory and determining the processing order in every factory. In this paper, a decoding strategy is used, which consists of two stages: first, assigning the jobs to different factories, and second, determining the processing order of each job within each factory. The final completion time of each factory is calculated based on the start and complete times of each job in processing stage, and makespan which is the maximum completion time of all factories is obtained. The Gantt chart of an instance of DPFSP is shown in Fig. 1.

126

K. Lan et al.

M3

Factory 2

M2 M1

J2

M3

Factory 1

M2

J2

J5

J1

J2

J-5

J1

J4

J5

J1

J4 J1

J1

M1

J1

J6

J6

Makespan J6

J3

J3

J3

50

0

J4

100

Time

Fig. 1. The Gantt chart of one instance for DPFSP

3.2 Initialization Initialization is a crucial step in optimization algorithms as it determines the quality of the optimization results and the stability of the algorithm. In this paper, we adopt the NEH2 algorithm proposed by Naderi and Ruiz [4] as the initialization procedure, which is an effective heuristic algorithm. The NEH2 algorithm is based on the NEH algorithm and introduces a priority queue for processing optimal job sequencing. Specifically, the NEH2 algorithm sequentially inserts each job into all positions of all factories, calculates the makespan after each insertion, selects the position that results in the shortest makespan, and places the job in that position. This process continues until all jobs are inserted into the working sequence of each factory. This generates an initial solution that provides good quality and stability, thereby providing a better foundation for subsequent optimization. 3.3 HHQL The specification framework of the HH algorithm consists of two levels: high-level strategies and a set of LLH. In this study, a Q-learning-based high-level strategy, HHQL, was adopted, which seeks the most suitable LLH based on the change of state in the evolution process. The key to using Q-learning as a heuristic selection mechanism is to design appropriate actions and states that accurately describe the scheduling environment and thus improve the efficiency of the learning process. In this study, each feasible solution obtained is treated as a state, and each LLH is considered as an action. In the iteration process, the initial solution is selected as the initial state. The initial values of the Q-table and R-table are set to zero. The Q-table is a data structure used to store the learned experiences of an agent in a given environment. Each row of the Q-table represents a state, and each column represents an action. The Q-value for each action at a given state is calculated using the following formula: Qt+1 (st , at ) = Qt (st , at ) + αt {rt + γ arg maxQ(st+1 , a) − Qt (st , at )}, a∈A

(7)

A Hyper-Heuristic Algorithm

127

where Q(st , at ) is the q-value of the state st when taking action at . γ , which is a value between 0 and 1, represents the discount rate and determines the importance of future rewards in the update process. rt is the reward obtained by the agent immediately after taking action at at state st . max Q(st+1 , a) represents the maximum Q-value over all possible actions a in the next state St+1 .αt , which is also a value between 0 and 1, represents the learning rate and determines the extent to which the newly obtained reward influences the Q-value update. In HHQL, αt is decreased over evaluation times, and is calculated as Eq. (8): αt = 1 −

0.9 ∗ t tmax

(8)

Action Selection Strategy. The action selection strategy of HH algorithm can balance the exploration and exploitation during the learning process. The ε-greedy strategy is the common method for action selection in Q-learning, which is presented as fellow:  arg maxQ(St , a), rand < 1 − ε a∈A , (9) a= random, other ε=

0.5 1+e

10∗(t−0.6∗tmax ) tmax

,

(10)

where ε represents a sample value subject to standard normal distribution at time t and tmax denotes stopping criteria. The formula indicates that at the beginning of training, there is an approximately 50% probability of exploring new actions, but as time t increases, the algorithm tends to choose actions with the maximum Q value. In other words, the action selection strategy of the algorithm (i.e., the selection of LLH) retains a certain degree of exploration in the early stages of action selection and tends to use learned knowledge to guide action selection as the number of training iterations increases. Reward Mechanism. The reinforcement signal, rt , is utilized in the Q function to assess the performance of actions. The reward mechanism used in Q-learning is expressed in Eq. (11), which includes a random number u within the range of [0,1], and where rt depends on the probability of εt and the aggregated state after an episode is finished. ⎧ u < εt ∧ Pf ∈ [0.85, 1)||u > εt ∧ Pf ∈ [0, 0.85) ⎪ ⎨ 1, u < εt ∧ Pf ∈ [0, 0.85)||u > εt ∧ Pf ∈ [0.85, 1) rt = 2, (11) ⎪ ⎩ 0 , otherwise Low-Level Heuristics of HHQL. Low-level heuristics (LLHs) are a type of simple and easy-to-implement heuristic methods used in optimization algorithms to guide the search direction at each decision step. These heuristic methods are typically designed based on the local features and empirical knowledge of the problem, such as domain knowledge and heuristic rules. Compared to high-level heuristics, LLHs have lower computational costs but can provide practical guidance during the search process and have some effect on improving the optimization results. To our best knowledge, LLHs have a significant impact on the performance of hyper-heuristics. In this paper, we designed eight simple

128

K. Lan et al.

heuristic methods to construct a LLHs pool based on the characteristics of DPFSP. The details are shown as follows. LLH1 : Each job is removed from the critical factory and inserted into all possible positions in the original factory. The position with the best makespan is selected. LLH2 : Each job is removed from the critical factory and swapped with all possible positions in the same factory. The position with the best makespan is selected. LLH3 : Each job is removed from the critical factory and inserted into all possible positions in all other factories. The position with the best makespan is selected. LLH4 : Each job is removed from the critical factory and swapped with all possible positions in all other factories. The position with the best makespan is selected. LLH5 : Randomly remove two jobs from all factories and sequentially insert them into all possible positions in all factories. The position with the best makespan is selected. LLH6 : Randomly remove three jobs from all factories and sequentially insert them into all possible positions in all factories. The position with the best makespan is selected. The Procedure of HHQL for DPFSP. Table 3 describes the detailed process of HHQL for DPFSP. Table 3. The procedure of HHQL for DPFSP.

Input: Parameters. 1: 2: 3: 4:

Generate an initial sequence by NEH2 and save as global best solution Initialize Q-table to 0 and randomly initialize a state st .

π*.

while not satisfied with the stopping condition do greedy strategy and update state (i.e., set st Select an action by

st

1).

a on best solution π*.

5:

Apply the action

6:

if

7: 9: 10:

Update the best individual π*. else obtain the reward rt based on Eq. (11).

π*is improved do

Update the 11: 12: end while Output: π*.

Qt 1(st ,at) based on Eq. (7).

To begin with, we generate an initial solution based on NEH2 and save it as the current optimal solution. Then we initialize the Q-table and determine the initial states in a random way. It should be noted that we set each action as a state. For example, when we finish LLH3 , then we note that the state is St = 3. Then we will determine the next action based on the ε-greedy strategy (Table 3, Lines 4). After executing the action, we will determine whether the makespan obtained is better and whether to replace the optimal solution with the new solution. In the following, the Q-table is updated according

A Hyper-Heuristic Algorithm

129

to the reinforcement signal rt . At last, repeat the action of selecting, executing, judging and updating the Q-table until the stopping condition is satisfied.

4 Simulation and Comparisons Through conducting parameter calibration experiments, we have determined the optimal combination of parameter values for HHQL, which consists of the following: learning rate α = 0.5, the discount factor γ = 0.95, and initial greedy rate ε = 0.9. To evaluate the effectiveness of HHQL for solving DPFSP, it was compared with two other algorithms, IG [5] and GA [14]. All algorithms are coded by Python 3.8 and conducted on same environment. The maximum elapsed CPU time n × f × m (seconds) is used as the stopping conditions for all experiments. Ten independent experiments were conducted for each test problem, and the optimal results for each problem are shown in bold in Table 4. Table 4. Comparison of HHQL with IG and GA Instance

GA

IG

HHQL

n,m,f

Best

AVG

Best

AVG

Best

AVG

20,5,2

770

803

746

766

746

770

20,5,3

588

662

575

630

580

623

20,5,4

517

577

517

550

519

568

20,10,2

940

1078

917

1022

911

1012

20,10,3

731

875

767

834

769

800

20,10,4

782

833

762

791

762

793

50,5,2

1533

1649

1489

1568

1488

1501

50,5,3

1050

1224

1018

1145

1013

1132

50,5,4

845

1049

832

998

825

921

50,10,2

1739

1901

1688

1790

1682

1779

50,10,3

1451

1642

1384

1553

1364

1521

50,10,4

1160

1359

1133

1291

1123

1266

100,5,2

2729

2800

2650

2718

2645

2709

100,5,3

1950

2198

1930

2123

1918

2098

100,5,4

1432

1499

1406

1462

1391

1459

100,10,2

3093

3379

2994

3210

2991

3126

100,10,3

2157

2185

2063

2168

2037

2093

100,10,4

1716

1831

1648

1753

1627

1698

According to Table 4, we can see that IG can occasionally present better results on some small-scale arithmetic cases. However, on the large-scale instances which take more time, HHQL shows a better performance. This indicates that the HHQL algorithm

130

K. Lan et al.

is an effective HH algorithm in solving large-scale problems by continuously learning and improving the order of executing LLHs. The average makespan of each instance which we obtained by repeating the experiment also confirms that HHQL is better in the stability of algorithm. Generally, HHQL outperforms IG and GA on most of the test problems, indicating that HHQL is an effective algorithm for solving DPFSP.

5 Conclusion and Further Work In this study, a HHQL algorithm is proposed for the DPFSP problem of minimizing makespan, and experiments are conducted on 18 instances. The experimental results show that HHQL performs well. Future research will be devoted to further optimize the parameter selection and improve the efficiency of the algorithm, and to apply the method to other scheduling problems. Acknowledgements. The authors are sincerely grateful to the anonymous reviewers for their insightful comments and suggestions, which greatly improve this paper. This work was financially supported by the National Natural Science Foundation of China (Grant Nos. 72201115, 62173169, and 61963022), the Yunnan Fundamental Research Projects (Grant No. 202201BE070001-050 and 202301AU070069), and the Basic Research Key Project of Yunnan Province (Grant No. 202201AS070030).

References 1. Framinan, J.M., Gupta, J.N.D., Leisten, R.: A review and classification of heuristics for permutation flow-shop scheduling with makespan objective. J. Oper. Res. Soc. 55(12), 1243– 1255 (2004) 2. Qian, B., Wang, L., Hu, R., Wang, W.L., Huang, D.X., Wang, X.: A hybrid differential evolution method for permutation flow-shop scheduling. Int. J. Adv. Manuf. Technol. 38(7–8), 757–777 (2008) 3. Gonzalez, T., Sahni, S.: Flowshop and jobshop schedules: complexity and approximation. Oper. Res. 26(26), 36–52 (1978) 4. Naderi, B., Ruiz, R.: The distributed permutation flowshop scheduling problem. Comput. Oper. Res. 37(4), 754–768 (2010) 5. Ruiz, R., Pan, Q.K., Naderi, B.: Iterated Greedy methods for the distributed permutation flowshop scheduling problem. Omega-Int. J. Manag. Sci. 83(1), 213–222 (2019) 6. Gao, K., Yang, F., Zhou, M., Pan, Q., Suganthan, P.N.: Flexible job-shop rescheduling for new job insertion by using discrete Jaya algorithm. IEEE Trans. Cybern. 49(5), 1944–1955 (2019) 7. Burke, E.K., et al.: Hyper-heuristics: a survey of the state of the art. J. Oper. Res. Soc. 64(12), 1695–1724 (2013) 8. Koulinas, G., Kotsikas, L., Anagnostopoulos, K.: A particle swarm optimization based hyperheuristic algorithm for the classic resource constrained project scheduling problem. Inf. Sci. 277, 680–693 (2014) 9. Anwar, K., Khader, A.T., Al-Betar, M.A., Awadallah, M.A.: Harmony Search-based Hyperheuristic for examination timetabling. In: 2013 IEEE 9th International Colloquium on Signal Processing and Its Applications, pp. 176–181. Publishing (2013)

A Hyper-Heuristic Algorithm

131

10. Rajni, I.C.: Bacterial foraging based hyper-heuristic for resource scheduling in grid computing. Future Gener. Comput. Syst. 29(3), 751–762 (2013) 11. Gölcük, ˙I, Ozsoydan, F.B.: Q-learning and hyper-heuristic based algorithm recommendation for changing environments. Eng. Appl. Artif. Intell. 102, 104284 (2021) 12. Lin, J., Li, Y.-Y., Song, H.-B.: Semiconductor final testing scheduling using Q-learning based hyper-heuristic. Expert Syst. Appl. 187, 115978 (2022) 13. Zhao, F., Di, S., Wang, L.: A hyperheuristic with Q-learning for the multiobjective energyefficient distributed blocking flow shop scheduling problem. IEEE Trans. Cybern. 1–14 (2022) 14. Gao, J., Chen, R., Liu, Y.: A knowledge-based genetic algorithm for permutation flowshop scheduling problems with multiple factories. Int. J. Adv. Comput. Technol. 4, 121–129 (2012)

Robot Path Planning Using Swarm Intelligence Algorithms Antanios Kaissar1 , Sam Ansari2 , Meshal Albeedan3 , Soliman Mahmoud2 , Ayad Turky4 , Wasiq Khan3 , Dhiya Al-Jumeily OBE5 , and Abir Hussain2,3(B) 1 Department of Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates

[email protected]

2 Department of Electrical Engineering, University of Sharjah, Sharjah, United Arab Emirates

{Samansari,Solimanm,abir.hussain}@sharjah.ac.ae

3 School of Computer Science and Mathematics, Liverpool John Moores University,

Liverpool L3 3AF, UK [email protected], [email protected] 4 Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates [email protected] 5 School of Computer Science and Mathematics, Liverpool John Moores University, Liverpool L3 5UX, UK [email protected]

Abstract. Nowadays, robotic applications exist in various fields, including medical, industrial, and educational. The critical aspect of most of these applications is robot movement, where an efficient path-planning algorithm is required in order to guarantee a safe and cost-effective movement. The main goal of the path planning technique is to find the shortest possible path to the destination while avoiding the obstacles on the route. This study proposes a framework employing swarm intelligence optimization techniques based on an improved genetic algorithm and particle swarm optimization to obtain the optimum trajectory. The simulations are conducted using MATLAB R2022b. It is observed that the proposed particle swarm optimization achieves better accuracy of up to 99.5% and faster convergence time when compared with the genetic algorithm that attains 74.6% accuracy. The proposed optimized path planning algorithm is considerably advantageous, especially in realistic applications such as rescue robots and item delivery. Keywords: Genetic algorithm · Particle swarm optimization · Path planning · Robotics · Swarm intelligence

1 Introduction Similar to any existing simple system that takes an input, processes it, and produces an output, the input for a robot is programmed data that the robot should translate into actions and perform them accurately and according to a specific task. Back in 1954, George Devol invented the first programmable and digitally operated robot, which was called “Unimate”. Later in 1961, this robot was sold to General Motors. The job of the © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 132–145, 2023. https://doi.org/10.1007/978-981-99-4755-3_12

Robot Path Planning Using Swarm Intelligence Algorithms

133

robot was to grab the hot pieces of metal and stack them [15]. That has represented the foundations of today’s robotic industry, where the main purpose of robots was to provide industrial aid. However, in the middle of the 1990s’ robotic applications started to spread in other areas such as health care, deep water repairs, mine clearing, and heritage preservation [1]. A few years later, it began in personal services such as home cleaning [20]. Today, robots are almost applied everywhere with the ongoing growth of applications, specifically when combined with artificial intelligence (AI) and the Internet of Things (IoT). Henceforth, one crucial aspect of the robot that must be considered is the robot’s movement. Various algorithms and techniques are used to achieve path planning for robots; however, according to work [14], the path planning techniques are divided mainly into two categories depending on whether the environment is static or dynamic. The detailed classification of the existing path planning techniques is shown in Fig. 1. Each approach depends on the environment and the robot’s capabilities. The algorithm must converge to find the path to the goal where it exists; otherwise, it should stop and inform the user about the unavailability of the route. Furthermore, there are several attributes each algorithm depends on, such as path length, computation time, rotation, robustness, memory requirements, and simplicity [12].

Fig. 1. Classification of path planning techniques.

One of the biologically inspired methods is swarm intelligence, which is suitable for dynamic environments. In this paper, a robot path planning algorithm is proposed utilizing an improved genetic algorithm (GA) and particle swarm optimization (PSO). Such algorithms guide the robot to acquire the optimal path to the destination along

134

A. Kaissar et al.

with obstacle avoidance. The proposed method is robust and beneficial and improves the robotic applications that rely significantly on robot motion. The rest of this paper is structured as follows. In Sect. 2, the problem statement and literature review are discussed. The research methodology and proposed framework are presented in Sect. 3. Section 4 provides the results and discussion. Lastly, Sect. 5 highlights the conclusions and future work.

2 Problem Statement and Related Work Robot motion plays a considerable role in specific applications such as search and rescue robots [17], product delivery [5] as well as assembling items. Therefore, the robot’s path must be carefully planned to guarantee arrival in the shortest possible time. In addition, obstacle avoidance should be considered because any collision might delay or damage the robot. This work focuses on the most up-to-date biologically inspired algorithms. Specifically, the swarm intelligence methods and how they address robot motion planning with obstacle avoidance. Then, a comparison between them and the environment of each experiment is presented. The study in [16] discusses path planning using the GA as well as utilizing Dijkstra’s algorithm for obstacle avoidance. The authors implement three main experiments. In each experiment, the number of obstacles changes starting with five, eight, and 10 obstacles. They compare their results to the traditional GA and claim their modified GA achieved better results in terms of a shorter path and less time consumed in seconds. Their main drawback is using the same size and shape of small square obstacles. Moreover, the robot’s path is made up of straight lines, which is not very effective and smooth. Curves are necessary and helpful to achieve a shorter route. An improved particle swarm optimization combined with grey wolf optimization (IPSO-GWO) was proposed in [6]. To increase the diversity of the particles and help in faster convergence, they initialized the position and speed of particles using chaos. They compared their hybrid model with multiple algorithms such as GA, ACO, and PSO using two grid maps of sizes 20 × 20 and 30 × 30. Moreover, they compared the model convergence with ten benchmark functions. The authors claim that the IPSOGWO performed better with collision avoidance and faster convergence time. However, they did not consider non-edgy obstacles such as circles. Ali et al. [2] have developed robot motion planning based on the combination of the firefly algorithm and cubic polynomial equation. They used A ∗ algorithm to find the shortest path between two selected points. They examine their model using seven different maps, each with different starting and ending points. In addition, the paper compares its results to another published paper. All the maps were of size 20 × 20. The water cycle algorithm (WCA) is utilized in [18]. The authors assert they are the first to employ WCA for robot path planning. They tested the algorithm using three maps with different sizes and obstacle shapes. WCA achieved better results in comparison with another algorithm in the literature. However, the largest map was of size 14 × 14, presented in Fig. 2a, which is considerably small. Moreover, they did not show the convergence speed of WCA.

Robot Path Planning Using Swarm Intelligence Algorithms

135

Guangqiang et al. [10] exploited an improved artificial fish swarm algorithm (IAFSA) to address robot path planning. They have compared their results to the traditional AFSA implementation of their own utilizing two different maps of size 100 × 100. The improved model achieves a shorter path and reduces the convergence time by approximately 52% on both maps. The second simulation is illustrated in Fig. 2b. GA is used in [8] for static environment robot motion planning. They verify the model on two maps, both of size 7 × 7 grids. Every map has different obstacles, but they are not completely closed shapes, only lines. They use a different number of steps in each map, starting with five steps, then six, seven, and eight, respectively. Nonetheless, they did not compare GA performance with other algorithms or work. Tuba et al. [19] utilize brain storm optimization (BSO) technique to solve robot path planning problems. They improve the optimization process by introducing a new local search procedure. They set up five test environments, each map of size 20 × 20, as depicted in Fig. 2c, with different shapes and a number of obstacles. Moreover, they compare their results with three other literature methods: PSO, NLI-PSO, and SA-PSO. Their BSO model achieves better results in terms of finding the shorter path. An improved chicken swarm optimization (ICSO) is proposed in [11]. They enhance the original CSO with a problem falling into a local minimum. They add nonlinear weight reduction ability so that the position update of chickens is better. Compared to the traditional CSO, it is claimed that the ICSO achieved better performance with fewer iterations. Using a 10 × 10 grid map, the authors assert that their method achieves a 15% shorter path, as shown in Fig. 2d. Hongyan [13] used an improved ant colony optimization (IACO) to address robot motion planning. By enhancing the traditional ant colony optimization (ACO), they are able to reduce the number of iterations from 45 to 25. They examine the model on twodimensional as well as three-dimensional space. The map size is 25 × 25, containing only uniform edgy shapes. Table 1 indexes the key parameters of the state-of-the-art investigated in this paper. After presenting the existing work tackling the robot motion planning problem employing different swarm intelligence methods, this study proposes and implements modified GA and PSO algorithms exploiting MATLAB. Nonetheless, to create a better simulation environment, these changes are considered: 1. The movement does not occur per unit square as the previous methods did; instead, the movement occurs per pixel of the image in order to make the simulations more realistic and accurate, 2. Unlike all the previous simulations, the algorithms are tested on maps containing non-edgy obstacles like circles. 3. Two different sources of maps are considered, the first one by reading an image created from another software. The second one is by building the map in the MATLAB environment.

136

A. Kaissar et al.

(a) WCA.

(b) IAFSA.

(c) BSO.

(d) ICSO.

Fig. 2. Representation of multiple simulation environments using various algorithms.

Table 1. Summary of the state-of-the-art. Ref.

Year

Algorithm

Map Size

Obstacle Shapes

No. of Maps

[16]

2022

GA

-

1

3

[2]

2022

Firefly with CPE

20 × 20

4

7

[6]

2021

IPSO-GWO

20 × 20, 30 × 30

6

2

[13]

2021

IACO

25 × 25

5

2

[11]

2020

ICSO

10 × 10

5

1

[10]

2020

IAFSA

100 × 100

8

2

[8]

2019

GA

7×7

3

2

[18]

2019

WCA

14 × 14, 6 × 6, 9 × 12

5

3

[19]

2018

BSO

20 × 20

8

5

Robot Path Planning Using Swarm Intelligence Algorithms

137

3 Methodology In this section, the proposed model design is presented. Moreover, the detailed equations and implementation of GA and PSO for robot path planning are presented. 3.1 Proposed Model Framework The main objectives of utilizing the optimization for robot motion are summarized in two points: 1. Search and find the shortest possible path (where available) between the start and endpoints. 2. Guarantee the arrival to destination without collisions (if possible). The detailed proposed framework is presented in Fig. 3. Initially, the map is either created inside MATLAB or read from outside after creating it using external drawing software. Then, the start and end points are defined, and the proposed framework proceeds to the next step of initializing the parameters and number of iterations. This is a crucial step because it determines the performance of the algorithm. The parameters specify the speed at which the algorithm converges, while the number of iterations determines the number of computational resources required. Choosing the wrong parameters or the number of iterations can result in the algorithm not converging or taking an unacceptably long time to find the optimal path. The algorithm works by iteratively searching for the path with the lowest cost from the start point to the endpoint. At each iteration, the model considers all possible trajectories that could lead to the endpoint and selects the path with the lowest cost. This process is repeated until the optimal route is found. During the execution of the algorithm, the proposed framework checks if the algorithm converges by checking if the change in the cost of the optimal path between two iterations is below a certain threshold or the maximum iterations is reached. 3.2 Genetic Algorithm Implementation GA was introduced in the 1960s and is exploited to solve nonlinear/non-differentiable optimization problems. Nevertheless, today’s concept of GA is somehow different from the original concept [4]. Each bit is called an “allele”, and the bit string is called a “chromosome”; the group of chromosomes forms the population. Each chromosome is replaced depending on its score, and the score is calculated using an objective function or so-called “fitness function”. The function can be linear or nonlinear and could include trigonometrical functions also. Each full loop is called a “generation”; approximately, the GA loops from 50 to 500 times [9]. In the beginning, the program reads the image containing the map in a bmp format. It is saved in a global variable called “img”. After that, the main variables are initialized. 1. 2. 3. 4.

Start: A global variable to save the starting point. Target: A global variable to save the destination point. Smooth: A global variable type bool that checks whether the line is smoothed. Start: A global variable to save the starting point.

138

A. Kaissar et al.

5. Target: A global variable to save the destination point. 6. Smooth: A global variable type bool that checks whether the line is smoothed. 7. SolutionPoints: A local variable to hold the number of points used to form the line which connects the starting and ending point. 8. Generations: A local variable where each generation forms a single iteration for the algorithm. 9. Population: A local variable that holds the value of the population.

Fig. 3. Proposed framework design.

Robot Path Planning Using Swarm Intelligence Algorithms

139

After initializing these variables, the program checks using the function feasible point if the starting and ending points are valid, ensuring they are inside the map and do not lie on an obstacle. Then it starts with a generated path consisting of multiple segments depending on the solution-points variable. The objective function is the length of this path which is calculated by [3] Fitness =

n 

L(i)

(1)

i=1

The fitness value is computed by adding all lengths of all line segments. However, a penalty weight is added if any part of the segment lies in an obstacle. Accordingly, the cost of each segment alone is calculated using this formula:  L(i) = (n1 − n2 )2 , (2) where n1 is the previous position and n2 is the new position. However, if this value is not less than 1, it checks for a better segment. This is done by fixing the previous point n1 and looking for a new point that satisfies the cost value to be less than 1 using this formula: P(x,y) = n1 + r ∗ [sin(di )cos(di )],

(3)

where r is changing each time, starting from one with an increment of 0.5, and the cost is calculated between the new point P and the old point n1 using (2). It keeps running until a feasible point with a value less than one is found. Moreover, di is the four-quadrant inverse tangent calculated using this function: di = atan2[(n1 − n2 ), (P − n1 )]

(4)

When a feasible cost value that is less than one is found, the total cost is updated, and the cost for the next segment is calculated until all the segments are finished. This is for one generation; the program keeps running until the maximum number of generations is reached, and the best total cost of all is the answer. 3.3 Particle Swarm Optimization Implementation PSO is one of the heuristic algorithms which has various applications in multiple fields, including image processing, power systems, networking, and routing. The main principle of PSO was motivated by the social behavior of birds flocking or fish schooling. However, if compared to GA, PSO does not have evolutionary operators such as mutation and crossover. In PSO, the particles represent solutions moving in the search space. Each particle uses the combination to find the best solution. Every particle can communicate with the nearest particles, called a “neighborhood” [7]. Unlike the GA implementation, where the map is read from an image file, in this part, the map with the obstacles is created in the program. However, the same map is chosen to make a fair comparison. There are no global variables here; all of them are local and are as follows. 1. Map: This variable calls the function Model, where the starting and ending points are initialized and the obstacles are created.

140

2. 3. 4. 5. 6. 7.

A. Kaissar et al.

NVar: This holds the number of decision variables. VarSize: It is a matrix of the size of decision variables. Iterations: This variable holds the number of iterations. Population: This variable holds the swarm size. Gbest: This variable holds the value of the best solution. Empty_Particle: This represents a set of seven variables that are initially empty. Each one saves a specific value, such as velocity and position to be used in the calculation.

After initializing the map environment with the obstacles and setting these variables, the program starts by initializing the positions of the particles by calling the function RandomSolution. This function generates the initial random positions using a built-in function “unifrnd()”. After that, the velocity is initialized to zero and calls the function costm to calculate the cost of that position and check whether the violation lies on an obstacle. Then the value of the best position found so far is updated by   (5) pbest(i, t) = arg min f (Pi (k)) , i ∈ {1, 2, . . . , Np }, k=1,...,t

where Np is the population size, i indicates the particle index, and t denotes the current location. Then the computed value of (5) is compared to gbest value to update it in case it is better using:   gbest(t) = arg min (6) f (Pi (k)) i = 1, . . . , Np k = 1, . . . , t After the global best position is updated, the velocity is calculated using this formula: Vi (t + 1) = wVi (t) + c1 r1 (pbest(i, t) − Pi (t))+c2 r2 (gbest(t) − Pi (t)),

(7)

where r1 and r2 are uniformly distributed random variables, c1 and c2 are constant positive parameters. While w is an inertia weight that can be adjusted to control the search fitness in different stages of the iteration. Consequently, the velocity is updated by Pi (t + 1) = Pi (t) + Vi (t + 1).

(8)

When the position is updated, it checks if the position is the target or not; if not, it goes back to (5) and starts over.

4 Results and Discussion Firstly, the GA results are presented, followed by the results obtained by PSO. Finally, a comparison between the presented GA and PSO is conducted. In order to verify the accuracy and feasibility of the utilized algorithms, the simulation experiments were conducted on Intel Core i7-10870H CPU, 2.2 GHz, and 16 GB RAM, using the latest version of MATLAB which is R2022b.

Robot Path Planning Using Swarm Intelligence Algorithms

141

4.1 Outcomes Using Genetic Algorithm The number of population has been fixed to 100, and different executions are conducted to measure the effect of changing the number of generations on the path length and the mean value. The results are summarized in Table 2. The best result is achieved when the number of generations is set to 50. GA simulation is shown in Fig. 6a. Starting point is (1, 1), and the target point is (500, 500). The convergence in each generation is illustrated in Fig. 6b. The best solution is 954, and the mean is 960. Table 2. Genetic algorithm simulations part 1. No

Population

Generations

Best Value

Mean

1

100

10

974

976.35

2

100

20

965

970.12

3

100

30

957

965.26

4

100

40

961

965.97

5

100

50

954

960.13

As the number of generations increases, the path length and the mean value gets approximately lower. However, the overall values are somehow far from the shortest distance which from point (1, 1) to (500, 500) which is 707 exactly. In the next experiment, the number of generations is fixed to 50 and varying the population number as indexed in Table 3. The best value was 934 which occurred at population 125. Table 3. Genetic algorithm simulations part 2. No

Population

Generations

Best Value

Mean

1

50

50

963

966.18

2

75

50

961

965.05

3

100

50

965

966.26

4

125

50

934

945.10

5

150

50

967

970.51

142

A. Kaissar et al.

4.2 Outcomes Using Particle Swarm Optimization The results presented in Table 4 demonstrate the impact of varying the number of iterations on the performance of the algorithm. It is worth noting that the number of populations is fixed at 100 throughout the experiments. The table shows that the best score is achieved when the algorithm is run for 100 iterations. This indicates that the algorithm converges to the optimal solution after 100 iterations. Running the algorithm for a larger number of iterations does not yield significant performance improvements. However, it is also worth noting that the performance of the algorithm is still relatively good even when the number of iterations is reduced to 50 or 25. This suggests that the algorithm is able to converge to an excellent solution even with a smaller number of iterations, although the quality of the solution may not be as high as when the algorithm is run for 100 iterations. Overall, these results provide valuable insights into the behavior of the model and can help inform decisions about how to set the number of iterations when using this algorithm in practice. Table 4. PSO simulations part 1. No

Population

Iterations

Best Value

Mean

1

100

25

716

730.31

2

100

50

750

810.15

3

100

75

711.9

725.61

4

100

100

711.6

760.72

5

100

125

712.3

720.91

Now, the number of iterations is fixed to 100, and the population number varies. Results are indexed in Table 5; the best score is achieved with 125 population.

Robot Path Planning Using Swarm Intelligence Algorithms

143

Table 5. PSO simulations part 2. No

Population

Iterations

Best Value

Mean

1

25

100

760

811.31

2

50

100

712

760.50

3

75

100

711.8

726.61

4

100

100

712.5

732.15

5

125

100

711.4

722.81

To fairly compare with GA, the number of iterations is set to 50, and the number of populations is 100. The simulation results are shown in Fig. 7a, and PSO convergence is depicted in Fig. 7b (Fig. 4).

(a) PSO simulation.

(b) PSO convergence.

Fig. 4. Illustration of PSO-related simulations.

4.3 Comparison Between GA and PSO Models Multiple executions are conducted for GA and PSO while fixing the number of iterations to 50 and the population to 100. The results are summarized in Table 6. The PSO is dominant overall, with an average accuracy of 99.5%. However, the GA’s best accuracy was approximately 74.6% only. Nonetheless, it is important to point out that in terms of complexity using Big O notation, GA has higher complexity with O(n2 ) while PSO has O(nlogn) which is lower. But unlike PSO, which needs proper initialization, GA is very simple and easier to implement.

144

A. Kaissar et al. Table 6. PSO simulations part 2.

No

Population

Iterations

Best Value

Mean

1

25

100

760

811.31

2

50

100

712

760.50

3

75

100

711.8

726.61

4

100

100

712.5

732.15

5

125

100

711.4

722.81

5 Conclusions and Future Work Humans are getting closer each year to being nearly fully dependent on robotic applications. They are already available in almost every field and increasing rapidly. Most of these robots are not static robots. Thus, their movement must be controlled and optimized using an algorithm. Such an algorithm for path planning could significantly increase their work efficiency and productivity. A detailed model framework has been proposed, which includes implementing GA and PSO optimization methods. PSO showed more accurate results with an average accuracy of up to 99.5% and faster convergence time. It was dominant over the GA. Future work could involve moving obstacles, making it more challenging to calculate the shortest path because each obstacle has a different velocity and direction. This is beneficial in real-case scenarios where multiple robots operate on the same map. In addition, the ability to connect all the robots in the same environment using IoT could be considered allowing them to communicate together and coordinate their movement.

References 1. Acemoglu, D., Restrepo, P.: Robots and jobs: evidence from us labor markets. J. Polit. Econ. 128(6), 2188–2244 (2020) 2. Ali, S.M., Yonan, J.F., Alniemi, O., Ahmed, A.A.: Mobile robot path planning optimization based on integration of firefly algorithm and cubic polynomial equation. J. ICT Res. Appl. 16(1), 1–22 (2022). https://doi.org/10.5614/itbj.ict.res.appl.2022.16.1.1 3. Ansari, S., Alnajjar, K.A.: Multi-hop genetic-algorithm-optimized routing technique in diffusion-based molecular communication. IEEE Access 11, 22689–22704 (2023). https:// doi.org/10.1109/ACCESS.2023.3244556 4. Ansari, S., Alnajjar, K.A., Saad, M., Abdallah, S., El-Moursy, A.A.: Automatic digital modulation recognition based on genetic-algorithm optimized machine learning models. IEEE Access 10, 50265–50277 (2022). https://doi.org/10.1109/ACCESS.2022.3171909 5. Brooks, A., et al.: Path planning for robotic delivery systems. In: SoutheastCon 2022, pp. 421– 426. IEEE (2022) 6. Cheng, X., Li, J., Zheng, C., Zhang, J., Zhao, M.: An improved PSO-GWO algorithm with chaos and adaptive inertial weight for robot path planning. Front. Neurorobot. 15, 770361 (2021)

Robot Path Planning Using Swarm Intelligence Algorithms

145

7. Chopard, B., Tomassini, M.: Particle swarm optimization. In: Chopard, B., Tomassini, M. (eds.) An Introduction to Metaheuristics for Optimization, pp. 97–102. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-319-93073-2_6 8. Choueiry, S., Owayjan, M., Diab, H., Achkar, R.: Mobile robot path planning using genetic algorithm in a static environment. In: 2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA), pp. 1–6. IEEE (2019) 9. Gupta, S.K.: An overview of genetic algorithms: a structural analysis. Int. J. Innov. Sci. Res. Tech. 15, 58 (2021) 10. Li, G., et al.: Improved artificial fish swarm algorithm approach to robot path planning problems. In: 2020 5th International Conference on Automation, Control and Robotics Engineering (CACRE), pp. 71–75. IEEE (2020) 11. Liang, X., Kou, D., Wen, L.: An improved chicken swarm optimization algorithm and its application in robot path planning. IEEE Access 8, 49543–49550 (2020) 12. Lu, C.Y., Kao, C.C., Lu, Y.H., Juang, J.G.: Application of path planning and image processing for rescue robots. Sensors and Materials 34(1), 65–80 (2022) 13. Ma, H.: Application of an improved ant colony algorithm in robot path planning and mechanical arm management. Int. J. Mechatron. Appl. Mech. 10, 196–203 (2021) 14. Mahajan, B.D., Marbate, P.: Literature review on path planning in dynamic environment (2013) 15. Pagallo, U.: Vital, Sophia, and co.—the quest for the legal personhood of robots. Information 9(9), 230 (2018) 16. Rahmaniar, W., Rakhmania, A.E.: Mobile robot path planning in a trajectory with multiple obstacles using genetic algorithms. J. Rob. Control (JRC) 3(1), 1–7 (2022) 17. Shao, X., Wang, G., Zheng, R., Wang, B., Yang, T., Liu, S.: Path planning for mine rescue robots based on improved ant colony algorithm. In: 2022 8th International Conference on Control, Automation and Robotics (ICCAR), pp. 161–166. IEEE (2022) 18. Tuba, E., Dolicanin, E., Tuba, M.: Water cycle algorithm for robot path planning. In: 2018 10th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), pp. 1–6. IEEE (2018) 19. Tuba, E., Strumberger, I., Zivkovic, D., Bacanin, N., Tuba, M.: Mobile robot path planning by improved brain storm optimization algorithm. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE (2018) 20. Wang, Z., Wu, C., Xu, J., Ling, H.: Research on path planning of cleaning robot based on an improved ant colony algorithm. In: MATEC Web of Conferences, vol. 336, p. 07005. EDP Sciences (2021)

Deep Reinforcement Learning for Solving Multi-objective Vehicle Routing Problem Jian Zhang, Rong Hu(B) , Yi-Jun Wang, Yuan-Yuan Yang, and Bin Qian School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China [email protected]

Abstract. This paper presents a novel app roach for solving the multi-objective vehicle routing problem (MOVRP) using deep reinforcement learning. The MOVRP considered in this study involves two objectives: travel distance and altitude difference. To address this problem, we employ a decomposition strategy based on weights to decompose the multi-objective problem into multiple scalar subproblems. We then use a pointer network to solve each subproblem and train the policy network’s parameters using the policy gradient algorithm of reinforcement learning to obtain the Pareto front solutions of the entire problem. The proposed method provides an effective solution to the MOVRP, and experimental results demonstrate its superiority over traditional optimization methods in terms of solution quality and computational efficiency. Furthermore, the method exhibits strong generalization and adaptability, enabling it to handle multi-objective vehicle routing problems of varying sizes and features with significant flexibility and practicality. The proposed method’s distinct advantages make it a promising solution to the MOVRP. Keywords: Multi-objective vehicle routing problem · Deep reinforcement learning · Decomposition strategy

1 Introduction Multi-objective optimization is crucial for solving practical problems in a variety of domains, such as logistics networks. The Multi-Objective Vehicle Routing Problem (MOVRP) is a variant of the well-known Vehicle Routing Problem (VRP) that involves juggling multiple objectives and taking into account mutual constraints.[1]. Altitude is a significant factor in VRP that can enhance comfort, environmental protection, driving safety, and the ability to meet specific criteria. In a logistics scenario, a vehicle might be needed to move a load from a low-altitude area to a high-altitude area. This could involve traveling across multiple altitude zones. A smaller altitude difference indicates a gentler change in altitude, which improves the vehicle’s ability to adjust to altitude changes. This helps improve driving stability and safety, while also reducing fuel consumption and vehicle wear and tear. Driving distance is a crucial factor in addition to altitude. Vehicles may need to detour if the route is not adequately chosen, which would considerably © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 146–155, 2023. https://doi.org/10.1007/978-981-99-4755-3_13

Deep Reinforcement Learning for Solving Multi-objective Vehicle

147

increase the driving distance and time, raise logistical expenses, and lengthen delivery timeframes. Therefore, considering both altitude and driving distance in vehicle routing planning optimizes logistics operations, reduces costs, and achieves a more balanced and practical solution. This approach enables logistics companies to better utilize resources, complete delivery tasks faster, and minimize operational expenses. Multi-objective evolutionary algorithms (MOEAs) are popular for solving multiobjective problems due to their population-based approach that can explore the solution space and obtain a set of solutions in a single run. NSGA-II [2] and MOGLS [3] are widely studied and used algorithms for this purpose, particularly for solving the multiobjective traveling salesman problem [4–6]. However, both evolutionary algorithms and manual heuristics have limitations that make them unsuitable for large-scale problems, and they may need to be re-executed if the problem changes slightly [7]. Reinforcement learning, a branch of machine learning, has shown great advantages in solving complex problems [8]. Compared with hand-designed evolutionary strategies and heuristics, reinforcement learning can autonomously learn decision strategies by interacting with the environment [9]. In particular, in the area of combinatorial optimization problems, Bello et al. [10] successfully applied deep reinforcement learning to the Traveling Salesman Problem (TSP), where they modeled the optimization problem using a pointer network model with negative itinerant path lengths as reward signals. By updating the network parameters using a policy gradient approach, deep reinforcement learning (DRL) has been successfully applied to different combinatorial optimization problems. Li [11] proposed a novel DRL model that used a decomposition approach for solving the multi-objective TSP problem, achieving remarkable results. These results show that reinforcement learning can play an important role in solving complex decision-making problems [12, 13]. This paper proposes a DRL-based framework for solving the MOVRP (DRLMOVRP), which offers multiple advantages over traditional methods. Firstly, DRL has powerful generalization capability and fast solution speed. Secondly, unlike conventional methods that need to be resolved when the problem size changes, our approach is more robust and can handle MOVRP instances of different sizes once the model is trained. Finally, our proposed method can obtain Pareto optimal solutions directly through simple forward propagation, which reduces running time compared to traditional methods that require complicated processes such as population updates or iterative search. Overall, our DRL-MOVRP framework offers an efficient and effective solution to the MOVRP. The rest of this paper is organized in the following manner: Sect. 2 introduces the problem to be solved, and Sect. 3 describes the general framework of the Deep Reinforcement Learning-based MOVRP Optimization Algorithm (DRL-MOVRP), which illustrates the idea of using deep reinforcement learning to solve MOVRP. Finally, experiments in Sects. 3 and 4 respectively confirm the method’s efficacy.

2 Problem Description The multi-objective vehicle routing problem studied in this paper can be described as follows: Multiple vehicles are distributed from a distribution center to multiple customers; the location and demand of each customer are determined, and the load capacity

148

J. Zhang et al.

of each vehicle is known. It is required to rationally arrange the vehicle routing to minimize the distance traveled and the altitude difference, which needs to meet the following conditions (Fig. 1): • The sum of the demand of each customer on each delivery route does not exceed the capacity of the vehicle; • Each customer must receive a delivery and can only be served once.

Fig. 1. schematic diagram

Let L be the number of customers that the distribution center needs to deliver to, K be the number of vehicles, di (i = 1, · · · , L) be the demand of each customer, and qk (k = 1, · · · , K) be the weight capacity of each vehicle. Let cij denote the transportation cost from point i to point j, which includes the distance and altitude difference of driving. The distribution center is denoted by 0, and the customers are denoted by 1, 2, · · · , L. Defining variables:  1 Vehicle k travels from customer i to customer j xijk = 0 Otherwise  1 Vehicle k completes the task for customer i yik = 0 Otherwise The mathematical model is expressed as follows: min Z =

L  K L  

cij xijk

(1)

i=0 j=0 k=1 L 

di yik ≤ q,

k = 1, 2, · · · , L

(2)

i=1 k  k=1

 yik =

1 K

i = 1, 2, · · · , L i=0

(3)

Deep Reinforcement Learning for Solving Multi-objective Vehicle L 

xijk = yik

149

j = 1, 2, · · · L; k = 1, 2, · · · , K

(4)

xijk = yik i = 1, 2, · · · L; k = 1, 2, · · · , K

(5)

i=0 L  i=0

Equation (1) above ensures that the total cost Z is minimized. Equation (2) is the load capacity constraint of the vehicles. Equation (3) ensures that the transportation tasks at each customer point are performed by only one vehicle, while all transportation tasks are performed by K vehicles together. Equations (4) and (5) ensure that each customer can and will be served by one vehicle only once.

3 Algorithm Framework In this section, the established framework solves MOVRP through DRL, as shown in the following Fig. 2:

Subproblem 1

Point network to build policy network parameter

MOVRP

Optimization of network parameters of RL guidance strategy

Decomposition strategy parameter

Subproblem n

MOVRP

Decomposition strategy

Point network to build policy network

Policy Network

Offline training

Solution

Fig. 2. Algorithm Flowchart

First, the MOVRP is decomposed into multiple scalar subproblems by decomposing the strategy, A pointer network is used to model the subproblems and construct the strategy network. The strategy developed by the network is then assessed, and RL is used to optimize the strategy network parameters depending on the strategy gradient. The training process is offline. To solve a specific MOVRP instance, the original problem is decomposed into multiple sub-problems, and the already trained strategy network is used to generate the final solution. 3.1 Decomposition Strategy The decomposition strategy in multi-objective optimization algorithms is crucial because it simplifies and improves the efficiency and quality of the algorithm’s solutions by

150

J. Zhang et al.

breaking down the original problem into multiple scalar sub-problems, where each sub-problem’s multiple objectives are weighted as a single objective. Additionally, the problem’s size and complexity are reduced. The DRL-MOVRP proposed in this paper uses a weight-based decomposition strategy [14]; specifically, given a set of uniformly distributed weight vectors, λ1 , · · · , λn , (1, 0), (0.9, 0.1), . . . , (0, 1) used in this paper, j j where λi = (λ1 , λ2 )T . Thus, the MOVRP is decomposed into subproblems assigned different weights. Then a Pareto optimal solution is found for each subproblem, and finally, the Pareto front (PF) is formed through a weighted combination of the Pareto optimal solutions from each subproblem. 3.2 Sub-problem Modeling Recall from the definition of MOVRP that the travel distance and altitude difference need to be minimized. To address this, the strategy is decomposed into multiple scalar subproblems, with each subproblem weighted as a single objective. These subproblems are then modeled using a pointer network to construct the strategy network [9]. The pointer network can be described as a neural network model that maps sequences to sequences. Its core idea is to encode the input sequence of VRP using an encoder, which generates feature vectors. Then, by utilizing the attention mechanism in conjunction with a decoder, the solution is constructed in an autoregressive manner. Autoregressive means that a node is selected at each step, and the next node is selected based on the previously selected nodes until a complete solution is obtained. Overall, this approach improves the efficiency and quality of the algorithm by reducing the size and complexity of the problem and providing simpler solutions to each subproblem.

ct

embedding

s1

s2

s3

s4

d t1

d t2

d t3

d t4

Attention layer

s5 d t5

s1 s1

s2

s3

s4

s5

d t1

d t2

d t3

d t4

d t5

Fig. 3. Pointer Network

As illustrated in Fig. 3, our model is composed of two main components. The first is a set of embeddings that maps the inputs into a D-dimensional vector space. We might have multiple embeddings corresponding to different elements of the input, but they are shared among the inputs. The second component of our model is a decoder that points to an input at every decoding step. The network model takes an input set X = {si , dti , i = 1, · · · , L}, the number of customers is represented by L, and the customer locations are denoted by the tuple {s1i , s2i }, where s1i = (xi , yi , zi ) represents

Deep Reinforcement Learning for Solving Multi-objective Vehicle

151

the x, y, and z coordinates of the ith customer; these coordinates are used to calculate the distance between two customers. s2i = zi denotes the z coordinates of the ith customer, which are used to calculate the altitude difference between two customers, and both cost functions have Euclidean distance definitions. Because the customer demand changes with the time that the vehicle visits the customer node, dti represents the dynamic element of the input, i.e., the demand of customer i at time t. The input structure is 3-dimensional. A pointer Y0 is used to refer to any input in X0 , and at each decoding time t(1, 2, · · · ), Yt+1 points to one of the available inputs Xt , which determines the input for the next decoder step. This process continues until a termination condition is satisfied, the terminating condition being that there is no more demand to satisfy. The model’s output is a customer alignment Y = {ρt , t = 0, · · · , T } that has a length of T . As the vehicle is in the process of delivery, it also needs to return to the warehouse for refilling operations. Therefore, the output sequence length T is different from the input sequence length L. Use Yt = {ρ0 , · · · , ρt } to denote the sequence that has been decoded up to time t. We use the probability chain rule to decompose the probability of generating sequence Y , i.e., P(Y |X 0 ), where Xt denotes the set of left and right input states at time t, as follows: P(Y |X 0 ) =

T 

P(ρt+1 |Yt , Xt )

(6)

t=0

Xt+1 = f (ρt+1 , Xt )

(7)

The state transfer function f represents the recursive update problem of the above equation. Each component on the right-hand side of Eq. (6) is computed by the attention mechanism [15]. Intuitively, the attention mechanism calculates how much every input is relevant in the next decoding step t. The most relevant one is given more attention and can be selected as the next visiting city. The calculation is as follows: P(ρt+1 |ρ1 , · · · , ρt , Xt ) = softmax(˜uti ),

u˜ ti = vcT tanh(Wc [xit ; ct ])

(8)

The “;” here means that two vectors are spliced together. Where v and Wc are learnable parameters, and ct is the state of the RNN decoder that summarizes the information of i previously decoded steps ρ0 , · · · , ρt . Let xit = (si , d t ) denote the ith embedded input. The greedy decoder can be used to select the next city. Instead of selecting the city with the largest probability greedily, during training, the model selects the next city by sampling from the probability distribution. 3.3 Training Vinyals et al. [16] employed supervised learning to train neural networks for the travel quotient problem, which is challenging to apply in combinatorial optimization due to the need for labeled datasets. In contrast, our subproblem model uses the actor-critic method [9, 11] for unsupervised training, requiring only the reward signal, i.e., a negative value of the total travel path length, to train the network. The training process involves two networks: the actor network, which is the pointer network providing a probability

152

J. Zhang et al.

distribution for selecting the next client node, and the critic network, which evaluates the expected reward based on the problem state. The critic network shares the same architecture as the pointer network encoder, mapping the encoder’s hidden states to the critic’s output.

Algorithm 1 REINFORCE Algorithm initialize the actor network with random weights θ and critic network with 1: random weights φ 2: 3: 4: 5: 6: 7:

for iteration = 1, 2,L do reset gradients: dθ ← 0, dφ ← 0 sample N instances according to Φ M for n = 1,L , N do initialize step counter t ← 0 repeat

8:

choose ytn+1 according to the distribution P( ytn+1 Yt n+1 , X tn )

9: 10: 11: 12: 13:

observe new state X tn+1

14: 15: 16: 17

t ← t +1 until termination condition is satisfied compute reward R n = R (Y n , X 0n )

end for 1 N ( R n − V ( X 0n ; φ ))∇θ log P (Y n X 0n ) ∑ n =1 N 1 N dφ ← ∑ n =1 ∇φ ( R n − V ( X 0n ; φ )) 2 N update θ using dθ and φ using dφ dθ ←

end for

As shown in Algorithm 1, the algorithm contains two neural networks: one for the actor network, represented by the weight vector θ , and the other for the critic network, represented by the weight vector φ. In this algorithm, we take N sample problems from problem set M and use Monte Carlo simulation to generate feasible sequences of these problems for the evaluation of the current strategy πθ . We use the superscript n to denote the variable of the nth sample problem. After the decoding of N problems is completed, we calculate the corresponding rewards and update the actor network using the policy gradient at step 14, where V (X0n ; φ) is the approximation of the rewards estimated by the critic network for the nth problem. We also update the critic network in step 15 to reduce the difference between the expected and observed rewards during the Monte Carlo rollback.

Deep Reinforcement Learning for Solving Multi-objective Vehicle

153

4 Computational Results 4.1 Settings The DRL proposed in this paper is trained on an RTX 3060 using Python, and the solution process and comparison algorithms are run on an Intel Core i5-12400f CPU with 32 GB of RAM. The test cases follow a special [0, 1] × [0, 1] × [0, 1] distribution, with demand points following a [0, 9] [0, 9] distribution except for the warehouse point. To test the effectiveness and generalization ability of the algorithm, cases of different sizes were generated, including 40, 70, 100, 150, and 200 customers, with vehicle capacities of 30, 50, 80, 100, and 120, respectively. The proposed network model is trained on 40 customer MOVRP instances, generating 500,000 examples over 9 h. Each algorithm is run 20 times in each case, with NSGA-II and MOGLS set to a maximum of 500–4000 iterations and 100 populations, and Decomposition Strategy with Genetic Algorithm (DS-GA) set to a maximum of 100 iterations and 100 populations, and DRL-MOVRP set to 100 subproblems, with only non-dominated solutions saved in the final PF. 4.2 Result In this paper, we propose a novel algorithmic framework, DRL-MOVRP, for solving the MOVRP problem and validate its performance through an ablation experiment. Specifically, we compare the DRL-MOVRP against a DS-GA by gradually removing their components to compare their performance. As shown in Table 1, the results demonstrate that the DRL-MOVRP outperforms the DS-GA, providing evidence of the effectiveness of our proposed approach. This single ablation experiment provides valuable guidance for further refinement and improvement of our algorithm. All of the algorithms in this paper, including the one that is suggested, perform admirably in resolving small-scale issues, as shown in Table 1 and Fig. 4. DS-GA, NSGA-II, and MOGLS improve their convergence ability with increasing iterations, but at the cost of longer computation times. On the other hand, DRL-MOVRP exhibits better convergence ability and a shorter running time. In large-scale problems with 200 customer points, DS-GA, NSGA-II, and MOGLS perform worse than the proposed algorithm in terms of diversity and convergence. DRL-MOVRP has the best performance and a significantly shorter running time. The experiment results demonstrate its effectiveness in solving large-scale multi-objective VRP problems, and its performance does not degrade with increasing customer numbers. In contrast, DS-GA, NSGA-II, and MOGLS fail to converge in reasonable computation time when dealing with large-scale MOVRP. Additionally, the PF obtained using the DRL-MOVRP method has a wider distribution range than those obtained using DS-GA, NSGA-II, and MOGLS. The volume of the target space region that the algorithm’s collection of nondominated solutions and reference points encloses is known as HV. The larger the HV value, the better the comprehensive performance of the algorithm.

154

J. Zhang et al.

Fig. 4. The left graph shows the experimental results of 40 customer points, and the right graph shows the experimental results of 200 customer points. Table 1. This table presents HV values and running times (in seconds) obtained by different algorithms for multi-objective VRP instances with varying customer numbers. Gray background indicates the best HV value, while bold text denotes the longest running time. 40-customer

70-customer

100-customer

150-customer

200-customer

HV

Time

HV

Time

HV

Time

HV

Time

HV

Time

NSGA-II -500

1376

7.5

4919

10.9

17616

13.9

38513

19.2

61884

23.8

NSGA-II -1000

1683

14.3

5613

21.6

21932

24.8

39987

34.2

64514

42.4

NSGA-II -2000

2230

29.1

5879

49.6

26236

49.1

40772

67.7

68304

83.9

NSGA-II -4000

2414

57.5

6365

98.3

27953

104.1

43711

136.7

70982

169.4

MOGLS-500

1109

14.1

4304

19.1

13680

28

34201

36.4

60430

44.1

MOGLS-1000

1630

26.3

5558

37.7

20684

51.3

38782

69.6

62531

85.4

MOGLS-2000

1930

32

5632

45.5

25977

61.1

40122

82.1

65056

99.3

MOGLS-4000

2396

109

6012

163

26628

214.1

41415

289.9

70633

357.5

DS-GA

2510

255.5

7025

412.3

29060

650.3

45907

985.4

76896

1394

DRL-MOVRP

2403

6.4

8225

10.6

33289

14.1

57499

21.7

91796

28.7

5 Experimental Results In this paper, we aim to address the multi-objective vehicle routing problem (MOVRP) involving vehicle travel distance and height. We propose a novel algorithmic framework that decomposes the original problem into scalar subproblems and merges each subproblem into a single-objective subproblem based on the decomposed weights. We model and solve each subproblem using a pointer network and adjust the network parameters through deep reinforcement learning. Finally, we obtain the Pareto front solution of the original problem by weighting the solutions of all subproblems. Experimental results demonstrate that our DRL-MOVRP approach is effective, efficient, and has promising characteristics such as generalization capability, fast solution speed, and the potential for high-quality solutions. However, due to the nature of pointer networks, they can only use distance as the objective function, which limits their ability to solve multi-objective problems. Therefore, designing the objective function of the network is essential and a

Deep Reinforcement Learning for Solving Multi-objective Vehicle

155

key area for future research. Our study offers a novel solution to assist with real-world logistics distribution problems. Acknowledgements. This research was supported by the National Natural Science Foundation of China (61963022 and 62173169) and the Basic Research Key Project of Yunnan Province (202201AS070030).

References 1. Jozefowiez, N., Semet, F., Talbi, E.G.: Multi-objective vehicle routing problems. Eur. J. Oper. Res. 189(2), 293–309 (2008) 2. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 3. Jaszkiewicz, A.: On the performance of multiple-objective genetic local search on the 0/1 knapsack problem-a comparative experiment. IEEE Trans. Evol. Comput. 6(4), 402–412 (2002) 4. Beirigo, B.A., dos Santos, A.G.: Application of NSGA-II framework to the travel planning problem using real-world travel data. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 746–753. IEEE (2016) 5. Peng, W., Zhang, Q., Li, H.: Comparison between MOEA/D and NSGA-II on the multiobjective travelling salesman problem. In: Goh, C.-K., Ong, Y.-S., Tan, K.C. (eds.) Multiobjective Memetic Algorithms, pp. 309–324. Springer Berlin Heidelberg, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-540-88051-6_14 6. Ke, L., Zhang, Q., Battiti, R.: MOEA/D-ACO: a multiobjective evolutionary algorithm using decomposition and antcolony. IEEE Trans. Cybern. 43(6), 1845–1859 (2013) 7. Zhang, X., Tian, Y., Cheng, R., Jin, Y.: A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization. IEEE Trans. Evol. Comput. 22(1), 97–112 (2016) 8. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018) 9. Nazari, M., Oroojlooy, A., Snyder, L., Takác, M.: Reinforcement learning for solving the vehicle routing problem. Adv. Neural Inf. Process. Syst. 31 (2018) 10. Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural Combinatorial Optimization with Reinforcement Learning (2016) 11. Li, K., Zhang, T., Wang, R.: Deep reinforcement learning for multiobjective optimization. IEEE Trans. Cybern. 51(6), 3103–3114 (2020) 12. Hsu, C., et al.: Monas: Multi-objective neural architecture search using Reinforcement Learning. arXiv preprint arXiv:1806.10332 (2018) 13. Mossalam, H., Assael, Y.M., Roijers, D.M., Whiteson, S.: Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707 (2016) 14. Miettinen, K., Hakanen, J., Podkopaev, D.: Interactive nonlinear multiobjective optimization methods. In: Multiple Criteria Decision Analysis: State of the Art Surveys, pp. 927–976 (2016). https://doi.org/10.1007/978-1-4939-3094-4_22 15. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) 16. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural Inf. Process. Syst. 28 (2015)

Probability Learning Based Multi-objective Evolutionary Algorithm for Distributed No-Wait Flow-Shop and Vehicle Transportation Integrated Optimization Problem Ziqi Ding1 , Zuocheng Li1(B) , Bin Qian1,2 , Rong Hu1,2 , and Changsheng Zhang1 1 School of Information Engineering and Automation, Kunming University of Science and

Technology, Kunming 650500, China [email protected] 2 Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China

Abstract. This paper proposes a probability learning based multi-objective evolutionary algorithm (PLMOEA) to solve distributed no-wait flow-shop and vehicle transportation integrated optimization problem (DNFVTIOP). The objectives to be minimized are the total completion time for each factory (TT) and the total transportation cost (TC). Firstly, we propose for the first time the mathematical description of DNFVTIOP that is quite different from previous works. Secondly, a probability learning approach based on two probability matrices are introduced to improve the global search ability of PLMOEA. To be specific, a positioning probability matrix (PM) is used to accumulate the probability of each job occurring at each position in high-quality solutions. And a conjoint structure probability matrix (CM) is used to learn the dependency of two joint jobs in high-quality solutions. Next, the new population is generated through a dedicated cooperation strategy based on the nondominated sorting method. Finally, computational results show that PLMOEA is an effective solution method for DNFVTIOP. Keywords: Probability learning · multi-objective evolutionary algorithm · distributed no-wait flow-shop · vehicle transportation · integrated optimization

1 Introduction In the fields of industrial production, flow-shop scheduling problem (FSSP) is a generalized version of conventional scheduling problems. FSSP is also one of the most widely research topics in production scheduling areas [1]. Compared with FSSP, no-wait flowshop scheduling problem requires that jobs are not allowed to be interrupted during the production process, which is extensively used in many real-life industries, including chemical, metallurgy, pharmaceutical, food manufacturing, and iron and steel industries [2]. With intensification of market competition and trend of economic globalization, manufacturing process considering multiple production centers is becoming a common © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 156–167, 2023. https://doi.org/10.1007/978-981-99-4755-3_14

Probability Learning Based Multi-objective Evolutionary Algorithm

157

production pattern. Manufacturing activities are changing from centralized to distributed. In addition, jobs processed by distributed factories often need to be transported by vehicle to a designated aggregation point for distribution. In this sense, distributed no-wait flow-shop and vehicle transportation integrated optimization problem (DNFVTIOP) is becoming increasingly important in real-life applications. The conventional distributed flow-shop scheduling problem (DFSP) has been proved to be NP-hard problem [3]. Since DNFVTIOP is the extension of DFSP, DNFVTIOP belongs also to NP-hard. Therefore, the study of DNFVTIOP has important theoretical and practical significance. For distributed no-wait flow-shop scheduling problem (DNFSP), Zhu et al. [4] proposed a knowledge-guided learning fruit fly optimization algorithm for DNFSP. Zhao et al. [5] presented an optimal block knowledge driven backtracking search algorithm for DNFSP. Engin et al. [6] introduced an effective hybrid ant colony algorithm for no-wait flow-shop scheduling problem (NFSP). Shao et al. [7] proposed an iterated greedy algorithm for DNFSP. Based on the above analysis, there are a number of solution approaches for solving DNFSP in existing works. However, there are relatively few studies on DNFSP with transportation. Therefore, this paper aims to propose effective solution method for the considered DNFVTIOP. In reality, decision makers need to consider multiple objectives for a scheduling problem, and there will be some conflict between the objectives. Hence the multi-objective problem has been challenging significance for researchers. Pan et al. [8] proposed a discrete differential evolution algorithm for the NFSP with manufacturing time and maximum delay criteria. Tavakkoli et al. [9] presented a new hybrid multi-objective algorithm to solve the NFSP. Shao et al. [10] introduced a Pareto-based estimation of distribution algorithm for the multi-objective DNFSP. In the paper, we propose targeting the total completion time for each factory (TT) and the total transportation cost (TC) while taking into account both efficiency and economic factors. These factors are more closely related to reality. As a result, we need a multi-objective evolutionary algorithm (MOEA) to solve DNFVTIOP. The elitist nondominated sorting genetic algorithm (NSGA-II) [11] is a MOEA based on the Pareto optimal solution. Wang et al. [12] proposed a NSGA-II based memetic algorithm to solve FSSP. Hisao et al. [13] used a biased neighborhood structure and local search to improve the performance of a hybrid algorithm of NSGA-II. From the above literature research, the framework of NSGA-II is significantly effective for solving multi-objective problems. But it is known that NSGA-II has defects such as slow convergence and low computational efficiency. Motivated by the above studies, we propose a probability learning based MOEA (PLMOEA) to solve DNFVTIOP. In PLMOEA, a probability learning method based on two probability matrices are introduced to learn location and conjoint structure information of jobs of high-quality solutions, with the aim to improve the global search ability of the algorithm. Afterwards, the probability learning approach and nondominated sorting method work together to discover promising search regions.

158

Z. Ding et al.

2 DNFVTIOP 2.1 Notation Definition The notation used is presented below. MI MJ PMIi ,MJj WMIi MK πkJF EMIi ,MI (i+1) CMIi ,MJj MH Q DMKk ,MKk V DV h,k AV h,k πhSV πhSR ST 1 CM Ii ST 2 CM Ii α β TT TD TC

The set of jobs, MI = {MI 1 , MI 2 , . . . , MIn }, n is the number of jobs The set of machines, MJ = {MJ 1 , MJ 2 , . . . , MJm }, m is the number of machines in each factory The processing time of job MIi on machine MJj The weight of job MIi The set of factory and aggregation point, MK = {MK0 , MK1 , . . . , MKk_num }, where MK0 is aggregation point, and k_num is the number of factories The job processing sequence in factory k, kMK \{MK0 } The difference of start time between jobs MIi and MI (i+1) on the first machine The completion time of job MIi on machine MJj   The set of vehicles, MH = MH 1 , MH 2 , . . . , MHh_num , h_num is the number of vehicles, it is calculated through a proposed decoding method The maximum capacity of vehicles The distance from factories (or aggregation point) MKk to MKk The velocity of vehicles The departure time of vehicle hMH from factory or aggregation point kMK The arrival time of vehicle hMH at factory or aggregation point kMK The sequence of jobs in vehicle hMH The travel route sequence of vehicle hMH The completion time of job MIi in the production stage The time of job MIi that is transported to the aggregation point The usage cost of unit vehicle The cost of the vehicle travels a unit distance The total processing time for each factory The total transportation distance The total transportation cost

2.2 Problem Description The DNFVTIOP can be described as follows: The jobs must be processed in identical factories, each of which is a no-wait permutation flow-shop. After processing, the jobs are transported to the aggregation point by vehicles. The whole process is divided into two stages: production stage and transportation stage, the diagram is shown in Fig. 1. The constraints of the production stage are as follows: a machine can only process one job at a time. The jobs are processed from the beginning to completion without any waiting time. In the transportation stage, the vehicle travels at a constant speed. Vehicles with load constraint start from the aggregation point, and each vehicle carries completed jobs from different factories back to the aggregation point according to the plan (Fig. 2). Vehicles can visit multiple factories without violating

Probability Learning Based Multi-objective Evolutionary Algorithm

159

load constraints. In this paper, the objective is to minimize TT and TC. The TT and TC are calculated as follows: ⎧ ⎧ ⎫ ⎫ m m−1 ⎨ ⎨ ⎬ ⎬  = max max Pπ JF ,MJj − Pπ JF ,MJj , Pπ JF ,MJ 1 , Eπ JF ,π JF k,l k,(l+1) k,l k,(l+1) k,l ⎩ ⎩ ⎭ ⎭ (1) j=1 j=1

JF k ∈ MK \{MK0 }, l ∈ 1, 2, ..., ( πk − 1) . ⎧ m  ⎪ ⎪ Pπ JF ,MJj ⎪ ⎨ CπSTJF1 = k,l

 ⎪ l−1 ⎪ ⎪ ⎩

Eπ JF ,π JF k,L

L=1

k,(L+1)

TT =

+

j=1

JF  .  Pπ JF ,MJj , l ∈ 2, 3, ..., πk

 k∈Nk \{Nk0 }

  Wπ SV d ∈ 1,...,|πhSV | h,d

CπSTJF1

k,|π JF | k

.



DVh,MK0 = max 0, max C ST 1 ⎩ L∈ 1,..., π JF ∩1,..., π SV  ππJFSR ,L AVh,π SR = DVh,π SR h,b

h,(b−1)

h,2

h

+ Dπ SR

h,(b−1)

(3)

≤ Q, ∀h ∈ MH . 

MK1

(4)

⎫ ⎬ − Dk,π SR /V , ∀h ∈ MH . h,2 ⎭

SR /V , b ∈ 2, ..., SR π h , ∀h ∈ MH . ,π h,b

⎧ ⎪ ⎪ ⎪ ⎨



DVh,π SR = max AVh,π SR ,  CπSTJF1 max  h,b h,b  ⎪  π SR ,L ⎪ h,b ⎪ L∈ 1,..., π JF ∩ 1,..., π SV ⎩

CπSTSV2 = DVh,π SR h,d

(2)

k,l

 ⎧ ⎨

k,1

j=1 m 

π SR h,b

h

⎫ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭

(6)

,



∀b ∈ 2, ..., ( πhSR − 1) , ∀h ∈ MH . + Dπ SR ,MK0 /V , ∀d ∈ 1, ..., πhSV , ∀h ∈ MH .

h,( π SR −1) h

(5)

(7)

(8)

h,( π SR −1) h

MH ,h_num b_num−1

TD =





h=MH 1

b=1

Dπ SR ,π SR h,b

h,(b+1)

TC = α × h_num + β × TD.

.

(9) (10)

160

Z. Ding et al. M H1 M K1 M J1

MJ2

SV M H 1 ,1

M Jm

SV M H 1 ,2

SV M H 1 ,|

SV MH1 |

SV M H 2 ,2

SV M H 2 ,|

SV MH 2 |

MH2 MK2

M J1 MI

MJ2

Products

M Jm

{M I 1 ,M I 2 ,...,M In }

SV M H 2 ,1

Vehicle Assignment

MK0

M Hh _ num M Kk _ num

M J1

SV M Hh _ num ,1

M Jm

MJ2

Production Stage

SV M Hh _ num ,2

SV M Hh _ num ,|

SV M Hh _ num |

Transportation Stage

Fig. 1. Schematic diagram of DNFVTIOP.

MH3

MK7

M K1

MK0 MK2 MK6

M H1

M K3

M K5 MK4 MH2

Fig. 2. Vehicle routing diagram at transportation stage.

3 PLMOEA for DNFVTIOP 3.1 Solution Representation We design a novel decoding method for DNFVTIOP, in which both the total completion time and the transportation cost can be taken into account. The method can be summarized as follows. 1) Jobs are assigned to each vehicle in turn based on meeting the load of the vehicle. 2) Each vehicle is used as a unit, and the jobs on the vehicle are operated in turn. Assign each job to each factory in turn and calculate the time for the job to arrive the aggregation point under different schemes. Finally, choose the solution that minimizes it. 3) Insert 0 between two adjacent vehicles to get the decoding sequence.

Probability Learning Based Multi-objective Evolutionary Algorithm

161

3.2 Proposed Probability Matrices and Updating Mechanisms DNFVTIOP is a combinatorial optimization problem, and individual fitness can be determined by decoding the order of jobs in an individual. To solve this problem, it is crucial to establish a probability model that reflects the position information of each job in the individual. In this paper, the PM and CM are respectively established by using the position information of each job and dependency of two conjoint jobs of high-quality solutions. It is assumed that n jobs are processed on m machines. According to the coding rule, the length of the individual chromosome is n, and each job can be arranged at any position in the chromosome. The PM that evolves to the gen generation can be expressed as: ⎤ ⎡ pm1,1 (gen) pm1,2 (gen) · · · pm1,n (gen) ⎥ ⎢ ⎢ pm2,1 (gen) pm2,2 (gen) · · · pm2,n (gen) ⎥ ⎥, ⎢ (11) PM (gen) = ⎢ .. .. .. .. ⎥ . ⎦ ⎣ . . . pmn,1 (gen) pmn,2 (gen) · · · pmn,n (gen)  where ni=1 pmi,j (gen) = 1, pmi,j (gen) denotes the probability of job i appearing at j position on high-quality individuals of the gen generation. Since the probability of the jobs appearing at any position in the initial state is the same, so the initial value of the PM is 0.   Let π best (gen) = π1best , π2best , ..., πnbest be one of high-quality solutions in the nondominated solution set at gen generation, LR is the learning rate, and pmi,j (gen) updates through the high-quality solution. The steps are as follows: a) Assume that j = 1. b) If x = πjbest (gen),pmx,j (gen + 1) = pmx,j (gen) + LR, j = j + 1. c) If j > n, then update with the next nondominated solution. Otherwise, go to step (b). If all nondominated solutions are used up, go to step (d). pm (gen+1) (∀w ∈ {1, 2, ..., n}, j ∈ d) Normalization, pmw,j (gen + 1) = n w,j pm (gen+1) {1, 2, ..., n}).

y=1

y,j

The CM that evolves to the gen generation can be expressed as: ⎡ ⎤ 0 cm1,2 (gen) · · · cm1,n (gen) ⎢ ⎥ 0 · · · cm2,n (gen) ⎥ ⎢ cm2,1 (gen) ⎥, (12) CM (gen) = ⎢ .. .. .. .. ⎢ ⎥ . ⎣ ⎦ . . . 0 cmn,1 (gen) cmn,2 (gen) · · · n where v=1 cmw,v (gen) = 1, cmw,v (gen) represents the probability that the previous position is w job, and the next adjacent position is the job v, and [w, v] is the dependency mentioned in the text. Since jobs only can appear once in any permutation, the probability of the diagonal is 0. bmi,j (gen) updates through the above high-quality solution   π best (gen) = π1best , π2best , ..., πnbest . The steps are as follows: a) Assume that j = 2.

162

Z. Ding et al.

best (gen), x = π best (gen). b) x1 = πj−1 2 j c) cmx1 ,x2 (gen + 1) = cmx1 ,x2 (gen) + LR, j = j + 1. d) If j > n, then update with the next nondominated solution. Otherwise, go to step (b). If all nondominated solutions are used up, go to step (e). cm (gen+1) (∀w ∈ {1, 2, ..., n}, v ∈ e) Normalization. cmw,v (gen + 1) = n w,v cm (gen+1)

{1, 2, ..., n}).

y=1

w,y

When the algorithm generates a new individual through the probability model, the first position of the individual is determined by PM, and each subsequent position is determined by CM. Therefore, the selection of the first position is significant. In order to dig more high-quality solutions, we determine the position of the individual as the first position to generate new solutions using a random method. In addition, after selecting a job using the roulette wheel method, the corresponding column should be cleared, and the CM should be normalized before selecting the next job in sequence. 3.3 Proposed Cooperation Strategy Based on the Nondominated Sorting In the process of new population generation, in addition to obtaining population 1 by executing selection, crossover and mutation of old population, another population 2 is generated by probability model sampling. We combine the individuals from both populations and use the nondominated sorting method to select the best candidates for the new population. Let P1 be the population 1, P2 the population 2, and P the new population. The procedure is given in Algorithm 1. Although P2 may produce individuals with higher fitness values due to their highquality location and dependency, we also recognize the potential for P1 that inherited from the paternal generation to evolve better individuals. Therefore, we propose a cooperation strategy based on nondominated sorting to generate a P and ensure that we can obtain more high-quality solutions and guide the population to evolve towards the Pareto front at a faster speed. By the design above, we propose the framework of PLMOEA, as shown in Fig. 3.

Algorithm 1: Proposed cooperation strategy based on the nondominated sorting Input: population size , and Output: 1: Combining and into a new population ; 2: The nondominated sorting of ; 3: Calculate the crowding degree of each solution; 4: First, the solutions are stratified according to the number of dominated solutions, and the solutions in the same stratum are sorted according to their crowding degree from largest to smallest; 5: For to do 6: ; 7: End For 8: Return

Probability Learning Based Multi-objective Evolutionary Algorithm

163

Begin

Initialized populations, PM,CM and external archive set

Generation of new populations through selection, crossing and mutation

Update the external archive set based on the new population, update the PM and CM based on the external profile set.

The new and old populations are merged, and the real new population is selected through nondominated sorting and crowding degree

New population 1 and new population 2 generate the real new population through cooperation strategy

Assignment of new populations to old populations

Generate new population 2 through probability matrix PM and CM

No Whether termination conditions are met

Generation of new population 1 through selection, crossover and mutation of old population

Yes End

Fig. 3. The main framework of PLMOEA for DNFVTIOP.

4 Experimental Results and Comparison To further show the effectiveness of the proposed PLMOEA, we compare the PLMOEA with MOEA/D [14], MOGLS [15] and NSGA-II [11]. Since there is no suitable standard test set for DNFVTIOP, test problems of different sizes are randomly generated for different numbers of jobs, machines and factories based on the approach to solve flow-shop scheduling problem in the literature [16]. In terms of n × m × k ∈{{20, 30, 50, 70} × {5, 10} × {2, 4}}, a total of 16 test cases. The parameters are: population size PS = 20, crossover probability CR = 0.8, mutation probability MR = 0.8, learning rate LR = 0.75. We coded all the algorithms in Delphi 7 and performed experiments on a computer of CPU, 2.70 GHz with 16 GB memory. To make a fair comparison, each algorithm independently solves each test problem 20 times and with the same termination condition n×m×k×0.4 s. We use three performance metrics from literature [17], Ω, Δ and β determine dominance, diversity, and convergence respectively. Suppose that nondominated solution sets S1 , S2 , ..., SK are obtained by K algorithms, S * is the approximate Pareto front of S1 ∪ S2 ∪ ...Sk ... ∪ SK .

164

Z. Ding et al.

The first performance metric is dominance metric Ω. Ω k is related to the proportion of solutions in S k that are not governed by S * . k =

|Sk − {x ∈ Sk |∃y ∈ S ∗ : y ≺ x}| , |Sk |

(13)

where, y ≺ x denotes the solution x in S k is dominated by the solution y in S * . Obviously, the bigger Ω k is, the better Sk is. The second performance metric is convergence metric Δ. |Sk |−1 dξ k + dηk + λ=1 dλk − d k , k = 1, 2, ..., K, (14) k = dξ k + dηk + (|Sk | − 1) × d k where,dξ k denotes the Euclidean distance between the highest solution of S k , and it of S * on the vertical axis,dηk indicates the Euclidean distance between the furthest solution of S k and it of S * on the horizontal axis,dλk represents the Euclidean distance between two continuous solutions in S k , d k denotes the Euclidean distance between all solutions in S k . Obviously, the smaller Δk is, the better S k is. The third performance metric is convergence metric β. β k denotes the sum of the Euclidean distances from the solution in S k to the nearest solution of the approximate Pareto front.   |Sk |  2 (S )/|S |, k = 1, ..., K, β(Sk ) =  (15) dx,y k k x=1

where,dx,y (Sk ) represents the Euclidean distance from the solution in S k to the nearest solution in S * . It is clearly that the smaller the β k is, the better the convergence is. From Table 1, the optimal values of Ω and β both appear in PLMOEA. Optimal values of Δ for only a few cases appear in other algorithms, but the interpolation does not exceed 0.08. Therefore, we can draw the conclusion that PLMOEA can achieve better performance than MOEA/D, MOGLS and NSGA-II in the different sizes test problems. At the same time, the effectiveness of PLMOEA to solve DNFVTIOP problem is verified.

Ins. n,m,f 20,5,2 20,10,2 20,5,4 20,10,4 30,5,2 30,10,2 30,5,4 30,10,4 50,5,2 50,10,2 50,5,4 50,10,4 70,5,2 70,10,2 70,5,4 70,10,4

Ω 0.1125 0.1250 0.0750 0.0750 0.1308 0.2375 0.0967 0.2125 0.1417 0.1500 0.1375 0.1750 0.3917 0.2250 0.0750 0.0917

M OEA/D ∆ β 0.7692 95.4601 0.7914 114.9240 0.8371 117.3111 0.7516 100.7620 0.7144 100.0938 0.7682 103.0687 0.8170 112.2499 0.8109 133.5933 0.7922 215.6405 0.8429 199.7025 0.8045 223.4307 0.8467 268.7300 0.8873 177.4622 0.8730 269.7419 0.8156 276.5110 0.8213 451.9176 Ω 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

M OGLS ∆ β 0.7155 232.0108 0.7350 281.7000 0.7953 241.5463 0.8260 231.6185 0.8089 333.2233 0.7815 533.5380 0.7363 275.6747 0.7909 367.1588 0.8647 727.1781 0.8684 825.9680 0.8521 521.5351 0.8217 654.3251 0.8580 922.4879 0.9325 1287.4075 0.8601 643.6912 0.8827 867.3594 Ω 0.0480 0.1376 0.3089 0.2283 0.1292 0.0846 0.4001 0.3488 0.2088 0.0766 0.2128 0.2963 0.1056 0.2014 0.1477 0.1873

NSGA-II ∆ β 0.7977 57.7667 0.7416 79.7007 0.7530 48.8305 0.7522 55.8824 0.7197 85.2042 0.8181 115.9389 0.7763 39.7139 0.8008 64.2862 0.8338 108.5350 0.8468 133.2941 0.8855 74.2188 0.8717 91.1490 0.9519 141.2298 0.9651 196.9941 0.8582 140.7015 0.8612 173.7865

Table 1. Comparisons of MOEA/D, MOGLS, NSGA-II and PLMOEA.

Ω 0.9263 0.8898 0.8092 0.8659 0.8683 0.9032 0.8038 0.8008 0.8875 0.9875 0.9554 0.8260 0.9197 0.8809 0.8839 0.9290

PLM OEA ∆ β 0.6926 1.3239 0.6936 5.7469 0.7551 5.8951 0.7558 7.1383 0.7937 3.1485 0.8010 6.0673 0.7428 8.4678 0.8264 11.8755 0.7500 6.6506 0.7048 0.4757 0.8017 1.4827 0.7624 7.3355 0.7618 5.5590 0.7988 7.5241 0.8105 8.4695 0.7972 4.0127

Probability Learning Based Multi-objective Evolutionary Algorithm 165

166

Z. Ding et al.

5 Conclusions and Future Research In this paper, we propose PLMOEA for DNFVTIOP. Firstly, based on the problem characteristics of DNFVTIOP, we establish a mathematical model and design the decoding method. Secondly, we use a random strategy to generate an initial population to maintain diversity. Then, two two-dimensional probability matrices are used to accurately record the positions of jobs and dependency of two conjoint jobs in high-quality solutions, to improve the global search efficiency. In addition, we propose a cooperative strategy based on nondominated sorting to generate new populations that can maximize the retention of high-quality solutions near parent populations. Finally, the effectiveness of PLMOEA have been verified through simulation experiments and algorithm comparisons on standard test problems of different sizes. For future research, the model can be improved to consider more actual production factors and transportation factors, such as processing speed, vehicle speed instability, etc. Acknowledgements. This research is supported by the National Natural Science Foundation of China (62173169 and 61963022), the Basic Research Key Project of Yunnan Province (202201AS070030) and the Yunnan Fundamental Research Projects (202301AT070458).

References 1. Cheng, C., Ying, K., Li, S., Hsieh, Y.: Minimizing makespan in mixed no-wait flowshops with sequence-dependent setup times. Comput. Ind. Eng. 130, 338–347 (2019) 2. Asefi, H., Jolai, F., Rabiee, M., Tayebi Araghi, M.E.: A hybrid NSGA-II and VNS for solving a bi-objective no-wait flexible flow-shop scheduling problem. Int. J. Adv. Manufact. Technol. 75(5–8), 1017–1033 (2014). https://doi.org/10.1007/s00170-014-6177-9 3. Ying, K., Lin, S.: Minimizing makespan for no-wait flowshop scheduling problems with setup times. Comput. Ind. Eng. 121, 73–81 (2018) 4. Zhu, N., Zhao, F., Wang, L., Ding, R., Xu, T.: A discrete learning fruit fly algorithm based on knowledge for the distributed no-wait flow shop scheduling with due windows. Expert. Syst. Appl. 198, 116921 (2022) 5. Zhao, F., Zhao, J., Wang, L., Tang, J.: An optimal block knowledge driven backtracking search algorithm for distributed assembly no-wait flow shop scheduling problem. Appl. Soft. Comput. 112, 107750 (2021) 6. Engin, O., Güçlü, A.: A new hybrid ant colony optimization algorithm for solving the no-wait flow shop scheduling problems. Appl. Soft. Comput. 72, 166–176 (2018) 7. Shao, W., Pi, D., Shao, Z.: Optimization of makespan for the distributed no-wait flow shop scheduling problem with iterated greedy algorithms. Knowl. Based Syst. 137, 163–181 (2017) 8. Pan, Q., Wang, L., Qian, B.: A novel differential evolution algorithm for bi-criteria no-wait flow shop scheduling problems. Comput. Oper. Res. 36, 2498–2511 (2009) 9. Tavakkoli-Moghaddam, R., Rahimi-Vahed, A., Mirzaei, A.H.: A hybrid multi-objective immune algorithm for a flow shop scheduling problem with bi-objectives: weighted mean completion time and weighted mean tardiness. Inf. Sci. 177, 5072–5090 (2007) 10. Shao, W., Pi, D., Shao, Z.: A pareto-based estimation of distribution algorithm for solving multi-objective distributed no-wait flow-shop scheduling problem with sequence-dependent setup time. IEEE Trans. Autom. Sci. Eng. 16, 1344–1360 (2019)

Probability Learning Based Multi-objective Evolutionary Algorithm

167

11. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002) 12. Wang, H., Fu, Y., Huang, M., Huang, G.Q., Wang, J.: A NSGA-II based memetic algorithm for multi-objective parallel flowshop scheduling problem. Comput. Ind. Eng. 113, 185–194 (2017) 13. Ishibuchi, H., Hitotsuyanagi, Y., Tsukamoto, N., Nojima, Y.: Use of biased neighborhood structures in multi-objective memetic algorithms. Soft Comput. 13, 795–810 (2009) 14. Zhang, Q., Li, H.: OEA/D: a multi-objective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11, 712–731 (2007) 15. Ishibuchi, H., Murata, T.: A multi-objective genetic local search algorithm and its application to flow-shop scheduling. IEEE Trans. Syst. Man Cybern. Syst. Part C, Appl. Rev. 28, 392–403 (1998) 16. Allahverdi, A., Aydilek, H.: Total completion time with makespan constraint in no-wait flowshops with setup times. Eur. J. Oper. Res. 238, 724–734 (2014) 17. Li, Z.C., Qian, B., Hu, R., Chang, L.L., Yang, J.B.: An elitist nondominated sorting hybrid algorithm for multi-objective flexible job-shop scheduling problem with sequence-dependent setups. Knowl. Based Syst. 173, 83–112 (2019)

Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective Two-Echelon Vehicle Routing Problem with Time Windows Qiu-Yi Shen1 , Ning Guo1,2(B) , Rong Hu1,2 , Bin Qian1 , and Jian-Lin Mao1 1 School of Information Engineering and Automation, Kunming University of Science and

Technology, Kunming 650500, China [email protected] 2 School of Mechanical and Electronic Engineering, Kunming University of Science and Technology, Kunming 650500, China

Abstract. In this paper, a hyper-heuristic ant colony optimization algorithm (HHACOA) is proposed to solve the multi-objective two-echelon vehicle routing problem with time windows (MO2E-VRPTW). Minimizing the total carbon emissions and maximizing the customer satisfaction are the objectives that the MO2E-VRPTW needs to optimize. On the one hand, at the low-level of HHACOA, the quality of the initial population is improved by using two problem-based heuristic rules, and a permutation consisting of nine neighborhood operations is used as an algorithm to perform a series of operations on each individual in the low-level population. Then HHACOA can be driven to search different regions in the solution space. On the other hand, at the high-level of HHACOA, the ant colony algorithm is used to learn the permutation information acting on each lowlevel individual, and the new permutation are generated based on the probabilistic transition matrix. A permutation is an individual in the high-level population. Then HHACOA can be quickly guided to reach regions where high-quality nondominated solutions are concentrated. Finally, the simulation experiments and algorithm comparisons verify the effectiveness of the proposed HHACOA. Keywords: two-echelon vehicle routing problem · time windows · multi-objective optimization · hyper heuristic algorithm

1 Introduction The classic vehicle routing problem (VRP) was first proposed by Dantzig and Ramser [1]. The vehicle routing problem refers to a problem in which a fleet assigns paths to a group of customers in a given directed graph, so that all customers are served and certain restrictions are met. The two-echelon VRP (2E-VRP) refers to a type of VRP in which goods must first be delivered from a central depot to a transfer station (Stage 1), and then from the transfer station to the customer (Stage 2). The 2E-VRP is widely used in real-world scenarios such as urban logistics delivery [2] and emergency relief material distribution [3]. With the increasing competition in the logistics industry, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 168–180, 2023. https://doi.org/10.1007/978-981-99-4755-3_15

Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective

169

in addition to pursuing low costs, customer satisfaction has also become an important means for logistics companies to improve their competitive advantage. At the same time, with the continuous deterioration of the global climate and excessive carbon emissions. Therefore, it is of great practical significance to study the multi-objective 2E-VRP with time windows (MO2E-VRPTW). Since the VRP is NP-hard and it can be reduced to the MO2E-VRPTW, the MO2E-VRPTW also belongs to NP-hard. Then the research on the MO2E-VRPTW also has important theoretical value. In the past few years, there have been some literatures about the 2E-VRPTW and its variants. Li et al. [4] proposed an adaptive large neighborhood search heuristic to solve the 2E-VRPTW with the goal of minimizing the integrated cost. Dellaert et al. [5] proposed a branch-and-price approach to tackle a multi-commodity two-echelon capacitated VRPTW(MC-2E-VRPTW), and the optimization goal was to minimize the total routing costs. It should be noted that the current research predominantly employs meta-heuristic algorithms to tackle the 2E-VRPTW, with the primary objective being the optimization of total costs. However, none of the studies have investigated the simultaneous minimization of carbon emissions and maximization of customer satisfaction in 2E-VRPTW. Therefore, this paper focuses on MO2E-VRPTW and devises a hyperheuristic ant colony optimization algorithm (HHACOA) to address this multi-objective problem. Hyper-heuristics is a higher-level algorithm that does not directly operate on the problem domain, but rather on other heuristic algorithms to find the optimal solution. This enables a more comprehensive search of the solution space and generates higher quality solutions. Compared to other optimization algorithms, ACO algorithm has the advantages of being computationally simple, robust, easy to implement, and applicable to large-scale problems. Therefore, the hyper heuristic algorithm combined with the ant colony optimization algorithm (HHACOA) has been successfully applied to various combinatorial optimization problems. Chen et al. [6] proposed an ant based hyperheuristic algorithm for the travelling tournament problem (TSP) to minimize the travel distances. Duhart et al. [7] applied HHACOA to solve the knapsack problem. In all the above studies, the high-level of HHACOA uses ant colony algorithm to record the sequence information of high-quality individuals, but the position information of highquality individuals will also have a great impact on the convergence of the algorithm. Therefore, this paper designs an effective ant colony algorithm in the high-level of HHACOA to reasonably record the sequence information and position information of high-quality individuals. The main features of HHACOA lie in two aspects: The low-level population is generated by several heuristic rules and a novel probabilistic transition matrix is proposed in the ant colony algorithm. For the initialization of the low-level population, the quality of the initial population is improved by using two problem-based heuristic rules, and the remaining individuals are generated randomly to ensure population dispersion. At the higher levels of HHACOA, an ant colony algorithm is used to learn information about the permutations of high-quality individuals and sample the novel probabilistic transition matrix to generate new permutations, allowing HHACOA to rapidly explore regions of high-quality non-dominated solutions.

170

Q.-Y. Shen et al.

The following sections of this paper are divided into five parts. Section 2 outlines the mathematical model of the MO2E-VRPTW. Section 3 elaborates on the details of HHACOA. Section 4 presents and analyzes the experimental outcomes and comparisons. Finally, Sect. 5 offers conclusions and suggestions for future research.

2 Problem Description The MO2E-VRPTW can be modeled on a directed complete graph G(V , E) with a set of vertices V = {0, 1, ..., ns +nc } and edges E = {(i, j)|i, j ∈ V , i = j}. In set V , the number 0 represents the central depot, the set N1 = {1, 2, ..., ns } denotes the transfer depot, The first-echelon distribution network node set is composed of set V1 = {0, 1, 2, ..., ns }, the set N2 = {ns + 1, ns + 2, ..., ns + nc } represents the customer, The second-echelon distribution network node set is consist of set V2 = {1, 2, ..., ns , ns +1, ..., ns +nc }. In the 2E-VRP, there are two types of vehicles that make up set Ki = {1, 2, ..., nKw }, w = 1, 2, where w represents either the first-echelon or the second-echelon. The primary vehicles transport goods at speed v1 between a central depot and a set of transfer depot, and auxiliary vehicles transport goods at speed v2 from the transfer depots to the customers. Each vehicle initially departs from the depot, serves customer i for time si , travels from customer i to j for time rij , then continues to serve customer j. After serving a group of customers, each vehicle finally returns to the depot. Each transfer depot i has the quantity qi and customer has the quantity qj to be picked up and can be served exactly once by one vehicle. The picking quantity of each vehicle cannot exceed its capacity constraint Qw . Let μi (zi ) denote the satisfaction of customer i when the vehicle arrives customer i at time zi . If customer i is served within the appointed time interval [ei , li ], the satisfaction μi (zi ) = 1. If customer i is served earlier than the extended earliest time Ei or later than the extended latest time Li , the satisfaction μi (zi ) = 0. It should be noted that there will be a waiting time wi when customer i is served earlier than time Ei . If customer i is served within (Ei , ei ) and (li , Li ), as shown in Fig. 1, the satisfaction μi (zi ) varies linearly with the arrival time zi . The customer satisfaction can be calculated by formula (1): ⎧ 0, zi < Ei , ⎪ ⎪ ⎪ zi −Ei ⎪ ⎪ ⎨ ei −Ei , Ei ≤ zi < ei , ui (zi ) = (1) 1, ei ≤ zi ≤ li , ⎪ Li −zi ⎪ ⎪ , l < z ≤ L , i i i ⎪ ⎪ ⎩ Li −li 0, zi > Li , Moreover, let dij denote the distance between nodes i and j, Qik the total quantity carried by vehicle k when leaving nodes i, and xijk the decision variable which equals 1 if vehicle k travels from node i and j. Then, the MO2E-VRPTW can be defined as follows:    d xijk [c1 vij1 + c2 dij v12 + c3 (u1 + Qik )dij ]+ Minimize Z1 = i∈N1 j∈N1 k∈K1 (2)    d xijk [c1 vij2 + c2 dij v22 + c3 (u2 + Qik )dij ] i∈N2 j∈N2 k∈K2

Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective

171

i

1

0

li

ei

Ei

Li

ti

Fig. 1. The customer satisfaction function

Maximize Z2 =



μi (zi ), w = 1, 2

(3)

i∈Nw

Subject to:  

xijk = 1, ∀i ∈ Nw

(4)

xijk = 1, ∀j ∈ Nw

(5)

k∈Kw j∈Vw

 

k∈Kw i∈Vw



x0ik =

i∈N1



i∈N2 j∈N1



xj0k = 1, ∀k ∈ K1

(6)

j∈N1

xjik =

 

xjmk = 1, ∀k ∈ K2

(7)

m∈N2 j∈N1

qi ≤ Qik ≤ Qw , ∀i ∈ Nw , k ∈ Kw

(8)

wi = si = 0, ∀i ∈ V1

(9)

 

(zi + wi + si + rij ) × xijk = tj , ∀j ∈ Nw

(10)

k∈Kw i∈Vw

Ei ≤ ti + wi ≤ Li , ∀i ∈ Nw

(11)

wi = max{0, Ei − ti }, ∀i ∈ Nw

(12)

xijk ∈ {0, 1}, ∀i, j ∈ Vw , i = j, k ∈ Kw

(13)

In the above problem model, Eqs. (2) and (3) are objective functions respectively: the total carbon emissions minimization and the customer satisfaction maximization. Constraints (4) and (5) state that each customer is served just by one vehicle for only once. Constraint (6) and (7) guarantee that each vehicle starts and ends its trip from the depot. Constraint (8) ensures that each vehicle cannot exceed its capacity. Constraint

172

Q.-Y. Shen et al.

(9) gives the waiting and service time of the depot and transfer station. Constraint (10) indicates the relationship between the arrival time at the next customer and the departure time from current customer. Constraint (11) ensures each customer is served within the required time. Constraint (12) specifies the waiting time. Constraint (13) defines the decision variable.

3 HHACOA for MO2E-VRPTW 3.1 Solution Representation The solution in this paper is denoted by decimal coding. It represents vehicle routes that serve all transfer stations and customers. The solution consists of two parts. Specifically, the first part is composed of the sequence of the central depot and the transfer station, and the second part is composed of the sequence of the transfer station and the customer. 3.2 Population Initialization In the initialization of the low-level population, it is generated using two heuristic rules and random rules based on the problem characteristics. Assuming that the vehicle departs from the current position i and checks if the set of positions U that satisfies the maximum vehicle load constraint is empty. If it is empty, the vehicle returns to the original depot and another vehicle is selected to continue serving the remaining positions. If not, the position j is selected from set U for service. The specific rules are described as follows: Rule 1: S(i, j) is defined as the satisfaction from the current position i to j. selecting all possible maximum values from S(i, j), and it can be described as j∗ = arg max{S(i, j)}. j∈U

Rule 2: E(i, j) is defined as carbon emission from the current position i to j. selecting all possible minimum values from E(i, j), and it can be described as j∗ = arg min{E(i, j)}. j∈U

Rule 3: selecting all possible positions from U , and it can be described as j∗ = random{j|j ∈ U }. If the solutions in the solution space are too concentrated, then the region that can be searched by various local operation operators for the current solution is limited, so a random approach is used to generate the high-level population to ensure that it has a certain degree of dispersion and diversity. 3.3 Low-Level Heuristic Operations To conduct a thorough search for high-quality solution spaces for HHACOA, nine effective neighborhood operations are reasonably designed at the low-level of the algorithm, they are expanded from three operation operators: insert, swap, and inverse. Insert can be divided into four variants: inserting a single element or two elements within or between sub-paths, for a total of four insert variants. Similarly, swap can be divided into four variants: swapping a single element or two elements with another element within or between sub-paths. Inverse only considers the operation of reversing the order of elements within a sub-path. Executing the above nine heuristic operations in different permutations and combinations can realize effective searches in multiple regions of the solution space.

Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective

173

3.4 Generation of High-Level Populations In the traditional ant colony algorithm, the probability transition matrix composed of pheromone matrix and heuristic function matrix is sampled to generate new individuals. Similarly, the formation of new individuals in this paper is based on a novel probability transition matrix. p

Probabilistic Transition Matrix: Defined Wn×n as the probability transition matrix in HHACOA, it is used to learn the position and sequence information of the non-dominated p solution sets composed of nine low-level heuristic operations. Let Wn×n (x, y) be the p elements in Wn×n , and the specific calculation is as follows: Wn×n = (Mn×n )α ∗ (Nn×n )β p

p

p

p

(14)

p

Mn×n represents the pheromone matrix and Nn×n represents the inspire function matrix in Eq. (14), Gp denotes the high-level population of the p th generation, GpH p,k

means the non-dominated solution set in Gp , SHP states the scale of GpH , AH indicates p,k

the k th individual of GpH , Z1 (πk ) and Z2 (πk ) are AH ’s carbon emissions and satisfaction p,k

p,k

respectively, Fn×n in Eq. (15) is used to record the sequence information of the AH , p p,k and En×n in Eq. (16) is used to record the sequence information of GpH , Un×n in Eq. (19) p,k

p

is used to record the location information of the AH , and Rn×n in Eq. (20) is used to record the location information of GpH ,The maximum number of iterations is max p. 

p,k

p,k

Z2 (πk )/Z1 (πk ), if x = AH (i), y = AH (i + 1) 0, otherwise i = 1, ..., n − 1, x = 1, ..., n, y = 1, ..., n, k = 1, ..., SHP

p,k Fn×n (x, y)

=

p En×n (x, y)

=

SHP 

p,k

Fn×n (x, y), x = 1, . . . , n, y = 1, . . . , n

(15)

(16)

k=1 p

The Mn×n can be detailed as be follows: p

0 (x, y) = 1, x = 1, . . . , n, y = Step1: Initialize the pheromone matrix Mn×n , Mn×n 1, . . . , n. Step2: Set p to 1. p Step3: Update the Mn×n according to Eq. (17). p

p−1

p

Mn×n (x, y) = (1 − Rho1 ) × Mn×n (x, y) + Rho1 × En×n (x, y)

(17)

174

Q.-Y. Shen et al.

Step4: Let p = p + 1, if p < max p, jump to Step3; otherwise, terminate the loop. Let sxy be the number of occurrences of heuristic operation y at position x and after p,k position x, and Un×n (x, y) be the probability that heuristic operation y is selected at position [8], which can be calculated by Eq. (18). p,k

Un×n (x, y) =

sxy n 

(18)

sxy

y=1 p

Rn×n (x, y) =

SHP 

p,k

Un×n (x, y), x = 1, . . . , n, y = 1, . . . , n, p ≥ 1

(19)

k=1 p

Based on the above description, the Nn×n can be detailed as be follows: p

p

Step1: Initialize the inspire function matrix Nn×n , Nn×n (x, y) = 1, x = 1, . . . , n, y = 1, . . . , n. Step2: Set p to 1. p Step3: Update the Nn×n according to Eq. (20). p

p−1

p

Nn×n (x, y) = (1 − Rho2 ) × Nn×n (x, y) + Rho2 × Rn×n (x, y)

(20)

Step4: Let p = p + 1, if p < max p, jump to Step3; otherwise, terminate the loop. High-Level Population Update: Define Op,k to be the k th individual in Gp , n is the number of heuristic operations, Sopt (Op,k , i) is an heuristic operation selection function, which is used to determine the low-level heuristic operation that appears at the i th position in Op,k . The specific description of Sopt (Op,k , i) is as follows: Step1: Determine whether i is equal to 1, if yes, execute step 2 step2; if not, execute step3. Step2: In order to prevent the algorithm from converging prematurely, considering the particularity of the individual at the first position, set Sopt (Op,k , 1) to randomly generate Lrand when selecting the first heuristic operation, then Sopt (Op,k , 1) = Lrand , return Sopt (Op,k , 1), and the operation selection ends. Step3: Randomly Generate Probability Numbers r,and ensure that its range is within n  p−1 Wn×n (Sopt (Op,k , i), h)). r ∈ [0, h=1

Step4: Take the method of roulette to choose the corresponding heuristic operation Lc , if sop sop+1  p−1  p−1 r∈[ Wn×n (Sopt (Op,k , i), h), Wn×n (Sopt (Op,k , i), h)), then c = sop, therefore, h=1

Sopt (Op,k , i) = Lc .

h=1

Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective

175

Step5: Output Sopt (Op,k , i), the process of operation selection ends. The above description is to select a heuristic operator, while the update of the entire high-level population can be described as follows: Step1: Set k to 1. Step2: Set i to 1. Step3: Set Op,k (i) = Sopt (Op,k , i). Step4: Set i = i + 1, if i ≤ n, jump to Step3; otherwise, jump to Step5. Step5: Set k = k + 1, if k ≤ popsize, jump to Step2; otherwise, jump to Step6. Step6: Output Gp . 3.5 The Flow of the HHACOA According to the above algorithm description, the specific process is as follows: Step1: Use the rules mentioned in Subsect. 3.2 to initialize the low-level population L = {Lp1 , Lp2 , Lp3 , · · · , Lpk } and high-level populations H = {Hp1 , Hp2 , Hp3 , · · · , Hpk },and set their size to popsize. Step2: Evaluate the dominance relationship of the population L, and calculate the corresponding dominance level of each individual. Select the non-dominated solution set E = {Ep1 , Ep2 , · · · , Epe } to form an excellent individual set, its size is pe , and use the individuals in to decode the individuals in turn. Let g = 1, h = 1. Step3: When Eph executes a low-level heuristic operation in Hpg , a Ep h will be generated, and Eph will be compared with Ep h . If Ep h dominates Eph , then use Ep h instead of Eph ; otherwise, keep Eph . And continue to execute Hpg in accordance with this rule The remaining low-level heuristic operations are used to update Eph . After Eph completes all the heuristic operations in Hpg , the fitness function value of Eph is the fitness function value of Hpg , and the fitness function value of Hpg is stored in the set in X . Step4: Set g = g + 1, if g ≤ popsize, execute step 4, otherwise, execute step 6. Step5: Set h = h + 1, if h ≤ pe , execute step 4, otherwise, execute step 7. Step6: Evaluate the dominance relationship of the set X and select the nondominated solution set K as the low-level archive set, select the individual set F = {Fp1 , Fp2 , · · · , Fpf } corresponding to K in H to update the pheromone matrix and heuristic function matrix of the high-level. Step7: The mutation operation is performed on the individuals in the set K to generate popsize − pf new individuals, and the final population is formed by merging the new individuals and the set K. Step8: Follow the steps described in Subsect. 3.4 to sample the probability transition matrix to generate high-level populations. Step9: Determine whether the termination condition is satisfied. If not, go to step2; otherwise, output the low-level archive set, terminate the loop.

176

Q.-Y. Shen et al.

4 Simulation Experiments and Result Analysis 4.1 Experimental Setup To test the performance of the proposed HHACOA, the 15 sets of test data used in this paper, namely set2 to set4, are derived from the standard instances of 2E-VRP. Where Set 2 and 3 comes from Perboli et al. [9], Set 4 derived from Crainic et al. [10]. In order to evaluate the effectiveness of HHACOA, simulations were conducted to compare it with other state-of-the-art multi-objective evolutionary algorithms, namely NSGAII [11], MOEA/D [12], MOGLS [13], and SPEAII [14]. In order to verify the effectiveness of using ant colony algorithm at the high-level of hyper-heuristic algorithm, two variant experiments were set up: HH_V1 (random execution of nine types of operation operators) and HH_V2 (sequential execution of nine types of operation operators). The parameter values of HHACOA and its variant experiments are set as follows: the high-level and low-level population size is popsize = 30, four importance factors α = 1, β = 2, rho1 = 0.7, rho2 = 0.9, The parameters for other four algorithms were set the same as in their original literature. To ensure a fair comparison, each algorithm was independently run 20 times with the same runtime limit of nc ∗ ns ∗ 0.1 seconds as the termination criterion. To facilitate the comparison, all comparison algorithms are re-implemented and coded in Matlab2021a, which is also the implementation platform for HHACOA. All experiments were conducted on an Intel 2.5GHz PC with 16GB RAM. 4.2 Measures for Comparing Non-dominated Sets The evaluation of the non-dominated solution set Sj in this study utilizes two performance measures described in [15]. Let S represent the union of all non-dominated solution sets. The first measure calculates the ratio of solutions in Sj that are not dominated by any other solutions in S. This can be expressed as follows: R_N (Sj ) =

|Sj − {x ∈ Sj |∃y ∈ S : y ≺ x}| |Sj |

(21)

Here, y ≺ x denotes that the solution x is dominated by the solution y, and |Sj | represents the number of solutions obtained by the j th algorithm. The second measure is the number of solutions in Sj which are not dominated by y, the larger the number R_N (Sj ) and N _N (Sj ) is, the better the solution set Sj is. N _N (Sj ) can be written as: N _N (Sj ) = |Sj − {x ∈ Sj |∃y ∈ S : y ≺ x}|

(22)

5 Results and Discussion Table 1 and Table 2 present the statistical results of the two performance measures, R_N and N_N, described in Subsect. 4.2, The best value for each instance is highlighted in bold for clarity. The NB at the last row in Tables 1 and 2 denotes the number of optimal

Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective

177

results obtained for each algorithm in all instances. Based on the statistically obtained NB values, we have plotted Figs. 2 and 3 to visualize the results obtained. It is evident that HHACOA outperforms the other four algorithms and variant algorithms in most instances, demonstrating the effectiveness of combining the hyper-heuristic algorithm with the ant colony optimization algorithm to reasonably record excellent individual position information and sequence information to solve MO2E-VRPTW. Table 1. Comparison results of HHACOA with its two variants Instances

HH_V1

HH_V2

HHACOA

R_N

N_N

R_N

N_N

R_N

N_N

Set2a_E-n22-k4-s6-17

0.68

5.45

0.11

0.60

0.62

7.55

Set2a_E-n33-k4-s1-9

0.71

7.05

0.01

0.05

0.59

10.75

Set2b_E-n51-k5-s2-17

0.60

3.35

0.00

0.00

0.70

14.35

Set2b_E-n51-k5-s6-12-32-37

0.78

5.65

0.00

0.00

0.59

13.85

Set2c_E-n51-k5-s2-17

0.55

4.35

0.00

0.00

0.70

16.00

Set2c_E-n51-k5-s6-12-32-37

0.59

5.00

0.00

0.00

0.73

16.20

Set3_E-n22-k4-s13-14

0.60

2.70

0.07

0.35

0.66

7.10

Set3_E-n33-k4-s16-22

0.49

4.10

0.01

0.05

0.76

15.10

Set3_E-n51-k5-s12-18

0.71

5.40

0.00

0.00

0.70

19.70

Set4a_Instance50-1

0.45

0.55

0.15

0.40

0.51

1.10

Set4a_Instance50-19

0.55

0.60

0.11

0.25

0.53

1.15

Set4a_Instance50-37

0.60

0.70

0.03

0.05

0.70

1.40

Set4b_Instance50-1

0.55

0.55

0.00

0.00

0.75

1.10

Set4b_Instance50-19

0.45

0.45

0.03

0.05

0.83

0.95

Set4b_Instance50-37

0.58

0.70

0.12

0.40

0.80

2.00

NB

5

0

0

0

10

15

Table 2. Comparison to the results of four algorithms Instances

MOEAD

MOGLS

NSGAII

SPEAII

HHACOA

R_N N_N R_N N_N R_N N_N R_N N_N R_N N_N Set2a_E-n22-k4-s6-17

0.42 2.85 0.04 0.20 0.31 4.10 0.45 4.20 0.66

2.80

Set2a_E-n33-k4-s1-9

0.23 2.70 0.00 0.00 0.07 1.25 0.37 4.05 0.96

6.75

Set2b_E-n51-k5-s2-17

0.23 1.30 0.00 0.00 0.33 6.85 0.39 4.45 0.93

5.40

(continued)

178

Q.-Y. Shen et al. Table 2. (continued)

Instances

MOEAD

MOGLS

NSGAII

SPEAII

HHACOA

R_N N_N R_N N_N R_N N_N R_N N_N R_N N_N Set2b_E-n51-k5-s6-12-32-37 0.15 0.65 0.00 0.00 0.16 4.45 0.24 2.55 0.97

5.65

Set2c_E-n51-k5-s2-17

0.08 1.05 0.00 0.00 0.19 3.75 0.21 2.55 0.98

7.40

Set2c_E-n51-k5-s6-12-32-37 0.18 2.25 0.00 0.00 0.33 8.10 0.23 2.65 0.97

7.05

Set3_E-n22-k4-s13-14

0.06 0.10 0.00 0.00 0.23 2.85 0.18 1.55 0.99

3.80

Set3_E-n33-k4-s16-22

0.11 1.55 0.00 0.00 0.28 6.20 0.44 4.45 0.94

3.50

Set3_E-n51-k5-s12-18

0.15 0.80 0.00 0.00 0.05 1.75 0.13 1.30 1.00

9.75

Set4a_Instance50-1

0.20 0.25 0.05 0.15 0.19 0.40 0.17 0.25 1.00

1.10

Set4a_Instance50-19

0.13 0.15 0.03 0.10 0.23 0.50 0.19 0.45 1.00

1.35

Set4a_Instance50-37

0.15 0.25 0.00 0.00 0.35 0.55 0.35 0.90 1.00

1.10

Set4b_Instance50-1

0.20 0.20 0.00 0.00 0.55 0.75 0.18 0.25 1.00

1.20

Set4b_Instance50-19

0.10 0.10 0.00 0.00 0.18 0.25 0.03 0.05 1.00

1.10

Set4b_Instance50-37

0.18 0.20 0.08 0.30 0.41 1.05 0.28 0.75 0.95

1.25

NB

0

11

0

0

0

0

3

0

Fig. 2. Comparison of HHACOA and its two variants on NB

1

15

Hyper-heuristic Ant Colony Optimization Algorithm for Multi-objective

179

Fig. 3. Comparison of HHACOA and four algorithms on NB

6 Conclusion and Future Work In this paper, a hyper-heuristic ant colony optimization algorithm (HHACOA) is proposed to solve the multi-objective two-echelon vehicle routing problem with time windows (MO2E-VRPTW). HHACOA utilizes two problem-based heuristic rules to enhance the quality of the initial population at the low-level. Additionally, a permutation algorithm, consisting of nine neighborhood operations, is applied to each individual in the low-level population to perform a series of operations. This enables HHACOA to explore diverse regions in the solution space. At the high-level, the ant colony algorithm is employed to acquire the permutation information acting on each low-level individual. The probabilistic transition matrix is used to generate new permutations, which serve as individuals in the high-level population. In future research, HHACOA will be further extended to solve the multi-objective green multi-period two-echelon vehicle routing problem with time windows, and effective algorithms will be designed for its solution. Acknowledgments. This research was supported by National Natural Science Foundation of China (61963022 and 62173169), the Basic Research Key Project of Yunnan Province (202201AS070030) and the Science Research Foundation of Yunnan Education Bureau, China (2022J0062).

References 1. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manag. Sci. 1(6), 80–81 (1959) 2. Pang, Y., Luo, H.L., Xing, L.N., Ren, T.: A survey of vehicle routing optimization problems and solution methods. Control. Theory. Appliance. 36(10), 1573–1584 (2019) 3. Döyen, A., Aras, N., Barbaroso˘glu, G.: A two-echelon stochastic facility location model for humanitarian relief logistics. Optimization Letters 6(6), 1123–1145 (2012) 4. Li, H., Wang, H., Chen, J., Bai, M.: Two-echelon vehicle routing problem with time windows and mobile satellites. Trans. Res. Part B: Method. 138, 179–201 (2020)

180

Q.-Y. Shen et al.

5. Dellaert, N., Van Woensel, T., Crainic, T.G., Saridarq, F.D.: A multi-commodity two-echelon capacitated vehicle routing problem with time windows: Model formulations and solution approach. Comput. Oper. Res. 127, 105154 (2021) 6. Chen, P.C., Kendall, G., Berghe, G.V.: An ant based hyper-heuristic for the travelling tournament problem. In: 2007 IEEE Symposium on Computational Intelligence in Scheduling, Honolulu, HI, USA, pp. 19–26. IEEE (2007) 7. Duhart, B., Camarena, F., Ortiz-Bayliss, J.C., Amaya, I., Terashima-Marín, H.: An experimental study on ant colony optimization hyper-heuristics for solving the knapsack problem. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A., Sarkar, S. (eds.) MCPR 2018. LNCS, vol. 10880, pp. 62–71. Springer, Cham (2018). https://doi.org/10.1007/978-3319-92198-3_7 8. Guo, N., Qian, B., Rong, Hu., Jin, H.P., Xiang, F.H.: A hybrid ant colony optimization algorithm for multi-compartment vehicle routing problem. Complexity 2020, 1–14 (2020). https:// doi.org/10.1155/2020/8839526 9. Perboli, G., Tadei, R., Vigo, D.: The two-echelon capacitated vehicle routing problem: models and math-based heuristics. Trans. Sci. 45(3), 364–380 (2011) 10. Crainic, T.G., Perboli, G., Mancini, S., Tadei, R.: Two-echelon vehicle routing problem: a satellite location analysis. Procedia Soc. Behav. Sci. 2(3), 5944–5955 (2010) 11. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE. Trans. Evol. Comput 6(2), 182–197 (2002) 12. Zhang, Q., Hui, L.: MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE. Trans. Evol. Comput. 11(6), 712–731 (2008) 13. Jaszkiewicz, A.: On the performance of multiple-objective genetic local search on the 0/1 knapsack problem-a comparative experiment. IEEE Trans. Evol. Comput. 6(4), 402–412 (2002) 14. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength pareto evolutionary algorithm. In: Proceedings of the 5th Conference Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems, pp. 95–100 (2001) 15. Ishibuchi, H., Yoshida, T., Murata, T.: Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Trans. Evol. Comput. 7(2), 204–223 (2003)

A Task-Duplication Based Clustering Scheduling Algorithm for Heterogeneous Computing System Ping Zhang1,2(B) , Jing Wu1,2 , Di Cheng1,2 , Jianhua Lu1,2 , and Wei Hu1,2 1 College of Computer Science and Technology, Wuhan University of Science and Technology,

Wuhan, China [email protected] 2 Hubei Key Laboratory of Intelligent Information Processing and Real-Time Industrial Systems, Wuhan, China

Abstract. Efficient scheduling algorithms have always been a hot topic of research in heterogeneous computing systems. Task Duplication Based (TDB) scheme is an important technique addressing this problem, and the main idea is to trade the computational cost of tasks for the communication cost. In this paper, we proposed a Task-Duplication based Clustering Scheduling (TDCS) algorithm which uses the task duplication technique and clustering to occur as little communication spend as possible, and thus reduce overall completion time. The TDCS scheduling process consists of three main phases. Firstly, the algorithm calculates the critical predecessor of tasks. Secondly, the generation of clusters based on the task critical predecessor trail, this phase is accompanied by the generation of task duplication. Finally, the selection of the appropriate processor for the cluster, this phase takes into account the communication costs. The experiments and analysis are based on randomly generated graphs with various parameters, including the degree of DAG regularity and the communication-computing cost ratio, as well as the number of processors and the degree of heterogeneity. The results showed that the algorithm is highly competitive and can effectively reduce the scheduling length. Keywords: Scheduling Algorithm · Task Clustering · Task Duplication · Heterogenous Environment

1 Introduction Due to the emergence of advanced technologies such as big data and machine learning, and the ever-growing size and complexity of applications, parallel and distributed computing has become an increasingly important research priority [1]. The main goal of the scheduling mechanism is to map tasks to the processor and then sequence their execution to meet priority requirements and achieve the minimum overall completion time. Yet, in the majority of cases, it has been shown that the issue of scheduling tasks based on the needed priority relations is NP-complete [2]. As a result, previous © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 181–193, 2023. https://doi.org/10.1007/978-981-99-4755-3_16

182

P. Zhang et al.

research has focused on finding sub-optimal scheduling solutions with low complexity [3], where heuristics have been widely used. In recent years, numerous heuristics have been proposed, and these can be classified into task clustering scheduling algorithms, task duplication scheduling algorithms and list scheduling algorithms. In addition, two of these three algorithms have also been combined to improve the scheduling quality, and there has been a significant improvement in the results. Therefore, this paper focuses on the DAG scheduling problem in a heterogeneous computing system. Combining task clustering and task duplication-based approaches, with the target of minimizing the maximum completion time. The main contributions of this work are as follows: • In this paper, we proposed a new clustering method in which when each task is assigned to its critical processor, the critical predecessor tasks of all tasks except the entry task are recorded, and clusters are generated by virtue of the critical predecessor trail of the tasks. The process is accompanied by the generation of task duplication. • The allocation of appropriate processors to clusters is based on the sum of the execution costs of the tasks in the cluster on each processor and the communication costs required in this cluster. The simulation experiments showed that the algorithm can obtain better results under various test parameters, better than LDLS algorithm in 57.79% cases and better than HEFT in 64.47% cases. The full text is organized as follows. In Sect. 2, we review the relevant work. The problems considered are described in Sect. 3. Section 4 describes the algorithm in detail. Experimental results are given in Sect. 5, and conclusions are given in Sect. 6.

2 Related Work In this section, we present a concise overview of task scheduling algorithms, with a specific focus on the three primary types of DAG scheduling algorithms: list scheduling, cluster-based scheduling and task duplication-based scheduling, along with the description of these algorithms, the related key strategies are also described, such as list technique, duplication technique and clustering technique. DAG scheduling is the assignment of tasks to the appropriate processor for execution and involves how the tasks are assigned and the order in which they are executed and minimizes the total completion time or makespan. List scheduling algorithms generate a list of tasks based on their priority and then assign them to processors in turn. These algorithms differ in how the priority is defined or how the tasks are assigned to the processors [4]. Examples include the Heterogeneous Earliest Completion Time (HEFT) algorithm [5], and the Predicted Earliest Completion Time (PEFT) algorithm [6]. Cluster-based scheduling algorithms first divide tasks into clusters and then allocate these clusters to different processors in order to minimize the communication costs among processors. Their solutions are particularly efficient if the number of available processors is greater than the number of clusters. There are several advanced algorithms such as the Improved ACS Algorithm by CA for Task Scheduling [7], Distributed EnergyEfficient Clustering [8] algorithm and time-constrained single-arm cluster [9] algorithm.

A Task-Duplication Based Clustering Scheduling Algorithm

183

Task-duplication based algorithms duplicate critical tasks when needed and duplicate tasks are assigned to different processors. Examples include the limited duplicationbased list scheduling algorithm (LDLS) [10], the improved task duplication-based clustering algorithm (TDCF) [11], the list scheduling-based duplication and insertion algorithm (DILS) [12], and the task duplication-based clustering algorithm (TDCA) [13].

3 Models and Problem Definition 3.1 Application Models We consider a heterogeneous computing system containing p processors, with the set of processors denoted as P = {p1 , p2 , . . . , pp }. Assuming that interoperability between processors is not otherwise constrained, there is no competition for communication. The communication overhead between identical processors is negligible compared to the communication overhead between different processors. A weighted directed acyclic graph denoted as DAG = {V , E} is used to represent the priority constraint between tasks, where V represents the set of nodes or tasks, V = {v1 , v2 , . . . , vn }, and n is the total number of tasks. The set of edges E represents the dependencies between tasks, with each edge ei,j ∈ E, this means that only after the successful completion of task v1 will task v2 have a chance to start. The description of the edge weights is detailed in Definition 1.

1

2

3

4

5

6

20

7

8

9

8 10

(a)

(b)

Fig. 1. An example task graph. (a) Directed acyclic graph of tasks. (b) Execution times of all tasks in the DAG on different processors.

The computational heterogeneity of heterogeneous distribution systems results in varying execution times of the same task on different processors and differing execution times of different tasks on the same processor, as noted in previous studies [14]. Figure 1 shows an example of a task graph. The heterogeneity of the processor is shown in detail in Fig. 1(b). In this paper, the set pre(v i ) and the set suc(v i ) denote the predecessor and successor tasks of the current task respectively. A task without a predecessor set is an entry task,

184

P. Zhang et al.

and a task without a successor task is referred to as an exit task. For tasks with multiple entry nodes, we introduce a pseudo-entry node connected to these entry nodes. Similarly, a pseudo-exit node is introduced to connect to the exit nodes of the task. It is worth noting that both the pseudo-entry node and the pseudo-exit node have a runtime of 0 on any processor and the resulting edge has a weight of 0. The scheduling length corresponds to the time when the last exit node finishes its execution. 3.2 Problem Definition Some common attributes of task scheduling are defined below, and these definitions are referenced in the next sections. Definition 1. wi,j . For vj ∈ pre(vi ), if task vi and task vj are assigned to processors pm and pn respectively. The cost of communication between the two tasks is defined as below.  0 m=n (1) wi,j = ci,j m = n For each edge ei,j ∈ E, this corresponds to a non-negative communication time ci,j , which represents the data transfer time of task vi to its direct successor vj that allocated in another processor. Definition 2. EC(i). For vi ∈ V and pm ∈ P, EC(i, m) denotes the execution time of task vi on processor pm . The average execution time of task vi is defined as below. EC(i) =

p 

EC(i, m)/p

(2)

m=1

Definition 3. est(i, m). For vi ∈ V and pm ∈ P, est(i, m) denotes that if task vi is assigned to processor pm , the earliest start time of task vi on processor pm is given by: For the entry task, est(i, m) = 0. est(i, m) = max vj ∈pre(vi ) {minpn ∈P ect(j, n) + wj,i }

(3)

Definition 4. ect(i, m). For vi ∈ V and pm ∈ P, ect(i, m) is the earliest completion time of task vi on processor pm , we define ect(i, m) to be the value of est(i, m) plus the computational cost of task vi on processor pm : ect(i, m) = est(i, m) + EC(i, m)

(4)

Definition 5. cpro(vi ). For vi ∈ V , cpro(vi ) is the critical processor for task vi . In further detail, when the inter-task communication time is not considered, est(i, m) and ect(i, m) are calculated for each task on all processors, and the processor that minimises the earliest completion time of task vi is found to be the critical processor for task vi . cpro(vi ) = argminpm ∈P ect(i, m)

(5)

A Task-Duplication Based Clustering Scheduling Algorithm

185

Definition 6. cpre(vi ). For vi ,vj ∈ V , cpre(vi ) is a critical predecessor to task vi , provided that the predecessors of task vi have been assigned to their critical processors in the context of definition 5. cpre(vi ) is the last predecessor to deliver the results to task vi .     (6) cpre(vi ) = argmaxvj ∈pre(vi ) ect j, cpro vj Definition 7. rank(vi ). For vi ∈ V , rank(vi ) is the priority of task vi which refers to the length of the longest path from vi to vexit in the graph. The weight of node vi corresponds to the maximum computational cost of the task across different processors, while the weight of an edge represents the communication cost between the connected nodes.   (7) rank(vi ) = maxvk ∈succ(vi )) rank(vk ) + wi,k + maxpm ∈p EC(i, m) Definition 8. s. Cluster s consists of a number of tasks. The cluster set is denoted as S = {s1 , s2 , . . . , st }. For vi ∈ V , s(i) refers to the cluster in which task vi is present. Definition 9. sc(sk ). For sk ∈ S, sc(sk ) refers to the total communication cost required for tasks in cluster sk .  sc(sk ) = vi ∈sk ,vj ∈pre(vi ) C(i, j) s(j) = s(i) (8) Definition 10. fprc(k, r). For sk ∈ S and integer r ∈ [1, p], fprc(k, r) refers to the r-th favorite processor of cluster sk , satisfying Inequality 9.   (9) vi ∈sk EC(i, fprc(k, 1)) ≤ · · · ≤ vi ∈sk EC(i, fprc(k, r)) Cluster sk has the lowest sum of task execution costs on fprc(k, 1), and the order in which cluster sk selects processors is obtained by summing the execution costs in non-decreasing order. Definition 11. level(vi ). level(vi ) Denotes the level at which task vi is placed, which is the maximum number of path edges from the entry node to itself. Suppose level(ventry ) = 0. It can be expressed as:    (10) level(vi ) = maxvj ∈pre(vi ) level vj + 1

4 Proposed Algorithm Clustered scheduling can group tasks with high dependencies onto a single processor so that the communication cost between them is negligible, resulting in a smaller overall schedule length.

186

P. Zhang et al.

The main steps in the execution of the algorithm are the preparation phase, the initial clustering and the assignment of processors. Task duplication is generated by the generative clustering process. 4.1 Preparation Phase This algorithm uses task duplication and intend to have as little communication costs as possible, so the communication costs between tasks are not considered during the generation of clusters. The critical precursors of the tasks are used for cluster generation. Therefore, this stage is significant. • Without considering communication costs, each task is assigned to all processors in a non-increasing order of priority, and the critical processor for each task is obtained by calculating the earliest start time and earliest completion time for each task (Eq. 5). • Assign all tasks to its critical processor to get the critical predecessor of each task (Eq. 6), and the critical predecessor of the entry task is null. 4.2 Initial Clustering The critical predecessor task of the task, which is the bottleneck task that prevents the task from starting execution earlier, has zero communication time between tasks in the same cluster. Therefore, we can take benefit of this advantage of clustering by putting the task and its critical predecessor task in the same cluster as a way to make the task start as early as possible. The details of generating the clusters are shown in Algorithm 1. • Iterate through all tasks, and find tasks that are not critical predecessors of any task, put these tasks into the readylist in turn, then iterate through the tasks in thereadylist. If vi is a task in thereadylist, generate a clustersk , and add itself to it, sk ={ vi }. Inequality 11 needs to be determined. (The tasks stored in readylist can be understood as the seed tasks for generating the initial cluster) cpre(vi )! = null

(11)

If Inequality 11 is satisfied, assume that cpre(vi ) = vj , add vj to cluster sk , and the cluster is updated to sk ={ vi , vj }, then continue to bring vj into Inequality 11 to see if it is satisfied, and repeat these operations until the inequality is not satisfied. If Inequality 11 is not satisfied, the generated cluster is sk ={ vi }.

A Task-Duplication Based Clustering Scheduling Algorithm

187

• Sort the tasks in the generated cluster by non-increasing priority.

4.3 Processor Selection Phase After the previous steps have been completed, we have generated a number of clusters and obtained the order fprc(k, r) of the selected processors for the cluster. The details of assigning processors to clusters are shown in Algorithm 2. • For sk ∈ S, iterate through the fprc(k, 1) of all clusters sk , if only the fprc(k, 1) of a certain cluster sk is processor pm , then the tasks in this cluster sk are assigned to processor pm of this cluster fprc(k, 1). If the fprc(k, 1) of some clusters is the same, these clusters are temporarily unassigned processors. • If there are several clusters with the same fprc(k, 1), the one with the larger sc(sk ) value for cluster sk communication cost has the priority to choose the processor. Because if the sc(sk ) value for cluster sk is larger, it means that the tasks in this cluster sk are waiting for the needed data for a longer time, and it is more difficult for this cluster to complete all the tasks compared to other clusters with smaller sc(sk ) values, and this cluster has the priority to choose the processor in the overall situation. • The remaining cluster sk now has fprc(k, 1) occupied by other clusters. Arrange the remaining clusters sk according to sc(sk ) non-increasingly, still with the cluster with the larger sc(sk ) value selecting the processor first. Start iterating through the fprc(k, r) of cluster sk according to Inequality 12 until the rth fprc(k, r) processor is free, and assign the tasks in cluster sk to this processor. If all processors are occupied, then cluster sk is assigned directly to fprc(k, 1). fprc(k, 1), . . . , fprc(k, r)

(12)

188

P. Zhang et al.

• After the previous steps, there may be two duplicate tasks assigned to the same processor, so it is necessary to remove the duplicate tasks on the same processor and arrange the tasks on each processor in a non-increasing order of priority.

5 Experimental Results and Discussion In this section, we present an evaluation of our proposed TDCS algorithm, comparing it against existing algorithms such as PEFT [6], LDLS [11], and HEFT [5], with respect to each comparison metric.

A Task-Duplication Based Clustering Scheduling Algorithm

189

5.1 Comparative Indicators In the evaluation of algorithm performance, the scheduling length (or makespan) of a DAG is the basic criterion for evaluation. When dealing with larger task graphs, a scheduling length calculation method normalized to a smaller range is proposed. The approach, referred to as Scheduling Length Ratio (SLR), is defined mathematically in Eq. 13. SLR =



makespan

vi ∈CP min minpm ∈p {EC(i,m)}

(13)

The denominator of the formula is obtained by summing up the minimum execution times of all tasks along the CP min of the graph (CP min is the critical path of the graph, and the execution times of all tasks are considered as the minimum of the tasks). The scheduling performance is better when the SLR is smaller. 5.2 Randomly Generated Task Maps In this section, we introduce the method and parameters used to generate random graphs, as well as an evaluation analysis of the algorithm. Random Graph Parameters. We performed scheduling experiments with six distinct parameters to gain a comprehensive understanding of how our algorithm affects different task graphs, and provide a detailed analysis of the effects of three parameters. We implemented these parameters using the DAG generation procedure provided in [15] to generate DAG graphs with varying shape. • n: the number of tasks in the task graph of the application. n = [10, 20, 30, 40, 50, 60]. • regularity: the regularity of the number of nodes between different layers. Regularity = [0.2, 0.5, 0.8]. • jump: specifies the maximum number of layers that an edge is allowed to traverse in the DAG graph. That is to say, all tasks satisfies: jump > [level(vj ) − level(vi )],vj ∈ suc(vi ). Jump = [2, 4, 7]. • balance (β): assigns the balance of computation time for random tasks, also be expressed as a factor of processor heterogeneity, specifies a range of values for EC(i, m). β= [0.5, 1, 1.5]. • CCR (Communication to computation ratio): the ratio of edge weights to the sum of node weights in the task graph. The overall task graph generation time cost consists of the computation cost of the task itself and the communication cost between tasks, so CCR is a measure of the proportion of task communication and computation spend in the task graph cost. CCR = [0.5, 1, 4, 8, 10, 20]. • Processor: number of processors, the greater the number of processors, the more parallel computing power. Processor = [8, 16, 32]. Performance of Random Graphs. The parameters listed above were selected for our experiments in generating the random task graphs utilized for evaluation purposes.

190

P. Zhang et al.

Figure 2 represents the comparison results of the average scheduling length for different number of tasks and CCRs. It can be easily seen that the TDCS algorithm has significantly improved compared to the HEFT, PEFT and LDLS algorithms, especially when the number of tasks is greater than 30, the TDCS algorithm reduces the average scheduling length more than the LDLS algorithm (7.78%,7.69%,8.41%,8.81%). When the CCR is less than 2.5, there is almost no difference between the scheduling length ratios of the four algorithms; when the CCR is greater than 2.5, the SLR of the TDCS algorithm is significantly lower than that of the three algorithms HEFT, PEFT and LDLS as the CCR becomes larger and larger, and at CCRs of (4, 8, 10, 20), the performance of TDCS improved relative to LDLS (13%, 15.07%, 14.90%, 12.90%).

Fig. 2. Average makespan for random graphs of task numbers and CCR.

Fig. 3. Average SLR for Gauss elimination as a function of CCR of DAG and heterogeneity of processors.

5.3 Real-World Application Graphs In addition to the randomly generated task graphs, we considered the performance of the algorithms in realistic applications. We use Gaussian elimination and fast Fourier transform as the basic graphical framework and generate task graphs with different weights based on the parameters CCR and β. Gaussian Elimination. Figure 3 shows that for task graphs with higher heterogeneity (higher β-values), the TDCS algorithm improves LDLS by (4.13%, 6.72%, and 11.32%),

A Task-Duplication Based Clustering Scheduling Algorithm

191

Fig. 4. Average SLR for fast Fourier transform as a function of CCR of DAG and heterogeneity of processors.

respectively. For low communication rates (CCR = 0.1), scheduling results of the algorithm are generally consistent and the average speedup of LDLS by TDCS for higher CCR cases (1, 2, 5) (5.58%, 7.65%, 9.01%). Fast Fourier Transform. As shown in Fig. 4, the scheduling results of TDCS are slightly better than LDLS at CCR = 5. The total average speedups obtained by TDCS correspond to (0.69%, 8.01%, and 10.54%) for LDLS, PEFT, and HEFT, respectively. The performance improvement of TDCS in this highly interconnected graph is not significant compared to LDLS, but PEFT and HEFT still show considerable improvements. Table 1 gives the percentage of makespan better, equal and worse results, which were used to compare the performance of the algorithms. Compared to LDLS, TDCS has a shorter task scheduling length in 57.79% of cases, and for TDCS and PEFT, the HEFT algorithm has a improved solution in (60.93%, 64.47%) of cases, respectively. From the experimental results, we can conclude that the proposed algorithm is suitable for solving the problem under consideration. Table 1. A comparison of the scheduling algorithm with two overall makespan.

TDCS

better

TDCS

LDLS

PEFT

HEFT

*

57.79%

60.93%

64.47%

2.57%

0.81%

0.25%

equal worse LDLS

39.64%

38.26%

35.23%

*

40.04%

61.12%

2.57%

28.63%

2.02%

57.79%

31.33%

36.79%

better

39.64%

equal worse

(continued)

192

P. Zhang et al. Table 1. (continued)

PEFT

HEFT

TDCS

LDLS

PEFT

HEFT

better

38.26%

31.33%

*

62.78%

equal

0.81%

28.63%

5.91%

worse

60.93%

40.04%

31.31%

better

35.23%

36.79%

31.31%

equal

0.25%

2.02%

5.91%

worse

64.47%

61.12%

62.78%

*

6 Conclusion In the paper, we proposed a cluster scheduling algorithm for heterogeneous computing systems based on task duplication. In contrast to existing algorithms, its task duplication is generated along with the cluster generation process. In the cluster generation process, it takes into account the advantage of cluster scheduling with task duplication by exchanging the computational cost of tasks for the communication cost. The communication cost of the tasks in the cluster is also taken into account in the selection of the processor. Simulation experiments showed that the TDCS algorithm outperforms the LDLS, PEFT and HEFT algorithms in terms of frequency of optimal results, better than LDLS algorithm in 57.79% cases and better than HEFT in 64.47% cases. In the future, we can add reliability considerations to this model.

References 1. Ahmad, W., Alam, B.: An efficient list scheduling algorithm with task duplication for scientific big data workflow in heterogeneous computing environments. Concurrency and Computation: Practice and Experience 33(5), e5987 (2021) 2. Pandey, V., Saini, P.: A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in Hadoop YARN. Cluster Comput. 24(2), 683–699 (2020). https://doi. org/10.1007/s10586-020-03146-7 3. Wang, L., Wu, W., Xu, Z., et al.: Blasx: a high performance level-3 blas library for heterogeneous multi-gpu computing. In: Proceedings of the 2016 International Conference on Supercomputing, pp. 1–11 (2016) 4. Hu, Y., Zhou, H., de Laat, C., et al.: Concurrent container scheduling on heterogeneous clusters with multi-resource constraints. Futur. Gener. Comput. Syst. 102, 562–573 (2020) 5. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002) 6. Arabnejad, H., Barbosa, J.G.: List scheduling algorithm for heterogeneous systems by an optimistic cost table. IEEE Trans. Parallel Distrib. Syst. 25(3), 682–694 (2013) 7. Liu, N., Ma, L., Ren, W., et al.: An improved ACS algorithm by CA for task scheduling in heterogeneous multiprocessing environments, pp. 216–235. Springer Nature Singapore, Singapore (2022). https://doi.org/10.1007/978-981-19-8152-4_16

A Task-Duplication Based Clustering Scheduling Algorithm

193

8. Arafat, M.Y., Pan, S., Bak, E.: Distributed energy-efficient clustering and routing for wearable IoT enabled wireless body area networks. IEEE Access 11, 5047–5061 (2023). https://doi. org/10.1109/ACCESS.2023.3236403 9. Yuan, F., Zhao, Q., Huang, B., et al.: Scheduling of time-constrained single-arm cluster tools with purge operations in wafer fabrications. J. Syst. Architect. 134, 102788 (2023) 10. Guo, H., Zhou, J., Gu, H.: Limited duplication-based list scheduling algorithm for heterogeneous computing system. Micromachines 13(7), 1067 (2022) 11. Fan, W., Zhu, J., Ding, K.: An improved task duplication based clustering algorithm for DAG task scheduling in heterogenous and distributed systems. In: 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 878–883. IEEE (2022) 12. Shi, L., Xu, J., Wang, L., et al.: Multijob associated task scheduling for cloud computing based on task duplication and insertion. Wirel. Commun. Mob. Comput. 2021, 1–13 (2021) 13. He, K., Meng, X., Pan, Z., et al.: A novel task-duplication based clustering algorithm for heterogeneous computing environments. IEEE Trans. Parallel Distrib. Syst. 30(1), 2–14 (2018) 14. Cheng, D., Hu, W., Liu, J., et al.: Permanent fault-tolerant scheduling in heterogeneous multi-core real-time systems. In: 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 673–678. IEEE (2021) 15. DAGGEN: A Synthetic Task Graph Generator. https://github.com/frs69wq/daggen. Accessed 10 Feb 2023

Hyper-heuristic Q-Learning Algorithm for Flow-Shop Scheduling Problem with Fuzzy Processing Times Jin-Han Zhu1 , Rong Hu1,2(B) , Zuo-Cheng Li1 , Bin Qian1,2 , and Zi-Qi Zhang1 1 School of Information Engineering and Automation, Kunming University of Science and

Technology, Kunming 650500, China [email protected] 2 School of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, China

Abstract. The flow-shop scheduling problem (FSP) has been extensively studied over the past few decades. Q-learning algorithms (QLAs) have been shown to be effective in evolving efficient heuristics for scheduling problems. In this paper, we propose a hyper-heuristic Q-learning algorithm (HHQLA) for the FSP with fuzzy processing times (FSPF) to minimize the fuzzy makespan. FSPF accounts for the existence of various disturbing factors in the manufacturing industry that make the processing time fuzzy. To capture this, we use triangular fuzzy numbers (TFNs) to express the processing time more realistically. Firstly, four low-level heuristics (LLHs) are designed based on the characteristics of FSPF, and the permutations of the LLHs are used as high-level individuals. Secondly, the extraction of valuable information from excellent high-level individuals by using the Q-learning algorithm (QLA). Thirdly, a new sampling strategy is designed to apply the information obtained by QLA to generate new high-level individuals and search for potential regions in the solution space. Finally, the experimental results demonstrate that the proposed HHQLA can effectively solve FSPF. Keywords: Fuzzy processing times · Hyper-heuristic algorithm · Q-learning algorithm

1 Introduction Flow-shop scheduling problem (FSP) is a prominent combinatorial optimization problem with many real-world applications that have received extensive attention from scholars [1, 2]. FSP aims to determine the optimal sequence for processing the jobs. In actual productions, the processing time is often not a definite value due to raw material differences [3] and operator proficiency [4], therefore fuzzy processing times are introduced. FSP with fuzzy processing times (FSPF) is an extension of FSP. FSPF is known to be NP-hard [5], and the exact method cannot obtain a solution for FSPF in a very short time. Therefore, designing an effective algorithm to solve FSPF is a challenge and an urgent task that needs to be addressed. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 194–205, 2023. https://doi.org/10.1007/978-981-99-4755-3_17

Hyper-heuristic Q-Learning Algorithm

195

Different from FSP, FSPF considers the fuzziness that exists in actual production, using fuzzy sets to express processing times. The concept of fuzzy sets was proposed by Zadeh [6]. The scheduling problem with fuzzy processing time was first presented by McCahon and Lee [7]. Since then, especially in recent years, an increasing number of scholars have studied scheduling problems with fuzzy characteristics. The fuzzy job-shop scheduling problem (FJSP) [8], fuzzy flexible job-shop scheduling problem (FFJSP) [9], and distributed fuzzy hybrid flow-shop scheduling problem (DFHFSP) [10] have been studied by domestic and foreign scholars. In much literature, the expression of fuzzy processing time is basically in the form of triangular fuzzy numbers (TFNs). In the paper, we use the ranking criterion of triangular fuzzy numbers proposed by Li et al. [11] to rank the triangular fuzzy numbers more reasonably. Using an efficient algorithm to solve the proposed FSPF is essential. The effectiveness of heuristics varies depending on the scheduling scenario, and creating heuristics manually is a tedious process. As a result, hyper-heuristic is proposed to learn effective heuristics [12]. The Q-learning algorithm (QLA) possesses strong learning capabilities, QLA and its variants have been used in the semiconductor final test scheduling problem (SFTSP) [13], job-shop scheduling problem (JSP) [14], and assembly flow-shop scheduling problem (AFSP) [15]. The hyper-heuristic algorithm (HHA) consists of a series of low-level heuristics (LLHs) and high-level strategy (HLS) to search for different neighborhoods in the solution space. Because QLA is efficient in learning, the Q-learning algorithm is adopted as the HLS, which constitutes the hyper-heuristic Qlearning algorithm (HHQLA). The HHQLA implements a tight combination of QLA and LLHs, which may be more effective for guiding the algorithm to search for promising regions in the solution space of FSPF. The main contributions of our paper are summarized as follows: • We consider the uncertainty of executing the operation in the actual production process, the FSPF model is proposed to apply to more scenarios. • We propose new Q-table states and actions, which quickly guide the global search to find potential regions in the solution space. • We design an efficient sampling strategy for the proposed Q-table to generate a new high-level strategy population. The paper is organized as follows: The problem descriptions and triangular fuzzy numbers are introduced in Sect. 2. Proposes the hyper-heuristic Q-learning algorithm in Sect. 3. Then, discusses the results of the comparison experiments in Sect. 4. Finally, conclusions and future works are provided in Sect. 5.

196

J.-H. Zhu et al.

2 FSPF 2.1 Problem Descriptions The FSPF can be described as follows: There are n jobs J = {J1 , J2 , · · · , Jn } must be processed on m machines M = {M1 , M2 , · · · Mm }. Each job consists of m operations, and each operation is assigned to a different and uniquely designated machine. Oi,j denotes the jth operation ofthe ith job, and  the operation Oi,j is assigned to the machine Mj for 1 2 3  processing. Pi,j = p , p , p indicates the fuzzy processing time of operation Oi,j , i,j

i,j

i,j

1 , p2 and p3 are the most which is represented by the triangular fuzzy number (TFN). pi,j i,j i,j optimistic, most probable, and most conservative processing times,respectively. The  1 , c2 , c3 . The FSPF aims Ci,j = ci,j fuzzy completion time of Oi,j is denoted as the  i,j i,j to determine the processing sequence of n jobs to minimize the fuzzy makespan  Cmax , subject to satisfying the processing and precedence constraints as follows:

• The same processing path for each job, without any change being allowed. • Each machine can process only one operation at a time and the operation cannot be interrupted. • A job can only be processed on one machine at a time. • The preparation time for the operation is negligible or included in the processing time. The fuzzy makespan  Cmax is calculated as follows:  P1,1 C1,1 = 

(1)

 Ci−1,1 +  Pi,1 , i = 2, 3, · · · , n Ci,1 = 

(2)

 C1,j−1 +  P1,j , j = 2, 3, · · · , m C1,j = 

(3)

   Pi,j , i = 2, 3, · · · , n, j = 2, 3, · · · , m Ci−1,j ,  Ci,j−1 +  Ci,j = max 

(4)

 Cn,m Cmax = 

(5)

Equations (1)–(5) are the equations for calculating the makespan of each job when the processing sequence of the jobs is determined. The FSPF is to find a permutation π ∗ in the set of all permutations  such that:   Cmax (π ); π ∈  π ∗ = arg min  (6)

2.2 Triangular Fuzzy Numbers and Related Operations Due to the many uncertainties in the production process, the processing time for each operation can only be determined as an approximate range. In FSPF, the TFNs are used to

Hyper-heuristic Q-Learning Algorithm

197

represent the fuzzy processing time of each operation. The equation of the membership function of TFN is as follows: ⎧ ⎪ 0, x ≤ t1 ⎪ ⎪ x−t1 ⎨ , t 1 < x ≤ t2 1 (7) μTFN = tt23−t −x ⎪ ⎪ t3 −t2 , t2 < x < t3 ⎪ ⎩ 0, x ≥ t3 The membership image of TFN is shown in Fig. 1(a).

Fig. 1. Illustration of the TFN.

In Fig. 1(a), t1 is the most optimistic processing time, t2 is the most probable processing time with membership is one, which is the kernel of this TFN and t3 is the most conservative processing time. Similarly, the fuzzy completion time is also expressed by TFN. The operation between TFNs has specific methods. In the calculation of the fuzzy makespan of FSPF, addition, ranking, and max operations are applied. The following definitions are given for the above operations to generate feasible scheduling [11]. Given B = (b1 , b2 , b3 ). two TFNs:  A = (a1 , a2 , a3 ) and  Addition Operation: The addition operation is shown below:  A + B = (a1 + b1 , a2 + b2 , a3 + b3 )

(8)

Ranking Operation: The ranking operation is conducted based on the following three criteria: • When have two TFNs of a2 < b2 . Use corresponding sides crossed and the kernel A,  B = 21 − y × (a3 − b1 ) + 21 (b3 − a1 ) to judge, if Z1  A,  B < 0, then  A> B, Z1  A,  B > 0, then  A Z2  value of the TFN Z2 U B , then  A> B, else 3   A < B.

198

J.-H. Zhu et al.

 = u3 − u1 to judge, if Z3  • When Z2  A = Z2  A > Z3  B , then use Z3 U B , then  A B. Max Operation: The max value of  A and  B is judged by the following rule: Based on the ranking operation, if  A> B, then  A ∨ B = A, else  A ∨ B = B.

3 HHQLA for FSPF In this section, we propose HHQLA for solving the FSPF. Firstly, the problem-related low-level heuristics and the encoding form are presented. Secondly, a new design for the states and actions of QLA is proposed. Thirdly, a sampling strategy is designed for QLA to generate new high-level strategy populations effectively. Finally, the framework of HHQLA is described. 3.1 LLHs and Encoding In combinatorial optimization problems, the use of a single LLH to perform the search easily falls into the local optimum of the corresponding neighborhood structure earlier, and the quality of its solutions is usually poor. The solution space of the FSPF studied in this paper is complex and an effective search cannot be achieved using a single LLH. Therefore, based on the efficient operations commonly used to solve scheduling problems, the following four low-level heuristics are designed to perform an efficient search of the solution space of FSPF. • Jobs swap (L1 ): Randomly selects two jobs from the sequence of the jobs for swap. • Jobs adjacent swap (L2 ): Randomly selects two adjacent jobs from the sequence of the jobs for swap. • Job inserts (L3 ): Randomly selects two jobs from the sequence of the jobs and inserts the first one before the second one. • Jobs inverse (L4 ): Randomly selects two jobs from the sequence of the jobs and inverses all the jobs between them. Encoding of high-level strategy population: Each individual of the high-level strategy population contains four LLHs, and the same LLH is allowed to appear repeatedly in it. From left to right, the LLHs in the higher-level strategy individuals are performed to update the Q-table. Figure 2 is the illustration of a high-level strategy individual, with the four LLHs forming a specific heuristic algorithm in a certain order. Encoding of low-level problem population: The length of the low-level problem individual is equal to the number of jobs n, and each individual is a solution of the FSPF.

Hyper-heuristic Q-Learning Algorithm

199

Fig. 2. Illustration of the high-level strategy individual.

3.2 Q-learning Algorithm Each state and action in the QLA is associated with a Q-value which reflects the reward received for performing the action in that state. The selection of states and actions is crucial in the QLA. To better guide the algorithm to potential regions, we consider the combination of states and actions with the HHA. The crucial aspect of the HHA is in accumulating information about the permutations of LLHs in high-quality solutions. Therefore, we propose the following definitions of states and actions. States set: Obviously, the region reached after performing the four LLHs can be defined as four states denoted as {s1 , s2 , s3 , s4 }. In addition, there is a special state, the initial state in which no LLHs have been performed which is s0 . The states set can be expressed as S = {s0 , s1 , s2 , s3 , s4 }. Actions set: Define performing four LLHs as four actions. Then the actions set is denoted as A = {a1 , a2 , a3 , a4 }. To better understand the relationship between states and actions, Fig. 3 gives an illustration of the relationship between three states {s0 , s1 , s2 } and two actions {a1 , a2 }.

Fig. 3. Illustration of the relationship between states and actions.

The proposed Q-table of states and actions can effectively reflect the relationships between LLHs. For the update of each Q-value in the Q-table, the calculation is performed by the following equation: Q(st , at ) ← Q(st , at ) + α(Rt + γ max Q(st+1 , at+1 ) − Q(st , at ))

(9)

200

J.-H. Zhu et al.

In Eq. (9), Q(st , at ) is the Q-value of the state st when performing the action at . The max Q(st+1 , at+1 ) is the maximum Q-value at the state st+1 when performing the action at+1 . Rt is the reward received after performing the action at at the state st . α represents the learning rate, which α ∈ [0, 1]. γ represents the discount rate, which γ ∈ [0, 1]. 3.3 Sampling Strategy QLA can estimate the distribution characteristics of high-quality solutions in the solution space by converting the information between LLHs into the corresponding Q-values. Therefore, Q-tables can effectively describe the characteristics of promising regions. To apply the information in these Q-tables in a rational way, to generate the new high-level strategy population. The following sampling strategy was designed: Step 1: Initialize the sampling table with the same size as the Q-table. Step 2: Sort the Q-values of each row and record the order in the sampling table. Step 3: Calculate the cumulative sum of a row in the sampling table (each row has an equal cumulative sum), denoted as sum. Step 4.1: Produce a random value pr where pr ∈ [0, sum). Step 4.2: According to pr , generate an LLH using a roulette wheel selection based on the sampling table. Step 4.3: Using the LLH generated in step 4.2 to form the individuals of the high-level strategy population. Step 4.4: If the new high-level strategy population has not yet been generated completely, go to step 4.1, otherwise finish. 3.4 General Framework of HHQLA HHQLA is a hyper-heuristic-based algorithm, where each iteration consists of learning and searching. The learning stage accumulates the characteristics of the distribution of high-quality solutions in the solution space. The search stage implements the search of the solution space based on the learned characteristics. To better learn the characteristics of high-quality solutions in the solution space of FSPF, the learning stage should have good prediction and learning capabilities. QLA has the above advantages to better guide the algorithm to search. The general framework of HHQLA is given in Algorithm 1.

Hyper-heuristic Q-Learning Algorithm

201

Algorithm 1: Generalize framework of HHQLA Input: n (number of jobs), m (number of machines), and Gmax (number of iterations) 1:

Initialize the Q-table.

2:

Encode the low-level problem population randomly, and encode the high-level strategy population with LLHs. //LLHs and encode form see sec. 3.1

3:

Calculate the fuzzy makespan of all individuals in the low-level problem population.

4:

Set π * := ∅ , G :=0.

5:

While G ≤ Gmax do

6:

G := G + 1 .

7:

Use the proposed QLA to evaluate the high-level strategy population and update the Q-table. //QLA see sec. 3.2

8:

Sample the Q-table to generate a new high-level strategy population. //Sampling strategy see sec. 3.3

9:

Use high-level strategy population to generate new low-level problem population.

10:

Store the optimal individual in π * .

11:

End while

Output: π * (optimal individual)

202

J.-H. Zhu et al.

4 Computational Result and Comparisons To further show the effectiveness of the proposed HHQLA, we compare the HHQLA with the AGA [16], ES10 [17], and HHGA [18]. To ensure the generality of the experimental results, we generated 16 problem sizes each with the following sizes (n ∈ {10, 30, 50, 70}, m ∈ {5, 10, 15, 20}). The parameters are: popsize = 50, sps = 12, α = 0.7 and γ = 0.8. We coded all the algorithms in IntelliJ IDEA and conducted experiments on a computer of CPU 2.5 GHz with 16 GB memory. To verify the effectiveness of the algorithm performance, all the tests independently run 20 times and the termination conditions are the same. The SD metric is given as follows:

2 1 20 Ck − Cavg (10) SD = k=1 20 where Ck is the sum of the fuzzy makespan for the results of the kth run, denoted as Ck = Ck1 + Ck2 + Ck3 , the Cavg is the average value of Ck , k = 1, 2, · · · , 20. To check the significance of the differences between HHQLA and other algorithms, the p-values based on the nonparametric Friedman test at the 95% confidence level (CI) are reported as well. The results of the comparison tests are listed in Table 1. In addition, to make the experimental results clearer, the best values in each row are highlighted in bold. To more clearly compare the performance between all algorithms, Fig. 4 illustrates box plots based on two instances. From Table 1, HHQLA outperforms all other algorithms on the average value of all the instances. HHQLA achieves strong stability for almost all the instances in terms of SD, with only four instances inferior to others. For the significance, the p-values for three comparative algorithms (AGA, ES10, and HHGA) are all smaller than 0.05. Therefore, HHQLA is an effective algorithm for FSPF. The main reason is that HHQLA has a powerful search engine to drive global exploration. Furthermore, the powerful learning power of the QLA and the scalability of the HHA are helpful to explore the search space. In conclusion, the HHQLA proposed in this paper can effectively solve the FSPF.

AGA

Average value

(32.8,82.4,138.4)

(39.8,121.3,195.3)

(49.1,155.9,246.9)

(64.3,181.4,300.1)

(73.3,209.6,344.7)

(85.9,238.2,399.0)

(93.4,288.6,453.1)

(115.0,317.2,505.8)

(101.4,319.8,542.2)

(122.1,375.6,596.0)

(141.7,410.2,650.8)

(160.8,443.0,705.7)

(146.1,448.7,735.6)

(181.9,491.6,792.1)

(183.9,532.6,859.3)

(202.7,570.9,910.0)

0.000

Instance

n×m

10 × 5

10 × 10

10 × 15

10 × 20

30 × 5

30 × 10

30 × 15

30 × 20

50 × 5

50 × 10

50 × 15

50 × 20

70 × 5

70 × 10

70 × 15

70 × 20

p-values

7.60

5.98

5.43

3.68

5.14

6.09

5.62

4.04

3.04

3.71

3.31

4.58

2.05

1.19

1.52

1.12

SD

0.000

(201.1,570.5,911.4)

(185.9,533.3,857.6)

(181.7,491.8,791.7)

(148.6,446.2,735.7)

(160.2,442.2,706.8)

(142.3,410.1,651.6)

(122.5,375.7,594.4)

(104.6,316.9,542.7)

(115.7,315.9,506.7)

(94.4,287.0,454.3)

(82.9,239.7,397.9)

(73.2,208.2,344.2)

(63.8,182.0,299.6)

(51.2,155.0,247.5)

(39.6,121.5,194.8)

(32.9,82.2,138.0)

Average value

ES10

7.67

6.35

4.60

3.67

5.20

4.70

5.89

5.21

3.78

4.80

4.64

5.98

2.08

2.01

2.10

0.22

SD

0.001

(200.3,571.6,912.4)

(185.3,533.2,856.4)

(181.4,492.4,791.1)

(148.1,446.4,735.4)

(161.7,439.6,709.7)

(139.5,411.4,652.1)

(123.6,376.0,597.0)

(102.9,317.7,543.6)

(115.3,315.7,506.9)

(98.4,283.9,455.6)

(85.3,238.1,398.3)

(71.0,210.5,339.3)

(65.9,179.9,300.4)

(50.0,155.2,247.9)

(42.2,118.9,195.9)

(33.0,82.0,138.0)

Average value

HHGA

Table 1. Comparison of AGA, HHGA, ES10 and HHQLA

5.98

6.37

3.98

2.32

6.16

6.45

3.40

3.59

3.12

5.55

3.37

3.66

1.56

1.87

1.80

0.00

SD

(195.2,555.5,904.7)

(180.0,520.0,852.1)

(179.0,480.8,785.5)

(146.8,440.2,740.2)

(156.6,431.0,700.6)

(135.3,401.2,647.7)

(117.6,368.2,591.0)

(107.3,308.4,547.0)

(115.6,305.2,506.4)

(89.7,282.6,449.8)

(84.1,232.1,396.8)

(74.1,201.7,352.2)

(70.0,179.0,296.0)

(48.0,156.0,247.0)

(37.4,121.6,194.4)

(33.0,82.0,138.0)

Average value

HHQLA

4.96

4.88

3.97

4.57

6.13

5.98

4.71

2.65

2.36

3.43

3.28

2.73

0.00

0.00

0.49

0.00

SD

Hyper-heuristic Q-Learning Algorithm 203

204

J.-H. Zhu et al.

Fig. 4. The box plots for all compared algorithms.

5 Conclusions and Future Research In this word, we aim at the flow-shop scheduling problem with fuzzy processing time (FSPF), which has a strong engineering focus. Owing to the NP-hard of FSPF, we propose a hyper-heuristic Q-learning algorithm (HHQLA) to solve it. Firstly, the design of states and actions in QLA based on the characteristics of the HHA, ensures that relationships between LLHs can be efficiently translated into the corresponding Q-values. Secondly, using Q-tables and update equations to accurately record information in excellent individuals. Thirdly, a sampling strategy is designed to generate new populations of high-level strategy to enable an efficient search of the solution space. Finally, the effectiveness of HHQLA has been verified by an experimental comparison of different-sized problems. In the future, we consider the HHQLA algorithm for solving fuzzy production and transport integration scheduling problems or fuzzy energy-efficient production problems. Besides, designing a local search that closely integrates with HHQLA to improve the search capability of the algorithm is also a valuable research direction. Acknowledgments. This research was supported by the National Natural Science Foundation of China (61963022 and 62173169) and the Basic Research Key Project of Yunnan Province (202201AS070030) and the Yunnan Fundamental Research Projects (202301AT070458).

References 1. Zhao, Z., Zhou, M., Liu, S.: Iterated greedy algorithms for flow-shop scheduling problems: a tutorial. IEEE Trans. Autom. Sci. Eng. 19(3), 1941–1959 (2021) 2. Gmys, J., Mezmaz, M., Melab, N., Tuyttens, D.: A computationally efficient branch-andbound algorithm for the permutation flow-shop scheduling problem. Eur. J. Oper. Res. 284(3), 814–833 (2020) 3. Kong, M., Pei, J., Xu, J., Liu, X., Yu, X., Pardalos, P.M.: A robust optimization approach for integrated steel production and batch delivery scheduling with uncertain rolling times and deterioration effect. Int. J. Prod. Res. 58(17), 5132–5154 (2020) 4. Defersha, F.M., Obimuyiwa, D., Yimer, A.D.: Mathematical model and simulated annealing algorithm for setup operator constrained flexible job shop scheduling problem. Comput. Ind. Eng. 171, 108487 (2022)

Hyper-heuristic Q-Learning Algorithm

205

5. Wang, H., Wang, W., Sun, H., Cui, Z., Rahnamayan, S., Zeng, S.: A new cuckoo search algorithm with hybrid strategies for flow shop scheduling problems. Soft. Comput. 21(15), 4297–4307 (2016). https://doi.org/10.1007/s00500-016-2062-9 6. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965) 7. McCahon, C.S., Lee, E.S.: Job sequencing with fuzzy processing times. Comput. Math. Appl. 19(7), 31–41 (1990) 8. Wang, G.G., Gao, D., Pedrycz, W.: Solving multiobjective fuzzy job-shop scheduling problem by a hybrid adaptive differential evolution algorithm. IEEE Trans. Ind. Inf. 18(12), 8519–8528 (2022) 9. Li, R., Gong, W., Lu, C.: Self-adaptive multi-objective evolutionary algorithm for flexible job shop scheduling with fuzzy processing time. Comput. Ind. Eng. 168, 108099 (2022) 10. Zheng, J., Wang, L., Wang, J.J.: A cooperative coevolution algorithm for multi-objective fuzzy distributed hybrid flow shop. Knowl. Based Syst. 194, 105536 (2020) 11. Li, S., Hu, R., Qian, B., Zhang, Z., Jin, H.: Hyper-heuristic genetic algorithm for solving fuzzy flexible job shop scheduling problem. Control Theory Appl. 37(02), 316–330 (2020) 12. Sánchez, M., Cruz-Duarte, J.M., Carlos Ortíz-Bayliss, J., Ceballos, H., Terashima-Marin, H., Amaya, I.: A systematic review of hyper-heuristics on combinatorial optimization problems. IEEE Access 8, 128068–128095 (2020) 13. Lin, J., Li, Y.Y., Song, H.B.: Semiconductor final testing scheduling using Q-learning based hyper-heuristic. Expert Syst. Appl. 187, 115978 (2022) 14. Wang, Y.-F.: Adaptive job shop scheduling strategy based on weighted Q-learning algorithm. J. Intell. Manuf. 31(2), 417–432 (2018). https://doi.org/10.1007/s10845-018-1454-3 15. Wang, H., Sarker, B.R., Li, J., Li, J.: Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning. Int. J. Prod. Res. 59(19), 5867–5883 (2021) 16. Aminzadegan, S., Tamannaei, M., Fazeli, M.: An integrated production and transportation scheduling problem with order acceptance and resource allocation decisions. Appl. Soft Comput. 112, 107770 (2021) 17. Khurshid, B., Maqsood, S., Omair, M., Sarkar, B., Saad, M., Asad, U.: Fast evolutionary algorithm for flow shop scheduling problems. IEEE Access 9, 44825–44839 (2021) 18. Bacha, S.Z.A., Belahdji, M.W., Benatchba, K., Tayeb, F.B.S.: A new hyper-heuristic to generate effective instance GA for the permutation flow shop problem. Procedia Comput. Sci. 159, 1365–1374 (2019)

Hyper-heuristic Estimation of Distribution Algorithm for Green Hybrid Flow-Shop Scheduling and Transportation Integrated Optimization Problem Ling Bai1 , Bin Qian1,2(B) , Rong Hu1,2 , Zuocheng Li1 , and Huai-Ping Jin1 1 School of Information Engineering and Automation, Kunming University of Science and

Technology, Kunming 650500, China [email protected] 2 School of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, China

Abstract. With the deepening of economic globalization, the integrated production and transportation mode has become an inevitable tendency of modern supply chain. At the same time, facing increasingly serious ecological and environmental problems, green production and green transportation are important ways to reduce carbon emissions. In this study, a green hybrid flow-shop scheduling and transportation integrated optimization problem (GHFSSTIOP) with the objective of minimizing total cost is investigated. To cope with this problem, we present a hyper-heuristic estimation of distribution algorithm (HHEDA). According to the features of GHFSSTIOP, firstly, a novel loading strategy is proposed. Secondly, estimation of distribution algorithm (EDA) is used for high-level strategy, to learn and accumulate the sequence information of high-quality solutions and their location information in the high-level population, and then generate new high-level solutions by sampling probabilistic model in EDA to enhance the global search ability of HHEDA; subsequently, four effective low-level heuristic operations are designed in the low-level of HHEDA to enhance the local search capability of HHEDA, and a new update strategy is designed to ensure variegation of highlevel population. Finally, effectiveness of HHEDA is evidenced by numerical simulation and algorithm comparison. Keywords: Green hybrid flow-shop · Production and transportation integration scheduling · Hyper-heuristic · Estimation of distribution algorithm

1 Introduction With the continuous development of economic globalization, a rising number of companies are focusing on coherence and collaboration of all parts in supply chain. In addition, in the face of changing climate and environment, production and logistical transportation must change the way that they are developed. Hybrid flow-shop scheduling problem © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 206–217, 2023. https://doi.org/10.1007/978-981-99-4755-3_18

Hyper-heuristic Estimation of Distribution Algorithm

207

(HFSP) has a strong engineering background and is widely existed in the fields of paper, metallurgy, chemistry, machinery, architecture, textile, etc. [1]. In above context, importance of green hybrid flow-shop production and transportation integrated optimization problem is becoming increasingly evident. In terms of computational complexity, a production transportation problem has been evidenced to be an NP-hard problem [2]. Moreover, GHFSSTIOP is an NP-hard problem because it belongs to production transportation problems. Therefore, it is of great academic significance and application value to study the models and solutions of GHFSSTIOP. The schematic of GHFSSTIOP is displayed (Fig. 1).

Fig. 1. Schematic diagram of GHFSSTIOP.

For the past few years, integration of production and transportation problem (IPTP) has gotten much attention, and it has been found that optimizing IPTP can improve its performance more than optimizing its sub-problems separately [3]. Nowadays, there are already some researches on IPTP. For example, for the optimization of single-objective aspect, Lee et al. [4] designed a large neighborhood search (LNS) algorithm for the parallel machine integrated scheduling problem in the form of multi-trip transportation. Abdollahzadeh et al. [5] presented a new whale optimization algorithm (WOA) for integrating production and distribution scheduling problem. For multi-objective optimization aspect, Ganji et al. [6] designed a multi-objective ant colony optimization algorithm (MOACOA) for single-plant integrated scheduling problem with multi-vehicle batch transportation. Karimi et al. [7] designed a branch and bound (B&B) approach for multiobjective integrated scheduling problem of production and vehicle distribution. Based on the above literature studies, it is known that there is no research on GHFSSTIOP. Therefore, we establish GHFSPTISP model and present an effective way to solve it. Hyper-heuristic algorithm (HHA) is a new type of intelligent optimization algorithms. It manipulates or manages the low-level heuristic (LLH) operations through some high-level strategy (HLS), and the LLH operations achieve a deep search of different regions of the solution space with specific rules. In recent years, HHA has been used effectively to tackle various combinatorial optimization problems [8–10]. However, there is no HHA to solve GHFSSTIOP. EDA is a new intelligent optimization algorithm based on statistical learning. It relies on construction and maintenance of probabilistic model, and new individuals are generated by probabilistic model sampling. Different from traditional optimization algorithms, the evolutionary mechanism of EDA can prevent solutions that are evolving in a good direction from being destroyed. Due to its features of fast convergence, global search and inherent parallelism, EDA has been increasingly studied and applied to solve

208

L. Bai et al.

various problems and achieved very good results [11–14]. According to the literature research, there is no report of EDA to solve the GHFSSTIOP, so it is very necessary to carry out the related research. As a result, this paper selects EDA as the high-level algorithm of HHEDA. This study has three contributions: (1) A loading strategy is proposed to better link production and transportation. (2) A population update method is designed to improve the performance of HHEDA. (3) HHEDA provides an effective solution for optimization problems.

2 GHFSSTIOP GHFSSTIOP can be described as: The factory produces according to the orders, the manufacturer firstly receives N orders from different geographical locations, then N jobs are processed through a factory to generate final products. Finally, the goods are loaded on vehicles and then transported to corresponding customer points through 3PL. There are three stages in this entire procedure: production stage, loading stage and transportation stage. In the production stage, every job has S process stages which are processed by m machines in a plant, and processes of jobs need to be processed in a fixed sequence of machines by assembly line. The speed set of each machine is V =(v1 , v2 , ..., vb ), where b is the speed type of each variable speed machine, and different processing speeds make the machine energy consumption and job process time vary. During processing, machine has two different statuses: processing status and idle status. If it is idle, instantaneous energy consumption per unit time is IECkj . If machine j is processing at speed vl , energy consumption per unit time is ECkjl . When process k of job j is processed on machine i at speed vl , the corresponding actual processing time is pijkl = pijk /vl . In the loading stage, each vehicle has the same load capacity Q. All jobs are loaded without exceeding constraint of a vehicle load and constraints of minimum completion and transport times. In the transport stage, each vehicle departs from the factory at a fixed speed and transports the products to the corresponding customer points according to the route planned in the loading stage. 2.1 Problem Assumptions and Symbol Definition The related mathematical notations are defined as below. j is job number; i is machine number; k is process number; h is vehicle number; c1 is production cost factor; c2 is transportation cost factor; c3 is machine energy cost factor; c4 is vehicle start-up cost factor; pj,k denotes processing time of the kth process of job j; Q is maximum load capacity of vehicle; π h is sequence of jobs carried by vehicle h; f is the total cost; f1 is the processing stage cost; f2 is energy consumption cost; f3 is the transportation stage cost; Th denotes the pick-up time of vehicle h; N denotes total customers (total number of jobs); H denotes total vehicles; M denotes total machines; qj means the weight of job j; tij indicates the transportation time from customer i to customer j; Cj denotes completion time of job j; Cj,k,i means completion time for process k of job j by machine i. The assumptions of GHFSSTIOP are shown in Table 1.

Hyper-heuristic Estimation of Distribution Algorithm

209

Table 1. Assumptions of GHFSSTIOP. Production stage

Transportation stage

(1) All machines are turned on and idle before (1) Vehicles owned by 3PL companies are not processing jobs always engaged in distribution tasks (2) The machine can only produce the job according to the sequence of each process

(2) 3PL companies own homogeneous vehicles

(3) Each process can only be processed in a machine during same time, and each process can only be processed by a machine during same time

(3) Each vehicle makes only one delivery in a task assignment

(4) Each job must be machined continuously and without interruption

(4) Vehicle departs from the factory and does not need to return to the factory after service

(5) The processing speed of the machine cannot be changed during the job processing

(5) The vehicle needs to serve each customer, and each customer can only be served once

2.2 Problem Model The optimization objective of GHFSSTIOP is to minimize total cost f . This optimization objective consists of three parts: processing cost f1 , energy consumption cost f2 and transportation cost f3 . Subject to: C1,1,i = P1,1

(1)

C1,k,i = C1,k−1,i + P1,k

(2)

Cj,1,i = Cj−1,1,i + Pj,1

(3)

  Cj,k,i = max Cj,k−1,i , Cj−1,k,i + Pj,k

(4)

N 

zijk =1 (j = 1, 2, . . . , N )

(5)

z0jk = 1 (k = 1, 2, . . . , H )

(6)

i=1 N  j=1

Th = Cπ h (−1) N N   j=1 i=1

zijk × qi ≤ Q (i = j, ∀k ∈ H )

(7)

(8)

210

L. Bai et al.

Decision variables:  1, if process k is processed on machine j at speed vl xkjl = (∀j ∈ M , k ∈ S, l ∈ b) 0, otherwise (9)  1, if machine j is idle when process k is produced yjk = (j ∈ N , k ∈ S) (10) 0, otherwise  1, if vehicle k loads job i and job j zijk = (11) (∀i, j ∈ N , k ∈ H ) 0, otherwise Objective function: f1 = c1 ×

N 

max Ci

(12)

i=1

⎞ ⎛ b M  M S  S    f2 = ⎝ IECkjl × xkjl + ECkj × ykj ⎠ × c2 k=1 j=1 l=1

f3 =

H  

(13)

k=1 j=1

  c3 × sum tij × zijk + c4 × H (∀i, j ∈ N )

(14)

k=1

f = min(f1 + f2 + f3 )

(15)

Equations (1)–(4) are to calculate completion time of jobs. Constraint (5) means that if a client is served, the client is a former or next servicer of another client. Constraint (6) denotes each client can only be served by one vehicle once. Equation (7) indicates that the pickup time is the last job’s completion time to be loaded for each vehicle. Constraint (8) means that total weight of jobs to be loaded should not exceed the load capacity of each vehicle. Equations (12)–(14) indicate that calculate the cost of each stage. Equation (15) indicates that total cost is calculated.

3 Feature of the Solution In this paper, we propose a loading strategy based on “first-completed-first-transported” rule, where the vehicle pick-up time Th is not defined first and is set by the manufacturer. The manufacturer completes the loading and transportation by working with a 3PL. In response to long waiting time of customers and high cost, a loading strategy is proposed and a transportation route is developed to enhance customers’ satisfaction and decrease total cost. In the loading process, vehicle capacity limitation is met while weighted sum of completion time and transportation time is minimized. If this is not met, the next vehicle is loaded until the product has been loaded. As soon as one vehicle is loaded, it leaves to deliver, and leaving time of each vehicle is the completion time of the last product loaded in a vehicle. This shows an example of GHFSSTIOP (Fig. 2).

Hyper-heuristic Estimation of Distribution Algorithm

211

Fig. 2. An example of GHFSSTIOP when N = 6, M = 6 and S = 3.

3.1 Encoding and Decoding For the HLS domain, each individual in population consists of an arrangement of four low-level heuristic operation LLHc , where c is the serial number of LLH operation. When decoding individuals in HLS domain, LLH operations are performed from left to right for an individual in the low-level problem domain. If the new solution is better than the old one, the old solution is replaced with the new one; otherwise, the old solution is kept and the remaining low-level heuristics are executed. After executing all LLH operations in HLS domain individual, fitness value of the high-level individual is the difference between fitness value of its corresponding old solution and new solution. The diagram of a high-level policy domain individual is shown (Fig. 3).

Fig. 3. The diagram of a high-level policy domain individual.

For the low-level problem domain, every individual is a solution of original problem. The low-level problem domain encodes an individual which is the job processing sequence. In addition, this paper designs a novel decoding method for GHFSSTIOP, which assigns the jobs to each vehicle according to the load constraint and transportation time constraint on basis of first completed first transported. The decoded individual consists of the job processing sequence and vehicle routing sequence. The diagram of the decoded low-level individual is shown (Fig. 4).

Fig. 4. The diagram of the decoded low-level individual.

212

L. Bai et al.

4 Hyper-heuristic Estimation of Distribution Algorithm 4.1 Population Initialization In this paper, we sample the initial probabilistic model by Eq. (18) to produce population of high-level strategy domain. For the low-level problem domain, to ensure the diversity and dispersion of solutions and to allow sufficient search for subsequent operations, so the low-level problem domain individuals are generated by a random initialization method. 4.2 Probabilistic Model Probabilistic model is core for EDA, which describes solution distribution and overall evolutionary trend of population through probabilistic model and its updates. Thus, based on different problems, it is necessary to design a suitable probabilistic model and its update mechanism. We define Pop(gen) as the high-level strategy domain population of the genth generation, PopB (gen) as the high-quality solution in Pop(gen), PopB (gen) =

gen,1 gen,2 gen,bps , ps as the size of Pop(gen), bps as the scale of PopB (gen), OB , OB , ..., OB bps = ps × γ , γ is the percentage of high-quality individuals in the high-level strategy gen,k domain population, and OB is the kth high quality solution in PopB (gen). gen A 2D probabilistic model Pn×n is used in the high-level strategy domain of HHEDA to learn and accumulate information about the sequence of low-level heuristic operations from the high-quality solutions of the HLS domain, and two adjacent operations are gen,k considered as operation blocks. DBk (gen) defines as the sequence of operations of OB in the high-level strategy neighborhood, the length of the sequence is n, DBk (gen) = k

gen,k DB (1), DBk (2), ..., DBk (n) . Mn×n (x, y) is used to record the number of occurrences of operation sequence information [x, y] in DBk (gen), and the calculation process is shown gen in Eq. (16); Sumn×n (x, y) is the total number of occurrences of operation block [x, y] in the sequence of low-level heuristic operations in the high-level strategy domain for all individuals of PopB (gen), and the calculation process is shown in Eq. (17).  gen

Mn×n (x, y) =

G,k G,k 1, if x = DB (i) and y = DB (i + 1) , i = 1, 2, ..., n − 1; x, y = 1, 2, ..., n; k = 1, 2, ..., bps 0, else

gen Sumn×n (x, y)

=

bps  k=1

gen,k

Mn×n (x, y), x, y = 1, 2, ..., n

(16)

(17)

Hyper-heuristic Estimation of Distribution Algorithm

213

The initialization of the two-dimensional probabilistic model is shown in Eq. (18). 0 Pn×n = 1/n, x, y = 1, 2, ..., n

(18)

It is updated as shown in Eq. (19) for 2D probability model with genth generation, where λ is the learning rate of the 2D probability model.   gen+1 gen gen Pn×n (x, y) = (1 − λ) × Pn×n (x, y)+λ × Sumn×n (x, y)/(x × bps) , x, y = 1, 2, ..., n (19)

4.3 High-Level Strategic Domain Population Update 2D probabilistic model is employed at high level of HHEDA to sample and update HLS domain population, the position information of LLH operations are recorded from the high-quality solution through 2D probability matrix, and then rely on the probability matrix to generate new solutions. This is repeated to accumulate the information of high-quality solution, finally find high-quality solution. To assure the solutions’ diversification, we design a new upgrade mechanism for the HLS domain’s individuals, setting the same LLH operations’ number for every two individuals in high-level cannot exceed 75% of the high-level individuals’ length, and if they exceed set range, they are replaced by the previous generation of high-quality individuals in order, if there are no high-quality individuals to replace, they are generated by sampling with the probability model. To make the operation more efficient, it is sampled by roulette method. Each row of the 2D probabilistic model is normalized and the probabilities are accumulated to gen obtain the normalization matrix Pn×n . Denoting gen,k as the sequence of LLH opera  tions for the kth individual in Pop(gen), gen,k = gen,k (1), gen,k (2), ..., gen,k (n) . SelLLH (gen,k , i) is an operation selection function that determines LLH operation to occur at the ith position in gen,k . The sequential generation of low-level heuristic operations in high-level strategy domain individual is shown in Algorithm 1. gen−1 Sum Pinit (y)

=

n  y=1

gen−1

Pn×n (x, y), x = 1, 2, . . . , n

(20)

214

L. Bai et al.

Algorithm 1. The process of SelLLH ( Λ gen , k , i ) -1 Input: The probability matrix Pngen × n ; High-level strategic gen, k domain individual Λ .

1: While i ≤ n If i =1

2:

3: Calculate Sum Pinitgen −1 ( y ) by Eq. (20); generate a random number δ from

)

δ ∈ ⎡0, ∑ h =1 Sum _ Pinitgen −1 ( h ) ; n



4:

If δ ∈ ⎡⎣0, Sum _ Pinitgen −1 (1) )

5:

c ←1 ;

Else if

6:

)

−1 −1 δ ∈ ⎡ ∑ h =1 Sum _ Pngen ( h ) , ∑ h =1 Sum _ Pngen ( h ) , pos ∈ {1,, n − 1} ×n ×n pos

pos +1



c ← pos + 1 ;

7: End if

8: 9:

Else

10:

Generate

a

random

number

n −1 r ∈ ⎡0, ∑ h =1 Pngen ( Λ gen,k ( i − 1) , h )⎤⎦ ; ×n ⎣

−1 If r ∈ ⎡⎣0, Pngen ( Λ gen, k ( i − 1) ,1) ) (select LLH c by ×n

11: roulette) 12:

c ←1 ;

13:

Else if

)

pos pos +1 −1 gen , k −1 gen , k r ∈ ⎡ ∑ h =1 Pngen ( i − 1) , h ) , ∑ h =1 Pngen ( i − 1) , h ) , pos ∈ {1,, n − 1} ×n ( Λ ×n ( Λ ⎣

c ← pos + 1 ;

14: End if

15: 16:

End if

17:

Λ gen , k ( i ) ← LLH c , i = i + 1 ;

18: End While Output: Generated Λ gen , k

r

from

Hyper-heuristic Estimation of Distribution Algorithm

215

4.4 Local Search To improve search performance of HHEDA for high-quality solutions in solution space, this paper designs following multiple effective neighborhood operations at low level of HHEDA, and constitutes many different heuristics by dynamic mixing, which can keep HHEDA searching down until searching to arrive at each other’s locally optimal solution of multiple neighborhood structures, thereby it can enhance search depth of HHEDA for solution space, which is helpful for HHEDA to obtain high-quality solution sets. Specifically, based on the effective neighborhood operations (swap, insert, etc.) frequently used to address combinatorial optimization problem, four neighborhood operations are designed as low-level heuristic operations in this paper and as follows. (1) LLH1 : To select an individual randomly, two jobs are first selected randomly from a process sequence then swapped. (2) LLH2 : To randomly select an individual, a job is randomly selected from process sequence and exchanged with its predecessor or successor. (3) LLH3 : To randomly select an individual, and randomly select two jobs from process sequence, then reverse the sequence of jobs between the two jobs. (4) LLH4 : To randomly select an individual, randomly select two jobs from process sequence, then insert the job in latter position before the job in former position.

5 Simulation Testing and Comparisons To prove the validity of HHEDA, we compare HHEDA with AGA [15] and IICA [16]. Since there is no standard GHFSSTIOP test cases, this paper generates 12 cases based on the cases are provided by Urlings [17], and parameters are: ps = 30, λ = 0.35, γ = 0.4. We wrote all algorithms by Python 3.9 and performed all experiments on a computer of CPU 2.50 GHz with 16 GB memory. To make a fair comparison and assure the stability and credibility of all algorithms, all algorithms run independently 15 times at the same time. AGA is saved as integer. The performance metric is calculated as below IA = (AVGA − AVGB )/AVGA × 100%

(21)

where IA is the percentage value that AVG (AVGA ) of the current algorithm is improved based on AVG (AVGB ) of algorithm A. From Table 2, we can see that HHEDA performs better than AGA and IICA in most cases. The reasons can be summarized into three points: (1) The probability matrix and upgrade mechanism help HHEDA to perform a deeper search in solution space. (2) The population diversity is ensured by proposed upgrade mechanism. (3) The four low-level heuristic operations are designed that can increase local search ability of HHEDA and enhance solutions’ quality.

216

L. Bai et al. Table 2. Comparison of BST, AVE and I AGA of HHEDA and IICA.

Inst. N, M , S

AGA BST

AVG

BST

IICA AVG

I AGA

HHEDA BST

AVG

5,6,2

5546

5554

5567

5585

−0.6%

5521

5529

0.5%

5,9,3

9169

9169

8994

9150

0.2%

8847

8954

2.3%

9,6,2

6900

6984

6839

6853

1.9%

6447

6541

6.3%

9,9,3

13745

13911

13825

13831

0.6%

12820

12902

7.3%

13,6,2

10749

10949

10896

11023

−0.7%

9941

10022

8.5%

13,9,3

15562

15827

15448

15639

1.2%

13747

13961

11.8%

15,6,2

13405

13499

12876

13167

2.5%

12163

12250

9.3%

15,9,3

21365

21892

21779

21995

−0.5%

18522

18631

14.9%

50,8,4

126753

127088

125529

126587

0.4%

107886

108797

14.4%

50,16,4

125163

126161

123419

125517

0.5%

106989

107989

14.4%

50,16,8

470688

479603

483749

491404

−2.5%

426906

435608

9.2%

100,8,4

265951

266705

259989

262663

1.5%

251268

252975

5.1%

I AGA

6 Conclusions and Future Research This study developed a green hybrid flow-shop scheduling and transportation integrated optimization problem (GHFSSTIOP) model with total cost as the optimization objective, and designed a hyper-heuristic estimation of distribution algorithm (HHEDA) to solve it. Based on characteristics of GHFSSTIOP, firstly, we propose a new encoding and decoding strategy to better link production and transportation so as to improve the benefits of manufacturers and customer satisfaction. Secondly, the probabilistic model is used to learn and accumulate the sequence information of high-quality individuals. Thirdly, four heuristic operations are designed to perform a deeper search on the highquality solutions which are found by the global search. And an upgrade mechanism of the high-level population is designed to assure high-level population diversity. Finally, the effectiveness of HHEDA is verified through simulation experiments and comparison of algorithms for different size cases. For future research, HHEDA can be extended to study other optimization problems, such as multi-factory production and transportation integrated scheduling problems. Acknowledgements. This research was supported by the National Natural Science Foundation of China (62173169 and 61963022), the Basic Research Key Project of Yunnan Province (202201AS070030) and Yunnan Fundamental Research Projects (grant NO. 202301AT070458).

Hyper-heuristic Estimation of Distribution Algorithm

217

References 1. Qian, B., Wang, L., Huang, D.X., et al.: An effective hybrid DE-based algorithm for multiobjective flow shop scheduling with limited buffers. Comput. Oper. Res. 36, 209–233 (2009) 2. Hochbaum, D.S., Hong, S.P.: On the complexity of the production-transportation problem. SIAM J. Optim. 6, 250–264 (1996) 3. Chandra, P., Fisher, M.L.: Coordination of production and distribution planning. Eur. J. Oper. Res. 72, 503–517 (1994) 4. Lee, J., Kim, B.L., Johnson, A.L., Lee, K.: The nuclear medicine production and delivery problem. Eur. J. Oper. Res. 236, 461–472 (2014) 5. Abdollahzadeh, V., Nakhaikamalabadi, I., Hajimolana, S.M., Zegordi, S.H.: A multifactory integrated production and distribution scheduling problem with parallel machines and immediate shipments solved by improved whale optimization algorithm. Complexity 2018, 1–21 (2018) 6. Ganji, M., Kazemipoor, H., Molana, S.M.H., Sajadi, S.M.: A green multi-objective integrated scheduling of production and distribution with heterogeneous fleet vehicle routing and time windows. J. Clean. Prod. 259, 120824 (2020) 7. Karimi, N., Davoudpour, H.: A branch and bound method for solving multi-factory supply chain scheduling with batch delivery. Expert Syst. Appl. 42, 238–245 (2015) 8. Song, H.B., Lin, J.: A genetic programming hyper-heuristic for the distributed assembly permutation flow-shop scheduling problem with sequence dependent setup times. Swarm Evol. Comput. 60, 100807 (2021) 9. Park, J., Mei, Y., Ngsuyen, S., Chen, G., Zhang, M.J.: An investigation of ensemble combination schemes for genetic programming based hyper-heuristic approaches to dynamic job shop scheduling. Appl. Soft Comput. 63, 72–86 (2018) 10. Akarsu, C.H., Küçükdeniz, T.: Job shop scheduling with genetic algorithm-based hyperheuristic approach. Int. Adv. Res. Eng. J. 6, 16–25 (2022) 11. Zhang, Y., Li, X.P.: Estimation of distribution algorithm for permutation flow shops with total flowtime minimization. Comput. Ind. Eng. 60, 706–718 (2011) 12. Shao, W.S., Pi, D.C., Shao, Z.S.: A Pareto-based estimation of distribution algorithm for solving multiobjective distributed no-wait flow-shop scheduling problem with sequencedependent setup time. IEEE Trans. Autom. Control 16, 1344–1360 (2019) 13. Salhi, A., Rodríguez, J.A.V., Zhang, Q.F.: An estimation of distribution algorithm with guided mutation for a complex flow shop scheduling problem. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 570–576 (2007). https://dl.acm. org/doi/abs/10.1145/1276958.1277076 14. Pan, Q.K., Ruiz, R.: An estimation of distribution algorithm for lot-streaming flow shop problems with setup times. Omega 40, 166–180 (2012) 15. Aminzadegan, S., Tamannaei, M., Fazeli, M.: An integrated production and transportation scheduling problem with order acceptance and resource allocation decisions. Appl. Soft Comput. 112, 107770 (2021) 16. Marandi, F., Ghomi, S.M.T.F.: Integrated multi-factory production and distribution scheduling applying vehicle routing approach. Int. J. Prod. Res. 57, 722–748 (2019) 17. Urlings, T., Ruiz, R., Stützle, T.: Shifting representation search for hybrid flexible flowline problems. Eur. J. Oper. Res. 207, 1086–1095 (2010)

Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem with Sequence-Independent Setup Times and Resource Constraints Xing-Han Qiu1 , Bin Qian1,2(B) , Zi-Qi Zhang1,2 , Zuo-Cheng Li1,2 , and Ning Guo1,2 1 School of Information Engineering and Automation, Kunming University of Science and

Technology, Kunming 650500, China [email protected] 2 Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China

Abstract. In this paper, an improved estimation of distribution algorithm-based hyper heuristic (IEDA-HH) algorithm is proposed for the flexible job shop scheduling problem with sequence-independent setup times and resource constraints (FJSP_SISTs_RCs) with the objective of minimizing the makespan. First, a singlevector encoding scheme is employed to represent the ranking of solutions, and a LPT (Longest Processing Time) based method is designed to select the machine to process jobs. Second, six simple and efficient heuristics are incorporated into IEDA-HH to constitute a set of low-level heuristics (LLHs). Third, the improved estimation of distribution algorithm (IEDA) is introduced as the high-level strategy to manage the heuristic sequences from the pre-designed LLHs set. In each generation, the heuristic sequences are evolved from the IEDA and then sequentially performed on the solution space for better results. Finally, the performance of IEDA-HH is carried out on a typical benchmark dataset and computational results demonstrate the superiority of the proposed IEDA-HH. Keywords: IEDA-HH · hyper-heuristic · flexible job shop scheduling · low-level heuristic

1 Introduction In recent years, flexible job shop scheduling problem (FJSP) has been widely concerned in both academia and industry [1–3]. As a special case of job shop scheduling problem (JSSP), FJSP is also proved to be a typical NP-hard problem [4]. However, compared with the actual manufacturing process, FJSP is still slightly simpler, which does not take into account many of the constraints that may exist in the actual manufacturing process, such as the sequence-independent setup times (SISTs) of the machine to change the mold [5] and the resource constraints (RCs) of the machine, etc. These constraints are all encountered in the actual production process and are of great research value. Meanwhile, there are few researches on FJSP with SISTs and RCs at domestic and international studies [6–10]. This paper considers this important issue. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 218–228, 2023. https://doi.org/10.1007/978-981-99-4755-3_19

Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem

219

Nowadays, hyper-heuristic algorithms (HHA) have been successfully applied to solve a variety of scheduling problems. Park et al. [11] designed a genetic programmingbased hyper-heuristic (GPBHH) algorithm with the objective of minimizing the average weighted delay time for the JSP incorporating the dynamic arrival of jobs and machine breakdown. For HHA, researchers focus on how high-level strategy (HLS) guides lowlevel heuristics (LLHs), which determines how HLS learns the characteristics of LLHs and guides the selection of LLHs for a deeper search of the solution space of the problem. The estimation of distribution algorithm (EDA) can be used as an effective HLS, which is an intelligent optimization algorithm based on statistical learning. EDA can use probabilistic model to learn the structural information of the problem solution, which can avoid the problem of traditional evolutionary algorithm to a certain extent to destroy the excellent pattern in the quality solution [12]. Thus, we propose an improved EDA-based hyper-heuristic (IEDA-HH) to solve FJSP with SISTs and RCs.

2 FJSP with SISTs and RCs 2.1 Problem Description The FJSP_SISTs_RCs can be described as follows: There are n jobs {J1 , J2 , ..., Jn } to be testedon m machines {M  1 , M2 , ..., Mm }. Each job Ji consists of a sequence of Ki operations Oi1 , Oi2 , ..., Oiki . Each operation Oik corresponds to a number of machines that can be selected for processing, and each machine corresponds to a number of required resources. In addition, the FJSP with SISTs and RCs has the following additional constraints: (1) At the initial moment, all jobs, machines and resources are available. (2) At most one job can be processed by one machine at the same time, and the next job can be processed only after the process is completed, and this process is not allowed to be interrupted. (3) When different operations of the same job are transferred between machines, a SISTs is required. (4) The RCs by each machine is predetermined, and the machine can start working only when all the RCs are available. 2.2 FJSP with SISTs and RCs The notation used is described in Table 1. The objective of the FJSP_SISTs_RCs is to find an optimal solution sequence π with the minimum makespan Fmax , which is required for all jobs to complete the process while satisfying the constraints. Then, the permutation model of FJSP_SISTs_RCs can be written as follows:   (1) Fmax = max FO1k1 , FO2k2 , ..., FOnkn FOik = max{RTMik , FMik } + POik,Mik , k = 1

(2)

  FOik = max RTMik , FMik , FOik−1 + STMik−1,Mik + POik,Mik , k > 1

(3)

  1 2 r RTMik = max WTMik , WTMik , ..., WTMik

(4)

220

X.-H. Qiu et al. Table 1. Nomenclature.

sum

  π = πj , πm

The total number of operations of all jobs A solution vector of the problem

πj

An operation vector of length sum

πm

A machine vector of length sum

Mik

The selected machine for processing the operation Oik

FOik

Finishing time of the operation Oik

FMik

Finishing time of the last operation on the machine Mik

RTMik

The readiness time of required resources of Mik

STa,b

The SISTs of transferring the same job from the machine Ma to Mb

POik,Mik

The processing time of the operation Oik on the machine Mik

r RCM r WTM

The type of resource r required by machine M (r = 1, 2, ...) Waiting time for the type of resource r required by machine M to be ready

π = arg{Fmax } → min, ∀π ⊂



,

(5)

where Eqs. (1)–(4) are the formulae for the makespan Fmax , and Eq. (5) indicates that the optimal solution sequence π is found among the set of all solution sequences to minimize makespan Fmax . There is an example to illustrate the FJSP_SISTs_RCs under consideration. Considering a set of 5 jobs and a group of 4 machines as follows: πj = {2, 1, 3, 5, 4, 1, 3, 2, 5}, πm = {1, 2, 3, 4, 3, 2, 4, 3, 1}. The parameters of the example are provided in Table 2, 3 and 4. The scheduling Gantt chart is shown in Fig. 1. Table 2. Machine resource configuration. Mac

Tester

Handler

Accessory

M1

Type 1

Type 1

Type 3

M2

Type 1

Type 2

Type 2

M3

Type 2

Type 3

Type 1

M4

Type 3

Type 2

Type 3

As the first operation on M2 , O11 should have been processed at the 0 moment on M2 , but the RC1 required by M2 (the first type of RC1) are occupied by M1 at this time, so O11 has to wait until the process on M1 is completed and the RC1 required by M2 are released; then, O11 can be processed. Therefore, the start time of O11 is 3.

Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem

221

Table 3. Number of resources. Number

RC1

RC2

RC3

Type 1

1

1

1

Type 2

2

2

1

Type 3

1

1

1

Table 4. Processing time and setup time. Mac

Processing time

Setup time

O11

O12

O21

O22

O31

O32

O41

O51

O52

M1

M2

M3

M4

M1

3



3

2

4

5

2

2

4

0

1

3

2

M2

2

3



2



3



2

2

1

0

2

4

M3

4

2

2

4

4



3





3

2

0

1

M4

2

3

3



6

3

3

4

2

2

4

2

0

Fig. 1. Gantt chart for a solution to the example

3 IEDA-HH for FJSP with SISTs and RCs 3.1 Encoding and Decoding of High-Level Strategy Domains In the IEDA-HH high-level strategy domain population, each individual consists of 6 LLHs, the length of the high-level individual is 10. When decoding the high-level individuals, the bodies in the lower-level problem domain execute the corresponding LLHs in the higher-level individuals in the order from front to back. Each time a LLH is performed on a low-level individual, the fitness of the individual is calculated, and the fitness of the original low-level individual is subtracted from its updated fitness, and the difference is recorded as χ . If the new difference is greater than χ , χ is replaced with the new difference, otherwise, χ is retained. After all LLHs are performed, the fitness of the individual in the high-level strategy domain is χ .

222

X.-H. Qiu et al.

3.2 Encoding and Decoding of Low-Level Problem Domains In the low-level problem domain of the algorithm, IEDA-HH employs a single-vector coding scheme based on the operation sequence of jobs, where each job has a corresponding sequence number and different scheduling schemes can be obtained according to different arrangements. It can be represented as πj = {J1 , J2 , ..., Jsum }, where the k th occurrence of the job in πj refers to the k th operation of this job. For a certain sequence of jobs πj , this paper designs an LPT_RC (Longest Processing Time with resource constrain) method based on the LPT (Longest Processing Time) method to select the machine that is currently available for scheduling. Thereby completing the decoding process. The specific steps are as follows: Step 1: Sort the jobs in decreasing order of the processing time; Step 2: Select the first job in the sequence of jobs which has been assigned in step 1. When assigning a job, always assign it to the machine that becomes free first, and if the machine resource constraint is unavailable, consider the next job; Step 3 When the job has been assigned, remove the job from the sequence; Step 4 Repeat steps 2 to 3 until all jobs have been allocated. 3.3 Probabilistic Model and Update Mechanism The probability matrix p is adopted to characterize the distribution estimation model for the higher levels of IEDA-HH, and pm,n (g) denotes the probability of LLHm occurring around position n of the high-level individuals in generation g. The word ‘around’ is used here because the update strategy in this paper is used to calculate the probability of LLHm being around position n, not just at a fixed position n. ⎤ ⎡ p1,1 (g) · · · p1,10 (g) ⎥ ⎢ .. .. pm,n (g) = ⎣ ... ⎦ . . p6,1 (g) · · · p6,10 (g)

6×10

In the initial period of IEDA-HH, all probability values are set to 1/6, to ensure that all LLHs are sampled uniformly. The update mechanism considers the probability distribution of LLHs in the current location and its neighborhood. The specific update method is as follows: pm,n (g + 1) = (1 − γ )pm,n (g) +  i qm,n

=

γ OInd _Size

OInd _Size  i=1

i qm,n |neighbor(n)|

(6)

1, If LLHm is within the neighborhood of position n , 0, else

where OInd _Size is the number of dominant individuals, γ is the learning rate, and 0 < γ , neighbor(n) is the size of one-sided neighborhood. The high level of IEDA-HH constructs the probability distribution matrix of the dominant individual by counting the information of LLH at each position according to the update mechanism, and then updates the probability matrix p by learning the probability distribution characteristics of the LLH of the dominant individual, and then decides the LLH at each position in turn based on the proportional selection by roulette, so as to generate the new high-level individual.

Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem

223

Fig. 2. Represent for low-level heuristics

3.4 Represent for Low-Level Heuristics In this paper, six simple and effective LLHs are designed as low-level heuristics for IEDA-HH, they are presented below. LLH1 : swap, LLH2 : forward-insertion, LLH3 : backward-insertion, LLH4 : swap between the subsequence1 and 2, LLH5 : swap between the subsequence1 and 3, LLH6 : swap among the subsequence1, 2, and 3. A detailed description of each LLH is shown in Fig. 2. 3.5 IEDA-HH for FJSP with SISTs and RCs In summary, the overall flow of IEDA-HH is shown in Table 5. 3.6 Computational Complexity Analysis of IEDA-HH For IEDA-HH, the complexity is the number of times the algorithm performs addition, subtraction, multiplication and division. IEDA-HH contains four parameters, the population size of the high-level strategy domain Pop_Size, the number of dominate individuals in the high-level strategy domain population OInd _Size, the number of iterations gen, and the length of individuals in the high-level strategy domain hl. For a FJSP_SISTs_RCs of size sum × m, the complexity of the algorithm at each part is analyzed as follows: (1) High-level strategy domain: The computational complexity of sampling the proba  bilistic model to generate individuals in the high-level strategy domain is O hl 2 ; considering the population size, the computational complexity of generating a high  level strategic domain population is O hl 2 × Pop_Size ; the computational complexity of updating the probabilistic model when selecting  dominate individuals  in the high-level strategy domain is O hl 2 × OInd _Size ; and the computational   complexity of bubble sorting all individuals in the population is O Pop_Size2 .

224

X.-H. Qiu et al. Table 5. IEDA-HH for FJSP_SISTs_RCs.

(2) Low-level problem domain: The computational complexity of the heuristic algorithm formed by the arrangement of LLHs is; the computational complexity of calculating the fitness of an individual in the low-level problem domain is O(sum × m); and the computational complexity of randomly generating the population of the low-level problem domain when considering the population size is O(sum × m × Pop_Size).In summary, the whole computational complexity of IEDA-HH is: O(IEDA − HH ) = O(hl 2 × Pop_Size) + O(sum × m × Pop_Size)

Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem

225

+ gen × (O(hl 2 × OInd _Size) + O(hl 2 × Pop_Size) + O(Pop_Size2 ) + O(hl × sum × m × Pop_Size))

(7)

From Eq. (7),  it can be seen that the highest overall computational complexity of IEDA-HH is O hl 2 × OInd _Size (hl is a small value, 10 in this paper), so IEDA-HH has a small computational complexity.

4 Computational Result and Comparisons Considering that the semiconductor final testing scheduling problems (SFTSP) is a type of FJSP_SISTs_RCs, in order to verify the effectiveness of the proposed IEDA-HH, this paper uses 10 SFTSP instances available at the website (http://dalab.ie.nthu.edu.tw/new sen_content.php?id=0), which are used to carry out numerical simulations for testing and comparing the performance of the IEDA-HH with the compared algorithms. The parameters of the instances are provided in Table 6. The parameters of IEDA-HH are: Pop_Size = 120, OInd _Size = 16, γ = 0.4. All algorithms and test programs are coded in Delphi 2010, the operating system is win10, the CPU frequency is 4.5. To make a fair comparison, all algorithms adopted the same time, the termination conditions are the same. Table 6. The parameters of the instances. Instance

n

m

Processing time

Number of RCs RC1

RC2

RC3

{10, 5, 3}

{10, 8, 4}

{7, 7, 5, 5}

1–5

100

36

{1, 2, …,15}

6–10

60

36

{1, 2, …,60}

SISTs {1, 2, …,5} {1, 2, …,15}

In this paper, IEDA-HH is compared with the recent algorithms NFOA [13], KMBS [14] and CCIWO [15], the experiments are conducted with minimum makespan as the optimization objective, and 25 independent tests are performed with 10 instances to verify the effectiveness of IEDA-HH. The best value (BST), and average value (AVG) were chosen as performance metrics and the experimental comparison results are shown in Table 7, the best value for each row is shown in bold. From Table 7, we can conclude that IEDA-HH performs better than NFOA, KMBS, and CCIWO for problems of different sizes. The reasons for this can be attributed to: NFOA, KMBS, CCIWO and other metaheuristic algorithms essentially use several neighborhood operations, due to the limited number of neighbors and the single search method, the actual search depth of these algorithms is limited. In contrast, IEDA-HH accumulates information about the type and arrangement of LLHs at the top level by learning from the probability matrix model, and then samples the probability matrix model to generate new high-level individuals, and so on. The number, location and arrangement of different LLHs in the newly generated high-level strategy domain are continuously optimized, and then the lower-level individuals are controlled to perform

226

X.-H. Qiu et al.

a heuristic search with multiple combinations of LLHs to search the solution space of the problem more efficiently to enhance the ability of the algorithm to find high-quality solutions. In addition, to demonstrate the effectiveness of the obtained schedule, the Gantt chart of the solutions found by IEDA-HH for instance 10 is presented in Fig. 3. Table 7. Comparison of NFOA, KMBS, CCIWO and IEDA-HH. Instance

NFOA

KMBS

CCIWO

IEDA-HH

BST

AVG

BST

AVG

BST

AVG

BST

AVG

1

118

123.10

100

103.95

100

106.05

98

96.80

2

122

128.40

104

107.95

116

124.80

102

105.15

3

109

113.25

99

101.55

107

113.95

96

100.20

4

133

137.90

118

121.05

123

134.85

111

114.95

5

129

132.75

109

111.75

114

124.65

104

106.95

6

267

281.20

232

247.25

234

245.30

220

229.00

7

218

229.00

184

191.75

186

197.15

179

183.85

8

232

244.55

210

214.95

215

226.20

200

206.35

9

195

204.95

185

188.10

185

201.40

185

185.10

10

233

239.75

205

212.50

210

221.75

196

204.55

Average

175.60

183.49

154.60

159.78

159.00

169.61

148.60

153.59

Fig. 3. Gantt chart of the solutions found by IEDA-HH for instance 10

Improved EDA-Based Hyper-heuristic for Flexible Job Shop Scheduling Problem

227

5 Conclusions and Future Research In this paper, IEDA-HH is proposed to solve the FJSP with SISTs and RCs with the objective of minimizing the makespan. First, in the encoding and decoding phase of IEDA-HH, a machine selection scheme based on the LPT method is designed to improve the quality of the solution. Second, six simple and efficient LLHs are designed to guarantee the depth of the search of the solution space by IEDA-HH. Third, The IEDA-based probability matrix model is used at the high level of IEDA-HH to learn and accumulate information from high-quality high-level individuals to dynamically control the update of high-level populations and to guide the low-level individuals to perform the corresponding LLHs. This is so that IEDA-HH can find high-quality solutions more efficiently. Future work will extend IEDA-HH to solve the multi-objective FJSP_SISTs_RCs with distributed production and vehicle transportation integration. Acknowledgement. This research was supported by the National Natural Science Foundation of China (62173169 and 72201115), the Basic Research Key Project of Yunnan Province (202201AS070030), and the Basic Research Project of Yunnan Province (202201BE070001-050).

References 1. Wang, C., Tian, N., Ji, Z., Wang, Y.: Multi-objective fuzzy flexible job shop scheduling using memetic algorithm. J. Stat. Comput. Simul. 87, 2828–2846 (2017) 2. Tamssaouet, K., Dauzère-Pérès, S., Knopp, S., Bitar, A., Yugma, C.: Multiobjective optimization for complex flexible job-shop scheduling problems. Eur. J. Oper. Res. 296, 87–100 (2021) 3. Li, R., Gong, W., Lu, C.: Self-adaptive multi-objective evolutionary algorithm for flexible job shop scheduling with fuzzy processing time. Comput. Ind. Eng. 168, 108099 (2022) 4. Kacem, I., Hammadi, S., Borne, P.: Approach by localization and multiobjective evolutionary optimization for flexible job-shop scheduling problems. IEEE Trans. Syst. Man Cybern. Part C 32, 1–13 (2002) 5. Laguna, M.: A heuristic for production scheduling and inventory control in the presence of sequence-dependent setup times. IIE Trans. 31, 125–134 (1999) 6. Lei, D., Guo, X.: Variable neighbourhood search for dual-resource constrained flexible job shop scheduling. Int. J. Prod. Res. 52, 2519–2529 (2013) 7. Zheng, X.-L., Wang, L.: A knowledge-guided fruit fly optimization algorithm for dual resource constrained flexible job-shop scheduling problem. Int. J. Prod. Res. 54, 5554–5566 (2016) 8. Soofi, P., Yazdani, M., Amiri, M., Adibi, M.A.: Robust fuzzy-stochastic programming model and meta-heuristic algorithms for dual-resource constrained flexible job-shop scheduling problem under machine breakdown. IEEE Access 9, 155740–155762 (2021) 9. Kress, D., Müller, D., Nossack, J.: A worker constrained flexible job shop scheduling problem with sequence-dependent setup times. OR Spectr. 41(1), 179–217 (2018). https://doi.org/10. 1007/s00291-018-0537-z 10. Wu, R., Li, Y., Guo, S., Xu, W.: Solving the dual-resource constrained flexible job shop scheduling problem with learning effect by a hybrid genetic algorithm. Adv. Mech. Eng. 10, 1687814018804096 (2018) 11. Park, J., Mei, Y., Nguyen, S., Chen, G., Zhang, M.: An investigation of ensemble combination schemes for genetic programming based hyper-heuristic approaches to dynamic job shop scheduling. Appl. Soft Comput. 63, 72–86 (2018)

228

X.-H. Qiu et al.

12. Larraaga, P., Lozano, J.A.: Estimation of distribution algorithms applied to combinatorial optimization problems 19, 149–168 (2003) 13. Zheng, X.-L., Wang, L., Wang, S.-Y.: A novel fruit fly optimization algorithm for the semiconductor final testing scheduling problem. Knowl. Based Syst. 57, 95–103 (2014) 14. Wang, S., Wang, L.: A knowledge-based multi-agent evolutionary algorithm for semiconductor final testing scheduling problem. Knowl. Based Syst. 84, 1–9 (2015) 15. Sang, H.-Y., Duan, P.-Y., Li, J.-Q.: An effective invasive weed optimization algorithm for scheduling semiconductor final testing problem. Swarm Evol. Comput. 38, 42–53 (2018)

Learning Based Memetic Algorithm for the Monocrystalline Silicon Production Scheduling Problem Jianqun Gong, Zuocheng Li(B) , Bin Qian, Rong Hu, and Bin Wang School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China [email protected]

Abstract. In recent years, with the rapid development of the global semiconductor industry, the importance of monocrystalline silicon is increasingly important. This paper addresses a novel monocrystalline silicon production (MSP) scheduling problem. We analyze the characteristics of the MSP, and the MSP is molded as an unrelated parallel machine scheduling problem with maintenance cycle and machine setup time. To solve the MSP problem, a learning based memetic algorithm (LMA) is proposed. In the LMA, first, a high-quality initial population is constructed according to problem-specific knowledge, as well as an initial probability distribution model is established to accumulate valuable information about superior individuals. Second, to improve the ability of global exploration, an adaptive update mechanism based on the probability distribution model is developed, and a sampling method that keeps excellent patterns was designed to generate a new population. Third, a local search strategy based on variable neighborhood descent (VND) is designed to enhance the local exploitation ability. In the VND component, to enrich the search behavior, four self-adaptive neighborhood operators are devised. Moreover, we applied the simulated annealing (SA) as acceptance criteria of solutions to avoid the algorithm falling into a local optimum. Finally, simulation experiments based on some real-world instances from an MSP process demonstrate the effectiveness of the proposed LMA in solving the MSP scheduling problem. Keywords: Monocrystalline silicon production · Unrelated parallel machine scheduling · Learning based memetic algorithm · Variable neighborhood descent

1 Introduction In recent years, people are more and more interested in the practical scheduling problem [1]. In this paper, we consider a realistic and typical production scheduling problem, which was found in the production process of monocrystalline silicon at Long-ji Silicon Materials Co., Ltd. in Yunnan Province. The monocrystalline silicon production process includes six processes: charging and melting of polysilicon raw materials, seeding, necking, shouldering and rotating, equal-diameter, and ending. However, because of © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 229–240, 2023. https://doi.org/10.1007/978-981-99-4755-3_20

230

J. Gong et al.

the entire production process is completed in only one monocrystal furnace, and each monocrystal furnace can produce any type of monocrystalline silicon rods. Therefore, the whole production process is modeled as an unrelated parallel machine scheduling problem with maintenance cycle and machine setup time. The parallel machine scheduling problem (PMSP) exists in a wide variety of manufacturing systems. Adan et al. [2] studied an unrelated parallel machine scheduling problem (UPMSP) with the objective of minimizing maximum completion time (makespan). In order to be closer to the actual situation, the problem considers the sequence dependent setup time and machine qualification constraints. Sels et al. [3] proposed a hybrid algorithm combining tabu search, genetic algorithm and truncated branch and bound method for PMSP. Lei et al. [4] considered a distributed unrelated parallel machine scheduling problem (DUPMSP) with preventive maintenance to minimize the makespan. Fang et al. [5] studied an UPMSP with machine and job sequence dependent setup time to minimize the makespan. In practical production, Shermetov et al. [6] modeled the problem cyclic steam stimulation (CSS) of petroleum wells as a parallel uniform machines scheduling problem with release dates. A two-stage scheduling algorithm combining heuristic and genetic algorithm is presented to solve this problem. Berthier et al. [7] modeled the problem as an UPMSP for the production process of a knitting shop in the textile industry and proposed a new mathematical programming and an improved genetic algorithm to solve this complex problem. The identical parallel machine scheduling problem has been considered as a typical NP-hard scheduling problem [8], the MSP problem, a more complex problem related to unrelated parallel machines, is also an NP-hard problem. Meta-heuristic is a suitable method for solving NP-hard problems because they can obtain proximity-optimal solutions within acceptable running time [9]. The memetic algorithm (MA) is a metaheuristic algorithm, which uses a hybrid search model combining evolutionary algorithm [10] and local search to achieve a good balance between global exploration and local exploitation. It has been proven that the MA has been successfully applied to various combinatorial optimization problems [11]. Li et al. [12] proposed a MA based on probabilistic learning (MA/PL) to solve the MKP problem, and designed a new framework of MA/PL, which is of great significance for the research of combinatorial optimization problems. Wang et al. [13] proposed an energy-efficient distributed hybrid flow shop scheduling problem under the premise of simultaneously minimizing makespan and energy consumption. A cooperative memetic algorithm (CMA) based on reinforcement learning (RL) policy agent is designed to solve the problem. Mao et al. [14] studied the distributed permutation flow shop scheduling problem with preventive maintenance with the goal of minimizing the total flow time. A MA based on hash mapping is proposed to solve this problem. According to the literature research, the application of learning based MA (LMA) on MSP has not been studied yet. The remainder of this paper are organized as follows. Section 2 introduces the MSP scheduling problem. Section 3 elaborates the learning memetic algorithm in detail, which is the encoding and decoding, initialization, learning mechanism, local optimization strategy, and LMA for the MSP scheduling problem. The Simulation results and comparison are presented and analyzed in Sect. 4. Finally, Sect. 5 draws conclusions and presents future work.

Learning Based Memetic Algorithm

231

2 Description of the MSP Scheduling Problem Although the production of monocrystalline silicon contains six processes, the entire production process is approximated as single-process production. Because it is done in only one monocrystal furnace (machine). The entire furnace (from the start of production to the stopping of the furnace to replace the quartz crucible) is treated as one scheduling unit. Furthermore, considering the aging of the monocrystal furnace, the service life of the quartz crucible, and the actual production situation such as machine maintenance. We model the problem as an UMPSP with maintenance cycle and machine setup times. For convenience and readability, detailed symbolic descriptions including indexes, parameters and variables are stated as follows: • M = {1, 2, ..., Nm}, a set of machines, indexed by i. •  = 1,2, ..., Nf , a set of furnaces  to be scheduled, indexed by f . T (i)

T (i)

T (i)

, a set of furnaces to be processed on machine i, • π T (i) = π1 , ..., πk , ..., πf i ∈ (1, ..., Nm ), f ∈ (1, ..., Nf ). • Parameter Nm is the total number of machines. • Parameter Nf is the total number of furnaces. • Parameters π are all sequences of furnaces. • Parameter T (i) Represents the sum of all furnaces processed on machine i. T (i) • Parameter P(πk ) Processing time required for furnace k on machine i. T (i) T (i) • Parameter TD (πk−1 , πk ) difference in start time of two adjacent furnaces on machine i. T (i) • Parameter St(πk−1 , πkT (i) ) the setup time of two adjacent furnaces of machine i. • Parameter tR is a time constant, the machine to replace the quartz crucible setup time. • Parameter tM is a time constant, machine maintenance time. • Parameter TM is a time constant, machine maintenance cycle. • Parameter Totalp,s represents the total processing time of the first s furnaces, s ∈ (1, ..., Nf ). The problem is described as follows: a set of furnaces  must be assigned to a set of machines M . Each machine is always available and can process up to one job at a time without interruption. The processing time P(πkT (i) ) of furnace k is dependent on machine T (i) T (i) i. The setup time St(πk−1 , πk ) of two adjacent furnaces on machine i represents the time tR of replacing quartz crucible by machine i or the maintenance time tM of machine. The scheduling goal is to find a π ∗ in the set  that minimizes the earliest completion time (makespan). Based on the above description, Cmax (π ) can be calculated as follows.  T (1)−1  T (1) T (1) T (1) Cmax (π ) = max TD (πk−1 , πk ) + P(πT(1) ), k=1  (1) T (2)−1 T (m)−1   T (2) T (2) T (2) T (m) T (m) T (m) TD (πk−1 , πk ) + P(πT(2) ), ..., TD (πk−1 , πk ) + P(πT(m) ) k=1

k=1

T (i)

T (i)

The starting process time difference TD (πk−1 , πk on machine i is calculated as follows:

) between two adjacent furnaces

T (i) T (i) T (i) , πkT (i) ) = P(πk−1 ) + St(πk−1 , πkT (i) ) , k ∈ (1, ..., T (i)) , i ∈ (1, ..., m) (2) TD (πk−1

232

J. Gong et al. T (i)

T (i)

St(πk - 1 , πk s 

) = 0 , k = 1;

T (i) P(πk ) k=1 T (i) St(πk−1 , πkT (i) ) = tM ;

if TotalP,s =

(3)

≥ TM , s ∈ (1, ..., Nf );

TotalP,s = 0; else T (i) T (i) St(πk−1 , πk ) = tR ,k = 1;

(4)

T (i) Equation (3) indicates that the machine setup time St(πk−1 , πkT (i) ) is set to 0 for the first processing, that is, the machine is always available and the furnace to be processed T (i) is available at time 0. Except for the first processing, the setup time St(πk−1 , πkT (i) ) is calculated as shown in Eq. (4). Where tR and tM are time constants of 10h and 36h (considering an ideal state in a real production process), respectively. In addition, TM is also a time constant, which set to 480h in combination with the actual production process. Hence, Cmax (π ∗ ) can be calculated as follows:

Cmax (π ∗ ) = min Cmax (π ) π ∈

(5)

The goal of this article is to find a solution π ∗ in the set of all schedules  such that. π ∗ = arg{Cmax (π )}→ min, ∀π ∈ 

(6)

Obviously, the model used in this section is a permutation-based model, which consists of several equations to calculate the completion time of the furnaces on each machine to obtain the minimum completion time.

3 Learning Memetic Algorithm 3.1 Encoding and Decoding, Initialization The most common representation of the solution to the parallel machine scheduling problem is the array of jobs per machine, which represents the processing order of the assigned jobs [5]. We first randomly generate a non-repetitive positive integer ordering (from number 1 to the total number of furnaces Nf ). Then, according to the decoding rule of the earliest completion time (ECT), each furnace is assigned to m machines. That is, a solution consists of m arrays of one-dimensional integers, each of whose elements represent the sequence of furnaces assigned to the machine. To facilitate understanding, we give the encoding and decoding of an instance with 10 furnaces and 3 machines, and an initial solution is obtained. Table 1 shows the processing time of different furnaces on different machines. We first randomly generate an initial solution π = [9, 2, 6, 7, 8, 5, 3, 10, 4, 1], then decoding according to ECT rule. A feasible solution of MSP shown in Fig. 1. We can see the

Learning Based Memetic Algorithm

233

Table 1. Processing time of furnaces on machine 1, machine 2 and machine 3 Furnaces

Processing time M1

M2

M3

1

150

152

210

2

164

169

197

3

172

161

176

4

179

155

183

5

154

208

154

6

170

214

175

7

204

172

198

8

209

200

171

9

161

173

182

10

167

207

169

furnaces 9, 5, 3 and 1 are assigned to machine 1 for processing. The furnaces 2, 7 and 4 are processed sequentially on machine 2. Finally, furnaces 6, 8 and 10 are processed by machine 3, the blank spaces indicating setup times tR and tM respectively. The maximum completion time is determined by machine 1 and is calculated as: Cmax (π ) = 161 + 10 + 154 + 10 + 172 + 36 + 150 = 693. For metaheuristic algorithms, a good initial solution can reduce time consumption and improve the quality of the solution. We consider the initialization strategy with the idea of randomly generating a set of randomly sequences non-repeating positive integers. The strategy can ensure the accuracy of the solution and to increase the diversity of the population.

Fig. 1. Encoding and decoding (initial solution)

234

J. Gong et al.

3.2 Learning Mechanism Search strategies based on learning mechanisms are widely used to improve the search performance of MA. For LMA, certain probabilistic models are used to learn problemspecific knowledge and generate offspring. As pointed out by Li [12], an accurate probabilistic model helps to improve the global exploration performance of MA. In this sense, a suitable probabilistic model is crucial for the LMA. The application of statistical learning in combination optimization problems has been favored by more and more researchers. Estimation of distribution algorithm (EDA) is a new metaheuristic algorithm based on statistical learning technology. It builds a probabilistic model to estimate the distribution of promising or excellent solutions and generates new individuals from that model by sampling. Unlike traditional evolutionary algorithms, EDA’s evolutionary mechanism can avoid the destruction of blocks in promising solutions [9]. Therefore, the probabilistic learning mechanism in EDA is also used in this paper. 3.3 Local Optimization Strategy The solution generated by the probabilistic learning mechanism in EDA is generally not the optimal solution of the problem. Thus, further improvement is needed. It is generally believed that the introduction of an effective local search strategy in the metaheuristic algorithm can increase the depth of its development in the solution space, thereby improving the search performance of the algorithm. In this subsection, we designed four self-adaptive neighborhood search operators based on the characteristics of the problem, and then proposed a local search strategy based on simulated annealing (SA) and variable neighborhood descent (VND) to obtain an improved optimal solution. The proposed four neighborhood operators are described as follows: • Swap_furnace: This is a furnace-based neighborhood exchange operation. That is, on the same machine, randomly select four positions, exchange the furnace order on position 1 and position 3, and exchange the furnace order on position 2 and position 4, as shown in Fig. 2. • Swap_machine: This is a machine-based neighborhood exchange operation. First, two machines are selected, and any position on each machine is selected, and then the furnaces at these two positions are exchanged, as shown in Fig. 3. • Insert_furnace: This is a furnace-based neighborhood insertion operation. That is to say, on the same machine, two positions are arbitrarily selected, and then the furnaces at the back of the position are inserted before the other position, as shown in Fig. 4. • Insert_machine: This is a machine-based neighborhood insertion operation. Similar to the operation of Insert_furnace, select two machines, select a position on each machine, and then insert the furnace at the back of the position into the front of the position, as shown in Fig. 5. Then the local search strategy based on these neighborhood search operators is proposed. The local search pseudocode based on VND and SA is shown in Algorithm 1.

Learning Based Memetic Algorithm

235

Fig. 2. An example of Swap_furnace

Fig. 3. An example of Swap_machine

Fig. 4. An example of Insert_furnace

Fig. 5. An example of Insert_machine

The proposed local search strategy has the ability to perform deep mining from different promising regions, which increases the search performance of the algorithm and obtains the optimal solution of the problem. 3.4 LMA for the MSP Scheduling Problem The basic idea of the proposed LMA is a hybrid probabilistic learning and local optimization strategy based on the framework of evolutionary algorithm. The purpose is to achieve an appropriate balance between exploration and exploitation in the search process. Learn and extract valuable information from the population to obtain new promising high-quality solutions that can be further improved by local optimization. Therefore, the combination of probabilistic learning and local optimization enables the algorithm to continuously explore new promising search areas and deeply exploit specific areas. The pseudo-code of LMA is shown in Algorithm 2. LMA starts with an initialization procedure to obtain a set of high-quality solutions and updates the population. Then the local optimization process is called to further improve to obtain the optimal solution of the problem. It can be seen that LMA not only Uses probability model for global exploration, but also uses problem-dependent local search to develop high-quality individuals. A good balance between exploration and exploitation is achieved, and it is expected that better results will be achieved in solving the MSP scheduling problem.

236

J. Gong et al.

Algorithm 1: Local search based on VND and SA Input: The solution π , S=2*f Output: The best found solution π ∗ 1

begin

2

// Improve the current solution π

3 4

π = π ′ ,h=1, flag=true;

While h ≤ S do

5

Switch_pro= random (); //Select neighborhood by probability

6

If Switch_ pro ≤ 0.5 then

7 8 9 10 11

π ″ ← Perform Swap _ furnace(π ′ ) ;

else π ″ ← Perform Insert _ furnace(π ′ ) ;

end If If f (π ′ ) ≤ f (π ″ ) then π∗ ← π′ ;

12

flag=true;

13 14

else If Switch_pro= random (); //Select neighborhood by probability

15

If Switch_ pro ≤ 0.5 then

16

π ″ ← Perform Swap _ machine(π ′ ) ;

17 18

else π ″ ← Perform Insert _ machine(π ′ ) ;

19 20

end If If f (π ′ ) ≤ f (π ″ ) then

21

π∗ ← π′ ;

22 23

else π∗ ← π″ ;

24 25 26

end If end If

27

h=h+1;

28

end While

29 Perform SA ( π ∗ ); //Perform SA to further optimize the solution 30

Return The best found solution π ∗ ;

Learning Based Memetic Algorithm

237

4 Simulation Results and Comparison 4.1 Experimental Setup The MSP problem is modeled and first proposed by a practical problem, and there are no benchmarks. Therefore, we randomly generate test instances. That is, according to the combination of f ∈ {10, 20, 30, 40, 60, 70, 90} and m ∈ {3, 5, 10, 15, 20}. The proT (i) cessing time P(πk ) is randomly generated from the uniform distribution of [150, 220] T (i) , πkT (i) ) is determined by tR and (Close to actual production), and the setup time St(πk−1 tM , as detailed in Sect. 2. Parameter population size popsize = 30, parameter learning rate α = 0.3. All algorithms and experiments were programmed and implemented by Delphi 2007 on a computer with 2.50 GHz CPU, 16 GB RAM, and Windows 11 as the operating system.

Algorithm 2: The LMA algorithm for the MSP scheduling problem Input: Problem instance I with a minimization objective fit , population size k, learning rate α = 0.3 Output: The best found solution π ∗ 1

begin

2

//generate a population ( POP )

3

POP ← Initialize Population ();

4

π ∗ ← arg min { fit (π i ) : i = 1, 2,..., k} ;

5

// sampling learning and update population

6

POP _ new ← Learning Procedure( POP ,k);

7 do 8 9 10

While a stopping condition is not reached //Improve the current solution π π ′ ← VND Optimize ( π );

π ″ ← SA Optimize ( π ′ );

238

J. Gong et al.

// update the best solution found so

11 far

( ) ( )

′ ″ If f π ≤ f π then

12

π∗ ← π′ ;

13 14

else π∗ ← π″ ;

15 16 17 18

end If end While Return The best found solution π ∗ ;

4.2 Computational Results and Comparison To verify the effectiveness of LMA, LMA is compared with HGA [15] and EVNS [16]. Since the above algorithms are not designed for the problem considered here, we use the ECT criterion proposed in Sect. 3.2 to adjust them. The optimal results corresponding to each problem are shown in bold, and the solution results for different problems are shown in Table 2. Where, AVG is the average of the optimal results of the algorithm running 20 times, and BST is the optimal value of the algorithm running 20 times independently. Table 2. Comparison of LMA with HGA and EVNS Instances

HGA

f ×m

AVG

EVNS BST

AVG

LMA BST

AVG

BST

10 × 3

666

659

659

659

659

659

20 × 3

1245

1207

1221

1206

1206

1206

30 × 5

1077

1057

1060

1053

1054

1050

30 × 10

515

503

504

499

500

499

40 × 5

1467

1442

1445

1427

1429

1418

40 × 10

728

710

710

698

693

684

60 × 10

1089

1060

1068

1055

1057

1050

60 × 15

727

716

710

699

690

669

60 × 20

515

499

500

493

495

492

70 × 10

1286

1263

1267

1248

1236

1225

70 × 15

860

851

849

843

843

841 (continued)

Learning Based Memetic Algorithm

239

Table 2. (continued) Instances

HGA

f ×m

AVG

EVNS BST

AVG

LMA BST

AVG

BST

70 × 20

650

647

646

644

642

641

90 × 10

1660

1627

1634

1614

1607

1596

90 × 15

1086

1066

1063

1042

1045

1036

90 × 20

848

840

840

836

836

834

As can be seen from Table 2, the test results of LMA are significantly better than HGA and EVNS on most instances, indicating that LMA has better performance in solving MSP scheduling problem.

5 Conclusions and Future Work We proposed a new MSP scheduling problem for the first time, and model the problem as an unrelated parallel machine scheduling problem with maintenance cycle and machine setup time. To solve this problem, we designed four neighborhood search operators based on the problem features, and then propose an LMA with a local search strategy based on VND and SA which based on these neighborhood operators. The effectiveness of LMA in solving the MSP problem is verified by simulation experiments. For future work, it can be done in two aspects. First, this study mainly discusses the learning mechanism. It is worth studying other learning models, such as Bayesian networks, to obtain more experimental computational data so that we can find the best learning mechanism. Second, the MSP problem is an ideal scheduling problem. We can consider more complex scheduling modes (such as: fuzzy processing time, monocrystalline silicon production and transportation integrated scheduling problem, etc.). Acknowledgements. This research was supported by the National Natural Science Foundation of China (62173169 and 61963022), the Basic Research Key Project of Yunnan Province (202201AS070030), and Yunnan Fundamental Research Projects (grant NO. 202301AT070458).

References 1. Pan, Q.-K.: An effective co-evolutionary artificial bee colony algorithm for steelmakingcontinuous casting scheduling. Eur. J. Oper. Res. 250 (2015) 2. Adan, J.: A hybrid genetic algorithm for parallel machine scheduling with setup times. J. Intell. Manuf. 33, 1–15 (2022) 3. Sels, V., Coelho, J., Dias, A., Vanhoucke, M.: Hybrid tabu search and a truncated branchand-bound for the unrelated parallel machine scheduling problem. Comput. Oper. Res. 53, 107–117 (2015) 4. Meiyao, L., Lei, D.: An artificial bee colony with division for distributed unrelated parallel machine scheduling with preventive maintenance. Comput. Ind. Eng. 141, 106320 (2020)

240

J. Gong et al.

5. Fang, W., Zhu, H., Mei, Y.: Hybrid meta-heuristics for the unrelated parallel machine scheduling problem with setup times. Knowl. Based Syst. 241, 108193 (2022) 6. Sheremetov, L., Muñoz, J., Chi-Chim, M.: Two-stage genetic algorithm for parallel machines scheduling problem: cyclic steam stimulation of high viscosity oil reservoirs. Appl. Soft Comput. 64 (2017) 7. Berthier, A., Yalaoui, A., Chehade, H., Yalaoui, F., Amodeo, L., Bouillot, C.: Unrelated parallel machines scheduling with dependent setup times in textile industry. Comput. Ind. Eng. 174, 108736 (2022) 8. Yalaoui, F., Chu, C.: An efficient heuristic approach for parallel machine scheduling with job splitting and sequence-dependent setup times. IIE Trans. 35, 183–190 (2003) 9. Zhang, Z.-Q., Hu, R., Qian, B., Jin, H.-P., Wang, L., Yang, J.-B.: A matrix cube-based estimation of distribution algorithm for the energy-efficient distributed assembly permutation flow-shop scheduling problem. Expert Syst. Appl. 194 (2022) 10. Wu, X., Che, A.: A memetic differential evolution algorithm for energy-efficient parallel machine scheduling. Omega 82 (2018) 11. Wang, J.-j., Wang, L.: A cooperative memetic algorithm with feedback for the energy-aware distributed flow-shops with flexible assembly scheduling. Comput. Ind. Eng. 168, 108126 (2022) 12. Li, Z., Tang, L., Liu, J.: A memetic algorithm based on probability learning for solving the multidimensional knapsack problem. IEEE Trans. Cybern. PP, 1–16 (2020) 13. Wang, J.-j., Wang, L.: A cooperative memetic algorithm with learning-based agent for energyaware distributed hybrid flow-shop scheduling. IEEE Trans. Evol. Comput., 1 (2021) 14. Mao, J.-Y., Pan, Q.-K., Miao, Z.-H., Gao, L., Chen, S.: A hash map-based memetic algorithm for the distributed permutation flowshop scheduling problem with preventive maintenance to minimize total flowtime. Knowl. Based Syst. 242, 108413 (2022) 15. Joo, C., Kim, B.: Hybrid genetic algorithms with dispatching rules for unrelated parallel machine scheduling with setup time and production availability. Comput. Ind. Eng. 85 (2015) 16. Abdullah, S., Turky, A., Ahmad Nazri, M.Z., Sabar, N.: An evolutionary variable neighbourhood search for the unrelated parallel machine scheduling problem. IEEE Access, 1 (2021)

Learning Variable Neighborhood Search Algorithm for Solving the Energy-Efficient Flexible Job-Shop Scheduling Problem Ying Li1 , Rong Hu1,2(B) , Xing Wu2 , Bin Qian1 , and Zi-Qi Zhang1 1 School of Information Engineering and Automation, Kunming University of Science and

Technology, Kunming 650500, China [email protected] 2 School of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, China

Abstract. With the increasingly challenging environment, green manufacturing attracts widespread attention. Moreover, facing fierce market competition and frequently variable product demand, flexible manufacturing is more closely to real-life applications. In this article, we propose a probability matrix-based learning variable neighborhood local search (LVNS_PM) algorithm for solving the energy-efficient flexible job-shop scheduling problem (EE_FJSP). The optimization objective is to minimize the makespan and total energy consumption (TEC). Concretely speaking, on the one hand, a problem-dependent three-dimensional probability matrix is proposed as an information accumulation matrix to learn and retain characteristic messages of elite solutions that guide the algorithm’s global exploration. On the other hand, we design multiple problem-specific neighborhood operators and present a variable neighborhood search algorithm, which balances the algorithm’s global exploration and local exploitation. Finally, simulation studies and computational comparisons demonstrate the practicality and robustness of LVNS_PM. Keywords: Learning Variable Neighborhood Search · Energy-Efficient Scheduling · Flexible Job-shop Scheduling

1 Introduction To cope with the global increasingly serious environmental issues, green manufacturing (GM) is receiving growing attention [1]. Moreover, the flexible job-shop scheduling problem (FJSP) eliminates the limitation of resource uniqueness, each operation can be processed by multiple machines. Compared to the job-shop scheduling problem (JSP), FJSP is an NP-hard problem [2], it not only requires operation sequencing but also machine selection. Accordingly, to adapt the modern discrete manufacturing production mode, achieving energy saving and emission reduction, energy-efficient FJSP (EE_FJSP) was proposed and gradually attracted researchers’ attention. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 241–250, 2023. https://doi.org/10.1007/978-981-99-4755-3_21

242

Y. Li et al.

Most existing research on FJSP applied intelligent optimization algorithms (IOA) to optimize the maximum completion time (makespan)-related objectives and achieving good performance. Such as the genetic algorithm (GA) [3], differential evolution [4], memetic algorithm [5], and hybrid many-objective evolutionary algorithm [6]. However, most of them are improvements of existing GA under the classical evolutionary algorithms framework. In addition, they rarely design problem-specific evolutionary operators, mostly using certain mapping schemes to convert the discrete FJSP to a continuous problem. Although it can simplify operators, it does not fully utilize the problem’s information and leads to unfocused exploration. Furthermore, the above works on FJSP ignored the machine energy consumption (EC). With the promotion of GM, the EC reduction is drawing increasing interest from scholars. Wu [7] designed an improved NSGA-II for solving the FJSP to minimize the makespan, EC, and on/off times. Li [8] proposed an improved artificial bee colony algorithm for a multi-objective low-carbon FJSP. Wei [9] developed a hybrid energy-efficient scheduling measure for the FJSP with variable machining speeds. Although the above works have made some contribution to emission reduction, the frequent on/off strategy will cause wear to the machine and shorten its lifetime. Considering that the machines’ EC is mostly spent in execution, including processing and standby, so reduce the EC via shorting the standby time of machines or choosing low-power machines is expected. Thus, it is of great theoretical value and practical significance to investigate the modeling of EE_FJSP and design a suitable algorithm. The variable neighborhood search (VNS) algorithm is a metaheuristic algorithm [10]. Compared with IOA, VNS without the “population” concept, VNS focuses on rapidly tackling high-quality solutions in solution space through fast transformations between different neighborhood operators. Therefore, the VNS algorithm has been widely applied in the production scheduling field [10] since it has the advantages of fewer variable parameters, a simple structure, and good convergence. Furthermore, the probability matrix learns and accumulates structural features of good solutions employing statistics, and then yields offspring by extraction sampling, with better global search capability. Inspired by the above, to fully utilize problems’ characteristics and achieve a good trade-off between global and local search, a probability matrix-based learning variable neighborhood search (LVNS_PM) algorithm is proposed to solve EE_FJSP. Most works on probability matrices adopt two-dimensional matrix (2DM) with good effects [11, 12]. Nevertheless, the 2DM is hard to simultaneously retain the information of job blocks and their positions, leading to the destruction of block structure. Fortunately, a few studies have emerged on the three-dimensional matrix (3DM) for various scheduling problems. Zhang [13] designed a matrix-cube-based EDA (MCEDA) for the distributed assembly permutation flow-shop problem (FSP), this algorithm reasonably records the information of high-quality solutions and avoids the damage of block structure. Qian [14] also proposed an MCEDA for the no-wait FSP with sequence-dependent setup times and release times. Guo [15] proposed a three-dimensional ant colony optimization algorithm for the green multi-compartment vehicle routing problem. The above literature research shows that there are relatively few studies on 3DM, and there are no reports on the application of 3DM for EE_FJSP. Therefore, we develop a three-dimensional

Learning Variable Neighborhood Search Algorithm

243

probability matrix-based VNS algorithm and design various neighborhood optimization operators to solve the EE_FJSP. The remaining parts of this article are organized as follows. Sect. 2 gives the problem description. Sect. 3 gives the details of the LVNS_PM algorithm. Some experimental results and test instances are given in Sect. 4. Sect. 5 summarizes the conclusions.

2 Problem Description The EE_FJSP is simply described as follows: there are n jobs that need to be processed on m machines. Each job i has ni operations, each operation can be processed on one or more machines and the processing time is different. Figure 1 illustrates the difference between JSP and FJSP, where the machine h is noted as Mh. The solid arrow denotes that the machine for the next operation is fixed, while the dotted arrow denotes that the machine for the next operation can be selected. So the EE_FJSP not only addresses the operation processing but also chooses the processing machine. The relevant symbols and descriptions as in Table 1.

Fig. 1. JSP vs FJSP.

According to the above description, the mathematical model is built as follows: f (1) = min(max Cijh )

(1)

f (2) = min(TEC P + TEC I )

(2)

TEC P =

ni  m n  

(EPh × Tijh × Xijh )

(3)

i=1 j=1 h=1

TEC I =

ni  m n  

(EIh × Th )

(4)

i=1 j=1 h=1

Sijh + Tijh ≤ Cijh , ∀i, j, h. m  h=1, h=t

Yiht = 1, ∀i, t.

(5)

(6)

244

Y. Li et al. Table 1. Symbol description.

Symbols

Description

Parameters

n, m: Number of jobs and machines, respectively i, h: Index of jobs and machines, respectively, where i = 1, ..., n, h = 1, ..., m U : Total number of operations ni : Number of operations that job i contains, where j = 1, ..., ni Oij : The jth operation of job i, Oi, 1 , ..., Oi, ni Tijh , Th : Processing time and idle time of the machine h Sijh , Cijh : Start time and completion time of Oij on machine h, respectively Q: A large enough positive number EPh , EIh : Per unit EC of machine h at processing and standby, respectively TEC P , TEC I : Total processing EC and total standby EC, respectively

Decision Variables Xijh : It equals 1 if Oij is processed on machine h, else it equals 0 Yiht : It equals 1 if machine t is the successor of machine h for job i, else it equals 0 Ahir : It equals 1 if job r is the successor of job i on machine h, else it equals 0

(Siqh + Tijh )Yiht ≤ Sijh , ∀i, q, t, j = q, h = t.

(7)

Srjh + Tijh ≤ Sijh + Q(1 − Ahir ), ∀r, i = r, j, h.

(8)

Equation (1) is the criteria for minimizing the makespan. Equations (2)–(4) is the criteria for minimizing TEC. Equation (5) calculate the makespan. Equations (6)–(7) restrict a job that can be processed on at most one machine at any time. Equation (8) limited a machine can only process at most one job at any time. Some constraints must be met: 1) Only one machine can be selected for an operation, and one operation can be processed by only one machine at any time. 2). Without interruption of machines during the process. 3) There is no processing priority between jobs, but between operations of the same job. 4) The processing time of each operation on each machine is known.

3 LVNS_PM for EE_FJSP This section details the LVNS_PM, starting with the solution representation and population initialization, then the global search guided by a three-dimensional probability matrix, and immediately followed by the variable neighborhood local search. The framework of the LVNS_PM algorithm is first outlined as follows: Step 1: Set the related parameters of LVNS_PM;

Learning Variable Neighborhood Search Algorithm

245

Step 2: Initializing the population PG (G = 0) and probability matrix, archive set (non-dominated solution set or Pareto solution set) T = ∅, G = 1, 2, ..., MaxG; Step 3: Evaluate all individuals in PG and perform non-dominance ranking and crowding distance calculations for each solution, update archive set T [16]; Step 4: G = MaxG? if yes, go to step 7; else, go to step 5; Step 5: Generate the new population: Step 5.1: Select the top 30% solutions based on the ranking results to be subpopulation SPG ; Step 5.2: Calculate the information accumulation matrix MUG×n×n according to SPG and update the probability matrix AG U ×n×n ; Step 5.3: Sampling probabilistic matrix AG U ×n×n to generate new populations PG ; Step 5.4: Perform VNS for each non-dominated solution in PG ; Step 6: G = G + 1, go to step 3; Step 7: Output all individuals in T . 3.1 Chromosome Representation and Initialization The EE_FJSP contains two subproblems: machine selection and job processing, a feasible solution is composed of two vectors as in Fig. 2, n = 3, m = 3, U = 8, n1 = 3, n2 = 2, and n3 = 3. The operation sequence O consists of the job number and the machine sequence M consists of machine number. For decoding, we use the proven decoding method in FJSP [17]. The “pre-interpolation” in this decoding method is to enable each operation to start earlier than possible, thus reducing both the makespan and TEC by shortening machines’ idle time (IT). See the Gantt chart of Fig. 2, operations O21 and O22 are inserted forward to the machine’s idle time. The initial population is generated randomly for all individuals.

Fig. 2. Feasible solution example and pre-interpolation decoding.

3.2 Three-Dimensional Matrix Guide for Global Search First, defined as ps and sps are the scale of PG and SPG , respectively. SPG = {πs,G1 , πs,G1 , ..., πs,Gsps }, πs,Gk (k = 1, 2, ..., sps) is the kth individual in SPG , πs,G,k O is the operation sequence of πs,Gk , πs,G,k O = [πs,G,k O (1), πs,G,k O (2), ..., πs,G,k O (U )]. In πs,G,k O , a job block is defined as two successive adjacent jobs, such as πs,G,k O = [1, 1, 2, 3], where the job

246

Y. Li et al.

blocks at each of the three positions (x = 1, 2, 3) are (y, z) = [1, 1], [1, 2], [2, 3]. Define a three-dimensional information accumulation matrix MUG×n×n for reserving the job blocks and their position information of πs,G,k O (k = 1, 2, ..., sps) in SPG . The details of MUG×n×n are described as follows:  1, if y=πs,G,k O (x) and z = πs,G,k O (x+1) G, k Ixyz (x, y, z) = , ∀x, y, z (9) 0, else sps k Ixyz , x = 1, 2, ..., U − 1; y, z = 1, 2, ..., n. (10) MUG×n×n (x, y, z) = k=1

k Ixyz

in Eq. (9) is the indicator function, it is used to record the position of the job block (y, z) in the kth elite individual in SPG . In Eq. (10), MNG×n×n (x, y, z) is the total number of occurrences of job block (y, z) at position x in SPG . Second, we create the three-dimensional probability matrix AG U ×n×n to reserve the job block distribution features of SPG . That is, convert the elements in MUG×n×n to the G corresponding probability values and keep them in AG U ×n×n . Let AU ×n×n (x, y, z) to be the element of AG U ×n×n , which represents the probability value that the job block at position x is (y, z) in πs,G,k O . The initialization of AG U ×n×n by Eq. (11), and updating of G AU ×n×n by Eq. (12), α is the learning rate. The total probability of the different job G blocks at position x in SPG (i.e., the sum of all elements in AG U ×n×n (x)) is NA (x) = n  n G y=1 z=1 AU ×n×n (x, y, z). The total number of job blocks at position x in SPG is n n 0 NM (x) = y=1 z=1 MUG×n×n (x, y, z). 

AG U ×n×n (x,

y, z) =

1/U , x = 1; y, z = 1, ..., n . 1/U 2 , x = 2, 3, · · · , U − 1; y, z = 1, ..., n

(11)

G G G AG+1 U ×n×n (x, y, z) = (1 − α) × AU ×n×n (x, y, z) + α × MU ×n×n (x, y, z)/NM (x),

G > 1, x = 1, 2, · · · U − 1, y, z = 1, 2, ..., n.

(12)

Finally, define πkO = [πkO (1), πkO (2), ..., πkO (U )] to be the operation sequence of the kth individual in PG . It can be obtained by sampling AG U ×n×n . Thereafter, adopt random selection to generate a legal machine sequence. Let NewJob(πkO , x) as the job selection function to determine the candidate job Jx at the xth position of πkO . Since the probability of the job block [πkO (x − 1), πkO (x)] being selected in πkO is stored in the O to sample from the πkO (x−1) row (x−1)th layer of AG−1 U ×n×n , so NewJob(π k , x) only need  G−1 G−1  O of AU ×n×n (x − 1), i.e., AU ×n×n x − 1, πk (x − 1) . The detailed steps of NewJob(πkO , x) are shown as follows:   O (x − 1), h) ; Step 1: Generate a random number γ , γ ∈ 0, nh=1 AG−1 (x − 1, π U ×n×n k  G−1 O Step 2: if γ ∈ 0, AU ×n×n (x − 1, πk (x − 1), 1) , then set Jx = 1, else, go to Step 3;  t G−1 O Step 3: if γ ∈ h=1 AU ×n×n (x − 1, πk (x − 1), h), t+1 G−1 O h=1 AU ×n×n (x − 1, πk (x − 1), h) , Jx = t + 1.

Learning Variable Neighborhood Search Algorithm

247

3.3 Variable Neighborhood Local Search The critical path (CT) is defined as the longest path from the start of the first machine process to the completion of the last machine. Call the operation on the critical path critical operation. The CT has been marked in the Gantt chart of Fig. 2, and O12 , O13 , O31 and O32 are the critical operations. In other words, the makespan is decided by the CT. Thus, four neighbor operators aiming at reducing makespan and two neighbor operations aiming at reducing TEC are designed, the details are as follows: (1) Cmax _N1 : Randomly select two critical operations that are not of the same job and swap their positions. (2) Cmax _N2 : Randomly select a critical operation and a non-critical operation that is not the same job and swaps their positions. (3) Cmax _N3 : Randomly select two critical operations πkO (u) and πkO (v) (u = v), then insert πkO (u) before or after πkO (v). (4) Cmax _N4 : Randomly select a critical operation πkO (u) and a non-critical operation πkO (v) (u = v), then insert πkO (u) before or after πkO (v). (5) TEC_N5 : Randomly select an operation and assign it to the machine with the lowest energy consumption. (6) TEC_N6 : Randomly select an operation and assign it to the machine with the minimum processing time. According to Fig. 2, the six neighborhood operators are depicted in Fig. 3. The steps of LVNS_PM’s VNS are as follows: Step 1: For each solution in the non-dominated solution set, find a critical path, if there are multiple randomly selected ones; Step 2: Execute the above six neighborhood operations sequentially for each nondominated solution. After each neighborhood operation, evaluate the solution; Step 2.1: If the new solution dominates the old solution, replace the old solution, redefine the critical path, and go to the next non-dominated solution; Step 2.2: If the two solutions do not dominate each other, add the new solution to the non-dominated solution set and go to the remaining neighborhood operations; Step 2.3: Otherwise, keep the old solution; Step 3: Sort all solutions and calculate the crowding distance, eliminate the solutions with the smallest crowding distance until the population size is ps.

4 Experiment Results Parameter calibration has a significant impact on the performance of the algorithm. The proposed LVNS_PM involves three key parameters, i.e., population size ps, subpopulation rate ϕ(sps = ps × ϕ), and learning rate α. Due to the wide and flexible range of each parameter value, we refer to the prior literature [13] and adopt trial-and-error to define the reasonable range of each parameter. Then the Design-of-Experiments [18] method is applied for parameter tuning to determine each parameter’s value, i.e., ps = 100, ϕ = 0.4, and α = 0.2. To verify the validity of the proposed LVNS_PM algorithm in this article, we use the benchmark dataset MK01-MK15 [19] and compare the LVNS_PM

248

Y. Li et al.

Fig. 3. Illustration of the six neighborhood operators.

algorithm with two advanced algorithms, e.g., EMA [20] and NSGA-II [16]. Three algorithms are executed in Python 10.3 and run on the Intel Core i5-9400 processor with a 2.90 GHz CPU and 16 GB RAM. To be fair, each instance is run 10 times with the same termination condition (i.e., n × m × 0.1 s). Moreover, note that the non-dominated solutions obtained by each algorithm will not be evaluated immediately after each instance. Instead, perform a non-dominated ranking for the ten nondominated solutions obtained by each algorithm to determine the two objective values for all algorithms under that instance. This will be fair for each algorithm’s nondominated solution evaluation. There are various multi-objective evaluation metrics, due to the limit of paper space, to validate the convergence and diversity of solutions, we adopt two multi-objective evaluation metrics, which is as follows: (1) Coverage Metric (CM): It is used to measure the dominance relationship between two non-dominated solution sets E1 and E2 , donated as C(E1 , E2 ). C(E1 , E2 ) represent the proportion of the solution set E2 that is dominated by the solutions in E1 , larger C(E1 , E2 ) means E1 convergence is better than E2 , which is represented as Eq. (13): C(E1 , E2 ) = |{e2 ∈ E2 |∃e1 ∈ E1 : e1  e2 or e1 = e2 }|/|E2 |.

(13)

(2) Overall Nondominated Vector Generation (ONVG): It is used to measure the size of the Pareto solution set E. The larger ONVG means that the algorithm has a wide search range and the diversity of solutions is better. As seen in Table 2, good results are highlighted in bold with gray underlay, C(A, B) and C(A, D) are bigger than C(B, A) and C(D, A) in most instances, respectively. It shows that the non-dominated solution obtained by A can dominate a large number of the B and D obtained. In addition, the number of non-dominated solutions obtained by A is larger than those obtained by B and D approximately all tests. Therefore, we

Learning Variable Neighborhood Search Algorithm

249

Table 2. Comparison of CM and ONVG of LVNS_PM, EMA, and NSGA-II. Instance

Size

A: LVNS-PM B: EMA D: NSGA-II

n, m

C(A, B)

C(B, A)

C(A, D)

C(D, A)

ONVG A

B

D

MK01

10,6

1.000

0.000

1.000

0.000

20

15

13

MK02

10,6

1.000

0.000

1.000

0.000

21

19

9

MK03

15,8

1.000

0.000

1.000

0.000

14

12

4

MK04

15,8

1.000

0.000

0.895

0.000

10

10

9

MK05

15,4

0.857

0.000

0.308

0.433

10

6

5

MK06

10,15

0.810

0.000

0.900

0.080

15

14

12

MK07

20,5

0.722

0.286

0.864

0.095

9

13

10

MK08

20,10

0.850

0.000

0.917

0.000

11

10

8

MK09

20,10

0.800

0.214

0.895

0.071

14

8

7

MK10

20,15

0.727

0.111

0.769

0.111

12

15

12

MK11

30,5

1.000

0.000

0.053

0.769

9

5

4

MK12

30,10

1.000

0.000

0.647

0.250

15

14

11

MK13

30,10

0.857

0.267

0.450

0.400

19

14

12

MK14

30,15

0.765

0.182

0.842

0.091

10

7

6

MK15

30,15

0.722

0.150

0.955

0.050

10

8

8

can draw the conclusion that the proposed LVNS_PM can solve well the considered EE_FJSP. The two reasons can be concluded as follows: 1) the stronger global search capability of the three-dimensional probability matrix and its sampling mechanism; 2) the variable neighborhood local search well balances the global and local exploitation to further enhance the quality of solutions.

5 Conclusions and Future Work In this article, we propose an LVNS_PM algorithm for the energy-efficient flexible jobshop scheduling problem with minimizing the makespan and TEC. LVNS_PM was able to obtain a better solution on the 15 FJSP benchmark instances and has good stability. But the 3DM is more complex than 2DM, it will take more time in initialization, updating, and sampling process. Therefore, the computational complexity needs to be optimized. This work can provide practical help and guidance for new manufacturing enterprises under the requirement of emission reduction. Future work will focus on distributed and multi-objectives problems with transportation or assembly, analyzing the problem properties and designing more efficient algorithms. Acknowledgements. This research was supported by the National Natural Science Foundation of China (62173169 and 72201115), the Basic Research Key Project of Yunnan Province (202201AS070030), and the Basic Research Project of Yunnan Province (202201BE070001-050).

250

Y. Li et al.

References 1. Mouzon, G., Yildirim, M.B., Twomey, J.: Operational methods for minimization of energy consumption of manufacturing equipment. Int. J. Prod. Res. 45(18–19), 4247–4271 (2007) 2. Meng, L.-l., Zhang, C., Shao, X., Ren, Y.: MILP models for energy-aware flexible job shop scheduling problem. J. Clean. Prod. (2019) 3. Lin, C.-S., Li, P.-Y., Wei, J.-M., Wu, M.-C.: Integration of process planning and scheduling for distributed flexible job shops. Comput. Oper. Res. 124, 105053 (2020) 4. Wu, X., Liu, X., Zhao, N.: An improved differential evolution algorithm for solving a distributed assembly flexible job shop scheduling problem. Memet. Comput., 1–21 (2019) 5. Luo, Q., Deng, Q.W., Gong, G.L., Zhang, L.K., Han, W.W., Li, K.X.: An efficient memetic algorithm for distributed flexible job shop scheduling problem with transfers. Expert Syst. Appl. 160(1), 113721 (2020) 6. Sun, J., Zhang, G., Lu, J., Zhang, W.: A hybrid many-objective evolutionary algorithm for flexible job-shop scheduling problem with transportation and setup times. Comput. Oper. Res. 132, 105263 (2021) 7. Wu, X., Sun, Y.: A green scheduling algorithm for flexible job shop with energy-saving measures. J. Clean. Prod. 172, 3249–3264 (2018) 8. Li, Y., Huang, W., Wu, R., Guo, K.: An improved artificial bee colony algorithm for solving multi-objective low-carbon flexible job shop scheduling problem. Appl. Soft Comput. 95, 106544 (2020) 9. Wei, Z., Liao, W., Zhang, L.: Hybrid energy-efficient scheduling measures for flexible jobshop problem with variable machining speeds. Expert Syst. Appl. 197, 116785 (2022) 10. Mladenovi´c, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097– 1100 (1997) 11. Jarboui, B., Eddaly, M., Siarry, P.: An estimation of distribution algorithm for minimizing the total flowtime in permutation flowshop scheduling problems. Comput. Oper. Res. 36(9), 2638–2646 (2009) 12. Wang, F., Tang, Q., Rao, Y.Q., Zhang, C., Zhang, L.: Efficient estimation of distribution for flexible hybrid flow shop scheduling. Zidonghua Xuebao 43, 280–293 (2017) 13. Zhang, Z.Q., Qian, B., Hu, R., Jin, H.P., Wang, L.: A matrix-cube-based estimation of distribution algorithm for the distributed assembly permutation flow-shop scheduling problem. Swarm Evol. Comput. 60(1), 100785 (2021) 14. Qian, B., Zhang, Z.Q., Hu, R., Jin, H.P., Yang, J.B.: A matrix-cube-based estimation of distribution algorithm for no-wait flow-shop scheduling with sequence-dependent setup times and release times. IEEE Trans. Syst. Man Cybern. 1–12 (2022) 15. Guo, N., Qian, B., Na, J.-X., Hu, R., Mao, J.L.: A three-dimensional ant colony optimization algorithm for multi-compartment vehicle routing problem considering carbon emissions. Appl. Soft Comput. 127, 109326 (2022) 16. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 17. Wang, L., Zhou, G., Xu, Y., Liu, M.: An effective artificial bee colony algorithm for the flexible job-shop scheduling problem. Int. J. Prod. Res. 51 (2013) 18. Chandra, S.: Design and Analysis of Experiments. Publishing (2005) 19. Brandimarte, P.: Routing and scheduling in a flexible job shop by tabu search. Ann. Oper. Res. 41, 157–183 (1993) 20. Luo, Q., Deng, Q., Gong, G., Zhang, L., Han, W., Li, K.: An efficient memetic algorithm for distributed flexible job shop scheduling problem with transfers. Expert Syst. Appl. 160, 113721 (2020)

A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm for the Distributed Flexible Job-Shop Scheduling Problem Fang-Chun Wu1 , Bin Qian1,2(B) , Rong Hu1,2 , Zi-Qi Zhang1,2 , and Bin Wang1,2 1 School of Information Engineering and Automation,

Kunming University of Science and Technology, Kunming 650500, China [email protected] 2 Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China

Abstract. As an important branch of distributed scheduling, distributed flexible job shop scheduling problem (DFJSP) has become an emerging production pattern. This article proposes a Q-learning-based hyper-heuristic evolutionary algorithm (QHHEA) to minimize the maximum completion time (i.e., makespan) of DFJSP. First, a hybrid initialization strategy is introduced to acquire a high-quality initial population with certain diversity. Second, according to DFJSP’s characteristics, we design a three-dimensional vector coding scheme, and left-shift method is embedded into the decoding stage to improve the utilization of machines. Third, six simple but effective low-level neighborhood structures are devised. Forth, a Q-learning-based high-level strategy (QHLS) is developed to automatically learn the execution order of the low-level neighborhood structure. Moreover, a novel definition of the state and a dynamic adaptive parameter mechanism are developed to avoid QHHEA trapping in the local optima. Finally, comprehensive comparisons are conducted for our QHHEA against several state-of-the-art algorithms based on 18 instances. And the statistical results demonstrate the efficiency and effectiveness of the proposed QHHEA in solving the DFJSP. Keywords: Q-learning · Hyper-heuristic · Evolutionary algorithm · Distributed flexible job shop scheduling

1 Introduction With the increasingly fierce market competition, to improve production efficiency, flexible manufacturing systems (FMS) have become the choice of numerous enterprises. Meanwhile, under the trend of global production, manufacturing is from traditional centralized manufacturing mode to a distributed manufacturing mode transformation, distributed manufacturing gradually becomes an emerging production pattern [1]. As one of the most significant modes of the distributed production in FMS, distributed flexible job shop scheduling problem (DFJSP) has become a research hotspot due to its engineering value. In DFJSP, three tasks need to be completed: assign factory to © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 251–261, 2023. https://doi.org/10.1007/978-981-99-4755-3_22

252

F.-C. Wu et al.

each job, determine the production sequence and allocate machines to all operations. FJSP is already strongly NP-hard [2], whereas DFJSP is similar to FJSP in the case of a factory number of 1, so the latter problem is also NP-hard. Therefore, seeking efficient scheduling method for DFJSP has important application value. The existing methods for solving DFJSP mainly include exact algorithms, heuristic algorithms, and hybrid intelligent optimization algorithms (HIOAs). Nevertheless, for solving the large-scaled DFJSPs, the performance of the exact algorithm is poor, such as mathematical methodology [3]. Moreover, designing effective heuristics is a nontrivial task with the lack of problem features. And HIOAs, such as genetic algorithm [4], memetic algorithm [5], and hybrid estimation of distribution algorithm [6], although have obtained satisfactory solutions within acceptable time, the iterative search is a time-consuming process. Thus, we utilize a hyper-heuristic algorithm (HHA) to address DFJSP. Since HHA can adaptively select and execute the proper low-level neighborhood structure by the guidance of high-level strategy (HLS). In addition, reinforcement learning (RL) is a knowledge-based method that can lead the agent to choose effective actions according to experience obtained by interacting with the environment [7]. As a kind of RL algorithm, Q-learning has been widely applied and Q-learning-based HHAs have demonstrated outperformance for settling various problems [8–10]. Inspired by these aforementioned insights, we introduce a Q-learning-based hyper-heuristic evolutionary algorithm (QHHEA) to deal with DFJSP.

2 Problem Description As depicted in Fig. 1, DFJSP can be described as follows. The jobs in a set of n jobs {J1 , J2 , ..., Jn } need to be assigned to S different factories {F1 , F2 , ..., Fs }, and m processing machines in all factories {M1 , M2 , ..., Mm }. Each job contains hi operations  Oi,1 , Oi,2 , ..., Oi,hi , and operations can be processed on multiple candidate machines according to the production priority. Note that all operations of a job must be processed in same factory. The goal of addressing DFJSP is to find a schedule π ∗ with the minimum makespan in the set of all available scheduling solutions. For the DFJSP detailed description, please refer to [6].

3 Q-Learning-Based Hyper-heuristic Evolutionary Algorithm 3.1 Encoding and Decoding Strategy   In the encoding phase, a vector π = π F , π M , π O is used to represent a feasible solution or an individual, where π F is the factory selection subsequence, π M is the machine allocation subsequence, and π O is the operation scheduling subsequence. Corresponding to the example in Table 1, where * indicates that the operation cannot be processed on the corresponding machine. A feasible encoding is shown in Fig. 2. In which machines 1, 2, and 3 are distributed in factory 1, and machines 4, 5, and 6 are distributed in factory 2. Meanwhile, job 1 is assigned in factory 1, job 2 is assigned in factory 2. In factory 1, job 1 is first processed on machine 3. Immediately after, the second operation of job 1 is processed on machine 1. In factory 2, job 2 is first processed on machine 6.

A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm

253

Fig. 1. The diagram of DFJSP.

Decoding refers to the process of conversion of a feasible solution into a scheduling scheme. In the decoding process, we first decompose the DFJSP into a series of flexible job shop problems (FJSPs). Then we used left-shift decoding method to tackle each FJSP respectively. It should be noted that left-shift decoding method was suggested by [11]. Finally, the feasible scheduling solution of DFJSP is formed by merging each FJSP. By using the left-shift decoding method, the operations are shifted to the left as compact as possible, which significantly improves machine utilization. 3.2 Hybrid Initialization of Populations To obtain a high-quality population with certain diversity, a hybrid population initialization method (HPIM) is presented. In a population, global selection (GS) generates 60% of individuals, local selection (LS) produces 30% of individuals, and remaining individuals are created by random selection (RS). This method has been proved to be effective for DFJSP. Unlike [5], when GS, LS and RS are executed, it is need to ensure

254

F.-C. Wu et al. Table 1. A simple example of DFJSPC.

Job J1

J2

Operation

F1

F2

M1

M2

M3

M4

M5

M6

O1,1

2

*

3

3

4

2

O1,2

3

5

*

3

2

3

O1,3

3

5

2

4

*

1

O2,1

4

6

*

5

4

3

O2,2

3

2

4

5

2

*

Fig. 2. Illustration of the representation of a feasible solution.

that operations of same job can only be processed in the same factory due to all operations of same job cannot be transferred to other factory. For deep reviews of HPIM, please consider the below reference [5]. 3.3 Neighborhoods of DFJSP In this subsection, we design six simple neighborhood structures to explore the search space of the DFJSP. The details of neighborhood structures are described as follows. (1) Mutation between different machines (MDM): Select the critical machine u (i.e., machine with makespan) and identify factory k where machine u is located. Then in machine u random select an operation Oi,h and assign it to another available machine in factory k. (2) Swap in same machine (MSM): Randomly select an operation Oi,h in critical machine u. Then, in machine u, randomly select an operation Oi ,h and exchange Oi,h and Oi ,h . (3) Swap in same factory (SSF): Select critical factory k (i.e., factory with makespan) at random, then randomly select two operations in factory k and swap them. (4) Mutation between different factories (MDF): Randomly select a job i, then identify critical factory k where job i is in. Next, randomly allocate job i to factory k  (k  = k), and assign each operation of job i to available machine in factory k  at random. (5) Sequence half-side reverse (SHR): Divide the scheduling subsequence of the critical factory into two equal sequences, then randomly select one of the sequences and reverse it.

A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm

255

(6) Swap between different factories (SDF): Randomly select two jobs i1 and i2 (i1 = i2), and assume that job i1 is in critical factory k1 and job i2 is in factory k2. Next, distribute job i1 to factory k2, then allocate job i2 to factory k1. Finally, assign each operation of job i1 and i2 to available machines in factory k2 and factory k1, respectively. 3.4 Q-Learning-Based High-Level Strategy Q-Learning Algorithm The goal of the Q-learning algorithm is to obtain the best behavior to achieve the desired state through reward or punishment interactions between agent and environment. In Qlearning, each state-action pair has a cumulative reward value, which is calculated by Q-function, as expressed in Eq. (1). Suppose s = {s|s1 , s2 ...st } indicates a set of possible states, a = {a|a1 , a2 ...at } illustrates a set of selectable actions, rt+1 and st+1 represents the immediate reinforcement signal and the state reached after the agent performs action at at state st , respectively. The learning rate αt determines the capabilities of accepting new learning information or using existing information. In the early iterations of Qlearning, the current solution is often not the optimal solution and a large αt favors the exploration of solutions in a large-scale solution space. As the running time of the algorithm increases, the solution obtained gradually approaches the optimal solution and the small αt value is more conducive to using the existing learning information to search the solution space delicately. so, we designed dynamic adaptive mechanisms to regulate αt that is calculated by Eq. (2). In addition, γ denotes the discount factor, and Q(st , at ) expresses the Q-value at time t.   Qt+1 (st , at ) = Qt (st , at ) + αt × rt+1 + γ × arg maxQ(st+1 , a) − Q(st , at ) , (1) a∈A

αt = 1 − 0.9 ×

t tmax

.

(2)

Definition of State and Action Traditionally, the states of Q-learning stand for the characteristics of external environments. But it is probably easy to cause dimension disaster problem for DFJSP with huge search space. Thus, we define DFJSP’s neighborhoods as state set. The state set is formed by the six neighborhood structures listed at Sect. 4.2 and is represented by Eq. (3). S = {MDM , MSM , SSF, MDF, SHR, SDF}.

(3)

An action is defined as the transfer from one neighborhood to another (“shift to”). In this way, states and actions can be represented by a directed complete graph G = (V , E) is shown in Fig. 3, where V represents vertex set (i.e., state set) and E denotes arc set. And each action is represented by an arc connecting two states (vertexes of the graph). Action Selection Strategy The ε-greedy strategy is commonly used to randomly choose a neighborhood with a

256

F.-C. Wu et al.

Fig. 3. The graph of states and actions.

probability εt or between two neighborhoods that possesses the highest Q-value with a probability 1 − εt , which is shown in Eq. (4).  arg maxQ(st , a), u ≥ εt a∈A at = , (4) random, u < εt where εt is called greedy rate, its initial value ε is a large number between 0 and 1, and decreases with the decay rate in the evolving process. And the value of u is a random value between [0, 1], we set the decay factor to 0.999. Reward Mechanism In QHHEA, the performance of each neighborhood was evaluated by rt . Since the goal of DFJSP is to find a better scheduling solution, the reward implies whether the execution of the action contributes to the improvement of the optimal solution. Thus, rewards are t (π ∗ ) is the makespan of π ∗ at time t calculated by Eq. (5), where Cmax  t+1 (π ∗ ) < C t (π ∗ ) 1, Cmax max rt = . (5) 0, others

3.5 The Procedure of QHHEA for DFJSP The procedure of QHHEA for DFJSP is described in Table 2. Moreover, local search method is embedded in the proposed QHHEA (lines 11–24 of Table 2). When the solution is not improved, QHHEA will continue to visit other states through Q-learning algorithm, until all states have been visited or the solution has been enhanced.

4 Experimental Comparisons and Statistical Analysis To verify the effectiveness of the proposed QHHEA, we compare QHHEA to three algorithms: EMA [5], IGA [2], and EDA-VNS [6]. All algorithms are coded by Python 3.8 and conducted on same environment. To ensure the fairness of the experiment, the

A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm

257

Table 2. The procedure of QHHEA for DFJSP.

Input: Parameters. Generate a population pop by HPIM and save the glob1:

al best individual π∗ .

2:

Initialize Q-table to 0 and randomly initialize a state st (neighborhood nt ).

3:

while not satisfied with the stopping condition do

4:

Select an action by ε − greedy strategy and update state (i.e., set st = st +1 , nt = nt +1 ). pop , add the accept

5:

Apply the neighborhood nt on improved solutions to pop .

6:

Select the first ps individuals as new population in pop .

7:

8:

9: 10:

if π∗ is improved do Update the best individual reward rt based on Eq. (5).

π∗

and obtain the

Update the Qt +1 ( st , at ) based on Eq. (1). else

11:

while solution is not improving do

12:

Select an action by ε − greedy strategy and update state (i.e., set st = st +1 , nt = nt +1 ).

13:

Apply the neighborhood nt on pop and accept improved solutions.

14:

Select the first ps individuals as new population in pop . (continued)

258

F.-C. Wu et al. Table 2. (continued)

if π∗ is improved do

15:

16:

Update the best individual π∗ and obtain the reward rt based on Eq. (5).

17:

Update the Qt +1 ( st , at ) based on Eq. (1).

18:

break else

19:

if all states have been visited and π∗ no improved number of iterations is 10 do

20:

21:

break. end if

22: 23:

end if

24:

end while

25:

end if

26:

ε t +1 = ε t × 0.999 and update learning rate Eq. (2).

27:

αt

based on

end while

Output: π∗ .

maximum elapsed CPU time n × f × m × ρ (seconds) is used as the stopping conditions for all experiments, where n is the number of jobs, f is the number of factories, m is the number of machines in each factory, and ρ is the time factor (ρ1 = 0.4 and ρ2 = 0.8). Moreover, there is no standard test instance available to test DFJSP, thus, all test instances were automatically generated by code and are available at https:// pan.baidu.com/s/1J4yrlxEKMYiO7kMyjOzEOw?pwd=k49a (password: k49a). And all methods are implemented independently 20 times.

A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm

259

By implementing parameter calibration experiments, we obtained the best combination of parameter values of QHHEA as follows: the size of population ps = 30, discount factor γ = 0.95 and initial greedy rate ε = 0.9. The makespan of QHHEA and other reference algorithms is listed in Table 3. The interaction plots between makespan of algorithm under three stopping criteria is shown in Fig. 4. As shown in Fig. 4, QHHEA significantly outperforms other three algorithms, which testifies the effectiveness and efficiency of both the proposed QHHEA. The Gantt chart of the obtained optimal solution for instance 20 × 2 × 3 is illustrated in Fig. 5, the colored rectangles represent the machine processing time, the number in the colored rectangles represents the job number and the operation number. From Table 3, it can be seen that the proposed QHHEA can obtain the best performance for all instances under different stopping criteria, which demonstrate the efficiency and effectiveness of the proposed QHHEA in tackling the DFJSP. There are three findings as follow: (1) HPIM generate a high-quality population, which provides a sound beginning of QHHEA. (2) As an HLS, Q-learning can effectively select appropriate neighborhood for each state and guide the global exploration for finding promising regions. (3) QHHEA’s local search increases search depth to improve the exploitation ability.

Fig. 4. The interaction plots between makespan of algorithm and different running time.

Fig. 5. The Gantt chart of the obtained optimal solution for instance 20 × 2 × 3.

260

F.-C. Wu et al.

Table 3. The average makespan of QHHEA and other algorithms under different stopping conditions. Scale

EMA

EDA-VNS

IGA

ρ1

ρ2

ρ1

ρ2

ρ1

ρ2

QHHEA ρ1

ρ2

20×2×3

208.70

208.45

215.60

215.55

209.30

204.85

181.85

181.00

20×2×5

112.25

112.80

137.25

135.45

129.35

124.55

96.95

98.35

20×3×3

124.05

124.00

138.10

143.10

132.15

130.35

110.30

108.30

20×3×5

90.85

93.00

111.25

111.80

100.80

99.25

79.20

78.65

20×5×3

115.10

114.35

115.40

114.55

104.50

101.65

87.15

87.85

20×5×5

74.00

74.95

89.95

90.20

81.20

79.25

64.25

63.85

40×2×3

362.50

364.25

382.00

387.30

377.35

371.30

325.20

326.10

40×2×5

219.80

222.25

278.80

271.50

263.50

260.35

207.10

205.75

40×3×3

248.00

248.00

266.05

264.20

256.90

254.45

213.35

212.75

40×3×5

167.80

161.45

199.85

195.50

183.90

180.85

140.45

140.20

40×5×3

186.40

181.90

194.40

193.05

180.15

176.80

158.30

156.70

40×5×5

119.00

118.65

142.05

144.70

130.80

127.70

103.20

101.10

60×2×3

608.05

603.00

631.70

629.25

619.15

612.20

541.95

541.65

60×2×5

308.45

310.80

391.25

381.85

361.30

358.00

292.50

289.80

60×3×3

420.25

418.40

442.25

433.00

425.15

421.20

369.20

362.00

60×3×5

217.10

217.65

299.40

289.45

265.75

263.60

202.65

200.60

60×5×3

265.80

268.25

278.65

285.60

264.05

260.85

220.85

219.45

60×5×5

161.40

161.15

196.60

199.55

184.55

180.50

146.30

145.15

Average

222.75

222.41

250.59

249.20

237.21

233.76

196.71

195.51

5 Conclusion and Future Work For DFJSP with the objective of minimizing makespan, this study put forwards QHHEA to address it. And experimental results indicate that the proposed QHHEA outperforms on 18 instances. In the future work, we will further consider the transfer of jobs between machines in the flexible shop and energy consumption (EC) objective, then design efficient multi-objective evolutionary algorithm to optimize both makespan and EC. Acknowledgement. This research was supported by the National Natural Science Foundation of China (62173169 and 72201115), the Basic Research Key Project of Yunnan Province (202201AS070030), and the Basic Research Project of Yunnan Province (202201BE070001-050).

A Q-Learning-Based Hyper-Heuristic Evolutionary Algorithm

261

References 1. Wu, X.L., Sun, Y.J.: A green scheduling algorithm for flexible job shop with energy-saving measures. J. Clean. Prod. 172, 3249–3264 (2018) 2. Zhang, G., Hu, Y., Sun, J., Zhang, W.: An improved genetic algorithm for the flexible job shop scheduling problem with multiple time constraints. Swarm Evol. Comput. 54, 100664 (2020) 3. Meng, L., Zhang, C., Ren, Y., Zhang, B., Lv, C.: Mixed-integer linear programming and constraint programming formulations for solving distributed flexible job shop scheduling problem. Comput. Ind. Eng. 142, 106347 (2020) 4. De Giovanni, L., Pezzella, F.: An Improved Genetic Algorithm for the Distributed and Flexible Job-shop Scheduling problem. Eur. J. Oper. Res. 200(2), 395–408 (2010) 5. Luo, Q., Deng, Q., Gong, G., Zhang, L., Han, W., Li, K.: An efficient memetic algorithm for distributed flexible job shop scheduling problem with transfers. Expert Syst. Appl. 160, 113721 (2020) 6. Du, Y., Li, J.-Q., Luo, C., Meng, L.-L.: A hybrid estimation of distribution algorithm for distributed flexible job shop scheduling with crane transportations. Swarm Evol. Comput. 62, 100861 (2021) 7. Sutton, R.S., Barto, A.G.J.I.T.o.N.N.: Reinforcement Learning: An Introduction 9(5), 1054– 1054 (1998) 8. Lin, J., Li, Y.Y., Song, H.B.: Semiconductor final testing scheduling using Q-learning based hyper-heuristic. Expert Syst. Appl. 187, 115978 (2022) 9. Lixin, C., Qiuhua, T., Liping, Z., Zikai, Z.: Multi-objective Q-learning-based hyper-heuristic with Bi-criteria selection for energy-aware mixed shop scheduling. Swarm Evol. Comput. 69, 100985 (2021) 10. Choong, S.S., Wong, L.P., Lim, C.P.: Automatic design of hyper-heuristic based on reinforcement learning. Inf. Sci. 436, 89–107 (2018) 11. Wang, L., Zhou, G., Xu, Y., Wang, S., Liu, M.: An effective artificial bee colony algorithm for the flexible job-shop scheduling problem. The Int. J. Adv. Manuf. Technol. 60(1), 303–315 (2012)

A Firefly Algorithm Based on Prediction and Hybrid Samples Learning Leyi Chen1,2 and Jun Li1,2(B) 1 College of Computer Science and Technology, Wuhan University of Science and Technology,

Wuhan 430065, China [email protected] 2 Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan 430065, China

Abstract. The firefly algorithm is an optimization algorithm developed based on the interactive behavior of fireflies in nature. It is based on the simulation of the mutual attraction and flickering behavior between fireflies to obtain the optimal solution by optimizing the objective function in the problem. To address the slow convergence speed, high time complexity, and easy trapping in local optima of the firefly algorithm during the search process, this paper proposes a firefly algorithm based on prediction and hybrid sample learning (PHSFA). Firstly, fireflies with insufficient progress space will be eliminated, and new fireflies will replace them to search for a better optimal value of the search function. Secondly, a hybrid sample is designed to enable each generation of fireflies to learn and enhance the convergence speed of the algorithm while reducing time complexity. In addition, a new adaptive step size strategy is adopted to adapt to the proposed algorithm. To verify the performance of PHSFA, experiments are conducted on CEC2020 test functions. The experimental results show that PHSFA performs the best on most test functions and has better solution accuracy. Keywords: Firefly Algorithm · Hybrid Sample · Prediction-based Tournament

1 Introduction Firefly Algorithm (FA) was proposed by Yang Xinshe in 2008 [1]. It is a stochastic heuristic algorithm inspired by the biological behavior of fireflies in nature. The algorithm simulates the natural behavior of fireflies, which emit light and are attracted to each other based on the intensity of the light. The core idea of FA is that the brightness of the firefly represents the fitness of the solution, while the position of the firefly represents the variable value of the solution. Fireflies adjust their positions based on their own brightness and the brightness of neighboring fireflies, in order to search for better solutions. Due to the good convergence and robustness of the Firefly Algorithm, as well as its suitability for a variety of optimization problems, it has been widely applied in many fields [2–4]. However, FA also has some notable drawbacks, such as being prone to premature convergence and high complexity. In response to the standard FA, many scholars have © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 262–274, 2023. https://doi.org/10.1007/978-981-99-4755-3_23

A Firefly Algorithm Based on Prediction

263

made different improvements to the FA. Yu et al. [5] used a wise step strategy to adjust the step factor of each firefly based on the Euclidean distance between an individual’s best position and the current global best position. Wang et al. [6] proved that as FA converges, the step factor approaches 0, and based on this conclusion, proposed an adaptive control parameter strategy. Yu et al. [7] proposed a personalized step strategy, where the strategy uses a large step for the best firefly and a linearly decreasing step for other fireflies to improve exploration capability. Tao et al. [8] proposed an adaptive strategy for a random model and an adaptive penalty function for handling constraint conditions to enhance the algorithm’s performance in optimization problems. In addition, there are also many improvements for the attraction model. Wang et al. [9] proposed a random attraction firefly algorithm, where each firefly randomly learns from another firefly to update its position. Xu et al. [10] developed an average-condition partial attraction model, where only fireflies with brightness higher than the average can learn from other individuals. Bei et al. [11] proposed an adaptive probability attraction model to attract fireflies based on their brightness level, which can minimize the number of brightness comparisons in the algorithm and adjust the attraction times. This article proposes a firefly algorithm based on prediction and hybrid sample learning (PHSFA) to address the shortcomings of the traditional firefly algorithm. The PHSFA algorithm consists of the following parts: (1) a prediction-based tournament mechanism is constructed, which predicts the performance of each firefly after each generation of learning. If the predicted result is worse than the current best individual, the firefly is reset; (2) a mixed sample attraction model is proposed, which attracts not only the currently better individuals but also those with greater potential for improvement; (3) a new step size adaptive strategy is used and combined with the previously proposed attraction strategy. The remaining parts of this article are as follows: Sect. 2 briefly introduces the process of standard FA. Section 3 describes our proposed method. Section 4 verifies the optimization performance of PHSFA through experiments and provides an analysis of the experimental results. Finally, some main conclusions are presented in Sect. 5.

2 Standard FA FA is a population-based stochastic search algorithm, and based on the social behavior of fireflies, it has three idealized rules, as shown below: (1) All fireflies are neutral, so any two fireflies can attract each other. (2) The attraction between fireflies is proportional to their brightness. For a pair of fireflies with different brightness, the darker one moves towards the brighter one. (3) The brightness of a firefly is related to the value of the objective function. For a minimization problem, the smaller the value of the objective function, the brighter the firefly. In FA, the brightness of a firefly and the attraction between fireflies are two important indicators. The attraction of each firefly is related to its brightness (i.e., fitness value). The higher the brightness of a firefly, the stronger its attraction, and the more it can attract other fireflies. During the position update process, a firefly moves towards another firefly with stronger attraction in the hope of finding a better solution. In general, the attraction between fireflies gradually decreases with the increase of distance, that is, the attraction

264

L. Chen and J. Li

is smaller between fireflies that are far apart, and larger between those that are close. The attraction formula between two fireflies is as follows: βij = β0 × e

−γ ×rij2

(1)

where β0 is the attractive force at r = 0, γ is the light absorption coefficient as a constant, and rij is the Euclidean distance between fireflies i and j, calculated by the following formula:   D  (2) r = |x − x | =  (x − x )2 ij

i

j

id

jd

d =1

where D is the dimension of the objective function and xid represents the d-th component of the i-th firefly. When comparing two fireflies i and j, if j is brighter than i, i will move towards j according to the following formula: xit+1 = xit + βij × (xjt − xit ) + α × (rand − 0.5)

(3)

where xit and xjt are the spatial coordinates of fireflies i and j at the t-th iteration, α is the step factor, and rand represents a random number uniformly distributed in [0, 1].

3 Proposed Approach 3.1 Prediction-Based Tournament In reference [12], Li et al. proposed a loser tournament strategy that introduces the competition mechanism in nature into the fireworks algorithm, where each generation of fireworks competes with each other during the search process, and the losers will be reset as new individuals to continue the search. Simulation experiments show that this loser tournament strategy can effectively improve the performance of the fireworks search. In our method, we improved it and applied it to FA. For each individual in the population, its change amount is defined as: ϕit = F(xit−1 ) − F(xit )

(4)

where F(xit ) is the objective function value of firefly i at generation t, and ϕit represents how much firefly i has changed between two iterations. In solving a minimization problem, the next generation fitness value of each firefly individual must be better than the previous generation, i.e., the objective function value corresponding to the next generation individual must not be larger than that of the previous generation. Therefore, the change amount must satisfy ϕit ≥ 0. If the change amount is relatively large, it means that there is a large improvement space in the area where the representative individual is located. As the individual gradually approaches the local optimal solution, the change amount will gradually decrease. After obtaining the change amount, we need to predict the fitness value of individuals with potential at the maximum number of iterations, and the predicted value of an individual is given by: Fpre (xiT ) = F(xit ) − (T − t)ϕit

(5)

A Firefly Algorithm Based on Prediction

265

where T is the maximum number of iterations. After obtaining the predicted value for firefly i, it is compared with the current global best individual. If the predicted value is worse than the current global best, i.e., Fpre (xiT ) > Fbest (xjT ), the individual is reinitialized and allowed to randomly search the search space again, effectively preventing individuals from getting stuck in local optima and abandoning individuals with low potential for change, and giving new individuals a chance to move towards the global best. Algorithm 1 shows how the prediction-based tournament mechanism works in each generation of the population.

3.2 Hybrid Sample Attraction Model To improve the search efficiency of the Firefly Algorithm, the selection of learning objects is crucial. Learning from the currently better solutions can make the algorithm converge quickly, but it may also prematurely fall into local optima. On the other hand, learning from ordinary individuals can reduce convergence speed and potentially escape local optima. Therefore, a hybrid sample attraction model is proposed. The variation rate of Firefly i is defined as: ρxt i =

ϕit |xit−1 − xit |

(6)

From (6), we can see that the firefly individuals with higher change rates can generate significant improvements in a relatively short displacement distance, and the region where they are located may contain local optima. For individuals with high change rates, even if they are not the current global optimal individuals, they also have considerable potential and can be selected as excellent individuals for other individuals to learn from. Due to the advantages of both elite individuals and individuals with high variation rates, two samples are used to store them separately in each iteration, and they are sorted in the samples based on the fitness values and variation rates of the current population. The two samples Xet and Xρt are defined as follows: t t t t t t Xet = [xi1 , xi2 , · · · , xiN |F(xi1 ) ≤ F(xi2 ) ≤ · · · ≤ F(xiN )]

(7)

266

L. Chen and J. Li t t t Xρt = [xj1 , xj2 , · · · , xjN |ρxt j1 ≥ ρxt j2 ≥ · · · ≥ ρxt jN ]

(8)

where N is the population size. After obtaining the two samples for each generation, in order to fully utilize the advantages of the individuals in the two samples, the top 10 individuals in each sample are selected and put into a new sample with a total of 20 individuals, which is represented as: t t t t t t , xi2 , · · · , xi10 , xj1 , xj2 , · · · , xj10 ] Xkt = [xi1

(9)

As shown in Fig. 1, in the Full Attraction model proposed in standard FA, each firefly learns from another firefly that is better than itself. In our proposed method, however, the learning targets for all fireflies are no longer other fireflies that are simply better than themselves, but individuals from the hybrid sample Xkt , as shown in Fig. 2. The individuals in hybrid sample Xkt not only lead to faster convergence of the algorithm, but also ensure the diversity of learning.

Fig. 1. Full attraction model

Fig. 2. Hybrid samples attraction model

3.3 Modification of Motion Strategy and Parameters In the early stage of FA, a larger search range is needed to explore more unknown solution space. In the later stage of the algorithm, a smaller search range is needed to obtain more accurate solutions. However, in the standard FA, the step size factor α is a constant, which is not conducive to algorithm detection and development. If the α value is set too large, the search accuracy of FA will be low. If the α value is set too small, the convergence speed of the algorithm will be very slow. The design of the step size factor is as follows: t

α(t + 1) = α(t) × e( T −1)

(10)

The attraction β is calculated using the same strategy as in MFA [10], and the formula is as follows: β = βmin + (β0 − βmin ) × e

−γ ×rij2

(11)

where βmin is the established minimum value of β, ensuring that the fireflies have at least some attraction. Due to the improvement of the attraction model, fireflies no longer learn from every individual but learn from the mixed sample. The motion equation is as follows: xit+1 = xit + βij × (xkt − xit ) + α × (rand − 0.5)

(12)

A Firefly Algorithm Based on Prediction

267

3.4 Framework of PHSFA

4 Experiments 4.1 Experiment Settings In this section, we chose to use 10 classic benchmark functions from CEC2020 to test and compare the performance of PHSFA under different environments. Table 1 provides a basic description of the CEC2020 test functions and the global optimum value of each function. In order to fully validate the effectiveness of the strategies in PHSFA, the experimental study of algorithm performance can be divided into two parts. The first part is to evaluate the solution accuracy of different attraction strategies under the same parameter settings and operational environment, in order to exclude the impact of parameter modification strategies on the algorithm. In the second part, FA and several published variants of FA are selected for comparison. In our test, the dimensions of CEC2020 benchmark

268

L. Chen and J. Li Table 1. CEC2020 test functions NO. Functions

Fi∗ = Fi (x∗ )

Unimodal Function 1

Shifted and Rotated Bent Cigar Function

100

Basic Functions

2

Shifted and Rotated Schwefel’s Function

1100

3

Shifted and Rotated Lunacek bi-Rastrigin Function

4

Expanded Rosenbrock’s plus Griewangk’s Function 1900

5

Hybrid Function 1 (N = 3)

1700

6

Hybrid Function 2 (N = 4)

1600

7

Hybrid Function 3 (N = 5)

2100

8

Composition Function 1 (N = 3)

2200

9

Composition Function 2 (N = 4)

2400

10

Composition Function 3 (N = 5)

2500

Hybrid Functions

Composition Functions

700

Search range: [−100, 100]D

functions are set to D = 10 and D = 20, and each function is evaluated 30 times, and then the average and standard deviation of the function running 30 times are calculated as the basis for comparing the performance of the algorithms. In addition, to more intuitively compare the performance of each algorithm, Friedman test is used to calculate the results of all compared algorithms. The experimental environment is: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz central processing unit, 32.00GB memory capacity, Win10 64-bit operating system, Matlab 2018a. 4.2 Comparison of Attraction Strategies In PHSFA, a new attraction model and a strategy, namely, the hybrid sample attraction model and the prediction-based tournament strategy, are included. To ensure fair comparison, our algorithm only uses the new attraction model and strategy, with parameter settings consistent with other FAs. In this section, NaFA [13], LVFA [14], SWFA [15], and the proposed PHSFA are compared. “+/ − / ≈” represents the comparison results between the proposed algorithm and other algorithms, where “+” means that the proposed algorithm is better than the algorithm, “−” means that the proposed algorithm is worse than the algorithm, “≈” means that the two algorithms perform similarly on the function, and Best represents the number of times the algorithm has the best optimization result among the 10 test functions. In Table 2, PHSFA is either the best or performs comparably with other algorithms in 7 functions, which demonstrates the feasibility of the attraction strategy in PHSFA. PHSFA outperforms LVFA in almost all CEC2020 functions, while it has advantages and disadvantages compared to NaFA and SWFA, but PHSFA performs better in more functions. For the functions f4 , f5 , and f7 , where the global optima are not the best solutions, the experimental data of PHSFA is only slightly inferior to the best-performing algorithms, rather than significantly behind them.

A Firefly Algorithm Based on Prediction

269

Table 3 presents the experimental results with dimension D = 20. By comparing the data in Table 2 and Table 3, it can be clearly seen that the algorithm accuracy of NaFA, LVFA, SWFA, and PHSFA all decrease with the increase of dimension, but PHSFA still achieves the best experimental results, demonstrating its stronger convergence and robustness. The Friedman test results for several algorithms are shown in Table 4, which also indicates that PHSFA has the best overall performance. Table 2. Mean ± Std for various attraction strategy of FA (D = 10) Functions

NaFA

LVFA

SWFA

PHSFA

f1

3.57E + 03 ± 4.83E + 02

5.85E + 03 ± 1.31E + 02

1.84E + 03 ± 3.72E + 02

8.41E + 02 ± 5.14E + 01

f2

2.48E + 03 ± 2.67E + 02

2.46E + 03 ± 1.75E + 02

3.05E + 03 ± 6.16E + 02

2.46E + 03 ± 2.36E + 02

f3

7.82E + 02 ± 2.32E + 00

7.84E + 02 ± 2.63E + 01

8.64E + 02 ± 6.07E + 00

7.40E + 02 ± 3.74E + 00

f4

1.90E + 03 ± 5.19E-01

1.92E + 03 ± 3.65E + 00

1.90E + 03 ± 7.24E-01

1.93E + 03 ± 8.96E + 00

f5

2.35E + 03 ± 4.36E + 02

4.26E + 03 ± 8.92E + 02

2.27E + 03 ± 2.15E + 02

2.73E ± 03 ± 3.92E + 02

f6

1.60E + 03 ± 6.02E-01

1.62E + 03 ± 3.21E + 01

1.60E + 03 ± 7.40E + 00

1.60E + 03 ± 9.13E-01

f7

2.77E + 03 ± 4.15E + 02

3.60E + 03 ± 2.43E + 02

2.75E + 03 ± 3.22E + 02

2.88E + 03 ± 3.40E + 02

f8

2.38E + 03 ± 2.56E + 00

2.71E + 03 ± 6.08E + 02

2.68E + 03 ± 8.61E + 02

2.35E + 03 ± 6.61E + 01

f9

2.70E + 03 ± 1.06E + 02

2.68E + 03 ± 2.52E + 02

2.64E + 03 ± 1.33E + 02

2.57E + 03 ± 8.63E + 01

f10

2.95E + 03 ± 2.25E + 01

3.23E + 03 ± 1.64E + 02

2.86E + 03 ± 2.47E + 01

2.79E + 03 ± 7.76E + 00

+/ − / ≈

6/3/1

8/1/1

6/3/1

-

Best

2

1

4

7

4.3 PHSFA Compared to Other FA Variants In this section, to demonstrate the performance of PHSFA, we compare it with FA [1], MFA [16], VSSFA [17], and SAFA [8]. The common parameter settings for all algorithms are as follows: population size N = 40, dimensions are still set to D = 10 and 20, maximum number of iterations T = 2000, step factor α = 0.2, maximum attraction β0 = 1, minimum attraction βmin = 0.2, and light absorption coefficient γ = 1. Table 5 shows the optimization results of PHSFA and other FA variants at D = 10,

270

L. Chen and J. Li Table 3. Mean ± Std for various attraction strategy of FA (D = 20)

Functions

NaFA

LVFA

SWFA

PHSFA

f1

4.42E + 03 ± 1.73E + 02

1.07E + 04 ± 3.33E + 03

2.78E + 03 ± 6.18E + 02

1.29E + 03 ± 2.30E + 02

f2

3.95E + 03 ± 8.70E + 02

2.76E + 03 ± 2.13E + 02

4.45E + 03 ± 1.31E + 03

6.31E + 03 ± 6.58E + 02

f3

8.08E + 02 ± 4.03E + 00

1.03E + 03 ± 2.90E + 01

9.33E + 02 ± 5.88E + 01

8.08E + 02 ± 1.79E + 01

f4

1.93E + 03 ± 7.11E-01

1.92E + 03 ± 2.77E + 01

1.92E + 03 ± 2.18E + 00

1.92E + 03 ± 2.31E + 01

f5

4.65E + 03 ± 2.92E + 03

7.90E + 03 ± 3.61E + 03

3.54E + 03 ± 4.80E + 02

4.07E ± 03 ± 6.60E + 02

f6

1.60E + 03 ± 8.97E-01

1.63E + 03 ± 4.29E + 01

1.61E + 03 ± 3.57E + 00

1.60E + 03 ± 1.15E + 00

f7

7.15E + 03 ± 2.39E + 03

5.92E + 03 ± 3.74E + 03

4.66E + 03 ± 9.63E + 02

5.35E + 03 ± 9.95E + 02

f8

2.47E + 03 ± 4.92E + 00

2.83E + 03 ± 9.23E + 02

2.80E + 03 ± 7.25E + 02

2.41E + 03 ± 4.84E + 01

f9

2.89E + 03 ± 2.75E + 00

3.40E + 03 ± 1.88E + 02

3.39E + 03 ± 8.82E + 01

2.83E + 03 ± 2.96E + 02

f10

3.08E + 03 ± 6.56E-01

4.02E + 03 ± 1.72E + 01

3.23E + 03 ± 1.54E + 02

3.08E + 03 ± 8.68E + 00

+/ − / ≈

6/1/3

8/1/1

6/3/1

-

Best

3

2

3

7

Table 4. Result of Friedman test on various attraction strategy Algorithm

10D

20D

Mean rank

PHSFA

1.7

1.5

1.6

SWFA

2.1

2.3

2.2

NaFA

2.3

2.3

2.3

LVFA

3.4

3.3

3.35

from which it can be seen that the accuracy of the optimization results of PHSFA is the best or equivalent to other algorithms in functions f1 , f3 , f4 , f6 , f8 , f9 . For function f1 , the optimization results of PHSFA and SAFA are very close to the optimal value, MFA’s optimization results are slightly worse, while the optimization results of FA and VSSFA are much worse. For function f4 , the optimization results of FA, VSSFA, and SAFA are very close to the optimal value, while the results of MFA and PHSFA reach the optimal

A Firefly Algorithm Based on Prediction

271

value. For function f6 , except for the standard FA, all other algorithms found the optimal value. Table 6 shows the comparison results between PHSFA and other FA variants in D = 20. The experimental data clearly shows that the increase in dimension affects the accuracy of the results of the five FAs. Compared to D = 10, FA and VSSFA only have results close to the optimal value on functions f4 and f6 , while the accuracy of the results on the other eight functions decreases significantly. The performance of PHSFA is only greatly affected on functions f2 and f7 , while still performing well on the other eight functions, and the number of optimal solutions has increased by one compared to D = 10. To compare the five algorithms more clearly, Fig. 3 shows the convergence curves of six functions. It can be seen that PHSFA had the fastest convergence speed and highest optimization result accuracy on function f4 , f8 , f9 . For function f1 , PHSFA had a fast convergence speed in the early stage, was surpassed by SAFA in the middle stage, and then surpassed SAFA again in the later stage. For function f9 , the convergence speed of MFA in the early stage was similar to that of PHSFA, but it slowed down in the middle stage and fell into a local optimum, while PHSFA achieved higher accuracy. In addition, Table 7 shows the Friedman test results of all FA variants on CEC2020 test functions at different dimensions. From the results, the proposed PHSFA algorithm is clearly superior to FA, MFA, VSSFA, and SAFA, achieving the lowest ranking, thus demonstrating the superiority of the algorithm in terms of search ability and solution accuracy. Table 5. Mean ± Std the comparative algorithms on CEC2020 test functions (D = 10) Func

FA

MFA

VSSFA

SAFA

PHSFA

f1

2.37E + 03 ± 8.01E + 02 ± 5.95E + 02 7.83E + 02

4.57E + 03 ± 1.07E + 02 ± 9.54E + 02 2.18E-02

1.07E + 02 ± 2.22E + 01

f2

2.10E + 03 ± 1.70E + 03 ± 1.29E + 02 2.50E + 02

2.14E + 03 ± 1.99E + 03 ± 1.53E + 02 2.01E + 02

2.07E + 03 ± 1.37E + 02

f3

1.17E + 03 ± 7.23E + 02 ± 7.77E + 01 3.65E + 00

1.16E + 03 ± 7.65E + 02 ± 9.17E + 02 2.45E + 01

7.19E + 02 ± 3.16E + 00

f4

1.99E + 03 ± 1.90E + 03 ± 6.14E + 00 1.63E-01

1.93E + 03 ± 1.92E + 03 ± 1.66E + 00 8.02E + 00

1.90E + 03 ± 3.66E-01

f5

2.86E + 03 ± 2.29E + 03 ± 1.65E + 02 1.92E + 02

2.69E + 03 ± 2.32E + 03 ± 1.33E + 02 2.88E + 02

2.41E + 03 ± 2.29E + 02

f6

1.61E + 03 ± 1.60E + 03 ± 1.41E + 01 6.64E + 00

1.60E + 03 ± 1.60E + 03 ± 1.84E + 01 8.58E + 00

1.60E + 03 ± 1.83E-01

f7

2.94E + 03 ± 2.82E + 03 ± 1.99E + 02 2.46E + 02

2.91E + 03 ± 2.41E + 03 ± 2.29E + 02 2.20E + 02

2.81E + 03 ± 1.96E + 02 (continued)

272

L. Chen and J. Li Table 5. (continued)

Func

FA

MFA

VSSFA

SAFA

PHSFA

f8

2.85E + 03 ± 2.31E + 03 ± 2.28E + 02 1.11E + 00

2.64E + 03 ± 2.50E + 03 ± 2.83E + 02 1.02E + 02

2.30E + 03 ± 3.78E-01

f9

2.60E + 03 ± 2.64E + 03 ± 1.11E + 02 1.16E + 02

2.50E + 03 ± 2.83E + 03 ± 2.72E + 01 1.30E + 02

2.50E + 03 ± 7.31E-03

f10

2.77E + 03 ± 2.90E + 03 ± 1.17E + 02 1.40E + 01

2.73E + 03 ± 2.95E + 03 ± 1.24E + 02 3.56E + 01

2.88E + 03 ± 9.40E-01

+/ − / ≈ 9/1/0

6/2/2

7/1/2

5/3/2

-

Best

3

2

2

6

0

Table 6. Mean ± Std the comparative algorithms on CEC2020 test functions (D = 20) Func

FA

MFA

VSSFA

SAFA

PHSFA

f1

1.70E + 04 ± 6.48E + 02 ± 2.30E + 03 6.34E + 02

2.89E + 04 ± 3.51E + 02 ± 1.62E + 02 ± 2.90E + 03 7.40E + 01 5.87E + 01

f2

3.50E + 03 ± 3.76E + 03 ± 1.78E + 02 8.59E + 02

3.47E + 03 ± 3.35E + 03 ± 4.70E + 03 ± 1.65E + 02 3.50E + 02 2.30E + 02

f3

2.10E + 03 ± 8.02E + 02 ± 1.38E + 02 1.44E + 01

2.04E + 03 ± 1.00E + 03 ± 7.82E + 02 ± 1.89E + 02 8.20E + 02 2.02E + 01

f4

2.17E + 03 ± 1.91E + 03 ± 1.01E + 01 4.17E-01

2.01E + 03 ± 1.97E + 03 ± 1.90E + 03 ± 5.48E + 00 1.56E + 01 9.21E-01

f5

3.77E + 04 ± 9.56E + 03 ± 7.11E + 03 3.78E + 03

2.98E + 04 ± 2.87E + 03 ± 9.95E + 03 ± 6.97E + 03 3.15E + 02 3.19E + 03

f6

1.61E + 03 ± 1.60E + 03 ± 2.45E + 01 6.35E + 00

1.63E + 03 ± 1.60E + 03 ± 1.60E + 03 ± 4.71E + 01 7.75E + 01 8.31E-01

f7

1.52E + 04 ± 4.31E + 03 ± 3.69E + 03 7.00E + 02

1.27E + 04 ± 2.93E + 03 ± 4.09E + 03 ± 3.07E + 03 3.14E + 02 2.44E + 02

f8

5.29E + 03 ± 2.30E + 03 ± 2.70E + 02 4.13E-03

5.13E + 03 ± 3.74E + 03 ± 2.28E + 03 ± 4.67E + 02 1.52E + 03 5.56E-01

f9

2.95E + 03 ± 2.80E + 03 ± 1.67E + 02 3.06E + 00

2.80E + 03 ± 3.27E + 03 ± 2.70E + 03 ± 2.19E + 02 7.02E + 01 1.08E + 01

f10

2.94E + 03 ± 2.91E + 03 ± 5.65E + 00 8.07E-02

2.92E + 03 ± 3.00E + 03 ± 2.91E + 03 ± 8.08E + 00 4.68E + 01 2.83E-01

+/ − / ≈ 9/1/0

7/1/2

9/1/0

6/3/1

-

Best

2

.0

3

7

0

A Firefly Algorithm Based on Prediction

273

Table 7. Result of Friedman test on different FAs. Algorithm

10D

20D

Mean rank

PHSFA

1.7

1.7

1.7

MFA

2.2

2.3

2.25

SAFA

2.5

2.4

2.45

VSSFA

3.3

3.7

3.5

FA

4.3

4.4

4.35

Fig. 3. Convergence curves of 6 functions

4.4 Time Complexity Analysis In this section, the time complexity of standard FA and PHSFA is analyzed based on their implementation. For a given problem f, , assuming the time complexity of solving the problem is O(f ), the time complexity of standard FA is O(T ∗ N 2 ∗ f ) because its full attraction model iteratively visits every firefly. PHSFA proposed in this paper has two operations, the first is hybrid sample learning with time complexity of O(T ∗ N ∗ f ), and the second is tournament selection based on prediction with time complexity of O(T ∗ f ). The time complexity of PHSFA is the sum of these two operations, which can be simplified to O(T ∗ N ∗ f ) by neglecting lower order terms. Therefore, it can be seen that PHSFA not only outperforms standard FA in performance but also has a much smaller time complexity.

5 Conclusion The paper proposes a firefly algorithm called PHSFA based on prediction and mixed sample learning to address the issues of the full attraction model in the standard FA. Three strategies are proposed, including (1) a prediction-based tournament mechanism, (2) a hybrid sample attraction model, (3) an adaptive iteration of the step factor and attraction parameter. The prediction-based tournament mechanism maximizes the learning

274

L. Chen and J. Li

potential of individual fireflies, while the mixed sample attraction model enhances the efficiency of individual learning without sacrificing population diversity. Modifying the step factor and attraction parameter can better adapt to the proposed strategies. Two comparative experiments on the CEC2020 test functions demonstrate that the proposed PHSFA method outperforms other algorithms. Acknowledgement. This research is supported by National Natural Science Foundation of China (Grant No. 62271359).

References 1. Yang, X.S.: Firefly algorithm. Nature-Inspired Metaheuristic Algorithms, pp. 79–90. Luniver Press, London, U.K (2008) 2. Tsuya, K., Takaya, M., Yamamura, A.: Application of the firefly algorithm to the uncapacitated facility location problem. Journal of intelligent & fuzzy systems 32(4), 3201–3208 (2017) 3. Bacanin, N., Zivkovic, M., Bezdan, T., Venkatachalam, K., Abouhawwash, M.: Modified firefly algorithm for workflow scheduling in cloud-edge environment. Neural Comput. Appl. 34(11), 9043–9068 (2022) 4. Melin, P., Sanchez, D., Monica, J.C., Castillo, O.: Optimization using the firefly algorithm of ensemble neural networks with type-2 fuzzy integration for COVID-19 time series prediction. Soft Computing, 1–38 (2021) 5. Shuhao, Y., Su, S., Lu, Q., Huang, L.: A novel wise step strategy for firefly algorithm. International Journal of Computer Mathematics 91(12), 2507–2513 (2014) 6. Wang, H., et al.: Firefly algorithm with adaptive control parameters. Soft. Comput. 21(17), 5091–5102 (2016). https://doi.org/10.1007/s00500-016-2104-3 7. Yu, S., Zuo, X., Fan, X., Liu, Z., Pei, M.: An improved firefly algorithm based on personalized step strategy. Computing 103, 735–748 (2021) 8. Tao, R., Zeng, M., Zhou, H.: A self-adaptive strategy based firefly algorithm for constrained engineering design problems. Applied Soft Computing 107, 107417 (2021) 9. Wang, C., Liu, K.: A randomly guided firefly algorithm based on elitist strategy and its applications. IEEE Access 7, 130373–130387 (2019) 10. Xu, G., Zhang, T., Lai, Q.: A new firefly algorithm with mean condition partial attraction. Applied Intelligence, 1–14 (2022) 11. Bei, J., Zhang, M., Wang, J., Song, H., Zhang, H.: Improved Hybrid Firefly Algorithm with Probability Attraction Model. Mathematics 11, 3706-3716 (2023) 12. Li, J., Tan, Y.: Loser-out tournament-based fireworks algorithm for multimodal function optimization. IEEE Transactions on Evolutionary Computation 22(5), 679–691 (2017) 13. Wang, H., et al.: Firefly algorithm with neighborhood attraction. Information Sciences 382, 374-387 (2017) 14. Zhao, J., Chen, W., Ye, J., Wang, H., Sun, H., Ivan, L.: Firefly algorithm based on level-based attracting and variable step size. IEEE Access 8, 58700–58716 (2020) 15. Peng, H., Qian, J., Kong, F., Fan, D., Shao, P., Wu, Z.: Enhancing firefly algorithm with sliding window for continuous optimization problems. Neural Computing and Applications 34(16), 13733–13756 (2022) 16. Fister, Jr, I., Yang, X.S., Fister, I., Brest, J.: Memetic firefly algorithm for combinatorial optimization. arXiv preprint arXiv 1204.5165 (2012) 17. Yu, S., Zhu, S., Ma, Y., Mao, D.: A variable step size firefly algorithm for numerical optimization. Applied Mathematics and Computation 263, 214–220 (2015)

Fireworks Algorithm for Dimensionality Resetting Based on Roulette Senwu Yu1,2 and Jun Li1,2(B) 1 College of Computer Science and Technology, Wuhan University of Science and Technology,

Wuhan 430065, China [email protected] 2 Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan 430065, China

Abstract. In order to enhance the local search ability and global search ability of the fireworks algorithm (FWA), this paper proposes a fireworks algorithm for dimensionality resetting based on roulette (FWADRR). First, the algorithm improves the exponential decay explosion and mapping rules in EDFWA, avoid the interference in the process of the algorithm using regional information, and enhance the local search ability of the algorithm. Then, a new firework reset method is proposed, which uses roulette to reset a certain dimension of the optimal firework based on the change of the optimal firework position, and uses this position as the position for resetting the firework. Change the explosion range of the reset firework to the explosion range of the optimal firework to ensure that the reset firework and the optimal firework are in the same search stage. This method ensures that other fireworks can play a role until the algorithm converges, and improves the global search ability of the algorithm. Tested on the CEC2020 benchmark suite, the experimental results show that FWADRR significantly outperforms previous fireworks algorithms. Keywords: Fireworks Algorithm · Roulette-Based Dimension Reset · CEC2020

1 Introduction The Fireworks Algorithm (FWA) is a metaheuristic algorithm inspired by the natural phenomenon of fireworks exploding in the night sky, proposed by Tan et al. [1]. FWA generates sparks through the explosion of fireworks, and then selects the next generation of fireworks to explode from the current population of fireworks and sparks. However, this process lacks sufficient utilization of information from the sparks. To address this issue, Li et al. [2] used information from the sparks generated by each explosion to guide the search for better solutions. In order to propagate the inspiration from each firework to the next iteration, Zheng et al. [3] selected new fireworks from the current population of fireworks and sparks, and used a crowding strategy to maintain diversity. Building on these two variations, Li et al. [4] proposed a reset strategy to reset unpromising fireworks. This strategy preserves promising fireworks while avoiding those that continue to explore © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 275–287, 2023. https://doi.org/10.1007/978-981-99-4755-3_24

276

S. Yu and J. Li

unpromising regions. It is not difficult to find that when the optimal fireworks are iterated all the time, other fireworks will increase the gap between the optimal fireworks and the reset fireworks because each reset uses the initialized explosion amplitude and random position. Therefore, this strategy can only guarantee the possibility of other fireworks replacing the optimal fireworks in the early stage, and then other fireworks will no longer play a role in the final solution. Through the calculation of information utilization rate [6], Chen [5] and others found that the guiding operator in [2] has good information utilization ability, so they changed the single explosion of each iteration in [4] to exponential decay The continuous explosion of each explosion, in which each explosion moves the center of the explosion through the guidance strategy in [2], and reduces the explosion amplitude and the number of sparks generated. Although each explosion produces sparks uniformly within the explosion range, the center of gravity of the area represented by the sparks is not necessarily the location of the fireworks. Therefore, continuing to offset using the firework position or firework offset position after each explosion may lead to misuse of firework information. At the same time, the guidance vector generated by using the spark information in [2] is based on the area information of the explosion range of the fireworks. When the fireworks are at the boundary of the search area, the generated sparks may be mapped to other areas outside the explosion range because they exceed the search space, which interferes with the steering vector. Aiming at the deficiencies in [5], we propose a fireworks algorithm for dimensionality resetting based on roulette (FWADRR), which is improved to the following three points: 1. Change the offset position of the continuous explosion in [5] to The center of gravity that produces the spark. 2. Regenerate the dimension of the spark beyond the search area within the explosion range, and reduce the guide vector of the explosion center that is offset from the search space, so as to ensure that the explosion center is always in the search space. 3. Change the explosion range of the reset fireworks to the explosion range of the current optimal firework, and according to the changes in each dimension of the optimal firework, select a certain dimension of the optimal firework to reset by means of roulette. The remainder of this article follows. The fireworks algorithm and related variants are presented in Sect. 2. A detailed description and analysis of our proposed method is given in Sect. 3. In Sect. 4, we present experimental results to demonstrate the performance of FWADRR on the CEC2020 [7] benchmark suite. Finally, we conclude in Sect. 5.

2 Fireworks Algorithm The basic process of the fireworks algorithm is to initialize multiple fireworks first, and then perform multiple explosion and selection operations in order to find the optimal solution. During the Explosion phase, each Firework creates explosion sparks around it that affect other Fireworks. Next, in the selection phase, after evaluation and ranking, the next generation of fireworks will be selected. The traditional selection strategy is similar to the evolutionary strategy, which is to put all fireworks in a selection pool. However, later research [4] showed that letting each firework form its own selection pool is a better choice in the case of multimodal optimization. In addition to selection, Fireworks also

Fireworks Algorithm for Dimensionality Resetting

277

cooperates with each other according to the global cooperation strategy. These strategies can be very general, for example, the number of sparks can be globally coordinated [4, 8], restart schedules can be designed [4], etc. In fact, the overall framework of the fireworks algorithm can be abstracted into two main processes: 1. Local search by explosion. 2. Global coordination of local explosions [9]. Together, these two principles enable the Fireworks algorithm to efficiently solve various optimization problems. During its development, FWA has attracted extensive attention from the research community. This is because on the one hand, FWA has been widely used to solve various problems in the real world, such as image processing [10], matrix factorization [11], spam detection [12], vehicle routing problem [13] and large Scale traveling salesman problem [14] et al. On the other hand, to improve the performance of FWA itself, researchers have tried many theoretical variants, some of the important variants include enhanced FWA (EFWA) [15], AFWA [16], dynFWA [9], bare-bones FWA (BBFWA) [17], guided FWA (GFWA) [2], CoFFWA [3], loser-out tournament FWA (LoTFWA) [4] and exponentially decaying FWA ( EDFWA) [5]. In order to enhance the local search ability of the fireworks algorithm, a guiding strategy is proposed in GFWA to realize the mutation operation of fireworks. In each generation, each firework first undergoes a uniform explosion to generate sparks, and then evaluates and ranks the fitness of these sparks. The guiding vector is computed by presetting the guiding mutation ratio σ, and computing the difference vector between the best spark set and the worst spark set. The guide vector is then added to the position of the firework to form the guide spark. The policy process is described as Algorithm 1. LoTFWA introduces competition among fireworks based on GFWA. In each generation, each firework calculates its expected final fitness by linear estimation, which is compared at the end of the optimization process. When a firework’s estimated fitness is lower than the current best true fitness in the population, it is considered a loser and restarted randomly in the next generation. Algorithm 2 describes the knockout process of LoTFWA. Compared with GFWA, LoTFWA greatly enhances its ability to handle more complex multimodal problems. But for some single-mode problems and large-scale problems, GFWA is still a better choice. Since the guidance strategy in GFWA has a good ability to utilize regional information, EDFWA applies the guidance strategy to the explosion of fireworks, and changes the single explosion of each generation of fireworks to continuous explosions with exponential decay. The proportion of the explosion amplitude and the number of sparks generated is the same, that is, each explosion in the continuous explosion is searched at the same search density, and the fireworks algorithm 3 is described as an exponentially decaying continuous explosion process. Among them, f is the fitness function, U0 is the spark produced by the initial explosion of the exponential decay explosion, P is the explosion center, A is the explosion amplitude, M is the number of sparks that the explosion can produce, γ is the exponentially decaying ratio parameter, and ε is the guide mutation ratio. Because similar mutation operations already exist in the exponential decay explosion, the original mutation operator is deleted in EDFWA.

278

S. Yu and J. Li

Fireworks Algorithm for Dimensionality Resetting

279

280

S. Yu and J. Li

3 Proposed Approach 3.1 Offset Position The guidance strategy in both GFWA and LoTFWA is used in the mutation operation to offset the positions of the fireworks by the guidance vector and evaluate these positions. The role of the guiding strategy in these variants is only to ensure the diversity of the population and prevent individuals from falling into local optimum. In EDFWA a guiding strategy is used to find more promising regions, and the regions are explored after they are found. In this method of finding a better area through the area information of the spark, the area after the center of gravity of the area information is shifted should be selected as the expected better area. Theoretically, the position of the fireworks with uniform explosion is the center of gravity of the sparks within the explosion range, but in fact there is an error between the actual center of gravity of the sparks and the position of the fireworks. As shown in Fig. 1, the position of the fireworks is not the same as the actual center of gravity of the sparks. Therefore, we change the guide vector offset

Fireworks Algorithm for Dimensionality Resetting

281

position in EDFWA to the center of gravity where the spark is generated. That is to modify Pi+1 = Pi + Gi−1 in Algorithm 1 to formula (1). Mi Pi+1 =

k=1 Ui [k]

Mi

(1)

Fig. 1. Schematic diagram of a single explosion

3.2 Mapping Rule Most of the fireworks variants will randomly map the dimensions beyond the search space to the search range for sparks beyond the range of the search space, which can avoid the generation of invalid sparks and increase the diversity of the population. But each continuous explosion of fireworks in EDFWA uses the information of sparks generated in the explosion area to generate a guide vector to control the direction of the continuous explosion. If invalid sparks are generated during the explosion and the sparks are mapped, it is likely that the generated sparks will not be within the explosion range of the fireworks. However, the information of the spark may be used in the process of generating the guide vector, which may easily cause the wrong guide vector to be generated, thus causing the fireworks to explore the wrong area. To avoid this, we regenerate the invalid sparks from the explosion within the effective explosion range. And in order to prevent the offset vector from causing the explosion center to shift out of the search space, we reduce the value in the dimension where the offset vector exceeds the search range, so as to ensure that the explosion center is always in the search space. 3.3 Roulette-Based Dimension Reset The reset strategy used in EDFWA is the loser-out tournament strategy in LoTFWA. The purpose of this strategy is to find a better area than the current optimal fireworks area through other individuals. By observing the changes of the current optimal fireworks, we found that this strategy can only work in the early stage, and it will no longer work in the middle and late stages. The main reason is that other fireworks will initialize the position and explosion range every time they are reset, resulting in each reset will widen the gap between the reset fireworks and the current optimal fireworks. In order to avoid this situation, we propose a roulette-based dimension reset strategy. The reset of the

282

S. Yu and J. Li

fireworks position is no longer a random reset of all dimensions, but according to the recent update of each dimension of the current optimal fireworks., use roulette to select a certain dimension of the current optimal firework to reset, and use the reset position of the firework as the position of the reset firework. The explosion range of the reset fireworks inherits the explosion range of the current optimal fireworks, so that the reset fireworks and the current optimal fireworks are in the same search stage. Algorithm 4 is a roulette-based dimensionality reset process. Algorithm 5 is the overall process description of FWADRR.

Fireworks Algorithm for Dimensionality Resetting

283

4 Experiments and Disscutions To demonstrate the effectiveness of FWADRR, FWADRR is compared with GFWA, CoFFWA, LoTFWA, and EDFWA. This article selects the CEC2020 benchmark function [7] for testing. Each function is tested 30 times in 20 dimensions, according to the setup of the bounded-constrained single-objective optimization competition. The termination condition is to perform 1,500,000 calculations. The mean and standard deviation calculations of the functions are shown in Table 2. The bold part indicates the optimal solution of the same function. The comparison results between the algorithm of this paper and other algorithms are summarized as “w/t/l” in the last line, that is, the algorithm of this paper wins on w functions, draws on t functions, and loses on l functions. 4.1 Parameters For the two parameters of the dynamic explosion amplitude used by the fireworks variant, namely Cr and Ca , the settings of Cr = 0.9 and Ca = 1.2 are still used. Other parameters in DRRFWA are the number of fireworks μ, the initial number of explosion sparks λ0 , the guided mutation rate σ , and the decay factor γ . Use the same settings as in EDFWA[5], namely μ = 3, λ0 = 30, σ = 0.2, γ = 0.75. The constant ε is set to 1E-06. The parameter setting of LoTFWA adopts μ = 5, λ = 60, σ = 0.2, A = (UpperBound − LowerBound)/2 in [4]. The relevant parameters in the other variants of the fireworks algorithm also use the same settings to ensure that each generation of fireworks in different variants has similar evaluation times. Experimental environment: Processor Intel ® Core™i7-11800U [email protected], RAM16.00GB, Win10 64-bit operating system, and MATLAB R2021a.

284

S. Yu and J. Li

4.2 CEC2020 Test Functions This article selects the latest CEC2020 test function for experimental verification. There are 10 test functions in total as the Table 1. Table 1. CEC2020 test functions No

Functions

Unimodal Function

1

Shifted and Rotated Bent Cigar Function

100

Basic Functions

2

Shifted and Rotated Schwefel’s Function

1100

3

Shifted and Rotated Lunacek bi-Rastrigin Function

700

4

Expanded Rosenbrock’s plus Griewangk’s Function

1900

5

Hybrid Function 1 (N = 3)

1700

6

Hybrid Function 2 (N = 4)

1600

7

Hybrid Function 3 (N = 5)

2100

8

Composition Function 1 (N = 3)

2200

9

Composition Function 2 (N = 4)

2400

10

Composition Function 3 (N = 5)

2500

Hybrid Functions

Composition Functions

Best

Search range: [−100, 100]D

4.3 Results From the experimental results shown in Table 1, in the 20-dimensional test of the CEC2020 benchmark, FWADRR performs poorly in the unimodal function (1), but it is also at the same level as other algorithms. In the complex function (9), it is only slightly worse than CoFFWA, and in other functions, FWADRR is better than (at least equal to) other comparison algorithms, and the average fitness value is basically close to the optimal value of the corresponding problem, which further verifies that the FWADRR algorithm has a higher solution accuracy and better robustness. The convergence curves of F1, F2, F5, and F8 in 20 dimensions are shown in Fig. 2, which respectively represent the ability of each algorithm to solve unimodal functions, basic functions, mixed functions, and composite functions. In the F2 function in Fig. 1, FWADRR exhibits stronger accuracy solving ability than other algorithms. In F5, although the improved algorithm is not as fast as EDFWA in the early stage of convergence, the final convergence result is significantly better than EDTFWA. The main reason is that the reset strategy adopted by LoTFWA is only effective in the early stage, and it will no longer work after the algorithm explores for a period of time; while the improved reset strategy uses the current optimal fireworks information to reset, so that the reset fireworks and the optimal fireworks In the same search stage, the gap between the reset fireworks and the optimal fireworks is reduced, so the reset strategy can always work until the algorithm converges.

Fireworks Algorithm for Dimensionality Resetting

285

Table 2. . FunId

GFWA

CoFFWA

LoTFWA

EDFWA

FWADRR

1

1.79E + 03 ± 1.58E + 03

2.82E + 03 ± 2.70E + 03

1.72E + 03 ± 1.89E + 03

1.22E + 03 ± 1.54E + 03

3.11E + 03 ± 2.95E + 03

2

3.21E + 03 ± 6.09E + 02

2.34E + 03 ± 3.16E + 02

3.17E + 03 ± 5.77E + 02

2.39E + 03 ± 4.38E + 02

1.34E + 03 ± 5.61E + 02

3

7.61E + 02 ± 1.97E + 01

7.60E + 02 ± 8.36E + 00

7.64E + 02 ± 1.99E + 01

7.49E + 02 ± 1.42E + 01

7.30E + 02 ± 5.16E + 00

4

1.90E + 03 ± 1.58E + 00

1.90E + 03 ± 6.87E-01

1.90E + 03 ± 2.07E + 00

1.90E + 03 ± 6.06E-01

1.90E + 03 ± 5.65E-01

5

5.24E + 03 ± 1.94E + 03

8.90E + 03 ± 5.01E + 03

9.47E + 03 ± 4.61E + 03

4.77E + 03 ± 1.59E + 03

3.62E + 03 ± 1.72E + 03

6

2.05E + 03 ± 9.25E-13

2.05E + 03 ± 9.25E-13

2.05E + 03 ± 9.25E-13

2.05E + 03 ± 9.25E-13

2.05E + 03 ± 9.25E-13

7

4.66E + 03 ± 1.35E + 03

4.26E + 03 ± 1.43E + 03

6.54E + 03 ± 2.85E + 03

4.56E + 03 ± 1.3E + 03

2.86E + 03 ± 4.57E + 02

8

2.30E + 03 ± 1.46E-13

2.50E + 03 ± 6.14E + 02

2.30E + 03 ± 7.2E + 00

2.30E + 03 ± 2.07E-13

2.30E + 03 ± 2.07E-13

9

2.88E + 03 ± 3.10E + 01

2.79E + 03 ± 1.35E + 02

2.87E + 03 ± 2.97E + 01

2.88E + 03 ± 4.83E + 01

2.82E + 03 ± 9.11E + 00

10

2.94E + 03 ± 2.88E + 01

2.92E + 03 ± 1.30E + 01

2.94E + 03 ± 2.77E + 01

2.93E + 03 ± 2.92E + 01

2.91E + 03 ± 2.74E + 00

w/t/l

6/3/1

6/2/2

6/3/1

6/3/1

Fig. 2. Convergence curves

286

S. Yu and J. Li

5 Conclusion This article proposes a fireworks algorithm for dimensionality resetting based on roulette(FWADRR). In our proposed method, the algorithm resets hopeless fireworks based on the information of the current best firework, so that the reset fireworks are in the same search stage as the current best firework. This enhances the information exchange between the best firework and other fireworks, and the reset strategy continues to be effective until the algorithm converges. In addition, experimental results demonstrate that FWADRR has strong convergence capabilities. Acknowledgment. This research is supported by National Natural Science Foundation of China (Grant No. 62271359).

References 1. Tan, Y., Zhu, Y.: Fireworks Algorithm for Optimization. In: Tan, Y., Shi, Y., Tan, K.C. (eds.) ICSI 2010. LNCS, vol. 6145, pp. 355–364. Springer, Heidelberg (2010). https://doi.org/10. 1007/978-3-642-13495-1_44 2. Li, J., Zheng, S., Tan, Y.: The effect of information utilization: Introducing a novel guiding spark in the fireworks algorithm. IEEE Trans. Evol. Comput. 21(1), 153–166 (2016) 3. Zheng, S., Li, J., Janecek, A., Tan, Y.: A cooperative framework for fireworks algorithm. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(1), 27–41 (2015) 4. Li, J., Tan, Y.: Loser-out tournament-based fireworks algorithm for multimodal function optimization. IEEE Trans. Evol. Comput. 22(5), 679–691 (2017) 5. Chen, M., Tan, Y.: Exponentially decaying explosion in fireworks algorithm. In: 2021 IEEE Congress on Evolutionary Computation (CEC), pp. 1406–1413 (2021) 6. Li, J., Tan, Y.: Information utilization ratio in heuristic optimization algorithms. In: Advances in Swarm Intelligence: 13th International Conference, ICSI 2022, Xi’an, China, July 15–19, 2022, Proceedings, Part I 3–22. Springer International Publishing, Cham (2022) 7. Yue, C.T., et al.: Problem definitions and evaluation criteria for the CEC 2020 special session and competition on single objective bound constrained numerical optimization. Technical Report, Glasgow, UK (July 2020) 8. Li, Y., Tan, Y.: Multi-scale collaborative fireworks algorithm. In: 2020 IEEE Congress on Evolutionary Computation (CEC) 1–8 (2020) 9. Zheng, S., Janecek, A., Li, J., Tan, Y.: Dynamic search in fireworks algorithm. In: 2014 IEEE Congress on evolutionary computation (CEC), pp. 3222–3229 (2014) 10. Zheng, S., Tan, Y.: A unified distance measure scheme for orientation coding in identification. In: 2013 IEEE Third international conference on information science and technology (ICIST), pp. 979–985 (2013) 11. Janecek, A., Tan, Y.: Swarm intelligence for non-negative matrix factorization. Int. J. Swarm Intellig. Res. (IJSIR) 2(4), 12–34 (2011) 12. He, W., Mi, G., Tan, Y.: Parameter Optimization of Local-Concentration Model for Spam Detection by Using Fireworks Algorithm. In: Tan, Y., Shi, Y., Mo, H. (eds.) ICSI 2013. LNCS, vol. 7928, pp. 439–450. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3642-38703-6_52 13. Yang, W., Ke, L.: An improved fireworks algorithm for the capacitated vehicle routing problem. Front. Comp. Sci. 13(3), 552–564 (2018). https://doi.org/10.1007/s11704-0176418-9

Fireworks Algorithm for Dimensionality Resetting

287

14. Luo, H., Xu, W., Tan, Y.: A discrete fireworks algorithm for solving large-scale travel salesman problem. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8 (2018) 15. Zheng, S., Janecek, A., Tan, Y.: Enhanced fireworks algorithm. In: 2013 IEEE congress on evolutionary computation 2069–2077 (2013) 16. Li, J., Zheng, S., Tan, Y.: Adaptive fireworks algorithm. In: 2014 IEEE Congress on evolutionary computation (CEC) 3214–3221 (2014) 17. Li, J., Tan, Y.: The bare bones fireworks algorithm: A minimalist global optimizer. Appl. Soft Comput. 62, 454–462 (2018)

Improved Particle Swarm Optimization Algorithm Combined with Reinforcement Learning for Solving Flexible Job Shop Scheduling Problem Yi-Jie Gao, Qing-Xia Shang(B) , Yuan-Yuan Yang, Rong Hu, and Bin Qian Kunming University of Science and Technology, Kunming 650500, China [email protected]

Abstract. Particle Swarm Optimization (PSO) is widely used to solve optimization problems. Most existing PSO algorithms only improve on inertia weights, but not in speed, position, and learning factors. Particles cannot find the optimal value more accurately based on their current position. In this paper, an improved particle swarm optimization algorithm combined with reinforcement learning (IPSO_RL) is proposed to solve the flexible job shop scheduling problem (FJSP) with the optimization goal of minimizing the maximum completion time (makespan). At first, in the particle update stage of this algorithm, a Q-learning algorithm is proposed to dynamically adjust inertia weights and acceleration parameters to balance the algorithm’s global exploration and local exploitation capabilities, thereby guiding the search direction reasonably. Secondly, a particle position update strategy was redesigned to accelerate the convergence speed and accuracy of the algorithm, to improve search efficiency. In addition, the algorithm introduces an oppositionbased learning strategy that can enrich the search direction of the solution space and enhance the algorithm’s ability to jump out of local optima. Finally, simulations experiments, and comparisons demonstrate that IPSO_RL can effectively solve the FJSP. Keywords: Flexible job shop scheduling · Q-learning · opposition-based learning · Particle swarm optimization

1 Introduction Job shop scheduling problem is an important link in intelligent manufacturing and is the key to optimizing resource allocation and improving economic efficiency for enterprises. The flexible job shop scheduling problem (FJSP), as an extension of the workshop scheduling problem, has attracted widespread attention in recent years. FJSP includes assigning a set of jobs that can be processed on multiple machines. Each job consists of several operations that must be processed in sequence. The operations of different jobs can be interspersed in the scheduling. The flexibility of FJSP lies in the fact that each operation (possibly all operations) of a given set can be processed across multiple © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 288–298, 2023. https://doi.org/10.1007/978-981-99-4755-3_25

Improved Particle Swarm Optimization Algorithm Combined

289

machines. Although small-scale FJSP can find the optimal scheduling through enumeration, the complexity of large-scale FJSP increases sharply with the number of machines and jobs, which poses a challenge for researchers. Currently, many scholars have achieved rich research results on flexible job shop scheduling. Brandimarte [1] first used the hybrid Tabu search (TS) algorithm to solve single target FJSP. However, this algorithm can only take a solution that is different from the solution saved in the taboo table and cannot use the information of the global optimal solution and the local optimal solution at the same time. Then Xing et al. [2] proposed an Ant Colony Optimization (ACO) algorithm to solve FJSP. Wang et al. [3] proposed a multi-population collaborative genetic algorithm based on a collaborative optimization algorithm to study the maximum completion time minimum problem of FJSP. Compared to the above algorithm, the particle swarm optimization algorithm (PSO) [4–6] has the characteristics of fast search speed and a small number of parameters and has unique advantages for solving flexible job shop scheduling problems. Maroua et al. [7] regarded the scheduling of embedded real-time production systems as a flexible job shop scheduling problem and proposed a distributed PSO to solve it. In the above model, only the inertia weights of the PSO improved, while the speed, position, and learning factors did not. However, in the search performance of the PSO algorithm, the balance between global exploration and local search largely depends on the control parameters of the algorithm [8, 9], while the fixed parameters ignore the flexibility of particles to find more accurate optimal solutions according to different surrounding environments [10, 11]. For FJSP, this paper establishes a sorting model to minimize the maximum completion time and proposes an improved particle swarm optimization combined with reinforcement learning [12–15] (IPSO_RL) to solve it. Firstly, in the particle update stage, a Q-learning algorithm is proposed to dynamically adjust inertia weights and acceleration parameters to guide the algorithm’s search direction reasonably. Secondly, a particle position update strategy was redesigned to accelerate the convergence speed and accuracy of the algorithm. In addition, the algorithm introduces an opposition-based learning strategy to enhance the algorithm’s ability to jump out of local optima.

2 Problem Description and Mathematical Model 2.1 Problem Description FJSP can be described as processing n jobs on m machines. Each job must be completed in the given operation sequence. Each operation has a set of machinable machines. Each operation can only be processed by one machine. Selecting different machines requires different processing times. After each operation is completed, it is transported to the next machine for processing. FJSP aims to determine the optimal processing time by reasonably arranging the job sequence and selecting suitable processing machines. The assumptions made regarding this issue include: (1) All machines are idle at the start of the scheduling process; (2) All jobs can be processed at the start of the scheduling process; (3) All jobs have the same priority;

290

Y.-J. Gao et al.

(4) There are no constraints on the order of operations between different jobs; (5) Each processing operation cannot be interrupted; (6) There are no limits to the buffer area for each machine to store jobs. Overall, the FJSP is a complex scheduling problem that requires careful consideration of various factors to optimize processing times and maximize efficiency. 2.2 Mathematical Model The mathematical model describes the problem, and the variable is defined as follows: π refers to the sequence of jobs; J represents the job set, J = {J1 , J2 , . . . , Jn }; Oij represents the jth operation of the job i, Oi(j−1) represents the previous operation of the operation Oij , Oi j represents the previous operation processed on the same machine as the operation Oij ; M represents the machine set, M = {M1 , M2 , . . . , Mm }; The machinable machine   set of Oij is Mh , Mh ⊆ M ; P πOijh represents the processing time of the operation   Oij on the machine h, h ∈ Mh ; S πOijh represents the start time of the operation Oij   on the machine h; C πOijh represents the maximum completion time (makespan) of the operation Oij on the machine h; C(πi ) represents the makespan of the job i; C(π ) represents the minimizing makespan of all jobs. The objective of the FJSP is to minimize the makespan of all sequences of jobs by establishing an optimization model. The model can be expressed as follows: f = min(C(π ))

(1)

C(π ) = max C(πi )

(2)

  C(πi ) = max C πOijh

(3)

       C πOijh = min S πOijh + P πOijh

(4)

h∈Mh

    S πOijh = max C πO

   , C π O 

i(j−1)h

ijh

(5)

Equation (1) is the objective function and f denotes minimizing the makespan of all sequences of jobs. Equation (2) denotes the maximum completion time of this job sequence. Equation (3) ensures the maximum completion time of the job i. Equation (4) means the minimum completion time of a process on a machinable machine. Equation (5) represents the constraints of machines’ resources and jobs’ resources. A machine can only handle one operation at a time. A job can only be processed on one machine at a time.

Improved Particle Swarm Optimization Algorithm Combined

291

3 IPSO_RL IPSO_ RL has improved the particle update process of the particle swarm optimization algorithm and adaptively controlled parameter settings. Firstly, an opposition-based learning (OBL) strategy is introduced in the particle update process. Compared with traditional particle swarm optimization algorithms, it expands the current search space and enhances the algorithm’s ability to jump out of local optima. Then, based on the problem that fixed parameters in particle swarm optimization affect the evolution speed, we propose to use a reinforcement learning algorithm to dynamically control parameters to balance the global and local search capabilities of particle swarm optimization in different evolution scenarios. The overall process of this algorithm is shown in Fig. 1: First, initialize the population and calculate the particle fitness value to find the current individual optimal value and global optimal value; Then, after OBL, select appropriate actions in the Q table according to the current state of the particles can adaptively adjust the particle swarm update parameters; Next, update the particles, feedback the updated results to the Q table, and adjust the individual and global optimal values; Finally, repeat the above steps until the termination criteria are met. This section is arranged as follows: Sect. 3.1 describes the opposition-based learning strategy. Section 3.2 describes the Qlearning algorithm that can adaptively control PSO parameters. Section 3.3 describes the improved particle swarm optimization.

Fig. 1. The process of improved particle swarm optimization algorithm combined with reinforcement learning.

3.1 Opposition-Based Learning (OBL) OBL is a mechanism that can effectively expand the search space and cover feasible solution regions and has been effectively applied in intelligent algorithms. OBL can calculate candidate particles and their corresponding opposite candidate particles in a single iteration (Eq. 6), and use them selectively to jump out of local optima and accelerate the search process.  k πi , fitness(πik ) > fitness(π ki ) k k (6) Bi = O(πi ) = π ki , fitness(πik ) ≤ fitness(π ki )

292

Y.-J. Gao et al.

where, πik represents the ith particle iterating to the kth generation. π ki denotes the opposite candidate particle. fitness is defined as the fitness scalar of particles in the objective function. Bik is a child who uses the opposition-based learning strategy for particle πik . Let πik (π1 , π2 , . . . , πL ) be the numbers in L dimensional space. π1 , π2 , . . . , πL represent the sequence number of the operations. a is the minimum value of the sequence. b is the maximum value of the sequence, and πx ∈ [a, b]. Then its opposition number π ki is defined by its coordinates π 1 , π 2 , . . . , π L , where (Fig. 2): π x = a + b − πx , x = 1, 2, . . . , L

(7)

Fig. 2. The learning process of opposite candidate particle.

3.2 Q-Learning The Particle Swarm Optimization algorithm (PSO) adapts its parameters dynamically by leveraging Q-learning’s capacity to learn through environmental feedback. In this article, we will elaborate on the methods for designing the Q-table and adjusting parameters adaptively combined with Q-learning. State Setting. The Particle’s position in the solution space is used as its state. This state Si is represented by the ratio of the distance of fitness between the particle Bik and the current global historical optimal particle Pgk to Pgk , as shown in Eq. (8).     fitness(Bik ) − fitness(Pgk ) Si = (8) fitness(Pgk ) Since the states of the Q-table need to be represented discretely, we normalize Si to S˜ i and discretize it into four states as shown in Eq. (9). The classification criteria for the four different states are shown in Fig. 3. ⎧ S1 0 ≤ St ≤ 0.2 ⎪ ⎪ ⎨ S2 0.2 ≤ St ≤ 0.3 S˜ i = (9) ⎪ S 0.3 ≤ St ≤ 0.5 ⎪ ⎩ 3 S4 St ≥ 0.5

Improved Particle Swarm Optimization Algorithm Combined

293

Fig. 3. Particles Generate Different states based on their location in state space (distance from the current global historical optimal particle Pgk ). Particle 1 is in S2 and Particle 2 is in S4 .

Action Setting. The particle swarm update process has three parameters: the inertia weight ω, acceleration factor C1 , and C2 .ω is used to balance the global and local search ability and ω > 0. C1 is the cognitive coefficient used to regulate the speed of particles converging to the individual historical optimal particle Pik . C2 is the social coefficient used to regulate the speed of particles converging to the global historical optimal particle Pgk . According to the effects of different parameter changes on optimization, different combinations of parameters are designed as Q-learning actions to get the dynamic parameter selection strategy. We propose particle parameter adaptive adjustment as the Q-table actions: global exploration, local search, slow convergence, and fast convergence. Each action corresponds to a set of parameter combinations, as shown in Table 1. Table 1. Q-table actions that can adaptively adjust particle swarm parameters. Action

Particle swarm update parameters ω

C1

C2

A1 (Global exploration)

0.1

0.4

0.7

A2 (Local search)

0.15

0.5

0.9

A3 (Slow convergence)

0.2

0.5

0.9

A4 (Fast convergence)

0.25

0.5

0.7

Reward Setting. To speed up particle search, the distance of fitness before and after particle update is used as a reward to guide the best direction.     (10) rik = fitness πik − fitness πik+1 where rik denotes the reward value obtained by the particle πik after the update. πik and the updated part denotes not the particle before and after the update, respectively. Q-Table Update. Based on the design of Q-learning learning states, actions, and rewards mentioned above, the action corresponding to the maximum Q value in the Q-table is selected with probability ε by judging the current Particle’s state. Update the

294

Y.-J. Gao et al.

state information of particles based on the selected parameter combination parameters of the action, calculate the rewards during the update process, and update the Q table with the new state and action of the particles. The Q table adopts a random initialization method, and the update formula is Shown in Eq. (11).

t (11) Q(Si+1 , at+1 ) = (1 + a)Q(Si , ai ) + α ri + γ max Q(Si+1 , ai ) a

where α is the learning rate. γ is the discount factor. ai is the action of particle Bik selection. And Q(Si , ai ) is the Q-value of the state Si taking action ai . To balance the exploration and exploitation ability of Q-table, we set α = 0.1 and γ = 0.85.

3.3 Improved PSO Algorithm We have redesigned the update operation for particle position and proposed the following particle position update operation:       πik+1 = C2 ⊗ Cross_2 C1 ⊗ Cross_1 ω ⊗ Swap Bik , Pik , Pgk (12) where Swap, Cross_1 and Cross_2 are three different operations of particle updates and denote an operational relationship. The following will provide a detailed introduction to these operations. 1) Swap is the swap process of operations in the particle Bik . Randomly select an operation Oij . Due to process sequence constraints, it is necessary to determine the front operation Oi(j−1) and rear operation Oi(j+1) positions of this operation Oij . Then, randomly select another operation Oi j between two positions to swap these two genes Oij and Oi j (Fig. 4).    SwapBk , rand [0, 1] < ω i Eik = ω ⊗ Swap Bik = Bik , rand [0, 1] ≥ ω

(13)

Fig. 4. The process of Swap.

Eik is a feasible child who performs Swap based on the probability of ω. 2) Cross_1 represents the cross process between the particle Eik and each individual historical optimal particle Pik . The job set J = {J1 , J2 , · · · , Jn } is randomly split into complementary subsets J1 and J2 . J1 ’s job numbers in Pik are sequentially copied to the child Fik from the original location, while J2 ’s job numbers in Eik are copied to Fik in

Improved Particle Swarm Optimization Algorithm Combined

295

turn. If two processes have the same order number, they are sorted randomly by their adjacent process (Fig. 5).   Cross_1E k , P k , rand [0, 1] 1is a monotonically decreasing function, the calculation shows that when μj = 10 is close to 0. Also, considering the load balancing state, a reasonable allocation of resources makes the edge calculator’s LA(t 0 ) smaller, which is both a more adequate allocation of resources and a high utilization of resources. The relationship between 1 < μj < 10 and the number of edge servers satisfies. The multitasking offload and resource allocation strategy is shown in Fig. 3.

Fig. 3. Multi-task offloading and resource allocation strategy flowchart.

3.5 Set Up Edge Computing Model Encoding To apply the OPGWO to the MEC problem, a mapping relationship needs to be established where the location of the individual grey wolf represents the offloading decision. Assuming N mobile end devices and J MEC servers, the computational resources of the

Mobile Edge Computing Offloading Problem

351

edge computing servers are allocated according to the V-function allocation policy, and a cloud server is deployed in the CSS. The encoding of the cloud server is set to J + 1. If five end devices send task offload requests and three edge servers are deployed in the ESS, the edge computing model coding is adjusted as shown in Fig. 4.

Fig. 4. Edge computing model coding.

Where task requests generated by device 1 are processed locally, tasks generated by device 2, 3 and 4 are offloaded to servers 2, 1 and 3 respectively, task requests generated by device 5 are processed by the cloud server. Complete coding tuning to apply to the edge computing task offload issues.

4 Simulation Experiments 4.1 Setting of Experimental Parameters Since a real edge computing environment cannot be implemented, in order to simulate the real environment as much as possible, this paper refers to the environment settings in the literature [15] and makes appropriate modifications for experimental simulation, using MATLAB R2021 simulation to generate experimental data and conduct experiments. The parameters in the edge computing model are set as follows: the number of MECs deployed in the ESS is J = 10, the number of cloud servers is 1, and the maximum number of iterations is t max = 100,the size of the task volume generated by the device and the resources required to compute that task are generated by a normal distribution. Other parameters are set in Table 3. Table 3. Experimental parameter setting. Parameter

Value

Parameter

Value

BCSS (MHz)

0.5

10

fi (GHz)

BESS (MHz)

5

σ 2 (dB)

−100

fijESS (GHz)

[2,10]

ci (MB)

[5,50]

fiCSS (GHz)

30

si (GHz)

[0.5,5]

pe (mW )

[100,200]

pj (dBm)

[20,25]

pc (mW )

[110,250]

pcdeal (dBm)

[20,25]

After completing the system model parameter settings, it is necessary to determine the parameters of V-function. As mentioned earlier, μj have been determined to be

352

W. Shang et al.

Fig. 5. System parameter setting

between 1 and 10, so μ should be between [0.1–1]. Therefore, different values of μ including 0.3, 0.5, 0.7, 1.0,1.2 and 1.5 were selected for experimentation by combining them with OPGWO experiments were conducted under the conditions that N is 50, 100, 150, 200, 250, 300, respectively. The best total energy consumption was averaged over 30 experiments for each group and compared as shown in Fig. 5(a). The results show that as the number of mobile devices increases, the disadvantage of not reusing resources becomes more apparent. μ = 0.5 is significantly better than μ = 0.3 when the devices are increased, and the Cost for different numbers of devices at μ > 0.5 are is higher than the allocation policy at μ = 0.5. Therefore, when the number of MEC is 10, the μ for the V-function is 0.5. As the weight values of latency and energy consumption are usually not on the same order of magnitude, they need to be regularized. Assuming n is 20, the average energy consumption and latency of the system with different weight values are shown in Fig. 5(b).The average latency and average energy consumption of the system tend to level off when gi T ≥ 0.4, with the gi T of increase, the system to reduce the latency at the expense of energy consumption, when gi T is larger, continue to increase the gi T cannot reduce the average energy consumption and average latency of the system, so the selected gi T = gi E = 0.5 in this paper. 4.2 Analysis of Results Simulation experiments were conducted using the above parameters to evaluate the performance of the OPGWO algorithm when combined with the V-function for offloading decisions. The OPGWO was compared with the Random algorithm, the improved genetic algorithm (GA) [16], and the Deep Q-network (DQN) [17]. The experiments were conducted under different numbers of devices, with a maximum of 100 iterations for each algorithm. Each algorithm was run 30 times to obtain the total cost convergence curves of the different algorithms. The results are shown in Fig. 6. Based on the experimental results, all the algorithms have good convergence speed in the first 20 iterations, indicating their merit-seeking speed. However, as devices increases, the accuracy of random algorithms in optimization decreases significantly. The GA had better convergence accuracy than the Random, and the use of genetic mutation helped to avoid falling into local optima in the solution process. However, the GA used randomness to search the solution space, resulting in a lack of diversity in the search process and potentially causing the algorithm to fall into local optima.

Mobile Edge Computing Offloading Problem

353

Fig. 6. Total cost convergence curves for different algorithms.

Compared to the Random and GA, the DQN method was quick and highly accurate in finding the optimal solution. The DQN analyzed and replayed the experience during the optimization search process to prevent entering local optima. However, over-evaluation could lead to slow seeking speed and flattening of the seeking process. The experimental findings showed that the OPGWO could quickly and efficiently find the best solution at the start of an iteration. The ORPS helped to increase the population size, allowing every potential optimal solution in the search space to be fully explored. Avoiding it from entering local optima. The population distribution was denser, and the effect of optimization performance was better at higher latitudes.

Fig. 7. System performance of different algorithms in relation to the number of tasks.

From Fig. 7(a), it is evident that as the number of devices increases, the load values of all the algorithms show an increasing trend. It is essential to control the load values within a lower range to ensure the stability and availability of the system. In both high-dimensional and low-dimensional problems, the OPGWO demonstrates excellent performance in load balancing compared to other algorithms. In terms of resource utilization (Fig. 7(b)) and task completion rate (Fig. 7(c)), the random offloading policy leads to server overload due to task stacking, resulting in low resource utilization and a low task completion rate. The other three algorithms show higher system resource utilization and task completion rate, with OPGWO performing

354

W. Shang et al.

the best. When combined with the inverse method, the V-function mapping strategy can prevent the blockage of low-performance server task processing and idle highperformance server resources.

5 Conclusion This study proposes a joint optimization strategy based on the OPGWO and the Vfunction mapping strategy for the computational offloading problem in a multi-user multi-MEC server situation. The effectiveness of the OPGWO is confirmed using the CEC 2017 test function. The simulated studies confirm the viability of the OPGWO in conjunction with the V-function mapping strategy in managing task offloading, enhancing load balancing, and improving resource utilization. In future studies, this work will be extended to mobile edge systems to address the problem of excessive delay and energy consumption during task transfer and processing.

References 1. Cisco Systems, Inc. Cisco annual internet report (2018–2023) whitepaper. https://www. cisco.com/c/en/us/solutions/collateral/execut-ive-perspectives/annual-internet-report/whitepaper-c11-741490.htm 2. Mach, P., Becvar, Z.: Mobile edge computing: a survey on architecture and computation offloading. IEEE Commun. Surv. Tutorials 19(3), 1628–1656 (2017) 3. Yang, G., Hou, L., He, X., et al.: Offloading time optimization via markov decision process in mobile-edge computing. IEEE Internet of Things J. 8(4), 2483–2493 (2021) 4. Liu, J., Zhang, Q.: Offloading schemes in mobile edge computing for ultra-reliable low latency communications. IEEE Access 6, 12825–12837 (2018) 5. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014) 6. Gobalakrishnan, N., Arun, C.: A new multi-objective optimal programming model for task scheduling using genetic gray wolf optimization in cloud computing. The Comput. J. 61(10), 1523–1536 (2018) 7. Jiang, K., Ni, H., Sun, P., et al.: An improved binary grey wolf optimizer for dependent task scheduling in edge computing. In: 2019 21st International Conference on Advanced Communication Technology (ICACT) (2019) 8. Rahnamayan, S., Jesuthasan, J., Bourennani, F., et al.: Computing opposition by involving entire [17] population. IEEE Congress on Evolutionary Computation, pp. 1800–1807, Beijing, China (2014) 9. Zhou, L.Y., Ding, L.X., Ma, M.D.: An orthogonal reverse learning firefly algorithm. J. Electr. Inform. Technol. 41(01), 202–209 (2019) 10. Awad, N.H., Ali, M.Z., Liang, J.J., et al.: Problem Definitions and Evaluation Criteria for the CEC 2017Special Session and Competition on Single Objective Real-Parameter Numerical Optimization. Nanyang TechnologicalUniversity, Singapore (2016) 11. Tan, F.M., Zhao, J.J., Wang, Q.: Research on gray wolf optimizer algorithm with improved nonlinear convergence mode. Microelectron. Comput. 36(05), 89–95 (2019) 12. Gai, W., Qu, C., Liu, J., Zhang, J.: An improved grey wolf algorithm for global optimization. In: 2018 Chinese Control And Decision Conference (CCDC), pp. 2494–2498. Shenyang, China (2018)

Mobile Edge Computing Offloading Problem

355

13. Gao, Z.M., Zhao, J.: An improved grey wolf optimization algorithm with variable weights. Comput. Intell. Neurosci. 2019(1–3), 1–13 (2019) 14. Wang, Z.T., Cheng, F.Q., You, W.: Gray wolf optimizer algorithm based on somersault for aging strategy. Appl. Res. Comput. 38(05), 1434–1437 (2021) 15. Tran, T.X., Pompili, D.: Joint task offloading and resource allocation for multi-server mobileedge computing networks. IEEE Trans. Veh. Technol. 68, 856–868 (2017) 16. Li, Z.: Genetic algorithm-based optimization of offloading and resource allocation in mobileedge computing. Information (Switzerland) 11(2), 83 (2020) 17. Yan, J., Bi, S., Zhang, Y.: Offloading and resource allocation with general task graph in mobile edge computing: a deep reinforcement learning approach. IEEE Trans. Wireless Commun. 19(8), 540 (2020)

Runtime Analysis of Estimation of Distribution Algorithms for a Simple Scheduling Problem Rui Liu, Bin Qian(B) , Sen Zhang, Rong Hu, and Nai-Kang Yu School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China [email protected]

Abstract. Estimation of Distribution Algorithm (EDA) is an intelligent optimization technique widely applied in production scheduling. In algorithm application, time complexity is an important criterion of concern. However, there is relatively little theoretical research on the time complexity of these algorithms. We propose a single machine scheduling with deteriorating effect (SMSDE) and give a proof of its property. Under the objective of minimizing the makespan, the convergence time (CT) of EDA to solve SMSDE is obtained. Then, we obtain the First Hitting Time (FHT) of EDA solving SMSDE from CT. This study provides some theoretical support for the application of EDA. Keywords: Estimation of distribution algorithms · first hitting time · deteriorating effect · computational time complexity

1 Introduction Evolutionary algorithms (EAs) are a general term for a large class of stochastic optimization algorithms inspired by Darwin’s theory of evolution. The idea of these algorithms is to treat each solution of the optimization problem as an individual in a population, and allow them to produce offspring through operations such as recombination and mutation, following the law of “survival of the fittest”. As the evolution progresses, the quality of surviving individuals is continuously improved, and eventually an optimal or nearoptimal solution to the corresponding problem is obtained. Estimation of Distribution Algorithm (EDA) is a new type of evolutionary algorithm that guides the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Evolutionary algorithms have been widely applied in various fields, such as computer applications, industrial design, and automation. Although evolutionary algorithms have strong generality, the theoretical research on evolutionary algorithms is relatively weak. This has led to a lack of reliable theoretical guarantees for the performance of evolutionary algorithms in practical applications, which has affected the further promotion and application of this algorithm. Doerr and Krejca [1] gave an upper bound on the running time of Univariate Marginal Distribution Algorithm (UMDA) for optimizing the LeadingOnes function. Dang et al. [2] provided upper bounds on the running time of UMDA for optimizing both the LeadingOnes © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 356–364, 2023. https://doi.org/10.1007/978-981-99-4755-3_31

Runtime Analysis of Estimation of Distribution Algorithms

357

function and the BinVal problem. Krejca and Witt [3] proved a lower bound on the running time of UMDA for optimizing the OneMax function. Sutton [4] analyzed the running time of (1 + 1)EA for solving the graph coloring problem and 2-CNF problem. Zhang [5] analyzed the running time of (1 + 1)EA for solving the traveling salesman problem and assignment problem, as well as the running time of (1 + λ)EA for solving the 0–1 knapsack problem. Oliveto et al. [21] analyzed the running time of (1 + 1)EA for solving the vertex cover problem. The existing research on the running time of EDA is usually based on several common benchmark functions. For evolutionary algorithms solving scheduling problems, (1 + 1)EA and (1 + λ)EA are commonly used. There is little theoretical research related to the running time analysis of EDA for solving scheduling problems. During the operation of evolutionary algorithms, offspring solutions are often generated from current solutions, and therefore this process can be modeled as a Markov chain [13, 14]. To analyze the running time of evolutionary algorithms, they can be modeled as Markov chains, and then the first hitting time (FHT) can be analyzed. Wegener [15] proposed an fitness level analysis method, which is a method for analyzing the expected running time of evolutionary algorithms using elite preservation strategies. Drift analysis is another tool for analyzing Markov chain FHTs. It was initially employed by He and Yao [16] to analyze the running time of evolutionary algorithms, and has since evolved into many variants [17, 18]. Yu and Zhou [19] used convergence-based analysis to analyze the running time of evolutionary algorithms. Yu et al. proposed a switch analysis method [20], which analyzes the expected running time of two evolutionary algorithms by comparison. In past research, researchers in the field of evolutionary computing have proposed various analysis tools for analyzing the running time of traditional evolutionary algorithms. However, these analysis tools cannot be directly applied to the analysis of typical distribution estimation algorithms, mainly because distribution estimation algorithms do not explicitly use operators such as mutation and crossover like traditional evolutionary algorithms do. One of the difficulties in analyzing distribution estimation algorithms is that the random errors in the algorithm’s random sampling are difficult to handle directly using traditional analysis tools. Specifically, these random errors occur in the random sampling done to update the probability model. This paper handles random errors by assuming that the distribution estimation algorithm samples infinitely many solutions in each generation. According to the law of large numbers in probability theory, the sampled results will converge to the expected results, thus avoiding discussions about random errors [10, 11].

2 Preliminaries 2.1 Sorting Problems with Deteriorating Effect In classical sorting problems, the processing time of each job is generally constant. However, in some sorting problems with strong practical applications, the actual processing time of a job depends on its position in the sequence. One model is that the processing time of a job increases as it is positioned later in the sequence, which was proposed by Mosheiov [6] and is known as the deteriorating jobs problem. The background for such

358

R. Liu et al.

models is that after a machine has been operating for a long time, its efficiency gradually declines due to reasons such as aging, which leads to an increase in the processing time of jobs processed later. This paper proposes and considers a single machine scheduling with deteriorating effect (SMSDE), which is described as follows: There are n independent jobs J1 , J2 , · · · , Jn that need to be processed on a single machine, and all jobs arrive at time zero. Only one job can be processed by the machine simultaneously, and processing cannot be interrupted. For each job Ji , there is a basic processing time pji = n − i + 1. The actual processing time of job Ji when it is placed in the r-th position is pji r = pji nr . The makespan of a feasible solution π = (π1 , π2 , · · · , πn ) on SMSDE can be calculated as follows: Cmax (π ) =

n 

pπi r .

i=1

Minimizing Cmax as the objective, the global optimum of SMSDE is π ∗ = (jn , jn−1 , · · · , j1 ). 2.2 First Hitting Time and Convergence Time The First Hitting Time of an evolutionary algorithm for solving a given problem is represented by τ and defined as τ = min{t : x∗ ∈ ξt }, where ξt refers to the population of the evolutionary algorithm at generation t, and x∗ refers to the global optimal of the problem. Chen [7] pointed out that FHT can be used to analyze the time complexity of EDA. Due to the high randomness of EDA, which is a probability sampling-based algorithm, it is difficult to give a definite FHT when assuming a very large population. Therefore, a new concept of “convergence” is introduced to provide an upper bound for FHT [8]. It originates from the concept of convergence of random sequences and is often used to describe the extreme state of EA [9], where all individuals in the population reach the global optimum. Therefore, “EA converging to the global optimum on a problem” is a sufficient but not necessary condition for “EA finding the global optimum on a problem”. The strict definition of convergence is given by Zhang and Muhlenbein as follows [10]: lim F(t) = G ∗ ,

t→∞

where F(t) is the average fitness value of individuals in the t-th generation, and G ∗ is the fitness value of the global optimal individual. Convergence describes the extreme state of EA, rather than the time complexity of EA. However, under the premise that EA can converge to the global optimum within a finite time, we can measure the time complexity of EA by the minimum number of generations required for EA to converge. The minimum number of generations is called convergence time (CT), denoted as T . The CT of EDA can be defined as follows:      T = min t : p x∗ ξt(s) = 1 ,

Runtime Analysis of Estimation of Distribution Algorithms

359

  (s) is the estimated where x∗ is the global optimal of the given problem, and p x∗ |ξt distribution of EDA in the t-th generation. Let E[T ] be the expected value of CT and E[τ ] be the expected value of FHT, we have: E[τ ] ≤ E[T ]

(1)

As a consequence, we can use the upper bound of the expected value of CT to estimate the upper bound of the expected value of FHT [8].

3 FHT of EDA with Truncation Selection on SMSDE The single-machine scheduling problem with deteriorating effects considered in this article can be defined by the following minimization function: SMSDE(x) 

n 

(n − xi + 1)ni ,

i=1

where x = (x1 , x2 , · · · , xn ) is a permutation of 1 to n, the global optimal of SMSDE is x∗ = (1, 2, · · · , n). Table 1. Estimation of Distribution Algorithm With Truncation Selection

The EDA using truncation selection is shown in Table 1, where pt,i (xi ) is the probability that an individual takes xi at the i-th position in the t-th generation, and r is the frequency that an individual takes xi at the i-th position in the t-th generation, rt,i (xi ) =

1  δ(x|xi ), N x∈ξt

360

R. Liu et al.

δ(x|xi ) is defined as follows:

 1, xi = xi , δ(x|xi )= 0, xi = xi .

Our analysis is based on the following assumptions: Assumption 1: ∀t > 0, 0 ≤ i ≤ n : rt,i (xi ) = pt−1,i (xi ). This assumption refers to our constant belief that EDA maintains a very large population, so according to the Glivenko-Cantelli theorem, the frequency of each value in the population generated by the probability matrix is equivalent to the probability of taking that value. This also means that under the condition of randomly generating the initial population, we assume the initial values of each marginal probability to be 1n . For each generation’s probability matrix Pt (x)(∀t = 1, 2, · · ·), we denote Pt (x) as follow: Pt+1 (x) = E(Pt+1 (x)|Pt (x)). We have the Assumption 2. Assumption 2: Pt (x) = Pt (x). We assume that the values of the probability matrix during EDA iteration are the same as the expected values. These two assumptions are also used in [8, 10]. In the remaining parts of this article, we will analyze the first hitting time of SWSDE solved by EDA under these two assumptions. In the analysis, we will use the following property of SMSDE. Propery 1: For ∀i ∈ {1, 2, · · · , n}, when the two solutions of SMSDE are the same in the last i − 1 bits, the value of the last i-th bit overwhelmingly determines the relative size of the fitness values of the two solutions. Proof: Consider an SMSDE of size n, x = (x1 , x2 , · · · , xn ) is a permutation of 1 to n. In the base-n + 1 numeral system, SMSDE(x) is an n-digit number, where the leftmost digit is (n − xn + 1), the second leftmost digit is (n − xn−1 + 1), and so on. For two n-digit numbers, when the first i-1 digits are the same, the relative size of the i-th digit determines their relative size, thus proving the property.  Due to Property 1, in EDA based on fitness selection for solving the SMSDE problem, each bit of the solution will start to converge after all bits to its right have converged. This phenomenon is called domino convergence [11, 12]. We will focus on marginal probabilities and use the following two lemmas. Lemma 1: For EDA on SMSDE, if at the t0 -th generation, pt0 ,i (x∗ ) = 1, then for ∀t > t0 , we have pt,i (x∗ ) = 1. Lemma 2: For EDA with truncation selection, the proportions of the best individuals before and after selection in the t-th generation satisfy:  Rt N , Rt ≤ M (s) N, (2) Rt = M 1, Rt > M N,

Runtime Analysis of Estimation of Distribution Algorithms

361

(s)

where Rt and Rt are the proportions of the best individuals before and after truncation selection. The proof of the above two lemmas can be found in [8]. Then we have the following theorem. Theorem 1: For EDA with truncation selection on SMSDE, its convergence time under the average behavior satisfies: ln nM + ln N − 2 ln M

E T |∀t > 0, 0 ≤ i ≤ n : rt,i (xi ) = pt−1 (xi ) < 2n − 1 ln N − ln M Proof: From Lemma 1, we know that once the value of a bit in EDA solving SMSDE converges to the global optimal, it will not take a suboptimal value in the subsequent iterations. Meanwhile, due to Property 1, EDA solving SMSDE will experience domino convergence, that is, each bit from right to left will converge to the global optimal value in sequence. Therefore, we can define Ti as the iteration at which the i-th bit from the right converges to the global optimal value, and its strict definition is as follows: ∗ 

=1 , Ti = min t : pt,n−i+1 xn−i+1 where xi∗ is the value of i-th bit of global optimal. Because the first bit will converge automatically when the second bits have converged, Tn−1 is convergence time. We first calculate T1 . From Lemma 2, we can write the following inequality: ⎧ ⎪ ⎨ 1 N T1 −1 ≥ M , n M N (3) ⎪ T −2 ⎩1 N 1 < M. n M N Solving (3) and considering the upper bound of T1 , we can obtain: T1
0, 0 ≤ i ≤ n : rt,i (xi ) = pt−1 (xi ) < 2n − 1 ln N − ln M So far, we have obtained mean FHT under the average behavior satisfies of EDA with truncation selection on SMSDE. Now we discuss the impact of selection pressure in EDA with Truncation Selection on the time complexity of the algorithm. Define the real number ε = M N , we have ε ∈ (0, 1). The smaller the ε, the greater the selection pressure of the algorithm. Rewriting (7) by replacing M N with ε, we obtain:   ln n

n E τ |∀t > 0, 0 ≤ i ≤ n : rt,i (xi ) = pt−1 (xi ) < 2 − 1 +1 (8) ln 1ε < 2Tn−2 +

This indicates that as the selection pressure of EDA with Truncation Selection increases, the time complexity of solving SMSDE decreases.

Runtime Analysis of Estimation of Distribution Algorithms

363

4 Conclusion and Further Work EDA is a population-based optimization algorithm that learns statistical patterns from good parent individuals by building a probability model and generates offspring through sampling the probability model. Time complexity is an important criterion in algorithm application. In this paper, we use FHT to measure the time complexity of EDA. In the analysis, we first obtain CT of EDA solving SMSDE and then derive FHT from CT. Our research indicates that a high selection pressure in EDA can reduce the time complexity when solving SMSDE. This conclusion is consistent with our expectation. For a simple problem like SMSDE, EDA has a clear direction to optimize. Higher selection pressure can make the algorithm optimize faster. However, for some complex optimization problems, excessively high selection pressure may cause the algorithm to fall into local optima. Another point to note is that we assume that EDA always maintains a huge population size. In some small population algorithms, a large selection pressure may cause the algorithm to lose some important information, leading to premature convergence. This is the first analysis of the time complexity of EDA for solving scheduling problems. However, in our analysis, we assume that the algorithm maintains a very large population, which is different from actual situations. Our future work includes studying the time complexity of EDA with small population, more complex scheduling problems, and more complex EDA. Acknowledgement. This research was supported by the National Natural Science Foundation of China (62173169 and 61963022) and the Basic Research Key Project of Yunnan Province (202201AS070030).

References 1. Doerr, B., Krejca, M.S.: A simplified run time analysis of the univariate marginal distribution algorithm on Leading Ones. Theoret. Comput. Sci. 851(1), 121–128 (2021) 2. Dang, D.-C., Lehre, P.K., Nguyen, P.T.H.: Level-based analysis of the univariate marginal distribution algorithm. Algorithmica 81(2), 668–702 (2018) 3. Krejca, M.S., Witt, C.: Lower bounds on the run time of the univariate marginal distribution algorithm on onemax. contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010) 4. Sutton, A.M.: Superpolynomial lower bounds for the (1 + 1) EA on some easy combinatorial problems. Algorithmica 75(3), 507–528 (2016) 5. Zhang, Y., Hao, Z., Huang, H., et al.: Runtime analysis of (1+1) evolutionary algorithm for two combinatorial optimization instances. J. Inform. Comput. Sci. 8(15), 3497–3506 (2011) 6. Mosheiov, G.: Scheduling problems with a learning effect. Eur. J. Oper. Res. 132(3), 687–693 (2001) 7. Chen, T., Tang, K., Chen, G., et al.: Analysis of computational time of simple estimation of distribution algorithms. IEEE Trans. Evol. Comput. 14(1), 1–22 (2010) 8. Chen, T., Tang, K., Chen, G., et al.: On the analysis of average time complexity of estimation of distribution algorithms. In: IEEE Congress on Evolutionary Computation. IEEE (2007) 9. Rudolph, G.: Finite markov chain results in evolutionary computation: a tour d’horizon. Fundamenta Informaticae 35(1–4), 67–89 (1998)

364

R. Liu et al.

10. Zhang, Q., Muhlenbein, H.: On the convergence of a class of estimation of distribution algorithms. IEEE Trans. Evol. Comput. 8(2), 127–136 (2004) 11. Rudnick, W.M.: Genetic algorithms and fitness variance with an application to the automated design of neural netoworks (1992) 12. Thierens, D., Goldberg, D.E., Pereira, A.G.: Domino convergence, drift, and the temporalsalience structure of problems. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE (2002) 13. He, J., Xin, Y.: Drift analysis and average time complexity of evolutionary algorithms. Artif. Intell. 127(1), 57–85 (2001) 14. He, J., Xin, Y.: Towards an analytic framework for analysing the computation time of evolutionary algorithms. Artif. Intell. 145(1–2), 59–97 (2008) 15. Wegener, I.: Methods for the analysis of evolutionary algorithms on pseudo-boolean functions. In: Evolutionary Optimization. International Series in Operations Research & Management Science, vol. 48. Springer, Boston, MA (2003). https://doi.org/10.1007/0-306-48041-7_14 16. He, J., Yao, X.: A study of drift analysis for estimating computation time of evolutionary algorithms. Nat Comput 3, 21–35 (2004) 17. Doerr, B., Johannsen, D., Winzen, C.: Multiplicative drift analysis. Algorithmica 64, 673–697 (2012) 18. Doerr, B., Goldberg, L.A.: Adaptive drift analysis. Algorithmica (New York) 65(1), 224–250 (2013) 19. Yu, Y., Zhou, Z.H.: A new approach to estimating the expected first hitting time of evolutionary algorithms. Artif. Intell. 172(15), 1809–1832 (2006) 20. Yu, Y., Qian, C., Zhou, Z.-H.: Switch analysis for running time analysis of evolutionary algorithms. IEEE Trans. Evol. Comput. 19(6), 777–792 (2015) 21. Oliveto, P.S., He, J., Yao, X.: Analysis of the (1 + 1)-EA for finding approximate solutions to vertex cover problems. IEEE Trans. Evol. Comput. 13(5), 1006–1029 (2009)

Nonlinear Inertia Weight Whale Optimization Algorithm with Multi-strategy and Its Application Cong Song li, Feng Zou(B) , and Debao Chen Institute of Physics and Electronic Information, Huaibei Normal University, Huaibei 235000, China [email protected]

Abstract. Whale optimization algorithm (WOA) suffers from slow convergence speed, low convergence accuracy, also difficulty in escaping local optima. To address these issues, we propose an improved whale algorithm that incorporates Latin hypercube sampling for population initialization. This ensures a more uniform distribution of the population in the initial stage compared to the random initialization. And then, introducing Cauchy Distribution into WOA’s searching prey stage. This prevents premature convergence to local optima and avoids affecting the later convergence. Furthermore, applying a nonlinear inertia weight to make an improvement on the convergence speed and accuracy of the algorithm. And compared the improved whale algorithm with other algorithms using 18 benchmark functions, and all results given indicated that the proposed algorithm better than original WOA and other algorithms. Finally, applying the improved WOA, original WOA and a mutate WOA to optimize the design of a pressure vessel, and the optimization results demonstrated that the effectiveness of the proposed method. Keywords: Whale optimization algorithm · Cauchy Distribution · Nonlinear inertia weight · Pressure vessel design

1 Introduction Since the emergence of intelligent optimization algorithms, various algorithms have been announced to handle a pile of optimization problems. Researchers have created different kinds of types of algorithms inspired by different phenomena, such as evolutionary mechanisms, physical principles, and animal behaviors. Those algorithms approximately fall into three broad categories. First, evolutionary algorithms, for instance, genetic algorithms [1], evolution strategies [2], and cultural algorithms [3]. Second, intelligent algorithms related to physical principles, such as the Big Bang-Big Crunch algorithm [4] and the Ray Optimization algorithm [5]. And third, swarm intelligence algorithms, which are inspired by some phenomenon or behavior of animals or plants in nature, for example, particle swarm optimization [6] and ant colony optimization [7]. What’s more, recently proposed algorithms such as the hunter-prey optimization algorithm [8] as well as the wild dog optimization algorithm [9]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 365–375, 2023. https://doi.org/10.1007/978-981-99-4755-3_32

366

C. S. li et al.

Whale Optimization Algorithm (WOA) was presented in 2016 by Mirjalili et al., from Griffith University in Australia as a new swarm intelligence optimization algorithm. Due to its simplicity, ease of implementation, low requirements on objective function conditions, and few parameter controls, WOA has been put into use in so many fields. However, traditional WOA still has some problems, such as low search accuracy, tardy convergence speed, as well as a tendency to trap in local optima. This paper raises an improved Whale Optimization Algorithm (MSNWOA) to overcome traditional WOA weaknesses, as well as enhances the efficiency and accuracy of the algorithm. There are many ways to improve the global search ability, slow convergence speed, moreover, have a tendency to fall into the local optima of WOA. One approach is to improve its global search ability by adding a probability-biased selection mechanism [10], which selectively executes actions at different probabilities during different iteration periods to improve its global search ability. Another is to introduce the Cauchy mutation (WOAWC) [11] to enhance algorithm’s global search ability. A second approach is to introduce the local optimization mechanism from bat algorithm [12], add a Gaussian random walk strategy, create a multi-leader mechanism [13], add Lévy flight, improve WOA’s ability to escape local optima. A third approach is to add adaptive weights, nonlinear adaptive weights to improve the algorithm’s convergence speed and accuracy. In addition, there are many integrated methods to compensate for the algorithm’s shortcomings. The improved WOA algorithm (MSNWOA) proposed in this paper incorporates Latin hypercube sampling as the population initialization strategy to cover as much search space as possible. Furthermore, implementing Cauchy distribution into WOA searching prey stage to avoid premature in early iteration. Nonlinear weights are also added to significantly improve the algorithm’s convergence accuracy and speed. Whale optimization algorithm and other improved whale optimization algorithm, have a wide range of applications, such as image processing field, image segmentation, image enhancement is using whale optimization algorithm to optimize the algorithm in image processing, and BP neural network regression prediction algorithm optimization problems and so on.

2 Whale Optimization Algorithm Whale Optimization Algorithm [14] is an intelligent optimization algorithm based on the foraging behavior of whales in nature. The inspiration comes from the humpback whale’s feeding strategy called “bubble net feeding,” in which they cooperatively create a circle made of bubbles to guide herring into the circle before swallowing them whole. The core idea of Whale Algorithm is to simulate the search and position update process of whales during feeding. It is split into three stages: searching for prey, enclosing prey, and bubble-net attacking. 2.1 Enclosing Prey Stage Whale Optimization Algorithm selects the best whale in the current population during this stage, as well as other whales move towards it using a certain mechanism. The

Nonlinear Inertia Weight Whale Optimization Algorithm

367

mathematical expression for this mechanism is:

where

− → − → − − → → X (t + 1) = X ∗ (t) − A · D

(1)

→ → − − → − − →  D =  C · X ∗ (t) − X (t)

(2)

− → − → − → and X ∗ (t) is the best whale in the current iteration population, moreover, A , C are both − → − → coefficient vectors, The specific form of A and C is: − → → C =2·− r

(3)

− → → → → A = 2− a ·− r −− a

(4)

where a diminishes linearly from 2 to 0, and r is a random vector from 0 to 1. 2.2 Bubble Net Attacking Stage This stage mimics the behavior of whales attacking prey and encloses it with a spiral path while releasing bubbles to confuse and trap prey. Shrink Enclosing Mechanism: This mechanism reduces the value of a in Eq. (1) by applying  it to Eq. (4), causing individuals to gradually shrink and approach each other. − → As  A  decreases, the step size of individuals; movement decreases, which can help search for local optimal solutions. Spiral Updating Mechanism: During this mechanism, individuals other than the best individual calculate their distance to the current best individual and move towards it in a spiral path while searching for the optimal solution in the vicinity. The mathematical expression for this mechanism is: − → − → − → X (t + 1) = D · ebl · cos(2π l) + X ∗ (t) (5) → − → − − →  where, D = X ∗ (t) − X (t) is the distance between best whale and current whale in the population, b is constant coefficient, and l is random number from 0 to 1. The two mechanisms are selected with a probability p, when p >0.5, contraction is selected, and when p 0.5

368

C. S. li et al.

2.3 Searching Prey Stage During the Searching Prey stage, a random whale individual is selected as the leader  −  → for searching, which is performed when  A  > 1, meaning that in the initial phase,  −  → when  A  is large, the population will conduct a global search as much as possible. The mathematical expression for this stage is: The stage of hunting prey is carried out according to Formula (7):

where:

→ − − → −−→ − → X (t + 1) = Xrand − A · D

(7)

→ −−→ − − → − → D =  C · Xrand − X 

(8)

−−→ Xrand is the randomly selected individual.

3 Proposed Whale Optimization Algorithm In response to the problems of Whale Optimization Algorithm (WOA), such as easy trapping in local optima, slow convergence speed, and low convergence accuracy, this paper proposes improvements. Firstly, the Latin hypercube sampling is used for uniform initialization of population’s positions in the search space. Secondly, Cauchy distribution is introduced in WOA searching prey stage to escape local optimal. Finally, a nonlinear weighting method is introduced to enhance search ability and improve convergence accuracy. 3.1 Latin Hypercube Sampling The concept of LHS was first introduced by McKay et al. in 1979. As a way to improve the efficiency of Monte Carlo simulation studies. The basic idea behind LHS is to divide each input variable into equally spaced intervals, and then select one sample from each interval randomly. This ensures that the sampled values cover the entire range of each variable, while also avoiding clustering or redundancy of samples. Compared with random sampling, Latin hypercube sampling will stratify the sampling space, and then extract samples from each layer. As the number of sampling points increases, the number of layers will also increase. In this way, points in the corner of the sampling space can also be sampled, while random sampling is less likely to sample corner points. Therefore, the Latin hypercube sampling was chosen as the initialization. The uniformity of its sampling results is more obvious in higher dimensional problems. The Latin hypercube sampling was introduced to ensure the population diversity as large as possible in the initial stage. Two different sampling examples are shown in Figs. 1 and 2. These images show that Latin hypercube sampling with fewer sampling times, the distribution of sampling points is similar to random sampling, and it can sample distant sampling points.

Nonlinear Inertia Weight Whale Optimization Algorithm

369

Fig. 1. Random sample 50 times.

Fig. 2. Latin hypercube sample 40times

3.2 Cauchy Distribution Compared with other distribution functions, take the Gaussian distribution as an example for comparison. The value range of the Cauchy distribution is very large, and large values have a certain probability to be generated by Cauchy distribution. However, the value of the Gaussian distribution is concentrated near the mean value, it can’t produce a large value to help the algorithm escape. So the Cauchy distribution is chosen. In the searching prey stage of the original whale optimization algorithm, the behavior of this stage would be affected by parameter A, and the decreasing parameter A will lead to the decreasing search step size. Cauchy distribution was introduced in this stage to increase the initial search step size of the whale optimization algorithm, which would produce a large value with a small probability, so that there was a certain probability to avoid falling into local optimal. The modified expression as follow: → − → − → −−→ − X (t + 1) = Xrand − A · D · CDF

(9)

The CDF is the value generated by the Cauchy distribution. 3.3 Nonlinear Weight In purpose of enhancing the global search ability of WOA, the convergence speed and accuracy as well, this paper introduces nonlinear weights inspired by literature [15]. The reason for adding nonlinear weights is that linear weights may not provide sufficient flexibility during iteration. And nonlinear weight can better adapt to complex problems, can provide more freedom and flexibility. The positions introduced are the bubble net attack stage and the enclosing prey stage, these expressions shown as follow: π ·t

w = λ · e− 3·Maxiter

(10)

− → − → − → X (t + 1) = w · D · ebl · cos(2π l) + X ∗ (t)

(11)

− → − → − → − → X (t + 1) = w · X ∗ (t) − A · D

(12)

where, Maxiter is the maximum number of iterations, and λ in the Eq. (10) is 0.08, t is current iteration.

370

C. S. li et al.

3.4 Algorithm MSNWOA In this paper, Whale Optimization Algorithm (WOA) is improved by using the Latin hypercube sampling to uniformly initialize the population, using Cauchy distribution to overcome the problem of the WOA easily getting trouble in local optima, and using nonlinear weighting to boost convergence accuracy and rate of convergence of WOA. The improved algorithm pseudocode is shown as follow.

4 Algorithm Simulation Analysis The experiments were conducted on a computer with Win10, 64-bit operating system and 12GB of memory, using an i5-6300HQ processor, Python 3.0. 18 benchmark test functions were selected from the literature [11] for testing (detailed expressions can be found in the literature), including F1 to F8 are unimodal functions, F9 to F13 are multimodal functions, and F14 to F18 are fixed-dimensional functions. Comparing MSNWOA with other algorithms, which are ABC [17], PSOw [18], EDSDE [16], WOA, WOAWC [11]. The parameters of PSOw c1, c2 are set as 2. EDSDE parameters setting is set according to the original text. The analysis results will be provided in Table 1, and three convergence curve graphs will be selected from each of the three categories of test functions and shown in Fig. 3. The algorithm’s parameters were set to 1500 iterations, 100 populations,30 dimension in unimodal functions and multimodal functions and 10 repeated experiments.

Nonlinear Inertia Weight Whale Optimization Algorithm

371

Table 1. 18 Benchmark Functions Experimental Results. Functions Metric ABC

PSOw

EDSDE

F1

F2

F3

F4

F5

F6

F7

F8

WOA

WOAWC MSNWOA

Mean

1.04E-05

2.21E-03

6.68E-105 3.11E-92

0

0

Std

4.0E-06

1.35E-03

5.51E-105 9.11E-92

0

0

Best

5.34E-06

5.14E-03

1.67E-105 3.39E-101 0

0

Mean

7.42E-06

1.18

2.22E-53

1.78E-56

0

0

Std

2.50E-06

1.14

1.09E-53

5.12E-56

0

0

Best

3.81E-06

0.32

8.06E-54

1.53E-60

0

0

Mean

2.14E + 04

1.25

4.76E-44

3.49E-23

0

0

Std

3.77E + 03

0.74

8.19E-44

1.05E-22

0

0

Best

1.33E + 04

0.35

1.05E-45

7.53E-34

0

0

Mean

4.54E + 01

0.80

4.45E-39

8.08E-20

0

0

Std

2.70

0.28

2.28E-39

1.29E-19

0

0

Best

3.83E + 01

0.42

1.14E-39

3.70E-22

0

0

Mean

2.40E + 02

5.75E + 01

2.76E + 01

2.67E + 01

2.81E + 01

2.82E + 01

Std

4.90E + 01

2.59E + 01

0.35

0.73

0.17

0.32

Best

1.74E + 02

2.46E + 01

2.67E + 01

2.52E + 01

2.77E + 01

2.78E + 01

Mean

1.00E-05 1.75E-03

0.71

0.77

0.95

3.78

Std

2.84E-06 1.16E-03

6.53E-02

0.36

0.14

0.36

Best

5.99E-06 2.34E-04

0.63

0.24

0.80

3.42

Mean

0.15

1.18E-02

2.69E-04

3.25E-05

1.21E-05

7.62E-06

Std

0.02

2.91E-03

9.43E-05

3.97E-05

1.19E-05

6.72E-06

Best

0.11

5.94E-03

1.10E-04

3.14E-06

6.77E-07

9.89E-07

Mean

-5.62E + -6.39E + 03 03

-8.55E + 03

-7.71E + 03

-7.21E + 03

-4.12E + 03

Std

1.30E + 02

2.21E + 02

3.31E + 02

3.98E + 02

2.48E + 02

Best

-5.83E + -7.75E + 03 03

-8.98E + 03

-8.36E + 03

-7.81E + 03

-4.58E + 03

5.68E + 03

(continued)

372

C. S. li et al. Table 1. (continued)

Functions Metric ABC F9

F10

F11

F12

F13

F14

F15

F16

F17

F18

PSOw

EDSDE

WOA

WOAWC MSNWOA

Mean

1.96E + 02

2.99E + 01

0

0

0

0

Std

5.43

6.16

0

0

0

0

Best

1.87E + 02

1.92E + 01

0

0

0

0

Mean

1.01E-03

2.36

3.99E-15

4.44E-16

4.44E-16

4.44E-16

Std

2.14E-04

0.37

0

0

0

0

Best

7.47E-04

1.78

3.99E-15

4.44E-16

4.44E-16

4.44E-16

Mean

3.21E-03

9.65E-03

0

0

0

0

Std

5.95E-03

9.24E-03

0

0

0

0

Best

4.20E-04

2.58E-04

0

0

0

0

Mean

6.87

0.59

0.08

0.23

0.12

1.22

Std

0.97

0.76

0.01

0.09

0.03

0.04

Best

5.29

0.13

0.06

0.12

0.06

1.13

Mean

10.15

1.81E-02 0.44

1.22

0.48

1.87

Std

2.91

1.97E-02 0.04

0.36

0.07

0.12

Best

5.37

1.05E-03 0.39

0.62

0.38

1.64

Mean

0.998

1.20

0.998

0.998

0.998

1.198

Std

7.02E-17 0.398

6.93E-04

3.73E-05

8.67E-06

5.24E-03

Best

0.998

0.998

0.998

0.998

0.998

0.998

Mean

5.65E-04

8.42E-04

3.79E-04

3.48E-04

3.66E-04

4.16E-04

Std

8.51E-05

3.95E-04

4.34E-05

1.64E-04

2.47E-05

2.63E-05

Best

3.65E-04

3.07E-04 3.34E-04

3.07E-04

3.29E-04

3.11E-04

Mean

-1.032

-1.032

-1.032

-1.032

-1.032

-1.032

Std

1.86E-16

0

6.45E-05

7.80E-10

2.05E-05

1.02E-05

Best

-1.032

-1.32

-1.32

-1.032

-1.032

-1.032

Mean

0.398

0.398

0.398

0.398

0.398

0.398

Std

0

1.18E-06

1.84E-03

3.24E-08

3.20E-05

6.18E-05

Best

0.398

0.398

0.398

0.398

0.398

0.398

Mean

3.0

3.0

3.0

3.0

3.0

3.0

Std

7.56E-16 5.62E-16

1.92E-02

9.56E-09

4.35E-07

5.05E-09

Best

3.0

3.0

3.0

3.0

3.0

3.0

Nonlinear Inertia Weight Whale Optimization Algorithm

373

Fig. 3. Convergence diagram

As Table 1 and Fig. 3 provided, MSNWOA has performed great optimization ability in F1, F2, F3, F4, F7, F9, F10, F11 functions. In F16 to F18 functions MSNWOA also can get the optimal value. These proved MSNWOA has better searching ability and convergence accuracy. Compared with original WOA algorithm and WOAWC algorithm, its convergence speed and fast swimming ability are also improved.

5 Pressure Vessel Design In pressure vessel design, the minimization of cost is a nonlinear programming problem with multiple constraints. Existing algorithms used for optimizing pressure vessel design problems include the gravitational search algorithm, evolutionary strategy algorithm, and genetic algorithm. This problem mathematical expression: x = [x1 , x2 , x3 , x4 ] = [Ts , Th , R, L]

(13)

min f (x) = 0.6224x1 x2 x3 x4 + 1.7781x2 x32 + 3.1661x12 x4 + 19.84x12 x3

(14)

Constraint condition: g1 (x) = −x1 + 0.0193x3 ≤ 0

(15)

374

C. S. li et al.

g2 (x) = −x2 + 0.00954x3 ≤ 0 g3 (x) = −π x32 −

4π x33 + 129600 ≤ 0 3

(16) (17)

g4 (x) = x4 − 240 ≤ 0

(18)

0 ≤ x1 ≤ 100, 0 ≤ x2 ≤ 100, 10 ≤ x3 ≤ 100, 10 ≤ x4 ≤ 100

(19)

where L is the length of the section of the column without considering the head, R is the radius of the inner wall of column, Ts is column’s wall thickness, Th is column’s top wall thickness. In this paper, MSNWOA is compared with the original WOA and WOAWC algorithms for optimizing pressure vessel design problems. The parameters are set to a population size of 50 and 500 iterations, and the experimental results are displayed in Table 2. The running results are presented at Table 2: Table 2. Optimization result Algorithm

WOA

WOAWC

MSNWOA

L

10

10

10

R

68.007

68.666

68.595

Ts

1.731

1.498

1.369

Th

0.744

0.672

0.689

Fitness

10999.118

9405.355

8962.949

Although the optimization results of parameter R and parameter Th are inferior to those of WOA and WOAWC, MSNWOA is superior to the other two optimization algorithms in terms of final fitness value. Overall, MSNWOA has a better effect on the design of pressure vessel.

6 Conclusion In this paper, a multi-strategy fusion Whale Optimization Algorithm (MSNWOA) is proposed, which uses the Latin hypercube sampling for population initialization, Cauchy distribution is introduced to avoid local optima, and nonlinear weighting to improve convergence accuracy and speed. 18 benchmark test functions experimental results shows that MSNWOA has significantly improved performance compared to original WOA, especially in the field of avoiding local optima, convergence accuracy, speed as well. The comparison with WOAWC also shows that the MSNWOA algorithm has higher convergence accuracy.

Nonlinear Inertia Weight Whale Optimization Algorithm

375

Acknowledgement. This work is partially supported by the National Natural Science Foundation of China (No. 61976101) and the University Natural Science Research Project of Anhui Province (No. 2022AH040064). This work is also partially supported by Funding plan for scientic research activities of academic and technical leaders and reserve candidates in Anhui Province (Grant No. 2021H264) and the Top talent project of disciplines (majors) in colleges and universities in Anhui Province (Grant No. GxbjZD2022021).

References 1. Ma, Y.J., Yun, W.X.: Research progress of genetic algorithm. Comput. Appl. Res. 29(04), 1201–1206+1210 (2012) 2. Zhang, M., Wang, X.J., Ji, D., Zhou, F.J.: A new evolutionary planning algorithm. J. Nav. Univ. Eng. 03, 40–43 (2008) 3. Liu, M.D.: Research progress of Memetic Algorithm. Autom. Technol. Appl. 2007(11), 1– 4+18 (2007) 4. Ma, A.F., Wang, J.J.: PID parameter tuning based on big bang-big convergence algorithm. J. Hangzhou Dianzi Univ. (Nat. Sci. Ed.) 38(06), 56–61 (2018) 5. Li, X., Fu, Y.F., Wang, L., Lu, C.T.: Dynamic collision optimization algorithm based on ray detection. J. Syst. Simul. 31(11), 2393–2401 (2019) 6. Lei, K.Y.: Particle Swarm Optimization and Its Application Research. Southwest University (2006) 7. Zhang, J.H., Xu, X.H.: A new evolutionary algorithm - ant colony algorithm. Syst. Eng. Theory and Pract. 1999(03), 85–88+110 (1999) 8. Naruei, I., Keynia, F., Sabbagh Molahosseini, A.: Hunter-prey optimization: algorithm and applications. Soft. Comput. 26, 1279–1314 (2022) 9. Hernán, P.-V., Adrián, F.P.-D., Gustavo, E.-C., et al.: A bio-inspired method for engineering design optimization inspired by dingoes hunting strategies. Math. Probl. Eng. 2021, 19 (2021) 10. Liu, W., et al.: Improved whale optimization algorithm and its application in weight threshold optimization of shallow neural networks. Control Decis. 38(04), 1144–1152 (2023) 11. Guo, Z.Z., Wang, P., Ma, Y.F., Wang, Q., Gong, C.Q.: Whale optimization algorithm based on adaptive weight and Cauchy mutation. Microelectron. Comput. 34(09), 20–25 (2017) 12. Wang, T.Y., He, X.B., He, C.L.: A hybrid whale optimization algorithm based on adaptive strategy. J. Chin. West Normal Univ. (Natural Sciences) 42(01), 92–99 (2021) 13. Yu, X.X.: An improved multi-leader whale optimization algorithm. Softw. Eng. 25(11), 28–34 (2022) 14. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95(5), 51–67 (2016) 15. Liu, Z.J., Tian, W.Y.: Optimization of whale algorithm. Internet of Things Technology 11(01), 42–46 (2021) 16. Peng, X., Desxuan, Z., Qiang, Z.: An efficient dynamic adaptive differential evolution algorithm. Comput. Sci. 46(S1), 124–132 (2019) 17. Quande, Q., Shi, C., Li, L., Yuhui, S.: A review of artificial bee colony algorithms. J. Intell. Syst. 9(02), 127–135 (2014) 18. Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence, pp. 69–73, IEEE (1998)

Sparrow Search Algorithm Based on Cubic Mapping and Its Application Shuo Zheng, Feng Zou(B) , and DeBao Chen Huaibei Normal University, Huaibei 235000, China [email protected]

Abstract. Sparrow search algorithm (SSA) easily appears some problems such as falling into local optima and lacking search precision. To address these issues, a sparrow search algorithm based on cubic mapping (CSSA) improvement is proposed. Wanting a better population, a Cubic chaotic initialization method is used to improve population quality. During the producer position update process, levy fight is added to widen the search area of the algorithm. Then, an inertia weight is introduced during the scrounger phase. The algorithm is tested using CEC2022 test functions and compared with multiple algorithms. The data show that CSSA can overcome the problem that SSA easily getting stuck in local optimum to some extent, and improve convergence accuracy and stability. At last, CSSA is applied to an engineering problem of three-bar truss, and compared with other algorithms in a comparative experiment. Keywords: Sparrow search algorithm · Cubic · Levy fight · Three-bar truss

1 Introduction Intelligent optimization algorithms refer to a class of computational techniques that use iterative search procedures to efficiently find optimal solutions to complex problems. These algorithms leverage progressive artificial intelligence techniques, such as machine learning, neural networks, and evolutionary computation, to enable efficient and effective optimization. Intelligent optimization algorithms are powerful tools that enable researchers and practitioners to quickly and accurately optimize complex systems and processes, ultimately leading to more efficient and effective outcomes. Therefore, many optimization algorithms are proposed, for example, grasshopper optimization algorithm(GOA) [1], whale optimization algorithm(WOA) [2], artificial bee colony algorithm(ABC) [3], and SSA [4], etc. Among them, the SSA was proposed in 2020. Owing to the fact that the SSA can find approximate optimal solutions in a relatively short amount of time when solving combinatorial optimization problems, it has faster convergence speeds and higher solution efficiency compared to other heuristic algorithms. Therefore, this article conducts research on improving the SSA. Among many optimization algorithms, SSA has the characteristic of few parameters and simple principles. However, it can’t find an optimal solution in some cases, and the convergence rate is too slow. To make the algorithm perform better, many scholars have © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 376–385, 2023. https://doi.org/10.1007/978-981-99-4755-3_33

Sparrow Search Algorithm Based on Cubic Mapping

377

studied and improved it. For example, in reference [5], reverse learning was used in the initialization part, in references [6, 7], chaotic Logistic mapping and Tent mapping were respectively used for algorithm initialization to avoid the drawbacks of a random population. There are also improvements to individual position updates, such as the adaptive spiral sparrow search algorithm proposed in reference [8]; and in reference [9], the authors introduced a learning coefficient in the position update part of the producers to promote the algorithm’s search performance. In reference [10], a Levy flight disturbance mechanism was introduced during the sparrow’s foraging process to lead the population to move an appropriate step length and increase the diversity of spatial search. There are also multiple strategies for updating the sparrows, such as the combination of a sine-cosine search method and diversity mutation processing method raised by Zhang et al. to mend the SSA [11] and elevate the rate of convergence; and Fu Hao et al. used elite chaotic reverse learning to initialize the population [12], and integrated the chicken swarm algorithm’s random follow-up strategy to change the updated formula of the producer’s position, equilibrating the algorithm’s local extensibility and global explore capability. The above literature has improved the algorithm from different perspectives, to some extent, avoiding local optima and improving the algorithm’s exploration ability, but there are still shortcomings in the algorithm’s accuracy. In an attempt to elevate the execution efficiency of the algorithm, prevent it from easily getting into local optima, and optimize convergence speed, this article announces a sparrow search algorithm (CSSA) based on Cubic mapping. In the initial population part, the Cubic chaotic mapping is introduced to improve the quality of population distribution. When the producer position is updated, the Levy flight is added to avert trapping into local optima. Then, in the renewal phase of scroungers, adaptive weights are imported to heighten the ability to seek for excellence. In this paper, CEC2022 benchmark functions are simulated, and the algorithm is simply put into use to the engineering problem of three-bar truss design, verifying the viability of CSSA.

2 Sparrow Search Algorithm On the basis of the predation behavior of sparrows, SSA is proposed. Sparrows are made up of producers and scrounger, with producers responsible for foraging food and scroungers responsible for tracking and plundering. When sparrows encounter danger, they need to move their positions in a timely manner. The update method of this algorithm is as follows: First, the producer positions are updated, and it is updated as follows:  −i   t , R2 < ST xid · exp α·T t+1 (1) xid = t xid + Q · L, R2 ≥ ST t is for individual sparrow, α ∈ (0, 1], it is a uniformly distributed number, Q is a xid number following standard normal distribution, L is a matrix of size 1 × d with all elements equal to 1. The value of R2 ranges from 0 to 1, indicating the warning value. ST ∈ [0.5, 1], It means there is no danger between these intervals.

378

S. Zheng et al.

The remaining sparrows will be considered as scrounger, and the formula for updating their positions is as follows: ⎧  t t ⎨ exp xωd −xid · Q, i > 2n t+1 i2

xid = (2) ⎩ xbt+1 +

xt − xbt+1

A+ · L, otherwise d d id In this formula, i represents the number of scroungers, xωdt is the optimal position of individuals, xωdt denotes the current globally worst position, A denotes a 1*d matrix, A+ = AT (AAT )−1 . Finally, the sparrows that realize the danger perform position updates. They have randomly generated initial positions in the population, and their number ranges from 10% to 20% of the total quantity. It’s mathematical expression for this is: t   t xbd + βxid − xbtd , fi > fg t+1 t t (3) xid = t + K xid −xwd , f = f xid i g ε+(fi −fω ) xbtd represents the optimal position; β is the step size control argument, K is a random number ranging from -1 to 1, fi is the fitness value of the current sparrow individual, fg and fω denote the maximum and minimum fitness values at present, respectively, ε is a very small number to avoid division by zero.

3 Improvement of Sparrow Search Algorithm Regarding the improvements, first, a chaotic map is used in the initialization stage. Then, Levy flight strategy and adaptive weight are imported in the position update stage of the producer and scrounger. Meanwhile, in order to expand the scope of preliminary search, corresponding adjustments have been made to the producer’s formula. The following is the detailed description of the improvements. 3.1 Section of Population Initialization In the population initialization stage of the algorithm, the mode of the population distribution is improved using the Cubic chaos. Chaos maps are widely used in the optimization field because of their ergodicity and regularity. The expression for the Cubic map is shown in Eq. (4):  (4) xn+1 = ρxn 1 − xn2 xn ∈ (0, 1); ρ is the control parameter, and when the value of x0 is 0.3 and the value of ρ is 2.595, the Cubic map exhibits excellent chaotic ergodicity. According to reference [13], the chaoticity of the Cubic map is similar to the maximum Lyapunov index of the Logistic map and Tent map and is more superior than that of one-dimensional maps such as the Sine map, Circle map, Singer map, and Kent map.

Sparrow Search Algorithm Based on Cubic Mapping

379

3.2 Part of Producer and Scrounger Location of The Producer: To boost its exploration ability, the Levy flight strategy is imported into the producers update mode. “When R2 < ST, the position of the individual is decreasing after each iteration, which can result in a limited search range for the early stage of SSA. To address this issue, the updated formula is revised as follows: ⎧  −i ⎪ ⎨ xt · e 2·t+k , R2 < ST id t+1 xid = (5) ⎪ ⎩ x + x · Levy, R ≥ ST 2 b b k ∈ [0, 2], xb represents the global optimal position, and the mechanism of Levy flight is as follows: u Levy = (1/ξ ) (6) |v| In formula (6), v ∼ N (0, 1), u ∼ N (0, σ 2 ) and the value of ξ is 1.5. The expression for σ is: (1/ξ )

(1 + ξ ) × sin(π ξ/2) (7) σ = ((1 + ξ )/2) × ξ × 2((ξ −1)/2) Levy flight is a method for generating random numbers using a normal distribution, proposed by Mantegna7 in 1994. Levy flight is a type of random walk in which the steplengths are drawn from a heavy-tailed probability distribution, such as a power law or a Levy distribution. Unlike the normal random walk, where the step-lengths are drawn from a Gaussian distribution, Levy flights allow for occasional long-distance jumps, which can lead to a much faster exploration of space. Location of The Scrounger: To improve a convergence speed of the SSA, adaptive weights are introduced in the scrounger’s update stage. Equation (8) is as follows: ⎧  t t ⎪ id ⎨ w1 · exp xωd −x , i > 2n i2 t+1 xid = (8)



⎪ ⎩ xbt+1 +

xt − xbt+1

A+ · L, otherwise d d id where the formula for w1 is:



π · w1 = d − tan 2 d=

t+n T

f i − fb fmean − fb

2  ·d

(9) (10)

fi and fb are the current fitness values and the best fitness values, respectively. fmean is the average value of fitness of the current population n ∈ [0, 1]. The introduction of w1 is more conducive to exploring unknown regions compared to the random numbers in the original formula.

380

S. Zheng et al.

Cubic mapping and Levy flight strategy have certain limitations. The Cubic chaotic mapping is sensitive to initial conditions, and its search performance is greatly affected by the initial conditions. Improper initial conditions may lead to the algorithm being unable to converge or converge slowly. Due to the complexity of the Levy distribution, calculating the step size for each step requires more computational resources and time, so the high computational cost may be encountered in practical applications.

4 Simulation Experiment and Result Analysis In an attempt to test the feasibility and superiority of CSSA, comparative experiments were conducted with four other algorithms: ITLBO [14], MeanPSO [15], SSA and ISSA [16]. The parameters used for ITLBO, MeanPSO, and ISSA are kept consistent with the original text. Selecting CEC2022 [17] test functions for simulation experiments. Where the parameters in CSSA, ISSA and SSA: PD = 10%, SD = 10%, R2 = 0.8. All calculations were repeated 10 times, and the results were evaluated based on the average, standard deviation, and best values. The experiments were conducted in Python 3.10 on a 64-bit Windows 11 operating system. The evaluation frequency for each algorithm is 200,000 times. Table 1 displays the experimental data of the five algorithms, and the best values are highlighted in bold. From the table above, it is not hard to find that the overall search ability of CSSA is more excellent compared to other algorithms, but its performance in the few function tests is not ideal. In the comparative algorithm, SSA performs relatively well in the overall performance of functions f1 and f3, while in f7, f9, and f11, SSA obtains the smallest optimal values. The optimal values found by CSSA in functions f1 to f5 and f11 are very close to the corresponding function’s minimum values, while the optimal values obtained in other test functions differ significantly from the function’s minimum

Sparrow Search Algorithm Based on Cubic Mapping

381

Table 1. Experimental Results Functions(dim = 10) Metric ITLBO

MeanPSO

SSA

ISSA

CSSA

F1 F(min) = 300

Mean Std Opt

8534.33 1467.01 5859.21

3535.05 1126.08 1482.77

300.00 2.31E-05 300.00

9619.29 2365.85 4957.81

320.24 16.65 300.11

F2 F(min) = 400

Mean Std Opt

1474.57 526.61 1060.08

1898.44 493.25 1151.38

417.54 28.11 400.00

1539.05 683.77 735.99

402.40 3.56 400.00

F3 F(min) = 600

Mean Std Opt

600.48 0.14 600.29

600.32 0.08 600.19

600.00 4.51E-10 600.00

600.39 0.10 600.20

600.00 2.15E-06 600.00

F4 F(min) = 800

Mean Std Opt

800.72 0.30 800.29

801.26 0.45 800.60

800.23 0.09 800.09

801.87 0.28 801.31

800.17 0.09 800.08

F5 F(min) = 900

Mean Std Opt

902.11 0.60 900.69

901.68 0.88 900.76

902.90 1.58 900.72

904.44 1.28 902.44

900.07 0.20 900.00

F6 F(min) = 1800

Mean Std Opt

4.37E + 08 4.02E + 08 4.37E + 04 1.11E + 08 5184.90 5.05E + 08 2.68E + 08 1.17E + 04 8.40E + 07 2392.10 1.46E + 06 2.73E + 07 2.28E + 04 1.07E + 07 2468.70

F7 F(min) = 2000

Mean Std Opt

3135.22 565.43 2477.06

F8 F(min) = 2200

Mean Std Opt

1.93E + 10 2.12E + 11 2378.38 5.04E + 10 6.35E + 11 130.89 4002.15 8211.54 2245.97

1.23E + 08 2228.90 3.73E + 08 5.40 2680.11 2224.65

F9 F(min) = 2300

Mean Std Opt

3236.46 224.86 2888.80

3485.42 329.93 3040.53

2706.51 189.72 2300.00

3202.19 208.32 2958.25

2416.13 218.31 2300.50

F10 F(min) = 2400

Mean Std Opt

2906.65 140.32 2667.56

2809.94 140.56 2631.74

2689.98 126.58 2600.82

2928.01 112.07 2714.65

2605.35 6.89 2598.89

F11 F(min) = 2600

Mean Std Opt

3182.43 154.16 2887.00

2826.81 70.17 2724.99

2802.46 358.82 2600.00

2984.62 231.23 2734.35

2600.10 0.04 2600.03

F12 F(min) = 2700

Mean Std Opt

3166.85 140.69 2960.73

3254.05 100.63 3079.57

2945.28 64.62 2878.34

3174.86 115.97 3008.98

2866.82 0.69 2865.88

2622.16 237.48 2314.47

2093.00 49.31 2021.72

2755.84 419.57 2273.79

2033.01 6.84 2025.44

value. Other algorithms show more mediocre performance. This issue is also an area that needs improvement in future research and study. The following are convergence curves for test functions (Fig. 1).

382

S. Zheng et al. Table 2. Optimization result Algorithm

X1

X2

Fitness

SSA

0.78851535

0.40896747

263.92256782

ISSA

0.86542871

0.25767629

270.54783336

CSSA

0.78995273

0.40471242

263.90361633

Fig. 1. Convergence curves

From the figure, it can be seen that due to the introduction of Levy flight strategy, CSSA is more likely to jump out of local optima and find the optimal value compared to

Sparrow Search Algorithm Based on Cubic Mapping

383

SSA. The effective improvement of the quality of the initial population by Cubic chaos can be clearly seen from F8, F9, F10, and F12, where the initial solutions obtained by CSSA are relatively better.

5 Design of Three-Bar Truss Truss engineering is a field that deals with the design, analysis, and construction of structures using truss systems. Trusses are composed of interconnected triangles that are used to distribute loads and provide stability to a structure. One common problem in truss engineering is determining the optimal size and shape of the truss members to ensure that the structure can withstand the expected loads without excessive deflection or failure. This requires careful consideration of the material properties, loading conditions, and geometric constraints. The truss design with three bars is shown in Fig. 2, where x1, x2, and x3 are the cross-sectional areas of the three bars. Due to its symmetry, x1 is equal to x3. The goal of the design is to adjust the cross-sectional areas of x1 and x2 to minimize the volume of the truss, while each structure is subject to force σ constraints. The mathematical model is as follows:  √ min f (x) = 2 2x1 + x2 · L The constraint conditions are as follows: √ 2x1 + x2 G1 = √ 2 P−σ ≤0 2x1 + 2x1 x2 x2 G2 = √ 2 P−σ ≤0 2x1 + 2x1 x2 1 G3 = √ P−σ ≤0 2x2 + x1 The range of variable values is as follows: 0.001 ≤ x1 ≤ 1,

0.001 ≤ x2 ≤ 1

L = 100cm, P = 2KN /cm2 , σ = 2KN /cm2 Compare SSA, ISSA, and CSSA in this experiment. The population size is 50, and the maximum number of iterations is 500. The experimental data is shown in the following table.

384

S. Zheng et al. L

L

X1

X2

L

X3

P

Fig. 2. Three-bar truss

As we can see from the table, under the above requirements. In the engineering application of the three-bar truss, CSSA is more outstanding, which also indicates the feasibility of CSSA in engineering applications.

6 Conclusion This article comes up with a modified sparrow search algorithm. The population is initialized using cubic chaos to achieve a more uniform distribution. The trajectory of Levy flight prevents the algorithm from getting into local optima, and the adoption of adaptive weight enhances the algorithm’s ability for local exploitation. In an attempt to verify the algorithm ability, CEC2022 test functions are used to test. In the application of three bar-truss engineering, CSSA showed more excellent results than other algorithms. CSSA still has many shortcomings, and further exploration and research will continue in the future, with a focus on applying it to more practical applications. Acknowledgement. This work is partially supported by the National Natural Science Foundation of China (No. 61976101) and the University Natural Science Research Project of Anhui Province (No. 2022AH040064).

References 1. Saremi, S., Mirjalili, S., Lewis, A.: Grasshopper optimization algorithm: theory and application. Adv. Eng. Softw. 105, 30–47 (2017) 2. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016) 3. Ma, W., Sun, S., Li, J., et al.: An improved artificial bee colony algorithm based on the strategy of global reconnaissance. Soft Comput. 20(12), 1–33 (2015) 4. Xue, J., Shen, B.: A novel swarm intelligence optimization approach: sparrow search algorithm. Syst. Sci. Control Eng. 8(1), 22–34 (2020) 5. Liu, T., Yuan, Z., Wu, L., et al.: An optimal brain tumor detection by convolutional neural network and enhanced sparrow search algorithm. Proc. Inst. Mech. Eng.—Part H: J. Eng. Med. 235(4), 459–469 (2021)

Sparrow Search Algorithm Based on Cubic Mapping

385

6. Ibrahim, R.A., Elaziz, M.A., Lu, S.: Chaotic opposition based grey-wolf optimization algorithm based on differential evolution and disruption operator for global optimization. Expert Syst. Appl. 108, 1–27 (2018) 7. Teng, Z.-J., Lv, J.-L., Guo, L.-W.: An improved hybrid grey wolf optimization algorithm. Soft Comput. 23(15), 6617–6631 (2019) 8. Ouyang, C., Qiu, Y., Zhu, D.: Adaptive spiral fly in sparrow search algorithm. Sci. Program. 2021, 6505253 (2021) 9. Yuan, J., Zhao, Z., Liu, Y., et al.: DMPPT control of photovoltaic microgrid based on improved sparrow search algorithm. IEEE Access 9, 16623–16629 (2021) 10. Ma, W., Zhu, X.: Sparrow search algorithm based on Levy flight disturbance strategy. J. Appl. Sci. 40(01), 116–130 (2022) 11. Zhang, X., Zhang, Y., Liu, L., et al.: An improved sparrow search algorithm combining multiple strategies. Appl. Res. Comput. 39(04), 1086–1091+1117 (2022) 12. Fu, H., Liu, H.: An improved sparrow search algorithm based on multi-strategy fusion and its application. Control Decis. 37(01), 87–96 (2022) 13. Feng, J., Zhang, J., Zhu, X., et al.: A novel chaos optimization algorithm. Multimed. Tools Appl. 76(16), 17405–17436 (2016, 2017) 14. Yu, K., Wang, X., Wang, Z.: Study and application of improved teaching-learning-based optimization algorithm. Chem. Ind. Eng. Prog. 33(4), 850–854 (2014) 15. Deep, K., Bansal, J.C.: Mean particle swarm optimisation for function optimisation. Int. J. Comput. Intell. Stud. 1(1), 72–92 (2009) 16. Mao, Q., Zhang, Q., Mao, C., et al.: Mixing sine and cosine algorithm with Lévy flying chaotic sparrow algorithm. J. Shanxi Univ. (Nat. Sci. Ed.) 44(06), 1086–1091 (2021) 17. Kumar, A., Price, K.V., Mohamed, A.W., et al.: Problem definitions and evaluation criteria for the 2022 special session and competition on single objective bound constrained numerical optimization. Technical report (2021)

Hyper-heuristic Three-Dimensional Estimation of Distribution Algorithm for Distributed Assembly Permutation Flowshop Scheduling Problem Xiao Li1 , Zi-Qi Zhang1(B) , Rong Hu1,2 , Bin Qian1 , and Kun Li1 1 School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China [email protected] 2 School of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, China

Abstract. For the distributed assembly permutation flowshop scheduling problem (DAPFSP) to minimize the maximum completion time, this study suggests a hyperheuristic three-dimensional estimation of distribution algorithm (HH3DEDA) for solving it. The HH3DEDA consists of a high-level strategy (HLS) domain and a low-level problem (LLP) domain. The HLS domain guides the global search direction of the algorithm, while the LLP domain is responsible for searching local information in the problem domain. The HH3DEDA in this paper uses a variety of optimization strategies and metaheuristics that allow for global search and optimization, with the simultaneous setting up of nine variable neighborhood local search operations, and the arrangement of them as HLS domain individuals. Concurrently, the three-dimensional distribution estimation algorithm (3DEDA) is used in the HLS domain to learn the block structure of high-quality individuals in the HLS domain and their location information., and generating new HLS domain individuals by sampling the probability model in 3DEDA, then a series of ordered heuristic operators represented by each new individual generated at the HLS domain is used as a new heuristic algorithm at the LLP domain to perform a more in-deep neighborhood search in the problem domain. Keywords: Distributed Assembly · Hyper-heuristic Algorithm · Estimation of Distribution Algorithm · Variable neighborhood local search

1 Introduction With the increase of production scale and complexity, the traditional flowshop scheduling methods gradually expose many limitations and drawbacks, and the advantages of distributed flowshop scheduling are showing up. The distributed assembly permutation flowshop scheduling problem (DAPFSP) is a composite of the distributed permutation flowshop scheduling problem (DPFSP) and the assembly flowshop scheduling problem (AFSP) [1]. For computational complexity, the DAPFSP has been proven to be © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 386–396, 2023. https://doi.org/10.1007/978-981-99-4755-3_34

Hyper-heuristic Three-Dimensional Estimation

387

NP-hard. For solving the minimized makespan of the DAPFSP, Sara [2] built a mixedinteger linear programming model, presented a series of heuristics to construct the initial solution, and designed a variable domain descent algorithm to address the initial solution. Lin [3] came up with a backtracking search hyper-heuristic (BBSH) algorithm to tackle this problem and compared it with the literature [3], distribution estimation algorithms, and other intelligent algorithms which highlighted the effectiveness of the BBSH algorithm. Pan [4] put forwarded a mixed integer linear model and listed three different structured heuristic algorithms. Finally, they verified the advantages of the mentioned algorithms on algorithm performance and running time through simulation experiments involving 16 existing algorithms. Zhang [5] showed an enhanced EDA to fix the flowshop scheduling problem. Combining the longest common subsequence mechanism with a statistical probability matrix model enhanced the algorithm’s ability to mine high-quality solution information, thus improving the algorithm’s global search performance. Wang [6] submitted a fuzzy logic-based hybrid estimation distribution algorithm (FL-HEDA) to solve DAPFSP under machine failures, aiming to minimize machine failure’s life optimization objective for distributed lines. Most of the algorithms designed in the above literature use two-dimensional (2D) probability models to accumulate high-quality solutions information. However, due to the limitations of their structure, 2D probability models cannot learn and accumulate the location information of different block structures in high-quality solutions, which makes them have great limitations in guiding algorithm searches, therefore, three-dimensional (3D) models are more efficient. What’s more the hyper-heuristic algorithm (HHA) also is a new intelligent optimization algorithm, which realizes the search of different regions of solution space through the high-level strategy operation of the low-level heuristic algorithm. In recent years, HHA has been successfully applied to various combinatorial optimization problems [7, 8]. Qin [9] proposed a hyper-heuristic algorithm based on reinforcement learning to solve the heterogeneous vehicle routing problem. Lin [10] designed a Genetic Programming Hyper Heuristic, (GP-HH) algorithm to optimize the makespan of DAPFSP with sequence-related setup time. Zhou [11] presented a hyperheuristic co-evolutionary algorithm to optimize the minimum average weighted delay, maximum delay, and average flow time of the multi-objective dynamic flexible job-shop scheduling problem. According to the problem characteristics of DAPFSP, HH3DEDA is proposed in this paper. It is an effective combination of HHA and EDA, which can quickly search for the optimal solution. The rest of this paper is organized as below. In Sect. 2, the DAPFSP permutation model is established. The HH3DEDA procedure is designed in Sect. 3. In Sect. 4, computational comparisons and statistical analyses are conducted and discussed. Finally, some concluding remarks and suggestions for future research are provided in Sect. 5 (Table 1).

388

X. Li et al. Table 1. The Symbols used in the permutation model of DAPFSP.

Symbols

Definition

f

The index for factories where f = {1, 2, ..., F}

M

The index for machines where I = {M1 , M2 , ..., Mm }



The index for products where ={1 , 2 , ..., δ }



The index for all parts where ={1 , 2 , ..., l }

l

the set of parts that belongs to the product l , l = {l,1 , l,2 , ..., l,nl }

f

the set of parts that belongs to the factory f , f = {1 , 2 , . . . , nf }

O

The set of operations O = {Oi,1,l , O1,j,l , ..., Oi ,j,l }

f

MA

Assembly machine

δ

All products

n

All parts

p

f

i ,j,l

f

The processing time of the operation Oi ,j,l on the machine Mj

pl

The assembly time of the product l on the machine MA

C

The complement time of the operation Oi ,j,l on the machine Mj

f

i ,j,l

f

Cl

The completion time of the product l

f Cl

The completion time of products in the factory f

Cmax

The solution of makespan

2 Problem Statement The DAPFSP aims to determine the allocation of parts in the factories, the processing order of parts on machines, and the assembly order of each product, Fig. 1 is the diagram of DAPFSP. The production efficiency criterion needs to minimize the makespan, which is defined as the completion time of the last product on the assembly machine. According to the above description, the permutation model of DAPFSP can be given as follows. Cf ,1, = pf ,1, , f = 1, 2, ..., F; l = 1 , ..., δ . l

1

l

1

Cf ,1, = Cf

i−1 ,1,l

l

i

+ pf ,1, , l

i

i = 2, 3, ..., nf ; f = 1, 2, ..., F; l = 1 , ..., δ . Cf ,j, = Cf ,j−1, + pf ,j, , 1

l

1

l

l

1

j = 2, 3, ..., m; f = 1, 2, ..., F; l = 1 , ..., δ . Cf ,j, = max{Cf i

l

i−1 ,j,l

, Cf ,j−1, } + pf ,j, , i

l

i

l

i = 2, 3, ..., nf ; j = 2, 3, ..., m; f = 1, 2, ..., F; l = 1 , ..., δ .

(1)

(2)

(3)

(4)

Hyper-heuristic Three-Dimensional Estimation f

C1 = Cf ,m, + p1 , f = 1, 2, ..., F. 1

i

f

389

(5)

f

Cl = max{Cf ,m, , Cl−1 } + pl , i

l

i = 2, 3, ..., nf ; f = 1, 2, ..., F; l = 1 , ..., δ . 1 2 F + C , ..., +C } Cmax = min{C l l l

(6) (7)

Fig. 1. The figure of DAPFSP.

3 Algorithm Design 3.1 Encoding and Decoding Take the numerical example in literature [4] I_8_4_2_4_1 as an example. Adopt the 3D vector coding method, that is, factory sequence, machine sequence, and parts sequence. The low-level problem domain individual π is the product processing sequence, and the decoded individual contains two parts: product sequence πf assigned to each factory and part sequence πj included in each product. For ease of understanding, Fig. 2 is the Gantt of DAPFSP in this paper. The subsequences of all the parts assigned to the two factories are π1 = [6, 1, 5, 3] and π2 =[7, 2, 4, 8], respectively. Furthermore, [2, 4, 6] indicates that part 6, which is included in product 2, is processed on the M4 . 3.2 HH3DEDA for DAPFSP Step 1: The population and 3D probability models are initialized. The population size is popsize, abbreviated as ps. The populations in the low-level problem domain are

390

X. Li et al.

Fig. 2. The Gantt of DAPFSP.

randomly initialized to ensure diversity and randomness. The 3D probability model is initialized as follows: PG is defined as the population of high-level strategy domain of generation G in G,sps HH3DEDA, best_PG = {IHG,1 , IHG,2 , . . . , IH } is the set of high-quality solutions in PG , sps is the scale of best_PG , and IHG,k is the kth individual of best_PG . There is a G that stores information about similar blocks and their distribution 3D matrix MCn×n×n G information in outstanding individuals of generation G, MCn×n×n (x, y, z) is the element G in MCn×n×n , with the following description:  1, y = IHG,k (x) , z = IHG,k (x + 1), G,k F_MCn×n×n (x, y, z) = 0, else, x = 1, ... , n − 1, y, z = 1, ... , n, k = 1, ... , sps (8) sps G,k G (x, y, z) = F_MCn×n×n (x, y, z), MCn×n×n k−1 (9) x = 1, ... , n − 1, y, z = 1, ... , n, k = 1, ... , sps ⎡

⎤ G (x, 1) MCn×n×n ⎢ ⎥ .. G MCn×n×n (x) = ⎣ ⎦ . G MCn×n×n (x, n)

n×1



⎤ G G (x, 1, 1) · · · MCn×n×n (x, 1, n) MCn×n×n ⎢ ⎥ .. .. .. =⎣ ⎦ , . . . G G MCn×n×n (x, n, 1) · · · MCn×n×n (x, n, n) n×n

x = 1, ... , n − 1 G MCn×n×n (y, z)

G G G = MCn×n×n (1, y, z), MCn×n×n (2, y, z), ... , MCn×n×n (n, y, z)

(10)

1×n

,

x = 1, ... , n − 1, y, z = 1, ... , n (11)

Hyper-heuristic Three-Dimensional Estimation

391

G G Generating a model P_MCn×n×n based on MCn×n×n learning and accumulating the position information of similar blocks in high-level strategy domain high-quality G G (x, y, z) is an element in P_MCn×n×n . The sum of the solutions, so that P_MCn×n×n probabilities that all similar blocks of good solutions in the high-level strategy domain are distributed at position x is: ⎤ ⎡ G (x, 1) P_MCn×n×n ⎥ ⎢ .. G (x) = ⎣ P_MCn×n×n ⎦ .



G (x, n) P_MCn×n×n

G (x, 1, 1) P_MCn×n×n

n×1 ⎤ G P_MCn×n×n (x, 1, n)

··· ⎥ ⎢ . .. .. . =⎣ ⎦ . . . G G P_MCn×n×n (x, n, 1) · · · P_MCn×n×n (x, n, n) n×n

(12)

All good solution similar blocks in the high-level strategy domain are distributed at the x position, the probability sum on is: n n G Sum_P_MC G (x) = P_MCn×n×n (x, y, z) (13) y=1

z=1

All good solution similar blocks in the high-level strategy domain are distributed at the x position, the number of times on is: n n G Sum_MC G (x) = MCn×n×n (x, y, z) (14) y=1

z=1

The update process for a probability model is as follows: 1/n, x = 1, y, z = 1, ..., n 0 P_MCn×n×n (x, y, z) = 1/n2 , x = 2, 3, · · · , n − 1,y, z = 1, ..., n

(15)

Step 2: Design low-level heuristic operators: LLH1 : Randomly swap any two products within a critical factory. LLH2 : Take a product from a critical factory and insert it randomly into another position. LLH3 : Randomly reverse the order of products within the critical factory. LLH4 : Random swap of products between two factories. LLH5 : Take a product from a critical factory and insert it randomly into another factory. LLH6 : Randomly swap any two products within a critical factory. LLH7 : Select a product from the critical factory, then select a part from the sequence included in that product and randomly insert it elsewhere in the sequence. LLH8 : Select any product from the critical factory, then select any 2 parts from the part sequence included in the product, and randomly swap the positions of the two parts. LLH9 : Select any product from the critical factory, then select a segment from the part sequence included in the product, and reverse the segment sequence. Step 3: For the high-level strategy domain, each individual in the population is composed of 9 kinds of low-level heuristic operators, the individual length is 12, and the same low-level heuristic operators are allowed to appear in the same individual. Similarly, the high-level strategy domain population size is set as ps. When decoding the high-level strategy domain individual, the scheduling solution of the low-level problem domain

392

X. Li et al.

successively performs the low heuristic operators in the high-level strategy domain individual from left to right. After each execution of the corresponding low-level heuristic operation, the new solution is compared with the original solution. If the new solution is better than the old one, the new solution is replaced by the old one. Otherwise, leave the original solution and continue with the remaining low-level heuristics. When all low-level heuristic operators in the high-level strategy domain individual are completed, the fitness value of the high-level strategy domain individual is calculated. The fitness value is the average fitness value of the updated low-level problem domain individual. Step 4: The operation selection function is used to evaluate the population of the low-level problem domain. There are two situations: one is that the initial position of the individuals in the high-level strategy domain is not at position 1, and the other is that the initial position of the individuals in the high-level strategy domain is at position 1. Step 4.1: for situation 1, suppose the selective operation function is opt (I G,k , i), used to determine the low-level heuristic operation occurring at the position i in I G,k , Since the probability of a similar block [G,k (i − 1), G,k (i)] being selected is stored G (i − 1), opt (I G,k , i) samples according to the in the 3D probability model P_MCn×n×n G 3D probability model P_MCn×n×n (i − 1). The specific operation process opt (I G,k , i) is as follows: Step 4.1.1: Random generation probability number r,

n

G−1 r ∈ 0, P_MCn×n×n (i − 1, I G,k (i − 1), h) (16) h=1

Step 4.1.2: Select the low-level heuristic operation Lc by roulette method, if

G−1 r ∈ 0, P_MCn×n×n (i − 1, I G,k (i − 1), 1) Then c = 1, go to Step 4.1.3. if

pos G−1 r∈ P_MCn×n×n (i − 1, I G,k (i − 1), h), h=1

pos+1 G−1 P_MCn×n×n (i − 1, I G,k (i − 1), h) ,

(17)

(18)

h=1

pos ∈ {1, ... , n − 1} Then c = pos + 1. Step 4.1.3: Return Lc Step 4.2: For situation 2, because the high-level strategy domain individual is at the initial position 1, there is no similar block [I G,k (i − 1), I G,k (i)], Sampling function opt (I G,k , i) cannot be performed at the initial position 1, so an initial sampling strategy is proposed for the initial position, which is described as follows: G−1 (y). Step 4.2.1: Calculate SumP_MCinit n G−1 (y) = P_MC G−1 (1, y, z), y = 1, ..., n (19) SumP_MCinit z=1

Step 4.2.2: The roulette method is used to determine the low-level heuristic operation Lc  when the individual I G,k , k = 1, ... , ps is in the initial position 1, Random generation probability number r  , n G−1 (y) = P_MC G−1 (1, y, z), y = 1, ..., n (20) SumP_MCinit z=1

Hyper-heuristic Three-Dimensional Estimation

Then c = 1, go to Step 4.2.3. if

pos G−1 SumP_MCn×n×n (i − 1, I G,k (i − 1), h), r ∈ h=1  pos+1 G−1 G,k SumP_MCn×n×n (i − 1, I (i − 1), h) ,

393

(21)

h=1

pos ∈ {1, ... , n − 1} Then c = pos + 1. Step 4.2.3: Let I G,k (1), k = 1, ... , ps be Lc  Step 5: All individuals in the high-level strategy domain population are decoded and new solutions are obtained according to the low-level heuristic operators in the high-level strategy domain individuals. Step 6: update the next generation population PG (G + 1), The update process is as follows: Step 6.1: let k = 1, the low-level heuristic operation Lc  of I G,k at initial position 1 according to the initial sampling strategy. Step 6.2: let i = 2, I G,k (i) = opt (I G,k , i), i = i + 1, if i ≤ n then go to step 3. Step 6.3: set k = k + 1, if k ≤ ps then turn to step 1. Step 6.4: output PG (G + 1). 0 (x, y, z) and update the probability model Step 7: Calculate the 3D matrix MCn×n×n 1 P_MCn×n×n . ⎧ 0 MCn×n×n (x,y,z) ⎪ ⎪ ⎪ 0 (x) , ⎪ Sum_MC ⎪ ⎪ ⎪ ⎪ ⎨ x = 1; y, z = 1, . . . , n 1 P_MCn×n×n (x, y, z) = (22) 0 0 ⎪ P_MCn×n×n (x,y,z)+MCn×n×n (x,y,z) ⎪ ⎪ , ⎪ Sum_P_MC 0 (x)+Sum_MC 0 (x) ⎪ ⎪ ⎪ ⎪ ⎩ x = 2, . . . , n − 1; y, z = 1, . . . , n Step 7.1: Let G = 1, G G and update the probability model P_MCn×n×n . Step 7.2: Calculate MCn×n×n G G−1 (x, y, z) = (1 − r) × P_MCn×n×n (x, y, z) P_MCn×n×n

+

G−1 (x, y, z) r × MCn×n×n

, Sum_MC G−1 (x) x = 1, 2, · · · , n − 1, y, z = 1, ..., n

(23)

Step 7.3: Set G = G + 1, if G < max G go to step 7.2. Otherwise, the loop is terminated.

4 Simulation Result and Comparisons The algorithm program was programmed in PyCharm Community Edition 2022.2.3 and the experimental running environment was a 12th Gen Intel (R) Core (TM) i5–12400 2.50 GHz computer with 12 GB RAM. To verify the validity of HH3DEDA in solving

394

X. Li et al.

DAPFSP, a Genetic algorithm (GA) in literature [12] and Iterative greed (IG) in literature [4] were selected for comparison, all test studies in this paper are generated based on the test studies provided by literature [4]. There are 20 test examples in this paper. Each best is the algorithm runs 5 times independently for each test case at the same time. Cmax optimal value of the output results of the algorithm running 5 times independently, and ARD is the average of the output results of the algorithm running 5 times independently value. And obtains the following experimental data: The number of parts n = {8, 12, 16}, the number of machines m = {2, 3, 4, 5}, the number of factories F = {2}, and the number of products δ = {3, 4}. ARD =

R  C r − C best 1 ( max best max × 100) × R Cmax

(24)

r=1

Table 2 is the experimental results of each algorithm and each group of examples. Table 2. Comparison of HH3DEDA, GA, and IG n×m×f ×δ

best Cmax

ARD

HH3DEDA

GA

IG

HH3DEDA

GA

IG

8×2×2×3

764

831

631

0.096

0.340

0.187

8×2×2×4

620

549

657

0.303

0.361

0.262

8×3×2×3

953

955

966

0.076

0.115

0.130

8×3×2×4

695

851

870

0.128

0.292

0.123

8×4×2×3

1032

1066

1072

0.063

0.106

0.107

8×4×2×4

859

967

971

0.157

0.183

0.127

8×5×2×3

1067

1125

1091

0.127

0.146

0.183

8×5×2×4

1031

1121

936

0.077

0.083

0.225

12 × 2 × 2 × 3

962

1072

959

0.186

0.255

0.340

12 × 2 × 2 × 4

1047

1116

1177

0.162

0.212

0.166

12 × 3 × 2 × 3

1210

1350

1319

0.154

0.171

0.147

12 × 3 × 2 × 4

1019

1136

1025

0.232

0.166

0.325

12 × 4 × 2 × 3

1094

1094

1384

0.191

0.238

0.109

12 × 4 × 2 × 4

1172

1190

1279

0.142

0.220

0.149

12 × 5 × 2 × 3

1530

1423

1350

0.090

0.182

0.153

12 × 5 × 2 × 4

1466

1452

1328

0.094

0.113

0.151

16 × 2 × 2 × 3

1212

1278

1362

0.166

0.223

0.210 (continued)

Hyper-heuristic Three-Dimensional Estimation

395

Table 2. (continued) n×m×f ×δ

best Cmax

ARD

HH3DEDA

GA

IG

HH3DEDA

GA

IG

16 × 2 × 2 × 4

1156

1471

16 × 3 × 2 × 3

1299

1299

1483

0.254

0.091

0.127

1299

0.232

0.299

0.332

16 × 3 × 2 × 4

1394

16 × 4 × 2 × 3

1270

1479

1507

0.139

0.188

0.181

1384

1395

0.167

0.325

0.310

16 × 4 × 2 × 4

1468

1638

1593

0.067

0.121

0.129

16 × 5 × 2 × 3

1466

1381

1460

0.094

0.146

0.109

16 × 5 × 2 × 4

1620

1838

1648

0.376

0.301

0.415

5 Conclusions and Future Research In this paper, HH3DEDA is proposed to solve the DAPFSP aiming at minimizing the maximum completion time. The experimental results show that the algorithm has good search performance and effect through a large number of standard examples for scheduling problems. HH3DEDA uses the idea of heuristic search to continuously adjust the search direction in the search process, effectively avoid the local optimal solution, and quickly find the global optimal solution, which can ensure the accuracy of the solution while making the algorithm the highest efficiency. At the same time, HH3DEDA has broad universality: it is possible to estimate the joint distribution of three variables from the data in the problem scale, which may represent position, velocity, or any other continuous variable in 3D space. HH3DEDA can not only be used to estimate the distribution of three-dimensional data, but also can be extended to higher dimensions, and can be applied to different types of data, with strong universality. Compared with the 2D probability model used in EDA, the 3D probability model can simultaneously learn the block structure information and its location information of high-quality individuals, and determine the location of high-quality block structure in the scheduling solution more accurately when sampling and generating new individuals. In this way, HH3DEDA can quickly guide the algorithm to reach the high-quality solution area for search, so as to improve the search efficiency of the algorithm. This paper only discusses the singleobjective DAPFSP. In the future, we will further explore the HH3DEDA to solve the more complex multi-objective or multi-constraint DAPFSP. Acknowledgments. The authors are sincerely grateful to the anonymous reviewers for their insightful comments and suggestions, which greatly improve this paper. This work was financially supported by the National Natural Science Foundation of China (Grant Nos. 72201115, 62173169, and 61963022), the Yunnan Fundamental Research Projects (Grant No. 202201BE070001050 and 202301AU070069), and the Basic Research Key Project of Yunnan Province (Grant No. 202201AS070030).

396

X. Li et al.

References 1. Naderi, B., Ruiz, R.: The distributed permutation flowshop scheduling problem. Comput Oper Res. 37(4), 754–768 (2010) 2. Hatami, S., Ruiz, R., Andrés-Romano, C.: The distributed assembly permutation flowshop scheduling problem. Int. J. Prod. Res. 51(17), 5292–5308 (2013) 3. Lin, J., Wang, Z.-J., Li, X.: A backtracking search hyper-heuristic for the distributed assembly flow-shop scheduling problem. Swarm Evol. Comput. 36, 124–135 (2017) 4. Pan, Q.-K., Gao, L., Xin-Yu, L., Jose, F.M.: Effective constructive heuristics and metaheuristics for the distributed assembly permutation flowshop scheduling problem. Appl. Soft Comput. 81, 105492 (2019) 5. Zhang, Y., Li, X.: Estimation of distribution algorithm for permutation flow shops with total flowtime minimization. Comput Ind Eng. 60(4), 706–718 (2011) 6. Wang, K., Huang, Y., Qin, H.: A fuzzy logic-based hybrid estimation of distribution algorithm for distributed permutation flowshop scheduling problems under machine breakdown. J Oper Res Soc. 67(1), 68–82 (2016) 7. Song, H.-B., Yang, Y.-H., Lin, J., Ye, J.-X.: An effective hyper heuristic-based memetic algorithm for the distributed assembly permutation flow-shop scheduling problem. Appl. Soft Comput. 135, 110022 (2023) 8. Lin, J.: Backtracking search based hyper-heuristic for the flexible job-shop scheduling problem with fuzzy processing time. ENG APPL ARTIF INTEL. 77, 186–196 (2019) 9. Qin, W., Zhuang, Z., Huang, Z., Huang, H.: A novel reinforcement learning-based hyperheuristic for heterogeneous vehicle routing problem. Comput Ind Eng. 156, 107252 (2021) 10. Lin, J., Zhu, L., Gao, K.: A genetic programming hyper-heuristic approach for the multi-skill resource constrained project scheduling problem. Expert Syst. Appl. 140, 112915 (2020) 11. Zhou, Y., Yang, J.J., Zheng, L.Y.: Hyper-heuristic coevolution of machine assignment and job sequencing rules for multi-objective dynamic flexible job shop scheduling. IEEE Access 7, 68–88 (2019) 12. Li, Zhang, X., Yin, M., Wang, J.: A genetic algorithm for the distributed assembly permutation flowshop scheduling problem. In: 2015 IEEE Congress on Evolutionary Computation (CEC). Publishing, pp. 3096–3101 (2015)

An Improved Genetic Algorithm for Vehicle Routing Problem with Time Windows Considering Temporal-Spatial Distance Juan Wang1(B) and Jun- qing Li2 1 School of Information Science and Engineering, Shandong Normal University, Jinan 250014,

China [email protected] 2 School of Computer, Liaocheng University, Liaocheng 252059, China

Abstract. In recent years, the impact of the epidemic has led to a gradual recovery of the logistics industry, bringing new opportunities to improve the country’s economy. Therefore, the study of the vehicle routing problem with time windows (VRPTW) is of great practical importance. To the best of our knowledge, most studies only consider the distance in customer space and ignore the effect of time windows on objectives. In this paper, we propose an improved genetic algorithm (IGA) to solve the vehicle routing problem with time windows considering temporal-spatial distance (VRPTWTSD). In the proposed algorithm, a hybrid initialization method is first designed, which includes two problem-specific methods, namely the temporal-spatial distance insertion heuristic (TSDIH) and the earliest ready time heuristic (ERH). In addition, two knowledge-based crossover operators are designed for the encoding method to expand the search space. Finally, a series of problem-applicable instances are generated and the effectiveness of the algorithm is proved by statistical analysis through comparison with several well-known algorithms. Keywords: Vehicle routing problem · Time windows · Temporal-spatial distance · Multiobjective optimization · Improved genetic algorithm

1 Introduction Vehicle routing problem (VRP) has been studied more and more extensively in recent years, and has been applied in many practical transportation scenarios, such as supply chain management [1], drone delivery [2] and refuse collection [3]. VRPTW is a generalization of the classic VRP and can be considered as the NP hard problem, where the set N contains n customers served by a series of homogeneous vehicles. Each customer has a soft time window in advance, and early or late arrival can affect customer satisfaction. VRPTW is more similar to realistic transportation scenarios. Therefore, it has become the focus of more research. Solomon [4] proposed VRPTW in 1987 and required that the vehicle must arrive and serve the customer within the customers time windows, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 397–409, 2023. https://doi.org/10.1007/978-981-99-4755-3_35

398

J. Wang and J. Li

and gave the typical instances of VRPTW. Subsequently, many researchers added various new constraints and used different algorithms to solve the VRPTW. Desrosiers et al. [5] and Cook et al. [6] used the exact algorithm to solve VRPTW. However, the exact algorithm has limitations for solving large-scale VRPTW. To be closer to the real transportation scenarios, heuristics and metaheuristics are proposed. Liu et al. [7] investigated a two-stage method for the VRPTW. Hashimoto et al. [8] designed an iterative local search algorithm to determine the optimal service start time. Many metaheuristic algorithms have better global search abilities, while others have better local search abilities. Therefore, a competitive hybrid algorithm performs better than a single one. Wang et al. [9] developed a multiant system to solve the VRPTW considering service time customization. Recently, Fan et al. [10] used a hybrid GA to solve the time-dependent multi-depot green vehicle routing problem. However, most of the literature on VRPTW nowadays does not design the initialization method based on the characteristics of the problem, but uses a single initialization method. In recent years, most of the published multiobjective algorithms have been proposed to solve continuous optimization problems. Considering the foraging behaviour of bees in artificial bee colonies Iqbal et al. [11] used a hybrid hyper heuristic algorithm to solve the multiobjective VRPTW. Cai et al. [12] investigated a hybrid evolutionary multitask algorithm for the VRPTW with five objectives. From the above analysis, it can be concluded that there is little literature considering the multiobjective structure of VRPTW, especially considering the temporal-spatial distance. Genetic algorithm has been widely used to solve many types of scheduling problems, such as flow shop problem (FSP), TDVRP and VRPTW. The advantages of genetic algorithms are as follows: 1) Avoid the algorithm to fall into local optimum by mutation mechanism; 2) Introduce the idea of probability in natural selection and the randomness of individual selection; 3) Extensible and easy to use in combination with other algorithms and 4) GA has been shown to be effective in many applications of scheduling problems, so in this paper we use GA to solve the VRPTWTSD. In realistic transportation scenarios, customers who are spatially located far away may have narrower time windows. Therefore, it is necessary to consider both the spatial location of customers and the time distance between them. In this paper, we investigate VRPTW with temporal-spatial distance and consider minimizing travel time and maximizing customer satisfaction as optimization objectives. The main contributions of this paper are as follows: 1) Build a mathematical model based on the problem and constraints; 2) A specific encoding approach is proposed based on the objectives and constraints of the problem; 3) A hybrid initialization method containing two heuristics is proposed; 4) Three problem-specific crossover operators are designed to expand the search space. The rest of this article is organized as follows. Section 2 describes VRPTWTSD. IGA is presented in Sect. 3 in detail. The results of several comparative experiments are given in Sect. 4. Finally, we end this article with summary and future research in Sect. 5.

An Improved Genetic Algorithm for Vehicle Routing Problem

399

2 Problem Description 2.1 Symbolic Representations To facilitate the understanding of the problem, the symbols and decision variables used in this paper are given in Table 1. Table 1. Summary notations and variables Notation

Description

V

V  = V ∪ {v0 } is set of all nodes, where V is the set of customers and v0 represents the depot

E

E = {(i, j)|i, j ∈ V , i  = j} is edge set

K

A set of homogeneous vehicles

h

The grade of the road

lij

The distance between customer i and j

vij

The speed of the vehicle between customer i and j

pr tij

The departure time of vehicle p from customer i to j in time zone r

tij

The travel time of arc (i, j) by vehicle p

di

The demand of customer i

ti

The vehicle arrival time at customer i

si

The service time at customer i

p

wi

The waiting time at customer i

cs(i) (ti )

The satisfaction of customer i

pr

xij p

yi

If the vehicle p reaches the customer j from customer i within the time zone r, pr

pr

xij = 1, else xij = 0 p

p

If the customer i is served by vehicle p, yi = 1, else yi = 0

2.2 Assumptions and Formulas Similar to the classical VRPTW, the goal is to find the set of routes with maximum customer satisfaction and minimum travel time under given constraints. The assumptions are as follows: 1) 2) 3) 4) 5)

The departure time of each vehicle from the depot is 0. All vehicles are homogeneous. Vehicles must depart and return before the depot is closed. Location information of all customers and time windows are predefined. Each customer can only be served once.

400

J. Wang and J. Li

6) Early or late arrival is allowed, but it will affect customer satisfaction. 7) The capacity limit of the vehicle is not allowed to exceed. Based on the above statements and the multiple objectives, the VRPTWTSD is formulated as follows:   p min f1 = tij (1) p∈P i∈V  j∈V /{i}

min f2 =  s.t.



i∈V pr x0j

1 cs(i) (ti )

= 1, ∀p ∈ P,

(2)

(3)

j∈V r∈T



pr

xi0 = 1, ∀p ∈ P,

i∈V r∈T

 

p

yi =

 

pr

xji =

j∈V /{i} r∈T

pr

xij , ∀p ∈ P, i ∈ V ,

(4) (5)

j∈V /{i} r∈T





p

(6)

di yi ≤ Q, ∀p ∈ P,

(7)

yi = 1,

p∈P i∈V p

i∈V p

tj ≥ ti + tij + si , ∀i, j ∈ V , i = j, ∀p ∈ P,

(8)

tij = ti + wi + si , ∀i, j ∈ V , i = j, ∀p ∈ P, ∀r ∈ T ,

(9)

ei ≤ ti + wi ≤ li , ∀i ∈ V ,

(10)

wi = max{0, ei − ti }, ∀i ∈ V ,

(11)

t0j = w0 = s0 = 0, ∀j ∈ V , ∀p ∈ P, ∀r ∈ T ,

(12)

pr

pr

e0 +



p pr

tij xij +

i∈V  j∈V 

 j∈V

wj +



sj ≤ l0 , ∀p ∈ P, ∀r ∈ T ,

pr

xij ∈ {0, 1}, ∀i, j ∈ V , i = j, ∀p ∈ P, ∀r ∈ T , p

(13)

j∈V

yi ∈ {0, 1}, ∀i ∈ V , ∀p ∈ P.

(14) (15)

The objective functions (1) and (2) represent minimizing total travel time and minimizing the inverse of customer satisfaction, respectively. Constraints (3) and (4) ensure that each vehicle starts from the depot and finally returns to the depot. Constraint (5)

An Improved Genetic Algorithm for Vehicle Routing Problem

401

states that the flow of each customer node is conserved. Constraint (6) indicates that each customer can only be visited by one vehicle once. Constraint (7) is the capacity constraint of trips. Constraint (8) guarantees that if i and j are continuous during the route of vehicle p, the arrival time at customer j should be greater than or equal to the arrival time at customer i plus the service time of customer i plus the travel time of customer i to customer j. . Constraint (9) describes the departure time from customer i to customer j. . Constraint (10) ensures that customers are served within the required time. Constraint (11) defines the waiting time if the vehicle arrives at customer i earlier. Constraint (12) represents the times of depot. Constraint (13) ensures that the return time of each vehicle does not exceed the closing time of the depot. Constraints (14) and (15) represent the attributes of the decision variables.

3 The Proposed Algorithm 3.1 Encoding In this paper, a two-dimensional encoding approach is used to represent the solution. The first dimensional vector L1 represents the order in which customers are served and its length N is the number of customers. The second dimensional vector L2 represents the serial number of the corresponding service vehicle and has the same length as L1 . And a feasible solution is shown in Fig. 1.

Fig. 1. An example of a solution

3.2 Initialization Method Assuming that the population size is m, the three compositions of the initial population are as follows: 1) The first m/3 solutions are generated by the random method; 2)The second m/3 solutions are generated by TSDIH method and 3) The last m/3 solutions are generated by ERH method. The TSDIH Method. For ease of understanding and description, the following assumptions are made. 1) the time windows of customers i and j are [ETi , LTi ] and [ETj , LTj ],

402

J. Wang and J. Li p

p

and ETi < ETj ; ; 2) vehicle p arrives at customer i at time ti ∈ [ETi , LTi ], then, tj is p p within [ETi + si + tij , LTi + si + tij ], which is abbreviated to [a, b]. Temporal distance can be formulated as: ⎧ ⎪ ETj − b, b < ETj ⎨ p p DijT = tj − ti , ETj ≤ a < LTj ||a ≤ ETj < b ⎪ ⎩ ∞, a > LTj

(16)

The ERH Method. Customer satisfaction is often directly related to whether or not they are served within their time windows. Based on this requirement, ERH method is designed. The ERH is implemented in Algorithm 1.

Algorithm 1: ERH method Input: The information of customers; Output: The initial solution s ; 1 Generate the random sequence S ; 2 For i = 1 to N do 3 If current route is empty do 4 Insert the customer S (i ) into ; 5 Else If there is only one customer in 6 Insert the customer S (i ) into according to the ready time; 7 Else 8 Insert the customer S (i ) into a pair of consecutive customers by the order of ready time; 9 End If 10 If the capacity limit is not exceeded do 11 Update load of vehicle p ; 12 Else 13 Update s; 14 Add a new vehicle p and an empty route ; 15 End If 16 End For 17 Return s

3.3 Problem-Specific Crossover Method Based on the proposed encoding method, three crossover operators are presented in this section. Customer pairs preservation crossover (CPP) keeps the most common customer pairs in the population and populates the remainder in the genetic order of the parent. An example for the CPP is shown in Fig. 2. The improved order crossover (IO) saves different genes from randomly selected gene sequences, and the remaining genes are populated according to the order of the corresponding parent. The complete process is shown in Fig. 3.

An Improved Genetic Algorithm for Vehicle Routing Problem

403

Fig. 2. The procedure of CPP

Fig. 3. The procedure of IO

Minimum delay time crossover (MDT) retains the route with the least delay time, and the remaining genes are redivided according to the capacity limits of the vehicles. The crossover process of MDT is shown in Fig. 4.

Fig. 4. The procedure of MDT

4 Computational Experiments In this section, extensive experiments are conducted on IGA for solving VRPTWTSD. All experiments are implemented in MATLAB R2018a on the PlatEMO platform [13], and all tests are run on an Intel (R) Core (TM) i5-7300HQ CPU @ 2.50 GHz laptop computer with 8 GB of memory. To evaluate the effectiveness of the proposed algorithm, after a specified number of iterations, the best solutions were collected for performance comparisons. 4.1 Experimental Instances and Performance Criteria The VRPTWTSD instances are generated on the basis of the Solomon’s benchmark data. C, R and RC denote instances with 25, 50 and 100 customers, respectively. Table 2 lists the relevant information of the instances.

404

J. Wang and J. Li Table 2. Instances information

Instances class

Vehicles capacity

Service time

Customer number

C1 C2 R1 R2 RC1 RC2

200 700 200 1000 200 1000

900 900 100 100 100 100

25,50,100 25,50,100 25,50,100 25,50,100 25,50,100 25,50,100

In this paper, the relative percentage increase (RPI) is used to analyse the experimental data obtained by the algorithm. Formula (17) is designed to calculate the RPI value. f c − fb RPI = (17) fb where fc is the HV or IGD value obtained by the algorithm, and fb is the optimal value of HV or IGD. 4.2 Numerical Experiments Efficiency of the Initial Strategy. To investigate the efficiency of the proposed hybrid initial strategy, a comparison is conducted with IGA using random initial strategy (denote as IGA-R) and IGA using hybrid initial strategy (denote as IGA-H). Figure 5 shows the average HV values of 10 independent runs in all instances. It is concluded from Fig. 5 that the proposed hybrid initial strategy in 153 of 168 instances is significantly better than the random initial strategy. Moreover, Fig. 6 shows the unfactored analysis of variance (ANOVA) of HV and IGD obtained from 10 independent runs of all instances, which further indicates that IGA-H has better convergence and diversity than IGA-R within 95% least-significant difference (LSD) intervals.

Fig. 5. Mean HV comparisons between IGA-R and IGA-H

The reason why the proposed hybrid initialization strategy works better is that random initialization does not consider the characteristics of the problem, while the initialization strategy proposed in this paper considers both the temporal-spatial distance and the ready time of customers, resulting in a higher quality initial population. Efficiency of Crossover Operator. To verify the performance of the proposed crossover operator, IGA using the order crossover strategy (denote as IGA-O) and IGA using the

An Improved Genetic Algorithm for Vehicle Routing Problem

405

Fig. 6. The ANOVA comparisons between IGA-R and IGA-H

proposed crossover strategy (denote as IGA-PC) are designed. A total of 36 instances are randomly selected, and the mean value of HV obtained by 5 independent runs is shown in Table 3, and the better HV value is marked in bold. It can be observed from Table 3 that IGA with the proposed crossover strategy obtains almost all of the optimal results out of the given 36 instances, which is significantly better than another method. The advantages of the proposed crossover operator are as follows: 1) CPP retains the most common customer pairs in the population, leaving the excellent genes; 2) MDT is favorable to reduce the delay time of the route. Table 3. Comparison results of IGA-O and IGA-PC Instances

HV values IGA-O

IGA-PC

C101_25

3.2270E-01

4.8642E-01

C104_25

4.1802E-01

5.2994E-01

C205_25

3.6498E-01

5.1864E-01

C206_25

4.9470E-01

5.2214E-01

R107_25

5.3682E-01

5.4068E-01

R108_25

5.1978E-01

5.3258E-01

R210_25

3.3158E-01

5.1254E-01

R211_25

3.2774E-01

5.0906E-01

RC107_25

4.5250E-01

5.5918E-01

RC108_25

3.2126E-01

4.9558E-01

RC207_25

4.2624E-01

5.4052E-01

RC208_25

3.9938E-01

5.1924E-01

C101_50

5.4278E-01

6.0086E-01

C104_50

5.6552E-01

6.2026E-01

C205_50

4.2050E-01

5.2250E-01 (continued)

406

J. Wang and J. Li Table 3. (continued) Instances

HV values IGA-O

IGA-PC

C206_50

5.3344E-01

5.7446E-01

R107_50

5.6336E-01

6.2152E-01

R108_50

5.3324E-01

5.8428E-01

R210_50

3.4848E-01

5.1420E-01

R211_50

5.9884E-01

6.2862E-01

RC107_50

3.9200E-01

5.1818E-01

RC108_50

3.8278E-01

5.2634E-01

RC207_50

5.1808E-01

5.6314E-01

RC208_50

4.4560E-01

5.0902E-01

C101_100

5.1434E-01

5.8008E-01

C104_100

5.9728E-01

6.6356E-01

C205_100

5.9558E-01

6.4232E-01

C206_100

5.6812E-01

6.0580E-01

R107_100

6.1530E-01

6.5264E-01

R108_100

5.9136E-01

6.3920E-01

R210_100

3.6010E-01

5.2910E-01

R211_100

5.7520E-01

6.1646E-01

RC107_100

4.1042E-01

5.4846E-01

RC108_100

3.7438E-01

5.1254E-01

RC207_100

5.8982E-01

6.1898E-01

RC208_100

4.6968E-01

5.3452E-01

Efficiency of Proposed Algorithm. To test the effectiveness of IGA, we compare IGA with MOEA/D [14], MaOEA-CSS [15] and hpaEA [16]. MOEA/D, MaOEACSS and hpaEA are typical optimization algorithms for multiobjective problems. Each comparison algorithm runs 168 instances independently for 5 times. The results of HV and IGD comparisons obtained from 34 randomly selected instances of 50-customers are shown in Table 4. It can be observed from Table 4 that: (1) for HV values, IGA obtains almost all of the optimal results out of the given 34 instances, which is significantly better than the other three algorithms; (2) for IGD values, IGA obtains 33 optimal results out of the given 34 instances, which further proves the effectiveness of the proposed strategy. The good performance of IGA is mainly due to its initialization heuristic and knowledge-based crossover strategy. The hybrid initialization method considering temporal-spatial distance can avoid generating initial populations of lower quality, and

An Improved Genetic Algorithm for Vehicle Routing Problem

407

the knowledge-based crossover strategy can expand the search space while preserving the good genes of the parents.

5 Conclusions and Future Work In this article, the VRPTW with temporal-spatial distance is considered and solved via an improved GA. Firstly, the mathematical model is built and encoded according to the problem characteristics and constraints. Secondly, according to the encoding method, a hybrid initialization method that considers both the temporal-spatial distance of customers and the narrow time windows requirement is proposed to obtain high-quality initial solutions. Thirdly, three crossover strategies based on the Pareto front were designed to deepen the search space. Finally, a comprehensive set of VRPTWTSD instances covering the number of customers of different sizes and time windows was generated for a systematic study of the problem. Furthermore, detailed comparisons demonstrate the effectiveness and practicality of the proposed IGA in solving VRPTWTSD. Although IGA has experimentally demonstrated its effectiveness, a random approach is used to select local search strategies, thus ignoring the knowledge generated during the evolutionary process. In future research, we will focus on the following: (1) taking other realistic constraints into consideration, such as the uncertainty of customer demand, transportation of heterogeneous vehicles, and transportation of multiple depots and multiple vehicles; (2) applying the proposed algorithm to more types of vehicle routing problems, such as UAV scheduling, electric vehicle scheduling, and rail and molten iron transportation; and (3) adding other knowledge-based heuristic strategies to IGA. Table 4. Comparison results for the 50-customers instances Instance

HV MOEA/D

IGD MaOEA-CSS

IGA

hpaEA

MOEA/D

IGA

hpaEA

C101_50

0.3140

0.3282

0.5100

0.3809

1.4340E+04 2.0798E+04

MaOEA-CSS

0.0000E+00

1.7057E+04

C103_50

0.3975

0.4190

0.5054

0.5197

3.6211E+03 7.3585E+03

0.0000E+00

9.8610E+03

C106_50

0.3285

0.3483

0.5187

0.5541

1.1671E+04 1.5537E+04

0.0000E+00

1.9101E+04

C108_50

0.3910

0.3817

0.5153

0.3269

9.4965E+03 1.4119E+04

0.0000E+00

1.3599E+04

C109_50

0.4045

0.3892

0.5167

0.4397

8.9601E+03 1.3120E+04

0.0000E+00

1.4310E+04

C202_50

0.2752

0.3089

0.5385

0.3594

2.8315E+04 2.6397E+04

0.0000E+00

2.8294E+04

C203_50

0.4293

0.4369

0.5662

0.5654

1.5693E+04 1.7349E+04

0.0000E+00

1.9667E+04

C204_50

0.6022

0.6150

0.6499

0.3742

7.3799E+03 6.5926E+03

2.9354E+03

6.8125E+03

C205_50

0.3257

0.3405

0.5299

0.5151

2.8626E+04 2.9315E+04

0.0000E+00

2.7824E+04

C207_50

0.4375

0.4292

0.5794

0.4131

1.9704E+04 2.1932E+04

0.0000E+00

2.2616E+04

C208_50

0.3286

0.3251

0.5159

0.5037

2.3066E+04 2.4297E+04

0.0000E+00

2.4704E+04

R102_50

0.3685

0.3733

0.5316

0.5416

5.2859E+03 8.8367E+03

0.0000E+00

8.5145E+03

(continued)

408

J. Wang and J. Li Table 4. (continued)

Instance

HV MOEA/D

IGD MaOEA-CSS

IGA

hpaEA

MOEA/D

IGA

hpaEA

R105_50

0.3493

0.3403

0.5164

0.3719

6.9298E+03 9.6337E+03

MaOEA-CSS

0.0000E+00

1.0489E+04

R107_50

0.4336

0.4266

0.5554

0.4820

5.1269E+03 6.7009E+03

4.5425E+02

7.0926E+03

R108_50

0.5289

0.5446

0.6112

0.4948

3.8766E+03 3.8660E+03

1.9104E+03

4.8643E+03

R109_50

0.4087

0.3815

0.5407

0.4410

5.7049E+03 8.4921E+03

0.0000E+00

9.5872E+03

R112_50

0.5335

0.5504

0.6126

0.5190

3.0631E+03 3.4766E+03

2.4077E+03

3.6442E+03

R201_50

0.3496

0.3925

0.5915

0.5461

1.0456E+04 9.7353E+03

0.0000E+00

9.1727E+03

R203_50

0.4548

0.4651

0.5835

0.3928

5.3705E+03 5.6996E+03

0.0000E+00

6.5802E+03

R206_50

0.3802

0.3951

0.5689

0.3858

8.3885E+03 8.1532E+03

0.0000E+00

9.5364E+03

R207_50

0.4537

0.4525

0.5692

0.4695

4.3492E+03 4.7136E+03

0.0000E+00

6.8104E+03

R208_50

0.5584

0.5710

0.6328

0.3938

2.4871E+03 2.2274E+03

1.6830E+03

2.2790E+03

R211_50

0.4016

0.4098

0.5471

0.4592

5.5205E+03 5.7887E+03

0.0000E+00

6.7406E+03

RC101_50

0.3391

0.3047

0.5117

0.5433

1.0621E+04 1.4602E+04

1.6503E+04

1.5282E+04

RC102_50

0.5446

0.5497

0.6349

0.5650

7.7106E+03 9.2748E+03

3.9777E+03

1.0009E+04

RC105_50

0.5132

0.5145

0.6124

0.3887

6.6018E+03 9.4926E+03

1.4811E+03

1.1038E+04

RC106_50

0.5491

0.5369

0.6379

0.3988

6.6546E+03 9.6178E+03

1.5912E+03

9.5135E+03

RC108_50

0.6170

0.6262

0.6845

0.5102

4.9341E+03 4.4386E+03

1.4181E+03

4.3937E+03

RC201_50

0.3316

0.3261

0.5551

0.3887

1.1950E+04 1.5203E+04

0.0000E+00

1.4064E+04

RC202_50

0.3789

0.3951

0.5714

0.4347

9.8042E+03 9.9079E+03

0.0000E+00

1.1975E+04

RC203_50

0.4274

0.4350

0.5714

0.6099

6.5387E+03 7.4159E+03

0.0000E+00

7.0666E+03

RC205_50

0.3773

0.3790

0.5792

0.5642

9.7281E+03 1.0698E+04

0.0000E+00

1.1311E+04

RC206_50

0.3630

0.3606

0.5720

0.5771

9.5436E+03 1.2307E+04

0.0000E+00

1.0428E+04

RC208_50

0.3658

0.3991

0.5427

0.3803

8.2713E+03 6.2813E+03

0.0000E+00

9.5166E+03

References 1. Lahyani, R., Coelho, L.C., Khemakhem, M., Laporte, G., Semet, F.: A multi-compartment vehicle routing problem arising in the collection of olive oil in Tunisia. OMEGA-The Int. J. Manage. Sci. 51, 1–10 (2015) 2. Dorling, K., Heinrichs, J., Messier, G.G., Magierowski, S.: Vehicle routing problems for drone delivery. IEEE Trans. Syst. Man Cybern. Syst. 47, 70–85 (2017) 3. Mourao, M.C., Almeida, M.T.: Lower-bounding and heuristic methods for a refuse collection vehicle routing problem. Eur. J. Oper. Res. 121, 420–434 (2000) 4. Solomon, M.M.: Algorithms for the vehicle routing and scheduling problems with time window constraints. Oper. Res. 35, 254–265 (1987) 5. Desrosiers, J., Dumas, Y., Solomon, M.M., Soumis, F.: Chapter 2 Time constrained routing and scheduling. In: Handbooks in Operations Research and Management Science. Elsevier, pp. 35–139 (1995) 6. Cook, W.a.R., Jennifer, L.: A Parallel Cutting-Plane Algorithm for the Vehicle Routing Problem with Time Windows (1999) 7. Liu, S.-C., Chen, J.-R.: A heuristic method for the inventory routing and pricing problem in a supply chain. Expert Syst. Appl. 38, 1447–1456 (2011) 8. Hashimoto, H., Ibaraki, T., Imahori, S., Yagiura, M.: The vehicle routing problem with flexible time windows and traveling times. Discret. Appl. Math. 154, 2271–2290 (2006)

An Improved Genetic Algorithm for Vehicle Routing Problem

409

9. Wang, Y., Wang, L., Peng, Z., Chen, G., Cai, Z., Xing, L.: A multi ant system based hybrid heuristic algorithm for vehicle routing problem with service time customization. Swarm Evol. Comput. 50, 100563 (2019) 10. Xiaoju, F., et al.: Active vitamin D3 protects against diabetic kidney disease by regulating the jnk signaling pathway in rats. Int. J. Diabetes Endocrinology 6, 105–113 (2021) 11. Iqbal, S., Kaykobad, M., Rahman, M.S.: Solving the multi-objective vehicle routing problem with soft time windows with the help of bees. Swarm Evol. Comput. 24, 50–64 (2015) 12. Cai, Y., Cheng, M., Zhou, Y., Liu, P., Guo, J.-M.: A hybrid evolutionary multitask algorithm for the multiobjective vehicle routing problem with time windows. Inf. Sci. 612, 168–187 (2022) 13. Tian, Y., Cheng, R., Zhang, X., Jin, Y.: PlatEMO: a MATLAB platform for evolutionary multiobjective optimization [educational forum]. IEEE Comput. Intell. Mag. 12, 73–87 (2017) 14. Zhang, Q., Li, H.: MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evol. Comput. 11, 712–731 (2007) 15. He, Z., Yen, G.G.: Many-objective evolutionary algorithms based on coordinated selection strategy. IEEE Trans. Evol. Comput. 21, 220–233 (2017) 16. Chen, H., Tian, Y., Pedrycz, W., Wu, G., Wang, R., Wang, L.: Hyperplane assisted evolutionary algorithm for many-objective optimization problems. IEEE Trans. Cybern. 50, 3367–3380 (2020)

A Reinforcement Learning Method for Solving the Production Scheduling Problem of Silicon Electrodes Yu-Fang Huang1 , Rong Hu1,2(B) , Xing Wu2 , Bin Qian1 , and Yuan-Yuan Yang1 1 School of Information Engineering and Automation,

Kunming University of Science and Technology, Kunming 650500, China [email protected] 2 Faculty of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, China

Abstract. In this paper, a new silicon electrode production (SEP) scheduling problem resulting from silicon electrode manufacturing procedure is discussed. We combine two coupled subproblems to model the problem. One subproblem is a silicon rod cutting scheduling problem in parallel machines, and the other is a silicon electrode product scheduling problem in a hybrid flowshop. We present a reinforcement learning (RL) method to address this SEP problem. RL uses Q-learning algorithm to autonomously select heuristics from a pre-designed lowlevel heuristic set (LLHs). The selected heuristic is used to optimize the solution space for better results. Considering the “single-batch” coupling relationship in the manufacturing process, a two-stage encoding strategy is used for subproblems, and the related decoding mechanism was designed in the silicon rod cutting stage and silicon electrode production processing stage. Experimental results on newly introduced examples demonstrate that the suggested method effectively competes with cutting-edge algorithms. Keywords: Q-learning · Silicon electrode · Reinforcement learning · Hyper-heuristic algorithm · Production scheduling

1 Introduction In recent years, there has been a growing interest in finding solutions to practical scheduling issues [1]. This paper proposes the silicon electrode production (SEP) process, which is found in the semiconductor industry. It is a realistic and typical scheduling problem that integrates a silicon rod cutting scheduling problem with parallel machines and silicon electrode product scheduling problem in a hybrid flowshop. Moreover, SEP in the semiconductor industry is the bottleneck process and is gaining attention because of its complexity and indispensable. The SEP process includes three consecutive production processes, namely cutting, shaping, and surface treatment. Effective scheduling methods for the SEP process are crucial to improving the productivity of the semiconductor industry [2]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 410–421, 2023. https://doi.org/10.1007/978-981-99-4755-3_36

A Reinforcement Learning Method for Solving the Production

411

Parallel machine scheduling and traditional hybrid flow-shop scheduling are combined in SEP scheduling. With the characteristics of parallel machine and hybrid flowshop, SEP scheduling is difficult to solve and even the two-stage hybrid flow-shop scheduling is also an NP-hard (non-deterministic polynomial, NP) problem [3]. SEP scheduling has been neglected in the last 40 years, the majority of researchers have concentrated on improving the physical characteristics of silicon electrodes [4–6]. Hence, SEP scheduling is an important topic that deserves investigation. We discovered that SEP scheduling is the inverse scheduling of steelmakingcontinuous casting (SCC). Therefore, by studying the SCC problem, we can discover a solution to the SEP scheduling. SCC scheduling problem have primarily been addressed through mathematical programming and artificial intelligence in research and practice [7]. For mathematical programming methods, Lagrange relaxation is widely used [8– 10]. Theoretically, mathematical programming methods can obtain the best answer, but in practice, they typically take too long to compute, making them only useful for smallscale issues. Obviously, it is not applicable to SEP. For artificial intelligence methods, in the past few years, an array of intelligent algorithms has been proposed and can be applied to solve SCC effectively, including the ant colony optimization algorithm [11], the genetic algorithm (GA) [12], fruit fly optimization algorithm [13], and so on. In the above literature, the parallel machine stage is fixed in advance and remains unchanged, and the researchers focused on scheduling the HFSP stage. But this paper assumes that the schedule of the parallel machine stage is determined by the scheduling algorithm. That is, the new SEP scheduling problem optimizes parallel machine scheduling and HFSP scheduling to minimize makespan. This method will result in a substantially higher level of flexibility, which can greatly increase production. To the best of the authors’ knowledge, nevertheless, there is still no existing research about the RL method for this kind of problem. RL is an algorithmic approach to addressing sequential decision problems in which an agent picks up knowledge by making mistakes and interacting with its surroundings [14]. In the past few years, RL has been a hot for machine learning research. Q-Learning, one of the most popular algorithms of RL, has demonstrated its excellent performance on various scheduling problems [15, 16]. Besides, Q-learning can use evolutionary mechanisms to quickly solve complex scheduling problems. So, the previous discoveries motivate us designing a Q-learning based RL scheme (see Fig. 3) to solve the SEP. In RL, Q-learning is used to select LLHs during optimization. The selected LLH is used to find better solutions. The LLH is given a reward or punishment depending on a reward function following the execution period. Then, Q-table stores a Q-value by Q-function to determine which action will be chosen in the next stage. The main contributions of this paper can be described as follows: (1) We developed a scheduling model for silicon electrodes production. (2) According to the characteristics of SEP, the decoding method is used pertinently. (3) We generate a total of 20 new test instances with different dimensions based on a realistic production process, which would be valuable for future research on SEP. (4) RL method is proposed to solve SEP. The rest of the paper is organized as follows: In Sect. 2, the SEP sequencing model is constructed and analyzed in a sample case. In Sect. 3, the RL structure is designed in detail. In Sect. 4, the computational results of the test example and the comparison with

412

Y.-F. Huang et al.

some state-of-the-art algorithms are analyzed. Finally, Sect. 5 concludes the paper and proposes some future work.

2 Problem Description and Modeling 2.1 Variable Definitions The definitions of relevant mathematical symbols involved in this paper are shown in Table 1. Table 1. Notations applied in the model of SEP. Symbol

Constants

i

Indices of silicon, where i = 1,2,…,N Si , N Si is total number of silicon;

b

Indices of batches, where b = 1,2,…,N batch , N batch is total number of batches;

s

Indices of stages, where s = 1,2,…,N stage , N stage is total number of stages;

ms

Indices of machines in stage s, where ms = 1,2,…,N s machine , N s machine is the total number of machines in stage s;

k

Indices of tools, where k = 1,2,…,N tool , N tool is total number of tools;

ts

Transportation time of each batch from s-1 stage to s stage;

qi

Setting time of silicon i;

bs

Tool for producing batch b in stage s, s = 2,…,N stage ;

pi 1

Processing time of silicon i;

pb s

Processing time for batch b production on stage s, s = 2,…, N stage ;

T ms

The total number of batches or silicon processed on the mth machine in stage s;

π m,j

The jth batch or silicon processed on the mth machine in stage s;

αi

Number of batches contained in silicon i;

π i1

The ith silicon rod in stage1;

π bs

The bth batch in stage s, s = 2,…,N stage ;

s qm s ,k

On stage s, the time to switch tool k, s = 2,…,N stage ;

cπ s

Completion time of batch;

f Prod

Total production cost;

b

2.2 Permutation-Based Model of the SEP SEP can be described as: there are N batch types of silicon electrode that need to be produced in the silicon electrode factory, N Si silicon rods are cut into N batch silicon electrode on some wire saw devices. Each replacement of silicon rod requires a set

A Reinforcement Learning Method for Solving the Production

413

time. The silicon electrode need to complete all the production stages to finish the production, and each silicon electrode takes t s time to transport to the next stage. There are ms independent isomorphic parallel machines with tools in the Shaping and Surface treatment stages. In those stages, the silicon electrode uses a specific tool for processing. So, when a silicon electrode arrives at a parallel machine, it is necessary to check whether s time to change the tool of the current parallel machine is compatible, if not, takes qm s ,k the tool. Our goal is to minimize the total time of the production process. The schematic diagram of this problem is shown in below (see Fig. 1).

Fig. 1. Schematic diagram of SEP

Based on the above discussion, the permutation-based model of SEP can be expressed as: απ 1

pπ1 1

cπ 1

m1 ,i

=

m1 ,i

⎧ ⎨

=

pπ1 1 , ifTm1 1 = 0 m1 ,i

⎩ cπ 1

m1 ,i−1

cπms ,b s

+ qπ 1

m1 ,i

+ pπ1 1 , else

m1 ,i 

j=1

pπ1 1

(1)

j

1 , m1 = 1, 2, ..., Nmachine , i = 1, 2, ..., NSi

m1 ,i



 s = max cπ s−1 + ts , cπms ,b−1 + qm s s ,π

s = 2, .., Nstage , ms =

ms ,b

s

b

s 1, 2, ..., Nmachine ,b

(2) + pπs s , ms ,b

= 1, 2, ..., Tms s

 fProd =

max

Nstage

s ms =1,2,...,Nmachine ,b=1,2,...,TmN

stage

(3)



c

Nstage

πm

(4)

Nstage ,b

Equations (1)–(4) give the permutation-based model of SEP. Equations (1) to (3) represent the formula for calculating the completion time of each job in the forward scheduling process. Equation (4) represents the minimum maximum completion time of the production stage.

414

Y.-F. Huang et al.

3 Proposed RL Algorithm In this section, a Q-learning based RL algorithm is proposed for solving the SEP. First, the encoding and decoding schemes are proposed to represent solution and obtain feasible schedules, respectively. Then, a set of low-level heuristics (LLHs), which are also regarded as the executable actions in RL scheme, are created using some problemspecific heuristics. Additionally, the state representation and action selection approach are introduced, and Q-learning is used as a high-level strategy to choose the right action during the optimization process. Finally, the algorithmic framework of RL is presented in the last subsection. 3.1 Solution Encoding and Decoding Schemes In this subsection, the two-segment coding vector x = (u, v) (see Fig. 2) contains the first half is the silicon rod sequence u, and the second half is the silicon electrode sequence v. The silicon electrode is produced by the silicon rod with the same background color, for example, silicon electrode 4 and silicon electrode 5 are produced by the silicon rod 2.

Fig. 2. The two-segment coding strategy.

Common coding rules are designed only to provide the processing order of a certain stage[17], so relying on these rules is insufficient to obtain a complete scheduling solution for SEP. Because the proposed SEP problem contains two processing stages, it is necessary to design the corresponding decoding rules for the two processing stages: (1) Silicon rod cutting phase decoding mechanism Before cutting, it is necessary to assign silicon electrodes to silicon rods. Only silicon electrodes have the same shape can be divided into the same silicon rod. Based on the principle of earliest completion time, in this study, the decoding steps in the silicon rod cutting stage were designed as follows: Step 1: Placing the silicon rod on the wire saw devices according to u, and delete the silicon rod from u, until there is a machinable silicon rod on all wire saw devices. Step 2: According to the earliest completion precedence rules–for the new task assigned a wire saw devices, the preferred choice is a fastest completed wire saw devices.

A Reinforcement Learning Method for Solving the Production

415

Step 3: Steps 1 and 2 are repeated until all silicon rods have been assigned to the wire saw device. (2) Silicon electrodes processing phase decoding mechanism Silicon electrodes processing decoding is composed of two parts: silicon electrodes sorting and equipment dispatching. For silicon electrodes sorting, the commonly used heuristic rule is “come first, first served” (FCFS) - that is, cutting completed earlier in the previous stage of the remaining processing stage are given priority for processing. Based on the principle of earliest completion time, the decoding steps in the silicon electrodes processing stage were designed as follows: Step 1: Silicon electrodes b is removed from sequence v, and silicon electrodes b is deleted from sequence v. The completion time of silicon electrodes b on its optional processing equipment is calculated, and the equipment with the shortest completion time is selected. Step 2: The processing equipment’s total processing time is updated. Step 3: Steps 1 and 2 are repeated until all silicon electrodes in sequence v are removed. 3.2 Action Representation The design of a low-level heuristic algorithm will greatly affect the efficiency of the super heuristic algorithm. In this section, we construct low-level heuristic algorithms based on SEP characteristics and treat them as different executable actions. We designed four simple heuristic algorithms in a single silicon rod and four heuristic algorithms between silicon rods to form the low-level heuristic operations (LLHs). The domain structure between silicon is described as follows: (1) Inverse_differ: Select two non-adjacent silicon rods randomly from the silicon rod sequence and inverting the silicon rod sequence between the selected silicon rod. (2) Interchange_differ: Select two adjacent silicon rods randomly from the silicon rod sequence and exchanging the positions of the two selected silicon rod. (3) Swap_differ: Select two non-adjacent silicon rods randomly from the silicon rod sequence and swapping the positions of the two selected silicon rod. (4) Insert_differ: Randomly select two different silicon rods i and j from the silicon rod sequence and inserting silicon rod j before or after silicon rod i. (5) Inverse_same: Select two non-adjacent silicon electrodes randomly from a silicon rod and inverting the silicon electrode sequence between the selected silicon electrodes. (6) Interchange_same: Select two adjacent silicon electrodes randomly from a silicon rod and exchanging the positions of the two selected silicon electrodes. (7) Swap_same: Select two non-adjacent silicon electrodes randomly from a silicon rod and swapping the positions of the two selected silicon electrodes. (8) Insert_same: Randomly select two different silicon electrodes i and j from a silicon rod and inserting silicon electrode j before or after silicon electrode i.

416

Y.-F. Huang et al.

3.3 State Representation We used a state aggregation technique based on the minimum maximum completion time is used to separate the state space into several disjoint parts. The improvement area is divided using the completion time proportion r t since the ranges of each episode’s completion times different. The calculation of r t is given as in Eq. (5), where ct Init and ct EP are the minimum maximum completion time of the solution before and after the search process during an episode period EP, respectively, and t = 1,2,…, E, E is the total evaluation time. In Agent, the proportion is divided into three states including [1,1.5), [1.5,2), and [2, ∞). rt =

ctInit ctPE

(5)

3.4 Action Selection Method During the learning process, the εt -greedy policy is a common method for action selection in Q-learning [18], which is presented as in Eq. (6), where εt is the exploration probability, k is a random number within [0, 1]. In RL, εt is dynamically controlled by Eq. (7), where ecur is the current time.

random, if k < εt (6) at = arg max Q(s , a), else t a∈A

εt = 1 −

ecur E

(7)

3.5 Reward R The reward function is essentially used to reinforce the effective action for each state. In the Q-function, the hardening signal R is used to evaluate the performance of an action. The reward mechanism is shown in Eq. (8). ⎧ 0.5, rt ∈ (1, 1.5) ⎪ ⎪ ⎨ 1, rt ∈ [1.5, 2) R= (8) ⎪ 2, rt ∈ [2, ∞) ⎪ ⎩ 0, otherwise

3.6 Algorithmic Framework The proposed RL is essentially a single-solution based selection algorithm. Eight heuristics are used to construct LLHs and define executable actions. The Q-learning is employed as the high-level hyper-heuristic strategy to manipulate LLHs. After taking action to produce a new solution, the decoding scheme evaluates the fitness of solution.

A Reinforcement Learning Method for Solving the Production

417

The algorithmic framework is presented as below (see Fig. 3), where the chosen action tries to create a global optimum schedule for SEP while the Q-learning directly explores the heuristic space to get a good reward. The algorithm flow is as follows: Step 1: Initialize the solution by uniform random distribution and make all stateaction pairs with zero Q-values. The initial s0 is set to 1, belonging to the state [1, 1.5). Step 2: Save the initial solution as the overall best solution after evaluating the solution fitness using the decoding strategy. Step 3: Select an action at according to Eq. (6). Step 4: Perform the action at . If a better solution is discovered, update the current best option. If an EP ends, calculate r t by Eq. (5) and identify the next state st+1 . The relevant reward R is given by Eq. (8). Step 5: Based on the provided Q-function, obtain the argmax Qt (st+1 , a) value and a∈A

update the Qt + 1 (st , at ) value. Then, update the current state St = St + 1. Step 6: Repeat the procedures from Step 2 to Step 5 until the terminating condition is satisfied, then, output the best solution.

Fig. 3. Framework of the proposed RL.

4 Experimental Results and Discussion To verify the effectiveness of the proposed RL algorithm for addressing the SEP, in this section, we compare RL with some frontier intelligent algorithms. All algorithms are developed in Pycharm using Python 3.9 and installed with Intel ® Running on Core’s PC ™ I7-5200U 2.20 GHz CPU and 16 GB RAM in Microsoft Windows 11 environment. The following sections respectively introduce the data and experimental design, experimental results of the comparison algorithm.

418

Y.-F. Huang et al.

4.1 Data and Experiment Design Because the SEP problem in this paper is a new scheduling problem and therefore no benchmark instances have been found in the literature. To evaluate the performance of proposed RL, we generate a total of 20 new test instances with different dimensions based on a realistic silicon electrode production process. We choose N stage ∈ {3, 4, 5, 6}, N si ∈ {3, 9, 18, 24, 30}, and combine them in the form of N stage × N si . The other data of the instance is generated uniformly in the following range: ms ∈ [3,5], ai ∈ [5,12], t s s ∈ [8,15], qi ∈ [20,35], ps b ∈ [60,100], bs ∈[3,5], qm ∈ [4,6]. s ,k To study the effectiveness of RL, we used the Taguchi method of design-ofexperiment (DOE) to get the three key parameters for RL [19], namely greed ε, learning rate α, and decreasing reward γ . And the best combination of parameters is: ε = 0.6, α = 0.15, γ = 0.8. Then, compare the performance of RL with the algorithms published in recent literature. By comparing the results of these algorithms, we can verify the effectiveness of RL. These algorithms are PGR [20], GATS [21], and GA [22]. We compared algorithms in the same operating environment as the RL. And all algorithms have been reimplemented according to the original literature. The parameters of the algorithm are from the recommended settings in the literature, and appropriate adjustments have been made for the problems considered. Relative error (RE) is used to evaluate the effectiveness of scheduling algorithms and is directly used to evaluate the performance of algorithms, as shown in Eq. (9): RE =

fTotal_i − fTotal_best × 100 fTotal_best

(9)

where f Total_best represents the minimum total cost obtained by running all comparison algorithms, and f Total_i represents the total cost obtained by running the ith algorithm. To obtain more reliable experimental results, algorithms execute 30 times for each instance,10,000 times for results. The best RE (BRE), average RE (ARE), and worst RE (WRE) are used to measure the performance. The algorithm with lower BRE, ARE, and WRE can better solve the SEP. 4.2 Experiment Results and Discussion The comprehensive comparison results of all comparison algorithms are shown in Fig. 4 and Table 2, and the lowest BRE, ARE, and WRE for each test question in the table are highlighted in bold. According to Fig. 4 and Table 2, we can see that RL has the lowest BRE, ARE and WRE values, it can prove the superiority of the proposed RL algorithm. In addition, RL has obtained the lowest overall average of BRE, ARE, and WRE on 16 instances of different scales, indicating that the average performance of RL is better than the other three comparison algorithms.

A Reinforcement Learning Method for Solving the Production

419

Fig. 4. Comparisons of RL performance with the state-of-the-art algorithms.

Table 2. Comparison results of RL with the state-of-the-art methods. Scale

PGR

GATS

GA

RL

BRE ARE WRE BRE ARE WRE BRE ARE WRE BRE ARE WRE 3×9

0.66

0.66

0.66

0.66

0.81

1.01

0.66

1.02

1.26

0.00

0.42

0.66

3 × 18 0.31

0.58

0.79

0.36

0.49

0.58

0.19

0.29

0.57

0.00

0.12

0.18

3 × 30 0.34

0.51

0.67

0.46

0.54

0.65

0.28

0.45

0.59

0.00

0.04

0.08

4×9

0.93

0.93

0.93

0.81

0.91

0.93

0.93

0.93

0.93

0.00

0.14

0.33

4 × 18 0.03

0.38

0.52

0.03

0.35

0.56

0.13

0.29

0.37

0.00

0.02

0.03

4 × 30 0.65

0.71

0.76

0.12

0.48

0.71

0.31

0.57

0.75

0.00

0.32

0.53

5×9

0.82

0.84

0.92

0.82

0.86

0.92

0.82

0.89

0.96

0.00

0.07

0.82

5 × 18 0.40

0.49

0.58

0.17

0.32

0.46

0.05

0.29

0.40

0.00

0.12

0.23

5 × 30 0.30

0.31

0.32

0.16

0.28

0.39

0.16

0.22

0.32

0.00

0.05

0.13

6×9

0.27

0.41

0.48

0.27

0.37

0.48

0.27

0.32

0.48

0.00

0.07

0.23

6 × 18 0.18

0.27

0.40

0.00

0.22

0.31

0.11

0.26

0.55

0.00

0.01

0.10

6 × 30 0.50

0.57

0.66

0.23

0.57

1.16

0.10

0.25

0.34

0.00

0.11

0.18

420

Y.-F. Huang et al.

5 Conclusions In this paper, an RL algorithm is presented to solve the SEP problems. Experiments show that RL can effectively solve the SEP problem. However, the stability of the algorithm still needs to be strengthened, and its complexity can be further reduced. For future work, we can consider building the actions and states of the RL algorithm more efficiently to improve algorithm efficiency or studying more complex integrated scheduling problems for production and transportation of silicon electrodes, and applying the RL method to this integrated scheduling problem. Acknowledgement. This research was supported by the National Natural Science Foundation of China (61963022 and 62173169) and the Basic Research Key Project of Yunnan Province (202201AS070030).

References 1. Tang, L., Guan, J., Hu, G.: Steelmaking and refining coordinated scheduling problem with waiting time and transportation consideration. Comput. Ind. Eng. 58, 239–248 (2010) 2. Pan, Q.K.: An effective co-evolutionary artificial bee colony algorithm for steelmakingcontinuous casting scheduling. Eur. J. Oper. Res. 250(3), 702–714 (2016) 3. Hoogeveen, J.A., Lenstra, J.K., Veltman, B.: Preemptive scheduling in a two-stage multiprocessor flow shop is NP-hard. Eur. J. Oper. Res 89, 172–175 (1996) 4. Madou, M.J., et al.: Bulk and surface characterization of the silicon electrode. Surf. Sci. 108(1), 135–152 (1981) 5. Chandrasekaran., R., et al.: Analysis of lithium insertion/deinsertion in a silicon electrode particle at room temperature. J. Electrochemical Soc. 157(10), A1139 (2010) 6. Huertas, Z.C., et al.: High performance silicon electrode enabled by titanicone coating. Sci. Rep. 12(1), 1–8 (2022) 7. Pacciarelli, D., Pranzo, M.: Production scheduling in a steelmaking-continuous casting plant. Comput. Chem. Eng. 28, 2823–2835 (2004) 8. Tang, L., Lub, P.B., Liu, J., Fang, L.: Steelmaking process scheduling using Lagrangian relaxation. Int. J. Prod. Res. 40(1), 55–70 (2002) 9. Xuan, H., Tang, L.: Scheduling a hybrid flowshop with batch production at the last stage. Comput. Oper. Res. 34(9), 2718–2733 (2007) 10. Mao, K., Pan, Q.-K., Pang, X., Chai, T.: A novel Lagrangian relaxation approachfor a hybrid flowshop scheduling problem in the steelmaking-continuous casting process. Eur. J. Oper. Res. 236, 51–60 (2014) 11. Atighehchian, A., Bijari, M., Tarkesh, H.: A novel hybrid algorithm for scheduling steelmaking continuous casting production. Comput. Oper. Res. 36(8), 2450–2461 (2009) 12. Zhu, D.-F., Zheng, Z., Gao, X.-Q.: Intelligent optimization-based production planning and simulation analysis for steelmaking and continuous casting process. Int. J. Iron Steel Res. 17(9), 19–24 (2010) 13. Pan, Q.-K., Wang, L., Mao, K., Zhao, J.-H., Zhang, M.: An effective artificial bee colony algorithm for a real-world hybrid flowshop problem in steelmaking process. IEEE Trans. Autom. Sci. Eng. 10(2), 307–322 (2013) 14. Han, W., Guo, F., Su, X.: A reinforcement learning method for a hybrid flow-shop scheduling problem. Algorithms 12(11), 222 (2019)

A Reinforcement Learning Method for Solving the Production

421

15. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn 8, 279–292 (1992) 16. Choong, S.S., Wong, L.-P., Lim, C.P.: Automatic design of hyper-heuristic based on reinforcement learning. Inf. Sci. 436–437, 89–107 (2018) 17. Li, X., Guo, X., Tang, H., et al.: Improved cuckoo algorithm for the hybrid flow-shop scheduling problem in sand casting enterprises considering batch processing. Available at SSRN 41, 18–112 (2018) 18. Falcao, D., Madureira, A., Pereira, I.: Q-learning based hyper-heuristic for scheduling system self-parameterization. In: 10th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–7. Aveiro, Portugal: IEEE (2015) 19. Montgomery, D.C.: Design and analysis of experiments. Wiley (2008) 20. Goodarzi, M., Raheleh, F.A., Farughi, H.: Integrated hybrid flow shop scheduling and vehicle routing problem. J. Ind. Syst. Eng., 223–244 (2021) 21. Li, W., et al.: Integrated production and transportation scheduling method in hybrid flow shop. Chinese J. Mech. Eng., 1–20 (2022) 22. Qin, H., Li, T., Teng, Y., Wang, K.: Integrated production and distribution scheduling in distributed hybrid flow shops. Memetic Comput. 13(2), 185–202 (2021). https://doi.org/10. 1007/s12293-021-00329-6

Information Security

CL-BOSIC: A Distributed Agent-Oriented Scheme for Remote Data Integrity Check and Forensics in Public Cloud Xiaolei Zhang1,2 , Huilin Zheng1,2 , Qingni Shen1,2(B) , and Zhonghai Wu1,2 1 Peking University, Beijing 100871, China

[email protected] 2 PKU-OCTA Laboratory for Blockchain and Privacy Computing, Peking University,

Beijing 100871, China

Abstract. As electronic evidence, electronic data is easily tampered with and forged, so it is very important to ensure the authenticity, reliability and integrity of electronic data in forensics. In this paper, we propose a certificateless distributed agent-oriented data outsourcing and integrity checking scheme in forensics scenarios — CL-BOSIC, which utilizes smart contracts deployed on the blockchain to check the integrity of outsourced data periodically. Besides, smart contracts guarantee the fair payment between the user and the cloud server. During forensics, the judiciary verify the validity of electronic data according to the records on the blockchain, which solves the judgment of legal force on electronic evidence. CL-BOSIC embeds the non-interactive framework in both verification process and the data retrieval process during forensics to reduce the communication overhead and data leakage. In addition, CL-BOSIC also realizes data privacy protection and user identity anonymity. Security analysis is presented in the context of completeness, soundness, and privacy protection. We implement a prototype of CL-BOSIC based on Fabric, and evaluate the performance of the prototype from different aspects, the evaluation results indicate that our proposal is feasible and incurs tolerable overheads for each party. Keywords: cloud storage · remote data integrity check · non-interactive · certificateless cryptography · forensics · privacy-preserving · blockchain

1 Introduction More and more people outsource data to the cloud for reducing the burden of data management and sharing data conveniently. But cloud servers are semi-trusted entities that may hide data corruption from users to protect their reputation, or even deliberately remove some data for greater business interests. Therefore, Provable Data Possession (PDP) is widely used in cloud storage scenario to ensure the availability and integrity of user’s data. There have been many researches that have successfully applied PDP schemes to e-health [26], internet of things (IOT) [25] and other fields recently. X. Zhang and H.Zheng—These authors contributed equally to this work. This work was supported by the National Key R&D Program of China (No.2022YFB2703301). © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 425–441, 2023. https://doi.org/10.1007/978-981-99-4755-3_37

426

X. Zhang et al.

The rapid development of instant messaging applications makes it an important medium for people to negotiate, so that the chat records often need to appear as electronic evidence (e-evidence) in judicial trials. We consider an electronic lending scenario that the loan agreement is formed using the instant messaging applications, then the chat record exists as “Electronic Debit Note”, which will be used as e-evidence to prove the facts of the case under certain circumstances. E-evidence is easily tampered with. To guarantee the validity of e-evidence, the parties need to store some auxiliary information and apply for notarization from the authority, which brings a great burden to the parties. In general, there are many problems in actual litigation cases, and put forward high requests to the parties’ ability to store and present evidence. Therefore, there is an urgent need for a scheme that would reduce the burden of local storage on users while enabling judiciary to access the probative value of e-evidence more effectively. In this paper, we propose a certificateless blockchain-based outsource storage and integrity check scheme (CL-BOSIC) to solve the above problems. We use PDP to ensure the integrity of e-evidence, and utilize blockchain as a distributed agent to perform noninteractive auditing and build a fair payment environment by smart contracts. The judiciary achieves accurate requests for e-evidence through keyword search, improving judicial efficiency and protecting user’s privacy. And CL-BOSIC is based on certificateless cryptography, which avoids the security risks of key generation center (KGC). Our contributions can be summarized as follows: • We introduce a PDP scheme based on certificateless cryptography in a forensics scenario, which reduces the cost of certificate management and solves the problems of key escrow. We are the first to combine PDP and blockchain to solve the problems of e-evidence depositing and authentication. • Our scheme is based on a non-interactive framework. The blockchain is introduced as a distributed agent, which is responsible for periodically data integrity audit and data retrieval during forensics. The smart contracts deployed on the chain execute the operation logics according to the invocation arguments, and record the information of the operations and results on the chain. We build a fair and open data operation platform while reducing communication overhead. • We implement a prototype of CL-BOSIC based on Fabric. And we provide security analysis and performance evaluation of CL-BOSIC, which indicate that our proposal is feasible and incurs tolerable overheads for each party.

2 Related Work 2.1 Provable Data Possession Ateniese et al. [1] first introduced the concept of PDP, which allows users to verify the integrity of outsourced data without downloading the complete data. Wang et al. [2] introduced Third Party Auditor (TPA) into the PDP scheme for the first time to perform the integrity verification process on behalf of the user, and realized data privacy protection by using random mask technology. Since then, a large number of PDP schemes [3, 4] have also adopt random mask technology to achieve data privacy protection. The concept of zero-knowledge public audit is proposed to further enhance the strength of users’ privacy protection in schemes [5, 6]. To reduce the overhead of certificate management, solutions

CL-BOSIC: A Distributed Agent-Oriented Scheme

427

[10, 11] utilize PKI cryptography and solutions [7, 8] utilize identity-based encryption (IBE) [9]. To make the KGC cannot decrypt the data of all users, some researchers proposed PDP schemes [17, 18] based on certificateless cryptography [12]. 2.2 Blockchain and Smart Contract Blockchain is proposed by Nakamoto Satoshi in 2008 [13]. Blockchain records each transaction or data in a block, which is public and can be queried by anyone in the blockchain system. Distributed nodes update the hash chain synchronously by running a consensus protocol. Therefore, unless an attacker controls 51% of the computing power of the entire network, it can be considered that the blockchain is immutable. A smart contract is an executable code that runs automatically on the blockchain through consensus nodes. The smart contracts are used to solve problems in various fields, such as electronic voting [14], cloud computing [15] and the Internet of Things [16]. 2.3 Certificateless Public Key Encryption with Keyword Search Song et al. [22] first proposed the concept of searchable encryption and constructed a scheme based on symmetric encryption to search encrypted data. Boneh et al. [23] proposed a public-key encryption scheme with keyword search (PEKS) to reduce the overhead of key management. Peng et al. [21] first combined certificateless cryptography and public encryption with keyword search (CLPEKS) to solve key escrow problems.

3 Preliminaries 3.1 Bilinear Pairing Let G1 , G2 be two multiplicative groups of the same prime order q. g is a generator of G1 . A bilinear pairing is a map e : G1 × G1 → G2 with the following properties: • Bilinear: e(ua , vb ) = e(u, v)ab , ∀u, v ∈ G1 , and a, b ∈ Zq . • Non-Degenerate: e(g, g) = 1G2 , in which 1G2 is the identity of G2 . • Computable: There is an efficient algorithm to compute e(u, v) for all u, v ∈ G1 . 3.2 Security Assumptions Discrete Logarithm (DL) Assumption: For a probabilistic polynomial time (PPT) adversary ADL , the advantage of an adversary ADL on solving the DL problem in G1 is negligible, which can be defined as below ( is a negligible value). R

Pr[ADL (g, ga ) = a : a ← Zq ∗ ] ≤  Computational Diffie-Hellman (CDH) Assumption: For a probabilistic polynomial time (PPT) adversary ACDH , the advantage of ACDH on solving the DL problem in G1 is negligible, which can be defined as below ( is a negligible value). R

Pr[ACDH (g, ga , gb ) = gab : a, b ← Zq ∗ ] ≤ 

428

X. Zhang et al.

Fig. 1. The system model of our scheme.

  Decisional Diffie-Hellman (DDH) Assumption: Given a tuple g, gα , gβ , R where α, β ∈ Zq ∗ and g, R ∈ G1 , for a probabilistic polynomial time (PPT) adversary ADDH , the advantage of an adversary ADDH on distinguishing whether R is gαβ or a random element is negligible, which can be defined as below ( is a negligible value). |Pr[ADDH (g, gα , gβ , gαβ ) = 1] − Pr[ADDH (g, gα , gβ , R) = 1]| ≤ 

4 Problem Statement 4.1 System Model CL-BOSIC includes five entities: Key Generation Center (KGC), user group, cloud server (CS), blockchain, and judiciary. Figure 1 illustrates the relationships and interactions among these entities. The KGC is responsible for generating system parameters and partial private keys. Users in the user group outsource data to the CS. The CS is semi-trusted, who provides data storage services for users and may hide data corruption to users for maintaining reputation. The blockchain is delegated by users to check the data possession of the CS periodically and perform retrieval of e-evidence metadata during the forensic process. The judiciary obtains the metadata and verification results about e-evidences from the blockchain during the forensic process. We also prevent the leakage of any private information about users and their data during integrity verification and metadata retrieval processes. 4.2 Design Goals The design goals of CL-BOSIC are as follows: • Soundness. The CS can pass the verification only if the user’s data is complete. • Perfect Privacy Preservation. In the whole process, the zero knowledge leakage of users’ data and identity information to verifier need to be realized.

CL-BOSIC: A Distributed Agent-Oriented Scheme

429

• Non-interactive. The CS and the verifier have no need to interact with each other during the auditing process. • Fair-payment. Only the CS stores user’s data correctly will obtain the reward from the user, if the data is corrupted, the CS will pay compensation to the user. • Public Verifiability. Everyone can verify the integrity of data depending on the proof generated by the CS or directly obtain the verification result. • Certificateless. Avoid security risks of KGC and certificate management overhead. 4.3 Security Definition We consider four types of probabilistic polynomial time (PPT) adversaries namely A1 , A2 , A3 and A4 . • Type-I Adversary (A1 ): A1 (malicious outsider) tries to substitute the user’s public key with another value, but A1 cannot access the master secret key (msk). • Type-II Adversary (A2 ): A2 (malicious KGC) tries to mount an impersonation attack having access to the master secret key (msk) of the system, but it cannot replace the user’s public key. • Type-III Adversary (A3 ): A3 (untrusted CS) tries to forge the auditing proof to pass the integrity verification initiated by the verifier. • Type-IV Adversary (A4 ): A4 (curious verifier) tries to gain access to private information of users’ data and identity during audit process. We define the security model of our scheme by constructing three games between A1 , A2 , A3 and a challenger C respectively. The details are as follows: Game 1 (played between adversary A1 and challenger C): Setup: Initially, C executes Setup algorithm to obtain the msk and params, and then forwards params to A1 while keeping msk secret. Oracles: • Partial Private Key Oracle: On input of a query on the user’s identity ID, C runs PartialKeyGen algorithm to obtain and return the partial private key of ID to A1 . • UserKeyGen Oracle: On input of a query on the user’s identity ID, C runs UserKeyGen algorithm to generate the secret value and the public key of the user’s ID, and then returns to A1 . • Public Key Replacement: A1 replaces user’s public key with a value of his choice.

430

X. Zhang et al.

• Tag Gen Oracle: A1 adaptively chooses the tuple (ID, m) and submits it to C. C runs TagGen algorithm to generate the tag σ of m, and then returns to A1 . Forge: Finally, A1 outputs {ID∗ , m∗ , σ∗ } as its forgery with identity ID∗ . If the forged tag σ∗ is valid after the above queries, A1 is regarded to win this game. Game 2 (played between adversary A2 and challenger C): Setup: Initially, C executes Setup algorithm to obtain the msk and params, and then forwards params and msk to A2 . Oracles: • UserKeyGen Oracle: On input of a query on the user’s identity ID, C runs UserKeyGen algorithm to generate and return the secret value and the public key of the user’s ID to A2 . • Tag Gen Oracle: A2 adaptively chooses the tuple (ID, m) and submits it to C. C runs TagGen algorithm to generate the tag σ of m, and then returns to A2 . Forge: Finally, A2 outputs {ID∗ , m∗ , σ∗ } as its forgery with identity ID∗ . If the forged tag σ∗ is valid after the above queries, A2 is regarded to win this game. Game 3 (played between adversary A3 and challenger C): Setup: Initially, C generates the msk, params, and partial private key for users, and then only forwards params to A3 . Tag Gen Oracle: A3 Adaptively chooses the tuple (ID, m) and submits it to C for querying the tag of m. C runs TagGen algorithm to generate and return the tag σ of m to A3 . Challenge: C Generates a random challenge chal to A3 and requests A3 to provide a corresponding data possession proof P of chal. Forge: For chal, A3 generates a proof P and sends it to C. If P can pass the verification process while A3 does not possess the correct data, A3 wins the game. Definition 1. A CL-BOSIC scheme is secure against adaptive impersonation and forging tag attacks if any PPT adversaries A (A1 , A2 or A3 ) who plays the above games with the challenger C has only negligible probability  of winning the games: Pr(Awin ) ≤  in which the probability  is taken over all coin tosses made by A and C. Definition 2. A CL-BOSIC scheme achieves perfect privacy if the adversary A4 cannot obtain any information about user’s data and identity during the interaction with the cloud server and the user.

CL-BOSIC: A Distributed Agent-Oriented Scheme

431

5 Framework 5.1 Algorithm Specification We suppose the number of users is z, idi represents the unique identity of the user ui and the judiciary is uj , in which i ∈ Zp∗ , j , The outsourced data CT is split into n blocks,   denoted as CT = mj |1 ≤ j ≤ n, mj ∈ Zq∗ . We employ short signature and randomization method to produce the partial private key in the PartialKeyGen algorithm. We use the certificateless cryptography [19] to construct the tags in the TagGen algorithm and use the certificateless public key encryption with keyword search [21] to achieve accurate requests for e-evidence in the forensic process. We design the interaction between the CS and the blockchain with zero knowledge proof [20]. The details of the proposed protocol are as follows:   • Setup 1λ → (params, msk, δ). On input of a security parameterλ, KGC chooses two cyclic multiplicative groups G1 and G2 with prime orderq,log2 q ≤ λ. g is a generator ofG1 . There exists a bilinear mape : G1 ×G1 → G2 . KGC selects four secure hash functionsH1 , H2 : {0, 1}∗ → G1 ,H3 : {0, 1}∗ → Zp∗ ,H4 : G2 → {0, 1}l . KGC initializes a public log fileLF, which is used to record the information of data blocks’ indexes and corresponding tag generators. KGC randomly choosesSm , δ ∈ Zp∗ , where Sm is master secret key and δ is an anonymous secret value. Then, KGC computesPpub = g Sm . KGC keeps the master secret key msk and δ privately, and publishes the system parameters params = {G1 , G2 , Ppub , H1 , H2 , H3 , H4 , e, g, p, LF}. • PartialKeyGen(params, msk, id) → psk. When receiving the identityidi ∈ {Zp∗ , j}. KGC computes the partial private key pskidi = H1 (idi + δ)Sm and sends pskidi and H1 (idi + δ) through a secure channel. • UserKeyGen(params, psk) → (SK, PK). The user ui or the judiciary uj randomly selects Sidi ∈ Zp∗ as the secret value and keeps it privately, and uses the secret value to compute the public keyPKidi = g Sidi . The private key of the user or the judiciary Sidi isSKidi =(pskidi , Sidi ), the public key isPK   idi = g . • EncData D, key, PK j → CT, CT key . The ui randomly selects key and encrypts the original data D with a symmetric encryption algorithm (e.g., AES) to obtain the ciphertextCT . Then ui encrypts the key with the judiciary’s public key PKj to obtain the ciphertext of the symmetric encryption keyCTkey . The CTkey will be sent to the blockchain and will be returned to the judiciary as the search result when the judiciary performs the keyword search successfully. • TagGen(SK u , CT) → σ . As mentioned earlier, the message CT is split into  n blocks, 0 ≤ m tag for each data blockm ≤ n . It takesui ’s user ui generates an authentication j j   private key SKid i = pskidi , Sid i and data mj as inputs and outputs the tag σj ofmj . m  S  The equation for computing tag isσj = pskidi j ·H2 ωj idi , in whichωj = j||fname, j is the index of data blockmj , and fname denotes the unique identity of the outsourced data. Each time the ui generates a tag for data blockmj , ui will update the information in public log file LF with the file namefname, the index j ofmj ,Sidi , andH1 (idi + δ). Actually, LF is a table, and one line of it can be shown as follows:

432

X. Zhang et al.

Besides, ui randomly selects t ∈ Zp and generates three parameters, namely,T1 = g t , t   T2 = e H1 (idi + δ), Ppub , andT3 = j∈J H2 (j||fname)t . These three parameters are used to the process of proof generation and verification.   EncInd params, id j , PK j , id u , SK u , W → C W . o Extract a set of keywords W = {ωi ∈ ψ|(1 ≤ i ≤ m)}, where ψ represents a collection of all keywords and m represents the number of keywords. o Select r ∈ Zp∗ randomly. 1/Sidi  r  o Compute C1 = g r , C2 = H1 (idi + δ)Sm , and C3 = H1 (idi + δ)/H1 idj . r/Sid  i if the data file contains keyword ω , otherwise o Compute Ei = g H3 (ωi ) · PKidj i set Ei = 1. o Set the encrypted indexes CW = (C1 , C2 , C3 , E1 , . . . , Em ). ProofGen(seed) → proof . The CS takes the nonce of the latest block in the blockchain R

as seed , and calculates challengeschal = {(i, vi )}i∈I ,vi ← ZP , I ⊂ [0, n). Then the CS  vm v computes μ = j∈J T2 j j , σ = j∈J σj j , and proof = H4 (e(σ, T1 ) · μ−1 ), and sends proof and chal to the blockchain. ProofCheck(proof , PK u , T 3 ) → (TrueorFalse). Upon receiving the proof   from the CS, the blockchain first searches the publish log file LF to get the information j, PKidi and checks the Eq. (1): proof = H4 (



e(T3vi , PKidi ))

(1)

j∈J j→i in which j → i means the information of ui can be found from public log file LF by the index j of data block mj . If the equation holds, the blockchain accepts the proof ; otherwise, the proof is invalid.



→ T W . The judiciary first produces the • Trapdoor params, SK j , PK u , W trapdoor by performing the following steps: o Choose a set of search keywords W = {ωi ∈ ψ|(1 ≤ i ≤ l)}, where l represents the number of search keywords. And select r ∈ Zp∗ randomly.

   l H3 (ωτ )r g r Sid j · H id + δ Sm r , N = p r

o Compute N1 = 1 j 2 pub , and N3 = τ =1 g PKid D r . The PKid D is the identity of the user.

CL-BOSIC: A Distributed Agent-Oriented Scheme

433

o Set the trapdoor TW = (N1 , N2 , N3 ). Then, the judiciary sends the T W to the blockchain for searching the metadata.   • Search params, C W , T W → (res or ⊥). After receiving the search request, the blockchain searches metadata by performing the following processes.

 o Compute σ1 = e C2 · lτ =1 Eτ →i , N3 , σ2 = e(N1 , C1 ), and σ3 = e(C3 , N2 ), where τ → i represents the mapping relationship between the subscript of the search keywords W and the encrypted keywords W . o Check σ1 = σ2 · σ3 . If holds, the search result res is returned to the judiciary. Otherwise, ⊥ is returned. The search result res contains theCTkey .   • DecData CT, keysession , SK j → D. Upon receiving the search result, the judiciary fetches the data CT from the CS, and decrypts the CTkey withSidj . Finally, the judiciary decrypts the CT with key and obtains the original dataD. 5.2 Our Construction 5.2.1 Outsource Data Preprocess. Firstly, the user randomly selects a key as the data encryption key and runs the EncData algorithm to obtains the ciphertext CT of the data and the ciphertext CTkey of the data encryption key. Secondly, the user runs the TagGen algorithm with the ciphertext CT and the private key SKu to generate the tags σ of CT . Besides, the user generates secret values T1 , T2 , T3 that used to verify the integrity of CT .

Fig. 2. The smart contracts used in our scheme.

Thirdly, the user runs the EncInd algorithm with a set of keywords W related to the CT to generate the encrypted indexes CW for CT . Upload Data to CS. After preprocessing, the user uploads CT , tags σ , and secret values T1 , T2 to CS, and sends the secret value T3 to the blockchain for data integrity checking. Upon receiving the CT and σ form the user, CS first checks the validation of each tag using the Eq. (2):       (2) e σj , g = e(H1 (idi ) + δ)mj , Ppub ) · e H2 ωj , PKidi

434

X. Zhang et al.

Deploy Smart Contracts. If all tags are valid, the CS deploys the smart contract C1 to the blockchain, which is responsible for compensating the user when the proof provided by the CS does not pass the integrity check. Meanwhile, after uploading data, the user deploys the smart contract C0 , C2 , C3 to the blockchain. C0 is used to pay remuneration to the CS after the integrity check is passed. C2 is used to check the proof generated by the CS, if the check passes, C2 calls C0 to pay remuneration, otherwise, C2 calls C1 to pay compensation. C3 is used to search the data with the trapdoor, and return the search result to the judiciary. The smart contracts are shown in Fig. 2. 5.2.2 Remote Data Integrity Audit Periodically Proof Generation. The CS uses the nonce of the lastest block in blockchain as seed and runs the ProofGen algorithm to generate the proof . Then the CS calls the smart contract C2 to complete the remote data integrity checking. Proof Verification. Upon receiving the proof from the CS, the smart contract C2 checks the correctness of the proof with the ProofCheck algorithm. 5.2.3 Forensics Trapdoor Generation. The judiciary runs the Trapdoor algorithm with correct keywords of the data to generate the trapdoor for searching metadata. Then, the judiciary calls the smart contract C3 to search the corresponding data with the trapdoor. Search. C3 Runs the Search algorithm to search data with the trapdoor. If there is a corresponding data matches the trapdoor, C3 returns the metadata (i.e., data indexes and the ciphertext CTkey of data encryption key) to the judiciary, otherwise, C3 returns ⊥. C3 deducts the query fee from the judiciary’s deposit for each search. Decrypt Data. After obtaining the metadata, the judiciary first fetches the ciphertext CT of original data from the CS. Then, the judiciary decrypts the CTkey with his private key SKj and then obtains the data encryption key key. Finally, the judiciary decrypts CT with the data encryption key key and obtains the original data D.

6 Security Analysis 6.1 Correctness The correctness of our scheme can be demonstrated as follows: e(σ, T1 ) e(σ, T1 ) proof = H4 (e(σ, T1 ).μ−1 ) = H4 (  vj mj ) = H4 (  tvj mj ) m j∈I T2j j∈I ,j→i e(H1 (ID + δ) , Ppub )  e( j∈I σj vj , T1 ) e(σj vj , T1 ) ) = H4 (  vj mj ) = H4 ( e(psk ID mj , T1 vj ) j∈I ,j→i e(psk ID , T1 ) = H4 (



j∈I ,j→i

= H4 (



j∈I ,j→i

j∈I ,j→i

e(

σj , T1 vj )) = H4 ( e(H2 (ωj )yID , gtvj )) = H4 ( e(H2 (ωj )vj , gtyID )) psk ID mj j∈I ,j→i

e(H2 (ωj )vj , YID t ))

j∈I ,j→i

CL-BOSIC: A Distributed Agent-Oriented Scheme

435

6.2 Soundness We prove that CL-BOSIC is secure against adversaries as defined in Sect. 4.3. Theorem 1. In the random oracle model, if A1 wins Game 1 with a non-negligible probability , then we could construct an algorithm B that simulates a challenger C to solve the CDH problem with a non-negligible probability. Proof. Algorithm B is given g, ga , gb ∈ G1 , it simulates the challenger C and interacts with A1 as follows. Setup: B Generates params, secret key δ ∈ Zq ∗ and set Ppub = g a .B keeps a secret. Initially, B maintains two hash list LH1 and LH2 and a partial private key list Lpsk . These lists are all empty initially. H1 -Query: If A1 makes an H1 -Query with ID, B checks whether (ID, k1 , K) ∈ LH1 . If it holds, B returns K to A1 . Otherwise, B chooses a random k1 ∈ Zq ∗ and computes K = g bk1 . Then B adds (ID, k1 , K) to LH1 and returns K to A1 . Partial-Private-Key-Query: If A1 makes a Partial-Private-Key-Query with identity / LH1 ,B makes the H1 -query. Otherwise, ID, B checks whether (ID, k1 , K) ∈ LPK . If ID ∈ B checks whether (ID, psk) ∈ Lpsk . If ID ∈ Lpsk , B returns psk to A1 . Otherwise, B computes psk = K a , adds the tuple (ID, psk) to Lpsk and then returns psk to A1 . UserKeyGen-Query: If A1 makes a UserKeyGen-Query with identity ID, B checks whether (ID, k1 , K) ∈ LH1 and (ID, psk) ∈ Lpsk . If not, B firstly makes H1 -Query or Partial-Private-Key-Query with ID. Then B chooses a random value x ∈ Zq ∗ and computes PK = g x , and then returns x and PK to A1 . H 2 -Query: If A1 makes an H2 -Query with ω, B checks whether (ω, W ) ∈ LH2 . If it holds, B returns W to A1 . Otherwise, B randomly chooses k2 ∈ Zq ∗ and computes W = g k2 . B adds the tuple (ω, W ) to LH2 and returns W to A1 . Tag-Query: If A1 makes a Tag-Query with (m, ID), B checks whether ID ∈ LH1 , ID ∈ Lpsk and ω ∈ LH2 . If not, B makes corresponding query and updates LH1 , Lpsk and LH2 respectively. After that, B runs Tag-Gen algorithm to compute the tag T for (ω, m, ID) using the corresponding information in LH1 , Lpsk and LH2 , and then returns T to A1 . Forge: Finally, A1 outputs his forgery (ID, K, psk, x, PK, W , m, T ). T is the forged tag of data m on the identity ID with the public key PK. Analysis: If A1 wins the Game 1, B can get e(T, g) = e(H1 (ID + δ)m , Ppub ).e(H2 (ω), PK) according to the verification Eq. (2). Meanwhile, B can retrieve H1 (ID + δ) = g bk1 from LH1 and H2 (ω) = g k2 from LH2 . Thus, B gets e(T, g) = e(g bk1 m , g a ).e(g k2 , PK). Eventually, we can derive that g ab = (

T PK k2

)

1 k1 m

436

X. Zhang et al.

which means that B solves the CDH problem with a non-negligible probability . Therefore, A1 cannot win Game 1 under the CDH problem assumption. Theorem 2. In the random oracle model, if A2 wins Game 2 defined in Sect. 4.3 with a non-negligible probability , then we could construct an algorithm B that simulates a challenger C to solve the CDH problem with a non-negligible probability. Proof. Algorithm B is given g, ga , gb ∈ G1 , it simulates the challenger C and interacts with A2 . The interaction process between A2 and C can be derived from Theorem 1 based on the differences between Game1 and Game2 defined in Sect. 4.3. Forge: After interaction, A2 outputs his forgery (ID, m, ω, T ). T is the forged tag of data m on the identity ID. Analysis: B can get e(T, g) = e(H1 (ID + δ)m , Ppub ).e(H2 (ω), Y) according to the verification Eq. (2) if A2 wins the Game 2. Meanwhile, B can retrieve H1 (ID + δ) = g l1 from LH1 , Y = g al2 from Lval and H2 (ω) = g bl 3 from LH2 . Thus, B gets e(T, g) = e(g l1 m , g s ).e(g bl 3 , g al2 ). Eventually, we can derive that 1

g ab = (T ) l1 l2 l3 sm which means that B solves the CDH problem with a non-negligible probability . Therefore, A2 cannot win Game 2 under the CDH problem assumption. Theorem 3. If the DL assumption holds, the adversary A3 wins Game 3 only at a negligible probability.

Proof. Let the challenge chal = {Q, T1 , T2 }. If A3 outputs a proof and wins the Game 3 at a non-negligible probability, we can get the verification equation.



proof = H4 (e(σ , T1 ).μ

−1

) = H4 (



e(H2 (j||fname)vj , YID t ))

j∈I





in which σ is the forged tag for the forged data block m and μ is produced by A3 . Assumed that the valid proof is proof and the corresponding information is (σ, μ). Then we can also get the verification equation proof = H4 (e(σ, T1 ).μ−1 ) = H4 ( e(H2 (j||fname)vj , YID t )) j∈I

Thus, we can derive the following equation from the above two equations: e(σ, T1 ) μ =

e(σ , T1 ) μ

Because A3 wins Game 3, there exists σ = σ and at least one data block mj = mj . 

T2j

vj m j

Supposed that mj − mj = mj . Then we have μ = 1, which means  j∈I vj mj = 1. μ j∈I T2j  We can get j∈I T2j vj mj = 1. Based on this conclusion, the DL problem can be solved

CL-BOSIC: A Distributed Agent-Oriented Scheme

437

Fig. 3. The overhead of indexes encryption and trapdoor generation process

Fig. 4. The time cost of preprocessing. The number of keywords is indicated after EncInd.

as follows: Given two elements g, y ∈ G1 in which y = ga , we will compute a ∈ Zq ∗ . We randomly choose αj , βj ∈ Zq ∗ and let T2j vj = Xj = g αj yβj . We can get the following equation:

mj α T2j vj mj = Xj mj = (g αj yβj ) = g i∈I j mj y i∈I βj mj = 1 j∈I

j∈I

j∈I i∈I αj mj i∈I βj mj

Then we can derive y = g . Since mj = 0, βj is the random value from Zq ∗ , so the probability of i∈I βj mj = 0 is only q1 . Therefore, we can output the right value with a non-negligible probability 1 − q1 . Theorem 4. If the DDH assumption holds, the keyword search in our scheme is semantically secure against inside keyword guessing attacks in the random oracle model. Proof. The proof process is same as the scheme [24], so we will not repeat here for reasons of the length of the paper.

6.3 Perfect Privacy Preservation Data Privacy Preservation. In the auditing process, verifier obtains proof = H4 (e(σ, T1 ) · μ−1 ), in which the information related to user’s data (σ and μ) are all

438

X. Zhang et al.

hidden by a hash function. And verifier just needs to check the Eq. (1) without knowing any information about user’s data and tags. Furthermore, users encrypted their data before uploading to CS, so the verifier can’t decrypt the original data even if it obtained data during auditing. User Identity Privacy Preservation. A log file is designed to maintain the indexes of data blocks and the information of the tag generators, including H1 (idi + δ). It is obvious that the user identity is randomized by δ ∈ Zp∗ , which is kept secret by KGC. So it is impossible for verifier to obtain the user’s real identity information. Table 1. Time cost of the proof generation and verification process in different cases. Processes

stages

Time cost of cases with different sampling blocks (ms) 100

Proof generation ChalGen 16 process RespGen 625.8 Total Verification process

641.8

off-chain 474.4

200 19

400

600 17.6 18.6

800

1000 20.4 22

1219.2

2390.2 3517

4655.1 5817.4

1238.2

2407.8 3535.6

4675.5 5839.4

910.6

1811.6 2714.2

3582.8 4467.2

on-chain 625.8 1219.2 2390.2 3517 4655.1 5817.4 (+32%) (+ 34%) (+ 32%) (+ 30%) (+ 30%) (+ 30%)

7 Implementation and Evaluation We utilize Java to implement a prototype of CL-BOSIC on Ubuntu-18.04 with Intel(R) Core(TM) i7-8750H CPU @ 2.20 GHz and 12 GB memory. We adopt the GMP and JPBC for big integer and pairing operation, and utilize Hyperledger Fabric as the distributed agent. We choose the Type-A elliptic curve with order of 160-bit, which provides a symmetric pairing with the fastest speed among all default parameters and has the equivalent security level of 1024-bit RSA. Performance of Indexes Encryption and Trapdoor Generation. We evaluate the time overhead of indexes encryption and trapdoor generation processes with different numbers of attributes. These operations are only performed once in the whole lifecycle of data. As show in the Fig. 3, in the case with 20 keywords, the time cost of both processes are no more than 500 ms, which is acceptable in actual usage scenarios. Performance of Preprocessing. As shown in Fig. 4, the time cost of preprocessing increases with the number of keywords. Since the tag generation and indexes encryption process are only executed once during the whole lifecycle of data, and the maximum time

CL-BOSIC: A Distributed Agent-Oriented Scheme

439

cost of our experiments does not exceed 900 ms. Therefore, the preprocessing overhead of our scheme is acceptable. Performance of Proof Generation. The proof generation in the CS includes two phases: challenge generation (denoted as ChalGen in Table 1) and response generation (denoted as RespGen in Table 1.). In challenge generation phase, the CS utilizes the nonce of the lastest block in the blockchain to generate the index of the sampling blocks. In the response generation phase, the CS utilizes the index of sampling blocks and tags of the blocks to generate the response that is sent to the blockchain. As shown in Table 1, the total time cost increases with the block number. According to [1], under the balance of security and overhead, generally 460 sampling blocks are selected for verification. According to the results, when the sampling block is 460, the overhead of proof generation is about 2500ms. We consider this overhead is acceptable. Performance of Verification. To evaluate the additional overhead brought by the blockchain in the verification process. We calculate the time cost of performing the same verification algorithm on the blockchain (denoted as on-chain in Table 1.) and on the host (denoted as off-chain in Table 1.). We evaluation the time cost of verification in cases with different number of sampling blocks. The percent overhead calculation against off-blockchain case is shown in the below of on-blockchain case. According to the results, due to the existence of the consensus mechanism in the blockchain, the time cost of on-blockchain case is greater than that of off-blockchain case. The present of additional overhead is about 30%, which is acceptable.

8 Conclusion We apply PDP to the judicial scenario, and construct an e-evidence storage and integrity check scheme to solve the problem of e-evidence depositing and authentication. CLBOSIC introduces blockchain as a distributed agent, using smart contracts deployed on the chain to create a fair and reciprocal environment, and build a non-interactive auditing framework to avoid the communication overhead between the verifier and the CS. Meanwhile, CL-BOSIC allows judiciary to make precise requests for e-evidence through keyword searching, and reduces the cost of certificate management and avoids the security risks of the KGC by utilizing the certificateless cryptography. Finally, we provide the security analysis and performance evaluation of CL-BOSIC, which indicate that our proposal is secure and incurs tolerable overheads for each party.

References 1. Ateniese, G., Burns, R., Curtmola, R., et al.: Provable data possession at untrusted stores. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 598–609 (2007) 2. Wang, C., Wang, Q., Ren, K., et al.: Privacy-preserving public auditing for data storage security in cloud computing. In: 2010 Proceedings IEEE Infocom, pp. 1–9. IEEE (2010) 3. Yang, K., Jia, X.: An efficient and secure dynamic auditing protocol for data storage in cloud computing. IEEE Trans. Parallel Distrib. Syst. 24(9), 1717–1726 (2012)

440

X. Zhang et al.

4. Li, J., Zhang, L., Liu, J.K., et al.: Privacy-preserving public auditing protocol for lowperformance end devices in cloud. IEEE Trans. Inf. Forensics Secur. 11(11), 2572–2583 (2016) 5. Hao, Z., Zhong, S., Yu, N.: A privacy-preserving remote data integrity checking protocol with data dynamics and public verifiability. IEEE Trans. Knowl. Data Eng. 23(9), 1432–1437 (2011) 6. Yu, Y., et al.: Enhanced privacy of a remote data integrity-checking protocol for secure cloud storage. Int. J. Inf. Secur. 14(4), 307–318 (2014). https://doi.org/10.1007/s10207-014-0263-8 7. Yu, Y., et al.: Identity-based remote data integrity checking with perfect data privacy preserving for cloud storage. IEEE Trans. Inf. Forensics Secur. 12, 767–778 (2017) 8. Wang, H.: Identity-based distributed provable data possession in multicloud storage. IEEE Trans. Ser. Comput. 8, 328–340 (2015) 9. Boneh, D.; Franklin, M.: Identity-based encryption from the weil pairing. In: Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 19–23 August 2001; pp. 213–229 (2001) 10. Liu, X., Zhang, Y., Wang, B., et al.: Mona: secure multi-owner data sharing for dynamic groups in the cloud. IEEE Trans. Parallel Distrib. Syst. 24(6), 1182–1191 (2012) 11. Wang, B., Li, B., Li, H.: Panda: public auditing for shared data with efficient user revocation in the cloud. IEEE Trans. Serv. Comput. 8(1), 92–106 (2013) 12. Al-Riyami, S.S., Paterson, K.G.: Certificateless public key cryptography. Asiacrypt. 2894, pp. 452–473 (2003) 13. Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system (2008) 14. McCorry, P., Shahandashti, S.F., Hao, F.: A smart contract for boardroom voting with maximum voter privacy. In: Financial Cryptography and Data Security: 21st International Conference, FC 2017, Sliema, Malta, April 3-7, 2017, Revised Selected Papers 21. Springer International Publishing, pp. 357–375 (2017) 15. Dong, C., Wang, Y., Aldweesh, A., et al.: Betrayal, distrust, and rationality: Smart countercollusion contracts for verifiable cloud computing. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 211–227 (2017) 16. Christidis, K., Devetsikiotis, M.: Blockchains and smart contracts for the internet of things. IEEE Access 4, 2292–2303 (2016) 17. Gudeme, J.R., Pasupuleti, S., Kandukuri, R.: Certificateless privacy preserving public auditing for dynamic shared data with group user revocation in cloud storage. J. Parallel Distributed Comput. 156, 163–175 (2021) 18. Zhang, J., Cui, J., Zhong, H., Gu, C., Liu, L.: Secure and efficient certificateless provable data possession for cloud-based data management systems. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12681, pp. 71–87. Springer, Cham (2021). https://doi.org/10.1007/978-3030-73194-6_5 19. Al-Riyami, S.S.; Paterson, K.G.: Certificateless public key cryptography. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Taipei, Taiwan, 30 November–4 December 2003, pp. 452–473 20. Fiege, U., Fiat, A., Shamir, A.: Zero knowledge proofs of identity. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of computing, pp. 210–217 (1987) 21. Yanguo, P., Jiangtao, C., Changgen, P., et al.: Certificateless public key encryption with keyword search. China Commun. 11(11), 100–113 (2014) 22. Song, D.X., Wagner, D., Perrig, A.: Practical techniques for searches on encrypted data. In: Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000, pp. 44–55. IEEE (2000) 23. Boneh, D., Di Crescenzo, G., Ostrovsky, R., Persiano, G.: Public key encryption with keyword search. In: Cachin, C., Camenisch, J.L. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 506– 522. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24676-3_30

CL-BOSIC: A Distributed Agent-Oriented Scheme

441

24. Yang, X., Chen, G., Wang, M., et al.: Multi-keyword certificateless searchable public key authenticated encryption scheme based on blockchain. IEEE Access 8, 158765–158777 (2020) 25. Liu, B., Yu, X.L., Chen, S., et al.: Blockchain based data integrity service framework for IoT data. In: 2017 IEEE International Conference on Web Services (ICWS). IEEE, pp. 468–475 (2017) 26. Shen, W., Qin, J., Yu, J., et al.: Enabling identity-based integrity auditing and data sharing with sensitive information hiding for secure cloud storage. IEEE Trans. Inf. Forensics Secur. 14(2), 331–346 (2018)

Zeroth-Order Gradient Approximation Based DaST for Black-Box Adversarial Attacks Yanfei Zhu1

, Yaochi Zhao1(B) , Zhuhua Hu2 and Anli Yan1

, Xiaozhang Liu3

,

1 School of Cyberspace Security, Hainan University, Haikou, China

[email protected]

2 School of Information and Communication Engineering, Hainan University, Haikou, China 3 School of Computer Science and Technology, Hainan University, Haikou, China

Abstract. In recent years, adversarial attacks have arisen widespread attention in the domain of deep learning. Compared to white-box attacks, black-box attacks can be launched only with output information, which is more realistic under real-world attack conditions. An important means of black-box attacks is to train a substitute model of the target model. Yet, obtaining the training data of target model is difficult. Previous works use GAN to accomplish data-free training of the generator and the substitute model. However, the gradient of generator has no relationship with the target model, which causes the requirement of massive iterations and the limitation of Attack Success Rate (ASR). To address this issue, we propose zeroth-order gradient approximation based Data-free Substitute Training (DaST) for black-box adversarial attacks. It estimates the gradient of target model using forward difference and back propagates the gradient to generator, which improves ASR and training efficiency. Four popular auxiliary white-box attack algorithms are used to compare our method with previous works. Experiment results on MNIST and CIFAR-10 demonstrate higher ASR, less training time and memory of our method. Specifically, on MNIST, our method reaches an ASR of 99.87% consuming only 10 min and 1092.56 MB memory, with the increase of 3.51% ASR, the time reduction of 38 h and the memory decrease of 929.72 MB compared to the state-of-the-art method. Moreover, our method remains effective even when the structure of substitute model differs from that of target model. Keywords: Black-Box Adversarial Attacks · Data-Free Substitute Training · Zeroth-Order Gradient Approximation

1 Introduction The machine learning models are vulnerable to malicious and imperceptible disturbances [1, 2], which makes adversarial attacks and defenses a challenging problem. Adversarial attacks can be divided into white-box attacks [1, 3–7] and black-box attacks This work was supported in part by the National Natural Science Foundation of China (62161010, 61963012), the Natural Science Foundation of Hainan Province (623RC446), the Key Research and Development Project of Hainan Province (ZDYF2022GXJS348, ZDYF2022SHFZ039) China. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 442–453, 2023. https://doi.org/10.1007/978-981-99-4755-3_38

Zeroth-Order Gradient Approximation

443

[8–14] according to the access permission granted to attackers. White-box attack methods, including Fast Gradient Sign Method (FGSM) [1], DeepFool [7], Jacobian-based Saliency Map Attack (JSMA) [5] etc., generate perturbations by calculating the gradient of victim model. Therefore, white-box attacks require the full knowledge of target model. In contrast, black-box attacks only require the outputs of target model, which is more executable for realistic attack conditions. In this paper, we focus on the research of black-box attack. Training substitute models is a common approach to implement black-box attacks [8]. The training efficiency of these methods is then greatly enhanced by reservoir sampling which is used to reduce the number of queries [15]. However, previous works require the original training dataset or partial sub-training set of target model, which is challenging, resource-intensive and even infeasible in practical scenarios, thereby limiting the practical applicability of this approach. Research in Data-free Substitute Training (DaST) [16, 17] domain offers an solution by utilizing Generative Adversarial Networks (GANs) [18]. The generator produces synthetic samples and the substitute model mimics the prediction behavior of target model during training. Whereas, the gradient of generator has no relationship with the target model, leading to the requirement of massive iterations, time and memory as well as the limitation of Attack Success Rate (ASR). To overcome the above shortcoming, we propose zeroth-order gradient approximation based DaST for black-box adversarial attacks. Zeroth-order gradient approximation refers to a derivative-free method for estimating the model gradient in black-box attack where only zeroth-order predictions is available. With the convergence of zeroth-order methods proved [19], it is used to optimize and produce adversarial samples [20]. In this paper, zeroth-order estimation method is used to approximate the gradient of target model and the estimated gradient is back propagated into generator for efficient training. In detail, small perturbations is cast onto synthetic samples and forward difference is used to estimate gradient. After obtaining the optimal substitute model, 4 popular auxiliary white-box attack algorithms are employed, comprising FGSM, Basic Iterative Method (BIM) [3], Projected Gradient Descent (PGD) [4] and Carlini and David Wagner (C&W) [6], to compare our method with previous works. Experiment results on MNIST and CIFAR-10 demonstrate the higher ASR, less training time and memory of our method. Moreover, our method remains effective even when the structure of substitute model differs from that of target model. Our primary contributions are described in the following: (1) We propose zeroth-order gradient approximation based DaST for black-box adversarial attacks. Specifically, forward difference is utilized to estimate the gradient of target model and provide the valid information of target model for the training of generator and substitute model. (2) Our method achieves higher similarity and ASR while requiring significantly less training time and memory without necessity of consistency between the target model and the substitute model. (3) On MNIST, our method reaches an ASR of 99.87% consuming only 10 min and 1092.56 MB memory, with the increase of 3.51% ASR, the time reduction of 38 h and the memory decrease of 929.72 MB, compared with the state-of-the-art method. Our method outperforms existing method in DaST based black-box adversarial attacks.

444

Y. Zhu et al.

2 Related Work The vulnerability of machine learning models to intentional and imperceptible perturbations is firstly discovered by Szegedy et al. [2]. Adversarial attacks against this vulnerability can be divided into white-box attacks requiring full knowledge of the target model and black-box attacks only having access to the input data and output predictions. 2.1 Black-Box Adversarial Attacks Black-box attacks are firstly introduced by Papernot et al. [8] to describe attack scenarios where adversaries only have access to input data and output predictions. In black-box attacks, adversaries often craft adversarial examples according to the local substitute model based on the transferability of adversarial examples [1]. To enhance the transferability of malicious samples, Xie et al. [21] propose the Dense Adversary Generation (DAG) algorithm which combines two or more heterogeneous perturbations. However, low training efficiency and the difficulty of obtaining the original dataset hinders the practicality of these methods. Other research generates adversarial samples by estimating the gradient of target model. To obtain the estimated gradient, Chen et al. [20] propose zeroth-order optimization and coordinate descent to estimate gradient and produce adversarial samples simultaneously. Meunier et al. [14] extend such attacks by proposing evolution strategies and other derivative-free optimization methods. 2.2 DaST in Black-Box Adversarial Attacks To overcome the difficulty of obtaining the training dataset of target model in black-box adversarial attacks, Zhou et al. [16] propose DaST method using a specially designed multi-branch GAN and label control loss to train substitute models with even generated samples. Yet the complexity of generator leads to low training efficiency and effectiveness. To solve this problem, Yu et al. [17] use a single-branch generator with an information entropy loss function to correct the uneven distribution of synthetic samples. The upgrade of model structure reduces the parameter space as well as promoting attack performance, but the attack still requires massive training time and computing resources.

3 Zeroth-Order Gradient Approximation Based DaST 3.1 Overview Zeroth-order gradient approximation method is used during the training of GAN (see Fig. 1). The gradient estimation process is indicated by the red dashed line. In the figure, G is the generator to produce synthetic samples, T is the target model being attacked and S represents the substitute model which imitates the prediction behavior of T . Data z is random noise while X signifies synthetic samples used to train the substitute model, LG and LS are the loss values of generator and substitute model

Zeroth-Order Gradient Approximation

445

Fig. 1. The schematic diagram of zeroth-order gradient approximation based DaST for black-box adversarial attacks.

respectively. ω is auxiliary white-box attack algorithms, besides A is ultimate adversarial samples against the target model. Black arrows indicate data streams while red arrows indicate the process of back propagation, among which, red dashed arrows represent the gradient approximation. Our method aims to efficiently launch attacks without accessing the original training data of target model. During training, G produces synthetic images X from z. Then X is fed into both T and S to calculate loss values LG and LS until S is well-trained. Finally, the optimal S and testing set are inputted into ω to obtain A which is used to deceive the target model. 3.2 GAN The training process of GAN can be regarded as a two-player minmax game (see Eq. 1), where the generator tries to maximize the objective function, and the surrogate model tries to minimize it. After adversarial training of GAN, we can reach the ultimate goal of obtaining a substitute model that predicts similarly with the target model. max min V (G, S) G

S

(1)

The similarity between the substitute model and the target model is assessed by L1 distance of their output probabilities and using it as part of the loss function (see Eq. 2). L1 =

n 

|Ti (G(z)), Si (G(z))|

(2)

i=1

G(z) represents synthetic samples that are fed into both the target model T and the substitute model S to obtain the corresponding probability output. And L1 is the summation of the probability distances for each class i. The substitute model get close to the target model by minimizing L1 . To facilitate even distribution of generated samples, information entropy loss is computed according to predicted probability outcomes of the target model. Information

446

Y. Zhu et al.

entropy describes the uncertainty of various possible events. When the output probability is excessively biased towards a certain category, the value of the information entropy loss function increases, thereby prompting the generator to explore samples of other categories and increase the probability of generating other types of samples. The information entropy loss is also set it as another part of the loss function (see Eq. 3) of the generator. Linfo = −H (G(z)) =

n 

Ti (G(z)) log Ti (G(z))

(3)

i=1

The loss function of substitute model is L1 distance, while the complete loss function of the generator is represented by following equation (see Eq. 4): LG = L1 + βLinfo

(4)

where β controls the weight of the Linfo and its value is set to 1.0. As the substitute model is fully available, back propagation can be achieved by stochastic gradient descent (SGD). Additionally, the generator is also optimized via target model with the zeroth-order gradient approximation method to estimate and back propagate the estimated gradient. 3.3 Gradient Approximation The gradient of generator is formulated as following equation (see Eq. 5). ∇θG L =

∂x ∂LG × ∂x ∂θG

(5)

G The partial derivative of generator loss function with respect to x is denoted as ∂L ∂x , ∂x and ∂θG represents the partial derivative of x with respect to the parameters of G. By applying the chain rule, we can compute the gradient of the loss function with respect to the generator parameters. Since we have full access of the generator, we can calculate ∂x ∂θG and achieve optimization by traditional backpropagation. However, the internal information of the target model required to compute gradient ∇x L(x) is not available in black-box attacks. Thus, zeroth-order gradient estimation is used to approximate the real gradient of the target model. Specifically, we employ forward difference (see Eq. 6) to estimate the gradient by adding perturbations to the input samples and asking the target model for the difference in loss values.

1  LG (x + δui ) − LG (x) ui d n δ n

∇x LG (x) =

(6)

i=1

In the above formula, the generator loss function LG (x) is the objective function, d is a unit sphere having the same dimension with x, and ui is a random variable taken on the sphere d . δ is a small positive constant called smoothing factor [22]. To obtain stable and accurate estimation of the gradient, we set n as the number of times for gradient estimation, and take the average of n as estimated gradients.

Zeroth-Order Gradient Approximation

447

4 Experiments In this part, we validate the effectiveness of our method. The experiments simulate the real-world environment, assuming that the target model is unknown. The structures of substitute model are selected in line with the function of target model, and the datasets are MNIST and CIFAR-10 in experiments. 4.1 Environment Settings Experiments are conducted on Ubuntu 22.04 system equipped with GeForce RTX 3090. The code is developed using torch 1.12.0, and programmed with Python 3.9. To assess the effectiveness of this method, it is compared with two existing approaches: DaST [16] and FE-DaST [17]. DaST firstly applies data-free training method to black-box adversarial attacks and serves as our baseline method. FE-DaST is the state-of-the-art method in this domain and is used as a controlled experiment to verify the superiority of our method. The experiments are conducted in probability-only scenario, where target models output probabilities as feedback for training. 4.2 Model Architecture Generator. We utilize a single-branch generator to produce samples from random noise. The generator consists of several layers including a fully connected layer for extending the size of noise vectors, three convolutional layers for extracting potential features, four batchnorm layers for normalizing, and two upsampling layers for matching the dimension of generated samples. The middle layers use LeakyReLU as their activation functions, and the last activation function is Tanh. The generator is trained by Adam with 5 × 10−4 as its initial learning rate which multiplies 0.3 at 0.1 ×, 0.3 × and 0.5 × of the total epoch numbers. Target and Substitute Model. Substitute models are trained from the scratch and used to generate adversarial examples against target models. The attack effectiveness of the proposed method can be measured by experimental evaluation standards. On MNIST, we utilize a pretrained CNN with 4 layers as our target model, while the substitute model has 5 convolution layers. On CIFAR-10, pretrained VGG-16 is used as our target model and ResNet-18 is chosen as our substitute model. As for optimizer, we adopt SGD for substitute models. The initial learning rate is set to 0.1, and a 0.3 descent multiplier is applied at the same percentage of epoch numbers as the generator. The weight decay is 5 × 10−4 . 4.3 Evaluation Standard To ensure a comprehensive evaluation of our approach, we use several measurement criteria. Firstly, we measure the time and computational memory required in the training process to evaluate training efficiency. Secondly, since the transferability of adversarial samples is the theoretical basis of training substitute models, we calculate the accuracy of

448

Y. Zhu et al.

substitute models on the testing set, and the similarity between the predicted probabilities of target and substitute models. Finally, attack success rate is the most direct way to compare the attack capability of various approaches, and it is defined as the ratio of the number of successfully misclassified samples to the total number of samples. 4.4 Results Training Time and Memory Consumption. The usage of gradient approximation enables our method to train the substitute model with information of the target model, which cuts down training iterations and the consumption of computing resources. To better quantify the efficiency of our method, we compare the training time and the estimated GPU memory size required by different methods, and the results are recorded in Table 1. As shown in Table 1, the proposed method gains significantly lower time consumption and the memory requirement, for the usage of gradient approximation cuts down the number of iterations necessary for training. That makes this method outperforms other compared methods in terms of these two standards. Table 1. Training time and memory requirement of various methods on MNIST and CIFAR-10. The bold indicate better performance. Datasets

Approach

Training Time

Estimated Memory

MNIST

DaST

150 h 26 min

14211.99 MB

FE-DaST

38 h 10 min

2022.28 MB

CIFAR-10

our method

10 min

1092.56 MB

DaST

268 h 34 min

26171.82 MB

FE-DaST

75 h 7 min

11007.73 MB

our method

1 h 24 min

2706.86MB

Specifically, compared with the DaST, the time required for training substitute model using our method is almost negligible on both MNIST and CIFAR-10, which are 10 min and one hour 24 min respectively. The memory usage of our method also indicates that the proposed method has low computational cost. Accuracy and Similarity. In this section, we present the accuracy of substitute models and the similarity between target and substitute models, which is shown in Table 2. On MNIST, both the accuracy and the similarity of our method are higher than DaST and FE-DaST. Our method achieves an accuracy of 99.45% which improves 1.63% compared to DaST and 1.06% compared to FE-DaST. Moreover, the similarity of our method outperforms that of other methods. This is due to the fact that the gradient estimation provides more effective information for training. On CIFAR-10, our method

Zeroth-Order Gradient Approximation

449

Table 2. Accuracies (%) of substitute models and similarities (%) between target and substitute models in various methods. The bold indicate better performance. Datasets

Approach

Accuracies

Similarities

MNIST

DaST

97.82

97.89

FE-DaST

98.39

99.01

CIFAR-10

our method

99.45

99.97

DaST

23.44

23.05

FE-DaST

46.68

46.73

our method

39.65

64.89

obtain a similarity of 64.89%, which is 18.16% higher than the state-of-the-art method. Yet, the number of queries limits the accuracy of alternative models to some extent. Attack Success Rate. In this subsection, we input the trained substitute model and the testing set into four distinct white-box attack algorithms, including FGSM, BIM, PGD, and C&W, to generate adversarial examples and calculate ASRs for the target models. ASRs are recorded in Table 3 and Table 4, and the examples of original and non-targeted adversarial images of MNIST and CIFAR-10 are shown in Fig. 2. As shown in Table 3, our method achieves higher ASR under four auxiliary attack methods due to the higher accuracy and similarity of its substitute model. For nontargeted attacks, the BIM method provides the highest ASR of 99.87%, which is 3.51% and 5.44% higher than DaST and FE-DaST respectively. The C&W method provides the lowest ASR of 76.74%, which is about 50% higher than the previous methods. Meanwhile, except for C&W, our method achieves the lowest L2 distance. For targeted attacks, our method achieves the highest ASR of 85.64%, with considerable advantage than other methods. Table 4 shows ASRs of different methods on CIFAR-10. For non-targeted attacks, ASRs of our method are much higher than all previous works. For targeted attacks, our method also obtains significant advantages than DaST and FE-DaST with FGSM, BIM and PGD. Additionally, the L2 distance of our methods is smaller than that of previous methods. Due to the unknown structure of target model, we can only choose substitute models functionally similar with the target model. However, even with similar functionalities, there are still multiple substitute model architectures available. In order to evaluate the attack performance of our method under various substitute models, different model structures are used for each of two dataset and record the ASRs in Table 5 and Table 6. Table 5 indicates the experiments conducted on MNIST, the target model is 4-layer convolution network, while 5-layer convolution network and LeNet5 are chosed as substitute models. As demonstrated in the table, 5-layer convolution substitute model achieves the highest ASR, followed by LeNet5. As shown in Table 6, on CIFAR-10, the target model is VGG16, and we choose VGG13, ResNet18, and ResNet50 as our substitute models. All of substitute models

450

Y. Zhu et al.

Table 3. Attack success rates (%) of various methods on MNIST. The target model is a 4-layer convolution network and the substitute model is a 5-layer convolution network. The L2 distance for each attack is presented in parentheses. The bold and underline indicate better performance which is higher ASR and lower L2 distance. ASRs

DaST

FE-DaST

our method

FGSM

69.76(5.41)

57.66(5.42)

79.59(5.37)

BIM

96.36(4.81)

94.43(4.86)

99.87(4.74)

PGD

53.99(3.99)

66.68(3.98)

79.04(3.95)

C&W

23.80(2.99)

28.53(2.79)

76.74(2.86)

FGSM

23.93(5.45)

15.11(4.53)

28.84(5.47)

BIM

57.22(4.87)

61.83(4.84)

85.64(4.84)

PGD

47.57(4.63)

52.11(4.69)

69.50(4.59)

C&W

23.80(2.99)

26.49(3.01)

61.76(3.02)

Non-targeted

Targeted

Table 4. Attack success rates (%) of different methods on CIFAR-10. The target model is VGG16 and the substitute model is ResNet18. The L2 distance for each attack is presented in parentheses. The bold and underline indicate better performance which is higher ASR and lower L2 distance. ASRs

DaST

FE-DaST

Our method

FGSM

36.63(1.65)

58.01(1.65)

90.23(0.85)

BIM

39.67(1.08)

41.58(1.04)

100.00(0.61)

PGD

29.75(1.08)

41.66(1.04)

100.00(0.52)

C&W

16.47(6.20)

49.34(5.26)

62.85(0.90)

FGSM

4.69(1.65)

14.97(1.65)

16.08(0.10)

BIM

1.01(1.08)

5.03(1.04)

76.98(0.26)

PGD

1.11(1.08)

10.82(1.04)

52.39(0.52)

C&W

6.46(6.26)

28.70(6.83)

15.87(0.43)

Non-targeted

Targeted

exhibit good attack performance. Among them, VGG13 and ResNet18 achieves better results due to their structural similarity to the target model.

Zeroth-Order Gradient Approximation

451

Table 5. Attack success rates (%) of our method with different substitute models on MNIST. The target model is 4-layer convolution network and substitute models are 5-layer convolution network and LeNet5. L2 distances for every attacks are provided in parentheses, with bold and underline indicating superior performance which means higher ASR and lower L2 distance. ASRs

5-layer conv

LeNet5

FGSM

79.59(5.37)

67.63(5.36)

BIM

99.87(4.74)

93.61(4.77)

PGD

79.04(3.95)

46.17(3.95)

C&W

76.74(2.86)

22.89(2.76)

FGSM

28.84(5.47)

23.05(5.46)

BIM

85.64(4.84)

49.49(4.87)

PGD

69.50(4.59)

37.86(4.62)

C&W

61.76(3.01)

16.52(3.01)

Non-targeted

Targeted

Table 6. Attack success rates (%) of our method with different substitute models on CIFAR10. The target model is VGG16 and substitute models are VGG13, ResNet18 and ResNet50. L2 distances for every attacks are provided in parentheses, with bold and underline indicating superior performance which is higher ASR and lower L2 distance. ASRs

VGG13

ResNet18

ResNet50

FGSM

91.50(0.85)

90.23(0.85)

88.71(0.84)

BIM

100.00(0.61)

100.00(0.61)

99.98(0.67)

PGD

100.00(0.50)

100.00(0.52)

99.99(0.52)

C&W

35.93(0.39)

62.85(0.90)

35.75(0.38)

FGSM

13.74(0.84)

16.08(0.10)

13.65(0.85)

BIM

65.36(0.52)

76.98(0.26)

73.88(0.53)

PGD

55.21(0.50)

52.39(0.52)

54.95(0.50)

C&W

20.05(0.44)

15.87(0.43)

19.38(0.43)

Non-targeted

Targeted

452

Y. Zhu et al.

Fig. 2. Visualization of clean and adversarial images generated by the proposed method on MNIST (left) and CIFAR-10 (right).

5 Conclusion The vulnerability of machine learning models to adversarial attacks remains a notable research topic, which has prompted us to explore innovative and effective approaches. In this work, we proposed the zeroth-order gradient approximation based DaST for black-box adversarial attacks. Experimental results on MNIST and CIFAR-10 datasets demonstrates the effectiveness of our method with higher ASR as well as lower training time and memory, even when the structure of substitute model differs from that of victim model. In the future, we will study diffusion model to obtain adversarial samples more effectively.

References 1. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014) 2. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013) 3. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: Yampolskiy, R.V. (ed.) Artificial Intelligence Safety and Security, pp. 99–112. Chapman and Hall/CRC, New York (2018) 4. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017) 5. Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE, New York (2016) 6. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, New York (2017) 7. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: A simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582. IEEE, New York (2016)

Zeroth-Order Gradient Approximation

453

8. Papernot, N., Mcdaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical blackbox attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519. ACM, New York (2016) 9. Narodytska, N., Kasiviswanathan, S.P.: Simple black-box adversarial attacks on deep neural networks. In: CVPR Workshops, vol. 2, pp. 6–14. IEEE, New York (2017) 10. Xiao, C., Li, B., Zhu, J.Y., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. arXiv preprint arXiv:1801.02610 (2018) 11. Chen, J., Jordan, M.I., Wainwright, M.J.: Hopskipjumpattack: a query-efficient decisionbased attack. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1277–1294. IEEE, New York (2020) 12. Fu, Q.A., Dong, Y., Su, H., Zhu, J., Zhang, C.: AutoDA: Automated decision-based iterative adversarial attacks. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 3557– 3574. USENIX Association, Boston, MA (2022) 13. Li, X.C., Zhang, X.Y., Yin, F., Liu, C.L.: Decision-based adversarial attack with frequency mixup. IEEE Trans. Inf. Forensics Secur. 17, 1038–1052 (2022) 14. Meunier, L., Atif, J., Teytaud, O.: Yet another but more efficient black-box adversarial attack: tiling and evolution strategies. arXiv preprint arXiv:1910.02244 (2019) 15. Ghadimi, S., Lan, G.: Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013) 16. Zhou, M., Wu, J., Liu, Y., Liu, S., Zhu, C.: Dast: Data-free substitute training for adversarial attacks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 234–243. IEEE, New York (2020) 17. Yu, M., Sun, S.: FE-DaST: Fast and effective data-free substitute training for black-box adversarial attacks. Comput. Secur. 113, 102555 (2021) 18. Goodfellow, I., et al.: Generative adversarial nets. In: Neural Information Processing Systems. MIT Press, Cambridge (2014) 19. Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016) 20. Chen, P.Y., Zhang, H., Sharma, Y., Yi, J., Hsieh, C.J.: Zoo: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26. ACM, New York (2017) 21. Xie, C., Wang, J., Zhang, Z., Zhou, Y., Xie, L., Yuille, A.: Adversarial examples for semantic segmentation and object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1369–1378. IEEE, New York (2017) 22. Kariyappa, S., Prakash, A., Qureshi, M.K.: Maze: Data-free model stealing attack using zeroth-order gradient estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13814–13823. IEEE, New York (2021)

A Dynamic Resampling Based Intrusion Detection Method Yaochi Zhao1

, Dongyang Yu1

, and Zhuhua Hu2(B)

1 School of Cyberspace Security, Hainan University, Haikou, China 2 School of Information and Communication Engineering, Hainan University, Haikou, China

[email protected]

Abstract. With the development of Internet of Things (IoT), IoT security becomes a focal point of attention. Intrusion detection is one of the key technologies for protecting IoT security. In real IoT environments, there always exists serious class imbalance in traffic data, which likely causes the poor performance of machine learning models for intrusion detection. An effective solution is using resampling methods. However, the current resampling methods are inherently static because they don’t change the class distribution during the whole training process, which may lead to the overfitting of models. To address the problem, we propose a dynamic resampling method, where the attenuation rate of oversampling is introduced. Our method dynamically adjusts the sizes of classes and the distribution of dataset during training. It can be easily implemented on the existing resampling methods and alleviate the overfitting problem. In addition, to improve the feature extraction ability of model and further boost the classification performance, we propose a parallel-residual feature fusion module (PRFF). PRFF uses a residual structure including two parallel convolutional layers to extract multiple residual features, and fuse them with the original data features. The experiment results on CICIDS2017 and NSL-KDD demonstrate that our method performs better than the existing methods. In particular, on CICIDS2017, our method improves F1-Score by 0.3% and Macro F1-Score by 0.4% over the best existing approach. On NSL-KDD, our method also boosts the performances with 2.6% F1-Score and 4.1% Macro F1-Score, respectively. Keywords: Intrusion detection · Class Imbalanced · Resampling · Oversampling

1 Introduction In recent years, with the rapid development of Internet of Things (IoT) technologies, the number of IoT devices increases rapidly. Due to the limitations of IoT devices in terms of information transmission, performance, and other aspects, these devices are at This work was supported in part by the National Natural Science Foundation of China (62161010, 61963012), the Natural Science Foundation of Hainan Province (623RC446), the Key Research and Development Project of Hainan Province (ZDYF2022GXJS348, ZDYF2022SHFZ039), China. The authors would like to thank the referees for their con-structive suggestions. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 454–465, 2023. https://doi.org/10.1007/978-981-99-4755-3_39

A Dynamic Resampling Based Intrusion Detection Method

455

high risk of being attacked by hackers. Therefore, how to improve the performance of intrusion detection and ensure the security of IoT devices gradually becomes a research hot spot. Intrusion detection is one of the key technologies for protecting IoT security. It can detect abnormal device access or user behavior, and then protect the IoT system through access denial, issuing alarms, etc. Currently, many research works are carried out to develop better intrusion detection models. In these works, intrusion detection is treated as classification problems. Machine learning models, such as K-nearest neighbors (KNNs) [13], random forest (RF) [4], and support vector machine (SVM) [24], are widely used. Specially, deep learning models, such as convolutional neural network (CNN) [7] and recurrent neural network (RNN) [21], gradually attract the attention of researchers. In real-world IoT environments, there always exists a significant difference in the number of normal and abnormal flows, with normal flows being the vast majority. Therefore, the collected data is also highly imbalanced. This can adversely affect the performance of machine learning models, causing the models to be biased towards normal flows. To solve the above problem, an effective solution is to use resampling methods. Resampling methods contains undersampling and oversampling. Since undersampling methods delete some samples, they may undesirably lead to the loss of important information. Oversampling methods generates more samples of minority classes to balance the proportion of samples from different classes. For conventional oversampling methods, oversampling occurs before training, and once training begins, the distribution of classes will be never changed. This may lead to some negative effects, such as the model focusing too much on the minority class and reducing the classification accuracy of the majority class. Even the latest work [1, 15, 17, 25] still uses static oversampling to deal with data imbalance. In this paper, we propose a dynamic resampling method. It dynamically controls the attenuation rate of oversampling to adjust the sample numbers of classes and the distribution of dataset during training. By this way, the overfitting problem caused by the constant class distribution in the existing methods can be alleviated. Moreover, our dynamic resampling method can be easily implemented on the existing resampling methods. To improve the capability of feature extraction and further improve classification performance, we propose parallel-residual feature fusion (PRFF) module, which together with the subsequent fully connected layers forms a complete neural network. PRFF contains two parallel 1D convolutional layers for extracting features in different dims. To provide more information for multi-classification, PRFF extracts residual features through its residual connection and then fuses the extracted features with original features. In summary, there are the following contributions in this paper: • A novel dynamic resampling method is proposed, which dynamically adjusts the attenuation rate of oversampling to change distribution of dataset during training process. It can be easily implemented on the existing resampling methods and alleviate the overfitting problem. • A PRFF module is proposed, which uses a residual structure including two parallel convolutional layers to extract multiple residual features, and fuse them with the

456

Y. Zhao et al.

original data features. PRFF can improve the ability of features extraction and multiclassification performance. • We evaluate our method by comparing it with the several newest intrusion detection approaches on CICIDS2017 and NSL-KDD. On CICIDS2017, our method improves F1-Score by 0.3% and Macro F1-Score by 0.4% over the best existing approach. On NSL-KDD, our method also boosts the performance with 2.6% F1-Score and 4.1% Macro F1-Score, respectively. The remaining structure of the paper is arranged as follows. In Sect. 2, we review related work on intrusion detection and class imbalance. Then, we provide a detailed introduction to the dynamic resampling method and our model with PRFF in Sect. 3. The details and results of experiments are presented in Sect. 4. Finally, we draw conclusions.

2 Related Work 2.1 Intrusion Detection At present, intrusion detection technology is mainly divided into two categories: featurebased intrusion detection and anomaly-based intrusion detection. The former detects intrusion by matching the features of current flow with the features of previous intrusions and vulnerabilities, which has low performance. The latter detects any behavior that deviates from the learned normal data pattern as an anomaly, and it has better detection performance for new attacks. Anomaly-based intrusion detection mainly uses machine learning to learn abnormal mode. Today, various machine learning methods are used for intrusion detection, such as KNN [13], SVM [24], and random forests [4], which utilize labeled training data to train classifiers to classify network flows. In recent years, deep learning models such as CNN [2, 3], RNN [21], long short-term memory (LSTM) [6], and other networks are used for intrusion detection [23]. Elsherif et al. [6] combine RNN, bidirectional RNN, LSTM, and bidirectional LSTM to establish a model for binary and multi-class intrusion detection. Andresini et al. [3] combine nearest neighbor search with clustering algorithm, transform input data into 2D representation, and then perform binary classification through CNN. Different from the above works, considering the performance and efficiency of intrusion detection, we propose an 1D CNN based PRFF module to improve the performance of features extraction and further improve performance of multi-classification. 2.2 Class Imbalance In the real IoT environment, normal flows account for the vast majority, that is, they exist a serious class imbalance in the data used to train model for intrusion detection. To address the problem, two approaches are studied by researchers, algorithm-level methods and data-level methods. Algorithm-level methods are mainly cost-sensitive methods. Cost-sensitive learning methods incorporate class or sample costs into the objective function during the training process. Cost parameters can be arranged in the form of a cost matrix such that higher costs are associated with misclassification of samples from the minority class. Khan et al.

A Dynamic Resampling Based Intrusion Detection Method

457

[11] propose a cost-sensitive method in which they simultaneously optimize the model parameters and cost parameters. In the field of computer vision, the locally sensitive loss function proposed by Lin et al. [12] for object detection attracts widespread attention. Data-level methods balance the dataset by changing the class distribution in the original data using resampling methods. Resampling methods include oversampling and undersampling. Undersampling methods that discard samples from the majority class lose valuable information and are not feasible when the class imbalance is too high. Recent work uses oversampling methods such as ROS [1, 5, 20], SMOTE [8, 25], ADASYN [17], etc., to address the problem of data imbalance. A new oversampling method for computer vision is proposed by [15], which transfers the context information of the majority class to the minority class samples, and is experimentally proven to be effective. Islam et al. [10] propose a novel oversampling method that uses KNearest Neighbor to find critical regions for data augmentation, and demonstrate good performance on multiple imbalanced image datasets. However, the current resampling methods are all static strategies, meaning that once resampling is completed, the data distribution remains constant throughout the training process, which causes the model’s emphasis on each class to remain constant, resulting in the limitation of generalization ability. In contrast, we propose a generalized dynamic resampling method to solve the problem of class imbalance.

3 Methodology 3.1 Dynamic Resampling Traditional resampling methods only sample once before training and do not change the distribution of dataset throughout training. They are classified as static resampling methods in this paper. These methods are likely to cause overfitting and limit the model performance. We focus on oversampling and propose a dynamic oversampling method. Let Sj and Sj denote the oversampling number for class j in static and dynamic oversampling methods, respectively. Sj is a constant, which is typically set to the sample number of classes with largest size. In contrast, Sj is varying with current epoch, defined as: Sj (t) = αj (t) × Sj

(1)

where t is the current epoch during training, and αj is the attenuation rate of oversampling for class j. Attenuation Rate of Oversampling. The attenuation rate αj is used to adjust the distribution of dataset during training, and then alleviate the overfitting of model. We define αj as a cosine function that varies with current epoch t. It is formulated as follows: αj (t) =

1 [cos(β(t)π) + 1] 2

(2)

458

Y. Zhao et al.

where β is the progress rate of training, which is in the range of [0, 1], defined by: β(t) =

t/T  M /T 

(3)

where M is a hyperparameter representing the total number of epochs, and T is the period of dynamic oversampling, that is, the attenuation rate is adjusted every T epochs. t/T  reflects how many times dynamic oversampling had been performed during training. M /T  is the expected number of times of dynamic oversampling in the entire process. To adjust the output range of the cosine function, we add 1 to its result and multiply the output by 1/2, so that the function value is always a non-negative number in the range of [0, 1]. In Eq. (1), αj decreases over time, and Sj gradually decreases until it equals to 0, which causes the occurrence of undersampling in the later period of training. Therefore, we consider the original sample number Nj of class j into Eq. (1). Specifically, we set Nj as the lower bound of Sj , and control Sj to vary from Sj to Nj , as shown in Eq. (4).     S j (t) = αj (t) × Sj − Nj + Nj = αj (t) × Sj + 1 − αj (t) × Nj (4) where Sj and Sj denote the oversampling number for class j in static and dynamic oversampling methods. And αj (t) is defined by Eq. (2).

Fig. 1. Changing curve of α. X-axis is the progress rate of training β, and Y-axis is the attenuation rate of oversampling α. α l and β u work together to limit the range of α.

For those classes with extremely small sizes and high complexities, it is not suitable to decrease Sj to Nj during the later training phase, because it may cause the underfitting of model. To solve the problem, we introduce two hyperparameters, α l and β u , to limit the ranges of α and β, respectively, as shown in Fig. 1 and Eq. (5).         t/T  1 u (5) cos min , β × π + 1 , αl α(t) = max M /T  2   where α l is the lower bound of α and α is limited in the range of α l , 1 . When α goes u down to α l , it will  no longer change. β is the upper bound of β andu β is limited in the u range of 0, β to finally influence the value of α. The introduce of β is to more flexibly control when to stop dynamic oversampling.

A Dynamic Resampling Based Intrusion Detection Method

459

3.2 Parallel-Residual Feature Fusion (PRFF) In order to better extract features from multiple dimensions, we propose a parallelresidual feature fusion module (PRFF) based on residual connections and parallel convolutional layers, which together with the subsequent fully connected layers forms a complete neural network, as shown in Fig. 2.

Fig. 2. Classification model with PRFF. Abbreviations: CONV = convolution; FC = fullyconnected layer.

PRFF contains two parallel 1D convolutional layers for extracting different types of features. Moreover, to prevent the feature extraction module from losing important information in the original data, the features generated by the parallel convolutional layers are fused with the original data features. That is, PRFF is to extract multiple residual features and fuse them with the original data features. Although PRFF is simple to implement, it is efficient in the intrusion detection task, as demonstrated by our experiments. As shown in Fig. 2, the proposed PRFF includes two branches of 1D convolutional layers with kernel size of 1 and output lengths of 128 and 64, respectively. The output of PRFF is then successively passed through three fully connected layers with output length of 1024, 512 and n (the number of classes in dataset), respectively. The activation function of the hidden layers is rectified linear unit (ReLU), and the final output layer uses Softmax for multi-class classification. The feature extraction process using 1D convolutional operations can be represented as: ⎞ ⎛ l F

Kjfl (τ) × afl (τ) + blj ⎠ (6) ajl+1 (τ) = σ⎝ f =1

where the feature map j of layer l is represented by ajl (τ), the non-linear function is represented by σ, the number of feature maps of layer l is represented by F l , the convolution kernel is represented by Kjfl , and the bias vector is represented by blj . Finally, the trained classifier can be used as a detector for intrusion detection. The implementation procedure of the overall framework is shown in Table 1. Equation (7) uses T (n) to describe the computational complexity of our procedure. Note that the computational complexity of dynamic resampling is equal to O(1) and thus has no impact on the final complexity. T (n) = O(f (n)) + O(c) = O(f (n))

(7)

460

Y. Zhao et al.

where T (n) represents the total computational complexity. O(f (n)) represents the computational complexity of regular training process and O(c) represents the computational complexity of dynamic resampling. Importantly, c is a constant that equals to training epochs. Table 1. Intrusion detection procedure of the proposed dynamic resampling method.

Require: training dataset , testing dataset Model , Loss function , learning rate , the number of static resampling , training epochs, batch size Ensure: The classification information of testing dataset 1: Preprocessed dataset : including the completion of missing values, the label coding of discrete features, and reshaping the input vector into a matrix. 2: repeat 3:

for Number of training epochs do

4:

Compute attenuation rate α according to Eq. (5).

5: (4).

Compute target sampling number

6: Execute resampling resampled . 7:

dataset

to

obtain

for Number of mini-batch do

8: Input prediction .

to

9:

Compute loss

10:

Back-propagate

11:

on

according to Eq.

classifier

network

to

get

. to optimize classifier.

end for

12: end for 13: Loss 14: Input class.

reaches convergence to trained classifier to get predict

A Dynamic Resampling Based Intrusion Detection Method

461

4 Experiments We describe the datasets in Sect. 4.1 and implementation details of experiments in Sect. 4.2. The results of experiments are shown in Sect. 4.3. 4.1 Datasets CICIDS2017 is released by the Canadian Institute for Cyber-Security at the University of New Brunswick [19]. CICIDS2017 contains benign and the latest common attacks similar to real-world data. Different attack types are distributed across multiple dataset files. Since CICIDS2017 does not provide a predefined training and test sets, we adopt the data collection method used in [7]. The number of samples collected for each category of CICIDS2017 is reported in Table 2. Table 2. Distribution of network flows in each attack category in CICIDS2017. Class Category

Count

Percentage (%)

Benign

440031

44.1076

DoS Hulk

231073

23.1622

PortScan

158930

15.9308

DDoS

128027

12.8331

DoS GoldenEye

10293

1.0317

FTP-Patator

7938

0.7957

SSH-Patator

5897

0.5911

DoS slowloris

5796

0.5810

DoS Slowhttptest

5499

0.5512

Web Attack

2180

0.2185

Bot

1966

0.1971

Total

997630

100

NSL-KDD is improved based on dataset of KDD-CUP 1999 [22] by removing some issues and unnecessary features, making it more suitable for evaluating machine learning algorithms. As shown in Table 3, NSL-KDD includes normal flows and four types of attack flows: DoS attack, probe attack, R2L attack, and U2R attack. Recent studies consider it an effective benchmark dataset to help researchers compare different network intrusion detection approaches. 4.2 Implementation Details On CICIDS2017, we adopt the Adam algorithm with a learning rate of 2e-4 to optimize the parameters, with a training epoch of 200 and a batch size of 256. On NSL-KDD, we

462

Y. Zhao et al. Table 3. Distribution of network flows in each attack category in NSL-KDD.

Class Category

Train

Test

Count

Percentage (%)

Count

Percentage (%)

Normal

67343

53.4583

9711

43.0758

Dos

45927

36.4578

7458

33.0820

Probe

11656

9.2528

2421

10.7390

R2L

995

0.7898

2754

12.2161

U2R

52

0.0413

200

0.8871

Total

125973

100

22544

100

optimize the parameters using the Adam algorithm with a learning rate of 1e-2, a training epoch of 100, and a batch size of 16. The cross entropy is used as the loss function. Our method is implemented in the Pytorch deep learning framework and the experiment is conducted on a computer with Intel i9-10900X CPU with 64 GB RAM and an NVIDIA GeForce RTX3090 GPU. 4.3 Experimental Results We compare our approach with commonly used intrusion detection methods, which can be divided into two categories: traditional machine learning methods and deep learning methods. In our experiment, traditional machine learning methods include KNN, Multinomial Naive Bayes (Multinomial NB), SVM and RF, and deep learning methods include Deep Neural Network (DNN) and Deep Belief Networks (DBNs). Table 4 shows the experimental results. As can be seen from Table 4, Multinomial NB achieves very low performance metrics on CICIDS2017, which is due to the strong correlation of sample attributes in the dataset. Compared to the existing methods, our method has the same or higher Accuracy and the highest F1-Score and Macro F1-Score. On CICIDS2017, our method has an F1-Score exceeding 0.3% and a Macro F1-Score exceeding 0.4% than DNN (the best method among the compared methods). On NSL-KDD, our method has an Accuracy exceeding 1.6%, F1-Score exceeding 2.6%, and Macro F1-Score exceeding 4.1% than SVM (the best method among the compared methods). Ablation Experiments. We implement our dynamic resampling method on the existing common oversampling methods (ROS, SMOTE and ADASYN). And we compare the static oversampling methods with the corresponding dynamic oversampling methods on NSL-KDD, as shown in Table 5. Obviously, our dynamic strategy has better performance than static strategy. For fairness, all the experiments are performed on the same model with the PRFF module, specially. To analyze the effectiveness of PRFF in our model, we compare the model with PRFF and model without PRFF, both using the dynamic ROS method on NSL-KDD. As can be seen in Table 6, the performance of model without PRFF is much lower than that

A Dynamic Resampling Based Intrusion Detection Method

463

Table 4. Performance comparison of different classification methods on CICIDS2017 and NSLKDD. All compared methods employ static random oversampling, while our method uses dynamic random oversampling with PRFF as the feature extraction module. (%) Datasets

Methods

Accuracy↑

F1-Score↑

Macro F1-Score↑

CICIDS2017

KNN [13]

99.8

98.2

98.4

Multinominal NB [14]

72.5

52.5

45.9

RF [4]

99.8

98.2

98.0

SVM [16, 24]

94.6

95.8

81.7

DNN [18]

99.8

98.5

98.4

DBN [9]

99.8

98.5

98.4

Ours

99.8

98.8

98.8

KNN [13]

74.9

60.6

52.1

Multinominal NB [14]

65.8

58.8

48.8

RF [4]

73.8

58.4

48.1

SVM [16, 24]

77.6

60.4

56.9

DNN [18]

73.6

58.9

53.0

DBN [9]

76.5

58.9

54.8

Ours

79.2

63.0

61.0

NSL-KDD

Table 5. Performance comparison between static resampling and dynamic resampling on NSLKDD. Here, the dynamic resampling is set with βu = 0.2 and α l = 0.8. (%) Methods

Strategy

Accuracy↑

F1-Score↑

Macro F1-Score↑

ROS [1, 5, 20]

Static

78.1

60.6

57.4

Dynamic

79.2

63.0

61.0

SMOTE [8, 25] ADASYM [17]

Static

77.8

61.9

57.7

Dynamic

80.2

66.2

63.2

Static

77.3

61.3

57.2

Dynamic

81.0

64.8

62.3

of the model with PRFF. The model with PRFF obtains 2.6% higher Accuracy, 4.5% higher F1-Score and 8% higher Macro F1-Score than the model without PRFF.

464

Y. Zhao et al. Table 6. Performance comparison between using PRFF and not using PRFF. (%)

Methods

Accuracy↑

F1-Score↑

Macro F1-Score↑

Ours (not using PRFF)

76.6

58.5

53.0

Ours (using PRFF)

79.2

63.0

61.0

5 Conclusion To address the overfitting problem existing in traditional static resampling methods during training, we proposed a novel dynamic resampling method that modifies the distribution of dataset by adjusting attenuation rate of oversampling. The method of dynamic resampling is ease to deploy on the existing resampling methods. In addition, to further improve classification performance, we proposed a PRFF module. Experiment results demonstrate that both of dynamic resampling and PRFF module can improve classification performance. In the future, we will investigate the combination of dynamic oversampling and undersampling to better address the class imbalance problem.

References 1. Almuayqil, S.N., Humayun, M., Jhanjhi, N., Almufareh, M.F., Javed, D.: Framework for improved sentiment analysis via random minority oversampling for user tweet review classification. Electronics 11(19), 3058 (2022) 2. Andresini, G., Appice, A., Caforio, F.P., Malerba, D., Vessio, G.: Roulette: a neural attention multi-output model for explainable network intrusion detection. Expert Syst. Appl. 201, 117144 (2022) 3. Andresini, G., Appice, A., Malerba, D.: Nearest cluster-based intrusion detection through convolutional neural networks. Knowl.-Based Syst. 216, 106798 (2021) 4. Balyan, A.K., et al.: A hybrid intrusion detection model using ega-pso and improved random forest method. Sensors 22(16), 5986 (2022) 5. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018) 6. Elsherif, A.: Automatic intrusion detection system using deep recurrent neural network paradigm. J. Inf. Secur. Cybercrimes Res. 1(1), 21–31 (2018) 7. Fernando, K.R.M., Tsokos, C.P.: Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Trans. Neural Networks Learn. Syst. 33(7), 2940–2951 (2021) 8. Gonzalez-Cuautle, D., et al.: Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets. Appl. Sci. 10(3), 794 (2020) 9. Huda, S., Miah, S., Yearwood, J., Alyahya, S., Al-Dossari, H., Doss, R.: A malicious threat detection model for cloud assisted internet of things (cot) based industrial control system (ics) networks using deep belief network. J. Parallel Distrib. Comput. 120, 23–31 (2018) 10. Islam, A., Belhaouari, S.B., Rehman, A.U., Bensmail, H.: Knnor: an oversampling technique for imbalanced datasets. Appl. Soft Comput. 115, 108288 (2022) 11. Khan, S.H., Hayat, M., Bennamoun, M., Sohel, F.A., Togneri, R.: Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Networks Learn. Syst. 29(8), 3573–3587 (2017)

A Dynamic Resampling Based Intrusion Detection Method

465

12. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017) 13. Liu, G., Zhao, H., Fan, F., Liu, G., Xu, Q., Nazir, S.: An enhanced intrusion detection model based on improved knn in wsns. Sensors 22(4), 1407 (2022) 14. Panda, M., Abraham, A., Patra, M.R.: Discriminative multinomial naive bayes for network intrusion detection. In: 2010 Sixth International Conference on Information Assurance and Security, pp. 5–10. IEEE (2010) 15. Park, S., Hong, Y., Heo, B., Yun, S., Choi, J.Y.: The majority can help the minority: Contextrich minority oversampling for long-tailed classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6887–6896 16. Ponmalar, A., Dhanakoti, V.: An intrusion detection approach using ensemble support vector machine based chaos game optimization algorithm in big data platform. Appl. Soft Comput. 116, 108295 (2022) 17. Rani, M., Kaur, G., et al.: Designing an efficient network-based intrusion detection system using an artificial bee colony and adasyn oversampling approach. In: Machine Learning for Edge Computing, pp. 169–186. CRC Press (2023) 18. Saba, T., Rehman, A., Sadad, T., Kolivand, H., Bahaj, S.A.: Anomaly-based intrusion detection system for iot networks through deep learning model. Comput. Electr. Eng. 99, 107810 (2022) 19. Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1, 108–116 (2018) 20. Sharma, S., Gosain, A., Jain, S.: A review of the oversampling techniques in class imbalance problem. In: International Conference on Innovative Computing and Communications: Proceedings of ICICC 2021, vol. 1, pp. 459–472. Springer (2022) 21. Tang, T.A., Mhamdi, L., McLernon, D., Zaidi, S.A.R., Ghogho, M.: Deep recurrent neural network for intrusion detection in SDN-based networks. In: 2018 4th IEEE Conference on Network Softwarization and Workshops (NetSoft), pp. 202–206. IEEE (2018) 22. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. IEEE (2009) 23. Thakkar, A., Lohiya, R.: A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif. Intell. Rev. 55(1), 453–563 (2021). https://doi.org/10.1007/s10462-021-10037-9 24. Tian, Y., Mirzabagheri, M., Bamakan, S.M.H., Wang, H., Qu, Q.: Ramp loss one-class support vector machine; a robust and effective approach to anomaly detection problems. Neurocomputing 310, 223–235 (2018) 25. Zhang, Y., Liu, Q.: On IoT intrusion detection based on data augmentation for enhancing learning on unbalanced samples. Futur. Gener. Comput. Syst. 133, 213–227 (2022)

Information Security Protection for Online Education Based on Blockchain Yanran Feng(B) College of Information Engineering, Zhengzhou University of Finance and Economics, Zhengzhou, China [email protected]

Abstract. With the development of information technology, the world has entered the information age. Network technology has gradually enriched people’s learning methods while changing their lives. Online education has emerged, greatly promoting people’s thirst for knowledge. However, at the same time, incidents such as user information security breaches and copyright infringement occur frequently, lacking effective protection for user information security. This article mainly analyzes the information security issues faced by online education, analyzes the application of blockchain technology in teaching at home and abroad, and explores the effectiveness and feasibility of blockchain in online education. In order to better protect the information security of online education users and fully leverage the positive role of computer networks in the development of education. Keywords: Blockchain · Information security · Online education

1 Introduction In today’s world, the internet has become an important platform for information exchange and has opened up new media for people’s thirst for knowledge. Various types of online education have emerged as the times require. However, with the development of society and the establishment of a market economy, the commercialization of personality elements and the diversification of interests have led to the gradual development of personal information as a commercial value. The rapid popularization of information technology has provided convenient conditions for the collection and processing of personal information, greatly increasing the possibility of unreasonable use, collection, tampering, deletion, replication, and theft of personal information for online education users. With the rise of the Internet, online education has sprouted and developed. Online education can break the limitations of time and space, allowing students to enjoy educational resources without leaving their homes, and has played a great complementary role in traditional education. In 2022, the number of digital education users in China reached 314 million, a year-on-year increase of 5.36%. In order to further analyze and study the information security protection role of blockchain technology in online education, the second section of this article analyzes the current information security issues faced by online education, the third section explores the use of blockchain technology © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 466–474, 2023. https://doi.org/10.1007/978-981-99-4755-3_40

Information Security Protection for Online Education Based on Blockchain

467

in teaching both inside and outside China, the fourth section analyzes the feasibility of promoting the use of blockchain technology in online education, and the fifth section is the conclusion.

2 Current Situation of Online Education Information Security The information security of online education also belongs to network information security. Essentially, it refers to the security of information on the network, including the hardware, software, and data in the network system. The transmission, storage, processing, and use of network information must be in a secure and visible state. However, information systems are not absolutely secure, and after analysis, current online education mainly faces the following risks. 2.1 Software Inherent Risks The development and dissemination of computer network information have its unique characteristics, and the network characteristics of information dissemination determine that its dissemination scope is wider than traditional media dissemination, and its impact on people is more profound. However, it is precisely due to the wide scope of dissemination that regulatory personnel lack sufficient and effective monitoring of information dissemination, which leads to a lack of information security [1]. There are two forms of software vulnerabilities, one is vulnerabilities left by developers due to negligence or other technical reasons, and the existence of these vulnerabilities creates network security risks. Another way is for developers to set up some software backdoors for convenience. Although others may not know about these backdoors, once leaked, it will lead to the problem of online education user information leakage. 2.2 Public WiFi Usage Risks In our daily lives, for convenience, a large number of people use public WiFi networks for learning and work, and online education is also the case. However, this exposes servers to public networks. Once connected to public WiFi, criminals can relatively easily steal personal information of online education users for sale, or steal important data information stored on computers. This poses a certain copyright threat to online learning users’ personal learning materials, and even poses a great threat to their financial and personal safety by stealing contact books or chat records, and deceiving the stolen list information. 2.3 Risks of Virus Attack The application of network technology has driven the rise of a group of technical talents. As a leader in the field of computer technology applications, hackers can easily steal and intercept stored or disseminated data. And with the rapid development of the Internet, the attack technology of hackers is rapidly changing, and the risk of virus attacks on online education users will also increase. Hackers will spread the virus by attacking

468

Y. Feng

system vulnerabilities and using attractive icons to lure users to click on virus software. Once attacked by hackers, the user’s computer system information will be destroyed, leading to system paralysis, which will have a significant negative impact on the learning of online learning users.

3 Overall Explanation of Blockchain 3.1 Basic Concepts The concept of blockchain first appeared in 2008, and the development of blockchain technology has gone through three stages: first, the blockchain 1.0 stage, which mainly focuses on virtual currency applications; The second is the blockchain 2.0 stage with smart contracts as its basic feature; The third is blockchain 3.0, the expansion and application stage of blockchain technology [2]. Blockchain technology is a professional information technology that exists in the form of shared data chains to store data information. The content structure includes three parts: transaction link, block division link, and chain structure. Among them, the nature of transactions refers to a single operational process, while blocks emphasize a regional structure. If the regional structure is subdivided internally, the block area includes the block head area and the block body area. In terms of specific functions, the block header is mainly responsible for ensuring the comprehensiveness and completeness of data information, while the block body mainly displays specific transaction information records and content. In a complete block, record information with different contents and focuses can be uniformly and centrally displayed [3]. In terms of form, the chain structure is more dynamic and displays the operational status and changes of the entire blockchain system. Different block structures also rely mainly on chain structures for connection. From the above analysis, it can be seen that the concept of blockchain is comprehensive in nature, formed by the fusion of different types of independent structures, and the roles played by different structural regions are also different. Only by relying on the collaborative play of the overall structural role can this form of information dissemination be effectively realized. 3.2 Basic Characteristics of Blockchain A. Decentralized features The decentralization effect is achieved from three aspects: information dissemination, recording, and storage. Firstly, in terms of transmission methods, the transmission structure of blockchain is a distributed structure, which means that further information recording is also implemented in a distributed form. Ultimately, this distributed feature will continue to be reflected in the storage process. In the specific process of exerting its function, the blockchain structure relies on the 2P network transmission mode and the independence of each link’s working state, which is the main basis for achieving the decentralized effect of this structure and system [4]. B. Openness The foundation of blockchain technology is open source. In addition to encrypting the private information of all parties involved in the transaction, blockchain data is open to

Information Security Protection for Online Education Based on Blockchain

469

everyone, and anyone can query blockchain data and develop related applications through public interfaces. Therefore, the entire system information is highly transparent. With the development of blockchain technology, Ethereum has emerged after the Bitcoin system. The blockchain of Ethereum is more advanced than Bitcoin. Ethereum and Bitcoin are different from pre configured trading system operations, as Ethereum is an open source, programmable blockchain. Therefore, the Ethereum system can also be seen as the openness of blockchain, and it is an upgraded version of “openness”. C. No tampering The most easily understood feature of blockchain is its immutability. Non tampering is formed based on a unique ledger of ‘block + chain’: blocks containing transactions are continuously added to the end of the chain in chronological order. To modify the data in a block, it is necessary to regenerate all subsequent blocks. Once the information is verified and added to the blockchain, it will be permanently stored. As long as 51% of all data nodes cannot be controlled, network data cannot be arbitrarily manipulated and modified. Therefore, the data stability and reliability of the blockchain are extremely high. This makes blockchain itself relatively secure and avoids subjective and artificial data changes. D. Traceability Blockchain is a “blockchain data” structure, similar to a interlocking “iron chain”. The content of the next link contains the content of the previous link, and the information on the chain is linked in chronological order. This allows any piece of data on the blockchain to be traced back to its origin through the “blockchain data structure”, which is the “traceability” of blockchain. Based on the traceability of blockchain, information is recorded on the blockchain from the very beginning. Once a problem occurs, it can be traced back to see which link went wrong. E. Reliability The data on the blockchain stores multiple copies, and any node failure will not affect the reliability of the data. The consensus mechanism makes modifying a large number of blocks extremely expensive and almost impossible. Destroying data is not in the self-interest of important participants, and this practical design enhances the security and reliability of data on the blockchain. F. Anonymity Due to the fact that the exchange between nodes follows a fixed algorithm, data exchange does not require trust (the program rules in the blockchain will determine whether the activity is valid), so counterparties do not need to generate trust in themselves through public identity, which is very helpful for credit accumulation. Unless required by legal regulations, technically speaking, the identity information of each block node does not need to be disclosed or verified, and information transmission can be anonymous. In the education industry, the “anonymity” of blockchain can play a great role in protecting personal privacy.

470

Y. Feng

4 The Current Status of Blockchain Application in Online Education Due to its decentralized, traceable, tamper resistant, secure, and reliable features, blockchain can play a good role in online education, such as information security maintenance in learner identity authentication, learning process recording, and other aspects. The world’s first blockchain technology based Woolf University was launched by the University of Oxford in March 2018. The tamper resistant of blockchain prevents students from tampering with their academic records, and smart contracts automate students’ attendance, credits, and academic papers. Yonsei University in South Korea will collaborate with Puhang University of Technology to develop Engram’s knowledge sharing system and build a blockchain campus. In the Engram system, a customized digital currency called Neuron will be launched based on the blockchain system as a reward system, and a blockchain voting system will be developed to ensure that forged results do not occur [5]. The Nigerian government and IBM are attempting to collaborate to establish a blockchain based academic certificate network publishing and management platform, aiming to achieve transparent production, transmission, and verification of academic certificates. The Open University of the United Kingdom has developed a combination of “Micro credentials” - through the blockchain technology platform, the credits or achievements obtained from different educational institutions are combined to apply for recognition of the graduation/degree certificate of this composite pattern. The University of Melbourne is testing blockchain student file management, using a new digital system to view student files and utilizing the immutability of blockchain to provide real talent information for enterprises. At the same time, China is actively developing blockchain based education models. In 2016, the Ministry of Industry and Information Technology issued the “White Paper on the Development of Blockchain Technology and Applications in China”, which pointed out that “the transparency of blockchain systems and the immutability of data are fully applicable to student credit management, enrollment and employment, academia, qualification certification, industry university cooperation, and other aspects, and have important value for the healthy development of education and employment. In 2018, the Chinese Ministry of Education released the “Education Informatization 2.0 Action Plan”, which proposed to actively explore effective ways of recording, transferring, exchanging, and authenticating intelligent learning effects based on new technologies such as blockchain and big data, forming a ubiquitous and intelligent learning system, and promoting the deep integration of information technology and intelligent technology into the entire process of education and teaching [6]. In 2020, the Ministry of Education issued the “2020 Key Points for Education Informatization and Network Security Work”, which proposed to explore the application of blockchain technology in students’ online learning, teachers’ online teaching behavior recording and recognition, and establish a scalable and credible new model for online teaching evaluation [7]. In recent years, many universities in China have also established blockchain colleges. It is not difficult to see that blockchain based education models will become a new trend in future education. It is not difficult to see that the development of future education requires not only cutting-edge technologies such as AI and metaverse, but also the integration of

Information Security Protection for Online Education Based on Blockchain

471

blockchain technology to ensure the stable development of future education. Therefore, the era of blockchain + education has arrived.

5 The Feasibility of Blockchain in Online Education The answer to educational resources is ‘what to learn’. Each learner has different foundations, interests, and learning methods, and the ‘what to learn’ they need should not be a unified ‘wholesale’ planned resource system. It is necessary to create customized and modular teaching and research resources [8]. The research on the integration of blockchain technology and online education mode aims to find methods for transformation and updating, and transform the online education mode by referencing the blockchain operation mode, ultimately achieving the effect of generating blockchain segmentation. When building blockchain based on online education mode, the entire process of online education can be regarded as an independent chain of blockchain, solving the problems of credit mutual recognition in online education mode [9]. In this mode, not only can the scope of students’ learning content and channels for obtaining academic qualifications be expanded, but also the barriers to learning opportunities between different age groups, occupational types, and basic education groups can be broken down. This is also a positive practice for building a learning society’s macro goal. The decentralized and co maintenance features of blockchain enable the customization and sharing of teaching and research resources. All resources, including course resources, case libraries, teaching and research achievements, are organized according to data block logic, and each teacher and student can participate in the updating of resources. The so-called “three people must have my teacher” allows each student to participate in the updating of resources. The saying goes, ‘In a group of three, there must be my teacher’, which allows students to participate in resource updates and generate learning resources themselves. Through collective wisdom, resources can be optimized to the greatest extent possible. At the same time, blockchain can fully ensure the trust of resources and intellectual property rights [8]. 5.1 Using Blockchain for User Information Protection There are three types of blockchain: public chain, private chain, and alliance chain, and alliance chain is commonly used in this education system. In the alliance chain, it is usually authorized by relevant alliance groups to jointly maintain the operation of the blockchain, which is equivalent to partial decentralization. It only takes effect after being confirmed by more than half of the groups on the chain. Blockchain is composed of multiple nodes, each of which backs up transaction information. The failure of one or even multiple nodes will not affect the entire system operation, and data security and reliability are guaranteed [10]. The data transmission on the blockchain relies on asymmetric encryption techniques in cryptography, and the blockchain system assigns users a pair of keys, including public and private keys. When users want to upload data, they need to go through the verification of other majority nodes. The request can only be successfully completed after the majority on the chain has passed consensus verification on the remaining nodes. The uploaded data resources will be uploaded to the blockchain

472

Y. Feng

in the form of block pairs. Once the block is recorded on the chain, it can hardly be deleted or modified except for querying and using [11]. This can effectively protect the identities of teachers and learners, as well as learning or teaching data. Perform identity verification on users entering the system, grant different rights to different objects, and encrypt their data to ensure the security of block data and the privacy of users using online education platforms. 5.2 Using Blockchain for Teaching Resource Protection and Sharing Copyright protection is still highly valued in the field of education. The traceability of blockchain allows authors to check whether or when the resources they create have been abused, and helps them question abusive behavior. Blockchain technology adopts a point-to-point data storage structure, effectively weakening the role of schools as central nodes in traditional resource sharing methods [12]. Mine Labs in New York, USA has developed a metadata protocol called Mediachain based on blockchain technology, which can effectively protect resources [13]. In the blockchain online teaching platform, resource owners can upload resources and write copyright information through the block. These resources can be used by other users, and copyright information will be automatically added when the resources are used. At the same time, utilizing the traceability feature of blockchain, the ownership rights of resources can be traced, allowing users to supervise copyright information.Taking asymmetric encryption as an example, it reduces the number of attackers who can unlock resources through traditional brute force attacks. Blockchain technology can also record the visit application records of every visitor, which is very helpful for copyright owners to protect their own copyright interests. Adding an exclusive copyright block to resources can increase the cost of registration and governance. With the asymmetric encryption and timestamp technology of blockchain, the ownership and transaction process of copyright are clearly traceable, and the copyright owner can confirm their rights or find the infringing party in the first time, providing evidence for the rights protection stage. Due to the irreversibility and immutability of blockchain, teaching resources have been shared and protected, providing great convenience and security for users who need online education. 5.3 Using Blockchain for Teaching Tracking The decentralized nature of blockchain can be used on online teaching platforms to track and record teaching and learning processes. The learning process data is mainly based on the data generated by learning flow, including time and location information, learning resource information, learning process and learning outcomes, which are difficult to collect and use. The existing education data cannot be collected across classrooms and disciplines, and the collected data does not have continuity and cannot be analyzed across disciplines. Only by standardizing the opposing learning process data, blockchain technology can preserve the learning behavior data of students at different times, locations, and disciplines in the same database. Based on this, data analysis can be conducted, and the retained data cannot be corrected, ensuring the authenticity of the data and the credibility of the analysis results. Decentralization is essentially achieved through smart contracts and distributed ledgers. Distributed ledgers contribute, replicate,

Information Security Protection for Online Education Based on Blockchain

473

and synchronize databases through network academies. Once data is uploaded, the entire blockchain system will jointly maintain data security, preventing tampering and theft. Teachers and students log into the blockchain teaching platform, and the smart contract will record the teaching and learning process. During the teaching process, teachers can store blockchain data such as lectures, homework assignments, and online discussion activities. When students complete the corresponding tasks assigned by the teacher, they can obtain corresponding grades through online exams. Throughout the entire process, there is no need for perceived intervention, and the information is encrypted and cannot be tampered with. By using blockchain technology to store distributed databases, students’ learning processes and teachers’ teaching process information can be integrated, and data mining can be used for analysis and evaluation, achieving evaluation of teaching modes and continuously enriching online education modes.

6 Conclusion With the rapid development of network technology, information security issues have gradually attracted people’s attention. Although it provides great convenience and positive impact for people’s learning, it also poses an undeniable threat to the information security of online learning users. The emergence of blockchain has played an important role in promoting the optimization of teaching business processes, reducing operational costs, improving collaborative efficiency, expanding learning scope, maintaining copyright and user security, and providing great support for building new teaching models and privacy protection. Meanwhile, in traditional education, educational units centered around schools are fully responsible for learners’ learning and certification. Whether it is training programs or learning activities, learners are passive participants, making it difficult to ensure the quality of learning and the fairness of certification. In addition, it is also difficult to guarantee the rights of learners or professors. The application of blockchain technology to online education is partly due to its immutable nature, which makes the learning process transparent and recordable, ensuring that data cannot be forged or tampered with, and improving the security of online education. On the other hand, compared to traditional education models, online education can build a real and trustworthy learning process record archive by introducing blockchain learning, providing a more convenient, secure, and open learning environment for more users. In addition, due to blockchain distributed ledgers, asymmetric encryption, and consensus mechanisms, applying blockchain technology to the education industry will provide access to knowledge for users nationwide and even globally, helping to create a more open, secure, and efficient teaching system, and also contributing to the vigorous development of the education industry.

References 1. Du, C.: Research on security prevention measures for computer network information technology. Inf. Secur. Technol. 2011(11) 2. Swan, M.: Blockchain: Blueprint a New Economy, pp. 7–39. O’Reilly Media, CA (2015)

474

Y. Feng

3. Xu, L., Liu, X.: Research on Online Education Platforms Based on Blockchain - Taking On Demand Education Market Company (ODEM) as an Example. Publication Reference 09, 17–21 (2019) 4. Sun, H., Sheng, Y., Su, B.: Analysis and Research on the Current Situation of Blockchain+Online Education. J. Hum. Vocational and Technical College of Posts and Telecommunications, 2019 (02): 16–18. Decentralization is the most prominent and essential feature of blockchain 5. Wu, Y., Cheng, G., Chen, Y., Wang, X., Ma, X., et al.: Analysis of the Current Situation, Hot Spots, and Development Considerations of “Blockchain+Education”. Home Abroad. J. Distance Educ. 35(3), 31–39 (2017) 6. Ministry of Education of the People’s Republic of China Notice of the Ministry of Education on Issuing the Action Plan for Education Informatization 2.0 [EB/OL]. http://www.moe.gov. cn/srcsite/A16/s3342/201804/t20180425_334188.html 7. Key Points for Education Informatization and Network Security Work in 2020 by the General Office of the Ministry of Education [EB/OL]. http://222.ict.edu.cn/news/jrgz/xxhdt/n20200 305_66025.html 8. Jin, Y.: Demand analysis and technical framework for blockchain+education. China Electron. Educ. 09, 62–68 (2017) 9. Liu, T., Yang, J.: Exploration of higher education learning evaluation based on blockchain technology. Softw. Guide 05, 248–252 (2020) 10. Fu, W., Zhou, H.: The COVID-19 has brought challenges and coping strategies to China’s online education. J. Hebei Model Univ. (Education Science Edition) 22(2), 14–18 (2020) 11. Yuan, Y., Ni, X., Zeng, S., Fei-Yue, W.: Development status and prospect of blockchain consensus algorithm. J. Automation 44(11), 2011–2022 (2018) 12. Zhou, Y.: Research on the development and application of higher education based on blockchain technology. Chinese Foreign Entrepreneurs 2, 129 (2018) 13. Li, X., Yang, X.: Applying blockchain technology to build a new ecology of open education resources. China Distance Educ. (Comprehensive Edition) 6, 58–67 (2018)

Blockchain-Based Access Control Mechanism for IoT Medical Data Tianling Yang1 , Shuanglong Huang1 , Haiying Ma1(B) , and Jiale Guo2 1 Information Science and Technology, Nantong University, Nantong, Jiangsu, China

[email protected] 2 Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore

Abstract. With the proliferation of medical Internet of Things (IoT) devices that collects sensitive healthcare data, access control is more crucial than ever to ensure medical IoT data from unauthorized use. The security and privacy of medical records must be addressed rigorously. The security protocols and distributed system architecture are designed for effectively implementing the access control mechanisms to prevent privacy leakage and insecure data sharing. Blockchain is a promising candidate for strengthening the security and privacy needs of IoT medical systems. This paper proposes a blockchain-based access control mechanism for IoT medical data. Medical IoT data can be encrypted and stored off-chain in a distributed file storage system. Data access transactions only record the address of the data ciphertext and the encryption-key ciphertext into the blocks, which can significantly improve the scalability of the blockchain and strengthen privacy protection of medical data. The immutable nature of blockchain ensures the integrity and authenticity of medical data. Our BAC-IOMT scheme embeds the access policies into smart contracts to execute data access control mechanism. Finally, this scheme is proven to be secure and its performance efficiency, data volume and delay are analyzed in detail, the experimental results show that it is efficient and feasible. Keywords: IoT · Access Control · Blockchain · Smart Contract · Medical Data

1 Introduction With the rapid development of the Internet of Medical Things (IoMT), devices can be used to achieve real-time access to patient data, and collect patient sensitive data through smart sensors and wearable devices. This allows doctors and patients to track patients’ conditions everywhere in real time, and realizes medical monitoring and remote health tracking. The collaborative data sharing makes healthcare more predictable and personalized. The IoMT data contain many sensitive information. However, massive data related to user privacy will be recorded by various types of IoMT devices, and the data security risks will become more and more serious. The privacy protection of medical data has received more and more attention from the public. Currently, IoMT devices can provide medical institutions with real-time patient medical data for processing and analysis. Most IoMT © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 475–486, 2023. https://doi.org/10.1007/978-981-99-4755-3_41

476

T. Yang et al.

systems rely on the third centralized server to store data. The centralized server is likely to be attacked, and there is a serious risk of privacy leakage. Once the centralized server is broken by the attacker, the privacy data of users will be leaked or tampered, and then it will cause a great threat to the patients’ lives and even their safety of life. Secondly, in a centralized environment, hospitals retain the main management rights of patient medical data, which will lead to the damage of valid evidence for medical accidents and medical disputes cannot be properly resolved. In the medical IoT, the complex structure of network and the variety of data sources increase the difficulty of realizing data security and privacy protection. The platform differences between hospitals have a negative impact on the security of data access. Due to lacking scientific auditing, medical data is difficult to be used as experimental data for medical statistics and probably make the results of disease surveys inaccurate. However, these medical data collected by IoMT devices are very important for hospitals and medical research. For medical institutions, medical data sharing helps medical staff to accurately understand the patient’s condition and medical history, which will greatly reduce the diagnosis time. Besides, a large amount of medical data will provide the basis for medical research in medical institutions. For individuals, medical data sharing can help citizens get better medical services while reducing medical costs. Therefore, it is a key issue to achieve secure data access control mechanism and protect user privacy simultaneously. Sharipudin et al. [1] proposed an IoMT-healthcare monitoring system, which can monitor multiple patients’ health parameters simultaneously and effectively deliver the data to a patient monitoring system to be analyzed and diagnosed and improve the efficiency of monitoring each patient. However, the hospital controls the patient’s medical data access rights, and the hospital may disclose the patient’s privacy without the patient’s permission. Fan et al. [2] proposed a blockchain-based information management system MedBlock, which improves the consensus protocol and allows effective EMR access and retrieval, and enhanced the security and privacy of data in the system. However, this article cannot adequately discuss specific measures to protect data security. Ma et al. [3] proposed a blockchain-based mechanism for fine-grained authorization in data crowdsourcing, and used Ciphertext-Policy Attribute-Based Encryption (CP-ABE) to pre-process the complex encryption workload, and generated the attribute private key for data requester to achieve the fine-grained authorization. However, this paper may not explore how to reduce the cost of IoMT systems and ensure their sustainability and cost-effectiveness. In this paper, we propose a blockchain-based access control mechanism for IoT medical data (BAC-IOMT). In BAC-IOMT, IoMT data can be encrypted and stored off-chain in a distributed file storage system. Data access transactions only record the address of the data ciphertext and the encryption-key ciphertext into the blocks, which can significantly improve the scalability of the blockchain and strengthen privacy protection of medical data. The immutable nature of blockchain ensures the integrity and authenticity of medical data. Our scheme embeds the access policies into smart contracts to execute data access control mechanism, which realize strict access control of patients to their own medical data. This scheme can realize secure access control mechanism of medical data in the IoT, protect the privacy of users, and effectively withstand malicious activities

Blockchain-Based Access Control Mechanism

477

of internal users, external DDos and Sybil attackers. The experimental results show that it is efficient and feasible. The rest of this paper is arranged as follows: Section 2 introduces the related works. Section 3 describes the system model of blockchain-based access control mechanism for the IoT medical data. Then Sect. 4 proposes the concrete scheme. Section 5 analyzes the performance and proves its security. Finally, Sect. 6 summarizes our works and gives our future research directions.

2 Related Works Jiang et al. [4] used a double-chain structure to process different types of medical data according to the different needs of users’ health data and diagnostic data sharing. In this scheme, the double chain needs to consume many maintenance costs. Frequent crosschain operations will greatly increase the computing overhead and reduce the efficiency of medical services. Pathirana et al. [5] combined blockchain and Interplanetary File System (IPFS) on a mobile cloud platform to design a reliable access control mechanism with smart contracts. This scheme enables EHR sharing between different patients and medical service providers. Pathirana et al. [5] uses an asymmetric encryption algorithm to encrypt medical data. Although this method can ensure the privacy and security of medical data. However, using an asymmetric encryption algorithm to encrypt medical data will generate a large computational overhead. In the scheme [6], each patient uses a proprietary private blockchain to record their own health data. Automated edge computing nodes use the health data to analyze the patient’s sleep health status. However, the private chain is not suitable for medical data sharing scenarios. For general users, it takes a lot of computing costs to maintain a blockchain. The solution uses edge computing nodes to calculate and analyze health data. This method can easily lead to malicious nodes intercepting data transmission.

3 System Model 3.1 System Model The system model of blockchain-based access control mechanism for the IoT medical secure is shown in Fig. 1. There are five entities: Patients, Doctors, Hospital, System Manager, and Distributed file storage system. Patients have ownership of medical data. Each patient has a public key and a private key (PK P , SK P ), and an address (Addr P ). They can use the smart contract to formulate access control mechanisms for medical data. The main functions of patients include modifying their related information, accessing doctor’s diagnosis and their own medical data, formulating access control mechanisms for medical data. The proposed system can ensure the security of user medical data. The blockchain also provides traceability mechanism, and effectively protect the rights and interests of patients and provide evidence for accountability when an accident occurs. The main functions of doctors include: analyzing IoT medical data, modifying their re-lated information, applying for medical data, etc. Each doctor has a public key and

478

T. Yang et al.

a pri-vate key (PK D , SK D ), and an address (Addr D ). When applying for access to the patient’s medical data, the doctor must submit a request and his/her authorization certificate to a smart contract, and obtain the data address or a failure information, and the result can be recorded into this blockchain.

Fig. 1. System model

The main function of the hospital is to manage doctors and medical data, etc. Each hospital has a public key and a private key (PK H , SK H ), and an address Addr H . When the user’s medical data is delegated to the hospital management, hospital need to ensure the security of the user’s medical data privacy, which provides a data basis for the hospital’s research on epidemics and ensures the accuracy of the research results. The distributed file storage system is an entity that stores encrypted IoT medical data. The distributed file storage system can return the ciphertext storage address Addr i , and users can download the ciphertext to according to Addr i . System manager (SM) is a role in the system rather than a central authority.SM build a blockchain system and user interaction platform. SM can be companies and government departments, which only verify the identity of users. Before users join the system, they need to submit their identity information to SM for verification, and SM will send a digital certificate to users after the verification is passed. Only authenticated users can join the system. To a certain extent, it can improve the security of the system. In this scheme, we assume that SM is supervised by government agencies. The SM entities will develop access control policy to protect users’ identity information, which can prevent the external and internal attacks of malicious users. SM is answerable to legal actions for any data leakage. All users in the system must make measures to securely store their passwords and private keys. The Data flow between entities: All users can join the system after being verified by SM. Doctors monitor patients remotely through IoT devices and obtain medical data. The system will use the symmetric encryption algorithm to encrypt the data. Then, the ciphertext will be stored in the distributed file system, and the distributed file system will return the data storage address. The address and encryption-key will be encrypted with the patient’s public key. The ciphertext and signature will be sent to the network as a transaction. The miner can add the transaction to the block after verifying the signature, and all transactions are recorded into this blockchain. The hospital or doctor can apply to access the patient’s medical data. If the applicant has the permission, the patient will use the applicant’s public key to encrypt his medical data address and encryption-key and sign it. Then the ciphertext and signature will be sent

Blockchain-Based Access Control Mechanism

479

to the network as a transaction. Hospitals or doctors can user their private keys to decrypt the transaction in the blockchain and obtain the medical data addresses and encryptionkey. Hospitals and doctors can find the medical data ciphertext in the distributed file storage system based on the data address and then decrypt with the encryption-key to obtain the patient’s medical data. This paper uses the blockchain and smart contract as a decentralized data access control platform. The system uses blockchain technology to make medical data access control fair and transparent. Patients can use the smart contract to formulate data access policies. Hospitals and doctors can access authorized medical data through the smart contract. 3.2 Smart Contract Access Control Module This BAC-IOMT scheme uses the smart contract to implement access control of medical data in the IoT, solve user privacy leakage problem, and realize the secure medical data access control. The module is composed of four smart contracts: consensus contract (COC), relationship contract (RC), classification contract (CLC), and permission contract (PC). (1) COC: Consensus contract COC stores the addresses of nodes which have consensus permission. COC is used to package transactions and maintain the security of the blockchain. At the beginning, the COC is empty, and the temporary administrator is required to add trusted initial nodes. These initial nodes will verify and package the transaction. Once there are enough nodes in the system for consensus algorithm, the initial node will be deleted. (2) CLC: The classification contract CLC stores the addresses and classifications of all nodes in the system. During the user registration process, the system adds node addresses and classification information to the contract. The system confirms the registration of nodes through CLC, and this can effectively prevent double registration and ensure system security. (3) RC: The relationship contract RC stores interactive nodes’ information such as the address, ID, and status. Each user has a relationship contract. The relationship contract will be called at the time of medical treatment and be used to establish contact with the doctor. It also will be called during the medical data access process. (4) PC: The permission contract PC stores the permissive node’s address and access role. Access roles include: 1. User, the user can read the patient’s medical data. 2. Owner, usually the owner may be the patient himself. In special cases, other users can be the owner. For example, if the patient is juvenile, the owner is the patient’s guardian. 3. Manager. The manager can be hospitals or medical institutions. Hospitals or medical institutions will obtain the management permission after being authorized by the patient, then they can manage the patient’s medical data. Patients can use the PC to control the access to medical data and protect his privacy, and decide whether the applicant has access permission according to the applicant’s address (Addr) and application type (t) in the PC contract.

480

T. Yang et al.

4 Our BAC-IOMT Scheme The scheme mainly includes four parts: user management module, diagnosis module, data security access module, and delegate management module. In the user management module, the user is authenticated by the SM before entering the system, which can improve the security of the system. In the diagnosis module, doctors monitor patients remotely through IoT devices and obtain medical data. Doctors can analyze these data, fill in patient medical records and share diagnosis results with patients. In the data security access module, the scheme uses the smart contract to ensure the security of medical data access. Only authorized users can access the medical data and the transactions of data access process will be stored on the blockchain. The delegate management module implements that patients can delegate their data to the hospital. 4.1 User Management Module The user management module includes user authentication and registration. Before joining the system, all users should send an authentication application to the certificate authority and submit the corresponding identity documents and certificates. After the SM verifies the applicant’s identity, SM will generate a digital certificate and the key management center will create a key pair for the applicant. Then, keys and the digital certificate will be sent to the applicant through a secure channel. After receiving the digital certificate and the key pair, the user will send a registration application to the system and submit the digital certificate generated by SM. When the node receives the application, it will check whether the applicant exists in the classification contract CLC. If the applicant is a new user, then the node will check the validity of the digital certificate and confirm the applicant’s identity through the SM. After the user’s identity is verified, the system will store the address and identity in CLC, and add the user to the system. After the user joins the system, the system will automatically create an empty relationship contract RC and permission contract PC for the user, and send the contract address to the user. 4.2 Diagnosis Module In the diagnosis module, doctors monitor patients remotely through IoT devices and obtain medical data. Doctors can analyze these data, fill in patient medical records and share diagnosis results with patients. The system will record information of the doctor in the RC. When patient P wants to see a doctor, first the patient needs to fill in the visiting demand (Infop ). When the system receives the demand, it will classify the patient according to the department (depart) and match the right doctor D in the department. The system will also search for the doctor in the CLC. If it exists, the patient’s relationship contract RC will store the doctor’s information. After the IoT devices collect the relevant data of the patient, doctor D will analyze these data and fill in the diagnosis result (MInfo) to generate the medical data. The doctor will encrypt the medical data with the symmetric key (KEY ), and store the ciphertext (ct) in a distributed file storage system. The distributed file storage system will return the medical data address (Addr i ). The doctor will encrypt the address (Addr i ) and the

Blockchain-Based Access Control Mechanism

481

symmetric key (KEY ) with the patient’s public key (PK P ) to generate a ciphertext (CT ), and use his private key (SK D ) to sign the hash value of CT. The CT and signature (SignD ) will be added to the block, and then the block will wait to be added to the main chain. Encrypt the symmetric key and the address of the medical data ciphertext using the elliptic curve encryption algorithm of the Elliptic Curve Cryptography Algorithm (ECC). Compared with RSA, ECC is a shorter key can be used to achieve equivalent or higher security than RSA. The system can find the transaction in the blockchain through block ID (IDB ), and obtains the CT and SignD . The patient uses the doctor’s public key (PK D ) to verify the signature. After the verification is passed, patient uses the private key (SK P ) to decrypt the CT and then get the medical data address (Addr i ) and the symmetric key (KEY ). KEY is used to decrypt the ciphertext in the distributed file storage system. Then, the patient can get his medical data. 4.3 Secure Access Control Module The scheme uses the smart contract to realize access control of medical data in the IoT, which can effectively protect the privacy of users and realize the secure medical data access control. The patient’s PC records user’s address and access role. As shown in Algorithm 1, when applicant A requests access to the patient’s medical data, the PC will confirm the applicant’s access permission. If the applicant has no permission, the system will send a request to the patient. The request includes the applicant’s address (Addr A ) and request type (t). Patients can add permission through the PC. After the PC is updated, the applicant will be notified. The detailed algorithm is shown in Algorithm 2. When a user A sends an access request, the patient will use his public key (PK A ) to encrypt the medical data address (Addr i ) and the symmetric key (KEY ) to form the CT, and the patient will sign the hash value of CT with his private key (SK P ). The CT and SignP will be stored in the block IDB , and then the block will wait to be added to the main chain. When the block is added to the main chain, the user can find the CT and signature (SignP ) in the block. The user can verify the SignP with the patient’s public key (PK P ). After the verification is passed, the user can decrypt CT with his private key (SK A ), get the medical data address (Addr i ) and KEY, and then decrypt the corresponding ciphertext in the distributed file storage system, and achieve the patient’s medical data. If hospital’s or doctor’s role cannot satisfy the access control policy of the data, they can’t obtain the data. 4.4 Delegate Module In the delegate management module, the patient can delegate his medical data to a trusted medical institution. The patient P can choose a trusted hospital H. The patient will check the permission of the hospital in the PC through the hospital address (Addr H ). If the hospital does not have management permission T, the system will send a request to the hospital, and the PC will be updated after the hospital responds. Then, the patient will encrypt the medical data address (Addr i ) and the symmetric key (KEY ) with the hospital’s public key (PK H ) to generate a ciphertext (CT ). The patient will sign the hash value of CT with his private key (SK P ). The CT and signature (SignP ) will be stored in

482

T. Yang et al.

the block, and the block will be added to the blockchain after consensus protocol. This transaction can be used as an evidence that the patient delegates his medical data to the hospital.

Algorithm 1 Changing permissions Input: applicant’s address (AddrA), application type (t), permission contract (PC) Output: add Applicant’s permission in PC 1.function AddPermission (AddrA, t, PC) 2.i ← matchPC(AddrA, t) 3. if (i == false) then 4. adding request → Patient 5. if (positive) then 6. Add(AddrA, t) → PC //Add permission type and applicant address 7. Notify applicant and patient 8. end if 9.end if 10.if (i == “lower”) then 11. PC (updating request) ė Patient 12. if (positive) then 13. Add(AddrA, t) → PC //Add permission 14. Notify applicant and patient 15. end if 16.end if 17.end function

5 Performance Analysis and Security Proofs Based on the Intel (R) Core (TM) i7-7700HQ environment, IntelliJ IDEA 2017.3.4, this scheme is implemented to analyze the performance efficiency, data volume and delay. Based on the Ethereum development environment, we developed a private blockchain by ourselves, and combined the private blockchain and the public blockchain. We use InterPlanetary File System (IPFS) to store the encrypted medical data which can be saved safely and permanently. And the IPFS can generate a unique address based on the stored data. Users can obtain the encrypted medical data through the address.

Blockchain-Based Access Control Mechanism

483

Algorithm 2 Secure data access Input: applicant’s address (Addr), application type (t), public key and private key (PKA, SKA), patient’s public key and private key (PKP, SKP), medical data address (Addri), symmetric key (KEY), block ID(IDB), permission contract (PC). Output: PC returns whether the applicant has permission. Transactions are stored in the block. Applicant gets Addri and KEY 1.function DataAccess (AddrA, t, Addri, KEY, PKA, SKP, PC) 2.i ← matchPC(AddrA, t) 3.if (i == true) then 4. CT= enc(Addri+KEY, PKA) 5.return CT 6.SignP←Sign(hash(CT) , SKP) 7.block←Add(CT, SignP) // Add CT and signature to the block 8.end function

5.1 Performance Analysis In the encryption part, this system combines symmetric encryption technology and asymmetric encryption technology to ensure the security of medical data storage and data access. We use 256-bit AES to encrypt the medical data and store the encrypted data in the IPFS. For the asymmetric encryption, we compare with ECC and RSA, the time of signature generation and verification are within the acceptable range. However, compared to RSA, with keys of the same length, ECC has higher secure level. Considering the security and time consumption, this system uses a 256-bit ECC algorithm. In our system, all encryption and decryption operations are performed off-chain. Table 1. Cost of proposed smart contract. Smart Contract

Transaction cost (gas)

Execution cost (gas)

Price

COC (Consensus Contract)

252127

148623

$1.04836

CLC (Classification Contract)

285216

174020

$1.20135

RC (Relationship Contract)

251644

151396

$1.05436

PC (Permission Contract)

263940

162008

$1.11429

In the process of data access in the IoT system, for both parties to the transaction, 256-bit AES algorithm are required to encrypt and decrypt medical data. This operation is only performed once. Also, 256-bit ECC algorithm are required to encrypt and decrypt the transaction data one time, sign and verify signature one time. A transaction which stored in the block contains the public key, digital signature, the type of transaction, and the transaction data. The size of transaction is about 210B to 239B. In the process of data access, it needs to complete at least three transactions, which are about 668B.

484

T. Yang et al.

We use Remix-IDE to test, debug and deploy smart contracts. In this paper, we conduct experiments on the cost of smart contracts, and the results are shown in Table 1. Transaction costs are based on the cost of sending the contract code to the blockchain. Execution costs are based on the cost of computational operations which are executed as a result of the transaction. The price is calculated based on Gas Price = 12 gwei. The cost of each phase of the system is shown in Table 2. In the process of diagnosis module, the doctor can monitor patients remotely through IoT devices, obtain the medical data, and fill in the medical record. Then, the patient can share the diagnosis results with the patient. The execution cost is about 1263026 gas. In the process of transaction, the CLC, PC and COC are introduced to realize the access control of medical data and ensure the security of data access. The execution cost is about 1285934 gas. In the process of delegate management, the patient can delegate his medical data to a trusted medical institution. The execution cost is about 1688974 gas. In this system, we conduct the consensus every 1 min. In this minute, miners will add all the transactions in the network to the block after verification. And the time for miners to mine and get results is about 2 min. This means that every three minutes a new block will be added to the main chain. Table 2. Cost of each phase of this system. Phase

Total cost (gas)

Price

The process of diagnosis module

1263026

$3.30407

The process of data access

1285934

$3.364

The process of delegate management

1688974

$4.41836

This paper conducts time-consuming tests and write performance tests on the process that blocks are added to the chain. We assume that 100 transactions are generated every second. The following Table 3 shows the data volume of 1000 to 5000 pieces of medical data. Figure 2 shows the time consumption of 6000 to 60000 pieces of medical data which are added to the chain. In order to improve system security, this paper adds some acceptable time consumption. It has been proved through experiments that these time consumptions have no impact on normal medical business processes. After testing, it takes about 1500 ms to 3000 ms for the system to process the request and respond. During the medical data storage and access control processes, this system needs to encrypt and decrypted the data. Secondly, the data is passed from the client to the server in JSON format. The server not only needs to process business requests, but also needs to JSON parse the data. In addition, in the blockchain, nodes need to send requests and responses through the network, the processing speed is affected by the network. However, when processing the query business, since there is no need to write and send data, the system only needs to search and decrypt the ciphertext in the distributed file storage system, the response speed will be much faster.

Blockchain-Based Access Control Mechanism

485

Table 3. Medical data storage costs of blockchain. Pieces of data

Transaction size (B)

Blockchain data volume (KB)

1000

227618.675

223.14

2000

458875.35

448.97

3000

687294.025

672.04

4000

917573.96

896.92

5000

1129264.8

11040.70

5.2 Security Proofs Due to the reasonable design of management system and distributed file storage system, it is assumed that the IoT devices are reliable and will not generate incorrect patient data, and doctors are reliable and will not leak out patients’ medical data. Under this assumption, we prove that our scheme satisfies the following several security properties. Theorem 1. Assume that the consensus protocol is secure, this scheme can realize the secure access control mechanism of the IoT medical data. Proof. When the applicant applies for the patient’s medical data, the patient will use the applicant’s address (Addr A ) to find the access permission in PC. If the applicant has permission, the patient will use the applicant’s public key (PK A ) to encrypt the medical data address and symmetric key to form a ciphertext (CT ). The patient needs to sign the hash value of CT. The CT, signP and related data will be added to the block. In the blockchain, nodes can validate transactions in the block, and the block will be added to the main chain through the consensus protocol. When the block is added to the main chain, the applicant can retrieve the corresponding transaction on the chain. The applicant can use his private key (SK A ) to decrypt the CT in a transaction and decrypt the ciphertext in the distributed file storage system with the key (KEY ). Due to the feature of the asymmetric encryption algorithm, the data which is encrypted by the public key (PK A ) can only be decrypted by the corresponding private key (SK A ), so unauthorized applicants cannot use their private key (SK A ’) to decrypt the CT in the block and cannot access the privacy medical data.

Fig. 2. Time costs of medical data in blockchain.

486

T. Yang et al.

Compared with the related works [4–6], BAC-IOMT does not have frequent crosschain operations and the calculation overhead is small. Access control of medical data is implemented using smart contracts. Patients can develop access control policies based on their own data and protects the patient’s privacy. Additionally, to ensure the security of medical data, our scheme only add a few storage and the computation overheads.

6 Conclusions The BAC-IOMT scheme uses symmetric encryption technology to encrypt the original medical data generated by IoT devices, and off-chain stores the ciphertext in a distributed file storage system. The smart contract is introduced to implement access control mechanism for medical data. The BAC-IOMT scheme improves system efficiency and throughput. It can effectively guarantee the ownership of medical data of patients, protect the privacy of patients. However, this scheme only realizes one-to-one medical data access, and the medical data may have some user privacy information. In our future work, we will use attribute-based encryption algorithms to encrypt medical data, combined with smart contracts to form access control policy. Also, we will focus on the privacy protection of medical data in the IoT and improve the efficiency of the blockchain. Acknowledgements. This work is supported by the Nantong Science and Technology Project (No. JC2021128, No. JC22022036).

References 1. Sharipudin, A., Ismail W.: Internet of medical things (IoMT) for patient healthcare monitoring system. In: 2019 IEEE 14th Malaysia International Conference on Communication (MICC), Selangor, Malaysia, pp. 69–74, 2019. https://doi.org/10.1109/MICC48337.2019.9037498 2. Fan, K., Wang, S., Ren, Y., Li, H., Yang, Y.: Medblock: efficient and secure medical data sharing via blockchain. J. Med. Syst. 42(8), 136 (2018) 3. Ma, H.Y., Huang, E.X., Lam, K.Y.: Blockchain-based mechanism for fine-grained authorization in data crowdsourcing. Futur. Gener. Comput. Syst. 106, 121–134 (2020) 4. Jiang, S., Cao, J., Wu, H., Yang, Y., Ma, M., He, J.: Blochie: a blockchain-based platform for healthcare information exchange. In: 2018 IEEE International Conference on Smart Computing (Smartcomp), pp. 49–56. IEEE (2018) 5. Nguyen, D.C., Pathirana, P.N., Ding, M., Seneviratne, A.: Blockchain for secure EHRs sharing of mobile cloud based e-Health systems. IEEE Access 7, 66792–66806 (2019) 6. Rachakonda, L., Bapatla, A.K., Mohanty, S.P., Kougianos, E.: SaYoPillow: blockchainintegrated privacy-assured IoMT framework for stress management considering sleeping habits. IEEE Trans. Consum. Electron. 67(1), 20–29, (2021). https://doi.org/10.1109/TCE. 2020.3043683 7. Zheng, X., Mukkamala, R.R., Vatrapu, R., Ordieres-Mere, J.: Blockchain-based personal health data sharing system using cloud storage. In: 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom), Ostrava, pp. 1–6. (2018) 8. Hossein, K.M., Esmaeili, M.E., Dargahi, T., khonsari, A.: Blockchain-based privacy-preserving healthcare architecture. In: 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), pp. 1–4. Edmonton, AB, Canada (2019)

LNPB: A Light Node for Public Blockchain with Constant-Size Storage Min Li(B) School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China [email protected]

Abstract. In the blockchain system, the light node uses a simplified transaction verification method. Its storage overhead increases linearly with the size of the blockchain, which can quickly become prohibitive for mobile and IoT low-end devices in the blockchain system. However, most existing schemes only achieve limited storage compression, apply only to UTXO scenarios, or cannot verify the transactions independently. This paper proposes a light node for public blockchain with constant-size storage called LNPB. LNBP fuses the blockchain verification and the simple payment verification protocols to achieve a more succinct and secure transaction verification protocol with constant-size storage. It applies to UTXO and non-UTXO scenarios. To this end, we re-design the block header, which contains a constant-size summary and an RSA modulus. Then we use an RSA encryption accumulator to calculate this summary. The light node only stores the summary of the latest block when verifying the transaction. In generating the proof of the transaction by full node, we employ the Merkle Mountain Range to store the intermediate results of the summary, which makes the proof generating faster. In addition, we conduct simulation experiments and analysis on LNPB, and compare it with existing schemes. The results indicate that LNPB achieves the expected goals, and can save the storage and computation overheads. Keywords: Blockchain · Light node · RSA encryption accumulator · SPV · MMR

1 Introduction Blockchain [1–3] is a technology to build a value exchange system, which is one of the current focuses in academia and industry. In a typical public blockchain system, there are two types of nodes, which are referred to as full nodes (FNs) and light nodes (LNs), respectively. An FN stores a complete blockchain ledger, which enables it can verify transactions and update the blockchain state independently. For example, an FN can verify account balances and detect double payments [4], etc. On the other hand, an LN only stores the ledger related to the user, the wallet, and the routing communication to complete simple transaction verification. This kind of node is also referred to as simple payment verification (SPV) node. SPV only verifies whether the blockchain has accepted the transaction or not and the number of confirmations it has received [5], which is different from the transaction verification performed by an FN. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 487–498, 2023. https://doi.org/10.1007/978-981-99-4755-3_42

488

M. Li

In the SPV protocol, an LN stores all block headers [6]. When a transaction is to be verified, the LN first sends a request to its neighboring FN to find the block in which the transaction is located. Then, the LN calculates a root according to the Merkle hash tree (MHT) path sent by the FN. Finally, the LN compares the MHT root with the block header stored locally to verify the transaction. However, in the original SPV protocol, the overhead of storing all block headers is still huge for LNs, especially for those storagelimited LNs (e.g. mobile and IoT low-end devices [7]) as the amount of block headers grows with the blockchain ledger linearly. Therefore, many solutions [8–14] have been proposed to reduce the storage overhead of LNs in the blockchain system. For example, in Reference [8–11], the original blocks are merged or combined into new blocks, which are much smaller than the original ones. Unfortunately, since the number of blocks will increase, the storage overhead will continue to grow. Meanwhile, they are only effective in the Unspent Transaction Output (UTXO) model. In Reference [12, 13], the transaction verification is performed by one or more FNs and the result is returned to the LN. In this manner, the LN has lower storage overhead. However, it will weaken the decentralized nature of the blockchain as LNs cannot verify transactions independently. In [14], a transaction verification scheme, which is referred to as EPBC, is present for public ledgers. EPBC re-designs the block header to contain a new attribute summary (S). On this basis, EPBC utilizes an RSA cryptographic accumulator to realize constantsize storage for LNs. In EPBC, the transaction verification can be performed by LNs independently. However, the LN can verify whether a given block is on the i-th position of the blockchain, but cannot verify that the transaction is indeed on that block. In other words, LNs might be deceived by malicious nodes in the verification process. In this paper, we propose LNPB, an LN for public blockchain, which fuses the EPBC and SPV protocols to achieve a more succinct and secure transaction verification protocol with constant-size storage. LNPB also re-designs the block header, which contains two new attributes: the block summary (S) and the RSA modulus (N). Then, an RSA cryptographic accumulator is used to calculate the constant-size S according to the MHT root. When an LN verifies the transaction, it stores the S of the last block and verifies the proof from the FN. The LN does not store a complete blockchain ledger and requires the FN to assist in verification. LNPB can verify whether a transaction is on a given block or not, which makes it more secure. Moreover, to make the proof generation more efficient, we employ the Merkle Mountain Range (MMR) to store the intermediate results of the summary. We analyze the security of LNPB and evaluate the performance by simulation experiments. The results show LNPB can save the storage and computation overheads. The rest of this paper is organized as follows. In Sect. 2 we describe the design of ILNP. Section 3 describes the transaction verification protocol. In Sect. 4 we introduce a new proof generation method. Section 5 analyzes the security of LNPB. Experimental results are given in Sect. 6 to demonstrate the practicability of LNPB, and Sect. 7 discusses the related work. We conclude the paper in Sect. 8.

LNPB: A Light Node for Public Blockchain

489

2 Design of LNPB 2.1 Framework Figure 1 gives the framework of LNPB, which is consisted of LNs, FNs, and a public blockchain. LNs are devices with limited computation and storage resources, such as mobile devices and low-end devices of IoT. FNs maintain a complete blockchain ledger. The block headers are re-designed, which contain two new attributes: the block summary (S) and the RSA modulus (N). An LN can perform an interacted transaction verification protocol with an FN to verify the validity of transactions. To this end, the LN first obtains the latest summary from the blockchain. Then, the LN sends a request of a designated transaction to an FN. According to the local complete ledger, the FN can generate a proof of the transaction, which will be returned to the LN. Finally, the LN can verify the proof to determine whether to accept the transaction or not.

Fig. 1. Framework of LNPB.

2.2 Design Goals The goals of LNPB are to allow LNs to participate in a public blockchain, which includes: (1) Constant-size storage: LNs do not have to store or download all block headers. Instead, they only need to store constant-size information, of which the size is independent of the size of the blockchain. (2) Transaction verification: LNs can verify whether a transaction has been accepted by the blockchain or not. Meanwhile, a malicious FN cannot generate a valid existence proof for a transaction that is not on the blockchain.

3 The Transaction Verifiaction Protocl In LNPB, the transaction verification protocol consists of four stages: parameter initialization, block summary generation, proof generation, and proof verification. 3.1 Parameter Initialization The Parameter initialization is executed only once by a special FN (SFN) which generates the Genesis block and common parameters as follows. The SFN selects two large prime numbers p and q, and calculates N = pq. N will be embedded into each block header as a new attribute. p and q are discarded. The SFN also picks up a random value g ∈ Z∗N , which is public for all nodes.

490

M. Li

3.2 Block Summary Generation This stage is performed by FNs to generate the summary S of each block. The S of the Genesis block is calculated according to its MHT root. The S of other block is calculated according to its MHT root and the S of its previous block. More precisely, the S of a block can be generated according to Eq. (1). The i is the index of a block. The S i is the summary of the i-th block, blkr i is the MHT root of the i-th block. If the blockchain contains n blocks, the summary of the latest block is S n .  g hash(blkri ) mod N ,i=1 Si = hash(blkri ) (1) Si−1

mod N ,i>1

3.3 Proof Generation In this stage, the proof of a specified transaction is generated by an FN as follows. Firstly, an LN sends the hash of the specified transaction to an FN. Then, the FN locates the transaction in its local ledger. If the transaction is found in the blockchain, the FN calculates the proof of the transaction according to Eq. (2). The MHT path of the transaction, the block index containing the transaction, and the proof are returned to the LN. Otherwise, an empty message is returned to the LN. ⎧ ⎪ ⎨ p(1) = hash(blkri ), i (2) pi=  ⎪ ⎩ p(2) = g ( nk=1,k=i hash(blkrk )) mod N i The proof is used to prove that the specified transaction has been accepted by the (1) blockchain. More precisely, the pi is used to prove that the transaction is on a given block i, and the pi(2) is used to prove that the block i is on the blockchain. 3.4 Proof Verification Given the necessary information about a transaction such as its MHT path, the block index containing the transaction, and the proof sent by the FN, the LN can confirm whether the transaction has been accepted by the blockchain or not. The details are as follows. Firstly, when the LN receives the proof sent by the FN, it retrieves the index of the last block (denoted by n) and its summary S n from the blockchain. Secondly, the LN calculates blkri  according to the MHT path. Thirdly, the LN checks Eq. (3). If both sub-equations hold, the LN accepts the proof. Otherwise, the LN rejects the transaction directly. Finally, the LN computes n-i. Given a threshold T, if n-i ≥ T holds, the LN will accept the transaction as the transaction has been confirmed T times by the blockchain.

LNPB: A Light Node for Public Blockchain

491

Otherwise, the LN will wait for some time, and then check whether n-i ≥ T holds or not. In this way, the rolling back attack can be prevented. ⎧ ? ⎪ ⎨ p(1) = hash(blkri  ) i (3) ⎪ ? (2) p(1) ⎩S = n (pi ) i mod N

4 A New Proof Generation Method In this section, we will introduce a new proof generation method which can reduce the computation overhead of the proof generation. In our original proof generation, the computation overhead is related to the blockchain height and the block index containing the transaction. For example, when the height of the blockchain is n, given a transaction tx which is on the i-th block, the FN needs to perform hash product operation on the MHT for blocks 1,2,…,i − 1, i + 1,…,n to generate the proof. Totally, the FN needs to perform n−2 multiplications, n hashes, and 1 modular operation. In other words, the computational complexity is O(n). In the (2) worst case, the FN needs to traverse all of the blocks on the blockchain to calculate pi . To make the proof generation more efficient, EPBC utilizes a fixed-height (32-layer) binary tree maintained by the FN to store the intermediate results. However, when a new node is added to the binary tree each time, the root must be re-calculated, in which 1 hash and 31 multiplications must be performed. Moreover, when the number of nodes is small, a significant amount of space is wasted as many empty nodes are still maintained. Conversely, when the number of nodes exceeds 231 , there will be no node to use, resulting in the construction of a new binary tree, which requires a substantial amount of computation. To tackle this challenge, our LNPB introduces the Merkle Mountain Range (MMR) [15] to store intermediate results that can be used to generate proof for a given transaction. Compared with the fixed-height binary tree, new nodes can be added into the MMR dynamically, and there might be multiple roots. When the heights of two roots are the same, a new root, of which the two children are the two roots, is generated. In LNPB, each leaf node of the MMR stores the MHT root of the corresponding block. The inner node stores the product of its two child nodes. Figure 2 illustrates an example of MMR with 4 leaf nodes and its dynamic construction process. Blkr i is the MHT root of the i-th block, and hi is the hash of blkr i . When blkr 1 is added into this MMR, the root is h1 . When blkr 2 is added into this MMR, the heights of h1 and h2 will be the same. In this case, a new root h12 , of which the two children are h1 and h2 , is generated. The value of h12 is the result of h1 *h2 . When blkr 3 is added into this MMR, there will be two roots, h2 and h3 . When blkr 4 is added into this MMR, the height of h4 is the same with the height of the h3 , resulting in a new root h34 is generated. Besides, the heights of h12 and h34 are the same now. Thus, a new root h1–4 is generated, of which the two children are h12 and h34 . In this manner, only 31 multiplication operations are required when the 231 -th block is added into a MMR.

492

M. Li

Equation (4) is our new method to generate the proof. The root p is a child root where hi is located. The r, among all leaf nodes of root p , is the product of h on the right side of hi . Root k is child root to the right of root p , and there are m child roots in the MMR. For example, in the thirdpart of Fig. 2, the proof is generated for blkr 1 , where r = h2 , root p = root 1 = h1 *h2 , m k=∂+1 rootk = root2 = h3 . (Si−1 )r

m

k=p+1 rootk

mod N

(4)

Fig. 2. MMR

5 Security Analysis The correctness of our transaction verification protocol can be verified by verifying the correctness of Eq. (3). Due to length limitation, we omit the details. The correctness means that a valid proof will be accepted. Now we analyze the situation that malicious FNs (MFNs) cannot generate valid proofs for transactions which are not on the blockchain under the irreversible property of hash and the strong RSA hypothesis. It is noted that the optimized proof generation, which just makes the proof generation more efficient, has no impact on the security of the protocol. Thus, we only analyze the security of our original protocol. Theorem 1. Given initialization parameters N and g, the summary S i of the i-th block, MFNs cannot generate valid proofs for transactions which are not on the blockchain. Proof. The strong RSA hypothesis implies that even though an attacker knows C, N and can choose the exponent e by himself/herself, he/she cannot calculate M from C = M e (modN), in which N is the product of two large prime numbers. When a transaction tx is not on the blockchain, and a MFN wants to generate a proof (1) (2) p’, which contains two values ptx and ptx , to pass the verification, the MFN has two (1)

(1) choices.(1) The MFN generates a valid ptx to make Sn = yptx mod N hold, wherein n ( k=1,k=ε hash(blkrk )) mod N , and ε is the block index specified by the MFN. The y=g (2) MFN wants to prove that the tx is on the ε-th block. (2) The MFN generate a valid ptx (2)y to make Sn = ptx mod N hold, wherein y is a hash chosen by the MFN randomly.

LNPB: A Light Node for Public Blockchain

493

(1)

In case (1), according to Eq. (2), ptx = hash(blkrtx ) holds, wherein blkr tx is the (1) MHT root containing the transaction tx. To make ptx valid, blkr tx must be the MHT root of one block on the blockchain. In our protocol, the MFN has to generate the MHT path of the transaction tx for the LN. Due to the irreversible property of hash, it is impossible to generate an MHT path, which links the transaction tx hash and the MHT root, and this MHT root is belong to any block on the blockchain. In case (2), the MFN randomly generates an MHT path, which contains the transaction tx, to make the MHT root be blkhε . In this situation,y = hash(blkhε ) holds. Then, (2)y (2) to make Sn = ptx mod N hold, the MFN needs to calculate ptx , which is impossible according to the strong RSA hypothesis.

6 Simulation Experiment In this section, we mainly evaluate and analyze the time cost of the proof generation and proof verification. 6.1 Simulation Experiment Environment and Parameters We implemented the algorithms using python with version 3.7 on a Linux system Ubuntu 20.04.1, with Intel (R) Core (TM) at 2.60GHz, 2.00GB of RAM. We used the function hash256 with a 256-bit N. Since the average transaction volume of the Ethereum block is about 200 transactions, we set each block to contain 200 transactions. Correspondingly, the MHT path required by SPV and LNPB to verify a transaction contains approximately 7 or 8 transaction hashes. 6.2 Experimental Results and Analysis Proof Generation. Figure 3 shows the average time of three schemes to generate proofs based on the different block numbers. As can be seen from the figure, the average time for the three schemes to generate proofs increases with the number of blocks. This is due to the fact that with the growth of the blockchain size, the number of traversals and multiplications of the three schemes generally increase. Our original scheme’s multiplication times and traversal complexity are n−i−1 and O(n), respectively. The n is the blockchain length, and the transaction is on the i-th block. EPBC has less than C−2 multiplication times and traversal complexity C. the C is the height of the fixed-height binary tree. Our optimized scheme’s multiplication times are less than logn −1with traversal complexity logn . Since the height of the fixed-height binary tree is equal to or greater than that of MMR, there is C ≥ logn + 1. Specifically, our optimized scheme reduces the time overhead by 14.46% compared to EBPC and 20.91% compared to our original scheme. Figure 4 describes the time required by the three schemes to generate proof of transaction in blocks, which are at different locations in the blockchain with the number of blocks 216 . To do this, we divided the block positions into 10 intervals. Then we recorded the time required for each endpoint to generate a proof. It can be seen from the figure that the proof generation time is related to the location of the block the transaction in. The closer the transaction is to the end of the blockchain, the shorter the proof generation

494

M. Li

time required, as it requires fewer traversals and multiplications. The optimized scheme in the figure indicates that the generation time is periodic, since the MMR structure maintained by FN is symmetrical when the number of blocks is 216 . In the first half and second half of the blockchain, the closer to the end of each half, the smaller the number of traversals and multiplications required for the proof generation, so the time overhead is shorter. In addition, the three schemes prove that the generation time is similar in the second half of the blockchain, as their traversals and multiplications are comparable.

Fig. 3. Proof generation time.

Fig. 4. Block transaction proof generation time at different locations.

Proof Verification. Figure 5 illustrates the verification times of these three LNs for different block numbers. Our experiment indicates that the verification time of LNPB

LNPB: A Light Node for Public Blockchain

495

and EPBC LNs remains constant irrespective of the blockchain size. Instead of traversing the block header, they only need to perform 1 hash and 1 modular operation (LNPB also requires an MHT root calculation with very little additional effort). As the blockchain size increases, the verification time of LNPB and EPBC LNs remains significantly lower than that of the SPV LNs. Additionally, the LNPB LN’s verification time is slightly lower than that of EPBC, as the block header (640-bit) is used in the EPBC verification process, while LNPB uses the MHT root (128-bit).

Fig. 5. Transaction proof verification time.

7 Related Work Nakamoto [6] proposed the SPV LN, which stores all block headers rather than the whole blockchain. However, its storage overhead increases with blockchain size. Frey et al. [8] split the UTXO set into shards and stored them within a distributed hash table (DHT) that the LNs can query whenever they need to verify a transaction. The LN only needs to download the last 6 blocks, the UTXO fragment hash list, and the block hash list in the blockchain. Nagayama et al. [9] proposed the Trail architecture in which account balances are managed in the same way as UTXOs using the transaction output (TXO). It maintains a TXO tree to manage whether a TXO is used or unused. The LN only stores the latest TXO tree root and its MHT proof related to itself. Palai et al. [10] proposed a recursive block summary method, the LN stores the final summary block. Nadiya et al. [11] built upon Palai’s work and used the Deflate compression algorithm to compress data in summary blocks, further reducing the storage space occupied by the blockchain. Unfortunately, since the number of blocks will increase, the storage overhead in Reference [8–11] will continue to grow. Meanwhile, they are only effective in the Unspent Transaction Output model.

496

M. Li

Ying et al. [12] proposed a low-overhead payment verification method. The LN hands over transaction verification to a group of FNs with a consensus algorithm and trusts their final result. Kim et al. [13] proposed a storage compression consensus (SCC). The LN with the least remaining storage capacity compresses the block. Then it stores the compressed block and deletes the previous blocks. [12, 13] realize constant-size storage overhead of LNs. However, they cannot verify transactions independently as they trust the FNs and do not perform verification computations, lacking certain security and weakening the decentralized nature of the blockchain. Lei et al. [14] proposed a common blockchain verification protocol (EPBC). It uses the RSA encryption accumulator to compress the blockchain into a constant-size summary, and the LN only needs to store the latest block summary. It greatly reduces the storage burden of LNs and implements constant-size storage. However, because LNs cannot verify whether a transaction exists on a particular block, MFNs can exploit this vulnerability by producing valid proof for a transaction that is not on the blockchain, using an on-chain block to deceive the LN. We compared LNPB to existing LN schemes. LNPB uses an RSA encryption accumulator to compress blockchain to achieve constant-size storage, which significantly reduces the storage overhead of LNs. In addition, the LN of LNPB can perform verification calculations, thus avoiding the dependence on the FN to obtain the final verification results. Finally, LNPB is suitable for UTXO and non-UTXO models, and its application scenarios are more extensive. In general, the scheme of this paper enhances the security of LNs and the decentralization of blockchain under the condition of constant-size storage overhead of LNs (Table 1). Table 1. LN comparison. Scheme

Public/ private

Independent verification

Storing content

Fixed length

Applicable scenario

LNPB

Public

Y

Latest block summary

Y

All

[6]

Public

N

All block headers

N

All

[8]

Public

Y

Last 6 blocks UTXO shard hash list Block hash list

N

UTXO

[9]

Public

Y

Latest TXO tree N root Proof related to itself

UTXO

(continued)

LNPB: A Light Node for Public Blockchain

497

Table 1. (continued) Scheme

Public/ private

Independent verification

Storing content

Fixed length

Applicable scenario

[10]

Public

Y

All summary blocks

N

UTXO

[11]

Public

Y

Compressed summary

N

UTXO

[12]

Public

N

N/A

Y

All

[13]

Private

N

Compressed blocks Latest blocks

Y

All

[14]

Public

Y

Latest block Summary

Y

All

8 Conclusion To reduce the storage overhead of LN and improve its security, we propose LNPB. LNPB is suitable for low-end devices, such as mobile and lightweight IoT devices in blockchain systems. LNPB re-designs the block header, including a block summary (S) and an RSA modulus (N), and uses the RSA encryption accumulator to calculate this constant-size summary. When an LN verifies the transaction, it only stores the summary of the last block and interacts with the FN. The proof from the FN contains the MHT path of the transaction, and the LN can verify whether the transaction is on a given block, enhancing the security of the LN. In addition, we propose a new proof generation method with MMR, which makes the proof generation faster. We simulate and evaluate the LNPB and compare it with the existing schemes. The results show that LNPB achieves the expected goals, and can save the storage and computation overheads.

References 1. Alam, S., Shuaib, M., Khan, W., et al.: Blockchain-based Initiatives: Current state and challenges. Comput. Netw. 198, 108395–108414 (2021) 2. Huang, H., Lin, J., Zheng, B., et al.: When blockchain meets distributed file systems: an overview, challenges, and open Issues. IEEE Access 8, 50574–50586 (2020) 3. Kim, S., Kwon, Y., Cho, S.: A survey of scalability solutions on blockchain. In: 2018 International Conference on Information and Communication Technology Convergence (ICTC). Jeju, Korea (South), pp. 1204–1207 (2018) 4. Shalini, S., Santhi, H.: A survey on various attacks in bitcoin and cryptocurrency. In: 2019 International Conference on Communication and Signal Processing (ICCSP). Chennai, India, pp. 0220–0224 (2019) 5. Nijsse, J., Litchfield, A.: A taxonomy of blockchain consensus methods. Cryptography. 4, 1–32 (2020) 6. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system. Social Science Electronic Publishing (2008)

498

M. Li

7. Yang, W., Aghasian, E., Garg, S., et al.: A survey on blockchain-based internet service architecture: requirements, challenges, trends, and future. IEEE Access. 7, 75845–75872 (2019) 8. Frey, D., Makkes, X., Roman, P., et al.: Bringing secure Bitcoin transactions to your smartphone. In Proceedings of the 15th International Workshop on Adaptive and Reflective Middleware (ARM). vol. 10, pp. 1–6 (2016) 9. Nagayama, R., Banno, R., Shudo, K.: Trail: a blockchain architecture for LNs. In: 2020 IEEE Symposium on Computers and Communications (ISCC), Rennes, France, pp. 1–7 (2020) 10. Palai, A., Vora, M., Shah, A.: Empowering light nodes in blockchains with block summarization. In: 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), Paris, France, pp. 1–5 (2018) 11. Nadiya, U., Mutijarsa, K., Rizqi, Y.: Block summarization and compression in Bitcoin blockchain. In: 2018 International Symposium on Electronics and Smart Devices (ISESD), Bandung, Indonesia, pp. 1–4 (2018) 12. Ying, Z., Zhengyuan, H., Linpeng, J., et al.: LOPE: a low-overhead payment verification method for blockchains. Chin. J. Electron. 30, 349–358 (2021) 13. Kim, T., Noh, J., Cho, S.: SCC: storage compression consensus for blockchain in Lightweight IoT Network. In: 2019 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, pp. 1–4 (2019) 14. Lei, X., Lin, C., Zhimin, G., et al.: EPBC: efficient public blockchain client for lightweight users. In: Proceedings of the 1st Workshop on Scalable and Resilient Infrastructures for Distributed Ledgers (SERIAL 2017), pp. 1–6 (2017) 15. Huan, C., Yijie, W.: MiniChain: a lightweight protocol to combat the UTXO growth in public blockchain. J. Parallel Distrib. Comput. 143, 67–76 (2020)

Collaborative Face Privacy Protection Method Based on Adversarial Examples in Social Networks Zhenxiong Pan1,2 , Junmei Sun1(B) , Xiumei Li1,2 , Xin Zhang1 , and Huang Bai1 1 School of Information Science and Technology, Hangzhou Normal University,

Hangzhou 311121, China [email protected] 2 Key Laboratory of Cryptography of Zhejiang Province, Hangzhou 311121, China

Abstract. The face image de-identification method commonly used to protect personal privacy on social networks faces the problem of not being able to assure the usability of image sharing. Although the privacy protection method based on adversarial examples meets this requirement, there are still potential hidden problems of information leakage and the compression processing of images by social networks will weaken the privacy protection effect. In this paper, we propose a collaborative privacy protection method based on adversarial examples for photo sharing services on social networks, called CSP3 Adv(Collaborative Social Platform Privacy Protection Adversarial Example). We use the perturbation transfer module, which avoids the information leakage caused by accessing to the original image. Moreover, we use the frequency restriction module to guarantees the privacy of users’ face images after social network compression. The experimental results show that CSP3 Adv achieves better privacy protection for various face recognition models and commercial API interfaces on different datasets. Keywords: Adversarial examples · Face verification · Privacy protection · JPEG compression

1 Introduction Facial images contain sensitive biological recognition information that is closely related to personal and property security. With the rapid development of social media and platform networks, the widespread use of facial recognition technology poses a certain threat to users’ privacy and security. Unauthorized malicious third parties engage in illegal information processing activities by capturing facial images on social platforms, infringing on the privacy of peoples’ biological recognition information. Therefore, identity information needs to be protected from excessive identification and utilization while ensuring the usability of image sharing. Under the background of facial recognition technology, privacy protection requires that personal biological feature data be used only for specific purposes [1]. Deidentification methods are used to prevent unauthorized use of facial recognition models © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 499–510, 2023. https://doi.org/10.1007/978-981-99-4755-3_43

500

Z. Pan et al.

and facial attribute analysis. Research on de-identification privacy protection of facial images can be divided into three categories: (1) Face de-identification is a visual privacy protection method that mainly uses visual processing methods such as blurring and pixelation to destroy the readability of facial images for de-identification [2] However, studies have shown [3] that even after visual processing, identity information can still be correctly identified. (2) Facial attribute de-identification methods refer to deidentification methods that change the facial feature area of the face image and affect feature extraction representation, such as replacing or distorting the facial feature area for de-identification, thus achieving the purpose of privacy protection. (3) De-identification based on adversarial examples refers to adding small adversarial perturbations to construct adversarial examples [4]. But the transmission process of data and temporary storage of images has data security issues that require direct access to original image. And most social platforms compress images uploaded by users to save storage space, will ultimately decrease the privacy protection effect of adversarial examples.

Fig. 1. CSP3 Adv face privacy protection process

In order to avoid information leakage during the privacy protection process while ensuring the availability of images after social platform compression, our contributions are listed as follows. • We propose a collaborative privacy-preserving method, called CSP3 Adv, for imagesharing services in social networks as shown in Fig. 1, which further enhances the effectiveness of privacy protection between client and server. • We limit the distribution of adversarial perturbations by performing Gaussian filtering on privacy-preserving adversarial examples during the training phase and enhance the compression resistance of privacy-preserving adversarial examples. • Experiments on multiple face recognition models and commercial API interfaces for multiple face datasets shows that CSP3 Adv can achieve great privacy protection.

Collaborative Face Privacy Protection Method

501

2 Face Privacy Protection Related Work 2.1 Face De-identification Face de-identification methods interfere with feature information related to identity representation by means of image processing methods such as mosaics, blurring, and pixelation. Although face de-identification methods are simple and easy to implement, they are no longer effective in providing privacy protection as face recognition models continue to evolve. Wilber et al. [5] showed that images processed by blurring, reducing image brightness are still correctly recognized by face detection models. 2.2 Face Attribute De-identification GAN [6] has been widely used in a range of generative tasks because of its ability to generate real and natural images. In the field of face recognition, He et al. [7] combined U-Net [8] with GAN to propose an image privacy protection algorithm PriGAN, but as the disturbance increases, the image will have a checkerboard effect. Wu et al. [9] proposed a de-identification framework PP-GAN, which retain the pixel structure of images while incorporating new identity information into the feature space. But the methods above all change the facial visual features of the original individual and not satisfy the usability requirements for image sharing. 2.3 Face Adversarial Attack Shan et al. [10] proposed Fawkes, which protects face images from unauthorized processing by adding small perturbations. However, the presetting factors such as age and skin color of the target will affect the privacy protection effect to some extent. However, Fawkes require direct access to images when constructing adversarial examples, which still poses a risk of information leakage. Zhang et al. [11] proposed Adversarial Privacy-preserving Filter (APF) method based on adversarial gradient transformation for end-cloud collaborative adversarial attacks to avoid this problem. 2.4 Face Anti-compression Adversarial Attack JPEG compression is a compression method based on the human visual system that uses discrete cosine transform to suppress high-frequency information such as pixel intensity and tone [12]. JPEG compression is also used as an effective defense method against adversarial examples. For example, Das et al. [13] proposed the SHIELD adversarial defense method achieving good defense results after proving that JPEG compression can eliminate certain adversarial perturbations [14]. Currently, anti-compression adversarial attack methods in the field of face recognition can be mainly divided into two categories: (1) replacing the non-differentiable JPEG compression process with differentiable mathematical functions or training compressed approximate network models. For example, Shin and Song et al. [15] replaced the JPEG quantization process with an approximate differentiable function but have poor transferability to unknown JPEG compression. (2) Constructing low-frequency adversarial

502

Z. Pan et al.

examples to obtain better anti-compression performance. Zhang et al. [16] proposed lowfrequency LFAP and low-to-medium frequency LMFAP adversarial attacks by training a source model that extracts low-frequency features, which have good anti-compression performance. However, this method also requires many compressed images to supplement the source model training. The CSP3 Adv proposed in this paper does not require modification or supplementation of the training set.

3 Method 3.1 Definition Given a face image x, after inputting a face recognition model f θ , an l-dimensional feature vector will be output f θ (x). Given another registered face image x e of the same user, when the feature vector distance d( f θ (x), f θ (x e )) is less than a specific threshold th in the face recognition model f θ , they are considered to belong to the same person; otherwise, they are considered to belong to different individuals. 3.2 Model The CSP3 Adv mainly consists of three modules during training, as shown in Fig. 2:

Fig. 2. CSP3 Adv privacy preservation training framework

(1) Perturbation acquisition module: Based on the original image x and the enrolled image x e , the user side first constructs the gradient perturbation so through gradient attack methods on the lightweight face recognition model f θ . (2) Perturbation transfer module: Adversarial transfer network T learn the corresponding adversarial perturbation s based on the so under the complex face recognition model f θ  on the server side. (3) Frequency restriction module: Add the adversarial perturbation s to the original image x to obtain a privacy-preserving adversarial example x p and then uses Gaussian filtering to obtain a smoothed image G(x p ). Finally, calculate the feature metric distance loss between G(x p ) and x e on model f θ . By minimizing this loss function between them,

Collaborative Face Privacy Protection Method

503

optimize and update the parameters of perturbation transfer network T. Inference phase does not require this frequency restriction module. Perturbation Acquisition Module. We select MobileFacenet [17] as the lightweight face recognition model f θ deployed on the user end to extract features. Within the perturbation range to generate gradient perturbation so . Here, we use DI2 FGSM [18] as the gradient attack method, shown in Eq. (1): son+1 = clipε {son + α · sign(∇x L(x, xe ; θ ))}

(1)

where θ represents the model parameters, α represents the attack step size, ε represents the upper limit of perturbation in l∞ space, sign (·) represents the sign function, and L (·) represents the loss function that measures image distance metric, as shown in Eq. (2): L(x, xe ; θ ) = −d (fθ (x), fθ (xe ))

(2)

where f θ (·) represents the image feature vector extracted by the face recognition model f θ ; d (·) represents the Euclidean distance function. Perturbation Transfer Module. Based on the user side’s gradient perturbation so , we propose perturbation transfer network T learn the corresponding adversarial perturbation s, as shown in Eq. (3). Through end-to-end learning can avoid information leakage and improve the efficiency and privacy protection effect. Specifically, we select Arcface [19] as the complex face recognition model f θ . s = T (so )

(3)

For the end-to-end perturbation transfer network T, we adopt the fully convolutional network with encoding-decoding structure as the main architecture in Fig. 3.

Fig. 3. Network structure of the perturbation transfer network

The encoding-decoding structure can deeply extract the perturbation information of face images and supplement the coordinate feature information related to face image representation through skip connections with coordinate attention (CA) [20], thus reduce redundant perturbations, improve the image quality and the attack effects. Frequency Restriction Module. In order to improve the privacy protection effect of adversarial examples after being compressed by social networks, we introduce a frequency restriction module, which restricts high-frequency disturbances through Gaussian filtering. Due to the non-differentiability of the quantization process of JPEG

504

Z. Pan et al.

compression, it cannot be used for gradient optimization training to improve the anticompression performance of adversarial examples. Although Gaussian filtering cannot reduce the storage space, as a differentiable linear low-pass filter, it can filter out highfrequency details of images to some extent. The weight of Gaussian filtering follows a two-dimensional normal distribution, and Gaussian filtering has different smoothing effects under different weights for pixels and Gaussian kernel windows as shown in Eq. (4): w(x, y) =

2 1 − (x−n)2 +(y−n) 2σ 2 e 2π σ 2

(4)

where σ is the standard deviation, the larger the weight distribution, the more uniform the weight distribution, and the smoother it is; the larger the Gaussian kernel window n, the stronger the filtering effect, and the more blurred the image. In this paper, by calculating image compression processing and Gaussian filtering processing with different windows and standard deviations on the training dataset, we minimize the grayscale difference under 1 norm to maximize the grayscale similarity between Gaussian filtering and target compression methods, thus obtain the most suitable Gaussian kernel parameters. We consider using JPEG80 (PSNR = 40dB) as the target compression method. After experiments, we use Gaussian kernel with a window size of 3 × 3 and a standard deviation of 3 to perform Gaussian filtering on adversarial examples generated by CSP3 Adv to improve their anti-compression performance. We perform Gaussian filtering on the adversarial example x p with added adversarial perturbation s to obtain G(x p ), and iteratively trains CSP3 Adv by minimizing the loss function shown in Eq. (5):      minfθ G xp − fθ (xe )2

(5)

where f θ  (·) represents the feature vector extracted by the server-side face recognition model from the image and ||·||2 represents the Euclidean distance. By minimizing the objective loss function, the perturbation is more distributed in the low-frequency area of the original image. In the test phase, CSP3 Adv’s perturbation acquisition module and perturbation transfer module are deployed on the user side and social platform server side respectively. Users can achieve effective privacy protection by uploading CSP3 Adv privacy-preserving adversarial examples to social networks, even after JPEG compression by the social platform.

4 Experiment 4.1 Experimental Setup Datasets. During the training phase, we use MS-Celeb-1M [21] as the training set, and LFW (Labeled Faces in the Wild) dataset [22] as the test set. During the inference phase, CSP3 Adv privacy protection against adversarial examples was verified on LFW, CALFW, CPLFW, AgeDB-30 and CFP-FP datasets. The datasets consist of image pairs

Collaborative Face Privacy Protection Method

505

of different individuals, with one image of the subject being used as the original image to be protected, and another image being used as an enrolled face image. • MS-Celeb-1M: MS-Celeb-1M is a dataset of 10 million face images of 100K celebrities collected by networks for the MSR IRC competition. We randomly select 50,000 images of 1,000 celebrities for training. • LFW: LFW dataset contains 13,233 images of faces collected from the web. This dataset consists of 5,749 identities with 1,680 people with two or more images. In the standard LFW evaluation protocol, verification accuracies are reported on 6,000 face pairs. We select 6,000 images to form 3,000 pairs of image pairs as the test set. Face Recognition Models. We chose MobileFaceNet as the lightweight face recognition model deployed on the user side and Arcface as the complex face recognition model deployed on the server side. Five SOTA face recognition models and commercial API interfaces FaceNet [23], VGG-Face [24], Insightfaceiresnet100, BaiduAPI, and Face++API are used to verify the privacy protection effect of CSP3 Adv. Evaluation Metrics. In order to measure the privacy protection effect and image usability of face images, we evaluate CSP3 Adv from two aspects: the attack effect and image quality of the adversarial examples. In order to measure the effectiveness of privacy protection, we use Attack Success Rate (ASR) as an evaluation index for the privacy protection effect of adversarial examples. Taking image pairs x and x e as an example, the calculation process of ASR is shown in Eq. (6): ASR =

1 N  (f (x, xe ) < τ ) N i=1

(6)

where N represents the number of test images, f represents the distance measurement function of the face recognition model, and τ represents the Euclidean distance discrimination threshold obtained by the face recognition model under a False Acceptance Rate (FAR) of 1% for face verification tasks. In order to measure the usability of adversarial examples for privacy protection, we use SSIM and PSNR as evaluation indicators for quantifying image quality. SSIM indicates quantifying structural similarity between two images, with a value range of [0, 1]. PSNR measuring the maximum signal value and background noise, with a unit of dB. Generally, SSIM > 0.8 and PSNR > 30 indicates that there is no significant difference, representing real and natural images. Setting. The CSP3 Adv network uses an ADAM optimizer with a learning rate of 0.0001, and each mini batch contains 100 face images. All experiments are conducted under TensorFlow r1.14.0 and NVIDIA Corporation TU102 GPU. During the training phase, the embedding feature dimensions of MobileFaceNet and ArcFace are set to 192 and 512 respectively. The perturbation limit ε = 8 in CSP3 Adv is difficult for human observers to detect [20]. The step size α = 2 and iteration T = 40. The Euclidean distance discrimination thresholds for LFW, CALFW, CPLFW, Agedb-30 and CFP-FP datasets are 1.12, 1.10, 1.13, 1.16 and 1.29 respectively.

506

Z. Pan et al.

4.2 Experimental and Result Analysis Privacy Protection. In order to verify that CSP3 Adv’s privacy protection adversarial examples can have good privacy protection effects on different datasets, we conduct experiments on LFW, CALFW, CPLFW, Agedb-30 and CFP-FP datasets using the whitebox Arcface face recognition model, as shown in Fig. 4.

Fig. 4. The ASR (%) of the privacy-preserving adversarial examples generated by CSP3 Adv against the Arcface face recognition model for different datasets

We can learn from Fig. 4 that CSP3 Adv achieve over 99% ASR on the Arcface on most datasets. It also achieved an attack success rate of over 95% on the CFP-FP dataset with poor training data acquisition conditions, achieving cross-dataset privacy protection for the Arcface. In addition, the privacy protection adversarial examples generated for each dataset are all real and natural, meeting the usability requirements of social platform images. In order to verify the privacy protection effectiveness of CSP3 Adv on black-box models, we use multiple face recognition models and commercial API interfaces, and select Fawkes method as a comparison, experimental results are shown in Table 1. Fawkes has three optional schemes: Low, Mid and High, corresponding to different degrees of perturbation. We can learn from Table 1 that when Fawkes set to High mode can achieve more than 91% privacy protection on FaceNet, but the privacy protection effect on other black-box face recognition models is poor. The attack success rate on Insightfaceiresnet100 model is less than 53%, the generalization of attack is poor. Under the premise of image useability (SSIM > 0.8, PSNR > 30), CSP3 Adv can achieve better privacy protection on various black-box face recognition models. Among them, the attack success rate is over 98% on FaceNet and BaiduAPI, over 80% on Insightfaceiresnet100 and Face + + API, and over 60% on VGG-Face. Because VGG-Face uses a deep backbone network different from white-box model Arcface. Anti-compression. To verify the privacy protection effect of CSP3 Adv after being compressed by JPEG on social networks, we use JPEG compression with different compression quality factors as a defense method to simulate an unknown compression process. The experimental results are shown in Table 2.

Collaborative Face Privacy Protection Method

507

Table 1. Attack success rate (%) of privacy protection adversarial examples against various blackbox face recognition models Methods

Datasets

FaceNet

Insightfaceiresnet100

VGG-Face

Baidu API

Face++API

Fawkes

LFW

91.66

17.83

27.89

86.29

48.91

CALFW

96.05

28.16

48.02

95.86

63.65

CSP3 Adv

CPLFW

96.09

48.40

39.26

95.05

70.55

Agedb-30

98.76

37.30

62.77

98.89

73.88

CFP-FP

99.39

52.66

58.43

98.65

88.07

LFW

100

80.66

61.37

98.23

87.09

CALFW

99.96

83.43

76.82

99.59

93.56

CPLFW

99.26

91.80

65.55

99.25

90.00

Agedb-30

99.93

89.80

86.93

99.93

97.41

CFP-FP

100

98.76

73.80

99.59

96.25

Table 2. The ASR of CSP3 Adv adversarial examples compressed by different JPEG compression quality factors against Arcface face recognition model (%). Datasets

JPEG90

JPEG80

JPEG70

JPEG60

JPEG50

JPEG40

JPEG30

JPEG20

LFW

99.17

98.73

98.27

97.90

97.33

96.33

94.10

86.33

CALFW

99.17

98.77

98.53

98.23

97.87

97.40

95.70

91.67

CPLFW

98.87

98.83

98.33

98.20

97.93

97.27

96.00

93.08

AgeDB-30 99.03

98.87

98.67

98.53

98.17

98.07

96.83

93.53

CFP-FP

91.80

91.20

90.49

89.56

87.37

85.09

79.66

92.23

According to the data in Table 2, as the compression quality factor of JPEG compression decreases, the attack effect of adversarial examples will decrease. CSP3 Adv can still maintain good privacy protection effect even under JPEG20 image compression. JPEG compression will also reduce the image quality of the image. The privacy protection adversarial examples after JPEG compression with different compression quality factors and their image quality evaluation index results are shown in Fig. 5 where the numbers before and after representing SSIM and PSNR results respectively. As the compression quality factor decreases, CSP3 Adv privacy protection adversarial examples can still maintain real and natural even under common JPEG80 and JPEG60 , satisfying the usability requirements of social networks.

508

Z. Pan et al.

Fig. 5. Privacy-preserving adversarial example image quality of CSP3 Adv under JPEG compression with different quality compression factors

5 Conclusion In this paper, we propose a privacy protection method CSP3 Adv for face verification based on adversarial examples, which can effectively protect the privacy and security of face images. CSP3 Adv avoids direct image acquisition by the server and information leakage by perturbation transfer in a coordinated manner between the user and the social network server. Meanwhile, we attempt to resist compression of privacypreserving images, the privacy-protected adversarial examples generated by CSP3 Adv still have good privacy protection effect after JPEG compression by social network and satisfy the availability of image sharing on social network. Experimental data shows that the privacy-protected adversarial examples generated by CSP3 Adv have good crossmodel and cross-dataset privacy protection effects and can effectively transfer to various face recognition models and BaiduAPI, Face++API. In the future, we will conduct experimental comparisons with other anti-compression privacy preservation methods.

Collaborative Face Privacy Protection Method

509

References 1. Othman, A., Ross, A.: Privacy of facial soft biometrics: suppressing gender but retaining identity. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 682–696. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_52 2. Hyvärinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 2000. https://doi.org/10.1016/s0893-6080(00)00026-5 3. Gross, R., Airoldi, E., Malin, B., Sweeney, L.: Integrating utility into face de-identification. In: Danezis, G., Martin, D. (eds.) PET 2005. LNCS, vol. 3856, pp. 227–242. Springer, Heidelberg (2006). https://doi.org/10.1007/11767831_15 4. Vakhshiteh, F., Nickabadi, A., Ramachandra, R.: Adversarial attacks against face recognition: a comprehensive study. IEEE Access 9, 92735–92756 (2021). https://doi.org/10.1109/access. 2021.3092646 5. Wilber M J., Shmatikov, V, Belongie, S.: Can we still avoid automatic face detection. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9. IEEE. New York (2016). https://doi.org/10.1109/wacv.2016.7477452 6. Goodfellow, I., Pouget-Abadie, J., Mirza, M.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020) 7. He, Y., Zhang, C., Zhu, X.: Generative adversarial network-based image privacy protection algorithm. In: Tenth International Conference on Graphics and Image Processing (ICGIP 2018), pp. 635–645. SPIE, Chengdu (2019). https://doi.org/10.1117/12.2524274 8. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 9. Wu, Y., Yang, F., Xu, Y., Ling, H.: Privacy-protective-gan for privacy preserving face deidentification. J. Comput. Sci. Technol. 34(1), 47–60 (2019). https://doi.org/10.1007/s11390019-1898-8 10. Shan, S., Wenger, E., Zhang, J.: Fawkes: protecting privacy against unauthorized deep learning models. In: Proceedings of the 29th USENIX Security Symposium, p. 16. USENIX Association, Berkeley (2020) 11. Zhang, J., Sang, J., Zhao, X.: Adversarial privacy-preserving filter. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1423–1431. Association for Computing Machinery, Seattle (2020). https://doi.org/10.1145/3394171.3413906 12. Prangnell, L.: Visible Light-Based Human Visual System Conceptual Model. arXiv preprint arXiv:1609.04830 (2016) 13. Das, N., Shanbhogue, M., Chen, S.T.: Shield: Fast, practical defense and vaccination for deep learning using jpeg compression. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 196–204. Association for Computing Machinery, New York (2018) 14. Das, N., Shanbhogue, M., Chen, S.T.: Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 196–204. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3219819.3219910 15. Shin, R., Song, D.: Jpeg-resistant adversarial images. In: NIPS 2017 Workshop on Machine Learning and Computer Security, pp. 1–8. Long Beach (2017) 16. Zhang, J., Yi, Q., Sang, J.: JPEG compression-resistant low-mid adversarial perturbation against unauthorized face recognition system. arXiv preprint arXiv:2206.09410 (2022) 17. Li, S., Zhang, H., Jia, G., Yang, J.: Finger vein recognition based on weighted graph structural feature encoding. In: Zhou, J., et al. (eds.) CCBR 2018. LNCS, vol. 10996, pp. 29–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-97909-0_4

510

Z. Pan et al.

18. Xie, C., Zhang, Z., Zhou, Y.: Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2730–2739. IEEE, Long Beach (2019). https://doi.org/10.1109/cvpr.2019. 00284 19. Deng, J., Guo, J., Xue, N.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4690–4699. IEEE, Long Beach (2019). https://doi.org/10.1109/cvpr.2019.00482 20. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13713– 13722. IEEE, Nashville (2021). https://doi.org/10.1109/cvpr46437.2021.01350 21. Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3319-46487-9_6 22. Huang, G.B., Mattar, M., Berg, T.T.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ Images: detection, alignment, and recognition. Marseille (2008) 23. Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 815–823. IEEE, Boston (2015). https://doi.org/10.1109/cvpr.2015.7298682 24. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision Conference (BMVC), pp. 41.1–41.12. BMVA Press, Swansea (2015). https:// doi.org/10.5244/C.29.41

Duty-Based Workflow Dynamic Access Control Model Guohong Yi1 and Bingqian Wu2(B) 1 Computer Science and Engineering Department, Wuhan Institute of Technology,

Wuhan 430205, Hubei, China 2 Hubei Provincial Key Laboratory of Intelligent Robot, Wuhan Institute of Technology,

Wuhan 430205, Hubei, China [email protected]

Abstract. Access control plays an important role in the era of big data as a key technology to ensure secure data sharing. Accomplishing access control in workflow systems has higher requirements on the dynamism and granularity of the model. We propose a duty-based access control model that decomposes workflow tasks into duties for dynamic authorization and combines atomic operations for effective duty division. The duties are assigned to users to achieve the separation of users and permissions. The dynamic authorization mechanism and duty separation constraint mechanism can better solve the information security and access control problems of flexible workflow in dynamic environment. Keywords: Access Control · Workflow · Dynamic Authorization · Role Based Access Control

1 Introduction In today’s world where the Internet is widely used, sharing data securely is an important requirement, where access control is a key technology to ensure secure data sharing [1]. Access control refers to the management and control of access to system resources and information, and the protection of them from unauthorized access, use, modification or destruction by means of authentication and authorization. Workflow [2] is the automation of all or part of a business process by passing execution documents, information, or tasks between different participants according to certain predefined rules. One of the most challenging issues in workflow management systems is access control, which not only controls user access to static resources but also restricts user access to dynamic tasks in the process. The traditional access control models are Discretionary Access Control (DAC) and Mandatory Access Control (MAC), which are directly associated with users and permissions and cannot flexibly adapt to the frequent changes of work assignees in workflow systems. The most commonly used in practical applications is Role Based Access Control (RBAC) [3], which was first proposed by Ferraiolo and Kuhu, and gradually derived from the conceptual models RBAC0∼RBAC3, which have different complexity of permissions and add more detailed authorization principles, but © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 511–521, 2023. https://doi.org/10.1007/978-981-99-4755-3_44

512

G. Yi and B. Wu

RBAC models need to define roles in different systems and then unify the management of roles, permissions, and users, which increases the complexity of the system. TaskBased Access Control (TBAC) can obtain the corresponding permission by activating a task, and the permission changes continuously according to the specific execution status of the task in the workflow, but it requires a complex authorization structure and authorization mechanism, and it does not consider the inheritance relationship between roles and roles. In order to make the model design meet the specific needs of workflow, ensure the difference with other systems, prevent the user unauthorized access, user fraud and other security problems, and improve the efficiency and flexibility of the authority model. A Duty-Based Access Control (DBAC) model is proposed, which combines RBAC and TBAC. It separates users and permissions through roles and duties, and combines atomic operations to carry out dynamic and effective duty division of tasks, in which roles are relatively fixed and static roles, while permissions are dynamically assigned to users through duties. A task is divided into one or more duties, with each duty corresponding to an atomic operation set. Users obtain permissions dynamically by reading the duty profile, and tasks are divided into duties by the process profile. The rest of the paper is organized as follows: Section II summarizes the related work and compares our approach with previous work. In Section III, the problems of RBAC and TBAC are analyzed. In Section IV, the duty-based access control model proposed in this article is introduced, as well as the separation of duties and dynamic authorization mechanism. Section V discusses the performance of the proposed approach and compares it with the traditional model. In Section V, the full paper is summarized and looked forward.

2 Related Work In order to improve the efficiency of access control and ensure the security of the workflow system, Jin et al. [5] proposed the first formal access control model RABAC. They extend RBAC with user and object attributes and add a component called a permission filtering policy (PFP). PFP requires that the filter function be specified in a Boolean expression consisting of user and object attributes. Their solution helps solve the role explosion problem, which facilitates user role assignment. However, this method does not contain environment properties, is not suitable for systems involving frequently changing properties, and does not adapt to workflow environments. Li Jinyan et al. [6] proposed a collaborative flexible workflow, introduced the concept of team, combined with the static authorization of roles and dynamic control of tasks, and realized the hierarchical authorization management of workflows. To ensure the correctness and performance of the access control strategy, Ma [7] et al. proposed an access control model based on risk-aware topics for the problem of insufficient authorization in the database, which uses topics to represent the content relationship between users and data, and grants users corresponding access rights according to their historical behavior and access requests, but he does not have permission control for task execution status. Content-based access control [8] is a model for content-centric access control. It uses internal relationships between different text files to grant users file-level access rights,

Duty-Based Workflow Dynamic Access Control Model

513

however, it is a file-level access control model that cannot divide paragraphs of a file into “accessible” and “inaccessible” parts to grant user access permission. Rajpoot [10] proposes an attribute-enhanced role access control model that provides a content-based authorization system while keeping the approach role-oriented to retain the advantages provided by RBAC by allowing the user to access a file by allowing the user to access it. The advantages provided by RBAC by allowing the use of an object’s attributes to specify permissions instead of using only the object’s identifier. Liu [11] and others designed the workflow-based enterprise-integrated automation system and its application model based on access control technology, which reflects the superiority of workflow-based enterprise-integrated automation systems. Tarkhanov [12] and others extend role-based security policies, adding a set of permissions, schemes, and algorithms that allow simultaneous execution of policies, solving the problem of object-versioning and dependent on the state of linked objects. Through analyzing the relevant research on access control models and workflow permission control at home and abroad, it can be seen that they have not considered the impact of dynamic authorization of atomic operations in workflow access control and authorization management systems, and cannot decouple the groupal model and business model well. The related extension models are also not perfect. The DBAC model studied in this paper is precisely to solve these problems and can be well applied in workflow systems.

3 Problem Raising Compared with the traditional access control model that directly assigns permissions to users, the RBAC model adds role attributes, through the assignment of users and roles, roles and permissions, indirectly linking users and permissions, and activating the corresponding user after authentication Role, and opening a session to describe all the roles corresponding to the user and their permission constraints. However, the relationship between users, roles, and permissions in the RBAC model is statically specified, which cannot meet the dynamically changing system requirements, and does not support fine-grained authorization. Assuming that there is a workflow with 3 tasks, and there are 3 authorized operation permissions, then use the role-based access control model to authorize the workflow as shown in Fig. 1, which shows the combination of operations into permission sets, and then the Combination of task sets with priority constraints, arrangement and combination into the authorization process of role sets, and finally calculated that there may be 15 different roles. According to the above reasoning, when the approval workflow has n tasks and m m kinds of role types of operation permissions, there are Cn1 P11 + Cn2 P22 + . . . + Cnn Pm authorization situations. The more permissions or tasks there are, the more roles may be authorized. RBAC supports fine-grained authorization and access control by adding a large number of roles, which obviously increases the management cost and complexity. It is easy to cause role explosion (role explosion), and it is difficult to meet the dynamic access control requirements in workflow scenarios. The TBAC model is similar to RBAC in that permissions are assigned to tasks, and then the authorization relationship is determined according to the tasks. However,

514

G. Yi and B. Wu

Fig. 1. Schematic diagram of RBAC authorization

it requires a complex authorization structure and mechanism, it doesn’t consider the inheritance relationship between roles. In general, it is still in the stage of theoretical research and is not suitable for practical application.

4 The Proposed DBAC 4.1 Overview of the Proposed Model The schematic diagram of the DBAC model is shown in Fig. 2, which combines the rolebased access control model and the task-based access control model to separate users and permissions through roles and duties. It can combine atomic operations to dynamically and effectively divide duties for tasks, where roles are relatively fixed static roles and permissions are dynamically assigned to users through duties. Tasks are decomposed into one or more duties, one duty corresponds to one atomic set of operations, users dynamically obtain permissions by reading duty profiles, and tasks are decomposed into duties through process profiles. Both configuration files are in JSON format and record the mapping between users and duties and tasks and duties in the form of key-value. When the business approval process enters, the process is first decomposed into specific tasks, and then effectively and dynamically divided duties according to atomic operations, and then dynamically assigned to roles when the tasks are executed to a certain state. Instead of static authorization like RBAC, which is performed before the task is executed. This ensures that permissions are minimal and fine-grained, collections of atomic operations. It can also be dynamically authorized according to the task status, and the permissions can be withdrawn after the task is completed, which can be changed in real time according to the actual situation of the workflow running, avoiding the problem of role explosion. 4.2 Formal DBAC Model Setting the approval workflow in DBAC is composed of a series of duties, and duties are defined as delegations that assign permissions to roles. It consists of a task, the minimum set of permissions required to complete the task, and the group to which the task belongs. Duty is the basic unit of role authorization. On the one hand, it serves as

Duty-Based Workflow Dynamic Access Control Model

515

Fig. 2. Duty-based access control model.

an abstraction for performing tasks, and associates the organizational model with the process model; on the other hand, it is a collection of permissions, tasks and groups. Dynamic authorization is performed when the workflow task reaches a certain state, so that the user has the minimum authority for a specific task of an group when performing the task. This way of separating users and permissions through duties and roles enables permissions to be an atomic operation set, ensuring the principle of least privilege. The definition and description of the key elements and the relationship between elements in the model are as follows: (1) Session set: S = {s1 , s2, · · · , sn }, si (1 ≤ i ≤ n) represents a session. After the user is activated in the system, a session will be created. The session records the specific operations when the system is running, and the user is associated with the role through the session. si = (sidi , uidi , ridi , sti , eti , statui ), represent session id, user id, role id, session start time and end time and session status respectively. (2) Task status set: Ts = {ts1 , ts2 , · · · , tsn }, tsi (1 ≤ i ≤ n) is the state of a task. During the lifetime of a task, the task may be in different states: existing, running, waiting, completed, failed, etc. In this paper, only the waiting, running and completed states of a task are considered in the access control of workflow. A task can only correspond to one state at a certain moment, and different tasks can correspond to the same state. (3) Duty set: D = {d1 , d2 , · · · , dn }, di (1 ≤ i ≤ n) is a ternary set, di = (gm , (tm ,tsm ), pm ), gm ⊆ G is the set of groups associated with the duty di , tm ⊆ T is the set of tasks associated with the duty di , tsm ⊆ TS is the current status of the task, pm ⊆ P is the minimum set of permissions required to complete the duty di。 (4) Permision set: P = {p1 , p2, · · · , pn }, pi (1 ≤ i ≤ n) is the ability of the user to perform atomic operations on tasks, which consists of atomic operations and resources, pi = (op,obj). The set domain is the Cartesian product of the two, it can be expressed as P = (op × obj). Atomic operation set: Op = {op1 , op2, · · · , opn }, opi (1 ≤ i ≤ n) is a collection of atomic operations of users on resources at the time of duty authorization. In this

516

G. Yi and B. Wu

study, the atomic operations are initiating process, reading, editing, adding, deleting, and approving operations on resources. Object resource: Obj = {obj1 , obj2, · · · , objn }, obji (1 ≤ i ≤ n) is an object resource object to which the subject has access at the time of duty authorization. (5) Group Hierarchy(GH): GH ⊆ G × G, is a partial order relationship on the set of groups, noted as ≺, gi ≺gj indicates that gi is the parent group of gj , superiors and subordinates are generally in a one-to-many relationship. (6) Task-duty assignment relationship (TD): TD ⊆ T × D, indicates the task ti ∈ T is assigned to duty di ∈ D. This relationship is one-to-many, a task can be visualized as multiple duties, but a duty can only correspond to one task, noted as ti → di。 (7) Duty-role assignment relationship (DR): DR ⊆ D × R, indicates the duty di ∈ D is assigned to role ri ∈ R. This relationship is many-to-many, noted as di → ri。 4.3 Separation of Duties Verification Mechanism The most basic requirement of access control in a workflow system is the separation of duties. The principle of separation of duties is a constraint to reduce the potential danger of fraudulent crimes. It is mainly divided into two types [13], static separation of duty (S- SoD) and dynamic separation of duty (D-SoD). S-SoD considers whether the delegating party and the delegating party have conflicting duties, and a user or a pair of conflicting users cannot be assigned conflicting roles. D-SoD means that a pair of mutually exclusive roles cannot be activated by the same user or a pair of conflicting users in a session at the same time. S-SoD is too strict in actual application scenarios because a user often assumes multiple roles at the same time. To satisfy S-SoD, it may be necessary to assign multiple employees to participate in a workflow, which increases the cost of human resources. D-SoD assigns executors based on specific duty instances and session information, which is more suitable for access control requirements in approval workflows. In an approval workflow environment, there is a possibility of conflicting permissions, roles, tasks, users, and duties. DBAC focuses on considering the conflict relationship between duties, and defines the following two common conflict relationships according to the characteristics of the approval workflow: Definition 1 Mutually Exclusive Duty (MED): If the duty di and the duty dj cannot be executed by the same user, then these two duties are called mutually exclusive duties, denoted as di ⊗dj。 Definition 2 Binding Duty (BD): If the duty di and the duty dj must be performed by the same user, then these two duties are called binding duties, denoted as di≡dj。 A process instance executed in an approval workflow can be formalized as a duty instance set Id. Dynamic separation of duties relies on the separation of duty instances, for which the following four mapping functions are defined. Based on the mapping function, two dynamic separations of duties authorization constraint rules for the two types of duty conflicts are derived. The duty instance mapping function maps the duty instance to the corresponding duty, noted as: duty : Id → D. The duty execution role mapping function maps each duty instance to a role, noted as: role : Id → R.

Duty-Based Workflow Dynamic Access Control Model

517

The duty executor mapping function maps each duty instance to an executor, noted as: executor : Id → U The user role mapping function maps roles to users, noted as: user : R → U. Rule 1: Mutually exclusive constraint rules ∀idi , idj ∈ Id , rx , ry ∈ R, i = j, x = y, u ∈ U ,         duty idi ∧ di = duty idj ∧ rx ∈ role idi ∧ ry ∈ role idj ∧ u       ∈ user(rx ) ∧ u ∈ user ry ∧ u = executor idi ∧ di ⊗ dj ⇒ u   = executor idj Rule 1 indicates that if duty di and duty dj are mutually exclusive duties, di is assigned to role rx and dj is assigned to role ry for execution, assigning roles rx and ry to user ui . The duty instance idi of di and the duty instance idj of dj belong to the same duty instance set Id , and the user u has already executed idi , then u cannot be assigned to execute idj again. Rule 2: Binding constraint rules ∀idi , idj ∈ Id , rx , ry ∈ R, i = j, x = y, u ∈ U ,         duty idi ∧ di = duty idj ∧ rx ∈ role idi ∧ ry ∈ role idj ∧ u       ∈ user(rx ) ∧ u ∈ user ry ∧ u = executor idi ∧ di ≡ dj ⇒ u   = executor idj Rule 2 indicates that if duty di and duty dj are binding constraint duties, di is assigned to role rx and dj is assigned to role ry for execution, assigning roles rx and ry to user u. The duty instance idi of di and the duty instance idj of dj belong to the same duty instance set Id , and the user ui has already executed idi , then u should be assigned to execute idj too. 4.4 Dynamic Authorization Secure dynamic authorization and support for least privilege are also important requirements for access control in workflow systems. The DBAC model regards duty as the smallest unit of authorization and revocation, which corresponds to a set of atomic operations. However, in the actual workflow, the execution status of each task is constantly changing during the execution process, and the duties assigned to users should also change. Linking the granting of permissions to recycling and task execution states can better support the principle of least privilege constraints [14]. Dynamic authorization for workflow tasks is defined as a mapping of duty sets and role sets, and duty sets come from the mapping of group sets, task sets, and permission sets to duties. The authorization mapping function can be defined as Authorization = (GD × ITS × PD) × DR, ITS is the mapping relationship between tasks and task states,

518

G. Yi and B. Wu

which can be formally expressed in the following matrix. ⎡ ⎢ tts11 ⎢ ⎢ ⎢ tts21 ITS = ⎢ ⎢ . ⎢ .. ⎢ ⎣ ttsn1

⎤ tts12 tts13 ⎥ ⎥ ⎥ tts22 tts23 ⎥ ⎥ .. ⎥ .. . . ⎥ ⎥ ⎦ ttsn2 ttsn3

(1)

n×3

The number n of matrix rows is the number of tasks, and the number of matrix columns 3 is the number of task states. In the matrix, tsij = 1 means that task ti is in state tsj at this time, and tsij = 0 means that task ti is not in state tsj at this time. At a certain moment, a certain task will only be in a certain state, so the row vector in the ITS matrix has one and only one component as 1. Through the matrix mapping as above, a matrix is finally obtained, which is the authority granted to the role.

Fig. 3. Delegation of duties diagram

Figure 3 shows a schematic diagram of duty authorization. With such an authorization mechanism, it is possible to calculate the permissions that a role has at a given moment to perform its duties, ensuring that the authorization given to the role is unique and dynamic. The authorization is the least privilege, which is a collection of atomic operations. Before a user requests authorization, the separation of duties is verified, and the system can only perform subsequent authorization operations if the constraints are met. The system authorizes the role by querying the generated profile with the set of duties and finally assigns the atomic operation corresponding to the current duty to the user.

Duty-Based Workflow Dynamic Access Control Model

519

5 Performance Analysis and Comparison 5.1 Performance Analysis The improved duty-based access control model builds on the existing RBAC and TBAC models. With the addition of duty sets and related definitions and usage mechanisms, DBAC can better meet the access control needs of approval workflow systems in terms of application than RBAC and TBAC models. (1) The advantages of the traditional RBAC and TBAC models are retained. RBAC is role-centric, TBAC is task-centric, and the DBAC proposed in this study retains both role entities and task entities. Using duties as the intermediary of their links to form a task-duty-role mapping can be easily carried out for authorization management and make the management of users more granular. (2) Implement dynamic authorization. In the improved model, duties become the center of the connection of entities, permissions are neither directly assigned to roles nor acquired only through the execution of tasks. Tasks are further decomposed into duties associated with atomic operations. Dynamic authorization occurs when the workflow task of the group is executed, different states assign different permissions, and the permissions are retracted after the task status changes. This authorization greatly improves the dynamic adaptability of the system. (3) Solve the problem of least privilege constraint illusion. In the DBAC model, the set of atomic operations is associated with permissions. A permission corresponds to only one atomic operation, so it also constitutes a minimum privilege set. Moreover, users do not have permissions when they do not perform specific tasks, and the permissions granted when the task status is different. Therefore, there will be no excessive permissions, and the principle of least privilege constraints will be truly realized. (4) Guarantee the principle of separation of duties. Segregation of duties is achieved by avoiding conflicts, which reduce the likelihood of fraud or fatal mistakes by avoiding users having conflicting duties. This paper defines the duty constraints and verifies the separation of duties for the correspondence between users-roles-duties, reducing the burden of administrators when facing a large number of users and information resources. 5.2 Comparison with Traditional Models The RBAC is one of the most commonly used models in practical application scenarios, and this section compares DBAC and RBAC through an example. Suppose there is a payroll approval workflow in a labor company g1 , and the approval workflow consists of four tasks: company financial approval task t1 , team leader approval task t2 , worker approval task t3 , and project manager approval task t4 . The task states are waiting state ts1 , running state ts2 , and completed state ts3 . For the operation permission of payroll, there are the permission to initiate the approval process p1 , the approval permission p2 , the read permission p3 , and the edit permission p4 . Only the company finance has permission p1 , the four approval roles need to have permissions p2 and p3 , and the company finance and team leader have permission p4 .

520

G. Yi and B. Wu

For example, when the team leader approves the task to reach the execution state, the dynamically delegated duty is d1 = (g1, (t2,ts2), {p2, p3, p4}). Under the same circumstances, the team leader of another labor company g2 can replace only one of the group sets g1. When the task completion status changes to ts3, only ts2 needs to be replaced, and so on for other entities. In other words, if the DBAC is used for dynamic authorization of duties, only the corresponding reusable entity set needs to be replaced when the task arrives. New dynamic duties are created, and roles are relatively fixed, which greatly reduces the number of static roles. When using the RBAC, team leaders with different organizations, different task statuses, and different permission sets need to specify different roles, resulting in a geometric increase in the number of roles. RBAC is also inadequate for separation of duties. For example, if a team leader has MED with a worker, you can set the permission set to {p2, p3} when the task status is ts2, instead of creating an additional role for the team leader as in RBAC.

6 Conclusion In this paper, we propose a duty-based access control model to solve the dynamic permission control problem in the approval workflow scenario. Firstly, we analyze the access control problems in the approval workflow and compare the shortcomings of RBAC and TBAC. Subsequently, We propose the design idea of effective division of duties for tasks combined with atomic operations. DBAC maps a combination of tasks, organizations, and permissions to duties, then duties to roles, and finally to users. Because permission is a collection of atomic operations, it is the most accurate and fine-grained. Permissions are dynamically authorized according to the task status, and permissions are withdrawn after the task is completed, without generating too many static roles. According to the actual situation of different workflows, only the reusable entity set in it needs to be replaced to achieve real-time change results.

References 1. Uddin, M., Islam, S., Al-Nemrat, A.: A dynamic access control model using authorising workflow and task-role-based access control. IEEE Access 7, 166676–166689 (2019) 2. Erickson, B.J., Langer, S.G., Blezek, D.J., et al.: DEWEY: the DICOM-enabled workflow engine system. J. Digit. Imaging 27(3), 309–313 (2014) 3. Ferraiolo, D., Kuhn, R.: Role-based access controls. In: Proceedings of the 15th National Computer Security Conference (NCSC), pp. 554–563 (1992) 4. Cai, F., Zhu, N., He, J., et al.: Survey of access control models and technologies for cloud computing. Clust. Comput. 22, 6111–6122 (2019) 5. Jin, X., Sandhu, R., Krishnan, R.: RABAC: role-centric attribute-based access control. In: Kotenko, I., Skormin, V. (eds.) MMM-ACNS 2012. LNCS, vol. 7531, pp. 84–96. Springer, Heidelberg (2012) 6. Jinyan, L., Zonghua, Y.: Collaboration-oriented flexible workflow access control mechanism. Comput. Integr. Manuf. Syst. 23(06), 1234–1242 (2017) 7. Ma, K., Yang, G.: RTBAC: a risk-aware topic-based access control model for text data with paragraph-level authorization. Secur. Commun. Netw. 2022, 3371688 (2022)

Duty-Based Workflow Dynamic Access Control Model

521

8. Zeng, W., Yang, Y., Luo, B.: Content-based access control: use data content to assist access control for large-scale content- centric databases. In: Proceedings of the IEEE International Conference on Big Data, pp. 701–710, Washington, DC, USA, October (2014) 9. You, M., Yin, J., Wang, H., Cao, J., Wang, K., Miao, Y., Bertino, E.: A knowledge graph empowered online learning framework for access control decision-making. World Wide Web 26(2), 827–848 (2022). https://doi.org/10.1007/s11280-022-01076-5 10. Rajpoot, Q.M., Jensen, C.D., Krishnan, R.: Attributes enhanced role-based access control model. In: 2015, Valencia, Spain, September 1–2, 2015, Proceedings 12. Springer International Publishing, pp. 3–17 (2015) 11. Liu, Y., Zhao, Y., Li, K., et al.: Access control based intelligent workshop integrated automation system based on workflow engine. Comput. Electr. Eng. 87, 106747 (2020) 12. Tarkhanov, I.: Extension of access control policy in secure role-based workflow model. In: 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT), pp. 1–4. IEEE (2016) 13. Deng, B., Ding, J., Mann Jiang, E., et al.: A fine-grained data permission control framework. Journal of Shanghai University of Technology: 1–9 [2023–03–18]. 14. Yang, F.J., Ding, T., Fu, M.J., et al.: Research on RBAC-based permission complexity and reliability control model. Comput. Appl. Softw. 39(01), 30–38+59 (2022)

A High-Performance Steganography Algorithm Based on HEVC Standard Si Liu, Yunxia Liu(B) , Guoning Lv, Cong Feng, and Hongguo Zhao College of Information Science and Technology, Zhengzhou Normal University, Zhengzhou, China [email protected]

Abstract. In today’s digital age, digital video has the characteristics of wide sources, large capacity and strong camouflage, making it an excellent carrier for steganography. At the same time, HEVC video coding standard has good application prospects. This paper presents a novel and high-performance steganography algorithm based on HEVC standard. The secret message is embedded into the multi-coefficients of the selected 4 × 4 luminance QDST blocks to avert the intra-frame distortion drift. And a novel embedding scheme is used to reduce the modification of these embedded blocks. The experimental results show that the proposed algorithm can effectively increase the visual quality and get good embedding capacity. Keywords: HEVC · Video steganography · QDST · Multi-coefficients · Intra-frame distortion drift

1 Introduction Steganography [1] technology provides protection for the existence of secret information and can be seen as a type of secure communication and storage technology. The carriers of digital steganography include audio, image, video, text, network packet, and so on. Among them, the notable features of video are the large amount of data and more diverse content, such as video calls, video conferences, live broadcasts and other forms have been favored by people. As of June 2022, the number of online video (including short video) users in China reached 994 million, accounting for 94.6% of the total netizens. This also makes the research on video-based steganography algorithms very popular and meaningful. Nowadays, video services have become the service type with the largest proportion of internet traffic, and the vast majority of websites and applications have added video related content sections to some extent. Due to the huge capacity of video files, high bandwidth consumption, and high latency requirements of some service types (such as live streaming interaction and video conferencing), video services have also brought enormous storage and bandwidth pressure to application developers and cloud service providers. Driven by increasing demand, the pace of developing new video coding technologies in the industry has significantly accelerated. Numerous new coding solutions © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 522–531, 2023. https://doi.org/10.1007/978-981-99-4755-3_45

A High-Performance Steganography Algorithm

523

[2] have effectively reduced the pressure and costs on service providers, helping them provide higher quality video services to more users. HEVC (High Efficiency Video Coding) standard [3] was born in 2013 and is a nextgeneration solution launched by the industry for H.264. Compared with H.264, HEVC provides a compression ratio improvement of 50%–100% under the same image quality, and has good support for scenes such as 4k, 8k, HDR, and high frame rate videos. HEVC has received unanimous support from mainstream hardware and device manufacturers, and has a wide range of software and hardware ecosystems. Therefore, research on steganography methods based on HEVC standard has very high application prospects. Many coding coefficients in the HEVC standard can be used as the carrier of video steganography, such as intra prediction modes, motion vectors, DCT/DST coefficients, etc. [4–6]. For the intra-frame QDCT/QDST (Quantized DCT/DST) coefficients of the HEVC standard, some literatures have proposed corresponding video steganography algorithms [6–15]. [9] proposed a robust and improved visual quality steganography method for HEVC in 4 × 4 luminance QDST blocks. To improve the robustness of data hiding, the embedded data are first encoded into the encoded data by using the BCH syndrome code (BCH code) technique. To improve the visual quality of data hiding, three groups of the prediction directions are provided to limit the intra-frame distortion drift. [15] proposed a HEVC video steganography method based on intra-frame QDCT coefficient. The secret information is embedded into the coupling coefficient of the selected 8 × 8 luminance QDCT blocks to avert the intra-frame distortion drift, and the (7, 4) Hamming code is used to reduce the modification from information embedding. In this paper, two sets of multi-coefficients are used to avert the intra-frame distortion drift in 4 × 4 luminance QDST blocks. In order to further improve the visual quality of the proposed algorithm, a novel embedding scheme is used to reduce the modification of these embedded blocks. Experimental results show that the proposed algorithm has both good visual quality and high embedding capacity. The rest of the paper is organized as follows. Section 2 describes the theoretical framework of the proposed algorithm. Section 3 describes the proposed algorithm. Experimental results are presented in Sect. 4 and conclusions are in Sect. 5.

2 Theoretical Framework 2.1 Intra-frame Prediction The intra-frame prediction in HEVC utilizes surrounding reconstructed pixels to predict all pixels in the current block (utilizing the correlation of pixels in the spatial domain), and the entire video frame is partitioned into multiple square transform blocks. The supported transform block sizes are 4 × 4, 8 × 8, 16 × 16, and 32 × 32. For the transform block size of 4 × 4, an integer transform derived from a DST (Discrete Sine Transform) is applied to the luma residual blocks for intra-frame prediction modes. The sixteen pixels in the 4 × 4 QDST block are predicted by using the boundary pixels of the adjacent blocks which are previously obtained, which use a prediction formula corresponding to the selected optimal prediction mode, as shown in Fig. 1. Each 4 × 4 block has 33 angular prediction modes (mode 2–34).

524

S. Liu et al.

Fig. 1. Labeling of prediction samples

2.2 Intra-frame Distortion Drift Distortion drift refers to that embedding the current block not only causes the distortion of the current block, but also causes the distortion of its adjacent blocks. As illustrated in Fig. 2, we assume that current prediction block is Bi,j , then each sample of Bi,j is the sum of the predicted value and the residual value. Since the predicted value is calculated by using the samples which are gray in Fig. 2. The embedding induced errors in blocks Bi-1,j-1 , Bi,j-1 , Bi-1,j , and Bi-1,j+1 would propagate to Bi,j because of using intra-frame prediction. For convenience, we give several definitions, the 4 × 4 block on the right of the current block is defined as right-block; the 4 × 4 block under the current block is defined as under-block; the 4 × 4 block on the left of the under-block is defined as under-left-block; the 4 × 4 block on the right of the under-block is defined as under-right-block; the 4 × 4 block on the top of the right-block is defined as top-right-block, as shown in Fig. 3. The 4 × 4 block embedding induced errors transfer through the boundary pixels to these five adjacent blocks.

3 Description of Algorithm Process 3.1 Embedding Because the prediction block uses the boundary pixels of its adjacent blocks, if the embedding error just changed the other pixels of the current block instead of the boundary pixels used for intra-frame angular prediction reference, then the distortion drift can be avoided. Based on this idea, two conditions are proposed to prevent the distortion drift.

A High-Performance Steganography Algorithm

525

Fig. 2. The prediction block Bi,j and the adjacent encoded blocks

Fig. 3. Definition of adjacent blocks

Condition1: Right-mode ∈ {2–25}, Under-right-mode ∈ {11–25}, Top-right-mode ∈ {2–9}. Condition2: Under-left-mode ∈ {27–34}, Under-mode ∈ {11–34}. If the current block meets Condition1, the pixel values of the last column should not be changed in the following intra-frame prediction. If the current block meets Condition

526

S. Liu et al.

2, the pixel values of the last row should not be changed in the following intra-frame prediction. If the current block meets the Condition 1 and 2 at the same time, the current block should not be embedded. If both the Condition 1 and 2 cannot be satisfied, the current block can be arbitrarily embedded where the induced errors won’t transfer through the boundary pixels to the five adjacent blocks, that means the distortion drift won’t happen, but in this paper we don’t discuss this situation, the current block should also not be embedded. Two sets of multi-coefficients are proposed to meet the above conditions when embedded in. The multi-coefficients can be defined as a three-coefficient combination (C1 , C2 , C3 ), C1 is used for bit embedding, and C2 , C3 are used for distortion compensation. The specific definitions are as follows, multi-coefficients of VS applies to condition 1, and multi-coefficients of HS applies to condition 2: VS (Vertical Set) = (C i0 = 1, C i2 = −1, C i3 = 1), (C i0 = −1, C i2 = 1, C i3 = −1) (i = 0,1,2,3). HS (Horizontal Set) = (C 0j = 1, C 2j = −1, C 3j = 1), (C 0j = −1, C 2j = 1, C 3j = −1) ( j = 0,1,2,3). For example, add 1 to a00 in a 4 × 4 QDST block, then subtract 1 from a02 , and add 1 to a03 . This modification will not change the pixel value in the last column of this block, as shown in Fig. 4(a). Similarly, subtract 1 from a00 in a 4 × 4 QDST block, then add 1 to a20 , and subtract 1 from a30 . This modification will not change the pixel value of the last row of the block, as shown in Fig. 4(b).

Fig. 4. (a) Examples of VS (b) Examples of HS

Assume. ⎛ ⎞ a00 a01 a02 a03 ⎜ a10 a11 a12 a13 ⎟ ⎜ ⎟ ⎝ a20 a21 a22 a23 ⎠ is the 4 × 4 QDST coefficients matrix used to be embedded. a30 a31 a32 a33

A High-Performance Steganography Algorithm

527

If the QDST block meets Condition 1, then the coefficients a00 , a10 , a20 and a30 are used to embed secret information. If the QDST block meets Condition 2, then the coefficients a00 , a01 , a02 and a03 are used to embed secret information. Take the QDST coefficients a00 , a01 , a02 and a03 applicable to Condition 2 as an example. For Condition 1, the embedding process is similar. Set the 3-bit secret information to be embedded is m(m1 , m2 , m3 ). d = (a00 + 2 · a01 + 3 · a02 + 4 · a03 ) mod 9

(1)

Table 1. Coefficient modification comparison table d-m

−7

−6

−5

−4

−3

−2

−1

0

Modify

a01 −1

a02 −1

a03 −1

a03 + 1

a02 + 1

a01 + 1

a00 + 1

/

d-m

1

2

3

4

5

6

7

8

Modify

a00 −1

a01 −1

a02 −1

a03 −1

a03 + 1

a02 + 1

a01 + 1

a00 + 1

As shown in Table 1, first calculate the difference between d and m (decimal value), and then modify the corresponding QDST coefficient value. For example, if d-m equals −7, then let a01 minus 1, and the 3-bit secret information m is already embedded. At the same time, in order to prevent the intra-frame distortion drift, it is also necessary to add 1 to a21 and subtract 1 from a31 based on the HS rule, as shown in Fig. 5(a). Figure 5(b) is another more specific example, assuming m = 101 = 5, a00 = 5, a01 = 2, a02 = 3 and a03 = 1. Then d = 22 mod 9 = 4, d−m = −1, and according to Table 1, add 1 to a00 , now a00 = 6.

Fig. 5. (a) Examples of embedding (b) A specific examples

528

S. Liu et al.

By using this novel embedding scheme, embedding 3-bit secret information only needs to modify at most 3 QDST coefficients. If the conventional line-by-line embedding method is used, at most 9 QDST coefficients need to be modified to embed the 3-bit secret information. After the original video is entropy decoded, we get the intra-frame prediction modes and QDST block coefficients. By using this novel embedding scheme, we embed the secret information by the multi-coefficients into the selected 4 × 4 luminance QDST blocks which meet the conditions. Finally, all the QDST block coefficients are entropy encoded to get the carrier video. 3.2 Data Extraction After entropy decoding of the HEVC carrier video, the extraction operations are performed on the coefficients of the first row or first column in the selected 4 × 4 luminance QDST blocks which meet the corresponding condition. The extraction operation is repeated until the number of extraction bits is equal to the secret information bits notified in advance, thereby extracting the embedded secret information. Assume that the 3-bit secret information to be extracted is m . Take the received QDST coefficients a00 , a01 , a02 and a03 applicable to Condition 2 as an example. For Condition 1, the extraction process is similar. m = (a00 + 2 · a01 + 3 · a02 + 4 · a03 ) mod 9

(2)

If the data from the example Fig. 5(b) is used for validation, it can be calculated that: m = (6 + 4 + 9 + 4) mod 9 = 23 mod 9 = 5, which is the same as the embedded secret information m.

4 Case Study The proposed method has been implemented in the HEVC reference software version HM16.0. In this paper we take “BlowingBubbles” (416*240), “Keiba” (416*240), “BQMall” (832*480) and “FourPeople” (1280*720) as test video. The GOP size is set to 1 and the values of QP (Quantization Parameter) are set to be 16, 24 and 32. The method in [13] is used for performance comparisons. As shown in Table 2, the PSNR (Peak Signal to Noise Ratio) of our method is very close to the method proposed in [13] in each video sequences, and both show good visual effects. Because the method proposed in [13] used matrix encoding technology to reduce the modification in the QDST embedding block, the degree of modification to QDST blocks is similar to our method. In terms of embedding capacity, as shown in Table 3, the embedding capacity of our method is higher than the method in [13] of average per frame. Because our method has higher embedding efficiency, it can embed an average of 3 bits per QDST block, while the method in [13] can only embed an average of 2 bits per QDST block. Figure 6 to Fig. 7 are the video frame screenshots of two steganography methods. As can be seen, there is no visual distortion and the visual quality is very high.

A High-Performance Steganography Algorithm Table 2. PSNR(dB) of embedded frame in each video sequences Sequences

Method

QP = 16

QP = 24

QP = 32

BlowingBubbles

In this paper

47.54

42.21

36.48

In [13]

47.35

42.37

36.45

Keiba

In this paper

47.88

42.47

37.33

In [13]

48.02

42.36

37.01

BQMall

In this paper

47.42

42.63

37.12

In [13]

47.33

42.57

37.05

In this paper

47.55

42.38

37.13

In [13]

47.61

42.56

36.94

FourPeople

Table 3. Embedding capacity (bit) of embedded frame in each video sequences QP = 16

QP = 24

QP = 32

In this paper

5406

5949

5562

In [13]

3604

3966

3708

In this paper

5385

5661

5355

In [13]

3590

3774

3570

14799

16491

14961

Sequences

Method

BlowingBubbles Keiba BQMall

In this paper

9866

10994

9974

FourPeople

In this paper

25935

25536

22563

In [13]

17290

17024

15042

In [13]

Fig. 6. (a) Method in this paper (b) Method in [13]

529

530

S. Liu et al.

Fig. 7. (a) Method in this paper (b) Method in [13]

5 Conclusion This paper proposed a novel and high-performance steganography algorithm based on HEVC standard. Two sets of multi-coefficients are used to avert the intra-frame distortion drift in 4 × 4 luminance QDST blocks. In order to further improve the visual quality of the proposed algorithm, a novel embedding scheme is used to reduce the modification of these embedded blocks. Experimental results demonstrate the feasibility and superiority of the proposed method. Acknowledgment. This paper is sponsored by the National Natural Science Foundation of China (NSFC, Grant 61572447), 2023 Henan Science & Technology Breakthrough Project (No. 232102210126). Henan Big Data Development Innovation Laboratory of Security and Privacy, Henan International Joint Laboratory of Blockchain and Audio/Video Security, Zhengzhou Key Laboratory of Blockchain and CyberSecurity.

References 1. Liu, Y., Liu, S., Wang, Y., et al.: Video steganography: a review. Neurocomputing 335, 238– 250 (2019) 2. Liu, Y., Liu, S., Wang, Y., et al.: Video coding and processing: a survey. Neurocomputing 408, 331–344 (2020) 3. Sze, V., Budagavi, M., Sullivan, G.J. (eds.): High efficiency video coding (HEVC). ICS, Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06895-4 4. Jia, J.W.: An information hiding algorithm for HEVC based on angle differences of intra prediction mode. J. Softw. 10(2), 213–221 (2015) 5. Yang, J., Li, S.: An efficient message hiding method based on motion vector space encoding for H. Multimedia Tools Appl. 265, 1–23 (2017) 6. Chang, P.-C., Chung, K.-L., Chen, J.-J., Lin, C.-H.: A DCT/DST-based error propagation-free data hiding algorithm for HEVC intra-coded frames. J. Visual Commun. Image Represent. 25(2), 239–253 (2013) 7. Liu, S., Liu, Y., Lv, G., Feng, C., Zhao, H.: Hiding bitcoin transaction information based on HEVC. In: Qiu, M. (ed.) SmartBlock 2018. LNCS, vol. 11373, pp. 1–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05764-0_1

A High-Performance Steganography Algorithm

531

8. Zhao, H., Pang, M., Liu, Y.: An efficient video steganography scheme for data protection in H.265/HEVC. In: Huang, D.-S., Jo, K.-H., Li, J., Gribova, V., Bevilacqua, V. (eds.) ICIC 2021. LNCS, vol. 12836, pp. 358–368. Springer, Cham (2021). https://doi.org/10.1007/9783-030-84522-3_29 9. Liu, Y., et al.: A robust and improved visual quality data hiding method for HEVC. IEEE Access 2018(6), 53984–53997 (2018) 10. Liu, S., Liu, Y., Feng, C., Zhao, H.: A reversible data hiding method based on HEVC without distortion drift. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.) ICIC 2017. LNCS (LNAI), vol. 10363, pp. 613–624. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-63315-2_53 11. Zhao, H., Liu, Y., Wang, Y., Wang, X., Li, J.: A blockchain-based data hiding method for data protection in digital video. In: Qiu, M. (ed.) SmartBlock 2018. LNCS, vol. 11373, pp. 99–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05764-0_11 12. Liu, Y., Liu, S., Zhao, H., et al.: A new data hiding method for H. 265/HEVC video streams without intra-frame distortion drift. Multimedia Tools Appl. 78(6), 6459–6486 (2019) 13. Liu, S., Liu, Y., Feng, C., Zhao, H.: An efficient video steganography method based on HEVC. In: Huang, D.-S., Jo, K.-H., Li, J., Gribova, V., Bevilacqua, V. (eds.) ICIC 2021. LNCS, vol. 12836, pp. 327–336. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84522-3_26 14. Zhao, H., Liu, Y., Wang, Y., et al.: A video steganography method based on trans-form block decision for H. 265/HEVC. IEEE Access 9, 55506–55521 (2021) 15. Liu, S., Liu, Y., Feng, C., Zhao, H.: A novel DCT-based video steganography algorithm for HEVC. In: Huang, D.-S., Jo, K.-H., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds.) Intelligent Computing Methodologies: 18th International Conference, ICIC 2022, Xi’an, China, August 7–11, 2022, Proceedings, Part III, pp. 604–614. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-13832-4_49

Using N-Dimensional Space Coding of Transform Coefficients for Video Steganography in H.265/HEVC Hongguo Zhao1 , Yunxia Liu1(B) , Yonghao Wang2 , Hui Liu3 , and Zhenghang Zhao4 1 College of Information Science and Technology, Zhengzhou Normal University, Zhengzhou,

China [email protected] 2 Computing and Digital Technology, Birmingham City University, Birmingham, UK 3 College of Marxism, Henan University of Traditional Chinese Medicine, Zhengzhou, China 4 School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou, China

Abstract. In H.265/HEVC, DST transformation is first utilized for the 4 × 4 residual coefficient block to deal with complex texture features region, which improves the compression efficiency compared with the earlier H.264/AVC, and also provides a potential embedding position for video steganography. This paper presents a novel video steganography method based on constructing an N-dimensional space coding of 4 × 4 DCT/DST residual coefficients. First, 4 × 4 DCT/DST residual coefficient blocks are selected for embedding according to embedding strength, random number and coefficient values of TB. Second, high frequency coefficients are normalized into an N-dimensional array and mapped to a point in the constructed N-dimensional space, and the mapping value of this point is calculated based on pre-defined mapping rules. Third, according to the value of the metadata (embedded data) and mapping value, traversing all neighbor points in the N-dimensional space, video steganography can be achieved by modifying N-dimensional array coefficients to neighbor point that satisfies whose mapping value is equal to the value of metadata. The proposed method can embed multiple metadata bits while only modify at most one DCT/DST coefficient, so the embedding error is limited. The experimental results have been proven the superiority of the proposed video steganography method. Keywords: Video steganography · space coding · Residual coefficients · 4× 4 DCT/DST blocks

1 Introduction In recent years, online video applications, such as video conferencing, short video sharing, live broadcasting, etc., have been increasingly developed. However, along with these flourishing industries of video applications, video security is becoming a no-negligible issue. Video illegally broadcasting, tampering, unauthorized video clipping, etc., can be seen in several aspects of network. Video steganography has become an important © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 532–543, 2023. https://doi.org/10.1007/978-981-99-4755-3_46

Using N-Dimensional Space Coding of Transform Coefficients

533

research area for video security [1]. It mainly uses redundancy of human vision in digital signal to embed metadata, which can be used as an effective tool for video copyright protection, illegal transmission, traceability, etc. Video steganography can be divided into raw and compressed domain according to embedding position. Video steganography based on raw domain mainly embeds metadata into pixel samples. However, since video often suffers compression to be transmitted on network, the embedded metadata is vulnerable to compression and cannot be retrieved. Video steganography based on compressed domain is becoming the mainstream research area in recent years. According to the embedding position and specific syntax elements, it can be divided into DCT/DST residual coefficients, prediction mode, motion vector and entropy coding based methods [1, 4]. Due to the majority occupation of DCT/DST coefficients in bitstreams [2], video steganography based on DCT/DST residual coefficients has less impact on carrier video than syntactic elements such as prediction mode, motion vector, entropy coding, and so on. Actually DCT/DST residual coefficients have always be a hot research area in video steganography both academia and industry. There are a lot of video steganography methods based on DCT or DST residual coefficients based on H.265/HEVC [7–13] and its latest version H.264/AVC [3, 5, 6]. Ma et al. [3] proposed to embed metadata into DCT residual coefficients with coefficient pair and neighbor intra- prediction modes; the purpose is to eliminate intra-frame distortion drift in H.264/AVC. In advance, Liu et al. [5, 6] proposed using BCH code and secret sharing to increase the robustness of video steganography, in the meanwhile, they also embedded metadata into 4 × 4 DCT residual coefficients and using coefficient pair and intra- frame prediction modes to eliminate intra-frame distortion drift. Based on H.265/HEVC, Swati et al. [7] proposed to embed metadata into DCT residual coefficients with Least Significant Bit (LSB). The challenge is lower visual quality of the carrier video compared to previous researches. In order to improve visual quality, Liu et al. [9] utilize multiple coefficients and intra- prediction modes to embed metadata into DST coefficients and Zhao et al. [12] utilized two-dimensional histogram shifting to embed metadata into QDST residual coefficients. Moreover, Yang et al. [8] proposed to use motion vector space encoding to embed metadata. However, as the motion vector is a key syntactic element, modification on motion vector would lead to a significant impact on PU’s data. Considering that DCT/DST residual coefficients takes higher proportion in bitstreams than motion vector, combining space encoding and DCT/DST residual coefficients for video steganography would be a potential research area for high embedding efficiency. In this paper, we proposed a novel N-dimensional space coding video steganography method based on 4 × 4 DCT/DST residual coefficients for H.265/HEVC. In order to eliminate embedding error, we can embed multiple metadata bits while only modify at most one DCT/DST coefficient by constructing N-dimensional coefficient space. In order to improve the visual quality and transparency, only high frequency component of 4 × 4 DCT/DST residual coefficients in I, B and P is used as embedding position. Compared with existing related methods, the proposed method has superiority in embedding capacity, visual quality, high transparency and security performance. The remainder of this paper is organized as follows: Sects. 2 presents the principles of N-dimensional residual coefficient space and coding. Section 3 presents the scheme

534

H. Zhao et al.

of the proposed video steganography method. In Sect. 4, the experimental results are presented and evaluated. Finally, the conclusion is shown in Sect. 5.

2 DCT/DST Residual Coefficients Space 2.1 Selection of 4 × 4 DCT/DST Residual Coefficient Blocks In order to reduce the impact of embedding errors, 4 × 4 blocks with complex texture features are selected as potential embedding positions. Specifically, for intra-frame prediction, 4 × 4 blocks should meet the current PB (Prediction Block) size is 4 × 4, while the TB (Transform Block) size should also be 4 × 4. We select the high frequency DST residual coefficients as embedding position. For inter-frame prediction, 4 × 4 blocks should meet the same requirement with intra. High frequency DCT residual coefficients are selected as embedding position. The purpose of 4 × 4 blocks is to ensure that the impact of embedding error is as small as possible. In addition, we also use embedding intensity ε and random number generator G to further increase the randomness for video steganography. Detailed representation is shown as follows: (1) Embedding Strength ε Embedding strength ε determines whether 4 × 4 TB as embedding block or not. If the value of embedding strength ε = 0.6, the probability of 4 × 4TB as the embedded block is 60%. It’s noted that the larger embedding strength ε value is, the larger embedding capacity will be and the larger embedding error will be introduced. Embedding strength ε can be adaptively adjusted according to the volume of metadata to be embedded. (2) Random Number ρ Random number ρ is generated by the random number generator G. Each 4 × 4 TB corresponds to a random number ρ. If ρ is less than ε, then the current 4 × 4 TB will be selected as embedded block. If not, then the current 4 × 4 TB will not be used as embedding block and will be skipped for embedding process. To ensure the embedded metadata can be correctly extracted at the extraction side, the random seed G should be kept the same both at embedding and extraction side. (3) 4 × 4 TB Constraint Since the 4 × 4 TB residuals with all zero would not be encoded into bitstreams through entropy coding. Therefore, in order to avoid this negative condition, if the current 4 × 4 TB residual coefficients are all zero, the current 4 × 4 TB will be skipped, if the current 4 × 4 TB residuals are not all zero, the current 4 × 4 TB will be selected as an embedded block. The selection of 4 × 4TB for embedding is shown in Fig. 1. The striped 4 × 4 TBs are the selected blocks for embedding according to above selection principles. The shaded block in Fig. 1 is not selected as an embedding block due to its SIZE_2N × 2N PB splitting regardless of its 4 × 4 TB block structure. The purpose of this selection is to reduce the influence range of embedding errors introduced. The selection of high frequency residual coefficients is shown in Fig. 2. To simplify 4 × 4 TB constraints, here we take direct coefficient (DC) Y0, 0 = 0 to meet the constraint that not all residual coefficients are zero, that is, the syntactic element CBF always keeps no-zero re- gardless of the modification on high frequency coefficient due to embedding. The scanning order of coefficients selection for embedding is depicted as directed arrow.

Using N-Dimensional Space Coding of Transform Coefficients

535

Embedding Block CB Splitting

PB Splitting

TB Splitting

Fig. 1. 4 × 4 Blocks selection for embedding

Y0,0 ≠ 0

Y0,1

Y0,2

Y0,3

Y1,0

Y1,1

Y1,2

Y1,3

Y2,0

Y2,1

Y2,2

Y2,3

Y3,0

Y3,1

Y3,2

Y3,3

DC

Low Frequency

High Frequency

Fig. 2. High frequency residual coefficients selection

2.2 Construction and Coding of N-dimensional Residual Coefficients Space The purpose of constructing the N-dimensional residual coefficients space coding is to construct a mapping rule that can embed multiple bits of metadata while at most modify one bit of selected 4 × 4TB residual coefficient. The dimension N is decided by the random number ρ of the selected 4 × 4TB block for security enhancement. After N is determined, N DCT/DST residual coefficients can be selected according to the scanning order as shown in Fig. 2 to construct the original N-dimensional DCT/DST residual coefficients array.

536

H. Zhao et al.

Considering the computational overflow, the original N-dimensional DCT/DST residual coefficients array need to be normalized. Assume the original N-dimensional DCT/DST residual coefficients array is Ccoeff = {c1 , c2 , c3 , . . . , cN }, the normalized array is Ttemp = {t1 , t2 , t3 , . . . , tN }, and the normalization rule is carried out as following:  ci ≥ 0 c mod (2N + 1) (1) ti = i (ci mod (2N + 1) + 2N + 1)mod (2N + 1) ci < 0 If we define the element ti in Ttemp is the ith dimension coordinate, then all the elements in Ttemp can be a point in the N-dimensional residual coefficient space  which each dimension is decided by the range of ti . Obviously, the range of each dimension in space  is (0,2N). The mapping rule about space  is defined as following: (1) If the space  is 1-dimensional, the mapping rule of point x1 is defined as Eq. 2: f (x1 ) = x1 mod (2N + 1)

(2)

(2) If the space  is 2-dimensional, the mapping rule of point (x1 , x2 ) is defined as Eq. 3: f (x1 , x2 ) = (x1 + 2x2 )mod (2N + 1)

(3)

(3) Similarly, if the space  is N-dimensional, the mapping rule of point (x1 , x2 , x3 , . . . , xN |N ≥ 3) is defined as Eq. 4: f (x1 , x2 , x3 , . . . , xN ) = (x1 + 2x2 + 3x3 + . . . + NxN )mod (2N + 1)

(4)

According to above mapping rules, the calculating process of mapping value referred to point (x1 , x2 , x3 , . . . , xN |N ≥ 3) is called the N-dimensional DCT/DST residual coefficients space coding. Figure 3 shows the example of space coding process when N = 3. When N = 3, the range of coordinate in each dimension and mapping value of any point in  are both (0,6). For example, assume a point (0, 1, 2), the mapping value of this point would be f(0, 1, 2) = (0 × 1 + 1 × 2 + 2 × 3)mod(2 × 3 + 1) = 1, as shown in Fig. 3 with striped block. In addition, there’s an important property in this N-dimensional DCT/DST residual coefficients space , that is, for any point P (x1 , x2 , x3 , . . . ., xN ) in , there are 2N neighbor points around point, and the mapping values of these neighbor points and point P completely traverse array {0, 1, 2, 3, . . . , 2N }. Here we provide the analysis of above property. If the mapping value of point P is X, that is, X = f (x1 , x2 , x3 , . . . ., xN ) = (x1 + 2x2 + 3x3 + . . . + NxN )mod (2N + 1). Assume the neighbor points of P in the ith dimension is P + and P − , then their mapping values will be: P + : f (x1 , x2 , . . . , xi + 1 . . . ., xN ) = (x1 + 2x2 + . . . i × (xi + 1) + . . . + NxN )mod (2N + 1) = (x1 + 2x2 + 3x3 + . . . + NxN + i)mod (2N + 1) = (X + i)mod (2N + 1) P − : f (x1 , x2 , . . . , xi − 1 . . . ., xN )

Using N-Dimensional Space Coding of Transform Coefficients

537

= (x1 + 2x2 + . . . i × (xi − 1) + . . . + NxN )mod (2N + 1) = (x1 + 2x2 + 3x3 + . . . + NxN − i)mod (2N + 1) = (X − i)mod (2N + 1)

Fig. 3. Example of 3-dimensional DCT/DST residual coefficients space coding

If iterator i traverse from 1 to N , the number of neighbor points P + and P − will be 2N , and their mapping values have constituted an array {(X + 1)mod (2N + 1), (X − 1)mod (2N + 1), (X + 2)mod (2N + 1), (X − 2) mod (2N + 1), . . . , (X + N )mod (2N + 1), (X − N )mod (2N + 1)}, where these elements are fully filled with range (0, 2N ). For example, as shown shadow in Fig. 3, for point (3, 3, 1), its mapping value is f (3, 3, 1) = 5. The number of neighbor points of (3, 3, 1) is 6, and their mapping values compose array {1, 0, 6, 3, 4, 2}. Moreover, for boundary point (6, 1, 1) marked shadow in Fig. 3, its mapping value is f (6, 1, 1) = 4.

538

H. Zhao et al.

It’s noted that cause point (6, 1, 1) is located at the rightmost boundary on x1 dimension, so its right neighbor point would be point (0, 1, 1) according to (1). Obviously, all mapping values of its neighbor points traverse array {0, 1, 2, . . . , 6}. The construction, mapping rules and property in N-dimensional space  provide underlying support for the proposed video steganography method in this paper.

3 Embedding and Extraction In the proposed method, there are several key components for embedding and extraction. The embedding position is 4 × 4TB high frequency residual coefficients. For intraprediction, DST residual coefficients are used to construct the N-dimensional space . For inter- prediction, DCT residual coefficients are used to construct the N-dimensional space . The basic idea of Using 4 × 4TBs is to decrease the impact of embedding error introduced by embedding. Embedding intensity ε and random number ρ can further control embedding capacity and security of video steganography. Constructing the relationship between N-dimensional space  and metadata can achieve that multiple metadata bits can be embedded and only one residual coefficient need to be modified at most, which can improve transparency of the video steganography. In the following, we would utilize the relationship between N-dimensional space  and metadata to elaborate embedding and extraction process in this method. 3.1 Embedding The binary length of metadata that can be embedded into current 4 × 4 TB is decided by N, that is l = log2 (2N + 1), where l is the binary length of metadata. Assume the metadata binary array is S = {s1 , s2 , . . . , sl } and the decimal value of metadata is Dsec , then the relationship between S = {s1 , s2 , . . . , sl } and Dsec would be: Dsec = l s × 2i−1 . Then the relationship between N-dimensional space  and metadata that i=1 i used in embedding process is defined as follows: (1) When the decimal value Dsec of metadata is equal to the mapping value X of point P in the N-dimensional space , that is Dsec = X , the current 4 × 4 TB residual coefficients do not need any modification to embed metadata. (2) When not equal, we need to traverse all neighbor points of point P until we have found one point P  , which meets the mapping value X  of point P  is equal to Dsec , that is Dsec = X  . Moreover, we modify the original N-dimensional DCT/DST residual coefficients array Ccoeff = {c1 , c2 , c3 , . . . , cN } to make point P covert to point P  , current 4 × 4 TB embedding is done. A typical example of embedding process is shown in Fig. 4. The selected 4 × 4 TB block used to embed is marked bold, where PB splitting pattern is SIZE_N × N and the DC coefficient is not zero. Assume the dimensional N is 3, it’s indicated that the length of metadata that can be embedded into current 4 × 4 TB is l = log2 (2N + 1) = log2 (2 × 3 + 1) = 2. If the binary array of embedded metadata is “10”, the decimal value of metadata would be Dsec =2. In the Meanwhile, the original 3-dimensional DCT/DST residual coefficients array is Ccoeff = {−3, 1, 2} according to above scanning order. Due

Using N-Dimensional Space Coding of Transform Coefficients

539

to the negative value -3, the array Ccoeff can be normalized to array Ttemp = {4, 1, 2} by Eq. (1). The mapping value X of point P (4, 1, 2) would be X = f (4, 1, 2) = 5 according to Eq. (4). Since the decimal value of metadata Dsec is not equal to mapping value X of point P, we traverse all neighbor points of P and find that the neighbor point P  (4, 1, 1), whose mapping value is X  = f (4, 1, 1) = 2 = Dsec , which meets the relationship between N-dimensional space  and metadata. Thus, the normalized array Ttemp  = {4, 1, 1} and the modified DCT/DST residual coefficients array is Ccoeff  = {−3, 1, 1}, which indicates that we only need to modify original residual coefficient 2 to 1, the metadata “10” has been embedded. It can be seen that the proposed video steganography method can achieve the goal of embedding multiple bits metadata while only add or subtract 1 to one residual coefficient at most. Therefore, embedding errors are limited.

Fig. 4. Example of embedding process

3.2 Extraction  = After entropy decoding, if the N-dimensional residual coefficients array is Ccoeff              c1 , c2 , c3 , . . . , cN , the normalized should be Ttemp = t1 , t2 , t3 , . . . , tN accord can be mapped to point P  , which the coordinate is ing to Eq. (1). Then array Ttemp       P  t1 , t2 , t3 , . . . , tN . The relationship between N-dimensional space  and metadata that used in extraction process is defined as follows:  of point P  can be formulated as According    to Eq. (2) – (4), the mapping value X   X = f t1 , t2 , t3 , . . . , tN . Obviously, the value X is also Dsec , the decimal value of metadata that has been embedded into current 4 × 4 TB residual coefficients. Finally,

540

H. Zhao et al.

the extracted metadata can be achieved through inverse binarization process of Dsec . Figure 5 provides a typical example of extraction process. The selected extraction 4 × 4 TB block is marked bold in Fig. 5, where the PB’s splitting pattern is SIZE_N × N and size is also 4 × 4. Moreover, the DC coefficient is not zero. The dimensional N is 3. After entropy decoding and scanning, the high frequency residual coefficients of current 4 × 4 TB is {−3, 1, 1, 0, 2, 5, 0, 0, 0, 0}. According to N, the N-dimensional residual   = {−3, 1, 1}, and the normalized array Ttemp = {4, 1, 1}. In coefficients array is Ccoeff the 3-dimensional residual coefficients space, the mapping value of point P  (4, 1, 1) is X  = f (4, 1, 1) = 2, that is to say, the decimal value of extracting metadata is Dsec  = 2. Finally, following inverse binarization, the extracting binary metadata is S  = {1, 0}.

Fig. 5. Example of extraction process

4 Experimental Evaluation The proposed video steganography algorithm is manipulated and evaluated under the reference software x265 (version 3.4) and HM16.0. Specifically, Software x265 is used as embedding server and HM16.0 is used as extracting server. The experimental parameters are set as follows: frame-rate is set to be 30frames/s, basic quantization parameter is set to be 32, and the test video sequence is set to be key frames with interval 4. Sign Data Hiding (SDH) is turn-off. In addition, the dimension N is set to 3, embedding strength ε is set to 0.6 and random seed is set to 1. Test video resolutions is in the range of 416x240 to 1920x1080 (BasketballPass: 416x240, BQMall: 832x480, KristenAndSara: 1280x720 and ParkScene: 1920x1080).

Using N-Dimensional Space Coding of Transform Coefficients

541

The subjective visual quality of the proposed algorithm is depicted in Fig. 6, where left frame is original video sample, middle frame is compressed sample with x265 encoder, and right frame is embedded sample with our method in x265 encoder. The subjective visual comparisons of BasketballPass is shown in Fig. 5(a), and the average PSNR values of compressed and embedded are 36.72dB, 36.68dB. The subjective visual comparisons of BQMall is shown in Fig. 5(b), and the average PSNR values of compressed and embedded are 36.72dB, 35.92dB. It can be seen that the proposed algorithm has achieved good visual quality on carrier videos and imperceptible performance. R-D(Rate-Distortion) performance of BasketballPass is shown in Fig. 7. The solid line is the R-D performance of compressing with x265, which indicates no metadata has been embedded. The dotted line is the R-D performance of our proposed video steganography method. The average discrepancy of PSNR and Bit-Rate increase are 0.63 dB and 2.02%. It can be seen that our proposed method has a good performance in terms of R-D performance.

Fig. 6. Subjective visual quality of the proposed method

Table 1 provides the embedding capacity and embedding time consuming of our proposed method. 30 frames are used to test embedding capacity and time consuming. The average embedding capacity is 669 bits/frame, and the embedding time consuming is 4.13 frames/s. It can be seen that our proposed method can achieve a good performance on embedding capacity and time consuming either.

542

H. Zhao et al.

Fig. 7. R-D Performance of BasketballPass

Table 1. Embedding capacity & time consuming of our proposed algorithm Test sample BasketballPass

Embedding Capacity (bits/frame) 117

Time consuming (fps) 10.34

BQMall

591

3.67

KristenAndSara

467

1.86

1504

0.66

ParkScene

5 Conclusion In this paper, an effective N-dimensional residual coefficients space coding video steganography method based on 4 × 4 DCT/DST TB is proposed for H.265/HEVC video security protection. With the relationship and mapping value in N-dimensional space  and metadata, the proposed method can achieve embedding multiple-bits metadata while at most one residual coefficient is modified. The experimental results also show that our proposed algorithm can achieve a good embedding performance on visual quality, R-D performance, embedding capacity and time consuming. Acknowledgment. This paper is sponsored by Henan International Joint Laboratory of Blockchain and Audio/Video Security, Henan Big Data Security and Privacy Development Innovation Laboratory, Zhengzhou Key Laboratory of Blockchain and CyberSecurity, and 2023 Henan Science&Technology Breakthrough project, “Research on Covert Communication Based on H.265 Video Steganography” (No. 232102210126).

Using N-Dimensional Space Coding of Transform Coefficients

543

References 1. Liu, Y.X., Liu, S., Wang, Y.H., Zhao, H.G.: Video coding and processing: a survey. Neurocomputing 408, 331–344 (2020) 2. Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012) 3. Ma, X.J., Li, Z.T., Tu, H., et al.: A data hiding algorithm for H. 264/AVC video streams without intra-frame distortion drift. IEEE Trans. Circ. Syst. Video Technol. 20(10), 1320–1330 (2010) 4. Liu, Y.X., Zhao, H.G., Liu, S.Y., et al.: A robust and improved visual quality data hiding method for HEVC. IEEE Access. 6, 53984–53987 (2018) 5. Liu, Y., Li, Z., Ma, X., Liu, J.: A robust without intra-frame distortion drift data hiding algorithm based on H.264/AVC. Multimedia Tools Appl. 72(1), 613–636 (2013). https://doi. org/10.1007/s11042-013-1393-0 6. Liu, Y., Jia, S., Hu, M., et al.: A reversible data hiding method for H.264 with Shamir’s (t, n)-threshold secret sharing. Neurocomputing 188, 63–70 (2016) 7. Swati, S., Khizar, H., Zafar, S.: A watermarking scheme for high efficiency video coding (HEVC). PLoS ONE 9(8), e105613 (2014) 8. Yang, J., Li, S.: An efficient information hiding method based on motion vector space encoding for HEVC. Multimedia Tools Appl. 77(10), 11979–12001 (2017). https://doi.org/10.1007/s11 042-017-4844-1 9. Liu, Y.X., Liu, S.Y., Zhao, H.G., et al.: A new data hiding method for H.265/HEVC video streams without intra-frame distortion drift. Multimedia Tools Appl. 78, 6459–6486 (2019) 10. Zhao, H.G., Liu, Y.X., Wang, Y.H., et al.: A video steganography method based on transform block decision for H.265/HEVC. IEEE Access 9, 55506–55521 (2021). https://doi.org/10. 1109/ACCESS.2021.3059654 11. Zhao, H.G., Liu, Y.X., Wang, Y.H.: A novel two-dimensional histogram shifting video steganography algorithm for video protection in H. 265/HEVC. In: Huang, D.S., Jo, K.H., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Methodologies: 18th International Conference, ICIC 2022, Xi’an, China, August 7–11, 2022, Proceedings, Part III. Cham: Springer International Publishing (2022). https://doi.org/10.1007/9783-031-13832-4_55 12. Zhao, H., Pang, M., Liu, Y.: Intra-frame adaptive transform size for video steganography in H.265/HEVC bitstreams. In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 601–610. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-607968_52 13. Zhao, H.G., Liu, Y.X., Wang, Y.H., Wang, X.M., Li, J.X.: A blockchain-based data hiding method for data protection in digital video. In: International Conference on Smart Blockchain: SmartBlock 2018, Tokyo, Japan, pp. 99–110 (2018)

CSMPQ: Class Separability Based Mixed-Precision Quantization Mingkai Wang1(B) , Taisong Jin1 , Miaohui Zhang2 , and Zhengtao Yu3 1 Media Analytics and Computing Lab, Department of Computer Science and Technology,

School of Informatics, Xiamen University, Xiamen 361005, China [email protected] 2 Institute of Energy Research, Nanchang, China 3 Yunnan Key Laboratory of Artificial Intelligence, Kunming, China

Abstract. Network quantization has become increasingly popular due to its ability to reduce storage requirements and accelerate inference time. However, However, ultra low-bit quantization is still challenging due to significant performance degradation. Mixed-precision quantization has been introduced as a solution to achieve speedup while maintaining accuracy as much as possible by quantizing different bits for different layers. However, existing methods either focus on the sensitivity of different network layers, neglecting the intrinsic attribute of activations, or require a reinforcement learning and neural architecture search process to obtain the optimal bit-width configuration, which is time-consuming. To address these limitations, we propose a new mixed-precision quantization method based on the class separability of layer-wise feature maps. Specifically, we extend the widely-used term frequency-inverse document frequency (TF-IDF) to measure the class separability of layer-wise feature maps. We identify that the layers with lower class separability can be quantized to lower bits. Furthermore, we design a linear programming problem to derive the optimal bit configuration. Without any iterative process, our proposed method, CSMPQ, achieves better compression tradeoffs than state-of-the-art quantization algorithms. Specifically, for QuantizationAware Training, we achieve Top-1 accuracy of 73.03% on ResNet-18 with only 63GBOPs, and Top-1 accuracy of 71.30% with 1.5 Mb on MobileNetV2 for Post-Training Quantization. Keywords: Mixed-Precision Quantization · Class Separability · TF-IDF

1 Introduction Network quantization is a popular technique used to compress and accelerate neural networks. This technique involves mapping single precision floating point weights or activations of neural networks to lower bits. Traditional quantization approaches [1–4] typically use the same low-bit for all network layers, which can result in significant accuracy degradation. To address this drawback, mixed-precision quantization has recently been proposed. Mixed-precision quantization offers a better trade-off between the compression ratio and accuracy of the neural network. Several representative mixed-precision © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 544–555, 2023. https://doi.org/10.1007/978-981-99-4755-3_47

CSMPQ: Class Separability Based Mixed-Precision Quantization

545

quantization methods have been proposed. HMQ [5], for example, makes the bit-width and threshold differentiable using the Gumbel-Softmax estimator. HAWQ [6] leverages the eigenvalue of the Hessian matrix of weights for bit allocation. Other methods such as Neural Architecture Search (NAS) [7–13] or Reinforce Learning [14, 15] are also used to search for optimal bit-width. Mixed-precision quantization has shown promising learning performance in realworld applications. However, it is important to note that it still requires a significant amount of data and computational resources. Specifically, the search space for mixedprecision quantization is exponential to the number of layers in the neural network. As a result, it can be intractable for existing neural networks to handle large-scale datasets. This limitation can ultimately hinder further performance enhancements of mixed-precision quantization methods. Inspired by the recent advances on network quantization algorithm, we propose a novel Class Separability Based Mixed-Precision Quantization method, termed CSMPQ. CSMPQ introduces TF-IDF metric from Natural Language Processing (NLP) to measure the class separability of layer-wise feature maps. Using this metric, we design a linear programming problem to derive the optimal bit-width config. The CSMPQ method allocates fewer bits to layers with lower class separability and more bits to layers with higher class separability. Without any iterative process, the proposed method can derive the optimal layer-wise bit-width in a few GPU seconds. The main contributions of this work are three-fold: (1) We introduce the class separability of layer-wise feature maps to search for optimal bit-width. To our knowledge, the proposed method is the first attempt to apply the class separability of the layers to network quantization. (2) We propose to leverage the TF-IDF metric that is widely-used in NLP to measure the class separability of layer-wise feature maps. (3) The extensive experiments demonstrate that the proposed method can provide the state-of-the-art quantization performance and compression rate. The search process can be finished within 1 min on a single 1080Ti GPU.

2 Related Work 2.1 Network Quantization The current practice of network quantization can be broadly categorized into two groups: Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ). QAT [1, 6] aims to mitigate the performance degradation caused by quantization through a retraining process. However, QAT is computationally expensive due to the fine-tuning process. PTQ [16, 17], on the other hand, directly quantizes the neural network models without fine-tuning. For a better compression trade-off, mixed-precision quantization assigns different bit-widths to the network layers across the model. Reinforcement learning and neural architecture search (NAS) [7, 8] are used to determine bit-widths. However, these methods often require significant computational resources. Recently, second-order gradient information via the Hessian matrix was used to determine bit-widths [6]. However, calculating the Hessian information of neural network is still time-consuming.

546

M. Wang et al.

2.2 TF-IDF TF-IDF, short for Term Frequency-Inverse Document Frequency, is a well-known technique used in the fields of information retrieval and text mining. TF-IDF assesses the importance of a word to a document set or a document to a corpus. The importance of a word increases proportionally to the number of times a word appears in the document, but decreases inversely to the frequency a word appears in the corpus.   TF-IDF for word t in document d from the document collection D = d1 , · · · , dj is calculated as follows: TF − IDFi,j = TFi,j × IDFi

(1)

where ni,j is the number of occurrences of a word in the document dj , and the denominator is the sum of occurrences of all the words in the document dj . TF refers to the frequency with which a given word appears in the document. For a word dj in a document, its importance can be calculated as: TFi,j =

n  i,j k nk,j

(2)

IDF is a measure of the importance of a word. The IDF of a word can be obtained by dividing the total number of documents by the number of documents containing the word, and then taking the base 10 logarithm of the quotient which is defined as: IDFi = lg |{j:t|D| ∈d }| i

j

(3)

where ni,j is the number of occurrences of the  word ti in the document dj , and |D| is the total number of documents in the corpus, {j : ti ∈ dj } is the number of documents  containing the term ti . To avoid the denominator from being zero, 1 + {j : ti ∈ dj } is used as the denominator. High word frequency within a document, and low document frequency of that word in the whole corpus, result in a higher TF-IDF score. Thus, TF-IDF tends to filter out the common words and keep the important words.

3 Methodology 3.1 Pre-processing Given a pre-trained model with L network layers, n images are sampled to obtain the feal l l ture maps of each layer {X 1 , X 2 , · · · , X L }. Then, the feature map X j ∈ Rc ×hout ×wout ×1 of j-th class are fed into an average pooling layer for reducing the feature dimension: Aj =

hout wout hout ×wout

Xj

(4)

where Aj ∈ Rcout is the feature of each output channel after dimension reduction. We compose the features across different classes of the certain layer as A = A1 , · · · , Aj ∈ Rcout ×j . After pre-processing, there are cout features in each layer for an image.

CSMPQ: Class Separability Based Mixed-Precision Quantization

547

3.2 Transforming the Features into Words The original TF-IDF metric used in NLP is specifically designed to measure the importance of words in a document. However, it is not suitable for directly measuring the class separability of different layers in a neural network with TF-IDF. To address this limitation, it is crucial to identify which features should be transformed into words for TF-IDF computation. To derive the discriminative features for measuring the class separability of network layers, it is important to select features with greater sensitivity and stronger representation capability. This can be achieved by identifying features whose values significantly deviate from the mean value of the feature set. Specifically, when a feature’s value deviates from the mean by a certain threshold, the corresponding feature is chosen as a word for further TF-IDF computation:     (5) Ni = {sj ∈ S : abs Ai,j − Aj ≥ std Aj } where S is a set of the images. For the j-th image, Ai,j is the i-th element of the feature map and Aj denotes the mean of the feature values. In this way, the suitable features are chosen as the words for computing the metric. 3.3 The TF-IDF for Network Quantization After transforming the features into the words, we formulate the TF of the i-th feature of the j-th image as: TF ∗i,j

=

 Ali,j ×mask Ali,j ∈Ni cout l k=0 Ak,j

(6)

where Ali,j represents the i-th element of the feature map in the l-th layer of the j-th image. To enhance the discriminative power of features, we utilize a mask to preserve the appropriate features (i.e., words). In this way, the defined term frequency (TF) can reflect the importance of different features in the l-th feature map. Meanwhile, we define the inverse document frequency (IDF) of the i-th feature as: 1+|S| IDFi∗ = log 1+|N i|

(7)

where |S| denotes the number of the images for calibration, and |Ni | represents the number of features that deviate from its mean value by a predefined threshold. By multiplying the improved TF with IDF, the TF-IDF score of each feature in the l-th layer is obtained: TF − IDF ∗i,j = TF ∗i,j ∗ IDF ∗i

(8)

The TF-IDF score of a feature can be used to measure the importance of a network layer. Specifically, if a layer has more features with higher TF-IDF scores, the corresponding class separability is expected to be stronger. Therefore, the class separability of the l-th layer is defined as: αl =

cout  k=0

TF−IDF lk,j , sj |Ni |×cout j

∈ S∗

(9)

548

M. Wang et al.

3.4 Mixed-Precision Quantization Given a pre-trained neural network M , we use θi to denote the importance of i-th layer after obtaining the layer-wise class separability αi . A lower class separability value αi corresponds to a lower importance θi , and vice versa. However, since the class separability between layers can vary dramatically, we use the monotonically increasing function ex to control the relative importance: θi = eβαi

(10)

where β is a hyperparameter that controls the relative importance to balance the bitwidth between different layers. Using the layer-wise importance, we define a linear programming problem to maximize the global importance as follows: Objective: max b

Constraints:

L 

L 

bi × θi

(11)

M (bi ) ≤ T

(12)

i=1

i

where M (bi ) is the model size of the i-th layer when it is quantized to bi bit and T represents the model size. The objective function maximizes the sum of layer-wise importance weighted by their bit-widths, while the constraint ensures that the total model size is limited by T . Maximizing the objective function means assigning more bit-widths to the layers with higher class separability, which implicitly maximizes the network’s representation capability. We solve this linear programming problem using the SciPy library [18], which only requires a few seconds on a single CPU. Furthermore, the proposed method can serve as an auxiliary tool to search for optimal bit-width so that it can be easily combined with other quantization methods (including QAT and PTQ). The proposed CSMPQ is summarized in Algorithm 1.

CSMPQ: Class Separability Based Mixed-Precision Quantization

549

4 Experiments In this section, we conduct the experiments to evaluate the effectiveness of the proposed CSMPQ. Specifically, we compare the performance of CSMPQ against several widely-used Quantization-Aware Training (QAT) methods. Furthermore, we combine CSMPQ with a state-of-the-art Post-Training Quantization (PTQ) method BRECQ [17] to enhance the accuracy at different levels of compression. 4.1 Implementation Details The ImageNet dataset [19] has 1.28M training data and 50,000 validation data. We randomly sample 32 training data for all the models following the same data pre-processing methods [20] to obtain the layer-wise feature maps. For the training strategy, we use a cosine learning-rate decay scheduler and SGD optimizer with a momentum of 0.9 and a weight decay of 1e−4. The initial learning rate is set to 1e−4 for all the models. We fine-tune the compressed networks for 90 epochs. We fix the first and last layer bit at 8 bits following the previous works. The search space of QAT is 4–8 bits and that of PTQ is 2–4 bits. We use 2 NVIDIA Tesla A100 GPUs for QAT and a single 1080Ti for PTQ. The implementation for QAT is based on HAWQ-V3 [21] and that for PTQ is based on BRECQ [17]. In our experiment, only the bits for weight are quantized using mixed-precision, and the bits for activation are fixed. The whole search process only needs one forward pass which only costs about 30 s on a single 1080Ti GPU while other methods [6, 17, 22] need hundreds of iterations. 4.2 Quantization-Aware Training We first conduct experiments of quantization-aware training on ImageNet-1k dataset. We classify parameter quantization algorithms into two categories, unified quantization methods [23–25] and mixed-precision quantization methods [8, 15, 21]. We choose two ResNet models with different depths, namely ResNet-18 and ResNet-50 for experiments. The results are shown in Table 1 and Table 2, which shows the comparison between CSMPQ and other parameter quantization methods. As can be seen that CSMPQ obtains the Pareto optimum, which has a better compression-accuracy trade-off compared to the state-of-the-art QAT algorithms. Specifically, CSMPQ achieves 73.03% Top-1 accuracy on ResNet-18 with only 63G BOPs and 6.7 Mb when using 6-bit activation. As a comparison, HAWQ-V3 [21] achieves 70.22% Top-1 accuracy on ResNet-18 with 72G BOPs and 6.7 Mb. CSMPQ achieves 2.81% higher Top-1 accuracy while achieving 13G BOPs reduction compared to HAWQ-V3. When the activation is set to 8-bit, ResNet-18 model compressed by CSMPQ achieves 73.16% Top-1 accuracy with only 79G BOPs which is even higher than the fp32 baseline, being 1.60% higher than HAWQ-V3 with smaller model size and BOPs. Meanwhile, for ResNet-50, CSMPQ shows the superior performance to HAWQV3 (76.65% Top1-acc, 16.0 Mb, 163G BOPs vs 75.39% Top1-acc, 18.7 Mb, 154G BOPs). The excellent performance validates the effectiveness of CSMPQ.

550

M. Wang et al. Table 1. Quantization-aware training experiments on ImageNet with ResNet-18.

Method

W bit

A bit

Model Size (Mb)

BOPs (G)

Top-1 Acc (%)

Baseline

32

32

44.6

1858

73.09

RVQuant [25]

8

8

11.1

116

70.01

HAWQ-V3 [21]

8

8

11.1

116

71.56

CSMPQ

mixed

8

6.7

84 (−32)

73.16 (+1.60)

PACT [23]

5

5

7.2

74

69.80

LQ-Nets [24]

4

32

5.8

225

70.00

HAWQ-V3 [21]

mixed

mixed

6.7

72

70.22

CSMPQ

mixed

6

6.7

63 (−9)

73.03 (+2.81)

Table 2. Quantization-aware training experiments on ImageNet with ResNet-50. Method

W bit

A bit

Model Size (Mb)

BOPs (G)

Top-1 Acc (%)

Baseline

32

32

97.8

3951

77.72

PACT [23]

5

5

16.0

133

76.70

LQ-Nets [24]

4

32

13.1

486

76.40

RVQuant [25]

5

5

16.0

101

75.60

HAQ [15]

mixed

32

520

75.48

OneBitwidth [26]

mixed

8

12.3

494

76.70

HAWQ-V3 [21]

mixed

mixed

18.7

154

75.39

CSMPQ

mixed

5

16.0

143 (−9)

76.65 (+1.26)

9.62

4.3 Post-Training Quantization As for Post-Training Quantization, CSMPQ can serve as an auxiliary tool combined with other state-of-the-art PTQ methods to further improve the compression-accuracy trade-off due to the efficiency of CSMPQ. We thus combine CSMPQ with BRECQ [17]. BRECQ proposes to perform block reconstruction strategy to reduce quantization errors by using Hessian information. However, BRECQ also leverages evolutionary search strategy for the optimal bit configuration, which requires a large amount of computation resources. We thus replace the evolutionary search process with CSMPQ as the bit search algorithm proposed by CSMPQ is orthogonal to the BRECQ fine-tuning process, which are reported in Table 3 and Table 4. We can clearly observe that CSMPQ shows the superior performance to other unifiedprecision quantization and mixed-precision quantization method with similar under various model constraints. In particular, for ResNet-18, CSMPQ outperforms the basic BRECQ by 0.88% Top-1 accuracy under a model size of 4.0Mb with similar time and data costs. Similarly, for MobileNet-V2, CSMPQ achieves a higher Top-1 accuracy of

CSMPQ: Class Separability Based Mixed-Precision Quantization

551

Table 3. Post-training quantization experiments on ImageNet with ResNet-18. Method

W bit

A bit

Model Size (Mb)

Top-1 Acc (%)

Data

Iterations

Baseline

32

32

44.6

71.08



– -

FracBits-Pact [22]

mixed

mixed

4.5

70.01

1.2M

120

CSMPQ

mixed

4

4.5

71.56

1024 + 32

0

CSMPQ

mixed

8

4.5

73.16

1024 + 32

0

ZeroQ

4

4

5.81

21.20





BRECQ [17]

4

4

5.81

69.32





PACT [23]

4

4

5.81

69.20





HAWQ-V3

4

4

5.81

68.45





FracBits-Pact [22]

mixed

mixed

5.81

69.70

1.2M

120

CSMPQ

mixed

4

5.5

69.43

1024 + 32

100

BRECQ [17]

mixed

8

4.0

68.82

1024

100

CSMPQ

mixed

8

4.0

69.70

1024 + 32

100

Table 4. Post-training quantization experiments on ImageNet with MobileNet-V2. Method

W bit

A bit

Model Size (Mb)

Top-1 Acc (%)

Data

Baseline

32

32

13.4

72.49



Iterations

BRECQ [17]

mixed

8

1.3

68.99

1024

100

CSMPQ

mixed

8

1.3

69.71

1024 + 32

100

FracBits [22]

mixed

mixed

1.84

69.90

1.2M

120

BRECQ [17]

mixed

8

1.5

70.28

1024

100

CSMPQ

mixed

8

1.5

71.30

1024 + 32

100

(71.30% vs 70.28%) while keeping a lower model size (1.5 Mb vs 1.8 Mb). Additionally, extensive experiments show that CSMPQ can outperform other PTQ methods under different constraints. 4.4 Ablation Study In this section, we investigate the impact of changes in hyperparameter β on the performance of CSMPQ, as well as the influence of different bit-width allocation method, where the details are described as follows: Influence of β. The hyperparameter β plays a crucial role in controlling the relative importance between layers in CSMPQ. To investigate the impact of varying β on the

552

M. Wang et al.

performance of CSMPQ, we conducted experiments using the MobileNet-V2 model with a model size of 1.1 Mb and 8-bit activation values. It can be seen from Eq. (10) that as β increases, the range of importance value θ also increases, and conversely, when β approaches 0, the range of θ value is compressed, resulting in a significant reduction in the variance of layer-wise importance factors.

Fig. 1. The relationship between β and accuracy on ImageNet with MobileNet-V2.

Figure 1 illustrates the relationship between the values of β and CSMPQ performance. Our results show that when β is set too small, the importance gap between layers also becomes minimal, and CSMPQ cannot accurately distinguish which layers are more important. As a result, the bit-width allocation may not be fair, which can lead to reduced model accuracy. As β increases, the variance of importance factors for different layers also increases, enabling CSMPQ to allocate higher bit-width to more important layers. This results in an optimal bit-width configuration and improved model accuracy. However, we also observed that when β exceeds 10, further increasing its value causes the importance variance to become too large. Consequently, CSMPQ tends to choose an aggressive bit-width allocation, with the largest bit-width allocated to the most important layer, while the relatively unimportant layers receive the lowest bit width. This can lead to a drop in model performance. Influence of Bit-Width Allocation Method. We conduct an ablation study to validate the effectiveness of CSMPQ. θi represents the importance of the i-th layer, which can be calculated by Eq. (10), where β is a hyperparameter to control the variance of the importance of different layers and αi represents the class separability of the i-th layer. As θi is positively correlated with αi , the layer with a larger class separability value αi will be assigned a higher bit-width, while the layer with a smaller value will be assigned a lower bit-width. For a random bit allocation configuration, we first set αi as a random number between 0 and 1, and then obtain the layer-wise importance by Eq. (10). In this way, the bit-width configuration is determined through modeling a linear programming problem. Once the bit-width configuration was determined, we perform quantization-aware training on ResNet-50 model using 8-bit activations on ImageNet-1k dataset.

CSMPQ: Class Separability Based Mixed-Precision Quantization

553

Table 5. The relationship between different bit-width allocation method and accuracy on ImageNet. Method

Model

Model Size (Mb)

BOPs (G)

Top-1 Acc (%)

Random Select

ResNet-50

16.0

143

75.03

Reverse Select

ResNet-50

16.0

143

74.78

CSMPQ

ResNet-50

16.0

143

76.62

As shown in Table 5, when αi is set to a random number, the result of quantizationaware training with the obtained bit-width config is 1.59% lower than CSMPQ. To further investigate the effect of extreme cases, we modify Eq. (10) to θi = e−βαi . This modification means that the layers with larger class separability αi are assigned lower quantization precision, while layers with lower class separability are assigned higher quantization precision. As shown in Table 5, the Top-1 accuracy of “reverse select” is further reduced, being 1.84% lower than basic CSMPQ. These experimental results demonstrate the effectiveness of the proposed CSMPQ.

5 Conclusion In this paper, we propose a novel mixed-precision quantization method, named Class Separability Mixed-Precision Quantization (CSMPQ), which utilizes the class separability of layer-wise feature maps to allocate bit-widths effectively. By introducing the TF-IDF algorithm and proposing a modeling scheme to transfer features to words, we have designed an improved TF-IDF algorithm that can efficiently measure the class separability of features. We have further developed a linear programming-based search strategy to find the optimal bit-width allocation under different compression constraints. The whole search process is computationally efficient, taking only a few seconds on a single 1080Ti GPU, and does not require any iterative process. Our experimental results have shown that CSMPQ outperforms the state-of-the-art quantization methods in terms of compression trade-offs, achieving 73.03% Top-1 accuracy on ResNet-18 with only 63 GBOPs for Quantization-Aware Training and 71.30% Top-1 accuracy on MobileNetV2 for Post-Training Quantization. Our proposed method has demonstrated its effectiveness in improving the model’s compression rate without sacrificing much accuracy. We plan to perform more theoretical analysis of class separability and extend it to more domains in the future. Acknowledgements. This work was partially supported by National Key R&D Program of China (No. 2022ZD0118201), by National Natural Science Foundation of China (Grant Nos. U21B2027,62072386,61972186), and by Yunnan provincial major science and technology special plan projects (Nos. 202103AA080015, 202202AE090008-3, 202202AD080003).

554

M. Wang et al.

References 1. Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017) 2. Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. arXiv preprint arXiv:1902.08153 (2019) 3. Nagel, M., van Baalen, M., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: ICCV (2019) 4. Lee, J., Kim, D., Ham, B.: Network quantization with element-wise gradient scaling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6448–6457 (2021) 5. Habi, H.V., Jennings, R.H., Netzer, A.: HMQ: hardware friendly mixed precision quantization block for CNNs. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12371, pp. 448–463. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-585747_27 6. Dong, Z., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: HAWQ: Hessian aware quantization of neural networks with mixed-precision. In: ICCV (2019) 7. Wu, B., Wang, Y., Zhang, P., Tian, Y., Vajda, P., Keutzer, K.: Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090 (2018) 8. Yu, H., Han, Q., Li, J., Shi, J., Cheng, G., Fan, B.: Search what you want: barrier panelty NAS for mixed precision quantization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 1–16. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-58545-7_1 9. Zheng, X., et al.: DDPNAS: efficient neural architecture search via dynamic distribution pruning. Int. J. Comput. Vision 131(5), 1234–1249 (2023) 10. Zheng, X., et al.: Migo-NAS: towards fast and generalizable neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 43(9), 2936–2952 (2021) 11. Zhang, S., et al.: You only compress once: towards effective and elastic BERT compression via exploit-explore stochastic nature gradient. arXiv preprint arXiv:2106.02435 (2021) 12. Zhang, S., Jia, F., Wang, C., Wu, Q.: Targeted hyperparameter optimization with lexicographic preferences over multiple objectives. In: The Eleventh International Conference on Learning Representations (2023) 13. Zheng, X., Ji, R., Tang, L., Zhang, B., Liu, J., Tian, Q.: Multinomial distribution learning for effective neural architecture search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1304–1313 (2019) 14. Elthakeb, A.T., Pilligundla, P., Mireshghallah, F., Yazdanbakhsh, A., Esmaeilzadeh, H.: Releq: a reinforcement learning approach for deep quantization of neural networks. arXiv preprint arXiv:1811.01704 (2018) 15. Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: Haq: hardware-aware automated quantization with mixed precision. In: CVPR (2019) 16. Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: Zeroq: a novel zero shot quantization framework. In: CVPR (2020) 17. Li, Y., et al.: Brecq: pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426 (2021) 18. Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020) 19. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

CSMPQ: Class Separability Based Mixed-Precision Quantization

555

20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) 21. Yao, Z., et al.: Hawq-v3: dyadic neural network quantization. In: ICML (2021) 22. Yang, L., Jin, Q.: Fracbits: mixed precision quantization via fractional bit-widths. arXiv preprint arXiv:2007.02017 (2020) 23. Choi, J., Wang, Z., Venkataramani, S., Chuang, P.I.-J., Srinivasan, V., Gopalakrishnan, K.: Pact: parameterized clipping activation for quantized neural networks. arXiv preprint arXiv: 1805.06085 (2018) 24. Zhang, D., Yang, J., Ye, D., Hua, G.: LQ-nets: learned quantization for highly accurate and compact deep neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 373–390. Springer, Cham (2018). https://doi.org/10. 1007/978-3-030-01237-3_23 25. Park, E., Yoo, S., Vajda, P.: Value-aware quantization for training and inference of neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 608–624. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-012250_36 26. Chin, T.-W., Chuang, P.-J., Chandra, V., Marculescu, D.: One weight bitwidth to rule them all. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12539, pp. 85–103. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68238-5_7

Open-World Few-Shot Object Detection Wei Chen and Shengchuan Zhang(B) Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, Xiamen 361005, People’s Republic of China [email protected]

Abstract. General object detection has made significant progress under the closeset setting. However, the detector can only support a fixed set of categories and fail to identify unknown objects in real-world scenarios. Therefore, class-agnostic object detection (CAOD) has recently attracted much attention, aiming to localize both known and unknown objects in the image. Since CAOD utilizes the binary label to train the detector, lacking multi-class classification information, and is also incapable to further generalize quickly to the unknown objects of interest, the scalability of this task is limited in more downstream applications. In this paper, we propose a new task termed Open-World Few-Shot Object Detection (OFOD), extending class-agnostic object detection with the few-shot learning ability. Compared with CAOD, OFOD can accurately detect unknown objects with only a few examples. Besides, we propose a new model termed OFDet, built upon a classagnostic object detector under the two-stage fine-tuning paradigm. OFDet consists of three key components, Class-agnostic Localization Module (CALM) that generates class-agnostic proposals, Base Classification Module (BCM) that classifies objects from classes features, and Novel Detection Module (NDM) that learns to detect novel objects. OFDet detects the novel classes in NDM and localizes the potential unknown proposals in CALM. Furthermore, an Unknown Proposals Selection algorithm is proposed to select more accurate unknown objects. Extensive experiments are conducted on PASCAL VOC and COCO datasets under multiple tasks, CAOD, few-shot object detection (FSOD) and OFOD. The results show that OFDet performs well on the traditional FSOD and CAOD settings as well as the proposed OFOD setting. Specifically for OFOD, OFDet achieves state-ofthe-art results on the average recall of unknown classes (32.5%) and obtains high average precision of novel classes (15.7%) under the 30-shot setting of COCO’s unknown set 1. Keywords: General Object Detection · Class-Agnostic Object Detection · Few-shot Learning · Unknown Proposal Selection

1 Introduction Deep neural networks [1–5] have witnessed significant progress in object detection [6], which aims to localize and classify objects of interest in the image. However, the success of most modern object detectors is built on a close-set setting where the categories © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 556–567, 2023. https://doi.org/10.1007/978-981-99-4755-3_48

Open-World Few-Shot Object Detection

557

in the test set entirely depend on the ones used in the training process. Therefore, in more realistic scenarios, these detectors are unable to recognize the unseen categories in training. In contrast, humans can recognize unseen objects similar to previous classes in new environments regardless of their special categories, leading us to research classagnostic object detection (CAOD) [7–9]. As a sub-problem of open-set learning [10], CAOD aims to localize all instances of objects in the image without learning to classify them. There are two scenarios for this task as follows. One performs as a significant preprocessing module in object detection, e.g., Region Proposal Network [6]. The other is to localize unseen objects in practical applications such as autonomous driving and robot navigation. Recently, Kim et al. proposed OLN [9] to learn better objectness cues for object localization, which generalizes well to unknown classes compared to previous works. Table 1. The effectiveness of Base Detection Module on 10-shot split 1 of VOC. Method

bAP50

nAP50

OFDet w/o BCM

35.5

30.2

OFDet w/BCM

76.7

59.2

However, the scalability of such networks is still limited. Suppose we can employ the strong object locator as a pre-processing module, fine-tuned with only a few annotations of unknown objects of interest. In that case, the network will reduce the computational cost and be more flexible during the training process. To this end, we first attempt combining CAOD with few-shot object detection (FSOD) [11–14], which has attracted much attention recently in data-scarce scenarios. A few-shot detector can quickly generalize to novel classes with a few annotated examples by learning from abundant base examples. Therefore, a naive idea is to train a class-agnostic detector and then directly extend it with an R-CNN module to classify known classes and detect unknown classes of interest (i.e., novel classes) with only a few annotations. We expect a competitive result compared to the model trained on abundant data. However, as shown in the first row of Table 1, the results of base and novel classes are neither satisfying. The reason may be that the class-agnostic detector is supervised with binary labels, lacking multi-class classification information. Therefore, we propose a simple yet effective module named Base Classification Module (BCM) to maintain the classification information of base classes. As shown in Fig. 1 and Table 1, BCM boosts the average precision (AP) of base and novel classes by 30% approximately. In addition, since we still need to recognize the potential unknown objects in the second stage for further design, existing problem settings no longer meet our needs. Here we propose a novel setting called Open-World Few-shot Object Detection (OFOD), shown in Fig. 2. OFOD not only needs to localize unseen objects by fully leveraging the representation learned from limited base classes in the first stage, but also detect novel categories with few examples in the next stage. We also propose a detector named OFDet for the OFOD task, which builds upon a class-agnostic object detector [9] under a two-stage fine-tuning paradigm [13]. OFDet consists of three novel modules:

558

W. Chen and S. Zhang

Fig. 1. The visualization of Base Classification Module.

Fig. 2. An illustration of the proposed Open-world Few-shot Object Detection task.

Class-Agnostic Localization Module (CALM), Base Classification Module (BCM) and Novel Detection Module (NDM). In the first stage, the class-agnostic proposals are utilized for training CALM consisting of location RoI head (L-RoI head) and the following box regression and localization quality layers. The classification features of base classes are extracted by the BCM, composed of Classification RoI head (C-RoI head) and the box classifier. In the second stage, the class-agnostic proposals are produced by CALM and pooled into a fixed-size feature map to perform box regression and novel classification in NDM. We test our method on three tasks across two datasets, PASCAL VOC [15] and MS COCO [16]. For the FSOD task, our approach outperforms the baseline of transfer-learning approach under most settings. For the CAOD task, our network essentially improves the average recall of novel classes when we add a few novel annotations. For our proposed OFOD task, our network achieves state-of-the-art performance for the localization of unknown classes while keeping the capability of detecting known classes. Our main contributions are summarized as follows: • We introduce a novel problem setting, Open-World Few-Shot Object Detection, which attempts to localize the unknown classes and detect the novel classes as FSOD task. • We present a network named OFDet, based on class-agnostic object detection with a two-stage fine-tuning paradigm to address the challenge OFOD task. In OFDet, we propose three novel modules and an unknown proposal selection algorithm for known and unknown classes. • Extensive experiments and analyses illustrate that our approach performs well across three tasks, demonstrating the effectiveness of our approach.

2 Open-World Few-Shot Object Detection 2.1 Preliminary Few-Shot Object Detection. We revisit few-shot object detection following previous works [11]. We first split the classes into base classes Cb with abundant instances and

Open-World Few-Shot Object Detection

559

novel classes Cn with only K shot instances for each category, forming two sub-datasets named Db and Dn , respectively. The goal of FSOD is to detect objects from novel classes Cn with only a few annotations. As a pioneer of the transfer-learning paradigm in FSOD, TFA proposes a two-stage fine-tuning approach that adopts Faster R-CNN as the base detector. Despite the superior performance, its proposal generator usually overfits to trained classes and cannot adapt to the real world with unknown proposals. Class-Agnostic Object Detection Aims to locate all instances in the image without classifying them. Unlike FSOD, CAOD only leverages a model trained on Db to localize objects from both Cb and Cn , eliminating the fine-tuning stage. As the first attempt to explore the open-world proposals, OLN replaces the classifiers in both RPN and RoI heads with localization quality estimators. Although OLN can achieve strong performance on novel classes, the scalability is still limited for lack of multi-label information, i.e., Cb . Therefore, we propose a novel problem setting of open-world few-shot object detection based on few-shot learning, described as follows. Our Problem Setting. In this section, we propose a new task, open-world few-shot object detection (OFOD), based on existing literature on class-agnostic object detection and few-shot object detection. For an object detection dataset D, we split it into three sub-datasets, a base set Db with abundant annotated instances from Cb , a novel set Dn with a few examples of Cn , and an unknown set Duk with a set of unknown classes but not labeled yet. Note that the three sets of classes are not overlapped, i.e., Cb ∩Cn ∩Cuk = ∅. Our goal is to learn a model to detect both Cb and Cn from the combination of Db and Dn , and localize Cuk from Duk without extra training samples. Compared with CAOD, our task additional needs to classify objects from Cb and make a good separation of Cn and Cuk by using a few annotations of Cn . Compared with FSOD, our task not only needs to detect objects from Cb and Cn , but also localize objects from Cuk (we do not need to classify them).

3 OFDet Stage I: Open-World Base Training

Backbone

RPN

(Class-Agnostic Localization Module, CALM)

Stage II: Few-Shot Fine-tuning

Backbone

RPN proposals

proposals

Unknown

UPS

Unknown

Person Person

Open-World Object Localization

Few-Shot Detection

Localization Quality RoI Align

L-RoI Box Regression

RoI Align

Box Regression

CALM Refined Proposals

C-RoI (Base Classification Module, BCM)

Box Classification

RoI Align

Box Classification D-RoI

(Novel Detection Module, NDM)

Fig. 3. The architecture of OFDet for open-world few-shot object detection.

As shown in Fig. 3, we propose a novel network called OFDet following the twostage training approach, which consists of a backbone, Region Proposal Network (RPN), Class-Agnostic Localization Module (CALM), Base Classification Module (BCM), and Novel Detection Module (NDM) based on the Faster R-CNN.

560

W. Chen and S. Zhang

3.1 Stage I During the base training stage, our architecture consists of the backbone, RPN, BCM and CALM. The box regressor in CALM and the box classifier in BCM are responsible for detecting base classes and localizing unknown objects. In this section, we will introduce BCM and CALM, respectively. Class-Agnostic Localization Module. The Class-Agnostic Localization Module (CALM) is a variant of the R-CNN detection head, including a localization RoI head (LRoI head), a bounding box regressor and a localization quality estimator. Different from R-CNN, CALM does not distinguish the class label of a proposal, i.e., class-agnostic proposal generation. CALM is used to compute the four-dimensional coordinates of the proposals and corresponding localization quality. It plays an essential role in recognizing both known and unknown objects during two stages. Base Classification Module. As discussed before, though the class-agnostic proposal generator has better localization ability for all foreground objects, it still lacks supervision of base labels. To obtain discriminative information in the first stage, a naive solution is to add another classifier following the L-RoI head of CALM. However, there may be a conflict between localization and classification tasks if both output layers follow the same RoI head, which could harm both the generalization ability for unknown classes and the classification ability for novel classes. Therefore, to avoid reducing the generalization ability to the unknown objects, we add an extra Base Classification Module (BCM) in the first stage to decouple the branches of localization and classification. BCM consists of the C-RoI head and the box classifier for base classes. We do not need to supervise the regression features since they are already obtained in CALM, so we can directly extract the classification feature for adapting to the base classes. Open-World Base Training. In the first stage, the box regressor in CALM and box classifier in BCM are responsible for detecting base classes and localizing unknown objects. Thus, the limitation of no classification information in the original localization network can be compensated. The total loss in the first stage can be described as: (1) where Lcls in BCM is a cross-entropy loss for the base classifier, and Lreg and Lloc in RPN and CALM are both L1 losses for the class-agnostic box regressor, and λ1 = 8.

3.2 Stage II In the second stage, the C-RoI head and its classifier layer in BCM are pre-trained for Novel Detection Module to detect novel classes with a few examples and then removed. In addition, an Unknown Proposal Selection is proposed for localizing the unknown objects during inference. We will introduce them next. Novel Detection Module. As described in the task of OFOD, the detector needs to further operate on the localized unknown objects of interest in more applications. Therefore,

Open-World Few-Shot Object Detection

561

we propose a Novel Detection Module (NDM) to detect novel objects from the unknown, which only have a few instances per class. As shown in Fig. 3, NDM is followed by the CALM and only applied in the fine-tuning stage, composed of a D-RoI head, a box regressor and a classifier. We use CALM to generate a set of proposals and then feed these proposals to NDM for further multi-class classification and regression. Since we have abundant instances of base classes and a few examples of novel classes, it is easy to learn a good separation of different classes and avoid training from scratch which is time-consuming and inefficient. Unknown Proposal Selection. After we detect novel objects from NDM, we still need to localize the unknown objects which may be contained in the class-agnostic proposals by CALM. Therefore, we propose an Unknown Proposals Selection (UPS) shown in Algorithm I algorithm that enables CALM to produce potential detections for unknown classes. Then we concatenate unknown proposals and known proposals with their categories to evaluate both the average precision for known classes and classes-agnostic average recall for unknown objects, respectively. Few-Shot Fine-Tuning. As shown in Fig. 3, the weights of the D-RoI head are initialized by the weights of the C-RoI head, and the classification weights and the regression weights of base classes in NDM are initialized by the corresponding layers in BCM, respectively. The network obtains high objectness scores of proposals and their corresponding deltas from the output of CALM. Then we take top-scoring k1 proposals and perform RoI Align operation again to extract the refined region features for better localization. For detecting novel classes, we feed these features into D-RoI head and finally carry out classification and regression tasks with two separate fully connected layers in NDM. The rest of the proposals which do not contain base or novel classes are utilized to localize the unknown objects with the aid of UPS. Following [13], we adopt a cosine similarity for box classifier. The total loss in the second stage can be described as: (2) where Lcls and Lreg in NDM are a cross-entropy loss for box classification and a smoothed L1 loss for box regression, respectively.

562

W. Chen and S. Zhang

3.3 Inference During inference, the final objectness score s of a proposal in CALM is computed √ as the mean of RPN localization quality r and CALM localization quality c, i.e., s = (r · c). In the second stage, we take top-scoring k2 proposals with NMS threshold of θ3 obtained by CALM to feed into NDM for classification and regression, where k1 , k2 are both 5000 and θ3 is set as 0.9. For recognizing the unknown objects, θ1 = 0.4, θ2 = 1.0, Kuk = 20 of UPS are set for recalling objects.

4 Experiments 4.1 Experiments Setting Existing Benchmarks. We evaluate our method for several tasks on the widely used detection benchmarks, PASCAL VOC and MS-COCO. For FSOD, we follow previous works [12, 13] and use the same class and data splits for a fair comparison. For CAOD, we have two splits. The first split follows the setting of OLN [9], while the second split follows the setting of FSOD instead. For open-world few-shot object detection, we divide COCO into 60 base classes, 15 novel classes, and 5 unknown classes with three

Open-World Few-Shot Object Detection

563

random groups. The unknown categories of each set are identical to the ones on each novel set of VOC. Evaluation. We introduce the evaluation setting for three tasks, respectively. For CAOD experiments, following [9], class-agnostic COCO-style AR over N proposals on the novel classes are used. For FSOD experiments, we measure COCO-style mAP and AP50 for COCO and PASCAL VOC, respectively. For OFOD experiments, we together measure COCO-style mAP and unknown AR over N unknown proposals for both known classes and unknown classes, respectively. Table 2. Experimental results of CAOD on COCO. Method/Split

AR10

AR50

AR100

Split 1

Split 2

Split 1

Split 2

Split 1

Split 2

OLN [9]

15.9∗

10.0∗

25.5∗

16.2∗

29.6∗

18.4∗

OFDet-10shot (ours)

18.2

17.6

27.9

25.1

31.7

27.7

OFDet-30shot (ours)

21.5

21.8

32.2

30.4

37.0

33.3

Implementation Details. We implement our method based on detectron2. The SGD optimizer with momentum 0.9 and weight decay 1e−4 is utilized to optimize our network over 8 GPUs with 16 images per mini-batch (2 images per GPU). We use ResNet-101 pre-trained on ImageNet as the backbone for most experiments, except for COCO’s split 1 of class-agnostic object detection experiments, in which ResNet-50 is used to align with [6] for a fair comparison. The learning rate is set to be 0.02 during base training and 0.01 during fine-tuning.

4.2 Results on Class-Agnostic Object Detection For a fair comparison, we reproduce the unknown classes results of OLN based on detectron2. We present our evaluation results of COCO on two different splits in Table 2. It can be seen that our method can significantly improve the recall of novel classes compared to OLN. The AR10 of split 1 in our method can achieve 18.2% and 21.5% under 10-shot and 30-shot settings, boosting OLN by 2.3% and 5.6%. For split 2, our work can also achieve 17.6% and 21.8% in AR10 evaluation, which gains 7.6% and 11.8% AR compared to OLN. 4.3 Results on Few-Shot Object Detection We provide the results of three splits for PASCAL VOC in Table 3. We would like to emphasize that recent methods are not shown in the table because we select TFA as the baseline and focus on the more challenging OFOD task. We make a detailed comparison with TFA to verify the effectiveness of our work. As shown in Table 3, our method is

564

W. Chen and S. Zhang Table 3. Experimental results of FSOD on PASCAL VOC.

Table 4. Experiment results of FSOD on COCO.

superior to the transfer-learning baseline TFA. Besides, our method can outperform metalearning approaches under most settings. Table 4 demonstrates the few-shot detection results for COCO. Our methods can achieve 9.6%, 12.7% and 17.5% in 5-shot, 10-shot and 30-shot settings, which gains 2.2%, 2.7% and 3.8% for K = 5, 10, 30 shots per category than TFA. Meanwhile, our method can also outperform meta-learning baseline under most settings, showing our methods’ strong robustness and generalization ability in a more realistic and complex scenario such as COCO. 4.4 Results on OFOD We conduct OFOD experiments on COCO. We compare OFDet with several baselines, including FRCN-ALL, TFA and Meta R-CNN. As shown in Table 5, though these detectors can detect novel classes with a few samples, they fail to localize the unknown objects under 10-shot or 30-shot setting. The reason may be that their classifier cannot recognize the unknown objects, harming the unknown recall evaluation. Finally, compared with the formal results, our method can achieve state-of-the-art performance on unknown AR with 30.2%, 30.1% and 24.4% under 10-shot setting, 32.5%, 31.2% and 24.5% in 30-shot setting, respectively, while keeping the competitive performance of detecting novel classes.

Open-World Few-Shot Object Detection

565

Table 5. Experimental results of OFOD on COCO.

4.5 Ablation Study. Effectiveness of UPS. As a plug-and-play method, the proposed UPS is easily utilized for two-stage object detectors to help the existing proposal generators localize more unknown proposals. As shown in Table 5 and discussed in Sect. 4.4 before, we obverse that using UPS can achieve higher performance, with about 100% relative improvement in Unk AR, demonstrating the effectiveness of our UPS. Ablation of Weight Initialization in NDM. We carefully analyze how the weight initialization in NDM contributes to the performance of novel classes. All results are shown in Table 6. The first row demonstrates that base model without BCM only achieves 30.2% nAP. Next, we take four progressive steps to explore the fine-tuning strategy used in NDM. Finally, we integrate the above three pre-trained weights into the original OFDet, which makes a significant boost with 28.5%, proving the effectiveness of BCM and fine-tuning strategy. Visualization. Figure 4 is our qualitative results on COCO. Our method can not only detect known objects, but also localize unknown objects without training samples.

566

W. Chen and S. Zhang Table 6. Ablation of weight initialization of NDM. Method

OFDet

D-RoI Head

Cls Layer

√ √ √ √

√ √

Reg Layer

√ √

nAP 30.2 55.5 57.6 58.3 59.2

Fig. 4. Visualization results of our proposed method on 10-shot of COCO dataset.

5 Conclusion In this work, we propose Open-World Few-Shot Object Detection, where our proposed OFDet extends a class-agnostic object detector based on the two-stage approach to perform detection for novel classes with a few available examples and localization for unknown class with an Unknown Proposals Selection in more realistic scenarios. We hope our work will inspire the vision community to further explore this novel task for practical applications. Acknowledgement. This work was supported by National Key R&D Program of China (No. 2022ZD0118202), the National Science Fund for Distinguished Young Scholars (No. 62025603), the National Natural Science Foundation of China (No. U21B2037, No. U22B2051, No. 62176222, No. 62176223, No. 62176226, No. 62072386, No. 62072387, No. 62072389, No. 62002305 and No. 62272401), and the Natural Science Foundation of Fujian Province of China (No. 2021J01002, No. 2022J06001).

References 1. Zheng, X., et al.: DDPNAS: efficient neural architecture search via dynamic distribution pruning. Int. J. Comput. Vision 131(5), 1234–1249 (2023) 2. Zheng, X., et al.: Migo-NAS: towards fast and generalizable neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 43(9), 2936–2952 (2021) 3. Zhang, S., et al.: You only compress once: towards effective and elastic BERT compression via exploit-explore stochastic nature gradient. arXiv preprint arXiv:2106.02435 (2021) 4. Zhang, S., Jia, F., Wang, C., Wu, Q.: Targeted hyperparameter optimization with lexicographic preferences over multiple objectives. In: The Eleventh International Conference on Learning Representations (2023)

Open-World Few-Shot Object Detection

567

5. Zheng, X., Ji, R., Tang, L., Zhang, B., Liu, J., Tian, Q.: Multinomial distribution learning for effective neural architecture search. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1304–1313 (2019) 6. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015) 7. Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012) 8. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104, 154–171 (2013) 9. Kim, D., Lin, T.Y., Angelova, A., Kweon, I.S., Kuo, W.: Learning open-world object proposals without learning to classify. IEEE Robot. Autom. Lett. 7(2), 5453–5460 (2022) 10. Bendale, A., Boult, T.: Towards open world recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1893–1902 (2015) 11. Chen, H., Wang, Y., Wang, G., Qiao, Y.: LSTD: a low-shot transfer detector for object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018) 12. Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., Darrell, T.: Few-shot object detection via feature reweighting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8420–8429 (2019) 13. Wang, X., Huang, T., Gonzalez, J., Darrell, T., Yu, F.: Frustratingly simple few-shot object detection. In: International Conference on Machine Learning, pp. 9919–9928. PMLR (2020) 14. Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., Lin, L.: Meta R-CNN: towards general solver for instance-level low-shot learning. In: Proceedingsof the IEEE/CVF International Conference on Computer Vision, pp. 9577–9586 (2019) 15. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010) 16. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

Theoretical Computational Intelligence and Applications

RWA Optimization of CDC-ROADMs Based Network with Limited OSNR Pengxuan Yuan1 , Yongxuan Lai1 , Liang Song1 , and Feng He2(B) 1 Xiamen University, Xiamen 361005, Fujian Province, China 2 FiberHome Telecommunication Technologies Co., LTD, Wuhan 430205, Hubei Province,

China [email protected]

Abstract. The introduction of Colorless, Directionless, and Contentionless Reconfigurable Optical Add Drop Multiplexers (CDC-ROADMs) complies with the idea of software-defined network and gradually becomes the major component of the construction of optical transmission network (OTN). Current Routing and Wavelength Assignment (RWA) algorithms oriented to OTN have already considered the flexibility of deployment of multiplexers. However, mainstream RWA algorithms usually tend to overlook the physical limitation of the Optical Signal-to-Noise Ratio (OSNR). In this paper, we creatively propose two integer linear programming (ILP) based algorithms solving OSNR limitation considered RWA problem on CDC-ROADMs networks. The simulations we implemented have proved the feasibility and efficiency of our algorithms. Keywords: RWA · Optical Network · CDC-ROADM · OSNR

1 Introduction The rapid growth of IP traffic and emerging high-rate applications, such as mobile applications, high-resolution stream service, and cloud computation has reached the limit of traditional electrical signal networks’ capacity, which requires a much more cost-efficient and scalable alternative [1]. To meet this requirement, Optical Transmission Network (OTN) [2] with high channel capacity and extended optical reach enabling high-rate transmission over Wavelength Division Multiplexing (WDM) links, has progressively become the bedrock architecture of the current core layer and aggregation layer [3]. Though the main problem OTN facing is still the design of the Routing and Wavelength Assignment (RWA) algorithm [4], the introduction of Colorless, Directionless, and Contentionless Reconfigurable Optical Add Drop Multiplexers (CDC-ROADMs) and physical constraint deriving from Optical Signal-to-Noise Ratio (OSNR) [5] in fiber links makes the modeling of RWA in OTN much more complicated. Unlike Conventional Reconfigurable Optical Add Drop Multiplexers (C-ROADMs) [6] unable to supply sufficient flexibility to adjust the routing schema, CDC-ROADM is much more compatible with the concept of the software-defined network [7] and shows its superiority in adding, dropping, and expressing routing traffic through network nodes. Figure 1 shows the main structural difference between them. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 571–580, 2023. https://doi.org/10.1007/978-981-99-4755-3_49

572

P. Yuan et al.

Fig. 1. Difference between C-ROADMs and CDC-ROADMs

Figure 1 (a) is the physical model of C-ROADMs, which fixedly places optical channels by connecting specific ports and multiplexers, leaving wavelength allocation directional. When rerouting wavelength, manual intervention is inevitable in the CROADMs scenario. In comparison, CDC-ROADMs technology as shown in Fig. 1 (b) logically binds all ports and multiplexers, making it possible to assign any wavelength to any port at the add/drop site and only needs remote software control to reroute the path or adjust wavelength based on the network status [8]. Thus CDC-ROADMs have become an essential component of optical transmission systems deployed with core network nodes. In this work, all nodes would be regarded as deployed with CDC-ROADMs and pose characteristics mentioned above, which is also the foundation of following logical modeling and formulation. Furthermore, the energy loss of optical signal transmission is another factor taken into consideration in this work. OSNR is a classic indicator to quantify the degree of optical noise interference on an optical signal within a valid bandwidth. Figure 2 shows the basic definition of OSNR where x axe represents the wavelength of an optical signal and y axe represents the power of signal and noise. After long-distance transmission, the signal’s intensity inevitably decreases, meaning it shall be amplified by optical amplifiers (OA) so that receivers detect and recover the signal properly [9]. However, OA decreases the OSNR value which shall be above the OSNR limit on WDM networks to keep the error bit rate under control [10]. This is another issue addressed in this work. There has been a significant body of work in the area of optical network RWA. Most previous researches concentrate on decreasing the complexity of computation and improving the quality of services [11–13] based on the ideal graph model without considering the physical constraints and construction cost, which is the focus we would like to explore in this work. In this paper, we propose two ILP-based algorithms for different scale cases to solve the RWA problem on high routing flexibility networks with OSNR limit considered at the same time. The remainder of the paper is organized as follows. In Sect. 2, we clarify the physical model. Based on it, we provision two formulation schemas to tackle the small scale and large scale problems correspondingly. In Sect. 3, we describe our ILP algorithm and the heuristic strategy we adopt to reduce excessive variables. Our

RWA Optimization of CDC-ROADMs Based Network with Limited OSNR

573

performance results follow in Sect. 4, where we initially examine the feasibility and effectiveness of our method. Finally, we conclude the paper.

Fig. 2. The definition of OSNR

2 Problem Description 2.1 Physical Model In the model of the WDM network in this paper, modulation devices are deployed only on nodes of graphs and every node can adjust wavelength allocation strategy and routing direction according to the objective of RWA, which means a node can assign a new wavelength to the signal going through it and OSNR of the signal emitted by it will be newly estimated due to modulation and demodulation process. The computation of OSNR will be approximated by a function negatively correlated with the number of links within a path and transmission length for simplification. As shown in Fig. 3, links between nodes with CDC-ROADMs devices consist of a set of wavelength channels where traffic of different services is carried on spectrum slots accordingly. What needs to be noted is a specific spectrum slot cannot be shared by two or more services at the same time. When an optical signal passes through nodes, OSNR can be increased by demodulation and modulation at the cost of multiplexers [14], the quantity of which is limited. At the same time, signals modulated by the node can be assigned to its’ original wavelength or a new available spare wavelength channel. Likewise, once a specific multiplexer is allocated to signal modulation of a certain service, it is not allowed to work for other services simultaneously. A complete routing path is called a lightpath in OTN which consists of a set of fiber links with different lengths. OSNR computation is simulated by designing a negatively correlated equation with the length of each link of a complete lightpath [15]. The OSNR of the signal carried on a lightpath is determined by the number of links and length of

574

P. Yuan et al.

each link when the signal is not processed by any multiplexer on any middle node of the lightpath. It is not hard to see that long-distance transmission is not supported due to the OSNR limit and OSNR attenuation, which is also the main reason why multiplexers are needed to be deployed on nodes.

Fig. 3. Two possible signal transmission status

2.2 Graph Model The OTN topology is represented by a connected graph G = (V , E). V denotes the set of nodes, which can be deployed with multiplexers to amplify signals and change the wavelength channel signals are carried on. E denotes the set of (point-to-point) singlefiber links with a certain number of spectrum slots. The assumption is the graph topology is bidirectional, which means if a link E(i, j) starts from i to j in E so is the link E(j, i). The frequency spectrum, same for every link in E, „ is represented by a set of frequency (spectrum) slots Λ = λ1 , λ2 , . . . , λn , where |Λ| is the number of slots existing in a link and usually fixed. M denotes the hardware resources as in M = m1 , m2 , . . . , mn , where |M | is the number of available multiplexers and also the same for all nodes on the graph. Meanwhile, each m stands for the cost of two occupied interfaces on the multiplexer for uploading and downloading the data. Especially, each interface is not allowed to be shared by different services using the same wavelength, and the number of available interfaces is fixed and limited similarly. T represents the set of static traffic demands required to be routed on the network and allocated wavelength to. Each demand can be described by a tuple y(s, d ), in which s is the source node and d is the target node of this service, respectively. Since OSNR of the whole lightpath is not a linear function related to the number of its links, it is almost impossible to formulate OSNR computation as a strict constraint or a weighted part of the objective function needed to be optimized in the framework

RWA Optimization of CDC-ROADMs Based Network with Limited OSNR

575

of ILP. To solve this problem, preprocessing links in a lightpath becomes necessary. We provision two strategies modeling the links and lightpaths on different granularity separately to meet the requirement that the OSNR of signal shall be above the OSNR limit. 2.3 Small Scale vs Large Scale Aware that the searching space of a small scale problem is limited, all potential paths from source node s to target node t for a traffic demand will be calculated via deep first searching (DFS) on the graph. Subsequently, each complete potential path is cut into pieces on different granularity, which keeps OSNR of signal above the limit, to constitute a set P of subpaths which variables of the ILP procedure are sampled from. xtpioω is the boolean variable that denotes the utilization of the subpath. tp ∈ P where P represents the set of subpaths generated for all traffic demands and tp means the subpath p is used by traffic demand t. Suffix i denotes the index of the interface this subpath utilizes at the start node, and suffix o denotes the index of the interface occupied at the end node. ω represents the wavelength slot used. For a highly connected large scale graph, computation of all paths between a pair of nodes is already costly. Therefore, k shortest path algorithm is taken to trade off the completeness of searching space and efficiency of the ILP procedure. In the same way, paths are segmented to generate subpaths according to the OSNR limit. Noting that Simply splitting the whole paths on a scale of one link, two links up to all links still produces too many variables unbearable for the following computation, searching space is required to be compressed further. To minimize the utilization of hardware resources like multiplexers, an intuitive idea is to reduce the number of nodes deployed multiplexers on by integrating more links in a subpath as much as possible. OSNR of the signal is negatively correlated with links of a subpath and the length of each link, making an optimal substructure of a greedy algorithm in splitting the complete path. Preprocessing shall be as follows: The furthest node from the start of the lightpath, which can still keep OSNR above the limit, becomes the first position to cut, and the rest of the lightpath iterates this course till the rest of the lightpath already meets the OSRN constraint. xtp is the boolean variable that denotes the utilization of the subpath. Likewise, tp is a sample out of set P, which is generated on all potential k shortest paths of all traffic demands through the preprocessing procedure mentioned above. Allocation of wavelength and multiplexer on nodes is on the results computed by ILP procedure to further reduce the size of the searching space for the solver.

3 Optimization Method Under the same circumstance that all traffic demands are needed to be routed on the network with minimum utilization of resources like multiplexers and wavelength slots, problems formulated on different scale networks shall be taken differently. In this section, detailed ILP designs will be given.

576

P. Yuan et al.

3.1 ILP Formulation for Small Scale Network The objective function is consisted of two parts with different weights which can be adjusted depending on the purpose and focus of the algorithm:       (1) Yt + B xtpioω − xtpioω minA |T | − t∈T

Dtp =Dt

all

In the equation above, A is the weight of failed traffic demands, and B is the weight of used multiplexers. Dtp is the end node of certain subpath and Dt is target node of the traffic demand tagged t. Topology-related constraints are described as follows:  xtpioω = Yt , ∀t ∈ T (2) 

Stp =St ,i,o,ω

Dtp =Dt ,i,o,ω

 Dtp =v,i,o,ω

xtpioω = Yt , ∀t ∈ T



xtpioω =

Stp =v,i,o,ω



tp∈Ptv ,i,o,ω

xtpioω , ∀v ∈ VIO , t ∈ T

xtpioω ≤ 1, ∀v ∈ Vt , t ∈ T

(3) (4) (5)

Equations (2) and (3) constrain that every successful traffic demand shall only have a single start and a single end. Stp is the start node of certain subpath and St is start node of the traffic demand tagged t. Equation (4) requires that for all traffic demands and for any node (vertex) tagged v which belongs to a set of nodes VIO deployed with multiplexer the input traffic shall be strictly equal to the output traffic. Equation (5) avoids loop on topology where Ptv means the set of all subpaths of certain traffic demand t that start with node v and Vt means all the nodes on the graph that are the start of certain subpath of traffic demand t.  xtpioω ≤ 1, ∀ω ∈ , e ∈ ETP (6) 

v∈Stp ∧v ∈S / t ,i,o,ω

  

tp∈Pe ,i,o

Dtp =v,i,ω Dtp =v,i,ω

Dtp =v,i,t

xtpilω +

xtpilω +

xtpilω +





xtpioω ≤ Mv , ∀v ∈ VIO



xtploω ≤ Nvl , ∀l ∈ Lv , v ∈ VIO

(8)

xtploω ≤ 1, ∀l ∈ Lv , v ∈ VIO , t ∈ T

(9)

xtploω ≤ 1, ∀l ∈ Lv , v ∈ VIO , ω ∈ Wvl

(10)

Stp =v,o,ω

Stp =v,o,ω Stp =v,o,t

(7)

xtpioω ∈ {0, 1}

(11)

Equation (6) gives out RWA constraints by having that, for a link that is part of a certain subpath of traffic demand, there will be only one traffic demand going through it and occupying wavelength slot ω, where ETP means the set of links used in subpaths of traffic demand and Pe means all subpaths that contain link e. Equation (7) requires the number of multiplexers deployed on node v be less than Mv where Stp ∈ v and v ∈ / St mean the node v is the start of certain subpath but not the start of a whole lightpath. Equation (8) means the number of interfaces in multiplexers of a specific node is limited, and Eq. (9) and Eq. (10) show that the interface is simplex.

RWA Optimization of CDC-ROADMs Based Network with Limited OSNR

577

3.2 ILP Formulation for Large Scale Network Likewise, the purpose of RWA proposed in this paper is still to reduce the consumed resources while maximizing the number of successful traffic demands simultaneously. The objective function is given as follows, the main difference from small scale problems is the granularity of variables.       min A |T | − Yt + B xtp − xtp (12) t∈T

all

Dtp =Dt

Topology-related constraints, RWA-related constraints, and multiplexer-related constraints shall remain the same without integrating the wavelength and index of the interface on each side of a subpath into the variables.  xtp = Yt , ∀t ∈ T (13) Stp =St

 Dtp =Dt

 Dtp =v

xtp =

xtp = Yt , ∀t ∈ T

 Stp =v

xtp , ∀v ∈ VIO , t ∈ T

(14) (15)

As mentioned in Sect. 2, we put the allocation of wavelength and multiplexer behind ILP, leading to the fact that we are only required to make sure the number of slots occupied by traffic demands computed is not larger than the practical number of wavelength slots in each link. In general, the capacity of all links in the same network is also the same, so we set the  as a constant, as shown in Eq. (16).  xtp ≤ 1, ∀v ∈ Vt , t ∈ T (16) tp∈Ptv

 tp∈Pe

xtp ≤ , e ∈ ETP

 v∈Stp ∧v ∈S / t

xtp ≤ Mv , ∀v ∈ VIO

xtp ∈ {0, 1}

(17) (18) (19)

Noting that these equations restrict the number of critical resources, a naïve idea is brute-force algorithms by putting traffic demands on the channels calculated by ILP at the order of importance of those traffic demands. This method can’t ensure all traffic demands can be met. To get a potential final result, we propose to iterate ILP and assignment process on failed traffic demands repeatedly with enlarging searching space by increasing meta parameter k of the KSP algorithm at the start until all traffic demands are met ideally. In the practical experiment environment, we proved that this method works appropriately and achieved good results.

578

P. Yuan et al.

4 Experimental Results In this section, we evaluate the proposed algorithm’s performance on small scale and large scale networks separately through simulation experiments. Because previous mainstream RWA algorithms overlook the physical limitation of OSNR, our simulation experiments are conducted to verify the effectiveness of the algorithm in terms of saving occupied hardware resources and considering the constraint of OSNR limit for large scale and small scale networks, respectively.

Fig. 4. Topologies of the small network and the large network

We randomly generated a small network topology with 7 nodes and 11 edges and a large network with 101 nodes and 178 edges, the topologies of both networks are like what is illustrated in Fig. 4. For with small scale networks, the hardware resources are set at a constant number of 50 channels and 50 multiplexed interfaces. The OSNR limits for transmission in the network are set to 20, 17, 16, and 15, corresponding to the results from low to high BER requirements, respectively. For the large scale network, the simulation generates 389 sets of service requirements, and the rest of the parameters are guaranteed to be the same. The objective function is set in such a way that the weight representing the number of successful services is much larger than the control weight representing the number of multiplexers. It is not hard to see the results in Table 1 indicate that our algorithm can satisfactorily solve the RWA problem on CDC-ROADMs-based network on a small scale or a large scale. Taking the OSNR limit into consideration, the number of successful traffic demands will decrease at a certain rate based on the topology of the target network when the limitation increases, which means long-distance routing is not supported. Furthermore, we can see that the number of occupied multiplexers will decrease when OSNR limitation is at a low level, which means more potential routes are added to the searching space leaving chances of reusing the nodes deployed with multiplexers.

RWA Optimization of CDC-ROADMs Based Network with Limited OSNR

579

Table 1. Results of Simulation. Problem

OSNR limit/Total traffic demands

Small Scale

20/40

17

7

17/40

40

20

16/40

40

20

15/40

40

16

20/389

277

170

17/389

383

125

16/389

389

113

15/389

389

102

Large Scale

Successful traffic demands

Occupied multiplexers

5 Conclusion This paper focuses on the routing and wavelength assignment problems in OSNRconstrained optical transport networks. Then, the study is explained and defined in terms of physical and mathematical models, including the characteristics of OSNR estimation, the functions of multiplexers, the modeling of logical topology diagrams, and the mathematical models targeted by RWA algorithms deployed on optical transport networks based on CDC-ROADMs devices. Based on the problem characteristics and OSNR restriction constraints, we propose two ILP-based modeling schemes to cope with the solution problems of large scale and small scale networks, respectively, and design preprocessing and post-processing algorithms to achieve decision variable generation and wavelength assignment, respectively. The effectiveness of the algorithms on the small scale and large scale networks is verified on the simulated networks, respectively, and more satisfactory results are achieved.

References 1. Kozdrowski, S., Zotkiewicz, M., Sujecki, S.: Resource optimization in fully flexible optical node architectures. In: Proceedings of 20th International Conference on Transparent Optical Networks, pp. 1–4. IEEE (2018) 2. Heymans, A., Breadsell, J., Morrison, G.M., et al.: Ecological urban planning and design: a systematic literature review. Sustainability 11(13), 3723 (2019) 3. Mukherjee, B.: WDM optical communication networks: progress and challenges. IEEE J. Sel. Areas Commun. 18(10), 1810–1824 (2000) 4. Talebi, S., Alam, F., Katib, I., et al.: Spectrum management techniques for elastic optical networks: a survey. Opt. Switch. Netw. 13, 34–48 (2014) 5. Freude, W., Schmogrow, R., Nebendahl, B., et al.: Quality metrics for optical signals: eye diagram, q-factor, OSNR, EVM and BER. In: Proceedings of 14th International Conference on Transparent Optical Networks, pp. 1–4 (2012) 6. Sequeira, D., Cancela, L., Rebola, J.: CDC ROADM design tradeoffs due to physical layer impairments in optical networks. Opt. Fiber Technol. 62, 102461 (2021)

580

P. Yuan et al.

7. Ji, P.N.: Software defined optical network. In: Proceedings of 11th International Conference on Optical Communications and Networks, pp. 1–4 (2012) 8. Sequeira, D.G., Cancela, L.G., Rebola, J.L.: Impact of physical layer impairments on multidegree CDC ROADM-based optical networks. In: Proceedings of International Conference on Optical Network Design and Modeling, pp. 94–99 (2018) 9. Sjodin, M., Johannisson, P., Karlsson, M., et al.: OSNR requirements for self-homodyne coherent systems. IEEE Photonics Technol. Lett. 22(2), 91–93 (2009) 10. Bergano, N.S., Kerfoot, F., Davidsion, C.: Margin measurements in optical amplifier system. IEEE Photonics Technol. Lett. 5(3), 304–306 (1993) 11. Mehta, D., O’Sullivan, B., Quesada, L., et al.: Designing resilient long-reach passive optical networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, pp. 1674–1680 (2011) 12. Yoo, M., Qiao, C., Dixit, S.: QoS performance in IP over WDM networks. IEEE J. Sel. Areas Commun. Spec. Issue Protocols Next Gener. Opt. Internet (2000) 13. La, R.J., Seo, E.: Expected routing overhead in mobile ad-hoc networks with flat geographic routing. IEEE Trans. Mob. Comput. (TMC) (2008) 14. Fukutoku, M.: Next generation ROADM technology and applications. In: Optical Fiber Communication Conference, p. M3A-4. Optica Publishing Group (2015) 15. Zhu, C., Tran, A.V., Chen, S., et al.: Statistical moments-based OSNR monitoring for coherent optical systems. Opt. Express 20(16), 17711–17721 (2012)

Resource Optimization for Link Failure Recovery of Software-Defined Optical Network Yongxuan Lai1 , Pengxuan Yuan1 , Liang Song1(B) , and Feng He2 1 Xiamen University, Xiamen 361005, Fujian Province, China

[email protected] 2 FiberHome Telecommunication Technologies Co., LTD, Wuhan 430205, Hubei Province,

China

Abstract. With the deployment of Software-Defined Network (SDN), traditional IP routers are being upgraded to SDN-enabled switches, enabling traffic reachability in case of a single link failure. However, ideal signal transmission model overlooks the attenuation phenomenon during the transmission process. From the perspective of the Internet Service Providers (ISP), the robustness of the upper application layer has met the stability requirement of users, the cost of network construction, like the use of multiplexers on routers, shall instead get more emphasis. Therefore, we propose an integer planning-based single-link failure recovery algorithm for Optical Signal Noise Ratio (OSNR) limited optical networks, which rationally uses resources of CDC-ROADM (Colorless Directionless Contentionless Reconfigurable Optical Add-Drop Multiplexer) devices. Simulation results show that our proposed scheme can achieve fast recovery and relatively high reachability from any single link failure at a low cost of multiplexers resources on OSNR limited optical networks. Keywords: Resource Optimization · SDN · Optical Network · Link Failure Recovery

1 Introduction The optical transmission network is the backbone network that carries communication between large areas. As the network scale increases, users demand higher requirements for the stability and availability of network applications. When designing and building the network, network operators need to ensure the network services’ robustness [1]. Stable network services need to have two conditions; on the one hand, the network traffic will not be out of paralysis when a failure occurs, and the service traffic can have an alternate path or be re-planned. On the other hand, the recovery of network services needs to be responsive [2], and the quality of service is guaranteed to be stable. In practice, physical networks often face interference from many natural and human factors, such as earthquakes, construction, natural weathering, aging, etc. [3]. When physical problems cause network failures, in addition to quick line repair, network operators often need to achieve speedy failure recovery with the help of algorithms under SDN [4]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 581–592, 2023. https://doi.org/10.1007/978-981-99-4755-3_50

582

Y. Lai et al.

In the existing network failure recovery research, there are two main failure recovery modes: passive reconstruction mode and active protection mode [5, 6]. In the passive reconstruction mode, when the network breaks the fiber, the interrupted service traffic needs to be re-planned with routing and channel allocation. In this mode, the network does not need to allocate or reserve hardware resources in advance. Still, it recalculates whenever a problem occurs, which makes the failure recovery process in this mode timeexpensive, requires better computing cores, and brings additional signaling overhead [7]. In the active protection mode, the service requirements have multiple alternative paths and channel allocation schemes [8]. At the same time, network resources are reserved for immediate switching of service traffic in case of network failure. To improve the robustness of the network while reducing the latency of failure recovery, proactive protection schemes are favored by more network operators and are also of more academic research interest [9, 10]. However, for network service providers, the reserved network resources are the critical reason for the high network construction cost [11, 12]. How to reduce the cost of laying hardware facilities and minimize the number of nodes for laying out hardware devices are essential indicators of algorithm efficiency and availability. To address this situation, this paper proposes an integer planning based single link failure recovery algorithm for OSNR-constrained optical networks, which innovatively considers the constraints on the OSNR of optical paths in the route and the rational use of resources for CDC-ROADM devices [13] while planning alternative recovery paths. The algorithm employs a two-stage model. In the first stage, it integrates the routing situations in the network topology of the complete network and all the fragmented networks in the single link breakage state for the overall routing planning. The second stage completes the calculation of wavelength assignment and interfaces assignment. In the simulation experiments, the effectiveness of our method is verified on the simulated network both on small-scale and large-scale.

2 Introduction 2.1 Physical Model In the WDM network model presented in this paper, modulation devices are exclusively deployed on graph nodes. Each node can adjust its wavelength allocation and routing direction based on the RWA objective. This means that a node can assign a new wavelength to a signal passing through it, and the OSNR of the emitted signal will be re-estimated due to the modulation and demodulation processes [14]. To simplify the computation of OSNR, it is approximated by a function negatively correlated with the number of links within a path and the transmission distance, as shown in Fig. 3. Links between nodes with CDC-ROADM devices consist of a set of wavelength channels with different services on spectrum slots. It should be noted that a specific spectrum slot cannot be shared by two or more services simultaneously. As shown in Fig. 1, when an optical signal passes through nodes, the OSNR can be increased through demodulation and modulation, but this comes at the cost of multiplexers, which have limited quantity. Similarly, signals modulated by the node can be assigned to their original wavelength or a new spare wavelength channel.

Resource Optimization for Link Failure Recovery

583

Once a specific multiplexer is allocated to signal modulation of a certain service, it cannot work for other services simultaneously.

Fig. 1. Two possible signal transmission status

In optical transmission networks, a complete routing path is referred to as a lightpath composed of a series of fiber links with varying lengths. To calculate the OSNR of a signal transmitted along a lightpath, we have designed an equation inversely related to the length of each link and the number of links within the lightpath. Specifically, the OSNR of the signal carried by a lightpath is determined by the number and length of the links in the path when any multiplexer does not process the signal on any intermediate node along the path. However, due to the limitations of OSNR attenuation, long-distance transmission is not supported, which necessitates the deployment of multiplexers on nodes. 2.2 Graph Model In the OTN topology given by G(V , E), nodes V are connected by fiber links E with a fixed number of spectrum slots. The graph is bidirectional, meaning if a link starts from node i to j, the reverse link is also included. Each link has a set of frequency slots denoted by  = λ1 , λ2 , . . . , λn , where || is the number of slots existing in a link and usually fixed. Nodes can be equipped with multiplexers to amplify and change wavelength channels. The set of available multiplexers is represented by M = m1 , m2 , . . . , mn with each multiplexer having a cost of two occupied interfaces for uploading and downloading data. |M | is the number of available multiplexers and is also the same fixed value for all nodes on the graph. Interfaces cannot be shared among different services using the same wavelength, and the number of available interfaces is limited. The set of static traffic demands required to be routed and allocated wavelength channels is denoted by T with each demand described by a tuple y(s, d ), where s is the source node and d is the target node.

584

Y. Lai et al.

Fig. 2. Single Link Failure Illustration

As shown in Fig. 2, R0 represents the network topology without fiber breakage problem, and the number of deployed relay hardware resources (CDC-ROADMs) is noted as Rl0 . Considering any link in OTN has the possibility of fiber breakage, the derived subgraphs R1 , R2 , . . . , Rn are generated with the same number of links as the original graph. Given such a situation, our design aims to achieve the following goals. First, it should calculate the spare routes for the affected services when a single link failure occurs. Second, considering the economic costs and labor costs needed to deploy multiplexers on nodes, hardware resources shall be reused as much as possible. Third, calculated results are required to meet the basic RWA principles and OSNR constraint we discussed above.

3 Optimization Method 3.1 ILP Formulation of Centralized Optimization To handle the computational cost of finding all paths between a pair of nodes in a large graph, the k shortest path algorithm is used to balance completeness and efficiency. To further reduce the search space, paths are segmented into subpaths. However, simply splitting the paths on a small scale produces too many variables for computation, so a preprocessing step is taken to compress the search space by integrating more links into a subpath to reduce the number of nodes requiring multiplexers. Because the estimation of OSNR is simulated by an equation negatively related to the length of each link and the number of links within the lightpaths, the optimal substructure is determined by a greedy algorithm. It cuts the complete lightpaths at the furthest node from the start that can still maintain OSNR above the limit to ensure every subpath meets OSNR constraint. After finishing the computation of potential alternative subpaths for all services, the coding of decision variables is performed. The decision variable is designed as a Boolean variable of the form xtpe , which represents that a service t occupies a subpath p on a broken link e topological network. Unlike the modeling approach based on links as decision variables, this section uses a two-stage RWA algorithm model, where the interface assignment on nodes and the wavelength assignment problem on the optical path are further computed based on the ILP results, which further reduces the search space of the ILP solver and makes the problem solvable.         T e  − (1) Yt,e + B Maxe xtpe minA e

t,e

noden

Stpe =Ste ∧Stpe =noden

Resource Optimization for Link Failure Recovery

585

The objective function has two components; the first half represents the service requests that fail in route planning, whose coefficients are denoted by A, and the second part is an estimate for the multiplexer resources deployed in the graph, whose coefficients are denoted by B. It is possible to choose whether the priority is the number of successful services or the priority of the deployed resources in the topological network, depending on the focus of the algorithm. In Eq. 1, Stpe represents the head node with xtpe , Ste represents the departure node for service t in the broken fiber topology, and noden represents all nodes in the network where hardware resources may be deployed. Considering the case of network topology map multiplexing on the same node in different fiber-break states, for a given node n, the number of multiplexers deployed on it is determined by the sum of the subpaths (optical paths) starting from it. For different network topologies, the maximum value should be taken as the upper limit, which allows the node to provide sufficient carrying capacity for all network routing in the case of single-link fiber breakage, while the sum of resources deployed on all nodes in the network is one of the overall optimization goals.  xtpe = Yt,e , ∀(t, e) ∈ T e (2) 

Stpe =Ste

Dtpe =Dte

 Dtpe =v

xtpe =

 tpe∈Ptev

xtpe = Yt,e , ∀(t, e) ∈ T e

 Stpe =v

xtpe , ∀v ∈ VIOe , (t, e) ∈ T e

xtpe ≤ 1, ∀v ∈ Vte , (t, e) ∈ T e

(3) (4) (5)

It can be seen that Eq. 2 and Eq. 3 are uniqueness constraints on the success of the service, at most one unit of traffic can flow out from the starting node of the service, and at most one unit of traffic can flow in at the terminating node of the service, if no traffic occurs then the service Yt,e route assignment is not successful. Equation 4 is a guarantee that the incoming traffic must be strictly equal to the outgoing traffic at all non-business starting and terminating nodes in the topological network, and VIO represents the set of nodes of this class in the network. Since the subpaths take the nodes of potential deployment resources as the starting point, the nodes in the set may not need to cover all nodes in the topological network, and this constraint guarantees that for a particular complete routing route, the optical paths are connected. Equation 5 is an anti-loop formation constraint in the network topology, which restricts the generation of loops in the form of a constraint instead of using an objective function in order to avoid long-distance transmission and reduce unnecessary multiplexer deployment. Ptev denotes the set of all sub-paths of the service t starting at node v in the network topology with broken fiber link e and Vte denotes the starting node of all potential subpaths in the graph, this constraint reduces the search space for the solver to search for the optimal solution.  xtpe ≤ , ∀t, e ∈ T e ∀i ∈ E (6) tpe∈Pi

In the framework of the algorithm designed in this section, we place the allocation of wavelengths in the optical path and the allocation of multiplexer interface resources after

586

Y. Lai et al.

the calculation of the ILP procedure. Therefore, we only need to satisfy the constraints on the number of link wavelength resources and the number of potentially deployed multiplexer resources. In the model of the static RWA algorithm, each physical link has the same carrying capacity for the wavelength channels, so  is set as a constant, as shown in Eq. 6. In the encoding stage of the decision variable xtpe , the nodes in the topological network that potentially need to be deployed with multiplexers can be identified by examining the subscript p. The location and number of these nodes are determined by the path-splitting strategy and KSP algorithm in the decision variable preprocessing algorithm.  xtpe ≤ Mv , ∀v ∈ VIO , e ∈ E (7) v∈Stpe ∧v ∈S / te

Mv ≤ |M |∀v ∈ VIO       T e  − minA Yt,e + B e

t,e

(8)

nodev

 Mv ∀v ∈ VIO

xtpe ∈ {0, 1}

(9) (10)

As shown in Eq. 7, for a node, the decision variable Mv (integer variable) is set to denote the upper limit of resources laid at that node. Each Mv decision variable in turn satisfies the upper limit of resources |M | allowed by the network respectively, where |M | is a constant. Therefore, for all nodes, a resource ceiling constraint is established, as in Eq. 8 The form of the objective function changes due to the introduction of the decision variable Mv , see Eq. 9. 3.2 Distributed Optimization for Large Scale Network For a large-scale network, the number of variables is likely far beyond mathematical planning solvers’ capability. We propose a heuristic scheme tackling this issue, which can be implemented in a distributed manner, the specific steps are as follows; For a single broken fiber network, the method in routing distribution refers to the above algorithm to calculate lightpaths and deployment nodes. Then merge, update the hardware resource occupation situation in the reference network, iterate the overflow services back into the planning model, avoid the points and links where the reference network resources have been allocated, calculate the results, and loop this process until all the service routes are allocated. Referring to Algorithm 1, the Optimization algorithm module is mainly responsible for iterating over the failed services until all services are successfully allocated, and the core parameter k in the KSP algorithm needs to be continuously increased during the iterations in order to expand the solution space for the subsequent service allocation problems to find feasible solutions. The function is responsible for calculating the network routing results for all single-link broken fiber states and adding the calculated results to the resultList of the current iteration.

Resource Optimization for Link Failure Recovery

587

Algorithm 1: Distributed Optimization Scheme Required: Service, EdgeList, Graph Ensure: Service Routing function CALCULATOR (Service, EdgeList, Graph, K) iĕlen(Service) while i >0 do for egde in EdgeList do resutlList ĕcal(Service, edge, Graph, K) end for i =i-1 end while return resultList end function function MERGER(resutlList, Graph) iĕlen(resultList) while i>0 do for result in resultList do failedServices ĕrenew(Graph, result) end for end while return failedServices end function function OPTIMIZATION(Service, EdgeList, Graph) K = 1, i = 1 while i < IterationTimes&&len(failedServices) >0 do resultList = Calculator(Service, EdgeList, Graph, K) failedServices = Merge(resultList, Graph) K=K+1 end while end function

The Merge module is responsible for two functions, the first is to collect all the results of the resultList and the second is to update the resource occupation status in the reference network, the resource occupation is not simply summed up, in the renew function the resource occupation of the nodes in the network is calculated for each resultList result allocation route and the maximum value is calculated if the maximum value exceeds the fixed resource limit in the network, then some of the allocated services need to be withdrawn and dropped into the set of failed services failedService to participate in the next iteration, and before the next iteration, the hardware resource occupancy in the reference network is updated and the links and nodes in the reference network Grapth that have exhausted their resources are deleted. Due to the nature of the heuristic algorithm, it is not guaranteed that all services will be successful in this part of the routing design. Therefore, to prevent the iterative process from going on indefinitely, an upper limit Iteration is set on the number of iterations. 3.3 Wavelength and Interface Assignment The purpose of the post-processing algorithm is to bring the planned RWA results as close as possible to the upper bound of the ILP calculation results, focusing on two

588

Y. Lai et al.

aspects: first, the need for further allocation of available channels and interfaces to the calculated decision variables of optical paths and multiplexer nodes; second, the ability to reallocate paths for service requests that fail to be allocated resources to. Referring to the heuristic strategy above, the post-processing algorithm proposed in this section uses a brute-force search approach for resource allocation, searching the available allocation space sequentially based on the importance of business requests or randomly. At the end of a round, the remaining business requests go to the next round of the ILP program iteration.

4 Performance Evaluation 4.1 Parameters Setting of Simulation In this section, we evaluate the proposed algorithm’s performance on small-scale and large-scale networks through simulation experiments. On the simulated data, topology graphs with a certain number of nodes and edges are generated based on the Erd˝os-Rényi [15] model. The estimation and limitation of OSNR degenerate to the case of limiting the cut to subpaths by hop count. In what follows, this parameter is noted as the cut granularity, which refers to the maximum link size that the length of the subpaths can reach in the cut of the complete path. In addition, for demand generation, the experiment first generates a sampling pool between all node pairs and simulates the actual service traffic requests by randomly and repeatedly sampling the service samples in the sampling pool. Table 1. Simulation Parameters Setting Parameters Index

V

E

Size

Multiplexer

Slots

Interface

K

OSNRG

1

5

8

15

10

10

30

5

2

2

10

30

45

30

30

100

5

2

3

15

50

80

40

40

120

5

2

4

20

65

120

50

50

150

5

2

5

25

80

160

70

70

210

5

2

4.2 Results Discussion Table 1 lists 5 groups of parameters in the experiments. “V” represents the number of nodes in the topology graph, “E” represents the number of edges, “Size” represents the number of samples taken from the sampling pool, “Multiplexer” is how many resources of this type can be used per node, and “Slots” is the number of channels available in network links. “Interface” is the number of interfaces available on a multiplexer. A higher OSNRG value means longer transmission without signal enhancement/wavelength conversion, while a value of 1 indicates sub-paths composed of only one link, with the sub-path

Resource Optimization for Link Failure Recovery

589

set size related to the link size multiple. Each group of parameter settings was tested 10 times, with the coefficient of successful services fixed at 1000 and the coefficient of occupied multiplexer resources fixed at 1.

Fig. 3. Successfully Allocated Services Comparison

Fig. 4. Multiplexer Occupation Comparison

As shown in Fig. 3 and Fig. 4, with the increase of network scale, centralized and heuristic algorithms struggle to allocate services due to increased competition for service resources. Additionally, larger networks require more hops to reach target nodes and multiple optical paths to be stitched together, leading to increased hardware resource usage. In smaller networks with high connectivity, both algorithms perform similarly due to ample redundancy in network resources and variable space. However, in larger networks, the heuristic algorithm’s performance decreases due to higher resource coupling and worse success rates for service allocation, requiring expanded decision variables and more iterations. Although the centralized algorithm takes longer and uses more storage space due to different structural designs and solution spaces, it performs better in terms of service allocation success. As in Fig. 5, the experiments compare the performance of the centralized algorithm under different OSNR constraints. Due to the simplification of OSNR estimation, only the value of the OSNRG parameter is adjusted here to represent the granularity of the

590

Y. Lai et al.

Fig. 5. The Impact of OSNR on Planning Effectiveness

longest optical pathway supported by the decision variables. As can be seen from the results in the figure, when the granularity of the subpaths is just 1, the decision variable degenerates into the state of the link, and the demand for hardware resources in the network rises sharply due to channel conversion and signal enhancement. When the value of OSNRG is 2 and above, in a topology with high connectivity, it often takes less than two hops to reach the destination node, but still, some of the more distant service requests need to use the multiplexer resources. Meanwhile, it can be seen that as the OSNRG rises, the alternative set of solution space gradually increases, and the success rate of the service gradually rises to 1.

Fig. 6. The Impact of k value on Planning Effectiveness

As shown in Fig. 6, the experiments compare the performance of the centralized algorithm under different k values. k value affects the size of the decision variables in the alternative set, in practice, the parameters are set in line with the second set of parameters in Table 1, changing only the k value size. When the value of OSNRG is 2, the service requests do not need multiple optical paths connected to reach the target node, and the reduction in the number of hardware resources occupied is not significant. Moreover, when the value of k increases, the solution space with a small k value is a

Resource Optimization for Link Failure Recovery

591

subset of the solution space of the model with a large k value in terms of success rate, so the success rate of service planning is 1.

Fig. 7. The Impact of hardware resources on Planning Effectiveness

As shown in Fig. 7, the experiments compare the performance of the centralized algorithm with different numbers of hardware resources, and the Multiplexer value affects the degree of freedom of the optical path combination. In practice, the parameter settings are consistent with the second set of parameters in Table 1. In order to avoid the case where the OSNRG value is 2 and the services are directly connected to mask the resource constraint, the OSNRG value is set here to a fixed 1, indicating that the hardware resources must be occupied between the links. It can be seen that the algorithm requires additional hardware resources in order to assign more service routes, and the increase of Multiplexer value can significantly improve the success rate of service route assignment with constant network size.

5 Conclusion This paper focuses on the service recovery problem in the case of single-link fiber breakage in OSNR-constrained optical transport networks. Two ILP-based two-stage algorithmic frameworks are proposed to cope with different computational requirements and topological networks, including a centralized optimization algorithm and a distributed optimization algorithm. Both algorithms use ILP as pre-processing to plan the lightpaths to find the upper bound, and post-processing to assign the wave channels, interfaces and failed service requests. The effectiveness of the algorithms is verified on the simulated network.

References 1. Rak, J., Girao-Silva, R., Gomes, T., et al.: Disaster resilience of optical networks: State of the art, challenges, and opportunities. Opt. Switch. Netw. 42, 100619 (2021) 2. Kozdrowski, S., Zotkiewicz, M., Sujecki, S.: Resource optimization in fully flexible optical node architectures. In: Proceedings of 20th International Conference on Transparent Optical Networks, pp. 1–4 (2018)

592

Y. Lai et al.

3. Maslo, A., Hodzic, M., Skaljo, E., et al.: Aging and degradation of optical fiber parameters in a 16-year-long period of usage. Fiber Integr. Opt. 39(1), 39–52 (2020) 4. Amin, R., Reisslein, M., Shah, N.: Hybrid SDN networks: a survey of existing approaches. IEEE Commun. Surv. Tutor. 20(4), 3259–3306 (2018) 5. Ali, J., Lee, G.M., Roh, B.-H., et al.: Software-defined networking approaches for link failure recovery: a survey. Sustainability 12(10), 4255 (2020) 6. Qiu, K., Zhao, J., Wang, X., et al.: Efficient recovery path computation for fast reroute in large-scale software-defined networks. IEEE J. Sel. Areas Commun. 37(8), 1755–1768 (2019) 7. Petale, S., Thangaraj, J.: Link failure recovery mechanism in software defined networks. IEEE J. Sel. Areas Commun. 38(7), 1285–1292 (2020) 8. Mohan, P.M., Truong-Huu, T., Gurusamy, M.: TCAM-aware local rerouting for fast and efficient failure recovery in software defined networks. In: Proceedings of IEEE Global Communications Conference, pp. 1–6 (2015) 9. Chu, C.Y., Xi, K., et al.: Congestion-aware single link failure recovery in hybrid SDN networks. In: Proceedings of IEEE Conference on Computer Communications, pp. 1086–1094 (2015) 10. Li, Q., Liu, Y., Zhu, Z., et al.: Bond: flexible failure recovery in software defined networks. Comput. Netw. 149, 1–12 (2019) 11. Poularakis, K., Iosifidis, G., Smaragdakis, G., et al.: One step at a time: optimizing SDN upgrades in ISP networks. In: Proceedings of IEEE Conference on Computer Communications, pp. 1–9 (2017) 12. Ghannami, A., Shao, C.: Efficient fast recovery mechanism in software-defined networks: multipath routing approach. In: Proceedings of 11th International Conference for Internet Technology and Secured Transactions, pp. 432–435 (2016) 13. Sequeira, D., Cancela, L., Rebola, J.: CDC ROADM design tradeoffs due to physical layer impairments in optical networks. Opt. Fiber Technol. 62, 102461 (2021) 14. Simmons, J.M., Saleh, M., et al.: Wavelength-selective CDC ROADM designs using reducedsized optical cross-connects. IEEE Photonics Technol. Lett. 27(20), 2174–2177 (2015) 15. Seshadhri, C., Kolda, T.G., Pinar, A., et al.: Community structure and scale-free collections of Erd˝os-Rényi graphs. Phys. Rev. E 85(5), 056109 (2012)

Multi-view Coherence for Outdoor Reflective Surfaces Shuwen Niu, Jingjing Tang, Xingyao Lin, Haoyang Lv, Liang Song(B) , and Zihao Jian School of Informatics, Xiamen University, Xiamen 361005, China [email protected]

Abstract. In this work, we address the deficiency that traditional multi-view matching algorithms cannot be applied to outdoor reflective surfaces. Since glass reflects different scenes under different viewpoints, which violates the multi-view coherence, it leads to the traditional 3D reconstruction algorithm to produce wrong estimation of camera poses, which eventually leads to the failure of reconstruction of outdoor buildings with glass surfaces. This paper proposes a pipeline that can improve the accuracy of reconstructing outdoor reflective surface. Firstly, we use a drone path planning algorithm that enables the dataset captured by the drone to have the maximum 3D reconstruction capability. Then, we propose a multi-view matching algorithm based on control points, which largely improves the accuracy and robustness of the current 3D reconstruction system. Finally, in order to restore the texture details of the 3D model, we replace the inaccurate texture mapping automatically generated by the modeling software with a new texture mapping through texture coordinate remapping. These three works together constitute a novel system that can reconstruct outdoor scenes with reflections in high quality. Finally, we apply our method to a variety of outdoor scenes with reflective surfaces, such as teaching buildings, libraries, and shopping malls, and we find that our method can produce the most excellent 3D reconstruction results. Keywords: Multi-view Coherence · Control Points · Outdoor Reconstruction · Reflective Surfaces

1 Introduction The 3D reconstruction algorithm based on multi-view images is a classical problem in computer vision, which can construct highly detailed 3D models from a series of images and is crucial for applications such as VR/AR, robotics. However, current 3D reconstruction methods are still challenging for outdoor scene reconstruction, because most outdoor buildings are glass surfaces with strong reflections and transmissions, which can make the computer ambiguous and unable to know that this part does not belong to the model surface, thus leading to the failure of outdoor scene reconstruction with reflections. The traditional 3D reconstruction algorithm is based on the Multi-view Stereo Vision (MVS) principle, and its performance mainly depends on the quality of the input image and the output camera parameters during Structure-from-Motion (SfM). Since the texture information of the reflective surface is very complex and irregularly © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 593–603, 2023. https://doi.org/10.1007/978-981-99-4755-3_51

594

S. Niu et al.

transformed with the viewpoint movement, it can lead the 3D reconstruction algorithm to incorrectly estimate the camera parameters in the SfM stage, which leads to the failure of scene reconstruction. Therefore, this paper proposes a novel 3D reconstruction pipeline that is suitable for outdoor scene reconstruction with reflections. Specifically, our main contributions are as below. • We implement an optimization method to plan the flight path of programmable UAV, which ultimately captures datasets with excellent reconstruction capabilities. • We introduce a multi-view matching algorithm based on control points, which automatically optimizes the parallel matching between images using a small number of manually labeled control points to guide the SfM stage to correctly estimate the camera parameters, and finally improves the accuracy of 3D reconstruction. • We apply the texture remapping method to the generated model, which enriches the texture details of the model and improves the overall realism of the model. We have conducted experiments on the reconstruction pipeline proposed in this paper for various outdoor scenes such as teaching buildings, shopping malls and libraries, and the experimental results show that the pipeline can greatly improve the accuracy of 3D reconstruction and outperform all kinds of current 3D reconstruction algorithms.

2 Pipeline Method 2.1 Drone Path Planning for Image Collection With the development of UAV technology, drones are often used in 3D reconstruction tasks. Because UAVs can capture high spatial and temporal resolution images and have unique high-altitude viewpoints, researchers often use UAVs to photograph high-rise buildings for large-scale outdoor scene reconstruction. In fact, the flight of UAVs is physically constrained and frequent changes of direction and turns can greatly increase power consumption, therefore, multiple objectives optimization is suitable for our problem. Considering multiple objectives is a long-standing research topic from traditional optimization to machine learning [1, 2]. In our method, we use a small UAV to collect images of outdoor surfaces, by planning the UAV path based on the RRT* [3] algorithm. The method can optimize both the reconstruction capability of the viewpoint and the path quality under the physical constraints, so as to obtain a dataset with excellent reconstruction capability. Since the application scenario of this paper targets outdoor building reconstruction, we can approximate the building as a rectangle, then we only need to sample the viewpoints of his five faces (excluding the bottom). As shown in Fig. 1, taking the center of the rectangle as the center of the sphere O (0,0,0) and half of the diagonal length of the rectangle as the radius, we get a sphere. We use the volume of the sphere as the region for collision detection. Since the Tello UAV used in this paper has only a single camera, we need to rotate the UAV to a certain angle to align the camera with the building for each viewpoint sampling. To simplify the calculation, we set the default orientation of the UAV to be the positive half-axis of X . The vector of the line between the current position of the UAV

Multi-view Coherence for Outdoor Reflective Surfaces

595

Fig. 1. We approximate the building as a rectangle, the center of the building is O, and the light pink area is the collision detection area.

and the center of the sphere is (x, y, z), then the angle that the UAV needs to be rotated is: θ = arccos(

a·b ) a · b

(1)

Algorithm 1 describes the drone path planning algorithm, where the function ChooseParent(Xnew , xnear , xnew ) finds a node xmin as the parent of xnew such that the path P satisfies the objective function maximization in Eq. (2).

E(Traj) = Eg (Traj) + αe Ee (Traj) − αt Et (Traj)

(2)

596

S. Niu et al.

We use the same objective function as in [4], where Eg denotes the information captured over the entire trajectory and Ee denotes the information captured per unit length, both of which encourage finding the path with the best reconstruction capability per unit length. Et denotes the sum of the angles of turning, making the total rotation angle as small as possible, thus making the path smoother. 2.2 Multi-view Control Points Matching Algorithm SfM is an important process in traditional 3D reconstruction algorithms, which reconstructs sparse 3D scenes from a series of images taken from different viewpoints, while outputting the camera’s internal and external parameters as the basis for subsequent dense reconstruction. The current optimal SfM algorithm [5] can estimate the camera parameters of diffuse reflection scenes well, but there is often a problem of low accuracy on the reconstruction of outdoor scenes with reflections, which is due to its use of SIFT features [6] to describe the key points in the feature matching stage. The SIFT features are disturbed by the complex reflective texture of the image and cannot correctly identify the corresponding feature points, which leads to feature matching failure and thus incorrect estimation of camera parameters. We propose a multi-view control points matching algorithm (see Algorithm 2) to improve this deficiency, optimize the feature matching results of SIFT, and guide the SfM process to output accurate camera parameters, thus improving the accuracy of 3D reconstruction of outdoor scenes with reflections. The overall pipeline is illustrated in Fig. 2.

Control Points Coherence Search. Instead of directly using sift to search control points, we add a preprocessing step using a template matching algorithm [8] to search for possible control points. The reasons are as follows. a. The images captured by the UAV are relatively continuous. Therefore, it is efficient to move the template around in continuous images without computing corresponding points in the whole image.

Multi-view Coherence for Outdoor Reflective Surfaces

597

b. Because reflective features are more salient in area than those in single pixel, template matching algorithm will be more effective because it considers template in computation.

Fig. 2. Illustration of the Multi-View Matching Algorithm Based on Control Points. The matching results in the middle figure are computed by the template matching and RANSAC algorithms in OpenCV [7], and the SIFT algorithm implemented by ourselves. The application to 3D Stereo in the right figure is processed in RealityCapture, which only uses the control points as input and will be analyzed in Sect. 3.

First, we select an image Ir as the reference image from the input images I = {Ii |i = 1...NI } and manually label a small number points as the set of control points Fr = {pj |j = 1...NFr }. The role of control points is to avoid reflection interference and guide image alignment. Therefore, when users specify control points, they should avoid the pixel points in the reflection area and should choose the places where the image gradient changes significantly and the image features are more obvious, such as the corner points of window frames, the corner points of nameplates hanging on walls, etc. In order to accurately reconstruct outdoor scenes with reflections, we should be able to uniquely identify control points in all images that overlap with the reference image. Next, we get the set of candidate points Cij of the ith image about the control point pj based on the template matching algorithm, so the set of all candidate points of the ith image is Fi = {Cij |j = 1...NFr }. Through this step, we focus the possible matching feature points on a few candidate points. Specifically, we take the control point pj as the center and some domain size as the radius to get the template image T , which is squared difference matched with the ith image in Eq. (3). Finally, we arrange the results in order and take a number of minimum values as the set of candidate points Cij . R(x, y) =

 x ,y

(T (x , y ) − Ii (x + x , y + y ))

2

(3)

Then we find the most similar candidate point in the candidate point set Cij with the feature of control point pj . SIFT features remain invariant to rotation, scale size, and luminance changes, so we use SIFT features to measure the similarity between the control point pj and the candidate point set Cij , so as to find the best candidate point pj . It

598

S. Niu et al.

is worth noting that the best candidate point may be empty, because the user-annotated control points may not be observed in some viewpoints. Finally, we obtain the image pairs C = {{Ia , Ib }|Ia , Ib ∈ I , a < b} with overlapping regions and their control point correspondence Mab ∈ Fa × Fb . Optimized Matching Based on Control Points. Given a point x and its corresponding point x , there exists the Epipolar Geometric Constraint [9]. In this paper, we use a UAV to collect images, so the camera internal reference matrix K is known, i.e., the camera is calibrated. We can obtain the essence matrix E of the camera by no less than five matched pairs of points according to Eq. (4), and then find the camera’s pose parameters R, t by the conventional matrix decomposition method. Therefore, the accuracy of feature matching is crucial for the estimation of the camera parameters. In this paper, we optimize the SIFT feature matching based on the control point matching relationship obtained in the previous stage, and eliminate the influence of reflection texture on the feature matching stage to get the correct camera parameters. T

xi K −T EK −1 xi = 0

(4)

In practice, we use the RANSAC algorithm [10] to optimize the feature matching of images. The algorithm flow is as follows. a. We first randomly select more than 8 control points [11] or using the 5-point method [12] in a certain image pair and solve the camera matrix. b. We use the conventional SIFT feature matching algorithm to obtain dense noisy feature matches, and then count the number of point pairs in these matches that satisfy the epipolar geometry in Eq. (4). c. After repeating (a)(b) several iterations, we select the feature matches with the maximum number of point pairs satisfying the constraint as the refined matching results. d. Finally, we successfully filter out the incorrectly matched feature points and obtain ˜ ab . the geometrically verified image pair C˜ and their matching relationship M 2.3 Application to Multi-view Stereo Application Incremental Reconstruction. For a point in the 3D scene, it is reprojected into the 2D image by the camera parameters, which should maintain multi-view coherence, so we use Bundle Adjustment [13] to optimize the reprojection error in Eq. (5). We use an incremental reconstruction process similar to [5] and perform a nonlinear joint optimization of the camera pose parameters [9] as well as the 3D point set [14].  min ||Pi X − xi ||2 , P = K[R|t] (5) i

This stage outputs camera poses P = {Pc ∈ SE(3)|c = 1...Np } for all aligned images and a set of 3D points X = {Xk ∈ R3 |k = 1...NX } constituting a sparse 3D scene. Texture Remapping. Based on the optimization algorithm mentioned earlier, we obtained a more accurate 3D model, however, the texture mapping generated by the traditional 3D reconstruction algorithm is blurred in detail and not realistic enough to

Multi-view Coherence for Outdoor Reflective Surfaces

599

show the details of the reflection texture. In order to restore the model details, we also need to re-texture the model. In computer graphics, the essence of texture mapping is an image, and the coordinates of the image are usually referred to as uv coordinates, with both axes taking values in the range [0,1]. Since 3D models generally have irregular geometric structure, it is difficult to describe the mapping between model and uv coordinates mathematically, which makes it difficult to correspond the new texture mapping to the original texture coordinates of the model, so we need to establish the mapping between 3D model and new texture mapping by texture remapping. Since the surfaces of buildings with reflections are usually smooth, relatively regular in shape, and roughly perpendicular to the xoy plane in space, we temporarily ignore the effect of coordinates in the y-direction during texture remapping. First, we traverse the set of 3D points on the reflective surface of the model, calculate their 3D Euclidean distances to the origin, find the nearest and farthest points from the origin, and set their texture coordinates to (0, 0, 0) and (1, 1, 1), respectively. Next, we change the texture coordinates of the remaining 3D points based on these two points, according to the relative scale of the model. Finally, we repeat the above operation for all building surfaces.

3 Experimental Results In this section, we first show the experimental results of our proposed coherence search algorithm in Sect. 3.1. In order to show that our reconstruction pipeline for outdoor scenes with reflections is effective, we compare the results with the state-of-the-art 3D reconstruction system RealityCapture [15] in Sect. 3.2. In Sect. 3.3, we show the effectiveness of the texture remapping method. For the simplicity of experiments, we only use iPhone to collect pictures of the outdoor surfaces. 3.1 Comparison of Control Points Matching This experiment is operated using a visual interface, where the user labels no less than five control points in the reference image, and then uses a control point coherence search algorithm to find the key points that match the control points in each image. The control point matching results for the two scenes are shown in Fig. 3, with the reference image on the left and the target image on the right in each image. 3.2 Comparison of 3D Reconstruction We import the results of control point matching and the pre-estimated camera matrix into RealityCapture for comparison experiments. The 3D reconstruction results of scene 1, scene 2, and scene 3 without the use of control points and pre-estimated camera matrices are shown in Fig. 4. We can observe that after the image alignment process, scene 1 is incorrectly divided into three components: component 0, component 1, and component 2. The point cloud result of component 0 is more complete, but the model shape is

600

S. Niu et al.

very distorted, as shown in the red box marked in (a). Component 1 and component 2 are incomplete, where component 1 can be seen to represent the scene marked by the red circle in (b). Scene 2 has few features due to the scene itself (a white wall with reflections), resulting in the reconstructed model with the error marked by the red circle in (d). Scene 3 is also divided into two components. The point cloud in the area marked by the red circle in component 0 is sparse, and the windows have obvious holes. The model of component 2 is incomplete. As a conclusion, the reconstruction by RealityCapture for outdoor scenes with reflections has significant distortions. Figure 5 shows the reconstruction results after adding the control point matching and the pre-estimated camera matrix. We can see that the model is not incorrectly divided into several components, and many problems are significantly improved. The overall shape distortion of scene 1 is significantly improved, although there is a hole in the area shown in red. Scene 2’s hole is filled and the overall shape is better. The hole problem of scene 3 is also solved, and the overall result is improved to some extent. The experimental results show that the multi-view control points matching algorithm proposed in this paper is effective for the task of reconstructing outdoor scenes with reflections, and improves the effect of 3D models to a certain extent, mainly in terms of the increased number of point clouds, smoother planes and fewer voids, and less model distortion.

Fig. 3. Results of control point matching.

3.3 Comparison of Texture Remapping We compared the original 3D model which generated directly by RealityCapture with the one after texture remapping in Fig. 6. We applied our texture remapping method

Multi-view Coherence for Outdoor Reflective Surfaces

601

Fig. 4. The 3D reconstruction results of scene 1, scene 2, and scene 3 without the use of control points and pre-estimated camera matrices. (Color figure online)

on three scenes, and the results show that the model after texture remapping is more detailed and more realistic.

4 Limitations and Conclusion In practice, our method would fail in some extreme cases, such as scenes with very complex reflection textures or forward-facing shots of fully transparent glass. Also, our control points matching algorithm suffers from computational inefficiency, so we later consider extending our work using GPU-accelerated algorithms. In summary, this paper proposes a novel 3D reconstruction pipeline for outdoor scenes with reflections consisting of three main tasks, which are UAV path planning algorithm, multi-view matching algorithm based on control points, and texture remapping of the built model. Our pipeline can handle all kinds of outdoor scenes with reflections, while achieving better 3D reconstruction results compared with the current state-of-the-art 3D reconstruction systems.

602

S. Niu et al.

Fig. 5. The 3D reconstruction results of scene 1, scene 2, and scene 3 with the use of control points and pre-estimated camera matrices. (Color figure online)

Fig. 6. Comparison results of texture remapping.

Multi-view Coherence for Outdoor Reflective Surfaces

603

References 1. Zheng, X., Ji, R., Chen, Y., et al.: Migo-nas: towards fast and generalizable neural architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 43(9), 2936–2952 (2021) 2. Zhang, S., Jia, F., Wang, C., et al.: Targeted hyperparameter optimization with lexicographic preferences over multiple objectives. In: Proceedings of 18th International Conference on Learning Representations. Online (2023) 3. Karaman, S., Frazzoli, E.: Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res. 30(7), 846–894 (2011) 4. Zhang, H., Yao, Y., Xie, K., et al.: Continuous aerial path planning for 3D urban scene reconstruction. ACM Trans. Graph. 40(6), 225 (2021) 5. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016) 6. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 91–110 (2004) 7. OpenCV. https://opencv.org 8. Yoo, J.C., Han, T.H.: Fast normalized cross-correlation. Circuits Syst. Sig. Process. 28, 819– 843 (2009) 9. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003) 10. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with. Commun. ACM 24, 381–395 (1981) 11. Hartley, R.I.: In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19(6), 580–593 (1997) 12. Li, H., Hartley, R.: Five-point motion estimation made easy. In: Proceedings of 18th International Conference on Pattern Recognition, vol. 1, pp. 630–633 (2006) 13. Triggs, B., McLauchlan, P.F., Hartley, R.I., et al.: Bundle adjustment-a modern synthesis. In: Proceedings of Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms Corfu, Greece, 21–22 September 1999 14. Hartley, R.I., Sturm, P.: Triangulation. Comput. Vision Image Underst. 68(2), 146–157 (1997) 15. CapturingReality. https://www.capturingreality.com

MOVNG: Applied a Novel Sparse Fusion Representation into GTCN for Pan-Cancer Classification and Biomarker Identification Xin Chen1 , Yun Tie1(B) , Fenghui Liu2 , Dalong Zhang3 , and Lin Qi1 1 School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China

[email protected], {ieytie,ielqi}@zzu.edu.cn

2 Respiratory and Sleep Department, The First Affiliated Hospital of Zhengzhou University,

Zhengzhou, China 3 Department of Oncology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou,

China

Abstract. Multi-omics data is used in oncology to improve cure rates, decrease mortality, and prevent disease progression. However, the heterogeneity, high dimensionality, and small sample sizes of this data pose challenges for modeling tumor-gene relationships. Current research focuses on inter-omics fusion, neglecting intra-omics fusion, which results in inadequate feature representation. Dimension reduction is also important to avoid overfitting due to the high dimensionality and limited samples in omics data fusion. Thus, we proposed a novel sparse fusion representation method based on VAE-NN network and applied it into GTCN to constitute a new model named MOVNG for pan-cancer classification and biomarker identification. The presented method implements inter- and intra-omics data fusion in high-level feature space. At the same time, it can be used for a universal integration framework in which different data have the traits of heterogeneity. Extensive experiments were conducted to our combined model. From it, we can know that the sparse fusion representation method has a strong ability of expression and the model achieved an average accuracy of 92.06 ± 1.50% in 33 types of tumor classification tasks. Finally, the model also identified some vital biomarkers based on the characteristics of pan-cancer. Keywords: Multi-omics data · Pan-cancer classification · Biomarker identification · Graph tree convolution networks · Spare fusion representation

1 Introduction With the rapid development of sequencing technology of biomolecular, various kinds of omics data can be easier to obtain than before [1]. Tumors form when genes divide abnormally, as shown in omics data where markers or gene sequences exceed normal levels. However, the relationship between single omics data and tumors is one-sided, ✩ The source code at https://github.com/cx-333/MOVNG.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 604–615, 2023. https://doi.org/10.1007/978-981-99-4755-3_52

MOVNG: Applied a Novel Sparse Fusion Representation into GTCN

605

partial, and incomplete [2]. Thus, multi-omics data fusion analysis can comprehensively map the relationship between tumors and biomolecular data, greatly improving diagnostic accuracy, cure rates, and prognosis prediction [3]. Unfortunately, the data to be analyzed commonly have the characteristics of high dimension (up to hundreds of thousands) and small sample sizes [4]. In machine and deep learning theories, when the number of parameters to be trained in a model greatly exceeds the number of samples, the model may experience over-fitting [5]. To prevent this issue, a specific projection relationship can be used to first map the high-dimensional features of the original data into a low-dimensional feature space before sending it for training in the model. Fakoor et al. [6] was the first to apply principal component analysis (PCA) algorithm to map expression data to low-dimensional space and then used in the training of sparse autoencoder (SAE) model. Experiments have shown the effectiveness of the method. What’s more, in the recent years, some sophisticated methods of dimension reduction, such as kernel principal analysis (kPCA) [7], locally linear embedding (LLE) [8], isometric mapping (ISOMAP) [9], locality preserving projections (LPP) [10] and auto-encoder (AE) [11], were successively applied to the feature mapping of multi-omics data and achieved good results. However, the application of the above methods is mostly limited to single omics data. To analyze multi-omics data, it’s crucial to consider the high dimensionality of features and the heterogeneity between different omics data [12]. Various methods have been proposed, such as using random forest (RF) [13] or multi-omics factor analysis (MOFA) [14] for integration and dimensionality reduction. Auto-encoder (AE) [15] framework, hybrid feature selection algorithm [16], and variational auto-encoder (VAE) [17, 18] have also been used to reduce feature dimension and improve tumor diagnosis and classification. Traditional algorithms [19–21] have also shown promising results. Several studies [22–25] have developed models that integrate multiple omics data to improve tumor classification, such as OmiVAE and convolutional neural networks (CNNs). However, the existing multi-omics fusion analysis methods have the shortcoming of insufficient fusion and the performance of their corresponding models has room for further improvement. To crack the problems mentioned above, we proposed a novel sparse fusion representation method of multi-omics data based on variational auto-encoder and neural networks (VAE-NN). And combined it with graph tree convolution networks (GTCN) for pan-cancer classification and biomarker identification tasks. The main contributions of this paper are as follows: 1) novel sparse fusion representation method is proposed by exploiting the intra-omics high-level features generated by VAE and the inter-omics relationship matrix generated by NN. The method can be generalize to any existing models similar to graph neural networks. 2) In the paper, we develop a combined model named MOVNG consists of the novel sparse fusion representation method and GTCN for pan-cancer classification and biomarker identification. 3) Extensive empirical experiments are conducted to show the superiority and robustness of the novel fusion representation method and the combined model. The desired performance gains are achieved.

606

X. Chen et al.

2 Proposed Methodology This section introduces the details of a novel sparse fusion representation method based on VAE-NN, GTCN and the conjunctive model named MOVNG. 2.1 Combined Model Framework Figure 1 shows the overall framework of the proposed MOVNG. The model training process was split into two stages. In the first stage, multi-omics data was encoded to suitable dimension and then concatenated and merged processed signals into new input data and a sparse matrix that include interactive information across each data, respectively, by VAE and NN orderly. In this stage, VAE realizes the intra-omics data fusion through its excellent coding ability, NN completes the inter-omics data fusion according to its complex reasoning ability. In the second stage, GTCN was trained to fulfill tumor classification utilizing fresh input data produced by stage one.

Fig. 1. The overall framework of MOVNG.

2.2 Sparse Fusion Representation Method Sparse fusion representation method based on VAE-NN network not only can integrate inter- and intra-omics data but map original low-level features into high-level feature space as to enhance the reliability of the signals. First of all, VAE projects input data to a high-level feature space thanks to its strong encoding ability and realizes intraomics data fusion. Then, NN mines the relationship between inter-omics data and the label by extracting the “native pattern” of high-level features. Finally, high-level features encoded by VAE and the sparse matrix deduced by NN form the novel sparse fusion representation. Variational Auto-Encoder (VAE) [26] is a probability inference model based on incomplete data which learning the traits of input data by probability and mapping original characters to latent space. Therefore, VAE has strong representational ability over information, this makes high-dimensional features can be mapped into low-dimensional space by distortion as smaller as possible.

MOVNG: Applied a Novel Sparse Fusion Representation into GTCN

607

 N ,Q Given multi-omics data sets D = Xji , where N represents the number of i,j=1,1

samples and Q denotes the number of heterogenous data. Xj ∈ RPj refers to the j − th modality data and Pj (j ∈ Q) stands for the dimension of features. Provided that the probability distribution of Xj is p(Xj ), in which Xj denotes different input data. And p(Xj ) represents the distribution of latent variable Zj corresponding to Xj . Therefore,   p(X |Z )p(Z ) p(X ,Z ) according to Bayesian principle: p Xj = p j Z j|X j = p Zj |Xj . ( j j) ( j j) On account of the intractability of posterior probability distribution p(Zj |Xj ) which denotes the probability that a sample generates a hidden variable, we used q(Zj ) to approximate p(Zj |Xj ) and took the logarithm of the above equation while computed their expectation:       ln p Xj = L qj + KL(q Zj ||p(Zj |Xj )) (1)    p(Xj ,Zj )   where L qj = q Zj ln q(Zj ) dZj denotes the variational lower bound and    q(Z ) KL(q(Zj )||p(Zj |Xj )) = q Zj ln p(Zj |Xj j ) is called Kullback-Leibler Divergence (KL) which measures the discrepancy of two functions. The smaller KL value, the more imperceptible difference in objective functions. By minimizing KL(q(Zj )||p(Zj |Xj )) can we obtain the proximate distribution q(Zj |Xj ) of p(Zj |Xj ). Finally, the loss function of the model is calculated as:   LossVAEj = argmin [KL(q(Zj |Xj )||p Zj |Xj ) + Lp(Xj ) ] (2) q(Zj |Xj )     where Lp(Xj ) = Eq(Zj ) lnp Xj − lnp Xj denotes reconstruct error which was designed as a proper function, such as mean square error (MSE). As input data and corresponding adjacency matrix are required for the GTCN to work, hidden space data encoded by VAE to raw data cannot be directly sent to GTCN, and the adjacency matrix corresponding to hidden space data needs to be calculated. The adjacency matrix reflects the interrelation between each node and can completely mine the interaction information between data [27]. Here we use neural network to get its adjacency matrix by learning. In this paper, each sample is regarded as a node, and the relationship between samples is determined by their labels. For example, Given 5 samples of labels [0, 2, 3, 2, 0], the corresponding adjacency matrix is: ⎡ ⎤ 100 01 ⎢010 10⎥ ⎢ ⎥ ⎢ ⎥ ⎢001 00⎥ ⎢ ⎥ ⎣010 10⎦ 100 01 It can be seen from this matrix that there is a relationship between the first sample and the last sample, a relationship between the second sample and the fourth sample, and a third sample independent of the others.

608

X. Chen et al.

N ,Q  It is assumed that multi-omics data set D = Xji , yi

i,j=1,1

output by VAE network

as implicit space vector Z ∈ Rd . Then through a mapping function f (·) → h to learn the relationship between the hidden space vectors, then through the softmax function h = softmax(h), and finally through the adjacency matrix transformation function output adjacency matrix of the hidden space vector. The loss function of neural network adopts cross entropy function, namely:   exp hy LNN = −log C−1 (3) c=0 exp(hc ) where, C is the number of classes, y is the label of the input sample. 2.3 Graph Tree Convolution Networks Graph tree convolution network (GTCN) [28] makes message passing by the tree representation of graphs. Different from the schedule of message passing in traditional graph network, the GTCN represents graphs utilizing the tree structure which delivers information from leaf node to root node, and each node will retain initial information itself before receiving updating from neighboring nodes. The updating of one node in GTCN is fulfilled by aggregating its original value and the renewed values of its neighboring nodes. Compared with traditional GCN, GTCN can improve network performance by stacking network layers without overfitting problems. The graph topology structure g(v, ε) consists of nodes v and edges ε which can be described by adjacency matrix A and degree matrix D completely. In addition, |v| = N and |ε| = M are the number of nodes and edges in graph structure, respectively. Nu denotes the set of the direct (1-hop) neighbors of node u, and X ∈ RN ×D represents features mapping over all nodes. xu ∈ R1×D is the feature vector of node u with D dimension. Y ∈ RN ×C is the class matrix of all nodes and yu ∈ R1×C represents the class vector of node u with C dimension. Message passing in GTCN is calculated as:  hku = Aˆ uv huk+1 + Aˆ uu Zu (4) v∈Nu 



−1/2

−1/2





Here, A = D AD denotes symmetric normalized adjacency matrix, and A = (A + I ) is the addition of adjacency matrix with self-loop. I is the identity matrix. D

represents the corresponding degree matrix, and Duu = v Auv , Zu = MLP(xu ). As to the real label Y of input data, loss function constructed as:   exp yˆ i,yi LGTCN = −log C−1 (5)   c=0 exp yˆ i,c









where, yi denotes true label of i − th input data, and yi is i − th prediction vector via softmax function by GTCN output.

MOVNG: Applied a Novel Sparse Fusion Representation into GTCN

609

3 Performance Evaluation To show clearly the expression ability of sparse fusion representation method, we visualized the high-level features learned by VAE with other similar methods and compared the classification performance of MOVNG with latest multi-omics integration and classification models. And we also demonstrated the advancement of components in MOVNG which can be substituted by other methods by ablation studies. The network structure was implemented in the same way as Fig. 1. The model optimized parameters by Adam [29] method, and the learning rate was set to 1e − 3 and batch size was set to 4. 3.1 Dataset In this paper, we gained virgin pan-cancer dataset including 33 types of tumor with three omics data (DNA methylation, RNA expression and protein expression) which can be downloaded from the cancer genome atlas (TCGA)1 . Due to the existence of noise, bias signals, missing values, unmatched samples and unit values in the gained datasets, it is essential to preprocessing input data before applying them. Targeting DNA methylation data, firstly, aligning the probes in that with counterpart in human reference genome (hg38) and removing the probes that do not match to hg38. For the samples that the missing values (N/A) have more than 10% in total probes we adopted SimFiller [30] method to fill homologous null value. Finally, the surplus NAN values in the samples were substituted by the mean value of corresponding probes in all samples. For RNA and protein expression data, the preliminary processing process in them was same as DNA methylation data excluding alignment. Then normalized their values to the interval 0 and 1 by max-min trick. To further validate the generalization ability of the model and its performance on multiple and binary classification tasks, we also conduct experiments on other real-world multi-modal medical datasets. BRCA for breast invasive carcinoma PAM50 subtype classification contains 875 samples of 5 different classes which is the portion of pancancer dataset from TCGA. ROSMAP for Alzheimer’s Disease diagnosis includes 351 samples of 2 classes which derived from Religious Orders Study (ROS) and the Rush Memory and Aging Project (MAP). 3.2 Representation Ability The heterogeneous data in our proposed VAE-NN algorithm-based sparse fusion representation consists of high-level features and the fused sparse matrix. To demonstrate the effectiveness of our proposed method, we first compare the encoding abilities of VAE, AE, PCA, LLE, LPP, and ISOMAP algorithms on 33 types of tumor data, and present the results through visualization shown in Fig. 2. Then we compare their performance on classification tasks when combined with the NN algorithm, using evaluation metrics such as ACC, Weighted F1 score and Recall [31]. From Fig. 4, it can be seen that compared with other dimension reduction algorithms, VAE has better representation ability, resulting in better classification performance. The 1 https://xenabrowser.net.

610

X. Chen et al.

Fig. 2. 2-D embedding graph of RPPA data learned by proposed VAE and its homogeneous method. Samples of various tumor types were plotted by different colors shown in the legend2 . The closer the same colors are to each other, the stronger the representation of the model is. It can be seen that compared with other dimension reduction algorithms, VAE has better representation ability.

Fig. 3. The comparison between the adjacency matrix learned by NN (b) and the true adjacency matrix (a) among the samples.

encoded high-level features fused by the NN algorithm can effectively reflect their relationship with the corresponding categories. Figure 3 shows the comparison between the adjacency matrix learned by NN and the true adjacency matrix among the samples, in which the red box denotes not learned sample categories correctly. In practice, the true adjacency matrix cannot be obtained in advance.

2 The full name of abbreviations in the figure can be found on the website (https://gdc.cancer.

gov/resources-tcga-users/tcga-code-tables/tcga-study-abbreviations).

MOVNG: Applied a Novel Sparse Fusion Representation into GTCN

611

Fig. 4. The comparison of different dimensionality reduction algorithms combined with the NN algorithm and GTCN network in terms of their classification performance.

Fig. 5. The comparison of classification performance of different graph neural network models combining two fusion methods, respectively.

3.3 Ablation Experiment In this paper, we propose the MOVNG model to achieve the tasks of pan-cancer classification and biomarker identification. The MOVNG model consists of a sparse fusion representation based on the VAE-NN algorithm and the GTCN network. To verify the classification and generalization ability of the joint model MOVNG, we replace the NN algorithm and the GTCN network with models that achieve the same function and conduct pan-cancer classification, specific tumor subtype classification, and disease diagnosis tests using their variants. Figure 5 shows the classification performance of the model with SNF fusion algorithm replacing the NN algorithm. It can be seen that MOVNG has better classification

612

X. Chen et al.

performance. SNF integrates data with a “flat” characteristic, which can lead to an imbalance in the input data fusion, thereby affecting the model performance. In comparison to ordinary graph neural network models, GTCN network can improve model performance by increasing the network layers, while increasing network layers of ordinary network models can affect the model performance. Therefore, its performance is higher than other models of the same kind, and GTAN has similar characteristics to GTCN, but GTAN analyzes data through attention, so the performance of the GTCN and GTAN models is not significantly different. Table 1. The comparison of pan-cancer classification performance between MOVNG and currently popular classification models. Method

Key Performance Indicator Accuracy

Weighted F1 score

Weighted Recall

VAE + LR

86.23 ± 1.63%

85.98 ± 1.61%

85.43 ± 1.45%

VAE + SVM

87.54 ± 1.85%

86.23 ± 1.59%

88.76 ± 2.41%

VAE + MLP

86.05 ± 1.34%

88.26 ± 2.53%

82.43 ± 2.55%

MOGONET [25]

90.04 ± 2.50%

89.25 ± 1.64%

89.65 ± 2.23%

MOVNG

92.06 ± 1.50%

91.21 ± 1.40%

89.55 ± 2.10%

Table 2. The comparison of BRCA subtype classification performance between MOVNG and currently popular classification models. Method

Key Performance Indicator Accuracy

Weighted F1 score

Weighted Recall

VAE + LR

80.23 ± 1.63%

78.98 ± 1.61%

80.43 ± 1.45%

VAE + SVM

79.54 ± 1.85%

80.23 ± 1.59%

81.76 ± 2.41%

VAE + MLP

81.05 ± 1.34%

80.26 ± 2.53%

82.43 ± 2.55%

MOGONET [25]

82.90 ± 1.80%

82.50 ± 1.60%

80.65 ± 1.23%

MOVNG

83.06 ± 1.05%

82.71 ± 1.40%

82.25 ± 2.01%

Table 1 shows the comparison results of the classification performance between MOVNG and current popular models. Although MOGONET outperforms the model proposed in terms of recall metric, its standard deviation is larger, and it is not as stable as the model proposed in this paper. Table 2 and Table 3 demonstrate the generalization ability of MOVNG, showing that the proposed model exhibits superior subtype and binary classification performance when applied to specific disease datasets. 3.4 Biomarker Identification According to the first-layer weights of trained VAE in the pan-cancer dataset, we can get the rankings of vital biomarkers identified by MOVNG. In addition, the number of

MOVNG: Applied a Novel Sparse Fusion Representation into GTCN

613

Table 3. The comparison of ROSMAP binary classification performance between MOVNG and currently popular classification models. Method

Key Performance Indicator Accuracy

Weighted F1 score

Weighted Recall

VAE + LR

79.23 ± 1.13%

78.98 ± 1.41%

80.43 ± 1.15%

VAE + SVM

80.14 ± 1.05%

81.23 ± 1.89%

80.76 ± 1.51%

VAE + MLP

80.05 ± 1.14%

81.26 ± 2.03%

80.43 ± 2.05%

MOGONET [25]

81.50 ± 2.00%

82.10 ± 1.24%

82.65 ± 2.21%

MOVNG

86.57 ± 1.10%

86.22 ± 1.40%

2.00%

top influential genes can be selected on the basis of first-layer weights. In the paper, we set top-n equal to 10. The top-n genes discerned by MOVNG are shown in Table 4. Note that these vital genes target all types of tumor. For example, the abnormal value of ACVRL1 may give rise to foreboding of suffering from one of 33 tumors. Table 4. Important genes of multi-omics data identified by sorting the first-layer weights of trained VAE. Omics Data Type

Identified Biomarkers

DNA Methylation

TMC4, HYAL2, TTC15, GPR37L1, OR1J4, ATP10B, TMEM207, CDH26, MT1DP, AGA

RNA Expression

NPNT, CDK18, APLN, SYTL1, ARRDC2, DSG1, UGT8, SOX10, PI3, SERPINB5

Protein Expression

ACVRL1, RAD51, BRCA2, MSH6, CDK1, BID, ATM, SMAD4, IGFBP2, BCL2

4 Conclusion and Future Work Accurate classification and biomarker identification of different kinds of tumors is of great significance for rapid development of treat plans and guide to relevant doctors. However, it makes integrating analysis of various types of genes complicated due to the extreme dimension of the features of multi-omics data. Even some simple tasks like tumor classification are also difficult to accomplish. This work proposes a novel sparse fusion representation method and combines it with GTCN named MOVNG to fulfill pancancer classification and biomarker identification. Experimental results with 33 types of tumor dataset including 3 omics data demonstrate that the sparse fusion representation method in MOVNG achieves significantly better classification accuracy than its other widely used peers.

614

X. Chen et al.

In the future work, first, we plan to design a better fusion method which can be learned by network to substitute NN algorithm as to provide higher-representation ability. Second, we plan to design an improved VAE to optimize high-level features space among multi-omics data. Finally, we will further investigate methods to fulfill better classification performance.

References 1. Berger, B., Peng, J., Singh, M.: Computational solutions for omics data. Nature Reviews Genetics 2. Vasaikar, S.V., Peter, S., Jing, W., Bing, Z.: LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Res. D1, D1 (2017) 3. Huang, S., Kumardeep, C., Garmire, L.X.: More is better: recent progress in multi-omics data integration methods. Front. Genet. 8, 84 (2017) 4. Huang, Z., Zhan, X., Xiang, S., Johnson, T.S., Huang, K.: Salmon: Survival analysis learning with multi-omics neural networks on breast cancer. Front. Genet. 10, 166 (2019) 5. Chkifa, A., Cohen, A., Schwab, C.: Breaking the curse of dimensionality in sparse polynomial approximation of parametric pdes. Journal de Math ematiques Pures et Appliqu´ees, 103(2), 400–428 (2015) 6. Gupta, P., Malhi, A.K.: Using deep learning to enhance head and neck cancer diagnosis and classification. In: 2018 IEEE International Conference on System, Computation, Automation and Networking (ic-scan) (2018) 7. Speicher, N.K., Pfeifer, N.: Towards multiple kernel principal component analysis for integrative analysis of tumor samples. J. Integr. Bioinform. 14(2), 20170019 (2017) 8. Xu, J., Mu, H., Yun, W., Huang, F.: Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. In: Computational and Mathematical Methods in Medicine, 2018, (2018–1–31), vol. 2018, pp. 1–11 (2018) 9. Wu, Y., Ji, R., Ge, M., Shi, S.: Classification of tumor gene expression data based on manifold learning and gaussian process. In: 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (2019) 10. Liu, H., Zhang, Z., Xin, Z., Yang, Y., Zhang, C.: Dimensionality reduction for identification of hepatic tumor samples based on terahertz time-domain spectroscopy. IEEE Trans. Terahertz Sci. Technol. PP(99), 1–7 (2018) 11. Zhong, Y., Jia, S., Hu, Y.: Denoising auto-encoder network combined classification module for brain tumors detection. In: 2022 3rd International Conference on Electronic Communication and Artificial Intelligence (IWECAI) 12. Picard, M., Scott-Boyer, M.P., Bodein, A., P´erin, O., Droit, A.: Integration strategies of multi-omics data for machine learning analysis. Comput. Struct. Biotechnol. J. 19, 3735–3746 (2021) 13. Acharjee, A., Kloosterman, B., Visser, R., Maliepaard, C.: Integration of multi-omics data for prediction of phenotypic traits using random forest. BMC Bioinformatics 17(5), 180 (2016) 14. Argelaguet, R., et al.: Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol. 14(6), e8124 (2018) 15. Chaudhary, K., Poirion, O.B., Lu, L., Garmire, L.X.: Deep learning based multi-omics integration robustly predicts survival in liver cancer. Clin. Cancer Res. 24(6), 1248–1259 (2017) 16. Wiel, M., Lien, T.G., Verlaat, W., Wieringen, W., Wilting, S.M.: Better prediction by use of co-data: adaptive group-regularized ridge regression. Stat. Med. 35(3), 368–381 (2016)

MOVNG: Applied a Novel Sparse Fusion Representation into GTCN

615

17. Huda, S., Yearwood, J., Jelinek, H.F., Hassan, M.M., Fortino, G., Buckland, M.: A hybrid feature selection with ensemble classification for imbalanced healthcare data: a case study for brain tumor diagnosis. IEEE Access PP, 1 (2016) 18. Way, G.P., Greene, C.S.: Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders, pp. 80–91 (2018) 19. Titus, A.J., Bobak, C.A., Christensen, B.C.: A new dimension of breast cancer epigenetics applications of variational autoencoders with dna methylation. In: International Conference on Bioinformatics Models (2018) 20. Wang, Z., Wang, Y.: Exploring dna methylation data of lung cancer samples with variational autoencoders. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2019) 21. Li, Y., Kang, K., Krahn, J.M., Croutwater, N., Kevin, L.: A comprehensive genomic pancancer classification using the cancer genome atlas gene expression data. BMC Genomics (2017) 22. Zhang, X., Zhang, J., Sun, K., Yang, X., Dai, C., Guo, Y.: Integrated multi-omics analysis using variational autoencoders: Application to pan-cancer classification (2019) 23. Lyu, B., Haque, A.: Deep learning based tumor type classification using gene expression data (2018) 24. Rhee, S., Seo, S., Kim, S.: Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification (2017) 25. Wang, T., et al.: Mogonet integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12(1), 3445 (2021) 26. Mahmud, M.S., Huang, J.Z., Fu, X.: Variational autoencoder-based dimensionality reduction for high-dimensional small-sample data classification. In: International Journal of Computational Intelligence and Applications, no. 1, p. 2050002 (2020) 27. Shi, Q., et al.: Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data. Bioinformatics 33(17), 2706–2714 (2017) 28. Wu, N., Wang, C.: GTNet: A tree-based deep graph learning architecture (2022) 29. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. Computer Science (2014) 30. Rehman, F.U., Abbas, M., Murtaza, S., Butt, W.H., Rehman, S.: Similarity-based missing values filling algorithm. In: IEEE Thirteenth International Conference on Digital Information Management (ICDIM) (2018) 31. Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowl. Manag. Process 5(2), 1 (2015)

A Blockchain-Based Network Alignment System for Power Equipment Data Inconsistency Yuxiang Cai1,2 , Xin Jiang2 , Qifan Yang3 , Wenhao Zhao4(B) , and Chen Lin4 1 School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University,

Shanghai 200240, China 2 State Grid Fujian Information and Telecommunication Company, Fuzhou 350013, China 3 State Grid Fujian Electric Power Big Data Center, Fuzhou 350013, China 4 School of Informatics, Xiamen University, Xiamen 361000, China

[email protected]

Abstract. This paper resolves the problem of data inconsistencies and redundancies in the current equipment management system of State Grid Corporation of China (SGCC). We propose a blockchain-based network alignment system (BNAS). Our system offers three key features: fine-grained access control, efficient equipment data synchronization, and retrieval of similar equipment data, improving data management and accuracy during data updates. Specifically, the proposed system utilizes blockchain to connect multiple equipment data sources and employs representation learning techniques to match similar items. An efficient compression algorithm that utilizes composite hashing is proposed to improve the synchronization speed and reduce the storage burden of blockchain by approximately 43%. Moreover, we introduce HAC (Hierarchy Access Control) Tree, a novel data structure that mimics SGCC’s affiliations to manage staff’s permission to update equipment data. The evaluation results demonstrate BNAS’s efficiency in dealing with power equipment data inconsistencies. Keywords: Blockchain · Access Control · Data Inconsistency · Network Alignment · Representative Learning

1 Introduction As one of the largest power companies in the world, China’s state grid corporation (SGCC) constructs and operates China’s power infrastructure. SGCC has deployed numerous hardware and software power equipment to maintain the infrastructure. Multiple departments generally manage these equipment items, each using an independent data source to store the equipment description, referred to as “equipment data” in this paper. It is important to develop efficient and reliable equipment data management. However, the problem of data inconsistency hinders equipment maintenance and statistical analysis for SGCC. Data inconsistency problem means that equipment data from different sources overlap and vary greatly, and equipment items have different values in each data source. As for the current power equipment management system, the main reasons for data inconsistency are as follows: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 616–627, 2023. https://doi.org/10.1007/978-981-99-4755-3_53

A Blockchain-Based Network Alignment System

617

R1. Absence of data matching ability. When staff members add or modify equipment data, the current system cannot retrieve similar records from multiple independent data sources, leading to redundant data entries for the same equipment. R2. Loose access control and permission checks. When staff members request to update equipment data, the current system does not strictly authenticate their permissions. Thus, everyone can change the data, leading to a higher probability of inaccuracies in equipment data entry. R3. Lack of update record traceability. When staff members add or modify equipment data incorrectly, the current system cannot trace update records, making it difficult to identify the source of errors and hold the counterpart accountable. As a decentralized and distributed system, blockchain [1] can store data immutably and traceably. It has been applied in various scenarios, such as Supply Chain [2], Copyright Protection [3, 4], and Internet of Things [5]. Current research has demonstrated that access control schemes based on blockchain technology are more secure than traditional schemes. Therefore, we propose BNAS, a blockchain-based network alignment system that addresses inconsistencies in power equipment data. However, there are several challenges that we must overcome in our work. Firstly, correctness and errors of fields are common phenomena in equipment data records, and traditional text-matching algorithms may not work on such data. Thus, an effective matching scheme is required. Secondly, the synchronization process of transferring equipment data to the blockchain can be time-consuming due to the vast fields and extensive content, which may cause a significant storage burden. Thirdly, the current system lacks strict access control and fine-grained permission authorization when employees update equipment data, leading to management difficulties. Our proposed system includes three components: access control, equipment data synchronization, and similar equipment data retrieval, addressing the above issues with the following contributions. (1) The proposed system utilizes blockchain to enable secure data sharing across multiple data sources. Additionally, the system employs representation learning techniques to match similar equipment data records, even in missing or incorrect fields. (2) This paper presents an equipment data compression algorithm that utilizes composite hashing. The proposed algorithm preserves the discriminative features for data records, reduces the volume of data to be synchronized, accelerates data synchronization to the blockchain, and improves the utilization rate of blockchain storage. (3) The hierarchical access control tree (HAC Tree), a novel tree-like data structure, is proposed to limit the scope of permission for data updates and provide more fine-grained access control than the current system.

2 Related Work Network Alignment is a technique that aims to predict the strength of association between pairs of nodes in different networks. For instance, [6] aligns protein networks based on their neighborhood topology, while [7] relies on a low-rank pairwise algorithm for few-shot fine-grained image classification. [8] optimizes IsoRank by adopting spectral alignment clusters, and [9] utilizes graph eigenvalue decomposition to identify correspondences between two networks. Another type of network alignment method

618

Y. Cai et al.

involves building a graph network. For instance, [10] learns continuous feature representations for nodes. [11] designs multi-level graph convolutions on both local network structure and hypergraph structure. [12] presents cross-lingual KG alignment. Data Compression Algorithms can be categorized into two types. Lossless compression algorithms allow the restoration of original data after compression. RLE (RunLength Encoding) [13] counts frequently occurring symbols and compresses the original data into pairs of frequency and symbol. Huffman coding [14] assigns weights to characters based on their frequency and builds a binary tree from weights. Shannon Fano coding [15] creates binary trees from the top down rather than the bottom up, as in Huffman. Lossy algorithms cannot be fully restored to the original data, e.g., the SimHash algorithm [16]. Lossy compression techniques such as JPEG, JPG are suitable for media files. Access Control involves authenticating and ensuring the trustworthiness of users accessing data in a system. Blockchain inherently possesses decentralized features and maintains data immutability and traceability. Research on existing blockchain consensus algorithms, as outlined in [17], divides access control procedures into policy formulation, storage, and request processing. [18] presents a cloud-based data access control mechanism that eliminates the need for provider participation and leverages dynamic attribute-based encryption with a ciphertext-policy structure. A trustworthy scheme [19] provides attribute-based secure data sharing with integrity auditing.

3 System Design 3.1 Overall System Architecture

Staff member Update request

Similar records

EDS1 Staff ID

EDS2

Staff ID

Equipment Data Sources (EDS)

EDSn

New record

Access Control Permission check result

EDS3

Similar records

Update result

Staff ID

Staff ID

New record

New record

Similar Data Retrieval Similar records

Data Synchrozation

New record Offline Model

Blockchain New record (compressed) Staff ID HAC Tree

Smart Contract

Blockchain-based Network Alignment System (BNAS)

Fig. 1. Overall architecture of the proposed network alignment system

A Blockchain-Based Network Alignment System

619

The overall architecture of the proposed BNAS system is illustrated in Fig. 1. BNAS interacts with multiple independent data sources as a standalone module. It consists of three components: access control, similar equipment data retrieval, and real-time data synchronization. The functions of these components are summarized as follows: Access Control. The access control component is based on the HAC Tree. When a staff member requests to update equipment data in a specific data source, the system combines the staff ID with a fixed prefix as the key and then searches the HAC Tree for the corresponding authorized node. The key-value indexing scheme ensures efficient search operations. If a validly authorized node is found, the staff ID and modified record are transmitted to the similar equipment data retrieval component. Similar Equipment Data Retrieval. The similar equipment data retrieval component is responsible for finding similar equipment data records using a pre-trained offline model. This model is updated along with the equipment data. Firstly, the component searches the blockchain storage using the model to retrieve the top K data records similar to the new record. Then the staff member can review his/her update request based on similar records. If the staff member confirms the update operation, the staff ID and the new record will be sent to the data synchronization component. Equipment Data Synchronization. The task of the equipment data synchronization component is to integrate equipment data from independent sources and securely store it in the blockchain. In order to ensure data security and improve synchronization speed, this paper proposes an algorithm based on composite hashing to compress equipment data. Then the compressed data is synchronized to the blockchain. 3.2 Access Control via HAC Tree A fine-grained access control scheme is necessary to limit the scope of staff’s permission to update equipment data. We consider three main operations involved in access control in the real world. They are: 1. Access retrieval, which means checking whether someone has permission to update equipment data. 2. Direct access authorization, which means a department grants one of its members the update permission directly. 3. Delegate access authorization, which means a department authorizes its subdepartment to act as a delegate, who can then grant update permission to its members. In order to ease the above operations in our system, we designed a hierarchical treelike data structure named the HAC Tree to manage all permission-related information for power equipment data. This data structure simplifies the authorization process to a set of operations on the HAC Tree, which will be elaborated on next. HAC Tree. As shown in Figure 2, the HAC Tree consists of multiple HAC nodes and edges. A HAC node represents a staff member involved in the authorization, and a directed edge denotes the relationship between the two parties involved. An event of authorization corresponds to a directed edge from a parent node to a child node. The node issuing the access permission is the parent node, and the node receiving the permission is the child node.

620

Y. Cai et al. Blockchain K-V Storage ID1

Genesis

Delegate

Authorized

Delegate

Node2 Node3

ID4

Node4

ID5

Node5

ID6

Delegate

Authorized

Authorized

Authorized

Delegate

Authorized

Delegate

Node1

ID2 ID3

Node6

ID7

Node7

ID8

Node8

ID9

Node9

ID10

Node10

ID11

Node11

Authorization Capability Node ID Prefix (fixed)

ID

Authorization Details Staff ID

Node

Meta-data

Fig. 2. Hierarchy access control tree

Types of HAC Nodes: There are three types of HAC nodes: the genesis node, delegate node, and authorized node, representing three different roles participating in the access control process of equipment data. • The genesis node is the unique root of the HAC Tree. It is bound to the BNAS system manager and is automatically created when the system is deployed. The genesis node’s staff ID field is pre-written in the configuration file, just like the genesis block in the blockchain, so we named it the genesis node. The genesis node can perform two types of authorization without restrictions: granting update permission to a specific staff member (i.e., direct authorization) and delegating the granting authority to a sub-department (i.e., delegate authorization). • A delegate node is created when a delegate authorization is issued. It should be noted that not only can the genesis node create new delegate nodes, but also the existing delegate nodes. The main difference between them is that the former can initiate authorizations without restrictions on the number of times. In contrast, the latter can only issue a limited number of authorizations determined by its parent and not more than the number of available authorizations that its parent can issue. • An authorized node is created when a direct authorization is issued, representing a leaf node in the HAC Tree. Although it cannot generate child nodes, the corresponding staff member can now update the equipment data. Each type of HAC node typically consists of three kinds of data. The first is the information about its authorization capability, for example, the number of authorizations it can execute. The second is the information about the authorization details, such as the public key of the authorized staff member, the authorization time, and the current valid state. The third is the meta-data, such as the operation counter, the ID of its parent node, the current node, and its child node. As illustrated in Fig. 2, a HAC node’s ID comprises two parts, including a fixed ID prefix and the staff ID. Each HAC node is stored in the key-value storage of the blockchain with its ID as the key index. Authorization Token. Authentication is required for each manipulation on the HAC Tree to prevent malicious attacks. Our system implements this authentication using an authorization token (AuthToken). The process of creating and verifying the authorization token is illustrated in Figure 3.

A Blockchain-Based Network Alignment System

621

* SHA is the abbreviation of Secure Hash Algorithm. * AUTH is the abbreviation of authorization.

BNAS Client

11001001

BNAS Server Private Key

AUTH TOKEN

Transfer Securly

AUTH TOKEN

Encrypt HAC Tree Node

Decrypt

11001001

Public Key SHA

Node ID Authorization Counter

SHA

== ? 0

11001001

Fig. 3. Basic creating and checking process of an authorization token

A public-private key pair must be prepared when a HAC node is created. The private key is secretly kept by the staff member associated with the HAC node, and the public key is stored in the node for future use. When a staff member wants to manipulate the associated HAC node, he/she needs to sign on the node ID and the operation counter with his/her private key and generate a signature (the authorization token). The token is then submitted to the system and verified using the staff member’s pre-stored public key. If the verification is successful, the staff member is allowed to continue his/her manipulation on the HAC node. Accordingly, the operation counter of the node is incremented, then the authorization token expires consequently. An Example of Authorization Process via HAC Tree. The HAC Tree mimics the hierarchical structure of SGCC, in which the genesis node is linked to the BNAS system manager. The system manager creates delegate nodes for second-tier departments, and similarly, second-tier department managers can create authorized nodes for their members and delegate nodes for third-tier departments. To elaborate on the functioning of the HAC Tree in the access control process, we present an example (see Fig. 4) that illustrates a series of authorization operations performed on the HAC Tree.

Fig. 4. An example of authorization process

1) At stage T 1 , the BNAS system is deployed and the genesis node nodeM bound to the system manager M is automatically created using M’s public key PublicKeyM . M’s private key PrivateKeyM is used for generating the authorization token AuthTokenM of

622

Y. Cai et al.

nodeM . The operation counter of nodeM is set to be 0, and the number of authorizations the genesis node can issue is not limited. 2) At stage T 2 , M intends to delegate the authorization rights to department A and B, with a cap of 10 authorizations per department. M first sends AuthTokenM securely to A, then A can manipulate nodeM to create a new delegate node nodeA with his/her publicprivate key pair and AuthTokenM . When this is done, the operation counter of nodeM is incremented, and AuthTokenM expires. Therefore, M has to generate a new authorization token AuthToken* M and transfer it to B, then B can follow the above steps and create a new delegate node nodeB using AuthToken* M . 3) At stage T 3 , department A wants to grant the update permission to its staff member A1 . A first generates an authorization token AuthTokenA and sends it to A1 , A1 can use AuthTokenA and his/her key pair to generate an authorized node node1 A . When this is finished, the number of authorizations that department A can issue is decremented. 4) At stage T 4 , department B wants to grant its sub-department C the update permission and sets the number of authorizations that C can issue as 7. B generates an authorization token AuthTokenB , then C uses it to generate another delegate node nodeC . After this, the number of authorizations B can issue decreases by 7. So far, staff member A1 has been granted permission to update equipment data, and departments A, B, and C can continue to grant the update permission according to their remaining authorization times. 3.3 Data Compression Algorithm Based on Composite Hashing Blockchain acts as a secure and shared platform for multiple independent data sources. Each new equipment data record is synchronized to the blockchain in real time for future use. However, due to the blockchain’s distributed nature, data synchronization from multiple data sources to the blockchain may take considerable time, which is a significant bottleneck for our system. To overcome this limitation, we present a compression algorithm based on composite hashing to improve synchronization efficiency, which maps complex data to fixed-length bit sequences before the synchronization. Offline learning model for the similar equipment data retrieval component is trained on compressed data. Therefore, the compression algorithm should preserve the fields’ discriminative features of equipment data to ensure the offline model’s accuracy. Field Groups

Group A

Group B

Record Fields SHA-256 256 bit 1010...01001

SimHash

Truncate Compressed Fields

64 bit 1110...01001

64 bit 0101...10101

Fig. 5. The framework of data compression algorithm based on composite hashing

A Blockchain-Based Network Alignment System

623

The framework of the proposed data compression algorithm is shown in Fig. 5. SHA256 algorithm is used to compress the fields that staff members will never update, while the SimHash algorithm is used for fields that will be frequently updated. This design trick is based on the following reasons: 1. The time complexity of the SHA-256 algorithm is lower than that of SimHash. Thus, for the fields that will never be updated, SHA-256 will bring less time cost. 2. The SHA-256 algorithm is input-sensitive, while the SimHash algorithm is not. In other words, the SimHash algorithm will generate similar hash values for similar text, preserving the discriminability of fields. The proposed compression algorithm groups field into two categories: Group A containing fields that the staff member will never update, and Group B containing fields that will be updated. For Group A, the algorithm applies the SHA-256 algorithm to map field content to a 256-bit hash value, which is then truncated to 64 bits. This truncation reduces the amount of data to be synchronized. For Group B, the algorithm maps field content directly to a 64-bit hash value using the SimHash algorithm, which is used as the final compressed data. 3.4 Similar Equipment Data Retrieval Based on Representative Learning Network alignment is a crucial approach to integrating data from diverse sources. One of the prevailing methods for aligning heterogeneous networks is using representation learning technology to extract representations and build matching strategies. Besides, representation learning still works even in the case of missing fields of data records. The proposed data compression algorithm preserves differences of various fields in equipment data records, making it possible to train offline representation models from compressed equipment data. In the proposed BNAS system, new equipment data records will be compressed first, and then the offline model takes the compressed data as the input to generate the top K similar equipment data records. Since this paper does not focus extensively on offline model building, we will omit the model details.

4 Evaluation 4.1 Performance Evaluation of Data Compression Algorithm To evaluate the impact of data compression algorithms on storage performance and synchronization efficiency, we performed experiments on a real dataset provided by SGCC. The dataset contains 5848 records, each with 216 attributes, involving various types of equipment such as database, middleware, network switch, virtual machine, and the web server. The statistical information of this dataset is shown in Table 1. The evaluation metrics Ratedec (the decrease rate of data size) and Rateinc (the increase rate of storage utilization) are defined as Eqs. (1) and (2) where X a denotes the data size after compression, and X b denotes the data size before compression. Ratedec =

Xa × 100%, Ratedec ∈ [0, 1] Xb

(1)

624

Y. Cai et al. Table 1. Dataset statistics

Statistics

Category Database

Middle-ware

Switch

VM

Server

Others

Total

Proportion (%)

7.06

10.09

13.47

42.82

22.06

4.50

100.00

Null rate (%)

59.85

59.76

33.60

46.75

48.09

36.61

84.23

Quantity

413

590

788

2504

1290

263

5848

Data size (KB)

413.00

223.40

765.65

965.54

1112.59

21.76

3501.94

Text length

78181

107104

394382

475470

569742

111787

1736666

Avg length

9.070

8.670

8.854

8.697

8.180

11.033

8.718

Max length

127

358

129

363

329

1040

1040

Rateinc =

(Xb − Xa ) × 100%, Rateinc ∈ [0, 1] Xb

(2)

Table 2 illustrates the synchronization performance for record quantities ranging from 100 to 5000. The results indicate that the proposed compression algorithm can help to reduce the amount of data synchronized by approximately 43%. Additionally, the utilization of blockchain storage space increases by around 55%. That is to say, using composite hashing effectively reduces the amount of synchronized data and alleviates the storage burden on the blockchain. Table 2. Comparison results of data size to be synchronized Records

Data size (KB) Before

Reductiona

Optimizationb

After

100

38.290

16.952

44.273

55.727

200

76.398

33.472

43.813

56.187

500

190.957

83.992

43.985

56.015

1000

388.826

167.256

43.016

56.984

2000

1288.591

559.705

43.435

56.565

5000

2714.200

1226.681

45.195

54.805

a Reduction is the decrease rate of data size. b Optimization is the increase rate of storage utilization.

We established a private blockchain network on Ethereum with PoA (Proof of Authority) [20] consensus algorithm to evaluate how the proposed compression algorithm affects synchronization efficiency. We first set up three nodes in the blockchain network. Then we conducted performance tests on the modes of full-sync (baseline method, synchronize the complete original data) and lite-sync (synchronize the compressed data)

A Blockchain-Based Network Alignment System

625

and measured the synchronization time cost under different record quantities. As shown in Fig. 6(a), the synchronization time for both modes increases as the quantity of the records rises. Besides, the lite-sync mode needs less time cost and a lower increase rate, which is more efficient than the full-sync mode.

(a) Under different record quantities

(b) Under different block heights

Fig. 6. Experiment results of synchronization time cost

Block height is an essential factor that affects blockchain efficiency. As the block height increases, nodes in the blockchain network need more time to achieve a consensus on new data. To evaluate the impact of block height, we conducted tests on the two synchronization modes with a record quantity of 2000. The running time of the system was measured as the block height changed. The results, illustrated in Fig. 6(b), indicate that the lite-sync mode requires less time to reach the same block height, which means the blockchain can achieve consensus more quickly on the compressed data. The number of nodes in the blockchain network will also affect the synchronization efficiency. To illustrate this point, we fixed the quantity of the records at 500 and recorded the time required to finish synchronization with different numbers of blockchain nodes. Figure 7 shows the experimental results, indicating that the blockchain requires more time to achieve consensus with more nodes participating. Compared to the full-sync mode, the lite-sync mode achieves consensus on new data more quickly, and the increase rate is more stable, which brings higher scalability for the system.

626

Y. Cai et al.

Fig. 7. Experiment results of synchronization time cost with different number of nodes

5 Conclusion This paper presents BNAS, a blockchain-based network alignment system designed to address data inconsistency in power equipment data management. The proposed system offers three key features: access control, equipment data synchronization, and similar data retrieval. The novel data structure, HAC Tree, provides fine-grained access control, and the compression algorithm based on composite hashing reduces the synchronization time cost and storage burden. The experimental results demonstrate the effectiveness and efficiency of the proposed system. Acknowledgement. This work was supported in part by the State Grid Fujian Science and Technology Project (“Research on Key Technologies of Blockchain based Full Link Monitoring and Multi-modal Multi-source Data Fusion”) under Grant 52130M22000A. Corresponding author: [email protected].

References 1. Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Cryptography Mailing list. https://metzdowd.com (2009) 2. Islam, M., Rehmani, M.H., Chen, J.: Transparency-privacy trade-off in blockchain-based supply chain in industrial internet of things. In: 2021 IEEE 23rd International Conference on High Performance Computing & Communications, Haikou, Hainan, China, 20–22 December 2021, pp. 1123–1130. IEEE (2021) 3. Xu, Z., Wei, L., Wu, J., Long, C.: A blockchain-based digital copyright protection system with security and efficiency. In: Xu, K., Zhu, J., Song, X., Lu, Z. (eds.) Blockchain Technology and Application. CBCC 2020. Communications in Computer and Information Science, vol. 1305. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6478-3_3 4. Agyekum, K.-B., et al.: Digital media copyright and content protection using IPFS and blockchain. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds.) Image and Graphics. Lecture Notes in Computer Science, vol. 11903, pp. 266–277. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34113-8_23 5. Thakur, S., Breslin, J.G.: A model of decentralized social internet of things using blockchain offline channels. In: 2020 Second International Conference on Blockchain Computing and Applications (BCCA). pp. 115–121 (2020)

A Blockchain-Based Network Alignment System

627

6. Singh, R., Xu, J., Berger, B.: Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Speed, T., Huang, H. (eds.) Research in Computational Molecular Biology, pp. 16–31. Springer, Berlin Heidelberg (2007). https://doi.org/10.1007/ 978-3-540-71681-5_2 7. Huang, H., Zhang, J., Zhang, J., Xu, J., Wu, Q.: Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. Trans. Multi. 23, 1666–1680 (2021) 8. Liao, C., Lu, K., Baym, M., Singh, R., Berger, B.: Isorankn: spectral methods for global alignment of multiple protein networks. Bioinform. 25(12), i253–i258 (2009) 9. Feizi, S., Quon, G.T., Mendoza, M.R., Medard, M., Kellis, M., Jadbabaie, A.: Spectral alignment of graphs. IEEE Trans. Netw. Sci. Eng. 7(3), 1182–1197 (2020) 10. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 855–864. ACM (2016) 11. Chen, H., Yin, H., Sun, X., Chen, T., Gabrys, B., Musial, K.: Multi-level graph convolutional networks for cross-platform anchor link prediction. In: Gupta, R., Liu, Y., Tang, J., Prakash, B.A. (eds.) KDD 2020: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, 23–27 August 2020, pp. 1503–1511. ACM (2020) 12. Wang, Z., Lv, Q., Lan, X., Zhang, Y.: Cross-lingual knowledge graph alignment via graph convolutional networks. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. pp. 349–357. Association for Computational Linguistics (2018) 13. Fiergolla, S., Wolf, P.: Improving run length encoding by preprocessing. In: 2021 Data Compression Conference (DCC), pp. 341–341 (2021) 14. Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952) 15. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948) 16. Manku, G.S., Jain, A., Sarma, A.D.: Detecting near-duplicates for web crawling. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, 8–12 May 2007. pp. 141–150. ACM (2007) 17. Liu, Y., Qiu, M., Liu, J., Liu, M.: Blockchain-based access control approaches. In: 2021 8th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2021 7th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), pp. 127–132 (2021) 18. Qin, X., Huang, Y., Yang, Z., Li, X.: A blockchain-based access control scheme with multiple attribute authorities for secure cloud data sharing. J. Syst. Archit. 112, 101854 (2021) 19. Ezhil Arasi, V., Indra Gandhi, K., Kulothungan, K.: Auditable attribute-based data access control using blockchain in cloud storage. J. Supercomput. 78(8), 1–27 (2022). https://doi. org/10.1007/s11227-021-04293-3 20. An, A.C., Diem, P.T.X., Lan, L.T.T., Toi, T.V., Binh, L.D.Q.: Building a product origins tracking system based on blockchain and poa consensus protocol. In: Le, L., Dang, T.K., Minh, Q.T., Toulouse, M., Draheim, D., Kung, J. (eds.) 2019 International Conference on Advanced Computing and Applications, ACOMP 2019, Nha Trang, Vietnam, 27–29 November 2019. pp. 27–33. IEEE Computer Society (2019)

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network for Traffic Flow Forecasting Kai Gong1 , Shiyuan Han1(B) , Xiaohui Yang1 , Weiwei Yu2 , and Yuanlin Guan3 1 Shandong Provincial Key Laboratory of Network Based Intelligent Computing,

University of Jinan, Jinan 250022, China [email protected] 2 Shandong Big Data Center, Jinan, China [email protected] 3 Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao, China [email protected]

Abstract. For complex nonlinear temporal and spatial correlation in traffic flow data, the accurate and effective traffic flow forecasting model is indispensable for understanding the traffic dynamics and predicting the future status of an evolving traffic system. In terms of spatial information extraction, existing approaches are mostly devoted to capture spatial dependency on a predefined graph, which assumes the relation between traffic nodes can be completely offered by an invariant graph structure. However, the fixed graph does not reflect real spatial dependency in traffic data. In this paper, a novel Adaptive Spatial-Temporal Graph Convolutional Network, named as TrafficSCINet, is proposed for traffic flow forecasting. Our model consists of two components: 1) AGCN module uses an adaptive adjacency matrix to dynamically learn the spatial dependencies between traffic nodes under different forecast horizon; 2) SCINet module extracts potential temporal information from traffic flow data through its superb temporal modeling capabilities. Two convolution modules in SCI-Block that have no effect on the results are removed to significantly improve the training speed of the model. Experimental results on four real-world traffic datasets demonstrate that TrafficSCINet achieves state-of-the-art performance consistently than other baselines. Keywords: Traffic Flow Forecasting · Spatial-Temporal Data · Adaptive Adjacency Matrix

1 Introduction Traffic flow forecasting is one of the most important components of Intelligent Transportation System (ITS) [1]. Accurate real-time traffic forecast provides data support for intelligent traffic light control, vehicle path planning and other core functions of ITS [2]. Traffic flow data have non-linear and complex patterns [3], and the spatial structure of traffic is non-Euclidean and directional [4]. The major challenge for traffic forecasting is that how to capture the spatial-temporal correlation simultaneously and effectively. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 628–639, 2023. https://doi.org/10.1007/978-981-99-4755-3_54

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network

629

Early ARIMA models [5, 6] relied on a large amount of historical data, could not handle the nonlinear process of traffic flow, and were not accurate enough in predicting non-stationary traffic flow. Machine learning models such as KNN [7], SVM [8] and neural network [9] are effective in short-term traffic flow forecasting, but they cannot meet the requirements of long-term forecasting tasks. Current research tends towards deep-learning-based approaches and focuses on designing complex neural network architectures to capture the spatial-temporal patterns shared by all transport series. SCINet [10] is a time series forecasting (TSF) model based on CNN. It does not use any spatial information, nor does it carry out artificial data enhancement on the seasonal periodicity of traffic data, etc., but its performance on the four public traffic data sets is superior to all the current models based on Graph Neural Networks (GNN). Therefore, it is necessary to rethink the existing spatial information capturing methods to model unstructured traffic series and their inter-dependencies. A fixed graph structure, which may contain biases and not adaptable to domains without appropriate knowledge, cannot reflect the dynamic complex patterns in traffic flow. In addition to connectivity and distance, many factors between traffic sensors are difficult to quantify, such as the location of the sensor, upstream-downstream relationship, density distribution, etc. Therefore, it is an effective choice for the network to adaptively learn the spatial dependence relationship between traffic nodes. Different from the random vector embedding method adopted by GraphWaveNet [11] and AGCRN [12], we directly use the symmetric adjacency matrix constructed by the distance between traffic nodes. Randomly initialized and comprehensive nodes embedding may capture the spatial dependency occasionally without any prior knowledge, but it also brings redundant even wrong information to the model, especially in a huge traffic network. Instead of designing more complicated network architectures, we proposed a concise yet effective framework TrafficSCINet by revising the basic building block of current methods to model temporal dynamics and spatial dependencies of traffic flow separately. By focusing on the known graph structure and continuously eliminating the effects of weak connections during the updating process, TrafficSCINet will be not distracted by irrelevant information and it will learn the appropriate spatial dependencies according to different prediction window sizes. The main contributions of this work are as follows. • An adaptive adjacency matrix construction method is proposed, which can dynamically adjust the correlation coefficient between traffic nodes. For each node, only a few but crucial nodes that are physically connected to it are considered according to different time forecast horizon. By this way, the interference of irrelevant spatial-temporal information is reduced. • A simple structure of SCI-Block is designed by removing two unnecessary convolutional modules to accelerate the model training process. The combination of adaptive GCN module and SCINet module enables TrafficSCINet to model the spatial-temporal data. • We conducted numerous experiments on four real-world datasets, and the results show that our model consistently outperforms all baseline methods. The source codes are publicly available from https://github.com/gongkai97/TrafficSCINet.

630

K. Gong et al.

2 Methodology In this section, the mathematical definition of traffic flow forecasting problem was firstly given. Then, an architecture of our framework was outlined and the two modules, the adaptive Graph Convolution Layer and the Sample Convolution and Interaction Network, were described in detail by subsequent subsections. 2.1 Problem Definition As a spatial-temporal time series forecasting task, traffic flow data includes a traffic network and historical flow of each node. • Definition 1: Traffic network G is represented as G = (V , E, A), where V = {v1 , v2 , ... , vN } is the set of nodes representing the sources of traffic series and E = {eij |1 ≤ i, j ≤ N } is the set of edges. N denotes the number of nodes. A ∈ RN ×N is a spatial adjacency matrix representing the nodes proximity or distance. • Definition 2: We use xti ∈ Rd to denote the value of all the features of node i at time t. Traffic flow sequence is denoted as Xt = (Xt1 , Xt2 , ... , XtN ) ∈ RN ×d . d is the number of attribute features. • The problem of multi-step traffic forecasting can be described as: learning a function f to forecast the next τ steps based on the past T steps historical spatial-temporal network series. θ denotes all the learnable parameters in the model: {XGt+1 , XGt+2 , . . . , XGt+τ } = fθ (XGt , XGt−1 , . . . , XGt−T +1 ).

(1)

2.2 Framework of TrafficSCINet

Fig. 1. (a) is the SCI-Block in which two convolution modules with little influence on prediction accuracy are removed. (b) is the overall architecture of SCINet. (c) is the overall architecture of TrafficSCINet, which has integrated the AGCN module, enabling it to model spatial-temporal data.

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network

631

Our proposed network framework is shown in Fig. 1(c). It extracts the spatialtemporal features of traffic flow data separately. Xinput contains a traffic network graph G, namely the adjacency matrix A and historical traffic flow data X of all nodes in the graph. According to different prediction tasks, it adaptively learns the dynamic spatial dependency graph under different forecast horizon τ . It is then added to the original time series through a residual connection to generate a new sequence with enhanced predictability. After the hierarchical encoder extracts the information at different time resolutions, the final prediction results are obtained through the decoder containing a fully connected layer. By removing the redundant convolutional modules in SCI-Block, the training speed is significantly improved with the accuracy of prediction result unaffected. 2.3 Adaptive Graph Convolution Layer Recent traffic forecasting studies generally use Graph Convolutional Network (GCN) to capture spatial correlations between traffic sequences. [13] proposed a multi-layer GCN with a layer-wise propagation rule. It can be well-approximated by 1st order Chebyshev polynomial expansion and generalized to high-dimensional GCN as: D− 2 H (l) W (l) ). D− 2 A˜  H (l+1) = σ ( 1

1

(2)

Here, A˜ = A+I is the normalized adjacency matrix of the undirected graph G with added self-connections. I is the identity matrix, and D is the degree matrix. H(l) ∈ RN ×d is the input matrix for each layer, H(0) = X . W is a layer-specific trainable weight matrix. σ (·) denotes an activation function. From a spatial-based perspective, it smooths a node’s signal by aggregating and transforming its neighborhood information. Predefined graphs do not contain complete information about spatial dependencies and are not directly relevant to prediction tasks, which can lead to considerable bias. Graph WaveNet [11] randomly initializes two node embedding dictionaries with learnable parameters E1 , E2 ∈ RN ×c . The self-adaptive adjacency matrix is defined as follows: A˜ adp = SoftMax(ReLU(E1 E2T )).

(3)

E1 is the source node embedding and E2 is the target node embedding. The spatial dependency weights between the source nodes and the target nodes can be derived by multiplying E1 and E2 . However, we believe that random initialization cannot integrate the existing spatial information well, but may bring redundant information. It is also not necessary to focus on the impact of the current node on each node. Therefore, we propose to use the initialized adjacency matrix directly and update iteratively based on it. The improvement can be expressed as follows: ˜ A˜ adp = ReLU(A).

(4)

ReLU activation function is used to eliminate weak connections. SoftMax function is removed to enhance the representation of spatial information. By representing Z ∈ Rd ×τ

632

K. Gong et al.

as the output and combining Eqs. (2) and (4), we propose the following adaptive graph convolution layer: K A˜ kadp XWk . (5) Z= k=0

It is noteworthy that our GCN layer falls into spatial-based approaches. Although it is slightly different from traditional GCN methods such as Eq. (2), it can still be interpreted as aggregating spatial neighborhood of each node. 2.4 Sample Convolution and Interaction Network SCINet [10] is an encoder-decoder structure. In the encoder part, multiple SCI-blocks are formed into a binary tree, and each SCI-Block contains multiple convolution modules and a set of interactive learning rules. The advantage is that each SCI-Block can obtain both local and global view of the entire time series. After entering SCI-Block, the time series is divided into two sub-sequences. The temporal resolution of the time series becomes coarser, but it still retains most of the information of the original sequence. SCINet rearranges the extracted features into a new sequence and adds it to the original time series. Finally, a fully connected layer is used as the decoder for prediction. After the input feature F enters the SCI-Block, it is decomposed into two subsequences Fodd and Feven by separating the even and the odd elements. Different convolution kernels extract features from Fodd and Feven respectively, and an interactive-learning strategy is proposed to allow information interchange: 

Fodd = Fodd ± φ(Feven ), 

Feven = Feven ± ψ(Fodd ).

(6) (7)

where φ and ψ represent two different 1d convolutional modules. Its structure is shown in Fig. 2. The preprocessed input data passes through a 1d convolution layer with a kernel size of k1 , expands the input channel C into h∗C. Next, a second 1d convolution layer of kernel size k2 restores the number of channels h ∗ C to input channel C. LeakyRelu activation function is used after the first convolutional layer because of its sparsity properties and a reduced likelihood of vanishing gradient. Tanh activation function after the second convolutional layer is applied since it can keep both positive and negative features into [−1, 1].

3 Experiments 3.1 Datasets and Data Preprocessing We verify the performance of TrafficSCINet on four public traffic network datasets: PeMS03, PeMS04, PeMS07, PeMS08 released by [10]. Those four datasets are constructed from four districts, respectively in California and aggregated into 5-min windows, which means there are 12 points per hour and 288 points per day. Table 1 shows the detailed information of each dataset. The spatial adjacency networks are constructed by actual road networks based on distance. We use Z-score normalization to standardize the data inputs:

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network

633

Fig. 2. The structure of φ and ψ. Table 1. Dataset description and statistics. Datasets

PeMS03

PeMS04

PeMS07

PeMS08

Nodes

358

307

883

170

Timesteps

26209

16992

28224

17856

Granularity

5 min

5 min

5 min

5 min

Start time

5/1/2012

7/1/2017

5/1/2017

3/1/2012

3.2 Baseline Methods We compare TrafficSCINet with those following models: • FC-LSTM [14], A recurrent neural network for time series prediction. • DCRNN [4], which integrates graph convolution into a encoder-decoder gated recurrent unit to respectively encode spatial and temporal information. • STGCN [15], which integrates graph convolution and gated temporal convolution through spatial-temporal convolutional blocks. • GraphWaveNet [11], a framework combines adaptive adjacency matrix into graph convolution with 1D dilated convolution. • ASTGCN(r) [16], which combines spatial and temporal attention mechanisms. Only recent components of modeling periodicity is taken to keep fair comparison. • STSGCN [17], which utilizes localized spatial-temporal subgraph module to model localized correlations independently.

634

K. Gong et al.

• STFGNN [18], which captures both local and global complicated spatial-temporal dependencies. • AGCRN [12], which can capture node-specific spatial and temporal correlations in time-series data automatically without a pre-defined graph. • SCINet [10], which effectively models time series with complex temporal dynamics by a hierarchical downsample-convolve-interact TSF framework. 3.3 Experiment Settings

Table 2. The hyperparameter in PeMS datasets. Model configurations

PeMS03

Hyperparameter

Horizon

12

Look-back window

12

Batch size

24

SCI-Block

SCINet

Learning rate

0.001

h

0.1

k

3

Dropout

0.5

Level

2

PeMS04

PeMS07

PeMS08

We implemented the TrafficSCINet model based on the PyTorch framework. Experiments are conducted under the environment with a NVIDIA GeForce 2080Ti GPU 11 GB card (NVIDIA, Santa Clara, CA, USA). We split the data with ratio 6: 2: 2 at PEMS03, PEMS04, PEMS07, PEMS08 into training sets, validation sets and test sets according to the chronological order. For fair comparison with previous baselines, we use historical data from the past hour to predict data for the next hour. We train our model by using Adam optimizer with learning rate 0.001, and the training epoch is 80. The mean absolute error (MAE) between the estimator and the ground truth are used as the loss function. More details of experimental parameters are presented in Table 2. 3.4 Experiment Results and Analysis Table 3 shows a comparison between our model and nine other different baselines. The best results are shown in bold and IMP shows the improvement of TrafficSCINet over the best model. The 60-min forecast results show that our model consistently outperforms the other baseline in each dataset. FC-LSTM and SCINet only consider the time correlation, and do not utilize the spatial information. DCRNN STGCN, STSGCN and STFGNN can only capture shared patterns among all traffic series and still rely on the pre-defined spatial connection graph. GraphWaveNet, AGCRN and TrafficSCINet use adaptive dependency matrix to dynamically extract spatial information. However, the

21.33

35.11

MAPE

RMSE

22.20

15.32

32.06

RMSE

42.84

RMSE

MAPE

15.33

MAE

29.98

39.59

RMSE

MAPE

20.33

MAPE

MAE

25.14

MAE

* denotes re-training.

PeMS08

PeMS07

PeMS04

21.33

MAE

PeMS03

Methods

FC-LSTM

Metrics

Datasets

27.83

11.45

17.86

38.58

11.66

28.30

38.12

17.12

24.70

30.31

18.91

18.18

DCRNN

27.83

11.40

18.02

38.78

11.08

25.38

35.55

14.59

22.70

30.12

17.15

17.49

STGCN

31.05

12.68

19.13

42.78

12.12

26.85

39.70

17.29

25.45

32.94

19.31

19.85

GraphWaveNet

28.16

13.08

18.61

42.57

13.92

28.05

35.22

16.56

22.93

29.66

19.40

17.69

ASTGCN(r)

26.80

10.96

17.13

39.03

10.21

24.26

33.65

13.90

21.19

29.21

16.78

17.48

STSGCN

26.22

10.60

16.64

35.80

9.21

22.07

31.88

13.02

19.83

28.34

16.30

16.77

STFGNN

25.22

10.09

15.95

36.55

9.12

22.37

32.30

12.97

19.83

28.25

15.23

15.98

AGCRN

Table 3. Performance comparison of different approaches on the PeMS datasets.

25.27

9.96

16.17

34.38

9.02

21.62

31.65

12.15

19.66

24.96

14.85

15.56

SCINet*

24.34

9.66

15.50

33.88

8.70

21.15

30.89

11.71

19.03

24.25

14.19

14.96

TrafficSCINet

3.82%

3.11%

2.90%

1.48%

5.75%

2.22%

2.46%

3.76%

3.31%

2.92%

4.65%

4.01%

IMP

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network 635

636

K. Gong et al.

relatively poor performance of GraphWaveNet indicates that more redundant information may be brought by the two randomly initialized embedding vectors, which interferes with the learning ability of its network. Table 4. The computation cost on the PeMS08 dataset. Model

Training Time(epoch)

MAE(one hour)

FC-LSTM

3.50 s

22.20

SCINet

11.30 s

16.17

TrafficSCINet

7.73 s

15.50

Table 4 shows that our model not only achieves the best prediction accuracy, but also improves the training speed by about 46% compared with SCINet. The experiments in the table are all carried out under the same hardware environment and reasoning process.

Fig. 3. Adaptive adjacency matrix with different forecast horizon τ

Figure 3 displays the learned spatial dependence matrix according to different prediction windows size in the PEMS08 dataset. (a) is the original normalized adjacency

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network

637

matrix. It is an undirected graph which means that the weight from node A to node B is the same as the weight from node B to node A. In fact, traffic speed is affected by the speed of upstream flow and downstream flow. Our model adaptively learns a directed graph. (b), (c) and (d) respectively illustrate the learned matrix generated by our model under different forecast horizon. When the forecast horizon is extremely short, such as 5 min (1 timestep), fewer neighbor nodes are considered by each node. As the forecast horizon reaches 60 min (12 timesteps), most known nodes are taken into account. 3.5 Ablation Experiments Figure 4 illustrates the results of our ablation experiment to verify the effectiveness of our AGCN module and the experiment was carried out on the PeMS08 dataset. We plotted the prediction results for each time step. Original SCINet was served as the baseline. TrafficSCINet w/o Adaptive Matrix means that a non-adaptive matrix is adopted for graph convolution according to Eq. (2). TrafficSCINet w/o ReLU represents the ReLU activation function in Eq. (4) is removed. TrafficSCINet with full parts got the best results.

Fig. 4. Ablation study on the PeMS08 dataset.

4 Conclusion and Future Work A novel spatial-temporal traffic flow forecasting model TrafficSCINet was proposed in this paper. By combining adaptive graph convolutional layer with sample convolution and interactive network, our model was able to effectively capture the hidden spatial-temporal

638

K. Gong et al.

features in traffic flow data with different time resolutions, and adaptively learn the spatial dependence relationship under different prediction lengths. After detailed experiments and analysis, TrafficSCINet was proved to be able to achieve the best performance on four real-world traffic datasets, and its training speed was significantly improved over SCINet. In future work, we will further explore the spatial-temporal dynamic correlation under different time scale prediction tasks, and find a more concise but effective spatialtemporal network model. Acknowledgments. This research was funded by the Natural Science Foundation of Shandong Province for Key Project under Grant ZR2020KF006, the National Natural Science Foundation of China under Grant 62273164, the Development Program Project of Youth Innovation Team of Institutions of Higher Learning in Shandong Province, and the Project of Shandong Province Higher Educational Science and Technology Program under Grants J16LB06 and J17KA055.

References 1. Corey, S., Minh, D.: Streets: A novel camera network dataset for traffic flow. In: Conference and Workshop on Neural Information Processing Systems, pp. 10242–10253, NeurIPS, Vancouver (2019) 2. Evangelia, C., Christina, I., Christina, M., et al.: Factors affecting bus bunching at the stop level: a geographically weighted regression approach. Int. J. Transport. Sci. Technol. 9(3), 207–217 (2020) 3. Guo, S.N., Lin, Y.F., Feng, N., et al.: Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: AAAI Conference on Artificial Intelligence, pp. 922– 929. AAAI, Hawaii (2019) 4. Li, Y.G., Yu, R., Shahabi, C., et al.: Diffusion convolutional recurrent neural network: Datadriven traffic forecasting. In: International Conference of Learning Representation, pp. 1–16. ICLR, Vancouver (2018) 5. Mohammed, S.A., Allen, R.C.: Analysis of freeway traffic time-series data by using BoxJenkins techniques. Transport. Res. Record J. Transport. Res. Board 773(722), 1–9 (1979) 6. Billy, M.W., Lester, A.H.: Modeling and forecasting vehicular traffic flow as a seasonal Arima process: theoretical basis and empirical results. J. Transp. Eng. 129(6), 664–672 (2003) 7. Van, L.J., Van, H.C.: Short-term traffic and travel time prediction models. Artif. Intell. Appl. Critical Transp. Issues 22(1), 22–41 (2012) 8. Jeong, Y.S., Byon, Y.J., Castro-Neto, M.M., et al.: Supervised weighting-online learning algorithm for short-term traffic flow prediction. IEEE Trans. Intell. Transp. Syst. 14(4), 1700– 1707 (2013) 9. Chan, K.Y., Dillon, T., Chang, E., et al.: Prediction of short-term traffic variables using intelligent swarm-based neural networks. IEEE Trans. Control Syst. Technol. 21(1), 263–274 (2013) 10. Liu, M.H., Zeng, A.L., Chen, M.X., et al.: SCINet: time series modeling and forecasting with sample convolution and interaction. In: Conference and Workshop on Neural Information Processing Systems. NeurIPS, New Orleans (2022) 11. Wu, Z., Pan, S., Long, G., et al.: Graph wavenet for deep spatial-temporal graph modeling. In: International Joint Conference on Artificial Intelligence. Morgan Kaufmann, Macao (2019) 12. Bai, L., Yao L.N., Li, C., et al.: Adaptive graph convolutional recurrent network for traffic forecasting. In: Conference and Workshop on Neural Information Processing Systems, pp. 17804–17815. NeurIPS, Online (2020)

TrafficSCINet: An Adaptive Spatial-Temporal Graph Convolutional Network

639

13. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations. ICLR. Toulon (2017) 14. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Conference and Workshop on Neural Information Processing Systems, pp. 3104–3112. NeurIPS, Montreal (2014) 15. Yu, B., Yin, H.T., Zhu, Z.X.: Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In: International Joint Conferences on Artificial Intelligence. Morgan Kaufmann, Sweden (2017) 16. Guo, S.N., Lin, Y.F., Feng, N., et al.: Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: AAAI Conference on Artificial Intelligence. AAAI, Hawaii (2019) 17. Song, C., Lin, Y.F., Guo, S.G., et al.: Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In: AAAI Conference on Artificial Intelligence. AAAI, New York (2020) 18. Li, M.Z., Zhu, Z.X.: Spatial-temporal fusion graph neural networks for traffic flow forecasting. In: AAAI Conference on Artificial Intelligence. AAAI, Beijing (2021)

CLSTGCN: Closed Loop Based Spatial-Temporal Convolution Networks for Traffic Flow Prediction Hao Li1 , Shiyuan Han1(B) , Jinghang Zhao1 , Yang Lian1 , Weiwei Yu2 , and Xixin Yang3 1 School of Information Science and Engineering, University of Jinan, Jinan, China

[email protected]

2 Shandong Big Data Center, Jinan, China

[email protected]

3 College of Computer Science and Technology, Qingdao University, Qingdao, China

[email protected]

Abstract. Traffic flow prediction plays a crucial role in assisting operation of road network and road planning. However, due to the dynamic correlations of road network nodes, the physical connectivity may not reflect the relationship of roads nodes. In this paper, a closed loop based spatial-temporal graph convolution neural networks (CLSTGCN) is proposed by constructing the closed loop with spatial correlation information of road network nodes. The designed model consists of multiple spatial-temporal blocks, which combines the attention mechanism with closed loop correlation information to promote the aggregation in spatial dimensions. Meanwhile, in order to improve the accuracy of long-term prediction, longterm road network trend is supplied into the model, which can capture the temporal features accurately. The experiments on two real world datasets demonstrate that the proposed model outperforms the state of art baselines. Keywords: Graph neural network · Traffic flow prediction · Spatial-temporal correlation

1 Introduction With the modernization of the city, the increasing number of vehicles ownership aggravates the burden of road network operation and raises the demand for intelligent transportation system (ITS). ITS is an important symbol of urban modernization. Traffic flow prediction is an important part of ITS, which is a useful tool to relieve urban congestion and improve urban traffic efficiency. The key elements of traffic flow include traffic volume, traffic speed and traffic density. Traffic flow prediction is to forecast future flow data based on the historical observation data. According to the forecast length, traffic flow prediction can be divided into short term forecast of 5 to 30 min, medium term forecast of 30 to 60 min and longterm forecast of more than 60 min. It should be pointed that the traffic flow data has both temporal correlations and spatial correlations. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 640–651, 2023. https://doi.org/10.1007/978-981-99-4755-3_55

CLSTGCN

641

Traffic flow data is a kind of time series data. In the early stage, researchers utilized traditional algorithms to mine the temporal dependencies of traffic flow data. In 1996, Voort et al. combined Kohonen mapping and ARIMA to predict short-term traffic flow [1]. In 2006, Sun et al. proposed a traffic flow prediction method based on Bayesian [2]. Machine learning methods are widely used in traffic flow prediction, such as VAR [3], K-NN [4] and other methods. Meanwhile, with the the development of intelligent computation theory, deep learning models are applied in the field of traffic flow prediction [5]. In 2016, Rui et al. used LSTM and GRU to capture the temporal correlations of traffic flow data [6]. In 2017, Yu et al. applied LSTM to predict traffic flow and simulate random factors in reality [7]. In 2020, Cao et al. combined LSTM with CNN to construct a deep learning model, which extracted the periodic features of traffic flow data [8]. Compared with traditional algorithms, deep learning models improve the prediction accuracy. However, these methods cannot predict the long-term traffic flow on graph structure accurately. The graph structure is suitable for expressing the road network. Graph neural networks develop to implement convolution operations on non-grid data. The main graph convolution methods are divided into two categories: spectral approaches and spatial approaches. Spectral approaches transform nodes from spatial space to the spectral domain by Fourier transform. In 2013, Bruna et al. extended the traditional CNN from the Euclidean domain to a non-Euclidean domain, and accomplished convolution by spectral approaches for the first time [9]. In 2016, Defferand et al. proposed ChebyNet [10], which used Chebyshev polynomial approximation to fit convolution kernels for spectral domain convolution. ChebyNet avoided the exigent-decomposition of the Laplacian matrix and reduced the number of parameters. Spectral approaches are a special form of spatial approaches. Spatial approaches aim to directly aggregate the neighborhood information of nodes in the spatial domain. In 2017, Kipf et al. proposed GCN [11], which avoided the need for multiple matrix multiplications. It was the first time that spatial approach was employed to perform convolutions on non-Euclidean data. In 2015, Duvenaud et al. implemented convolution by taking a weighted sum for each node with its neighboring information [12]. In recent years, researchers applied graph-neural network-based models to capture spatial dependencies in traffic flow data while taking into account temporal correlations. In 2018, Yu et al. used Chebyshev polynomial-based GNNs and CNNs to capture both spatial and temporal correlations [13]. In 2019, Guo et al. applied spatial-temporal attention mechanisms to model the correlations of spatialtemporal data [14]. In the same year, Bai et al. used cascaded graph neural recursive networks to capture spatial-temporal correlations and incorporated external meteorological factors and time elements [15]. In 2019, Zhu et al. proposed a seq2seq model with a multi-graph convolutional encoder-decoder structure that incorporates attention mechanisms, where multiple graphs reflect spatial relationships from different perspectives [16]. In 2020, Song et al. designed multiple prediction modules for different time periods in a prediction model, which captured the heterogeneity of local spatial-temporal graphs effectively [17]. However, the above methods are limited by the pre-defined static matrices, which cannot reflect the dynamic correlations of road network nodes, and cannot describe the spatial dependencies among road networks in essence.

642

H. Li et al.

In order to address the aforementioned problem, a deep learning model based on graph convolution neural networks is designed to tackle this problem, named as Closed Loop based Spatial-Temporal Graph Convolution Networks (CLSTGCN). The main contributions are as follows: (1) A method is proposed to improve the accuracy of long-term prediction on graph structure. More detailed, a precise long-term trend of road network signal is predicted as a compensation, which is added into the convolution to predict the long-term data in temporal dimensions. (2) The negative correlation coefficient among road network nodes is computed by PCCs. A set of nodes with strong negative correlation are regarded as a closed loop, which is designed to capture the correlation information. (3) Extensive experiments on two real world datasets, the proposed model outperforms state-of-the-art baselines. The remainder of the paper is organized as follows. In Sect. 2, traffic flow prediction definition and problem are listed. In Sect. 3, the overall structure of the CLSTGCN is introduced. In Sect. 4, two datasets are utilized to conduct experiments for comparing with other models. Our work is concluded in Sect. 5.

2 Preliminary In order to predict the traffic flow data, two definitions are described as follows. Definition 1: A traffic network is defined as a directed graph G = {V , E, A}, V is a set of nodes, |V | = N . Each node represents a traffic sensor. E is the edge set of the graph G. A is the adjacency matrix of the road network G. If there is connected between vi and vj , Ai,j equals to 1, otherwise, Ai,j equals to 0. Definition 2: Traffic data is collected every five minutes by sensors in the road network. Graph signal Xt ∈ RN ×C represents the traffic condition at time t in graph G. Xt,n ∈ RC represents the ith node information in the graph. Thus, the problem is described as follows. Problem: Traffic flow data usually refers to traffic volume, vehicle speed, and density. In as the only one parameter, where X =  the problem, traffic volume X is predicted Xt−T , Xt−T +1 , . . . . . . , X t−1 , Xt ∈ RT ×N ×C , in which C equals to 1. We aim to predict graph signals for next τ time steps:

X (t+1):(t+τ ) = F[X (t−T ):t ] 

where X = X (t+1):(t+τ ) ∈ Rτ ×N ×C , and F is a mapping function.

(1)

CLSTGCN

643

3 Methodology In this section, the overall structure of proposed CLSTGCN model is introduced and the details of each module are given. 3.1 Network Structure CLSTGCN is shown in Fig. 1. The model is stacked with several ST blocks. Moreover, residual connections are established between the ST blocks.

Fig. 1. The structure of CLSTGCN

Considering a historical time series of road network signal X = X (t−T ):t , where T is the series length. The output X = X (t+1):(t+τ ) is obtained by CLSTGCN. The CLSTGCN is stacked with several Spatial-Temporal (ST) blocks and a full-connected layer. ST block is combined with a spatial-temporal attention (STAtt) block and a spatialtemporal convolution (STConv) block. The STAtt block includes a temporal attention module (TAtt) and a spatial attention (SAtt) module. The STConv block includes a temporal convolution (TConv) module and a spatial convolution (SConv) module. 

3.2 Spatial Correlation Information For traffic flow prediction, the pre-defined static adjacency matrices cannot reflect node connectivity accurately, which results in a decline in prediction accuracy. To tackle this problem, PCCs is employed to evaluate the degree of correlation information among road network nodes. The node information is updated and aggregated from the nodes with high negative correlation coefficient. Without loss of generality, vehicles always flow from one node to other nodes. By exploring this relationship, a dynamic correlation matrix is constructed. Pearson correlation coefficient is defined as: n i=1 (Xi − X i )(Xj − X j ) (2) P(vi ,vj ) =   n 2 n 2 i=1 (Xi − X i ) i=1 (Xj − X j )   A Pearson correlation coefficient matrix is defined as APCCs , in which APCCs i, j =   P(vi ,vj ) . If APCCs i, j is larger than 0, those elements are set as 0. Non-zero values in APCCs are taken absolute values. After that, the top K values in each row in descending

644

H. Li et al.

order are normalized. The other values are set as 0. The trend correlation matching matrix ATCMA ∈ RN ×N is defined as: exp(AAPCC [i,j] ) ATCMA[i.j] = K j=1 exp(AAPCC [i,j] )

(3)

3.3 Spatial-Temporal Attention In this subsection, the proposed spatial-temporal attention block is described in detail. In spatial dimensions, the spatial attention mechanism module captures the correlation between nodes. In temporal dimensions, the attention block captures the relationship between graph signals at different time slices. As shown in Fig. 2. Softmax function is employed as the activation function.

Fig. 2. Detail of STAtt Block

Temporal Attention Module. A multi-head temporal attention module is proposed in this subsection, which captures the relationships between different time steps adaptively. To model the nonlinear correlations among road network, taking the X (l) ∈ RN ×C×T as an input, the attention mechanism is calculated by queries (Q), keys (K) and values (V ), which is described as (h)

X (l) Wq(h) = Q(l) , X (l) Wk

(h)

= K (l) , X (l) WV = V (l)

T

Q(l) K (l) (l) (l) (l) = Softmax Att Q , K , V V (l) √ dh

(4)

(5)

where Q(l) , K (l) , V (l) are projected H times with H different matrices, which exploit (l) effectively the dynamic temporal dependency within traffic data. XTA is the input of l th TA module.



1

H  (l) (l) (l) (l) (l) (l) (l) (6) , . . . , Att Q , K , V XTA = concat Att Q , K , V Spatial Attention Module. An improved multi-head self-attention is designed to capture spatial dependencies. In order to aggregate accurate information between nodes all over

CLSTGCN

network, the closed loops are added into the attention mechanism. ⎞ ⎛ (l) (h) (l) (h)

XTA Wk XTA Wk P (h) = Softmax⎝ + W (h)  ATCM A ⎠ √ dh

(l) PSA = concat P (1) , P (2) , . . . . . . , P (H )

645

(7) (8)

where W (h) is a learnable parameter, which is used to amend ATCM A for adjusting the attention of each head. 3.4 Spatial-Temporal Convolution In this subsection, a spatial-temporal convolution block is proposed, which is shown in Fig. 3. In temporal dimensions, 2D convolution extracts the features of the neighboring time slices. In spatial dimensions, graph convolution neural networks are used to aggregate and transmit information in non-Euclidean space. Thus, the spatial correlations are captured.

Fig. 3. STConv Block

Temporal Convolution Module. Temporal dependency is an important feature of timeseries data. The most representative model for predicting time series data is Recurrent Neural Network (RNN). However, it is limited due to vanishing gradient and exploding gradient. In 1997, Hochreiter et al. proposed Long Short Term Memory (LSTM) to solve the problem [18]. LSTM is utilized to extract temporal features. Combining the temporal features with the long-term trend predicted by the LSTM, the accuracy of long-term prediction is raised in temporal dimensions. Compared with predicting the precise values of each node in the road network, it is simple to predict the overall trend of the road network. Therefore, the trend is incorporated into temporal convolution as a compensation. LSTM is combined with input gate, forget gate and output gate, which is shown in Fig. 4. The information from the previous hidden state and the current input is concatenated and passed through a sigmoid function, where the output value determines the probability of retention, and the higher values indicates greater retention probability. Then the forget gate ft determines the discarded or retained information, which is defined as: ft = σ (Wf [ht−1 , xt ] + bf )

(9)

646

H. Li et al.

Fig. 4. Overview of LSTM

The hidden state information from the previous layers and the current input are concatenated and passed through a sigmoid function, where the output value determines the probability of retention. Then the input gate it is used to update the cell state, which is presented as: it = σ (Wi · [ht−1 , xt ] + bi ) Ct describes the current cell state, which is described as     C˜ t = tanh WC ht−1 , xt + bC Ct = ft  Ct−1 + it  C˜ t

(10)

(11) (12)

The output gate ot is used to determine the value of the next hidden state, which contains the information from previous inputs. The previous hidden state and current input are passed through a sigmoid function, then the new cell state is computed by the tanh function. The resulting value from the activation function determines the next hidden state, which is passed to the next neuron. ot = σ (Wo [ ht−1 , xt ] + bo )

(13)

ht = ot  tanh(Ct )

(14)

At the last time step t, the output yt is obtained, which given by: 

yt = σ W ht

(15)

The fully connected layers map yt to YL , which presents the trend. Combining the (l) 2D convolution with the tendency of a certain weight, YT is obtained by

(l) (16) YT = ReLU  ∗ X (l) + WL  YL Spatial Graph Convolution Module. Extracting complex spatial correlations is a key issue in traffic flow prediction. Convolutional neural network is only applied to Euclidean

CLSTGCN

647

spaces such as images, and cannot be applied to the complex topological structure such as urban road networks. In recent years, the graph convolutional neural networks develop rapidly, which can process graph structure data and capture spatial features between nodes by constructing filters and stacking multiple convolutional layers. The graph convolutional network module captures the spatial correlations of urban road network effectively and improve the accuracy of traffic prediction. Spectral approach is used to implement this module, which is a signal transformed into the spectral domain and convoluted, and transformed back to the node space. The road network can be represented by a graph G. The Laplacian matrix is defined as L = D − A characterizing the smoothness of signals on the graph. The standardized  1 1 form of L is L = I − D− 2 AD− 2 . Degree matrix D is a diagonal matrix, Dii = j Aij . I ∈ RN ×N is a unit matrix. The decomposition of the Laplacian matrix of a graph into L = U U T , where  = diag([λ0 , ..., λN −1 ]) ∈ RN ×N is a diagonal matrix and  is a set of eigenvalues of the Laplace matrix L. The corresponding eigenvectors are U = [u1 , . . . , un ], which form a set of N bases for a vector space. According to convolution theorem, the convolution of two signals can be viewed as the convolution of their Fourier transforms. We project the traffic flow signal xt onto N bases to get the spectral domain representation x, x = U T x. The convolution kernel y is projected into the spectral domain to get y. 









x∗G y = U (xt  y)

(17)



where y = U T y. ∗G represents the graph convolution operation. There are three drawbacks to above approach: (a) The approach relies on the eigenvalue decomposition of the Laplacian matrix, so the time complexity is O(n3 ). (b) The time complexity of using the Laplacian matrix for Fourier transformation is O(n2 ), which results in significant time overhead when the scale of the graph is large. To address the above issues, Defferrard et al. proposed ChebyNet [10]. This module is based on ChebyNet to implement convolution on graph structure, in which gβ () = gθ ∗G x =

K−1 k=0

K−1 k=0



βk Tk ()

(18)



βk Tk (L)x

(19)

The parameters βk are the coefficients of the k th term of the Chebyshev polynomials.







2 Tk (L) is a Chebyshev polynomial with L as the independent variable. L= λmax L − IN . ∼

∼ ∼ λmax is the max eigenvalue. T0 L = IN , T1 L =L. The recursive formula for the ∼



∼ ∼ Chebyshev polynomial is Tk L = 2 L Tk L − Tk−2 (L). In order to promote the aggregation of road network nodes information dynamically, each term of the Chebyshev (l) polynomial is element-wise multiplied with the PSA , which is proposed in Sect. 3.2. Therefore, a new graph convolution is proposed as

gθ ∗G x =

K−1 k=0



l βk (Tk (L)  PSA )x

(20)

648

H. Li et al.

Compared with previous models, the proposed spatial graph convolution module not only aggregates information from neighboring nodes, but also combine information with distant nodes accurately. Thus, more relevant information is considered.

4 Experiment Within this section, the effectiveness of CLSTGCN is proved by performing experiments on two authentic spatial-temporal traffic network datasets. All experiments are compiled and tested on a Linux cluster (CPU: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz, GPU: NVIDIA Geforce RTX 3090 Founders Edition). The mean square error (MSE) is used as loss function. During the training phase, the learning rate is 0.0001, and the optimizer is Adam. 4.1 Dataset In this paper, two datasets are employed, including the PEMSD7 Dataset collected by the Performance measurement system of California Transportation Bureau in 2018, and the data collected by the Hisense Intelligent Traffic Signal Control System of JiNan Urban Transportation Network (JiNan Dataset). The sensors collect the data every five minutes. The details of the datasets are shown in the Table 1. Table 1. Dataset description Dataset

#Sensor

Time span

PeMSD4

883

5/1/2017–8/31/2017

JiNan

261

9/30/2020–10/10/2021

4.2 Model Evaluation Traffic flow prediction is a regression problem essentially, in which three evaluation performance indicators are usually used to evaluate the effectiveness of the proposed model, including Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE).  1 n (yi − yi )2 (21) RMSE = i=1 n 1 n MAE = | yi − yi | (22) i=1 n 100% n yi − yi MAPE = | |% (23) i=1 n yi 







where n denotes samples size. yi and yi indicate the true value and estimate value.

CLSTGCN

649

4.3 Experiment Result and Analysis The proposed model is compared with six baselines, including DCRNN [19], STGCN [15], ASTGCN [14], STGODE [20], STFGNN [21], DSTAGNN [22]. We divided the datasets into 6:2:2 training set, test set and verification set according to the time step. The past 12 continuous time steps are employed to predict the future 12 continuous time steps. The time cost of proposed model is less than baselines. The average results of the prediction performance of the traffic flow in the next hour are predicted by proposed models and baselines on the datasets, which is shown in Table 2. DCRNN combines diffusion convolution and RNN to predict traffic flow. However, the ability for long-term prediction is limited by the RNN. STGCN and ASTGCN combine CNN and GNN to capture spatial-temporal features, but the connectivity of the nodes is described by pr-defined static adjacency matrix, which cannot reflect the change among road network, and the STGODE, STFGNN, and DSTAGNN have achieved good prediction results. STGODE and STFGCN use dynamic graph convolution or build sparse matrices to capture connected node spatial information, and DTSAGNN captures node relationships by computing the node similarity. The performance of the proposed CLSTGCN on PEMSD7 dataset is higher than all the baselines, MAPE is slightly worse than ours_CL on JiNan dataset. Overall, the proposed model has good performance in the field of traffic flow prediction. Table 2. Average performance comparison of different models on PEMSD7 and JiNan Dataset

Method

DCRNN (2017)

STGCN (2018)

ASTGCN (2019)

STGODE (2021)

STFGNN (2021)

DSTAGNN(2022)

Ours_LT

Ours_CL

Ours

PEMSD7

MAE

25.30

26.38

25.22

22.59

22.15

21.42

21.02

20.54

20.14

MAPE

11.73

10.57

11.07

10.23

10.14

9.15

9.02

8.71

8.54

RMSE

36.52

40.72

39.24

37.54

35.80

34.93

35.12

35.67

34.41

JiNan

MAE

4.38

4.97

5.19

4.33

5.48

3.77

3.74

3.94

3.61

MAPE

11.99

12.64

12.35

11.56

13.88

11.02

11.69

10.82

11.03

RMSE

8.21

8.55

8.44

8.91

10.16

7.88

8.02

7.82

7.69

In order to estimate the influence of each module in proposed model, Ours_LT is used to represent the proposed model without long-term trend and Ours_CL represents the model without closed loop. The traffic flow in the next two hours is predicted on the PEMSD7 dataset, which is shown in Fig. 5. Compared with Ours_LT and ours_CL models, the proposed CLSTGCN can effectively improve the accuracy in traffic flow prediction. Meanwhile, the experiment demonstrates that long-term trend and closed loop play a role in long-term forecasting and aggregating node information.

650

H. Li et al.

Fig. 5. The comparison curves among proposed CLSTGCN, Ours_LT and Ours_CL in PEMSD7

5 Conclusion CLSTGCN has been proposed in this paper for improving the predict accuracy of traffic flow, which was designed based on spatial-temporal correlations mining. In temporal dimensions, the attention mechanism incorporates the temporal trend to improve the accuracy of long-term prediction. In spatial dimensions, the nodes with negative correlations formed multiple closed loops, and the node correlation information aggregation was promoted by the loops, which contained intrinsic spatial dependence. Experiments on two real world datasets demonstrated that the proposed model could improve prediction accuracy significantly. Acknowledgments. This research was funded by the Natural Science Foundation of Shandong Province for Key Project under GrantZR2020KF006, the National Natural Science Foundation of China under Grant 62273164, the Development Program Project of Youth Innovation Team of Institutions of Higher Learning in Shandong Province, and the Project of Shandong Province Higher Educational Science and Technology Program under Grants J16LB06 and J17KA055.

References 1. Van Der Voort, M., Dougherty, M., Watson, S.: Combining Kohonen maps with ARIMA time series models to forecast traffic flow. Transport. Res. Part C Emerg. Technol. 4(5), 307–318 (1996) 2. Sun, S., Zhang, C., Yu, G.: A bayesian network approach to traffic flow forecasting. IEEE Trans. Intell. Transp. Syst. 7(1), 124–132 (2006) 3. Lu, Z., Zhou, C., Wu, J.: Integrating granger causality and vector auto-regression for traffic prediction of large-scale WLANs. KSII Trans. Internet Inf. Syst. (TIIS) 10(1), 136–151 (2016) 4. Davis, G.A., Nihan, N.L.: Nonparametric regression and short-term freeway traffic forecasting. J. Transp. Eng. 117(2), 178–188 (1991) 5. Zhang, L.D., Jia, L., Zhu, W.X.: Overview of traffic flow hybrid ANN forecasting algorithm study. In: 2010 International Conference on Computer Application and System Modeling, pp. 1–615. IEEE, Taiyuan (2010)

CLSTGCN

651

6. Rui, F., Zuo, Z., Li, L.: Using LSTM and GRU neural network methods for traffic flow prediction. In: 2016 31st Youth Academic Annual Conference of Chinese Association of Automation, pp. 324–328. IEEE, Wuhan (2016) 7. Yu, R., Li,Y., Shahabi, C., et al.: Deep learning: a generic approach for extreme condition traffic forecasting. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 777–785. SIAM, Charleston (2017) 8. Cao, M., Li, V., Chan, V.: A CNN-LSTM model for traffic speed prediction. In: 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), pp. 1–5. IEEE, Virtual Conference (2020) 9. Bruna, J., Zaremba, W., Szlam, A., et al.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013) 10. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems, pp. 3844–3852 , MIT Press, Barcelona (2016) 11. Kipf, T.N., Welling, M., Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations. OpenReview.net, Toulon (2017) 12. Duvenaud, D., Maclaurin, D., Aguilera-Iparraguirre, J., et al.: Convolutional networks on graphs for learning molecular fingerprints. In: Advances in Neural Information Processing Systems 28, pp. 2224–2232. MIT Press, Montreal (2015) 13. Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In: International Joint Conference on Artificial Intelligence, pp. 3634–3640. Morgan Kaufmann, Stockholm (2018) 14. Guo, S., Lin, Y., Feng, N., et al.: Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In: National Conference on Artificial Intelligence Association for the Advancement of Artificial Intelligence, pp. 922–929. AAAI, Hawaii (2019) 15. Bai, L., Yao, L., Kanhere, S.S., et al.: Spatio-temporal graph convolutional and recurrent networks for citywide passenger demand prediction. In: the 28th ACM International Conference, pp. 2293–2296. ACM, Beijing (2019) 16. Zhu, H., Luo, Y., Liu, Q., et al.: Multistep flow prediction on car-sharing systems: a multigraph convolutional neural network with attention mechanism. Int. J. Softw. Eng. Knowl. Eng. 29(11n12), 1727–1740 (2019) 17. Song, C., Lin, Y., Guo, S., et al.: Spatial-temporal synchronous graph convolutional networks: a new framework for spatial-temporal network data forecasting. In: Association for the Advancement of Artificial Intelligence, pp. 914–921. AAAI, New York(2020) 18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 19. Li, Y., Yu, R., Shahabi, C., et al.: Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017) 20. Fang, Z., Long, Q., Song, G., et al.: Spatial-temporal graph ODE networks for traffic flow forecasting. In: the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 364–373. Singapore (2021) 21. Li, M., Zhu, Z.: Spatial-temporal fusion graph neural networks for traffic flow forecasting. In: the AAAI Conference on Artificial Intelligence, pp. 4198–4196. AAAI, Virtual (2021) 22. Lan, S., Ma, Y., Huang,W., et al.: DSTAGNN: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In International Conference on Machine Learning, pp.11906–11917. Baltimore, PMLR (2022)

A Current Prediction Model Based on LSTM and Ensemble Learning for Remote Palpation Fuyang Wei, Jianhui Zhao(B) , and Zhiyong Yuan(B) School of Computer Science, Wuhan University, Wuhan, China {jianhuizhao,zhiyongyuan}@whu.edu.cn

Abstract. As an important technology of virtual reality, tactile reproduction enables users to touch and perceive virtual objects. The emergence of remote palpation technology based on tactile reproduction provides a new idea for disease diagnosis. Remote palpation technology collects tactile perception from the patient side, transmits it to the doctor side through the network, and uses various tactile devices to reproduce it. However tactile reproduction has high accuracy and real-time requirements. Our group designed a handheld tactile perception and reproduction system for remote palpation. This system generates tactile force feedback by driving coil current in an electromagnetic tactile device. Therefore, it is very important to establish a fast and accurate current prediction model to generate tactile feedback. We observe temporal relationships in tactile information, so we take advantage of the time series prediction model LSTM and the regression prediction model GRNN, and use the idea of ensemble learning to build a more powerful and accurate current prediction model. We conduct comprehensive experiments, and the experimental results show that our proposed method helps to improve the accuracy and speed of remote palpation. Keywords: Current Prediction Model · LSTM · Ensemble Learning · Remote Palpation

1 Introduction Human palpation is a method that directly utilizes tactile feedback for medical diagnosis. Doctors touch and press the skin, fascia, muscles and other parts of the patient’s body, and judge their functional status and lesion degree according to the stiffness of the soft tissue [1]. With the advancement of virtual reality research and communication technology, remote palpation has become a new way for doctors to diagnose patients without contact. The wide application of this technology faces two problems: tactile perception and tactile reproduction. Tactile perception usually uses various sensors to capture information such as position, posture, and pressure when generating tactile information. Antonia et al. [2] fabricated a wearable fingertip tactile device to simulate the surface stiffness properties of remote objects. The device uses an inertial measurement unit to track finger movements. Filippeschi et al. [3] in Italy designed a handheld tactile interface for remote palpation, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 652–661, 2023. https://doi.org/10.1007/978-981-99-4755-3_56

A Current Prediction Model Based on LSTM

653

allowing users to obtain mechanical tactile feedback corresponding to the contact site while remotely touching the patient’s abdomen. This tactile data acquired at the patient end should be preserved so that similar devices can be used for tactile reproduction. Tactile reproduction enables humans to acquire information by generating kinesthetic feedback [4]. Existing kinesthetic feedback devices usually use high-precision mechanical structures with torque motors to generate force feedback. For example, the Geomagic Touch X [5] realizes the free movement of the operating handle in space through the series connection of four joints, and the first two active joints Control to output feedback force to the operating handle. This type of mechanical device has the characteristics of high position detection accuracy and large output force range, but the mechanical linkage and inherent friction limit the flexibility and accuracy of haptic reproduction. More researchers have studied electromagnetic tactile devices. Weiss et al. [6] independently controlled the strength of each coil’s magnetic field to superimpose the electromagnetic attraction of each coil to generate a preset force feedback. Zhang Qi et al. [7] proved the linear relationship between the current of a single coil and the feedback force within a specific range when using an electromagnetic coil array for 3D surface simulation. Alaa et al. [8] combined the position feedback of the joystick to provide a controllable electromagnetic force to the permanent magnet with a tactile rendering algorithm. Electromagnetic tactile devices generate force feedback without friction, and have become one of the ways of tactile reproduction in remote palpation. Our group designed an electromagnetic tactile device for remote palpation scenarios. Compared with simply using a magnetic field to simulate tactile feedback, the data of remote palpation has continuous and periodic changes. For example, doctors usually press the skin at a fixed frequency to sense skin elasticity, and the touch position also changes continuously. Inspired by this, we pay attention to the spatial position and pressure changes at the same time, and capture the time series relationship existing therein, create a time series prediction model and a regression prediction model, and use the idea of ensemble learning for model fusion to build a more robust current prediction Model. The main contributions of this paper are as follows: • We pay attention to the temporal relationship of the data and proposed a current prediction model based on LSTM. • We use the idea of ensemble learning to integrate the time series prediction model and the regression prediction model to obtain a more ideal current prediction. • We conduct sufficient experiments, and the experimental results show that our method has better performance.

2 Related Work 2.1 Remote Palpation System Our group designed a handheld tactile perception and reproduction system for remote palpation to assist local doctors to perform palpation operations on remote patients. The whole system uses the tactile joystick as the carrier of tactile perception and feedback, and is divided into three modules: handheld tactile acquisition, remote tactile reproduction

654

F. Wei et al.

and tactile information transmission. Among them, the hand-held palpation joystick is fixed with visual positioning markers, inertial sensors and pressure sensors, which are used to simultaneously collect the position, posture and pressure information of the joystick during the remote palpation process, and fuse them into tactile information for trajectory and force feedback. The tactile reproduction module at the local doctor end receives the collected tactile information, uses a high-frequency current to drive the coil array to generate a magnetic field, and generates a magnetic force on the permanent magnet at the tip of the joystick to simulate the force feedback collected at the remote patient end. The overall structure of the entire remote palpation system is shown in Fig. 1.

Fig. 1. The architecture of our designed remote palpation system. The three modules include tactile acquisition from remote patient end, tactile information transmission in network layer, and tactile reproduction in local doctor end.

We focus on how to utilize the transmitted tactile information to achieve stable force feedback generation in electromagnetic tactile devices. The electromagnetic tactile device consists of three iron core coils to form a coil array. The three coils are placed on the same plane in the form of an equilateral triangle, with a distance of 120 mm from each other and an inclination of 30° to the center of the three coils. According to the basic laws of electromagnetism, we can draw the following conclusions: • The current generation of each coil is independent of each other, and the energized coil can excite a magnetic field in space. • The magnetic fields of multiple coils are superimposed on each other in space, and the magnetic field at any position in space has a direct and clear strong coupling relationship with multiple coils. • The magnetic force on the permanent magnet is the sum of the Ampere force of the space magnetic field on all the current microelements on the permanent magnet. In remote palpation, there are high requirements for the calculation time and accuracy of tactile reproduction, which not only requires low network delay for information transmission, but also requires the establishment of a fast and accurate feedback force calculation model. According to Biot Savart’s law, when the position and posture of the energized coil is fixed and the energized current is constant, the magnetic induction

A Current Prediction Model Based on LSTM

655

intensity generated at a certain point in space is proportional to the magnitude of the energized coil current. In addition, the magnetic induction intensity of the magnetic field generated by multiple energized coils at the same point conforms to the principle of vector superposition. Therefore, we can calculate the magnetic field produced by the three coils at any point in space.  μI dl × e B= (1) 2 L 4π R ⎞⎛ ⎞ ⎛ ⎞ ⎛ A1x A2x A3x I1 Bx ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ (2) ⎝ By ⎠ = ⎝ A1y A2y A3y ⎠⎝ I2 ⎠ Bz I3 A1z A2z A3z Among them, B represents the magnetic induction intensity, μ represents the vacuum permeability, I represents the magnitude of the coil current, dl represents the element of the current path, e represents the direction vector of the current element pointing to the field point to be obtained, and R is the distance between the spatial position and the current element. Aix represents a constant parameter. After obtaining the spatial magnetic field strength, the current microelement of the permanent magnet can be integrated to calculate the magnetic force on the permanent magnet. ˚ r × (∇ × M ) × Bdv (3) F= v

Among them, r represents the direction vector, ∇ represents differential operator, M represents the permanent magnet magnetization. Obviously, we can calculate the analytical formulas of force feedback and coil current by solving the equations, but it is difficult to ensure the accuracy of the solution due to uncertain physical parameters, and the calculation of the analytical solution is also difficult to meet the real-time requirements. Therefore, a faster and more accurate feedback force calculation model is needed. 2.2 Recurrent Neural Network and Ensemble Learning Recurrent Neural Network (RNN) is a neural network specially designed to process time series data [9]. It works on the principle of saving the output of a particular layer and feeding this back to the input in order to predict the output of the layer. The range of context information stored by standard recurrent neural network structures is limited, which limits the application of RNN. Long short-term memory unit (LSTM) is proposed a to solve the problem of gradient disappearance in the time dimension of standard recurrent neural networks [10]. The LSTM unit uses the input gate, output gate and forget gate to control the transmission of sequence information, so as to realize the preservation and transmission of a wide range of context information. Since we observe that the tactile data has a temporal relationship, we can use LSTM to model the tactile data. Ensemble learning is the combination of multiple weakly supervised models to obtain a better and more comprehensive strongly supervised model [11]. Different learners

656

F. Wei et al.

have different preference models, but each is a weakly supervised model, and ensemble learning combines multiple weakly supervised models to obtain a good strong supervised model. Therefore, different learners correct each other’s errors to achieve the final accuracy improvement. In the current prediction model that we applied to the remote palpation scene, both the regression prediction model and the time series prediction model were used as weakly supervised models, and we combined them to obtain a prediction model with higher accuracy.

3 Method We propose a novel current prediction model for more accurate generation of force feedback for remote palpation. This model combines LSTM and GRNN with Ensemble Learning. We uniformly the input as s including space coordinates x y z of the joystick and the force feedback f . The output is the current denoted by I. The overall structure of our current prediction model is shown in Fig. 2.

Fig. 2. The overall structure of our current prediction module. It includes LSTM as time series prediction model and GRNN as regression prediction model, these two models as fused with the idea of ensemble learning.

3.1 Temporal Relationships in Spatial Position and Force Feedback In classic electromagnetic tactile devices, analytical calculation is usually used to solve the magnitude of the force feedback, which is difficult to calculate. However, due to the existence of the electromagnetic physics simulation software Maxwell, we can obtain

A Current Prediction Model Based on LSTM

657

the data in the electromagnetic tactile device more quickly. The parameter settings of the simulation software Maxwell are shown in Table 1. We use the finite element method to calculate the magnetic force on the permanent magnet by adjusting the spatial position of the permanent magnet and the value of the excitation current in the coil array. We have collected 1000 items of data, the current range of each item of data is from 0 to 10000 mA, and the space coordinate range is limited to the spherical space with a radius of 500 mm reaching the center of the three coils array. We found that the spatial position is always changing continuously in the collected data, and it is impossible for the joystick to discretely jump from one position to another. Taking a typical press-to-palpation operation as an example, the spatial position of the joystick is usually descending and ascending in the vertical direction. We compose a set of data with a time series transformation relationship into a tactile operation. Since the frequency of the tactile sensor is usually 20 Hz, and a tactile operation lasts for 2s, a tactile operation includes 40 pieces of data. We naturally think of using a time series prediction model to predict the current value in next tactile operation. Table 1. The parameter settings of the simulation software Maxwell. Parameter

Value

Parameter

Value

Radius of the permanent magnet(mm)

5

Number of turns of copper wire

1041

Height of the permanent magnet(mm)

10

Radius of pure iron core(mm)

12.5

Inner diameter of copper conduit(mm)

15

Height of pure iron core(mm)

50

Outer diameter of copper conduit(mm)

37.5

Height of copper conduit(mm)

46

Diameter of copper(mm)

0.84

Conductivity of copper wire(107 S/m)

5

magnetizing current density of permanent magnet(106 A/m)

1.15

magnetizing current density of pure iron core(104 A/m)

3.50

3.2 Current Prediction Model Based on LSTM and GRNN As mentioned in Sect. 3.1, there is a temporal relationship between spatial position and force feedback in multiple tactile operation, so we can use the information at the previous tactile operation to predict the coil current value in current tactile operation. Specifically, the data unit st = {xt , yt , zt , ft } is fed into the LSTM along with the cell state ct−1 and hidden state ht−1 , The gate control unit performs mixed operations on the input, and finally outputs after the forgetting stage and the memory stage. The direct output of the current state will be mapped to the expected current value Ip through a linear layer. At the same time, the cell state ct and hidden state ht will be generated and passed to the next moment, so that the prediction of the next state can use the previous time series information.

658

F. Wei et al.

In each tactile operation, the input of the time series prediction model is a set of timing-related 4-dimensional vectors, including three-dimensional space coordinates and force feedback, and the output is the predicted currents corresponding to the next tactile operation. The spatial coordinates x and y of input should remain the same, while the vertical position z and force feedback f should first increase and then decrease as the time step increases. Therefor in one tactile operation, we can use this model based on LSTM to predict the magnitude of the coil excitation current. Since the current is affected by the spatial position and force feedback, we also use the generalized regression neural network (GRNN [12]) as the basic regression model. The input of the neural network is the spatial position coordinates x, y, z of the joystick and the force feedback f on the joystick, which could be uniformly defined as s. the output is the excitation current I of the electromagnetic coil section. We don’t need to divide the data into multiple tactile operations according to the temporal relationship, but use the powerful fitting ability of GRNN to model all the data, so as to realize the prediction of coil excitation current pervasively. The generalized regression neural network divides the network structure into four parts: an input layer, a pattern layer, a summation layer and an output layer. Compared with the basic BP neural network, the purpose of the added pattern layer is to calculate the Gauss function value of the test sample and each training sample, the first node of the summation layer is the arithmetic sum of the output of the pattern layer, and the output of the other nodes of the summation layer is the weighted sum of the output of the pattern layer. When we specify the spatial coordinates and force feedback of any point, we can predict the magnitude of the excitation current that needs to be provided to the coil. 3.3 Model Fusion with Ensemble Learning The time series prediction model based on LSTM and the regression prediction model GRNN are excellent at local data prediction and global data prediction respectively. They can achieve better performance in some data ranges. Compared with regression prediction, prediction using time series relationship is more in line with the changing law of single tactile operation data. But direct regression prediction is universal and can show good performance in global data. We try to fuse these two models so that the fused model can realize accurate current prediction in one tactile operation and tolerance to abnormal data changes. We define ILSTM as the current prediction of LSTM model, and IGRNN as the current prediction of GRNN model. Thus, the overall current prediction of the fused model is defined as: I = λ1 ILSTM + λ2 IGRNN

(4)

Note that for the input of arbitrary spatial position and force feedback, we only selected the prediction results of the time series prediction model and the regression prediction model for fusion at the same time, so these two models are trained independently. We estimate the weights of the two models by their performance on the same test set during experiments.

A Current Prediction Model Based on LSTM

659

4 Experiments 4.1 Dataset and Training Process We collect training and testing data from the simulation software Maxwell. Each piece of data includes spatial position coordinate x, y, z, force feedback f and coil excitation current I. We have described the size and format of the training data in Sect. 3.1. In order to verify the performance of the proposed model, we additionally collected 40 pieces of data, which belong to the same tactile operation and have the same format as the training data. When training the time series prediction model based on LSTM, we divide the training data into multiple tactile operations. There is a temporal relationship between multiple tactile operations, so we use existing tactile data to predict future expected current values. The data guarantees the consistency of the xOy plane coordinates, and the vertical coordinate z and force feedback f have related trends. In other words, we fully simulate the palpation operation in the real scene. We used a total of 25 tactile operations for training, covering multiple basic pressing tactile types. As for training the GRNN model, we build a multi-layer neural network. GRNN does not need to train network parameter weights like a classic neural network, nor does it need to use activation functions to process neurons. We just need to determine the training dataset and the hyperparameters of the Gauss function. In order to balance the training time and calculation accuracy, we selected 412 pieces of data from the training set, and its hyperparameter was set to 0.5. 4.2 Evaluation Metrics and Compared Method Our proposed current prediction model is essentially to solve the regression task. We use RMSE (Root Mean Square Error), MAE (Mean Absolute Error) and R2 (R-Square) as evaluation metrics, which are used to evaluate the performance of model on regression task. For RMSE and MAE metrics, a lower value means better model performance, while for R2 metrics, a higher value means better model performance. The formulas for these indicators are described below: 1 n (yi − yi )2 (5) RMSE = i=1 n 1 n yi − yi MAE = (6) i=1 n

(yi − yi )2 (7) R2 = 1 − i 2 i (yi − yi )







where n is the num of test data, yi represent the predicted coil current value, yi represents the ground truth of the coil current value, and yi represents the average of real coil current value. We calculate these three metrics to demonstrate the effectiveness of our proposed method by comparing with basic BP neural network, single GRNN model and single model based on LSTM.

660

F. Wei et al.

4.3 Experimental Result We use the proposed current prediction model to conduct experiments on the test data, and the relationship between the predicted results and the real results is shown in Fig. 3. It can be seen the predicted current value by our proposed current prediction model is basically consistent with the ground truth, and the data change is consistent with the pressing type in the real tactile operation.

Fig. 3. Comparison of the predicted current value and the ground truth on the testing set.

Compared to basic and single models, our proposed current prediction model shows better performance. The performance comparison of multiple models is shown in Table 2. Table 2. The comparison of regression performance with different methods. Method

RMSE

MAE

R2

BPNN

90.62

74.47

0.9917

GRNN

67.96

55.84

0.9953

LSTM

54.37

44.68

0.9970

GRNN + LSTM

45.31

37.23

0.9979

5 Conclusion In this article, we proposed a new current prediction method in electromagnetic tactile device for remote palpation, which combines the time series prediction model LSTM and regression prediction model GRNN with the idea of ensemble learning. We noticed the

A Current Prediction Model Based on LSTM

661

temporal relationship in the data and define the concept of tactile operation. Our proposed method takes advantage of temporal relationship and achieve better performance in multiple tactile operation. This proposed method is beneficial to the electromagnetic tactile device built by our group to generate more accurate force feedback, so as to meet the accuracy requirements of remote palpation. In future work, we will investigate more complex types of tactile manipulations and apply them in remote palpation. Acknowledgements. This work was supported by the Natural Science Foundation of China under Grant No. 62073248.

References 1. Scimeca, L., Hughes, J., Maiolino, P., et al.: Action augmentation of tactile perception for soft-body palpation. Soft Rob. 9(2), 280–292 (2022) 2. Tzemanaki, A., Al, G.A., Melhuish, C., et al.: Design of a wearable fingertip haptic device for remote palpation: characterisation and interface with a virtual environment. Front. Robot. AI 5, 62 (2018) 3. Filippeschi, A., Villegas, J.M.J., Satler, M., et al.: A novel diagnostician haptic interface for telepalpation. In: 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 328–335. IEEE (2018) 4. Culbertson, H., Schorr, S.B., Okamura, A.M.: Haptics: the present and future of artificial touch sensation. Annual Rev. Control Robot. Auton. Syst. 1, 385–409 (2018) 5. Jiang, Y., Yang, C., Wang, X., et al.: Kinematics modeling of geomagic touch x haptic device based on adaptive parameter identification. In: 2016 IEEE International Conference on Realtime Computing and Robotics (RCAR), pp. 295–300. IEEE (2016) 6. Weiss, M., Wacharamanotham, C., Voelker, S., et al.: FingerFlux: near-surface haptic feedback on tabletops. In: Proceedings of the 24th annual ACM symposium on User interface software and technology, pp. 615–620 (2011) 7. Zhang, Q., Dong, H., El Saddik, A.: Magnetic field control for haptic display: system design and simulation. IEEE Access 4, 299–311 (2016) 8. Adel, A., Abou Seif, M., Hölzl, G., et al.: Rendering 3D virtual objects in mid-air using controlled magnetic fields. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 349–356. IEEE (2017) 9. Yu, Y., Si, X., Hu, C., et al.: A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019) 10. Smagulova, K., James, A.P.: A survey on LSTM memristive neural network architectures and applications. Europ. Phys. J. Special Topics 228(10), 2313–2324 (2019) 11. Dong, X., Yu, Z., Cao, W., et al.: A survey on ensemble learning. Front. Comp. Sci. 14, 241–258 (2020) 12. Izonin, I., Kryvinska, N., Tkachenko, R., et al.: An extended-input GRNN and its application. Procedia Comput. Sci. 160, 578–583 (2019)

Multi-step Probabilistic Load Forecasting for University Buildings Based on DA-RNN-MDN Lei Xu1 , Liangliang Zhang1 , Runyuan Sun1 , Na Zhang1,2,3(B) , Peihua Liu1 , and Pengwei Guan4 1 Shandong Provincial Key Laboratory of Network Based Intelligent Computing,

University of Jinan, Jinan 250022, China [email protected] 2 Shandong Key Laboratory of Intelligent Buildings Technology, Jinan 250101, China 3 School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China 4 Shandong Qiuqi Analysis Instrument Co., Ltd., Jinan 250024, China

Abstract. Short-term load forecasting (STLF) of electricity consumption in university buildings is essential for the efficient management and operation of electricity systems. However, STLF for university buildings is challenging due to the difficulty in obtaining complete data of influencing factors and the weak cyclicality of electricity operation. This paper proposes a novel DA-RNN-MDN model to reach multi-step probabilistic load forecasting of university buildings. Specifically, we take advantage of dual-stage attention-based recurrent neural network (DA-RNN) in adaptively capturing the temporal dependencies of time series data to extract information from historical data. In addition, a hybrid structure of long short-term memory (LSTM) network and mixture density network (MDN) is designed to achieve multi-step probabilistic load forecasting, ensuring the time correlation between multiple results of load prediction. Experimental results indicate that for STLF in a university building, the probability distribution of future multi-step loads can be forecasted with good performance. Keywords: Probabilistic load forecasting · Multi-step prediction · University buildings · Dual-stage attention-based recurrent neural network · Mixture density network

1 Introduction STLF predicts the electrical demand for a future period, usually a day or a week. It plays a significant role in the overall power system. During the generation phase, STLF can help balance supply and demand, thereby preventing power shortages or excess and improving the system’s reliability and safety. In the power consumption stage, STLF helps customers understand the upcoming changes in electricity consumption, allowing L. Xu and L. Zhang—Contributed equally. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 662–673, 2023. https://doi.org/10.1007/978-981-99-4755-3_57

Multi-step Probabilistic Load Forecasting for University Buildings

663

them to plan their power usage more effectively, enhance power consumption efficiency, and reduce costs [1]. According to statistics, building operation accounts for over 30% of total electricity consumption in society [2]. Therefore, effective management of electricity consumption during building operation is important in reducing energy consumption and promoting energy saving [3]. STLF can effectively assist in achieving these goals. Specifically, for university buildings, STLF can help the university energy management department develop a reasonable electricity consumption plan, ensuring safe and reliable electricity operations. By analyzing the forecast results, STLF can help optimize equipment power consumption, reduce waste, and decrease electricity bills and carbon emissions, thus playing a critical role in university power system operations [4]. However, compared with large industrial, commercial or residential buildings, university buildings have significant fluctuations in electricity consumption due to various factors such as teaching schedules, student behavior, weather, holidays, and others [5]. These factors result in poor cyclical changes in electricity consumption. In addition, we usually cannot obtain comprehensive information that affects electricity consumption, which poses significant challenges for STLF technology. Most STLF research focuses on load value forecasting, but accuracy is reduced in universities due to incomplete information. Probabilistic load forecasting predicts probability distributions, which can provide more decision-making information. However, there are few studies on probabilistic load forecasting, and the existing forecasting models have weak abilities in extracting information from historical time-series data, resulting in poor prediction performance. In addition, unlike single-step load forecasting which provides forecasting information at a single point, multi-step prediction can provide continuous decision support, which can reflect load trends and has higher practical application value. Nevertheless, the mixed use of multi-step prediction and probabilistic load forecasting faces serious cumulative error problems. In this paper, to effectively address the issues mentioned above, we propose a novel multi-step probabilistic load forecasting (MSPLF) method based on DA-RNN-MDN, which allows us to forecast the probability distribution of future multi-step loads. Additionally, the performance is verified on an electricity load dataset from a university building that integrates both research and teaching. Experimental results demonstrate the good predictive capability and effectiveness of the proposed model. The main contributions are as follows: (1) A novel MSPLF is proposed for the STLF of university buildings, where multi-step and providing probabilistic distributions are the most significant features. (2) A DA-RNN-MDN model is proposed to extract essential information from historical data and to ensure the temporal correlation of predicted results for time series multistep probabilistic prediction problems. (3) A penalty-enhanced loss function based on negative log-likelihood (NLL) is adopted to improve the ability to handle high load cases. The subsequent sections of this paper are organized as follows: Sect. 2 provides a review of the pertinent literature. The research methodology is described in Sect. 3. Section 4 presents the experimental setup and results. Lastly, the paper makes a summary of the whole work in Sect. 5.

664

L. Xu et al.

2 Related Work 2.1 Load Value Forecasting There are many approaches in the field of deep learning to solve the time series prediction problem. LSTM has been used for STLF in previous studies [6–8]. However, LSTM has three different gates, which leads to many parameters and makes it difficult to train. GRU has fewer parameters than LSTM, which makes it easier to converge. Therefore, GRU is widely used in STLF problems, such as predicting the load of a large area [9], predicting the short-term load of a community [10], etc. Many scholars use hybrid neural network models for load forecasting, such as combining the Convolutional Neural Network (CNN) and the LSTM [11], or the CNN and GRU model [12–14]. These models can capture features better and perform better in time series prediction problems. Seungmin Jung et al. [15] constructed an attention mechanism-based GRU model for multi-step load forecasting. Moreover, DA-RNN [16] was proposed to improve and realize multi-step load forecasting [17]. 2.2 Probabilistic Load Forecasting Probabilistic load forecasting is a technique that predicts the probability distribution of load, taking into account the uncertainty associated with incomplete information [18]. Various methods have been proposed by researchers to achieve this goal. For example, He et al. [19] combined CNN and Kernel Density Estimation (KDE) to predict short-term load probability, while Wang et al. [20] used quantile regression and kernel density estimation to output the probability distribution of load. Shepero and Yang et al. [21, 22] improved Gaussian Process to obtain probabilistic load forecasting results. Xu et al. [23] developed three probabilistic load forecasting models using Bayesian deep neural network combined with dropout technology. Liu et al. [24] built a hybrid model based on CNN and quantile regression, while literature [25] established a hybrid model combining Convolutional LSTM (ConvLSTM) and MDN [26] for short-term probability prediction of industrial loads. Additionally, literatures [27, 28] also developed probabilistic forecasting models for load forecasting. Challenge. In practical applications such as universities, incomplete information may limit the accuracy of load value forecasting, which in turn limits the useful information provided to decision-makers. Probabilistic load forecasting is a better choice, but the method of learning time variation patterns from historical data in load value forecasting research is worth learning. Additionally, implementing multi-step forecasting based on probabilistic load forecasting is more advantageous for decision-makers to observe load change trends and make decisions in advance. However, most existing probabilistic load forecasting studies only predict the load for a single time step. Further exploration is needed in related research. Effectively combining multi-step and probabilistic forecasting methods to achieve high accuracy remains a significant challenge.

Multi-step Probabilistic Load Forecasting for University Buildings

665

3 Methodology 3.1 Technical Preliminaries DA-RNN. Traditional nonlinear autoregressive with exogenous inputs (NARX) used for time series forecasting face challenges in capturing long-term temporal correlations and selecting the appropriate driving data for forecasting. The DA-RNN [17] model was introduced to overcome these limitations. It is an attention-based Encoder-Decoder model that uses LSTM in both the encoder and decoder. The encoder selects the relevant driving sequences using an input attention mechanism, and the decoder chooses the corresponding hidden layer states throughout the time steps using a temporal attention mechanism. These two attention mechanisms help the DA-RNN model to effectively extract useful information of input features and capture the dependencies between time series data over an extended period, providing a better solution for the time series prediction problem. MDN. MDN [27] is a neural network model that learns the parameters and corresponding weights of multiple normal distributions through a neural network. These are then weighted and combined to establish a continuous probability distribution model. Its primary objective is to introduce uncertainty by considering each output as a mixed distribution rather than a deterministic value or a simple Gaussian distribution. The mixed distribution can address the multi-value mapping problem that the Gaussian distribution cannot solve well. MDN is commonly utilized at the end of neural network models to provide more helpful information regarding the target variable to be predicted.

3.2 The Proposed Method MSPLF for University Buildings. To address the issues of STLF in university buildings, a MSPLF method is proposed. MSPLF can not only provide probability distribution of load, but also achieve multi-step prediction. Figure 1 shows the flowchart of MSPLF for university buildings. Firstly, date-related data, weather-related data, and historical load data are considered as the original input data. After standardizing the original data and constructing the time series data, we generate the input X for DA-RNN-MDN. DA-RNN-MAN with an Encoder-Decoder structure is a time-series prediction model proposed in this paper, which is designed to achieve multi-step probabilistic forecasting. In this paper, DA-RNN-MDN is used to construct the MSPLF model for university buildings. Ultimately, through the output of DA-RNN-MDN, we can obtain the probability distribution of load for the next 24 time steps. We can also get the confidence interval and value of future load by processing the probability distribution. DA-RNN-MDN. The structure of DA-RNN-MDN is shown in Fig. 2. The input of the model contains three parts: X-history, Y-history, and X-target. X-history and Y-history represent the historical data related to the target variable and the historical data of the target variable itself, respectively, while X-target represents the known future data related

666

L. Xu et al.

Fig. 1. The flow of MSPLF method for university buildings

Fig. 2. Structure of the DA-RNN-MDN model

to the target variable. Y-predict is the output of the model, which is the result of multi-step prediction and the prediction result of each step is a probability distribution. The model is based on an Encoder-Decoder architecture, where the entire DA-RNN model is used as the encoder to extract the most valuable information from large amounts of historical data and remove redundant information. The decoder is a hybrid structure of LSTM and MDN to implement multi-step probabilistic prediction and ensure temporal correlation between multiple predicted results, where the initial input is the output of the encoder and the previous load value. Additionally, the future characteristics of the first time step to be predicted are added to the input. These data are then input into the LSTM unit for learning, and output the results through MDN. The output of MDN consists of parameters for multiple general distributions and the weight for each distribution. In this study, we use Gaussian distribution as the general distribution. Therefore, the parameters output by MDN are the mean, standard deviation, and weight for each distribution, which are finally combined by weighted summation to form the mixture probability distribution

Multi-step Probabilistic Load Forecasting for University Buildings

667

function P(x). P(x) is formed by P(x) =

K k =1

1

αk ·  ·e 2π σk2

−(x−μk )2 2σ 2 k

(1)

where K is the number of general distribution, which is set by us and needs to be adjusted according to the prediction effect. αk is the weight of the k th Gaussian distribution. μk and σk represent the mean and standard deviation of the k th Gaussian distribution, respectively. αk , μk , σk are all objectives to be learned by the model. The predicted value is obtained by sampling the predicted probability distribution multiple times and taking the mean value, then inputting the predicted value and the current hidden state of the LSTM into the next step. This process is repeated at each time step until the end of the forecast is reached. This approach guarantees the time correlation of future multi-step load forecasting. It improves the accuracy by using future-related information to decrease deviations in predicted results and reduce the accumulation of errors. Penalty-Enhanced Loss Function. For model training, we designed a penaltyenhanced loss function based on NLL for DA-RNN-MDN optimization. As the model is designed for multi-step prediction, in MSPLF, we want to enhance its learning ability towards high values in the time series. Therefore, we adjust the loss according to the case of each target value of the time series. Specifically, we introduce weights for each time step, which are determined based on the average value in the time series dataset. Suppose the length of the current time series is T and the value of the jth time step is xj , then the weight wj of this time step can be calculated using the following equation, xj }. (2) wj = max{1, 1 T t=1 xt T Finally, the resulting loss function of our model is defined as the following, 1 N 1 T Loss = − wnt (log Pnt (x)). (3) n=1 t=1 N T The function is used to evaluate the goodness of fit between the model and the observed data. In this equation, N represents the total number of samples, and T represents the number of time steps in one sample. wnt represents the weight of the nth target variable at time step t. Pnt (x) represents the mixture Gaussian distribution of the nth target variable at time step t. The objective of this loss function is to maximize the log-likelihood function, which is equivalent to minimizing the NLL function. By minimizing the NLL function, we can obtain the maximum likelihood estimates of the Pnt (x) parameters.

4 Experiments 4.1 Dataset In the experiment, a typical building in a university that combines research and teaching was selected as the research object. We employed date-related, weather-related, and historical load data as features X. These features have been demonstrated to be critical

668

L. Xu et al.

for predicting future loads and are also easily obtainable influencing factor data [31]. The target variable that we aim to predict is future load, denoted by Y. A detailed description of the features X and target variable Y can be found in Table 1. The dataset comprises 17,496 samples and covers the period from November 2021 to December 2022, recorded on an hourly basis. In this study, we divided the dataset into two subsets, with 80% of the data allocated to the training dataset and the remaining 20% designated as the test dataset. Table 1. Table of data descriptions X

Date-related data

Month (1–12) Day of the month (1–31) Day of the week (1–7) Hour of the day (0–23) Holiday information (0, 1)

Weather-related data

Temperature (°C) Humidity (%) Wind speed (m/s) Rainfall (mm/m2 ) Air pressure (hpa) Average cloud cover (%)

Historical load

Load at the same time last week (kWh) Historical 48 time steps of load data in hours (kWh)

Y

Future load

Future 24 time steps of load data in hours (kWh)

4.2 Data Processing During data preprocessing, one-hot encoding is used to process discrete data of daterelated data. To mitigate the high dimensionality of the encoded data, we employ an autoencoder, which effectively reduces the complexity of subsequent model training. Additionally, both weather and load data were standardized. Finally, the time series dataset was constructed. After completing data preprocessing, a sample consists of features X and target Y. X includes three parts: weather and date for multi-step in the past (X-history), load values for multi-step in the past (Y-history), and date associated with the loads to be predicted in the future (X-target). Y is the future multi-step load value to be predicted. During this process, we calculate corresponding weights for the target load value using the weight calculation show in Eq. 2.

Multi-step Probabilistic Load Forecasting for University Buildings

669

4.3 Parameters Settings and Evaluation Metrics The input variable for the model consisted of the past 2*24 h’ load data and other relevant data, with the output being the predicted load for the following 24 h. Both the encoder and decoder employed LSTM with 128 neurons in the hidden layer. The feature maps were activated using the hyperbolic tangent (Tanh) function, and the attention mechanism was implemented by softmax. For the MDN, the base distribution was Gaussian with 4 components, indicating that a mixture of four Gaussian distributions was utilized to fit the target load distribution. The training process utilized AdamW as the optimizer, a batch size of 512 for the training dataset, and 50 epochs. To validate the performance of our model, we utilize Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as metrics to evaluate the results of load value prediction. In addition, Continuous Ranking Probability Scoring (CRPS) and Loss were employed to evaluate the probability predicted results. The definition of Loss is shown in Eq. 3, whereas the definitions of the other error measures are as follows:  2 T   N 1 t=1 yit − yˆ it , (4) RMSE = i=1 N T  1 N 1 T  yit − yˆ it ), ( i=1 T t=1 N 1 N 1 T CRPS = ( (CDF(ˆyit ) − CDF(yit ))dy), i=1 T t=1 N MAE =

(5) (6)

where N represents the number of samples, and T represents the number of time steps in the time series data. The symbol yit represents the actual value of the ith target variable at time step t, and yˆ it represents the predicted value of the target variable at time step t. CDF(ˆyit ) and CDF(yit ) represent the cumulative distribution function (CDF) of the predicted and actual values of the ith target variable at time step t, respectively. 4.4 Results Upon completion of model training, we verified the performance of the proposed method using the test dataset. Figure 3 shows the mixture probability density distribution predicted by MSPLF and real values for four time points of a sample in the test dataset. Specifically, the predicted mixture probability density distribution is constructed by utilizing the four weights of the MDN output in MSPLF, as well as the mean and standard deviation of the Gaussian distribution. As shown in Fig. 3, the real values are close to the peaks of mixture probability density distribution, reflecting the proposed MSPLF has high prediction accuracy. In addition, the probability density distribution can give a visual distribution of the load, which makes the understanding of the results clearer. Especially Fig. 3(a) and (b) present two peaks for mixture probability density distribution. This gives more decision-making information than just providing a load value. Overall, the proposed MSPLF presents a good performance in the STLF of the university building.

670

L. Xu et al.

Fig. 3. The mixture probability density distribution predicted by MSPLF and real values

Fig. 4. The predicted multi-step results by MSPLF. Load values and confidence intervals are illustrated

We can obtain the predicted value from the predicted distribution by sampling multiple times and taking the average. Additionally, we can determine the confidence interval from the distribution by specifying the confidence level. Figure 4 depicts the predicted results for the next 24 steps of load values and corresponding confidence intervals in hours for four samples in the test dataset. The load values are sampled ten times, and the confidence level for the confidence intervals is 80%. Although the curves of predicted values do not completely coincide with the curves of real values, they exhibit similar trends. In fact, increasing the times of sampling during obtaining the predicted value can further improve the approximation of the two cures. This is because increasing the times of sampling can make the predicted value closer to the load corresponding to the peak of mixture probability density distribution. Additionally, the region composed of confidence intervals almost covers the curves of the real values, which also verifies the good performance of the proposed MSPLF.

Multi-step Probabilistic Load Forecasting for University Buildings

671

Table 2. Performance comparison of various models on test dataset Forecasting methods

Predicted Value

CRPS

Loss

0.1282

0.0913

−0.8794

0.3567

0.1476

0.1052

−0.7845

LSTM-MDN

0.3765

0.1674

0.1553

−0.6016

GRU-MDN

0.3572

0.1541

0.1059

−0.7355

CNN-LSTM-MDN

0.3726

0.1612

0.1312

−0.6670

RMSE

MAE

DA-RNN-MDN

0.2996

Without penalty-enhanced loss

Table 2 compares the performance of various models on the test dataset. The DARNN-MDN with and without using the penalty-enhanced loss function are compared. For a fair comparison, building upon the MSPLF method proposed in this study, we also conducted experiments by combining CNN, LSTM, and GRU models with MDN. All of the above experiments were conducted on the same dataset, and all models were evaluated using the metrics described in the previous subsection. As shown in Table 2, the DA-RNN-MDN model is superior to other models in all metrics. Even without using the penalty-enhanced loss function, DA-RNN-MDN still performs better than other models in all metrics. This not only verifies the effectiveness of DA-RNN-MDN but also the improved performance of DA-RNN-MDN with the penalty-enhanced loss function validates the effectiveness of the penalty-enhanced loss function.

5 Conclusion This paper proposes a MSPLF method to address the issues of STLF in university buildings. MSPLF can not only provide probability distribution of load, but also achieve multistep prediction. To achieve this goal, a DA-RNN-MDN is proposed, which effectively learns crucial information from a large volume of historical time series data and gradually predicts the future multi-step probability density distribution by combining future characteristics. In the case of load forecasting for university buildings, our model takes historical electricity consumption, weather-related data, date-related data, and future date-related data as input. To improve the prediction accuracy of high power loads, a penalty-enhanced loss function is introduced to improve the learning ability of the model. Experiments illustrate that MSPLF is able to give the multi-step probability density distribution that has abundant information and is more conducive to actual decision-making compared to load value prediction. In addition, the effectiveness of DA-RNN-MDN and penalty-enhanced loss function are verified by comparison with other models. Going forward, we aim to enhance the prediction accuracy for an increased number of forecasted time steps. Additionally, we intend to explore online learning techniques to ensure the model’s performance when the load situation changes regularly with reality.

672

L. Xu et al.

Acknowledgements. This work was supported by Shandong Key Laboratory of Intelligent Buildings Technology (Grant No. SDIBT2021004), Opening Fund of Shandong Provincial Key Laboratory of Network Based Intelligent Computing and Project of Talent Introduction and Multiplication of Jinan City.

References 1. Wang, H., Alattas, K.A., Mohammadzadeh, A., et al.: Comprehensive review of load forecasting with emphasis on intelligent computing approaches. Energy Rep. 8, 13189–13198 (2022) 2. Zhang, L., et al.: A review of machine learning in building load prediction. Appl. Energy 285, 116452 (2021) 3. Wang, J., Chen, X., Zhang, F., Chen, F., Xin, Y.: Building load forecasting using deep neural network with efficient feature fusion. J. Mod. Power Syst. Clean Energy 9(1), 160–169 (2021) 4. Azeem, A., Ismail, I., Jameel, S.M., Harindran, V.R.: Electrical load forecasting models for different generation modalities: a review. IEEE Access 9, 142239–142263 (2021) 5. Wei, Q., Li, Q., Yang, Y., Zhang, L., Xie, W.: A summary of the research on building load forecasting model of colleges and universities in north China based on energy consumption behaviour: a case in north China. Energy Rep. 8, 1446–1462 (2022) 6. Kwon, B.S., Park, R.J., Song, K.B.: Short-term load forecasting based on deep neural networks using LSTM layer. J. Electr. Eng. Technol. 15, 1501–1509 (2020) 7. Muzaffar, S., Afshari, A.: Short-term load forecasts using LSTM networks. Energy Procedia 158, 2922–2927 (2019) 8. Kong, W., Dong, Z.Y., Jia, Y., et al.: Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 10(1), 841–851 (2017) 9. Xiuyun, G., Ying, W., Yang, G., et al.: Short-term load forecasting model of GRU network based on deep learning framework. In: 2018 2nd IEEE Conference on Energy Internet and Energy System Integration (EI2), pp. 1–4 (2018) 10. Zheng, J., Chen, X., Yu, K., et al.: Short-term power load forecasting of residential community based on GRU neural network. In: 2018 International Conference on Power System Technology (POWERCON), pp. 4862–4868 (2018) 11. Tian, C., Ma, J., Zhang, C., et al.: A deep neural network model for short-term load forecast based on long short-term memory network and convolutional neural network. Energies 11(12), 3493 (2018) 12. Sajjad, M., Khan, Z.A., Ullah, A., et al.: A novel CNN-GRU-based hybrid approach for short-term residential load forecasting. IEEE Access 8, 143759–143768 (2020) 13. Wu, L., Kong, C., Hao, X., et al.: A short-term load forecasting method based on GRU-CNN hybrid neural network model. Math. Probl. Eng. 2020, 1–10 (2020) 14. Shen, M., Xu, Q., Wang, K., Tu, M., Wu, B.: Short-term bus load forecasting method based on CNN-GRU neural network. In: Xue, Y., Zheng, Y., Rahman, S. (eds.) Proceedings of PURPLE MOUNTAIN FORUM 2019-International Forum on Smart Grid Protection and Control. LNEE, vol. 585, pp. 711–722. Springer, Singapore (2020). https://doi.org/10.1007/ 978-981-13-9783-7_58 15. Jung, S., Moon, J., Park, S., et al.: An attention-based multilayer GRU model for multistepahead short-term load forecasting. Sensors 21(5), 1639 (2021) 16. Qin, Y., Song, D., Chen, H., et al.: A dual-stage attention-based recurrent neural network for time series prediction. arXiv preprint arXiv:1704.02971 (2017)

Multi-step Probabilistic Load Forecasting for University Buildings

673

17. Siridhipakul, C., Vateekul, P.: Multi-step power consumption forecasting in Thailand using dual-stage attentional LSTM. In: 2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE), pp. 1–6. IEEE (2019) 18. Chen, B., Islam, M., Gao, J., et al.: Deconvolutional density network: modeling free-form conditional distributions. Proc. AAAI Conf. Artif. Intell. 36(6), 6183–6192 (2022) 19. He, H., Pan, J., Lu, N., et al.: Short-term load probabilistic forecasting based on quantile regression convolutional neural network and Epanechnikov kernel density estimation. Energy Rep. 6, 1550–1556 (2020) 20. Wang, S., Wang, S., Wang, D.: Combined probability density model for medium term load forecasting based on quantile regression and kernel density estimation. Energy Procedia 158, 6446–6451 (2019) 21. Shepero, M., Van Der Meer, D., Munkhammar, J., et al.: Residential probabilistic load forecasting: a method using Gaussian process designed for electric load data. Appl. Energy 218, 159–172 (2018) 22. Yang, Y., Li, S., Li, W., et al.: Power load probability density forecasting using Gaussian process quantile regression. Appl. Energy 213, 499–509 (2018) 23. Xu, L., Hu, M., Fan, C.: Probabilistic electrical load forecasting for buildings using Bayesian deep neural networks. J. Build. Eng. 46, 103853 (2022) 24. Liu, R., Chen, T., Sun, G., et al.: Short-term probabilistic building load forecasting based on feature integrated artificial intelligent approach. Electric Power Syst. Res. 206, 107802 (2022) 25. Wang, Y.Y., Wang, T.Y., Chen, X.Q., et al.: Short-term probability density function forecasting of industrial loads based on ConvLSTM-MDN. Front. Energy Res. 10, 405 (2022) 26. Bishop, C.M.: Mixture density networks (1994) 27. Álvarez, V., Mazuelas, S., Lozano, J.A.: Probabilistic load forecasting based on adaptive online learning. IEEE Trans. Power Syst. 36(4), 3668–3680 (2021) 28. Zhang, W., Quan, H., Gandhi, O., et al.: Improving probabilistic load forecasting using quantile regression NN with skip connections. IEEE Trans. Smart Grid 11(6), 5442–5450 (2020) 29. Khatoon, S., Singh, A.K.: Effects of various factors on electric load forecasting: an overview. In: 2014 6th IEEE Power India International Conference (PIICON), IEEE, pp.1–5 (2014)

A Quantum Simulation Method with Repeatable Steady-State Output Using Massive Inferior Solutions Guosong Yang1,2,3 , Peng Wang2,3(B) , Gang Xin2 , and Xinyu Yin2 1 Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu, China 2 School of Computer Science and Engineering, Southwest Minzu University, Chengdu, China

[email protected] 3 University of Chinese Academy of Sciences, Beijing, China

Abstract. In evolutionary computation, inferior solutions are often discarded to expedite convergence, potentially bypassing valuable optimization insights. To rectify this, we have developed a novel method, drawing inspiration from ground state evolution (GSE) and quantum annealing (QA). These processes retain numerous positions with non-minimum potential energy, enabling the construction of the lowest possible energy wave function. Our method’s theoretical foundation is meticulously explained through the quantum path integral. We have realized this theory via numerical simulation, utilizing population-based evolution driven by multi-scale Gaussian sampling with a decreasing scale, mimicking QA with multiscale diffusion Monte Carlo (DMC). A series of rigorous experiments highlight the unique attributes and effectiveness of this method. Importantly, our approach generates a vast array of inferior solutions consistently. Their distribution indicates regions of lower function values within the solution space, presenting a new perspective on the utilization of inferior solutions. The implications of this research promise enhancements in solving optimization problems, potentially improving efficiency in evolutionary computation and beyond. Keywords: Quantum Inspired Optimization · Inferior Solution · Diffusion Monte Carlo · Ground State Evolution · Quantum annealing

1 Introduction Under limited computing resources, evolutionary algorithms output deeply optimized solutions of the objective function f (x) in the feasible solution space. These optimizations function without prior knowledge about f (x), with only the function values of sampled points, namely fitness, being essential. To achieve fast convergence, survival of the fittest is encouraged, leading to most inferior sampling points being discarded. Only a few optimal fitness individuals survive for further optimization, which may risk losing valuable information. Inferior solutions are useful in optimization because 1) they contain historical information to guide further search; 2) they may indicate a potential optimal area nearby, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 674–684, 2023. https://doi.org/10.1007/978-981-99-4755-3_58

A Quantum Simulation Method with Repeatable Steady-State Output

675

its inferior fitness is due to insufficient exploration of its neighborhood; 3) the denser distribution of non-optimal solutions may reflect the landscape of f (x), , as optimization algorithms tend to explore the area with better fitness more; 4) sometimes, the desired output cannot be directly encoded into a vector x in feasible solution space, like in some path-planning problems where a continuous path is needed instead of a few discrete solutions with competitive fitness. Thus, ignoring inferior solutions may have side effects besides fast convergence. The first two points regarding the utility of inferior solutions have been thoroughly investigated for optimization performance [1–4]. The last two, however, have garnered less attention [5], likely due to the specific requirement of the final point: output inferior solutions must be controllable and well-organized, a condition unmet by most existing optimization architectures. The ground state evolution (GSE) of a bound particle could inspire such a method. Under a given potential field V (x), it converges to its ground state wave function φ0 (x), assigning non-zero values to numerous non-minimum potential energy locations. Thus, minimizing energy towards the lowest value E0 by favoring locations with lower potential energy. This process realizes energy minimization as the particle’s distribution probability is proportional to |φ0 (x)|2 . In a QA-like process, imagine a continuous GSE with slowly decreasing quantum effect. This will progressively converge to the global minimum of V (x) through a series of distributions related to φ0 (x), marking the landscape of V (x) using numerous nonminimum positions with non-zero wave function value. Consequently, by encoding V (x) with f (x) and solving φ0 (x) by simulating QA with quantum mechanics, one could approach the global minimum and gather optimization information using inferior solutions. Additionally, the system’s quantum effect influences φ0 (x), allowing for the desired steady-state with appropriate distribution density. QA optimization computers such as D-wave [6] and Coherent Ising Machine [7] have physically verified this concept. It is also used to develop optimization algorithms, with Kadowaki and Finnila et al. proposing the QA algorithm that uses quantum tunneling to find the global minimum among local minima [8, 9]. However, these works primarily aimed for fast convergence and precision, neglecting inferior solutions’ potential value in exploring f (x). Further study is warranted. The paper is structured as follows: Sect. 2 covers basic ideas and theory. Section 3 proposes the method using numerous inferior solutions with steady-state output. Section 4 conducts experiments showcasing the method’s attributes, while Sect. 5 provides the major conclusions.

676

G. Yang et al.

2 Theoretical Basis 2.1 GSE–A Quantum Energy Minimization Process In the quantum realm, a particle’s spatial distribution is dictated by the wave function ψ(x, t). Prior to observation, the probability of finding the particle at x is proportional to |ψ(x, t)|2 , where ψ(x, t) can be found by solving the Schrodinger equation (consider a particle constrained in a 1-D infinite potential V (x) for simplicity)   2 ∂ψ(x, t) = − ∇ 2 + V (x) ψ(x, t) (1) i ∂t 2m where  is the reduced Planck constant, and m is the particle mass. In a bound state, ψ(x, t) = ∞ n=0 cn φn(x) exp(−iEn t/) is the general solution of (1), Here, φn(x) is the wave function of the eigen-state, and En is the corresponding eigen-energy ordered by energy value such that E0 < E1 ≤ E2 ≤ E3 ≤ .... The system determines the ground state energy E0 . As E0 is the lowest possible energy, the particle’s evolution to its φ0 (x) is an optimization problem that minimizes energy by shaping ψ(x, t) toward φ0 (x). Once E0 is determined, to minimize energy it tends to have larger values where V (x) has smaller potentials.

Fig. 1. Typical cases of φ0 (x) in double-well potential V (x) = 5(x2 − 9)2 /81 + 0.01x. Here φ0 (x) of different virtual particles with distinct mass is denoted by dotted lines. Length and mass are expressed in dimensionless units [10].

Figure 1 illustrates how φ0 (x) tends toward global maximum, local minimum, and local maximum near the respective global minimum, local maximum, and local minimum of V (x) as m = me /49 increases. With more considerable mass, resulting in smaller and weaker quantum effect, the maximum of φ0 (x) gradually converges toward the global minimum of V (x)(blue and yellow dotted lines). Assuming a continuous GSE with gradually decreasing quantum effect, we approach the global minimum of V (x) with adequate accuracy (black line). However, a poor initial distribution might result

A Quantum Simulation Method with Repeatable Steady-State Output

677

in convergence to the local minimum (red line). For an unknown V (x), guaranteeing convergence toward the global minimum with small, random distribution is not feasible. Therefore, simulating the QA process, a virtual multi-ground state evolution (MGSE) with decreasing quantum effect should be employed. Initially, a large E0 ensures global search and rough convergence toward the global minimum. In this stage, strong tunneling allows escape from local minima. Later, by reducing E0 , the system changes, evolving from φ0 (x) to a new ground state distribution for a more accurate approach. By continuing this GSE, we can estimate the global minimum with desired precision while maintaining lots of non-minimum potential. This resembles the heuristic optimization algorithm designed for global minimum location. If we encode V (x) by f (x), we derive a unique Schrodinger equation   2 ∂ 2 ∂ψ(x, t) = − + f (x) ψ(x, t) (2) i ∂t 2m ∂x2 Similarly, the global minimum of f (x) can be approached by simulating the QA process with MGSE. The optimization problem thus becomes a quantum issue, and the task left is solving φ0 (x) of a well-designed MGSE with decreasing quantum effect. While any method capable of obtaining φ0 (x) can be used, operating a real quantum system or mathematically solving it to get φ0 (x) is very challenging. Fortunately, this quantum simulation idea has been well-studied with various computational methods available [11]. We chose DMC to simulate MGSE due to its similarity with heuristic optimization, simplicity, and computational efficiency. 2.2 Solving a Single Ground State via DMC Since only φ0 (x) is needed and (2) is a complex equation related to time and i, the Wick rotation [12], τ → it, is introduced to transform (2) into an imaginary Schrödinger equation [10, 13]  2 2   ∂ ∂ψ(x, τ ) = − f (x) ψ(x, τ ) (3)  ∂t 2m ∂x2 This is similar to the diffusion-reaction function [14], where f (x)/ corresponds to the reaction term. Through quantum path integral [10, 15], the evolution of (3) can  +∞ be expressed as ψ(x, τ ) = −∞ K(x, τ |x0 , 0)ψ(x0 , 0)dx0 , where K(x, τ |x0 , 0) is the propagator in imaginary time. Divide τ into N pieces evenly, set xN = x, K(x, τ |x0 , 0) = limN →+∞

 +∞ −∞

dx1 ...

and define P(xn , xn−1 ) 

exp −f (xn )τ ,  ψ(x, τ ) = limN →+∞

 +∞ −∞



dxN −1 (

√1 2π

+∞ N −∞

n=1

 N m τ N m [ (xj − xj−1 )2 + f (xj )] ) 2 exp − 2 j=1 2π τ  2τ



m τ

2 m and W (xn ) exp − (xn −x2n−1 ) τ

P(xn , xn−1 ) · W (xn )ψ(x0 , 0)



N −1 j=0

(4) ≡

 dxj

(5)

678

G. Yang et al.

where W (xn ) is the weight function that shapes ψ(x, τ ) as time evolves and is related to potential energy f (x).P(xn , xn−1 ) is influenced by kinetic energy, being random, nonnegative, and normalized. √ It can be viewed as Gaussian sampling from xn−1 to xn with standard deviation σ = τ /m, suggesting that GSE evolution may be simulated by weighted random walks of massive Gaussian sampling points in parallel (i.e., DMC). This makes a Markov process [16] ρ(x, τ + h) − ρ(x, τ ) = α[ρ(x + a, τ ) − 2ρ(x, τ ) + ρ(x − a, τ )] ≈ a2 α

∂2 ρ(x, τ ) ∂x2 (6)

and can be approximated by h∂ρ/∂τ , thus ∂ρ(x, τ ) ∂ 2 ρ(x, τ ) =D ∂τ ∂x2

(7)

where D = a2 α/h. (7) is the pure diffusion equation, explaining how the probability distribution of massive independent walkers evolves. This evolution can be simulated by DMC with massive random walkers driven by Gaussian sampling x(τ + 1) ∼ N (x(τ ), σ 2 ). To obtain the global minimum of non-zero f (x) in (3), the potential energy (reaction term) must be considered. Since adding a f (x) with non-zero average value means the average value of W (xn ) is not 1, which would decrease or increase the walkers’ population size continuously. Thus, a positive trail potential energy ET is introduced to control the population size with negative feedback [13]. A typical feedback control law is employed ET (τ + 1) = Em (τ ) + kp [1 − N (τ )/N0 ]

(8)

where Em (τ ) is the mean energy of all sampling points, kp is a positive factor related to the system, N is the number of alive walkers, and N0 is the desired population size. Thus, (3) turns into  2 2    ∂ ∂ψ(x, τ ) = + ET − f (x) ψ(x, τ ) (9)  ∂τ 2m ∂x2 Thus, the weight of sampling point i in DMC is transformed into wi = exp{[ET (τ ) − f (xi )]τ/}

(10)

where ET is estimated by (8). Corresponding to (1) the general solution of (9) is ψ(x, τ ) =

+∞ n=0

cn φn(x) exp{[ET (τ ) − En ]τ/}

(11)

Among all φn(x), if and only if ET = E0 can the wave function converge to the only stable state φ0 (x) in DMC [10, 13], namely, limτ →+∞ ψ(x, τ ) = c0 φ0 (x). Therefore, DMC can be initialized with N0 random walkers conducting Gaussian sampling with the given scale determined by the simulated system; this is called random

A Quantum Simulation Method with Repeatable Steady-State Output

679

walking or diffusion. After each diffusion, each wi is calculated by (10), with walkers having large weight reproducing new copies at their present position and those with smaller weight participating in further evolution. Walkers with very limited weight would be deleted. This process tends to minimize system energy. The trial energy will be adjusted by negative feedback in (8). As this process continues, it would eventually converge to the only steady-state, φ0 (x). Then ET can be viewed as the ground state energy, with φ0 (x) approximated by the distribution of surviving walkers [10, 13, 16]. 2.3 Simulating QA with MGSE and Multi-scale DMC MGSE with decreasing quantum effect simulates the QA process, with DMC solving the ground states φ0 (x) in MGSE using decreasing Gaussian sampling scale. Initially, a large scale global search simulates strong quantum tunneling, retaining inferior solutions due to a large Em . This ensures ample global search and indicates the global landscape via φ0 (x) with strong quantum effect. Gradually reducing σ enhances search resolution, retaining only more competitive inferior solutions, thereby realizing further convergence and revealing smaller fitness landscapes. This process continues until φ0 (x) of the final scale is obtained, potentially achieving satisfying precision if the stop scale is small enough. The decreasing scale aligns with the Gaussian sampling characteristic. With maximum probability at the symmetric distribution center, Gaussian sampling tends to focus on the present location neighborhood as per the σ/3σ principle. By decreasing the sampling scale σ , the method promotes global exploration with large scale, initially retaining many poorer solutions. Subsequently, it focuses on the potential global minimum with small scale search, balancing fast convergence and high precision. Thus, with an appropriate scale decreasing law, this method converges toward the global minimum, employing massive inferior solutions at different optimization precisions.

3 Algorithm Proposal and Analysis As per the above discussion, Algorithm 1 illustrates the quantum simulation optimization method. It consists of three core operations: diffusion, influenced by kinetic energy and simulated by parallel Gaussian sampling; birth-death process reshaping ψ(x, τ ) by potential V (x); and scale decreasing. Prior to optimization, two parameters are required. The desired population size N0 simulates the distribution of ψ(x, τ ), necessitating a large N0 for simulation accuracy and method stability. The initial sampling scale σ should be sufficiently large for thorough global search.

680

G. Yang et al.

As the scale decreases, diffusion is simulated through a continuous weighted random walk driven by parallel Gaussian samplings. The birth-death process determines a walker’s fate based on its weight wi (x). For numerical stability, wi (x) can be kept below 3 or 5 before reproduction [10]. After this process, the algorithm updates ET and φ0 (x) based on the new walker distribution. Stability can be judged by the fluctuation of ET and φ0 (x). Due to the complexity of distribution estimating, one may simply use the fluctuation of ET as a stop criterion, or terminate optimization based on computing budget. For higher accuracy, ET and φ0 (x) can be determined by multiple iteration statistics post-stability. The stop criterion is dependent on available computing resources, required precision, and algorithm complexity and convergence. Complexity and convergence analysis is challenging due to the stochastic nature of the algorithm and the unknown form of f (x). Performance analysis of QA algorithms remains unresolved [17]. If the algorithm stops after T scales of GSE and the i-th evolution ends after Ii iterations with an average population size of N0 , then for a D-dimensional f (x), the highest time frequency operation is one-dimensional Gaussian sampling, with a time complexity of O(DN0 ) in each iteration. Hence, the algorithm complexity is linear with problem dimension and number of walks, and the total algo rithm execution time complexity is O( Ti=1 Ii DN0 ). As T increases, convergence slows and a larger mean value of Ii , increases the algorithm’s complexity. The settings for T and mean value of Ii are tied to the energy gap between the GSE E0 and the first excited state energy E1 . The quantum adiabatic theorem [18] states that a larger minimum energy gap during MGSE ensures a faster adiabatic process. Clearly,

A Quantum Simulation Method with Repeatable Steady-State Output

681

this gap depends on the unknown form of f (x). Practically, similar to the annealing curve in QA algorithms, a faster decrease rate is preferred when the global minimum’s neighborhood has the highest density in φ0 (x) or f (x) is unimodal, promoting swift convergence. In contrast, a slow decrease rate is beneficial to find the global minimum in complex landscapes. As per (11), if E0 is reasonably estimated, all excited states decay exponentially with time, with the first excited state decaying slowest, impacting convergence and complexity. In practise, this predicament is not unique, as most heuristic optimization algorithms cannot guarantee global convergence or quantitatively predict their complexity.

4 Experiment and Analysis 4.1 Convergence Experiment  2 A 10-D Rastrigin function, f (x) = 10 i=1 (xi − 10 cos(π xi ) + 10) where xi ∈ [−5, 5], is used for testing. Initial sampling scale, N0 , maximum iterations were set to 2, 100, and 10000, respectively. Sampling scale was designed halve every 100 iterations, and the initial location of each dimension was set to 3. Additionally, the particle swarm optimization (PSO) was used for comparison. Its population, inertia weight, two acceleration coefficients, and the maximum velocity, were set to 30, 0.7, 2, and 2, respectively.

Fig. 2. Optimization curves of 10-D Rastrigin function using Algorithm 1 and PSO.

As depicted in Fig. 2, the global best (Gbest) and mean fitness of PSO drop irregularly towards the global minimum at f (0, ..., 0) = 0 in 1000 iterations. Our algorithm, however, exhibits staged drops in best fitness and ET as optimization proceeds. With a new scale, ET decreases rapidly, achieving a new steady-state in few iterations. The algorithm remains stable at this state, highlighting the retention of inferior solutions with similar fitness. Notably, the best historical solution is not efficiently utilized, as evidenced by the increases of the best fitness. With the scale decreasing, our algorithm showed rapid convergence and stability, yielding improved output accuracy. If proper iteration times were set to each scale ET , the convergence would be more effective.

682

G. Yang et al.

4.2 A Practical Case with Simple Function Landscape Consider a city delivery multi-rotor vehicle’s path-planning problem. For this vehicle, vertical climbs, which are time and energy intensive, should be minimized. Further, flights between buildings are forbidden for safety. Hence, the building heights in the delivery blocks can be defined as the cost function f (x1 , x2 ) to optimize. The goal is a flight trajectory with the lowest height for a given start and end point. Traditional optimization algorithms struggle with this problem, but Algorithm 1 is suitable as it densifies the distribution of inferior solutions in lower-height areas. As illustrated in Fig. 3, the start point is (1,1), the end point is (7,7), and a path with the lowest height is needed. The initial sampling scale, population size, and maximum iterations were set to 6, 350, and 1000 respectively, with the sampling scale halving every 250 iterations. The surviving walkers from the last 15 iterations of each scale was recorded for a more accurate distribution.

(a) Top view of a block cons - (b) Steady-state distribution of (c) Steady -state distribution of tituted by 6*6 buildings, with inferior solutions when σ = 3.00. inferior solutions when σ = 0.75. height in the corresponding p osition.

Fig. 3. Steady-state distribution of inferior solutions under different sampling scales, which indicates path with lowest height. An alive walker is denoted by a point, note the distribution tends to be denser in area with lower height.

Figure 3b and 3c illustrate the steady-state distributions, depicted by contour lines and the kernel density estimate of each dimension on the top/right edge. Buildings with lower height correspond to denser distributions of random walkers, aiding in optimal path planning. As the sampling scale decreases, the distribution becomes denser. The tallest building on the trajectory can be identified by the sparsest distribution along this path, determining the minimum flight trajectory height. Combined with the 2D path shown in the figure, the candidate 3D trajectory can be determined. 4.3 A Practical Case with Rugged Function Landscape For military helicopters, low-altitude flight is advisable for stealth, and it is crucial to avoid enemy anti-aircraft system. Here, we consider a landscape given by fl (x1 , x2 ) =

A Quantum Simulation Method with Repeatable Steady-State Output

683

2[3 cos x12 + x22 /2 + cos(x1 /2) + cos(x2 /2)] where −40 < x1 , x2 < 40, with the cost function influenced by an anti-aircraft system modeled as fl (x1 , x2 ) = 18/[0.2 (x1 − xi1 )2 + 0.2(x1 − xi2 )2 + 1], where (x1 , x2 ) is the  location of i-th anti-aircraft system. The total cost function is f (x1 , x2 ) = fl (x1 , x2 ) + fi (x1 , x2 ). With three systems at (-32,0), (0,18), and (18,-18), we aim to identify a low-risk area for our aircraft. Conventional optimization methods may not be suitable, but our proposed method can locate the desired region by adjusting the sampling scale.

(a) 3D landscape of f l ( x1 , x2 ) . (b) Distribution of the steady-state (c)Distribution of the steadyinferior solutions when optimiz- state inferior solutions when ing f l ( x1 , x2 ) . optimizing f ( x1 , x2 ) .

Fig. 4. Steady-state distributions of the inferior solutions in distinct optimizations. A point denotes an alive walker, and the distribution is denser in the area with lower risk.

In our optimization, we set initial sampling scale, population size, and maximum iterations to 16, 450, and 1000. The sampling scale halved every 200 iterations, with walkers from the last 80 iterations of each scale used to construct the steady-state distribution. Figure 4a shows the 3D surface of fl (x1 , x2 ), with Fig. 4b illustrating the steady-state distribution of inferior solutions. The distribution’s symmetry indicates the method’s robustness. However, the presence of three anti-aircraft systems dramatically changes the steady-state, as seen in Fig. 4c. Note that in the three areas near the anti-aircraft systems, no sampling points can be observed. In addition, the overall distribution is also changed by them. As revealed by the contour lines and kernel density estimation curve on the upper/right sides, these distributions provide situational awareness in this area. While the method leverages inferior solutions and shows basic optimization abilities, we observed its limitations as well, including lost historical solution information and instability due to DMC characteristics. To address this, importance sampling and advanced feedback could enhance algorithm stability.

5 Conclusion Conventional evolutionary computation often discards inferior solutions for speedy convergence, overlooking potential optimization information. Inspired by ground state wave function resolution in quantum mechanics and QA process, we introduce a novel method

684

G. Yang et al.

that achieves a repeatable steady-state distribution of numerous inferior solutions. This method, driven by parallel, weighted, multi-scale Gaussian sampling, delivers a controllable distribution defined by the sampling scale. Experiment high-lighted the method’s unique features and basic optimization ability, offering a fresh way to leverage inferior solutions. This opens up a new perspective for studying the landscape of the objective function.

References 1. Jin, J., Wang, P.: Multiscale quantum harmonic oscillator algorithm with guiding information for single objective optimization. Swarm Evol. Comput. 65, 100916 (2021) 2. Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. The J. Mach. Learn. Res. 15(1), 949–980 (2014) 3. Li, J.Z., Zheng, S.Q., Tan, Y.: The effect of information utilization: introducing a novel guiding spark in the fireworks algorithm. IEEE Trans. Evol. Comput. 21(1), 153–166 (2016) 4. Simoncini, D., Verel, S., Collard, P., Clergue, M.: Centric selection: a way to tune the exploration/exploitation trade-off. In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, pp. 891–898 (2009) 5. Tanabe, R.: Towards exploratory landscape analysis for large-scale optimization: a dimensionality reduction framework. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 546–555 (2021) 6. Johnson, M.W., et al.: Quantum annealing with manufactured spins. Nature 473(7346), 194– 198 (2011) 7. McMahon, P.L., et al.: A fully programmable 100-spin coherent Ising machine with all-to-all connections. Science 354(6312), 614–617 (2016) 8. Kadowaki, T., Nishimori, H.: Quantum annealing in the transverse Ising model. Phys. Rev. E 58(5), 5355 (1998) 9. Finnila, A.B., Gomez, M.A., Sebenik, C., Stenson, C., Doll, J.D.: Quantum annealing: a new method for minimizing multidimensional functions. Chem. Phys. Lett. 219(5–6), 343–348 (1994) 10. Kosztin, I., Faber, B., Schulten, K.: Introduction to the diffusion Monte Carlo method. Am. J. Phys. 64(5), 633–644 (1996) 11. Blinder, S.M., House, J.E.: Mathematical physics in theoretical chemistry. Elsevier (2018) 12. Wick, G.C.: Properties of Bethe-Salpeter wave functions. Phys. Rev. 96(4), 1124 (1954) 13. Ceperley, D., Alder, B.: Quantum Monte Carlo. Science 231(4738), 555–560 (1986) 14. Anderson, J.B.: A random-walk simulation of the Schrödinger equation: H3 + . J. Chem. Phys. 63(4), 1499–1503 (1975) 15. Feynman, R.P., Hibbs, A.R., Styer, D.F.: Quantum mechanics and path integrals: Emended edition. Dover Publications (2005) 16. Thijssen, J.: Computational physics. Cambridge University Press (2007) 17. Edward, F., Jeffrey, G., Sam, G., Joshua, L., Andrew, L., Daniel, P.: A quantum adiabatic evolution algorithm applied to random instances of an NP-complete problem. Science 292(5516), 472–475 (2001) 18. Zhang, Y.Y, Fu, Z.H: Survey of adiabatic quantum optimization algorithms. Comput. Eng. Sci. 37(3), 429–433 (2015)

Metal Oxide Classification Based on SVM Kai Xiao, Zhuo Wang(B) , and Wenzheng Bao(B) School of Information Engineering, Xuzhou University of Technology, Xuzhou 221018, China [email protected], [email protected]

Abstract. Oxides are classified into various oxides according to their constituent elements, among which there are binary and quaternary oxides. Binary oxides are widely used in industry, and quaternary oxides have great potential in this aspect of electrode materials. This paper focuses on a support vector machine based classification method for oxides, redefining the method for describing metal oxides in order to facilitate computation, and using principal component analysis to reduce the dimensionality of the data, evaluating the classification results using a variety of metrics, and comparing the classification results of multiple kernel functions and the number of principal components to select the best algorithm. The results show that the support vector machine with the number of principal components of 4 and a Gaussian kernel function has better classification results and is important for the identification of oxides. Keywords: oxides · classification algorithms · machine learning · principal component analysis · support vector machines

1 Background Metal oxides are binary compounds composed of oxygen element and another metallic chemical element, such as Fe2 O2 , Na2 O, etc. Oxides include basic oxides, acid oxides, peroxides, superoxides, and amphoteric oxides. Metal oxides are widely used in daily life. Lime is a common desiccant and can also be used for disinfection; iron oxide can be used as a red pigment; some catalysts applied in industrial processes are also metal oxides. Metal oxides are an important class of catalysts and have been widely used in the field of catalysis. After nanosizing metal oxides, their catalytic performance is even better, and it is foreseeable that nano metal oxides will be an important direction for catalyst development. In the past, people mainly focused on the study of binary metal oxides, and less on the study of quaternary oxides. However, with the progress of material science, the inherent properties of quaternary metal oxides have been gradually discovered and applied. Currently, the automotive industry is facing increasingly stringent requirements in terms of fuel economy and emission reduction. Therefore, much attention has been paid to rechargeable lithium-ion batteries used in electric and electric hybrid vehicles.

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 685–694, 2023. https://doi.org/10.1007/978-981-99-4755-3_59

686

K. Xiao et al.

In lithium-ion batteries, the cathode material plays a crucial role. Currently, nickelrich lithium transition metal oxides are considered to be the most promising candidates because of their ability to increase the specific capacity of lithium-ion batteries by increasing the nickel content. The wide confinement band, high dielectric constant and active electron transport capability of quaternary metal oxides are considered to be the cornerstones for the construction of outstanding optoelectronic and electrical devices. Moreover, modern advanced high-temperature material development tends to use multiple alloys to obtain comprehensive properties such as high-temperature strength, room-temperature toughness and chemical stability to meet the service requirements. The quaternary oxides have excellent electrochemical activity and have broad application prospects in the field of high-performance supercapacitor electrodes. It can be used as electrode materials for supercapacitors because of its excellent electrochemical properties. Supercapacitors are emerging because of their power density, fast charge/discharge characteristics and excellent cycle stability. The high stability, low loss and high dielectric constant of quaternary oxides make them widely used in modern communication. At the same time, quaternary oxides can be used as chemical catalysts to catalyze some pollutants in industrial production. In the article, we focus on the classification of binary and quaternary metal oxides. We select data such as atomic number, electron number and electronegativity, and use PCA algorithm to get the principal components in the data, and use the principal components to describe the data, so as to achieve the effect of dimensionality reduction of the extracted data features. The data are analyzed by support vector machine, and the classification effects of different kernel functions in support vector machine are compared, and the effect of the number of principal components on the classification effect of the model is compared to achieve a more efficient classification of binary and quaternary oxides. It is found that the number of principal components is 4 and the Gaussian kernel function in the support vector machine has a higher accuracy in the experiment. In the experiments of distinguishing binary and quaternary metal oxides, this paper achieves their accurate and convenient identification, which provides a reference for distinguishing oxides (Figs. 1, 2, 3).

Fig. 1. Experimental flow chart

Metal Oxide Classification Based on SVM

687

2 Feature Input 2.1 Data Set The experimental data were obtained from The Citrine Informatics database, and the data set consists of 271 binary metal oxides and 444 quaternary metal oxides by the number of constituent elements [1].The metal oxides in the data consist of 3 alkali metal elements, 4 alkaline earth metal elements, 22 transition metal elements and 5 other metal elements (Sn, Ge, Al, Ga, In) in addition to the O element [2]. The data are rich in types, so the results of the experiments are general in nature.

Fig. 2. Data set distribution

2.2 Characteristic Variable Input The first step in building a machine learning model is to select features as feature inputs that accurately describe the metal oxides and that are physically significant and computationally simple [3]. The six features defined in this paper xi (i = 1,2,3,4,5 and 6) to uniquely describe the metal oxide. The first and second features are the total number of atoms and electrons per unit, respectively. X1 = Nt X2 =

 i

(1) Et

(2)

Nt is the total number of atoms per sample, and Ei is the number of electrons per atom. The characteristics of the oxide must be related to the element oxygen, defining the third, fourth element as follows. X3 = χo − χn

(3)

X4 = N (O)t

(4)

688

K. Xiao et al.

χo is the electronegativity of the O-atom in the Bowling scale. χn the electronegativity of the atom closest to the O atom, N (O)t is the total number of oxygen atoms per unit. Other characteristics are defined as follows. N (O)t Nt

(5)

N (O)t × EO Et

(6)

X5 = X6 =

Eo is the number of electrons in a single O-atom.

3 Modeling 3.1 Principal Component Analysis Theory The technique of principal component analysis (PCA) is mainly used to reduce the dimensionality of the data and convert multiple indicators into a few composite indicators. It has a wide range of applications in simplifying data sets, and it is a linear transformation that transforms the original data into a new coordinate system. Principal component analysis is mainly applied to reduce the dimensionality of a data set while, at the same time, keeping the features that contribute the most to each other’s variance and retaining the main aspects of the data. The ultimate goal is to describe the dataset with the main features in fewer dimensions. The main principle of PCA is to project the original data sample into a new space, and change the original sample data space to the new space coordinates through the data transformation matrix. The new space coordinates are composed of the first few feature vectors corresponding to several maximum feature values in the covariance matrix between different dimensions in the original data sample, and the feature vectors corresponding to smaller feature values will be removed as non-major components, and it can achieve The purpose of extracting some major features to represent the data and reducing the complexity of the data. 3.2 Support Vector Machine Theory Support vector machines [4–9] are a class of generalized linear classifiers that perform binary classification of data in a supervised learning manner, with a decision boundary of the maximum margin hyperplane solved for the learned samples. It is based on the basic principle of minimizing the empirical risk as well as maximizing the classification interval, which makes it possible to obtain better classification accuracy even when the number of samples is small. Support vector machines are good for dealing with smallscale, high-dimensional nonlinear data and have been used to solve numerous problems such as pattern recognition and regression prediction. In the classification problem, given the input data and the learning target, each sample of the input data contains multiple features and thus constitutes a feature space, while the learning target is a binary variable with negative and positive classes.

Metal Oxide Classification Based on SVM

689

If there exists a hyperplane as a decision boundary in the feature space where the input data is located to separate the learning targets by positive and negative classes, and make the point-to-plane distance of any sample greater than or equal to 1 wT X + b = 0

(7)

  yi wT X + b ≥ 1

(8)

The classification problem is said to be linearly differentiable, and the parameters w,b are the normal vector and intercept of the hyperplane,respectively. The decision boundary satisfying this condition actually constructs 2 parallel hyperplanes as interval boundaries to discriminate the classification of the sample. wT X + b ≥ +1 ⇒ yi = +1

(9)

wT X + b ≤ −1 ⇒ yi = −1

(10)

All samples above the upper interval boundary belong to the positive class and those below the lower interval boundary belong to the negative class.

Fig. 3. Support vector machine algorithm

Translated into an optimization problem based on the empirical minimization principle.  1 |W |2 + C (β1 + β2 ) 2 n

min Q =

(11)

i=1

Q is the optimization objective; W is the weighting factor. Finally, it is transformed into dyadic form by Lagrangian function, where the kernel function can map the data to higher dimensions in order to take the optimal partition

690

K. Xiao et al.

hyperplane, and the regression function is obtained as ⎧ n 



⎪ ⎪ ⎪ αi + αi∗ K xi , xj + b ⎪ ⎪ f (x) = ⎪ ⎪ ⎪ ⎨ ⎧ i,j=1 n

⎪ ⎪ αi + αi∗ b ⎪ ⎨ ⎪ ⎪ ⎪ s.t i=1 ⎪ ⎪ ⎪ ⎪ 0 ≤ αi ≤ C ⎪ ⎩ ⎪ ⎩ 0 ≤ αi∗ ≤ C

(12)

αi , αi∗ is the Lagrangian factor; K(xi , xj ) is the kernel function, including linear, polynomial, and Gaussian kernel functions. 3.3 Modeling In this experiment, we use a support vector machine classification model to analyze and process the data. Since the support vector machine is more sensitive to parameters, we will normalize the data. The experiments use K-fold cross-validation method, for small sample data, the test result of this method is more reliable than the result of dividing the original sample into training set and test set, so we take K = 10 (divide the data into 10 parts, take out one part at a time as the test set, and the remaining nine parts as the training set), and use the grid search method to determine the best hyperparameters. Meanwhile, the experiments will test the classification effect of different kernel functions and the effect of the number of principal components on the classification effect. 3.4 Parameter Optimization In experiments, classification algorithms may over-fit the model to obtain high accuracy. Therefore, we repeat 10 times by randomly dividing the training dataset, thus generating 10 different feature input sets for the classifier. Svm obtains the optimal parameters c and g based on the different feature sets. we then select c and g for developing the final classification model. This random cross-validation technique avoids overfitting. Finally, the average performance obtained from the cross-validation is compared in order to select the best model for the experiment. 3.5 Model Evaluation Indicators We build the confusion matrix based on the experimental results to judge the actual classification prediction ability of the model. Based on the results of the experiments, the full classification results can be classified into four cases: true (TP), true negative (TN), false positive (FP), and false negative (FN) [10–14]. The four evaluation metrics in this experiment were applied to assess the model performance. The accuracy rate is the percentage of results that the model predicts correctly. accuracy =

TP + TN TP + TN + FP + FN

(13)

Metal Oxide Classification Based on SVM

691

The precision rate is the proportion of correct predictions in the set of all predictions with positive samples. precision =

TP TP + FP

(14)

The recall is the actual positive sample correctly predicted by the model. recall =

TP TP + FN

(15)

Precision and Recall sometimes appear to be contradictory, and in order to consider them together, the F1 value is also used as one of the evaluation indicators. F1 value is the average of the reconciliation between Precision and Recall, and it considers Precision and Recall together. F1 = 2 ×

1 1 precision

+

1 recall

(15)

4 Model Results and Analysis 4.1 Performance Evaluation of the Model We derived the results of the experiments by performing a 10-fold cross-validation of each model and taking the average value, and compared the performance of the models. According to the evaluation metrics of the models, the support vector machine achieves high accuracy in metal oxide classification as expected. Among the linear kernel function, polynomial kernel function and Gaussian kernel function, the Gaussian kernel function has the highest accuracy rate of 92%. And when the number of principal components is 4, the accuracy of the model can reach 95%. Overall, when the number of principal components n_components = 4 and the kernel function of svm is rbf, the model classifies best and achieves a certain stability and generalization ability to be used in practice. 4.2 Analysis of Experimental Results As mentioned in the model building section, we applied support vector machines to classify the oxides and used three different kernel functions to train the data, and finally compared the classification effects of the three kernel functions. to verify the effectiveness and superiority of this algorithm, the experiments first used the data without dimensionality reduction as the feature input, and the results are shown in Table 1, and the confusion matrix of the model in the validation set is plotted in Fig. 4. Where The Gaussian kernel function has the best classification effect and all the indexes are high, which has a better classification effect compared with the other two kernel functions, but the polynomial kernel function has a poor classification effect. Therefore, among these three kernel functions, the Gaussian kernel function is chosen as the final model parameter in this paper.

692

K. Xiao et al.

Fig.4. Confusion matrix of three kernel functions

Table 1. Accuracy of the 3 kernel functions Classification effect

Gaussian kernel function

Linear kernel functions

Polynomial kernel functions

Accuracy/%

91

86

44

Precision/%

91

87

69

Recall/%

92

77

58

F1

0.91

0.86

0.41

In order to improve the accuracy of recognition, the data is first dimensionalized using PCA before using the data as input data for the support vector machine. The most important thing to reduce the dimensionality of the data is to choose the number of reduced dimensions, and the most suitable value of n_components needs to be selected. The experiments set n_components = 3, 4, 5, 6, and compare the classification effect of each parameter. The results are shown in Table 2, as well as the confusion matrix of the results in Fig. 5. The best classification of the model is achieved when n_components = 4. The accuracy of the classification of the data after dimensionality reduction is

Fig. 5. Confusion matrix for different n_components

Metal Oxide Classification Based on SVM

693

Table 2. Classification effects of different n_components n_components

3

4

5

6

Accuracy/%

85

95

92

91

Precision/%

86

93

92

91

Recall/%

87

95

93

92

F1

0.85

0.94

0.92

0.91

95%, which is a 4% improvement compared to the svm model without dimensionality reduction.

5 Conclusion This experiment presents methods that can accurately classify binary and quaternary oxides. Previously, the input of features to describe oxides was computationally complex and lacked a more obvious physical meaning. So this experiment redefines the six features to describe metal oxides. In this paper, we use principal component analysis and support vector machine to process the data. To achieve a good classification, the three kernel functions of the support vector machine and the number of different principal components are compared, and several evaluation functions are chosen to evaluate the performance of the model. Our proposed method performs well in the experiments with good classification results. We will continue to try more classification methods to further improve the performance of the classification. In summary, this experimental method has achieved a more stable and excellent performance, but there are still some improvements and advancements needed. 1. The oxide data presented in this paper are only for binary and quaternary oxides, and the applicability range of the classifier is small. Therefore, more kinds of data can be considered to be collected in future work to enhance the applicability of the classifier. 2. The classifier used in this paper is support vector machine, while nowadays, with the rapid development of computer technology, more efficient classification methods have appeared and applied to reality, such as deep learning and other algorithms. So in our future work, we have to use these new classification algorithms to improve the classification ability of the model. Acknowledgement. This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu and ghfund202302026465.

694

K. Xiao et al.

References 1. Jorner, K., Tomberg, A., Bauer, C., Skold, ¨ C., Norrby, P.-O.: Organic reactivity from mechanism to machine learning, Nat. Rev. Chem., 5(4), 240–255 (2021). 39 S 2. Lu, Q., Zhou, Y., Ouyang, Y., Guo, Q., Li, Wang, J.: Accelerated discovery of stable lead-free hybrid organicinorganic perovskites via machine learning. Nat. Commun., 9(1), 3405 (2018) 3. Ghiringhelli, L.M., Vybiral, J., Levchenko, S.V., Draxl, C., Scheffler, M.: Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114(10), 105503 (2015) 4. Cortes, C., Vapnik, V.N.: Support vector networks. Mach. Learn. 20(3), 273–297 (1995) 5. Breiman, L.: Random Forest. Mach. Learn. 45(1), 5–32 (2001) 6. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 7. Wang, Y.-C., Wang, Y., Yang, Z.-X., et al.: Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context. BMC Syst. Biol. 5(S1), S6 (2011) 8. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998) 9. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 10. Wei, L., Xing, P., Zeng, J., Chen, J., Su, R., Guo, F.: Improved prediction of protein–protein interactions using novel negative samples, features, and an ensemble classifier. Artif. Intell. Med. 83, 67–74 (2017) 11. Wei, L., Xing, P., Su, R., Shi, G., Ma, Z.S., Zou, Q.: CPPred-RF: a sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency. J. Proteome Res. 16(5), 2044–2053 (2017) 12. Hu, Y., Zhao, T., Zhang, N., Zang, T., Zhang, J., Cheng, L.: Identifying diseases-related metabolites using random walk. BMC Bioinformat. 19(5), 37–46 (2018) 13. Zhang, M., et al.: MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics 35(17), 2957–2965 (2019) 14. Song, T., Rodríguez-Patón, A., Zheng, P., Zeng, X.: Spiking neural P systems with colored spikes. IEEE Trans. Cogn. Dev. Syst. 10(4), 1106–1115 (2017)

Food Image Classification Based on Residual Network Xueyan Yang, Jinping Sun(B) , Zhuo Wang(B) , and Wenzheng Bao School of Information Engineering, Xuzhou University of Technology, Xuzhou 221018, China [email protected], [email protected]

Abstract. As food culture and internet technology evolve, tracking the nutritional information of daily food intake becomes increasingly important for assessing dietary habits and health management status. However, effective food image classification is a prerequisite. Food images present a fine-grained image recognition problem characterized by large inter-class differences and small intra-class differences. The presence of mutual occlusion between foods and background noise challenges existing food image classification techniques in extracting robust visual features. In response to these challenges, this paper proposes a food image classification residual network incorporating pyramid segmentation attention and soft thresholding. Attention is applied across both spatial and channel dimensions to mitigate the impact of noisy data on classification results. In each improved residual block, pyramid segmentation attention (PSA) is embedded to replace the convolutional unit, extracting target features through the spatial-level visual attention vector and multi-scale response map. Concurrently, a soft thresholding subnetwork is embedded within the basic module of the network, employing channel attention to automatically learn the threshold for each sample, thereby suppressing redundant information in the image. Multiple experiments were conducted using the VireoFood-251 food dataset, with results indicating a classification accuracy of 87.03%. When compared to classical models ResNet34 and ResNet50, the accuracy increased by 5.71% and 3.02%, respectively, validating the feasibility of the proposed network framework. Keywords: Classification · Pyramid split attention · Soft thresholding · Residual block

1 Introduction The food industry occupies a pivotal role in various aspects of life, and as a fundamental component of people’s daily routines, monitoring food is vital to assessing health, nutritional structure, and dietary status. With the emergence of the Internet age, the food industry has thrived, and research associated with food spans multiple fields. Among these, food image recognition constitutes a foundational and core task, with extensive applications in food recommendations, unmanned restaurants, smart catering, and other domains. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 695–704, 2023. https://doi.org/10.1007/978-981-99-4755-3_60

696

X. Yang et al.

In traditional food image recognition methods, S. Yang [1] proposed a statisticallybased approach to manually compute food image features. However, this method lacked generalization, as effectively describing features for a vast array of new food images proved challenging. Mohandoss [2] employed the random forest method to extract local visual features from images, achieving basic food image classification. With the rapid advancement of artificial intelligence technology, deep learning methods have gained widespread use in the field of food image recognition [3–8], yielding impressive results. Building on convolutional neural networks, Kawano and Yanai [9] utilized transfer learning to employ pre-trained networks as feature extractors and fine-tuned the network with the target dataset for food image classification, accelerating model convergence and achieving robust recognition performance. Setyono [10] proposed DenseFood, a food image recognition model based on dense connection structures, which accurately attained an 81.23% accuracy rate on the VireoFood-172 dataset, accounting for the significant similarity among Chinese food image categories. L. He [11] proposed an attention model for food image learning, utilizing attention mechanisms to generate image attention weights and act on food image feature maps to obtain critical information. This approach somewhat reduces the impact of noise between images on classification results, but there is still room for improvement in recognition performance. This paper introduces a food image classification model based on a correlation attention residual network. This model enables adaptive learning of feature channel weights and employs soft thresholding to denoise redundant information, connecting spatial and channel attention [12]. Comparative experiments are conducted using typical convolutional neural network models.

2 Experimental Data 2.1 Food Dataset The experience employs the VireoFood-251 food dataset, which includes 169,673 food images in 251 categories, covering eight major food categories, including “vegetables,” “soup,” “soy products,” “eggs,” “meat,” “seafood,” “fish,” and “staple foods.“ Some of the images and the overall distribution. The category distribution in this food image dataset is relatively balanced, which helps to enhance the generalization ability of the end-to-end model on the test data. 2.2

Data Preprocessing

Before model training, the food image data were preprocessed to generate similar but different training samples. This was achieved by combining multiple augmentation methods [13, 14] to enhance the images used for training and to increase the differences between different food features in the images, as shown in Fig. 1. To avoid the impact of environmental factors such as random lighting, imaging angles, and backgrounds as much as possible, geometric transformations such as translation, transpose, mirror, rotation, and scaling were used to process the food images, correcting the systematic errors of the

Food Image Classification Based on Residual Network

697

image acquisition system and the random errors of the instrument position. Additionally, enhancement methods such as brightness enhancement, chromaticity enhancement, and sharpness enhancement were used to reduce the sensitivity of the model to image color. AugMix [15], which uses image fusion to augment data by taking a single image as the original sample, was also used, and the new samples obtained could maintain semantic information. Data augmentation eliminated irrelevant information in the image, restored useful real information, enhanced the detectability of relevant information, and maximized the use of data, thereby improving the reliability of feature extraction, image matching, and recognition in the model.

Original

Random Crop

Vertical Flip

Rotation 45° ColorJitter

AugMix

Fig. 1. The pre-processing steps applied to the food images

3 3.1

Food Image Recognition Model Pyramid Split Attention Unit

The pyramid split attention (PSA) unit [16] presents a multi-scale feature map extraction strategy based on the classic SENet [17] attention network. This approach enriches the feature space by capturing spatial information of varying scales with different sizes of convolutional kernels. It addresses the limitation of single spatial attention, which only considers local information and fails to establish long-range dependencies, thereby enhancing global feature extraction and fusion. The SPC module within the PSA unit initially divides the channel dimension of the input feature map into S parts. As food images exhibit a rich array of features, the subnetwork module preceding the residual block increases the number of feature channels to 512 to fully extract shallow features. In the experiment, S is set to 4 and 8 and alternately embedded in the improved residual block to obtain more diverse and detailed spatial-level multi-scale feature maps. The Concat operation is employed to combine multi-scale features with different receptive fields. To handle input tensors with varying convolution kernel sizes without increasing computational cost, group convolution is introduced and applied to the convolution kernel. Furthermore, a new criterion is designed to select the group size without increasing the number of parameters. The relationship between multi-scale convolution kernel size and group size can be expressed as: G=2

k−1 2

(1)

698

X. Yang et al.

The function for generating the multi-scale feature maps is given by the following equation, where k is the size of the convolutional kernel and G is the group size: Fi = Conv(Ki ∗ ki , Gi )(X ), i = 0, 1, 2 . . . S − 1

(2)

The ith convolution kernel parameter k in group convolution is 2 × (i+1) + 1. The entire multi-scale feature map can be concatenated as:   (3) F = Cat F0 , F1 , . . . , FS−1 The importance of different channels in the feature map varies, and the SEWeight module compresses the feature pixels by global average pooling on the input feature map, which becomes a 1 × 1 × C vector to extract high-level features. Then, it goes through two linear layers to generate the desired weight information through weight W. In experiments, the sigmoid function is used to reassign the feature map weight vector between 0 and 1. Element-wise multiplication is used to obtain the final result response map by multiplying the weight vector with the original feature map. As shown in Fig. 2, different attention weights are assigned to the output feature map channels, and the PSA unit fuses context information of different scales to generate better pixel-level attention.

Fig. 2. The structure of PSA unit

3.2

Soft Thresholding Subnetwork

When classifying food image samples, there may inevitably be some noise in the samples that is unrelated to the current classification task, which may have a negative impact on the classification performance. Even for the same sample set, the amount of noise in each sample is often different. Soft thresholding is used to weaken the noise in food images and retain the features related to the current task. Soft thresholding is a core step in many signal denoising algorithms, where features with absolute values smaller than a certain threshold are set to zero, and features with absolute values greater than this threshold are adjusted towards zero. The formula or implementing soft thresholding is: ⎧ x>τ ⎨x − π y= (4) 0 −τ ≤ x ≤ τ ⎩ x x < −τ

Food Image Classification Based on Residual Network

699

The derivative formula of the output of soft thresholding with respect to the input is: ⎧ x>τ ⎨1 ∂y (5) ∂x = ⎩ 0 −τ ≤ x ≤ τ 1 x < −τ The output of soft thresholding with respect to the input can be calculated using formula 4 and formula 5, which indicate that the derivative of soft thresholding is either 0 or 1. Therefore, soft thresholding can also reduce the risk of gradient vanishing and gradient explosion in deep neural networks. It is important to set the threshold as a positive value that is not larger than the maximum value of the input signal in order to avoid losing all information in the feature map due to an output of zero. The soft thresholding sub-network, which essentially involves channel attention mechanism, is embedded in the improved residual block. The absolute values of the food image feature map input to the feedforward sub-network are calculated, followed by global average pooling and global maximum pooling to obtain diverse deep semantic features. A small fully connected network is used to process part of the obtained features, and the original fully connected layer is replaced by 1x1 convolution to reduce the number of computational parameters. Each weight layer is tightly connected with batch normalization (BN) layer and ReLu activation function, and the output is normalized to a range between 0 and 1. The obtained coefficients are multiplied with the other part of the output to obtain the threshold between feature map channels. As shown in Fig. 3, each sample has its own independent threshold according to its noise content.

Fig. 3. Soft thresholding subnetwork structure

3.3

Residual Module

Compared to conventional convolutional neural networks, deep residual networks] adopt skip connections across layers, and each residual block consists of a direct mapping part (xl ) and a residual part F(xl , wl ), establishing an effective connection between the input and output features, which enables the neural network to maintain its feature representation ability while deepening, and effectively address the problem of gradient vanishing or explosion caused by the increasing number of layers, controlling the network layer parameters and computational complexity unchanged.

700

X. Yang et al.

Residual networks are composed of a series of residual blocks. A residual block can be represented as: xl+1 = xl+1 + F(xl , wl )

(6)

In this study, we enhanced the residual block within the residual network and developed a food image recognition network utilizing correlation attention. The original single 3 × 3 convolution layer in the residual block was replaced with a pyramid split attention (PSA) unit. The PSA module incorporated 3 × 3 and 1 × 1 convolution layers in a sequential order. All weights were linked to batch normalization and activation layers. A max pooling layer was introduced following the 3 × 3 convolution layer to decrease the convolution layer’s position sensitivity and perform spatial downsampling representation. Simultaneously, a sub-channel attention network was integrated into the crosslayer identity connection section to adaptively learn the weights between input feature channels. This network computed the soft threshold for feature map thresholding and essentially combined spatial and chnel attention employed in the residual block. This approach emphasized feature fusion extraction in the channel and spatial dimensions while effectively reducing redundant information noise for each distinct sample. 3.4

Optimization Algorithm

The momentum gradient descent algorithm is modified based on the traditional gradient descent algorithm to address the slow convergence problem. The following modifications are made: vi = γ vi−1 + η∇L(θ)

(7)

θi = θi−1 − vi

(8)

The core of the momentum method is the exponential weighted moving average algorithm, which is a method of estimating a time series using historical values and current observations. Using the momentum method is equivalent to considering the previous velocity every time parameters are updated. The magnitude of movement of each parameter in each direction depends not only on the current gradient but also on whether the previous gradients in each direction were consistent. If the gradient is constantly updated along the horizontal direction, this will aggregate highly aligned gradients, thereby increasing the distance we cover at each step. If the gradient is constantly changing in the vertical direction, the aggregated gradient will decrease the step size due to the oscillation canceling each other out. In this way, a larger learning rate can be used to converge faster, and the magnitude of each update in the direction of a large gradient will decrease due to the momentum effect.

Food Image Classification Based on Residual Network

4

701

Experimental Result and Analysis

4.1 4.1.1

Model Accuracy Experimental Results

In this food image classification task, 80% of the samples from each class were allocated to the train set, while the remaining 20% comprised the test set. Given that this experiment involves single-label classification, accuracy served as the evaluation metric. Figure 5 illustrates the train and test accuracies of the residual network with fusion attention applied to food image data. It reveals that a fine-grained image recognition accuracy of 87.03% can be achieved with approximately 65 iterations. As depicted in Fig. 6, the test data loss value decreases to 0.15, suggesting that the model exhibits a strong generalization ability.

Fig.5. The train accuracy (red) and test accuracy (blue) of the experimental network.

Fig.6. The test loss of the experimental network

The lower coordinates beneath the weight parameters of the conv1 layer within the layer1 module of the experimental network represent the distribution range of weight values, while the right side depicts the frequency of occurrence for specific weight values. The central curve illustrates the cumulative results of each training iteration. As demonstrated in Fig. 7, through iterative training, the model’s tensor weights are consistently updated accurately and exhibit a tendency to be distributed near the 0 region. 4.1.2

Model Comparison Experiment

The proposed experimental method was compared with the mainstream computer vision image classification methods. As shown in Table 1, the proposed network model based on the attention mechanism outperformed other classification models on the VireoFood-251 food image dataset, further validating its robustness.

702

X. Yang et al.

Fig. 7. Histogram of weight parameters in the conv1 layer of the layer1 module

Table 1. Experimental results of different models on VireoFood-251

4.2

Method

Top-1/%

Top-5/%

VGG16

74.21

82.50

Resnet34

81.32

87.83

Resnet50

84.01

90.15

GoogLeNet

86.95

93.90

Ours

87.03

94.81

Ablation Experiments

In the food image classification ablation experiment, four sub-branch network structures were mainly compared: (1) (2) (3) (4)

a ResNet model without modifications; a network without the PSA unit replacing conventional convolution; a network without the embedded soft thresholding; a residual network with fused relation attention.

The experimental outcomes for various sub-branch networks on the VireoFood-251 dataset are presented in Table 2. The results indicate that, when compared to the network without the PSA unit replacing the conventional convolution and the network without the embedded soft thresholding, the network incorporating both attention modules demonstrates a classification accuracy improvement of 2.79% and 4.53% on the dataset, respectively.

Food Image Classification Based on Residual Network

703

Table 2. Ablation experiment results of our method on VireoFood-251 dataset. Method

5

Top-1/%

Original model

81.32

Without PSA sub-network

84.24

Without soft-threshold sub-network

82.50

Ours

87.03

Conclusion

In this study, we introduce a residual network featuring a correlation attention mechanism designed to optimize training for food image classification, addressing the challenges of low recognition accuracy due to high noise content in the fine-grained features of food images. Our experimental results demonstrate that our proposed method achieves high recognition accuracy and robust generalization capabilities. By incorporating multi-scale spatial attention and channel attention through soft thresholding in the enhanced residual block, we expand the feature space across multiple scales and address the limitation of conventional attention mechanisms, which only consider local features and do not establish long-range dependencies. Additionally, we employ the momentum gradient descent optimization algorithm to minimize oscillation and expedite model convergence. Our experimental outcomes outperform those of well-established convolutional neural network models such as ResNet34, ResNet50, VGG16, and GoogLeNet. However, to be suitable for practical applications, further improvements in accuracy are necessary. Future research could explore the applicability of this network design to other foodrelated objects, including ingredients, food status, or dining scenes. Acknowledgement. This work was supported by the National Natural Science Foundation of China (Grant No. 61902337), Xuzhou Science and Technology Plan Project (KC21047), Jiangsu Provincial Natural Science Foundation (No. SBK2019040953), Natural Science Fund for Colleges and Universities in Jiangsu Province (No. 19KJB520016) and Young Talents of Science and Technology in Jiangsu and ghfund202302026465, Basic Science Major Foundation (Natural Science) of the Jiangsu Higher Education Institutions of China (Grant: 22KJA520012), the Xuzhou Science and Technology Plan Project (Grant:KC22305).

References 1. Yang, S., Chen, M., Pomerleau, D., Sukthankar, R.: Food recognition using statistics of pairwise local features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2249–2256 (2010) 2. Mohandoss, D.P., Shi, Y., Suo, K.: Outlier prediction using random forest classifier. In: 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), NV, USA, pp. 0027–0033 (2021) 3. Bucher, T., van der Horst, K., Siegrist, M.: The fake food buffet – a new method in nutrition behaviour research. Br. J. Nutr. 107(10), 1553–1560 (2012)

704

X. Yang et al.

4. Mezgec, S., Eftimov, T., Bucher, T., Korouši´c Seljak, B.: Mixed deep learning and natural language processing method for fake-food image recognition and standardization to help automated dietary assessment. In: Public Health Nutrition, vol. 22, no. 7, pp. 1193–1202, May (2019) 5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 1097–1105 (2012) 6. Kagaya, H., Aizawa, K.: Highly accurate food/non-food image classification based on a deep convolutional neural network. In: International Conference on Image Analysis and Processing, pp. 350–357 (2015) 7. Yanai, K., Kawano, Y.: Food image recognition using deep convolutional network with pre-training and fine-tuning. In: Multimedia & Expo Workshops (ICMEW) 2015 IEEE International Conference on, pp. 1–6 (2015) 8. Ozsert Yigit, G., Özyildirim, B.M.: Comparison of convolutional neural network models for food image classification. J. Inf. Telecommun., 1–11 (2018) 9. Kawano, Y., Yanai, K.: Food image recognition with deep convolutional features. In: 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 589–593 (2014) 10. Setyono, N.F.P., Chahyati, D., Fanany, M.I.: Betawi traditional food image detection using ResNet and DenseNet. In: 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Yogyakarta, Indonesia, pp. 441–445 (2018) 11. He, L., Cai, Z., Ouyang, D., Bai, H.: Food recognition model based on deep learning and attention mechanism. In: 2022 8th International Conference on Big Data Computing and Communications (BigCom), Xiamen, China, pp. 331–341 (2022) 12. X. Xiang, M. Zhai, R. Zhang, N. Lv and A. El Saddik, “Optical Flow Estimation Using SpatialChannel Combinational Attention-Based Pyramid Networks,” 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, pp. 1272–1276, 2019 13. AbuSalim, S., Zakaria, N., Mokhtar, N., Mostafa, S.A., Abdulkadir, S.J.: Data augmentation on intra-oral images using image manipulation techniques. In: 2022 International Conference on Digital Transformation and Intelligence (ICDI), Kuching, Sarawak, Malaysia, pp. 117–120 (2022) 14. Wen, Q., et al.: Time series data augmentation for deep learning: a survey, pp. 4653–4660 (2021) 15. Ishida, N., Nagatsu, Y., Hashimoto, H.: Unsupervised anomaly detection based on data augmentation and mixin. In: IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, pp. 529–533 (2020) 16. Zhang, H., Zu, K., Lu, J., et al.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network (2021) 17. Li, J., Cheng, N.: SEDCN: an improved deep & cross network recommendation algorithm based on SENET. In: 2022 IEEE/ACIS 22nd International Conference on Computer and Information Science (ICIS), Zhuhai, China, pp. 218–222 (2022)

BYOL Network Based Contrastive Clustering Xuehao Chen1 , Weidong Zhou2 , Jin Zhou1(B) , Yingxu Wang1 , Shiyuan Han1 , Tao Du1 , Cheng Yang1 , and Bowen Liu1 1 School of Information Science and Engineering, University of Jinan, Jinan 250022, China

[email protected] 2 School of Microelectronics, Shandong University, Jinan 250100, China

Abstract. This passage introduces a new clustering approach called BYOL network-based Contrastive Clustering (BCC). This methodology builds on the BYOL framework, which consists of two co-optimized networks: the online and target networks. The online network aims to predict the outputs of the target network while maintaining the similarity relationship between views. The target network is stop-gradient and only updated by EMA of the online network. Additionally, the study incorporates the concept of adversarial learning into the approach to further refine the cluster assignments. The effectiveness of BCC is demonstrated on several mainstream image datasets, achieving impressive results without the need for negative samples or a large batch size. This research showcases the feasibility of using the BYOL architecture for clustering and proposes a novel clustering method that eliminates the problems bring by negative samples and reduce the computational complexity. Keywords: Deep Clustering · Contrastive Learning · Unsupervised Learning · Adversarial Learning

1 Introduction Clustering is a widely used technique in machine learning that aims to group data object based on their similarity [1], and it plays a crucial role in various fields, including data mining [2], statistical analysis [3], and pattern recognition [4]. Over the years, many clustering methods have been developed to uncover the underlying structures and features of data. However, with the increasing amount of high-dimensional data in the big data era, traditional clustering methods face significant challenges due to their limited representability. To address this issue, researchers have explored dimensionality reduction and feature extraction techniques to map the raw data into a new feature space that is easier to classify. Despite their usefulness, traditional data transformation methods such as principal component analysis [5], kernel methods [6], and spectral methods [7] are computationally expensive and struggle to handle large-scale data. While some random projection and random feature methods can produce low-dimensional features and better approximations of user-specified kernels, these shallow models have limited feature representation capabilities. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 705–714, 2023. https://doi.org/10.1007/978-981-99-4755-3_61

706

X. Chen et al.

In recent years, there has been extensive research on deep learning based on neural networks, which aims to discover effective data representations. Among these methods, deep clustering [8] has shown great promise and outstanding performance by utilizing deep neural networks in the field of unsupervised clustering. Deep clustering methods can be generally divided into two categories: generative models and discriminative models. Generative models establish a distribution or latent embedding of the data to learn representation features and perform clustering on the representation in an end-to-end manner. Popular generative clustering methods include autoencoder (AE) based clustering methods, variational autoencoder (VAE) based clustering methods, and generative adversarial network (GAN) based clustering methods. However, all generative clustering methods require computationally expensive high-detailed data generation, which may not be necessary for effective representation learning and clustering. Discriminative models differ from generative models by removing the computationally expensive data generation step while focus on obtaining a discriminative representation using contrastive learning to improve downstream clustering tasks. SimCLR [9] uses the representations views of samples to maximize the similarity between views from the same sample and minimize the similarity between views from different samples to optimize the representation. Based on this concept, some researchers suggest a twostage approach where representation learning, and clustering are separate. SCAN [10] is a two-step clustering method that first identifies the nearest neighbors of each image using feature similarity pre-trained by SimCLR. This process helps identify semantically meaningful information from images and integrates the nearest neighbors as prior knowledge into a learnable approach for optimizing the cluster network. Recent methods in deep learning focus on improving the discriminative power of neural networks through customized losses or pre-training stages. However, these methods lack guidance specifically geared towards clustering, leading to suboptimal clustering performance. To address this issue, newer discriminative models have been developed that incorporate clustering-oriented information into the contrastive learning structure. These models aim to learn a representation space in a SimCLR manner to maintain instance-level relationship. Additionally, the cluster-level relationship is optimized by maximizing the similarities between the same cluster and minimizing those that are not. By combining instance-level and cluster-level clustering, contrastive-based clustering methods can maintain inner and outer cluster relationships simultaneously, allowing for the discovery of cluster-oriented information. Contrastive Clustering (CC) [11] is a clustering method that is inspired by SimCLR, which separates the instance-level and cluster-level contrastive learning into two distinct subspaces. While discriminative models have demonstrated impressive clustering results, they often require a high number of negative pairs to capture uniform representations, which increases computational complexity and necessitates large batch sizes. Additionally, the use of negative pairs results in different instances from the same cluster being considered as negative pairs and being wrongly pushed away, leading to cluster collision. Recently, self-supervised representation learning methods that do not rely on negative pairs have been proposed and achieve comparable or even superior performance to state-of-the-art contrastive methods.

BYOL Network Based Contrastive Clustering

707

In order to enhance the coherence of positive pairs, BYOL [12] uses an augmented view of an image to train an online network to predict the target network representation of the same image under a different augmented view. However, due to the absence of negative pairs in contrastive learning-based clustering, these self-supervised representation learning methods may fail to capture uniform representations, which can lead to the collapse of clustering, wherein all data samples are assigned to a few clusters rather than the desired number. BYOL address this issue by creating two alternating optimized networks which the online network is update by predicting the target network and the target network is update by the moving average weight of online network. This paper presents a novel end-to-end approach called BYOL network-based endto-end Contrastive Clustering (BCC) to address the limitations of existing methods. BCC is inspired by the idea of “cluster assignment as representation” and modifies the BYOL network by adding a Softmax layer to capture the cluster assignment of data. This approach is designed to learn the cluster assignment in an end-to-end manner. In addition, this paper integrates adversarial learning into the BYOL network to improve discrimination between different clusters and mitigate the problem of collapse. To reduce the high interactivity dependency between the online and target networks, a self-improvement loss is introduced to evaluate the similarity of the cluster assignments of all positive pairs within a mini-batch across the online network.

2 BYOL Network-Based End-to-End Contrastive Clustering This section proposes a novel solution to address the problem of self-supervised representation learning models, such as BYOL, that lack negative pairs and struggle to learn uniform representations, resulting in the collapse of clustering. To overcome this issue, we introduce the BYOL network-based end-to-end Contrastive Clustering (BCC) model, which is comprised of two components: the contrastive network and the discriminative network. The contrastive network seeks the alignment of positive pairs to learn good cluster assignments, while the discriminative network imposes a one-hot-style prior distribution on the learned cluster assignments to perform the clustering task. Additionally, the BCC model integrates adversarial learning to enhance cluster discrimination and minimize the interactivity dependency between the online and target networks. A selfimprovement loss is introduced to measure the similarity of the cluster assignments, and negative pairs are incorporated to achieve uniform representations. The efficacy of the proposed models is demonstrated through experiments on various image datasets with different levels of detail. The framework of the BCC model is illustrated in Fig. 1 2.1 The Contrastive Network for Representation Learning The representation learning process in BCC involves a contrastive network comprising two networks: the online network and the target network. The online network, defined by an encoder fξ . And with parameters ξ , is responsible for extracting the features of the data representation, which is then transformed to cluster assignments by the Softmax layer Sξ . The target network has a similar structure to the online network but uses different parameters θ .

708

X. Chen et al.

Fig. 1. The framework of the BCC model. N ×D . In a mini-batch of size N containing data set X = {xi |1 ≤  ia ≤ N } ∈ R a With dimension D, two correlated views of the data, X = xi |1 ≤ i ≤ N and   X b = xib |1 ≤ i ≤ N , are obtained using data augmentations. The online network   generates the cluster assignment Z a = zia |1 ≤ i ≤ N ∈ RN ×K from the first augview X a , while the target network generates the cluster assignment Zˆ b = mented b zˆi |1 ≤ i ≤ N ∈ RN ×K from the second augmented view X b , where the number of clusters is denoted as K. The aim of contrastive learning is to optimize the similarity between positive pairs and facilitate the mutual improvement of the online and target network. In contrast to BYOL’s use of cosine distance metric, Kullback-Leibler (KL) divergence is employed to calculate the similarity between the cluster assignments of positive pairs, as it is better suited to measure the dissimilarity between two probability distributions. The mutual improvement of the contrastive network is achieved through a loss function (1)   (1) Lmi = KL Z a , Zˆ b

We symmetrize the loss Lmi by separately feeding X b into  theonline network a ˜ and X into the target network to compute: Lmi = KL Z b , Zˆ a and the final mutual-improvement loss of BCC is denoted as (2).     a ˆb b ˆa ˜ (2) LBCC mi = Lmi + Lmi =KL Z , Z +KL Z , Z The contrastive network consists of two interconnected networks, and if either one is not optimized well, the entire structure may suffer deterioration. Specifically, the clustering process that follows may ruin the feature space and compromise the preservation of local structure. To address this issue, we propose a new loss function called the self-improvement loss, which measures the similarity of cluster assignments across the online network itself and aims to reduce the high interdependence between the online and target networks. The self-improvement loss is defined in Eq. (3).   a b (3) = KL Z , Z LBCC si

BYOL Network Based Contrastive Clustering

709

2.2 The Discriminative Network for Data Clustering In BCC, the discriminative network D(·) with parameters η is constructed for the data clustering. Given the data X in a mini-batch, we input its two augmented views X a and X b to the online network, and output the corresponding cluster Z a and Z b . Then   assignments  a one-hot-style prior distribution P ∼ Cat K, p = 1 K is imposed on the learned cluster assignments Z (the alternative to Z a or Z b ), and the adversarial learning between Z and P is conducted to make Z closer to the form of one-hot, so as to enhance the discrimination of cluster and alleviate the collapse problem. Referring to the WGANGP method, the adversarial losses of the discriminative network for the generator LBCC Adv−G and the discriminator LBCC are defined as (4) and (5), respectively. Adv−D LBCC Adv−G = −Ez∼Z [D(z)]

(4)



2 LBCC Adv−D = Ez∼Z [D(z)] − Ep∼P D(p) + δEr∼R (∇r D(r)2 − 1)

(5)

where r =∈ p + (1− ∈)z subject to ∈∼ U [0, 1] is a representation sampled uniformly along straight lines between the prior distribution P and the soft assignments Z, (∇r D(r)2 − 1)2 is the one-centered gradient penalty that limits the gradient of the discriminative network to be around 1, and δ is the gradient penalty coefficient. 2.3 Training of the BCC Integrating the contrastive and the discriminative networks, the final loss function of BCC is defined as (6). BCC BCC LBCC = LBCC Adv−G + αmi · Lmi + αsi · Lsi

(6)

where αmi and αsi are parameters to trade off the importance of loss terms. The stochastic gradient descent (SGD) is utilized to optimize the parameters of the contrastive network and the discriminative network simultaneously. It is worth to note that the optimization of the contrastive network is performed to minimize LBCC−P only in respect of the online network, but not the target network, as depicted by the stop-gradient in Fig. 1. Therefore, the parameters of online network ξ are updated by (7). Similarly, the parameters of discriminative network η are updated by (8). ξ =ξ −α η =η−α where α is the learning rate.

∂LBCC ∂ξ

∂LBCC Adv−D ∂η

(7)

(8)

710

X. Chen et al.

Referring to BYOL, the parameters of target network θ is a weighted moving average of the online parameters ξ , and can be updated by (9). θ ← τ θ + (1 − τ )ξ where τ ∈ [0, 1] is a target decay rate to control the moving rate. The Algorithm of BCC is shown as follow.

(9)

BYOL Network Based Contrastive Clustering

711

3 Experiments 3.1 Implementation Detail Image Augmentations. Our method examines three distinct types of data augmentation methods in our experiments. These include a horizontal flip to reverse the image along the horizontal axis, color distortion to modify aspects such as brightness, contrast, saturation, and hue of the image, and grayscale conversion to shift the image to a grayscale format. Compared Datasets. The proposed method was analyzed on seven image datasets that are classified into two types. The first type comprises low-detail grayscale images such as MNIST and Fashion-MNIST. The second type consists of high-detail color images that contain interfering objects in the background, such as CIFAR-10, ImageNet-10, and STL-10. Table 1 offers a short summary of these datasets. Table 1. A summary of datasets used for the experiments. Datasets

Samples size

Classes

Image size

MNIST

70,000

10

28 × 28 × 1

Fashion-MNIST

70,000

10

28 × 28 × 1

CIFAR-10

60,000

10

32 × 32 × 3

ImageNet-10

13,000

10

96 × 96 × 3

STL-10

13,000

10

96 × 96 × 3

Network Structure. In BCC, ResNet-18 is utilized to obtain the representation for the contrastive network. The representation is then converted into cluster assignments through a Softmax layer with a dimension corresponding to the number of clusters K. On the other hand, the discriminative network of BCC employs a three-layer fully connected network to allocate data samples to different clusters, with each layer having dimensions of K-1024–512-1. Hyperparameters. Hyperparameters are a crucial component of the proposed deep clustering models. In BCC, we utilize the Adam optimizer with a learning rate of 0.0003. Referring to BYOL, we set the moving average parameter in (1) to 0.99. We set the gradient penalty coefficient in (6) for the discriminative network to 10 and the default batch size to 64. For αmi , αsi in the loss function of BCC, the grid search technique is adopted to search the optimal value of these hyper-parameters, and the recommended values on different datasets are listed in Table 2. 3.2 Comparisons Methods In our experiments, we utilized several common clustering methods that fall into four categories: (1) Traditional clustering methods such as K-means, SC, AC, and NMF, which primarily cluster based on sample distance. (2) Deep generative clustering methods, including AE-based clustering, VAE-based clustering, and GAN-based clustering.

712

X. Chen et al. Table 2. Recommended values of the hyper-parameters on different datasets.

Method

Parameter

MNIST Fashion-MNIST

CIFAR-10 CIFAR-100 ImageNet-10 Tiny-ImageNet

STL-10

BCC

αmi

2

4

5

αsi

1

2

2

(3) Contrastive learning-based clustering, which comprises IIC [13], DCCM [14], DCCS [15], DHOG [16], and GATCluster [17]. It is worth to note that for SC, NMF, AE, GAN, and VAE, the clustering results are obtained through k-means on the features extracted from images. Table 3. The comparison results with other clustering methods in multiple datasets. Method

MNIST

Fashion-MNIST

CIFAR-10

STL-10

ImageNet-10

ACC

NMI

ARI

ACC

NMI

ARI

ACC

NMI

ARI

ACC

NMI

ARI

ACC

NMI

ARI

K-means

0.572

0.500

0.365

0.474

0.512

0.348

0.229

0.087

0.049

0.192

0.125

0.061

0.241

0.119

0.057

SC

0.696

0.663

0.521

0.508

0.575

0.382

0.247

0.103

0.085

0.159

0.098

0.048

0.274

0.151

0.076

AC

0.695

0.609

0.481

0.500

0.564

0.371

0.228

0.105

0.065

0.332

0.239

0.140

0.242

0.138

0.067

NMF

0.545

0.608

0.430

0.434

0.425

0.321

0.190

0.081

0.034

0.180

0.096

0.046

0.230

0.132

0.065

AE

0.812

0.725

0.613

0.563

0.561

0.379

0.314

0.239

0.169

0.303

0.250

0.161

0.317

0.210

0.152

DEC

0.843

0.772

0.741

0.590

0.601

0.446

0.301

0.257

0.161

0.359

0.276

0.186

0.381

0.282

0.203

VAE

0.945

0.876

0.884

0.578

0.630

0.542

0.291

0.245

0.167

0.282

0.200

0.146

0.381

0.282

0.203

DEPICT

0.965

0.917

0.094

0.392

0.392

0.357

0.279

0.237

0.171

0.312

0.229

0.166

0.363

0.242

0.197

GAN

0.736

0.763

0.827

0.558

0.584

0.631

0.315

0.265

0.176

0.298

0.210

0.139

0.346

0.225

0.157

DAC

0.978

0.935

0.949

0.615

0.632

0.502

0.522

0.396

0.306

0.470

0.366

0.257

0.527

0.394

0.302

DCCS

0.989

0.970

0.976

0.756

0.704

0.623

0.656

0.569

0.469

0.536

0.49

0.362

0.737

0.640

0.560

DCCM

0.982

0.951

0.954

0.753

0.684

0.602

0.623

0.496

0.408

0.482

0.376

0.262

0.71

0.608

0.555

DHOG

0.954

0.921

0.917

0.658

0.632

0.534

0.666

0.585

0.492

0.483

0.413

0.272

-

-

-

GATCluster

0.943

0.896

0.887

0.618

0.614

0.522

0.610

0.475

0.402

0.583

0.446

0.363

0.739

0.594

0.552

IIC

0.992

0.979

0.978

0.657

0.634

0.524

0.617

0.513

0.411

0.499

0.431

0.295

0.701

0.598

0.549

BCC (ours)

0.996

0.982

0.979

0.761

0.706

0.627

0.685

0.602

0.502

0.654

0.565

0.458

0.853

0.746

0.713

Table 3 shows that BCC outperformed other methods on MNIST and FashionMNIST datasets by 0.4% and 0.5%, respectively. It also achieved a significant improvement of 1.9%, 7.1%, and 11.4% on the CIFAR-10, STL-10, and ImageNet-10 datasets when compared to recent contrastive-based clustering methods. Our analysis of the results is as follows: 1. Compared with traditional clustering methods that mainly perform clustering based on the distance information between samples, deep clustering can capture the semantic information of samples via deep neural network.

BYOL Network Based Contrastive Clustering

713

2. Compared with AE-based, VAE-based, and GAN-based clustering, our contrastive network focuses on extracting the similarity and dissimilarity between different views of samples, which enables the network to capture more clustering-related information. 3. Compared to contrastive-based clustering, we have removed the negative samples to decrease the computational complexity of the loss function. Additionally, we have incorporated a discriminative network to sharpen the cluster assignment. 3.3 Analysis of Batch Size Contrastive-based clustering methods, particularly those that utilize negative samples drawn from the minibatch, suffer reduced accuracy as batch size is decreased. To investigate the effect of batch size on our methods, we trained them separately using batch sizes of 256, 128, and 64, and the results are presented in Fig. 2. Our methods are represented by solid lines.

Fig. 2. The relationship between batch size and accuracy on MNIST and ImageNet-10 datasets.

According to Fig. 2, our method is not significantly affected by batch size. It maintains a consistent level of accuracy across all datasets while using smaller batch sizes. Consequently, BCC not only eliminates the need for large GPU memory during training but also resolves the issue of an imbalance in the numbers of positive and negative pairs.

4 Conclusion This paper presents a clustering method that merges BYOL network and contrastive learning. Our approach showcases the possibility to conduct clustering without negative samples and large batch sizes. By integrating the BYOL structure and discriminator into the optimization process, our method, achieves state-of-the-art performance on multiple datasets, including MNIST, Fashion-MNIST, CIFAR-10, STL-10, and ImageNet-10. By removing negative samples to reduce computational complexity and incorporating a discriminative network to refine cluster assignments, we overcome limitations of contrastive-based clustering while maintaining high accuracy. Our results demonstrate the potential of our approach to advance the field of clustering and its applications, offering a fresh perspective on contrastive learning-based clustering.

714

X. Chen et al.

Acknowledgment. This work was supported in part by the National Natural Science Foundation of China under Grant No. 62273164 and the Joint Fund of Natural Science Foundation of Shandong Province under Grant No. ZR2020LZH009.

References 1. Yang, M.S., Wu, K.L.: A similarity-based robust clustering method. IEEE Trans. Pattern Anal. Mach. Intell. 26(4), 434–448 (2004) 2. Atilgan, C., Nasibov, E.N.: A space efficient minimum spanning tree approach to the fuzzy joint points clustering algorithm. IEEE Trans. Fuzzy Syst. 27(6), 1317–1322 (2018) 3. Dixon, W.J., Massey, Jr. F.J.: Introduction to statistical analysis. New York, NY, USA: McGraw-Hill, 344, (1951) 4. Horn, D., Gottlieb, A.: Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys. Rev. Lett. 88(1), 018702 (2001) 5. Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010) 6. Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36(3), 1171–1220 (2008) 7. Bernardi, C., Maday, Y.: Spectral methods. Handbook of numerical analysis 5, 209–485 (1997) 8. Song, C., Liu, F., Huang, Y., Wang, L., Tan, T.: Auto-encoder based data clustering. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013. LNCS, vol. 8258, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41822-8_15 9. Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. Int. Conf. Mach. Learn., 1597–1607 (2020) 10. Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., Van Gool, L.: SCAN: learning to classify images without labels. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 268–285. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-58607-2_16 11. Li, Y., Hu, P., Liu, Z., et al.: Contrastive clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, issue 10, pp. 8547–8555 (2021) 12. Grill, J.B., Strub, F., Altché, F., et al.: Bootstrap your own latent-a new approach to selfsupervised learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020) 13. Ji, X., Henriques, J.F., Vedaldi, A.: Invariant information clustering for unsupervised image classification and segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9865–9874 (2019) 14. Wu, J., Long, K., Wang, F., et al.: Deep comprehensive correlation mining for image clustering. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8150–8159 (2019) 15. Zhao, J., Lu, D., Ma, K., Zhang, Y., Zheng, Y.: Deep image clustering with category-style representation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 54–70. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_4 16. Darlow, L.N., Storkey, A.: Dhog: deep hierarchical object grouping. arXiv preprint arXiv: 2003.08821 (2020) 17. Niu, C., Zhang, J., Wang, G., Liang, J.: GATCluster: self-supervised gaussian-attention network for image clustering. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 735–751. Springer, Cham (2020). https://doi.org/10.1007/9783-030-58595-2_44

Deep Multi-view Clustering Based on Graph Embedding Chen Zhang1 , Weidong Zhou2 , Jin Zhou1(B) , Yingxu Wang1 , Shiyuan Han1 , Tao Du1 , Cheng Yang1 , and Bowen Liu1 1 School of Information Science and Engineering, University of Jinan, Jinan 250022, China

[email protected] 2 School of Microelectronics, Shandong University, Jinan 250100, China

Abstract. In recent decades, multi-view clustering has garnered more attention in the fields of machine learning and pattern recognition. Traditional multi-view clustering algorithms typically aim to identify a common latent space among the multi-view data and subsequently utilize k-means or spectral clustering techniques to derive clustering outcomes. These methods are time and space consuming while leading to splitting feature extraction and clustering. To address these problems, deep multi-view clustering based on graph embedding is proposed in this paper. First, multiple autoencoders are used to mine complementary information from multi-view and simultaneously seek the common latent representations. In addition, both the validity of the nearest neighbor correlation and the local geometric structure of multi-view data are taken in account and a novel graph embedding scheme is proposed. Specifically, the affinity graph information of the original data is directly applied to the soft assignment of data, which is consistent with the clustering loss, thus improving the performance of multi-view clustering. Numerous experiments conducted on various datasets exhibit the efficacy of our algorithm. Keywords: Graph Embedding · Multi-view Clustering · Deep Clustering

1 Introduction Clustering is a crucial branch of machine learning that holds significance in various fields such as data mining and pattern recognition [1]. As a typical unsupervised method, clustering classifies data samples into different classes according to their degree of association. In recent decades, several classic clustering models have been developed, such as k-means [2] and spectral clustering [3]. These methods are feasible for single-view clustering, but cannot effectively handle data with multiple information sources. With the increasing ability of human to collect data, single-view data with different features and different sources gradually evolve into multi-view data consisting of multiple feature sets. To address this challenge, researchers have developed a technique called multiview clustering, which leverages the complementarity and correlation of multi-view to enhance clustering performance. Currently, most multi-view clustering methods aim to integrate different views into a common representation, which is crucial for the ensuing clustering task. But uniting multiple views to explore the common representation is © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 715–726, 2023. https://doi.org/10.1007/978-981-99-4755-3_62

716

C. Zhang et al.

a long-standing challenge due to the complex correlation between different views. To cope with this challenge, subspace methods, kernel methods and graph learning methods have been used to learn common representations and have produced many excellent multi-view clustering methods [4]. The era of big data has brought about serious challenges for traditional clustering methods, particularly with respect to large-scale and high-dimensional data. As a result, how to deal with massive and high-dimensional data quickly and accurately has become the focus of research. Although some dimensionality reduction techniques such as principal component analysis [5] have been successfully used to convert data into low-rank feature spaces, the representation capability of such shallow models is limited. Recently, deep clustering methods have gained considerable attention thanks to the powerful feature extraction and nonlinear dimensionality reduction capabilities of neural networks. Among them, as one of the most classical neural network models, autoencoder obtains the latent representation by minimizing the reconstruction error [6]. The combination of autoencoder and clustering has produced a series of excellent deep clustering algorithms. Earlier work on deep clustering typically performed feature extraction and clustering as two separate stages. While this two-stage approach resulted in better features by optimizing the neural network, the clustering process was not directly optimized and thus had limited impact on clustering performance. Different from the above methods, DEC [7] and IDEC [8] perform feature extraction and clustering simultaneously. Recently, inspired by deep single-view clustering methods, numerous deep multi-view clustering algorithms have been proposed, such as deep multi-view collaborative clustering [9], contrastive learning based deep multi-view clustering [10], and deep multi-view subspace clustering [11]. Whereas deep multi-view clustering has shown remarkable performance, it has a tendency to prioritize global structure at the expense of local structure, potentially resulting in misclustering of boundary data. To better mine the structural information of multiview, graph embedding method has been well studied. Graph embedding aims to represent the nodes of a graph as a low-dimensional vector space, while preserving the topology and node information of the graph to facilitate subsequent work. Thus, graph embedding can be used as a priori knowledge to mine the nearest-neighbor relationships between data and serve as a guide for clustering. Based on this approach, [12] utilizes graph affinity information to enhance clustering accuracy and preserve important local structural information. However, the graph affinity information in [12] is constructed using latent representations, which can lead to a lack of direct correlation between the original data and the membership. To solve the above problem, this paper develops a new deep multi-view clustering model based on graph embedding (G-DMC). We propose an autoencoder-based multiview learning framework for multi-view, which can integrate information from multiple views into a common latent representation. Furthermore, through the incorporation of both local and global structure learning, G-DMC can fully leverage all available information from the original multi-view data. A comprehensive set of experiments is conducted on five datasets, and the results prove the efficacy of the proposed G-DMC algorithm when compared to several advanced multi-view clustering methods.

Deep Multi-view Clustering Based on Graph Embedding

717

The rest of this paper is prepared as follows. Section 2 details the algorithm principles, loss function, and training procedures of G-DMC. In Sect. 3, we show the results of our experiments, which prove the superiority of G-DMC algorithm. The conclusions are given in Sect. 4.

2 The Proposed Approach A multi-view data representation contains rich information from multiple sources, and it is assumed that multi-view originates from a common latent representation that can effectively describe the data and uncover the common structure among views. We learn the common latent representation and the complex relationships between each view with the help of an auto-encoder, and feed the latent representation into a clustering network to predict the multi-view clustering results. Our model purposes to find a common latent representation H to discover common clustering patterns of multi-view for predicting multi-view clustering results. The model structure of the algorithm proposed in this paper is shown in Fig. 1.

Fig. 1. The framework of G-DMC. It consists of V encoders, V decoders, a clustering layer, and an affinity matrix used to guide the clustering.

The objective function of G-DMC is defined as (1). LG−DMC = LRe + αLKL + βLGr =

V  v=1

||X

(v)

− Xˆ (v) ||22 + α

K N   i=1 k=1

N   K  pik pik log +β sij (qik − qjk )2 qik i=1 j∈NBi k=1

(1)

718

C. Zhang et al.

The total loss function contains three components: the reconstruction loss LRe , the clustering loss LKL and the graph embedding loss LGr . α and β are parameters to balance the losses. 2.1 Reconstruction Loss Given N data samples X (v) = [x1(v) , ..., xN(v) ], v ∈ {1, 2, ..., V } from V different views, H = {h1 , h2 , ..., hN } is the common latent representation learned jointly by the V encoders. In order to obtain the shared information of all views and maintain the local structure of each view, we construct V parallel networks. Firstly, V encoders are used to map the multi-view data to the common latent space, and then V decoders are used to reconstruct the information of each view from the common latent representation. The reconstruction loss is usually measured by MSE, as (2). LRe =

V 

||X (v) − Xˆ (v) ||22

(2)

v=1

where Xˆ (v) denotes the reconstruction of the v-th view. 2.2 Clustering Loss The KL divergence is a classic loss function that measures the similarity between two probability distributions, and it finds wide applications in tasks such as clustering and classification. Similar to IDEC, the KL scatter between the membership distribution Q and the target distribution P is used to calculate the clustering loss in G-DMC. The specific formula is as (3). LKL = KL(P||Q) =

 i

pij log

j

(1 + ||hi − ck ||2 )−1 qik =  2 −1 j (1 + ||hi − ck || )  q2 / i qik pik =  ik 2  k (qik / i qik )

pij qij

(3)

(4)

(5)

where H = {h1 , h2 , ..., hN } is the latent representation learned by the autoencoders, which is used to calculate the membership distribution Q. The membership distribution is then compared to an auxiliary target distribution P, which is derived from Q, to calculate the clustering loss. This process encourages the learned latent representation to be discriminative for clustering, while also maintaining good reconstruction quality of the original data.

Deep Multi-view Clustering Based on Graph Embedding

719

2.3 Graph Embedding Graph embedding obtains the affinity graph of the original data by constructing affinity matrix. Applying the affinity graph to the clustering layer can help us get more accurate category division and directly promote clustering. We use the affinity matrix to constrain the membership vector. The loss function of the graph embedding can be defined as (6) LGr =

N   K 

sij ||qik − qjk ||22

i=1 j∈NBi k=1

s.t.

N 

(6)

sij = 1, sij ∈ [0, 1]

i=1

where sij is the affinity of the i-th sample and the j-th sample, qik denotes the degree that the i-th sample belongs to the k-th class, and NBi is the neighbor of the i-th sample. Since the graph embedding aims to ensure the correlation between the original data and the membership, the construction of affinity matrix is particularly important. In order to ensure that the affinity matrix contains both the topological information of a single view and the related information of other views, LT-MSC [13] becomes a great choice for computing the initial affinity matrix. Based on the self-representative subspace clustering algorithm, LT-MSC introduces tensor nuclear norm to enforce constraints on the coefficient matrix of each view to establish the relationship and dig out the higher-order correlations between views. This can be specifically expressed as (7). min ||R||2,1 + λ||M ||∗

Z (V ) ,E (V )

s.t. X (v) = X (v) M (v) + R(v) , v = 1, 2, .., V , M = (M (1) , M (2) , ..., M (V ) ),

(7)

R = [R(1) , R(2) , ..., R(V ) ] where M (v) and R(v) are the subspace coefficient matrix and error matrix of the v-th view, respectively. M is created by merging distinct representations M (v) into a 3-order is the l2,1 -norm, which encourages which tensor with dimensions of N × N × V . encourages sparse rows and columns of the matrix R. Then, LT-MSC uses AL-ADM method to solve (7). After obtaining the coefficient matrix for each view, they are combined using Eq. (8) to obtain the global affinity matrix S. S=

V 1  (v)| |M + |M (v)T | V

(8)

v=1

By the LT-MSC model, we obtain a global affinity matrix S of size N × N , but when N is large, the computational complexity of the graph embedding term LGr is high and the very low affinity values in the affinity matrix not only do not contribute to clustering but also cause redundancy of information. To address this issue, we keep only the larger NBi , i = 1, 2, ..., N affinities for each sample.

720

C. Zhang et al.

2.4 Optimization We employ the small batch stochastic gradient descent and backpropagation optimization methods to optimize the network weights and cluster centers. (i) Clustering Centers C-update: Fixing the target distribution P and taking partial derivatives for LKL and LGr , then the gradients for LKL and LGr clustering centers can be calculated respectively as  (qik − pjk )(hi − ck ) ∂LKL = 2α ∂ck 1 + ||hi − ck ||2 N

(9)

i=1

K 2 )(h − c )   (qik − qjk )(qik − qik ∂LGr i k = 4β sij 2 ∂ck 1 + |hi − ck ||

(10)

j∈NBi k=1

Given the learning rate δ and batch size M , the clustering centers are updated as ck = ck −

∂LKL δ ∂LGr (α + 2β ), for 1 ≤ k ≤ K M ∂ck ∂ck

(11)

(ii) Network parameters update: Firstly, the network weights of the encoder are updated by reconstruction loss LRe , clustering loss LKL and graph embedding loss LGr . The update equation is as follows. W =W−

 M  δ  ∂LRe ∂LKL ∂LGr +α +β M ∂W ∂W ∂W

(12)

m=1

Secondly, since the network weights of the decoder depend only on the reconstruction loss, so the decoder weights are updated as (13). W = W −

M δ  ∂LRe M ∂W

(13)

m=1

According to the above objective function and iteration formula, the algorithm flow of G-DMC is shown in Algorithm 1.

Deep Multi-view Clustering Based on Graph Embedding

721

722

C. Zhang et al.

3 Experiments 3.1 Dataset The experimental evaluation was conducted on a total of five multi-view datasets, and the basic information of each dataset are shown in Table 1. The composition and sources of each dataset are detailed in the following. 100leaves: The dataset contains 1600 samples of 100 plant species. The extracted sample features are described by three views: shape descriptors, fine-scale edges and texture histograms. IS: This is an image segmentation dataset which contains 2310 samples. These samples are taken from a database containing seven outdoor images. Features are then extracted from each image and these features can be classified into two views: the shape view of the image and the RGB information view of the image. BDGP: This is a two-view dataset which contains 2500 images of Drosophila embryos. Each image is represented by a 1750-D visual vector and a 79-D text feature vector. wine: This is a dataset of wines from the UCI database. The data contain 178 samples, each containing 13 characteristic components, and these are divided into 3 views. mfeat: The dataset is taken from the UCI database and comprises 2000 samples of handwritten digits from 0–9. The extracted features are described by six views. Table 1. Basic information of the five experimental datasets. Dataset

#Samples

100leaves

1600

#Views 3

#Classes 100

IS

2310

2

7

BDGP

2500

2

5

wine

178

3

3

mfeat

2000

10

6

3.2 Comparing Methods • k-means [2]: As the most classical division-based single-view clustering method, the Euclidean distance is calculated so as to divide the data set into clusters. • IDEC [8]: The clustering loss based on student ‘s-t distribution is defined and integrated with the reconstruction loss of autoencoder to perform feature learning and clustering simultaneously with local structure preservation. • LT-MSC [13]: A subspace approach is used for multi-view clustering, while nuclear norm is introduced to mine the higher-order correlations between multi-views.

Deep Multi-view Clustering Based on Graph Embedding

723

• LMSC [14]: The model clusters the data using latent representations, exploring potential complementary information between views from multiple perspectives. Unlike existing multi-view clustering methods, this method looks for potential latent representation and performs data reconstruction on the learned latent representations. • DEMVC [9]: Based on the IDEC, this method proposes to take advantage of the complementary and common information among views by collaboratively training the feature representation and cluster assignment of all views. • CE-MVFC [15]: This method employs a deep matrix decomposition technique to learn hierarchical information from multi-view data, and subsequently applies kmeans clustering on the resulting shared representation. 3.3 Experiment Setup The proposed G-DMC algorithm chooses a similar setup as IDEC [8], using a three-layer fully connected neural network with a symmetric network structure for the encoder and decoder, and the AE network structure is uniformly set to D -500–500-1000- d -1000500-500- D,where D is the dimensionality of the original data and d is the dimensionality of the hidden layer features. All internal layers are activated using the nonlinear function ReLU. Additionally, to provide a better initialization of the model and to accelerate the convergence of the objective function as well as to better compute the soft assignment of the data, the network weights are pre-trained using an AE model with the same network settings. After pre-training, the clustering centers are initialized using k-means. To test the performance of G-DMC, three clustering performance metrics were used: clustering accuracy (ACC), normalized mutual information (NMI), and adjusted Rand coefficient (ARI). Higher scores on the three metrics indicate better clustering performance. 3.4 Results Table 2 reports the clustering results of the five public datasets on different algorithms, from which we obtain some interesting observations, among which * expresses the optimal experimental result in the group. First, in general, the G-DMC algorithm achieves optimal values on ACC, NMI and ARI for five different sizes and types of datasets. Secondly, in comparison with two single-view algorithms (K-means and IDEC), it can be found that the G-DMC algorithm shows a clear advantage. For example, for the ACC evaluation metric, the G-DMC achieves 0.8475 on the 100leaves data set, which is 0.2544 and 0.2456 higher than the k-means and IDEC respectively. This indicates that the multi-view clustering method outperforms the single-view clustering method in handling multi-view data. Moreover, the accuracy ACC of G-DMC is 0.7017 for the IS dataset, which is significantly higher than that of LT-MSC (0.5987), LMSC (0.6152), DEMVC (0.4494) and CE-MVFC (0.5297), largely due to the effectiveness of the graph embedding. Furthermore, unlike DEMVC which retains the latent representation and category information behavior of each view, G-DMC chooses to obtain more consistent clustering information, which makes the G-DMC algorithm present better clustering results on

724

C. Zhang et al. Table 2. Clustering results of different algorithms on five datasets.

Dataset

k-means IDEC

100leaves ACC 0.5931

BDGP

IS

wine

mfeat

LT-MSC LMSC DEMVC CE-MVFC G-DMC

0.6019 0.7113

0.7650 0.3731

0.7126

*0.8475

NMI

0.7578

0.8109 0.8601

0.8830 0.6909

0.8876

*0.9226

ARI

0.3893

0.4790 0.6249

0.6751 0.2317

0.6378

*0.7630

ACC 0.5992

0.9088 0.9572

0.8124 0.8068

0.9280

*0.9608

NMI

0.4864

0.7873 0.8694

0.5773 0.7078

0.8428

*0.9600

ARI

0.2030

0.7858 0.9667

0.2551 0.6086

0.8364

*0.9054

ACC 0.5697

0.5766 0.5987

0.6152 0.4494

0.5297

*0.7017

NMI

0.6237

0.5553 0.5108

0.5496 0.4827

0.4752

*0.6584

ARI

0.4571

0.4078 0.3894

0.4230 0.2762

0.3368

*0.5582

ACC 0.8820

0.9101 0.8933

0.8989 0.8483

0.8652

*0.9719

NMI

0.6781

0.6869 0.6643

0.6567 0.6108

0.5991

*0.8823

ARI

0.6873

0.7517 0.8661

0.7452 0.5951

0.6347

*0.9134

ACC 0.8110

0.8880 0.8950

0.7985 0.7550

0.7550

*0.9570

NMI

0.7216

0.8138 0.8326

0.8227 0.6681

0.6681

*0.9160

ARI

0.6507

0.7730 0.7945

0.7528 0.5452

0.7589

*0.9076

datasets with a high number of categories such as 100leaves and those with a high number of views such as mfeat. In conclusion, our proposed G-DMC algorithm has been shown to outperform several advanced multi-view clustering algorithms on a variety of datasets. By effectively leveraging the consistency and complementary information present in multi-view data, G-DMC is able to achieve superior clustering performance.

Fig. 2. Results of ablation experiments on the 100leaves dataset and the IS dataset

To consider the contribution of the graph embedding module to our proposed model, we conducted ablation experiments on G-DMC using 100leaves and IS dataset as examples. β = 0 means the graph embedding module is removed, the rest of the modules

Deep Multi-view Clustering Based on Graph Embedding

725

remain unchanged, and Ours is our G-DMC model. The experimental results are shown in Fig. 2. We can clearly find that the clustering accuracy of the 100leaves and IS datasets increases by 4.88% and 4.93%, respectively, after adding the graph embedding module. This effectively validates the feasibility and superiority of the G-DMC algorithm, and further demonstrates that graph embedding can effectively mine the potential local prevalent structural information of the data, which in turn facilitates clustering.

4 Conclusion In this work, we present deep multi-view clustering based on graph embedding (GDMC) for the problem of how to dig out the relevant and complementary information between multi-view and how to leverage the local geometric structure of the original data. G-DMC performs clustering while learning features and introduces graph embedding strategy to maintain the affinity consistency between the original data and membership. The experimental results show that, compared with other clustering methods, the GDMC algorithm proposed in this paper has excellent performance on ACC, NMI and ARI, and presents better clustering effect on various types of datasets. Acknowledgment. This work was supported in part by the National Natural Science Foundation of China under Grant No. 62273164 and the Joint Fund of Natural Science Foundation of Shandong Province under Grant No. ZR2020LZH009.

References 1. Ding, W., et al.: An unsupervised fuzzy clustering approach for early screening of COVID-19 from radiological images. IEEE Trans. Fuzzy Syst. 30, 2902–2914 (2022) 2. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967) 3. Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM J. Res. Dev. 17, 420–425 (1973) 4. Yang, Y., Wang, H.: Multi-view clustering: a survey. Big Data Min. Anal. 1, 83–107 (2018) 5. Li, J., Tao, D.: Simple exponential family PCA. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 453–460. JMLR Workshop and Conference Proceedings (2010) 6. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp. 1096–1103. ACM Press (2008) 7. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 478–487 (2016) 8. Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 1753–1759. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia (2017) 9. Xu, J., Ren, Y.: Deep embedded multi-view clustering with collaborative training. Inf. Sci. 573, 279–290 (2021)

726

C. Zhang et al.

10. Xu, J., Tang, H., Ren, Y., Peng, L., Zhu, X., He, L.: Multi-level feature learning for contrastive multi-view clustering. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16030–16039 (2022) 11. Wang, Q., Cheng, J., Gao, Q., Zhao, G., Jiao, L.: Deep multi-view subspace clustering with unified and discriminative learning. IEEE Trans. Multimedia. 23, 3483–3493 (2021) 12. Yang, X., Deng, C., Dang, Z., Tao, D.: Deep multiview collaborative clustering. IEEE Trans. Neural Networks Learn. Syst. 34, 516–526 (2023) 13. Zhang, C., Fu, H., Liu, S., Liu, G., Cao, X.: Low-rank tensor constrained multiview subspace clustering. In: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, pp. 1582–1590. IEEE (2015) 14. Zhang, C., Hu, Q., Fu, H., Zhu, P., Cao, X.: Latent multi-view subspace clustering. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, pp. 4333–4341. IEEE (2017) 15. Chang, S., Hu, J.: Multi-view clustering via deep concept factorization. Knowl.-Based Syst. 217, 106807 (2021)

Graph-Based Short Text Clustering via Contrastive Learning with Graph Embedding Yujie Wei1 , Weidong Zhou2 , Jin Zhou1(B) , Yingxu Wang1 , Shiyuan Han1 , Tao Du1 , Cheng Yang1 , and Bowen Liu1 1 School of Information Science and Engineering, University of Jinan, Jinan 250022, China

[email protected] 2 School of Microelectronics, Shandong University, Jinan 250100, China

Abstract. Clustering is an unsupervised learning technique that helps us quickly classify short texts. It works by effectively capturing the semantic themes of texts and assigning the similar texts into the same cluster. Due to the excellent ability of contrastive learning to learn representations, using contrastive learning to extract semantic features for clustering tasks has become a new trend for short text clustering. However, the existing short text clustering methods pay more attention to the global information, and lead to wrong classification for samples with ambiguous clusters. Therefore, we propose graph-based short text clustering via contrastive learning with graph embedding (GCCL) - a novel framework that leverages the affinity between samples and neighbors to impose constraints on the low-dimensional representation space. To verify the effectiveness of our method, we evaluate GCCL on short text benchmark datasets. The experimental results show that GCCL outperforms the baseline method in terms of accuracy (ACC) and normalized mutual information (NMI). In addition, our approach achieves impressive results in terms of convergence speed, demonstrating the guidance of graph embeddings for short text clustering tasks. Keywords: Text Clustering · Contrastive Learning · Graph Embedding

1 Introduction Clustering, as an unsupervised learning method, groups similar data into the same cluster. Short text clustering takes short texts such as tweets, news or reviews, into clusters based on their similarity of context. It can be used for various applications, such as content recommendation, topic modeling or sentiment analysis. Traditional clustering methods for short texts extract discrete text vectors and use methods such as K-Means [4], BIRCH [14] or GMM [8] to obtain clustering assignments by measuring similarity through the distance or density of samples in the original space. Due to the high latitude, sparsity and noise of short text data, traditional clustering methods cannot represent text data adequately, resulting in inferior clustering results. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 727–738, 2023. https://doi.org/10.1007/978-981-99-4755-3_63

728

Y. Wei et al.

Deep representation learning can effectively extract textual information by training neural networks and obtain low-dimensional representations from high-dimensional original data. Recently, some researches use neural networks to extract semantic information and word associations from short text to facilitate downstream clustering tasks. STCC [11] feeds the word embedding via Word2Vec [5] into the convolutional neural network to obtain low-dimensional features for clustering tasks. Self-Train [2] obtains sentence vectors by SIF embeddings and uses autoencoder to obtain low-dimensional vectors. Even though the low-dimensional vectors extracted by deep neural networks can well represent the short text, as shown in Fig. 1(a), the original text data still overlap in the representation space before clustering, which significantly affects the clustering effect. Contrastive learning has achieved significant success in the field of deep representational learning. For text representation, several researches in contrastive learning have proved its effectiveness. SimCSE [1] feeds the same text into the encoder twice to get word embeddings as a positive pair and the other as negative pairs. PromptBert [3] uses prompt learning to compute sentence embeddings and employs existing contrastive learning framework. As shown in Fig. 1(b), contrastive learning further alleviates the problem of overlapping data points. One recent work [13] jointly trains contrastive head and clustering head, which is visualized as shown in Fig. 1(c), where the data points within clusters are still scattered and samples at the cluster boundaries are easy to be misclassified. Previous short text clustering methods only consider the global structure of the data and do not exploit the local structure between samples and their neighbors. Graph embedding can be a good solution that uses correlations between samples and nearest neighbors to compensate for clustering center bias caused by relying only on the global structure, achieving the goal of improving clustering accuracy. Several studies have demonstrated the effectiveness of graph embedding. Zhang et al. proposes SENet [15], which improves graph structure by exploiting information from neighbors and learns graph embedding through spectral clustering. Xu et al. proposes the GEC-CSD [10] to integrate node representation learning and clustering into a unified framework, introducing a new deep graph attentional autoencoder. To solve the existing problem of short text clustering, we propose a graph-based short text clustering via contrastive learning with graph embedding (GCCL) to optimize the clustering results by jointly training the contrastive head and clustering head and adding graph constraints based on the connectivity of the representation space. The method exploits the advantages of representation learning and graph structure. As shown in Fig. 1(d), by GCCL, the data points present a clear division in the representation space, ensuring that the boundary samples are clustered towards the cluster centers. In summary, our main contributions are as follows: -We propose a novel short text clustering framework, which constrains the proximity in the representation space based on the similarity between samples with its nearest neighbors. In this way, the local structure of the samples guides the division of data points during the joint training of contrastive head and clustering head, which avoids misclassification of boundary data points. We demonstrate the effectiveness of GCCL through comparative experiments. Our method shows better performance in accuracy

Graph-Based Short Text Clustering

729

compared to baseline. In addition, we evaluate convergence speed of GCCL to verify the guidance of graph embedding for clustering. -For text data, we use a new measure to construct the affinity matrix. We train instance-level contrastive head to obtain text representations and extract reliable nearest neighbor information as a priori, thus ensuring the reliability of the graph embedding.

Fig. 1. TSNE dimensionality reduction visualization of the embedded representations of the SearchSnippets dataset, where different colors represent different clusters, our method has a clearer division of the data in the representation space.

2 Model The whole framework is divided into two parts: 1) extracting the affinity matrix by contrastive learning. 2) training the contrastive head and clustering head jointly with graph embedding constraints. For extracting the affinity matrix, we train the instance-level contrastive head to construct the affinity matrix from the obtained representations. We analyze the local structural relationships to find the nearest neighbors of each sample. For joint training, we impose graph embedding constraints on the subspace representations through the nearest neighbor information, while jointly training the semantic-level clustering head and instance-level contrastive head. Eventually we achieve better clustering results and faster convergence. 2.1 Graph-Based Short Text Clustering via Contrastive Learning with Graph Embedding In Sect. 1, we demonstrate the shortcomings of existing short text clustering methods, so we adopt a better strategy through graph embedding. We randomly sample a minibatch B = {xi }N i=1 . For each sample xi ∈ B, we extract its top-M nearest neighbors through the affinity matrix obtained in Sect. 2.2. The sample xi is feed into feature extractor f (·) together with its top-M nearest neighbors, at which point the size of the data that is fed to the feature extractor is (M +1)N . Our model has three parts as shown in the Fig. 2, where the graph embedding imposes constraints on the subspace to introduce local structure, the contrastive head uses contrastive learning to get a better representation, the clustering head groups samples of the same semantic cluster together by KL divergence. Graph Embedding. Specifically, we feed the sample xi and its neighbor xm into the feature extractor f (·) to obtain the representations by hi = f (xi ), hm = f (xm ). We use

730

Y. Wei et al.

Fig. 2. The training framework of GCCL, where top-M nearest neighbors are calculated from the affinity matrix.

a two-layer nonlinear MLP g(·) to map representations to a 128-dimensional subspace, i.e., zi = g(hi ), zm = g(hm ). On the basis of global modelling, we supplement the local information on the degree of spatial similarity of the subspace representations with graph-constrained loss, i.e.,

Lgra =

M N  

Sim zi − zm 22 .

(1)

i=1 m=1

where M denotes the number of nearest neighbors for each sample. S is the affinity matrix calculated according to Sect. 2.2, and Sim denotes the similarity between the i-th feature and the m-th neighbors. This term is used to constrain the proximity of the sample and its neighbors in the representation space, constructing a local graph relationship for the sample. That is, if neighbors are close to the original data, the hidden layer features are required to be close to each other. Instance-Level Contrastive Head. For this part, we use original text {xi }N i=1 , two data a b augmentations T , T are applied to it. For each sample xi , we get its augmented sample as xia = T a (xi ), xib = T b (xi ) and finally 2N samples are obtained. For xia , there are 2N-1 sample pairs, we treat different augmentations of the same sample {xia , xib } as positive pairs and the others as negative pairs. Feed the augmented samples xia , xib into the feature extractor f (·), obtain the corresponding representations hai , hbi . We do not

Graph-Based Short Text Clustering

731

directly apply the contrastive loss to hai , hbi . In order to better learn the semantics of the samples, we use contrastive head g(·), which maps features onto the hypersphere i.e., zia = g(hai ), zib = g(hbi ). The goal of contrastive learning is to learn representations that map positive pairs close together and negative pairs far apart in representation space. We use the cosine similarity to measure the degree of similarity and proximity between sample pairs, i.e., s(zik1 , zjk2 )

(zik1 )(zjk2 )T    . =  k1  k2  zi zj 

(2)

where k1 , k2 ∈ {a, b} and i, j ∈ [1, N ]. We use contrastive loss to adjust the topological relationships between these points mapped onto the unit sphere, being able to bring the positive pairs closer together and push the negative pairs further apart. The loss function is shown as: lia = −log

exp(s(zia , zib )/τ ) N  j=1

,

(3)

[exp(s(zia , zja )/τ ) + exp(s(zia , zjb )/τ )]

where τ is the temperature parameter, which serves to control how well the model discriminates between negative pairs. The larger the temperature parameter is set, the more the contrastive loss will treat all negative pairs equally. If the temperature parameter is set too small, the more the model will focus on particularly difficult negative pairs, making the model difficult to converge or generalized. The loss for the contrastive head is calculated on every augmented sample and averaged over all contrastive losses, i.e., Lins = −

N 1  a li + lib . 2N

(4)

i=1

Semantic-Level Clustering Head. The clustering head aims to divide the sample into K clusters. We use the original samples xi to obtain representations hi through feature extractor f (·). Differ from the contrastive head, we use the clustering head, which calculate the soft assignment through the Student’s t-distribution. And we use the auxiliary target distribution to further optimize the clustering assignment through KL divergence.

α+1

(1 + hi − μk 22 /α)− 2 , qik = K  α+1 (1 + hi − μk  22 /α)− 2

(5)

k  =1

where μk is the center of K clusters randomly initialized by K-Means in the representation space, α represents the degrees of freedom of the Student’s t-distribution. qik is the probability that sample i belongs to cluster k, it is a soft assignment, but this assignment

732

Y. Wei et al.

is not sharp enough for different clusters. Therefore, a sharper cluster assignment is obtained by constructing an auxiliary distribution, namely, 2/ qik

N

i=1 qik , N 2 k=1 qik . / i=1 qik

pik = K

(6)

the auxiliary distribution first squares the soft distribution qik thus sharpening the soft distribution probability and then normalizes it. The auxiliary distribution enables to obtain sharper clustering results. We encourage learning from clustering results with high confidence, therefore sharpen the soft distribution by auxiliary distribution. We optimize the KL divergence to push the clustering assignment to the target distribution. liC = KL(pi qi ) =

K 

pik log

k=1

pik , qik

(7)

The optimization objectives of clustering head, Lcls =

N 

liC /N .

(8)

i=1

In summary, our total loss function is: L = σ Lins + ηLcls + βLgra .

(9)

where Lins and Lgra are defined in Eq. 4 and Eq. 1. σ, η, β are used to balance the weight of each loss in the objective function. Extraction of Affinity Matrix. Due to the superiority of contrastive learning in learning representations, we employee contrastive learning as a better strategy to obtain the affinity matrix S. As shown in the Fig. 3, given a randomly sampled mini-batch B = {xi }N i=1 , the same data augmentations as in Sect. 2.1 and we get augmented sample by xia = T a (xi ), xib = T b (xi ). In order to obtain a low-dimensional feature representation, we feed the augmented sample pairs into a MLP by zia = g(hai ), zib = g(hbi ) with the same network structure as the Sect. 2.1. The contrastive loss function Lins in Eq. 4 is used to maximize the similarity between positive pairs while minimizing the similarity between negative pairs. We calculate the affinity matrix from the output representatives of the feature extractor by jointly training the feature extractor f (·) and the contrastive head g(·). After L (L < = 5) epoch, the feature representation {hi }N i=1 of the original text obtained by f (·) is used to construct the affinity matrix using cosine similarity. Based on the affinity matrix, we rank the affinities of the samples and select the top-M nearest neighbors with the greatest similarity for text clustering.

Graph-Based Short Text Clustering

733

Fig. 3. Framework for extracting the affinity matrix. We extract the affinity matrix from the output of BERT.

3 Experiments In this section we conduct experiments to verify the effectiveness of GCCL. We first introduce the datasets, implementations and evaluation metrics. 3.1 Datasets We validate the effectiveness of GCCL with five benchmark datasets, and Table 1 provides statistics on the datasets details, where |V | is the vocabulary size; N D is the number of short text augmentations; Len is the average number of words in each documents; N C is the number of clusters; L/S is the ratio of the size of the largest cluster to the smallest cluster. To access the effectiveness of our method, we do not perform any pre-processing on all datasets. Table 1. Dataset statistics. Datasets

|V |

Documents

Clusters

ND

Len

NC

L/S

Biomedical

19K

20000

13

20

1

StackOverflow

15K

20000

8

20

1

SearchSnippets

31K

12340

18

8

7

AgNews

21K

8000

23

4

1

Tweets

5K

2472

8

89

249

Biomedical [11] uses the challenge data released on the official BioASQ website. Here is a random selection of 20,000 paper titles from 20 groups. StackOverflow [11] is a challenge dataset that is available on Kaggle.com. Here 20,000 question titles are randomly selected from 20 different categories. SearchSnippets [6] is extracted from web search snippets, which includes 12,340 snippets from 8 different domains. AgNews [7] is a subset of AG’s news article corpus, consisting of a combination of titles from the four largest categories of articles in the AG corpus.

734

Y. Wei et al.

Tweet [12] comes from a total of 109 queries used at the Text Retrieval Conference (TREC)2 in 2011 and 2012. After removing queries without highly relevant tweets, the dataset has 89 clusters and a total of 2472 tweets. 3.2 Implementations We use sentence-transformer, a natural language processing model, to encode textual input into 768-dimensional vectors. We choose the distilbert-base-nli-stsb-mean-tokens pre-trained model as encoder for most datasets, since this pre-trained model is based on a general corpus and has limitations for biological and medical corpora, for biomedical we use BioBERT-mnli-snli-scinli-scitail-mednli-stsb as encoder. We set M = 10 and batchsize to 100 and train 100 epochs for all datasets. For encoder, we set the learning rate to 1e−5, for clustering head and contrastive head we set the learning rate to 1e−3. We use a two-layer MLP as the clustering head, including a 768-dimensional hidden layer that maps the feature vectors into a 128-dimensional subspace. In the case of clustering head, we calculate the clustering results by computing the Student’s t-distribution. For the graph embedding, we use the same setting as contrastive head. The specific parameter settings are shown in Table 2. Table 2. The hyper-parameters set in our method. Datasets

Hyper-Parameters α

σ

η

β

τ

Biomedical

1

[5, 10]

1

[0.01,0.05]

0.5

StackOverflow

1

[5, 10]

1

[0.01,0.05]

0.5

SearchSnippets

10

[5, 10]

10

[0.01,0.05]

0.5

AgNews

1

[5, 10]

1

[0.05,0.07]

0.5

Tweets

1

[1, 5]

1

[0.001,0.005]

0.5

3.3 Compare with Baseline To prove the validity of our model, we compare the following baseline methods. More details about the baseline are as follows: BOW first pre-processes the data by removing stop words, punctuation and other irrelevant words. Then converts pre-processed data into a 1500-dimensional vector representation using the counting vector method and clusters the vector by k-means. TFIDF requires text pre-processing by removing stop words, punctuation and other noisy words. Then calculating the TFIDF matrix of the pre-processed text, and obtaining the results by applying clustering to the TFIDF matrix. K-Means uses BERT as feature extractor and performs K-Means on the sentence embeddings in its last layer of output. Then use k-means for clustering.

Graph-Based Short Text Clustering

735

Table 3. Experimental results, our results are averaged through five random runs. Model

Biomedical

StackOverflow

SearchSnippets

AgNews

ACC

NMI

ACC

NMI

ACC

NMI

ACC NMI ACC NMI

Tweet

BOW

14.3

9.2

18.5

14.0

24.3

9.3

27.6

2.6

49.7

73.6

TFIDF

28.3

23.2

58.4

58.7

31.5

19.2

34.5

11.9

57.0

80.7

K-Means

39.8

32.7

60.8

52.3

59.0

36.4

83.9

59.2

51.7

79.0

DEC

41.6

37.7

74.7

75.3

76.9

64.9









STCC

43.6

38.1

51.1

49.0

77.0

63.2









Self-Train 54.8

47.1

59.8

54.8

77.1

56.7









HAC_SD

40.1

33.5

64.8

59.5

82.7

63.8

81.8

54.6

89.8

85.2

SCCL

46.2

41.5

75.5

74.5

85.2

71.1

88.2

68.2

78.2

89.2

GCCL

55.1

47.7

79.0

73.5

86.0

72.0

88.6

69.0

86.0

91.7

DEC [9] uses BERT as feature extractor and performs K-Means on the sentence embeddings in its last layer of output. Then use k-means for clustering. STCC [11] is a three-stage process. For each dataset, the method uses Word2Vec, a widely used algorithm for generating word embeddings. Word embeddings are fed into a convolutional neural network to learn deep feature representations. Finally, the clusters are obtained by clustering the learned representations using K-means. Self-Train [2] uses SIF embedding which is superior to the word embedding method of STCC. During pre-training, a deep auto-encoder is applied to encode and reconstruct the SIF embeddings of the short texts. In a self-training phase, a soft clustering assignment is used as an auxiliary target distribution. The encoder weights and clustering assignments are jointly fine-tuned. HAC_SD [7] uses the hierarchical agglomerative clustering and SD sparsification methods of K-NN. The method iteratively classify the clusters obtained from hierarchical agglomerative clustering by applying the SD sparsification method to the iterative classification. SCCL [13] uses multiple data augmentations combine with contrastive learning to jointly train contrastive head and clustering head. For short text clustering we do not preprocess any of the five datasets, and we report the comparative results in Table. 3, where our method outperforms the baseline on most datasets. For SCCL, which also uses the contrastive head and clustering head, the visualization results are shown in Fig. 1(c). We compare it with the result after adding graph embedding constraints as shown in Fig. 1(d). It is clear that our method uses the local information to constrain subspace representations, making it easier to classify the samples into the same cluster as their closer ones. And in terms of convergence speed as shown in Fig. 4, the nearest neighbor information can greatly improve convergence speed. This is due to the ability of the proximity information to guide the training

736

Y. Wei et al.

Fig. 4. Effect of graph embedding on convergence speed.

process in the right direction. Experiments demonstrate that our method has significantly improved the clustering effect and convergence speed. For the Tweet dataset, which has fewer samples and more classes, contrastive learning lacks in performance on few-shot datasets since it requires a large number of sample pairs. In contrast, HAC_SD achieves better performance on Tweet dataset by using clustering on carefully selected pairwise similarities of pre-processed data. 3.4 Parameter Experiment In this section, we discuss for the effect of the number of nearest neighbors on the clustering accuracy. To evaluate the impact of nearest neighbors on the clustering results, we evaluate the number of nearest neighbors for each sample. For each dataset, we took 5, 10, and 20 nearest neighbors and the results are shown in Fig. 5.

Fig. 5. Experimental results by using 5, 10, 20 nearest neighbors.

From the experimental results, it can be seen that the too much and too little nearest neighbor information can cause the graph embedding useless. This is because too little nearest neighbor information cannot guide the training process; too much nearest

Graph-Based Short Text Clustering

737

neighbor information firstly lacks in accuracy, and secondly leads to long training time and degrades the model performance.

4 Conclusion In this paper, we propose a short text contrastive clustering method based on graph embedding, which is used to solve the problem of inconsistent delineation of boundary samples. We obtain the affinity matrix through contrastive learning, and constrain the local structure of subspace representations. To learn text representation better, we combine contrastive learning with deep clustering. We evaluate the model on five datasets and achieve excellent results. In addition, our convergence speed has improved spectacularly. However, due to the design of the clustering head, if data in each implied class is severely imbalanced, i.e., Tweet, the clustering effect cannot achieve the expectation. Additionally, the clustering effect of our method depends on the initialization of clustering center, how to design a new clustering head structure and combine it with contrastive learning will be our future work.

References 1. Gao, T., Yao, X., Chen, D.: Simcse: simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021) 2. Hadifar, A., Sterckx, L., Demeester, T., Develder, C.: A self-training approach for short text clustering. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pp. 194–199 (2019) 3. Jiang, T., et al.: Promptbert: Improving bert sentence embeddings with prompts. arXiv preprint arXiv:2201.04337 (2022) 4. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Mathematical Statistics and Probability, p. 281 (1965) 5. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) 6. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & webwith hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100 (2008) 7. Rakib, M.R.H., Zeh, N., Jankowska, M., Milios, E.: Enhancement of short text clustering by iterative classification. In: Métais, E., Meziane, F., Horacek, H., Cimiano, P. (eds.) NLDB 2020. LNCS, vol. 12089, pp. 105–117. Springer, Cham (2020). https://doi.org/10.1007/9783-030-51310-8_10 8. Reynolds, D.A., et al.: Gaussian mixture models. Encyclopedia of biometrics 741(659–663) (2009) 9. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487. PMLR (2016) 10. Xu, H., Xia, W., Gao, Q., Han, J., Gao, X.: Graph embedding clustering: graph attention auto-encoder with cluster-specificity distribution. Neural Netw. 142, 221–230 (2021) 11. Xu, J., Xu, B., Wang, P., Zheng, S., Tian, G., Zhao, J.: Self-taught convolutional neural networks for short text clustering. Neural Netw. 88, 22–31 (2017) 12. Yin, J., Wang, J.: A model-based approach for text clustering with outlier detection. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE), pp. 625–636. IEEE (2016)

738

Y. Wei et al.

13. Zhang, D., et al.: Supporting clustering with contrastive learning. arXiv preprint arXiv:2103. 12953 (2021) 14. Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. ACM SIGMOD Rec. 25(2), 103–114 (1996) 15. Zhang, X., Liu, H., Wu, X.M., Zhang, X., Liu, X.: Spectral embedding network for attributed graph clustering. Neural Netw. 142, 388–396 (2021)

An Improved UAV Detection Method Based on YOLOv5 Xinfeng Liu1 , Mengya Chen1 , Chenglong Li1(B) , Jie Tian2 , Hao Zhou3 , and Inam Ullah1 1 School of Computer Science and Technology, Shandong Jianzhu University, Fengming Road,

Jinan 250101, Shandong, China [email protected] 2 School of Data Science and Computer Science, Shandong Women’s College, Daxue Road, Jinan 250300, Shandong, China 3 Jinan Hope Wish Optoelectronics Technology Co., Ltd., Gangxing Third Road, Jinan 250101, Shandong, China

Abstract. Because unmanned aerial vehicles(UAVs) have the characteristics of low flight trajectory, slow motion speed, and small volume, they are difficult to identify using current vision technologies. To meet the requirements for detection speed and accuracy in the field of UAV detection, we propose an improved You Only Look Once version 5(YOLOv5) UAV detection method. It uses visible light and infrared imaging datasets for daytime detection and nighttime detection respectively. Based on these two datasets, our model can improve detection accuracy in challenging environments such as dim light or nighttime. We choose a model pruning method based on the Batch Normalization(BN) layers to prune the YOLOv5 detection model to improve the detection speed of the model. To solve the problem of misjudging birds as UAVs, EfficientNet is added to re-classify the detection results of YOLOv5. We constructed a dataset containing over 10000 visible light images and over 10000 infrared imaging images to evaluate the performance of the proposed algorithm. Qualitative and quantitative experimental results show that the proposed algorithm has greatly improved the accuracy and speed of UAV detection. Keywords: UAV detection · Computer vision · YOLOv5 · a model pruning method based on the BN layers · EfficientNet

1 Introduction With the continuous development and improvement of unmanned aerial vehicles(UAV) technology, UAVs with low cost, small size, simple operation, and strong flexibility [1] have been widely used in civil and military fields. However, due to its high concealment [2], low technical threshold, and low acquisition difficulty [3], it is easily used by criminals in improper places, posing a serious threat to national defense and social security [4]. The demand for UAV detection technology is growing [5]. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 739–750, 2023. https://doi.org/10.1007/978-981-99-4755-3_64

740

X. Liu et al.

Common UAV detection methods mainly include radar, radio frequency, audio, and visual detection [6]. However, due to the characteristics of the low flight path, slow motion speed, and small volume, UAVs are difficult to capture with traditional radar detection equipment [7]. The cost of radio frequency detection is high, and the UAV Communication Protocol limits the detection. Audio detection has a short-range and is vulnerable to environmental noise [8]. With the gradual maturity of video detection technology, the use of visual technology to detect UAVs has attracted widespread attention because of its low cost, mature technology, and good flexibility. It can provide information such as the orientation, size and type of UAVs. Recently, there has been much research on the application of computer vision technology in UAV detection. Martínezet et al. [9] proposed a method to detect the position of UAVs by reconstructing the three-dimensional trajectory based on the external camera system. However, the camera’s observation points are still fixed, and the monitoring range of the UAV is limited. Rozantsev et al. [10] used CNN and enhancement tree methods to detect UAVs in complex outdoor environments, but the deep convolution neural network training is prone to degradation. Zhai et al. [11] designed a new multi-strapdown residual network model by combining the two-layer residual learning module with the three-layer residual learning module. However, there are some problems, such as the inaccurate position of the UAV. To improve the detection accuracy of UAVs, Sun et al. [12] designed a deep Single Shot MultiBox Detector(SSD) network to detect UAVs, using large-scale input images and a deeper convolution network to obtain more features. Ma et al. [13] optimized the original You Only Look Once(YOLO) [14] structure using residual network and multi-scale fusion. However, the detection of UAVs at night or in dim light still needs improvement. Due to the small size of the UAV, it has the problems of low object recognition rate and poor positioning effect in visual detection. YOLOv5 has set up three detection heads with different scales, which can detect objects on multiple scales. It also incorporates various data preprocessing methods, which have a good detection effect on small objects [15]. At the same time, its detection speed is fast, so it has certain advantages for unmanned aerial vehicle detection with high real-time requirements. On this basis, we propose an improved UAV detection method based on YOLOv5. We selected two datasets (visible light and infrared image) to ensure our proposed work’s daytime and nighttime reliability. To improve the model detection speed and realize the real-time detection of UAVs, we used a model pruning method based on the Batch Normalization(BN) layers on the trained YOLOv5 mode. However, the model still has the problem of birds being mistakenly identified as UAVs. Therefore, we add the EfficientNet to re-classify the detection results of YOLOv5 to distinguish birds from UAVs. The experimental results show that this method has superior detection performance and can achieve timely detection of UAV targets, avoiding various safety hazards brought by UAVs.

2 Dataset Preparation The existing mainstream UAV detection methods in the markets can only perform lowaltitude anti-UAV missions in a single or specific environment and have strong time and space limitations. Most of these methods use models trained on one type of dataset for

An Improved UAV Detection Method Based on YOLOv5

741

detection. The resulting model has high detection performance in the daytime but low detection performance at night and in dim light. However, some criminals often choose to launch attacks at night. Therefore, improving the detection performance at night and in dim light is particularly important. It is difficult for ordinary cameras to take clear pictures at night or in dim light. At this time, infrared cameras can be used as shooting devices for UAV images. Infrared cameras use infrared imaging technology to generate infrared imaging images based on the radiation energy of an object, and display them in grayscale or pseudo-color images. In infrared imaging images, even under poor lighting conditions, the UAV targets in the image can be clearly distinguished. We use the photoelectric turntable equipment to collect visible light and infrared imaging datasets, and generate two corresponding models to detect the daytime and nighttime UAV objects respectively. Choosing the visible light model to detect UAV objects when the light is good, and choosing the infrared imaging model to detect when the light is poor. It greatly reduces the time limit for detection. Additionally, UAV images from various backgrounds were collected to reduce the spatial limitations of detection. The two datasets are shown in Fig. 1.

(a) infrared imaging image

(b) visible light image Fig. 1. Two Datasets

However, due to the lower detection accuracy of the trained infrared imaging model compared to the visible light model. To address this issue, we consider expanding the number of infrared imaging datasets to improve the detection accuracy of the model. We use the image processing method to convert the visible light images into grayscale images and add them to the infrared imaging dataset. After the addition, the number of images in the infrared imaging dataset is doubled, which also enriches the diversity of UAV samples in the infrared imaging dataset. Refer to Table 7 in Sect. 4.3 for specific experimental verification.

3 Algorithm Improvement 3.1 A Model Pruning Method Based on the BN Layers With the development of neural networks, the depth and width of models are expanding. A significant fraction of neuron activation values tend toward zero, which has minimal impact on the final detection results but uses an enormous amount of processing power

742

X. Liu et al.

and memory. At this time, the model pruning method can be used to remove these parameters, keep the expression ability of the model unchanged, and greatly reduce the model’s calculation. There have been many types of research on model pruning. Liu et al. [15] proposed a model pruning method based on the BN layers. It selected the scale factor of the BN layers as the measurement index of the pruning channel. During training, L1 regularization is performed on the scaling factor of the BN layers, and unimportant channels are identified by the scale factor tilted to 0 after regularization.The calculation formula is as follows: zin − μB z= σB2 +  

(1)



zout = γ z + β

(2)

In formula (1), zin is the output of the previous convolution layer, μB is the mean 2 is the variance, and the decimal  tending to 0 is added to avoid the denominator value, σB being 0. In formula (2), γ and β are learnable reconstruction parameters introduced by the BN layers, where γ is the scaling factor, and β is the translation factor. In addition, zout is the output value of the BN layers. zout of each channel is positively correlated with the coefficient γ. If γ is too small and close to 0, zout is also very small. It has little effect on the final detection result when removing those channels with γ -> 0. The loss function of a model pruning method based on the BN layers is:   L= l(f (x, W ), y) + λ g(γ ) (3) (x,y)

γ ∈

In formula (3), the former term is the normal training loss of a CNN, where (x, y) represents the train input and target, W represents the training weights. And the latter term is the sparsity-induced penalty on scaling factor, where λ is used to balance the two terms.The specific process of model pruning is shown in Fig. 2.

Fig. 2. Model pruning

The specific steps of model pruning are divided into the following four steps: Step 1 Model training based on YOLOv5. Based on the YOLOv5 algorithm, two datasets were trained separately to obtain two detection models, namely a visible light

An Improved UAV Detection Method Based on YOLOv5

743

model and an infrared imaging model. By adopting YOLOv5 models of different depths and continuously adjusting the input resolution, epoch and other relevant parameters of the mode, we find the optimal training parameters and get two optimal detection models. The specific tuning process is shown in Sect. 4.2. Step 2 Do sparse training on both models. Keep the relevant parameters such as model depth the same as those used by YOLOv5. After sparse training, the unimportant channels in the model are identified, preparing for the next step of model pruning. Step 3 Model pruning. The experiment to find the appropriate pruning rate is shown in Table 1. We initially selected the YOLOv5s model as the initial training model, and set the image input resolution to 640, epoch to 300, and batch size to 16. In terms of model evaluation indicators, we choose the mAP value to reflect the detection accuracy of drones, and the FLOP value to reflect the detection speed of drones. Table 1. Effect of different pruning rates on model performance Dataset

Pruning Rate

Image Input Resolution /pixel

Epoch

Batch_size

Max mAP

Test mAP

FLOPs/G

Model Size/M

Visible light

50%

640

300

16

0.931

0.926

3.78

8.6

60%

640

300

16

0.923

0.918

3.3

7.4

70%

640

300

16

0.92

0.915

2.93

6.5

80%

640

300

16

0.907

0.896

2.51

4.8

50%

640

300

16

0.922

0.914

3.47

8.6

60%

640

300

16

0.92

0.911

3.16

7.5

70%

640

300

16

0.916

0.905

2.94

6.5

80%

640

300

16

0.901

0.882

2.48

4.9

Infrared imaging

As the pruning rate increases, the mAP value of the model decreases to some extent, but FLOP and model size significantly decrease. A 70% pruning rate is a turning point, and after exceeding 70%, mAP significantly decreases. To ensure no significant difference in detection accuracy compared to before pruning, the pruning rate was ultimately set to 70%. In this step, the unimportant channels in the model identified through sparse training are removed, greatly reducing the computational complexity and size of the model, and thereby improving the detection speed of the model. Step 4 Model fine-tuning. After the model is pruned, the performance of the model is temporarily reduced, but it can be compensated by fine-tuning. The fine-tuned model accuracy is greatly improved, and the model size is further reduced. Through model fine-tuning, the model is finally determined. The results of pruning the two models through the above steps are shown in Table 2. The pruned model not only takes into account the accuracy and speed but also greatly reduces the model’s size and memory space.

744

X. Liu et al. Table 2. The results of model pruning with YOLOv5s model

Dataset

Step

Image Input Epoch Resolution /pixel

Batch_size

Max mAP

Test mAP

FLOPs /G

Model Size/M

Visible light

Step 1

640

300

16

0.956

0.95

4.89

14.8

Step 2

640

300

16

0.941

0.934

4.89

14.8

Step 3

640

300

16

0.92

0.915

2.93

6.5

Step 4

640

300

16

0.932

0.947

2.93

3.4

Step 1

640

300

16

0.82

0.804

4.89

14.4

Step 2

640

300

16

0.82

0.818

4.91

14.4

Step 3

640

300

16

0.81

0.801

2.93

6.5

Step 4

640

300

16

0.832

0.846

2.93

3.4

Infrared imaging

3.2 Secondary Classification with EfficientNet Because both UAVs and birds are small in size and fly in the air, UAVs are easy to be confused with birds, causing misjudgment to a certain extent. To solve the problem of misjudgment of UAVs and birds, we add the EfficientNet network to realize secondary classification. EfficientNet is a network series published by Google in May 2019. It proposes a composite model extension method by combining the effects of network depth, network width and image input resolution on the network, which significantly improves the accuracy and speed of the network. To detect whether the target is a drone or a bird, it is necessary to supplement the bird datasets. We collected a bird dataset equivalent to the number of drones and removed blurry images. After manually annotating the bird dataset, crop the image according to the annotation box to obtain the dataset of birds without background. The detection process is shown in Fig. 3, including four steps: Step 1 Get the UAV’s bounding boxes. Using the models pruned YOLOv5 detection model to detect the UAVs on the input image, the UAV’s bounding boxes are obtained. However, such detection is easy to classify birds as UAVs. Step 2 Find the position corresponding to the bounding boxes in the original images. When using the YOLOv5 algorithm for object detection, the original image will be enlarged or reduced based on the set image size, resulting in a change in the shape of the target. Therefore, we don’t directly use the detection results of the YOLOv5 model for secondary classification, but map the obtained bounding boxes to the corresponding position of the original images to obtain the position coordinates of the UAVs on the original images. Step 3 Crop the UAVs according to the position coordinates obtained in step 2. Using image cropping technology, crop the UAVs on the original images according to the position coordinates obtained in step 2. This way we get the input image set of the EfficientNet network.

An Improved UAV Detection Method Based on YOLOv5

745

Step 4 Secondary classification determines whether these objects are UAVs or birds. Use the generated secondary classification model for detection to determine the true category of these objects.

Fig. 3. Secondary classification of YOLOv5+EfficientNet

Due to EfficientNet having eight networks from B0 to B7, we tested them separately. Still set the image input resolution to 64 pixels, epoch to 300, and batch size to 16. The test results are shown in Table 3. Table 3. Comparison of EfficientNet B0-B7 Network Detection Results Dataset

Model

Image Input Resolution /pixel

Epoch

Batch_size

Max mAP

Test mAP

Visible light

B0

64

300

16

0.861

0.85

1.89

B1

64

300

16

0.885

0.872

2.43

B2

64

300

16

0.918

0.903

2.78

B3

64

300

16

0.927

0.919

3.59

B4

64

300

16

0.926

0.911

5.47

B5

64

300

16

0.914

0.908

7.32

B6

64

300

16

0.905

0.894

8.64

B7

64

300

16

0.902

0.887

10.41

B0

64

300

16

0.847

0.838

1.91

B1

64

300

16

0.872

0.859

2.44

B2

64

300

16

0.908

0.896

2.78

B3

64

300

16

0.92

0.909

3.6

B4

64

300

16

0.915

0.902

5.52

B5

64

300

16

0.903

0.898

7.33

B6

64

300

16

0.9

0.895

8.68

B7

64

300

16

0.891

0.89

10.45

Infrared imaging

FLOPs /G

746

X. Liu et al.

It can be seen that on the EfficientNet-B3 network, the visible light model and infrared imaging model have the highest test mAP values. Among them, the best model for visible light secondary classification has a mAP of 0.919, the best model for infrared image secondary classification has a mAP of 0.909, and the FLOPs are 3.59G and 3.6G, respectively. In the case of small FLOP differences, we ultimately chose EfficientNet-B3 with the highest mAP detection as the final secondary classification network.

4 Experimental Analysis 4.1 Dataset and Eexperimental Platform For the dataset, we collected more than 10,000 images ourselves and supplemented them with more than 10,000 images from the Det-Fly public dataset. Two datasets have a total of over 23000 images (including over 11500 visible light images and over 11500 infrared imaging images). Similarly, there are nearly 10000 infrared imaging images and nearly 10000 visible light images of birds. Filter the dataset to remove images with poor shooting results and label them. The ratio of dividing the training set, validation set, and test set is 6:2:2. The experiment was run on Ubuntu 1–20.04 system, with four GeForce GTX 1080 Ti GPUs, Python version 3.8, and deep learning framework PyTorch 1.8.1. 4.2 YOLOv5 Optimal Parameter Adjustment First, we tested four different depths of network models (YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x) included in the YOLOv5 algorithm to find the most suitable depth for UAV detection.The test results are shown in Table 4. Table 4. Network training results at different depths Dataset

Pre-training Model

Image Input Resolution/pixel

Visible light

YOLOv5s

640

300

YOLOv5m

640

300

YOLOv5l

640

300

16

0.958

0.95

34.22

YOLOv5x

640

300

16

0.954

0.938

65.12

YOLOv5s

640

300

16

0.82

0.821

4.89

YOLOv5m

640

300

16

0.797

0.818

15.09

YOLOv5l

640

300

16

0.856

0.854

34.22

YOLOv5x

640

300

16

0.838

0.835

65.13

Infrared imaging

Epoch

Batch_size

Max mAP

Test mAP

FLOPs /G

16

0.954

0.943

4.89

16

0.942

0.934

15.10

From Table 4, it can be seen that the visible light model and infrared imaging model trained under the YOLOv5l model have the highest test mAP values, while the FLOPs

An Improved UAV Detection Method Based on YOLOv5

747

value is the smallest and the detection speed is the fastest under the YOLOv5s model. Based on the principle of speed priority in actual detection, we chose the YOLOv5s model to improve the real-time performance of drone detection. The image input resolution of the model is also an important criterion for determining the accuracy of model detection. We tested the image input resolution of the YOLOv5 network while keeping the epoch and other related parameters unchanged. The test results are shown in Table 5. In terms of image input resolution selection, due to the majority of image sizes in the dataset being 1920 pixels × 1080 pixels, therefore select one-third and two-thirds of 1920 pixels, namely 640 pixels and 1280 pixels, plus 1920 pixels themselves for testing. Table 5. The effect of different image input resolutions on model detection performance Dataset

Pre-training Image Input Epoch Batch_size Max mAP Test mAP FLOPs Model Resolution /G /pixel

Visible light

YOLOv5s

640

300

16

0.954

0.943

4.89

YOLOv5s

1280

300

16

0.963

0.957

20.93

YOLOv5s

1920

300

16

0.97

0.965

41.64

Infrared YOLOv5s imaging YOLOv5s

640

300

16

0.806

0.804

4.91

1280

300

16

0.874

0.862

21.73

YOLOv5s

1920

300

16

0.912

0.898

42.34

According to the experimental results, among the three image input resolutions tested in Table 5, the mAP values of the visible light model and the infrared imaging model are proportional to the image input resolution. When the image input resolutions of the visible light model are 640 and 1920, the mAP values of the model only differ by 0.022, while the FLOPs values differ by 36.75G. In contrast, the mAP value of the infrared imaging model has increased significantly, but accuracy compensation can also be achieved through data augmentation. Therefore, in order to ensure the real-time performance of the model, we ultimately chose 640 pixels as the input resolution of the network. 4.3 Comprehensive Experimental Results We choose two datasets of visible light and infrared imaging, corresponding to the implementation of daytime and nighttime detection models. The method of graying out the visible light images is used to improve the detection accuracy of the infrared imaging model. Choosing YOLOv5 to implement UAV recognition, and a model pruning method based on BN layers is added to improve detection speed. Meanwhile, EfficientNet is added for the secondary classification of images to solve the problem of UAVs being misclassified as birds. This study evaluated the detection performance of the above two related models and added YOLOv3, YOLOv4, and Faster R-CNN as comparisons. The

748

X. Liu et al.

specific experimental results of the visible light dataset are shown in Table 6. Where the first item in Table 6 shows the performance of the model generated for the visible light dataset under the YOLOv5s model, and the second item shows the performance of the model generated after pruning the model obtained from the first item by using a model pruning method based on the BN layers. Table 6. Training results of visible light dataset Method

Pre-training Model

Image Input Resolution /pixel

Epoch

Batch_size

Max mAP

Test mAP

FLOPs /G



YOLOv5s

640

300

16

0.954

0.943

4.89

Model pruning

YOLOv5s

640

300

16

0.932

0.947

2.93

Secondary classification

EfficientNet-B3

64

300

16

0.927

0.919

2.42



YOLOv3

640

300

16

0.914

0.895

12.79



YOLOv4

640

300

16

0.925

0.917

7.03



Faster R-CNN

640

300

16

0.952

0.963

23.9

Table 7. Training results of the infrared imaging dataset Method

Pre-training Model

Image Input Epoch Batch_size Max Test FLOPs Resolution/pixel mAP mAP /G



YOLOv5s

640

300

16

0.806 0.804

4.91

Grey visible-light images

YOLOv5s

640

300

16

0.926 0.898

4.91

Grey visible-light images + Model pruning

YOLOv5s

640

300

16

0.935 0.927

2.97

64

300

16

0.92

2.93

640

300

16

0.712 0.728 12.83

Secondary EfficientNet-B3 classification

0.949



YOLOv3



YOLOv4

640

300

16

0.795 0.783

Faster R-CNN

640

300

16

0.812 0.826 24.2

7.07

An Improved UAV Detection Method Based on YOLOv5

749

The specific experimental results of the infrared imaging dataset are shown in Table 7. Where the first item in Table 7 is the performance of the model generated from the thermal image dataset under the YOLOv5s model, the second item is the performance of the model generated after using the data enhancement method for gray visible light images, and the third item is the performance of the model generated by pruning the model obtained from the second item using a model pruning method based on the BN layers. The comparison results show that in terms of detection accuracy, both model test mAP values of the YOLOv5 algorithm are stronger than those of the YOLOv3 and YOLOv4 algorithms, and slightly lower than those of the Faster R-CNN algorithm. However, the detection accuracy of Faster R-CNN is obtained at the expense of time. In terms of detection speed, the YOLOv5 algorithm has a significant advantage over other algorithms. After adding the model pruning method based on the BN layers, the detection speed is significantly improved. From the experimental results, it can be seen that the sum of FLOPs for the two classifications is still lower than other networks except for YOLOv4, and the difference compared to YOLOv4 is not significant. However, in terms of detection accuracy, after two network classifications, the model has a significant advantage in detection accuracy and effectively solves the problem of UAVs being misclassified as birds. Overall, compared with other algorithms, the method proposed in this paper has obvious advantages both in terms of detection accuracy and detection speed.

5 Summary We provide an improved UAV detection method based on YOLOv5 to improve the accuracy and speed of UAV detection. Two datasets, visible light and infrared imaging, are selected to generate daytime and nighttime detection models, which solve the problem of low detection accuracy at night and in dark light. The YOLOv5 network with fast detection speed is selected to implement UAV detection, and a BN layer-based model pruning method is used to further improve the speed of the detection model and reduce the model size. The problem of UAVs being misclassified as birds is solved by adding EfficientNet for secondary classification of the input images. The experimental results show that the proposed algorithm has certain advantages in terms of accuracy and speed, and the method has certain practical application value. Acknowledgement. This research was supported by the National Science Foundation of China under Grant(No.51975332, No. 62006143, No 62102235) and the Major Scientific & Technological Innovation Projects of Shandong Province (No.2021CXGC011204).

Conflicit of Interest. The authors declare that there are no conflicts of interest regarding the publication of this paper. Data Availability. All data generated or analysed during this study are included in this published article.

750

X. Liu et al.

References 1. Zhang, J., Zhang, K., Wang, J., Lv, M., Pei, W.: A survey on anti-UAV technology and its future trend. Adv. Aeronaut. Sci. Eng. 9(01): 1–8+34 (2018) 2. Yan, J., Xie, H., Zhuang, D.: Research on the threat of UAV cluster to air defense in key areas and Countermeasures. Aerodynamic Missile J. 07, 56–61 (2021) 3. Ma, W., Chigan, X.: Research on development of anti UAV technology. Aero Weaponry 27(06), 19–24 (2020) 4. Zhu, M., Chen, X., Liu, X., Tan, C.Y., Wei, L.: Situation and key technology of tactical laser anti-UAV. Infrared laser Eng. 50(07), 188–200 (2021) 5. Luo, B., Huang, Y.C., Zhou, H.: Development status and key technology analysis of tactical laser weapon anti UAV. Aerospace Technol. 09, 24–28 (2017) 6. Xue, M., Zhou, X.W., Kong, W.L.: Research status and key technology analysis of anti UAV system. Aerospace Technol. (05), 52–56+60 (2021) 7. Chen, X.L., Chen, W.S., Rao, Y.H., Huang, Y., Guan, J., Dong, Y.L.: Progress and prospects of radar target detection and recognition technology for flying birds and unmanned aerial vehicles. J. Radars 9(05), 803–827 (2020) 8. Xie, J.Y., et al.: Drone detection and tracking in dynamic pan-tilt-zoom cameras. CAAI Trans. Intell. Syst. 16(05), 858–869 (2021) 9. Martínez, C., Mondragón, I.F., Olivares-Méndez, M.A., et al.: On-board and ground visual pose estimation techniques for UAV control. J. Intell. Rob. Syst. 61(1), 301–320 (2011) 10. Rozantsev, A., Lepetit, V., Fua, P.: Detecting flying objects using a single moving camera. IEEE Trans. Pattern Anal. Mach. Intell. 39(5), 879–892 (2016) 11. Zhai, J.Y., Dai, J.Y., Wang, J.Q., Ying, J.: Multi-objective identification of UAV based on deep residual network. J. Graphics 40(01), 158–164 (2019) 12. Sun, H., Geng, W., Shen, J., et al.: Deeper SSD: simultaneous up-sampling and down-sampling for drone detection. KSII Trans. Internet Inf. Syst. 14(12), 4795–4815 (2020) 13. Ma, Q., Zhu, B., Zhang, H.W., Zhang, Y., Jiang, Y.C.: Low-altitude UAV detection and recognition method based on optimized YOLOv3. Laser Optoelectron. Progres 56(20), 279– 286 (2019) 14. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 159–182 (2016) 15. Jiang, L., Cui, Y.R.: Small target detection based on YOLOv5. Comput. Knowl. Technol. 17(26), 131–133 (2021) 16. Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR, pp. 6105–6114 (2019) 17. Liu, Z., Li, J., Shen, Z., et al.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–274 (2017)

Large-Scale Traffic Signal Control Based on Integration of Adaptive Subgraph Reformulation and Multi-agent Deep Reinforcement Learning Kai Gong2 , Qiwei Sun2 , Xiaofang Zhong1,2(B) , and Yanhua Zhang3 1 School of Data and Computer Science, Shandong Women’s University, Jinan 250300, China

[email protected]

2 Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of

Jinan, Jinan 250022, China 3 Information Network Management, Shandong Provincial Hospital Affiliated to Shandong

First Medical University, Jinan, China

Abstract. In multiple intersection environment, it is difficult to require all agents to collectively make a globally optimal decision, for which the dimension of action space increases exponentially with the number of agents. This paper is based on decentralized multi-agent deep reinforcement learning (decentralized MADRL), overcomes the scalability issue by distributing the global control to each local RL agent. In decentralized MADRL, the communication range between agents affects the performance and communication cost of the whole system. The adaptive subgraph reformulation algorithm is proposed to clarify the information acquisition range of agents. Our MADRL algorithm solves the trade-off problem of information interaction between various agents in large-scale urban traffic network. Each agent updates the decision based on local information and regional information extracted by subgraph reformulation to realize collaborative control of traffic signals. Through detailed simulation experiments, our algorithm comprehensively outperforms other RL algorithms when facing the real-world dataset with large number of intersections and complex road connections. Keywords: Adaptive Traffic Signal Control · Multi-Agent Deep Reinforcement Learning · Subgraph Reformulation

1 Introduction With the acceleration of urbanization and a growing number of private car ownership, traffic congestion is becoming increasingly serious. Adaptive traffic signal control (ATSC) aims to balance the difference between demand and supply by reallocating transportation resources that help to reduce traffic congestion and increase transportation efficiency [1]. In recent years, hardware equipment has been developing and intelligent technology has been progressing [2]. Applications of the new generation of information © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 751–762, 2023. https://doi.org/10.1007/978-981-99-4755-3_65

752

K. Gong et al.

technology comprehensively adopted in smart cities make it easier to sense vehicles on the road and integrate urban resources. On this basis, a more realistic research topic can be realized, that is, the cooperative adaptive control of multi-intersection traffic signals. At present, most traffic lights in cities are controlled by pre-defined fixed-time plan [3, 4]. According to the traffic flow and operation characteristics of the intersection, it determines the signal cycle time and the green signal ratio of each phase in advance. TSC methods based on traditional optimization algorithms have attracted wide attention in the past decades. The researchers applied various optimization algorithms, such as simulated annealing algorithm [5], genetic algorithm [6] and particle swarm optimization algorithm [7], to improve the efficiency and effectiveness of TSC. TSC is a complex decision-making problem, and the traditional methods based on fixed rules and algorithms are difficult to adapt to the changing environment. Deep reinforcement learning (DRL) uses deep neural networks (DNN) as function approximators. It can learn efficient decision-making strategies in complex environments, is suitable for high-dimensional data input and continuous state space scenarios, and can overcome the limitations of traditional optimal decision-making methods in model-free dynamic programming problems [8–10]. Multi-agent reinforcement learning (MARL) can be divided into centralized training framework and decentralized training framework. There are some existing decentralized RL models [11–13, 18, 19] that can tackle the city-level traffic signal control problem well. This paper proposes an adaptive subgraph reformulation strategy and integrate it into the multi-agent algorithm, aiming at clarifying the information acquisition range of each agent, reducing the complexity of calculation, and improving the optimization efficiency in signal control. MA2C_AS is evaluated in both a synthetic large traffic grid and a realworld large traffic network, with delicately designed traffic dynamics for ensuring a certain difficulty level of Markov Decision Processes (MDP). Numerical experiments confirm that MA2C_AS is more stable in synthetic large traffic grid, although it is slightly lower than the average reward curve of the general MA2C algorithm. In a real-world large traffic network environment, MA2C_AS outperforms other RL-based control algorithms in robustness and optimality. The framework of this paper is as follows: In Sect. 2, the principles and motivation of adaptive subgraph reformulation algorithm and the specific design of Multi-agent A2C are introduced. The simulation environment, datasets, experimental parameters and analysis of experimental results will be elaborated in Sect. 3. This paper will end with the conclusion and future work in Sect. 4.

2 Methodology In this section, an architecture of our framework is firstly outlined and the two parts, the adaptive subgraph reformulation algorithm and the multi-agent advantage actor-critic, are described in detail by subsequent subsections. 2.1 Framework A collaborative adaptive control method for multi-intersection traffic signals based on subgraph reformulation is proposed in this paper. As shown in Fig. 1, at any time t,

Large-Scale Traffic Signal Control Based on Integration

753

Fig. 1. Multi-intersection ATSC framework based on subgraph reformulation.

the road network subgraph is constructed adaptively according to the traffic flow data, vehicle queue information and other traffic state information collected by all agents in the target road network and the distance matrix of each intersection. Then the target agent node will receive the data of other nodes in the subgraph and carry out traffic signal control combined with the local state. The construction of the road network subgraph aims to reduce the computational complexity of the agent node and remove the influence of irrelevant data on traffic signal control. 2.2 Adaptive Subgraph Reformulation Algorithm Firstly, the initial traffic network was constructed to construct the adjacency matrix of the maximum travel time. Then, the shortest path algorithm is used to obtain a temporary connected subgraph, and which nodes are selected in the reserved interval according to accessibility analysis and similarity quantization. Finally, by connecting the Laplacian matrix in the temporary subgraph, the target subgraph is obtained by removing the nodes with low spatiotemporal similarity. The specific process of adaptive construction of road network subgraph is shown in Fig. 2:

Fig. 2. The process of the adaptive subgraph reformulation.

The minimum velocity between different  nodes is calculated by assuming that  flow the speed meets the normal distribution N μ, σ 2 [15]. Specifically, the average vehicle speed within the nearest time step t is substituted into the normal distribution as μ. 120% of the road’s maximum speed limit is used as the 95% upper confidence interval

754

K. Gong et al.

(μ + 2σ ). The minimum confidence lower bound (μ − 2σ ) is the lowest driving speed on the road. Therefore, the distance matrix D and the minimum travel speed matrix S can be obtained from the data set: ⎡ ⎤ ⎤ ⎡ 0 s12 . . . . . . s1n 0 d12 . . . . . . d1n ⎢ s21 0 ⎢ d21 0 d2n ⎥ s2n ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ .. . . .. ⎥ . . . . . . ⎢ ⎥ ⎢ . . (1) D=⎢ . . ⎥, S = ⎢ . . ⎥ ⎥, ⎢ ⎥ ⎥ ⎢ . . . . . . . . .. ⎦ . . .. ⎦ ⎣ .. ⎣ .. dn1 dn2 · · · · · · 0

sn1 sn2 . . . . . . 0

where n is the number of nodes in the entire traffic network. dij in D represents the actual distance from node i to node j; sij in S represents the minimum form velocity from node i to node j. The maximum travel time matrix T is obtained by dividing D and corresponding elements in matrix S, and is described as ⎤ ⎤ ⎡ ⎡ 12 1n 0 ds12 . . . . . . ds1n 0 t12 . . . . . . t1n ⎢ d21 0 d2n ⎥ ⎢ t21 0 t2n ⎥ s21 s2n ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ .. .. ⎥ ⎢ .. .. ⎥ .. .. ⎥ ⎥ ⎢ . (2) T =⎢ . . . ⎥=⎢ . . ⎥, ⎢ ⎥ ⎥ ⎢ . . . . . ⎢ ⎥ . . . .. ⎦ ⎣ .. . . .. ⎦ ⎣ .. d d n1 n2 tn1 tn2 · · · · · · 0 sn1 sn2 · · · · · · 0 where tij represents the longest driving time from node i to node j. Using Bellman-Ford shortest path algorithm, the shortest travel time of any two nodes can be obtained. Nodes whose travel time is shorter than the time interval set by subgraph construction will be selected and constructed as a temporarily connected subgraph GT . GT is computed as the sample data to obtain the eigenvector of the Laplace matrix. The similarity between the ith node to the target forecasting node could be quantized as Hsi . Filtrating the nodes with less similarity quantitative value Hsi than a reasonable threshold H , the reformulated subgraph could be obtained with the adjacent matrix G, which is described as ⎤ ⎡ 0 g12 . . . . . . g1m ⎢ g21 0 g2m ⎥ ⎥ ⎢ ⎢ .. .. ⎥ . . ⎢ . (3) G=⎢ . . ⎥ ⎥, ⎥ ⎢ . . . . . . ⎣ . . . ⎦ gm1 gm2 · · · · · · 0

where m is the number of nodes in the reformulated subgraph; gij represents the actual distance from node i to node j.

Large-Scale Traffic Signal Control Based on Integration

755

2.3 Multi-agent A2C for ATSC The traffic signal control problem can be formulated as a Markov Decision Process (MDP). States, actions, and rewards in the MDP are firstly described. Then, our multiagent A2C designs is introduced in following subsection. MDP Settings State Definition. At any time t, the local state space of intersection i is defined as

ot,i = queuelt,i , wait lt,i , l ∈ Li ,

(4)

where queuelt,i represents the number matrix of vehicles waiting in the entering lane at the intersection i at time t; wait lt,i represents the waiting time matrix of the first vehicle entering lane l at the intersection i at time t; Li is the number of incoming lanes at the i intersection. The joint status space at the current intersection i is composed of the local environment state of the current agent and the status space of other agents in the subgraph, which can be represented by the following formula:

(5) st,i =< ot,i , gi,j · wi,j · ot,j >, j ∈ Ni , where Ni is the set of all nodes in the subgraph where the target optimization node i resides. gi,j is obtained in Eq. (3); wi,j represents the number of intersections that the traffic flow at intersection j needs to pass through to reach the target intersection i. gi,j · wi,j can also be expressed as the discount factor γi,j . Action Space. In a four-arm intersection, signal phase collection space PhaseSpace = {NSG, NSLG, EWG, EWLG} includes north-south straight phase, north-south left turn phase, east-west straight phase and east-west left turn phase. Right-turning vehicles at the intersection are not restricted by traffic lights. Details are shown in Fig. 3.

Fig. 3. The set of signal phases and the definition of action.

Reward Function. At any time t, the current reward function rt,i of A2C agent can be defined as 



rt,i = rt,i − rt−t,i ,

(6)

756

K. Gong et al. 

where rt,i is defined as the sum of the rewards obtained by the local environment and all the other agents in the subgraph:        γi,j · rt,j = (queuelt,i + α · wait lt,i ) + γi,j · rt,j , (7) rt,i = rt,i + j∈Ni

j∈Ni

l∈Li





where r  is the local reward obtained by agent i, and γi,j · rt,j is the reward obtained by t,i agent j in the subgraph obtained by agent i. α is the discount factor. Agent Design Advantage Actor-Critic. A2C improves the policy gradient by introducing a value regressor Vw to estimate E[Rπt | st = s]. The loss function for value updating is L(w) =

1  (Rt − Vw (st ))2 , 2|B|

(8)

1  logπθ (at |st )At . |B|

(9)

t∈B

and the policy loss is L(θ ) = −

t∈B

B = {(st , at , st , rt )} contains the experience trajectory. A is the advantage function, defined as the difference between the action value function Q and the state value function V , i.e., Aπ (s, a) := Qπ (s, a) − V π (s, a) and At := Rt − Vw − (st ). “-” means the weights of DNN are frozen. Independent A2C. Each agent learns its own policy πθi and the corresponding value function Vwi . Value loss and strategy loss are defined according to [19], assuming the communication is limited to each local region. Local policy and value regressors take st,Vi := {st,j }j∈Vi instead of st as the input state. For agent i the value loss becomes L(wi ) =

 2 1  Rt,i − Vwi st,vi , 2|B|

(10)

  1  logπθi at,i |st,vi At,i , |B|

(11)

t∈B

and the policy loss becomes L(θi ) = −

t∈B

where At,i = Rt,i − Vw− (st,Vi ). i

Multi-agent A2C. To  relax the global cooperation, we assume the global reward is decomposable as rt = i∈V rt,i , Then a spatial discount factor α is introduced to adjust the global reward for agent i as ⎛ ⎞  Di ⎝ r˜t,i = α d rt,j ⎠, (12) d =0

j∈V |d (i,j)=d

Large-Scale Traffic Signal Control Based on Integration

757

where Di is the maximum distance from agent i. Similarly, we use α to discount the neighborhood states as s˜t,Vi = [st,i ] ∪ α[st,j ]j∈Ni ,

(13)

where Ni is the number of other agents that agent i focuses on. It can be obtained by our subgraph t −1 reformulation algorithm. Given the discounted global reward, we have Rt,i = τB=t γ τ −t r˜τ,i , and the local return can be obtained as 



R˜ t,i = Rt,i + γ tB −t Vw− (˜stB ,vi , πtB −1,Ni |πθ − ). i

−i

(14)

Combining Eq. (12), Eq. (13) and Eq. (14), the value loss Eq. (10) becomes L(wi ) =

 2 1  ˜ Rt,i − Vwi s˜t,Vi , πt−1,Ni . t∈B 2|B|

(15)

The policy loss Eq. (11) becomes.

(16)

The additional regularization term is the entropy loss of policy πθi for encouraging the early-stage exploration. This new advantage emphasizes more on the marginal impact of πθi on traffic state within local region. A˜ t,i = R˜ t,i − Vw− (˜st,Vi , πt−1,Ni ). Thus, value i loss and policy loss under multi-agent A2C is defined by Eq. (15) and Eq. (16).

3 Numerical Experiments MA2C_AS is evaluated in two SUMO-simulated traffic environments: a 4 × 4 synthetic traffic grid and a real-world 37-intersections traffic network, under time-variant traffic flows. This section aims to design challenging and realistic traffic environments for fair comparisons across controllers. 3.1 Simulation Environment and Experimental Parameters Each lane in the 4 × 4 grid traffic network is 400 m long and the road speed limit is 20 m/s. Each time step will randomly generate traffic flow and drive into the road network. Random seeds will be automatically saved after each simulation, so as to facilitate the subsequent training of the benchmark method (Fig. 4). In the simulation experiment of the real urban traffic network in Jinan City, there are 37 intersections with a speed limit of 20 m/s. The vehicle data at each intersection is the data of the vehicles on the real road on September 1, 2021. The data is the traffic flow from October 1, 2020 to October 1, 2021 obtained by our research group from the

758

K. Gong et al.

Fig. 4. A traffic grid of 16 intersections, with an example intersection shown inside circle.

Fig. 5. Part of the traffic network in Lixia district, Jinan City, Shandong Province.

database of Jinan Traffic Management Bureau. The traffic flow is the real traffic flow collected by the geomagnetic sensor installed at the intersection road (Fig. 5). In the same simulation environment, we trained MA2C_AS, IA2C, MA2C and IGRL, four traffic signal control algorithms of deep reinforcement learning. In the training stage, the agent made a decision every 5 s. The training parameters of the algorithm are set as follows (Table 1): Table 1. Training parameter setting. Training parameters

Value

Experience Replay Buffer size

10000

Batch size

128

Minimum exploration rate

0.001

Discount factor γ

0.99

Learning rate α

10−4

Training episodes

1400

Episode horizon T

720

Large-Scale Traffic Signal Control Based on Integration

759

The parameters selected by the reward function of each algorithm are inconsistent. For example, IA2C only focuses on the local reward, while MA2C_AS, MA2C and IGRL all observe the environmental changes of other agents in the road network to obtain the reward. Therefore, in order to make the average reward comparable, the average cumulative reward selected in this paper is the average of the local reward of each agent. The number of training episodes increased to 1400, making the total training cycle greater than 1-million-time steps. 3.2 Baseline Methods In order to evaluate the control effect and generalization ability of the proposed MA2C_AS method, it was compared with four baseline algorithms in the synthetic environment and the real road network data set respectively. The four baseline algorithms are as follows: • MP (Max pressure) [16], which belongs to the traditional optimized traffic signal control method. • IA2C (Independent A2C) [17], which does not consider the information interaction between multiple intelligences. • MA2C (Multi-agent A2C) [18], where each agent gathers information about its neighbors. • IG-RL (Inductive Graph Reinforcement Learning) [19], which uses an inductive graph method based on GCN for subsequent multi-agent traffic signal control. 3.3 Synthetic Traffic Grid

Fig. 6. Average cumulative rewards in a 4 × 4 traffic grid.

As can be seen from Fig. 6, IA2C algorithm only focuses on the local state of the agent and does not converge in training, so the model cannot realize effective control of traffic signals on the road network. MA2C has the best effect, because the 4 × 4 grid traffic network in the simulation environment requires less information by observing

760

K. Gong et al.

other agents. The average reward function curve of MA2C algorithm is still unstable in the late training process. MA2C_AS and IG-RL observed the status of other agents in the road network, which can realize the traffic flow channeling according to the status of non-adjacent intersections, and effectively realize the collaborative control of multi-intersection traffic signals. Twenty traffic simulation experiments were carried out on the MA2C_AS algorithm and four baseline algorithms respectively, and the time length of each simulation experiment was 3600 s. The generation of vehicles conforms to the Weber distribution to simulate the change of traffic peak. After each simulation test, the random seeds are saved so that all algorithms can be compared and analyzed in the same traffic flow environment. Table 2. Experimental results of three key indexes in 4 × 4 grid traffic network. Baselines

Average queue length (vehicle)

Average travel time (s)

Average driving speed (m/s)

MP

2.17

103.94

6.62

IA2C

1.13

71.18

10.56

IG-RL

1.29

76.64

9.39

MA2C

0.52

38.67

14.30

MA2C_AS

0.63

44.80

11.96

Table 2 summarizes the average value of the key indicators compared by the five methods in the 4 × 4 grid traffic network. 3.4 Jinan Urban Transportation Network

Fig. 7. Average cumulative rewards in Jinan urban transportation network.

Large-Scale Traffic Signal Control Based on Integration

761

Figure 7 elaborates that the training effect of MA2C_AS in the real-world road network becomes the best. IA2C algorithm does not care about the environmental changes of other agents, but only pursues the control effect of local intersections, which leads to the deterioration of the global effect. MA2C_AS and IG-RL observe the status of other non-adjacent agents in the road network to effectively realize the collaborative control of multi-intersection traffic signals. MA2C_AS has a higher average reward value, which proves that the proposed adaptive construction based on accessibility network can effectively remove the influence of irrelevant nodes on the optimization of target agents and improve the control effect of agents. Table 3. Experimental results of four key indexes in Jinan urban transportation network. Baselines

Average queue length (vehicle)

Average travel time (s)

Average driving speed (m/s)

Average intersection delay(s/vehicle)

MP

1.87

532

5.62

179.35

IA2C

1.71

459

6.07

146.19

IG-RL

1.04

336

9.09

96.86

MA2C

1.24

385

7.75

120.82

MA2C_AS

0.67

278

9.81

74.23

Table 3 displays that the MA2C_AS algorithm achieves the best performance in four main indexes.

4 Conclusion and Future Work The performance of our proposed MA2C_AS algorithm in synthetic traffic grid environment was only slightly worse than that of MA2C algorithm, but it was more stable. In a more complex road network, MA2C_AS observed the status of other agents in the road network. By integrating the status information of adjacent and non-adjacent intersections, MA2C_AS effectively realized collaborative control of traffic signals at multiple intersections, and its performance was better than all comparison algorithms. In addition, the proposed algorithm had excellent migration and generalization ability in different traffic network scenarios. Acknowledgment. This research was funded by the Natural Science Foundation of Shandong Province for Key Project under Grant ZR2020KF006, the National Natural Science Foundation of China under Grant 62273164, the Development Program Project of Youth Innovation Team of Institutions of Higher Learning in Shandong Province.

762

K. Gong et al.

References 1. Nguyen, T.T., Nguyen, N.D., Nahavandi, S.: Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans. Cybern. 99–113 (2020) 2. Wu, J.Q., Song, X.G.: Review on smart highways critical technology. J. Shandong Univ. (Eng. Sci.) 50(4), 52–69 (2020) 3. Alan, J.M.: Settings for fixed-cycle traffic signals. J. Oper. Res. Soc. 14(4), 373–386 (1963) 4. Webster, F.V.: Traffic signal settings. Road Research Technical Paper, 39 (1958) 5. Li, Z.L.: A differential game modeling approach to dynamic traffic assignment and traffic signal control. In: IEEE International Conference on Systems, Man, and Cybernetics. IEEE (2003) 6. Ceylan, H., Bell, M.G.: Traffic signal timing optimisation based on genetic algorithm approach, including drivers’ routing. Transport. Res. Part B-Methodol. 38(4), 1135–1145 (2004) 7. Wei, Y., Shao, Q., Hang, Y., et al.: Intersection signal control approach based on PSO and simulation. In: Proceedings of the 2008 Second International Conference on Genetic and Evolutionary Computing. IEEE, Jinzhou (2008) 8. Oroojlooy, A., Nazari, M., Hajinezhad, D., et al.: Attendlight: universal attention-based reinforcement learning model for trafficsignal control. In: Advances in Neural Information Processing Systems. NeurIPS, Online (2020) 9. Zheng, G., Xiong, Y., Zang, X., et al.: Learning phase competition for traffic signal control. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1963–1972. CIKM, Beijing (2019) 10. Zang, X., Yao, H., Zheng, G., et al.: Metalight: value-based meta-reinforcement learning for traffic signal control. In: AAAI Conference on Artificial Intelligence, pp. 1153–1160. AAAI, New York (2020) 11. Chen, C., Wei, H., Xu, N., et al.: Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control. In: AAAI Conference on Artificial Intelligence, pp. 3414–3421. AAAI, New York (2020) 12. Yu, Z., Liang, S., Wei, L., et al.: Macar: urban traffic light control via active multi-agent communication and action rectification. In: International Joint Conference on Artificial Intelligence, pp. 2491–2497. IJCAI, Yokahama (2020) 13. Wang, X., Ke, L., Qiao, Z., et al.: Large-scale traffic signal control using a novel multiagent reinforcement learning. IEEE Trans. Cybern. 51, 174–187 (2020) 14. Aslani, M., Mesgari, M.S., Wiering, M.: Adaptive traffic signal control with actor-critic methods in a real-world traffic network with different traffic disruption events. Transport. Res. Part C: Emerg. Technol. 85, 732–752 (2017) 15. Das, S., Maurya, A.K.: Time headway analysis for four-lane and two-lane roads. Transport. Dev. Econ. 3, 1–18 (2017) 16. Wang, X., Yin, Y., Feng, Y., et al.: Learning the max pressure control for urban traffic networks considering the phase switching loss. Transport. Res. Part C: Emerg. Technol. 140, 103670 (2022) 17. Chu, T., Wang, J., Codecà, L., et al.: Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 21(3), 1086–1095 (2019) 18. Paczolay, G., Harmati, I.: A new advantage actor-critic algorithm for multi-agent environments. In: 23rd International Symposium on Measurement and Control in Robotics. ISMCR, Budapest (2020) 19. Devailly, F.X., Larocque, D., Charlin, L.: IG-RL: inductive graph reinforcement learning for massive-scale traffic signal control. IEEE Trans. Intell. Transp. Syst. 23(7), 7496–7507 (2021)

An Improved AprioriAll Algorithm Based on Tissue-Like P for Sequential Pattern Mining Xiaojun Ma1,2 and Xiyu Liu1,2(B) 1 Business School, Shandong Normal University, Jinan, China

[email protected] 2 Academy of Management Science, Shandong Normal University, Jinan, China

Abstract. Sequential pattern mining is an important field in data mining, which aims to find frequent sequences with sequential order in massive data and discover valuable and relevant information to solve problems such as recommendation tasks. So far there are many algorithms that use serial or parallel approaches to mine sequential patterns. In this paper, we propose a TP-AprioriAll algorithm that can handle ordered data sets to mine frequent sequences. By combining the evolutionary communication tissue-like P-system with promoters and suppressors with the AprioriAll algorithm, which has an overall structure similar to that of an organization cell, where the evolutionary rules of the algorithm are object rewriting rules, the communication rules between cells and the parallelism mechanism are used to achieve an improvement of the AprioriAll algorithm to quickly process the transformed sequential data and mine the frequent sequential patterns. In this paper, the feasibility of the algorithm is demonstrated by examples, and the computational process of the algorithm is described in detail. By comparing the time complexity with other baseline algorithms, it is proved that the algorithm proposed is better than several other algorithms. The results of this paper provide some suggestions for improving traditional algorithms using parallel mechanisms of membrane computing models. Keywords: Tissue-like p system · AprioriAll algorithm · Sequential pattern mining

1 Introduction AprioriAll algorithm [1] is a commonly used method for mining sequence patterns. It takes into account the orderly characteristics of sequence elements in generating candidate sequences and frequent sequences, and changes the processing of item sets to the processing of sequences. In order to improve the efficiency of the algorithm to solve a large number of data sets, people improve the algorithm in terms of parallelism. The computational model in the field of membrane computing is called membrane system or P system, which is a model for parallel computing. Most P systems have been proved to be equivalent to Turing machines, so they are complete and universal in computing and have been applied in many fields [2]. The application of P system is based © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 763–774, 2023. https://doi.org/10.1007/978-981-99-4755-3_66

764

X. Ma and X. Liu

on two types of membrane algorithms, namely coupled membrane algorithm and direct membrane algorithm [3]. The direct membrane algorithm is to represent the data objects in the algorithm with the objects of the P system, and to represent the algorithms that operate the data objects with the rules of the P system. This paper proposes a sequential pattern mining algorithm TP-AprioriAll algorithm based on tissue-like P system. Due to the independence of organelles and cell membranes in living organisms, the tissue-like P system can operate in a maximal parallel mode. Therefore, the tissue-like P system is easier to realize information exchange [4–6]. In this paper we propose an improved AprioriAll algorithm based on a tissue-like P-system with promoters (TP-AprioriAll). The promoter is used as a system for digital generation devices in the tissue-like P system [7–10]. In the TP-AprioriAll system, it is assumed that there are m items. For such a database, a tissue structure with m + 1 cells are constructed. The first cell is used to input the data in the database into the system, and the m + 1 cell is called the output cell to store the results obtained by other cells. In the other m cells, by comparing whether the support count of the sequence is greater than the set support threshold to determine whether it is a frequent sequence, and then use the determined frequent h-sequence to generate frequent h + 1-sequence, and so on, and under the action of the promoter to achieve material transfer between cells. In this paper, the computational complexity of TP-AprioriAll system is compared with other classical algorithms. The experimental results show that the effectiveness of the proposed algorithm is better than other algorithms. The contribution of this paper is mainly reflected in two aspects. On the one hand, a TP-AprioriAll algorithm is proposed according to the great parallelism of P system and the algorithm of sequential pattern mining in transaction database. In this system, frequent sequences are screened according to the support threshold, and the selected frequent sequence are transmitted to the next cell under the action of the promoter to generate candidate sequences. This distributed parallel computing device greatly improves the time efficiency of computing. On the other hand, the application field of the new bionic device P system is extended to the mining of frequent sequential patterns. This study also provides a new application of P system in sequential pattern mining, and further expands the application field of direct membrane algorithm. The organizational structure of this paper is as follows: Sect. 2 presents some preliminary introductions about the Basic concepts and the tissue-like P-system. Section 3 develops the TP-AprioriAll algorithm that exploits the parallelism mechanism of the tissue-like P system. In Sect. 4, an example is used to illustrate the workflow of the proposed algorithm. The experimental results are summarized in Sect. 5. The conclusions are presented in Sect. 6.

2 Preliminaries 2.1 AprioriAll Algorithm Sequential pattern mining is to mine a set of transactions in chronological order to find frequent sequence with high frequency. The basic concepts of the AprioriAll algorithm are given below:

An Improved AprioriAll Algorithm Based on Tissue-Like P

765

Itemset: It is denoted by i =< i1 i2 . . . im >, where each ij represents an item, which is a non-empty itemset. Sequence: An ordered set of itemset, denoted by s =< s1 , s2 , . . . , sm >, where each si is an itemsets. Maximal Sequence: The largest frequent items are not included in other itemset. Support: Set a minimum support threshold. Includes: If there exists integers i1 < i2 < . . . < in , and a1  bi1 , . . . , an  bin , then the sequence < a1 a2 . . . an > is contained in another sequence < b1 b2 . . . bn >. The AprioriAll algorithm mainly includes the following steps: sort phase, Litemset phase, transformation phase, sequence phase and maximal phase. 2.2 Tissue-Like p System with Promoters and Inhibitor The formal definition of the tissue-like P system is as follows:  = (O, δ1 , δ2 , . . . , δm , syn, ρ, iout ) where O is an alphabet representing all objects in the environment and cells. δ1 , δ2 , . . . , δm is the cells that constitutes the tissue-like p system. syn = {1, 2, . . . , m} × {1, 2, . . . , m} is the relationship between cells. ρ is the order of the rules in the cell, and the higher the order, the higher the priority of the rules. iout Refers to the final output cell. The internal structure of each cell is:   δt = wt,0 , Rt , 1 ≤ t ≤ m where wt,0 represents the initial set of objects in the cell. When wt,0 = λ, the initial object set in the cell is an empty set, and there is no object. The objects in the cell are represented by characters. For example, θ represents an object in the cell, then θ k represents k copies of θ in the cell. Rt is the rule inside the cell δt , and the form of the rule is generally wv → xygo is a multiple set of objects that react according to the rules in the cell δt . v has two possibilities v = v and v = ¬v. The first v = v represents the promoters. The second v = ¬v is an inhibitor, which is a chemical substance used to hinder a reaction and has a negative impact on the chemical reaction. xy is an object multiset generated by the rule. The generated object multiset x remains in the cell, while the object multiset y is transmitted to other cells connected to it. If the rules in the cell have the same effect, they are connected by ∪. There is a certain order of rules in cells, only the highest order rules can be used in each step of transfer evolution.

3 TP-AprioriAll Algorithm 3.1 Algorithm and Rules Description After preprocessing the data, a sequence database containing D records and m fields and C sequences is obtained, which is converted into a form that the P system can process. If an item Ij in the sequence database is in the i-th record and lies on the l-th order of that record, denoted by Silj , the processing object of the membrane computation is a multiset, and the conversion of the database into a multiset facilitates the parallel operations that follow. This P-system consists of m + 2 cells, whose cell-to-cell connections are shown in Fig. 1:

766

X. Ma and X. Liu

Fig. 1. Cell structure for the TP-AprioriAll algorithm.

Where δ0 is an input cell, and δ1 , δ2 , . . . , δm are the cells which are used to implement the algorithm, besides this, the δm+1 is an output cell. At the beginning of the calculation, the object Silj is encoded in the sequence database, and the object θ k representing the support threshold k is also input into δ0 . With the parallel evolution mechanism of the organization P system, the object Silj and θ k are transmitted to δ1 , δ2 , . . . , δ m in parallel. In cell δ1 , β k which is generated by θ k is used to filter frequent item, frequent 1-sequences are determined by rules and passed to cell δ2 and cell δm+1 . Then, frequent 2-sequence and frequent 1-sequences are filtered in cell δ2 and passed to cell δ3 and cell δm+1 . By analogy, the generated frequent sequences are passed to cell δm+1 . The tissue-like P system for TP-AprioriAll is as follows: Apriori = (O, δ0 , δ1 , . . . , δm+1 , syn, ρ, iout ) O = {Silj , θ k , β k , Ij , γj , ϕj };

(1)

syn ={0, 1}, {0, 2}, . . . , {0, m}; {1, m + 1}, {2, m + 1}, . . . , {m, m + 1}; {1, 2}, {2, 3}, . . . , {m − 1, m};

(2)

ρ = {ri > rj |i < j};

(3)

      δ0 = w0,0 , R0 , δ1 = w1,0 , R1 , δ2 = w2,0 , R2 , . . . , δm = (wm,0 , Rm );

(4)

iout = m + 1.

(5)

The rules within each cell:

An Improved AprioriAll Algorithm Based on Tissue-Like P In R0:

0

=(

0,0 ,

0 ),

0,0

=

and

→ , }∪{ → } 01 = { for 1 ≤ ≤ ,1 ≤ ≤ , = 1,2, . . , − 1 , and 1 ≤ ≤ . In 1 = ( 1,0 , 1 ), 1,0 = and R1: → 11 = → 12 = − → } ∪ {{ → , } 13 = { ¬

for 1 ≤ ≤ ,1 ≤ ≤ , 1 ≤ ≤ ,and 1 ≤ ≤ In 2 = ( 2,0 , 2 ), 2,0 = and R2˖ → 1 2 ∪ 12 , 1 ≤ 1 , 2 ≤ . 21 = 1 → { 12 , 1 ≤ 1 = 2 ≤ } ∪ { 1 2 , 1 ≤ 22 = 1 1 2 2 23 = { 12 12 → 12 } ∪ { 1 2 1 2 → 1 2 } − → 12 , }} ∪ {{ 1 2 24 = {{ 12 12 → } ∪ { 12 ¬

{

1 2¬

In 3 = ( R3˖ 31 = { 1 32 = 1 1

33



1 2

3,0 ,

3 ),

2



2 2

}}

1 2,

3,0

=

1 2 3

12

≤ }

→ }∪

}∪{



12

12 3



123

}.

→ { 123 , 1 ≤ 1 = 2 = 3 ≤ } ∪ { 12 3 , 1 ≤ 1 = 2 < 3 ≤ } ∪ { 1 2 3 , 1 ≤ 1 < 2 < 3 ≤ }. = {{ 123 123 → 123 } ∪ { 12 3 12 3 → 12 3 }} ∪ { 1 2 3 1 2 3 → 1 2 3 } − − → 123 , }} ∪ {{ 12 3 12 3 → } 34 = {{ 123 123 → } ∪ { 123 ∪{

In ℎ = ( ℎ,0 , Rh: ℎ1 = {{ 12…ℎ −1

ℎ ),

1 1

2 2



12…(ℎ −1)ℎ

ℎ,0

→ →

ℎ3= {

− 1 2

2

and

¬

ℎ2=


is a candidate frequent 1-sequence, and object γj is used to indicate a frequent 1-sequence. βj’s initial presence in cell δ1 k copies indicates that j-th item needs to appear in at least k records for the sequence < (I1) > to be a frequent 1-sequence. βj1j2k and βj12k have the same role in cell δ2 and cell δ1, and the subscript is are used to store information about terms and sequences, the former indicating that item Ij1 and Ij2 are in different sequences as candidate 2-sequences, and the latter indicating that item Ij1 and Ij2 are in the same sequence as candidate 1-sequences. γj12 and γj1j2 are used to indicate frequent 1-sequences and frequent 2-sequences. In cell δ3, candidate frequent 1-sequences, 2-sequences and 3-sequences are represented by βj123k,βj12j3k and βj1j2j3k. γj123, γj12j3 and γj1j2j3 are used to represent frequent 1-sequences, 2-sequences, and 3-sequences, respectively. The representation is similar to cell δh. The subscripts to denote candidate or frequent 1-sequences to candidate or frequent h-sequences. The process of rule implementation is elaborated below: (1) Input cell δ0 . Enter the object Silj that has been converted, and the object θ nk are used to activate the algorithm. The rule r01 is executed to copy all the objects in cell δ0 and pass them to all the cells connected to it in parallel, i.e., the objects are passed to cells δ1 , . . . , δm . (2) When δ1 receives the objects passed from the cell, it evolves in the order of the rules. In δ1 , the support threshold of each item is obtained using the rule {θ k → β1k . . . βmk }, and then an object γj is generated using rule r11 for every object with Ij consumed and an object βj consumed, which is stored inside the cell, and in fact rule r12 is processed for multiple objects with different subscripts. It represents only one of p k−p the subrules, where 1 ≤ j ≤ m. Rule r13 = {γ j βj → λ} ∪ {{γ kij → ϕj,go } to ¬βj

realize the screening of frequent sequences, if the number of generated objects γj is less than the support threshold k, the generated objects γj and the remaining βj are consumed. If there is no inhibitor βj inside the cell, the generated ϕj is passed to cells δ2 and δm+1 , and cell δm+1 is used to store all the frequent sequences, and the frequent sequences generated by each cell later are also stored in δm+1 . (3) When object ϕj is passed into cell δ2 , rule r21 is executed to consume ϕj passed from δ1 to generate support counts for candidate 1-sequence and  candidate 2-sequence.  By  rule r22 = Sil1 j1 Sil2 j2 → Iij12 , 1 ≤ l 1 = l2 ≤ C ∪  carrying the sequence Iij1 j2 , 1 ≤ l 1 < l2 ≤ C by comparing the sequences in which the items are located, the items Iij12 and Iij1 j2 after mapping are obtained 2) that is, there are two cases of terms after converting, one is two items in the same sequence and one is in different sequences, so that we get the candidate one sequence and the candidate two sequences. The following rules r23 and r24 filter the candidate sequences obtained

An Improved AprioriAll Algorithm Based on Tissue-Like P

769

by rule r22 , and finally the generated frequent 1-sequence and frequent 2-sequence, are input to cell δ3 and saved to output cell δm+1 . (4) The obtained candidate sequences are passed into δ3 and rule r31 is initiated to generate support thresholds for all candidate 1-sequences, candidate 2-sequences, and candidate 3-sequences. βjk1 j2 j3 is used to filter frequent 3-sequences, βjk12 j3 is used to filter frequent 2-sequences, and βjk123 is used to filter frequent 1-sequences. As the same process in cell δ2 , 1-sequence, 2-sequence, and 3-sequence are first obtained by rule r32 according to the sequence in which the term is located. Filter whether it is a frequent sequence by rule r33 and rule r34 . The generated sequence is reacted with the corresponding β in rule r33 , and if the counts of 1-sequence, 2-sequence, 3-sequence obtained are equal to our support threshold, it is passed to connected cells δ4 and δm+1 . (5) The reaction process in other cells is similar to the intracellular reaction process shown previously, culminating in the execution of biological reactions in parallel in a regular sequence, yielding multiple frequent sequences that are passed on to neighboring cells. (6) There is no evolutionary rule in the last output cell δm+1 , which stores all the calculated frequent sequences internally. 3.2 Algorithm Flow The TP-AprioriAll algorithm used in this paper executes the intracellular rules in parallel according to the principle of parallelism of the organizational P system, and its specific algorithm is as follows:

770

X. Ma and X. Liu

3.3 Complexity Analysis The time complexity of the TP-AprioriAll algorithm proposed in this paper will be analyzed in the worst case. As we mentioned before, in the initial case, there is 1 computational step in cell δ0 , i.e., copying all the objects in it and passing them in parallel to the other cells connected to it. In cell δ1 , we directly find the frequent 1-sequence for the original objects without the step of generating candidate sequences. The process

An Improved AprioriAll Algorithm Based on Tissue-Like P

771

of generating frequent sequences in cell δ2 ,…, δm is carried out in four computational steps, which are generating objects βjk with support count threshold, generating candidate sequences, filtering frequent sequences to delete infrequent sequences, passing the frequent sequences to the cells connected to them, and greatly improving the computational speed without repeatedly scanning the database according to the mechanism of parallelism of the tissue p system. In summary, the time complexity of the TP-AprioriAll algorithm proposed in this paper is calculated to get 1 + 3 + 4(m−1) = 4 m. we write it as O(m). Here O is a common notation for time complexity, which is different from the meaning represented by O mentioned in our TP-AprioriAll algorithm in the previous paper. In Table 1, the algorithm proposed in this paper is compared with some baseline methods, where m is the number of frequent itemsets and L is the length of the sequence and the n is the length of data, and the time complexity is significantly better than other algorithms. Table 1. Time complexities of some baseline algorithms Algorithm

Time complexity

AprioriAll [1]

O(m2 )

GSP [11]

O(m*L2 )

SPADE [12]

O(n*m*log(m))

TP-AprioriAll

O(m)

4 Case Analyze In this section, an illustrative example is given to show how TP-AprioriAll works. Table 2 gives a time series database containing seven data with sequence length of three and fields of five, that is D = 7, C = 3, m = 5. Table 2. Sequential transaction database for cases SID

Sequence

T1

T2

T3

T4

T5

T6

T7

772

X. Ma and X. Liu

I1 , I2 , I3 , I4 , I5 are the 5 items contained in the database, (I1 , I2 ) is the item in the same time sequence, means I3 occurs after (I1 , I2 ).Assuming a support threshold of 2, the calculation procedure is shown below: (1) Input. The database is transformed into objects S11j1 , S11j2 , S12j3 , S21j1 , S22j2 , S22j3 , S23j4 , S31j1 , S31j2 , S31j3 , S31j4 , S32j5 , S41j4 , S42j5 , S51j2 , S52j3 , S52j4 ,S53j5 , S61j1 , S61j3 , S61j4 , S71j2 , S71j3 , S72j4 , S72j5 . The sequence data objects are input in such a way that they are transformed into a form that the p-system can recognize. In this example we set the support threshold to k = 2, and input the transformed object and θ 2 to the cells δ1 , δ2 ,…, δm connected to it to activate the computation process. (2) Generate frequent 1-itemsets. In cellδ1 , generate auxiliary object βj2 (1 ≤ j ≤ 5), which is created by is rule r 11 to indicate that each item j needs to appear in at least k = 2 transactions for it to be a frequent 1-itemset. The rule r 12 maximum parallelism is executed to traverse all frequent 1-itemsets in the sequence database. Here is an example of the detection process for candidate frequent 1-itemsetj1 . S11j1 , S21j1 ,S31j1 , S61j1 is contained in cellδ1 , it means that the first, second, third and sixth transactions all contain I1 , The subrules S11j1 βj1 → γj1 , S21j1 βj1 → γj1 , S31j1 βj1 → γj1 , S61j1 βj1 → γj1 meet the conditions of implementation. In the initial setup, we set the minimum support threshold to 2, so S11j1 βj1 → γj1 , S21j1 βj1 → γj1 , S31j1 βj1 → γj1 , S61j1 βj1 → γj1 can be executed in parallel two of these subsets. By executing this rule, two βj1 are consumed and two γj1 are generated, and several other terms are executed in the same process, generating γj21 , γj22 , γj23 , γj24 , γj25 . Rule r13 is executed to process the result obtained by the execution of rule r12 , and again we again take j1 as an example, rule r13 has consumed twoβj1 , the execution submodule {γ kij1 → ϕj1 ,go }will pass the generated ϕj1 to the connected cells δ2 andδ6 , ¬βjj1

similarly the execution submodule can pass the frequent item set ϕj2 , ϕj3 , ϕj4 , ϕj5 to cells δ2 andδ6 . (3) In cellδ2 , the frequent itemsets obtained in cell δ1 are combined with θ 2 to generate candidate sequences with frequent1-itemsets, and it is not necessary to put nonfrequent itemsets into the candidate sequences. Execute rule r21 , to obtain the set of sequences consisting of frequent 1-itemsets. In this example, the generated candidate 1-sequences are (I1 ,I2 ), (I1 ,I3 ), (I1 ,I4 ), (I1 ,I5 ), (I2 ,I3 ), (I2 ,I4 ), (I2 ,I5 ), (I3 ,I4 ), (I3 ,I5 ), (I4 ,I5 ), and the candidate 2-sequences generated are < I1 ,I2 >, < I1 ,I3 >, < I1 ,I4 >, < I1 ,I5 >, < I2 ,I3 >, < I2 ,I4 >, < I2 ,I5 >, < I3 ,I4 >, < I3 ,I5 >, < I4 ,I5 >.Execute rule r23 , in which the generated candidate sequences are reacted with βj12 andβj1 j2 , and the generated γj12 and γj1 j2 are filtered by a threshold with the support of 2. In this example, only individual frequent sequences are cited here, and → ϕj12 ,go ) to generate frequent 1-sequences, and by executing the subrule,γj212 γj21 j2

¬β j12

¬β j1 j2

→ ϕj1 j2 ,go to generate frequent 2-sequences, finally we get frequent 1-

sequences and frequent 2-sequences containing (I1 ,I2 ), (I1 ,I3 ), (I1 ,I4 ), (I2 ,I3 ), (I3 ,I4 ), < I1 ,I2 >, < I1 ,I3 >, < I2 ,I3 >, < I2 ,I4 >, < I2 ,I5 >, < I3 ,I4 >, < I3 ,I5 >, < I4 ,I5 > which are input to cells δ3 andδ6 . (4) In cellδ3 , the frequent sequences activation rules r31 generated by cell δ2 are used to generate the candidate sequence support counts, and the method can help to reduce the generation of non-frequent sequences. In this example, we get ultimately the

An Improved AprioriAll Algorithm Based on Tissue-Like P

773

frequent 1-sequence (I1 ,I3, I4 ), the frequent 2-sequence < (I2 ,I3 ),I4 >, < (I2 ,I3 ),I5 >, < (I3 ,I4 ),I5 >, and no frequent 3-sequence is generated. The generated frequent sequences are also passed to δ4 andδ6 . (5) In cell δ4 , similar to the process of cell generation described above, here no new frequent sequences are generated, there are no objects in the cell that can be passed to the next cell, there is no way to activate the reaction in the next cell, and the computation is terminated. (6) All the candidate frequent sequences obtained in the above cells were stored in cell δ6 .

5 Experiment Results. In this paper, we use the dataset of UCI [13] to conduct computational experiments on the process of the algorithm and its correctness, and more than 1000 data in the dataset are extracted for the experiments, where we only use 3 attributes with identifiers to get the sequence database we need by processing. In the extracted sample contains a total of 17 items, the support threshold is set to k = 15, where we will get the statistical results of the number of frequent sequences as follows Table 3, The largest frequent sequence is the 3-sequence and there is only one. Experiments further prove the feasibility of the algorithm. Table 3. Results of frequent sequences mining based on TP-AprioriAll algorithm. Size of frequent sequence

Number

Seq-1

18

Seq-2

95

Seq-3

42

Seq-4

1

6 Conclusion In this paper, we propose an improved mining algorithm TP-AprioriAll algorithm for frequent sequence patterns, which employs a parallel mechanism in an organizational P system. Compared with other parallel improved Apriori-like algorithms, the time complexity of TP-AprioriAll is significantly improved. The results of this paper provide some suggestions for improving traditional algorithms using parallel mechanisms of membrane computing models. For further research, it is necessary to improve the sequence pattern mining algorithm by using the more popular impulsive neural p-system nowadays.

774

X. Ma and X. Liu

References 1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering (1995) 2. Zhang, G.X., et al.: Evolutionary membrane computing: a comprehensive survey and new results. Inf. Sci. 279, 528–551 (2014) 3. Liu, X., Zhao, Y., Sun, M.: An improved apriori algorithm based on an evolutioncommunication tissue-like P system with promoters and inhibitors. Discret. Dyn. Nat. Soc. 2017, 1–11 (2017) 4. Song, B., et al.: Tissue P systems with states in cells. IEEE Trans. Comput. 1–12 (2023) 5. Luo, Y., Zhao, Y., Chen, C.: homeostasis tissue-like P systems. IEEE Trans. Nanobiosci. 20(1), 126–136 (2021) 6. Zhang, G.: Membrane computing. Int. J. Parallel Emergent Distrib. Syst. 36, 1–2 (2019) 7. Song, B., Li, K., Zeng, X.: Monodirectional evolutional symport tissue P systems with promoters and cell division. IEEE Trans. Parallel Distrib. Syst. 33(2), 332–342 (2022) 8. Song, B., Pan, L.: The computational power of tissue-like P systems with promoters. Theoret. Comput. Sci. 641, 43–52 (2016) 9. Song, B., et al.: Monodirectional tissue p systems with promoters. IEEE Trans. Cybernet. 51(1), 438–450 (2021) 10. Wang, L., et al.: An extended tissue-llike p system based on membrane systems and quantumbehaved particle swarm optimization for image segmentation. Processes 10(2), 32 (2022) 11. Apers, P., Bouzeghoub, M., Gardarin, G. (eds.): EDBT 1996. LNCS, vol. 1057. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014139 12. Zaki, M.J.: SPADE: an efficient algorithm for mining frequent sequences. Mach. Learn. 42(1), 31–60 (2001) 13. Chen, D.: Online Retail II. UCI Machine Learning Repository (2019). https://doi.org/10. 24432/C5CG6D

Joint Spatiotemporal Collaborative Relationship Network for Skeleton-Based Action Recognition Hao Lu and Tingwei Wang(B) Shandong Provincial Key Laboratory of Network Based Intelligent Computing, School of Information Science and Engineering, University of Jinan, Jinan 250022, China [email protected] Abstract. Skeleton-based human action recognition has achieved a great interest in recent years, as skeleton data has been demonstrated to be robust to illumination changes, body scales, dynamic camera views, and complex background. Nevertheless, efficient modeling of relationships between non-adjacent nodes in 3D skeleton spacetime is still an open problem. In this work, we propose a Joint Spatiotemporal Collaborative Relationship (JSCR) network which model utilizes the query and key vectors of each node to construct the relationship feature matrix between nodes, and realizes the feature capture and reorganization of nodes in the space-time dimension. In our JSCR model, Spatial Collaborative Relation Module (SCRM) is used to model the relationship between non-adjacent nodes within a frame, and Temporal Collaborative Relation Module (TCRM) is used to understand the interaction between non-adjacent nodes across multiple frames. The two are combined in a two-stream network which outperforms state-of-the-art models using the same input data on both NTU-RGB + D and Kinetics-400. Keywords: Deep Learning · Graph Convolutional Network · Skeleton Action Recognition · Attention Mechanism · Joint Spatiotemporal Collaborative Relationship

1 Introduction Human action recognition has a wide range of application scenarios, such as humancomputer interaction and video retrieval [1, 2]. In recent years, the recognition of actions based on the human skeleton has garnered increasing attention [3–5]. The skeleton, which consists of a well-organized set of data with each joint of the human body identified by a joint type, a frame index, and a 3D position, has several advantages for action recognition. First, the skeleton provides a high-level representation of the human body that abstracts human pose and motion. Biologically, humans are capable of recognizing action categories by observing only the motion of joints, even without appearance information [6]. Second, advances in cost-effective depth cameras [7] and pose estimation technology [8, 9] have made it easier to access the skeleton. Third, compared to RGB video, the skeleton representation is robust to variations in viewpoint and appearance. Fourth, the representation is computationally efficient due to its low-dimensional nature. Moreover, the use of skeleton-based action recognition is complementary to RGB-based action recognition [10]. The present work focuses on skeleton-based action recognition. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 775–786, 2023. https://doi.org/10.1007/978-981-99-4755-3_67

776

H. Lu and T. Wang

Graph convolutional networks (GCNs) have gained widespread adoption in the field of skeleton-based action recognition [11–14], where they are used to model human skeleton sequences as spatiotemporal graphs. Among GCN-based approaches, ST-GCN [14] is a well-known baseline that combines spatial graph convolutions with interleaving temporal convolutions to enable effective spatiotemporal modeling. To further improve upon this baseline, researchers have explored techniques such as adjacency powering for multiscale modeling [15, 16] and self-attention mechanisms to enhance modeling capacity [17, 18]. However, despite the success of GCNs in skeleton-based action recognition, they have limitations in modeling relationships between non-adjacent nodes in spacetime. To overcome the aforementioned limitations, this paper proposes the Joint Spatiotemporal Collaborative Relationship network. In the proposed JSCR, this modeling process leverages the query and key feature vectors from the self-attention mechanism to construct the relationships between non-adjacent nodes and non-adjacent skeleton nodes in non-adjacent frames in the same space. Finally, key features are weighted using valued feature vectors to construct a flexible, dynamic and learnable adjacency matrix. The main contribution of this paper can be summarized as follows: • We propose a novel model based on the joint spatiotemporal collaborative relationship. JSCR uses the query and key vectors of each node to construct the relational feature matrix between nodes, and realizes the capture and reorganization of node features in the spatiotemporal dimension. • In the spatial dimension, we design a spatial collaborative relation module to model the relationship between non-adjacent nodes within a frame, and better represent the relationship between various parts of the human body. In the temporal dimension, we design a temporal collaborative relation module for understanding interactions between non-adjacent nodes across multiple frames. • Experimental results on large datasets (NTU-RGB + D60, NTU-RGB + D120, and Kinetics-400) show that the accuracy of the proposed method substantially outperforms baselines as well as state-of-the-art methods using the same input data, proved the validity of the model.

2 Method The overall pipeline of our algorithm is shown in Fig. 1, the core of which is the joint spatiotemporal collaborative relationship block. The processing process can be divided into spatial processing flow and temporal processing flow. Input features into two parts of JSCR: spatial collaborative relationship module (SCRM) and temporal collaborative relationship module (TCRM), and perform joint correlation operations using Query (q) and Key (k) vectors. The SCRM is connected to the temporal convolution (TCN) layer, and the TCRM is connected to the spatial convolution (GCN) layer, where the convolution kernel size is 1 × 1. Then, we further implement feature weighting in spatial and temporal dimensions using Value (v) features. In this way, it is possible to model the spatiotemporal collaborative relation of the skeleton sequence, and enable the model to generate spatiotemporal attention weighting, as shown in Fig. 2. Below, we detail our approach in the order of spatially and temporally processing streams.

Joint Spatiotemporal Collaborative Relationship Network

JSCR

Backbone

JSCR

777

Loss Function

6 Layers

Classification

Fig. 1. The overall network structure.

V

Spatial Relation

Graph Unpooling Weighted Adjacency Matrix

Q K

SCRM

Graph Unpooling

TCN

Joint Spatiotemporal Collaborative Relationship Modeling GCN

V

Temporal Relation

Q K Fully Connection

TCRM

Graph Unpooling

Temporal Attention

T frames

Fig. 2. The two processing flows of JSCR.

2.1 Spatial Stream The movement of the human body is produced by a cooperative relationship between the skeleton nodes. For example, the movement of “running” is produced by a cooperative relationship between the hands and feet, as shown in Fig. 3. Therefore, we need to focus on modeling cooperative relationships between non-adjacent vertices. In the spatial processing stream, the collaborative relationship of individual vertices within each frame can be modeled, and a weighted adjacency matrix can be obtained to weight each vertex. This enables spatial contextual co-relation modeling with key skeleton attention within each frame. First, each vertex feature in the part-level graph sequence X P is multiplied with three different weight matrices to obtain the corresponding Query (q), Key (k) and Value (v) features. Taking the t-th frames as an example, the calculation process can be shown as follows:   P (1) · Wq i = 1, 2, ..., Np , qit = Xi,t P kit = Xi,t · Wk (i = 1, 2, ..., Np ),

(2)

P vit = Xi,t · Wv (i = 1, 2, ..., Np ),

(3)

P ∈ Rd represents the feature vector of the i-th vertex at the t-th frame in where Xi,t the part-level graph, Wq ∈ Rd ×dq , Wk ∈ Rd ×dk and Wv ∈ Rd ×dv represent the weight matrix. The general self-attention mechanism is to score the q and k features between different embedding by dot multiplication, and calculate the relation between feature

778

H. Lu and T. Wang

embedding. Each score is then divided by the sqrt of the dimension of the k feature √ dk , and then encoded using Softmax to obtain the contribution of the features at each position to the current embedding. After that, the self-attention of each embedding is calculated by multiplying the V feature matrix. Here, taking the i-th vertex as example, the calculation process is as Eq. (4) as follows:   T   T  T · V, (4) hti = qit k1t , qit k2t , ..., qit kNt p where hti represents the self-attention output of the i-th vertex, V represents the matrix containing the v features of all vertices.

Fig. 3. The skeleton graph about “running”.

In this paper, we propose SCRM to implement cross-vertex correlation computation between multiple vertices. The q features between the vertices are mutually independent to obtain the matrix Qr ∈ RNp ×Np about Query. Similarly, we perform the same operation on the k features to obtain the matrix Kr ∈ RNp ×Np about Key, which can be expressed as follows: T   t t t t (5) · q , q , ..., q Qr = q1t , q2t , ..., qN 1 2 Np . p T   Kr = k1t , k2t , ..., kNt p · k1t , k2t , ..., kNt p .

(6)

The dot multiplication operation can be considered as calculating the cosine sim T ilarity between two vectors in the feature space. As shown in equation q1t · q2t = t t q1 q2  cos θ , where cos θ represents the cosine of the angle between the two vectors. When q1t and q2t are more similar, the angle between the two vectors will be smaller and the cos θ will be larger. And vice versa. Thus for Qr and Kr , the individual elements of which contain relationships between pairs of vertices. Then, the matrix R can be obtained via a dot multiplication between the Qr and Kr , which can be expressed as shown Eq. (7): ⎡  T  T t ⎤ ⎡  t T t  t T t ⎤ k q1t q1t . . . q1t qN k · · · k1 kNp 1 1 p ⎢ ⎥ ⎢ ⎥ .. .. .. .. .. .. ⎢ ⎥ ⎢ ⎥ (7) R=⎢ ⎥·⎢ ⎥. . . . . . . ⎣  T ⎦ ⎣  T ⎦  T  T t t t qN kNt p k1t · · · kNt p kNt p q1t . . . qN qN p p p

Joint Spatiotemporal Collaborative Relationship Network

779

Equation (7) shows that the elements of the matrix can be obtained by multiplying the relationships of pairs of q features and k features. It can be considered as an interactive fusion between the correlation matrix on Query and the correlation matrix on Key. If the Query matrix is directly dotted with the Key matrix, then the elements of the resulting matrix represent the correlation between the two vertices (Fig. 4). √ Similar to the general self-attention mechanism, we divide the matrix R by dk and normalize it by Softmax. The normalized matrix is then dotted with the feature matrix V . The overall process is shown in. Spatial self-attention calculation process. Let Hs represents the self-attention output, and the above processes can be expressed via Eq. (8) as follows:   Qr Kr V. (8) Hs = softmax √ dk

NP

Qr

Q QT

V NP

Kr

K KT

Fig. 4. Spatial self-attention calculation process.

In addition, since the v feature can represent the main content of embedding itself, it can be used to model the hidden connection relations between individual vertices. First, we multiply the v features of all vertices with each other to obtain the correlation matrix T   Vr = v1t , v2t , ..., vNt P · v1t , v2t , ..., vNt P ∈ RNp ×Np , and encode it by Softmax. The calculated matrix contains the correlation weights between the vertices. We can consider the elements in Vr as the weights of the edges on the graph, then a hidden connection is established between the vertices. Then, the encoded matrix Vr is mapped back to the joint-level feature space by graph unpooling, and summed with the adjacency matrix A to obtain the weighted adjacency matrix, which can be used to obtain to weight the each vertice. The adjacency matrix represents the connection relationship between vertices, which allows to have a hidden edge connected between each vertex after weighted, as shown in Fig. 5, for simplicity and clarity, only the connections between nodes are shown in the diagram. Through hidden edges, a larger receptive filed can be provided for GCN as a way to model spatial contextual connections across vertices. Moreover, the weights can be trained to allow the model to pay attention to joint regions useful for action recognition, and enabling spatial attention of the model.

780

H. Lu and T. Wang

Graph Unpooling

Weighted

NP

Graph Unpooling

Softmax

V

VT

Fig. 5. Hidden connection relations between vertices.

2.2 Temporal Stream In the temporal processing stream, we implement cross-vertex collaborative relationship modeling for multiple vertices. It requires collaborative relationships not only among vertices within each frame, but also among non-adjacent vertices in non-adjacent frames. The reason is that for an action video, the position of the human joints in subsequent frames are changed from the joints in the previous frames. Therefore, the human joints between different frames are in a motion space variation relationship, as shown in Fig. 6. Finally, the temporal attention can be calculated to weight each frame in the sequence to achieve the temporal attention of the model (Fig. 7).

Fig. 6. The skeleton change of “running”. NP×NP

F(Qr,i ,Kr,j) =

=

Qr,i

Kr,j

sum

i, j

Fig. 7. Correlation projection vector calculation process.

To achieve the above goals, we propose the TCRM. Specifically, after passing through the GCN, the feature matrices of the part-level graph Q, K and V can be calculated for

Joint Spatiotemporal Collaborative Relationship Network

781

each frame. Then, calculate the relationship matrix about Query and Key according to Eqs. (5) and (6). The input skeleton sequence has a total of T frames; thus, the  relationship matrices can be represented as and KR = Kr,1 , Kr,2 , . . . , Kr,T . Then, pairwise operations are performed on each of the correlation matrices in and KR to obtain the correlation projection vector, as shown in. Correlation projection vector calculation process. First, given the correlation matrices Qr,i and Kr,j according to Eqs. (5) and (6), where ∀i, j = 1, 2, ..., T . Then the two matrices are dot-product to obtain matrix Ri,j , and the elements of each row are accumulated along the Y -axis to obtain vector Pi,j . For Pi,j , each element is accumulated by the correlation score between the current vertex and the others. Therefore, it can be represented as the correlation projection of vertices. When all the matrices in QR and KR have been subjected to this operation, a multi-frame cross-vertex correlation matrix can be obtained, which can be shown in Eq. (9) as follows:   ⎤ ⎡  F Qr,1 Kr,1 · · · F Qr,1 Kr,T ⎥ ⎢ .. .. .. T ×T ×N F (QR , KR ) = ⎣ , (9) ⎦∈R  .  .   . F Qr,T Kr,1 · · · F Qr,T Kr,T where T and N denotes the number of frames and nodes, respectively. The F (QR , KR ) is a multi-channel matrix, where the i-th channel represents the correlation matrix to the i-th vertex in the sequence. Then, similar to the operation that general self-attention mechanism to obtain the attention output in the temporal dimension, as shown in Fig. 8. The operation is implemented as a per-channel operation. We take out the matrix of each √ channel individually and divide it by dk and use Softmax for normalized encoding. The encoded matrix is then multiplied by a feature matrix containing the v features of the corresponding vertices in each frame of the sequence. In this way, feature reorganization can be achieved for the v features of each vertex in the sequence. The above operation can be represented as Eq. (10):     Fi (QR KR ) Vi , i = 1, 2, ..., Np , (10) HT ,i = software √ dk √ R KR ) is the correlation matrix of i-th vertex where T denotes the number of frames, Fi (Q dk according to Eq. (8), and Vi is the feature vector of vertex i.

( QR , K R )

T

Softmax

HT NP

NP T

dv

Fig. 8. Temporal self-attention computation flow.

782

H. Lu and T. Wang

Weighted

T frames

Fig. 9. After passing VR through the fully connected layer, an attention vector is obtained.

On the other hand, we accumulate the v features of all vertices in each frame to get the average feature vector about value for the current frame. Then, the average feature vectors of all frames are multiplied in pairs to obtain the correlation matrix of the whole sequence with respect to value, as shown in Eq. (11): T   VR = V 1 , V 2 , ..., V T · V 1 , V 2 , ..., V T · ∈ RT ×T ,

(11)

where V t ∈ Rdv (t = 1, 2, ..., T ) represents the average feature vector about value of the N  vit , where vit represents skeleton graph at frame t. V t can be calculated as V t = N1 i=1

the v feature of the i-th vertex at frame t. Then, VR is fed into the fully connected layer and modeled to obtain an attention score for each frame. We use this score to weight the joint region features of each frame, so as to achieve temporal attention of model to the human joint sequence, as shown in Fig. 9. In this way, it is possible to locate information on key motion parts of key frames in the sequence and filter out key motion information.

3 Experiments In this section, we conduct a series of experiments to demonstrate the effectiveness of the proposed method. It includes ablation experiments on the proposed JSCR, as well as comparative experiments with the state-of-the-arts methods. All experiments were conduct on the NTU-RGB + D60, NTU-RGB + D120 and Kinetics-400 datasets. Then, the configuration and details of our experiments will be described. 3.1 Ablation Experiment In this subsection, we conduct relevant ablation experiments for joint spatiotemporal collaborative relationship modeling. The following different implementations are carried

Joint Spatiotemporal Collaborative Relationship Network

783

out here separately. The first one is to replace the spatial graph convolution in ST-GCN with SCRM for training. Then, the temporal convolution is replaced with TCRM for training. The last one is to ensemble the results of the above two implementations. Table 1 shows the performance of ablation experiments on NTU-RGB + D60 and 120. It can be seen from rows 1–3, the accuracy of X-Sub and X-View in NTU-60 increased 3.9%∼ 4.1% and 5.3%∼ 5.6% when SCRM or TCRM completely replaced the block in back-bone (ST-GCN). The accuracy was improved 1.2%∼ 1.3% and 0.5%∼ 0.7% on X-Sub and X-View for NTU-60 combining the models with SCRM and TCRM, respectively. And the accuracy on X-Sub and X-Set for NTU-120 is improved by 1.2%∼ 1.6% and 0.4%∼ 0.7%, respectively. It suggests that combining the two branches in JSCR can better model the spatiotemporal collaborative relationships of skeleton sequence. Rows 5–7 represents keeping the first three layers of the backbone block, and replacing the last 6 layers with our method. It can be found that when the basic layer of backbone is combined, the accuracy is improved by 1.9%∼ 2.3% and 1.2%∼ 1.9% on X-Sub and X-View (X-Set). The reason is that backbone can extract high semantic joint features, which is more conducive to the subsequent feature relationship modeling. Table 1. The results of ablation experiment for JSCR on NTU-RGB + D 60 and 120 datasets. SCRM

TCRM

bone √

√ √ √



√ √

√ √







NTU-60

NTU-120

X-Sub

X-View

X-Sub

X-Set

81.5

88.3

-

-

85.6

93.8

80.2

83.8

85.4

93.6

79.8

83.5

86.8

94.3

81.4

84.2

88.3

95.3

82.6

84.9

88.4

95.5

83.4

85.3

88.7

96.1

84.1

86.4

Similarly, we used the same implementation to conduct experiments on the Kinetics400, as shown in Table 2. The results of ablation experiment for JSCR on Kinetics-400 dataset.. It can be seen that our methods have a significant improvement of accuracy in Top-1 and Top-5 based on backbone. And when the two branches of JSCR (SCRM and TCRM) are combined with backbone, Top-1 and Top-5 achieve the highest accuracy, i.e.,36.8% and 59.7%. Therefore, the above experiments show that the JSCR is effective. It can make the spatiotemporal skeleton features more robust by modeling the multivertex spatiotemporal collaborative relationships between joints. 3.2 Comparative Experiment In this subsection, we conduct experiments comparing the proposed method with stateof-the-art methods on the NTU-RGB + D60, NTU-RGB + D120 and Kinetics-400 datasets and the experimental results are shown in Table 3.

784

H. Lu and T. Wang Table 2. The results of ablation experiment for JSCR on Kinetics-400 dataset. SCRM

TCRM

bone √

√ √ √

















Top-1

Top-5

30.7

52.8

35.2

57.5

35.0

57.2

36.1

58.6

35.9

58.2

35.5

57.9

36.8

59.7

Table 3. Comparative test results on mainstream datasets. NTU-RGB + D60

NTU-RGB + D120

Kinetics-400

Method

X-Sub X-View Method

X-Sub X-Set Method

Top-1 Top-5

VA-LSTM

79.2

87.7

PA-LSTM

25.5

26.3

PA-LSTM

16.4

35.3

TCN

74.3

83.1

ST-LSTM

55.7

57.9

TCN

20.3

40.0

ST-GCN

81.5

88.3

GCA-LSTM

58.3

59.2

ST-GCN

30.7

52.8

Motif + VTDB

84.2

90.2

Synthesized CNN

60.3

63.2

AS-GCN

34.8

56.5

2s AS-GCN

86.8

94.2

CNN + MTLN

62.2

61.8

2sAdaptGCN

36.1

58.7

2sAdaptGCN

88.5

95.1

ST-GCN

70.7

73.2

sLnL

36.6

59.1

1s Shift-GCN

87.8

95.1

AS-GCN

77.9

78.5

CA-GCN

34.1

56.6

Js SEFN(Att)

86.9

94.1

AMCGC-LSTM 79.7

80.0

SAN

35.1

55.7

Js Sym-GNN

87.1

93.8

STF-GCN

76.9

79.1

Sym-GNN

36.4

57.4

Js Graph2Net

88.1

95.2

LAGA-Net

81.0

82.2

Bs DualHead

35.7

58.7

LAGA-Net

87.1

93.2

RA-CGN

81.1

82.7

KA-AGTN

36.1

58.8

KA-AGTN

88.3

94.3

KA-AGTN

84.1

86.2

Pe-GCN

34.0

54.2

1s JSCR-S (Our)

88.3

95.3

1s JSCR-S (Our) 82.6

84.9

2s JSCR (Our) 36.8

59.7

1s JSCR-T (Our) 88.4

95.5

1s JSCR-T (Our) 83.4

85.3

-

-

-

2s JSCR (Our)

96.1

2s JSCR (Our)

86.8

-

-

-

88.7

84.1

The proposed method in this paper can be divided into two processing streams, and the experiments will be conducted to compare the different processing streams with other methods. The 1s JSCR-S and 1s JSCR-T in the table represents the model with spatial or temporal processing streams, respectively. And the 2s JSCR is the model that ensembles two processing streams. Based on our results on the NTU-60, it can be found that the early method used LSTM to model the skeleton sequence. Alternatively, the TCN with a convolution kernel of 1 × 1 is used for the temporal modeling of individual joints. However, it lacks attentions to the topology of the human skeleton

Joint Spatiotemporal Collaborative Relationship Network

785

graph. The ST-GCN enables the spatial temporal modeling of skeleton sequences, and its ac-curacy is improved 2.3%∼7.2% and 0.6%∼5.2% on X-Sub and X-View. But ST-GCN is unable to model the across joints, and the accuracy of our method is 7.2% and 7.8% higher than ST-GCN. Moreover, our method is also compared with other methods for modeling cross-joint relationships. Among them, AS-GCN proposed the Actional Links and Structural Links to explore the dependencies between joints. The Shift-GCN is used to model the relationship between joints by shifting the elements of different joint features to each other. On X-Sub and X-View, the accuracy of JSCR is improved 1.1%∼2.0% and 0.9%∼1.9%. In addition, there have some other methods about attention are regarded as baseline methods, such as Adaptive GCN and LAGA-Net. And the accuracy of JSCR outperforms these methods 0.1%∼1.7% and 1.0%∼2.5%. Furthermore, we compared our proposed method with KA-AGTN, which also utilizes the transformer, and our proposed method outperforms KA-AGTN 0.4% and 1.8% in terms of accuracy. According to the result representation on NTU-120, The accuracy of 2s JSCR is 13.8% and 13.6% better than ST-GCN on X-Sub and X-Set. Compared with AMCGC-LSTM, which combines GCN and LSTM, the accuracy of 2s JSCR is improved 4.4% and 6.8%, respectively. Finally, we analyze the result of the comparative experiments on Kinetics-400. The accuracy of JSCR ensembled with the two-processing stream reached 36.8% and 59.7% on Top-1 and Top-5, which outperforms the stateof-the-art methods. It should be noticed that, KA-AGTN and Pe-GCN are the newest methods. Based on the above experiments, the proposed method can better model the skeleton sequence features. It indicates that the JSCR can focus on the collaborative contextual relationship between multiple joints and obtain robust feature representation. Therefore, our method is effective and superior.

4 Conclusion In this paper, a joint spatiotemporal collaborative relationship network is proposed to perform the skeletal action recognition task. The network model uses the query and key vectors of each node to construct the relationship feature matrix between nodes. The feature capture and recombination of nodes in space-time dimension are realized. In JSCR, we first proposed a spatial cooperative relationship modeling to enhance the correlation between non-adjacent skeleton nodes in space, and then proposed a temporal cooperative relationship modeling to enhance the correlation between non-adjacent skeleton nodes among non-adjacent frames. Finally, extensive experiments based on three benchmark datasets, NTU RGB + D60, NTU RGB + D120, and Kinetics validate the superior performance of the proposed JSCR and provide state-of-the-art results. Acknowledgement. This work was supported by Natural Science Foundation of Shandong Province (NSFSP) under grant ZR2020MF137.

786

H. Lu and T. Wang

References 1. Aggarwal, J.K.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 1–43 (2011) 2. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010) 3. Yong, D., Wei, W., Liang, W.: Hierarchical recurrent neural network for skeleton based action recognition. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118. CVPR, Boston (2015) 4. Shahroudy, A., Liu, J., Ng, T. T., et al.: Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019. CVPR, Las Vegas (2016) 5. Kiwon, Y., Jean, H., Debaleena, C., et al.: Two-person interaction detection using bodypose features and multiple instance learning. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 28–35. CVPRW, Providence (2012) 6. Gunnar, J.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14, 201–211 (1973) 7. Zhengyou, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012) 8. Zhe, C., Tomas, S., Shih-En, W., et al.: Realtime multi-person 2d pose estimation using part affinity fields. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1310. CVPR, Honolulu (2017) 9. Shotton, J., Fitzgibbon, A., Cook, M., et al.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp. 1297–1304. IEEE, Colorado Springs (2011) 10. Sijie, S., Cuiling, L., Junliang, X., et al.: Skeleton-indexed deep multi-modal feature learning for high performance human action recognition. In: 2018 IEEE International Conference on Multimedia and Expo, pp. 1–6. ICME, San Diego (2018) 11. Jinmiao, Ca., Nianjuan, J., Xiaoguang, H., et al.: Jolo-gcn: mining joint-centered light-weight information for skeleton-based action recognition. In: 2021 IEEE Winter Conference on Applications of Computer Vision, pp. 2734–2743. WACV, Waikoloa (2021) 12. Yuxin, C., Ziqi, Z., Chunfeng, Y., et al.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 13359–13368. ICCV, Montreal (2021) 13. Pranay, G., Anirudh, T., Aditya, A., et al.: Quo vadis, skeleton action recognition? Int. J. Comput. Vision 129(7), 2097–2112 (2021) 14. Sijie, Y., Yuanjun, X., Dahua, L.: Spatial temporal graph convolutional networks for skeletonbased action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, AAAI, Washington, DC (2018) 15. Maosen, L., Siheng, C., Xu, C., et al.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3590–3598. CVPR, Long Beach (2019) 16. Ziyu, L., Hongwen, Z., Zhenghao C., et al.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 140–149. CVPR, Seattle (2020) 17. Bin, L., Xi, L., Zhongfei, Z., et al.: Spatio-temporal graph routing for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 8561– 8568. AAAI, Washington, DC (2019) 18. Lei, S., Yifan, Z., Jian, C., et al.: Twostream adaptive graph convolutional networks for skeletonbased action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12026–12035. CVPR, Long Beach (2019)

A Digital Human System with Realistic Facial Expressions for Friendly Human-Machine Interaction Anthony Condegni, Weitian Wang, and Rui Li(B) School of Computing, Montclair State University, Montclair, NJ 07043, USA [email protected]

Abstract. Digital human technology can be applied to many areas, especially at the interaction of intelligent machines, digital 3D, and human-machine interaction. This paper develops a digital human system that enhances friendly interaction between humans and machines through synchronized and realistic facial expressions, based on tracking data from a human user. The details of this paper can be summarized in three parts. First, customize and build a 3D model for the digital human. Second, create a parameterized 3D model for different facial animations. Third, track the user for synchronizing the facial expressions of the customized digital human. The experimental results and analysis in this paper demonstrate the effectiveness and advantages of the developed system, which can create realistic and natural synthesized facial expressions. The system can be further generalized to other intelligent computing areas, such as smart medical care, autonomous vehicles, and companion robots. Keywords: Digital Human · Intelligent Computing · Human-Machine Interaction

1 Introduction While it is true that digital human technology has been widely used in the entertainment industry, its potential applications extend far beyond games and social media. In fact, digital human technology can pave the way for new possibilities in a wide range of fields, especially the intersection of intelligent machine, digital 3D and human-machine interaction. By incorporating digital human systems into intelligent machines, it is possible to create machines that can interact with humans in a more natural and engaging way. And this can be particularly valuable in areas like smart medical care, where intelligent machine can be used to assist with patient care and rehabilitation. For digital human technology, facial animation synthesis is a key component. The ability to create accurate facial expressions and emotions is very important in developing a useful and friendly digital human. There are multiple techniques for simulating facial animation in digital humans, such as motion capturing, keyframe animation, and procedural animation. Motion capture records a user’s facial expression via computer vision or motion sensors. The captured © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 787–798, 2023. https://doi.org/10.1007/978-981-99-4755-3_68

788

A. Condegni et al.

data is then mapped onto the face of the digital human for animation synthesis. Keyframe animation requires manually animating each facial parts, such as the eyes, mouth, and jaw for target expressions. Procedural animation is based on computer algorithms to generate facial expressions via tuning multiple parameters. Among these animation techniques, motion capture based facial animation has multiple advantages over the other two techniques. One of the main advantages of motion capture-based facial animation is that it allows for quick and lifelike animation synthesis due to the realistic facial data that are automatically captured from real humans. However, this is difficult to achieve in keyframe animation and procedural animation, which require a large amount of manual input and tuning. Until now, many works have leveraged the power of motion capture technique for facial animation synthesis [1–4]. For example, Richard et al. proposed a variational auto-encoder (VAE) based method for realistic facial expressions on a codec avatar [5]. Li et al. converted 2D captures into 3D animation based on the adaptive principal component analysis (PCA) [6]. As a linear model, the PCA method was widely used in facial tracking and animation synthesis due to its compact representation and computational efficiency. In these works, detailed and subtle facial expression synthesis has been achieved by capturing facial movement data from real humans. Leveraging the power of motion capture technique, this paper developed a digital human system that enhances friendly interaction between humans and machines through synchronized and realistic facial expressions based on tracking data from a human user. The following content of the paper is organized as follows: Sect. 2 provides a survey of state-of-the-art works closely related to this paper. Section 3 presents a detailed system overview of the proposed system. The theoretical knowledge associated with the proposed system is summarized in Sect. 4. To demonstrate the effectiveness and efficiency of the proposed system, Sect. 5 provides a detailed description of the experimental design, results, and analysis. Conclusion and future works are stated in Sect. 6.

2 Related Work The idea of a model being controlled by a real human has been researched in the past with varying outcomes. Past methods can be categorized into different categories based on their focus. Some approaches focus on facial reconstruction to create a realistic model [7–13], while others concentrate on 3D facial animation synthesis [14–19]. Another consideration is the simplicity and ease of implementation of the methods. For equipment, various past methods only require a mobile phone for facial tracking [2–4, 20], making them easy to use. As a result, some methods focus on a simple linear model, which is computationally efficient [21–26]. In recent years, the technology of facial synthesis has been successfully progressed. Liu et al. [27] and Richards et al. [5] both use a method of combining gaze and audio input with a VAE-based model by a Kinect sensor. Navarro et al. proposed a method of mapping facial synthesis onto a 3D cartoon character with no manual calibration [28]. Weise et al. focus on fast and robust mapping onto a character from video using temporal coherence [29]. Some works concentrate on creating accurate and realistic facial animation. Cao et al. synthesized facial animation using binocular video information for a regression model [14], which allows for animation on a photo-realistic avatar but

A Digital Human System with Realistic Facial Expressions

789

requires training for different lighting scenarios and environments. Weng et al. developed a facial animation system that can run on mobile devices using a regression method that remains accurate animation results when mapping onto the models. They also focus on training for different lighting conditions to improve robustness [20]. Li et al. use a calibration-free method for on-the-fly correctives when using facial synthesis on models. Their method transforms 2D real-time video into 3D animation using blendshape theory [6].

3 System Framework

Fig. 1. System framework. The proposed work consists of four main components: (1) personalized 3D head model reconstruction, (2) parametrization of the 3D model, (3) real-time facial tracking from a user, (4) 3D facial expression synthesis.

This paper presents a digital human system designed to enhance the human-computer interaction process by generating synchronized and realistic facial expressions based on real-time users’ facial data. As shown in Fig. 1, the system framework captures a user’s facial and head movements using a camera as the main input and displays the synchronized facial animation on a remote host computer as the output. The proposed system consists of four main parts: (1) personalized 3D static head modeling, (2) 3D facial parametrization, (3) user facial tracking, and (4) 3D facial expression synthesis. The system begins with 3D modeling of a user’s head. For this process, portrait photos of a user from five angles (front, back, left side, right side, and upward) are used as reference for 3D reconstruction of head model [30, 31]. After obtaining the 3D model of the digital human, we manually refine the textures of the model to ensure accurate and realistic results and fix any texture seams to avoid distortion. To prepare the digital human for generating specific facial movements, the 3D model is further parameterized for facial tracking and movement control. In addition, we use a mobile-based movement tracking method to capture multiple facial features from a human for animating subtle and realistic facial expressions. These captured data are then mapped onto parametric model of the digital human to drive the facial and head movements.

790

A. Condegni et al.

4 Proposed Method 4.1 3D Reconstruction of a User To create a customized 3D shape of the digital human, multiple 2D portrait photos of the user are used. These photos are captured using a mobile phone and taken in the same lighting and position conditions to ensure consistency. The 3D reconstruction process involves using portrait photos from five different angles (front, back, left side, right side, and upward) as shown in Fig. 2 (a) to (e). Figure 2 (a) to (e) show examples of these five photos that are used. From Fig. 2(a) to (e) are viewports from left, front, back, right side, and upward angles, respectively. The resulting head model reconstruction, shown in Fig. 2 (f) to (g), accurately reflects the user’s face and head shape. However, to achieve a realistic virtual representation of the user, the model is further fine-tuned by manual sculpting. This process ensures that the 3D mesh is both accurate and realistic, which is crucial for an immersive experience. The textures and texture seams of the model are also manually fixed to avoid distortion. The above preparation works result in a customized 3D static model that provides an accurate and realistic representation of the user.

Fig. 2. 3D Reconstruction of a user. (a) to (e) display the user’s portrait photos of the left side, front, back, right side, and upward views, respectively, which are used reconstruction of the 3D static head model. (f) and (g) show the created 3D static head model with and without meshes information, seperately.

4.2 Head Rotation Simulation Once the static head model is sculpted and created, it needs to be further parametrized for head movements. In 3D graphics and animation, the movements of the 3D model are generated by the movements of points on the model. The mathematical representation of these points can be expressed as a list of tuples, with each tuple representing the coordinates of a point on the model: H0 = (p0 , p1 , p2 , . . . pk . . . ., pN ),

(1)

A Digital Human System with Realistic Facial Expressions

791

where H0 is the head model mesh, pk = (xk , yk , zk , 1)T is the coordinate of the kth point for the mesh, k = 1, 2, 3, . . . N . N is the total number of points. To enable realistic head movements, the coordinates of the points need to be updated in response to the movements of the user. In the context of our research, we focus on head rotations as the primary movement to be captured and reflected in the 3D model. The head rotation can be calculated by Euler angles θ, ρ, γ for a combination of pitch, yaw, and roll rotation. Here, the right-handed coordinate system is used. Three distinct formulas are defined for these rotations respectively: ⎛ ⎞ 1 0 0 0 ⎜ 0 cos(θ ) − sin(θ ) 0 ⎟ ⎟ Rx (θ ) = ⎜ (2) ⎝ 0 sin(θ ) cos(θ ) 0 ⎠, 0 0 0 1 ⎛ ⎞ cos(ρ) 0 sin(ρ) 0 ⎜ 0 1 0 0⎟ ⎟ (3) Ry (ρ) = ⎜ ⎝ − sin(ρ) 0 cos(ρ) 0 ⎠, 0

0

0

1



⎞ cos(γ ) − sin(γ ) 0 0 ⎜ sin(γ ) cos(γ ) 0 0 ⎟ ⎟, Rz (γ ) = ⎜ ⎝ 0 0 1 0⎠ 0 0 01

(4)

where Rx (θ ), Ry (ρ), Rz (γ ) are three matrices used for calculating the rotation around the x, y, and z axes. The complex head rotation can be realized by matrix multiplication. And this process can be represented by the following equation:   = Rx (θ )Ry (ρ)Rz (γ ) , (5) pkα =  · Pk where  is the matrix multiplication of three rotation matrices. pkα is the updated kth point’s position after head rotation. Figure 3 shows examples of the created parametrical head model in Unreal Engine Metahuman library [32]. Figure 3 (a) shows the static model that is created and will be parametrized. Figure 3 (b) shows the parametrized head model which is ready for animating head rotation and facial animation. 4.3 Blendshape Theory In addition to head rotation, facial expressions are also considered in this work to improve human-computer interaction. Facial expression is complex, and its related points do not move as simply as those involved in head rotation. This is because facial skin undergoes deformations to express different emotions. Therefore, we have built a parametric facial model based on the Blendshape theory to realize these deformations. The model can be represented by: s = (pi , pi+1 , pi+2 , . . . .pM ),

(6)

792

A. Condegni et al.

Fig. 3. Examples of creating parametrical head model. (a) Example of the static model that is created and will be parametrized. (b) Example of the parametrical head model.

where {pi , pi+1 , pi+2 , . . . .pM } ⊆ {p0 , p1 , p2 , . . . pk . . . ., pN }, 0 ≤ M ≤ N and (pi , pi+1 , pi+2 , . . . .pM ) indicate all points in the face region. Figure 4 takes a point on the upper lip as an example to show how changing point’s coordinates can deform the shape of the face. Figure 4 (a) shows a point pi ’s position on the upper lip for neutral face. Figure 4(b) shows the point pi moves its position from original pi (0) to pi (1) for mouth open shape. To synthesize specific facial expressions, this paper utilizes the Blendshape theory [22], which is a linear model for generating synthesized facial expressions. Compared to other parametric and physiological models, the Blendshape theory has been widely applied in numerous applications, such as animation, films, and mobile apps, due to its computational efficiency. The mathematical representation of the Blendshape theory can be expressed by the following equation: f = s0 +

n

wk (sk − s0 ),

(7)

k=1

where f is the current face that is being shown. s0 is the neutral face with no facial expression. sk is the target face, k = 0, 1, 2, ..., n. sk − s0 is the difference between the kth target face and the neutral face. wk is the influence weight for the kth target face, meaning the intensity of the movement of the related points on the face model. n is the total number of the target faces. wk ∈ [0, 1], where 0 means that there is no influence, and 1 means the maximum influence for a target facial expression.

Fig. 4. Example of how point movements are used to generate facial expressions (the origional pictures [33] were used and modified here for better explaning the theory in this paper.) (a) a point’s position for neutral face. (b) the point’s position change for open mouth.

A Digital Human System with Realistic Facial Expressions

793

5 Experiments 5.1 Experimental Setup The proposed system in this paper is implemented on a remote laptop and a mobile phone. The laptop, installed with Unreal Engine, is used to calculate the facial animation and display the synthesized graphics results. The mobile phone is used to track the user’s facial information and head rotation in real-time. The laptop and the mobile phone are connected wirelessly, and the “live link” [34] library installed in the mobile phone is used to track facial expression and head pose. The tracked facial information and head rotation angles are then transferred to the laptop for animating the 3D digital human. Figure 5 shows an example of the experimental setup where the user uses the camera on the mobile phone to track facial and head information. This easy-to-implement experimental setup allows for quick facial and head animation synthesis. 5.2 Experimental Results and Analysis

Fig. 5. The experimental setup consists of a mobile phone for facial tracking and a laptop for animation synthesis, connected wirelessly.

The tracking results are portrayed on the application while facial tracking in a live view. As discussed, there are plenty of points marked on the face based on a neutral pose. These points are altered based on the expressions portrayed. This is done using the blendshape algorithm, gathering numerical data for each point of the face called blendshape data. The different numbers show the weight of the specific blendshape expressions (from the AR-Kit library [34]). To better observe the facial tracking, Fig. 6 shows tracking results of six different facial movements. Specifically, movements tracking of eyes, mouth, cheek, and head pose are tested. Figure 6 (a) shows the tracking of neutral face for comparison with other facial expression. Figure 6 (b)–(f) shows the tracking results for smile, head tilt up, cheek puff, eye wink, and mouth open. In addition to track the facial movements, the utilized tracking methods also track the user’s eye ball movements and head poses, which are very important for realistic facial and head animation synthesis. Figure 6 (a) shows the tracking results for neutral face. All the blendshape data are close to 0 as there is no expression was made for neutral face. When there is a new expression, Fig. 6(b) to Fig. 6(f), the corresponding blendshape data will be changed. For instance,

794

A. Condegni et al.

blinking the left eye will change the values of the blendshapes for “left eye blink”, “left eye squint”, and “left eyebrow down” (Fig. 6 (b)). When the mouth is open, the value of blendshape for “jaw being opened” is greatly affected (Fig. 6 (c)). Similarly, obvious blendshape values are being altered based on expressions of smiling (Fig. 6(d)), cheek puffing (Fig. 6(e)). In addition to the blenshape values for facial expressions. There are three values for recording head poses: yaw, pitch, and roll. Figure 6 (f) shows the tracking results for head upwards. The tracked facial and head pose data are then mapped onto the synthesized human head to simulate facial expressions for the digital human. Figure 7 shows detailed synthesis results for head pose, eye movements, mouth shapes, and combination effects. In Fig. 7(a), different head movements (yaw, roll, pitch, and combination) are synthesized. Figure 7(a-1) shows the synthesis result of yaw head movements that frequently occur during human communication. Figure 7(a-2) shows the synthesis result of roll head movements. Figure 7(a-3) shows the synthesized pitch head movements. A synthesized combination of head movements is shown in Fig. 7(a-4). Figure 7(b) shows the synthesis results of eye balls movements and eye lids close. Figure 7 (b-1) shows the eyes status of neutral face for comparison. Figure 7 (b-2) and (b-3) show shows the results of looking to the left and right respectively. Figure 7(b-4) shows eye lids close synthesis.

Fig. 6. Examples of facial tracking results for different facial expression. From (a) to (f), the expressions are neutral, smile, head tilt up, cheek puff, left eye wink, and mouth open.

Figure 7 (c) shows the different synthesis results of mouth movements. Figure 7 (c-1) shows the synthesis result of mouth open with teeth visible when the digital human is angry. Figure 7(c-2) shows the result of left corner of the mouth open wider with visible lower teeth. Figure 7 (c-3) shows results of jaw dropped down and mouth open wide. This movements commonly happened for the face of surprise. Figure 7 (c-4) shows pursed lips to the right side. This movements usually happened along with other facial movements. For better observation of mouth movements, the other facial movements are eliminated in this synthesis result. Figure 7 (d) shows a combination synthesis results with head, face, and eyes movements. Figure 7 (d-1) shows the neutral face for reference. Figure 7 (d-2) shows smile face with eyes look a little bit downward, lips are pulled back, visible teeth and without head movements. Figure 7 (d-3) shows a result of eyes look to the left side, yaw head

A Digital Human System with Realistic Facial Expressions

795

Fig. 7. More detailed results for facial, head, and eyes movements. (a) head movements (yaw, roll, pitch, and combination) synthesis. (b) synthesis of eye balls movements and eye lids closes. (c) synthesis of mouth movements (d) combination synthesis of head, face, and eyes movements.

movement to the right, lips pulled back, and visible teeth. Figure 7(d-4) shows a result of eyes look to forward, roll head movement to the left, lips pulled back, and visible teeth. 5.3 Comparison and Evaluation To prove the effectiveness of the proposed system, a comparison experiment to the user’s facial expression is conducted. The results, as shown in Fig. 8, present a detailed comparison between six basic facial expressions: happy, sad, anger, fear, disgust, and surprise, with the user’s facial expressions. For each frame, there are user’s picture on the left and synthesis result on the right for comparison. Figure 8 (a-1) to (a-5) show the frame sequence examples of happy. There are eyes, mouth, and head’s movements synthesis in them. The digital mimicry of the user’s movements and expressions, such as the smile and head tilt up, are evident in Fig. 8 (a-1) to (a-3). From Fig. 8 (a-3) to (a-5), the digital human generates the animation for the gradual disappearance of the smile, based on the user’s facial expression. Figure 8 (b-1) to (b-5) show the comparison frames for the sadness facial expression. Specifically, Fig. 8 (b-1) to (b-3) show the synthesized sad facial expression with eye closed and tilt down. These are also appeared on the user’s face. Following the user’s facial expression changes, a disappearance of sadness with eyes gradually open and head return to the neutral pose is evidenced in Fig. 8 (b-3) to (b-5). Figure 8 (c-1) to (c-5) show the comparison frames for the anger facial expression. Similar to the user’s facial expression changes, there are eyebrows, eyes, mouth, chin, and head movements for the anger facial synthesis. Figure 8 (d-1) to (d-5) show the comparison frames for the anger facial expression. Similar to the user’s eyebrows and jaw movements, the eyebrows raised and jaw dropped open are synthesized and presented in the result’s sequence. Figure 8 (e-1) to (e-5) shows frame sequence example of disgust. Lips compression effects are synthesized in Fig. 8 (e-1) to (e-3) while head shaking result is shown in Fig. 8 (e-2) to (e-3). Synthesis of nose twitches for disgust appear in Fig. 8 (e-4). The

796

A. Condegni et al.

Fig. 8. Comparison results of facial expression synthesis for six basic emotions. From (a) to (f) are comparison examples of 3D synthesis results for happy, sad, anger, fear, disgust, and surprise.

expression recovers to neutral in Fig. 8 (e-5). After comparing the digital human and the user in Fig. 8 (e-3), the digital human is not able to mimic the user’s slightly head pitch movements. This can be improved in the future work. Figure 8 (f-1) to (f-7) show the process of surprise with starting at neutral facial expression. Eyebrows raised, eye open wide, and jaw dropped down are synthesized in Fig. 8 (f-2) to (f-4). The expression recovers to neutral in Fig. 8 (f-5). The comparison results prove the effectiveness of the proposed system.

6 Conclusions and Future Work In conclusion, this paper presents a digital human system that utilizes real-time users’ facial data to generate synchronized and realistic facial expressions. The work in this paper accomplished three tasks: First, 3D reconstruction from multiple portrait photos of a user. Second, creation of a parameterized 3D model for different facial animations. Third, synthesis of 3D facial and head movements for the digital human system. The experimental results and analysis in this paper demonstrate the effectiveness and advantages of the developed system, which can create realistic and natural synthesized facial expressions. Moreover, the system can be further generalized to other intelligent computing areas, such as smart medical care, autonomous vehicles, and companion robots. Future work will focus on integrating the developed system with intelligent machines for better human-machine interaction. Acknowledgements. Anthony Condegni gratefully acknowledges support from the MSU CSAM Faculty-Student Summer Program. This work was also supported in part by the National Science Foundation under Grant CNS-2104742 and in part by the National Science Foundation under Grant CNS-2117308.

A Digital Human System with Realistic Facial Expressions

797

References 1. Ping, H.Y., Abdullah, L.N., Sulaiman, P.S., Halin, A.A.: Computer facial animation: a review. Int. J. Comput. Theory Eng. 5(4), 658–662 (2013) 2. Tresadern, P.A., Ionita, M.C., Cootes, T.F.: Real-time facial feature tracking on a mobile device. Int. J. Comput. Vision 96(3), 280–289 (2012) 3. Wettum, Y.C.: Facial landmark tracking on a mobile device, Bachelor’s thesis, University of Twente (2017) 4. Liu, X., Wang, J., Zhang, W., Zheng, Q., Li, X.: EmotionTracker: a mobile real-time facial expression tracking system with the assistant of public AI-as-a-service. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4530–4532 (2020) 5. Richard, A., Lea, C., Ma, S., Gall, J., De la Torre, F., Sheikh, Y.: Audio- and gaze-driven facial animation of codec avatars. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp. 41–50 (2021) 6. Li, H., Yu, J., Ye, Y., Bregler, C.: Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32(4), 41–49 (2013) 7. Sela, M., Richardson, E., Kimmel, R.: Unrestricted facial geometry reconstruction using image-to-image translation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1585–1594 (2017) 8. Lattas, A., Moschoglou, S., Gecer, B., Ploumpis, S.: AvatarMe: realistically renderable 3d facial reconstruction “in-the-wild”. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 757–766 (2020) 9. Gafni, G., Thies, J., Zollhofer, M, Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8645–8654 (2021) 10. Jo, J., Choi, H., Kim, I.J., Kim, J.: Single-view-based 3D facial reconstruction method robust against pose variations. Pattern Recogn. 48(1), 73–85 (2015) 11. Richardson, E., Sela, M., Kimmel, R.: 3D face reconstruction by learning from synthetic data. In: Proceedings of 4th International Conference on 3D Vision, pp. 460–467 (2016) 12. Beeler, T., Bickel, B., Beardsley, P., Sumner, B., Gross, M.: High-quality single-shot capture of facial geometry. In: Proceedings of ACM SIGGRAPH, pp. 1–9. (2021) 13. Schwartz, G., et al.: The eyes have it: an integrated eye and face model for photorealistic facial animation. ACM Trans. Graph. 39(4), 91:1-91:15 (2020) 14. Cao, C., et al.: Real-time 3D neural facial animation from binocular video. ACM Trans. Graph. 40(4), 1–17 (2021) 15. Pighin, F., Auslander, J., Lischinski, D., Salesin, D.H., Szeliski, R.: Realistic facial animation using image-based 3D morphing. Technical report UW-CSE-97-01-03, 1–26 (1997) 16. Lou, J., et al.: Realistic facial expression reconstruction for VR HMD users. IEEE Trans. Multimedia 22(3), 730–743 (2020) 17. Das, D., Biswas, S., Sinha, S., Bhowmick, B.: Speech-Driven Facial Animation Using Cascaded GANs for Learning of Motion and Texture. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 408–424. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_25 18. Jiang, D., Zhao, Y., Sahli, H., Zhang, Y.: Speech driven photo realistic facial animation based on an articulatory DBN model and AAM features. Multimedia Tools Appl. 73(1), 397–415 (2013) 19. Cao, C., Wu, H., Weng, Y., Shao, T., Zhou, K.: Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35(4), 1–13 (2016) 20. Weng, Y., Cao, C., Hou, Q., Zhou, K.: Real-time facial animation on mobile devices. Graph. Models 76(3), 172–179 (2014)

798

A. Condegni et al.

21. Chuang, E., Bregler, C.: Performance driven facial animation using blendshape interpolation. Comput. Sci. Tech. Rep. 2(2), 1–8 (2002) 22. Lewis, J.P., Anjyo, K., Rhee, T., Zhang, M., Pighin, F.H., Deng, Z.: Practice and theory of blendshape facial models. In: Eurographics 2014, vol. 1, pp. 1–23 (2014) 23. Bouaziz, S., Wang, Y., Pauly, M.: Online modeling for realtime facial animation. ACM Trans. Graph. 32(4), 1–10 (2013) 24. Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. ACM Trans. Graph. 32(4), 1–10 (2013) 25. Pham, H.X., Wang, Y., Pavlovic, V.: End-to-end learning for 3D facial animation from speech. In: Proceedings of ACM International Conference on Multimodal Interaction, pp. 361–365 (2018) 26. Joshi, P., Tien, W.C., Desbrun, M., Pighin, F.: Learning controls for blend shape based realistic facial animation. In: Proceedings of Eurographics/SIGGRAPH Symposium on Computer Animation, pp.1–7 (2003) 27. Liu, Y., Xu, F., Chai, J., Tong, X., Wang, L., Huo, Q.: Video-audio driven real-time facial animation. ACM Trans. Graph. 34(6), 1–10 (2015) 28. Navarro, I., Kneubuehler, D., Verhulsdonck, T., Du Bois, E.D., Welch, W., Verma, V., Sachs, I., Bhat, K.: Fast facial animation from video. In: Proceedings of ACM SIGGRAPH, pp. 1–2 (2021) 29. Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. ACM Trans. Graph. 30(4), 1–10 (2011) 30. Blender Homepage. https://www.blender.org/. Accessed 18 May 2023 31. KeenTools FaceBuilder for Blender. KeenTools (2022) 32. MetaHumans Homepage. https://www.unrealengine.com/en-US/metahuman. Accessed 18 May 2023 33. ARKit Face Blendshapes (Perfect Sync). https://arkit-face-blendshapes.com/. Accessed 18 May 2023 34. ARKit Home Page. https://developer.apple.com/documentation/arkit. Accessed 18 May 2023

Author Index

A Albeedan, Meshal 132 Al-Jumeily OBE, Dhiya 132 Ansari, Sam 132 B Bai, Huang 499 Bai, Ling 206 Bao, Wenzheng 685, 695 Basseur, Matthieu 59 C Cai, Yuxiang 616 Chen, Debao 365 Chen, DeBao 376 Chen, Guangyu 47 Chen, Leyi 262 Chen, Mengya 739 Chen, Wei 556 Chen, Wenbo 85, 333 Chen, Xin 604 Chen, Xuehao 705 Chen, Ziyu 97, 109 Cheng, Di 181 Cheng, Li 47 Condegni, Anthony 787 Cui, Sheng 71 D Ding, Jia 71 Ding, Ziqi 156 Du, Tao 705, 715, 727 F Feng, Cong 522 Feng, Fei-Long 15 Feng, Xianlin 37 Feng, Yanran 466 Feng, Yongxin 71

G Gao, Yi-Jie 288 Gong, Jianqun 229 Gong, Kai 628, 751 Guan, Pengwei 662 Guan, Yuanlin 628 Guo, Jiale 475 Guo, Ning 168, 218 H Han, Shiyuan 628, 640, 705, 715, 727 He, Feng 571, 581 Hu, Rong 15, 37, 85, 122, 146, 156, 168, 194, 206, 229, 241, 251, 288, 299, 322, 333, 356, 386, 410 Hu, Wei 97, 109, 181 Hu, Zhuhua 442, 454 Huang, Shuanglong 475 Huang, Yu-Fang 410 Hussain, Abir 132 J Jian, Zihao 593 Jiang, Xin 616 Jin, Huai-Ping 206 Jin, Taisong 544 K Kaissar, Antanios 132 Ke, Peng 343 Khan, Wasiq 132 L Lai, Yongxuan 571, 581 Lan, Ke 122 Li, Chenglong 739 li, Cong Song 365 Li, Hao 640 Li, Haodi 97, 109 Li, Jun- qing 397

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 D.-S. Huang et al. (Eds.): ICIC 2023, LNCS 14086, pp. 799–801, 2023. https://doi.org/10.1007/978-981-99-4755-3

800

Author Index

Li, Jun 262, 275 Li, Kun 322, 386 Li, Meng 71 Li, Min 487 Li, Rui 787 Li, Xiao 386 Li, Xiumei 499 Li, Ying 241 Li, Zuo-Cheng 194, 218 Li, Zuocheng 156, 206, 229, 299 Lian, Yang 640 Lin, Chen 616 Lin, Xingyao 593 Liu, Bowen 705, 715, 727 Liu, Fenghui 604 Liu, Hui 532 Liu, Peihua 662 Liu, Rui 356 Liu, Si 522 Liu, Xiaozhang 442 Liu, Xinfeng 739 Liu, Xiyu 763 Liu, Yunxia 522, 532 Lu, Hao 775 Lu, Jianhua 97, 181 Lv, Guoning 522 Lv, Haoyang 593 M Ma, Haiying 475 Ma, Xiaojun 763 Mahmoud, Soliman 132 Mao, Jian-Lin 168 N Niu, Ben 310 Niu, Shuwen 593 P Pan, Zhenxiong

499

Q Qi, Lin 604 Qian, Bi 122 Qian, Bin 15, 37, 85, 146, 156, 168, 194, 206, 218, 229, 241, 251, 288, 299, 322, 333, 356, 386, 410 Qiu, Haiyun 310 Qiu, Xing-Han 218

S Shang, Qing-Xia 15, 288 Shang, Wenyuan 343 Shen, Qingni 425 Shen, Qiu-Yi 168 Song, Liang 571, 581, 593 Sun, Jinping 695 Sun, Junmei 499 Sun, Qiwei 751 Sun, Runyuan 662 T Tang, Jingjing 593 Tian, Jie 739 Tie, Yun 604 Turky, Ayad 132 U Ullah, Inam

739

W Wang, Bin 229, 251 Wang, Chao 71 Wang, Juan 397 Wang, Mingkai 544 Wang, Peng 3, 674 Wang, Sanxing 71 Wang, Tingwei 775 Wang, Weitian 787 Wang, Yi-Jun 146 Wang, Yijun 85, 333 Wang, Yingxu 705, 715, 727 Wang, Yonghao 532 Wang, Zhuo 685, 695 Wei, Fuyang 652 Wei, Yujie 727 Wu, Bingqian 511 Wu, Fang-Chun 251 Wu, Jing 97, 109, 181 Wu, Xing 241, 410 Wu, Zhonghai 425 X Xia, Zhenyang 25 Xiao, Kai 685 Xiao, Qinge 310 Xin, Gang 674 Xing, Chen 25 Xu, Lei 662

Author Index

Xue, Bowen 310 Xue, Li-Yuan 59 Y Yan, Anli 442 Yang, Biao 299 Yang, Cheng 705, 715, 727 Yang, Guosong 3, 674 Yang, Qifan 616 Yang, Tianling 475 Yang, Xiaohui 628 Yang, Xixin 640 Yang, Xueyan 695 Yang, Yuanyuan 333 Yang, Yuan-Yuan 146, 288, 410 Yi, Guohong 511 Yin, Xinyu 3, 674 Yu, Dongyang 454 Yu, Nai-Kang 15, 37, 356 Yu, Senwu 275 Yu, Weiwei 628, 640 Yu, Zhengtao 544 Yuan, Pengxuan 571, 581 Yuan, Zhiyong 652 Z Zeng, Rong-Qiang 59 Zhang, Chang Sheng 37 Zhang, Changsheng 156, 322 Zhang, Chen 715 Zhang, Da-Cheng 122

801

Zhang, Dalong 604 Zhang, Jian 146 Zhang, Liangliang 662 Zhang, Miaohui 544 Zhang, Na 662 Zhang, Ping 97, 109, 181 Zhang, Sen 85, 322, 356 Zhang, Shengchuan 556 Zhang, Shurui 299 Zhang, Weiwei 71 Zhang, Xiaolei 425 Zhang, Xin 499 Zhang, Yanhua 751 Zhang, Yanyun 47 Zhang, Yue 25 Zhang, Zi-Qi 122, 194, 218, 241, 251, 386 Zhao, Hongguo 522, 532 Zhao, Jianhui 652 Zhao, Jinghang 640 Zhao, Wenhao 616 Zhao, Yaochi 442, 454 Zhao, Zhenghang 532 Zheng, Huilin 425 Zheng, Shuo 376 Zhong, Xiaofang 751 Zhou, Hao 739 Zhou, Jin 705, 715, 727 Zhou, Tao 343 Zhou, Weidong 705, 715, 727 Zhu, Jin-Han 194 Zhu, Yanfei 442 Zou, Feng 365, 376