This book highlights recent research on intelligent systems and nature-inspired computing. It presents 130 selected pape
218 26
English Pages 1441 [1440] Year 2021
Table of contents :
Preface
Organization
General Chairs
Program Chairs
Publication Chairs
Publicity Chairs
International Program Committee
Contents
Intelligent Automation Systems at the Core of Industry 4.0
1 Introduction
2 Motivation
3 Intelligent Automation – Necessity (or Objectives), Features and Possibilities
4 Future Possibilities with AI and IoT towards Intelligent Automation
4.1 Intelligent Automation/Robotics/Artificial Intelligence in Software Development
4.2 Intelligent Automation/Robotics/Artificial Intelligence (AI) in Cloud Computing
4.3 AI for Cyber Security: Intrusion Detection Automatically by Machine and Artificial Intelligence
4.4 AI for Industry 4.0
4.5 AI for Data Science
4.6 Artificial Intelligence – IoTs Integration for Other Sectors
5 Related Work
6 Conclusion
References
Hybrid Extreme Learning Machine and Backpropagation with Adaptive Activation Functions for Classification Problems
1 Introduction
2 Materials and Methods
2.1 Datasets
2.2 Extreme Learning Machine (ELM)
2.3 Adaptive Activation Functions
3 Computational Experiments and Discussion
4 Conclusion
References
Assessing an Organization Security Culture Based on ENISA Approach
1 Introduction
2 Literature Review
3 Security Culture Development
4 Methodology
4.1 Security Domains
5 Results
6 Conclusion
References
Evaluation of Information Security Policy for Small Company
1 Introduction
2 Literature Review
3 Information Security Culture Development
4 Methodology
5 Results
6 Conclusion
References
English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word
1 Introduction
2 Proposed Approach
3 Experimental Setup
3.1 Evaluation Method
3.2 Evaluation Result
4 Conclusion
References
Relay UAV-Based FSO Communications over Log-Normal Channels with Pointing Errors
1 Introduction
2 System and Channel Models
2.1 System Model
2.2 Channel Model with Pointing Errors
3 Performance Bounds
4 Numerical Results
5 Conclusions
References
A Novel Generalized Form of Cure Rate Model for an Infectious Disease with Co-infection
1 Introduction
2 Materials and Methods
2.1 Standard Cure Rate Model
2.2 Estimation of Cure Fraction Model (Non-mixture Model)
2.3 Cure Model Likelihood Estimation
2.4 Distributions of the Cure Model
3 Simulation Result and Discussion
3.1 Simulation Results
3.2 Discussion
3.3 Contribution
3.4 Managerial Implications
4 Conclusion
References
A2PF: An Automatic Protein Production Framework
1 Introduction
2 Related Work
2.1 X-Ray Crystallography Method
2.2 Computational Methods
3 A2P Framework
3.1 Step1: The Determination of the Primary Structure
3.2 Step2: The Prediction of the Secondary Structure
3.3 Step3: The Prediction of the 3D Structure
3.4 Step4: The Printing of the 3D Protein
4 Software Application and Experiment
4.1 The Protein Primary Structure Determination
4.2 The Protein 2D/3D Structure Determination
5 Discussion
6 Conclusion
References
Open Vocabulary Recognition of Offline Arabic Handwriting Text Based on Deep Learning
1 Introduction
2 Related Work
3 Architecture and Model Details
3.1 Data Pre-processing
3.2 Features Extraction
3.3 Recognition
4 Experiments
4.1 Database Overview
4.2 Experimental Results and Discussion
5 Conclusion
References
Formal Verification of Safety-Critical Systems: A Case-Study in Airbag System Design
1 Introduction
2 Related Work
3 Case Study: Airbag Systems
3.1 Airbag System Components
3.2 Failure of an Airbag System
3.3 Specifying Safety Requirements for Airbag Systems
3.4 Single-Microcontroller Airbag Design
3.5 Dual-Microcontroller Airbag Design
4 Proposed Design
4.1 Description of the System
4.2 Results
5 Conclusion
References
Classification of Musical Preference in Generation Z Through EEG Signal Processing and Machine Learning
1 Introduction
2 Literature Review
3 Music Preference Classification and Analysis
3.1 Study Design
3.2 Experiment Setup and Measurements
3.3 Signal Preprocessing and Feature Extraction
3.4 Musical Preference Classification
4 Results
4.1 MLP and SVM Classifiers
4.2 Spectral Analysis and ICA Components
5 Conclusions
References
A Locally Weighted Metric for Measuring the Perceptual Quality of 3D Objects
1 Introduction
2 Related Work
2.1 Subjective Versus Objective Metrics for 3D Models
2.2 Integration of the Human Visual Perception in 3D Metrics
3 The Proposed Metric
3.1 Mathematical Definition
3.2 Properties
4 Results and Discussion
4.1 Database and Experiments
4.2 Evaluation Criteria
4.3 Quantitative Results and Discussion
5 Conclusion and Prospects
References
Transfer Learning for Instance Segmentation of Waste Bottles Using Mask R-CNN Algorithm
1 Introduction
2 Related Work
3 Bottle Segmentation Using Mask R-CNN
4 Results and Discussion
5 Conclusions
References
ImageFuse: A Multi-view Image Featurization Framework for Visual Question Answering
1 Introduction
2 Related Works
3 Method
3.1 First Level Feature Extraction
3.2 Feature Fusion and Joint Embedding
3.3 Answer Prediction
4 Experiments
4.1 Dataset, Models and Evaluation Metrics
4.2 Results and Discussion
5 Conclusions
References
Cooperative Advanced Driver Assistance Systems: A Survey and Recent Trends
1 Introduction
2 Overview of ADAS Classification
2.1 First Factor: Ability to Take a Preventative Role in Mitigating Hazardous Situations
2.2 Second Factor: Impacts on Traffic Efficiency and Safety, High or Low
2.3 Third Factor: Nature of the Human Machine Interface (HMI)
2.4 Fourth Factor: Nature of Vehicle Kinematic Control
2.5 Fifth Factor: Accident Phases
2.6 Sixth Factor: Functions and Intentions of ADAS
2.7 Seventh Factor: Level of Cooperation as a Way to Classify the Functions of Cooperative ADAS
3 A New Classification and Related Works Dealing with C-ADAS
3.1 The Proposed C-ADAS Classification
3.2 The Proposed C-ADAS for the Basic Functionalities
3.3 The Proposed ADAS for Collision Prevention
3.4 The Proposed C-ADAS for Overtaking
3.5 The Proposed ADAS for Dangerous Zones and Objects Detection
4 Discussion and Recommendations
5 Conclusion
References
Amended Convolutional Neural Network with Global Average Pooling for Image Classification
1 Introduction
2 Methodology
2.1 CNN Mechanism
2.2 Proposed Architecture
3 Experimental Results
4 Conclusion
References
Employment of Pre-trained Deep Learning Models for Date Classification: A Comparative Study
1 Introduction
2 Methodology
2.1 Dataset
2.2 Pre-trained Deep Learning Models
3 Experimental Results
4 Conclusion
References
A Study on Evolutionary Algorithms to Reopen Organizations Safely During COVID-19
1 Introduction
2 Problem Description
3 Input Variables
4 Approach
4.1 Genetic Algorithm
4.2 Chromosome Structure
4.3 Population Initialization
4.4 Fitness Evaluation
4.5 Selection
4.6 Crossover
4.7 Mutation
4.8 Termination Condition
5 Results
6 Conclusion
7 Future Direction
References
Learning Spherical Word Vectors for Opinion Mining and Applying on Hotel Reviews
1 Introduction
2 Related Work
2.1 Hotel Reviews
2.2 Opinion Mining Using Word Vectors
3 Approach
3.1 Preparing Training Set
3.2 Learning Word Vectors
3.3 Computing Review's Opinion
4 Results and Discussion
5 Conclusion
References
Towards Developing a Hospital Cabin Management System Using Brain Computer Interaction
1 Introduction
2 Literature Review and Related Work
2.1 Scarcity of Healthcare Providers
2.2 BCI in Healthcare
2.3 BCI Applications in Other Contexts
2.4 Cabin Management Without BCI
3 Requirements Elicitation
3.1 Study Procedure
3.2 Study Outcomes
3.3 Functional and Non-functional Requirements
4 Conceptual Framework
4.1 The Directory Structure and the Graphical User Interface (GUI)
4.2 Integration of Artificial Neural Network
4.3 The Control Program for Task Execution
5 Testing the Proposed Conceptual Framework
5.1 Accuracy Testing
5.2 Device Dependency Testing
5.3 End-User Feedback
6 Conclusion
References
A Discriminant Function Controller for Elevator Groups Evolved by Genetic Algorithm
1 Introduction
2 Related Works
3 Algorithms
3.1 Simulation
3.2 Controller 1: Baseline
3.3 Controller 2: Distance-Based
3.4 Controller 3: Fitness-Based
4 Results
4.1 Poisson Traffic
4.2 Comparison with Other Works
5 Conclusion
References
Keyframe Extraction Using Sobel Fuzzified Weighted Approach
1 Introduction
2 Literature Review
3 Proposed Methodology
3.1 Feature Extraction and Representation
3.2 Keyframe Extraction
4 Results and Discussion
4.1 Performance Evaluation of Keyframe Extraction
5 Conclusion
References
Deep Learning with Real-Time Inference for Human Detection in Search and Rescue
1 Introduction
2 Background
3 Proposed Approach
3.1 Deep Learning Techniques
3.2 Image Properties Variation
4 Experimental Setup and Results
4.1 Dataset
4.2 Training
4.3 Results
5 Conclusions and Future Work
References
Sliding Mode Control of Magnetic Suspended Impeller for Artificial Heart Pump
1 Introduction
2 Modeling of the Suspension System of AHP
3 Development of SMC Algorithm for Magnetic Bearing of Impeller
4 Computer Simulation
5 Conclusion
References
Design and Construction of a Wireless Controlled Fire Extinguishing Robot
1 Introduction
2 Design and Construction
3 Measurement of Robot
4 Motor Selection Process
5 Motor Controlling Circuit
6 Condition of a Single Motor for Each Logic
6.1 Logic Combination 00
6.2 Logic Combination 01
6.3 Logic Combination 10
6.4 Logic Combination 11
7 Fire Extinguishing Unit
8 Remote Control Process (Electrical)
8.1 Control Process (Software)
9 Communication System
10 Software Implementation
11 Artificial Intelligence and Visual Control
12 Evaluation of Firebot
13 Conclusion
References
Investigating Data Distribution for Classification Using PSO with Adversarial Network
1 Introduction
2 Preliminaries
2.1 Neural Network
2.2 Floating Centroids Method
2.3 Particle Swarm Optimization
3 Methodology
3.1 Basic Method
3.2 Data Transformation with RNN
4 Experiments
4.1 Random Distributed Data Evolution
4.2 Dot-Matrix Distributed Data Evolution
5 Conclusions
References
Study of Various Web-based Applications Performance in Natif IPv4 and IPv6 Environments
1 Introduction
2 IPv4/IPv6 Overview
2.1 IPv4 (Internet Protocol Version 4)
2.2 IPv6 (Internet Protocol Version 6)
3 Related Work
4 Simulation Environment
5 Results and Analysis
5.1 HTTP
5.2 FTP
5.3 Database
5.4 E-mail
5.5 VoIP
6 Conclusion
References
A Phase Memory Controller for Isolated Intersection Traffic Signals
1 Introduction
2 Related Works
3 Methods
3.1 Controller Description
3.2 Diploid Differential Evolution
4 Results
4.1 Controller Performance
4.2 Comparison with Other Works
5 Conclusion
References
Multiple Face Recognition Using Self-adaptive Differential Evolution and ORB
1 Introduction
2 Background
2.1 Feature Extraction Using ORB
2.2 Self-adaptive Differential Evolution Algorithm (SaDE)
3 Proposed Face Recognition Approach
4 Experiments and Results
5 Conclusions and Future Directions
References
Low-Cost, Ultrasound-Based Support System for the Visually Impaired
1 Introduction
1.1 Background
1.2 Related Work
2 Method
2.1 Architecture
2.2 Design
2.3 Operation
3 Results
4 Conclusions
References
SD-CCN Architecture to Improve QoE for Video Streaming Applications
1 Introduction
2 Related Work
3 RF-VS-SD-CCN Operating Principle
3.1 Metrics Collection
3.2 CCN Process
3.3 SDN Process
4 Performance Evaluation
4.1 Simulation Setup and Parameters
4.2 The Impact on Expected Bitrate
4.3 The Impact on Bitrate Oscillation
4.4 The Impact on Video Freezing
4.5 The Impact on Required Peak Bandwidth
5 Conclusion
References
Dijkstra and A* Algorithms for Global Trajectory Planning in the TurtleBot 3 Mobile Robot
1 Introduction
2 Theoretical Background
2.1 A* Algorithm
2.2 Dijkstra Algorithm
3 Experimental Setup
3.1 Robot Operating System (ROS)
3.2 TurtleBot 3 Burger
3.3 Test Environments
4 Methodology
5 Results
5.1 Dijkstra Algorithm
5.2 A* Algorithm
5.3 Dijkstra Versus A*
6 Conclusion
References
A Data Mining Framework for Response Modelling in Direct Marketing
1 Introduction
2 Data Description and Cleaning
3 Feature Selection
3.1 Simple Filters Based on Distribution Properties
3.2 Wrappers Filters
4 Model Development and Testing
4.1 Data Sampling
4.2 Class Balancing
4.3 Performance Measures
5 Experiments
5.1 Models Evaluation: ROC and AUC
6 Conclusions
References
WakeMeUp: Weight Dependent Intelligent and Robust Alarm System Using Load Cells
1 Introduction
2 Literature Review
3 Methodology
4 Implementation
4.1 Software Implementation
4.2 Hardware Implementation
5 Results and Discussions
5.1 Testing Scenarios
6 Conclusion
References
Visual Data Mining: A Comparative Analysis of Selected Datasets
1 Introduction
2 Data Preprocessing and Visualisation
2.1 Teaching Assistant Evaluation Dataset
2.2 Statlog (Australian Credit Approval) Dataset
2.3 Letter Recognition Dataset
2.4 Connectionist Bench (Sonar, Mines vs. Rocks) Dataset
2.5 Poker Hand Dataset
3 Test Criteria for Evaluation
4 Visualisation Technique Justification for the Datasets
5 Visualisation Tools Used
5.1 Xmdv Tool
5.2 Orange
6 Conclusion and Future Work
References
NoSQL Big Data Warehouse: Review and Comparison
1 Introduction
2 RDBMS vs NoSQL Databases
3 Data Warehouse Under NoSQL Databases
4 Big ETL Approaches
5 Discussion
6 Conclusion
References
ICN Based DOS Attack Mitigation in Vehicle Adhoc Networks
1 Introduction
2 Related Works
3 Detecting and Mitigating DOS Attack
4 Implementation
5 Performance Evaluation
6 Conclusion
References
From Machine Learning to Deep Learning for Detecting Abusive Messages in Arabic Social Media: Survey and Challenges
1 Introduction
2 Abusive Messages in Social Media
2.1 Definition
2.2 Types of Abusive Messages
3 Which Difficulties for Detecting Arabic Abusive Messages ?
3.1 Vowelization
3.2 Agglutination
3.3 Grammatical Ambiguity
3.4 Semantic Ambiguity
3.5 Varieties of Arabic
4 Approaches for the Automatic Detection of Arabic Abusive Messages in Social Media
4.1 Machine Learning Approaches
4.2 Deep Learning Approaches
5 Discussion and Challenges for Future Research
6 Conclusion
References
A-Part: Top-Down Clustering Approach for Mesh of Clusters FPGA
1 Introduction
2 MoCs-Based FPGA Architecture
3 FPGA Configuration Flow
4 Adapted Multilevel Partitioning Approach
4.1 First Phase: Multilevel Partitioning
4.2 Second Phase: Legalization
5 Experimental Set-Up and Comparison Results
6 Conclusion
References
In-Car State Classification with RGB Images
1 Introduction
2 Related Works
3 Implementation
3.1 Classifiers
3.2 Dataset Creation
4 Experiments
4.1 Data Preparation
4.2 Classifier Configuration
5 Discussion
6 Conclusions and Future Work
References
Designing and Developing Graphical User Interface for the MultiChain Blockchain: Towards Incorporating HCI in Blockchain
1 Introduction
2 Related Works
3 Development of the System
3.1 Creating and Connecting to Chain
3.2 Creating Wallets
3.3 Granting and Revoking Permissions
3.4 Issuance and Transaction of Assets
3.5 Creating and Publishing to Streams
3.6 Retrieval of Information
4 Evaluation of the System
4.1 Participant's Profile
4.2 Study Procedure
4.3 Data Collection and Analysis
5 Discussion and Conclusion
References
Novel Martingale Approaches for Change Point Detection
1 Introduction
2 Related Work
3 Martingale Approach
3.1 Moving Median of a Martingale Sequence
3.2 Gaussian Moving Average of a Martingale Sequence
4 Experimental Results
5 Conclusion and Future Work
References
Development of a Reinforcement Learning System to Solve the Job Shop Problem
1 Introduction
2 Literature Review
2.1 Scheduling Problems
2.2 Job Shop Problem Definition
3 Methodology
4 Proposed Architecture
5 Preliminary Results
6 Conclusion
References
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification Using Document Embedding
Abstract
1 Introduction
2 Related Works
3 One-vs-All Strategy for Multi-label Classification
4 Methodology
4.1 Pre-processing
5 Experimentation
5.1 Dataset
5.2 Results and Discussion
6 Conclusion and Future Work
References
Analysis of the Superpixel Slic Algorithm for Increasing Data for Disease Detection Using Deep Learning
1 Introduction
2 Summary of the Research Protocol
2.1 Image Base
2.2 Tools
3 Methodology
3.1 Step 1: Pre-processing
3.2 Step 2: Development of the CNN Model
3.3 Step 3: Collection of Metrics
4 Results
4.1 Validation
5 Discussions
6 Final Considerations
References
Evaluating Preprocessing Techniques in Identifying Fake News
1 Introduction
2 Related Work
3 Methodology
4 Results
5 Conclusions and Future Works
References
Covid-19 and the Pulmonary Immune System Fight: A Game Theory Model
1 Introduction
1.1 Coronavirus Categorization and Structure
1.2 Mathematical Model
2 Related Works
2.1 Pulmonary Immune System
2.2 Virus Activities
2.3 Presentation of Game Theory
3 Game Components
3.1 Players
3.2 Strategies
3.3 Payoffs
3.4 Rounds of Game
4 Modelization
4.1 Probabilities of Successful Strategies
4.2 Payoffs After Each Round
4.3 Problem Resolution
5 Applications
5.1 Values Presentation
5.2 Results Explanations
6 Conclusion
References
IAS: Intelligent Attendance System Based on Hybrid Criteria Matching
1 Introduction
2 Architecture
2.1 Activity Releasing Layer
2.2 Personnel Input Layer
2.3 Intelligent Matching Layer
2.4 Statistics Revealing Layer
3 Demonstration Scenarios
4 Conclusions
References
Energy-Based Comparison for Workflow Task Clustering Techniques
1 Introduction
2 Background and Related Work
2.1 Scientific Workflows
2.2 Clustering Techniques
2.3 Workflow Simulator
2.4 Energy Model
2.5 Workflow Scheduling Policies:
2.6 Related Work
3 Methodology
3.1 Simulation Setup:
3.2 Results and Discussion
4 Conclusion
References
Fuzzy System for Facial Emotion Recognition
1 Introduction
1.1 Overview of FIS
1.2 Fuzzy Inference Systems for Emotion Recognition
2 Proposed System
2.1 Preprocessing Module
2.2 Prediction Module
2.3 Performance Analysis
3 Results and Discussion
3.1 Evaluation of FIS for Individual Emotions
3.2 Accuracy of Ideal FIS, GAFIS and ANFIS
3.3 Evaluating GAFIS and ANFIS against Ideal FIS
4 Conclusion and Future Work
References
Towards an Approach of the Contagion Curve for COVID-19 in Mexico
1 Introduction
2 SARS and MERS History
2.1 SARS: Acute and Severe Respiratory Syndrome
2.2 Middle East Severe Respiratory Syndrome (MERS)
3 Logistic Function: Growth Curve for Mexico
3.1 Confidence Intervals
4 Prediction of the Spread of Coronavirus COVID-19 Through Multli-criteria Analysis in México
4.1 Model Development
5 Conclusions
References
A Review on Existing Methods and Classification Algorithms Used for Sex Determination of Silkworm in Sericulture
1 Introduction
2 Exsting Methods
2.1 Physcial Observation
2.2 Hyperspectral Imaging Technology
2.3 Near Infrared Spectroscopy
2.4 Fluorescence Characteristics
2.5 X-Ray
2.6 MRI
2.7 Optical Penetration
2.8 DNA
2.9 Computer Vision
3 Discussion
4 Conclusion
References
Machine Learning-Based Big Data Analytics Framework for Ebola Outbreak Surveillance
1 Introduction
2 Problem Statement
2.1 Big Data Analytics for Epidemiological Surveillance–Issues and Challenges
3 Review of Related Works
4 The Architecture for Machine Learning-Based Big Data Analytics for Predicting EVD Outbreak
5 Conclusion and Future Research Direction
References
Comparison of Different Processing Methods of Joint Coordinates Features for Gesture Recognition with a CNN in the MSRC-12 Database
1 Introduction
2 Related Work
3 Theoretical Background
3.1 Convolutional Neural Networks
3.2 FastDTW Algorithm
3.3 Joint Coordinates Feature
3.4 Microsoft Research Cambridge-12 (MSRC-12)
4 Methodology
4.1 Data Preprocessing
4.2 Convolutional Neural Network Model Proposed
4.3 Individual Training
4.4 Combined Training
4.5 Software Architecture
5 Experimental Results
5.1 Combined Training
5.2 Individual Training
6 Conclusion
References
Solving the Job Shop Scheduling Problem with Reinforcement Learning: A Statistical Analysis
1 Introduction
2 Background
2.1 Job Shop Scheduling Problem Definition
2.2 Reinforcement Learning
2.3 Proposed Architecture
3 Computational Study
3.1 Efficiency Analysis
3.2 Quality of Solutions Analysis
4 Result Discussion
5 Conclusion
References
Arabic Handwritten Recognition System Using Deep Convolutional Neural Networks
1 Introduction
2 Deep Convolutional Neural Networks
2.1 Convolutional Neural Network
2.2 Vgg16
2.3 Residual Network (ResNet)
3 Proposed Method
3.1 CNN Model
3.2 ResNet Model
3.3 VGG16 Model
4 Experiments and Results
4.1 Dataset
4.2 Data Augmentation
4.3 Experimental Settings
4.4 Results and Discussion
5 Conclusion
References
A Daily Production Planning Model Considering Flexibility of the Production Line Under Uncertainty: A Case Study
1 Introduction
2 Problem Description
2.1 Basic Assumptions
2.2 Model Formulation
2.3 Non-fuzzy Model
3 Numerical Experiment
3.1 Evaluation
4 Conclusions and Future Research
References
Academic Records: A Feasible Use Case for Blockchain?
1 Introduction
2 Distributed Ledger Technologies
2.1 Why the Need for Distributed Ledger Technologies?
2.2 Blockchain
2.3 Ethereum
2.4 Hyperledger
3 Related Work
3.1 Definition of the Research Questions and Criteria
3.2 Conduct the Search
3.3 Screening of the Results
3.4 Evaluation and Classification of the Studies
3.5 Systematic Mapping
4 Discussion of Results and Future Work
References
Optimized NASNet for Diagnosis of COVID-19 from Lung CT Images
1 Introduction
2 Background
3 Data Description and Methodology
3.1 Description of Data
3.2 Methodology
4 Result and Analysis
5 Conclusion
References
Closed-Loop Active Model Diagnosis Using Bhattacharyya Coefficient: Application to Automated Visual Inspection
1 Introduction
2 Methods
2.1 Proposed Approach
3 Simulation of Dynamical System
4 Application to Automated Visual Inspection
4.1 Formulation of CLAMD-problem
4.2 Results
5 Conclusions
References
Robust Review on Privacy and Security Issues in Present Day Attacks of Data Mining
1 Introduction
2 Hierarchy of Data Mining
3 Advantages and Disadvantages of Data Mining
3.1 Advantages
3.2 Disadvantages
4 Robust Review on Privacy and Security Issues in Data Mining
4.1 Attacks of Botnet
4.2 Attacks on DoS/DDoS
4.3 Attacks of Ransomware
4.4 Attacks of Spyware
4.5 Attacks of Malware
5 Conclusions
References
Blindophile: Mobile Assistive Gesture-Empowered Ubiquitous Input Device
1 Introduction
2 Design Concepts
2.1 Understanding the Design Framework
2.2 Design Limitations and Ramifications
3 System Description
3.1 Components of Blindophile
3.2 Gestures Supported by Blindophile
3.3 Requisites for Data Preprocessing
3.4 ML Workflow
3.5 Android Application
4 Evaluation
5 Discussions with Related Works
6 Conclusion
References
Analysis and Simulation for Mobile Ad Hoc Network Using QualNet Simulator
1 Introduction
2 Related Work
3 Simulation Procedure of QualNet
3.1 Experimental Setup
3.2 Performance Metrics
3.3 Communication Lab
4 Analyze the Required Performance Metrics
4.1 Packet Delivery Ratio
4.2 Average End2End Delay
5 Analysis and Discussion of Our Results
5.1 The Total Packets Sent
5.2 The Total Packets Were Received
5.3 In Packet Delivery Ratio
5.4 The Average E2E Delay
6 Conclusion
References
Social Media Data Integration: From Data Lake to NoSQL Data Warehouse
1 Introduction
2 Related Work : NoSQL Data Warehouse
3 Data Lake : A Big Data Source
4 Mapping into a NoSQL Data Warehouse: Column and Document Oriented
4.1 NoSQL Column-Oriented Model
4.2 NoSQL Document-Oriented Model
4.3 NoSQL Mapping Rules
5 Implementation and Evaluation
6 Conclusion
References
Role of Antenna in Flying Adhoc Networks Communication: Provocation and Open Issues
1 Introduction
2 FANET Antenna Structure
2.1 Problems Caused by Directional Antenna
3 Communication in FANET
4 FANET Design Challenges and Guidelines for Directional Antenna
4.1 Location Estimation
4.2 Antenna Type
4.3 Nodes Mobility
4.4 Adaptability
4.5 Network Latency
5 Conclusion and Open Issues
References
Transmission Over OFDM and SC-FDMA for LTE Systems
1 Introduction
1.1 OFDM
1.2 SC-FDMA
2 Literature Review
3 Methodology
4 Results and Discussion
5 Conclusion
References
A Decision Support System Based on Ontology Learning from PMI’ Project Risk Management
1 Introduction
2 Related Work
3 The Proposed Approach: Enriching PRM-Ontology Based on OL Process
4 The Ontology Enrichment: Results
5 Evaluation of OL Process
6 Conclusion
References
LWOntoRec: Light Weight Ontology Based Novel Diversified Tag Aware Song Recommendation System
1 Introduction
2 Related Work
3 Proposed System Architecture
3.1 Problem Definition
3.2 Proposed System Architecture
4 Implementation
4.1 Dataset Description
5 Results and Performance Evaluation
6 Conclusions
References
An Optimization Model for Scheduling of Households Load Profiles Incorporating Electric Vehicles Charging
1 Introduction
2 Load Profiles and Electricity Tariffs
3 Optimization Model
4 Case Studies and Computational Results
4.1 Case Study 1: Two-Person Household
4.2 Case Study 2: Four-Person Household
4.3 Results Analysis
5 Conclusions
References
An Adaptive MAC Protocol for Wireless Body Area Networks
1 Introduction
2 MAC Protocol Overview
2.1 802.15.4 Mac Protocol
2.2 802.15.6 Standard
3 Problem Statement
4 Related Works
5 Adaptive Mac Protocol for WBAN
5.1 Model Specifications
5.2 Simulation Results
6 Conclusion
References
Deep Learning with Moderate Architecture for Network Intrusion Detection System
1 Introduction
2 Literature Reviews
2.1 Anomaly Detection
2.2 Deep Learning
2.3 Convolutional Neural Network
3 Proposed Methods
3.1 NSL-KDD Dataset
3.2 Methodology
3.3 Evaluation Metrics
4 Evaluation and Results Analysis
5 Conclusion and Future Work
References
Graph Matching in Graph-Oriented Databases
1 Introduction
2 Related Works
3 Data Migration Approach
4 Graph Matching
4.1 Formal Representation of Graph Database
4.2 Levenshtein Edit Distance
4.3 Nodes Similarity
4.4 Relationship Similarity
4.5 Structure Graph Matching
5 Case Study: The LDBC-SNB Benchmark
6 Conclusion
References
A Static Analysis Approach to Detect Confidentiality Leakage of Database Query Languages
1 Introduction
2 Related Works
3 Motivating Example
4 Proposed Work
4.1 DOPDG Construction
4.2 Path Computation
4.3 Refinement
4.4 Security Analysis
4.5 Experimental Results
5 Conclusion
References
Design Thinking Based Ontology Development for Robo-advisors
1 Introduction
2 Robotic Process Automation and Robo Advisors
3 Methodology
4 Development Process
4.1 Creating Fintech Investment Ontology
4.2 Validation of the Fintech Investment Ontology
5 Conclusion
References
Proposal for the Development of a Myoelectrically Controlled Prosthetic Arm Integrated with a Web Interface Management System
1 Introduction
1.1 Problem Description
2 Prosthetic Arm Structure
2.1 Mechanical System
2.2 Electrical System
2.3 Management System
3 Definition of Prosthesis Movements
3.1 Basic Movements
3.2 Kinematic Modeling of the Prosthetic Arm
3.3 Path Planning Using 3rd Degree Splines
4 Results
4.1 Management System
4.2 Electrical and Mechanical Systems
5 Conclusions and Discussions
References
A School Bus Routing and Scheduling Problem with Time Windows and Possibility of Outsourcing with the Provided Service Quality
1 Introduction
2 Literature Review
3 Problem Description
4 Computational Experiments
5 Conclusions and Future Research
References
Cyclotomic Fast Fourier Transform with Reduced Additive Complexity
1 Introduction
2 Cyclotomic Fast Fourier Transform Algorithm
2.1 Galois Field
2.2 Basic Notions and Definitions
3 Common Sub Expression Elimination Algorithm
3.1 Methodology of Common Sub Expression Elimination Algorithm (CSE)
4 Results and Analysis
5 Conclusion
References
Inferring Contextual Data from Real-World Photography
1 Introduction
2 Related Work
3 Framework for Environmental Contextualization - eCAT
3.1 Purpose and Scope
3.2 eCAT Architecture
3.3 Frontend
3.4 Backend
4 Results and Discussion
5 Final Remarks
References
Post Covid-19 Attitude of Consumers Towards Processed Food – a Study Based on Natural Language Processing
1 Introduction
2 Related Work
3 Data Collection and Data Pre-processing
4 Results
References
A Production Scheduling Support Framework
1 Introduction
2 Literature Review
2.1 Production Scheduling
2.2 Meta-heuristics
3 Computational Implementation
3.1 Job-Shop Problem Implementation
3.2 Flow-Shop Problem Implementation
4 Conclusion
References
IK-prototypes: Incremental Mixed Attribute Learning Based on K-Prototypes Algorithm, a New Method
1 Introduction
2 K-Prototypes and Incremental Attribute Learning
2.1 K-Prototypes
2.2 Formulation
2.3 Algorithm
2.4 Incremental Attribute Learning Techniques
3 Proposed Approach
3.1 Incremental K-Prototypes Through Incremental Attribute Learning
3.2 Algorithm
3.3 Merge Process
4 Experimentation
4.1 Framework
4.2 Evaluation Criteria
4.3 Results and Discussion
5 Conclusion
References
Template Matching and Deep CNN-SVM for Online Characters Recognition
1 Introduction
2 Methodology
2.1 Features Extraction Using FARG Matching
2.2 Feature Extraction by Deep CNN-SVM Classifier
2.3 Arabic Character Recognition Using Classifiers Outputs
3 Experimental Results and Discussion
3.1 Data Set and Parameters Tuning
3.2 Results and Discussion
4 Conclusion
References
A Extreme Gradient Boosting Classifier for Predicting Chronic Kidney Disease Stages
1 Introduction
2 Materials and Methods
2.1 Dataset
2.2 Classification Algorithms
3 Computational Experiment
3.1 Classification Scenario
3.2 Experimental Setup
3.3 Results and Discussion
4 Conclusion
References
Land Cover Classification Using Deep Convolutional Neural Networks
1 Introduction
2 DCNN Architectures
2.1 AlexNet
2.2 Vgg-16
3 The Proposed Landsat Image Classification System
3.1 The Study Area
3.2 Data and Pre-processing
4 Data Augmentation
4.1 Methods for Augmentation
5 Experimental Results
6 Conclusions
References
QoE Aware Routing Protocol in Content-Centric Mobile Networks Based on SDN Architecture for Video Streaming Applications
1 Introduction
2 Related Work
2.1 Video Streaming in CCMN
2.2 SDN Support for CCN Forwarding
2.3 SDN Support for CCN Caching
3 QoE Aware Routing Protocol
3.1 System Architecture
3.2 Routing Protocol Principle
3.3 Exploration Phase
3.4 Exploitation Phase
4 Performance Evaluation
4.1 Simulation Setup and Parameters
4.2 The Impact on Expected Bitrate
4.3 The Impact on Bitrate Oscillation
4.4 The Impact on Video Freezing
4.5 The Impact on Required Peak Bandwidth
5 Conclusion
References
DC Link Voltage Control of Stand-Alone PV Tied with Battery Energy Storage System
1 Introduction
2 Proposed System
2.1 PV Array
2.2 Battery Energy Storage System
2.3 Boost Converter
2.4 PV Inverter
3 Control Scheme
3.1 Perturb and Observe MPPT Control
3.2 Battery Energy Storage System Control
3.3 Voltage Source Inverter Control
4 Results and Its Discussion
5 Discharge Characteristics
6 Conclusion
References
Identifying Achievable Goals for Adaptive Replanning Against Runtime Environment Change
1 Introduction
2 Background of Model Formalization
2.1 Labelled Transition Systems and Fluent Linear Temporal Logic
2.2 Two-Player Game and Winning Region
3 Partially Achievable Winning Region Analysis Algorithm
3.1 Definition of Reachability-Safety Game and Partially Achievable Winning Region
3.2 The Overview of Our Proposal
3.3 Partially Achievable Winning Region Analysis Algorithm
4 Evaluation
4.1 Experiment Setting
4.2 Evaluation of Effectiveness
5 Related Work
6 Conclusion and Future Work
References
Transfer Learning for Autonomous Vehicles Obstacle Avoidance with Virtual Simulation Platform
1 Introduction
2 Related Work
2.1 Autonomous Vehicles Simulators
2.2 Transfer Learning
3 Methodology
3.1 VSim-AV: Virtual Simulation Platform for Autonomous Vehicles
3.2 Deep Learning Techniques
4 Experiments, Results and Discussion
4.1 Model Evaluation
4.2 Discussion
5 Conclusion and Future Work
References
Go Green: A Web-Based Approach for Counting Trees from Google Earth Images
1 Introduction
2 Related Work
3 Dataset and Methodology
3.1 Dataset
3.2 Methodology
4 Results and Observations
5 Conclusion
References
Machine Learning Approaches for Human Activity Recognition Based on Multimodal Body Sensors
1 Introduction
2 Related Work
3 Dataset and Methodology
3.1 MHealth Dataset
3.2 Methods
4 Proposed Framework
5 Experiments Implementation
5.1 Data Preparation
5.2 Re-balance Dataset
5.3 Building Machine Learning Classifiers
6 Experiment Analysis and Results
7 Conclusion and Future Work
References
Thumbnail Personalization for Users Based on Genre Preferences
1 Introduction
2 Literature Survey
3 Model and Architecture
3.1 Extraction
3.2 Image Clustering
3.3 Dataset and Classification
4 Results
5 Conclusion
References
OPTrack: A Novel Online People Tracking System
1 Introduction
2 Proposed Method
2.1 Mono-Camera Tracking
2.2 Re-identification
3 Experiments
3.1 Datasets and Metrics Description
3.2 Performance Evaluation of MLSAR Descriptor for Re-identification Problem
3.3 Performance Evaluation of the Spatio-Temporal Features for Person Re-identification Problem
3.4 Online Person Tracking System OPTrack
4 Conclusion
References
A Hybrid Heuristic for a Three-Stage Assembly Flow Shop Scheduling Problem with Sequence Dependent Setup Time
1 Introduction
2 Characteristics of the Problem
3 A Hybrid IG-ILS Heuristic
4 Computational Experiments
4.1 Problems Instances
4.2 Calibration of IG-ILS
4.3 Numerical Results
5 Conclusions
References
Fuzzy-Probabilistic Approach for Dense Wireless Sensor Network
1 Introduction
2 Related Works
3 Methods
4 Results
5 Conclusion
References
Interval-Valued Feature Selection for Classification of Text Documents
1 Introduction
2 Literature Review
3 Proposed Feature Selection Model
3.1 Interval Valued Feature Matrix Construction
3.2 Feature Ranking Method
3.3 Feature Selection
4 Classification
5 Experiment Setup and Results
5.1 Data Sets
5.2 Results and Discussion
6 Conclusion
References
Evaluation of User's Emotional Experience Through Neurological and Physiological Measures in Playing Serious Games
1 Introduction
2 Related Works
3 Experimental Study
3.1 Preparing Experimental Apparatus
3.2 Conducting Experimental Study
3.3 Analyzing Experimental Data
4 Discussion and Conclusions
References
Development of Low Cost Intelligent Tracking System Realized on FPGA Technology for Solar Cell Energy Generation
1 Introduction
2 PV Cell Structural Model
2.1 Single Diode Model
2.2 Dual Diode Model
2.3 Dual Diode Model
3 Modeling and Design
4 Conclusions
References
An Inclusive Survey on Signature Recognition System
1 Introduction
2 General Architecture
3 Comprehensive Study
3.1 Pre-processing
3.2 Feature Extraction
3.3 Signature Recognition and Verification
4 Conclusion
References
A Comprehensive Analysis of Keystroke Recognition System
1 Introduction
2 General Architecture
3 Stages of Keystroke Recognition System
3.1 Data Acquisition
3.2 Feature Extraction and Quality Assessment
3.3 Classification and Matching
3.4 Decision
3.5 Retraining Module
4 Evaluation Metrics
4.1 Effectiveness
5 Comparative Study
6 Conclusion
References
Long Term Stock Market Prediction with Recurrent Neural Network Architectures
1 Introduction
2 Literature Review
3 Methodology
3.1 Recurrent Neural Network Architectures
3.2 ABCD Trading Strategy
3.3 Evaluation Metrics
4 Experiments
4.1 Data
4.2 Stock Price Predictions
4.3 Estimation of Trading Profit
4.4 Results
5 Conclusion
References
Obsolete Information Detection Using a Bayesian Networks Approach
1 Introduction
2 Formal Background
3 Related Works
4 Theoretical Results
4.1 (1-)-Contradiction
4.2 Restricting the (1-)-Contradictory Set
4.3 Decompose the (1-)-Contradictory Set
4.4 Obsolete Information Detection Algorithm
5 Example Case Study
6 Conclusion
References
HCRDL: A Hybridized Approach for Course Recommendation Using Deep Learning
1 Introduction
2 Related Work
3 Proposed System Architecture
4 Implementation
5 Results and Performance Evaluation
6 Conclusions
References
Toward an Intelligent System Architecture for Smart Agriculture: Application to Smart Beehives
1 Introduction
2 Related Work
2.1 Digital Agriculture
2.2 Beekeeping and IoT
3 Analysis with a Spatio-Temporal Matrix
3.1 33 Matrix
3.2 Use Case on Beehives
3.3 ApiSoft: Purpose-Built Application for Beekeepers
4 Conclusion
References
Ontology-Based Knowledge Description Model for Climate Change
1 Introduction
2 Related Work
3 Ontology Modeling and Knowledge Representation for Climate Change
4 Visualization
5 Ontology Evaluation
6 Conclusions
References
Tuning Hyperparameters on Unbalanced Medical Data Using Support Vector Machine and Online and Active SVM
1 The Support Vector Machine
2 The Online and Active SVM (LASVM)
3 Hyperparameters and Evaluation Metrics
3.1 Tuning Hyperparameters
3.2 Evaluation Metrics
4 Experimentation
5 Conclusion
References
Human Speaker Recognition Based Database Method
1 Introduction
2 Speech Signal and Linear Prediction
3 Linear Prediction
4 Results of Database System Design
5 Conclusions
References
A Survey on Versioning Approaches and Tools
1 Introduction
2 Software and XML Documents Versioning
2.1 Software Versioning
2.2 XML Documents Versioning
3 Database and Data Warehouse Versioning
3.1 Database Versioning
3.2 Data Warehouse Versioning
4 Ontology Versioning
4.1 Ontology Versions Pertinence
4.2 Ontology Versions Relationships
4.3 Ontology Versions Storage and Querying
5 Conclusion and Open Challenges
References
A Non-compensatory Multicriteria Model for Sorting the Influence of PBL Over Professional Skills
1 Introduction
2 Related Previous Works
3 Background on Multicriteria Sorting Methods
4 Modelling and Results
4.1 Setting the Model
4.2 Collecting the Data
4.3 Calculating the Membership Degree to a Category
4.4 Sorting the Skills into Class
4.5 Sorting the Overall Influence of PBL
5 Conclusion
References
Using NER + ML to Automatically Detect Fake News
1 Introduction
2 Related Work
3 Our Approach
4 Experiments and Results
4.1 Datasets
4.2 The Vector Model
4.3 Models
4.4 Gradient Boosting:
4.5 Random Forest
4.6 SVM
4.7 ClusWiSARD
4.8 Experiments
4.9 Results
5 Conclusion
References
Diagnosing Coronavirus (COVID-19) Using Various Deep Learning Models: A Comparative Study
1 Introduction
2 Deep Learning Overview
3 Models of CNN
4 Results
4.1 Preprocessing for the Dataset
4.2 Procedures of Preprocessing
4.3 Categorization
4.4 Deep Learning Models
5 Conclusion
References
K-Means and Multicriteria Decision Aid Applied to Sustainability Evaluation
1 Introduction
2 Background
2.1 SSF Index
2.2 ELECTRE – III
2.3 K-Means
3 Methodology
4 Experiments
4.1 Results
5 Conclusion
Appendix
References
A Bottom-Up Approach for Feature Model Extraction from Business Process Models
1 Introduction
2 Related Works
3 Feature Model Identification
3.1 Commonalities and Variabilities Identification
3.2 Features Identification
3.3 Feature Model Construction and Constraints Identification
4 FMr-T: Feature Model Recovery from BPM Tool
5 Conclusion
References
Diabetes Self-management Mobile Apps Improvement Based on Users' Reviews Classification
1 Introduction
2 Background and Literature Review
2.1 Diabetes Mellitus
2.2 DSM Mobile Apps Features
2.3 Users' Opinions Analysis
3 Machine Learning for Users' Reviews Analysis of DSM Mobile Apps
3.1 Data Collection and Cleaning
3.2 Data Preprocessing and Feature Selection
3.3 Machine Learning Classifier
4 Experimental Results
5 Future Research Direction
6 Conclusion and Perspectives
References
A Statistical Based Modeling Approach for Deep Learning Based Speech Emotion Recognition
1 Introduction
2 Speech Emotion Recognition Using Deep Learning
3 CNN Model Implementation
4 Experiments
5 Conclusion
References
rAVA: A Robot for Virtual Support of Learning
1 Introduction
2 The rAVA Architecture
2.1 The rAVA's Knowledge Acquisition Strategy
3 Related Works
4 Experiments and Results
5 Conclusion
References
Finding Entities and Related Facts in Newspaper
1 Introduction
2 The Problem
3 Related Works
4 Proposed Work
4.1 Datasets and Preprocessing
4.2 Entities and Related Facts
4.3 Evaluation
5 Conclusion
References
Empirical Investigation of the Factors Influencing Researchers’ Adoption of Crowdsourcing and Machine Learning
1 Introduction and Problem Description
2 Theory Development, Research Model, and Hypotheses
3 Survey Design and Execution
4 Results and Discussion
4.1 Demographics and Experience
4.2 Technology Acceptance Among Crowdsourcing Researchers
5 Conclusion
References
A Multi-agent Driven Model for Distributed Collaborative Editing
1 Introduction
2 Illustrative Use Cases
2.1 Shared Calendar
2.2 E-Learning
3 Our Agents-Based Coordination Model
3.1 Group Manager Agent
3.2 Data Storage Agent
3.3 User Agent
4 Control Concurrency Behaviors
4.1 Mobile Collaboration
5 Conclusion
References
Loan Charge-Off Prediction Including Model Explanation for Supporting Business Decisions
1 Introduction
2 Research Review
3 Data Acquisition and Engineering
3.1 Feature Description
3.2 Feature Encoding and Transformation
4 Methodology
5 Analysis of the Experimental Results
6 Model Explanation and Decision Support
7 Conclusion
References
Intelligent Traffic Signal Control Based on Reinforcement Learning
1 Introduction
2 Related Work
3 The Proposed Learning Control System
3.1 State Space
3.2 Action Space
3.3 Reward Function
3.4 Reinforcement Q-Learning Approach
3.5 Lifelong Learning Algorithm Based on Q-Learning
4 Experiment Results and Evaluation
4.1 The Simulator and Simulation Environment
4.2 Results and Analysis
5 Conclusion
References
An Iterated Local Search for the Multi-objective Dial-a-Ride Problem
1 Introduction
2 Related Work
3 Problem Description
4 Solution Procedures
4.1 Initial Solution
4.2 Neighborhood Structures
4.3 Local Search
4.4 Perturbation
5 Computational Experiments
5.1 Analysis of Results
6 Conclusions
References
Automated Extreme Learning Machine to Forecast the Monthly Flows: A Case Study at Zambezi River
1 Introduction
2 Methodology
2.1 Streamflow Data and Study Area
2.2 Streamflow Prediction Model
2.3 Extreme Learning Machine (ELM)
2.4 Automated Parameter Tuning Guided by a Genetic Algorithm
3 Computational Experiments and Discussion
4 Conclusion
References
A Novel Fast Algorithm for Mining Compact High Utility Itemsets
1 Introduction
2 Related Works
3 Our Approach
3.1 Structure of the Proposed List
3.2 Our Algorithm for Mining Compact HUIs
3.3 Discussion
4 Experiments
4.1 Running Time
4.2 Memory Consumption
5 Conclusion
References
Near-Optimal Data Communication Between Unmanned Aerial and Ground Vehicles
1 Introduction
1.1 Motivation
1.2 Related Work
1.3 Our Main Contribution
1.4 Organization of This Paper
2 System Model
3 Efficiency of UROP
3.1 UROP
3.2 Efficiency Boundss of UROP
3.3 Efficiency of UROP Through Finite Time Horizons
4 Numerical Results
5 Conclusion
References
Isolated Kannada Character Recognition Using Transfer Learning
1 Introduction
2 Implementation Details
2.1 Effect on Recognition by Retraining a Last One Layer of Inception V3
2.2 Effect on Recognition by Fine-Tuning the Inception-V3 Model Using Intermediate Dataset
2.3 Effect on Recognition by Retraining the Last Three Layers of Inception V3
2.4 Effect on Prediction Using Inception-V3 Model Without ImageNet Dataset Weights
3 Discussion and Conclusions
References
Comparison of Machine Learning Algorithms and Ensemble Technique for Heart Disease Prediction
1 Introduction
2 Research Review
3 Methodology
4 Experimental Setup
5 Results and Discussion
6 Conclusion
References
Genetic Search Wrapper-Based Naïve Bayes Anomaly Detection Model for Fog Computing Environment
1 Introduction
2 Related Works
3 Proposed GSWNB Model
3.1 Dataset Description
3.2 Genetic Search Algorithm Implementation Phase
3.3 Classification Phase – Naïve Bayes Algorithm
4 Experimental Setup
5 Results and Discussion
5.1 Testing and Performance Evaluation
5.2 Overall Performance of the Proposed Model
5.3 Comparison of the Proposed Model With Other Algorithms
5.4 Proposed GSWNB Approach and other Feature Selection Techniques
5.5 Comparison of Full Dataset Features with Selected Dataset Features
5.6 Comparison of the GSWNB Model with Results from Previous Research Papers
6 Conclusion and Future Work
References
Antlion Optimization-Based Feature Selection Scheme for Cloud Intrusion Detection Using Naïve Bayes Algorithm
1 Introduction
2 Related Work
3 Proposed CIDS Model
3.1 Naïve Bayes Classifier
3.2 Ant Lion Optimization (ALO) Algorithm
3.3 Dataset Description
4 Evaluation of Proposed CIDS
4.1 Experimental Setup
4.2 Result and Discussion
4.3 Accuracy
4.4 False Positive Rate (FPR)
4.5 Recall
4.6 Precision
4.7 Kappa Statistics
5 Conclusion
References
Real-Time Content-Based Cyber Threat Detection with Machine Learning
1 Introduction
2 Related Works
3 Phishing Website Content
3.1 Content Analysis and Features
3.2 Dataset
3.3 Machine Learning Algorithms
3.4 K-Fold Approach
4 Experimental Results
5 Conclusion and Future Works
References
Hybrid Approach to Define Semantic Relationships
1 Introduction
2 Semantic Relationships
2.1 Semantic Relationships by Linguistic Approaches
2.2 Semantic Relationships by Statistical Approaches
2.3 Semantic Relationships by Hybrid Approaches
3 Proposed Hybrid Approach for Automatic Definition of Semantics Relationships
3.1 Definition of Relationships by Mesh
3.2 Definition of Relationships by Word2Vec
3.3 Hybrid Definition of Relations: MeSH + Word2Vec
4 Evaluation
4.1 1000 Documents
4.2 Results
5 Summary and Future Works
References
Author Index
Advances in Intelligent Systems and Computing 1351
Ajith Abraham · Vincenzo Piuri · Niketa Gandhi · Patrick Siarry · Arturas Kaklauskas · Ana Madureira Editors
Intelligent Systems Design and Applications 20th International Conference on Intelligent Systems Design and Applications (ISDA 2020) held December 12–15, 2020
Advances in Intelligent Systems and Computing Volume 1351
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. Indexed by DBLP, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST). All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/11156
Ajith Abraham Vincenzo Piuri Niketa Gandhi Patrick Siarry Arturas Kaklauskas Ana Madureira •
•
•
•
•
Editors
Intelligent Systems Design and Applications 20th International Conference on Intelligent Systems Design and Applications (ISDA 2020) held December 12–15, 2020
123
Editors Ajith Abraham Scientific Network for Innovation and Research Excellence Machine Intelligence Research Labs (MIR Labs) Auburn, WA, USA Niketa Gandhi Machine Intelligence Research Labs (MIR Labs) Auburn, WA, USA Arturas Kaklauskas Department of Construction Management and Real Estate Vilnius Gediminas Technical University Vilnius, Lithuania
Vincenzo Piuri Department of Computer Science Università degli Studi di Milano Milan, Milano, Italy Patrick Siarry Campus Centre de Créteil Université Paris-Est Créteil Créteil, France Ana Madureira School of Engineering Instituto Superior de Engenharia do Porto Porto, Portugal
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-71186-3 ISBN 978-3-030-71187-0 (eBook) https://doi.org/10.1007/978-3-030-71187-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Welcome to the 20th International Conference on Intelligent Systems Design and Applications (ISDA’20) held in the World Wide Web. ISDA’20 is hosted and sponsored by the Machine Intelligence Research Labs (MIR Labs), USA. ISDA’20 brings together researchers, engineers, developers and practitioners from academia and industry working in all interdisciplinary areas of computational intelligence and system engineering to share their experience and to exchange and cross-fertilize their ideas. The aim of ISDA’20 is to serve as a forum for the dissemination of state-of-the-art research, development and implementations of intelligent systems, intelligent technologies and useful applications in these two fields. ISDA’20 received submissions from 40 countries, and each paper was reviewed by at least five or more reviewers, and based on the outcome of the review process, 130 papers were accepted for inclusion in the conference proceedings (38% acceptance rate). First, we would like to thank all the authors for submitting their papers to the conference, for their presentations and discussions during the conference. Our thanks go to program committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. Our special thanks to the following plenary speakers, for their exciting plenary talks: • • • • • • • • • • • • •
Prof. Dr. Sushmita Mitra, Indian Statistical Institute, Kolkata, India Prof. Dr. Bassem Jarboui, Sfax University, Tunisia Prof. Dr. Tzung-Pei Hong, National University of Kaohsiung, Taiwan Prof. Dr. André Rossi, Université Paris-Dauphine, France Prof. Dr. Amine Nait-Ali, University Paris-Est Créteil, France Dr. Rudy A. Oude Vrielink, University of Twente & TwiQel, Netherlands Prof. Dr. Frederico Gadelha Guimaraes, Federal U. of Minas Gerais, Brazil Prof. Dr. Fakhri Karray, University of Waterloo, Canada Prof. Dr. Saman K. Halgamuge, University of Melbourne, Australia Prof. Dr. Salvador García, University of Granada, Granada, Spain Prof. Dr. Vincenzo Piuri, Università degli Studi di Milano, Italy Prof. Dr. Dries F. Benoit, Ghent University, Belgium Prof. Dr. Dijiang Huang, Arizona State University, USA
v
vi
Preface
• Prof. Dr. Ketan Kotecha, Symbiosis Institute of Technology, India • Prof. Dr. Sheela Ramanna, University of Winnipeg, Canada We express our sincere thanks to the organizing committee chairs for helping us to formulate a rich technical program. Enjoy reading the articles!
Organization
General Chairs Ajith Abraham Vincenzo Piuri Patrick Siarry
Machine Intelligence Research Labs, USA University of Milan, Italy Université Paris-Est Créteil, France
Program Chairs Ana Madureira Arturas Kaklauskas
ISEP, Porto, Portugal Vilnius Gediminas Technical University, Lithuania
Publication Chairs Niketa Gandhi Kun Ma
Machine Intelligence Research Labs, USA University of Jinan, China
Publicity Chairs Jyotshna Dongardive Pooja Manghirmalani Mishra
University of Mumbai, India University of Mumbai, India
International Program Committee Ayush Goyal Azah Muda Bruno Cunha Bruno Silva Carlos Pereira
Texas A&M University, USA UTeM, Malaysia Instituto Superior de Engenharia do Porto, Portugal Federal University of Ceará, Brasil ISEC, Portugal
vii
viii
Catarina I. Reis Dalia Kriksciuniene David Herrera-Sánchez Efrén Mezura-Montes Elizabeth Goldbarg Francisco Chicano Gerard Deepak Giner Alor-Hernández Gustavo Adolfo Vargas Hakim Hudson Geovane de Medeiros Isabel S. Jesus Islame Felipe Da Costa Fernandes Issa Atoum Jolanta Mizera-Pietraszko José Everardo Bessa Maia José Raúl Romero José-Clemente Hernández-Hernández Juan-Antonio Rodríguez-de-la-Cruz Jyotshna Dongardive Kaushik Das Sharma Kun Ma Lavika Goel Lee Chang-Yong Mahendra Kanojia Mansi Sharma Mario Giovanni C.A. Cimino Matheus Menezes Mauricio Ayala-Rincón Niketa Gandhi Nuno Bettencourt Omar Roríguez López Ons Aouedi Oscar Castillo Patrick Siarry Paulo Asconavieta
Organization
Polytechnic of Leiria, Portugal Vilnius University, Lithuania University of Veracruz, Mexico University of Veracruz, Mexico Universidade Federal do Rio Grande do Norte, Brazil Universidad de Málaga, Spain National Institute of Technology, India Instituto Tecnológico de Orizaba, México University of Veracruz, Mexico Federal University of Rio Grande do Norte, Brazil Instituto Superior de Engenharia do Porto, Portugal Federal University of Rio Grande do Norte, Brazil University of Malaysia Sarawak, Malaysia Opole University, Poland State University of Ceará, Brazil University of Cordoba, Spain University of Veracruz, Mexico Universidad Veracruzana, Mexico University of Mumbai, India University of Calcutta, India University of Jinan, China Malaviya National Institute of Technology, India Kongju National University, South Korea Sheth L.U.J and Sir M.V. College, India Indian Institute of Technology Delhi, India University of Pisa, Italy Universidade Federal Rural do Semi-Árido, Brazil Universidade de Brasilia, Brazil Machine Intelligence Research Labs (MIR Labs), USA University of Porto, Portugal University of Veracruz, Mexico University of Nantes, France Tijuana Institute Technology, Mexico Université Paris-Est Créteil, France Instituto Federal Sul-rio-grandense - Pelotas, Brazil
Organization
Pooja Manghirmalani Mishra Radu-Emil Precup Rafael Barbudo Lunar Raghunandan Reddy Alugubelli Romerito Andrade Sidemar Fideles Cezario Subodh Deolekar Thiago Soares Marques Virgilijus Sakalauskas
ix
University of Mumbai, India Politehnica University of Timisoara, Romania University of Cordoba, Spain H. Lee Moffitt Cancer Center & Research Institute, USA Federal University of Rio Grande do Norte, Brazil Federal University of Rio Grande do Norte, Brazil REDx Innovation Lab WeSchool, India Federal University of Juiz de Fora, Brazil Vilnius University, Lithuania
Contents
Intelligent Automation Systems at the Core of Industry 4.0 . . . . . . . . . . Amit Kumar Tyagi, Terrance Frederick Fernandez, Shashvi Mishra, and Shabnam Kumari
1
Hybrid Extreme Learning Machine and Backpropagation with Adaptive Activation Functions for Classification Problems . . . . . . T. L. Fonseca and L. Goliatt
19
Assessing an Organization Security Culture Based on ENISA Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wasnaa Kadhim Jawad
30
Evaluation of Information Security Policy for Small Company . . . . . . . Wasnaa Kadhim Jawad
39
English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word . . . . . . . . . . . . . . . . . . . Pratibha Maurya
49
Relay UAV-Based FSO Communications over Log-Normal Channels with Pointing Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ha Duyen Trung and Nguyen Huu Trung
59
A Novel Generalized Form of Cure Rate Model for an Infectious Disease with Co-infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oluwafemi Samson Balogun, Sunday Adewale Olaleye, Xiao-Zhi Gao, and Pekka Toivanen A2PF: An Automatic Protein Production Framework . . . . . . . . . . . . . . Mohamed Hachem Kermani and Zizette Boufaida Open Vocabulary Recognition of Offline Arabic Handwriting Text Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zouhaira Noubigh, Anis Mezghani, and Monji Kherallah
69
80
92
xi
xii
Contents
Formal Verification of Safety-Critical Systems: A Case-Study in Airbag System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Susmita Guha, Akash Nag, and Rahul Karmakar Classification of Musical Preference in Generation Z Through EEG Signal Processing and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 117 Billy Ward, Chandresh Pravin, Alec Chetcuti, Yoshikatsu Hayashi, and Varun Ojha A Locally Weighted Metric for Measuring the Perceptual Quality of 3D Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Nessrine Elloumi, Johan Debayle, Habiba Loukil, and Med Salim Bouhlel Transfer Learning for Instance Segmentation of Waste Bottles Using Mask R-CNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Punitha Jaikumar, Remy Vandaele, and Varun Ojha ImageFuse: A Multi-view Image Featurization Framework for Visual Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Sruthy Manmadhan and Binsu C. Kovoor Cooperative Advanced Driver Assistance Systems: A Survey and Recent Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Islam Elleuch, Achraf Makni, and Rafik Bouaziz Amended Convolutional Neural Network with Global Average Pooling for Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Aiman Al-Sabaawi, Hassan M. Ibrahim, Zinah Mohsin Arkah, Muthana Al-Amidie, and Laith Alzubaidi Employment of Pre-trained Deep Learning Models for Date Classification: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Aiman Al-Sabaawi, Reem Ibrahim Hasan, Mohammed A. Fadhel, Omran Al-Shamma, and Laith Alzubaidi A Study on Evolutionary Algorithms to Reopen Organizations Safely During COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Ashi Gala, Nilakshi Jain, Srikanth Kodeboyina, and Ramesh Menon Learning Spherical Word Vectors for Opinion Mining and Applying on Hotel Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Sandra Rizkallah, Amir F. Atiya, and Samir Shaheen Towards Developing a Hospital Cabin Management System Using Brain Computer Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Md Shadman Aadeeb, Md. Mahadi Hassan Munna, Md. Raqibur Rahman, and Muhammad Nazrul Islam
Contents
xiii
A Discriminant Function Controller for Elevator Groups Evolved by Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 André Luis Ferreira Sá and José E. B. Maia Keyframe Extraction Using Sobel Fuzzified Weighted Approach . . . . . . 236 H. M. Nandini, H. K. Chethan, and B. S. Rashmi Deep Learning with Real-Time Inference for Human Detection in Search and Rescue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Raúl Llasag Rosero, Carlos Grilo, and Catarina Silva Sliding Mode Control of Magnetic Suspended Impeller for Artificial Heart Pump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Saleem Khalefa Kadhim, Amjad J. Humaidi, and Ahmed Sharhan Gataa Design and Construction of a Wireless Controlled Fire Extinguishing Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Md. Milon Uddin, Imran Bin Jafar, Kanij Raihana, Tanwy Barua, and Mohammed Belal Hossain Bhuian Investigating Data Distribution for Classification Using PSO with Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Xiaojing Zhang, Jiawei Fan, Lin Wang, and Bo Yang Study of Various Web-based Applications Performance in Natif IPv4 and IPv6 Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Khalid E. L. Khadiri, Ouidad Labouidya, Najib E. L. Kamoun, Rachid Hilal, Fatima Lakrami, and Chaimaa Belbergui A Phase Memory Controller for Isolated Intersection Traffic Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 Nator J. C. da Costa and José Everardo Bessa Maia Multiple Face Recognition Using Self-adaptive Differential Evolution and ORB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Guilherme Costa, Rafael Stubs Parpinelli, and Chidambaram Chidambaram Low-Cost, Ultrasound-Based Support System for the Visually Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 Manuel Ayala-Chauvin, Patricio Lara-Alvarez, Jorge Peralta, and Albert de la Fuente-Morato SD-CCN Architecture to Improve QoE for Video Streaming Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Amna Fekih, Sonia Gaied Fantar, and Habib Youssef
xiv
Contents
Dijkstra and A* Algorithms for Global Trajectory Planning in the TurtleBot 3 Mobile Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Pedro Medeiros de Assis Brasil, Fabio Ugalde Pereira, Marco Antonio de Souza Leite Cuadros, Anselmo Rafael Cukla, and Daniel Fernando Tello Gamarra A Data Mining Framework for Response Modelling in Direct Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Fátima Rodrigues and Tiago Oliveira WakeMeUp: Weight Dependent Intelligent and Robust Alarm System Using Load Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Omar Bin Samin, Sumaira Imtiaz, Maryam Omar, Noman Naseeb, and Samad Ali Shah Visual Data Mining: A Comparative Analysis of Selected Datasets . . . . 377 Ujunwa Mgboh, Blessing Ogbuokiri, George Obaido, and Kehinde Aruleba NoSQL Big Data Warehouse: Review and Comparison . . . . . . . . . . . . . 392 Senda Bouaziz, Ahlem Nabli, and Faiez Gargouri ICN Based DOS Attack Mitigation in Vehicle Adhoc Networks . . . . . . . 402 Mahesh R. Patil and Loganathan Agilandeeswari From Machine Learning to Deep Learning for Detecting Abusive Messages in Arabic Social Media: Survey and Challenges . . . . . . . . . . . 411 Salma Abid Azzi and Chiraz Ben Othmane Zribi A-Part: Top-Down Clustering Approach for Mesh of Clusters FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Khouloud Bouaziz, Sonda Chtourou, Zied Marrakchi, Abdulfattah M. Obeid, and Mohamed Abid In-Car State Classification with RGB Images . . . . . . . . . . . . . . . . . . . . . 435 Pedro Faria, Sandra Dixe, João Leite, Sahar Azadi, José Mendes, Jaime C. Fonseca, and João Borges Designing and Developing Graphical User Interface for the MultiChain Blockchain: Towards Incorporating HCI in Blockchain . . . 446 Tani Hossain, Tasniah Mohiuddin, A. M. Shahed Hasan, Muhammad Nazrul Islam, and Syed Akhter Hossain Novel Martingale Approaches for Change Point Detection . . . . . . . . . . 457 Jonathan Etumusei, Jorge Martinez Carracedo, and Sally McClean Development of a Reinforcement Learning System to Solve the Job Shop Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 Bruno Cunha, Ana Madureira, and Benjamim Fonseca
Contents
xv
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification Using Document Embedding . . . . . . . . . . . . . . . . . . . . . . . 478 Sonia Guehria, Habiba Belleili, Nabiha Azizi, and Samir Brahim Belhaouari Analysis of the Superpixel Slic Algorithm for Increasing Data for Disease Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . 488 Luiz Daniel Garay Trindade, Fábio Paulo Basso, Elder de Macedo Rodrigues, Maicon Bernardino, Daniel Welfer, and Daniel Müller Evaluating Preprocessing Techniques in Identifying Fake News . . . . . . 498 Matheus Marinho, Carmelo J. A. Bastos-Filho, and Anthony Lins Covid-19 and the Pulmonary Immune System Fight: A Game Theory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 Said Lakhal and Zouhair Guennoun IAS: Intelligent Attendance System Based on Hybrid Criteria Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Fanglve Zhang, Jia Yu, and Kun Ma Energy-Based Comparison for Workflow Task Clustering Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 Youssef Saadi, Abdelhalim Hnini, Soufiane Jounaidi, and Hicham Zougah Fuzzy System for Facial Emotion Recognition . . . . . . . . . . . . . . . . . . . . 536 Kanika Gupta, Megha Gupta, Jabez Christopher, and Vasan Arunachalam Towards an Approach of the Contagion Curve for COVID-19 in Mexico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 M. Beatriz Bernábe-Loranca, Rogelio González-Velázquez, Erika Granillo-Martínez, Jorge A. Ruiz-Vanoye, and Alberto Carrillo Canan A Review on Existing Methods and Classification Algorithms Used for Sex Determination of Silkworm in Sericulture . . . . . . . . . . . . . . . . . 567 Sania Thomas and Jyothi Thomas Machine Learning-Based Big Data Analytics Framework for Ebola Outbreak Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 Oluwafemi A. Sarumi Comparison of Different Processing Methods of Joint Coordinates Features for Gesture Recognition with a CNN in the MSRC-12 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 Júlia Schubert Peixoto, Miguel Pfitscher, Marco Antonio de Souza Leite Cuadros, Daniel Welfer, and Daniel Fernando Tello Gamarra
xvi
Contents
Solving the Job Shop Scheduling Problem with Reinforcement Learning: A Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 Bruno Cunha, Ana Madureira, and Benjamim Fonseca Arabic Handwritten Recognition System Using Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 Safa Jraba, Mohamed Elleuch, and Monji Kherallah A Daily Production Planning Model Considering Flexibility of the Production Line Under Uncertainty: A Case Study . . . . . . . . . . . 621 Mohammad Sanjari-Parizi, Ali Navaei, Ajith Abraham, and Seyed Ali Torabi Academic Records: A Feasible Use Case for Blockchain? . . . . . . . . . . . 635 Rita Oliveira, Catarina I. Reis, and Marisa Maximiano Optimized NASNet for Diagnosis of COVID-19 from Lung CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Subrato Bharati, Prajoy Podder, M. Rubaiyat Hossain Mondal, and Niketa Gandhi Closed-Loop Active Model Diagnosis Using Bhattacharyya Coefficient: Application to Automated Visual Inspection . . . . . . . . . . . . 657 Jacques Noom, Nguyen Hieu Thao, Oleg Soloviev, and Michel Verhaegen Robust Review on Privacy and Security Issues in Present Day Attacks of Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668 P. S. S. Nitin and T. Sudha Blindophile: Mobile Assistive Gesture-Empowered Ubiquitous Input Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Ishita Agarwal, Udit Kumar, Rachit Jain, Ruchika Chugh, and Ajith Abraham Analysis and Simulation for Mobile Ad Hoc Network Using QualNet Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689 Mohammed Salah Abood, Mustafa Maad Hamdi, Ahmed Shamil Mustafa, Yasir Abdalhamed Najem, Sami Abduljabbar Rashid, and Imad Jalal Saeed Social Media Data Integration: From Data Lake to NoSQL Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 Hichem Dabbèchi, Nahla Zaaboub Haddar, Haytham Elghazel, and Kais Haddar Role of Antenna in Flying Adhoc Networks Communication: Provocation and Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711 Ashish Srivastava and Jay Prakash
Contents
xvii
Transmission Over OFDM and SC-FDMA for LTE Systems . . . . . . . . . 722 Mustafa Maad Hamdi and Mohammed Salah Abood A Decision Support System Based on Ontology Learning from PMI’ Project Risk Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732 Wiem Zaouga and Latifa Ben Arfa Rabai LWOntoRec: Light Weight Ontology Based Novel Diversified Tag Aware Song Recommendation System . . . . . . . . . . . . . . . . . . . . . . . . . . 743 Saicharan Gadamshetti, Gerard Deepak, and A. Santhanavijayan An Optimization Model for Scheduling of Households Load Profiles Incorporating Electric Vehicles Charging . . . . . . . . . . . . . . . . . . . . . . . . 753 Pedro Barros, Adelaide Cerveira, and José Baptista An Adaptive MAC Protocol for Wireless Body Area Networks . . . . . . . 764 Douma Ferdawss and Rafik Braham Deep Learning with Moderate Architecture for Network Intrusion Detection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 Safa Mohamed and Ridha Ejbali Graph Matching in Graph-Oriented Databases . . . . . . . . . . . . . . . . . . . 784 Soumaya Boukettaya, Ahlem Nabli, and Faiez Gargouri A Static Analysis Approach to Detect Confidentiality Leakage of Database Query Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794 Angshuman Jana Design Thinking Based Ontology Development for Robo-advisors . . . . . 805 Ildikó Szabó, Gábor Neusch, and Réka Vas Proposal for the Development of a Myoelectrically Controlled Prosthetic Arm Integrated with a Web Interface Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818 J. C. Barbon, J. Dellagostin, M. E. Ribeiro, L. Bortoncello, G. Vaccari, R. Sales, G. Salvador, A. F. Carneiro, A. R. Cukla, and B. Rossato A School Bus Routing and Scheduling Problem with Time Windows and Possibility of Outsourcing with the Provided Service Quality . . . . . 829 Mohammad Reza Sayyari, Reza Tavakkoli-Moghaddam, Ajith Abraham, and Nastaran Oladzad-Abbasabady Cyclotomic Fast Fourier Transform with Reduced Additive Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840 Tejaswini Panse, Prashant Deshmukh, Monica Kalbande, and Yashika Gaidhani Inferring Contextual Data from Real-World Photography . . . . . . . . . . . 853 Tiago S. Costa, Maria Teresa Andrade, and Paula Viana
xviii
Contents
Post Covid-19 Attitude of Consumers Towards Processed Food – a Study Based on Natural Language Processing . . . . . . . . . . . . . . . . . . 863 S. V. Praveen and Rajesh Ittamalla A Production Scheduling Support Framework . . . . . . . . . . . . . . . . . . . . 869 Paula Reis, André S. Santos, João Bastos, Ana M. Madureira, and Leonilde R. Varela IK-prototypes: Incremental Mixed Attribute Learning Based on K-Prototypes Algorithm, a New Method . . . . . . . . . . . . . . . . . . . . . . 880 Siwar Gorrab and Fahmi Ben Rejab Template Matching and Deep CNN-SVM for Online Characters Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891 Rabiaa Zitouni, Hala Bezine, and Najet Arous A Extreme Gradient Boosting Classifier for Predicting Chronic Kidney Disease Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901 João P. Scoralick, Gabriele C. Iwashima, Fernando A. B. Colugnati, Leonardo Goliatt, and Priscila V. S. Z. Capriles Land Cover Classification Using Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911 Reham Gharbia, Nour Eldeen M. Khalifa, and Aboul Ella Hassanien QoE Aware Routing Protocol in Content-Centric Mobile Networks Based on SDN Architecture for Video Streaming Applications . . . . . . . 921 Amna Fekih, Sonia Gaied Fantar, and Habib Youssef DC Link Voltage Control of Stand-Alone PV Tied with Battery Energy Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934 Pallavi Verma, Priya Mahajan, and Rachana Garg Identifying Achievable Goals for Adaptive Replanning Against Runtime Environment Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945 Jialong Li, Kenji Tei, and Shinichi Honiden Transfer Learning for Autonomous Vehicles Obstacle Avoidance with Virtual Simulation Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956 Leila Haj Meftah and Rafik Braham Go Green: A Web-Based Approach for Counting Trees from Google Earth Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966 Wani Bhavesh Gajendra Machine Learning Approaches for Human Activity Recognition Based on Multimodal Body Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977 Ghada Gamal, Yasser M. K. Omar, and Fahima A. Maghraby
Contents
xix
Thumbnail Personalization for Users Based on Genre Preferences . . . . 988 Hrithik Jha, Abhay Patel, and Aswani Kumar Cherukuri OPTrack: A Novel Online People Tracking System . . . . . . . . . . . . . . . . 997 Mayssa Frikha, Emna Fendri, and Mohamed Hammami A Hybrid Heuristic for a Three-Stage Assembly Flow Shop Scheduling Problem with Sequence Dependent Setup Time . . . . . . . . . . 1007 Saulo C. Campos, José Elias Claudio Arroyo, and Matheus Freitas Fuzzy-Probabilistic Approach for Dense Wireless Sensor Network . . . . 1018 Flávio R. S. Nunes, Crislânio de S. Macêdo, Jéferson do N. Soares, Haniel G. Cavalcante, Marcelo Q. L. Brilhante, and José E. B. Maia Interval-Valued Feature Selection for Classification of Text Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028 N. Vinay Kumar, K. Swarnalatha, D. S. Guru, and B. S. Anami Evaluation of User’s Emotional Experience Through Neurological and Physiological Measures in Playing Serious Games . . . . . . . . . . . . . . 1039 Tarannum Zaki, Nafiz Imtiaz Khan, and Muhammad Nazrul Islam Development of Low Cost Intelligent Tracking System Realized on FPGA Technology for Solar Cell Energy Generation . . . . . . . . . . . . 1051 Alaa Hamza Omran, Mohammed Salih Mahdi, Ahmed A. Hashim, and G. H. Abdul-Majeed An Inclusive Survey on Signature Recognition System . . . . . . . . . . . . . . 1065 L. Agilandeeswari, Yerru Nanda Krishna Arun, Chikkala Nikhil, Suri Koushmitha, and A. Chaithanya A Comprehensive Analysis of Keystroke Recognition System . . . . . . . . 1074 L. Agilandeeswari, V. Ragul, S. Syed Mohammed Nihal, and M. Rahaman Khan Long Term Stock Market Prediction with Recurrent Neural Network Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084 Emil Kalbaliyev and Gabor Szegedi Obsolete Information Detection Using a Bayesian Networks Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094 Salma Chaieb, Ali Ben Mrad, Brahim Hnich, and Véronique Delcroix HCRDL: A Hybridized Approach for Course Recommendation Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1105 N. Roopak, Gerard Deepak, and A. Santhanavijayan Toward an Intelligent System Architecture for Smart Agriculture: Application to Smart Beehives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114 Jean-Charles Huet, Lamine Bougueroua, Yassine Kriouile, and Alain Moretto
xx
Contents
Ontology-Based Knowledge Description Model for Climate Change . . . 1124 Deepak Surya, Gerard Deepak, and A. Santhanavijayan Tuning Hyperparameters on Unbalanced Medical Data Using Support Vector Machine and Online and Active SVM . . . . . . . . . . . . . . 1134 Walid Ksiaa, Fahmi Ben Rejab, and Kaouther Nouira Human Speaker Recognition Based Database Method . . . . . . . . . . . . . . 1145 Ahmed Samit Hatem, Muthanna J. Adulredhi, Ali M. Abdulrahman, and Mohammed A. Fadhel A Survey on Versioning Approaches and Tools . . . . . . . . . . . . . . . . . . . 1155 Leila Bayoudhi, Najla Sassi, and Wassim Jaziri A Non-compensatory Multicriteria Model for Sorting the Influence of PBL Over Professional Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165 Helder Gomes Costa and Anabela Carvalho Alves Using NER + ML to Automatically Detect Fake News . . . . . . . . . . . . . . 1176 Marcos A. Spalenza, Elias de Oliveira, Leopoldo Lusquino-Filho, Priscila M. V. Lima, and Felipe M. G. França Diagnosing Coronavirus (COVID-19) Using Various Deep Learning Models: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1188 Omran Al-Shamma, Mohammed A. Fadhel, Laith Alzubaidi, Laith Farhan, and Muthana Al-Amidie K-Means and Multicriteria Decision Aid Applied to Sustainability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198 Rafaela Lima Santos de Souza and Helder Gomes Costa A Bottom-Up Approach for Feature Model Extraction from Business Process Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1209 Jihen Maâzoun, Mariem Mefteh, Nadia Bouassida, and Mouna Belmabrouk Diabetes Self-management Mobile Apps Improvement Based on Users’ Reviews Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1219 Najwa Benalaya, Mariem Haoues, and Asma Sellami A Statistical Based Modeling Approach for Deep Learning Based Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1230 Sara Sekkate, Mohammed Khalil, and Abdellah Adib rAVA: A Robot for Virtual Support of Learning . . . . . . . . . . . . . . . . . . 1238 Elias de Oliveira, Marcos Spalenza, and Juliana P. C. Pirovani Finding Entities and Related Facts in Newspaper . . . . . . . . . . . . . . . . . 1248 Jaimel de Oliveira Lima, Cristiano da Silveira Colombo, Flávio Izo, Elias Oliveira, and Claudine Badué
Contents
xxi
Empirical Investigation of the Factors Influencing Researchers’ Adoption of Crowdsourcing and Machine Learning . . . . . . . . . . . . . . . 1257 António Correia, Daniel Schneider, Shoaib Jameel, Hugo Paredes, and Benjamim Fonseca A Multi-agent Driven Model for Distributed Collaborative Editing . . . . 1271 Moulay Driss Mechaoui Loan Charge-Off Prediction Including Model Explanation for Supporting Business Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1281 Abdullah Al Imran and Md Nur Amin Intelligent Traffic Signal Control Based on Reinforcement Learning . . . 1292 Natalija Dokic, Miroljub Tomic, Jasmina Stevic, Dunja Dokic, and Ning Xiong An Iterated Local Search for the Multi-objective Dial-a-Ride Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1302 Alba Assis Campos and André Renato Sales Amaral Automated Extreme Learning Machine to Forecast the Monthly Flows: A Case Study at Zambezi River . . . . . . . . . . . . . . . . . . . . . . . . . 1314 A. D. Martinho, T. L. Fonseca, and L. Goliatt A Novel Fast Algorithm for Mining Compact High Utility Itemsets . . . . 1325 Nong Thi Hoa and Nguyen Van Tao Near-Optimal Data Communication Between Unmanned Aerial and Ground Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336 Omer Melih Gul Isolated Kannada Character Recognition Using Transfer Learning . . . . 1348 Pratiksha R. Reddy and H. R. Mamatha Comparison of Machine Learning Algorithms and Ensemble Technique for Heart Disease Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 1360 Ritu Aggarwal and Saurabh Pal Genetic Search Wrapper-Based Naïve Bayes Anomaly Detection Model for Fog Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . 1371 John Oche Onah, Shafi’i Muhammad Abdulhamid, Sanjay Misra, Mayank Mohan Sharma, Nadim Rana, and Jonathan Oluranti Antlion Optimization-Based Feature Selection Scheme for Cloud Intrusion Detection Using Naïve Bayes Algorithm . . . . . . . . . . . . . . . . . 1383 Haruna Atabo Christopher, Shafi’i Muhammad Abdulhamid, Sanjay Misra, Isaac Odun-Ayo, and Mayank Mohan Sharma Real-Time Content-Based Cyber Threat Detection with Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1394 Emre Kocyigit, Mehmet Korkmaz, Ozgur Koray Sahingoz, and Banu Diri
xxii
Contents
Hybrid Approach to Define Semantic Relationships . . . . . . . . . . . . . . . . 1404 Nesrine Ksentini, Siwar Zayani, Mohamed Tmar, and Faiez Gargouri Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1415
Intelligent Automation Systems at the Core of Industry 4.0 Amit Kumar Tyagi1,2(B)
, Terrance Frederick Fernandez3 , Shashvi Mishra2 , and Shabnam Kumari4
1 Centre for Advanced Data Science, Vellore Institute of Technology, Chennai 600127,
Tamilnadu, India [email protected] 2 School of Computer Science and Engineering, Vellore Institute of Technology, Chennai 600127, Tamilnadu, India 3 Rajiv Gandhi College of Engineering and Technology, Puducherry, India 4 SRMIST University, Chennai 603203, Tamilnadu, India
Abstract. Today’s in 21st century, we require Digital Transformation everywhere and want to make human life easier and longer to live. Digital Transformation cannot be accomplished by companies/ industries without the use of artificial intelligence (AI, i.e., analytics process) and Internet of Things (IoTs) together. AI and IoTs are the necessity of next decade and of many nations. On another side, some other technology like Blockchain technology and edge computing make the integration these technologies simple and faster. In near future, Digital Transformation will require more than one technology, i.e., integration of technologies will be ion trend. The word ’Intelligent Automation,’ which is essentially the automation of the processes of the business (including general corporate-level processes using BPM and unique task-level processes using RPA), is therefore assisted by Artificial Intelligence’s analytics and decisions. This work discusses about Intelligent Automation, its internal structure, evolution and importance (with future work) in many useful applications (for Industry 4.0). In last, Intelligent Automation Systems has been explained for e-healthcare applications and give a perspective “How it can change Healthcare Industry and can save millions of lives. Keywords: Intelligent automation · Industry 4.0 · Future with machine learning · Blockchain applications and internet of things
1 Introduction In previous few months, COVID-19 pandemic has significantly changed human being lives. Countries are introducing several procedures like social distancing measures, including lockdown, to avoid the spread of infection. Also, Industries or Organisations are implementing new ways of continuing their business by facilitating Work from Home (WFH) and avoiding contacts using the Internet of Things (IoTs) (including Internet of Everything) for remote monitoring [1]. Schools and other educational institutes are implementing e-learning. Today most of the work is depend on Internet Connected © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 1–18, 2021. https://doi.org/10.1007/978-3-030-71187-0_1
2
A. K. Tyagi et al.
Devices (ICD) and their operations [2]. For example, Hospitals and health organisations are also using e-medicine and remote patient monitoring utilising Internet of Medical Things (IoMTs) [4]. Online shopping, contactless delivery and virtual social events are becoming popular. Today’s authorities are using sensors for contact tracing, traffic control and safety during this COVID 19 pandemic. During this pandemic, Cloud computing, IoTs and IoE are becoming ever more essential in the post- COVID-19 era. But, together this several issues like security, privacy, reliability, sustainability, performance, user experience, power requirements, management and compliance have been raised using such applications/ networking. Authorities are using sensors for contact tracing, traffic control and safety during this COVID 19 pandemic. This article help researchers to think beyond the other side of technology and push researchers/ scientist towards adaption of Intelligent Automation systems in Industry 4.0, which will save cost, time, incorporate value creation, efficiency increments, improving the customer experience, producing inventive opportunities and increasing more value. Today we need to overcome emerged issues and challenges in the post COVID-19 era. These issue and challenges solve through increasing the use of Intelligent Automation in many/daily life applications. Intelligent Automation (IA), the idea that may turn into the fundamental driving force behind digital transformation processes. Organizations are amped up for the detailed advantages and investing intensely in intelligent automation. Many organisations are spending billions of USD for making optimal robotic process automation plan. Gartner characterizes intelligent automation benefits as an umbrella term for an assortment of methodologies, skills, devices and procedures that service providers are utilizing to expel the requirement for labour and increase the consistency and unwavering quality of services while decreasing the expense of delivery. We are seeing the production of another model of working, Intelligent Automation (IA). It drastically changes the manner in which people and machines interact with one another under the quick development of data flows, digitization of life and for all intents and purposes unlimited computing power. Intelligent automation software really has an “understanding” of our business procedures and their varieties and considers that knowledge when executing automated business process approval to check that the correct business result occurred. Today Industries mainly focus on work costs, productivity, compliance, accuracy, and system efficiency. Improving at least one component in any given parameters means it increases revenues for an industry. For example, better customer experience increase demand of Product. Intelligent automation tends to a developing need that organizations operationally overlooked throughout the years. It likewise addresses the work process and systems integration gaps left out by software suppliers. Note that together in current scenario/ post COVID 19, Robots (machine with Intelligence or machine intelligence) can reduce human contact, transmission of COVID-19. For example, robots can serve as a doctor-patient interface in which they can conduct diagnosis and treatment processes, minimising human interaction and the risk of infection transmission during the coronavirus pandemic. Organisation of this Work:Organisation of this Work: Section 2 discusses our motivation behind writing article related to this emerging area or hot topic. Section 3 discusses about Necessity (or Objectives), Features and Possibilities of intelligent Automation in
Intelligent Automation Systems at the Core of Industry 4.0
3
current or future next decade. Further, Sect. 4 discusses various future Possibilities with AI and IoT towards Intelligent Automation. Then, Sect. 5 discusses related work in detail. Section 6 concludes this work in brief with including several interesting future remarks for future researchers and scientists.
2 Motivation Industry 4.0 is the necessity of future and modern society. Intelligent automation systems will be in trend in near future in all possible sectors. Millions of lives can be saved or profit can be increased if fully automation is allowed in all sectors, on other side, millions of people will get many opportunity like to development new systems, write new code or solve new problems (raised in near future according to the scenario). Together information technology, Machine Learning (ML) is the fastest growing area in current/ past decade. Here, using ML in Healthcare like Health Informatics (HI) have most application services/ challenges, like providing improved medical diagnoses, disease analyses, and pharmaceutical development. These enhancement/ improvement change the living standards of a patient to a new level. And make a patient happier to recover himself. In near future, Robots can serve as an intermediary between a doctor and a patient during the duration of COVID 19, where they can conduct diagnosis and treatment processes, minimising human interaction and the risk of infection during the coronavirus pandemic. Hence, this kind of aim has been covered in this article. This work will provide information about Computer Vision and automated analytics mechanisms, or intelligent automation system for Medicare or Healthcare with discussing its role in a person’s life (in real). In near future, we will see many things automated and intelligent like driverless vehicles, drones in logistics, etc.
3 Intelligent Automation – Necessity (or Objectives), Features and Possibilities Smart automation’s key goal is to maximise consumer and employee loyalty and improve productivity [4]. Moreover, it creates savings in time and expenses, dramatically decreasing human involvement in the process cycle, enabling workers to have more time to concentrate on innovative changes, strategy, decision-making, etc. The reduction of errors in the systems, as well as a reduction in the use of paper, is another goal, as this trend is entirely geared towards solely digital management. Figure 1 discusses about progress towards Artificial Intelligence for Industry 4.0. Note that Industry 4.0 encourages full automation with intelligence in all sectors and benefits the modern society with its reliable services. Intelligent Automation used/ is an effective and integrated application of four main technologies to provide tangible solutions to society/industry 4.0 [5]. a) BPM: BPM is a process automation application that requires effective human, processes and data communication. BPM’s goal is to ensure that the infrastructure for organisational and business processes is strong.
4
A. K. Tyagi et al.
Fig. 1. Progress towards Artificial Intelligence for Industry 4.0
b) Robotic Process Automation (RPA): Robotic Process Automation is a technology aimed at reducing the level of human intervention in computer applications, particularly in the case of repetitive tasks which differ very little in each iteration [6]. RPA operates mainly by communicating with "high level" applications as opposed to machine language or computer code, which are the software layers at the graphical interface level. Implementation is a relatively quick technology and therefore can bring immediate benefits to a business through time and cost savings, particularly if it can be applied to the bottlenecks of some processes. c) Artificial Intelligence: Artificial Intelligence is a machine-simulation of human intelligence. In other words, it is the discipline that is attempting to build structures that can understand and reason like a human being. Other principles such as machine learning, deep learning, Natural Language Processing (NLP), image recognition, big data, etc. are included in Artificial Intelligence [7]. While it is a very broad term that encompasses several levels (from basic automations to advanced virtual assistants), the following virtues of the current business climate are worth highlighting: • • • •
Deciphering trends from past experience. Wise decision making. The analytics are prescriptive and predictive. User experience enhanced.
d) Integrations: One of the greatest headaches for an organisation is the relation and integration between systems, because each system or software has its own particularities. They typically offer to communicate with an Application Programming Interface (API), which is normally based on certain principles such as SOAP (applied in Web Services) or REST (based on HTTP protocol), Possibilities for Intelligent Automation: Intelligent Analytics (or advanced analytics) will be a part of intelligent automation. Also, it will analysis a large amount of data [10] without any interruption, i.e., will make computer systems more decisions supported. In other words, computers will be able to provide intelligence in doing tasks and will do every task efficiently, perfectly (without any delay/ error).
Intelligent Automation Systems at the Core of Industry 4.0
5
4 Future Possibilities with AI and IoT towards Intelligent Automation Intelligent Automation and Intelligent Automation are two different terms, but related to computer vision. Note that AI and IoTs will be used in near future in various applications/areas like cyber security (find vulnerabilities/secure systems), software development (design software or finding bug quickly), cloud computing (storing data at optimal location, according to importance), education (provide best optimal solutions for reliable e-learning classes), agriculture (deciding which soil is good for which crop based on fertilizer and weather condition and using prevention mechanism to avoid any major loses to crops, occurs due to natural hazards), defence (reducing human workforce at borders and providing security through sensing), aerospace (controlling rover speed and recovering of any damages automatically), etc. Major Requirement of future: • • • • •
A Contactless World Decentralized Economy Distributed Web Emotion of AI: Emotions in AI need to be come in near future Digital Twins: Digital twins are virtual replicas of physical devices that can be used by data scientists and IT helps to run simulations before building and deploying real devices. It mainly uses IoTs in its construction process. Hence, the use of intelligent automation can be discussed in detail as:
4.1 Intelligent Automation/Robotics/Artificial Intelligence in Software Development Technology is at the root of all the innovations we see today in our lives. During the last few years, software development technologies have seen a major transformation (1980– 2020). For example, it makes business very simple, easier and profitable. Together it reduce s load of many consumers. Technology today makes the world go round in almost all industries, and is part of the nation’s economy. In their software development, companies have embraced almost all new technologies, and Artificial Intelligence (AI) is no exception. The impact of artificial intelligence on software creation shifts the way businesses work and make software smarter. Today AI is using by almost all industries which work through software and hardware (AI uses as Cyber Physical Systems). Traditional software development is not intended to embrace these changes, requiring a sequence of successive stages including manual code writing, requirements planning, software design, and testing to determine that the end product meets specifications. By building flexible and productive workflows to boost efficiency and minimise time-to market, Artificial Intelligence (AI) is disrupting this process. While several software companies are still in the early stages of the implementation of AI, the use of technology is increasingly rising in the enterprise. Revenues from the application of AI instruments worldwide are projected to hit $119B by 2025. Notice that AI is not a substitute for Human Intelligence (HI). Although helping software development teams save time by reducing human error associated with repetitive tasks, AI constantly learns and produces
6
A. K. Tyagi et al.
human interactions. With this development, we are confident that AI can enhance software development, agile test automation, software automation testing, as well as the way RPA bots work with software support [6]. AI instruments seek to make the production of software more effective, quicker and easier. Few possible uses of AI in software development are: • Requirement Gathering: Being a conceptual phase of SDLC (Software Development Life Cycle), the requirement gathering requires maximum human intervention. Artificial intelligence offers a broad range of techniques/tools like Google ML Kit, and Infosys Nia to automate certain processes to minimize human intervention to some extent. This phase includes plenty of emphasis on detecting loopholes early before moving to design. A technique of AI called Natural language processing will make machines understand the user’s requirements in natural language and automatically derive high-level of software models. Also, there are few issues with this approach including difficulties in balancing the developed systems. However, it is one of hot research topics (for research) todays. • Agile Project Management: The benefits of efficiency gained by applying AI extend beyond routine task management. The application of AI in software development can help developers become more agile and lean in the way code is written, tested, and delivered to production teams. AI algorithms can be used to improve project timelines, cost estimates, and volumes such as enabling development teams to prioritize sections of code needing early completion, and becoming more precise in defining potential failure rates. AI/ ML algorithms proactively search larger code databases looking for abnormalities, communicating possible future steps to developers for prevention [10]. Abnormalities may include missing code, bugs, or alternative product or service names under the same code. This is not only useful in recovery, i.e., the analysis of information may assist with the pre-diagnosis of errors. • Automate Software Design: Designing software code is an essential, complex, and demanding stage of the development process because teams locate at different geographical location. Planning and designing a project requires developers, designers, R&D (Research and Development), and marketing teams to work collaboratively (needs specialized learning and experience) by being transparent and communicating effectively, which is done manually till now. In other words, settling on a correct design for each stage is an error-prone task for designers. Retracts and forward investigating plan forces dynamic changes to the design until the client reaches the desired solution. But in near future AI/ ML algorithms can help to streamline and automate the planning and designing process by gathering data such as names of project stakeholders, location, customer needs, products, and type of business to auto-create intuitive instructions on what design approach to take without requiring manual intervention. Automating some complex procedures with AI tools can enable the most capable methods to design the projects. For example, using AIDA (Artificial Intelligence Design Assistant, website building platform), designers can understand the needs as well as the desires of the client and use that knowledge to design the appropriate project. This (AI) can help automate the code design process, saving programmers time, effort, and money.
Intelligent Automation Systems at the Core of Industry 4.0
7
• Automated Code Generation: Taking a business idea and writing code is timeconsuming and labour-intensive for a large project. Experts have approached a solution that writes code before beginning production to confront the time and money issues. The technique, however, is not good for uncertainty such as what the objective code intends to do as it takes a lot of time to gather these information, such as writing code from scratch. An intelligence programming help with AI can decrease the load to some degree. Imagine that it would be understood and translated into executable code if we explain the project definition in our natural language and our framework. It looks like impossible today/science fiction, software development artificial intelligence might flip the script, but it will be feasible in the near future. Natural language processing and AI software would allow for this. Also, it is always labour-intensive and time-consuming to code a broad and complex project with multiple stakeholders.AI coding assistants will dramatically reduce development team workload, while increasing productivity. Note that by concentrating on more innovative and strategic projects, such as enhancing the user interface, developers can improve productivity. • Software streamlining Testing: Testing is the central component of every software development cycle. A big problem for development teams is the identification and avoidance of errors or bugs. Fixing bugs and errors consists of a large amount of software development costs. Early identification of errors requires continuous monitoring during a project’s production life cycle. Current software testing practises, however, are costly, unreliable, and time-consuming because errors are found in the code in many instances after the product has been developed and delivered/launched on the market to end users. Trained AI and ML algorithms will ensure that the testing conducted in less time than manual testing is error-free, allowing code testers to concentrate on more critical tasks such as code maintenance. AI-enabled sample code testing enables development teams to conduct mass testing on thousands or millions of codes. Development teams can tackle case-specific tests while automation tools aided by AI can manage routine and time-consuming tests. This eventually leads to a reduction in errors because AI-assisted tests scope and correct errors with sheer accuracy, leading to an increase in overall software quality improvement. In near future, AI can integrate with cloud and do testing automatically for software testers, it will fix specific bugs, provide deployment to product at before the deadline. As a result, it not only saves time, money and energy, but also provides a high ROI for the organisation (return on investment). Notice that software testing is a critical stage in the production of software, which ensures the product’s consistency. Note that a few examples of AI and machine learning-based testing platforms are Approach, Feature and Testim.io. • Enhance decision-making: Software developers make important decisions about which features to include in a product for a lot of time and money. By analysing the performance of current applications and prioritising products and features for future development, AI will speed up the decision-making process. Computer businesses are able to make data-driven business decisions on a wide scale rapidly, optimising market influence and the sales (using AI in software development). Using AI technologies such as advanced Machine Learning (ML), deep learning, natural language processing, and business rules, software developers would be able to create better software quicker. The ability to learn from previous software projects and assess the success of current projects is empowered by machine learning solutions. AI in the
8
•
•
•
•
A. K. Tyagi et al.
development of software not only encourages development but also contributes to better implementation. AI builds better software with strategic decision-making for Software developers. In Deployment Control: Phase of implementation is the stage where developers often update programmes or software to newer versions. If developers fail to properly execute a process during update, the execution of the programme would present a high risk. During upgradation, AI will prevent developers from such vulnerabilities and reduce the risk of deployment failure. Another effect of artificial intelligence is that it helps machine learning algorithms to evaluate the deployment process. Enhanced Data Protection: A key property that we should not neglect during growth is software protection. The system typically collects data from client-end mounted network sensors and applications.AI helps us to examine the data by learning machine to differentiate anomalies from common behaviours. Other software development companies that implement AI will also prevent delayed warnings, false alarms, and alerts in their development process. Improving the accuracy of estimates: AI provides a software estimation approach that includes reviewing historical data from the company’s earlier projects to identify similarities and statistics. To deliver reliable expense, time and effort estimates, it uses predictive analytics as well as business rules. The Development Future: AI has tremendous potential to reshape software development in the future. Availability of AI-enabled applications enables tech firms to deliver customer-driven interactions by offering business-specific application strategy.
Coding becomes stronger, improved by integrating with AI tools, and error detection becomes easier. AI algorithms and advanced analytics help software development teams to make immediate decisions using real-time data at scale. In contrast to machines that respond to rule-based logic or provide pre-determined responses, AI applications perform complex and intelligent human thought-related functions. By collecting and analysing data from a variety of sources including microchips, sensors, and remote inputs, AI algorithms can automate the coding process using that data to help developers construct accurate code, resulting in more effective, agile, and scalable workflows. AI may also provide highly personalised products or services for customers in software development. Artificial intelligence can have a huge effect on both architecture and software development. Software development companies need to consider the effects of artificial intelligence and the possible benefits it can bring, not just in terms of software design, but also in terms of the essence of the software itself. In the design, code generation and testing of applications, etc., AI will play a significant key role., in near future, will be a game-changer. 4.2 Intelligent Automation/Robotics/Artificial Intelligence (AI) in Cloud Computing In this smart era, two technologies like artificial intelligence and cloud computing are changing the lives of human being to the new level. In other words, Artificial Intelligence (AI) and Cloud Computing (and Edge Computing) have emerged together to improve the lives of millions. Note that Edge computing is a new advanced version
Intelligent Automation Systems at the Core of Industry 4.0
9
of Cloud Computing [8]. Today’s many Digital (smart) devices mixing AI and cloud computing in our lives every day, such as Siri, Google Home, and Amazon’s Alexa, etc. On a larger scale, AI capabilities are working to make businesses more effective, strategic, and insight-driven in the business cloud computing world. In general, through hosting data and software in the cloud, cloud computing gives organisations greater versatility, agility, and cost savings. A cloud in business (industry) can be public, private, hybrid, providing end-users with services such as SaaS, PaaS, IaaS, etc. An integral cloud computing services is infrastructure that includes the provision of computing and storage devices. Cloud providers also provide data centre services that cover the various databases available. This development chain is moving technology towards the growth of Artificial Intelligence and Cloud Computing. To produce better results from large amount of data stored by Internet of Things (via their communication), cloud comes into existence with AI. Artificial Intelligence technologies combine with cloud/edge computing and help businesses manage their data, search for knowledge trends and insights, provide better customer service, and optimise workflows [9]. AI has enormous business strengths, the need for technological resources and vast infrastructure has made it less feasible for many companies. Benefit from AI technology can be like if we lack top technical talent, and access to huge data sets, then AI can help us with its massive computing power for refining this data (in intelligent way), Cost-Effectiveness, Increased Productivity, Reliability, Availability of Advanced Infrastructure. For example, such an AI-powered pricing module ensures that pricing for a business is always optimised. It’s not just about making better use of data; without the need for human involvement, it’s performing the research and then putting it into effect. That is, the cloud is democratising access to AI by allowing businesses the freedom to use it now. Some other useful benefits will be: • Significant and promising AI and cloud computing applications: Powering an AI (Artificial Intelligence) Self-Managing Cloud is built into the IT infrastructure to help streamline workloads and automate repeat tasks. • Improving AI data management: Artificial intelligence solutions at the cloud level are also improving data management. • Having Things Accomplished with AI-SaaS Integration: Artificial intelligence technologies are now being applied to provide more value as part of broader Software-asa-Service (SaaS) systems. • Using Dynamic Cloud Services: As a service, AI is also changing the way companies rely on instruments. Today Cloud and edge computing acts as an engine to increase an area’s reach and AI effects. At all levels, AI and cloud (also Edge) computing are transforming business, especially with a significant impact on large-scale business. A smooth AI flow and cloud-based tools are making many services a reality [11]. For example, it is possible to enjoy services (without creating a unique ML mode) Which parallel systems, i.e. text processing, voice, vision and computer language translation, are available to end-users. But previously, we need to provide more meaningful data, more human workforces for analysing large amount of data, then prediction gets better and the accuracy is improved.
10
A. K. Tyagi et al.
With these AI-based phases, the number of organisations improving talks profits the craving to put money into cognitive technology capabilities. In conclusion, the use of AI in cloud computing is definitely not a drastic or revolutionary move. It’s a revolutionary one, in many ways. We need AI and the cloud’s ’test and learn’ capabilities. We are optimistic that the merger of cloud computing services and AI technology would bring substantial improvement to the technology sector. In near future, AI will become a factor of production with having capability of large storage. With introducing many examples/ possible use (in next subsections) for future researchers, scientists, etc., towards AI for cloud and edge computing, we assure that with the growth of cloud computing, AI field needs to emerge, investment in AI would advance the cloud sector and be able to hit a new level in terms of income and productivity. Therefore, in the next few years, we should hope that the market will begin to explode, with AI pushing cloud computing higher than ever, as the cloud market will transmit AI’s benefits to the mainstream. 4.3 AI for Cyber Security: Intrusion Detection Automatically by Machine and Artificial Intelligence In recent years, artificial intelligence approaches have evolved rapidly, and their implementations can be seen in practise in many fields, ranging from facial recognition to image processing. AI-based techniques will provide improved cyber defence capabilities in the cyber security domain and help adversaries refine attack methods. Malicious people, however, are already aware of the emerging prospects and would undoubtedly try to exploit them for sinister purposes. Cybersecurity includes designing defence strategies that preserve unauthorised access, alteration, or destruction of computer infrastructure, networks, services, and data. New cybersecurity threats are evolving and changing rapidly because of the drastic developments in information and communication technologies. AI is now being used in the area of cybersecurity to advance defensive capabilities. AI can be used to process large quantities of data with reliability, precision, and speed, based on its powerful automation and data analysis capabilities. In order to detect similar attacks in the future, even if their patterns shift, an AI system should take advantage of what it knows and recognise past threats. The exponential advancement of computer technology and the internet is having a huge effect on the everyday lives and jobs of people. Unfortunately, several new cybersecurity problems have also been triggered by it: First, the data explosion renders manual review impractical. Second, threats are rising at a high rate, meaning that new, short-lived species and highly adaptive threats are becoming widespread as well. Third, at present, the risks compromise different dissemination, infection, and evasion techniques; they are thus difficult to identify and predict. In addition, the cost of avoiding threats should also be weighed. Generating and implementing an algorithm requires a lot of time, resources, and effort. Recently, researchers have suggested numerous techniques to detect or categorise malware, detect network intrusions, phishing, and spam attacks using AI techniques; counter Advanced Persistent Threat (APT); and recognise domain created by domain generation algorithms (DGAs).
Intelligent Automation Systems at the Core of Industry 4.0
11
In order to improve the efficacy of the malware, AI technology can be further weaponized, making it more autonomous, more advanced, quicker, and more difficult to detect. The new generation of malware is smarter and able to function autonomously, with the help of AI. Intelligent malicious programmes may spread themselves on the basis of a set of autonomous choices in a network or computer system, intelligently tailored to the host system parameters, and autonomous malware capable of choosing lateral movement techniques, thus raising the probability of completely compromising the targeted networks. In cryptography, the use of AI opens new frontiers for security investigations. Scientists see AI as an important response to the continuous growth in the number of cyber threats and the need for a rapid response and a significantly automated response to security attacks, and the increase in the sophistication of cyber threats. AI technology, on the other hand, also contributes to certain security concerns that need to be addressed. Currently, malware recognition and analysis, intrusion detection (focusing on networkbased anomaly attacks), phishing and spam, and advanced persistent threat detection and characterization are the prime targets for AI applications. Usually, intrusion detection systems rely on hybridization strategies that combine many strategies: signature-based methods for rapid detection of known threats with low false alarm rates and methods for flag deviations based on anomalies. 4.4 AI for Industry 4.0 Industry 4.0 is the digital transformation and processes of value development of manufacturing/production and related industries. With the fourth industrial revolution, Industry 4.0 is being used interchangeably and represents a new stage in the organisation and regulation of the industrial value chain. The advent of modern digital industrial technology, known as Industry 4.0, is a revolution that makes it possible to capture and analyse data through computers, allowing higher-quality products to be manufactured at reduced costs in quicker, more versatile and more effective processes [12]. AI techniques coupled with recent developments in the Internet of Things, Web of Things, and Semantic Web-jointly referred to as the Semantic Web-promise to play an important role in Industry 4.0. The authors present, as part of this vision, a Semantic Web of Things for Industry 4.0 (SWeTI) platform. Few examples for AI scope in Industry 4.0 are Machine Intelligence in Medical Imaging, Machine Learning and AI for Penetration Testing and Machine Learning in Chemical Sciences. AI for Industry 4.0 is comprised of some of the following technologies: • • • • • •
Advanced analysis techniques Predictive analysis Machine learning Image analysis Natural language processing Mood, behaviour and personality analysis
12
A. K. Tyagi et al.
AI is one of the emerging technologies already being utilised by manufacturers to improve product quality, efficiency and for cutting down on operating costs. We are beginning to see a working relationship between human beings and robots, an area that benefits from the use of AI in production facilities. The smart factory, consisting of hyper-connected production processes, consists of numerous machines that all interact with each other, relying on AI automation platforms to capture and analyse all kinds of data, including images, structured code text and categorised fixed field text. A recent IDC survey of global organisations currently using AI technologies found that only 25% have developed an enterprise-wide AI approach. To boost their efficiency, many organisations are applying AI. There are vast amounts of data, however, that have not even been digitised or structured in a way that allows them to be used by AI. For example: Machine Learning in Medical Imaging- Machine learning is a method that can be applied to medical images for the recognition of patterns. Although it is a powerful instrument that can assist with medical diagnosis, it can be misused. Usually, machine learning starts with the machine learning algorithm framework computing the image characteristics that are believed to be significant in the prediction or diagnosis of interest. There are numerous strategies, each with various strengths and disadvantages, that can be used. Most of these machine learning approaches have open-source versions that make them easy to try and apply to images [13]. There are several metrics for measuring an algorithm’s performance; however, one must be conscious of the possible associated pitfalls that can lead to misleading metrics. Machine Learning and AI for Penetration Testing and Machine Learning in Chemical Sciences- In order to determine its security, penetration testing requires conducting a controlled attack on a computer system. Currently, it is one of the key methods employed by organisations to strengthen their cyber threat defences. Network penetration testing, however, requires a considerable amount of training and time to perform well, and there is a growing shortage of qualified cyber security professionals at present. Current approaches to automated penetration testing have focused on techniques that involve a model of exploit performance, but as new software and attack vectors are produced, the cyber security environment is evolving rapidly, making it a challenge to produce and sustain up-to - date models. Massive amounts of data on the one hand and cost pressure, disruptive innovations and more and more regulations on the other – modern manufacturing faces challenges that can only be met by the smart use of Artificial Intelligence (AI). AI offers information for production processes and applications with skills that can mimic human cognitive functions. Manufacturing components equipped with these capabilities can perceive their environment, mimic it, learn from data, make predictions and enhance their own programming independently. In manufacturing, Artificial Intelligence is essential to linking humans, computers, goods and data intelligently, as well as to the smart and quick use of IoT (Internet of Things), cloud solutions, machine learning and predictive maintenance. 4.5 AI for Data Science In today’s time, artificial intelligence, also known as AI, and data science, have become the two most significant technologies sought after. Many times, people think of it as the same thing, but in fact, it’s not the same thing. For its processes, Artificial Intelligence
Intelligent Automation Systems at the Core of Industry 4.0
13
is used in the area of data science. After the explosion of massive data collected by them via various internet methods such as a laptop, smartphone, tablet, desktop, etc., there was substantial growth in the need for data processing for the industries. The most widely used interchangeably are Data Science and Artificial Intelligence. Although some aspects of AI may contribute to Data Science, it does not reflect all of it. The most famous field in the world today is Data Science. Actual Artificial Intelligence, however, is far from reachable. While modern data science is considered by many to be Artificial Intelligence, it is simply not so. In almost all sectors, Data Science has thus brought about a major revolution. All modern societies are data-driven, which is why data science has become a central part of the modern world. Artificial Intelligence is a field where artificial activities are conducted using algorithms. Its models are based on humans and animals’ natural intelligence. Similar patterns from the past are remembered, and when the patterns are replicated, related operations are performed automatically. It utilises software engineering principles and computational algorithms to create solutions to a problem. A lot of Artificial Intelligence is yet to be discussed, but Data Science, on the other hand, has already begun to make a significant difference in the market. The data that can be used for visualisation and analysis is transformed by Data Science. New products are developed with the aid of Artificial Intelligence, which are better than ever, and it also brings control by automatically doing several things [14]. Data is evaluated on the basis of careful business decisions, with the aid of Data Science, which gives businesses many advantages. Data Science Vs Artificial Intelligence: 1. Contemporary AI’s limits Data Science and Artificial Intelligence can be used interchangeably. But between the two sectors, there are some variations. ’Artificial Narrow Intelligence’ is the contemporary AI used in the world today. Computer systems do not have full autonomy and consciousness like human beings under this type of intelligence. Instead, they are only able to do tasks for which they are qualified. 2. Data Science is a comprehensive technique The analysis and research of data is Data Science. The duty of a Data Scientist is to make decisions that favour businesses. In addition, the role of data scientist varies with the industry. The key requirement is to pre-process data in the regular roles and responsibilities of a data scientist, that is, to perform data cleaning and transformation. 3. A platform for data scientists is Artificial Intelligence. Artificial Intelligence is a tool or a method for a data scientist. This approach is situated at the top of the other methodologies used for data analysis. This is better analysed by the Hierarchy of Maslow, where each part of the pyramid reflects a data activity carried out by a data scientist. 4.6 Artificial Intelligence – IoTs Integration for Other Sectors In IoT applications and implementations, artificial intelligence plays a rising role. Over the past two years, both investments and acquisitions in start-ups that combine AI and
14
A. K. Tyagi et al.
IoT have climbed. Major IoT platform software vendors now offer integrated AI capabilities such as analytics based on machine learning. In this context, AI’s importance is its ability to quickly write insights from data. Machine learning, an AI technology, offers the ability to automatically recognise patterns and identify data anomalies that produce information such as temperature, pressure, humidity, air quality, vibration, and sound from smart sensors and devices. IoT AI applications allow businesses to prevent unplanned downtime, increase operational performance, generate new products and services, and improve risk management. 4.6.1 Iot in Higher Education Systems In the context of information and communication technology and the advancement of society, the Internet of Things (IoT) continues to affirm its significant role. With the help of IoT, by delivering more affluent learning environments, increased operational efficiency, and by gaining real-time, actionable insight into student success, institutions can improve learning outcomes. In many disciplines and at any stage, the IoT stands to radically change the way universities operate and boost student learning. For universities or any other educational institution, it has tremendous potential; if well equipped to ensure widespread and effective implementation by leadership, workers, and students [15]. Where universities can lead, IoT requires growth. To direct the exploration and growth of IoT systems, computers, software, and services, scholars, researchers, and students are in a unique position. For students to get all types of information, the Digital Campus System is an important platform. Also impacting other aspects of campus management are emerging innovations. Higher education institutions, in particular universities, are increasingly in demand to digitise their content and activities and adapt their methods so that academics and researchers can work effectively in a digital environment. A digital university must have the technology to teach and facilitate teaching. Education, joint study and empowerment. Universities will face all contemporary digital challenges if they compete, but few have the vision, versatility, channels, or sufficient leadership to put policies in place to ensure that they can innovate or respond to market conditions. 4.6.2 Impact of Iot In Education Now and In Future The capacity of technology to interrupt teaching, learning, and evaluation has long been realised by universities. In addition, disruption of technology is necessary if a modern university is to differentiate its student bid, thereby increasing admissions, enhancing retention, and achieving desired results. But it is difficult to train students to be confident in the world of work. In all learning environments, IoT will allow better operational effectiveness. By enhancing learning environments, enhancing learning tools, improving learning methods and strategies, increasing management effectiveness, and saving management costs, IoT will facilitate classroom instruction. More engaging and interactive are the tools available for learning on computers, such as e-books. According to the Citrix 2020 Technology Environment Study (2015), IoT technology will change the learning experience in numerous ways in the next five years. The experience of learning will continue to become more virtual, learners will consume
Intelligent Automation Systems at the Core of Industry 4.0
15
new forms of information and learning, and classrooms will be better prepared to learn. Learning will gradually become an unforgettable experience for teachers and learners with accelerating awareness while taking new ideas and solutions around the world. Students are often trained for the future of jobs and future career aspirations. 4.6.3 A Learning Management System Enhanced with Internet of Things With the use of Learning Management Systems (LMS) as a method for designing, distributing, monitoring, and handling different forms of educational and training content, a revolution in the advancement of online learning happened. Since the first LMS was released, significant technical advances have turned this tool into a powerful curriculum management application, offering rich-content courseware, assessment and evaluation, and dynamic collaboration. Learning Management Systems (LMS) are an emerging technology in today’s society, offering online learning materials with course development, distribution, management, monitoring, reporting, and evaluation. It is a centralised software application that integrates pedagogical features with the emerging technology of virtual learning environments. In this way, learners can access resources, upload assignments, take tests, and share data with peers and instructors using personal devices such as mobile phones and tablets, thus creating a dynamic learning environment. The learning process is automated by LMS software by registering users, monitoring classes, documenting learner data and handling reports. Note that to read more about this section, i.e., AI scope in near future or future with emerging technologies, readers are suggested to read our work [18–20]. In near future, Intelligent Automation platforms end user’s typical activities and interfacing them in an end-to-end process, utilizing artificial intelligence. Automation together with artificial intelligence makes another sort of workforce that drives digital transformation and widens business opportunities. In last, readers are suggested to find important information from work [16, 17] related to Computer vision, Artificial intelligence (machine learning and deep learning).
5 Related Work Note that Intelligent Automation (IA) systems can be deployed in almost every industrial part. They process immense measures of data as well as analyze information, spot irregularities, check for correctness, learn during the time spent working, adjust to changes and take decisions. Despite the fact that the final confirmation still relies upon a human operator, a significant part of the work is done by the intelligent automation software, resulting in time-saving. Advanced strategies and an incredible computing capacity to make another age of hardware and programming robots that perform both cognitive and physical tasks. Numerous intelligent automation examples can be found in different fields. Figure 2 discusses about several learning techniques available in the past decade and in present and in the next decade. We can easily find that Machine Learning (ML) is a common learning technique in all era. Connected-RPA adoption should be driven
16
A. K. Tyagi et al.
Fig. 2. A comparison of learning techniques in past, present and future decade
not simply by the promise of more prominent cost savings and operational efficiencies however by other criteria as well. Key drivers incorporate value creation, efficiency increments, improving the customer experience, producing inventive opportunities and increasing more value from staff. Other remarkable results incorporate accomplishing more excellent operations, more noteworthy operational agility and progressively significant information for customer insights. While connected RPA is profiting associations over all sectors, those that work in industries requiring severe administrative or compliance necessities are additionally utilizing the innovation to improve risk reduction. Different parts, where an organisation’s core business activities have critical manual-driven procedures, are executing it as well. Note that the intelligent automation race is well in progress in numerous organizations today. Intelligent Automation (IA) is as yet an infant in the realm of innovations, however, it learns and grows quickly, turning into a significant player in the market. Intelligent automation patterns are in the spotlight, catching the attention of CEOs, designers and experts around the globe.
6 Conclusion Intelligent Automation (IA) is a concept that describes a comprehensive digital transformation approach focused primarily on process management (BPM) for users/industries/systems and robots (RPA) depending on (at any time) business requirements. Born as a concept linked to digital transformation, but with the benefit of being better described, intelligent automation proposes a real solution by integrating four technology branches: BPM, RPA, Artificial Intelligence and Integration. IA involves the use of analytics and AI (especially machine learning) to make automated and smart decisions, and case management to provide adequate flexibility for processes to succeed in end-to - end case management. Finally, it is worth noting that the convergence between the various systems used in the business is another main aspect of this trend. Integration would prevent data from being duplicated in the applications and users will only have to operate on one platform.
Intelligent Automation Systems at the Core of Industry 4.0
17
Scope of the Work This work is compiled by many research articles (published in recent decade), also includes personal thoughts of author also. In near future, researchers can read this article and find problem for their interest.
References 1. Lee, I., Lee, K.: The Internet of Things (IoT): Applications, investments, and challenges for enterprises. Bus. Horizons 58(4), 431–440 (2015) 2. Marzegalli, M., Lunati, M., Landolina, M., Perego, G.B., Ricci, R.P., Guenzati, G., Schirru, M., Belvito, C., Brambilla, R., Masella, C., Di Stasi, F.: Remote monitoring of CRT-ICD: the multicenter Italian CareLink evaluation—Ease of use, acceptance, and organizational implications. Pacing Clin. Electrophysiol. 31(10), 1259–1264 (2008) 3. Joyia, G.J., Liaqat, R.M., Farooq, A., Rehman, S.: Internet of Medical Things (IoMT): applications, benefits and future challenges in healthcare domain. J. Commun. 12(4), 240–247 (2017) 4. Gilabert, E., Arnaiz, A.: Intelligent automation systems for predictive maintenance: a case study. Robot. Comput.-Integrated Manuf. 22(5–6), 543–549 (2006) 5. Vaidya, S., Ambad, P., Bhosle, S.: Industry 4.0–a glimpse. Procedia Manuf. 20, 233–238 (2018) 6. Willcocks, L., Lacity, M., Craig, A.: Robotic process automation: strategic transformation lever for global business services. J. Inf. Technol. Teach. Cases 7(1), 17–28 (2017) 7. Russell, S., Norvig, P.: Artificial intelligence: a modern approach (2002) 8. Gill, S.S., Tuli, S., Xu, M., Singh, I., Singh, K.V., Lindsay, D., Tuli, S., Smirnova, D., Singh, M., Jain, U., Pervaiz, H.: Transformative effects of IoT, Blockchain and Artificial Intelligence on cloud computing: evolution, vision, trends and open challenges. Internet Things 8, 100118 (2019) 9. Dirican, C.: The impacts of robotics, artificial intelligence on business and economics. Procedia-Soc. Behav. Sci. 195, 564–573 (2015) 10. Tyagi, A.K.: February. Machine Learning with Big Data. In Machine Learning with Big Data (March 20, 2019). Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur-India (2019) 11. Rekha, G., Tyagi, A.K., Anuradha, N.: Integration of fog computing and internet of things: an useful overview. In: Proceedings of ICRIC 2019, pp. 91–102. Springer, Cham (2020) 12. Empowering Industry 4.0 with Artificial Intelligence. dqindia.com 13. Erickson, B.J., et. al.: Machine learning for medical imaging. RadioGraphics 37(2) 14. Data Science Vs Artificial Intelligence – Eliminate your Doubts, data-flair.training 15. Aldowah, H.: Internet of Things in Higher Education: A Study on Future Learning, Article in Journal of Physics Conference Series, November 2017 16. Tyagi, A.K., Chahal, P.: Artificial Intelligence and Machine Learning Algorithms, Book: Challenges and Applications for Implementing Machine Learning in Computer Vision. IGI Global (2020). https://doi.org/10.4018/978-1-7998-0182-5.ch008 17. Tyagi, A.K., Rekha, G.: Challenges of applying deep learning in real-world applications. Book: Challenges and Applications for Implementing Machine Learning in Computer Vision, IGI Global 2020, pp. 92–118. https://doi.org/10.4018/978-1-7998-0182-5.ch004 18. Tyagi, A.K., Nair, M.M.: Internet of Everything (IoE) and Internet of Things (IoTs): Threat Analyses, Possible Opportunities for Future, 15(4) (2020)
18
A. K. Tyagi et al.
19. Tyagi, A.K., Nair, M.M., Niladhuri, S., Abraham, A.: Security, privacy research issues in various computing platforms: a survey and the road ahead. J. Inf. Assurance Secur. 15(1), 1–16. 16p. (2020) 20. Pramod, A., Naicker, H.S., Tyagi, A.K.: Machine learning and deep learning: open issues and future research directions for next ten years. In: Computational Analysis and Understanding of Deep Learning for Medical Care: Principles, Methods, and Applications, 2020, Wiley Scrivener (2020)
Hybrid Extreme Learning Machine and Backpropagation with Adaptive Activation Functions for Classification Problems T. L. Fonseca(B)
and L. Goliatt
Computational Modeling Program, Federal University of Juiz de Fora, Juiz de Fora, Brazil [email protected], [email protected]
Abstract. This paper proposes a hybrid approach of Extreme Learning Machine with Backpropagation with adaptive activation functions for classification problems. In general, machine learning research seeks to find algorithms that can learn specific parameters through data to create increasingly accurate generalist predictive models. In some scenarios, these models become very complex, requiring great computational power for both the training stage and the predictive stage. Adaptive activation functions emerged intending to increase models’ predictive capacity, thus generating better models without increasing their complexity. We evaluate the performance of the proposal in a benchmark of ten classification problems. The results obtained show that the hybrid approach with adaptive activation functions, in general, surpasses the standard functions with the same architecture. Keywords: Adaptive activation functions · Extreme learning machine · Backpropagation · Classification problems
1
Introduction
One of the fundamental components of an Artificial Neural Network (ANN) is the Activation Functions (AF). The vast majority of early works used the following AF to compose their architecture: Binary, Linear, Sigmoid and Hyperbolic Tangents [9]. The use of these functions, homogeneous or heterogeneous across layers, in machine learning problems, were and still are capable of solving problems in different domains with satisfactory quality. Although most works dedicate efforts to improve an architecture by improving weights, increasing the number of neurons or layers, using an appropriate AF in creating and learning an ANN brings significant advantages [1]. For this reason, some studies focus on defining which activation function or set of functions are best suited to the problem faced [16,24]. Due to their high complexity, some problems request ANN architectures with many layers and neurons to obtain an adequate approximate function. Although c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 19–29, 2021. https://doi.org/10.1007/978-3-030-71187-0_2
20
T. L. Fonseca and L. Goliatt
they usually present useful performance metrics, these models tend to be computationally slow, both in their training and inference stages [3]. These large models also negatively impact the environment due to the high computational cost for training and inference [20,28,29]. Additionally, these models are unfeasible in straight edge computing [21]. In this way, with the opposite objective of increasing the ANN architectures, research has emerged around a neuron’s learning power. In order to achieve greater predictive power in an ANN architecture, some works have focused efforts on AFs with the ability to adapt its format during the training stage and, during inference, fix its format [19]. This work will refer to this type of function as the Adaptive Activation Function (AAF). One of the first workers in the field carried out the adaptation of a polynomial function during its training stage, obtaining superior results than the traditional AF [23]. After the work presented, different studies related to AAF have been developed in others AFs, as well as Hyperbolic Tangent [4], Splines [2,33], PReLU [10], and Soft Exponential [7]. Through different numerical experiments, in different domains of problems, the works showed that the AAF surpassed the functions that were not adapted and that, besides, they had a fast convergence in comparison. The pioneering work of AAF confirmed the advantages of using functions that adapt during training, opening space for studies applied to specific domains, such as classification of electrocardiographic arrhythmias [31], inference of gene expression [18], classification of images generated by computed tomography [35], and approximation of smooth and discontinuous functions, as well as solutions of differential equations linear and non-linear partials [15]. As with most machine learning algorithms, ANNs need to be optimized to learn all their adjustable parameters. The most common optimizer for ANN, built based on the descending gradient, is Backpropagation [25]. Despite being one of the most used optimizers in ANN, it has some disadvantages, such as its slow convergence. In some cases, this leads to the need of use other optimizers such as Equilibrium Propagation [27], Evolution Strategies [30], and L-BFGS [34]. Among the same objective of finding good models with fast learning, networks based on ANNs, called Extreme Learning Machines (ELM), emerged [12]. With some fixed parameters and others adjustable through simple optimization, these networks can obtain impressive results in a small time interval with a low computational cost. In some cases, combining the best features of the optimizers, as ELM and Backpropagation [36], can generate an optimization technique that performs better than each individually. In this work, we propose an AAF research for a set of 10 classification benchmark datasets in an ELM-based model, which trains the output layer using Backpropagation. The results indicate that the combination of the ELM network with AAF produces better results than traditional AF. The study’s principal constructive contributions can be summarized in the following points: (1) This is the first contribution to the implementation of ELM with AAF in classifications problems with the best knowledge of the authors; (2) This paper highlights two ways of utilizing AAF, Adaptive Layer and Adaptive Neuron functions.
Hybrid ELM and BP with AAFs for Classification Problems
2
21
Materials and Methods
2.1
Datasets
In this work, to carry out the numerical experiments, ten classification databases were selected through the PMLB [22], a broad benchmark suite for machine learning evaluation and comparison. Before conducting the training step, we randomly separated 20% of the database to perform the test step. Table 1 presents a summary of the classification databases used in numerical experiments, containing their name, quantity used in training and testing, number of features, and the number of classes. Table 1. Summarization of the classification databases used in numerical experiments. Dataset Problem
Train Instances Test Instances Features Classes
1
analcatdata lawsuit
211
53
4
2
2
australian
552
138
14
2
3
breast w
559
140
9
2
4
buggyCrx
552
138
15
2
5
dna
2548
638
180
3
6
hypothyroid
2530
633
25
2
7
monk1
444
112
6
2
8
penguins
266
67
7
3
9
sonar
166
42
60
2
10
threeOf9
409
103
9
2
2.2
Extreme Learning Machine (ELM)
The machine learning technique called Extreme Learning Machine (ELM) [13] is a particular case of an artificial neural network, where its first layers have fixed weights, and only its last layer has adjustable weights. Most works using ELM use a version of only one hidden internal layer. ELM, compared to other learning techniques, has the advantage of being a fast learning model, adequate generalization capacity, and convenience in terms of modeling [8]. In ELM, all hidden parameters of the first layers are randomly assigned, including those present in the activation functions, and the rest of the last layer parameters are chosen using the generalized inverse of the output matrix of the last hidden layer [14]. The output feature of the ELM, with only one hidden layer used in this article, is defined by [26]: yˆ(x) =
L
βi G(wi , bi , x)
(1)
i=1
where yˆ is the ELM prediction associated to the input vector x, wi is the weight vector that multiplies the input vector x to generate the input value of the
22
T. L. Fonseca and L. Goliatt
ith-neuron in the hidden layer, bi is the bias of the ith-neuron in the hidden layer, βi are output weights that multiply the output of the ith-neuron in the hidden layer, G(·) is the nonlinear activation function, and L is the number of neurons in the hidden layer. The parameters (w, b) are generated randomly through normal distribution, with zero mean and standard deviation equals one, and weights βi of the output layer are determined analytically. In a traditional ELM, the output weight vector [β1 , ..., βL ] can be determined by minimizing [11] β − y + C||β β ||2 ) (2) min (Hβ β ∈RL
where y is the output data vector, H is the hidden layer output matrix ⎡ ⎤ ⎡ ⎤ G1 (w1 , b1 , x1 ) · · · GL (wL , bL , x1 ) y1 ⎢ ⎥ ⎢ . ⎥ . . . .. .. .. H=⎣ ⎦ and y = ⎣ .. ⎦ G1 (w1 , b1 , xN ) · · · GL (wL , bL , xN )
yN
is the output data vector with N the number of data points. The optimal solution is given by β = (HT H)−1 HT y = H† y where H† is the pseudoinverse of H. In this work, also intending to adjust the parameters present in the AAF, we modified the optimization process to obtain the β parameters. Instead of calculating the pseudoinverse of H, we use the traditional Backpropagation. This modification generates a hybrid version between ELM and ANN, where all the hidden parameters of the first layers are still fixed and generated randomly through normal distribution, and the final parameters, including those of AAF, are obtained through Backpropagation. 2.3
Adaptive Activation Functions
When working with traditional ANNs, we use standard AF that do not change their shape during the training phase. Despite this, we have the weights and biases that adapt during the training stage to optimize a specific performance metric. The optimization of these parameters allows the models, whether they are an ANN or not, to learn an approximate function of the training data and allow this same model to generalize to new data. AAFs emerged intending to increase an ANN’s learning capacity, allowing some of its parameters to be learned during the training phase, just as it is done with weights and biases. In this way, the AAFs are able, during the training stage, to adapt their shape in order to optimize the performance metric, something that is not possible with the standard AF. Table 2 presents the AFs, and their adaptive forms, studied in this work. The parameters α, β, and γ have been set to the values shown in table for the standard AF. In adaptive cases, the values shown in the table are their initial values, which may converge to other values during the training stage.
Hybrid ELM and BP with AAFs for Classification Problems
23
Table 2. Parameters and formulation of AFs. Activation Function
α
β
γ
Hyperbolic Tangent −1.0 −2.0 Mish
1.0
1.0
PELU
1.0
1.0
Quadratic
0.1 10−6 10−6
General Function Formula α 1−exp(−βx) 1+exp(−βx) α x tanh(log(1 + exp(βx))) α x if x ≥ 0 β α(exp( βx ) − 1) otherwise αx2 + βx + γ
This work proposes two ways to build an adaptive architecture: (1) Adaptive Layer and (2) Adaptive Neuron. In the case of the Adaptive Layer, the neurons of the same architecture layer share the adaptive parameters of the AF. As for the Adaptive Neuron, each of the architecture’s neurons has its individual adaptive parameters, without sharing it between them. Using the Adaptive Layer strategy, the number of adapted parameters is less than or equal compared with the Adaptive Neuron strategy. There is a trade-off between the number of parameters that need to be adapted and the architecture’s complexity.
3
Computational Experiments and Discussion
The same architectural structure, number of neurons per layer, for each database, was used to verify the AAFs’ learning capacity compared to the standard AFs. This decision guarantees that when comparing the strategies, the architecture used is the same in terms of the number of layers and neurons, varying only the existence of adaptive parameters within the AFs. Besides, we created an ELM model with only one hidden layer, containing the number of neurons equal to twice the number of features. For all experiments performed, varying the database, the AF and the adaptive type (Normal, Layer and Neuron), we trained the network using Adam optimizer [17] for 500 epochs with a batch size of 32 and a learning rate of 10−4 that was adjusted every ten epochs using a multiplicative factor of 0.99. We carry out 30 independent training and evaluation runs, calculating the Log Loss metric for each of them. Table 3 presents a summary of the results, showing the mean and standard deviation of the Log Loss metric of the independent executions. It is possible to observe that, in general, the adaptive functions have a lower mean than the standard function, indicating that they can better minimize the Log Loss. On the other hand, it is possible to observe that the Adaptive Layer strategy’s results surpass, for the most part, those of the Adaptive Neuron strategy. To expand the comparative analysis of the results, we performed a One-way Analysis of variance (ANOVA) [6] with a confidence interval of 95%. The null hypothesis that the means between each adaptive type are statistically significantly different from each other. The results showed that, in some cases, the test rejected the null hypothesis.
24
T. L. Fonseca and L. Goliatt
Table 3. Mean and standard deviation of the Log Loss metric for the different types of function and adaptive types. Values in bold represent the best result for the metric evaluated. Dataset Activation function 1
2
3
4
5
6
7
8
9
10
Normal
Layer
Neuron
Hyperbolic Tangent 0.595 ± 0.099
0.591 ± 0.104
0.527 ± 0.079
Mish
0.611 ± 0.098
0.608 ± 0.101
0.540 ± 0.076
PELU
0.594 ± 0.101
0.589 ± 0.107
0.526 ± 0.081
Quadratic
0.540 ± 0.074
0.643 ± 0.105
0.602 ± 0.095
Hyperbolic Tangent 0.532 ± 0.037
0.519 ± 0.044
0.525 ± 0.039
Mish
0.585 ± 0.036
0.579 ± 0.041
0.580 ± 0.037
PELU
0.523 ± 0.040
0.507 ± 0.049 0.516 ± 0.042
Quadratic
0.692 ± 0.011
0.686 ± 0.011
Hyperbolic Tangent 0.374 ± 0.058
0.329 ± 0.072
0.351 ± 0.060
Mish
0.461 ± 0.058
0.432 ± 0.073
0.440 ± 0.059
PELU
0.361 ± 0.059
0.302 ± 0.073 0.334 ± 0.059
Quadratic
0.670 ± 0.017
0.626 ± 0.047
Hyperbolic Tangent 0.602 ± 0.037
0.597 ± 0.042
0.598 ± 0.038
Mish
0.636 ± 0.032
0.633 ± 0.035
0.632 ± 0.032
PELU
0.596 ± 0.038
0.589 ± 0.045 0.591 ± 0.039
Quadratic
0.694 ± 0.011
0.689 ± 0.011
Hyperbolic Tangent 0.452 ± 0.014
0.260 ± 0.012
0.383 ± 0.015
Mish
0.553 ± 0.017
0.319 ± 0.018
0.491 ± 0.018
PELU
0.424 ± 0.015
0.200 ± 0.014 0.363 ± 0.015
Quadratic
0.683 ± 0.010
0.630 ± 0.020
0.688 ± 0.010
1.023 ± 0.001
0.913 ± 0.121
Hyperbolic Tangent 0.241 ± 0.008
0.200 ± 0.009
1.019 ± 0.001 0.202 ± 0.005
Mish
0.226 ± 0.007
0.192 ± 0.007
0.196 ± 0.005
PELU
0.225 ± 0.007
0.182 ± 0.010 0.194 ± 0.005
Quadratic
0.264 ± 0.005
0.189 ± 0.002
Hyperbolic Tangent 0.705 ± 0.017
0.704 ± 0.016
0.703 ± 0.016
Mish
0.702 ± 0.016
0.701 ± 0.016
0.700 ± 0.015
PELU
0.706 ± 0.019
0.705 ± 0.018
0.703 ± 0.018
Quadratic
0.697 ± 0.006
0.696 ± 0.005
0.695 ± 0.004
Hyperbolic Tangent 0.967 ± 0.073
0.961 ± 0.083
0.954 ± 0.076
Mish
1.012 ± 0.067
1.009 ± 0.072
1.001 ± 0.067
PELU
0.950 ± 0.076
0.943 ± 0.092
0.934 ± 0.080
Quadratic
1.090 ± 0.024
1.077 ± 0.026
1.069 ± 0.018
0.200 ± 0.002
Hyperbolic Tangent 0.675 ± 0.035 0.675 ± 0.036 0.675 ± 0.035 Mish
0.682 ± 0.027
PELU
0.675 ± 0.036 0.675 ± 0.037 0.675 ± 0.036
0.682 ± 0.027
0.681 ± 0.027
Quadratic
0.694 ± 0.005
0.693 ± 0.005
Hyperbolic Tangent 0.658 ± 0.032
0.657 ± 0.034
0.693 ± 0.005 0.656 ± 0.033
Mish
0.673 ± 0.025
0.672 ± 0.026
0.671 ± 0.025
PELU
0.656 ± 0.032
0.655 ± 0.034
0.653 ± 0.033
Quadratic
0.694 ± 0.005
0.692 ± 0.005
0.691 ± 0.003
Hybrid ELM and BP with AAFs for Classification Problems
25
With a 95% confidence interval, we used the Tukey test [32] to check where the difference between the means is significant. Figure 1 shows a count, with a maximum value equal to the number of databases, of which adaptive types of function showed a significant mean difference according to the Tukey test.
Fig. 1. Counting of adaptive function types that showed a significant mean difference according to the Tukey test.
The Adaptive Layer and the Adaptive Neuron showed a significant difference between them. Complementing with the results presented in Table 3, we realized that the Adaptive Layer obtains better results than the Adaptive Neuron. This may indicate that the Neuronio Adaptative strategy is suffering from overfitting due to the number of parameters to be adapted, inducing simpler architectures. It is important to note that both adaptive strategies have a considerable number of problems with a significant difference in results compared with the Normal type, reinforcing the gain from using AAFs. To finalize the analysis, we use Performance Profiles [5] to facilitate the visualization and understanding of the results obtained. Performance profiles are cumulative distributions of a performance metric that compare different strategies to determine which one presents the best performance in a set of specific problems. In general terms, the strategy that presents the largest area under the curve is considered the best among the analyzed strategies, considering the problems in which they were applied to solve. In this paper, we used Log Loss as a metric to evaluate the performance of models. Figure 2 and Table 4 show the Performance Profiles’ results for the different activation functions, different adaptive types, and the ten classification databases studied. It is possible to observe that the AAF strategy is better for these databases than with the standard function. Besides, the Adaptive Layer approach can perform better than the Adaptive Neuron approach, despite presenting close results.
26
T. L. Fonseca and L. Goliatt
Fig. 2. Performance profile curves for Log Loss Metric. Table 4. Normalized area under performance profile curves for Log Loss metric. Values in bold represent the best result for the metric evaluated. Activation function Normal Layer
Neuron
Hyperbolic tangent 87.76% 98.93% 96.76% Mish
88.50% 98.93% 97.10%
PELU
88.33% 98.97% 96.84%
Quadratic
86.81% 96.79%
97.63%
Even though the preliminary results presented are promising, a study with a more extensive set of databases in different domains is necessary. In this work, the analyzes are limited to only ten classification databases. The adaptive activation function can behave differently in classification, regression, clustering, and others. These differences must be identified to optimize the generated architectures.
4
Conclusion
This paper investigates Adaptive Activation Functions in a hybrid version of Extreme Learning Machine combined with Backpropagation to solve a benchmark of ten classification problems. The study show that Adaptive Layers and Adaptive Neurons perform better than standard functions without increasing the architecture size. This may indicate that smaller architectures with Adaptive Layers or Adaptive Neurons can achieve equal or better results than large architectures with standard functions due to improved architecture’s predictive power. Also, this reduces the need to search for large architectures. The PELU function was the one that obtained, in general, the best results for the analyzed databases and, besides, it presented a more significant performance gain in the realization of the parameter learning. With the analyzes carried out in this work, this adaptive activation function can result in better results in other problem sets. This study presents some of the limitations regarding using a standard activation function and two alternatives to increase its predictive power. To increasingly reduce the complexity of artificial neural networks and, at the same time, increase their predictive power, it is necessary to conduct research that
Hybrid ELM and BP with AAFs for Classification Problems
27
seeks to optimize the size of the architecture and the set of adaptive activation functions used. Acknowledgment. The authors acknowledge the financial support of CNPq (429639/2016-3), FAPEMIG (APQ-00334/18), and CAPES - Finance Code 001. The authors would like to thank Ita´ u Unibanco for hours released to its collaborator to develop this work.
References 1. Bishop, C.M., et al.: Neural Networks for Pattern Recognition. Oxford University Press (1995) 2. Campolucci, P., Capperelli, F., Guarnieri, S., Piazza, F., Uncini, A.: Neural networks with adaptive spline activation function. In: Proceedings of 8th Mediterranean Electrotechnical Conference on Industrial Applications in Power Systems, Computer Science and Telecommunications (MELECON 1996), vol. 3, pp. 1442– 1445. IEEE (1996) 3. Canziani, A., Paszke, A., Culurciello, E.: An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678 (2016) 4. Chen, C.T., Chang, W.D.: A feedforward neural network with function shape autotuning. Neural Netw. 9(4), 627–641 (1996) 5. Dolan, E.D., Mor´e, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002) 6. Fisher, R.A.: Xv.—the correlation between relatives on the supposition of mendelian inheritance. Earth Environ. Sci. Trans. Roy. Soc. Edinburgh 52(2), 399– 433 (1919) 7. Godfrey, L.B., Gashler, M.S.: A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. In: 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), vol. 1, pp. 481–486. IEEE (2015) 8. Guo, P., Cheng, W., Wang, Y.: Hybrid evolutionary algorithm with extreme machine learning fitness function evaluation for two-stage capacitated facility location problems. Expert Syst. Appl. 71, 57–68 (2017) 9. Haykin, S.: Neural Networks - A Comprehensive Foundation, 2nd edn. Prentice Hall (1998) 10. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015) 11. Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Networks 61(Supplement C), 32 – 48 (2015) 12. Huang, G.B., Wang, D.H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011) 13. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, 2004, vol. 2, pp. 985–990. IEEE (2004) 14. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
28
T. L. Fonseca and L. Goliatt
15. Jagtap, A.D., Kawaguchi, K., Karniadakis, G.E.: Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. J. Comput. Phys. 404, 109136 (2020) 16. Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 1(4), 111–122 (2011) 17. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 18. Kunc, V., Kl´ema, J.: On transformative adaptive activation functions in neural networks for gene expression inference. bioRxiv, p. 587287 (2019) 19. Lau, M.M., Lim, K.H.: Review of adaptive activation function in deep neural network. In: 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), pp. 686–690. IEEE (2018) 20. Li, D., Chen, X., Becchi, M., Zong, Z.: Evaluating the energy efficiency of deep convolutional neural networks on CPUs and GPUs. In: 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom)(BDCloud-SocialCom-SustainCom). pp. 477–484. IEEE (2016) 21. Merenda, M., Porcaro, C., Iero, D.: Edge machine learning for ai-enabled IoT devices: a review. Sensors 20(9), 2533 (2020) 22. Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10(1), 36 (2017) 23. Piazza, F., Uncini, A., Zenobi, M.: Artificial neural networks with adaptive polynomial activation function (1992) 24. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017) 25. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. Nature 323(6088), 533–536 (1986) 26. Saporetti, C.M., Duarte, G.R., Fonseca, T.L., da Fonseca, L.G., Pereira, E.: Extreme learning machine combined with a differential evolution algorithm for lithology identification. RITA 25(4), 43–56 (2018) 27. Scellier, B., Bengio, Y.: Equilibrium propagation: Bridging the gap between energybased models and backpropagation. Front. Comput. Neurosci. 11, 24 (2017) 28. Schwartz, R., Dodge, J., Smith, N.A., Etzioni, O.: Green ai. arXiv preprint arXiv:1907.10597 (2019) 29. Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243 (2019) 30. Sulistiyo, M.D., Dayawati, R.N., et al.: Evolution strategies for weight optimization of artificial neural network in time series prediction. In: 2013 International Conference on Robotics, Biomimetics, Intelligent Computational Systems, pp. 143–147. IEEE (2013) ¨ 31. Tezel, G., Ozbay, Y.: A new neural network with adaptive activation function for classification of ecg arrhythmias. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 1–8. Springer (2007) 32. Tukey, J.W.: Exploratory Data Analysis, vol. 2. Reading, MA (1977) 33. Vecci, L., Campolucci, P., Piazza, F., Uncini, A.: Approximation capabilities of adaptive spline neural networks. In: Proceedings of International Conference on Neural Networks (ICNN 1997), vol. 1, pp. 260–265. IEEE (1997)
Hybrid ELM and BP with AAFs for Classification Problems
29
34. Wu, R., Huang, H., Qian, X., Huang, T.: A L-BFGS based learning algorithm for complex-valued feedforward neural networks. Neural Process. Lett. 47(3), 1271– 1284 (2018) 35. ZahediNasab, R., Mohseni, H.: Neuroevolutionary based convolutional neural network with adaptive activation functions. Neurocomputing 381, 306–313 (2020) 36. Zou, W., Yao, F., Zhang, B., Guan, Z.: Back propagation convex extreme learning machine. In: Proceedings of ELM-2016, pp. 259–272. Springer (2018)
Assessing an Organization Security Culture Based on ENISA Approach Wasnaa Kadhim Jawad(B) University of Information Technology and Communications, Baghdad, Iraq [email protected]
Abstract. Recently, one of the most promising things for any successful organization has been in establishing a security strategy. It is a distinguished document, which details out a set of steps important for any organization for determining and handling dangers. Developing a security strategy is an exhaustive process that involves an initial assessment, planning, operation, and permanent observation. Assessing the readiness of institutions, in terms of the security culture, is emphasized to correspond with the efforts of governments, organizations, the private sector towards the business administration, and the electronic transformation into a digital society and e-government. This paper proposes an “Application form” derived from the European Network and Information Security Agency (ENISA) safety criteria. The form involves twenty-five high-level security objectives, which are collected in seven domains for reviewing, assessing the readiness of any organization, and recognizing the deficit found in the requirements of information security in the organization. The suggested “Application form” has been tested over various objectives, the acquired findings demonstrate that the suggested system has achieved a great performance and significant competence in terms of determining the weaknesses of the network and the organizational structure of the institution. Keywords: Organization security policy · Security culture · ENISA criteria · Organization assessing
1 Introduction Establishing a security strategy is extremely significant for any successful organization. Thus, it is a remarkable document, which details out a group of steps, substantial for any organization to identify, handle, and manage risks. An influential security strategy is comprehensive and effective, with the ability for responding to any kind of security threat. Evolving security strategy is an itemized process, which includes primary assessment, planning, operation, and persistent monitoring [1]. However, there is a powerful rapport between security strategy and security culture. The security culture influences security strategy and makes it highly effective in organizations [2]. Security culture is the thoughts, habits, and techniques that the organization uses to create an environment free from risk and threat. More specifically, there are three © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 30–38, 2021. https://doi.org/10.1007/978-3-030-71187-0_3
Assessing an Organization Security Culture Based on ENISA Approach
31
important elements, that together build and maintain the security culture: people, policies, and technology. Each of these elements affects each other directly, which means that change one element will also change the others too, and in turn, changing the whole security culture [3]. The initial target of a security culture is to boost change and improve security. However, security culture is firstly for humans, not for computers. The persons want a framework to realize what the right thing is for security, and they ask to be instructed. Awareness about the concepts of integrity, secrecy, and privacy is a significant component of any security strategy. In general, efforts must be accomplished to guarantee that the labor force is sufficiently worked on these concepts and that people know their role and responsibility in society’s lifecycle. It is significant to know that a security strategy is not a onetime procedure and consequently assessments must be happening quarterly to gauge the performance of carrying out initiatives [4]. In this paper, the method of (ENISA) was widely applied inside Europe. It represents the European consultant body in cybersecurity, network security, and information. It has clear and reliable standards for assessing the readiness of any organization in the field of cybersecurity, network security, and information. ENISA method supports Europe the most developed region and most vulnerable to the threats of information security and cybersecurity [5], through the gathering of information, expertise concerning the security culture and contributing to insurance of the information society inside Europe by raising awareness, promoting and developing a culture of information security that have a significant effect on European Union [6].
2 Literature Review Several researchers investigated different methodologies and parameters for establishing, evaluating, and enhancing the security strategy and security culture. For instance, Okere et al. [7] confirmed that for assessing the security culture in the organizations, there is no specific tool or method. Besides, the unawareness must mitigate via increasing training and culture [8]. Renaud and Goucher [9] affirmed that employees form a crucial role in maintaining the organization’s security. Furthermore, focusing on achieving the Information security culture requirements is extremely significant in reducing the security risks that may occur in the organization [10]. Conversely, Al-Hogail et al. [11] emphasized that one of the most important issues for security culture is the human factor because the behavior of employees has a great influence on information security protection inside the organization. As well, building robust security culture, organizations require guidelines and an overall framework [12]. It should be noted that the security of the organization does not depend on one person, but depends on all the employees in the workplace [13]. Moreover, the organization’s security will subject to risk, if the employees do not commit to achieving information security policy [14]. Besides, the important and influential factor of security culture inside the organization is the strategy of information security [15]. Masrek [16] asserted that Information security culture has requirements and characteristics that vary from one organization to another.
32
W. K. Jawad
Similarly, the necessity to supply an overall view that determines the significant factors that form and affect the culture of information security [17]. In general, the authors emphasized that organizations have to guarantee a jumble of human behavioral sides as well as technical systems related to information safety administration, to obtain a strong security culture [18]. Finally, Masrek et al. [19], asserted that security culture did not depend only on the employees and their skills related to information security, but also on the procedures and techniques that are available to protect the information in the organization.
3 Security Culture Development In that respect, there are various methods to boost security culture idea. First, concentration on the idea that the concept of security belongs to every employee in the organization, not just the security department, it is so important that everyone behaves such as a security person. Second, concentrate on awareness. Security consciousness is the process of educating all your team on the fundamental lessons about security. Third, establish a Secure Development Lifecycle (SDL). An SDL is the actions that your institution consents to accomplish for each software or system version. Fourth, prize those people interested in security. When people complete a security awareness program successfully, then, search for opportunities to reward them. Eventually, establish a security community. It represents the backbone that gathers people in the organization against common trouble [5]. Information security culture depends on the technology and viewpoints of employees behavioral related to information security [20]. Furthermore, organizations confront numerous threats that affect the privacy and secrecy of their private and important data due to increased violations of information security in the organizations [21]. The security culture represents a part of the security plan to guide employee conduct in the organization [22]. The development of security culture inside organizations can decrease the human factor threats, data security violations, and large financial losses [23].
4 Methodology This paper presents ENISA as an assessment method for the security culture of an organization. ENISA represents for European Union an experience center for network and information security [24]. The responsibility of ENISA is to prop European foundations and Member States through complying within an arranged approach to react to network and information security threats [25]. 4.1 Security Domains According to ENISA, there are twenty-five objectives of high-level security, gathered into seven domains, as shown in Fig. 1. Each of these objectives has elaborated security procedures, which must be executed accurately and correctly to fulfill the security goal. Also, there is a list of elaborated evidence, which indicates that all measures have been properly applied. For instance, the field “Governance and risk management” in Table 1, shows its objectives, measures, and evidence [24]. Other domains’ details are available in ref. [24].
Assessing an Organization Security Culture Based on ENISA Approach Governance and risk management
33
Information secuirty policy Governance and risk administration Security roles and responsibilities Security of third party assets
Human resources security
Background checks Security knowledge and training Personnel charges Handling violations
Security of systems and facilities
Physical and environmental security Security of supplies Access control to network and information systems Integrity of network and information systems
Operations management
Operational procedures Change administration Asset administration
Incident management
Incident administration procedures Incident detection capability Incident reporting and communication
Business continunity management
Service continunity strategy and emergency plans
Catastrophes recovery capability
Monitoring, auditing, and testing
Monitoring and logging policies Exercises emergency plans Network and information systems testing Secuirty appraisals Compliance monitoring
Fig. 1. ENISA domains and their objectives
5 Results To assess the security, the ENISA method has been applied to an IT organization. This organization consists of more than (400) employees and contains four specialized research centers, which are: – – – –
Networks Information Security Specialized software Smart applications for community service
As well as a Data Center that hosts web-based software. Also, there is an internal network that is protected and controlled by passwords for each user within a scope of work on the active directory, as well as passwords for each application existing on the data center. Server software licenses and antivirus software have not been updated due to a lack of customizations, resulting in a weak network and breakdown in the
34
W. K. Jawad Table 1. Governance and risk management Domain [24] Objectives
Measures
Evidence
1
Information security policy
Situate an advanced security policy responsible for managing and handling the security and steadiness of the communication networks and/or services supplied
Documented security policy, involving networks and services in scope, critical assets supporting them, and the security objectives
2
Governance and risk management
Ensure residual risks are accepted by management
Management approval of residual risks
3
Security roles and responsibilities
The structure of security roles and responsibilities is frequently evaluated and adjusted, based on changes and/or past incidents
Latest documentation of the structure of security role assignments and responsibilities
4
Security of third party assets
Situate a security policy for contracts with third parties
Record of contracts with third parties
organizational structure of the institution. The application of ENISA standards in this institution enables us to identify weaknesses in the network and organizational structure of the institution. However, it should be noted that the ideal score is the maximum score given when all measures are achieved for each objective in each domain. The value of the ideal score for each domain varies according to its importance in the organization. In other words, the important domain owns high values of the ideal scores, while the less important domain holds few values of the ideal scores. For example, the total value of the ideal scores for Monitoring, auditing, and the testing domain is “15” whilst the total value of the ideal scores for the Business continuity management domain is “4” as shown in Table 2 and Table 3, respectively. Figure 2 shows a Chart of the monitoring, checking, and testing domain. Table 2. Monitoring, auditing, and testing domain Objective
Ideal score Audit score
Monitoring and logging policies
3
2
Exercise emergency plans
3
2
Network and information systems testing 3
2
Security appraisals
3
2
Compliance monitoring
3
2
Assessing an Organization Security Culture Based on ENISA Approach
35
3.5 3 2.5 2 1.5 1 0.5 0 Monitoring and logging policies
Exercise Network and Security assessments contingency plans information systems testing Ideal Score
Compliance monitoring
Audit Score
Fig. 2. Chart of monitoring, auditing, and testing domain
The audit score is the value given when all measures are examined for each objective in each domain. note that, for example, in (Monitoring, auditing, and testing) domain, if all measures are accomplished so the ideal score for Monitoring and logging policies objective is “3”, but the audit score is “2”, because of the shortage to achieve policy for logging and monitoring of crucial systems measure related to this objective as shown in Table 2. Besides, we observe in (Business continuity management) domain, the ideal score for Disaster recovery capabilities is “2”, but the audit score is “1” as shown in Table 3, due to the failure of establishing appropriate procedures for restoring network services and communications if natural disasters occur. Figure 3 shows a Chart of the Business continuity administration field. Table 3. Business continuity management domain Objective
Ideal score
Audit score
Service permanency strategy and emergency plans
2
1
Catastrophe recovery abilities
2
1
2.5 2 1.5 1 0.5 0 Service connuity strategy and conngency plans Ideal Score
Disaster recovery capabilies Audit Score
Fig. 3. Chart of the Business continuity management domain
36
W. K. Jawad
Furthermore, in (Human resources security) domain, as shown in Table 4, the ideal score for Security knowledge and training is “5”, but the audit score is “3”, due to the personnel need for modern security knowledge and shortage found in their security training programs. Figure 4 shows a Chart of the Human resources security domain. Table 4. Human resources security domain Objective
Ideal score Audit score
Background checks
5
2
Security knowledge and training 5
3
Personnel changes
5
4
Handling violations
5
2
6 4 2 0 Background checks
Security knowledge and training Ideal Score
Personnel changes
Handling violaons
Audit Score
Fig. 4. Chart of Human resources security domain
In this paper, “Application form” as shown in the Table 5 below was proposed to calculate all the scores for all the measures of every objective in each domain to audit, Table 5. Application form
Assessing an Organization Security Culture Based on ENISA Approach
37
assess the readiness of any organization, and to specify the shortage in the requirements of information security in an organization.
6 Conclusion This paper indicated that there is a real need to assess the performance of organizations and their security policies in information management according to the activity of each organization. Also, the concerned persons for the management of the file of information and assets security in any organization should take into account the international directives and criteria especially for the pioneer countries and organizations in this field. As mentioned in this paper to the ENISA agency. The paper also indicated that there is the necessary need to establish a technical team specialized in the domain of information security and cybersecurity to carry out internal auditing and monitoring using measures and international criteria for specifying the situation of the institution and its readiness in terms of information security and cybersecurity. Security culture principle and awareness are not restricted to the employees in the field of information security merely, but furthermore all the staff in the organization for integration among all the employees in the organization.
References 1. Whitworth, M.: Six Steps to a Better Security Strategy. Technical Report (2016) 2. Swiety, M.: Security Culture and how it affects your organization: Getting in touch with your human side. Web Page (2017). https://www.luxoft.com/blog/mswiety/security-culture-andhow-it-affects-your-organization-getting-in-touch/ 3. Roer, K.: Build a Security Culture. IT Governance Publishing (2015) 4. Al Hogail, A.: Cultivating and assessing an organizational information security culture; an empirical study. Int. J. Secur. Appl. 9(7), 163–178 (2015) 5. Study on the Evaluation of the European Union Agency for Network and Information Security. Technical Report, RAMBOLL (2017). https://openarchive.cbs.dk/bitstream/handle/10398/ 9524/EvaluationofENISA-FinalReport.pdf?sequence=1 6. Enisa Regulation (EU) No 526/2013 OF the European Parliament and of the Council. Official Journal of the European Union (2013) 7. Okere, I., van Niekerk, J., Carroll, M.: Assessing information security culture: a critical analysis of current approaches. In: The Proceedings of IEEE Conference on Information Security for South Africa (ISSA), pp. 1–8 (2012) 8. Whitman, M.E., Mattord, H.J.: Principles of Information Security. Course Technology, Boston (2012) 9. Renaud, K., Goucher, W.: The curious incidence of security breaches by knowledgeable employees and the pivotal role a of security culture. In: Human Aspects of Information Security, Privacy, and Trust, pp. 361–372. Springer, Switzerland (2014) 10. Hafizah Hassan, N., Ismail, Z., Maarop, N.: Proceedings of the 5th International Conference on Computing and Informatics, 11–13 August 2015, Istanbul, Turkey (2015) 11. Alhogail, A., Mirza, A., Bakry, S.H.: A comprehensive human factor framework for information security in organizations. J. Theor. Appl. Inf. Technol. 78(2), 201–211 (2015)
38
W. K. Jawad
12. AIHogail, A., Mirza, A.: Organizational information security culture assessment. In: International Conference on Security and Management SAM (2015) 13. Munteanu, A.-B., Fotache, D.: Enablers of information security culture. Procedia Econ. Fin. 20, 414–422 (2015) 14. Antoniou, G.S.: Designing an effective information security policy for exceptional situations in an organization: An experimental study. Doctoral dissertation. Nova Southeastern University. Retrieved from NSU Works, College of Engineering and Computing, no. 949 (2015). https://nsuworks.nova.edu/gscis_etd/949 15. Da Veiga, A.: The influence of information security policies on information security culture: illustrated through a case study. In: Proceedings of the Ninth International Symposium on Human Aspects of Information Security & Assurance (HAISA 2015) (2015) 16. Masrek, M.N.: Assessing information security culture: the case of Malaysia public organization. In: Proceeding of 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Semarang, Indonesia, 18–19 October 2017 (2017) 17. Tolah, A., Furnell, S.M., Papadaki, M.: A comprehensive framework for cultivating and assessing information security culture. In: Proceedings of the Eleventh International Symposium on Human Aspects of Information Security & Assurance (HAISA 2017) (2017) 18. Glaspie, H.W., Karwowski, W.: Human factors in information security culture: a literature review. In: International Conference on Applied Human Factors and Ergonomics (2018) 19. Masrek, M.N., Harun, Q.N., Sahid, N.Z.: Assessing the information security culture in a government context: the case of a developing country. Int. J. Civil Eng. Technol. (IJCIET) 9(8), 96–112 (2018) 20. Tang, M., Zhang, T.: The impacts of organizational culture on information security culture: a case study. Inf. Technol. Manag. 17, 1–8 (2016) 21. Connolly, L., Lang, M., Tygar, D.: Managing employee security behaviour in organisations: the role of cultural factors and individual values. In: Proceedings of 29th IFIP International Information Security Conference (SEC), Marrakech, Morocco, June 2014 (2014) 22. Martins, N., DaVeiga, A.: An information security culture model validated with structural equation modelling. In: Proceedings of the Ninth International Symposium on Human Aspects of Information Security & Assurance (HAISA 2015) (2015) 23. Cyber Security Culture in organizations. Technical Report, ENISA (2017) 24. Dekker, M., Karsberg, C.: Technical guideline on security measures technical guidance on the security measures in article 13a. Technical Report, ENISA, Version 2.0 (2014) 25. Skopik, F., Settanni, G., Fiedler, R.: A problem shared is a problem halved: a survey on the dimensions of collective cyber defense through security information sharing. Comput. Secur. 60, 154–176 (2016)
Evaluation of Information Security Policy for Small Company Wasnaa Kadhim Jawad(B) University of Information Technology and Communications, Baghdad, Iraq [email protected]
Abstract. Recently, the use of information technology and communications by people has increased dramatically in various governmental and private institutions and companies, therefore, it became necessary to protect information from various threats and breaches, and turn into establishing a detailed and precise information security policy that everyone must pursue. The target of this paper assessing the policy of the information security of a specific firm, find out the strengths and weaknesses of its security policy based on ENISA criteria. ENISA is the European Network and Information Security Agency which consists of five domains, each domain contains particular objectives for boosting, evaluating, and to distinguish the shortage in the company’s security policy requirements. The obtained findings show that using ENISA security criteria has achieved a high performance and significant efficiency in terms of evaluating the measures taken to implement a reliable and robust information security policy approved by the company. Keywords: Information technology and communications · Information security policy · ENISA criteria
1 Introduction Information security culture is the thoughts, habits, and techniques that the company uses to protect information from damage in all its forms, whether it be from persons (such as hackers), software (such as computer viruses), whether intentional or accidental, and protect information from unauthorized access, theft, capture, alteration, or reorientation, or misuse, to protect an enterprise’s ability to continue and function properly to create an environment free from risk and threat. Information security culture has three major elements: people, technology, and policies. Each element affects the other immediately and significantly, which means a change in one element will change the other elements as well, and subsequently, the structure of security culture is changing completely [3]. Security culture is definite for humans, so a framework for a robust information security policy must be put in place so that people can realize what the right thing is for security, and consider it a law to which they abide [4].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 39–48, 2021. https://doi.org/10.1007/978-3-030-71187-0_4
40
W. K. Jawad
Therefore, there is a powerful and obvious rapport between culture and policy of information security, security culture sets positive impacts on the security policy and makes it effective in the companies significantly [2]. Information security policy is rules and written steps by senior management of the company that the users must follow to ensure information security, and on this basis, the security policy must be clear and accurate. Establishing a security policy is extremely significant for any successful company to determine, handle, and manage risks. The information security policy constitutes a main and unified reference for reference when information security tasks conflict with each other, or with others [1]. Also, security policies clarify the duty and responsibility of each person regarding information security, as well as identify and protect the company’s core resources [2]. In this small company, various sections included employees with different cultures and opinions towards security. As the number of employees increases and the structure of the company changes, it is more difficult to control them, therefore information security policy is a powerful rule for enabling the employees to unify their opinions towards security [3]. The goal of this paper is to evaluate the information security policy applied by a small company based on the (ENISA) approach, which represents European Union Agency For Cybersecurity [5]. ENISA has reliable criteria and measures that can be depended on it for evaluating any company in the field of information security [6].
2 Literature Review Many investigators presented various standards and methodologies for establishing, evaluating, and bolstering the security culture and security policy. For example, Okere et al. [7] affirmed that for evaluating the security culture in the companies, there is no particular tool or approach. Safa and Ismail [8] asserted that despite the development of information and communication technologies, information security remains a widespread problem among experts as well as users. Also, Renaud and Goucher [9] asserted that the employees form a considerable role in protecting the security of the company. Moreover, they confirmed that focusing on Information security culture is very important in decreasing the security threats that may take place in the company [10]. On the other hand, Al-Hogail et al. [11] assured that the conduct of employees forms a significant role in the issue of maintaining security policy inside the company. Furthermore, companies need guidelines and a complete structure in order to build a strong security culture [12]. It must be renowned that the information security does not rely on one employee, but counts on all the personnel in the company [13]. Besides, the company security will undergo risk, if the person did not carry out the security policy [14]. Then again, the most effective and significant factor for establishing the information security culture inside any company is the information security policy [15]. Masrek [16] confirmed that Information security culture has features and requirements that differ from one company to another. Likewise, the importance to supply a complete view that identifies the major factors that form and impact the culture of information security [17]. Generally, to gain a powerful security culture the institutions must ensure a mix of human behavioral sides in addition to technical systems regarded to information integrity
Evaluation of Information Security Policy for Small Company
41
management [18]. Lastly, Masrek et al. [19] affirmed that security culture did not count only on the personnel and their skills regarded to information security, but as well on the guidelines and techniques that are available to protect the information in the company.
3 Information Security Culture Development In that regard, there are several methods to uphold information security culture idea. First, confirmation of the principle that information security protection is the liability of every employee in the company. Second, focus on educating employees and increasing their security awareness by providing them with basic information security lessons. Third, encouraging employees interested in information security by rewarding them [5]. Information security culture counts on the technology and personnel conduct related to the information security concept [20]. Furthermore, organizations face a serious threat due to the increasing violations of important, confidential and personal data [21]. The information security culture forms a substantial part of the security plan to guide personnel conduct inside the company [22]. Consequently, the development of information security policy within companies can decrease the risk of information security violations and dodge great fiscal losses [23].
4 Methodology As mentioned previously, this paper offers ENISA as an evaluating method for information security policy for the company [24]. ENISA plays an important role by adhering to a coordinated approach to responding to threats to the network and information security [25]. By ENISA, there are twenty-five objectives of security, collected into seven domains, as shown in Fig. 1. For each of these objectives, there are obvious security measures, which must be performed precisely to fulfill the security objective. As well, there is a list of detailed evidence, which refer that whole security measures have correctly applied. For instance, the “Governance and risk management” domain in Table 1 appears as an “Information security policy” objective, and its measures, and evidence connected to it [24]. Other information about the remaining domains is obtainable in ref. [24].
5 Results For evaluating the information security policy, the ENISA approach has been applied to a small company. This corporation consists of (100) personnel and includes three centers, which are: – Networks and Internet Services – Software Solutions – Personnel Management
42
W. K. Jawad
Governance and risk management Information secuirty policy Governance and risk administration Security roles and responsibilities Security of third party assets Human resources security Background checks Security knowledge and training Personnel charges Handling violations Security of systems and facilities Physical and environmental security Security of supplies Access control to network and information systems Integrity of network and information systems Operations management Operational procedures Change administration Asset administration Incident management Incident administration procedures Incident detection capability Incident reporting and communication Business continunity management Service continunity strategy and emergency plans Catastrophe recovery capability Monitoring, auditing, and testing Monitoring and logging policies Exercises emergency plans Network and information systems testing Secuirty assessments Compliance monitoring
Fig. 1. ENISA domains and their objectives
Due to slow Internet networks between departments and delays in the services provided and the lack of continuous updating of anti-virus programs in addition to the lack of realization of employees of the importance of adhering to the execution of security policy inside the company, it all led to a feeble network and collapse in the managerial and organizational structure of the company. The application of ENISA criteria in this company can determine the weaknesses in the network and the total organizational framework of the company. It must be known that the optimal score is the utmost score set when the whole measures are carried out for every objective in every domain. For example, (Networks and Internet Services Center) as shown in Table 2. Figure 2 shows the Chart of Networks and Internet Services Center. The rating score is the value that is given when all measures are checked up for every objective in every domain. For instance, in (Software Solutions Center) as shown in Table 3 the optimal score is “93”, but the rating score is “61” because most employees are unaware of the importance of implementing an information security policy. Figure 3 shows a Chart of the Software Solutions Center.
Evaluation of Information Security Policy for Small Company
43
Table 1. Governance and risk management Domain [24] Objective
Measures
Evidence
Information security policy
Situate an advanced security policy responsible for managing and handling the security and steadiness of the communication networks and/or services provided
Notarized security policy, involving networks and services, significant assets supporting them, and also the security aims
All employees must realize the Employees recognize the significance of the information importance of implementing security strategy an information security policy in their work Appraisal of the information security policies regularly
Policies of Information security are the latest and accepted by superior administration
Table 2. Networks and internet services center Domain
Objective
Measures
Optimal score
Rating score
Governance and risk management
Information security policy
Situate an advanced security policy responsible for managing and handling the security and permanency of the communication networks and/or services supplied
75
53
All employees must realize the significance of the information security policy
83
51
Appraisal of the information security policies regularly
91
57
Also, in (Personnel Management Center) as shown in Table 4 the optimal score is “73”, but the rating score is “33”, due to the failure of placing advanced security strategy responsible for managing and handling the security and permanency of the communication networks and/or services provided. Figure 4 shows the Chart of Personnel Management Center.
44
W. K. Jawad
100 90 80 70 60 50 40 30 20 10 0 O Ra
l Score g Score
Appraisal the All employees must Situate an advanced inform n security realize the security policy policies regularly. significance of the responsible for inform n security managing and policy. handling the security nuity of the and c communica on networks and/or services provided. Informa on security policy Governance and risk management
Fig. 2. Chart of networks and internet services center
Table 3. Software solutions center Domain
Objective
Measures
Optimal score
Rating score
Governance and risk management
Information security policy
Situate an advanced security policy responsible for managing and handling the security and permanency of the communication networks and/or services supplied
85
43
All employees must realize the significance of the information security policy
93
61
Appraisal of the information security policies regularly
77
53
Evaluation of Information Security Policy for Small Company
100 90 80 70 60 50 40 30 20 10 0 Opmal Score Rang Score
All employees must Situate an advanced Appraisal the realize the security policy informaon security policies regularly. significance of the responsible for informaon security managing and policy. handling the security and connuity of the communicaon networks and/or services provided. Informaon security policy Governance and risk management
Fig. 3. Chart of software solutions center
Table 4. Personnel management center Domain
Objective
Measures
Optimal score
Rating score
Governance and risk management
Information security policy
Situate an advanced security policy responsible for managing and handling the security and permanency of the communication networks and/or services supplied
73
33
All employees must realize the significance of the information security policy
91
51
Appraisal of the information security policies regularly
85
53
45
46
W. K. Jawad
100 90 80 70 60 50 40 30 20 10 0 Opmal Score Rang Score
All employees must Situate an advanced Appraisal the realize the informaon security security policy significance of the policies regularly. responsible for informaon security managing and policy. handling the security and connuity of the communicaon networks and/or services provided. Informaon security policy Governance and risk management
Fig. 4. Chart of personnel management center
6 Conclusion To obtain a safe environment, the management of the company must first understand the laws and systems related to information security and then start to apply them from the highest administrative level in the company to the lowest executive level, including knowing what to protect, and those who deal with him to ensure that the company is going according to a general program that ensures compliance full information security laws. This paper emphasized that there is an actual need to evaluate the information security policies in any company based on ENISA security criteria to ensure raise the standard of security in every section of the company. Information security policies have a significant part in the consolidation of information security in the company as a key pillar to uphold the security of information and consequently achieve the goals of the company. The security policy must be accurate, comprehensive, and clear because it forms the starting point and foundation stone of information security in any company. The full benefit of security policies cannot be achieved without an integrated awareness and training program, to make all employees of the company aware of these concepts and apply it.
References 1. Whitworth, M.: Six Steps to a Better Security Strategy. Technical Report (2016) 2. Swiety, M.: Security Culture and how it affects your organization: Getting in touch with your human side. Web Page (2017). https://www.luxoft.com/blog/mswiety/security-culture-andhow-it-affects-your-organization-getting-in-touch/
Evaluation of Information Security Policy for Small Company
47
3. Roer, K.: Build a Security Culture. IT Governance Publishing (2015) 4. Al Hogail, A.: Cultivating and assessing an organizational information security culture; an empirical study. Int. J. Secur. Appl. 9(7), 163–178 (2015) 5. Study on the Evaluation of the European Union Agency for Network and Information Security. Technical Report, RAMBOLL (2017). https://openarchive.cbs.dk/bitstream/handle/10398/ 9524/EvaluationofENISA-FinalReport.pdf?sequence=1 6. Enisa Regulation (EU) No 526/2013 OF the European Parliament and of the Council. Official Journal of the European Union (2013) 7. Okere, I., van Niekerk, J., Carroll, M.: Assessing information security culture: a critical analysis of current approaches. In: The Proceedings of IEEE Conference on Information Security for South Africa (ISSA), pp. 1–8 (2012) 8. Sohrabi, S.N., Akmar, I.M.: A customer loyalty formation model in electronic commerce. Econ. Model. 35, 559–564 (2013) 9. Renaud, K., Goucher, W.: The curious incidence of security breaches by knowledgeable employees and the pivotal role a of security culture. In: Human Aspects of Information Security, Privacy, and Trust, pp. 361–372. Springer, Switzerland (2014) 10. Hafizah Hassan, N., Ismail, Z., Maarop, N.: Proceedings of the 5th International Conference on Computing and Informatics, 11–13 August 2015, Istanbul, Turkey (2015) 11. Alhogail, A., Mirza, A., Bakry, S.H.: A comprehensive human factor framework for information security in organizations. J. Theor. Appl. Inf. Technol. 78(2), 201–211 (2015) 12. AIHogail, A., Mirza, A.: Organizational information security culture assessment. In: International Conference on Security and Management SAM (2015) 13. Munteanu, Adrian-Bogdanel., Fotache, D.: Enablers of information security culture. Procedia Econ. Fin. 20, 414–422 (2015) 14. Antoniou, G.S.: Designing an effective information security policy for exceptional situations in an organization: An experimental study. Doctoral dissertation. Nova Southeastern University. Retrieved from NSU Works, College of Engineering and Computing, no. 949 (2015). https://nsuworks.nova.edu/gscis_etd/949 15. Da Veiga, A.: The influence of information security policies on information security culture: illustrated through a case study. In: Proceedings of the Ninth International Symposium on Human Aspects of Information Security & Assurance (HAISA 2015) (2015) 16. Masrek, M.N.: Assessing information security culture: the case of Malaysia public organization. In: Proceeding of 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Semarang, Indonesia, 18–19 October 2017 (2017) 17. Tolah, A., Furnell, S.M., Papadaki, M.: A comprehensive framework for cultivating and assessing information security culture. In: Proceedings of the Eleventh International Symposium on Human Aspects of Information Security & Assurance (HAISA 2017) (2017) 18. Glaspie, H.W., Karwowski, W.: Human factors in information security culture: a literature review. In: International Conference on Applied Human Factors and Ergonomics (2018) 19. Masrek, M.N., Harun, Q.N., Sahid, N.Z.: Assessing the information security culture in a government context: the case of a developing country. Int. J. Civil Eng. Technol. (IJCIET) 9(8), 96–112 (2018) 20. Tang, M., Zhang, T.: The impacts of organizational culture on information security culture: a case study. Inf. Technol. Manag. 17, 1–8 (2016) 21. Connolly, L., Lang, M., Tygar, D.: Managing employee security behaviour in organisations: the role of cultural factors and individual values. In: Proceedings of 29th IFIP International Information Security Conference (SEC), Marrakech, Morocco, June 2014 (2014) 22. Martins, N., DaVeiga, A.: An information security culture model validated with structural equation modelling. In: Proceedings of the Ninth International Symposium on Human Aspects of Information Security & Assurance (HAISA 2015) (2015)
48
W. K. Jawad
23. Cyber Security Culture in organizations. Technical Report, ENISA (2017) 24. Dekker, M., Karsberg, C.: Technical guideline on security measures technical guidance on the security measures in article 13a. Technical Report, ENISA, Version 2.0 (2014) 25. Skopik, F., Settanni, G., Fiedler, R.: A problem shared is a problem halved: a survey on the dimensions of collective cyber defense through security information sharing. Comput. Secur. 60, 154–176 (2016)
English-Hindi Cross Language Query Translation and Disambiguation Using Most Salient Seed Word Pratibha Maurya(B) Amity Institute of Information Technology, Amity University, Lucknow Campus, Uttar Pradesh, India [email protected]
Abstract. Natural languages suffer from two types of ambiguity namely Lexical ambiguity and Syntactic ambiguity. This paper deals only in lexical ambiguity, ambiguity that arises when a word has two or more possible meanings. English language is no excuse. To translate English to Hindi query in Cross Language Information Retrieval, these ambiguous words need to be disambiguated properly for relevant Hindi language documents to be retrieved. This paper aims to find the most salient context word in the English query and use it as a single disambiguation feature in contrast of using the entire query as a context for disambiguation. When the entire query is used as a context for disambiguation, all the terms are assumed as equally important. Ideally this is not true always. All the terms in the source query are not as predicative of the word being translated as others and thus treating all query terms as uniformly important may not always be a wise decision. This paper aims to investigate this claim by proposing two methods which use either statistical mean or contribution ratio to find the best context seed word to disambiguate user query terms. The proposed methods are compared to baseline method which uses entire query as disambiguation feature. The proposed methods achieve 85% precision as compared to baseline method, which is quiet good and thus these methods can be used with high confidence for query translation and disambiguation instead of using entire query context as done by most of the researchers. Keywords: Cross Language Information Retrieval (CLIR) · Translation ambiguity · Translation disambiguation · Word co-occurrence feature · Salient seed word
1 Introduction The enormity of information on the web and continuously improving information retrieval methods have privileged people to dig huge amount of information available on the internet. With the augmentation of web with multilingual content and the approaching users who prefer non-English documents, has inspired the researchers to retrieve information crossing language boundaries. A cross language information retrieval system retrieves documents whose language is different from the query language [1, 2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 49–58, 2021. https://doi.org/10.1007/978-3-030-71187-0_5
50
P. Maurya
Such systems prove useful for the users who have limited knowledge of the document language and face difficulty in writing efficient queries. While English still dominates on the web, demand for retrieving Indian language documents has risen rapidly in recent years. Management consultancy KPMG India in association with search giant Google conducted a study on 2,448 rural and 4,612 urban Indian citizens in 2017 and predict that by the year 2021, that there will be 536 million Indian language internet users out of total 735 million internet users in India leaving behind only 199 million English internet users [3]. Among these Indian languages, Hindi is the most popularly searched language on internet [3]. These Hindi language users at times find hard to write effective queries in Hindi language may be due to limited vocabulary, or computers not supporting Indian language keyboard. These users would like to retrieve relevant Hindi documents on the web using English queries. These users are facilitated by CLIR systems for expressing their information need in English while system matches it appropriately with relevant documents in Hindi language. But CLIR between Western and Asian languages is not an easy task as the two language sets differ significantly in their structural and written form. A source language words can have many senses, which translate to different words in the target language. This is known as translation ambiguity [4, 5]. To achieve an efficient CLIR system using dictionary-based query translation, the basic requirement is to solve the problem of existing ambiguities of source query terms. The process of selecting the correct translation of query terms in the given context is termed as Translation Disambiguation [6, 7]. The motivation behind this work is that nearly all researchers working in Indian language CLIR have used the entire context of source query to deal with the problem of translation ambiguity. Cheung et al. claim that all the words present in the query are not equally important and better performance can be achieved by disambiguating with respect to the most salient word in the query [8, 11, 12]. This salient word significantly impacts the disambiguation of the ambiguous target query word [8–15]. The major contribution of this research work is to propose new methods to find the most salient word in given user query. The paper also proves that this feature of identifying best salient seed word works well in performing translation and disambiguation in Indian language perspective. The methods proposed are unsupervised and do not use any annotated training data. The paper is structured as follows: Sect. 2 covers the proposed approach by discussing the two methods to find the most salient seed word. Section 3 talks about the data used for experiments and baseline and evaluation of proposed methods. The section ends with a discussion regarding results obtained. Finally, Sect. 4 concludes research work.
2 Proposed Approach Many authors have proposed a range of options for choosing seed-words to disambiguate query word senses [8–10, 15]. Seed word is the word in a query which can be used alone to disambiguate target word efficiently. For example, consider the sentence I must go to the bank and change some money. Though ‘bank’ can be shore of a river or a financial institution, but the term ‘money’ is a good indicator that bank here refers to
English-Hindi Cross Language Query Translation and Disambiguation
51
a financial institution. This argues that instead of using entire query, we can also try to locate the most salient seed word in query for translation candidate disambiguation. This assumption is in accordance with many researchers who also believe that the most salient word if preferred for disambiguation, a better performance can be expected [8, 14, 15]. This paper proposes to use the concept of (i) statistical mean and (ii) the maximum contributing query term to find the salient seed word to perform disambiguation and evaluates the methods on English-Hindi language pair. Let English query be represented as a set {(e1 , H1 ), (e2 , H2 ), … (en, Hn)}, where ei is the English query term and Hi = (hi1 ,hi2 …..hij ) is the list of translation candidates of ei obtained from bilingual dictionary. Let English term ei be the target term for which the most salient query term must be selected. For each ei, we calculate the dice coefficient DC(hij , hkl ), where hij is a translation candidate of ei (represented in columns) and hkl is a translation candidate of ek (represented in rows) where k = i as shown in Table 1. Table 1. DC score between translation candidates of ei and ek (k = i) hi1
hi2
…..
h11 DC(hi1 , h11 ) DC(hi2 , h11 ) ….. h12 DC(hi1 , h12 ) DC(hi2 , h12 ) …..
hiq
DC hiq , h11 DC hiq , h12
.
. . ….. . hk1 DC(hi1 , hk1 ) DC(hi2 , hk1 ) …… DC hiq , hk1 hk2 DC(hi1 , hk2 ) DC(hi2 , hk2 ) …… DC hiq , hk2 .
.
.
…..
.
hn1 DC(hi1 , hn1 ) DC(hi2 , hn1 ) …..
DC hiq , hn1
.
.
.
.
….
hnr DC(hi1 , hnr ) DC(hi2 , hnr ) …..
DC hiq , hnr
After finding the required DC scores, we attempt to find the most contributing term in the given query for disambiguating a specific query term. Method 1: “The statistical mean refers to the mean or average that is used to derive the central tendency of the data in question. It is determined by adding all the data points in a population and then dividing the total by the number of points. The resulting number is known as the mean or the average” [16]. The mean is a statistical property rather than semantic goodness of context terms. The intuition for choosing the context word having highest value of mean as the disambiguating feature for the target query word assumes that the specificity of index terms can be interpreted statistically as a function of its central tendency rather than of term meaning. For each row p in Table 1, the statistical mean is calculated as follows: DC(hij , hkl ) q (1) mean(p) = 1≤j≤q;i,k,l∈N
52
P. Maurya
Where, q is number of translation candidates of ei and N being the set of natural numbers. Dice coefficient (DC) measures the associativity between two translation candidates. DC value lies between 0 and 1 (with 1 being perfect co-occurrence), and thus has being selected to measure association strength over other measures like log likelihood ratio and mutual information which fail to have any upper bound [17]. Next, the attempt is to find the translation candidate which has maximum mean value. Ψ = argmax1≤p≤t mean(p)
(2)
Then the translation candidate hkl is selected whose row has the maximum mean value Ψ . Here ‘t’ is the total number of Hindi translation candidates. Finally, the English source query term ek (corresponding to hkl ) is chosen as the most discriminative term to disambiguate source query term ei . Method 2: Calculate two contribution ratios ϕ1 and ϕ2 for each row ‘p’ in Table 1 as: Contribution ratio, DC hij , hkl first ϕ1ei (hkl ) = DC hij , hkl second DC hij , hkl second ϕ2ei (hkl ) = DC hij , hkl third
(3)
(4)
DC hij , hkl first , DC hij , hkl second and DC hij , hkl third are the highest DC score, second largest DC score and third largest DC score for row ‘p’. Ω1ei = argmax ϕ1ei (hkl )
(5)
Ω2ei = argmax ϕ2ei (hkl )
(6)
Then the translation candidate hkl is selected whose row has highest value for both ϕ1 and ϕ2. Finally, the English query term ek (corresponding to hkl ) with maximum contribution is chosen as the most discriminative term for disambiguation of ei . Proposed Algorithm: Most salient context seed word using statistical mean or contribution ratio
English-Hindi Cross Language Query Translation and Disambiguation
53
Input: English source query, E= {e1, e2, ……en} Retrieve translation candidates TCi= {hij|1≤i≤n and 1≤j≤m} from bilingual dictionary for every ei. For each ei E do step 2.1 2.1. extract all translation candidates W= {hkl |1≤k≤n and k≠i} for each context words el, l≠i. . For each hkl
W do step 3.1 or 3.2
3.1. if method = 1, calculate statistical mean
3.2. else calculate contribution ratio
Select hkl with highest statistical mean and
or maximum contribution ratios
as calculated either in equation 2 or equations 5 & 6.
. Mark ek (corresponding to hkl) as salient context word. Output: Select translation candidate hij with highest DC score with hkl as the target translation for ei
3 Experimental Setup A corpus containing 5000 Hindi articles in UTF-8 encoding is developed. These articles have been extracted from popular Hindi newspapers Amar Ujala, Web Dunia and Dainik Jagran with each article having size of 25 KB on average. This corpus is used to calculate Dice Coefficient (DC) score between Hindi translation candidates obtained from Shabdanjali [18] and English–Hindi mapping [19], the most popular translation resources for English- Hindi language pairs. The dictionary is developed in IIIT, Hyderabad and contains 28 K Hindi words. A stop word list with 507 English words is utilized to eliminate stop words from test queries. Inflected English query words are reduced to base form by using Porter Stemmer [20]. A set of 50 queries from FIRE (Forum for Information Retrieval Evaluation) 2008-12 dataset is taken as test queries. The proposed methods are evaluated using Google and Bing indexed web documents. The relevance judgments of Hindi documents retrieved for English test queries is established by three assessors with profound knowledge of both English and Hindi languages. The document relevance is stated on a four-value
54
P. Maurya
scale with scores ranging from 4 (the document exactly covers the information user wants) to 1 (the document is considered irrelevant). 3.1 Evaluation Method Two sets of experiments i.e. (i) Finding the most salient word using statistical mean and (ii) Finding the most salient word using contribution ratio are done to evaluate performance of the proposed methods. The results are compared with the results obtained from ‘Two Level Disambiguation method’ used here as baseline method. The Two Level Disambiguation method uses the entire context of query to translate a particular source query word. This method was proposed in the research paper [20]. 3.2 Evaluation Result The protocol used for evaluation is inspired by TREC (Text REtrieval Conference). Precision values are computed after ten (Prec@10), twenty (Prec@20) and fifty (Prec@50) documents, respectively. This decision is since most of the user’s search result click activity (89.8%) happens on the first page of search results [21]. The other reason being that at times, users instead being given the complete list of results is only presented with the top n documents retrieved. So, accessing a CLIR system using these evaluation measures is very significant. Another evaluation measure, the Normalized Discounted Cumulated Gain (NDCG) calculated on the first ten documents retrieved and then on the entire rank (NDCG@10 and NDCG respectively) and Mean Average Precision (MAP) are considered to evaluate our proposed methods with all query terms approach (Baseline) [22]. The results are shown in Tables 2 and 3. Table 2. CLIR experiment results with FIRE 2008-12 Queryset on Google search engine Experimental run
P@10
P@20
P@50
NDCG@ 10
NDCG
MAP
Percentage Baseline
All query words approach (Baseline)
0.483
0.420
0.309
0.405
0.434
0.518
–
Single query word using statistical mean method
0.344
0.251
0.218
0.393
0.430
0.441
85.13%
Single query word using contribution ratio method
0.397
0.326
0.212
0.398
0.432
0.453
87.45%
The NDCG uses graded relevance scale to measure relevancy or usefulness of a document based on its position in the set of documents retrieved against user query. More the document is at higher rank in the list, the more it is relevant [23]. The evaluation has been done on Hindi documents retrieved using Google and Bing search engines.
English-Hindi Cross Language Query Translation and Disambiguation
55
Table 3. CLIR experiment results with FIRE 2008-12 Queryset on Bing search engine Experimental run
P@10
P@20
P@50
NDCG@ 10
NDCG
MAP
Percentage Baseline
All query words approach (Baseline)
0.412
0.358
0.263
0.392
0.342
0.441
–
Single query word 0.301 using statistical mean method
0.198
0.115
0.386
0.336
0.363
82.31%
Single query word 0.322 using contribution ratio method
0.200
0.187
0.389
0.339
0.379
85.94%
Average precision
The proposed methods have been evaluated on Bing search engine, just to verify the possibility of proposed methods being favored by a search engine. Figure 1 and 2 plots the average precision reached by individual queries in the test query set on Google and Bing search engines for three disambiguation methods.
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Query Number Baseline
Stascal Mean
Contribuon Rao
Fig. 1. Graph representing different average precision values obtained on Google search engine
Tables 2 and 3 display comparison between results achieved by the two proposed methods of disambiguation, and the one obtained by the baseline on various metrics. The performance of these proposed methods is 85.13% and 87.45% respectively of
P. Maurya
Average precision
56
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Query Number Baseline
Stascal Mean
Contribuon Rao
Fig. 2. Graph representing different average precision values obtained on Bing search engine
baseline run on Google while 82.31% and 85.94% respectively on Bing search engine. The NDCG score of the two proposed methods are 0.430 and 0.432 and 0.336 and 0.339 on Google and Bing respectively. It can be clearly seen that the performance of two proposed methods is comparable to the baseline for all the considered metrics. This quantifies the capability of proposed approaches of producing an effective documents ranking, which is also certified by the values of the MAP and MAP@n documents obtained. The values obtained for the NDCG and NDCG@10 metrics emphasize that produced rankings are effective from a quality point of view also for the two methods of finding the most salient seed word for disambiguation. Usually researcher’s use all the terms present in the query for disambiguation as it increases the context for disambiguation. When the entire query is used as a context for disambiguation, all the terms are assumed as equally important. But this is not true always. All the terms in the source query are not as predicative of the word being translated as others and thus treating all query terms as uniformly important may not always be a wise decision. The co-occurrence information calculated using Dice coefficient asserts the dependency between the query terms and the translation candidate. The relationship is quite stronger between the words that are highly dependent upon each other as compared to those that are outside such dependency constructs. But due to syntactic ambiguity, these dependencies are hard to learn. On inspecting the experimental output of the research work, it shows that such dependencies can be identified using statistical mean or contribution ratio along with DC and achieves satisfactory precision.
4 Conclusion Word co-occurrence feature has often been used for translation disambiguation task. Usually researchers have utilized this feature to disambiguate source terms considering
English-Hindi Cross Language Query Translation and Disambiguation
57
all the remaining terms of the query as a context for disambiguation. The work discussed in this paper is a variant from previous works by proposing new methods to find the most salient seed word for performing disambiguation and verifying that this concept of salient seed word works well in Indian language perspective also. Both the methods described in this paper to locate the seed word perform the task well. This can be confirmed from the evaluation results which are obtained by evaluating the two proposed approaches on the document collection obtained using TREC guidelines and reach 85% in terms of document retrieval performances when compared with the baseline approach. Thus, finding the most salient seed word can be used instead of considering the entire context of the query for disambiguation. It needs further research to see if n best salient seed words can make the results even better. In future the effect of using other measures of association strength between terms on finding the best salient word will also be explored.
References 1. Sever, Y., Ercan, G.: Evaluating cross-lingual textual similarity on dictionary alignment problem. Lang. Res. Eval. 1–20 (2020) 2. Bhattacharya, P., et al.: Using communities of words derived from multilingual word vectors for cross-language information retrieval in Indian languages. ACM Trans. Asian Low Res. Lang. Inf. Process. 18(1), 1–27 (2018) 3. https://assets.kpmg.com/content/dam/kpmg/in/pdf/2017/04/Indian-languages-Defining-Ind ias-Internet.pdf 4. Çakal, Ö.Ö., Mahdavi, M., Abedjan, Z.: CLRL: feature engineering for cross-language record linkage. In: EDBT, pp. 678–681 (2019) 5. Chandra, G., Dwivedi, S.K.: Query expansion for effective retrieval results of hindi–english cross-lingual IR. Appl. Artif. Intell. 33(7), 567–593 (2019) 6. Rekabsaz, N., et al.: Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian. Computation and Language, Cornell University Library, arXiv.org > cs > arXiv:1711.06196, (2017) 7. Mohamed, E., Elmougy, S., Aref, M.: Toward multi-lingual information retrieval system based on internet linguistic diversity measurement. Ain Shams Eng. J. 10(3), 489–497 (2019) 8. Cheung, P., Fung, P.: Translation disambiguation in mixed language queries. Mach. Transl. 18, 251–273 (2004) 9. Lacerra, C., Bevilacqua, M., Pasini, T., Navigli, R.: CSI: a coarse sense inventory for 85% word sense disambiguation. In: AAAI, pp. 8123–8130 (2020) 10. Wang, Y., Yin, F., Liu, J., Tosato, M.: Automatic construction of domain sentiment lexicon for semantic disambiguation. Multi. Tools Appl. 79, 22355–22373 (2020) 11. Rosenfeld, R.: A Corpus-Based Approach to Language Learning, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (1995) 12. Dagan, I., Alon, I.: Word sense disambiguation using a second language monolingual corpus. Comput. Linguist. 20, 564–596 (1994) 13. Zagibalov, Taras, Carroll, J.: Automatic seed word selection for unsupervised sentiment classification of Chinese text. In: COLING 2008: Proceedings of the 22nd International Conference on Computational Linguistics. ACL, pp. 1073–1080, Morristown, NJ, USA (2008) 14. Butnaru, A.M., Ionescu, R.T.: ShotgunWSD 2.0: Aan improved algorithm for global word sense disambiguation. IEEE Access, 7, 120961–120975 (2019)
58
P. Maurya
15. Li, Z.H.I., Yang, F.A.N., Luo, Y.: Context embedding based on bi-LSTM in semi-supervised biomedical word sense disambiguation. IEEE Access 7, 72928–72935 (2019) 16. https://www.techopedia.com/definition/26136/statistical-mean 17. Christof, M. and Bonnie, J. D.: Iterative translation disambiguation for cross-language information retrieval. In: Proceedings of the 28th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 520–527 (2005) 18. http://ltrc.iiit.ac.in/onlineServices/Dictionaries/Dict_Frame.html 19. http://www.cfilt.iitb.ac.in/Downloads.html 20. Bajpai, P., Verma, P., Abbas, S.Q.: Two level disambiguation model for query translation. Int. J. Electr. Comput. Eng. (IJECE) 8(5) (2018) 21. Spink, A., Jansen, B., Blakely, C., Koshman, S.: A study of results overlap and uniqueness among major web search engines. Inf. Process. Manage. 42(5), 1379–1391 (2006) 22. Bajpai, P., Verma, P., Abbas, S.Q.: English- Hindi cross language information retrieval system: query perspective. J. Comput. Sci. 14(5), 705–713 (2018) 23. Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Relay UAV-Based FSO Communications over Log-Normal Channels with Pointing Errors Ha Duyen Trung(B)
and Nguyen Huu Trung
Department of Aerospace Electronics, School of Electronics and Telecommunications, Hanoi University of Science and Technology, Room 410, C9 Building, No.1, Dai Co Viet Road, Hanoi 10000, Vietnam [email protected]
Abstract. This paper presents study of two-hop free-space optical (FSO) communication systems using a hovering unmanned aerial vehicles (UAVs) relay over atmospheric turbulence-induced fading, pointing error due to the position and orientation deviations of hovering UAVs and link interruption due to the joint effects of angle-of-arrival (AoA) fluctuation and receiver field-of-view (FoV) limitation. UAV acts as an amplify-andforward (AF) relay-assisted FSO communications that employs singlecarrier quadrature amplitude modulation (QAM) considering the influence of the pointing error displacement standard deviation. An analytical expression is derived to evaluate the average symbol error rate (ASER) performance of the system model. Tight lower and upper approximations of the Gaussian Q-function are proposed for accurately approximating the ASER performance. Numerical results show the impact of pointing errors on the system performance over the log-normal channels. Specifically, the impact of the transmitter beam waist radius on the system’s performance is more significant in low regions of the pointing error displacement standard deviation than that of high regions. Keywords: Performance bounds · Free-space optics (FSO) · Amplify-and-forward · Log-normal channels · Pointing errors
1
Introduction
There has been recently a fast growing development of unmanned aerial vehicles (UAVs), also known as “drones”, for widely civilian applications, especially in industry, intelligent agriculture, forestry, live television [1]. Their coses are declining while providing more efficiency. In particular, UAVs may move to complex locations or remove areas, which can be dangerous to humans and hover stably over the desired area acting as relay node to cooperate communication links between the ground nodes and a central point (CP) [2]. In these circumstances, free-space optical (FSO) communications, a license-free, cost-effective, easy to deploy, high bandwidth and security access technique, has received considerable c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 59–68, 2021. https://doi.org/10.1007/978-3-030-71187-0_6
60
H. D. Trung and N. H. Trung
candidate for fronthauling the multimedia data gathered by the flying UAVs to the CP [1,3]. However, one of major deteriorations to the performance of UAVbased FSO communications is the effect of turbulence conditions caused by the refractive index variations due to temperature heterogeneity, pressure fluctuations in the air propagation path of the laser beam [4,5]. Another important factor is UAV’s fluctuations results the pointing error displacement deviation [6]. These result in irradiance fluctuations in the received signal, severe directly degradation of the tracking system performance. In terrestrial FSO communication systems, the AF multi-hop relaying model has been proposed as a promising solution to extend the transmission length and mitigate the atmospheric turbulence induced fading due to fluctuations of signal intensity, known as scintillation [5]. This is because of the fact that the fading is reduced proportionally to the link length. In addition to the atmospheric turbulence, misalignment between transmitter and receiver due to building sway causes vibrations of the transmitted beam and also limits the performance of atmospheric FSO communication system [7–13]. Most recently, authors in [14– 20] has been studied on the various performance of multi-hop both amplifyand-forward and decode-and-forward terrestrial FSO communication systems with/without pointing errors over different kinds of turbulence channels. However, in these works, the exact error rate performance have not been clarify. In this paper, we study analyze the pointing error effects on performance of FSO systems employing single-carrier QAM signals and UAVs acting as AF relays over weak atmospheric turbulence channel modeled by the log-normal distribution. In the analysis, a expression for ASER will be derived taking into account the atmospheric loss. Influence of the number of relaying stations and the effects of pointing error displacement standard deviation on the system’s ASER performance are also quantitatively analyzed. Moreover, performance in recent studies has computed integrals directly involving the Gaussian Q-function, which are impossible to express in terms of elementary functions. As a result, approximating the Gaussian Q-function in closed-form expressions with high accuracy becomes a essential. Therefore, this paper is also to accurately approximate the ASER performance by employing tight approximations on upper bound Qub (·) and lower bound Qlb (·). The rest of this paper is organized as follows. In Sect. 2, we introduce system and channel models of multi-hop serial relay FSO transmission considering the pointing errors over log-normal atmospheric turbulence channel. In Sect. 3, bounds of performance on the error rate of the considered system are proposed. In Sect. 4, the numerical results for performance bounds are presented. The paper concludes with a summary given in Sect. 5.
2 2.1
System and Channel Models System Model
We consider an AF two-hop relay FSO transmission system as shown in Fig. 1, which operates over independent and not identically distributed fading chan-
Relay UAV-Based FSO Communications
61
Fig. 1. A typically serial relaying FSO communication system
nels. The source node S and destination node D can be connected using optical wireless links arranged in an end-to-end configuration such that the source node S communicates with the destination node D through c relaying nodes R1 , R2 , ..., Rc−1 , Rc . It is assumed that all relaying terminals concurrently receive and transmit in the same frequency band, and no latency is incurred in the whole chain of relay transmissions and there is no multihop diversity. In Fig. 2, at the transmitter of the first hop, QAM symbol is up-converted to an intermediate frequency fc to generate the electrical e(t). This electrical QAM signal is then used to modulate the intensity of a laser of the transmitter. Therefore, the transmitted optical intensity is s(t) = Ps {1+κ[sI (t) cos(2πfc t)−sQ (t) sin(2πfc t)]}, where Ps denotes the average optical power per symbol, κ is the modulation index, and fc is the intermediate frequency. Due to the effects of atmospheric loss, atmospheric turbulence and the pointing error, the received optical intensity signal at the first relaying terminal is s1 (t) = XPs {1+κ[sI (t) cos(2πfc t)−sQ (t) sin(2πfc t)]}. X presents the signal scintillation caused by atmospheric lost, atmospheric turbulence and pointing errors. At each relaying terminal, an AF module is used for signal amplification. The electrical signal output of the AF module at the first relaying node is therefore expressed as e1 (t) = XPs PAF κe(t) + υ1 (t), where and PAF are the responsivity of the PhotoDetector (PD) and the AF’s amplify power, respectively. The receiver noise υ1 (t) can be modeled as an additive white Gaussian noise (AWGN) process with power spectral density hops, the output electrical N0 . Repeating the above process over next c c signal c i + i=1 υi (t). of the PD at the D terminal is re (t) = Ps κe(t) i=1 2i+1 Xi+1 PAF Plus, the instantaneous signal-to-noise ratio (SNR), denoted as γ, at the input of the electrical demodulator of the optical receiver of the destination terminal, is defined as the ratio of the time-averaged AC photo-current power to
Fig. 2. A two-hop serial relaying FSO-UAV communication system
62
H. D. Trung and N. H. Trung
Fig. 3. The source node (transmitter), relaying node and destination node (receiver) of FSO systems using SC-QAM signals
i c )] [κPs ci=1 (2i+1 Xi+1 PAF 2 the total noise variance. γ = = γ¯ [ i=1 Xi+1 ] , where N0 c 2 i ) /N0 is defined as the average electrical SNR and γ¯ = κPs i=1 (2i+1 PAF N0 is the total noise variance.
2.2
2
Channel Model with Pointing Errors
As described above, X represents the optical intensity fluctuations resulting from atmospheric loss, Xl , atmospheric turbulence, Xa , and pointing error, Xp , which can be described as X = Xl Xa Xp . Atmospheric Loss. Atmospheric loss Xl is a deterministic component that exhibits no randomness in its behavior, thus acting as a fixed scaling factor over a long time period. It is modeled in [5] as Xl = e−σL , where σ denotes a wavelength and weather - dependent attenuation coefficient, and L is the link distance. Log-Normal Channel Model. The log-normal distribution is used to model the weak atmospheric turbulence channel. The probability density function (pdf) of the irradiance intensity, Xa , is given by [5] 2 ln(Xa ) + 0.5σI2 1 √ exp − fXa (Xa ) = . (1) 2σI2 Xa σI 2π
Relay UAV-Based FSO Communications
In Eq. (1), σI2 is the log intensity, it is defined by ⎛ ⎜ σI2 = exp ⎝
0.49σ12 12/5
1 + 0.18d2 +0.56σ2
7/6 +
63
⎞ 0.49σ12 12/5
1 + 0.18d2 +0.56σ2
⎟ 7/6 ⎠−1, (2)
where d = kD2 /4L, k = 2π/λ is the wave number, λ is the wavelength, D is the receiver diameter, and σ2 is the Rytov variance, which is defined in [1] as σ2 = 0.492Cn2 k 7/6 L11/6 . Cn2 is the altitude-dependent based refractive-index structure parameter. Through c relaying terminals, the pdf of Xa for (c + 1) AF relaying hops based FSO systems over log-normal atmospheric turbulence channel can be re-written as 2 ln(Xa )+0.5σI2 1 √ exp − fXa(Xa ) = . (3) 2σI2 (c+1)Xac+1 σI 2π Pointing Error Fading Model. A statistical pointing error model is developed in [10]. The model assumes a circular detection aperture and a Gaussian spatial intensity profile of the beam waist radius, ωz , on the receiver plane. Corre2 2 spondingly, the pdf of Xp is given as fXp(Xp ) = Aξξ2 Xpξ −1 , where 0 ≤ Xp ≤ A0 , 2
A0 = [erf(ν)] is the fraction of the collected power at the radial distance of √ x 2 0 [10]. The Gauss error function erf(·) is defined as erf(x) = 2/ π 0 e−t dt, √ √ the parameter ν = πr/ 2ωz with r and ωz denote the aperture radius and the beam waist at the distance z, respectively. The parameter ξ = ωzeq /2σs , is the ratio between the equivalent beam radius at the receiver and the pointing error displacement standard deviation, σs , at the receiver. The equivalent √ 2 πerf(ν) [10], where beam radius, ωzeq , can be calculated by ωzeq = ωz 2ν exp(−ν 2) 2 1/2 ωz = ω0 1+ λL/πω02 with ω0 is the transmitter beam waist radius at −3/5 z = 0, and = 1 + ω02 /ρ20 , ρ0 = 0.55Cn2 k 2 L is the coherence length. The Combined Channel Model. The unconditional pdf, fX (X), of the whole channel state, X, is obtained by calculating the combination of X and the distribution fXa(Xa ) fX (X) = fX|Xa (X|Xa ) fXa (Xa )dXa , where fX|Xa (X|Xa ) = 1 Xa Xl fXp
X Xa Xl
denotes the conditional probability given a turbulence chan-
nel state, Xa [10]. As a result, the unconditional pdf for log-normal atmospheric turbulence conditions through c relaying terminals can be expressed by
64
H. D. Trung and N. H. Trung 2 ξ2 fX (X) = X ξ −1 (c+1)(A0 Xl )ξ2 ∞ 1 × √ ξ 2 +c+1 σI 2π (X/Xl A0 ) Xa 2 ln(Xa ) + 0.5σI2 ×exp dXa . 2σI2
(4)
√ in the Letting t = {ln(Xa ) + a} 2σI , Eq. (4) can be obtained closedX 2 ln +a 2 X A √l 0 form expression as fX (X) = (c+1)(Aξ X )ξ2 X ξ −1 12 eb × erfc , where 2σ 0
I
l
a = 0.5σI2 + σI2 (ξ 2 + c) and b = σI2 (ξ 2 + c)(1 + (ξ 2 + c))/2. Again, letting n = 0.5σI2 +σI2 (ξ 2 +c)−ln(A0 Xl ), The pdf of X can be re-written as ln X +n ξ2 ξ 2−1 1 b √ e ×erfc X fX (X) = . (5) 2 (c+1)(A0 Xl )ξ2 2σI
3
Performance Bounds
The general ASER expression for evaluating the AF relaying FSO systems over ∞ the log-normal channel can be expressed by Pse = 0 Pe (γ)fγ (γ)dγ, where Pe (γ) denotes the conditional error probability (CEP) and fγ (γ) is the pdf of SNR, γ. When using general (MI × MQ ) - QAM constellations with two independent MI in-phase and MQ quadrature signal amplitudes, the CEP is √ √ √ √ Pe (γ) = 2q(MI )Q(AI γ)+2q(MQ )Q(AQ γ)−4q(MI )q(MQ )Q(AI γ)Q(AQ γ) [10]. Here, q(x) = 1 − x−1 is the Gaussian Q-function, which is defined as √ √ ∞ Q(x) 0.5erfc(x/ 2) = 1/ 2π exp(−t2 /2)dt. (6) x
which relates to the terms of the complementary error function erfc(·). AI and AQ can be calculated from MI , MQ and in-phase, quadrature distances [10]. The ASER is therefore calculated as ∞ ∞ √ √ Q(AI γ)f (γ)dγ +2q(MQ ) Q(AQ γ)f (γ)dγ Pse = 2q(MI ) 0 0 (7) ∞ √ √ −4q(MI )q(MQ ) Q(AI γ)Q(AQ γ)f (γ)dγ. 0
Computing this ASER requires to work with integrals involving the Gaussian Q-function, which cannot be expressed in closed-form in terms of elementary functions. As a result, approximating the Gaussian Q-function in closed-form expressions with high accuracy becomes a necessity. To accurately approximate the above ASER evaluation, we use tight approximations on upper bound Qub (x) and lower bound Qlb (x) given by eq. (13) in [21] and eq. (17) in [22]. 2 1 x 1 1 √ (8) exp − exp −x2 − √ exp −3x2 . Qub (x) = − √ 2 2πx 2 2πx 6 2πx
Relay UAV-Based FSO Communications
65
14 2 37 2 38 2 x 2x x exp − x2 . (9) Qlb (x) = √ exp − x + √ exp − x + 54 27 π 3 3 2π 3 2π
4
Numerical Results
Using the derived expressions, Eq. (7), fX (X), γ and the tight approximation bounds, we present ASER performance of the AF relaying FSO systems in the log-normal channels with effects of the pointing error displacement standard deviation. Relevant parameters and constants are provided in Table 1. Table 1. System parameters and constants Parameter
Symbol
Value
Laser wavelength
λ
1550 nm
Index of refraction structure
Cn2
10−15 m−2/3
Photodetector responsivity
1 A/W
Modulation index
κ
1
Attenuation coefficient
σ
3.436
Total noise variance
N
10−7 A/Hz
Link distance
L
1000 m
Amplify power
PAF
3.5 dB
In-phase × Quadrature signal amplitudes MI × MQ 8 × 4 The number of relaying stations
c
Fig. 4. ASER vs. σs in the low σs regions
0, 1
66
H. D. Trung and N. H. Trung
Fig. 5. ASER vs. σs in the high σs regions
Figures 3 and 4 illustrate the ASER performance against the pointing error displacement standard deviation σs in the low regions (0.05 ÷ 0.1) (Fig. 4) and in the high regions (0.1 ÷ 0.15) (Fig. 5), for various values of the transmitter beam waist radius ω0 = 0.016 m, and 0.022 m, with/without relaying terminals c = 0, 1. It can be seen that the ASER characteristics of exact, upper and lower performance bounds are similar. As it clearly depicted in these figures that the system’s ASER is significantly decreases when the pointing error displacement standard deviation, σs , decreases. Therefore, the system’s performance is greatly improved when σs decreases. It is also shown that under the log-normal atmospheric turbulence condition and the number of relay stations c = 0, the pointing error influences severely on the system’s performance, since higher values of ASER are gained. The impact of the transmitter beam waist radius ω0 on the system’s performance is more significant in low σs regions than in high σs regions. We therefore can more easily adjust the system’s performance when the pointing error displacement standard deviation is in the low regions by changing the values of the transmitter beam waist radius. In addition, these figures also show that the use of AF relaying results in the decrease in ASER, particularly in the low σs regions.
5
Conclusions
We have analyzed the pointing error displacement standard deviation on the performance of AF relaying FSO communication systems employing SC-QAM signals and Gaussian Q-function bounds over weak atmospheric turbulence channels. The theoretical expressions for calculating the system’s ASER have been
Relay UAV-Based FSO Communications
67
applied to compare the exact, lower and upper bounds of average symbol error probabilities. The numerical results show the gaps between these bounds in low and high σs values, therefore they could be useful in evaluating design of such FSO transmission links. Acknowledgment. This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2020-SAHEP-015.
References 1. Alzenad, M., Shakir, M.Z., Yanikomeroglu, H., Alouini, M.-S.: FSO-based vertical backhaul/fronthaul framework for 5G+ wireless networks. IEEE Commun. Mag. 56(1), 218–224 (2018) 2. Tien, P.V., Nguyen, P.V., Trung, H.D.: Self-Navigating UAVs for supervising moving objects over large-scale wireless sensor networks. Int. J. Aerospace Eng. 2020, Article ID 2027340, 20 pages (2020) 3. Najafi, M., Ajam, H., Jamali, V., Diamantoulakis, P.D., Karagiannidis, G.K., Schober, R.: Statistical modeling of the FSO Fronthaul channel for UAV-based communications. IEEE Trans. Commun. 68(6), 3720–3736 (2020) 4. Trung, H.D., Tuan, D.T.: Performance of free-space optical communications using SC-QAM signals over strong atmospheric turbulence and pointing errors. In: Proceedings of IEEE Fifth International Conference on Commun. and Elect. (ICCE) pp. 42–47 (2014) 5. Uysal, M., Capsoni, C., Ghassemlooy, Z., Boucouvalas, A.: Optical Wireless Communications: An Emerging Technology. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-30201-0 6. Dabiri, M.T., Sadough, S.M.S., Khalighi, M.A.: Channel Modeling and Parameter Optimization for Hovering UAV-Based Free-Space Optical Links. IEEE J. Sel. Areas Commun. 36(9), 2104–2113 (2018) 7. Lee, E., Ghassemlooy, Z., Ng, W.P., Uysal, M.: Performance analysis of free space optical links over turbulence and misalignment induced fading channels. In: Proceedings of 8th CSNDSP, pp. 1–6, July 2012 8. Ahmed, A., Hranilovic, S.: Outage capacity optimization for free-space optical links with pointing errors. J. Lightw. Tech. 25(7), 1702–1710 (2007) 9. Djordjevic, G.T., Petkovic, M.I.: Average BER performance of FSO SIM-QAM systems in the presence of atmospheric turbulence and pointing errors. J. Modern Optics 63(8), 1–9 (2016) 10. Trung, H.D., Tuan, D.T., Pham, A.T.: Pointing error effects on performance of free-space optical communication systems using SC-QAM signals over atmospheric turbulence channels. AEU Int. J. Elec. Commun. 68(9), 869–76 (2014) 11. Wang, P., Wang, R., Guo, L., Cao, T., Yang, Y.: On the performances of relayaided FSO system over M distribution with pointing errors in presence of various weather conditions. Optics Commun. 367, 59–67 (2016) 12. Prabua, K., Kumar, D.S.: Polarization shift keying based relay-assisted free space optical communication over strong turbulence with misalignment. Optics Laser Tech. 76, 58–63 (2016) 13. Bhatnagar, M.R., Ghassemlooy, Z.: Performance Analysis of Gamma Gamma Fading FSO MIMO Links With Pointing Errors. J. Lightw. Tech. 34(9), 2158–2169 (2016)
68
H. D. Trung and N. H. Trung
14. Ai, D.H., Trung, H.D., Tuan, D.T.: On the ASER performance of amplify-andforward relaying MIMO/FSO systems using SC-QAM signals over log normal and gamma-gamma atmospheric turbulence channels and pointing error impairments. J. Inf. Telecomm. 4, 1–15 (2020) 15. Trung, H.D.: Performance analysis of FSO DF relays with log-normal fading channel. J. Optical Commun. (2019) 16. Trung, H.D., Hoa, N.T., Trung, N.H., Ohtsuki, T.: A closed-form expression for performance optimization of subcarrier intensity QAM signals-based relay-added FSO systems with APD. Phys. Commun. J. 31, 203–211 (2018) 17. Trung, H.D.: Performance analysis of amplify-and-forward relaying FSO/SC-QAM systems over weak turbulence channels and pointing error impairments. J. Optical Commun. 39(1), 93–100 (2018) 18. Ai, D.H., Tuan, D.T., Trung, H.D.: Pointing error effects on performance of amplify-and-forward relaying MIMO/FSO systems using SC-QAM signals over log-normal atmospheric turbulence channels. In: Proceedings of the 8th Conference on Intelligent Information and Database Systems, DaNang city, Vietnam, pp. 607–619 (2016) 19. Ai, D.H., Tuan, D.T., Trung, H.D.: AF relay-assisted MIMO/FSO/QAM systems in gamma-gamma fading channels. In: Proceedings of NICS 2016, Danang City, pp. 147–152 (2016) 20. Trung, H.D., Tuan, D.T.: Performance of amplify-and-forward relaying MIMO freespace optical systems over weak atmospheric turbulence channels. In: Proceedings of the NICS 2015, HoChiMinh City, pp. 223–228 (2015) 21. Fu, H., Wu, M.-W., Kam, P.-Y.: Explicit, closed-form performance analysis in fading via new bound on Gaussian Q-function. In: Proceedings of the 4th IEEE ICC Wireless Communication Symposium, pp. 5819–5823 (2013) 22. Fu, H., Wu, M.-W., Kam, P.-Y.: Lower bound on averages of the product of L Gaussian Q-functions over nakagami-m fading. In: Proceeding of the 77th IEEE VTC-Spring, pp. 1–5 (2013)
A Novel Generalized Form of Cure Rate Model for an Infectious Disease with Co-infection Oluwafemi Samson Balogun1(B) , Sunday Adewale Olaleye2 and Pekka Toivanen1
, Xiao-Zhi Gao1 ,
1 School of Computing, University of Eastern, Kuopio, Finland
{samson.balogun,xiao-zhi.gao,pekka.toivanen}@uef.fi 2 Department of Marketing, Management and International Business, University of Oulu, Oulu, Finland [email protected]
Abstract. Recently, researchers allow the analysis of the survival function of disease by the examination of the cure fraction. The extant literature discovers that the cure rate model is used for infectious but curable diseases and not infectious diseases with co-infection. Hence, a survival model that incorporates the cure rate of the management of infectious diseases with co-infection was adopted. This investigation aims to extend and develop a generalized model using the Bounded Cumulative Hazard (BCH) model, a non-mixture model for any infectious disease with a co-infection to estimate the performance of the management. The objective is to derive the appropriate probability density functions for sole infectious and coinfection disease and estimate the cure rate parameter for the two situations (sole and co-infection) using simulation data. The exponential distribution is commonly used in literature, but two-parameter Weibull distribution, a unique form of Weibull distribution, was employed in this study. The result compared using exponential distribution to accomplish the set objectives using the R package as the estimation procedure. The study concluded that the modified model, a generalized form of the cure rate model can accommodate an infectious disease with co-infection and derives the appropriate probability density function. It is also possible to extend the generalized model to other forms of distributions. This study explains the limitation of the study, the contributions, managerial implications and suggest future work. Keywords: Cure rate model · Weibull · Loglikelihood · Infectious disease · Co-infection · BCH model
1 Introduction “Survival analysis applies in many study areas, including epidemiology, public health, medicine, and biology. The survival data is used to show time-to-event data modeling, which may be applicable to being alive or time until death as the case may be. Survival time or failure time defines as a time to event of interest” [1]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 69–79, 2021. https://doi.org/10.1007/978-3-030-71187-0_7
70
O. S. Balogun et al.
Research and development in the medical science world is currently enhancing the life cycle of patients with a wide variety of diseases, including deadly diseases such as cancer, tuberculosis and HIV/AIDS. As a result, many patients with a particular form of cancer are permanently healed. In this sense, there is a heterogeneity in the number of patients in the population. That is, most of the people who responded favorably to drugs are said to be free of any signs, and infection symptoms are assumed to be healed and immune to the disease. The other remaining fraction of the populace on whom treatment has no significant impact in developing a relapse of the disease and considers it susceptible or un-cured. The progress of research in the area of cure motivates this research, which is paramount in any epidemiological research couple with less research that focuses on the area of an infectious disease with possible co-infections. It is pertinent that many patients with such diseases can get cure with active medical intervention, but due to some body’s chemistry, not all will benefit from such interventions. It is essential to differentiate between the possibility of cure and the predicted improvement in recovery time for uncured patients, even with the advancement in medical care. Such intervention is also possible in numerous places in statistical or epidemiological research. From time to time, the cure model is an underutilized statistical tool even though it develops in the statistical literature. However, there are perhaps plenty of publications in oncology that use cure models. However, when sought on prominent and quality databases, we could only find a significant number of research publications on cure models for an infectious disease with co-infection, which is the work by Balogun and Jolayemi [2]. The primary aim of this work is to extend and improve a broad-based model, a non-mixed model for infectious diseases, based on the BCH model to estimate the performance of the management. And the objective of this work is to derive the appropriate probability density functions for sole infectious and co-infection disease and estimate the cure rate parameter for the two situations (sole infectious and co-infected disease) using a simulated data to enhance the applicability of cure models and elucidate the same to the non-statistician like public health researcher, psychiatrist, educationist, criminologist, and reliability engineers. This research is restricted to using the Weibull distribution of two parameters which is a special type of Weibull as the baseline. Models of cure fractions focus primarily on the fraction of patients cured or live for a longer period after a disease. These models also concentrate on predicting the likelihood of survival of uncured patients at some stage [3]. Based on the background of the study, it is crucial to see some of the work done in the literature: Boag [4] was the first to use the cure model as a product of survival functions based on log-normal distribution and context distribution of the normal population in the survival function of uncured fractions. This model is used in many medical applications. For instance, Struthers and Farewell [5] used a cure rate model to examine the incubation time of AIDS for HIV-I positive patients and results is much better when compared to the standard survival models. In another example, Tournoud and Ecochard [6] employed this model for analyzing an HIVI mother-to-child transition (MTCT) and for nosocomial urinary tract infections dataset. The model was elaborately studied to determine the delay from contamination to the event of interest, considering successive and multiple exposures to the infectious agent.
A Novel Generalized Form of Cure Rate Model
71
The model was also used to ascertain which part of the infection is due to each exposure occasion. This cure rate model was also utilized by Balogun and Jolayemi [2] for the estimation of the cure rate by simulating for an infectious disease with co-infection data using the exponential distribution to estimate cure rate. The cure rate model is applicable and useful in different fields including economics, reliability, and health [2, 7]. In the past, according to Barriga et al. [8], Oliveira, and Louzada [9, 10] cure rate model was utilized to ascertain the loan performance and to determine loan recovery. To assess the percentage of convicts who will return to prison, Maller and Zhou used the cure rate model [11]. “A flexible model for survival data indicates a power series cure rate model. A survival data with the presence of cure fraction was proposed based generalized Bernoulli and Geometric Poisson distribution as a determinant of the best fit when using a Cutaneous Melanoma data” [12]. In the research carried out by Varshney et al., the cure rate model was used to determine the proportion long-term survivors of HIV/AIDS who are receiving antiretroviral (ART) [13]. Several authors such as Aljawadi et al. [14–16], Brown, and Ibrahim [17], Uddin et al. [18] and Chen et al. [19] have used BCH modeling incorporating the Expected Maximum (EM) algorithm, and Maximum Likelihood Estimation (MLE) for the cure rate model. Chukwu and Folorunsho [20] applied different parametric cure models to estimate the proportion of patients cured of gastric cancer and they also with the most flexible among the models used.
2 Materials and Methods The study discusses the appropriate methodology used in this section. A non-mixture model called BCH is used to come up with the generalized form of a cure-rate model for single infectious disease with possible co-infection. 2.1 Standard Cure Rate Model The initial model used by authors in the literature is the standard cure rate. The parameters associated with the model are described as follows: let q represent the proportion of patient cured and (1 − q) be the proportion of the uncured. Then, S(t)is the overall population survival function at any time t. The conventional standard cure rate is written as: S(t) = q + (1 − q)Su (t)
(1)
where, S(t) is the survival functions of the entire population while Su (t) is the survival function of the uncured) which may be assumed to follow some parametric distribution to estimate cure fraction q. The probability density function fu (t) of the overall population, written as: f (t) = (1 − q)fu (t) where fu (t) is the probability density function (pdf) of the uncured fraction.
(2)
72
O. S. Balogun et al.
2.2 Estimation of Cure Fraction Model (Non-mixture Model) The parameters of the model implemented in this work is estimated using the MLE method. The baseline distribution for the model can be parametric, nonparametric, and semi-parametric. Thus, the baseline distribution used in this work is the parametric model. Now, let (ki , bi , ti , gi ) be the observed data of size n, where ti is the survival time of the ith patient, ki represents censoring indicator variable which is defined as follows: ki is equal to zero for uncensored observation and one for censored observation, bi represents cure indicator variable where bi is equal to zero for cure patient and one for uncured patient while gi represent the disease indicator variable which can also be defined as gi is equal to zero for sole infectious disease and one for co-infected disease (i = 1, 2,.., n). This study modifies Eq. (1) to derive a model that can accommodate a disease with co-infection, the modified form of the model presented as the generalized of survival cure rate is hence written as: 1−gi g × q + (1 − q)Su (t) i (3) S(t) = q + (1 − q)Su (t) q = exp(−ϑ)
(4)
2.3 Cure Model Likelihood Estimation The cure model likelihood estimation using the BCH model is: ki 1−ki {S(ti )}1−bi lc = {f (ti )}bi
(5)
Putting (5) into (1) 1 − F(ti ) = q + (1 − q)(1 − Fu (ti ))
(6)
Differentiating (5) with respect to t, we have f (ti ) = (1 − q)(1 − Fu (t))
(7)
Put (7) into (2), it gives: ki 1−ki {q + (1 − q)Su (ti )}1−bi lc = {(1 − q)fu (ti )}bi
(8)
The generalized form for individual patient’s contribution in the likelihood is given by: ⎧
⎫ n ki 1−ki 1−gi ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ {fu (ti ) (1 − q)}bi {q}1−bi {(1 − q)(1 − Su (t))}bi log ×⎪ ⎪ ⎪ ⎨ ⎬ i=1 lc =
n ⎪ k 1−ki gi ⎪ ⎪ ⎪ ⎪ ⎪ bi i 1−bi ci ⎪ ⎪ {(1 − q)(1 − Su (t))} {fu (ti ) (1 − q)} {q} log ⎪ ⎪ ⎩ ⎭ i=1
(9) The generalized form for individual patient’s contribution in the likelihood given Eq. (9) will be used on the baseline distributions. In this study we will be using two parametric distributions: exponential and two-parameter Weibull distribution.
A Novel Generalized Form of Cure Rate Model
73
2.4 Distributions of the Cure Model Two parametric models were used for the generalized cure rate model which is based on the BCH model to determine the bias, variances and the fraction of the patients which is cured and uncured. (1) Exponential distribution (Exp.): A random variable t is considered an exponential if the pdf and survival function is defined by: f (t) = exp(−λt), t > 0 and S(t) = λexp(−λt) In most literature exponential as the baseline distribution for Su (t) and fu (t) is commonly used. Therefore, the probability density function and its survival function for the uncured faction is given below: fu (ti ) = e−λt Su (ti ) = λe−λt
(10)
The complete data likelihood becomes:
⎤ n b ki 1−b b 1−ki 1−gi i i i λe−λt 1 − e−ϑ e−ϑ ×⎥ 1 − e−ϑ 1 − e−λt ⎢ log ⎢ ⎥ i=1 ⎢ ⎥ lc = ⎢
g ⎥ ⎢ n b ki 1−b b 1−ki i ⎥ ⎣ ⎦ i i i λe−λt 1 − e−ϑ e−ϑ 1 − e−ϑ 1 − e−λt log ⎡
(11)
i=1
Simplifying Eq. (6), it becomes: ⎫ m 1 ⎪ [(1 − gi )bi ki log 1 − e−ϑ ] − ϑ [(1 − gi )(1 − bi )(1 − ki )] ⎪ ⎪ ⎪ ⎪ ⎪ i=1 i=1 i=1 i=1 ⎪ ⎬ m m m m 2 2 2 1 + gi bi ki log λ − gi bi ki λti + [gi bi log 1 − e−ϑ ]− (1 − gi )(1 − ki )bi log 1 − e−λti + ⎪ ⎪ i=1 i=1 i=1 i=1 ⎪ ⎪ m m ⎪ 2 2 ⎪ −λt ⎪ ⎭ ϑ [gi (1 − bi )(1 − ki )] + [gi (1 − ki )bi log 1 − e i ] lc =
m 1
[(1 − gi )bi ki log λ] −
i=1
m 1
[(1 − gi )bi ki λti ] +
m 1
i=1
(12) the solution of
∂lc ∂ϑ
and
∂lc ∂λ
∂lc = ⎣(1 − ki )(1 − bi ) ⎩ ∂ϑ
∂lc = ∂λ
= 0 which are the desired estimates of ϑ and λ where
⎧ m1 ⎨
⎡
(1 − gi ) +
i=1
m2 i=1
⎫⎤ ⎫⎤ ⎡ ⎧ m1 m2 ⎨ ⎬ ⎬ 1 ⎦ ⎣ bi + (gi ) (1 − gi ) + (gi ) ⎦ ⎩ ⎭ ⎭ 1 − e−ϑ i=1
! m1 m2 bi ki (1 − gi ) + (gi ) i=1
i=1
λ bi (1 − ki )
− bi ki ti "m 1 i=1
"m 1
⎫ ⎪ ⎪ # ⎪ m2 ⎪ ⎪ ⎪ (1 − gi ) + (gi ) +⎪ ⎬
i=1
(1 − gi ) +
(13)
i=1
m2 i=1
#$ (gi )
i=1
%⎪ ⎪ ⎪ ⎪ ti ⎪ ⎪ ⎪ −λt i e −1 ⎭
(14)
74
O. S. Balogun et al.
Expanding Eq. (14), it becomes: ⎡
m1
⎢ ϑ = log⎢ m1 ⎣
(1 − gi )bi +
i=1
m2
⎤ gi bi
i=1
[(1 − gi )(1 − ki )(1 − bi )] +
i=1
m2
[(gi )(1 − ki )(1 − bi )]
⎥ + 1⎥ ⎦
(15)
i=1
(2) Two parameter Weibull distribution (2P Wei.): two Weibull parameter distributions are said to follow a random variable t if the density function is met. The probability density function and the survival function of 2P Wei. is given as: & $ % ' & $ % ' t λ t λ λ λ−1 , t > 0, λ = 0, β = 0 and S(t) = exp − f (t) = t exp − β β β Using two parameter weibull as the baseline distribution for Su (t) and fu (t), the pdf and the survival function of the uncured fraction becomes: & $ % ' & $ % ' t λ λ λ−1 t λ exp − Su (ti ) = exp − (16) fu (ti ) = k t β β β So, complete data likelihood is given by:
lc =
⎧⎡ ⎤1−g ⎫ i⎪ ⎡ ⎪ & $ % ' # ⎤ki ⎡ " & & $ % ''#b ⎤1−ki ⎪ ⎪ n " ⎪ ⎪⎢ bi 1−b i λ λ ⎪ ⎪ ⎥ λ t t ⎪ ⎪ i −ϑ −ϑ −ϑ λ−1 ⎪ ⎪ ⎢ ⎥ ⎣ ⎦ ⎣ e ⎦ ⎪ ⎪ log 1−e 1−e t exp − 1 − exp − ⎪ ⎪ ⎣ ⎦ λ ⎪ ⎪ β β β ⎪ ⎪ ⎪ ⎪ i=1 ⎬ ⎨ ⎤g ⎪ ⎡ ⎪ ⎪ ⎡ ⎪ ⎪ & $ % ' # ⎤ki ⎡ " & & $ % ''#b ⎤1−ki i ⎪ ⎪ ⎪ n " ⎪ 1−b bi i λ λ ⎪ ⎪ ⎥ ⎪ ⎢ λ t t ⎪ i 1 − e−ϑ 1 − exp − ⎪ ⎪ ⎥ ⎪ ⎦ ⎣ e−ϑ ⎦ ⎣ 1 − e−ϑ ×⎢ t λ−1 exp − ⎪ ⎪ ⎪ ⎪ ⎦ ⎣log λ ⎪ ⎪ β β β ⎭ ⎩ i=1
(17) Simplifying Eq. (17), it gives: ⎫ $ %λ ⎪ t ⎪ −ϑ lc = [(1 − gi )bi ki ] log λ − [λ log β] + [(λ − 1) log t] − + log 1 − e +⎪ ⎪ ⎪ ⎪ β ⎪ ⎪ i=1 ⎪ ⎪
" ''# & & ⎪ % $ m1 ⎪ λ ⎪ ⎪ t ⎪ −ϑ ⎪ [(1 − gi )(1 − ki )] [(1 − bi )(−ϑ)] + bi [log 1 − e ] + log 1 − exp − ⎪ ⎪ β ⎬ i=1
$ % m 2 ⎪ ⎪ t λ ⎪ +[ + log 1 − e−ϑ +⎪ ⎪ (gi )bi ki ] log λ − [λ log β] + [(λ − 1) log t] − ⎪ ⎪ β ⎪ ⎪ i=1 ⎪ ⎪ & &
" ''# ⎪ $ % m2 ⎪ λ ⎪ ⎪ t ⎪ −ϑ ⎪ ] + log 1 − exp − [(gi )(1 − ki )] [(1 − bi )(−ϑ)] + bi [log 1 − e ⎪ ⎭ β
m1
(18)
i=1
The solution of
∂lc ∂ϑ
,
∂lc ∂λ
and
∂lc ∂β
= 0 are the desired estimates of ϑ, λ and β where
m 1
m 2
m gi bi (1 − gi )bi m2 1 ∂lc i=1 + i=1 − = − k − b + − k − b [(1 − g [(g )(1 )(1 )] )(1 )(1 )] i i i i i i ∂ϑ 1 − e−ϑ 1 − e−ϑ i=1
i=1
(19)
A Novel Generalized Form of Cure Rate Model
75
⎫ ⎪ ⎪ ⎪ %$ % $ ⎪ ⎪ t λ ∂lc t ⎪ i=1 [(1 − gi )bi ki log β] + [(1 − gi )bi ki log t] − = − +⎪ (1 − gi )bi ki log ⎪ ⎪ ⎪ ∂k λ β β ⎪ ⎪ i=1 i=1 i=1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ m ⎡ ⎤ ⎬ 2 gi bi ki m m m λ⎥ 1 ⎢ 2 2 log e ⎪ i=1 t t ⎪ − [gi bi ki log β] + [gi bi ki log t]− ⎣(1 − gi )(1 − ki ) log β λ ⎦+ ⎪ β ⎪ k ⎪ t ⎪ i=1 i=1 i=1 ⎪ ⎪ ⎡e β −1 ⎤ ⎪ ⎪ ⎪ m ⎪ m ⎪ λ λ⎥ ⎪ 2 2 ⎢ log e ⎪ t t t t ⎪ g gi bi ki log β + − k log (1 ) ⎣ ⎦ ⎪ i i β β β ⎪ ⎭ t λ m 1
(1 − gi )bi ki
i=1
m1
i=1
m1
m1
e β
−1
(20)
⎫ ⎤⎪ ⎪ m (1 − gi )bi ki λ ⎪ m1
$ $ $ % % % 1 ⎪ log e ∂lc t λ λ t λ ⎥⎪ ⎪ ⎢ i=1 [(1 − gi )(1 − ki )λ]⎣ =− − + (1 − gi )bi ki ⎪ ⎦⎪ λ ⎪ ⎪ ∂β β β β β t ⎪ ⎪ i=1 i=1 ⎪ β ⎪ β − βe ⎪ ⎬ m ⎤ ⎡ 2
⎪ g1 bi ki λ m2 m2 $ $ $ % % % ⎪ λ λ ⎪ t λ t log e ⎥⎪ ⎢ i=1 ⎪ − gi bi ki [gi (1 − ki )λ]⎣ − + ⎦⎪ ⎪ λ ⎪ β β β β ⎪ t ⎪ i=1 i=1 ⎪ β ⎪ β − βe ⎪ ⎪ ⎪ ⎭ m 1
⎡
(21) Solving Eq. (19) we get Eq. (15)
3 Simulation Result and Discussion This section involves the simulation results and discussion of the result. 3.1 Simulation Results The survival times is generated using an inbuilt function in the R package for exponential and Weibull distribution model. The samples sizes are set to (500 and 1000) for the two distributions, and different values were used for the censoring rate. The parameters in the two model were estimated before maximizing the log-likelihood function for both models. The simulation was done using the R package, and the figures SPSS was used. 3.2 Discussion The objective is to derive the appropriate probability density functions for sole infectious and co-infection disease and estimate the cure rate parameter for the two situations (sole and co-infection) using simulated data. Two parametric models were used as the baseline distribution, namely: exponential and two-parameter Weibull distribution for the proposed model to achieve this goal. The simulation suggests that the proposed model performs well in terms of the bias and the cure rate. This study has shown the consistency of the model as the samples size tends to infinity, the variance tends to zero (i.e. decreases) and the bias is very close to zero. This indicates the model is non-biased. Using the sample sizes of (500 and 1000),
76
O. S. Balogun et al.
the estimates of the bias, variances and the fraction of the patients which is cured and uncured is given in Table 1 and 2 for sole infectious and co-infection patients using exponential distribution and two parameters Weibull distribution, respectively. Also, the study noticed from the simulation that the bias shows that the Exponential distribution was a function of the sample while Weibull (2P) was not. Table 1. Sole infectious disease patients simulation Distribution
Exp. 2P (Wei)
Sample size (n)
q(%)
1− q(%)
Bias
Variance
500
23
77
0.00057
0.3004
500
60
40
0.00057
0.1661
Exp.
1000
24
76
0.00056
0.3109
2P (Wei)
1000
58
42
0.00056
0.1607
Table 2. Co-infected disease patients simulation Distribution
Sample size (n)
q(%)
1− q(%)
Bias
Variance
Exp.
500
25
75
0.00059
0.2748
2P (Wei)
500
24
76
0.00059
0.3008
Exp
1000
25
75
0.00064
0.3024
2P (Wei)
1000
25
75
0.00064
0.2834
The histogram used as descriptive statistics for this work for the monitoring period for the sole and co-infected disease. Figure 1 and 2 shows the monitoring period for the sole infectious and co-infected disease using exponential distribution as the baseline distribution, and Fig. 3 and 4 also shows the monitoring period for the sole infectious and co-infected disease using 2-parameter Weibull as the baseline distribution. The histogram shows the shape of the distributions used in the simulations (see Fig. 1 and 4). 3.3 Contribution The study shows how simulated dataset can be used to determine if a model performs well or not. In the study, we were able to come up with a novel model which can efficiently evaluate and examine an infectious disease with possible co-infections.
A Novel Generalized Form of Cure Rate Model
77
Fig. 1. Showing the histogram for the time based on exponential distribution for sole and coinfected disease (simulation for n = 500)
Fig. 2. Showing the histogram for the time based on exponential distribution for sole and coinfected disease (simulation for n = 1000)
Fig. 3. Showing the histogram for the time based on two parameter Weibull distribution for sole and co-infected disease (simulation for n = 500)
78
O. S. Balogun et al.
Fig. 4. Showing the histogram for the time based on two parameter Weibull distribution for sole and co-infected disease (simulation for n = 1000)
3.4 Managerial Implications This study will allow health managers and researchers working on co-infection infectious diseases to build models using other parametric models as a basis for distribution. In terms of methodology, this analysis would also allow managers to know how to make the right choices when faced with a particular analytical approach.
4 Conclusion This research is restricted to the use of simulation for the dataset and the use of two parametric models: exponential and two-parameter Weibull model. The simulation suggests that the proposed model performs well in terms of the bias and the cure rate and this research show the appropriate probability density functions for sole infectious and co-infection disease and estimate the cure rate parameter for the two situations using simulation data. It also shows that the extended model, which is a generalized form of the cure rate, can accommodate an infectious disease with co-infection by deriving the appropriate probability density function and also compared its result with the result of the distribution used in the literature. This paper serves as a wake-up call to the Government and Health-related agencies on the existence of some infectious disease and its co-infections and the danger associated with late diagnosis. Future research will be to apply more parametric model on a real-life dataset.
References 1. Zhao, G.M.A.: Nonparametric and Parametric Survival analysis of censored data with possible violation of method assumptions. Master’s Thesis (Unpublished). Faculty of Graduate School, University of North Carolina (2008) 2. Balogun, O.S., Jolayemi, E.T.: Modeling of a cure rate model for TB with HIV co-infection. Pac. J. Sci. Technol. 18(1), 288–299 (2017) 3. Taweab, F., Ibrahim, N.A.: Cure rate models: a review of recent progress with a study of change-point cure models when cured is partially known. J. Appl. Sci. 14, 609–616 (2014) 4. Boag, J.W.: Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Soc. Ser. (Methodol.), 11(1), 15–53 (1949)
A Novel Generalized Form of Cure Rate Model
79
5. Struthers, C.A., Farewell, V.T.: A mixture model for time to AIDS data with left truncation and an uncertain origin. Biometrika 76(4), 814–817 (1989) 6. Tournoud, M., Ecochard, R.: Application of the promotion time cure model with timechanging exposure to the study of HIV/AIDS and other infectious diseases. Stat. Med. 26(5), 1008–1021 (2007) 7. Nelson, W. B.: Applied Life Data Analysis. John Wiley & Sons, Hoboen (2005) 8. Barriga, G.D., Cancho, V.G., Louzada, F.: A non-default rate regression model for credit scoring. Appl. Stoch. Mod. Bus. Ind. 31(6), 846–861 (2015) 9. Oliveira, M.R., Louzada, F.: An evidence of link between default and loss of bank loans from the modeling of competing risks. Singaporean J. Bus. Econ. Manage. Stud. 3(1), 30–37 (2014) 10. Oliveira, M.R., Louzada, F.: Recovery risk: Application of the latent competing risk model to non-performing loans. Technologia de Credito 88, 43–53 (2014) 11. Maller, R., Zhou, X.: Survival Analysis with Long-Term Survivors. John Wiley and Sons Inc, New York (1996) 12. Cancho, V.G., Louzada, F., Ortega, E.M.: The power series cure rate model: an application to a Cutaneous Melanoma data. Commun. Stat. Simul. Comput. 42, 586–602 (2013) 13. Varshney, M.K., Grover, G., Ravi, V., Thakur, A.K.: Cure fraction model for the estimation of long-term survivors of HIV/AIDS patients under antiretroviral therapy. J. Commun. Dis. 50(3), 1–10 (2018) 14. Aljawadi, B.A., Bakar, M.R.A., Ibrahim, N.A.: Nonparametric estimation of cure fraction using right censored data. Am. J. Sci. Res. 14, 79–81 (2011) 15. Aljawadi, B.A., Bakar, M.R.A., Ibrahim, N.A.: Parametric cure rate estimation based on exponential distribution which incorporates covariates. J. Stat. Model. Anal. 2(10), 11–20 (2011) 16. Aljawadi, B.A., Bakar, M.R.A., Ibrahim, N.A., Midi, H.: Parametric estimation of the cure fraction based on BCH model using left censoring data with covariates. Mod. Appl. Sci. 5(3), 103–110 (2011) 17. Brown, E.R., Ibrahim, J.G.: Bayesian approaches to joint cure rate and longitudinal models with applications to cancer vaccine trials. Biometrics 59, 686–693 (2003) 18. Uddin, M.T., Islam, M.N., Ibrahim, Q.I.U.: An Analytical approach on cure rate estimation based on uncensored data. J. Appl. Sci. 6(3), 548–552 (2006) 19. Chen, M-H., Ibrahim J., Sinha D.: A new Bayesian model for survival data with a surviving fraction. J. Am. Stat. Assoc. 94, 909–918 (1999) 20. Chukwu, A.U., Folorunsho, S.A.: Determination of flexible parametric estimation of mixture cure faction model: an application of Gastric cancer data. West Afr. J. Ind. Acad. Res. 15(1), 139–156 (2015)
A2PF: An Automatic Protein Production Framework Mohamed Hachem Kermani1(B) and Zizette Boufaida2 1
2
LIRE Laboratory, National Polytechnic School - Malek Bennabi, Constantine, Algeria [email protected] LIRE Laboratory, University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria [email protected]
Abstract. Proteins are vital molecules that play many important roles in the human body; they contribute to tissue growth and maintenance, catalysis of organic reactions, communication between cells, tissues and organs and help improve immune health. Therefore one of the most important and frequently studied issues in biological and medical research is understanding the function of proteins. A thorough understanding of a protein’s function and activity requires determining its structures. In this paper, we propose an Automatic Protein Production Framework, which aims to completely determine the different structures in order to construct three-dimensional physical proteins and provide all information that will contribute to the study of the functions and activities of the proteins. The proposed framework is based on computational methods by combining three bioinformatics methods (i.e. comparative modeling, fold recognition, and ab initio prediction). We also present a software application that uses our framework and an experiment to illustrate our proposed Automatic Protein Production Framework, using the model application. Keywords: Immune health · Protein functions · Protein synthesis 3D Proteins · Computational methods · Framework · Software application.
1
·
Introduction
Four levels of protein structure are distinguished: primary, secondary, tertiary and quaternary. Amino acids are assembled through peptide bonds (i.e. an amino acid group of carboxylic acid with a neighboring amino acid group → C=O— NH) and thus form the primary structure [11]. Then comes secondary protein structure which is the three-dimensional form of local protein segments. Alpha helices and beta sheets are the two most common secondary structural elements, which form spontaneously as an intermediate before the protein folds into the tertiary three-dimensional structure where the α-helixes and β-pleated-sheets c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 80–91, 2021. https://doi.org/10.1007/978-3-030-71187-0_8
A2PF: An Automatic Protein Production
81
are folded into a compact globular structure. Some proteins, known as oligurics (i.e. made up of several polypeptide chains, each chain has a primary, secondary and tertiary structure), such as hemoglobin, reach a quaternary structure by adopting a symmetrical structure [20]. A number of methods are currently being used to determine a protein’s structure including X-ray crystallography, NMR spectroscopy, and electron microscopy [29]. Each method has both advantages and drawbacks. The scientist uses many pieces of information in each of those methods to create the final atomic model. This information enables to construct a consistent model with both the experimental data and the molecule’s expected composition and geometry [10]. In our case, we propose a framework for building a consistent protein model, the framework aims to reproduce the same functioning of cells, step by step, from determining the primary structure to building the three-dimensional physical proteins. Our proposed framework is broken down into four steps: 1) DNA sequences are translated into amino acid sequences. 2) The secondary-structure prediction. 3) 3D structure prediction and 4) physical tri-dimensional protein printing using a 3D printer.
2
Related Work
X-ray crystallography, which is a time-consuming and relatively expensive method, has determined most of the protein structures available in the Protein Data Bank [10]. Hence computational methods have been developed to compute and predict protein structures based on their sequences of amino acids. 2.1
X-Ray Crystallography Method
X-ray crystallography is a technique for determining the structure of molecules in three dimensions, including complex biological macromolecules such as proteins and nucleic acids. It is a powerful method at atomic resolution in elucidating the three-dimensional structure of a molecule. The X-ray crystallography technique uses diffraction patterns that are generated by irradiating a crystalline sample of the molecule of interest with X-rays, making diffraction quality crystals mandatory for this process [3]. Although this method provides a powerful tool in elucidating the three-dimensional structure, the major drawback is time. Thousands of experiments on crystallization can be performed daily in a single laboratory, each experiment is observed over time, with the normal time span being weeks to months. It was for this reason that computational methods were developed to reduce time and costs. 2.2
Computational Methods
Proteins fold into one or more specific conformations to exercise their biological functions. The Determination of a protein’s structure can be achieved through computational techniques that automatically predict protein structures based on their amino acid sequences. The three common bioinformatics methods used to predict the protein structure are: comparative modeling, fold recognition and ab initio prediction.
82
M. H. Kermani and Z. Boufaida
Comparative Modeling. Also known as homology modeling, it is a technique which uses known information from one or more homologous partners to predict the structure of an unknown protein. Comparative modeling usually involves three steps: a) identifying template structures for modeling the query protein, b) aligning the template with the query sequence, and c) modeling the query structure [21]. This family of methods enables greater number of potential templates to be produced and better templates to be identified [15]. To predict the three-dimensional protein, both the template and the query can be submitted to a comparative modeling program once the better template has been identified. Fold Recognition. We model the proteins in fold recognition by threading which have the same fold as the proteins of known structures. Protein threading is used for protein that is not stored in the Protein Data Bank (PDB) with its homologous protein structures [16]. Many algorithms for determining the correct threading of a sequence into a structure have been proposed. They employ some form of dynamic programming. The problem of identifying the best alignment for the complete 3D threading is very difficult (it is an NP-hard issue for some threading models) [14]. Researchers have therefore proposed many methods of optimization, such as Conditional random fields, simulated annealing, branch and bound and linear programming, in order to achieve heuristic solutions. Ab-Initio Prediction. The ab-initio method is a technique that attempts to predict protein structures based solely on information about sequences and without using templates. Ab-initio modelling is often referred to as de-novo modeling [33]. The fundamental procedure followed by the protein structure prediction abinitio method begins with the primary amino acid sequence, which is searched for the various conformations which lead to the prediction of native folds. After recognition and prediction of the folds, the model assessment is carried out to verify the quality of the predicted structure. Numerous methods of determining protein structures, including X-ray crystallography, and computational methods are currently being used. Each method has advantages and disadvantages, but they all have the same goal of building a consistent 3D protein model that can be useful for a detailed understanding of protein and enzyme function.
3
A2P Framework
Proteins, consisting of long or short amino acid sequences, respectively called polypeptides and peptides, are assembled from amino acids based on the information contained in the genes. Protein synthesis is the process in which cells produce proteins by determining a protein’s various structures: primary, secondary, and tertiary. The proposed Automatic Protein Production Framework aims to reproduce the same functioning of cells, step by step, from determining
A2PF: An Automatic Protein Production
83
the primary structure to building the physical 3D proteins, allowing for all protein information at the three different structures. This information will be used to better understand the functions and activities of the proteins to develop effective mechanisms for disease prevention, personalized medicine and treatments, and other healthcare aspects. The A2P framework is broken down into four steps (Fig. 1).
Fig. 1. The A2P Framework
3.1
Step1: The Determination of the Primary Structure
This step aims to determine the primary structure of the protein by acquiring several sequences of DNA at a time and translating it into sequences of amino acids. This first step was the subject of our previous work [17,18] in which we proposed a multi-agent approach that translates DNA sequences into amino acid sequences. For more details on the proposed multi-agent approach for determining primary protein structure, please consult [18]. 3.2
Step2: The Prediction of the Secondary Structure
A prediction for proteins consists of assigning regions of the amino acid sequence to be probable alpha helices, beta strands. Different methods for predicting sec-
84
M. H. Kermani and Z. Boufaida
ondary structures have been developed. One of the first algorithms was the ChouFasman method [5,6], which relies mainly on the probability parameters determined from the relative frequencies of the appearance of each amino acid in each type of secondary structure [4]. In addition, overtime computational prediction methods were developed which are based on techniques of sequence alignment and methods of machine/deep learning. We propose a combined technique based on sequence alignment and fold recognition within our framework. The prediction of the secondary structure can be achieved through aligning the determined amino acid sequence with the amino acid sequences of already known protein secondary structure, which are stored in different existing protein sources. If the determined amino acid sequence matches perfectly an amino acid sequence of already known protein, the secondary structure will be imported from the existing protein sources. Contrariwise, if the determined amino acid sequence does not match any amino acid sequences of already known protein secondary structure, we will predict the 2D structure based on a fold recognition technique. To compare the amino acid sequences, we use methods of pairwise alignment. The alignment between the sequence determined, denoted below as X, and the sequence recuperated, denoted as Y, depends on the similarities and dissimilarities between the amino acids in each sequence position. A correspondence between the amino acids is counted as 1, C = 1, and a dissimilarity or a gap in the case of local alignment is counted as 0, D = 0, for example: X: Lys - Glu - Thr - Lys - Glu - Thr - Thr Y: 0
1
1
0
The similarity score for the two sequences is calculated as follows: C, D Ss(X, Y ) = N AA
(1)
Where C and D represent the similarities and dissimilarities between the amino acids, and NAA represents the number of amino acids constituting the sequence. 3.3
Step3: The Prediction of the 3D Structure
The inference of a protein’s three-dimensional structure from its amino acid sequence remains an extremely difficult and unsolved task; in this step, we are proposing a method based on sequence alignment, fold recognition and ab-initio to automatically predict the tertiary protein structure. First we align the determined amino acid sequence with the amino acid sequences of the already known 3D protein, and then based on the alignment results; we import or predict the structure of the 3D protein. The alignment results in one of three cases:
A2PF: An Automatic Protein Production
85
– Similarity Score = 1: The determined amino acid sequence matches perfectly with a sequence of known 3D protein. In this case, we import the 3D protein structure from the available protein sources. – 0 < Similarity Score < 1: The determined amino acid sequence partially matches a sequence of known 3D protein. In this case, we use the information about the partially similar protein to automatically predict the 3D structure based on an ab-initio prediction method. – Similarity Score = 0: The determined amino acid sequence does not match any sequence of known 3D protein. In this case, we use the technique of fold recognition to automatically predict the tridimensional structure of proteins. 3.4
Step4: The Printing of the 3D Protein
Determining physical three-dimensional protein structures can contribute to studying the activity, function, and relationship of proteins. The development of 3D printing technologies has allowed the building of physical biomolecular models [25]. 3D printing is the process whereby a physical 3D object is produced from a digital file. [26]. Our fourth step of our A2PF is to convert the 3D protein files into real 3D protein objects. This conversion needs a 3D printing process, which takes place in three sub-steps: Importing 3D Files. The 3D file will be downloaded after location of the 3D protein data file from a protein source [30,34]. Preparing STL Files for Printing. The downloaded 3D files must be in STL format; in contrast they will be converted automatically. Slicing and Printing. A 3D printer connected to a computer will be used, the printer must be preparing by loading the filament and ensuring that the bed is level. The printing starts and processes automatically until we get the 3D protein model.
4
Software Application and Experiment
In this section, we present a software application which uses our framework and an experiment using the modeled application. To illustrate our proposed A2PF we developed the software application with Java programming language. 4.1
The Protein Primary Structure Determination
The multi-agent system uses existing DNA sources, containing several genomes made up of thousands of genes, which in turn comprise thousands of nucleotides. The MAS can determine the primary protein structure (i.e. the amino acid
86
M. H. Kermani and Z. Boufaida
sequence) by exploiting an existing DNA sequence in a very short time, ranging from 0,0005 min to 95,14 min (Fig. 2).
Fig. 2. The time to determine the primary structure of proteins
To compute this time we used ‘OAR resource and task manager’, which is a batch scheduler for HPC clusters and other computer infrastructure.
4.2
The Protein 2D/3D Structure Determination
The prediction of the secondary and tertiary structure of proteins depends on the results of sequence alignment, in accordance with our proposed framework. The software application, meanwhile, illustrates the case where the determined amino acid sequence perfectly matches an amino acid sequence of a known 2D or 3D protein. The Fig. 3 shows that the determined amino acid sequence is similar to the “Actin” amino acid sequence; in this case the “Actin” 2D and 3D file will be imported from the protein source and converted to STL format, which is a file that stores information about 3D models, this format describes the surface geometry of three-dimensional objects. The STL file will then be displayed to allow the 3D protein to be visualised (Fig. 4). The visualization of the 3D protein permits to have all information about the protein before printing it. Once the 3D printer is connected and the filament is loaded the printing process starts automatically.
A2PF: An Automatic Protein Production
87
Fig. 3. 2D/3D protein structure determination
Fig. 4. 3D Visualisation of the “Actin” structure
5
Discussion
Each method used to determine protein structures, including X-ray crystallography, and computational methods has advantages and disadvantages. The major drawback of X-ray crystallography is that method only aims to determine the 3D structure. In addition, laboratory experimentation with this method is time-consuming and requires days or weeks before the dynamic behavior or the expected results can be observed. Instead, computational methods were developed with the aim of reducing time-consuming and getting faster results. Yet bioinformatics methods also allow only the 2D or 3D protein structures to be obtained. Therefore, we propose an Automatic Protein Production Framework (A2PF), which is a computational tool which aims at simultaneously determining the different protein structure (i.e. primary, 2D and 3D) in a faster way.
88
M. H. Kermani and Z. Boufaida Table 1. Comparative study Methods approaches
X-ray Crys- Computational methods tallography Comparative Fold recognition Ab-initio prediction modeling
[22]
X
[13]
X
[31]
X
[9]
X
[28]
X
[2]
X
[23] [7]
X X
[32]
X
[27]
X
[19] [1]
X
X
X
[12]
X
[24]
X
[8] A2PF
X
X
X
X
X
The A2PF will enable information on protein structures to be obtained altogether, which will allow a better understanding of protein functions and activities. The proposed A2PF is based on computational methods by combining the three bioinformatics methods used to predict the protein structure (i.e. comparative modeling, fold recognition, and ab initio prediction). The A2PF will allow the identification of the three protein structures (primary, secondary and tertiary) in contrast to the existing approaches which only determine one of the protein structures.
6
Conclusion
Genetic and protein information plays a critical role in making life better understood to address challenges in the medical, pharmaceutical and pathological fields. This is why, in recent years, research has focused on obtaining and understanding this information, which is contained in cells. Indeed, the ability to sequence the genetic code of various organisms, from simple bacteria and viruses to human genetic code in particular, made the genetic information widely available. Even though this newly available genetic information has opened up new
A2PF: An Automatic Protein Production
89
avenues for a better understanding of life, there are still some issues to address. One of these concerns the availability of information about the protein. We have therefore proposed an automatic protein production framework aimed at reproducing the same functioning of cells, step by step, from the primary structure determination to the physical 3D protein building. The proposed A2PF is based on computational methods, combining three bioinformatics methods to predict the structure of proteins. The A2PF enables the three protein structures (primary, secondary, and tertiary) to be fully identified, which allows all protein information to be available at the three different structures. These information will be used to better understand the functions and activities of the proteins to develop effective mechanisms for disease prevention, personalized medicine and treatments, and other healthcare aspects. Moreover, even if the software application and experimentation with the modeled system demonstrate the feasibility of our approach. Our proposal is not without limitations. We plan to address these in the future by proposing a fold recognition and ab-initio method for secondary and tertiary structure prediction as our framework is currently based solely on sequence alignment technique to automatically predict the 2D/3D protein structures.
References 1. Audet, M., Villers, K., Velasquez, J., Chu, M., Hanson, C., Stevens, R.C.: Smallscale approach for precrystallization screening in GPCR x-ray crystallography. Nat. Protoc. 15(1), 144–160 (2020) 2. Bertoni, M., Kiefer, F., Biasini, M., Bordoli, L., Schwede, T.: Modeling protein quaternary structure of homo-and hetero-oligomers beyond binary interactions by homology. Sci. Rep. 7(1), 1–15 (2017) 3. Brito, J.A., Archer, M.: X-ray crystallography. In: Practical Approaches to Biological Inorganic Chemistry, pp. 217–255. Elsevier (2013) 4. Chou, P.Y.: Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 47, 45–148 (1978) 5. Chou, P.Y., Fasman, G.D.: Prediction of protein conformation. Biochemistry 13(2), 222–245 (1974) 6. Chou, P.Y., Fasman, G.D.: Empirical predictions of protein conformation. Annu. Rev. Biochem. 47(1), 251–276 (1978) 7. Degtjarik, O., Demo, G., Wimmerova, M., Smatanova, I.K.: X-ray crystallography. In: Plant Structural Biology: Hormonal Regulations, pp. 203–221. Springer (2018) 8. Dehghani, T., Naghibzadeh, M., Eghdami, M.: BetaDL: a protein beta-sheet predictor utilizing a deep learning model and independent set solution. Comput. Biol. Med. 104, 241–249 (2019) 9. Ghouzam, Y., Postic, G., Guerin, P.E., De Brevern, A.G., Gelly, J.C.: Orion: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci. Rep. 6(1), 1–10 (2016) 10. Goodsell, D.S., Dutta, S., Zardecki, C., Voigt, M., Berman, H.M., Burley, S.K.: The rcsb pdb “molecule of the month”: inspiring a molecular view of biology. PLoS Biol. 13(5), (2015) 11. Haynie, D.T., Xue, B.: Superdomains in the protein structure hierarchy: the case of PTP-C2. Protein Sci. 24(5), 874–882 (2015)
90
M. H. Kermani and Z. Boufaida
12. Ibrahim, W., Abadeh, M.S.: Protein fold recognition using deep kernelized extreme learning machine and linear discriminant analysis. Neural Comput. Appl. 31(8), 4201–4214 (2019) 13. Ilari, A., Savino, C.: Protein structure determination by x-ray crystallography. In: Bioinformatics, pp. 63–87. Springer (2008) 14. Jo, T., Hou, J., Eickholt, J., Cheng, J.: Improving protein fold recognition by deep learning networks. Sci. Rep. 5, 17573 (2015) 15. John, B., Sali, A.: Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res. 31(14), 3982–3992 (2003) 16. Kelley, L.A.: Fold recognition. In: From Protein Structure to Function with Bioinformatics, pp. 27–55. Springer (2009) 17. Kermani, M.H., Boufaida, Z.: A modeling of a multi-agent system for the protein synthesis. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–7. IEEE (2015) 18. Kermani, M.H., Guessoum, Z., Boufaida, Z.: A two-step methodology for dynamic construction of a protein ontology. IAENG Int. J. Comput. Sci. 46(1), (2019) 19. Khalatbari, L., Kangavari, M.R., Hosseini, S., Yin, H., Cheung, N.M.: MCP: a multi-component learning machine to predict protein secondary structure. Comput. Biol. Med. 110, 144–155 (2019) 20. Kumari, I., Sandhu, P., Ahmed, M., Akhter, Y.: Molecular dynamics simulations, challenges and opportunities: a biologist’s prospective. Curr. Protein Pept. Sci. 18(11), 1163–1179 (2017) 21. Lam, S.D., Das, S., Sillitoe, I., Orengo, C.: An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallographica Section D: Struct. Biol. 73(8), 628–640 (2017) 22. Langer, G., Cohen, S.X., Lamzin, V.S., Perrakis, A.: Automated macromolecular model building for x-ray crystallography using arp/warp version 7. Nat. Protoc. 3(7), 1171 (2008) 23. Lee, J., Freddolino, P.L., Zhang, Y.: Ab initio protein structure prediction. In: From Protein Structure to Function with Bioinformatics, pp. 3–35. Springer (2017) 24. Liu, B., Li, C.C., Yan, K.: Deepsvm-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in bioinformatics (2019) 25. Ma, T., Kuang, P., Tian, W.: An improved recurrent neural networks for 3D object reconstruction. Appl. Intell. 50(3), 905–923 (2020) 26. Meyer, S.C.: 3d printing of protein models in an undergraduate laboratory: leucine zippers. J. Chem. Educ. 92(12), 2120–2125 (2015) ´ Kun, J., Moussong, E., ´ Lee, Y.H., Goto, 27. Micsonai, A., Wien, F., Buly´ aki, E., Y., R´efr´egiers, M., Kardos, J.: Bestsel: a web server for accurate protein secondary structure prediction and fold recognition from the circular dichroism spectra. Nucleic Acids Res. 46(W1), W315–W322 (2018) 28. Ovchinnikov, S., Park, H., Varghese, N., Huang, P.S., Pavlopoulos, G.A., Kim, D.E., Kamisetty, H., Kyrpides, N.C., Baker, D.: Protein structure determination using metagenome sequence data. Science 355(6322), 294–298 (2017) 29. Qiao, S., Yan, B., Li, J.: Ensemble learning for protein multiplex subcellular localization prediction based on weighted KNN with different features. Appl. Intell. 48(7), 1813–1824 (2018) 30. Rose, P.W., Prli´c, A., Altunkaya, A., Bi, C., Bradley, A.R., Christie, C.H., Costanzo, L.D., Duarte, J.M., Dutta, S., Feng, Z., et al.: The RCSB protein data bank: integrative view of protein, gene and 3d structural information. Nucleic acids research, p. gkw1000 (2016)
A2PF: An Automatic Protein Production
91
31. Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to AB initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(1), 103–112 (2014) 32. Studer, G., Tauriello, G., Bienert, S., Waterhouse, A.M., Bertoni, M., Bordoli, L., Schwede, T., Lepore, R.: Modeling of protein tertiary and quaternary structures based on evolutionary information. In: Computational Methods in Protein Evolution, pp. 301–316. Springer (2019) 33. Xu, D., Jaroszewski, L., Li, Z., Godzik, A.: Aida: AB initio domain assembly for automated multi-domain protein structure prediction and domain-domain interaction prediction. Bioinformatics 31(13), 2098–2105 (2015) 34. Yang, M., Derbyshire, M.K., Yamashita, R.A., Marchler-Bauer, A.: Ncbi’s conserved domain database and tools for protein domain analysis. Current Protocols Bioinform. 69(1), 874–882 (2020)
Open Vocabulary Recognition of Offline Arabic Handwriting Text Based on Deep Learning Zouhaira Noubigh1(B) , Anis Mezghani2 , and Monji Kherallah3 1 ISITCom, University of Sousse, 4011 Sousse, Tunisia 2 Higher Institute of Industrial Management, University of Sfax, Sfax, Tunisia 3 Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
Abstract. The offline Arabic text recognition is a substantial problem that has several important applications. It has attracted special emphasis and has become one of the challenging areas of research in the field of computer vision. Deep Neural Networks (DNN) algorithms provide the great performance improvement in problems of sequence recognition such as speech and handwriting recognition. This paper interests on recent Arabic handwriting text recognition researches based on DNN. Our contribution in this work is based on CRNN model with CTC beam search decoder that is used for the first time for handwriting Arabic recognition. The proposed system is an Open-Vocabulary approach that based on charactermodel recognition. Keywords: Deep learning · Handwriting Arabic text · Open vocabulary · CNN · BLSTM · CTC beam search
1 Introduction Handwriting is a specific free expression of each person. It remains until today an important means of communication. Therefore, the text recognition systems are indispensable in everyday life and their performance varies greatly with the clarity of the images provided [1, 2]. The text recognition is considered as one of the most challenging tasks in artificial intelligence (AI) [3]. In fact, the mass paper documents continue to grow, and increasing numbers of industries or services have a need to digitize these documents. The developed handwriting Recognition systems (HRS) are more and more efficient, but it doesn’t outperform or equal to the human capacity in recognition rate. Those systems are nevertheless limited to very specific documents with a very limited lexicon [4–6]. There is no HRS able to process heterogeneous documents with higher performance. Researchers have proposed error correction strategies, such as the use of lexicons [7] or language models [8], to improve results. Although, several researches in offline Arabic handwriting recognition have been developed, [9, 10], the most systems are limited to the isolated characters, digits or word with a limited vocabulary [1, 12, 15]. Very few researches are presented for recognition of unconstrained Arabic text in open or large vocabulary. The recognition of Arabic text presents many challenges because of many characteristics of this script. Mainly, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 92–106, 2021. https://doi.org/10.1007/978-3-030-71187-0_9
Open Vocabulary Recognition of Offline Arabic Handwriting Text
93
the cursive nature of this script and its characters connectivity in both handwriting and printed task, make its recognition very sophisticated process [11, 12]. In recent years, the leveraging of deep neural networks allows a great progress in the text recognition problem. Therefore, the Deep Learning (DL) technologies proved itself as the most important models for robust classification, recognition, and segmentation due to the great performance improvement they have provided. Firstly, the Convolutional Neural Network (CNN) has provided an efficient solution for Arabic handwritten characters and digits recognition and has been employed in many researches. Secondly, Recurrent Neural Network (RNN), especially Long short-term memory (LSTM) is successfully used for sequence modeling task especially for text line recognition [13]. In this work, firstly, we want to use Convolutional Neural Networks (CNN) to solve the problem of features extraction for Arabic text lines combined with a sliding windows method. Secondly, bidirectional LSTM (BLSTM) and CTC beam search combination is proposed for the sequence modeling. The rest of this paper is organized as follows: in Sect. 2, we present recent works that record good results for offline Arabic text recognition, Sect. 3 offers an overview of the proposed Arabic text recognition system, the experimental results are deposited and explained in Sect. 4 and in Sect. 5 we conclude this paper.
2 Related Work Combining deep learning technologies present the key contribution in the most recent researches for handwriting recognition. Shi et al. [14] were the first ones proposed the Convolutional Recurrent Neural Network (CRNN) architecture consists in combining deep CNN and BLSTM with CTC decoder. Afterwards, many approaches for handwritten text recognition were inspired from this deep architecture. Suryani et al. [15] combine CNN-LSTM model with a hybrid HMM decoder instead of CTC. The proposed approach was tested on offline Chinese handwriting datasets. Rawls et al. [16] published a CNN-LSTM model where CNN is used for feature extraction, and bidirectional LSTMs for sequence modeling. In this work, authors presented a comparison stage between features provided types. It is proved that CNN model is better than both existing handcrafted features and a simpler neural model consisting entirely of Fully Connected layers. Results are presented on English and Arabic handwritten data, and on English machine printed data. For Arabic handwriting recognition, AL-Saffar et al. proposed a review that presented Deep Learning Algorithms for Arabic Handwriting Recognition [1]. Authors improve that the first successful systems based DL proposed for Arabic character were based on Convolutional Neural Network (CNN) and Deep Belief Networks [19]. BenZeghiba [20] proposed recognition system based on MDLSTM-RNN. He present a comparative study based on four different optical modeling units for offline Arabic text recognition. Experiments are conducted using Maurdor and Khatt databases. Ahmad et al. [21] proposed Arabic character recognition system combining an MDLSTM with CTC decoder. KHATT datasets was used for experiments. Recent handwriting recognition system was proposed by Jemni et al. [22]. In this work, authors introduce three BLSTM-CTC combination architectures namely low-level fusion, mid-level fusion and the high-level fusion.
94
Z. Noubigh et al.
The proposed contribution is based on a comparative study of those combination levels trained on different handcrafted features. The experiments conducted on the KHATT database. In this work, n-gram LMs are estimated on the training corpus of the KHATT database using the discounting method. Another recognition system was proposed by Jemni et al. [23] based on MDLSTM and CNN. The main contribution of this work is a novel OOV (Out of Vocabulary) detection and recovery method that considerably improve the system performance. Three OOV detection methods are proposed based on three different lexicon driven recognition methods. Experiments were performed using KHATT and AHTID/MW databases.
3 Architecture and Model Details The key contribution of this paper is the use of the new approach of CTC decoder called CTC beam search with a deep CRNN model. The proposed model is inspired from the original neural network architecture presented by Shi et al. [14] and adapted to the Arabic writing. In features extraction step a sliding window is used to scan the image from right to left as the direction of the Arabic writing. Dropout regularization technique is applied for two times, after the CNN layers and in the BLSTM layer. In this section, we will detail each part of the proposed system and an overview of those steps is illustrated in Fig. 1.
Fig. 1. Overview of the proposed handwriting recognition system steps
3.1 Data Pre-processing The scanned image may need to be enhanced by preprocessing steps before inputted in the recognition system. Furthermore, unconstrained offline handwritten Arabic texts are of high variations due to the multiple writer’s styles and cultures. Thus, selecting the best image improvement method is a vital step of the recognition process to make image more suitable for features extraction. As presented in Fig. 2, the proposed preprocessing
Open Vocabulary Recognition of Offline Arabic Handwriting Text
95
techniques allow an important improvement in the image quality. The used preprocessing methods are binarization using the algorithm of Otsu [30], resizing, baseline detection, corrected the line skew, corrected the slant and normalize the height of the characters [31].
Fig. 2. Original and preprocessed images examples
3.2 Features Extraction choice of the features extracted method from a given dataset impacts directly the results of a recognition system. In this context, there are many methods dedicated to the feature extraction such as the deep learning technologies. They provide high capability to capture word and text features without need a segmentation task. The CNN performance improves the accuracies against other models of the state-of-the-art [17, 18]. CNN called also ConvNet prove its features extraction ability and outperform the handcrafted features capability [27]. The main concept of CNN-features extraction method is to obtain local features from image at higher layers and combine them into more complex features at the lower layers [25, 28]. In general, convolutional neuron can be presented as in figure Fig. 3 and mathematically we have that: (1) Cj = i Xi ∗ wij + bj Yj = f Cj
(2)
Where in Eq. (1) Cj represents the output from the convolution operation, Xi denotes the input to the convolutional layer, wij is the convolution kernel (weights), and bj is the additive bias. In the Eq. (2), Yj is the output feature map of the convolutional layer and f is an activation function. Activation functions are mathematical operations over the
96
Z. Noubigh et al.
input, which introduces non-linearity into neural networks since convolution is a linear operation.
Fig. 3. Convolutional neural network
Researchers propose several CNN architectures include LeNet-5, AlexNet, ZFNet, VGGNet, GoogleNet, ResNet, DenseNet, and CapsNet that are briefly discussed in [27]. The proposed CNN architecture in this work is inspired by the VGGNet architecture. It contains 10 convolution layers with a filter size (3 × 3) with stride 1. The Rectified Linear Unit (ReLU) activation function is used. It is defined as in formulation (3). xx ≥ 0 f (x) = (3) 0x 0 f (x) = x α(e − 1) , x 0
Sinc Step ELU
π ] 2
1+
x2 )
(−∞, +∞) [−1, 1] (≈−.217234, 1] {0, 1} (−α, +∞)
288
X. Zhang et al.
Fig. 3. Space distribution of the evolved data under random initialization and the classification performance of FCM and OPC.
4
Experiments
In this section, we perform two-part experiments to investigate the characteristics of FCM and analyze the results. Next, we will introduce the two parts of the experiments in turn. 4.1
Random Distributed Data Evolution
To intuitively understand the impact of data distribution, we randomly generated a two-dimensional binary classification dataset. Each class contains 50 samples. Empirical studies have proved that the classification ability of the algorithm can be improved by projecting low-dimensional data to high-dimensional manifolds. Therefore, to deeply analyze the difference between the fixed centroid method and the floating centroid method, the hidden and output layer neurons of
Investigating Data Distribution for FCM-based Classifier
289
Fig. 4. Initialize the dot-matrix data and the global optimal data distribution space mapped by RNN.
the classifiers used in these two methods are both set to 2. For a fair comparison, we emply the stochastic gradient descent (SGD) method to train 50 epochs neural networks with a learning rate of 0.1. The number of centroids and λ of FCM is set to 2 and 0.5, respectively. We run ten times to obtain the average accuracy in each generation of evolution to reduce the impact of network parameter initialization on the accuracy. In addition, PSO uses these parameter settings: population size is 20, max generation is 200, c0 = 1, c1 = 1.8, c2 = 1.8, vmax = 0.2, α = 1, and β = 1. We present the results of 6 trial experiments, which are given in Fig. 3. These figures showed several interesting findings. After adversarial training of two classifiers, these data distributions have evolved into a variety of forms by PSO. For example, one cluster is surrounded by another cluster, two classes are interleaved with each other, and one class has multiple sub-clusters. From the box plots results of each trial, FCM demonstrates competitive performance on these datasets. Therefore, it can be seen that FCM may form a more flexible decision boundary based on data distribution. 4.2
Dot-Matrix Distributed Data Evolution
To further study the influence of data distribution on algorithm characteristics, we initialize the data in the form of a dot-matrix, and then utilize RNNs to perform a nonlinear transformation on the data distribution. The hidden layer neuron node of RNNs is set to 10, and the time step is equal to 10. The weights
290
X. Zhang et al.
and biases of RNNs are optimized by PSO for 100 generations, the two classifiers are run three times to get the average accuracy in each generation of data evaluation, and the other parameters of the experiment are the same as in the previous subsection. We conducted 5 trials for this experiment. In trial 1, the output neurons of RNNs is constrained to [0,1] by max-min normalization, and the centroids’ number of FCM is set to 4. In other trials, the number of centroids is updated to 6. Moreover, for the output neuron processing of RNNs, trial 2 uses max-min normalization, Z-score is adopted in trial 3, trial 4 employs principal component analysis (PCA), and trial 5 only uses Sigmoid activation. Figure 4 provides the initial distribution and the best evolution distributions. As can be seen from Fig. 4, the data spaces of trials 1, 2, and 3 showed multicluster and circular distribution, which makes FCM present a significant advantage in classification performance due to its multiple centroids characteristics. The data distribution of trials 3 and 4 is extremely stretched and compressed on the space scale, which implies that FCM is insensitive to the data scale range, especially for trial 4. In addition, trial 5 shows that although the two classes have excellent intra-class consistency, the inter-class separability is insufficient. Floating centroids may be feasible for this data distribution. Therefore, the floating centroids method could be able to adapt to more forms of classification problems than the fixed centroids method.
5
Conclusions
In this paper, we propose an adversarial network combined with the PSO evolution data distribution method to investigate the characteristics of neural network classifiers. Through the directional evolution of the data distribution by PSO, experimental results indicated the effectiveness of the proposed method. Moreover, we find that FCM has a better fitting capability for extremely nonlinear data than the fixed centroids method. In the future, applying this method to multiclass problems and high-dimensional problems could be an interesting further extension. Acknowledgements. This work was supported by National Natural Science Foundation of China under Grant No. 61872419, No. 61573166, No. 61572230, No. 61873324, No. 81671785, No. 61672262, No. 61903156. Shandong Provincial Natural Science Foundation No. ZR2019MF040, No. ZR2018LF005. Shandong Provincial Key R&D Program under Grant No. 2019GGX101041, No. 2018CXGC0706, No. 2017CXZC1206. Taishan Scholars Program of Shandong Province, China, under Grant No. tsqn201812077.
References 1. Bao, F., Deng, Y., Kong, Y., Ren, Z., Suo, J., Dai, Q.: Learning deep landmarks for imbalanced classification. IEEE Trans. Neural Netw. Learn. Syst. 31(8), 2691–2704 (2020)
Investigating Data Distribution for FCM-based Classifier
291
2. Chen, C.L.P., Liu, Z.: Broad learning system: an effective and efficient incremental learning system without the need for deep architecture. IEEE Trans. Neural Netw. Learn. Syst. 29(1), 10–24 (2018) 3. Ning, O., Zhu, T., Lin, L.: Convolutional neural network trained by joint loss for hyperspectral image classification. IEEE Geoence Remote Sens. Lett. 16(3), 457– 461 (2019) 4. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105, January 2012 5. Wang, L., Yang, B., Chen, Y., Zhang, X., Orchard, J.: Improving neural-network classifiers using nearest neighbor partitioning. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2255–2267 (2017) 6. Marek, M., Struski, U., Figueiredo, M.: A classification-based approach to semisupervised clustering with pairwise constraints. Neural Netw. 127, 193–203 (2020) 7. Mousavi, S., Zhu, W., Ellsworth, W., Beroza, G.: Unsupervised clustering of seismic signals using deep convolutional autoencoders. IEEE Geoence Remote Sens. Lett. 16(11), 1693–1697 (2019) 8. Erturul, M.F.: A novel clustering method built on random weight artificial neural networks and differential evolution. Soft. Comput. 24(16), 12067–12078 (2020) 9. Shao, Z., Zhou, W., Deng, X., Zhang, M., Cheng, Q.: Multilabel remote sensing image retrieval based on fully convolutional network. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 13(1), 318–328 (2020) 10. Li, P., Han, L., Tao, X., Zhang, X., Grecos, C., Plaza, A.: Hashing nets for hashing: a quantized deep learning to hash framework for remote sensing image retrieval. IEEE Trans. Geoence Remote Sens. 58(10), 7331–7345 (2020) 11. Wang, L., Yang, B., Chen, Y., Abraham, A., Sun, H., Chen, Z., Wang, H.: Improvement of neural network classifier using floating centroids. Knowl. Inf. Syst. 31(3), 433–454 (2012) 12. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1735–1742, June 2006 13. Liu, W., Wen, Y., Yu, Z., Li, M., Raj B., Song, L.: SphereFace: deep hypersphere embedding for face recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 6738–6746, June 2017 14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 770–778, June 2016 15. Lin, M., Cheng, Q., Yan, S.: Network in network. In Proceedings of International Conference on Learning Representations, April 2014 16. Wang, L., Orchard, J.: Investigating the evolution of a neuroplasticity network for learning. IEEE Trans. Syst. Man Cybern. Syst. 49(10), 2131–2143 (2019) 17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 18. Quentin, M., Fabrice, P., Desideri, J.A.: Stochastic multiple gradient descent algorithm. Eur. J. Oper. Res. 271(3), 808–817 (2018) 19. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: International Symposium on Micro Machine and Human Science, pp. 39–43, October 1995 20. Lin, C.T., Prasad, M., Saxena, A.: An improved polynomial neural network classifier using real-coded genetic algorithm. IEEE Trans. Syst. Man Cybern. Syst. 45(11), 1389–1401 (2015)
Study of Various Web-based Applications Performance in Natif IPv4 and IPv6 Environments Khalid E. L. Khadiri(B) , Ouidad Labouidya, Najib E. L. Kamoun, Rachid Hilal, Fatima Lakrami, and Chaimaa Belbergui Chouaib Doukkali University, El Jadida, Morocco {khalid.elkhadiri,labouidya.o,elkamoun.n,hilal.r,lakrami.f, belbergui.c}@ucd.ac.ma
Abstract. The world is directly moving from IPv4 to IPv6 due to some gaps in IPv4 such as address space exhaustion, security issues, mobility, and quality of service. These gaps gave birth to the development of a new version of the Internet Protocol named IPv6. This version of the protocol offers many improvements, including an increase in the address space from 232 to 2128 and improvements in security, mobility, and quality of service. In this article, we study the performance of these protocols on different applications, namely HTTP, FTP, database, email, and VoIP using OPNET Modeler as a simulation tool. For this, we designed a network, which includes different components such as servers, routers, and adapters. The performance measurement criteria are (i) the response time, (ii) the received traffic, (iii) the delay, (iv) the jitter, and (v) the MOS score. The aim of this work is to discover what is the best IP for a particular application in relation to the kind of traffic (heavy or light). Keywords: IPv4 · IPv6 · HTTP · FTP · Database · Email · VoIP · OPNET
1 Introduction The Internet Protocol (IP) is one the most important Internet protocols [1]. This protocol identifies the hosts and routes the data between them via the Internet. The first generation of IP, which is widely currently used, is IPv4, but lately, it has had limitations: limited address space, given the huge size of the Internet and a large number of IPv4 users currently connected, limited quality of service, mobility, and security [2]. The new version of the protocol IP, developed by the IETF (Internet Engineering Task Force), is IPv6. It has been developed to resolve the addressing exhaustion and most of the IPv4 limitations adding many improvements, including auto-configuration, a huge address space of 128 bits instead of 32 bits in IPv4 and other improvements in security, mobility, and quality of service [3]. In this research paper, the two protocols IPv4 and IPv6 were examined and evaluated on a simulation network infrastructure under OPNET Modeler using different applications such as HTTP, FTP, database, Email, and VoIP. The comparative analysis of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 292–301, 2021. https://doi.org/10.1007/978-3-030-71187-0_27
Study of Various Web-based Applications Performance
293
simulation results concerns various parameters such as the response time, received traffic, delay, jitter, and MOS score. The rest of the document is organized as follows: Sect. 2 will present an overview of the two protocols IPv4 and IPv6. Section 3 will discuss a non-exhaustive state of the art of research work in this field of research. The simulation scenarios will be described in Sect. 4. The results of the simulation and the comparative analysis will be discussed in Sect. 5. The conclusions and perspectives will be presented in the final section of this paper.
2 IPv4/IPv6 Overview 2.1 IPv4 (Internet Protocol Version 4) The growth of the number of Internet users worldwide is increasing rapidly in most countries. Currently, version 4 of the Internet Protocol IP is mainly used to communicate on the Internet. Indeed, an IP address contains two different kinds of data such as the host address and the network address. The IPv4 address was structured on a 32-bits basis and normally expressed in decimal point notation with four bytes separated by decimals, for example, 192.168.140.30. IPv4 addresses have been divided into five different classes. However, classes A, B, and C are generally used. Class A provides the largest number of IP addresses while class B provides fewer addresses than class A, and class C provides the fewest IP addresses.. 2.2 IPv6 (Internet Protocol Version 6) To face the limitations of IPv4 and the current growth of the Internet, Internet Protocol version 6 (IPv6) has been developed to replace IPv4. IPv6 is an innovative step of IPv4 but has several improvements related to its predecessor. In IPv6, the total address space is extended from 32 bits to 128 bits, providing 2128 IP addresses (approximately 3.4 × 1038 ) [9]. This expansion of the IP address space means that IP address assignment to devices on the Internet can now be done with a lot of flexibility because it has enough IP addresses for all users around the world. It also eliminated the use of NAT (Network Address Translation) and VLSM (Variable Length Subnet Mask), which are now widely adopted to save IPv4 from the shortage of address space [10].
3 Related Work Internet Protocol version 6 (IPv6) [27] is designed to respond to the limitations of IPv4, including the lack of IP addresses. Moreover, it has improved most network functions, including security, mobility, autoconfiguration, quality of service (QOS), and multicast [28]. It is proposed to provide the Internet with greater address space and better performance. The performance of such a network or protocol is one of the important factors that depend on the tool used and the choice of application tested. Since the adoption of the new IPv6 protocol, different studies have been performed on IPv6 and its predecessor IPv4 in order to study and compare their performance. Here are some of them.
294
K. E. Khadiri et al.
IPv4 limitations, improvements added in IPv6, and a comparison between IPv4 and IPv6 has been discussed by the author Chandra et al in [29]. The authors also discussed IPv6 addresses kinds and some strategies of transition from IPv4 to IPv6 such as the dual stack, manual tunnel, ISATAP, Teredo, and NAT-PT translator while describing the operation principle of each strategy. In [19], the author Ali performed a study on the two protocols IPv4 and IPv6. First, he presented an overview of the IPv4 protocol and its limitations. Then, he described the new version of the IP Internet Protocol (IPv6) and its advantages. Finally, he concluded that the transition from IPv4 to IPv6 in an instant is impossible due to the huge size of the Internet and a large number of IPv4 users currently. Therefore, a phase of coexistence of the two protocols is necessary. In [30], the author Narayan et al. studied the performances of the two versions of IP (v4 and v6) on two kinds of operating systems (Windows Vista and Linux Ubuntu). The study was performed on two kinds of traffic (UDP and TCP) using four measurements parameters such as the delay, jitter, and processor use. Consequently, the obtained results showed that the performance of IPv4 and IPv6 depend on the kind of traffic and the choice of the operating system. Another performance comparison of the two protocols IPv4 and IPv6 was done in [31] by the author Yasinovskyy et al. The authors studied VoIP on UDP as test traffic with six measurement parameters such as delta, jitter, MOS score, throughput, and loss rate. Consequently, they found that IPv4 gave better performance than IPv6 in the delta, jitter, MOS, and loss rate while in throughput, IPv6 is more efficient than IPv4. The same comparison was made in [32] by the author Fawad but this time on OPNET Modeler with two measurement criteria, which are the delay and throughput. As findings, they found that the delay in IPv6 is almost 49% lower than that of IPv4. IPv6 throughput is almost three times that of IPv4. A performance evaluation of Video/Voice throughput based on methods of simulation and analysis in IPv4 and IPv6 networks was performed by the author Aziz et al in [33]. The simulation was done on the OPNET Modeler tool. The results showed that IPv6 has more throughput than IPv4. That is due to the fact that IPv6 has a header size larger than that of IPv4. In [34], the authors Al-Gadi et al realized a performance evaluation of the two protocols IPv4 and IPv6. For measurement performance, the OPNET Modeler simulation tool was used based on four measurement parameters, namely response time, throughput, Ethernet delay, and jitter. Consequently, there is no difference observed in the jitter and response time values. IPv6 offers a high throughput related to IPv4, while in delay, IPv4 is lower than that of IPv6 because it has a larger header field than IPv4. Actually, most of this work has evaluated and compared the performance of the two protocols IPv4 and IPv6 but only for some applications and limited traffic. That does not give a clear idea of the most efficient protocol nor the application nor the kind of traffic (heavy or light) that are linked to it. Therefore, a study of the performance of the protocols IPv4 and IPv6 on several applications taking into account the kind of traffic (heavy or light) is sufficient to judge the best protocol. However, according to our research, the study of the performance of the two protocols on several applications in order to discover the best protocol on a particular application taking into account the kind of traffic (heavy or light) has not been discussed. This fact constituted our motivation to carry out this work on the OPNET Modeler tool using different applications such as
Study of Various Web-based Applications Performance
295
HTTP, FTP, database, email, and VoIP in terms of response time, received traffic, delay, jitter, and MOS score taking into account the kind of traffic (heavy or light).
4 Simulation Environment The test network shown in Fig. 1 below was used to create the simulations. Two scenarios, namely IPv4 and IPv6, were realized with the OPNET Modeler tool. The latter is an effective way to provide a complete study for the analysis of network performance. Thus, it allows to monitor and capture the performance of different applications of file transfer, web browsing, and email regarding the response time and the size of the received traffic. For voice applications, OPNET was able to monitor the jitter, end-to-end delay, and MOS score values. The test network includes the following components: • Six router nodes (ethernet4_slip8_gtwy node) • Two switch nodes (ethernet16_switch node): these nodes represent switches that support up to sixteen Ethernet interfaces • Five server node (ethernet_server node): to support four applications (voice, HTTP, FTP, and e-mail) connected to the other end of the network • One application configuration node (Application Config node): this module defines the applications supported by the network (ie, heavy applications HTTP of Web browser and heavy applications FTP). The specified application name is used when creating user profiles on the “Profile Config” object. • One profile configuration node (Profile Config node): this module describes how users use the applications defined in the application configuration module. This is used to create user profiles. These user profiles can be specified on different network node to generate application layer traffic. The applications defined in “Application Config” are used by this object to configure the profiles. Indeed, in our simulations, web browsing and file transfer applications are configured on the OPNET Modeler on heavy traffic while e-mail, database, and VoIP applications are configured on light traffic.
Fig. 1. Network testbed.
296
K. E. Khadiri et al.
5 Results and Analysis 5.1 HTTP Web browsing: Hyper Text Transfer Protocol (HTTP) [35] is a protocol used to transfer web pages. HTTP supports communication between the Web server and Web browser. HTTP is used to send requests from a Web customer (a browser) to a Web server, returning the Web content (Web pages) from the server to the customer. Web browsing is a software used to locate, retrieve, and display content (Web pages) via the HTTP protocol; the time required to load each Web page is called the response time. Another important parameter that affects the performance of HTTP traffic is the amount of traffic received by the HTTP customer. This is the average bytes per second transmitted to the HTTP application by the HTTP customer transport layer. Larger is the traffic amount better is the performance. The response time and received HTTP traffic were monitored for the network in the case of heavy web browsing in the two scenarios (IPv4 and IPv6). The results have been described in Figs. 2 and 3 below. According to a first reading, it is clear that IPv6 is more efficient than IPv4 in terms of these two criteria. This difference can be justified by the simplicity of the IPv6 header. It contains fewer fields (eight fields instead of thirteen in IPv4), which allows for faster processing of data when it is one heavy browsing, which is our case. Consequently, it allows for better performance.
Fig. 2. Page response time.
Fig. 3. Received HTTP traffic.
5.2 FTP File Transfer: The file transfer protocol (FTP) [36] is a standard network protocol used to transfer computer files from one host to another host. FTP is built on a customerserver architecture and uses separate control and data connections between the customer and the server. The download response time is the time elapsed between sending of a request and receiving of the response packet for the FTP application. This time is measured from the moment when one customer application sends one request to the server until the moment when it receives one response packet. Each response packet sent from a server to an FTP application is included in this statistic. This is an important factor in measuring the performance of FTP applications; short download response time
Study of Various Web-based Applications Performance
297
represents good performance. Another important factor in measuring the performance of FTP traffic is the amount of traffic received by the FTP customer. That is the average bytes per second transmitted to the FTP application by the FTP customer transport layer. A large amount of traffic represents good performance. Figures 4 and 5 below illustrate the results in average values of download response time and FTP traffic received for the network in the case of heavy FTP traffic for the two scenarios (IPv4 and IPv6). As shown in these figures, the results showed that IPv6 presents better performance than IPv4 because it gave a download response time lower than IPv4, and an average value of the received traffic greater than IPv4.
Fig. 4. FTP download response time.
Fig. 5. Received FTP traffic.
5.3 Database A database is a set of information organized in such a way that it can be accessed easily. A Web database is a database application designed to be managed and available via the Internet. The response time of a database request is the necessary time to send one request and receive one response packet. It is measured from the moment when the database request application sends a request to the server until it receives a response packet. Thus, each response packet sent from a server to a database request application is included in this statistic. Longer is the response time, longer is the delay, which will reduce the overall performance of the network. Figure 6 below represents the results of the response time to database requests for the two scenarios IPv4 and IPv6. These results indicate that IPv4 is more efficient than IPv6. The amount of received traffic is a good indicator of the quality of the database traffic. It illustrates the average bytes transmitted per second to database requests applications by the database customer transport layer. This value was monitored for the two scenarios IPv4 and IPv6, and the results were described in Fig. 7 below. As shown in this figure, the results show that IPv4 presents better performance than IPv6. These results, as well as those of response time, can be justified by the impact of the IPv6 header length in the case of light traffic (our case, via the database application). IPv6 has a longer header length than IPv4 (40 bytes instead of 20 bytes in IPv4).
298
K. E. Khadiri et al.
Fig. 6. Query response time.
Fig. 7. Received query traffic.
5.4 E-mail An email refers to the transmission of messages on communication networks. The response time for downloading an email is the time elapsed between sending of one email request and receiving of email from the email server in the customer’s email. This parameter plays an important role in the quality of an email application. The email response time value was monitored and the results summarized in Fig. 8. The comparison between the two protocols indicates that IPv4 is better than IPv6. Figure 9 shows that IPv4 is more efficient than IPv6 regarding the amount of received traffic. The received traffic here is the average bytes transmitted per second to the email application by the transport layer in the customer’s email.
Fig. 8. Email download response time.
Fig. 9. Received email traffic.
5.5 VoIP In regard to voice applications ; the VoIP protocol (Voice over Internet Protocol) [37] is a service that allows users to communicate with each other by voice. A very important indicator of the performance of any voice application traffic is jitter. Jitter is defined as the variation of the delay. Packets irregularly arrive according to the network traffic. It is therefore decisive in the case of VoIP. If the transmission delay varies during a voice conversation, the voice quality will be degraded. The best jitter value is the closest to zero. From the test network of Fig. 1, the voice packets are transmitted from the “VoiceClient” node to the “Server_SIP” server node. The results of the jitter were
Study of Various Web-based Applications Performance
299
monitored on this node during the simulation. The results are shown in Fig. 10. The comparison between the two protocols indicates that IPv4 is better than IPv6. Another parameter that affects the performance of a voice application is the endto-end delay. It is the necessary time to transmit a packet on a network, from source to destination. The average values of VoIP end-to-end delay were monitored, and the results were presented in Fig. 11. As shown in this figure, the results indicate that IPv4 works better than IPv6.
Fig. 10. VOIP jitter.
Fig. 11. Voice packet end-to-end delay.
Another indicator regarding the quality of a voice application is the MOS score (Mean Opinion Score). This is an important indicator to appreciate the quality of a voice application. MOS is a subjective measure. This is a scale from 0 to 5 where 5 indicates excellent quality and 1 indicates poor quality. OPNET has the ability to monitor the MOS value and calculate its average. In this experiment, the average MOS values were monitored, and the results were described in Fig. 12. A high MOS value indicates good performance. These results show that IPv4 has a better voice quality than the protocol IPv6.
Fig. 12. VOIP MOS.
6 Conclusion In this article, we studied and compared the performance of the two protocols IPv4 and IPv6. This study was carried out under OPNET Modeler with different applications,
300
K. E. Khadiri et al.
namely HTTP, FTP, database, email, and VoIP in terms of response time, received traffic, delay, jitter, and MOS score. The obtained results showed that the performance of the protocols depends on the kind of traffic used. Consequently, when it is heavy traffic (case of the HTTP and FTP applications), IPv6 presents better performance than IPv4. The difference is justified by the simplicity of the IPv6 header because it contains fewer fields (eight fields instead of thirteen in IPv4). That allows for faster data processing, which will be translated into better performance. On the other hand, when it is light traffic (case of the database, email, and VoIP applications), IPv4 became more efficient than IPv6. That is justified by the impact of the length of the IPv6 header, which is greater than that of IPv4 (40 bytes related to 20 bytes in IPv4). However, the IPv6 protocol has several advantages such as an extended address space of 2128 , integrated security by default, autoconfiguration, and other mobility advantages.
References 1. Sekhar, K.C.: Contradiction Between IPv4 & IPv6. IJECCE 3(4), 796–799 (2012) 2. El Khadiri, K., Labouidya, O.: Etude comparative des mécanismes de transition de l’IPv4 à l’IPv6. Revue Méditerranéenne des Télécommunications 7(1) (2017) 3. Deering, S., Hinden, R.: Internet protocol, version 6 (IPv6) specification (2017) 4. Davies, J., Northrup, T.: Windows Server 2008 Networking and Network Access ProtectionNAP, USA , pp. 3–10. Microsoft Press, Redmond (2008) 5. Shapiro, J.R.: Networking windows server 2008, Windows Server 2008 Bible 1st ed. Canada , pp. 98–102. Wiley, Hoboken (2008) 6. IETF: Internet Protocol Darpa Internet Program Protocol Specification. IETF, RFC 791 (1981) 7. Andress, J.: IPv6: the next internet protocol. Login 30(2), 21–28 (2005) 8. Ladid, L.: IPv6-the next big bail-out: will IPv6 save the internet? In: Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing, pp. 2.1–2.7., (2009) 9. El Khadiri, K., Labouidya, O., Elkamoun, N., Hilal, R.: Performance evaluation of IPv4/IPv6 transition mechanisms for real-time applications using OPNET modeler. Int. J. Adv. Comput. Sci. Appl. 9(4) (2018) 10. Ali, A.N.A.: Comparison study between IPV4 & IPV6. Int. J. Comput. Sci. Issues IJCSI 9(3), 314 (2012) 11. Deering, S., Hinden, R.: Internet protocol, version 6 (IPv6) specification” IETF, RFC 1883 and RFC 2460 (1998) 12. El Khadiri, K., Labouidya, O., Elkamoun, N., Hilal, R.: Performance analysis of video conferencing over various IPv4/IPv6 transition mechanisms. IJCSNS 18(7), 83–88 (2018) 13. Javvin, (ed.): Voice over IP and VoIP protocols Network Protocols Handbook, 2nd ed. CA, USA: Javvin Technologies, Inc, p. 67 (2005) 14. Wu, J., Wang, J.H., Yang, J.: CNGI-CERNET2: an IPv6 deployment in China. ACM SIGCOMM Comput. Commun. Rev. 41(2), 48–52 (2011) 15. Bouras, C., Gkamas, A., Primpas, D., Stamos, K.: Performance evaluation of the impact of QoS mechanisms in an IPv6 network for IPv6-capable real-time applications. J. Netw. Syst. Manage. 12(4), 463–483 (2004) 16. Cooper, M., Yen, D.C.: IPv6: business applications and implementation concerns. Comput. Stand. Interfaces 28(1), 27–41 (2005) 17. Caicedo, C.E., Joshi, J.B., Tuladhar, S.R.: IPv6 security challenges. Computer 42(2), 36–42 (2009)
Study of Various Web-based Applications Performance
301
18. Davies, J.: Understanding ipv6. Pearson Education (2012) 19. Durda˘gı, E., Buldu, A.: IPV4/IPV6 security and threat comparisons. Procedia-Soc. Behav. Sci. 2(2), 5285–5291 (2010) 20. Ferry, A.S., Tadaki, S.: The critical needed of IPv6 development in Indonesia In: Proceedings of the IECI Japan Workshop 2003 (2003) 21. Kumar, M.A., Karthikeyan, S.: Security model for TCP/IP protocol suite. J. Adv. Inf. Technol. 2(2), 87–91 (2011) 22. Atkinson, R., Kent, S.: Security architecture for the internet protocol (1998) 23. Blanchet, M.: Migrating to IPv6: a practical guide to implementing IPv6 in mobile and fixed networks. John Wiley and Sons, Hoboken (2009) 24. El Khadiri, K., El Kamoun, N., Labouidya, O., Hilal, R.: LISP: a novel solution for the transition from IPv4 to IPv6. IJCSNS 18(10), 130–139 (2018) 25. Limkar, S.V., Jha, R.K., Pimpalkar, S.: IPv6: issues and solution for next millennium of internet. In: Proceedings of the International Conference & Workshop on Emerging Trends in Technology, pp. 953–954 (2011) 26. Sailan, M.K., Hassan, R., Patel, A.: A comparative review of IPv4 and IPv6 for research test bed. In: International Conference on Electrical Engineering and Informatics, 2009. ICEEI 2009, vol. 2, pp. 427–43 (2009) 27. Deering, S., Hinden, R.: Internet protocol, version 6 (IPv6) specification, RFC 8200 (2017) 28. Dutta, A., et al.: IPv6 transition techniques for legacy application. In: Military Communications Conference, 2006. MILCOM 2006. IEEE, pp. 1–9 (2006) 29. Chandra, D.G., Kathing, M., Kumar, D.P.: A Comparative study on IPv4 and IPv6. In: 2013 International Conference on Communication Systems and Network Technologies (CSNT), pp. 286–289 (2013) 30. Narayan, S., Shang, P., Fan, N.: Performance evaluation of ipv4 and ipv6 on windows vista and linux ubuntu. In: International Conference on Networks Security, Wireless Communications and Trusted Computing, 2009. NSWCTC 2009, vol. 1, pp. 653–656 (2009) 31. Yasinovskyy, R., Wijesinha, A.L., Karne, R.K., Khaksari, G.: A comparison of VoIP performance on IPv6 and IPv4 networks. In: IEEE/ACS International Conference on Computer Systems and Applications, 2009. AICCSA 2009, pp. 603–609 (2009) 32. Fawad, M.: Comparison of VoIP on IPV4 only network and IPV6 only network 33. Aziz, M.T., Islam, M.S., Khan, M.N.I.: Throughput performance evaluation of video/voice traffic in ipv4/ipv6 networks. Int. J. Comput. Appl. 35(2), 5–12 (2011) 34. Al-Gadi, G., Babiker, A.A., Mustafa, N., Al-Gadi, A.: Comparison between IPv4 and IPv6 using OPNET simulator. IOSR J. Eng. IOSRJEN 4(08), 44–50 (2014) 35. Berners-Lee, T., Fielding, R., Frystyk, H.: Hypertext transfer protocol–HTTP/1.0 (1996) 36. Postel, J., Reynolds, J.: File transfer protocol (ftp). RFC959, October 1985 37. Goode, B.: Voice over internet protocol (VoIP). Proc. IEEE 90(9), 1495–1517 (2002)
A Phase Memory Controller for Isolated Intersection Traffic Signals Nator J. C. da Costa(B) and Jos´e Everardo Bessa Maia Universidade Estadual do Cear´ a - UECE, Ciˆencia da Computa¸ca ˜o, 60714-903 Fortaleza, Cear´ a, Brazil [email protected] Abstract. This work introduces and evaluates a phase memory controller for isolated intersection traffic signals. At the end of each phase, the controller generates the next phase and its green time. His goal is to simultaneously optimize the average and maximum queuing times. The distinguishing property of this controller is that it takes into account a memory of the last phases that took place to decide the next phase. The idea behind the phase memory is that decision making takes into account which phases have received green times in the recent past and that this information reinforces the optimization of the objective. Considering the decision parameters of green times independent of those of the phase decision, when the queue lengths are given, the controller is optimized by a Diploid Differential Evolution algorithm. Although this controller has a smaller number of parameters to tune, its performance is comparable to that of other state of the art controllers. Keywords: Isolated intersection evolution
1
· Traffic controller · Differential
Introduction
The quality of life in large metropolitan areas is strongly affected by the massive volume of vehicles circulating which results in long traffic congestion with undesirable consequences for the economy and with an impact on the natural environment via environmental pollution. The design of traffic controllers optimized to reduce congestion and its consequences is hot research with a lot of recent work being published [2,6,9]. Controlling traffic signals at an intersection is equivalent to defining the sequence and duration of the phases. Phases are each of the ways to allow vehicles to cross the intersection. At the end of each phase, the controller must decide the next phase and its duration. In general, traffic controllers are classified into three types based on the information and the decision mode used by the controller [17]. Fixed time controllers use a sequence and duration of the preset phases, calculated as a function of traffic regularities. The actuated controllers regulate the sequence and duration of the phases based on detectors located on the network’s traffic paths. Finally, adaptive traffic controllers include plan c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 302–311, 2021. https://doi.org/10.1007/978-3-030-71187-0_28
Traffic Signal Control
303
selection or cooperation between local controllers to optimize the individual or collective performance of a group of intersection controllers. The optimization of the controller can be faced directly by the designer by trial and error or by applying techniques such as the design of experiments. This approach, while yielding good results, is time-consuming and costly as the designer is subject to the repetition of this manual procedure each time a change in traffic dynamics is encountered. Therefore, techniques to automate the design of traffic controllers are valuable and their search justified. Differential Evolution (DE) is an effective metaheuristic in search for optimization in continuous high-dimensional spaces with applications in various domains: in the design of neural network architectures [15], scheduling project [7], continuous controller optimization [12], image processing [11], traffic controllers optimization [4,5,16]. This work addresses the design of actuated traffic controllers, optimized by the Differential Evolution (DE). DEs are an alternative to search-based optimization that follows the gradient, and they are suitable for non-convex optimization. DEs are known for their ability to not get stuck in local minimums. The superior performance of the controller is demonstrated when evaluated by simulation and compared with state of the art techniques. In what follows, in Sect. 2 some references directly related to this work are reviewed to contextualize the research. In Sect. 3 the controller design and the simulation method1q are detailed. Section 4 is about the experimental plan and results with performance comparison. Section 5 concludes by highlighting the main findings of the research.
2
Related Works
Research and publication on the design of optimized traffic signal controllers is extensive. To contextualize this paper, this section reviews some works. Traffic dynamics at an intersection are non-linear and stochastic, and although queue length is a discrete variable, traffic controllers based on queue length are often designed with discrete or continuous entry space. Model-based designs [1] or project formulation as a combinatorial optimization problem [7] are frequent. However, the focus here will be on some Neural and fuzzy controllers [2,4] optimized by evolutionary algorithms. The paper [5] focuses on the automatic optimization of the passage of vehicles through intersections. The problem has been addressed by proposing three mechanisms to model any type of intersection, to calculate the roads with fewer points of conflict between their inputs and outputs, and to optimize the arrival rate of vehicles using a Genetic Algorithm to achieve the maximum performance of the intersection. The proposed systems achieve a throughput improvement between 9.21 and 36.98. The paper [9] is aimed at optimizing delay at isolated signalized intersections through application of meta-heuristic search optimization methods. However, real-time traffic is usually heterogeneous, having non-linear, stochastic, and intricate characteristics. Thus, they proposed a couple of meta-heuristic-based
304
N. J. C. da Costa and J. E. B. Maia
including GA and DE (Differential Evolution) methods for efficient traffic control. Both GA and DE yielded rational signal timing plans. The results indicated that both methods reduced the average travel time delay ranging from 15 to 35. The difference in the work [4] is that it proposes a controller based only on a presence sensor, such as a magnetic loop, located at a certain distance from the intersection, and not on queues length measuring, as is the case with the others. The authors show that this discrete sensing only slightly degrades performance while it has a strong impact on the controller’s practical applicability. Araghi [2] develops the design of a traffic signal controller based on ANFIS (Adaptive-Network-Based Fuzzy Inference System). The controller takes the length of queues at the end of each phase as input to decide both the next phase and its duration. The controller is optimized using Genetic Algorithm. This paper is the main reference of this research and its results are taken as a benchmark for external comparison. Unlike the controller proposed here, in [2] the phase sequence is not fixed. In Sect. 4 more will be said about this work. The distinguishing property of the project proposed in this paper is the explicit consideration of a state memory of the past phases. Although neural networks, notably recurrent NNs, store an internal state, the controller structure with phase memory and a linear decision function are imposed, reducing the degrees of freedom and speeding up training.
3 3.1
Methods Controller Description
The structure of the proposed phase memory controller is shown in the block diagram of Fig. 1. The next phase is decided at the end of each phase in execution based on the lengths of the queues in all phases and the information stored in a memory of phases that stores the last Nf phases that happened.
Fig. 1. Block diagram of the controller with phase memory.
To decide the next phase, the controller calculates the Phase function given in Eq. 1 as a score for each phase. The controller chooses the lowest score phase as shown in Eq. 2:
Traffic Signal Control Nf −1
phase f unctionj =
305
Nf −1
αi qi +
i=0
βi 1i (sm(ph)) ,
(1)
i=o
and next phase = arg minj {phase f unctionj } ,
(2)
where Nf is the number of phases of the intersection, αi and βi are tuning parameters of the controller, qi are the lengths of the queues in each phase, sm is a left-shift memory that stores the last Nf phases occurred, 1i (sm(ph)) is an indicator function that indicates the occurrence of each of the phases in the phase memory and j ∈ {0, 1, ..., Nf − 1} specifies a phase. In fact, Eq. 1 is understood as a discriminate function. We call this a Discriminating Function Controller. It partitions the state space of the system into regions favorable to each of the next phase decisions. On the other hand, the controller calculates the green time by Eq. 3 based on the lengths of the queues: Nf −1
green time =
θ i qi ,
(3)
i=0
where θi are parameters of the controller and qi , as before, the lengths of the queues. The reasoning behind this proposal is that, as the objective of the project is to minimize the combination of the average and maximum waiting times, a memory of the recent phases that previously happened allows the controller to obtain some estimate of the waiting accumulated in the waiting lines. A long queue phase may or may not be recent arrivals. If the passage to that phase was recently made then it is estimated that the accumulated wait in it is not great. Comparatively, the idea is that if there are two phases with long queues, it is estimated that the accumulated wait in the phase that received the ticket most recently will be less. 3.2
Diploid Differential Evolution
Diploid algorithms are a variant of evolutionary algorithms in which chromosomes are composed of two types of segments that are evolved independently, in the sense that evolutionary operators do not cross segments of different types. This partitions the search space and accelerates convergence [8]. Differential evolution (DE) was introduced by Storn and Price [14] for global optimization problems in continuous search domain and it is a powerful population based trial-and-error method for the tackling of complex optimization. DE has three control search parameters, which are: NP, the population size, is the number of population members; F, the mutation control parameter, is a real and constant factor that controls the amplification of the differential variation, and CR, the crossover control parameter, is a real and constant factor that
306
N. J. C. da Costa and J. E. B. Maia
controls which parameter contributions to which trial vector parameter in the crossover operation . In this work the trial vector generation strategy of the canonical DE algorithm DE/rand/1/bin was used, where rand indicates that the base vector of the mutation strategy is chosen randomly, 1 indicates the number of difference pairs used in the mutation equation and bin indicates the binomial crossover. The DE procedure, shown in Algorithm 1 [3], is composed of four parts: initialization, mutation, crossover and selection. Countless published works suggest choosing the parameters of DE by these rules [13]: F ∈ [0.5.1], CR ∈ [0.8.1] and N P = 10 × D, where D is the dimensionality of the problem. In this implementation, F = 0.6, CR = 0.9 and N P = 8 × D = 96 were used. The fitness function used in line 14 of Algorithm 1, which is evaluated by simulation within the Differential Evolution loop, was: f itness = 10λ × awt + (1 − λ) × mwt + aql
(4)
where aql stands for the average queue length, awt stands for average wait time, mwt stands for maximum wait time and λ is used to regulate the relative importance between awt and mwt in the optimization process.
Algorithm 1. Canonical Differential Evolution algorithm DE/rand/1/bin [3]. 1: xi ← U (xmin , xmax ), i = 1, . . . , M. 2: while termination criteria are not met do 3: for i = 1 → M do 4: Choose r1 = r2 = r3 = i ∈ [1..M ] u.a.r. 5: vi ← xr1 + F (xr2 − xr3 ) 6: Choose jrand ∈ [1..n] u.a.r. 7: for j = 1 → n do 8: if U (0, 1) ≤ Cr or j = jrand then 9: xij ← vij 10: else 11: xij ← xij 12: end if 13: end for 14: if f (xi ) < f (xi ) then 15: xi ← xi 16: end if 17: end for 18: end while
Initialization
Mutation Crossover
Selection
Traffic Signal Control
4
307
Results
In order to perform the tests, a simplified traffic simulator driven by discrete time with a time pulse equal to 0.1 s was programmed. The simulator abstracts some physical properties of the phenomenon, such as acceleration and braking, but it has been calibrated and validated, reproducing approximate results of the average queuing times of some published works. A vehicle start-up time of 2 s has been included in the model. The simulator works as a state machine with transitions every 0.1 s. The state vector of the intersection includes the length of the queues, the phase of the signal and its elapsed time, and the vehicle entity with its attributes. The state trajectory log file during the simulation is stored and then used at the end of the run to calculate the performance index statistics. Using the classification given in [18] this simulator is in the mesoscopic traffic simulator category. Were simulated: Balanced traffic, in which the total flow of vehicles is equally distributed between the phases, and unbalanced traffic, in which 70% of the total traffic is concentrated in one of the phases and 10% in each of the others. The simulated experimental setup, an intersection of four phases, is represented in Fig. 2. Figure 2a is a graphic sketch of the intersection and the phases are those represented in Fig. 2b. In any of the traffic lanes the vehicle can move forward or deviate left or right. In all experiments, traffic is a Poisson arrival process.
Fig. 2. Experimental setup for performance evaluation.
4.1
Controller Performance
A common point in the works reviewed in Sect. 2 is that they all focus their performance on the average waiting time (awt). AWT alone may not be a good criterion if the variability is high. One of the objectives in this work is to consider the maximum waiting time (mwt) in the optimization criterion. Figure 3 shows the performance results of the Phase Memory Controller when λ is varied in the fitness function.
308
N. J. C. da Costa and J. E. B. Maia
Figure 3a is for AWT and Fig. 3b is MWT. The data in Fig. 3 shows two things: 1. The DE algorithm effectively finds a controller configuration for which awt is optimized without compromising mwt. 2. The effect of λ is relatively small, meaning that as long as awt and mwt are in the fitness function (FF), the DE finds a good setting. Tables 2 and 3 show the complete data of the performed experiments and the graphs in Fig. 5 show the typical convergence of DE algorithm for two executions.
(a) awt versus λ
(b) mwt versus λ
Fig. 3. Performance graphics of the phase memory controller when lambda is varied.
4.2
Comparison with Other Works
The difficulty in comparing performance with other works lies in the heterogeneity of the experimental setting used in the different works. Therefore, depending on the setting, Fig. 4 presents the results for only some of the controllers. The numerical data for Fig. 4 are shown in Table 1. For comparison, variations are made in two dimensions of the tests: traffic intensity and balance. The performance measure is the average waiting time. The works used in the comparison are [2] and [4]. More about these works can be read in Sect. 2. Figure 4a compares the performance for balanced traffic. The columns for traffic of 1600 vehicles/h show that the proposed DE-pm controller surpasses the others: bd-ga is a controller based on a discrete sensor and ql-ga is a controller based on queue length but without phase memory [4]. Anfis [2] is a Neuro-Fuzzy Controller optimized by GA. The columns for 800 vehicles/h and 1200 vehicles/h are shown for completeness. Figure 4b is for unbalanced traffic. The DE-pm controller is also better than the others for 1600 vehicles/h. Note that there are not two training sessions for the controller but only one controller trains by DE with the FF of Eq. 4.
Traffic Signal Control
(a) Average queue time, balanced traffic.
(b) Average queue time, unbalanced traffic.
Fig. 4. Graphics for performance comparison between controllers.
Table 1. The average queue time data used for Fig. 4. Traffic
bd-ga ql-ga pm-de ANFIS [2]
bal-800
10.51 7.13
5.89
–
bal-1200
10.59 8.18
7.03
4.87
bal-1600
11.21 9.51
7.89
–
unbal-800
12.16 4.67
3.64
–
unbal-1200 13.70 5.96
4.63
–
unbal-1600 15.23 7.35
5.70
–
Table 2. Maximum waiting times separated by λ, pm-de. Traffic
λ 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
bal-800
28.57 30.68 29.95 32.35 33.49 29.87 32.56 38.37 39.97 40.67 48.25
bal-1200
29.0
bal-1600
29.41 29.48 29.46 29.41 35.96 43.62 43.29 46.02 48.98 49.8
72.92
unbal-800
28.78 28.46 27.64 28.87 28.8
52.21
28.96 29.52 28.97 35.68 41.57 42.76 47.4
unbal-1200 28.84 28.99 28.98 29.3
309
43.77 55.05 55.76
28.57 28.77 36.42 46.04 42.5
32.32 32.87 39.33 38.33 41.22 44.38 61.25
unbal-1600 31.12 30.21 31.07 31.42 35.68 42.87 43.62 46.56 44.97 49.77 107.25
310
N. J. C. da Costa and J. E. B. Maia Table 3. Average waiting times separated by λ, pm-de. Traffic
λ 0.0
0.1
0.2
0.3
6.24 6.4
0.4
0.5
0.6
0.7
0.8
0.9
1.0
bal-800
7.37 6.4
bal-1200
8.23 8.06 8.02 8.01 7.61 7.27 7.15 7.19 7.04 7.25 7.05
6.01 6.18 6.06 6.08 6.01 5.96 5.9
bal-1600
8.93 8.82 8.79 8.82 8.56 8.11 7.98 8.02 7.98 7.89 7.89
unbal-800
6.8
5.21 5.15 5.13 4.73 4.9
4.6
3.86 3.77 3.9
3.65
unbal-1200 7.92 7.59 7.65 7.51 6.17 5.69 5.17 5.21 5.25 4.63 4.65 unbal-1600 9.61 9.33 9.42 8.96 7.77 6.5
(a) λ =0 .0
6.5
6.21 6.19 6.1
5.7
(b) λ =0 .4
Fig. 5. Speed of convergence of the DE algorithm for two λ values, total traffic = 1600 vehicles/h.
5
Conclusion
A controller was designed to optimize the combination of average and maximum queuing times. The distinguishing feature of this design is the imposition of a phase memory structure on the controller, reducing the number of free parameters. The idea behind the phase memory is that decision making takes into account which phases have received green times in the recent past and that this information reinforces the optimization of the objective. The project parameters were optimized by Differential Evolution. The comparative performance tests using simulation found that this project is competitive with the state of the art when performance is quality of service (average and maximum waiting times in line), but that this is achieved with a smaller number of parameters to be optimized. The performance of the controller far exceeds that of pre-timed controllers. In continuity, this research will seek to apply this design principle to Area Traffic Control in which multiple interdependent intersections must be coordinated. It is expected that the reduced number of degrees of freedom of this controller will be a differential for optimization in this application on a larger scale [10].
Traffic Signal Control
311
References 1. Ahmed, A., Naqvi, S.A.A., Watling, D., Ngoduy, D.: Real-time dynamic traffic control based on traffic-state estimation. Transp. Res. Record 2673(5), 584–595 (2019) 2. Araghi, S., Khosravi, A., Creighton, D.C.: ANFIS traffic signal controller for an isolated intersection. In: IJCCI (FCTA), pp. 175–180 (2014) 3. Boks, R., Wang, H., B¨ ack, T.: A modular hybridization of particle swarm optimization and differential evolution. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 1418–1425 (2020) 4. Costa, N.J.C., Maia, J.E.B.: An intersection traffic signal controller optimized by a genetic algorithm. Int. J. Comput. Appl. 176(40), 9–13 (2020) 5. Cruz-Piris, L., Lopez-Carmona, M.A., Marsa-Maestre, I.: Automated optimization of intersections using a genetic algorithm. IEEE Access 7, 15452–15468 (2019) 6. Genders, W., Razavi, S.: An open-source framework for adaptive traffic signal control. arXiv preprint arXiv:1909.00395 (2019) 7. George, T., Amudha, T.: Genetic algorithm based multi-objective optimization framework to solve traveling salesman problem. In: Advances in Computing and Intelligent Systems, pp. 141–151. Springer (2020). https://doi.org/10.1007/978981-15-0222-4 12 8. Hu, F., Wu, F.: Diploid hybrid particle swarm optimization with differential evolution for open vehicle routing problem. In: 2010 8th World Congress on Intelligent Control and Automation, pp. 2692–2697. IEEE (2010) 9. Jamal, A., Rahman, M.T., Al-Ahmadi, H.M., Ullah, I.M., Zahid, M.: Intelligent intersection control for delay optimization: Using meta-heuristic search algorithms. Sustainability 12(5), 1896 (2020) 10. Jovanovi´c, A., Nikoli´c, M., Teodorovi´c, D.: Area-wide urban traffic control: a bee colony optimization approach. Transp. Res. Part C: Emerg. Technol. 77, 329–350 (2017) 11. Mirjalili, S., Dong, J.S., Sadiq, A.S., Faris, H.: Genetic algorithm: theory, literature review, and application in image reconstruction. In: Nature-Inspired Optimizers, pp. 69–85. Springer (2020). https://doi.org/10.1007/978-3-030-12127-3 5 12. Ochoa, P., Castillo, O., Soria, J.: Optimization of fuzzy controller design using a differential evolution algorithm with dynamic parameter adaptation based on type-1 and interval type-2 fuzzy systems. Soft. Comput. 24(1), 193–214 (2020) 13. Pant, M., Zaheer, H., Garcia-Hernandez, L., Abraham, A., et al.: Differential evolution: a review of more than two decades of research. Eng. Appl. Artif. Intell. 90, 1–24 (2020). Article ID 103479 14. Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997) 15. Sun, Y., Xue, B., Zhang, M., Yen, G.G., Lv, J.: Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans. Cybern. (2020) 16. Teodorovic, D., Lucic, P., Popovic, J., Kikuchi, S., Stanic, B.: Intelligent isolated intersection. In: 10th IEEE International Conference on Fuzzy Systems. (Cat. No. 01CH37297), vol. 1, pp. 276–279. IEEE (2001) 17. Zhao, D., Dai, Y., Zhang, Z.: Computational intelligence in urban traffic signal control: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev. ) 42(4), 485–494 (2011) 18. Zhou, X., Taylor, J.: DTAlite: a queue-based mesoscopic traffic simulator for fast model evaluation and calibration. Cogent Eng. 1(1), 1–19 (2014). Article ID 961345
Multiple Face Recognition Using Self-adaptive Differential Evolution and ORB Guilherme Costa, Rafael Stubs Parpinelli, and Chidambaram Chidambaram(B) Universidade do Estado de Santa Catarina, Joinville, SC, Brazil {rafael.parpinelli,chidambaram}@udesc.br Abstract. Face recognition (FR) applications have been intensively studied in the areas of Computer Vision specially with the wide use of biometrics as a method of security access. Hence, working on still images with multiple faces (SIMF) is still a challenge due to the diversity of features that may be present in the images and different conditions from which the images are usually captured. In addition, FR algorithms may not perform well under uncontrolled conditions due to the illumination variation and other related image conditions. Hence robust approachesshould be developed to perform well under complex conditions. Based on this context, in the present work, we proposed a new approach for recognizing faces in SIMF obtained under different conditions using an optimization algorithm with feature extraction method based on interest points (IP). Our algorithm is constructed using the Self Adaptive Differential Evolution (SaDE) algorithm and ORB, well-known key point detector and descriptor. A specific base of SIMF obtained under different illumination conditions, rotation, scale, and occlusion is used to validate the algorithm. The recognition process is conducted with the aid of optimization algorithm searching for a possible best match of a template face image (TFI) in the SIMF. During the search process, numerous target cut images (TCI) from SIMF are evaluated using descriptor distance measure. According to the experiment results, it can be observed the present approach is suitable for real-world FR applications with complex conditions.
Keywords: Differential evolution Multiple faces image recognition
1
· Optimization algorithm · ORB ·
Introduction
Face recognition (FR) has become one of the important application in many realworld cases such as police investigations, entrance of corporate buildings, airport security and terrorists identification. Among bio-metric recognition systems, FR is an important research problem that covers several fields of application, which requires facial detection and robust recognition algorithms. This topic is still Supported by LABICOM - Laborat´ orio de Pesquisa em Inteligˆencia Computacional. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 312–321, 2021. https://doi.org/10.1007/978-3-030-71187-0_29
Multiple Face Recognition Using Self-adaptive
313
considered as a research problem because it requires intensive computation and has to deal with complex conditions in different type of images. The rapid development of FR systems is due to the presence of new technologies with low cost which include surveillance equipment and software tools. Based on this context, aiming to contribute with different approaches, we attempted using optimization algorithms with bio inspired concepts. The FR applications require the use of optimization algorithms because of the image conditions that consist of different variables such as illumination variation, scale, occlusion, rotation and others. In order to construct robust algorithms, these variables should be optimized to find the ideal parameter values. Similar works can be found in the literature, for example, using PSO and DE [5,15]. Hence, in many works, face detection and recognition algorithms have been developed using either evolutionary or swarm intelligence approaches [2,4,9]. Mostly, in this kind of approach, the objective is to construct a well-defined robust algorithm using computational intelligence in order to reduce the search space and the number of computations [9,11]. In addition to this, in FR applications, relevant and robust features should be extracted overcoming the image conditions issues. Hence, to represent faces with reduced dimensions, the extraction of invariant features requires efficient methods. In recent decades, though several approaches have been available in the image processing literature, many research works were published using interest points (IP) detectors and descriptors in FR [4,7], image matching [10] and Traffic Sign Recognition [6] applications. In [2], the DE algorithm was used to choose a set of image processing and feature extraction techniques by optimizing their parameter values. Likewise, the authors obtained promising results in the work [4] using SURF and Improved ABC (Artificial Bee Colony) algorithm on still images with multiple faces (SIMF) obtained under different conditions. In this work, the threshold values and parameters of SURF were optimized. On the other side, Gutoski [6] worked with Traffic Sign Recognition in which the image features were extracted using SIFT and SURF. In addition to these works, another IP detector and descriptor, Oriented FAST and Rotated BRIEF (ORB) has been widely used in many recognition applications [7,10,13]. During the FR process, the invariant feature extraction and detection of face of images using optimal parameters become essential to construct robust algorithms. The descriptors obtained from the IP provide local features invariant to illumination, rotation, scale and other image variations [4]. As exposed, the main objective of this work is to propose a robust novel approach using the Self Adaptive Differential Evolution (SaDE) algorithm [16] to identify faces in SMFI, based on discriminative features generated by ORB. Although there are many ways to identify faces in still images, we believe that the computational approaches based on the combination of evolutionary algorithms and IP detectors can certainly yield accurate and expected results through an iterative search and matching process. In our FR approach, several template face images (TFI) will be searched for a best match in SIMF. During the search process, numerous target cut image (TCI) from SIMF will be matched with TFI to
314
G. Costa et al.
identify the best similar match. Based on the exposition, our FR approach can be certainly treated as an optimization problem since the search space with image dimensions involved is large enough that can not be solved through exhaustive search. In order to evaluate the proposed approach we conduct several computational experiments with real SIMF obtained under different conditions such as illumination variation, rotation, scale, noise and blur. Mostly, FR approaches tackle single face images using traditional databases. In this, we attempt to work with the images with multiple faces in order to develop a novel approach using SaDE and ORB. No similar works was found in our literature review. The remaining paper is organized as follows: in Sect. 2, we discuss about ORB and SaDE. In Sect. 3, the proposed FR approach is explained. The experiments and results are discussed in Sect. 4. Finally, conclusions and future directions are pointed out in last section.
2
Background
In order to evaluate the FR power of SaDE and ORB, the present novel approach is developed using SIMF obtained under different conditions. This FR study is formulated as an optimization problem because the optimal values for many image parameters have to be determined [15]. The SaDE algorithm and ORB are explained in the following sections. 2.1
Feature Extraction Using ORB
In recent decades, in many computer vision applications, image features are identified using IP representing the local image features. According to the characteristics of the IP detectors, the features extracted from these regions are mostly invariant to image scaling, rotation and illumination [4]. In addition to the most known detectors, SIFT and SURF, ORB becomes an alternative technique to extract features in many FR applications [6,12]. Most of the IP detectors generate descriptor vectors that contain relevant information of the IPs neighborhood. During the FR process, the invariant features extracted using ORB can be used to determine the similarity between template image and TCI (possible solutions). It consists of a very fast binary descriptor based on BRIEF, which is invariant to rotation and resistant to noise [12]. ORB is implemented with some improved methodologies of BRIEF (Binary Robust Independent Elementary Features) and FAST (Features from Accelerated Segment Test) descriptors. These techniques are included because of their performance and low cost. Furthermore, both are invariant to illumination, blur and rotation [6]. In the ORB, the BRIEF is named as rBRIEF (rotated BRIEF) which is transformed to become more invariant to rotation by means of learning steps [12]. The detection of IP in a SMFI is shown in Fig. 1 in which it can be observed that only non-uniform regions are marked.
Multiple Face Recognition Using Self-adaptive
315
Fig. 1. Detection of interest points with ORB in still image with multiple faces
2.2
Self-adaptive Differential Evolution Algorithm (SaDE)
Differential evolution (DE) algorithm, proposed by Storn and Price [14], consists in a simple but powerful population-based stochastic search method for solving a lot of optimization problems. Its effectiveness and efficiency has been successfully demonstrated in many applications [2,16]. The original DE has three parameters which are fixed during the optimization process and user defined: mutation factor (F), crossover control parameter (CR) and population size (NP). Though the DE has been considered as simple yet powerful optimization algorithm, it requires a time consuming process with manual tuning and testing of evolutionary parameters involved prior to the optimization process. Hence, the need for self-adaptation strategy to adjust the two evolutionary parameters (F and CR) automatically and dynamically in any general class of problems. The efficiency and robustness of the DE algorithm depends on the setting of the these two parameters which are actually more sensitive [1,16]. Based on this context, the SaDE algorithm uses a self-adapting mechanism on the control parameters F and CR. The third parameter NP is maintained as a user-specified. The learning strategy and control parameter CR are self-adapted by using the previous learning experience. Since the CR plays a key role in the DE algorithm, the wrong choice may deteriorate the performance under any learning strategy. Hence, the SaDE algorithm adapts the CR value using learning strategy by accumulating the previous experience. Under these conditions, the SaDE can demonstrate good performance on uni-modal and multi-modal problems [1]. In this work, the SaDE is used to optimize the parameters of four dimensions involved during the search for similar faces in still images. The strong motivation for using this algorithm comes from the previous successful works developed using DE based algorithms [8,15] and the FR application for single faces developed by [3]. In the Fig. 2 the behavior of the SaDE is shown by the Pseudo-Code.
3
Proposed Face Recognition Approach
The general view of the proposed FR approach can be seen in Fig. 3. The TFI, as an input, is searched in the SIMF through an iterative optimization process
316
G. Costa et al.
Fig. 2. Pseudo-code of SaDE algorithm
with aid of SaDE algorithm. During the search process, according to the size of the population, simultaneously at each iteration, a patch of TCI, using the parameters of each individual of the population, is cut from the still image and their features are extracted using ORB. Using the descriptors, the matching and recognition process is conducted by calculating the similarity between TFI and TCI. As in any optimization algorithm, the entire search process is repeated for a given number of iterations to find out the most similar face. As proposed, the entire FR process is developed based on the following two steps: (1) definition of an image region for feature extraction in the SIMF by the SaDE algorithm, and feature extraction that includes IP detection and descriptors extraction from the image patch using ORB; (2) identification of matched IP between a TFI and a TCI from the still image using descriptor distance measure. The target cut image is represented by 4-tuple transformation parameters: central coordinates (x and y), scale factor and rotation angle of an image. These four parameters should be optimized. The search space is limited by the minimum and maximum values for each dimension of the vector. Using a four-dimensional vector for each individual of the population generated by the SaDE algorithm, an image patch
Fig. 3. Algorithm overview of the proposed FR approach
Multiple Face Recognition Using Self-adaptive
317
is cut from SIMF from which the descriptor vectors are retrieved for matching. During the matching, the corresponding IP between images are identified using Euclidean distance measure and distance descriptor threshold (discussed in the next section) which determines whether a match of two IP can be considered as valid. The maximum number of matched IP determines whether an individual of the population (TCI) is similar to the TFI among many comparisons during the search process. At the end of each iterative search process, the best match with maximum fitness (similarity) is considered as a recognized face image.
4
Experiments and Results
In the following sections, the information regarding image database, threshold analysis and experiments with different categories of images have been discussed. All information regarding the database with SIMF and TFI have been obtained from the previous work [4]. The resolution of the SIMF is 2592 x 1944 pixels. The template face images have varying resolution between 150 and 270 pixels of width and 190 and 330 pixels of height. The main objective of this approach is to conduct the experiments as much as possible similar to real-world conditions. Hence, eleven different categories of SIMF with different conditions were used in the experiments. Still images were captured under three main illumination conditions: using a specific lighting system with two lights, denominated as ILLUM I; using a specific lighting system with one light plus room lights (fluorescent lamps), denominated as ILLUM II; using room lights (fluorescent lamps) only, denominated as ILLUM III. Using these three illumination conditions, other images with head tilted (Rotation) and face occlusion were acquired as shown in Fig. 4a.
(a) Still images with multiple faces
(b) Template face images.
Fig. 4. Sample images of SIMF and TFI
318
G. Costa et al.
Images with changes in scale and noise (blur and color noise) were generated by an image editor in two levels for each category. For the scale category, the size of the images were reduced to 95% (Scale I) and enlarged to 105% (Scale II). Likewise, two different noise levels were applied to the images (Blur I, Blur II, Color Noise I, and Color Noise II). In Fig. 4b is shown all TFI used in this work. There are 20 individuals which are present in different ways in the still images, in order to guarantee a greater diversity. In addition to these features, according to the work [4], the TFI were obtained under illumination condition I separately and hence, all those images are different from the SIMF. The threshold analysis is an important step of this work. It regulates the behavior of the entire algorithm by choosing the optimal distance descriptor threshold that should be used to determine whether two IP can be considered as a match. In other words, the matching of two IP is valid only when the distance obtained by comparing the two descriptors associated with the IP respects the distance descriptor threshold. In the Fig. 5a, we present as an example of matching points where the correspondence is established using distance measure.
(a) Matching of IP using ORB in TCI (left) and TFI (right)
(b) Experimental analysis to determine the descriptor distance threshold.
Fig. 5. Demonstration of matching IP and threshold analysis
Based on the discussion, we have to decide through the analysis which will be the best threshold to work with the data base so that the maximum FR rate can be obtained. Threshold consists in a value that accept or discard a match between the descriptors of IP using the Euclidean distance. In this context, an experiment using randomly selected images from the base was conducted in order to identify the distance descriptor threshold. The Fig. 5b presents the recognition rates obtained varying the value of threshold. In this scenario, as shown in Fig. 5b, we can confirm that the distance threshold which has the maximum recognition rate is 29. It is important to emphasize that this value is valid for our image base used in this work. The best value (29) will be used as a distance descriptor threshold in all experiments to determine whether two IP can be considered as a valid match.
Multiple Face Recognition Using Self-adaptive
319
The experiments were done in the computer processor 2.2 GHz Quad-Core Intel Core i7, 16 GB of RAM and Intel Iris Pro 1536 MB run on operating system mac OS Catalina 64bits. All image processing functions are implemented using Pyhton 3 programming language with the libraries from Open CV. The search process is programmed for 40 iterations and 10 runs for each single face present in a SMFI. The population size is set to 30 in SaDE algorithm. The search space range is limited using the following parameter values of four dimensions: the central coordinates of an image, x = [0..n] (columns) and y = [0..m] (rows), scale = [0.5, 1.50], and angle =[−PI/3, +PI/3] (rotation of angle to −60 and +60◦ ). Totally eleven different categories of experiments were done using SaDE e ORB algorithm. In each category, one type of image condition is tested to check the real power of the ORB in FR process. All experiments and results are shown in Table 1. The same table also presents the recognition rates obtained using iABC and SURF from the previous work [4]. Table 1. Comparison of recognition rates between iABC with Surf and SaDE with ORB Experiment type iABC + Surf SaDE + ORB ILLUM I ILLUM II ILLUM III
81.17% 67.33% 31.23%
96.25% 75.00% 42.00%
Rotation Occlusion
51.92% 85.33%
90.00% 90.00%
Scale I Scale II
85.67% 89.00%
100.00% 100.00%
Blur I Blur II
68.67% 47.67%
100.00% 100.00%
Color Noise I Color Noise II
85.33% 85.00%
100.00% 100.00%
The validation of true positive images is done automatically during the experiment using the real coordinates of the face image in SMFI. The coordinate values of all face images present in SMFI are manually mapped to validate the recognition process whether the TCI identified as best match by the algorithm is same as the TFI. In addition, the convergence and divergence graphs (as shown in Fig. 6a and Fig. 6b) are presented, which denote the behavior of SaDE throughout its execution. It can be observed that both convergence and divergence behavior during the FR process is normal as in the literature. As shown in Table 1, among all three image conditions, the ILLUM III condition corresponds to the worse performance with abrupt decrease in recogni-
320
G. Costa et al.
Fig. 6. Convergence and divergence behavior in a single test.
tion rate. It points out that the strong influence of illumination variation under uncontrolled conditions. This problem is not present in ILLUM I and II since the images were obtained under controlled illumination. However, the results of the present work is higher than the previous work developed using iABC and SURF [4]. In addition to this, we can see that the combination of SaDE and ORB has a remarkable recognition rate for image variations, such as rotation, scale, noise and blurs, since ORB is considered as a powerful robust feature extractor compared to other algorithms [12]. Furthermore, the SaDE algorithm, which is a more optimized version than DE [2] and thus, it increases the speed of convergence for best result in combination with the decrease of diversity of the population.
5
Conclusions and Future Directions
In this work we proposed a new approach using SaDE as an optimization algorithm and ORB as a feature extractor. According to the results obtained from the experiments, we can conclude that present approach is suitable for FR applications with multiple face images. Our results obtained in this work are higher than in all categories in comparison with the other related work [4]. The proposed work has considerably higher recognition rate in all types of images, mainly on images with variations such as rotation, occlusion, scale and image noise. Based on the results, it can be observed the discriminative power of ORB regarding SURF. The results of the three illuminations conditions indicates the need for a specific illumination compensations techniques to improve the FR rates. As a future work, the same approach can be extended to the different categories of images such as lateral images, different faces expressions and poor image conditions. Furthermore, to increase the low recognition rates obtained some categories, new approaches can be developed by combining with a pool of well-known techniques. Illumination and noise compensation techniques can be considered as a complementary approach to construct a more robust algorithm for all conditions.
Multiple Face Recognition Using Self-adaptive
321
References 1. Qin, A.K., Suganthan, N.: Self-adaptive differential evolution algorithm for numerical optimization. In: IEEE Congress on Evolutionary Computation, vol. 2, pp. 1785–1791 (2005) 2. Plichoski, G.F., Chidambaram, C., Parpinelli, R.S.: An adjustable face recognition system for illumination compensation based on differential evolution. In: 2018 XLIV Latin American Computer Conference (CLEI), pp. 234–241 (2018) 3. Plichoski, G.F., Chidambaram, C., Parpinelli, R.S.: A face recognition framework using self-adaptive differential evolution. Learn. Nonlinear Models 17, 4–14 (2019) 4. Chidambaram, C., Neto, H.V., Dorini, L.E.B., Lopes, H.S.: Multiple face recognition using local features and swarm intelligence. IEICE Trans. Inf. Syst. 97(6), 1614–1623 (2014) 5. S´ anchez, D., Melin, P., Castillo, O.: Comparison of particle swarm optimization variants with fuzzy dynamic parameter adaptation for modular granular neural networks for human recognition. J. Intell. Fuzzy Syst. 38, 3229–3252 (2020) 6. Matheus, G., Chidambaram, C.: Supervised traffic signs recognition in digital images using interest points. In: XI Workshop de Vis˜ ao Computacional (WVC), pp. 158–163 (2015) 7. Guilherme, P., Guilherme, M., Chidambaram, C.A.: Supervised face recognition in still images using interest points. Comput. Beach (COB) 721–730 (2018) 8. Swagatam, D., Sankha, M., Suganthan, P.: Recent advances in differential evolution-an updated survey. Swarm Evol. Comput. 27, 1–30 (2016) 9. Salima, N., Abdallah, B.: Swarm intelligence inspired classifiers for facial recognition. Swarm Evol. Comput. 32, 150–166 (2017) 10. Ebrahim, K., Siva, P., Mohamed, S.: Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images. In: Proceedings of Newfoundland Electrical and Computer Engineering Conference (2015) 11. Rafael, P., Heitor, L.: New inspirations in swarm intelligence: a survey. Int. J. Bio-Inspired Comput. 3(1), 1–16 (2011) 12. Ethan, R., Vincent, R., Kurt, K., Gary, B.: ORB: an efficient alternative to SIFT or SURF. In 2011 International Conference on Computer Vision, pp. 2564–2571 (2011) 13. Leonardo, S., Chidambaram, C.: Traffic signs recognition approach using variable neighborhood search algorithm. In: XIV Workshop de Vis˜ ao Computacional (WVC), pp. 184–188 (2018) 14. Rainer, S., Kenneth, P.: Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997) 15. Tom, S., Frank, L., Jakob, R., Dirk, W.: Parameter optimization of differential evolution and particle swarm optimization in the context of optimal power flow. In: 2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe) (2020) 16. Janez, B., Viljem, Z., Mirjam, M.: Self-adaptive differential evolution algorithm in constrained real-parameter optimization. In: Evolutionary Computation (CEC), pp. 215–222 (2006)
Low-Cost, Ultrasound-Based Support System for the Visually Impaired Manuel Ayala-Chauvin1(B) , Patricio Lara-Alvarez1 and Albert de la Fuente-Morato3
, Jorge Peralta2
,
1 SISAu Research Group, Facultad de Ingeniería y Tecnologías de la Información y
Comunicación, Universidad Tecnológica Indoamérica, Ambato, Ecuador {mayala,patolara}@uti.edu.ec 2 Facultad de la Energía, las Industrias y los Recursos Naturales no Renovables, Universidad Nacional de Loja, Loja, Ecuador [email protected] 3 Centro de Diseño de Equipos Industriales, Universidad Politécnica de Cataluña, Barcelona, España [email protected]
Abstract. Surveys show that most blind people find the use of a cane to be indispensable, but complain it does not provide them with sufficient safety for their upper body. In this study, an ultrasound-based, portable support system was developed to cover such limitation. Consisting of a small device fastened to the forehead, it can detect the distance to and from obstacles and translate it into different warning patterns depending on the distance. The device’s warnings can be auditory or haptic, and can also be emitted by a cell phone linked wirelessly via an application. While similar systems exist, the present product offers a compromise solution between functionalities and price, being more suited for developing countries like Ecuador. The system consists of affordable, readily available components and a 3D-printed case, and runs on open-source software. Tests showed subjects were successful in several tasks where collisions with obstacles were to be avoided. Keywords: Visual impairment · Blindness · Ultrasound sensors · Bluetooth · Cell phone applications
1 Introduction 1.1 Background In 2015, data on visual impairment showed that approximately 1300 million people worldwide suffered from some form of visual impairment, with 36 million being completely blind [1]. Flaxman et al. have determined that visual problems depend on age and geographical region [2]. The risk of suffering from any kind of visual condition increases with age, with studies showing that 80% of visually impaired people are aged over 50 [3]. Furthermore, the prevalence of such handicaps is four times higher in poor regions than in high-income zones [2]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 322–332, 2021. https://doi.org/10.1007/978-3-030-71187-0_30
Low-Cost, Ultrasound-Based Support System
323
Visual impairment results in mobility limitations, restricting tasks that require ambulatory movements or the use of peripheral vision, the latter being limited by the person’s visual field. Some important daily challenges include the use of curbs, ramps and staircases [4, 5]. Furthermore, low visual acuity is linked to falls, injuries and psychological distress [4]. Nowadays, navigation and orientation systems available for the visually impaired are varied, though canes and guide dogs remain the most common aids [6]. However, canes and guide dogs cannot provide blind people with enough information regarding objects in close proximity to their upper body. For example, canes only work satisfactorily for short distances, and their use is limited to objects from the waist down. Guide dogs, in the same way, cannot provide handicapped users with information about obstacles above the waist that they can overcome without difficulty [7]. Several systems of sensory augmentation have been developed worldwide, with all efforts being focused on offering non-conventional options for navigation systems that provide safety and autonomy. Their main aim is to enhance the quality of life of visually impaired people whose condition limits their mobility. Researchers in this field have addressed the problem in a multitude of ways, taking advantage of technological improvements to create a vast number of gadgets. Nevertheless, their generally prohibitive prices make them unaffordable to most people. An example of these is Substitute Eyes for Blind with Navigator Using Android [8] (1.890 USD), which provides information about close obstacles and – via an Android application – the location and position of the user in real-time. It also features a navigation system that assists in the user’s positioning [6]. Another example is Smart Cane (high price) [9], with an electromechanical vibrator, a buzzer, a water detector and ultrasonic sensors to detect close obstacles and offer instructions via voice messages [6]. Additional examples of high cost devices include: UltraCane (807.35 USD), based on ultrasound technology that provides haptic feedback; Miniguide Mobility Aid (499.00 USD), based on ultrasonic echolocation that detects objects in front of the user; and BuzzClip Mobility Guide (249.00 USD), based on SONAR technology that provides haptic feedback [10]. 1.2 Related Work Visual support technologies can be divided in the following categories: vision enhancement, vision substitution, and vision replacement [11]. The first is self-explanatory, the second consists of non-visual displays based on sound or haptic vibration, while the third is based on direct connections between the sensors and the nervous system in order to reach the visual cortex. The device developed in this article falls in the second category. The most common support systems are sensor-based, acquiring information from the user’s surroundings and translating it to audio or vibration signals [12]. The sensor technologies employed can vary, including, among others, ultrasound, infrared, and time-on-fly distance [13]. Examples include Mohd et al. who designed a device employing haptic vibration response and voice messages to communicate with the user [9]. Yi et al. designed an ultrasound-based device that detects objects in front of the user and employs haptic response and voice messages to alert them [14].
324
M. Ayala-Chauvin et al.
Completely different approaches include those by Nguyen et al., consisting of an electro-tactile antenna (designed to transmit navigation information) placed inside the mouth, on the tongue [15]. Brihaut et al. utilised the fusion of artificial vision with GPS to estimate the user’s real time position [16]. Ramadhan proposes a portable device for the wrist that provides navigation assistance and obstacle detection via auditory and haptic signals. The same auditory signals can alert people nearby if the user stumbles. A GSM system sends messages to the user’s caregivers’ cell phones regarding potential incidences, providing them with the user’s exact location via a GPS module [17]. Lupu et al. present Sound of Vision, a device allowing people with visual impairments to perceive physical obstacles via audible and haptic signals. The final prototype integrates an object detection camera in which the depth sensor allows users to move both indoors and outdoors. The device features an audio unit and a haptic belt that integrates 60 vibrating motors. An analysis of the user’s physiological response through electroencephalography and behavioral signals shows the device can provide users with more security, satisfaction and comfort [18]. Kiuru et al. propose a device based on RADAR technology, which emits and receives radio waves to determine the distance and location of potential obstacles. This information is conveyed via haptic or auditory messages. The device is fastened to the upper abdomen and can detect objects at a distance of up to 3.5 m [19]. While a large amount of support devices for visually impaired people has been developed in recent years, each one usually focuses on a single approach: either to improve user navigation, or to allow them to detect potential obstacles. Very few projects use both. Bai et al. developed a portable gadget integrated to a pair of spectacles, consisting of an RGB-D recognition camera and an inertial measurement unit. Additionally, it includes a smartphone that provides the user’s position in exterior environments and emits audio directions [20]. Due to its easy-to-use technology, Arduino provides multiple choices to develop robust, low-cost devices. Thus, Niharika et al. designed an Arduino-based, low-cost, portable device containing an ultrasonic sensor, a water sensor, and a buzzer, along with the technology featuring the novel capability of detecting water before coming in contact with it [21]. In the same vein, Petsiku et al. designed an open source, low-cost device employing readily available electronic components and controlled by an Arduino Nano [10]. The rest of this contribution is organised as follows: Sect. 2 will present the methods used in this investigation, Sect. 3 will describe the results, and Sect. 4 will present the conclusions of the study.
2 Method The system was planned with a typical stage-based method used in product design (see Fig. 1). In the first design phase, the specifications of the product were determined through interviews with students in a school for the blind in Ecuador. The requirements that were considered indispensable were: affordability, portability, aesthetic appeal, protection
Low-Cost, Ultrasound-Based Support System
325
Fig. 1. Flowchart of the design method.
above the waistline, and no need for internet connection. Furthermore, the interviewees expressed the need for the warning system to be conceived not as a substitute, but as a complement to the cane, since the former is irreplaceable in their mobility from the waist down, providing them with a feeling of safety. After noticing the economic concerns, a survey was conducted. From this, it was determined that potential Ecuadorian buyers were willing to pay up to USD 200 for a support device, which is significantly lower than the price of similar products sold in developed countries. In the second phase, a conceptual design was produced based on previous specifications. For this, a bibliographical search was executed, reviewing other technologies developed to help visually impaired people. In the third phase, the selection of materials and components was carried out within the context of Ecuador, making the affordability of the system a vital requirement that was successfully fulfilled. In the final phase, the detail design of the system was executed with CAD software, followed by the materialisation of a prototype with low-cost 3D printed plastic parts assembled together with electronic components. 2.1 Architecture The proposed system consists of three main parts: controller, sensors, and alarms. All of these are integrated as shown in Fig. 2. The proposed system detects and monitors objects at a short distance from the user’s upper body. As the user approaches the obstacle, the device activates an electromechanical vibrator, of which the intensity increases as the user comes closer to the object. The device uses a high capacity battery to ensure continued operation during long periods. The system features the following main characteristics: Fastening to the forehead through an elastic strap, high portability, alarm warning system, low energy consumption and intuitive interface. While the final prototype uses an electromechanical vibrator as the warning system, it could be easily replaced by a speaker depending on the user’s preferences and given the system’s modularity. Some of the secondary aspects seen in other designs, like the water sensor or elaborate casings and displays, were deemed unessential and discarded in order to keep the final price low.
326
M. Ayala-Chauvin et al.
Fig. 2. Diagram of the proposed system.
Fig. 3. Cane technique with ultrasound-sensing
Additionally, an optional cell phone Android application connected via Bluetooth was developed, capable of generating sound and vibration in the phone besides the electromechanical vibrator of the forehead device (see Fig. 3). 2.2 Design Figure 4 shows all the components of the forehead device. The CAD design of the case was executed with PTC CREO. Later, the 3D models of the rest of the components, available online, were assembled inside the case of the CAD model. Besides the fact that all the electronic components are easily available in the local market, the device presents a modular design which allows for easy maintenance and repair. Table 1 gives a summary of all elements employed and their cost. Figure 5 shows the electronic circuit. The electronic components would later be soldered together and assembled compactly inside the 3D printed plastic case of the final prototype.
Low-Cost, Ultrasound-Based Support System
327
Fig. 4. CAD design of the forehead device.
Table 1. Bill of materials for the forehead device. Quantity
Component
Cost (USD)
1
3D printed case
10.0
4
Screws
1
Elastic band
5.00
-
Cable and soldering material
0.15
1
Arduino Nano
8.00
1
Ultrasonic sensor HC-SR04
3.00
1
Vibrator motor, 3 V
0.50
1
Lithium battery
1
Slide switch
0.50
1
Resistor, 1 k
0.05
1
Bluetooth HC-06
8.00
Total cost (USD)
0.10
10.0
45.30
The Arduino Nano controller was programmed through the open source IDE program to implement the object detection capabilities of the device. The code applies an exponential filter to the signal to ensure the correct operation of the electromechanical vibrator [10]. The Android cell phone application was connected wirelessly to the forehead device via the HC-06 Bluetooth module (see Fig. 6). The application interprets the distance via information gathered by the HC-SR04 sensor and generates the audio and vibration warning signals. A Samsung Galaxy Note 8 cell phone with Android V9 was used during the tests. The bill of materials, the CAD file of the case, and the electronic circuit diagram would be made public as open source information.
328
M. Ayala-Chauvin et al.
Fig. 5. Electronic circuit.
Fig. 6. Cell phone application interface.
2.3 Operation The HC-SR04 ultrasound sensor employed can detect objects in front of the user within a 35 cm to 140 cm range. Activated with a 10 µs pulse, it then sends eight 40 kHz ultrasonic signals and detects if they are reflected by the object. The distance to the object is determined by Eq. 1: D=
ct 2
(1)
Where D is the distance between the sensor and the object, t is the time it takes the signal to travel from the sensor to the object and back to the sensor, and c is the speed of sound in air. The low speeds of a person’s movements make the consequences of the Doppler effect negligible.
Low-Cost, Ultrasound-Based Support System
329
Fig. 7. Distance measured by the sensor vs. vibration intensity, real measure and modelled function.
Figure 7 shows the measurements of the ultrasound sensor. From these, a function to model its behaviour can be obtained (see Eq. 2): −3.13D + 338.52 if 0 < D ≤ 108.28 (2) I= 0 if D > 108.28 Where I is the vibration intensity in Hz, and D is the distance between the sensor and the object in cm. The energy consumption of the device is estimated at 30 mA assuming the electromechanical vibrator is working 50% of the time. Thus, a 400 mAh battery could provide an autonomy greater than 26 h.
3 Results Once the prototype was assembled (see Fig. 8), its functionality was tested with various subjects, obtaining satisfactory results.
Fig. 8. Ultrasound-sensing prototype.
330
M. Ayala-Chauvin et al.
The tests were carried out with 15 blind male and female subjects aged between 20 and 60. Three circuits were tested: a corridor, a staircase, and a sidewalk. In all cases, the circuit presented obstacles approximately at the level of the head or torso. Figure 9 shows some of the tests being carried out.
Fig. 9. Prototype testing procedures.
The device was calibrated and the different operation modes (with and without the cell phone application) were verified, arriving at a situation where the alarms were swiftly triggered.
Fig. 10. Individual test scores.
Figure 10 shows the test results for all the subjects and test types. The columns represent the mean scores of each subject in all three tests. Each test was run five times, and the time of each attempt was subsequently averaged. The total average of all the subjects and all the tests was 6.22 min. The haptic warnings emitted by the cell phone when using the application had special acceptation among the youngest users.
Low-Cost, Ultrasound-Based Support System
331
4 Conclusions This document proposes the design of a portable device featuring open source design and low materialization costs, with the final price of 43.50 USD being much lower than the limit of 200 USD found in the surveys from Ecuador, and even lower than the prices of most commercial devices with a similar aim. The system, presenting an intuitive interface, generates warning signals in the form of sounds or haptic vibrations when detecting obstacles in close proximity, and the patterns of these signals change as the user comes closer to them. This system can aid visually impaired people of any age in their daily lives when used alongside a cane, significantly improving their mobility and sense of safety for their upper body. It can also be wirelessly linked to an optional cell phone application that can generate additional benefits in the form of warning signals. The use of haptic vibration warnings means it can also be used by deaf-blind people. From the graph in Fig. 6 and the personal experience of using the device, it can be observed that the frequency response of the sensor is inversely proportional to the distance, even though it is known that humans perceive increments in frequency and amplitude logarithmically [22]. Therefore, in a future model, a logarithmic relationship will be tested in order to determine if there is an increase in the device’s perceived accuracy. Finally, given the option of connecting the forehead device to a cell phone via Bluetooth, the possibility of adding an emergency button to warn relatives via the telephone’s SIM card has also been considered for future versions.
References 1. Bourne, R.R.A., Flaxman, S.R., Braithwaite, T., Cicinelli, M.V, Das, A., Jonas, J.B., Keeffe, J., Kempen, J.H., Leasher, J., Limburg, H., Naidoo, K., Pesudovs, K., Resnikoff, S., Silvester, A., Stevens, G.A., Tahhan, N., Wong, T.Y., Taylor, H.R.: Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis. Lancet. Glob. Heal. 5, e888–e897 (2017). https://doi. org/10.1016/S2214-109X(17)30293-0. 2. Flaxman, S.R., Bourne, R.R.A., Resnikoff, S., Ackland, P., Braithwaite, T., Cicinelli, M.V., Das, A., Jonas, J.B., Keeffe, J., Kempen, J.H., Leasher, J., Limburg, H., Naidoo, K., Pesudovs, K., Silvester, A., Stevens, G.A., Tahhan, N., Wong, T.Y., Taylor, H.R.: Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis. Lancet. Glob. Heal. 5, e1221–e1234 (2017). https://doi.org/10.1016/S2214-109X(17)303 93-5. 3. Ackland, P., Resnikoff, S., Bourne, R.: World blindness and visual impairment: despite many successes, the problem is growing. Community Eye Heal. 30, 71–73 (2017) 4. Welp, A., Woodbury, R.B., McCoy, M.A., Teutsch, S.: Making Eye Health a Population Health Imperative: Vision for Tomorrow. Presented at the (2016) 5. Pissaloux, E., Velazquez, R.: Mobility of Visually Impaired People: Fundamentals and ICT Assistive Technologies. Springer Publishing Company, Incorporated (2017) 6. Elmannai, W., Elleithy, K.: Sensor-based assistive devices for visually-impaired people: current status, challenges, and future directions. Sensors (Basel). 17 (2017). https://doi.org/10. 3390/s17030565.
332
M. Ayala-Chauvin et al.
7. Pereira, A., Nunes, N., Vieira, D., Costa, N., Fernandes, H., Barroso, J.: Blind guide: an ultrasound sensor-based body area network for guiding blind people. Procedia Comput. Sci. 67, 403–408 (2015). https://doi.org/10.1016/j.procs.2015.09.285. 8. Bharambe, S., Thakker, R., Patil, H., Bhurchandi, K.M.: Substitute eyes for blind with navigator using android. In: 2013 Texas Instruments India Educators’ Conference, pp. 38–43 (2013). https://doi.org/10.1109/TIIEC.2013.14. 9. Abd Wahab, M.H., Talib, A., Abdul Kadir, H., Johari, A., Ahmad, N., Sidek, R., Abdul Mutalib, A.: Smart Cane: Assistive Cane for Visually-impaired People. CoRR. abs/1110.5, (2011) 10. Petsiuk, A.L., Pearce, J.M.: Low-cost open source ultrasound-sensing based navigational support for the visually impaired. Sensors 19 (2019). https://doi.org/10.3390/s19173783. 11. Dakopoulos, D., Bourbakis, N.G.: Wearable obstacle avoidance electronic travel aids for blind: a survey. IEEE Trans. Syst. Man, Cybern. Part C (Appl. Rev. 40, 25–35 (2010). https:// doi.org/10.1109/TSMCC.2009.2021255. 12. Elmannai, W.M., Elleithy, K.M.: A Highly Accurate and reliable data fusion framework for guiding the visually impaired. IEEE Access 6, 33029–33054 (2018). https://doi.org/10.1109/ ACCESS.2018.2817164 13. Islam, M., Sadi, M., Zamli, K., Ahmed, M.M.: Developing walking assistants for visually impaired people: a review. IEEE Sens. J. 19, 2814–2828 (2019). https://doi.org/10.1109/ JSEN.2018.2890423 14. Yi, Y., Dong, L.: A design of blind-guide crutch based on multi-sensors. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp. 2288–2292 (2015). https://doi.org/10.1109/FSKD.2015.7382309. 15. Nguyen, T.H., Le, T.L., Tran, T.T.H., Vuillerme, N., Vuong, T.P.: Antenna design for tongue electrotactile assistive device for the blind and visually-impaired. In: 2013 7th European Conference on Antennas and Propagation (EuCAP), pp. 1183–1186 (2013) 16. Brilhault, A., Kammoun, S., Gutierrez, O., Truillet, P., Jouffrais, C.: Fusion of artificial vision and GPS to improve blind pedestrian positioning. In: 2011 4th IFIP International Conference on New Technologies, Mobility and Security, pp. 1–5 (2011). https://doi.org/10.1109/NTMS. 2011.5721061. 17. Ramadhan, A.: Wearable smart system for visually impaired people. Sensors 18, 843 (2018). https://doi.org/10.3390/s18030843 18. Lupu, R.-G., Mitrut, , O., Stan, A., Ungureanu, F., Kalimeri, K., Moldoveanu, A.: Cognitive and affective assessment of navigation and mobility tasks for the visually impaired via electroencephalography and behavioral signals. Sensors (Basel) 20 (2020). https://doi.org/10. 3390/s20205821. 19. Kiuru, T., Metso, M., Utriainen, M., Metsävainio, K., Jauhonen, H.-M., Rajala, R., Savenius, R., Ström, M., Jylhä, T.-N., Juntunen, R., Sylberg, J.: Assistive device for orientation and mobility of the visually impaired based on millimeter wave radar technology—Clinical investigation results. Cogent Eng. 5, 1450322 (2018). https://doi.org/10.1080/23311916. 2018.1450322 20. Bai, J., Liu, Z., Lin, Y., Li, Y., Lian, S., Liu, D.: Wearable travel aid for environment perception and navigation of visually impaired people. Electronics 8, 697 (2019). https://doi.org/10.3390/ electronics8060697 21. Niharika, N., Heena, Jaint, B.: An electronic aid for the mobility of visually impaired. In: 2015 Annual IEEE India Conference (INDICON), pp. 1–4 (2015). https://doi.org/10.1109/ INDICON.2015.7443836. 22. Jacoby, N., Undurraga, E.A., McPherson, M.J., Valdés, J., Ossandón, T., McDermott, J.H.: Universal and non-universal features of musical pitch perception revealed by singing. Curr. Biol. 29, 3229–3243.e12 (2019). https://doi.org/10.1016/j.cub.2019.08.020.
SD-CCN Architecture to Improve QoE for Video Streaming Applications Amna Fekih1,3(B)
, Sonia Gaied Fantar2,3 , and Habib Youssef1,3
1 Computer Science Department, Higher Institute of Computer Science and Communication
Techniques - University of Sousse, 4011 Hammam Sousse, Tunisia [email protected] 2 Computer Science Department, Higher School of Sciences and Technologies - University of Sousse, 4011 Hammam Sousse, Tunisia 3 PRINCE Research Laboratory, University of Sousse, 4011 Hammam Sousse, Tunisia
Abstract. Adaptive video streaming is the most widely adopted solution by service providers to maximize user QoE and optimize resource utilization. The current internet based on host communications remains inadequate. Content-Centric Networks (CCN), the most popular ICN architecture, is an innovative approach proposed to overcome the limits of legacy IP networks by adopting a content-centric approach. However, there are important disadvantages in the existing CCN, which are the very large size of the CCN forwarding table and the huge overhead cost of the forwarding process. For this reason, this architecture cannot be appropriately used in a large-scale network. To solve this problem, we proposed a new replication framework for video streaming in Content-Centric Networks based on SDN Architecture (RF-VS-SD-CCN) in order to improve the efficiency of content delivery, the quality perceived by users (QoE) and enhance the consumption of network resources. The experiments, that were carried out with the ndnSIM simulator, have clearly shown the efficiency of our solution. Keywords: SDN · CCN · Video streaming · In-network caching · QoS · QoE
1 Introduction Over the past decades, video streaming has put a huge strain on the underlying distribution network. The network model is unchanged while the services that use the Internet have radically changed. These shortcomings have motivated the researchers to propose alternative architecture, Information Centric Networks [1, 2]. Especially the most popular ICN approach, CCN [2], based on content and characterized by in-network caching. The majority of media traffic comes from streaming services like Netflix, YouTube. HTTP Adaptive Streaming (HAS) form the basis of these services already mentioned. Basically, the video is temporarily divided into parts (chunks) having the same size or not and encoded at different quality rates. Therefore, in the event of rate fluctuations, the user migrates to a lower quality representation transparently based on buffer fill levels, device characteristics, and network statistics. HAS is based on Advanced Video © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 333–345, 2021. https://doi.org/10.1007/978-3-030-71187-0_31
334
A. Fekih et al.
Coding (AVC). However, the latter creates a certain redundancy since each quality layer is independent of the others. To solve this problem and improve the quality, scalable video coding (SVC) [3] can be used because it considers a basic quality layer to which any enhancement layer is added. The limits of the Internet and the evolution of video streaming oppose each other and increase the complexity of managing this infrastructure. To have more open and programmable networks, several researchers [4–9] have proposed multiple solutions. Software-Defined Networks is the result of these works offering a programmable network characterized by a complete separation between control and transfer logic. The deployment of each of the technologies already mentioned can be explained by particular advantages. So, their combination offers additional bene-fits. By combining HAS with SVC, it improves cache hit ratio [11], minimizes load on servers [10], decreases cache redundancy [12], and overcomes the cache size constraint in CCN [16]. A gradual upgrade with SVC leads to more efficient quality switching [14]. Specifically, in CCN a user can use one or more caches during playback. CCN approach requires a clean slate of the current architecture. It, therefore, creates an incompatibility with traditional networks and makes deployment more difficult. SDN presents the ideal solution to implement and deploy this promising approach that offers a complete separation between the control and data planes and network programmability. In the present paper, we develop a strategy for video streaming applications based on SD-CCN architecture proposed in previous work [24] to improve the efficiency of content delivery, user perceived quality (QoE) and improve the consumption of network resources. The main idea is a new replication framework for video streaming in ContentCentric Networks based on SDN Architecture (RF-VS-SD-CCN). The contributions of this work are twofold: • We augment the CCN node with proactive features. The main improved functionalities of CCN nodes are traffic shaping, anticipation of interests, cooperative caching and content replication according to utility (cost delivery). • We proposed a new replication framework for video streaming in Content-Centric Networks based on SDN Architecture in order to improve the efficiency of content delivery, the quality perceived by users (QoE) and enhance the consumption of network resources. The rest of this article is structured as follows: Related work will be detailed in Sect. 2. The next section presents the RF-VS-SD–CCN operating principle. Performance evaluation is illustrated in Sect. 4. Finally, conclusions are drawn in Sect. 5.
2 Related Work Emerging streaming technologies such as Adobe’s HTTP Dynamic Streaming send the same video content with multiple resolutions to different consumers. Although users may experience frequent freezes if the bandwidth is not large enough to support video resolutions, SVC is characterized by automatically adapting to link fluctuations. In [11], the authors enumerated the advantages of SVC if it is applied to caching and uplink
SD-CCN Architecture to Improve QoE
335
bandwidth. In [15], the authors propose to group the HAS sessions in unicast and to share the same content in a single multicast session for a better use of resources in 3GPP networks. Andelin et al. [16] propose a SVC-specific quality selection heuristic to take into account more specific algorithmic decisions. Lederer et al. present an architecture for DASH on CCN [13]. The authors identified several challenges primarily the transparent forwarding between multiple interfaces supported by name-based routing in ICN. In [14], the authors propose a DASH-aware scheduling algorithm for edge cache prefetching to improve the QoE of streaming video in ICN, using residual bandwidth to request video segments in advance. An analysis of the influence of CCN caching under different time scales is detailed in [17] where the authors reported that with a cache of 10GB and 1TB, an average cache hit rate of approximately 25% and 45% respectively is achieved. Wenjie Li et al. [26] proposed a cache placement scheme to improve user QoE through cache partitioning. Measurements show that the proposed scheme can significantly reduce bit rate oscillation. However, these solutions do not take into account the constraints of cache size and scalability. Other proposals have considered SDN as an ideal architecture to improve rate adaptation and subsequently improve the performance of video streaming applications. In [18], the authors implemented the adaptation logic by associating an adaptation device with the SDN controller and imposing a calculated bitrate to client. In [19], the selection of video quality is managed by a video control plan administered by an SDN controller. In [20], network congestion control is given to the SDN controllers in order to ensure reliable network service for real-time interactive applications. Nevertheless, these solutions do not consider in-network caches. Alternatively, OpenCache [21] is the first proposed solution in the literature to provide network caching through an optimized SDN architecture. [22] proposed a new hybrid architecture based on the combination of SDN and CCN to provide assisted DASH-aware networking. The first solution [21] corresponds to our proposition in terms of control support of CCN nodes which ensured by the SDN controllers as well as the caching. Otherwise, instead of defining new nodes to support caching, our approach is built around CCN routers which are themselves classified, characterized natively by network distributed caching and managed by one or more SDN controllers. The second hybrid solution [22] has several points of intersection with our approach but clearly differs in terms of collecting statistics and classifying requests. Table 1 presents a summary of this related work. We compare these approaches cited as well as our proposal according to the characteristics considered.
3 RF-VS-SD-CCN Operating Principle Our architecture is composed of a set of controllers deployed according to a topology defined by network operators for dynamic management of CCN clusters. The controllers are responsible for communicating with the orchestrator for the purpose of exchanging synchronization information and building their own network view periodically. In our architecture, a controller can be active or passive. It is active if at least one CCN node is attached. If it hasn’t, it’s passive. The latter remains in listening pending the allocation
336
A. Fekih et al.
of new CCN nodes. Our architecture is made up of these main actors: SDN orchestrator, SDN controller, CCN node, CCN server, and CCN client. Figure 1 describes our architecture. Our architecture allows active cooperation between the different elements of the network. CCN nodes, SDN controllers and SDN Orchestrator pushing the DASH client to make a good selection of segments in order to have the convenient quality of network conditions as well as client capacities. In [27–29], the authors detailed when users shared similar video preferences the probability of requesting the same video content is higher too. First, we introduce the new concept of “family” defined by analyzing the user preference similarity (e.g. playback time, delay margin) and mobility similarity (e.g. movement direction and moving speed), which can help improve routing performance and cache hit ratio in mobile networks. Subsequently, we propose a CCN process which consists of Traffic Shaping and Interests anticipation, as well as SDN controller process that enables path optimization, caching, and replication decisions. These features help maintain the stability of high-quality DASH traffic. Table 1. Summary of related work Paper
SDN
Idash [11]2011 Dash over CCN [13]2014 Congestion-aware edge caching [14]2015
SDN model -
CCN
Caching
Replication
Parallel adaptive http [15]2011 Quality selection for Dash [16]2012 CCN [17]2009 Delivering stable high-quality video [18] 2016 VCP [19] 2016 Congestion control for interactive applications [20] 2018 Ripplecache [26]2018 OpenCache [21] 2015 SD-RF [23] 2020 SICS [24] 2018 LECC [25] 2018 SDN-CCN-DASH[22] 2019 Our proposal VS: Video Streaming processed
C C D C D D D C D C: Centralized
D: Distributed
VS
SD-CCN Architecture to Improve QoE
337
3.1 Metrics Collection Metrics Collection in the Client Side To build a user family, two key factors are taken into consideration; estimation of the user preferences similarity pattern as well as estimation of the mobility similarity pattern of a mobile user. Due to limited space, description of metrics collection is outside the scope of this article. Metrics Collection in the Network Side Our proposal (RF-VS-SD-CCMN) is based on the QoS parameter collection module, the management module (routing, caching and replication decisions), and the QoE measurement module. The primary function of the first module is to provide real-time link QoS information for the QoE measurement module. The second module is responsible for controlling and calculating the QoE value of the transmission path in order to decide the final path to be selected. The management module uses the information collected and calculated by the other modules to make final routing, caching, and replication decisions. The operating mechanism is illustrated in Fig. 1.
Fig. 1. RF-VS-SD-CCN Architecture
3.2 CCN Process The main feature of CCN is in-network caching. However, this generates new problems such as flow oscillations [30]. To overcome the inefficiencies of caching, our proposal is based on supporting the mobile user to guide them towards the best choice through the different steps of the request and the transmission of the data packet. Therefore, the client rate adaptation choices are processed by the CCNs in order to accommodate the chunks already cached locally and their bitrates. For this, the CCN node checks whether the next video chunk is cached locally and with what representation. If not, it checks first Family Storage Client Table (FSCT) which provides information about video chunks cached by clients of the same family. If
338
A. Fekih et al.
the requested chunk is not yet found, then CCN node secondly checks Neighbor Storage Node Table (NSNT) which provides information on video chunks cached by other CCN nodes (Edge/Internal Node) requiring only on one hop for content delivery. Otherwise, it predicts future demand on the next video chunk to the assigned SDN controller, which improves QoE. The latter according to its complete vision on its coverage area, it determines the best route based on cooperative caching of assigned CCN nodes. Otherwise, it predicts a future demand on the next video segment to neighboring SDN controllers through its SDN orchestrator. This principle is described in Fig. 3. The main improved functionalities of CCN are traffic shaping, interest anticipation, cooperative caching and content replication according to utility (cost delivery). Traffic Shaping This step involves modifying the data rate provided based on mobile user observations. The flow rate noted Tres must verify the following conditions: Tres ≥ rreq
Bs ≤ Tres ≤ Bc
(1)
Tres must be greater than or equal to the rate of the requested chunk. It must also be between the bandwidth on the server side with the the CCN node (Bs ) and the available bandwidth on the connection of the client side with the CCN node (Bc ). Interests Anticipation To support DASH over QueryCCN, CCN nodes must request a future chunk with a specified rate to achieve better QoE. Two constraints are addressed: the occupation of the buffer presented by the cache capacity of the CCN node and the available bandwidth (Bc /Bs ). ranticip = arg min |ri − Bs | ri , 1≤i≤n
(2)
Cooperative Caching Due to limited space, our caching approach will be briefly described in this article, focusing on the main tables used for caching management. The SDN and CCN processes present an evolution of our preview work [23–25] of the study of caching for CCN networks. FSCT and NSNT tables, described in Fig. 3, are managed by SDN controllers. After creating user families and as the network traffic information is collected, SDN controller updates FSCT which records the information about the cached content in the personal storage of clients belonging to the same family. Based on the topology information collected, SDN controller fills NSNT with information from video content cached in neighbors requiring a one hop for content delivery. Content Replication Each SDN orchestrator manages a geographic area that consists of several neighboring SDN controllers and acts as a broker for them. In the event of multiple exchanges between SDN controllers to locate an Interest packet and if a join request threshold is reached then the SDN orchestrator triggers a network information collection request without waiting for periodic network information collections. Then, it analyzes the topology, QoS collection, and content distribution. Finally, it installs rules for replication points taking into account QoE measurements.
SD-CCN Architecture to Improve QoE
339
3.3 SDN Process According to the basic CCN approach when a data packet is transmitted, the intermediate CCN nodes store a copy of the content. This principle generates content redundancy in the network, frequent updating of caches and poor use of resources. Our hybrid architecture provides cooperative caching by leveraging the cache capacity of CCN nodes and the SDN controller’s network vision to update cache cooperation tables between clients of the same family or neighboring nodes. Each SDN controller listens continuously to associated CCN nodes by monitoring network information collected periodically. Our proposal is based on an OpenFlow extension. There are three types of messages exchanged between the controller and the CCN nodes. The first is symmetric messages. Once the secure channel is established, Hello (Request/Reply) messages are exchanged. Then, Echo (Request/Reply) messages are used during channel operation to ensure the connection is still alive and to measure current latency and throughput. The second type is asynchronous messages which are sent by CCN nodes to the controller without the node having been requested by the controller. The Packet-In message is used to pass a packet for their support (content name lookup/routing information). The Port-status message is used to communicate link changes (Fig. 2).
Fig. 2. CCN node tables
Fig. 3. Flowchart of CCN process
Finally, the CCN node uses the Error message to notify the controller of errors. The last type is controller-switch messages which is the most important category.
340
A. Fekih et al.
These messages may or may not require a response from the CCN node. Featurecaching_replication (Request/Reply) messages are used to enable caching which is managed by the controller. With this new functionality added to the controller, data collection is mandatory to manage the caches scattered over the network. For this, Readstate_cache (Request/Reply) messages are used. According to collected information, Modify-state (Routing/Caching) messages are used to apply changes. Regarding the cost that can be generated by OpenFlow messages [15] note that in our proposal there is an optimization of these messages. Packet-In and Modify-state messages are only executed after an unmatched incoming packet or its absence in CCN caches. Afterward, the requested content is routed through CCN nodes and specifically from the nearest CCN node storing the content. This principle is described in Fig. 4.
4 Performance Evaluation 4.1 Simulation Setup and Parameters We perform a set of extensive simulations in an extended mininet emulation environment to emulate very large SDN networks (e.g. Maxinet), a wide area network simulator supporting DASH to simulate the data transmission procedure in CCN (e.g. ndnsim in NS3), and a traffic generator to generate a traffic flow. We have developed our own implementation of FESTIVE [15] in order to simulate the adaptation behavior on the client side. Its principle is to capture the latest advances in bitrate adaptation based on the occupation of the buffer memory. The videos consist of 2 s segments. User interests are captured by a Zipf type distribution (controlled via the asymmetry parameter α). We use YouTube recommended encoding rates.
Fig. 4. Flowchart of SDN process
CCN server codes each video with 4 different bitrates (1, 2.5, 5 and 8). The control of any segment of the video is entrusted to FESTIVE as soon as a request is triggered. Table 2 presents the simulation parameters.
SD-CCN Architecture to Improve QoE
341
Table 2. Simulation parameters
Fig. 5. Total amount for required bandwidth
Each CCN (EN/IN) node has a Content Store (CS). The size Ci is controlled by ω and it is related to a total available system capacity. We define Ci the capacity of each CCNnode with content store as: (3) The following parameters are used to evaluate the performance of RF-VS-SD-CCMN. Expected Bitrate (Eb ): It is an aggregation of successive measurements of the video quality that the user expects. It is presented in Fig. 6(a1–b1). Bitrate Oscillation (Bo ): in practice, it is presented by images indicating an improvement or a decrease in the quality of the video. In our experiments, we calculate the average of the number of increases and decreases in bitrate while viewing a video file. Figure 6(a2–b2) draws an average of measurements of this parameter. Video Freezing Duration (VF ): it can be defined as the average playback time spent in a “frozen” state. It is illustrated in Fig. 6(a3–b3). To evaluate our proposed work, we have chosen these measures to: i) control and evaluate on the client side the amount of average video that can be estimated by a mobile node (Eb ). ii) control and monitor the bit rate oscillation on the client side in order to adapt to the client’s requirements (Bo ). iii) Finally, we calculate the duration of a video paused in the buffer memory in order to assess whether the choice of video bitrates by users is good (VF ). We run a set of repetitive evaluations by modifying the values of ω and α to determine the influence of total content store size and popularity-related bias α. 4.2 The Impact on Expected Bitrate The ratio of bitrates selected by adaptive rate control during the entire experiment can be summarized by a bit rate distribution. This distribution reflects an aggregate measure of overall video quality that current network resources can support under a given caching replacement approach. We further use the weighted average to calculate the expected bitrate (Eb ) based on the appearance of the bitrates in user requests, which maps the discrete rates to a continuous and comparable metric. A higher Eb indicates better cache efficiency by improving adaptive broadcast quality.
342
A. Fekih et al.
4.3 The Impact on Bitrate Oscillation Figure 6(a2–b2) shows that bitrate switches increase with high cache capacity or decrease popularity distribution. Comparing our proposal with the other strategies, we notice that it achieves the lowest bitrate oscillations. This improves the cache hit rate and thus triggers bitrate adaptations to match the requested quality level. So, our caching system is based on an effective strategy that keeps user requests for high quality content as long as possible. 4.4 The Impact on Video Freezing As long as the video access time for each video chunk is high, as long as the probability of freezing video playback is high. With a caching scheme that achieves a high success rate, the access time to the video chunk is reduced and subsequently the playback freeze is reduced. Figure 6(c1–c2) shows the efficiency of our proposal. It achieves the least video freezing. This can be explained by the high success rate, maintaining user demands for high quality content and responding to requests as soon as possible to relieve the traffic load. 4.5 The Impact on Required Peak Bandwidth To ensure seamless streaming delivery our proposal reduces underflow and buffer time. For this, as shown in Fig. 5, our proposal achieves the lowest peak bandwidth required.
Figure 6. a1) Expected bitrate a2) bitrate oscillation a3) video freezing [across cache capacity (ω)] b1) expected bitrate b2) bitrate oscillation b3) video freezing [across popularity skewness (α)]
SD-CCN Architecture to Improve QoE
343
5 Conclusion The purpose of this article is to study the integration of SDN and CCN architectures for video streaming applications. Our proposal consists in creating families of users according to the similarity of mobility as well as the similarity of their preferences. As and when users send requests for video content, two additional processes will be executed; CCN process and SDN process. We have carried out numerous simulations to evaluate the performance of the proposed framework. The results show the effectiveness of the proposed framework based on the cooperation between CCN nodes with supervision controlled by the SDN controller for rapid routing of video chunks in the event of absence of requested content in local cache.
References 1. Koponen, T., Chawla, M., Chun, B.-G., Ermolinskiy, A., Kim, K.H., Shenker, S., Stoica, I.: A data-oriented (and beyond) network architecture. SIGCOMM Comput. Commun. Rev. 37(4), 181–192 (2007) 2. Liu, Z., Dong, M., Gu, B.: Impact of item popularity and chunk popularity in CCN caching management. In: Network Operations and Management Symposium (APNOMS), 18th AsiaPacific. IEEE, pp. 1–6 (2016) 3. Heiko, S., Detlev, M., Thomas, W.: Overview of the scalable video coding extension of the H. 264/AVC standard. IEEE Trans. Circ. Syst. Video Technol. 17(9), 1103–1120 (2007) 4. Hu, F., Hao, Q., Bao, K.: A survey on software-defined network and openflow: from concept to implementation. IEEE Commun. Surveys Tutorials 16(4), 2181–2206 (2014) 5. Chanda, A., Westphal, C.: ContentFlow: adding content primitives to software defined networks. In: 2013 IEEE Global Communications Conference (GLOBECOM), pp. 2132–2138. IEEE, December 2013 6. Chang, D., Kwak, M., Choi, N., Kwon, T., Choi, Y.: C-flow: an efficient content delivery framework with OpenFlow. In: The International Conference on Information Networking 2014 (ICOIN2014), pp. 270–275. IEEE, 2014 February 7. Luo, H., Cui, J., Chen, Z., Jin, M., Zhang, H.: Efficient integration of software defined networking and information-centric networking with CoLoR. In: 2014 IEEE Global Communications Conference, pp. 1962–1967. IEEE, December 2014 8. Gao, S., Zeng, Y., Luo, H., Zhang, H.: Scalable area-based hierarchical control plane for software defined information centric networking. In: 2014 23rd International Conference on Computer Communication and Networks (ICCCN), pp. 1–7. IEEE, August 2014 9. Shailendra, S., Panigrahi, B., Rath, H.K., Simha, A.: A novel overlay architecture for information centric networking. In: 2015 Twenty First National Conference on Communications (NCC), pp. 1–6. IEEE, February 2015 10. Famaey, J., Latre, S., Bouten, N., Van de Meerssche, W., De Vleeschauwer, B., Van Leekwijck, W., De Turck, F.: On the merits of svc-based http adaptive streaming. In: 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), pp. 419–426, May 2013 11. Sanchez, Y., Schierl, T., Hellge, C., Wiegand, T., Hong, D., De Vleeschauwer, D., Van Leekwijck, W., Le Louedec, Y.: iDASH: improved dynamic adaptive streaming over http using scalable video coding. In: Proceedings of the Second Annual ACM Conference on Multimedia Systems, New York, NY, USA, 2011, MMSys 2011, pp. 257–264, ACM (2011)
344
A. Fekih et al.
12. Sanchez, Y., Schierl, T., Hellge, C., Wiegand, T., Hong, D., De Vleeschauwer, D., Van Leekwijck, W., Le Louedec, Y.: Efficient http-based streaming using scalable video coding. Sig. Proces. Image Commun. 27(4), 329–342 (2012) 13. Lederer, S., Mueller, C., Timmerer, C., Hellwagner, H.: Adaptive multimedia streaming in information-centric networks. Network. IEEE 28(6), 91–96 (2014) 14. Yu, Y., Bronzino, F., Fan, R., Westphal, C., Gerla, M.: Congestion-aware edge caching for adaptive video streaming in information-centric networks. In: IEEE Consumer Communications & Networking Conference (CCNC), 2015. 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN), pp. 1–6, July 2011 15. Jiang, J., Sekar, V., Zhang, H.: Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive. In: Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, pp. 97–108, December 2012 16. Andelin, T., Chetty, V., Harbaugh, D., Warnick, S., Zappala, D.: Quality selection for dynamic adaptive streaming over http with scalable video coding. In: Proceedings of the 3rd Multimedia Systems Conference, New York, NY, USA, 2012, MMSys 2012, pp. 149–154. ACM (2012) 17. Jacobson, V., Smetters, D.K., Hornton, J.D., Plass, M.F., Briggs, N.H., Braynard, R.L.: Networking named content. In: Proceedings of ACM CoNEXT (2009) 18. Kleinrouweler, J.W., Cabrero, S., Cesar, P.: Delivering stable high-quality video: an SDN architecture with DASH assisting network elements. In: MMSys 2016, Klagenfurt, Austria (2016) 19. Cofano, G., De Cicco, L., Zinner, T., Nguyen-Ngoc, A., Tran-Gia, P., Mascolo, S.: Design and experimental evaluation of network-assisted strategies for HTTP adaptive streaming. In: MMSys 2016, Klagenfurt, Austria (2016) 20. Naman, A.T., Wang, Y., Gharakheili, H.H., Sivaraman, V., Taubman, D.: Responsive high throughput congestion control for interactive applications over SDN-enabled networks. Comput. Netw. 134, 152–166 (2018) 21. Georgopoulos, P., Broadbent, M., Farshad, A., Plattner, B., Race, N.: Using software defined networking to enhance the delivery of video-on-demand. Comput. Commun. 69, 79–87 (2015) 22. Jmal, R., Fourati, L.C.: Assisted DASH-aware networking over SDN–CCN architecture. Photonic Netw. Commun. 38(1), 37–50 (2019) 23. Fekih, A., Fantar, S.G., Youssef, H.: SDN-based replication management framework for CCN networks. In: Workshops of the International Conference on Advanced Information Networking and Applications, pp. 83–99. Springer, Cham, April 2020. https://doi.org/10. 1007/978-3-030-44038-1_9 24. Fekih, A., Fantar, S.G., Youssef, H.: Secure SDN-based in-network caching scheme for CCN. In: 13th International Conference on Systems and Networks Communications (ICSNC) pp. 21–28, October 2018 25. Fekih, A., Gaied, S., Youssef, H.: Proactive content caching strategy with router reassignment in content centric networks based SDN. In: 2018 IEEE 11th Conference on Service-Oriented Computing and Applications (SOCA), pp. 81–87. IEEE, November 2018 26. Li, W., Sharief, O., Fayed, M., Hassanein, H.S.: Bitrate adaptation-aware cache partitioning for video streaming over information-centric networks. In: 2018 IEEE 43rd Conference on Local Computer Networks (LCN), pp. 401–408. IEEE, October 2018 27. Xu, C., Jia, S., Zhong, L., Muntean, G.M.: Socially aware mobile peer-to-peer communications for community multimedia streaming services. IEEE Commun. Mag. 53(10), 150–156 (2015) 28. Wang, M., Xu, C., Jia, S., Guan, J., Grieco, L.A.: Preference-aware fast interest forwarding for video streaming in information-centric VANETs. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–7. IEEE, 2017 May
SD-CCN Architecture to Improve QoE
345
29. Wang, Y., Orapinpatipat, C., Gharakheili, H.H., Sivaraman, V.: TeleScope: flow-level video telemetry using SDN. In: 2016 Fifth European Workshop on Software-Defined Networks (EWSDN), Den Haag, Netherlands, pp. 31–36 (2016) 30. Alkhazaleh, M., Aljunid, S., Sabri, N.: A review of caching strategies and its categorizations in information centric network. J. Theor. Appl. Inf. Technol. 97, 19 (2019)
Dijkstra and A* Algorithms for Global Trajectory Planning in the TurtleBot 3 Mobile Robot Pedro Medeiros de Assis Brasil1(B) , Fabio Ugalde Pereira1 , Marco Antonio de Souza Leite Cuadros2 , Anselmo Rafael Cukla1 , and Daniel Fernando Tello Gamarra1 1 Universidade Federal de Santa Maria, Santa Maria, RS 97105-900, Brazil
{anselmo.cukla,daniel.gamarra}@ufsm.br
2 Instituto Federal do Espírito Santo, Serra, ES 29173-087, Brazil
[email protected]
Abstract. In this work, two global path planners will be evaluated, specifically the Dijkstra and A* algorithms. For this evaluation, a mobile robot that processes ROS (Robot Operating System) will be used. Tests were carried out in two different environments (symmetrical and asymmetric), in order to evaluate the performance of both algorithms. The mobile robot used was the TurtleBot 3 Burger, which has open source software. The results showed a small, but better performance of the Dijkstra algorithm, compared to the A* algorithm. Keywords: ROS · Mobile robotic · Navigation · Dijkstra · A* · TurtleBot
1 Introduction In recent years, the advancement of technology and the use of more powerful embedded systems, has allowed a development in the area of software for mobile robotics. This pave the way for the construction of new mobile robots, with greater computational processing and navigation capabilities [1]. However, it is clear that it is still necessary to improve the methods for navigation in static and dynamic environments. For mobile robotics, in order to reach its objective, the robot needs to make correct decisions regarding obstacle avoidance and adjustments of kinematic parameters (speed, rotation, etc.), using only the available computational power [2]. This strategy is known as autonomous navigation, and it derives the motivation behind this paper since there are not many comparative studies regarding the performance of global planners for mobile robots. Researches that carry out a quantitative and qualitative analysis of the global trajectory planning algorithms applied to a specific mobile robot are even more scarce, evaluating their performance in different environments. Autonomous navigation studies techniques for safe locomotion of mobile robots in different environments, thus being a direct application, the generation or planning of trajectories. Trajectory planning can be divided into two types: global trajectories © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 346–356, 2021. https://doi.org/10.1007/978-3-030-71187-0_32
Dijkstra and A* Algorithms for Global Trajectory Planning
347
and local trajectories. As a crucial part of the autonomous navigation problem, the trajectory planning algorithm must always find the best path to reach the target [3]. Among global path planning algorithms, for this work, two algorithms were chosen to perform a comparative analysis, Dijkstra [4] and A* [5] algorithms. These two algorithms were chosen due to their high popularity among ROS users. Recently, a lot of research has been carried out on the study of path planning applied to mobile robots. An example is the work carried out by Fahleraz in [6], in which an analysis of three different path-finding algorithms (Breadth-First Search, Dijkstra’s, and A*) in the context of a grid-based path-finding was made. He also presents a performance comparison for each of those algorithms and concluded that A* algorithm is consistently the fastest by a significant margin. Dirik, Kocamaz and Castillo in [7] present a novel design for the kinematic control structure of the wheeled mobile robot (WMR) path planning and path-following. The proposed system is focused on the implementation of practical real-time model-free algorithms based on visual servoing. Ammar et al. in [8] propose two new time-linear relaxed versions of Dijkstra and A* algorithms to solve the global path planning problem in large grid environments. Avelar, Castillo and Soria in [9] made a fuzzy logic controller with the Python fuzzylab library which is based on the Octave Fuzzy Logic Toolkit, and with the Robot Operating System (ROS) for autonomous navigation of the TurtleBot3 robot on a simulated and a real environment using a LIDAR sensor. Thus, it is intended to conduct experiments with the TurtleBot 3 robot, where the performance of these global navigation algorithms will be evaluated. The performance of local trajectory planning algorithms will not be evaluated. However, for the experimental tests, the Dynamic Window Approach (DWA) local planner algorithm will be used. The evaluation will be carried out in two different environments. The paper is divided in 6 sections, the first one is a brief introduction to the problem, the second section explains the theoretical basis, the third section describes the experimental setup, the fourth section shows the adopted methodology, the fifth section presents the main results, finally, the last section summarizes the conclusions of the work.
2 Theoretical Background 2.1 A* Algorithm A* (pronounced “A-star”) is a graph traversal and path search algorithm [5], often perceived as an extension to the Dijkstra algorithm [4]. A* is a search algorithm formulated from weighted graphs. This means that, starting from a specific node in a graph, the algorithm aims at finding a path towards the goal node with the smallest cost value possible (shortest travelled distance, shortest time, etc.). It is capable of doing this by maintaining a tree of paths originated from the starting node and extending them towards the edge one node at the time until its completion criterion is met. This algorithm performs well using heuristics to guide its search, however, a prevailing practical disadvantage is the spatial complexity due to the fact that it stores every
348
P. M. de Assis Brasil et al.
generated node in the memory. The algorithm terminates when the route that it chooses to follow lets the robot to go from the starting node to the target node, or whenever there are no feasible paths available. The heuristic function is problem-specific. 2.2 Dijkstra Algorithm One of the main differences between the Dijkstra and A* algorithms is the lack of a heuristic function where the number of expanded nodes is reduced. Thus, in each iteration of the Dijkstra algorithm, the adjacent node is evaluated. This assessment does not depend on the order or prioritization of branches, unlike the A* algorithm that uses a specific heuristic function for this problem. Once chosen a vertex (node) as the root of the search, this algorithm calculates the minimal cost to the other vertices of the graph. The term ‘graph’ refers to a pair. D = (N , E)
(1)
In which N represents a set of vertices and E represents a set of pairs of vertices, called ‘edges’. At any given node of the graph, the algorithm looks for the connection with the smallest cost (i.e. the shortest path) between this node and every other node, analyzing the weight/cost of the edge. This mechanism can also be utilized to estimate the costs of paths from a starting node to the destination node, terminating the algorithm once the shortest path is found.
3 Experimental Setup 3.1 Robot Operating System (ROS) The Robot Operating System (abbreviated as ROS) is a set of tools, libraries and conventions for the development of robotics software [10]. Developed by Stanford University in the early 2000s, ROS is used worldwide in different applications. It also has an open-source ecosystem that facilitates robotics programming through the contribution of thousands of users [11, 12]. In this work, ROS was used to access sensor reading variables, implement the navigation algorithms of global planning and also, to obtain the parameters of the robot movement. The system works through ‘nodes’ and ‘topics’ (services) that can publish results or subscribe to entries on other nodes. Thus, through the creation of a service server (master node), all functionalities used in experiments were identical, allowing for data validation and comparison. 3.2 TurtleBot 3 Burger TurtleBot is a mobile robot whose hardware and software is open source, and is also a low-cost robot. It was developed with the objective of carrying out several activities, without needing to add external components. This robot is used for autonomous and
Dijkstra and A* Algorithms for Global Trajectory Planning
349
real-time navigation in different environments, where running Simultaneous Location and Mapping algorithms (SLAM) to map the environment [13]. To compare the performance of the trajectory planning algorithms, the mobile robot TurtleBot 3 Burger will be used, as it is a programmable robot that already has the ROS installed. It has two DYNAMIXEL XL-430-W350-T servo motors, a Raspberry Pi, an OpenCR control board, 360-degree laser distance sensor and Lithium polymer battery [14, 15]. Figure 1 shows this robot and its specifications.
Fig. 1. TurtleBot 3 Burger, taken from [16].
3.3 Test Environments Robotics navigation can be used in structure environments and unstructured environments, the algorithms employed in this work will be analyzed in structured environments [17]. Then, two square indoor environments were built, and their dimensions in the xy plane are (3, 3) meters. The environments have internal obstacles. The first environment (Fig. 2) has a symmetrical obstacle in the letter U format. The second environment has an asymmetric arrangement of obstacles (Fig. 3). The yellow point in the figure represents the target and the green landmarks are used as a distance reference for the robot trajectory.
Fig. 2. First environment.
350
P. M. de Assis Brasil et al.
Fig. 3. Second environment.
4 Methodology For the robot to reach the target, ROS uses two displacement systems, or two types of planners, one for global trajectory planning and the other for local trajectory planning. With the local planner, TurtleBot can avoid obstacles that come up on the course and that are not mapped before. This does not happen during the tests, as there are no dynamic obstacles. The global planner estimates the shortest or fastest trajectory based only on information from the map. Since the objective of this work is to compare the efficiency of global planners, two different algorithms from this topic were used to generate the robot’s trajectory, maintaining the same method (Dynamic Window Approach) [18] for the local planner. The first algorithm is based on the Dijkstra algorithm and is the standard ROS method. The second algorithm was implemented externally using a plugin to run on ROS. This plugin is based on the A* (A-star) algorithm. Some parameters are used to improve calculations of the robot’s trajectories, and thus avoid getting stuck in tight paths. Table 1 below presents the chosen values for these parameters. For experiments with the algorithms, ten tests were proposed for each map, in order to find the shortest path. Five of these tests used Dijkstra algorithm and five used the A* algorithm. One of the maps has a U-shaped obstacle, presenting a symmetrical scenario where paths taken on both sides are similar and thus evaluating the decision capacity of the algorithms, and the other has randomly arranged obstacles, representing a real world scenario, as shown in Fig. 2 and Fig. 3. This approach provides significant data on the use of these algorithms on different situations, therefore it represents in a compelling manner the behavior of Dijkstra and A* applied to mobile robots. All experiments were performed using the same starting point, and the target was always the same end point. By doing so, it was possible to compare the precision rate of each algorithm, both for position and orientation. For each test of the TurtleBot 3 Burger, the odometer data, the target coordinate and the path taken by the robot were recorded.
Dijkstra and A* Algorithms for Global Trajectory Planning
351
Table 1. Parameters used in the experiments. Parameter
Value Description
controller_frequency 10
The frequency at which each controller will be triggered in Hz
cost_scaling_factor
35
Indicates the inflation decline curve of a more pronounced obstacle
inflation_radius
0.15
Indicates how far the zero cost point is from an obstruction
max_vel_theta
1.0
The maximum allowed rotational speed in radians per second
max_vel_x
0.18
The maximum permitted linear speed in meters per second
sim_time
1.0
The amount of future time that a path will be simulated in seconds
vx_samples
18
The number of travel speed samples that will be stored in the x direction
5 Results This section will show the results obtained with the Dijkstra and A* algorithms gathered through the experiments using the TurtleBot 3 robot. 5.1 Dijkstra Algorithm Five experiments were performed for every one of the two environments shown in Fig. 2 and Fig. 3, using the same parameters shown in Table 1 for navigation. Figure 4 and Fig. 5 depict an image sequence that shown different points observed during the robot trajectory in one of the experiments. This trajectory was compared with the odometry provided for the TurtleBot 3 sensors, generating more reliability for the collected data.
Fig. 4. Image sequence using the Dijkstra algorithm in the U shaped environment.
352
P. M. de Assis Brasil et al.
Fig. 5. Image sequence using the Dijkstra algorithm in the environment with randomized obstacles.
Figure 6 and Fig. 7 show some trajectories for the Dijkstra algorithm for the first and second environment, respectively. The point (0, 0) of the figure was defined as the starting point of the robot and the red point symbolizes the position of its target.
Fig. 6. TurtleBot trajectories using the Dijkstra algorithm in the first environment in meters.
Fig. 7. TurtleBot trajectories using the Dijkstra algorithm in the second environments in meters.
Dijkstra and A* Algorithms for Global Trajectory Planning
353
5.2 A* Algorithm In the experiments with A* algorithm, the same methodology that was employed with the Dijkstra algorithm was applied. The navigation parameters were the same of the previous experiments and are shown in Table 1. Five experiments were performed for the U shaped environment, as well as five experiments for the environment with obstacles distributed randomly. Figure 8 and Fig. 9 show a sequence of images of the generated trajectories of the A* algorithm in the first and second environment. The yellow points in the images represent the targets, the green points are used to obtain a relation between the image pixels and the distance in meters, the blue and red points help to find the robot position and store the trajectory followed by the robot.
Fig. 8. Image sequence using the A* algorithm in the U shaped environment.
Fig. 9. Image sequence using the A* algorithm with random obstacles.
The trajectory figures of some experiments with the A* algorithm for the first and second environments can be seen in Fig. 10 and Fig. 11, respectively. In all the figures the point (0, 0) is defined as the place where the robot starts its trajectory and the red point represents the target.
354
P. M. de Assis Brasil et al.
Fig. 10. TurtleBot 3 robot trajectories using the A* algorithm in the first environment in meters.
Fig. 11. TurtleBot 3 robot trajectories using the A* algorithm in the second environment in meters.
5.3 Dijkstra Versus A* Analyzing the data in all the experiments, it is possible to observe a subtle advantage of the Dijkstra algorithm, that usually reaches the objective in less time and with a higher precision. Table 2 shows the data acquired during the experiments. Table 2. Comparisson between the Dijkstra and A* algorithm. Dijkstra Map 1 Average time to arrive in the goal 29 (s)
A* Map 1
Dijkstra Map 2
A* Map 2
39
21
22
Average position error in relation with the goal (m)
±0,02
±0,03
±0,035
±0,035
Average orientation error (degrees)
±2
±4
±3
±7
Dijkstra and A* Algorithms for Global Trajectory Planning
355
6 Conclusion The tests with the TurtleBot 3 robot took place as planned and had a suitable outcome. Both the Dijkstra algorithm and the A* algorithm were able to reach the designated target in satisfactory times for the problem tested. The Dijkstra algorithm, implemented from ROS libraries, yielded superior results in comparison to the A* algorithm, regarding precision and time spent to get to the target location. This is probably caused by the fact that the Dijkstra algorithm is broadly used in navigation systems that run ROS and, therefore, is more optimized and robust for the tested applications. It is known that the main advantage of the A* algorithm over that of Dijkstra is the processing time required to find the optimal path, however, due to the fact that the maximum speed of the robot is not so high in the experiments carried out, the processing time was not a decisive factor in finding the best path in a minimum time. For future work, other path planning methods could also be added to the comparisons, such as Deep Deterministic Policy Gradient (DDPG), Deep-RL Soft Actor-Critic (SAC), Depth-First Search (DFS), Breadth-First Search (BFS), Bellman-Ford, Wavefront or the Rapidly-Exploring Random Trees (RRT) algorithms.
References 1. Perez, J.A., Deligianni, F., Ravi, D., Yang, G.: Artificial intelligence and robotics. UK-RAS White Paper Series on Robotics and Autonomous Systems (RAS) (2018) 2. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics, chap.7–9. MIT Press (2005) 3. Pittner, M., Hiller, M., Particke, F., Patino-Studencki, L., Thielecke, J.: Systematic analysis of global and local planners for optimal trajectory planning. In: ISR, Munich (2018) 4. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959). https://doi.org/10.1007/BF01386390 5. Hart, P.E., Nilson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 2, 100–107 (1968). https://doi.org/10.1109/ TSSC.1968.300136 6. Fahleraz, F.: A comparison of BFS, Dijkstra’s and A* algorithm for grid-based path-finding in mobile robots. Sekolah Teknik Elektrodan Informatika, Indonesia (2017) 7. Dirik, M., Kocamaz, A.F., Castillo, O.: Global path planning and path-following for wheeled mobile robot using a novel control structure based on a vision sensor. Int. J. Fuzzy Syst. 22, 1880–1891 (2020) 8. Ammar, A., Bennaceur, H., Châari, I., Koubâa, A., Alajlan, M.: Relaxed Dijkstra and A* with linear complexity for robot path planning problems in large-scale grid environments. In: Soft Computing 2015. Springer, Heidelberg (2015) 9. Avelar, E., Castillo, O., Soria, J.: Fuzzy logic controller with fuzzylab python library and the robot operating system for autonomous robot navigation: a practical approach. In: Intuitionistic and Type-2 Fuzzy Logic Enhancements in Neural and Optimization Algorithms: Theory and Applications, vol. 862, pp. 355–369. Springer, Cham (2020) 10. About ROS. https://www.ros.org/about-ros/. Accessed 13 Sept 2019 11. Fairchild, C., Harman, T.L.: ROS Robotics by Example - Second Edition: Learning to Control Wheeled, Limbed, and Flying Robots Using ROS Kinetic Kame, 2 edn. Packt Publishing, Houston (2017)
356
P. M. de Assis Brasil et al.
12. Da Silva, E.S., Cuadros, M.A.S.L., Legg, A.P., Gamarra, D.F.T.: Sensory integration of a mobile robot using the embedded system Odroid-XU4 and ROS. In: Latin Ametican Robotics Symposium (LARS) 2019, Rio Grande, Brazil (2019) 13. Turtlebot 3: Robotis. https://emanual.robotis.com/docs/en/platform/turtlebot3/overview/. Accessed 20 May 2019 14. Ackerman, E.: Robotis and OSRF announce TurtleBot 3: smaller, cheaper, and modular. IEEE Spectrum. https://spectrum.ieee.org/automaton/robotics/diy/robotis-and-osrf-ann ounce-turtlebot-3-smaller-cheaper-and-modular. Accessed 21 May 2019 15. Jesus, J.C., Bottega, J.A., Cuadros, M.A.S.L., Gamarra, D.F.T.: Deep deterministic policy gradient for navigation of mobile robots in simulated environments. In: International Conference on Advanced Robotics (ICAR) 2019, Belo Horizonte, Brazil (2019) 16. ROS Components, TurtleBot 3 Burger. https://www.roscomponents.com/en/mobile-robots/ 214-turtlebot-3.html. Accessed 29 May 2020 17. Dos Reis, D.H., Welfer, D., Cuadros, M.A.S.L., Gamarra, D.F.T.: Mobile Robot Navigation Using an Object Recognition Software with RGBD Images and the YOLO Algorithm. Appl. Artif. Intell. 33(14), 1290–1305 (2019) 18. Fox, D., Burgard, W., Thrun, S.: The dynamic window approach to collision avoidance. IEEE Robot. Autom. Mag. 4, 23–33 (1997)
A Data Mining Framework for Response Modelling in Direct Marketing F´ atima Rodrigues1(B) 1
and Tiago Oliveira2
Interdisciplinary Studies Research Center, Institute of Engineering - Polytechnic of Porto (ISEP/IPP), Porto, Portugal [email protected] 2 Institute of Engineering - Polytechnic of Porto (ISEP/IPP), Porto, Portugal [email protected]
Abstract. The main objective of response modelling is to identify customers most likely to respond to a direct advertisement. This paper introduces a comprehensive methodology for response modelling that addresses two practical difficulties: large training data and class imbalance. For that, first several pre-processing techniques are applied in order to remove non-significant attributes. Next, two wrappers filter methods (Ensemble Random Forest and Relief algorithm) are combined with two balancing methods (the random under-sampling and the SMOTE method) and validated with several models. The experimental results demonstrate that the random under-sampling outperforms SMOTE and the wrapper filter Relief algorithm gives better results than the Ensemble Random Forest when combined with all classification algorithms considered. Keywords: Response modelling Classification
1
· Feature selection · Data balancing ·
Introduction
Mass marking reach the masses and in many ways simultaneously using mass media such as, television, radio and newspaper, to reach as broad an audience as possible. Direct marketing on the other hand, emphasizes focus on the customer through a multitude of channels, including mail, e-mail, phone, and in person. Direct marketing messages should be carefully used, because its misuse, messages of no interest to the customers or excessive number of messages, could have an opposite desired effect, customer loss. One approach to improve direct message targeting is response modelling, i.e. a predictive modelling approach to identify customers who are most likely to respond to a campaign based on customers’ demographic and behavioral data. A well-developed response model will target only customers with relatively high purchase likelihood. Therefore, it can not only increase the profit and lower the marketing costs, but also strengthen customer loyalty, and so improve return on investment, customer relationships and retention. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 357–366, 2021. https://doi.org/10.1007/978-3-030-71187-0_33
358
F. Rodrigues and T. Oliveira
Given the interest in this domain, several approaches for response modelling have been reported in the last decades and can be divided into statistical methods, artificial intelligence based methods and hybrid approaches [1]. Logistic regression has been widely employed as a base model because of its simplicity and availability. Besides logistic regression, models based on RFM attributes [2] are also largely used. Other group of quantitative models is based on machine learning and includes applications of ANN, random forest [3] bayesian networks and support vector machines for determining target customers [4]. Ensemble and hybrid techniques that try to combine the results of some models based on certain principles, with the purpose of eliminating bias of single techniques is also extensively used [5]. Nevertheless, response modelling consists of several tasks such as data selection, data cleaning, feature selection, class balancing, classification, and evaluation. Various data mining techniques and algorithms have been applied for implementing each step. According to the review of the literature made very few articles deal with all these steps. Most of them focus only on two or three steps of modelling and the rest use the results of previous works. Here we shall address all its phases. There are two challenges commonly faced in dealing with marketing data: imbalanced data with the number of non respondent users significantly larger than that of respondent users; and high dimensionality due to the large variety of data that have been collected. Consequently, we have explored various techniques to resolve these issues through multiple sampling methodologies and modelling mechanisms to identify hidden behaviour patterns. So, we start from a realworld problem and data, extract features that work well with such data, apply techniques to address specific data issues, develop several models and evaluated them. In summary, it is the whole end-to-end process, methodology. For the rest of paper, we first describe the characteristics of the marketing data and the initial cleaning operations made on it. Then, in Sect. 3 first, filter selection methods are employed in order to remove redundant features, next, two wrappers selection methods are presented, the Relief algorithm and the ensemble feature selection algorithm. In Sect. 4, to overcome the imbalanced class distribution two balancing methods that use different sampling strategies are described and applied to data. In the following section the model building, testing process and discussion of results is made. Finally, we conclude the paper and lay out the future directions.
2
Data Description and Cleaning
The data that will be used to build the response models is from Springleaf, an American financial services company whose primary business are consumer lending, credit insurance, and other credit related products1 . Springleaf published on Kaggle two real datasets (train and test) with almost 1 GB each and made them 1
https://www.springleaf.com.
Response Modelling in Direct Marketing
359
publicly available for competition purposes2 . The main task released with the data is to predict which customers are potentially interested in a loan and are likely to respond to a direct mail offer. Both datasets contain 145231 clients and only the training set data contains the target attribute. In this dataset 23.3% of the clients responded to the mail offer. There are a total of 1934 features that have been anonymized to protect privacy. The meaning of the features, their values and their types are provided “as-is”. As mentioned earlier, we are facing two challenges with such data set: 1) data imbalance, where the percentage of responders is only 23%; and 2) high dimensionality, due to the large variety of features and cases. With these data characteristics, most modelling techniques will have a hard time obtaining any meaningful result. In this context, we start to clean the data. For that, we search for irrelevant features with no useful information such as, features with no distinct values, or the opposite, key features, with all its values distinct. Also features with no information, or features with all its values null, and duplicated features were eliminated. These operations led to an elimination of 65 features. Observations with a large number of unknown values are useless. Due to the large number of unknown values, there are no reliable methods that fill with high quality these missing values. Therefore, we eliminate rows with more than 30% of columns with unknown values, which led to an elimination of 30421 rows. After these cleaning operations the dataset remains with 114810 clients and a total of 1869 attributes, 16 character attributes, 1837 numeric attributes and 16 date attributes. These date attributes were codified to their corresponding numeric representation, because the common date format is not often supported by most feature selection algorithms. In large datasets it is important to select the best subset of features for modelling, because irrelevant features may increase the running time and turn models more complex. For this study, feature selection was performed in several steps using both filter and wrapper methods. The following section describes these steps, applied only to numerical features, since all of these methods only work with numerical attributes.
3
Feature Selection
Feature selection is a critical step in response modelling, because of related high dimensionality datasets. Feature selection involves searching the space of possible features (variables) to identify a subset that is more relevant for the analysis of the data. Generally, there are two types of approaches to feature selection: filters, as a preprocessing step independent of the induction algorithm; and wrappers, as a process explicitly making use of the actual target learning algorithm to evaluate the usefulness of inputs [6]. Feature wrappers often achieve better results than filters because they are tuned to the specific interaction between an 2
https://www.kaggle.com/c/springleaf-marketing-response.
360
F. Rodrigues and T. Oliveira
induction algorithm and its training data, but have a clear overhead in terms of computational resources. When the number of features is very large, the best approach is to start with filter models, due to its computational efficiency, following by wrapper methods. In this study, due to the large number of features, the wrapper methods alone could not be used. So, before using wrapper approaches, a subset of features will be eliminated using filter methods. 3.1
Simple Filters Based on Distribution Properties
The first filter function applied to numerical data diagnoses features that have very few unique values that occur with very low frequencies. These “near-zero variance predictors” have a single value for the vast majority of the samples being the frequency of the others unique values severely disproportionate. Looking at the data we find 429 numerical features with near-zero variance. After this elimination we check the inter-quartile range (IQR) of the remaining features, except binary features, and observe that a large proportion of the features have IQRs near zero. Features with very low variability will not be useful in discriminating among the responders and non-responders, so we safely remove 276 features with zero IQR. In general, highly correlated predictors measure the same underlying information and add more complexity to the model than information they provide to the model, so they must be avoid. In order to eliminate only predictors highly correlated, we calculate all pairwise correlations and choose those greater than 0.90, which removes 482 features from the dataset. We are left with 682 features (650 numerical, 16 categorical and 16 date attributes) from the initial 1934 features. This is a rather significant reduction. Nevertheless, we are still far from a dataset that is “manageable” by most classification models. 3.2
Wrappers Filters
In addition to studying the effects of individual independent variables, it is also important to study their interactions. After construction of the individual independent variables the next step is to select features (predictors) that capture direct mail user response in an effective manner. Two wrapper approaches will be considered the Relief algorithm and an ensemble feature selection method. In order to get the most representative features first, we start to determine the adequate number (N) of features for the data set. For that, the Random Forest algorithm is applied to all data. This algorithm measures the importance of the features by randomly permuting a feature in the out-of-bag samples and calculating the percentage increase in misclassification rate as compared to the out-of-bag rate with all variables intact. After obtaining the importance score of the features, the most important ones are chosen. This can be accomplished using a backward elimination method. To choose them, we calculate the crossvalidated prediction error of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure. As Fig. 1 shows, 286 features give the minimum cross validation error (0.205).
Response Modelling in Direct Marketing
361
0.25 0.23 0.21
cross validation error
However, the variation of cross validation error between 286 and 120 features (0.206) is not significant and, as we are interested in a manageable dataset, in the next two feature selection approaches, the first 120 predictors with the highest importance will be used to build the models.
1
120
214
286
381
508
677
Number of features Fig. 1. Classification error rate using different numbers of features.
Relief Algorithm. The Relief algorithm [7] estimates the relevance of features according to how well their values distinguishes between the instances of the same and the other class that are near each other. Given randomly selected instances, the algorithm searches for the k nearest neighbours from the same class and k nearest neighbours from each of the other possible class. Based on which class do the neighbours belong to, the algorithm updates the feature quality information by increasing its value if the feature separates instances with different class well and by decreasing its value in the opposite scenario. The process of random instance selection is repeated for several times, where the number of iterations is pre-chosen by the user. The Relief algorithm was chosen, because it scales well large data sets compared to other methods. So, we run the Relief algorithm with 5 neighbours and select the 120 top features. Ensemble Feature Selection. Similarly of ensemble learning, where multiple classifiers are combined to yield a better classifier, a more effective feature selection technique can be achieved by combining various feature selectors. Here an ensemble feature selection method based on the combination of several Random Forest feature selection models (Ens-RF) is built. Ensembles are learning methods that build a set of predictive models and then classify new observations using some form of combination of the predictions of these models. They are known for often outperforming the individual models that form the ensemble, because ensembles are based on some form of diversity
362
F. Rodrigues and T. Oliveira
among the individual models. There are many forms of creating this diversity. It can be through different model parameter settings, or considering different predictors for each model in the ensemble. Another alternative is using different samples of observations to obtain each model. The ensembles here develop follow this last strategy. This approach works better if the data from which we obtain the different models is highly redundant. We will assume that the necessary degree of redundancy on our data sets is achieved with the k-fold method. We split the data into 10-folds that are used to build 10 different feature selectors. The 120 most important features of each fold are selected and for each one it is counted the number of times it has been selected by all the feature selectors. The 120 features with greater counts are chosen.
4
Model Development and Testing
This section discusses various aspects of building and testing a model using a machine learning approach. 4.1
Data Sampling
The dataset we are using has a very reasonable size, which ensures that the values we obtain are statistically reliable. In this context, it makes sense select the stratified hold out method. This method consists of randomly splitting the available dataset in two disjoint partitions (typically in 70%, 30% proportions) maintaining in both partitions the initial distribution of the goal attribute. One of the partitions is used for obtaining the models, while the other is used for testing them. In order to do a more careful comparison between the different classifiers we repeat this procedure ten times, each time selecting a different training set in order to compute the average performance of the metrics used. Although the test set maintain the initial distribution of responders and non-responders, the same is not possible with the train set, since most learning algorithms do not behave well with imbalance data. In such cases, standard classifiers tend to omit the smaller class because it is not supported statistically. This is particularly problematic in situations where this minority class is exactly the most relevant class, as is our case. Therefore, before applying the learning algorithms to data, in the next section, it is presented two strategies for overcoming class imbalances. 4.2
Class Balancing
A number of methods to overcome class imbalance have been proposed, which can be grouped into two categories, algorithm modification and data balancing. Methods based on algorithm modification bias the learning process by using specific evaluation metrics that are more sensitive to the minority class. Data balancing methods build a new training data set in which all classes are well balanced, by under-sampling the majority class, and/or over-sampling the minority
Response Modelling in Direct Marketing
363
class. Several modifications of these two general sampling approaches exist. A successful example is the SMOTE method [8]. This method artificially generates new examples of the minority class using the nearest neighbors of these examples. Moreover, the majority class examples are also under-sampled, leading to a more balanced dataset. Besides this method, and due to the large size of the dataset we also employed a random under-sampling (RUS) method. Under-sampling is effective in reducing training time, but it often distorts the class distribution because a large number of majority class instances are removed. Random sampling is the simplest way to implement under-sampling and is one of the effective sampling methods to deal with the class imbalance problem. In random under-sampling, a set of majority class instances is selected at random and combined with the minority class patterns that randomly selects a set of non-responders and combines them with the responders. 4.3
Performance Measures
The models will be evaluated on the test data that remains with the initial imbalanced distribution among responders and non-responders. Typically, performance of a classification model is evaluated by predictive accuracy, which reflects the agreement between the observed and predicted classes. However, this measure is not appropriate when the data is imbalanced and/or the costs of errors are very different. For a two classes prediction model, when one class is interpreted as the event of interest, the statistics sensitivity and specificity are more relevant. The sensitivity also considered the true positive rate (TPR) is the rate that the event of interest is predicted correctly for all samples having the event. Conversely, the specificity is defined as the rate that non-event samples are predicted as non-events. In fact, in our case where the event of interest is to predict the responders (the minority class), it is desirable to have a model with high sensitivity, since our primary goal is to identify the responders, not the non-responders. However, the costs associate with non-responders incorrectly predicted as responders (FP), which involves the cost for mailing the campaign offer, are much lower than those with responders incorrectly predicted as non-responders (FN), which involves the business opportunity for which the campaign was conducted. So, the expected profit for the marketing application is driven by the FN costs. Assuming a fixed level of accuracy for the model, there is typically a trade-off to be made between the sensitivity and specificity. Intuitively, increasing the sensitivity of a model is likely to incur a loss of specificity, since more samples are being predicted as responders. To surpass these issues another technique for evaluating this trade-off, and also adequate for imbalanced data, is the Receiver Operating Characteristic (ROC) curve. A ROC curve is a standard technique for summarizing a classifier performance over a range of trade-offs between true positive rate (sensitivity) and false positive error rate (1-specificity) [9]. The Area Under the Curve (AUC) is an accepted performance metric for ROC curves and can also be used as a quantitative assessment of models.
364
5
F. Rodrigues and T. Oliveira
Experiments
We have conducted various experiments to validate the proposed feature selection and data balancing approaches, and other aspects of the modelling process. In this phase we use four classification algorithms to develop individual response models, each of which was built based on one of the aforementioned data balancing methods, combined with the two feature selection algorithms previously presented. We consider four classification methods during the learning step, chosen to represent a wide range of approaches: two machine learning algorithms, Random Forest (RF) and neural network (NN), a probabilistic model, Naive Bayes (NB), and a model that belongs to the class of generalized linear models, multinomial logistic regression (ML). To tune the models, 10-fold cross-validation was used to provide reasonable estimates of uncertainty. The RF model used 2000 trees in the forest. The parameter that needs tuning is mtry, the number of variables randomly sampled as candidates at each split, that was tuned over 1 to 6 values. The NN were tuned over the number of units in the hidden layer (ranging from 1 to 15), as well as the amount of weight decay (λ = 0, 0.1, 1, 2). Note that, for this last algorithm the data was centered and scaled prior to fitting, so that attributes whose values are large in magnitude do not dominate the calculations. For the NB classifier we use the density estimate method kernel and the Laplace correction as tuning parameters. In all, sixteen models were built, with respect to two different feature sets, and two different data sampling approaches. We then run each model on the independent test data set (i.e., 30%) and measure the overall predictive performance of each model. This procedure was repeated ten times, and the average performance of the metrics were computed. Together the results of these analyses are essentially descriptive indicators of the relative predictive power of each model. 5.1
Models Evaluation: ROC and AUC
Here we present the prediction results of all classification models. As already explained each model attempts to correctly identify existing responders in the test dataset of customer profiles. The overall predictive performance of each model is presented as a ROC curve for AUC. Table 1 shows the average performance of all the models in terms of the area under the ROC curve (AUC). From Table 1 we can observe that the best performance for all the classifiers is obtained using the Relief algorithm, independently of the balancing method used, which confirms the success of this algorithm in practical applications. Concerning the balancing method, all classifiers have better results with the RUS balancing method. The worse performance of SMOTE may be due to the fact that this dataset has a lot of missing values, consequently, it is not accurate to find truly nearest neighbours for a given minority sample.
Response Modelling in Direct Marketing
365
Table 1. Models prediction in terms AUC measure. Balancing method Classifier Ens-RF Relief RUS
RF ML NB NN
0.748 0.693 0.688 0.505
0.742 0.705 0.684 0.583
SMOTE
RF ML NB NN
0.719 0.670 0.643 0.486
0.738 0.682 0.684 0.576
Fig. 2. ROC curves of classification models combined with RUS and Relief.
The combined ROC analysis and plots for the four models achieved with each classification algorithm, combined with the RUS balancing method and Relief feature selection algorithm are plotted in Fig. 2. As can be seen from Fig. 2 the worst algorithm to model this data is NN. Based on the ROC curves it is reasonable to conclude that concerning this dataset the combination of Relief feature selection with RUS balance method and prediction using RF gives the best performance. Nevertheless, this is what we observed from the experiments and we believe that the performance of sampling approaches is largely data dependent.
366
6
F. Rodrigues and T. Oliveira
Conclusions
In this paper, we describe the complete data mining framework necessary to develop a response model for direct marketing. The process was tested using a large dataset. The aim of this paper was twofold: First, document a structured collection of data mining techniques necessary to handle large datasets. Second, test several sampling and feature selection algorithms in a large dataset combined with different classification algorithms. Despite the techniques’ performance being highly dependent on the type of dataset and its characteristics, for this one in particular the best results were attained by combining the Relief algorithm for feature selection and using random undersampling as the balancing method with the random forest classifier. For the framework to be more generalizable, datasets with different characteristics should be incorporated in future studies. The conclusions also highlight the potential for further in-depth research to find new ways for combining base classifiers in order to maximize the ensemble prediction ability.
References 1. Bayoude, K., Ouassit, Y., Ardchir, S., Azouazi, M.: How machine learning potentials are transforming the practice of digital marketing: state of the art. Period. Eng. Nat. Sci. 6(2), 373–379 (2018) 2. Sharstniou, U.L., Vardomatskaja, E.U.: RFM-analysis as a marketing policy planning tool. In: Education and Science in the 21st Century, pp. 128–131 (2018) 3. Gupta, A., Gupta, G.: Comparative study of random forest and neural network for prediction in direct marketing. In: Applications of Artificial Intelligence Techniques in Engineering, pp. 401–410. Springer, Singapore (2019) 4. Rogic, S., Kascelan, L.: Customer value prediction in direct marketing using hybrid support vector machine rule extraction method. In: European Conference on Advances in Databases and Information Systems, pp. 283–294. Springer, Cham (2019) 5. Govindarajan, M.: Ensemble strategies for improving response model in direct marketing. Int. J. Comput. Sci. Inf. Secur. 14(9), 108 (2016) 6. Li, Y., Li, T., Liu, H.: Recent advances in feature selection and its applications. Knowl. Inf. Syst. 53(3), 551–577 (2017) 7. Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. AAAI 2, 129–134 (1992) 8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority oversampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) 9. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(6), 1145–1159 (1997)
WakeMeUp: Weight Dependent Intelligent and Robust Alarm System Using Load Cells Omar Bin Samin(B) , Sumaira Imtiaz, Maryam Omar, Noman Naseeb, and Samad Ali Shah Center for Excellence in IT, Institute of Management Sciences Peshawar, Peshawar, Pakistan {omar.samin,maryam.omar}@imsciences.edu.pk
Abstract. Race of having better life and living standards are the major cause of sleep deprivation, depression and irritating behaviours. Indulging in such competitions yield in imbalance between rest and study/office hours. Sleeping late due to tough routine usually results in waking up late the following morning, and will also affect working schedules along personal and professional commitments. Once the sleep time is determined which varies from person to person (different age groups), an alarm system can be synced with daily schedule. People tend to use a variety of alarm systems available in market, but they are not smart enough to achieve their prime objective. i.e. waking up the concerned individual. This paper aims to propose a smart solution to accomplish its objective by: (1) Monitoring weight of individual on the bed, (2) Alerting at pre-set alarm time and (3) Powering ON vibrating motors and DC light. If the sensed weight exceeds the specified threshold at alarm time, the alarm, vibrating motors and DC light will activate to operate. To switch the alarm OFF, the sensed weight must be less than threshold, i.e. the individual needs to be out of bed. The proposed system has been validated on a number of actual test cases and achieved upto 95% accuracy and efficiency. Keywords: Alarm system · Arduino · DC light · Eccentric rotating mass vibration motor · Instrumentation amplifier INA 125 · Load cells
1
Introduction
Oversleeping is a sleep disorder that can be connected to person’s mental health issues like depression or medical issues like diabetes and heart disease [5]. According to National Sleep Foundation (NSF), it is really important to have a sufficient amount of sleep in order to reduce the above mentioned disorders. Table 1 shows the recommended sleep durations for different age groups. A good night sleep is vital and essential for health. Sleep and individual’s ability to function during the course of daily routine activities are related to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 367–376, 2021. https://doi.org/10.1007/978-3-030-71187-0_34
368
O. B. Samin et al. Table 1. NSF recommendations for sleep durations [7] Age
Recommended sleep (Hours) Not recommended sleep (Hours)
0 to 3 months
14 to 17
Less than 11 more than 19
5 to 11 months 12 to 15
Less than 10 more than 18
1 to 2 Years
11 to 14
Less than 9 more than 16
3 to 5 Years
10 to 13
Less than 8 more than 14
6 to 13 Years
9 to 11
Less than 7 more than 12
14 to 17 Years 8 to 10
Less than 7 more than 11
18 to 25 Years 7 to 9
Less than 6 more than 11
26 to 64 Years 7 to 9
Less than 6 more than 10
65+ Years
Less than 5 more than 9
7 to 8
each other. Research has initiated that people who couldn’t get proper sleep are at greater risk of chronic diseases like high blood pressure, obesity, heart disease and diabetes. The amount of sleep recommended by NFA (shown in Table 1), varies suggestively over the course of lifetime depending on age, general health and habits as well as daily activity level. The conventional alarm clocks are not efficient enough for wakening up individuals, thus producing performance delays and health issues. These delays/issues have negative impact on individuals professional growth as well as social and personal life. To overcome oversleeping issue and the worry to wake up on time, smart alarm clock is required. Usage of proposed “WakeMeUp: Weight Dependent Intelligent and Robust Alarm System using Load Cells” can help in reducing undesirable delays by forcing the user to leave bed with ease and meet his/her deadlines. The other objective of the proposed solution is to waking up a person with smooth and gentle manners allowing the body functions and chemistry to prepare for being awaken. The proposed system includes: (1) Vibrating motors to vibrate the bed as vibrating machine transmits energy into body and force the muscles to contract, (2) Jarring Alarm tune to break the silence and (3) A DC light that powers ON with dim light and get brighter gradually to the defined limit, because light plays a significant role to synchronize 24-h hormonal rhythm in human body [10]. Figure 3 shows the model of “WakeMeUp: Weight Dependent Intelligent and Robust Alarm System using Load Cells”, its components and communication between hardware and software using Bluetooth. The following sequence is followed in the paper: Sect. 2 includes literature review, Sect. 3 defines the methodology, Implementation is explained in Sect. 4, Sect. 5 includes results and Sect. 6 is conclusion along with future work.
WakeMeUp
2
369
Literature Review
The different approaches in technology make the electronic devices more affordable and user friendly. The technology is moving towards the consumer products more, which are commonly found in the homes, like alarm clocks. According to different studies participated by students and professionals the alarm clocks should have high volume or some new features to wake them up on time. The professionals said they want alarm clocks to be smarter [17]. With the passage of time the alarm clocks experienced numerous changes. Previously, alarm clocks only had buttons for setting alarm and speakers for alarm tone. Today, they have screens and advance features like voice recognition through which one can use their voice for setting up an alarm. Almost every age group and gender want same features from an alarm clock. i.e. its easy to set alarm and alarm clock should be effective enough to make them leave bed in required time. Following are details of previously designed alarm clocks/systems: The “Shape up Alarm Clock” [8] is a dumbbell shape digital clock. The prime objective of this dumbbell shape alarm clock is to buzz on the desired time and can respond to exercise status. Shape up Alarm Clock is only designed for healthy young individuals, as it is not feasible for kids, elderly aged people or physical challenged people to sway the weight. The “Gun Alarm Clock” [1,2] is another uniquely designed digital clock. This alarm system is equipped with snooze as well as switching off alarm functionality. When the alarm starts buzzing, a target pops up on the clock and in order to snooze the alarm the user needs to hit the target with laser gun. The Gun Alarm Clock is also limited to the users of specific age group. It can not be used by the aged and physically challenged people. The “Clocky” [12] is a movement based digital alarm clock with wheels, which allow it to move. This alarm system is equipped with single time snooze option. It can also jump from up to 3 ft height (i.e. from bedside table). BEDRUNNER: An Intelligent Running Alarm Clock [6] presented by Lee Kien Ee et al., proposed and developed a moving alarm clock (robot). It is comprised of hardware, software and implements the application of Artificial Intelligence. The alarm clock has two modes of working (1) Hiding mode (default mode) and (2) Running mode. The “Flying Alarm Clock” [6] is another unique motion based digital alarm clock consisting of a LCD display and detachable propeller. The only way to shut the siren is by placing the propeller back into the clock. “A Light Apparatus” [16] is an intelligent alarm clock that slowly increases light like sunshine with nature sounds. There are different settings of light that not only function in the morning, but can also be used as a night light. The features of a standard alarm clock, such as an FM radio, a large LED screen and sounds, are also combined. Unfortunately, the loudness of nature’s sounds can’t be changed. For light sleepers, this could be annoying, as the sound alarm has only one volume level. Some state that for their mornings this is too noisy and jarring. Deep sleepers can however, enjoy the extra loudness.
370
O. B. Samin et al.
The “Smart Reminder Clock (SRC)” [9] is an android based clock application, which forces its user to walk in order to switch the alarm off. SRC utilizes pedometer of the smart phone to count the steps taken by user in order to switch off the alarm. The week points of SRC are snoozing functionality and dependency on other individuals. The alarm clocks/systems discussed above are particularly focused on certain group of people and do not give any unify solution for all age groups. Also control of alarm and snooze option exterminates the whole idea of alarm systems. Moreover, in almost all the systems, user has the authority to either snooze or switch off the alarm, which sometimes has a negative affect on user’s punctuality. A unique, unified and independent system is required that is feasible for all the age groups to operate and do not have any health and safety issues. The proposed system “WakeMeUp: Weight Dependent Intelligent and Robust Alarm System using Load Cells” resolves all the issues and short comings in the above stated alarm devices and also do not provide user with snooze or switch alarm off functionality. Table 2. Comparison among alarm clocks/systems Alarm systems
Alarm switch off prevention
Independent Suitable for of other physically individuals challenged
Shape up alarm clock ✗
✗
✓
✗
✗
✗
✓
✗
Gun alarm clock
Snoozing prevention
Clocky
✗
✗
✓
✗
Flying alarm clock
✗
✓
✓
✗
A light apparatus
✗
✓
✗
✓
SRC
✗
✗
✗
✗
WakeMeUp
✓
✓
✓
✓
WakeMeUp is the state of the art alarm system that resolves all the short comes of the existing systems (see Table 2). The novelty is in the architectural design of the system that uses load cell, vibration motor and DC light in such a manner that it is suitable for every age group and even for physically challenged people. The whole idea of this product revolves around the key purpose, i.e. to efficiently awake the individual on desired time, which can only be achieved if the user do not have “Alarm Snooze” or “Alarm Switch OFF” option. These controls are removed in this system, hence, forcing its user to leave the bed on time, as only way of switching the alarm off is by leaving bed and stay out for 2 min.
WakeMeUp
3
371
Methodology
WakeMeUp alarm clock uses load cells to sense weight based on the threshold approach to detect the human body on the bed. The components used to develop the proposed system includes Arduino UNO [4,11,15], Bluetooth HC-05 [3,13,14], Load Cell, Instrumentation Amplifier INS 125, Electric Rotating Mass Vibration Motor, and DC Light Bulb. Figure 1 explains the buzzer, vibrator motor and DC light’s activation process. First the system is initialized and alarm is set using Android application. In case, if the sensed weight is found to be less than threshold (i.e. 15 Kg), then system will take no action, else micro-controller activates vibration motor, power on DC light and also sends alarm activation signal to the mobile application. After receiving this command, alarm on the mobile phone will start buzzing. Thus, buzzer, light intensity and vibrator motor forces the user to wake up and leave the bed as this alarm cannot be snoozed or deactivated from mobile phone. No
Start
Phone time
Set Alarm
= Alarm
No SW>Threshold
Yes
Sense Weight (SW) Buzzer Activates
Yes
+ Bed Vibrates + Light Power On
Fig. 1. Buzzer, DC light and vibrator motor activation flow diagram
Figure 2 explains the buzzer, DC light and vibrator motor deactivation process. After the activation of buzzer, DC light and vibration motors; microcontroller continuously checks for weight on bed using load cell. If the sensed weight is found to be less than threshold (i.e. 15 Kg), the micro-controller waits and continuously checks for weight for about 2 min to confirm that the subject is wide awake and left the bed. After confirmation it sends mobile phone buzzer deactivation command, power off DC light command and also deactivates vibration motor. The threshold is precisely set to 15 Kg because of the fact that collectively infant and mattress weights are slightly less than 15 Kg, and this will avoid disturbance/interruption in infant’s sleep.
372
O. B. Samin et al.
Start
Buzzer Activated SW
+ Bed Vibrating
No
Buzzer + Bed Vibrations
Yes
Deactivates +
SW < Threshold for 2 minutes
Light Power Off
Fig. 2. Buzzer, DC light and vibrator motor deactivation flow diagram
4
Implementation
Implementation phase is divided into two sub phases: (1) Hardware Implementation and (2) Software Implementation. 4.1
Software Implementation
“WakeMeUp” is an embedded system application based on Android. The application is used to set an alarm for work/event reminder. Working flow of application starts with a splash screen, checks system Bluetooth (on/off) and asks user to select a particular Bluetooth device among list of all paired Bluetooth devices. The application contains a list of previously configured alarms, where the user can set/unset previous alarms or add a new alarm. On an add alarm screen, there is a 24-hours format time picker, the user can select hour, minute and put title of the alarm (for example “10:30”, “Morning”) and click SET button. After the alarm timespan completes the application sends “Alarm Time” notification to the connected Bluetooth device. Based on the result (weight) from the connected device the application either starts buzzing with vibrations and sets a 2 min timer for alarm or do nothing. The 2 min timer for alarm confirms the subject is wide awake and has left the bed. 4.2
Hardware Implementation
Figure 3 shows the connections among Arduino UNO, load cell, instrumental amplifier (INA 125), eccentric rotation mass vibration motor (ERM), DC light, Bluetooth module (HC-05) and battery (12 V). This hardware is used for sensing weight on the bed and notifying mobile application if required. Weight is sensed by load cell; these sensed values (as sensed values are very small) are amplified using instrumental amplifier (INA 125) and fed to the micro-controller, where decision is made in accordance to the defined rules.
WakeMeUp
(a) Proposed System Design
373
(b) Circuit Diagram
Fig. 3. WakeMeUp: weight dependent intelligent and robust alarm system using load cells
5
Results and Discussions
Primarily the load cell is calibrated with three different weights, 0 kg, 5 kg and 10 kg. Calibration weights and their outputs were found to be equivalent to the values of calibration weights, thus verifying the reliability and consistency of the result. After calibration, the system which is installed in the bed shown in Fig. 3 and tested for required scenarios. System’s minimum weight threshold is set to 20 Kg as approximately infant is of 6 Kg and mattress, pillows and blanket of 8 Kg to activate alarm (buzzer + vibrations) for adults only, in order to avoid interruption in infant’s sleep. 5.1
Testing Scenarios
In the first scenario, the system is tested with no weight (human entity). Figure 4a shows that the system sensed 8 Kg weight on bed (i.e. weight of mattress, pillows and blanket). Micro-controller will not send any notification to Android application as sensed weight is less than the set threshold. In the second scenario, the system is tested with an infant lying on the bed. Figure 4b shows that the system sensed approximate weight of 14 Kg on bed (i.e. weight of mattress, pillows and blanket + infant). Micro-controller will not send any notification to Android application as sensed weight is less than the set threshold. In the third scenario, the system is tested with adult lying on the bed. Figure 4c shows that the system sensed approximate weight of 40 Kg on bed (i.e. weight of mattress, pillows and blanket + adult). Micro-controller will send weight detected notification over Bluetooth to Android application as sensed weight is greater than threshold.
374
O. B. Samin et al.
In the fourth scenario (Fig. 5a), at the alarm time after weight detection (weight > threshold) system activates the alarm. Alarm activation refers to alarm notification, vibration motors initiation and light intensity escalation to awake the sleeping subject. In the fifth scenario (Fig. 5b), when the alarm is activated the system checks for weight value, if the weight < Threshold, system activates 2 min timer. If weight < Threshold even after this time span, the alarm is deactivated. Alarm deactivation refers to switching off alarm notification, vibration motors and light.
(a)
(b)
(c)
Fig. 4. (a) Load cell readings for mattress only, (b) Load cell readings for mattress + infant and (c) Load cell readings for mattress + adult.
In order to check the synchronization and to validate the efficiency and reliability of the WakeMeUp alarm system, the proposed system is placed in bed (see Fig. 3 shows the placement of load cell and vibration motors) and tested for various weights. Table 3 shows the results (i.e. ‘Pass’ or ‘Fail’) for tested weights. The system successfully passed, means load cell sensed the weight, sent notification to activate the phone buzzer, vibration motors and DC light at pre-set alarm time. The results concluded 95% of system efficiency.
(a)
(b)
Fig. 5. (a) Alarm activation and (b) Alarm deactivation.
WakeMeUp
375
Table 3. Test results S. no Weight (Kg) Test1 Test2 Test3 Test2
6
01
26
Pass
Fail
Pass
Pass
02
30
Pass
Pass
Pass
Pass
03
35
Pass
Pass
Pass
Pass
04
40
Pass
Pass
Pass
Pass
05
45
Pass
Pass
Pass
Pass
Conclusion
Alarm clocks play vital role in helping people to achieve their desired tasks on time by awakening them on specified time, but conventional and traditional alarm clocks are not capable enough to efficiently achieve their goal. In this research, simple, efficient, cost effective state-of-the-art and smart alarm system is developed to help people to cope with their unwanted long sleeping issues. WakeMeUp alarm clock comprises of alarm snooze and switch OFF prevention features along with gradual increase in light bulb intensity to ensure achievement of its prime objective. i.e. waking up the subject on time without compromising their health and safety. In future, the alarm system developed in this research for single bed can be scaled up to the double bed, such that the alarm system can facilitate multiple users by dividing double bed into segments and each segment will operate independently. Several product effectiveness tests are performed and experimental results clearly demonstrate the effectiveness of proposed alarm system. Consequently, it achieves an average accuracy and effectiveness of 95%.
References 1. Gun alarm clock (2018). https://www.yellowoctopus.com.au/products/gun-alarmclock. Accessed 5 Mar 2018 2. Gun target alarm clock makes you shoot target to turn off alarm (2018). https:// odditymall.com/laser-gun-target-alarm-clock. Accessed 5 Mar 2018 3. Alam, Z., Samin, H., Samin, O.B.: Healthband for dementia patients: fall and scream detector and caretaker helper. JPhCS 976(1), 012015 (2018) 4. Bukhari, J., Rehman, M., Malik, S.I., Kamboh, A.M., Salman, A.: American sign language translation through sensory glove; signspeak. Int. J. u e-Serv. Sci. Tech. 8(1), 131–142 (2015) 5. Leger, D., Beck, F., Sauvet, F., Faraut, B.: The risks of sleeping “too much”. survey of a national representative sample of 24671 adults. PLoS One 9(9), e106950 (2014) 6. Ee, L.K., Zamin, N., Aziz, I.A., Haron, N.S., Mehat, M., Ismail, N.N.: Bedrunn3r: An intelligent running alarm clock (2006) 7. Hirshkowitz, M., Whiton, K., Albert, S.M., Alessi, C., Bruni, O., DonCarlos, L., Hazen, N., Herman, J., Katz, E.S., Kheirandish-Gozal, L., et al.: National sleep foundation’s sleep time duration recommendations: methodology and results summary. Sleep Health 1(1), 40–43 (2015)
376
O. B. Samin et al.
8. Huang, H.T.: Dumbbell that can respond to exercise status and play music, US Patent App. 11/111,815, 26 Oct 2006 9. Kasim, S., Hafit, H., Leong, T.H., Hashim, R., Ruslai, H., Jahidin, K., Arshad, M.S.: SRC: smart reminder clock. In: IOP Conference Series: Materials Science and Engineering, vol. 160, p. 012101. IOP Publishing (2016) 10. Leproult, R., Colecchia, E.F., L’Hermite-Bale´eriaux, M., Van Cauter, E.: Transition from dim to bright light in the morning induces an immediate elevation of cortisol levels. J. Clin. Endocrinol. Metab. 86(1), 151–157 (2001) 11. Naseeb, N., Alam, M., Samin, O.B., Omar, M., Khushbakht, S.S., Shah, S.A.: RGB based EEG controlled virtual keyboard for physically challenged people. In: 2020 3rd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), pp. 1–5. IEEE (2020) 12. Ofek, E., Avery, J.: Nanda home: Preparing for life after clocky. Harvard Business School Marketing Unit Case (511-134) (2011) 13. Saeed, M.Z., Ahmed, R.R., Samin, O.B., Ali, N.: IoT based smart security system using PIR and microwave sensors. In: 2019 13th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS), pp. 1– 5. IEEE (2019) 14. Samin, O.B., Sohail, H., Omar, M., Hummam, H.: Accelerometer and magnetometer enabled entity following automated suitcase. In: 2020 International Conference on Emerging Trends in Smart Technologies (ICETST), pp. 1–5. IEEE (2020) 15. Sohail, H., Ullah, S., Khan, A., Samin, O.B., Omar, M.: Intelligent trash bin (ITB) with trash collection efficiency optimization using IoT sensing. In: 2019 8th International Conference on Information and Communication Technologies (ICICT), pp. 48–53. IEEE (2019) 16. Taekema, H.J., Zeinstra, M.H., Beutick, J., Heersema, M.: A light apparatus, US Patent App. 16/632,563, 21 May 2020 17. Yue, Y., Zhong, J., Wang, L.: Smart sleep mattress based on internet of things technology. In: International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy, pp. 773–776. Springer, Cham (2020)
Visual Data Mining: A Comparative Analysis of Selected Datasets Ujunwa Mgboh1 , Blessing Ogbuokiri2(B) , George Obaido3 , and Kehinde Aruleba4 1
2
Department of Computer Science, Eastern Michigan University (EMU), Ypsilanti, MI, USA [email protected] Department of Computer Science, University of Nigeria, Nsukka, Nigeria [email protected] 3 School Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa [email protected] 4 Department of Information Technology, Walter Sisulu University, Johannesburg, South Africa [email protected]
Abstract. This paper presents data preprocessing operations and visualisation techniques, carried out on the following datasets: Teaching Assistant Evaluation dataset, Statlog (Australian Credit Approval) dataset, Letter Recognition, Connectionist Bench (Sonar, Mines vs. Rocks) dataset, and Poker Hand dataset. These datasets are from the University of California Irvine (UCI) Machine Learning Repository. Further, appropriate visualisation techniques are applied to the five selected datasets depending on the properties that are supported by the visualisation techniques used. In the end, this paper offers a template for researchers, data scientists, and other data users, in selecting the right preprocessing operations and appropriate visualisation techniques when using these datasets. Keywords: Preprocessing mining · Datasets
1
· Visualisation · Data mining · Visual data
Introduction
Visualisation of large datasets for information has been applied as a supporting technique to data mining [13,15]. Over the years, there has been a dare need for proper visualisation techniques for large datasets that can support data mining tasks [8]. Since the late 1990s, visual data mining has become prominent in providing some interactive functionality, and guidelines for reusable and valuable data [12,16,18]. The interaction and analytical reasoning between one or more visual representation of abstract data can be generally called Visual Data Mining (VDM) c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 377–391, 2021. https://doi.org/10.1007/978-3-030-71187-0_35
378
U. Mgboh et al.
[2,14]. VDM leads to the visual understanding of large datasets, which provides some guidance for the application of other data mining and analytic techniques [5,10]. VDM also helps researchers in gaining a more in-depth knowledge of the underlying structures in a large dataset [9]. According to Ankerst [2], VDM depends on the close interconnectedness of tasks, selection of visual representations, and a corresponding set of interactive manipulations, and respective analytical techniques. In this process, patterns are discovered which form the information and possibly knowledge bank for policy or decision making. Despite the notable influence and usefulness of visual data mining in data analysis, the selection of the appropriate preprocessing operation and appropriate visualisation techniques sometimes poses to be a hedge of the task, especially to young researchers [7,11]. Therefore, there is a need for a template for preprocessing operation and visualisation techniques to be applied on a given dataset. This work examines five datasets from the UCI Machine Learning Repository used to suggest the appropriate preprocessing operation and visualisation techniques needed [4,17]. We choose the UCI Machine Learning Repository because they maintain their datasets as a service to the machine learning community. The selection of five datasets from the UCI Machine Learning Repository in this work is strictly by choice, considering the time frame. The five datasets are Teaching Assistant Evaluation dataset [17], Statlog (Australian Credit Approval) dataset [17], Letter recognition [17], Connectionist Bench (Sonar, Mines vs. Rocks) dataset [17], and Poker Hand dataset [17]. In the end, this paper offers a template for researchers, data scientists, and others to select the ideal preprocessing operations and appropriate visualisation techniques when using these datasets. The remainder of the paper is organised as follows. Section 2 presents data preprocessing and visualisation, followed by test criteria for evaluation in Sect. 3. In Sect. 4, we show the visualisation technique justification for the datasets used, followed by visualisation tools used in Sect. 5. Section 6 presents the conclusion and future work.
2
Data Preprocessing and Visualisation
In this part, we present the datasets with their different dimensions. We discuss the appropriate preprocessing and visualisation techniques needed for each. 2.1
Teaching Assistant Evaluation Dataset
The Teaching Assistant Evaluation Dataset has a dimension of 151 × 5. This dataset was aggregated at the Statistics Department of the University of Wisconsin-Madison during a course of three regular semesters and two summer semesters of 151 teaching assistant (TA) assignments that consist of evaluations of teaching performance. The scores were divided into 3 roughly equal-sized categories (“low”, “medium”, and “high”) to form the class variable.
Visual Data Mining: A Comparative Analysis of Selected Datasets
379
Preprocessing. In terms of preprocessing, there are no missing values in this dataset, so missing value estimation is not needed. However, global or local normalization can be recommended in other to get all the values to fall within the same range. The features of this dataset can illustrate are: 1. 2. 3. 4.
Outlier detection Cluster detection Class cluster detection Important feature detection (which will help detect the evaluation class of a teaching assistant)
Visualisation. The suggested visualisation technique for this dataset is the Chernoff-faces. This is because the dataset has 5 dimensions with 151 data objects. Chernoff-faces is an icon-based visualisation technique that can be used to find interesting trends and structure in a dataset [1,3]. The Chernoff-faces handle each variable differently because the features of the faces vary in perceived importance. Figure 1 shows that all 151 data objects of the Teaching Assistant dataset are represented with a face.
Fig. 1. Chernoff-face plot of teaching assistant evaluation dataset
The scatter plot of the teaching assistant evaluation dataset shows that there is a kind of overlapping between various classes in the dataset. However, the teaching assistant evaluation dataset may be visualised with a scatter plot. The 2D scatter plot can provide a powerful visual image of the relationship between variables. It is the most effective visualisation technique used to get a glance
380
U. Mgboh et al.
of patterns in any dataset. The plot in Fig. 2 represents a scatter plot of 151 teaching assistant evaluation dataset with 3 class variables. The three colour codes seen in the plot represent datasets belonging to various class variables.
Fig. 2. Scatter plot of teaching assistant evaluation dataset
2.2
Statlog (Australian Credit Approval) Dataset
The Statlog dataset is often used for credit card applications [17]. It has a dimension of 690 × 14. Here, all attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. The dataset contains a mix of attributes, such as continuous, nominal with small numbers of values, and nominal with larger numbers of values. Also, there are a few missing values in this Statlog dataset. Preprocessing. Since 37 cases (5%) have one or more missing values in the dataset, missing value estimation is deemed necessary to get the data ready for use. To do this, a learning algorithm such as Bayesian Principal Component Analysis (BPCA) missing value estimator or collateral missing value estimation could be used. In addition, normalization is needed here because some of the values are way out of range. Therefore, global or local globalization will be required to accomplish this. The features of this dataset can illustrate are: 1. 2. 3. 4.
Outlier detection Cluster detection Class cluster detection Important feature detection (which will help detect if a credit will be approved or not)
Visual Data Mining: A Comparative Analysis of Selected Datasets
381
Visualisation. The Australian credit approval dataset with 14 dimensions and 690 data objects can be visualised with the pixel-oriented technique. The basic idea of this technique is to represent each attribute value as a single coloured pixel, map the range of possible attribute values to a fixed colour map and display different attributes in different sub-windows. Pixel-oriented visualisation technique maximise the amount of information represented at one time without any overlap. Figure 3 represents the sub-windows of the 14 attributed in the statlog dataset.
Fig. 3. Pixel-orientation display of Australian credit card dataset with Xmdv
Also, a scatter plot matrix can be used to visualise this dataset. The scatter plot matrix is like a table that shows a plot of each attribute against others, see Fig. 4.
Fig. 4. Scatter plot matrix of Australian credit card dataset with Xmdv
382
2.3
U. Mgboh et al.
Letter Recognition Dataset
The letter recognition dataset has a dimension of 20000 × 16. The dataset identifies a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts, and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. Preprocessing. There is no missing value estimation needed, the data is already ready for use. The features of this dataset can illustrate are: 1. 2. 3. 4.
Outlier detection Cluster detection Class cluster detection Important feature detection (which will help detect a large number of blackand-white rectangular pixel displays as one of the 26 capital letters in the English alphabet).
Visualisation. The letter recognition dataset can be visualised using scatter plot matrix. A scatter plot matrix is a table of scatter plots. Each plot is small so that many plots can fit on one page. Following the example in Fig. 5, notice how quickly one can scan the plots for highly correlated variables of the letter recognition dataset as well as outliers.
Fig. 5. Scatter plot matrix of letter recognition dataset with Xmdv
Also, this dataset can be visualised with the heat plot (tree view), as depicted in Fig. 6.
Visual Data Mining: A Comparative Analysis of Selected Datasets
383
Fig. 6. Heat plot of letter recognition dataset with Matlab
2.4
Connectionist Bench (Sonar, Mines vs. Rocks) Dataset
This dataset was first used by Gorman and Sejnowski in their study of the classification of sonar signals using a neural network [6]. It has a dimension of 208 × 60. The task is to train a network to discriminate between sonar signals bounced off a metal cylinder and those that bounced off a roughly cylindrical rock. Preprocessing. There are no missing values, and basically no, missing data estimation is needed. In other words, the dimension of this dataset is large and might require dimension reduction using Principal Component Analysis (PCA). The dataset is within. Hence, normalisation is not necessary. The features of this dataset can illustrate are: 1. 2. 3. 4.
Outlier detection Cluster detection Class cluster detection Important feature detection (which will help detect the metal surface the signal bounces on)
Visualisation. Using the heat map as a technique to understand the dataset, the connectionist bench dataset produces a vague visualisation. In Fig. 7, the red colour represents similar data objects, while the green colour represents dissimilar data objects. The black colour represents the data objects that lie in between the similar and dissimilar data objects.
384
U. Mgboh et al.
Fig. 7. Heatmap for the connectionist bench dataset
However, the hierarchical pixel orientation display can also be used for dataset visualisation. Hierarchical pixel orientation display is a visualisation technique which uses hierarchical partitioning into subspaces. This is shown in Fig. 8 where all the dimensions of the connectionist bench dataset are represented as a sub-window.
Fig. 8. Hierarchical pixel orientation display of connectionist bench dataset with xmdv
Visual Data Mining: A Comparative Analysis of Selected Datasets
2.5
385
Poker Hand Dataset
In this dataset, each record is an example of a hand consisting of five playing cards drawn from a standard deck of 52. Each card is described using two attributes (suit and rank), for a total of 10 predictive attributes. The Poker Hand dataset has a dimension of 1025010 × 11. Preprocessing. With 1025010 data objects and 11 dimensions, preprocessing becomes relevant because it is difficult to load and visualise this dataset. Therefore, PCA is suggested to be used on the dataset. The features this dataset can illustrate are: 1. 2. 3. 4. 5.
Outlier detection Cluster detection Class cluster detection Important feature detection Difficulty of visual data mining
In order to visualise the dataset, Glyphs or heat plot (tree view) is suggested to be used to detect more patterns from this dataset, see Figs. 9 and 10.
Fig. 9. Glyphs visualisation of Poker hand dataset with Xmdv after dimension reduction
3
Test Criteria for Evaluation
It is pertinent to consider the amount of data a particular visualisation technique can handle, as well as the quantity of human interaction it can support. The following criteria, as summarised, in Tables 1, 2, 3, 4, 5, 6, 7, 8 and 9 are the benchmark for testing visualisation techniques in data mining process for the selected datasets.
386
U. Mgboh et al.
Fig. 10. Heat plot visualisation of Poker Hand with MATLAB after dimension reduction Table 1. Scatter plot using PCA Dataset
Outlier detection
Cluster detection
Class cluster detection
Important feature detection
Teaching assitance evaluation
Yes
Yes
Yes
Yes
Statlog (Australian credit approval)
Yes
Yes
Yes
Yes
Letter recognition
Yes
Yes
Yes
Yes
Connectionist bench (sonar, mines vs. rocks)
Yes
Yes
Yes
Yes
Poker hand
Yes
Table 2. Scatter plot by Sammon’s mapping Dataset
4
Outlier detection
Cluster detection
Teaching assitance evaluation
Yes
Yes
Statlog (Australian credit approval)
Yes
Class cluster detection
Important feature detection
Yes
Yes
Yes
Letter recognition
Yes
Yes
Connectionist bench (sonar, mines vs. rocks)
Yes
Yes
Poker hand
Yes
Yes
Visualisation Technique Justification for the Datasets
We summarise the justification of the various visualisation techniques used in this work in Table 10.
Visual Data Mining: A Comparative Analysis of Selected Datasets
387
Table 3. Scatter plot by RadViz Dataset
Outlier detection
Cluster detection
Class cluster detection
Teaching assitance evaluation
Yes
Yes
Yes
Statlog (Australian credit approval)
Yes
Yes
Yes
Letter recognition
Yes
Yes
Yes
Connectionist bench Yes (sonar, mines vs. rocks)
Yes
Yes
Poker hand
Yes
Important feature detection
Yes
Yes
Yes
Table 4. Parallel coordinate Dataset
Outlier detection
Cluster detection
Teaching assitance evaluation
Yes
Yes
Statlog (Australian credit approval)
Yes
Yes
Letter recognition
Yes
Yes
Connectionist bench Yes (sonar, mines vs. rocks)
Yes
Poker hand
Yes
Class cluster detection
Important feature detection Yes
Table 5. Heat plot (tree view) Dataset
Outlier detection
Cluster detection
Teaching assitance evaluation
Yes
Yes
Yes
Statlog (Australian credit approval)
Yes
Yes
Yes
Letter recognition
Yes
Yes
Yes
Connectionist bench Yes (sonar, mines vs. rocks)
Yes
Yes
Poker hand
Yes
Yes
Yes
Class cluster detection
Important feature detection
388
U. Mgboh et al. Table 6. Circular segment
Dataset
Outlier detection
Cluster detection
Class cluster detection
Important feature detection
Teaching assitance evaluation
Yes
Yes
Statlog (Australian credit approval)
Yes
Yes
Letter recognition
Yes
Yes
Connectionist bench Yes (sonar, mines vs. rocks)
Yes
Poker hand
Yes
Yes
Table 7. Chernoff-face Dataset
Outlier detection
Cluster detection
Class cluster detection
Important feature detection
Teaching assitance evaluation
Yes
Yes
Statlog (Australian credit approval)
Yes
Yes
Letter recognition
Yes
Yes
Connectionist bench (sonar, mines vs. rocks) Poker hand
Table 8. Pixel–oriented display Dataset
Outlier detection
Cluster detection
Class cluster detection
Important feature detection
Teaching assitance evaluation Statlog (Australian credit approval) Letter recognition Connectionist bench Yes (sonar, mines vs. rocks)
Yes
Poker hand
Yes
Yes
Visual Data Mining: A Comparative Analysis of Selected Datasets
389
Table 9. Hierachical pixel oriented Dataset
Outlier detection
Cluster detection
Class cluster detection
Important feature detection
Teaching assitance evaluation Statlog (Australian credit approval) Letter recognition
Yes
Yes
Connectionist bench Yes (sonar, mines vs. rocks)
Yes
Poker hand
Yes
5
Yes
Visualisation Tools Used
Apart from using MATLAB scripts for some visualisation techniques, such as the circle segment, scatter plot with Sammons mapping, some visualisation tools (Xmdv tool and Orange) were used in this work. 5.1
Xmdv Tool
In order to use the Xmdv tool, the dataset will be converted to a “. OKC” extension. The “. OKC” extension is only accepted by the Xmdv tool. To achieve this, the following steps are taken: 1. The first line of the “. OKC” file will be the number of attributes (excluding the labels) and the total number of objects (rows), 2. This will be followed by names for each attribute (column) of the file, 3. Next is the min, max and mean value of all the individual attribute (column), 4. Finally, the dataset, excluding the labels. After the above steps, the file is then successfully uploaded and run the intended visualisation technique(s). 5.2
Orange
The Orange visualisation tool allows user to upload the data, drag the file and the data table widget, connect them and run the necessary visualisation technique.
390
U. Mgboh et al. Table 10. Visualisation technique justification
Group 1 Chernoff-face Used for datasets that have very few features, with limited dimensions Group 2 Pixel-oriented display Pixel-oriented visualisation technique maximize the amount of information represented at one time without any overlap Group 3 Scatter plot matrix
Scatter plot by PCA This is the most effective visualisation technique Scatter plot matrix With this one can quickly scan the plots for highly correlated variables, as well as outliers Heat plot (tree view)
This one can quickly scan the plots for Most effective visualisation technique highly correlated variables as well as outliers Group 4 Heat plot (tree view) Most effective visualisation technique Group 5 Heat plot (tree view)
Hierarchical pixel oriented Used for large datasets with high dimensions Glyph
Most effective visualisation technique
Good for the reduced dataset after PCA Group 1 – Teaching Assistant Evaluation dataset, Group 2 – Statlog (Australian Credit Approval) dataset, Group 3 – Letter recognition, Group 4 – Connectionist Bench (Sonar, Mines vs. Rocks) dataset, Group 5 – Poker Hand dataset
6
Conclusion and Future Work
In this paper, required different data preprocessing operations and appropriate visualisation techniques were identified for the following datasets: Teaching Assistant Evaluation dataset, Statlog (Australian Credit Approval) dataset, Letter recognition, Connectionist Bench (Sonar, Mines vs. Rocks) dataset, and Poker Hand dataset from the UCI Machine Learning Repository. This work offers a template for researchers, data scientists, and other data users, in selecting the right preprocessing operations and appropriate visualisation techniques when using a selected dataset. In the future, we intend to accommodate more datasets for a more robust comparison.
References 1. Abdul Moiz, S.: Class level code smells: chernoff face visualization. CSI J. Comput. 3(2), 36–41 (2020). http://www.csi-india.org/downloads/pdf/4/csi 2. Ankerst, M.: Visual Data Mining. Ph.D. thesis, Faculty of Mathematics and Computer Science, University of Munich, Munich (2000) 3. Bruckner, L.A.: On chernoff-faces. In: Graphical Representation of Multivariate Data, pp. 93–121 (1978). https://www.sciencedirect.com/science/article/pii/ B9780127347509500095
Visual Data Mining: A Comparative Analysis of Selected Datasets
391
4. Ceneda, D., Gschwandtner, T., Miksch, S.: A review of guidance approaches in visual data analysis: a multifocal perspective. Comput. Graph. Forum 38(3), 861– 879 (2019). https://doi.org/10.1111/cgf.13730 5. Cristobal, R., Sebastian, V.: Educational data mining and learning analytics: an updated survey. In: WIREs Data Mining Knowledge Discovery, pp. 1–22 (2020) 6. Gorman, R.P., Sejnowski, T.J.: Learned classification of sonar targets using a massively parallel network. IEEE Trans. Acoust. Speech Signal Process. 36(7), 1135– 1140 (1988) 7. Keim, D., Mansmann, F., Schneidewind, J., Ziegler, H.: Challenges in visual data analysis. In: In Proceeding of International Conference on Information Visualization, pp. 26–36. ACM (2006) 8. Keim, D., North, S.: Visual data mining in large geospatial point sets. IEEE Comput. Graph. 24(5), 36–44 (2004) 9. Li, G.: Research on data analysis and mining technology based on computer visualization. In: CIPAE 2020: Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced EducationOctober 2020, pp. 194–200. ACM (2020) 10. Mehta, A.Y., Cummings, R.D.: GLAD: glycan array dashboard, a visual analytics tool for glycan microarrays. Bioinformatics 35(18), 3536–3537 (2019). https://doi.org/10.1093/bioinformatics/btz075 11. Nayem, R.: A Taxonomy of Data Mining Problems. IGI Global Publishers (2020) 12. Rubio, E., Castillo, O., Valdez, F., Melin, P., Gonzalez, C.I., Martinez, G.: An extension of the fuzzy possibilistic clustering algorithm using type-2 fuzzy logic techniques. Adv. Fuzzy Syst. 2017, 23 (2017) 13. Simoff, S.J.: Visual Data Mining, pp. 3365–3370. Springer, Boston (2009). https:// doi.org/10.1007/978-0-387-39940-9 1121 14. Simoff, S.J., B¨ ohlen, M.H., Mazeika, A.: Visual data mining. In: LNCS 4404, pp. 1–12. Springer-Verlag, Berlin (2008) 15. Solmaz, M., Lane, A., Gonen, B., Akmamedova, O., Gunes, M.H., Komurov, K.: Graphical data mining of cancer mechanisms with SEMA. Bioinformatics 35(21), 4413–4418 (2019). https://doi.org/10.1093/bioinformatics/btz303 16. Supriyati, Abdillah, S.R.: Data mining in sales data grouping. IOP Conf. Ser. Mater. Sci. Eng. 879, 012116 (2020). https://doi.org/10.1088 17. UCI: Machine learning repository (2020). https://archive.ics.uci.edu/ml/index.php 18. Ying, Y., Yue, S.: Application of data mining combined visualization technology in visual communication. In: 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), pp. 874–879 (2020)
NoSQL Big Data Warehouse: Review and Comparison Senda Bouaziz1,2(B) , Ahlem Nabli1,3 , and Faiez Gargouri1,4 1
MIRACL Laboratory, Sfax, Tunisia Faculty of Economics and Management of Sfax, Sfax University, Sfax, Tunisia Faculty of Computer Sciences and Information Technologies, Al-Baha University, Al Baha, Kingdom of Saudi Arabia [email protected] 4 Institute of Computer Science and Multimedia of Sfax, 3021 Sfax, Tunisia [email protected]
2 3
Abstract. Typically, to implement a data warehouse, we have to extract the data from relational databases, XML files, etc., which are very used by companies. Since today’s data are generated from social media, GPS data, sensor data, surveillance data, etc., which are maintained in NoSQL databases, we are talking about big data warehouses (BDW). Hence, there is a need to study the influence of this new paradigm in the creation of the data warehouse (DW) and the ETL process (Big ETL). This paper presents an overview of the work dealing with proposals in this context.
Keywords: Big Data warehouse process
1
· NoSQL databases · Big ETL
Introduction
The increase of data volume over the last decade has attributed to various data sources, such as social media, GPS data, sensor data, surveillance data, e-mail, etc. This movement has given birth to the Big Data phenomenon, which is characterized by the 7V: Volume, Velocity, Variety, Veracity, Validity, Value and Variability [17]. Most 3Vs used are the Volume, which implies the amount of data that goes beyond the usual units, the Velocity which implies the speed with which these data are measured, generated and must be treated and finally the Variety, which implies the diversity of formats and structures. Although relational databases have been the perfect data storage for many decades, they are no longer suited to the big data phenomenon. As a result, many NoSQL (Not Only SQL) databases have emerged for the storage and processing of large volumes of structured, semi-structured, and unstructured data, using data structures that can track columns, key-value, document, or graphs. Therefore, the Big Data increases the possibilities for analysis in all the sectors and opens up new horizons for decision-making [21]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 392–401, 2021. https://doi.org/10.1007/978-3-030-71187-0_36
NoSQL Big Data Warehouse: Review and Comparison
393
Indeed, the conventional data warehousing technology is typically applied to structured and semi-structured data and cannot really handle unstructured data i.e. data from big data. Traditional DW cannot meet the growing needs of the modern enterprise to integrate and analyze a wide variety of data generated by social, mobile and sensor sources. The best practices of the Business Intelligence are to organize the data in relation to the intended use. However, the data evolve very quickly, and as a result, each upgrading of the data warehouse becomes a challenge for the information organization strategy. This paper study the influence of big data on DW specifically the use of NoSQL databases in the context of data warehousing. This paper is organized as follows. Section 2 presents an overview of the relational databases (RDBMS) vs the NoSQL databases. Section 3 details the data warehouse under NoSQL databases. Section 4 addresses the Big ETL approaches. Section 5 presents a summary and discuss described approaches. Section 6 concludes the paper.
2
RDBMS vs NoSQL Databases
Usually the implemented data model determines the logical organization of the stored data in the database. Thus, Nosql databases are often divided according to the implemented data model. Currently the most common classification of these databases is a four segment classification in which the following bases can be distinguished: key-value, column-oriented, document-oriented and graph [14]. – Key-value databases. This type of databases are the simplest with respect to data availability because they are composed of hash tables with only two columns: key and value. The most common representatives of this group are such database environments as: Riak, Redis, BerkleyDB, LevelDB. – Column-oriented databases. This type of databases is the one that comes closest to the tables in a relational database. They are much more scalable and flexible since we can have different columns for each line. The most wellknown representatives of this group are: Cassandra, HBase. – Document-oriented databases. Document-oriented databases store and return XML, BSON, JSON documents which are hierarchically distributed in server memory creating tree structures, these in turn can be composed of collections, maps and scalar values [14]. The most common representatives of this group are: MongoDB, CouchDB. – Graph databases. This type of databases is constructed on the basis of graphs containing nodes and edges. The most common representatives of this group are: Neo4j, HypergraphDB, FlockDB. It is usually because it offers a better, more effective and inexpensive solution when a new technology enters the market and gains widespread adoption. By offering a more flexible, scalable, and less expensive alternative to RDBMS, NoSQL databases have affected the database market. They have also been designed to better handle the requirements of Big Data applications. Table 1 describes the difference between RDBMS and NoSQL databases.
394
S. Bouaziz et al. Table 1. The main differences between RDBMS and NoSQL databases.
3
RDBMS
NoSQL
Data stored in relational model, with rows and columns
Data stored in a host of different databases-each with different data, storage models
Follows a fixed schema
Follows dynamic schemas
Supports vertical scaling
Follows horizontal scaling
Atomicity, Consistency, Isolation, and Durability (ACID) compliant
Is not ACID complaint, except the Graph DataBases
Data Warehouse Under NoSQL Databases
Several researchers have suggested approaches in the literature for migrating from relational databases to NoSQL ones. However, The authors in [13] developed a new benchmark for the columnar NoSQL DW. This research is considered the first work that proposes to implement star DW directly from a dimensional model under column-oriented NoSQL DBMS. These authors developed the DW based on NoSQL-oriented columns without giving formalization to the modeling process. This work is extended, in [12], by proposing three approaches which allow BDW to be implemented under the column oriented NoSQL model. Furthermore, the work of [10] concerns the implementation of a multidimensional DW in NoSQL databases. The star schema is transformed into a table in a column-oriented model or a document-oriented model, where all attributes of the dimension and fact tables are stored and grouped according to a specific rule. Thus, The authors in [9] used three different logical models to instantiate the DW. They sought to load data, converting from a model to a model. The links between facts and dimensions have been converted using nesting, although the transformation process proposed by the authors comes from a conceptual level (multidimensional model). In the absence of a clear approach which enables the implementation of data warehouse under NoSQL models. The authors in [22] proposed a set of transformation rules that ensure the successful translation from conceptual DW schema to two logical NoSQL models (column-oriented and document-oriented). These authors proposed two possible transformations, namely: simple and hierarchical transformations. Experiments were carried out using the benchmark TPC-DS. Graph databases are a special type of NoSQL database that has proven effective in storing and querying highly interconnected data, and has become a promising solution for multiple applications. The authors in [11] presented a framework called UMLtoGraphDB to create an MDA-based approach to implement conceptual schemas (UML) in graph-oriented databases, including the generation of the code required to verify the OCL constraints defined in the schema. This approach is specified as a chain of model transformations that uses a new intermediate GraphDB meta-model. This model also be considered as a kind of
NoSQL Big Data Warehouse: Review and Comparison
395
the UML profile for graph-oriented databases. Also, the authors in [20] proposed two types of transformations rules ensuring successful passage from conceptual DW schema to NoSQL graph-oriented models. Both of these transformations are based on a set of rules presented with examples. Although the use of not-relational data sources is still quite slow during the implementation of a data warehouse, The authors in [7] discussed the possibilities of using not-relational databases as a source of data warehouse (DW) and highlighted the potential benefits when using non-relational databases (NoSQL) based on the main features of the Relational Database Management System (RDBMS). The authors in [8] described the process of creating and producing the DW using an NoSQL data mart. These authors only presented solutions for the creation of a NoSQL DW via NoSQL sources and did not deal with the modeling task that describes the different steps of creating a NoSQL DW. Recently, the authors in [1] proposed three approaches for the implementing of BDW in the column-oriented NoSQL DBMS. Each approach differs in terms of structure and the type of attributes used when mapping the conceptual model into a logical model. In columnar logical models, they used these approaches to instantiate the conceptual model (star schema) and showed the variations between them when decisional queries are made.
4
Big ETL Approaches
The ETL process is the data processing system of the data warehouse because it provides data from the integration of heterogeneous and distributed data sources. Before processing the data, one must start by extracting them from multiple sources and then load them into the analytical systems. When data sources are large, fast, and unstructured, they become too complex to develop, too expensive to run, and take too much time to execute [17]. In order to improve the performance of the ETL process when dealing with very large data, the well-known solution is its parallelization/distribution on a cluster of computers. In [2], the authors proposed a fine-grained parallel/distributed approach for ETL process the functionalities of which run in a parallel way with the MapReduce paradigm. The authors proposed a BigETL process which is a feature-based approach that exploits the MapReduce paradigm. For each feature, the authors applied the same principle at the process level in the distributed ETL process approach. This Big-ETL provides multiple distribution models (topologies): VDF (Vertical Distribution of Functionalities), VDFP (Vertical Distribution of Functionalities and Process), HDF (Horizontal Distribution of Functionalities), PPD (Pipeline Processing Distribution) [3]. However, the impact of the massive data in a decision-making environment and more particularly is on the data integration phase. The authors in [4] developed a platform called P-ETL (Parallel-ETL), to store massive data according to the MapReduce paradigm. P-ETL consists in setting ETL processes (workflow) and an advanced configuration related to the parallel and distributed environment. These authors presented the fundamental principles of P-ETL in exposing
396
S. Bouaziz et al.
the supported partitioning techniques, the adaptation of the Map and Reduce phases to the specificities of ETL and they described the global architecture of P-ETL. The P-ETL platform is organized into five modules: Extracting, Partitioning, Transforming, Reducing and Loading. After extraction phase, the source data is loaded into the Hadoop Distributed File System (HDFS), then, a logical partitioning of the data is done according to the choice of the end user (Simple, Round Robin (RR), Round Robin par Bloc (RRB)). After that the partition of the generated data will be submitted to the MapReduce process. Each mapper is in charge of transforming the data of its partition (cleaning, filtering, converting,...). Data merge and aggregation functions are deferred for execution in the Reduce phase. At this level, the data becomes relevant and then can be loaded into the DW. Since the typical RDBMS is inefficient at handling big data, it is essential to extract, transform, and load unstructured data from NoSQL databases to the DW, leading to strategic decisions. There are similar research has been conducted to extract data from the NoSQL databases. However, in [16] the authors described the architectural overview of data extraction from NoSQL databases to a platform called Spotfire. Basically, the research focused on the loading SpotFire data source. This search, found at least a minor relationship to build a NoSQL-based ETL and covered the extracted data from the NoSQL databases. The author did not deal with the transformation and loading steps of the ETL processes. Moreover, in [18] the authors dealt with a methodology that treats the extraction of data from NoSQL databases. In their methodology, the authors realized that it would take much more time and effort to create an application for the NoSQL ETL framework. Data were extracted by the ETL Framework using the Application Programming Interfaces (API) which was produced by the NoSQL databases. To enhance the performance, parallel extraction jobs were executed. ETL framework will be able to schedule the Jobs at the convenient time. Therefore, the relevant jobs were executed at the predefined time. Apart from that adhoc execution of ETL jobs also possible. This is due to the loading of the data in non peak hours. Furthermore, the authors in [23] proposed an ETL-based platform to transform a multidimensional conceptual model into a document-oriented one. These authors proposed a two-level approach for DW building under document-oriented systems. The first stage concerns the specification of transformation rules for the definition of the logical NoSQL schema. In this level, the authors have saved the traceability of the transformation rules for a table called Correspondence Table (CT). The data are extracted from data sources and loaded into the staging area. The second level consists in using the resulting CT for the modeling and implementation of the proposed rules. Besides, these authors modeled the transformation rules using the Business Process Modeling Notation (BPMN). The resulting warehouse was evaluated using the TPC-DS benchmark. In [15], the authors proposed an approach called BigDimETL for BigDimensionalETL that deals with the ETL process taking into account the multidimensional structure of the DW. This approach is based on new technologies
NoSQL Big Data Warehouse: Review and Comparison
397
that have emerged with the big data domain. To speed up data processing, the authors used the MapReduce and Hbase paradigm as a distributed storage mechanism providing data warehousing capabilities. The goal of this approach is to adapt ETL processes to Big Data technologies to support decision making. This BigDimETL approach leads to integrating Big Data with the preservation of the multidimensional structure of DW based on vertical partitioning. BigDimETL is based on the execution of several queries to process unstructured data in the NoSQL database using the MapReduce paradigm. These authors proposed a Join algorithm adapted to ETL processes in the BigDimETL solution. In [5] and [6], the authors dealt with the Variety problem, namely: (a) the variety of the enormous amount of data sources (for example, traditional, semantic and graph databases) and (b) the variety of storage platforms during the construction of the DW, in which a data integration system can have several stores, one hosting a particular type. These authors proposed a methodology to examine this variety problem. First, through model-driven engineering, they produced different types of generic data sources. This genericity allows to overload ETL operators. To show the interest of this genericity, three examples of instantiation are described covering relational, semantic and graph databases. Second, they gave a web service-based approach to orchestrate ETL flows. Third, they introduced a merge procedure that merges all heterogeneous instances and deployed based on their favorite stores.
5
Discussion
It should be noted that, the mentioned research in this paper have studied the BDW from many points of views. Some research is interested in creating a DW under NoSQL databases. Few works, are interested in the modeling phases of DW. Other works dealt with the big ETL process. Table 2 summarizes the explored work against five criteria that are described as the following: – C1-Type of Data Source Used. This Criterion is related to the type of data sources used when building a BDW. The data sources can be divided into three categories: • SD: Structured Data. Data that are having a pre-defined schema, like RDBMS, XML, Relational Models, UML Class Diagram, Traditional DW, etc. • S-SD: Semi Structured Data. Data that may have a pre-defined schemas; which are often ignored such as like XML, JSON, OCL constraints, Ontology, etc. • US-D: Un-Structured Data. Data that have neither a predefined format, like UnStructured Repetitive Data (US-RD: They do not have predefined structure but are recurrent in time and they are generally massive.) and Unstructured Unrepetitives Data (US-URD: They do not have a single structure) [19].
398
S. Bouaziz et al.
– C2-The 3V. This criterion describes the 3V of big data refer to Volume, Velocity and Variety. • Volume. Which involves the masses of data that exceeds the usual units. • Velocity. Which implies the processing of data in real time. • Variety. Which involves the data is very varied and not always structured. – C3-Level of modeling. This criterion concerns the modeling levels explained in the paper (C-Conceptual Level/L-Logical Level). – C4-Transformations Rules. This criterion describes the proposition of rules used to transform a model from one level to another (Example: Conceptual level towards the logical level). – C5-ETL processes. This criterion shows the use of the ETL process according two types: the use of the MapReduce paradigm for ETL or the use of the classical ETL. • MR-ETL: MapReduce-ETL. This criterion concerns the approaches based on the MapReduce (MR) paradigm. • C-ETL: Classical ETL. This criterion concerns approaches based on classical ETL with its main functionalities. – C6-Data Storage. This criterion describes the NoSQL storage system used (HBase, MongoDB ...) when creating a BDW. It should be noted that most studies have proposed the construction of BDW under NoSQL databases. The common objective of these studies is the comparison between the different storage structures of NoSQL databases. Indeed, some of them mentioned the feasibility of warehouses on NoSQL databases while others presented performance studies between different types of NoSQL databases. In addition, in [8] the authors have only proposed solutions to build a massive DW from unstructured sources (NoSQL Databases). While, the work of [15] did not deal with the use of a NoSQL database for the creation of a BDW in a direct way. These authors used the method of conversion of a Json file to column oriented NoSQL structure. Most of the presented works concerns mainly, the structured data (DS) and the semi-structured data (DS-S). What is remarkable is that there is no research that has used the unstructured data (DN-S) as a data source, except the work of [19] who presented these three types of sources only at its proposed architecture, but without any detail on the modeling task and the ETL process. What is noticed, that there is no work that studies 3V at the same time. Among the works that dealt only with the Volume of data, we find the work of [16,18]. In addition, for the Variety of data, we can classify works that have processed variety at the data source level, such as [19] and the work that have dealt with the variety in the data storage level namely [10] and [9]. And also works that have dealt with the variety of input sources and the storage of data output from the DW such as [5,11,15] and [6]. For the Velocity, no work that have treat this V. However, the studies in [16] and [18] dealt with only the extraction phase of the ETL process, but they did not deal with the transformation and loading phase to carry out the construction of the Big ETL.
Data Model
UML Class
Chevalier et al. (2015b)
Abdelhedi et al. (2017)
Sellami et al. (2019)
L
C4
C5
MR-ETL
C-ETL
Semantic
MongoDB/
HBase
Hadoop
MongoDB
HBase
Hibernate
Drill
Apache
MongoDB
Cassandra
MongoDB
HBase
HBase
C6
Model
Graph DB
Oracle
Graph DB
RDBMS/
C3 C
Neo4j
Neo4j
MySQL
Berkani et Bellatreche (2017)
Berkani et al. (2018)
Variety
Multidimensional
DataBases
document
Mallek et al. (2018)
Velocity
Oracle SDB
Key/Value
DataBases
NoSQL
DataBases
NoSQL
US-URD
US-RD
DataBases
NoSQL
Json
Yangui et al. (2017)
Sahiet et Asanka (2015)
Petter (2012)
Bala et al. (2017)
Traditional DW
Bala et al. (2015)
Santos et al. (2017)
Constraints
Diagram
OCL
UML Class
Salinas et Lemus (2017)
Bicevska et al. (2017)
Daniel et al. (2016)
Diagram
Multidimensional
Yangui et al. (2016)
Chevalier et al. (2015a)
C2 US-D
Volume
S-SD
C1
SD
Dehdouh et al. (2015)
Paper
Table 2. Summary of the literature review.
NoSQL Big Data Warehouse: Review and Comparison 399
400
S. Bouaziz et al.
What should be noted is that, the companies are migrating the relational databases to NoSQL databases. Given the important role played by the NoSQL databases, it is necessary to study these new databases as a source of data for the modeling and implementation of DW. The NoSQL sources have no schema, which represents a challenge for the traditional approach of ETL processes where the source schema is always available. In addition, the extraction of data from NoSQL sources is a task not yet studied enough. Therefore, there is a need to extract the data from the NoSQL databases and load it into NoSQL databases. In addition, there is no work that has dealt with the definition of transformation operations applied to NoSQL databases to perform the ETL process. With the increase of Big Data, we can conclude that the DW and Big Data are complementary and could be integrated to share not only the storage of the data but also the use of data as a source.
6
Conclusion
In this article, we presented a study about BDW, which focuses, first, on the difference between the relational databases and the NoSQL databases. Then we presented recent studies that aimed at creating DW under NoSQL databases, whose main objective is the comparative study between the different alternatives of NoSQL solution. After, we detailed some works that studied Big ETL and the NoSQL based ETL process. Finally, a discussion was conducted on all the research studied according to six criteria. To conclude, it must be said that the works that treat the NoSQL databases as data sources for building a DW have not yet explored as well as the ETL process was not well studied. So this part needs more effort and exploration.
References 1. Abdelhedi, F., Ait Brahim, A., Atigui, F., Zurfluh, G.: Big data and knowledge management: how to implement conceptual models in NoSQL systems? In: Knowledge Management and Information Sharing (KMIS), pp. 235–240 (Jan 2017) 2. Bala, M., Boussaid, O., Alimazighi, Z.: Big-ETL: extracting-transforming-loading approach for big data. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), p. 462 (Jan 2015) 3. Bala, M., Boussaid, O., Alimazighi, Z.: A fine-grained distribution approach for ETL processes in big data environments. Data Knowl. Eng. 111, 114–136 (2017) 4. Bala, M., Mokeddem, O., Boussaid, O., Alimazighi, Z.: Une plateforme ETL parall`ele et distribu´ee pour l’int´egration de donn´ees massives. In: 15`emes Journ´ees Francophones Extraction et Gestion des Connaissances, EGC 2015, 27-30 Janvier 2015, Luxembourg, pp. 455–460 (2015) 5. Berkani, N., Bellatreche, L.: A variety-sensitive ETL processes. In: International Conference on Database and Expert Systems Applications (DEXA), pp. 201–216 (2017)
NoSQL Big Data Warehouse: Review and Comparison
401
6. Berkani, N., Bellatreche, L., Guittet, L.: ETL processes in the era of variety. In: Transactions on Large-Scale Data and Knowledge-Centered Systems, pp. 98–129 (2018) 7. Bicevska, Z., Neimanis, A., Oditis, I.: NoSQL-based data warehouse solutions: sense, benefits and prerequisites. Baltic J. Mod. Comput. Riga. 4, 597 (2016) 8. Bicevska, Z., Oditis, I.: Towards NoSQL-based data warehouse solutions. Procedia Comput. Sci. 104(C), 104–111 (2017) 9. Chevalier, M., Malki, M.E., Kopliku, A., Teste, O., Tournier, R.: Implementation of multidimensional databases with document-oriented NoSQL. In: Big Data Analytics and Knowledge Discovery - 17th International Conference, DaWaK 2015, Valencia, Spain, 1–4 Sept 2015, Proceedings, pp. 379–390 (2015) 10. Chevalier, M., Malki, M.E., Kopliku, A., Teste, O., Tournier, R.: Implementing multidimensional data warehouses into NoSQL. In: ICEIS 2015 - Proceedings of the 17th International Conference on Enterprise Information Systems, vol. 1, pp. 172–183, 27–30 Apr 2015. Barcelona, Spain (2015) 11. Daniel, G., Suny´e, G., Cabot, J.: Umltographdb: mapping conceptual schemas to graph databases. In: The 35th International Conference on Conceptual Modeling (ER2016). Gifu, Japan (Nov 2016) 12. Dehdouh, K., Bentayeb, F., Boussaid, O., Kabachi, N.: Using the column oriented NoSQL model for implementing big data warehouses. In: The 21st International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 469–475 (2015) 13. Dehdouh, K., Boussaid, O., Bentayeb, F.: Columnar NoSQL star schema benchmark. In: Model and Data Engineering - 4th International Conference, MEDI 2014, Larnaca, Cyprus, 24–26 Sept 2014, Proceedings, pp. 281–288 (2014) 14. Kurpanik, J.: NoSQL databases as a data warehouse for decision support systems. J. Sci. Mil. Acad. Land Forces 49(3), 185 (2017) 15. Mallek, H., Ghozzi, F., Teste, O., Gargouri, F.: BigDimETL with NoSQL database. Procedia Comput. Sci. 126, 798–807 (2018) 16. Petter, N.: Extracting Data from NoSQL Databases - A Step towards Interactive Visual Analysis of NoSQL Data. Master’s thesis, 74 (2012) 17. Ramu, Y., Hota, C.P.P.K., Rao, D.B.V.S.: A relative study on traditional ETL and ETL with apache Hadoop. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (IJARCSSE) 6, 74–78 (2016) 18. Sahiet, D., Asanka, P.D.: ETL framework design for NoSQL databases in data warehousing. Int. J. Res. Comput. Appl. Rob. 3, 67–75 (2015) 19. Salinas, S.O., Lemus, A.C.N.: Data warehouse and big data integration. Int. J. Compu. Sci. Inf. Tech. (IJCSIT) 9, 1–17 (2017) 20. Sellami, A., Nabli, A., Gargouri, F.: Transformation of data warehouse schema to NoSQL graph data base. In: Intelligent Systems Design and Application, pp. 410–420 (Jan 2019) 21. Tian, Y.: Accelerating data preparation for Big Data analytics. Ph.D. thesis, Thesis (2017). http://www.eurecom.fr/publication/5173 22. Yangui, R., Nabli, A., Gargouri, F.: Automatic transformation of data warehouse schema to nosql data base: Comparative study. In: Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 20th International Conference KES-2016, York, UK, 5-7 Sept 2016, pp. 255–264 (2016) 23. Yangui, R., Nabli, A., Gargouri, F.: ETL based framework for NoSQL warehousing. In: Information Systems - 14th European, Mediterranean, and Middle Eastern Conference, EMCIS 2017, Coimbra, Portugal, 7–8 Sept 2017, Proceedings, pp. 40– 53 (2017)
ICN Based DOS Attack Mitigation in Vehicle Adhoc Networks Mahesh R. Patil and Loganathan Agilandeeswari(B) Vellore Institute of Technology, Vellore, India [email protected]
Abstract. Information centric network (ICN) is the new paradigm and the backbone of the future internet architecture. According to the latest survey [1] research in the networking is moving towards ICN to satisfy future demands from the internet, where legacy networks will fail to deliver best results. With the introduction of new technologies like ICN, it has become predominant to make these networks most secure for data transmission. There are several types of attacks which can occur in ICN and one of them is Denial of service (DOS) attack. In this article ICN based DOS attack mitigation method is introduced applicable in vehicle adhoc network. With the use of statistics called Interest fulfillment rate on the incoming interface of the nodes, name prefixes which are malicious and can be detected and interests coming from these malicious nodes can be dropped or slowed down using interest shaper to accommodate the legitimate traffic. This approach is implemented and simulated in NS3 based NDN simulator. Keywords: ICN · NDN · DOS · Interest shaping
1 Introduction Vehicle Adhoc networks are created to apply the specific purpose of vehicle to vehicle communication to achieve certain features and services. Various services can be implemented in this vehicular Adhoc network. This network consists of fixed infrastructure network and Adhoc network. In VANET vehicles can communicate with each other as well as they can communicate with base stations called road side units as shown in below Fig. 1. Due to Adhoc network architecture it is very crucial to make this network totally secure because there are various message and services exchange takes place between the nodes. VANET helps in informing any accidents or other mishaps to the vehicles which are travelling in same path and they can make these vehicles divert to another route due to congestion on road. Vehicles can also pass various information related to travel to other vehicles. This network can prevent future accidents of vehicles and make travel easier and better. There is possibility of any one of the node to become malicious and act differently. This node intercepts the communication with the motive of disrupting the network for their own advantage or for fun which is called as Denial of Service (DOS) attack. DOS attacks are usually initiated by one or more than one source, when more than one source involved then it is called as Distributed Denial of Service (DDOS) attack [14]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 402–410, 2021. https://doi.org/10.1007/978-3-030-71187-0_37
ICN Based DOS Attack Mitigation in Vehicle Adhoc Networks
403
In the vehicle Adhoc network one node (vehicle) becomes malicious can try to flood the network with multiple interest with different name prefixes. Such kind of attack must be detected and mitigated to allow for the normal operation of the legitimate traffic. Otherwise attacker will flood the network with interest packets and disrupts or jams the network and does not allow for normal function.
Fig. 1. Vehicle Adhoc network infrastructure
2 Related Works There are lots of security related works done in the vehicle Adhoc network with the use of legacy network like TCP/IP but ICN is a recent network architecture and security challenges in ICN is different than legacy networks so there is need to work with ICN for future support. Here we will discuss about works done in legacy networks and later works done in ICN [10–12, 15]. [3] Author has proposed an algorithm to detect the malicious packets and drop all the incoming packets through that interface. Here malicious packets are detected with the help of RSU’s, if any node sends harmful messages then RSU will keep track of such packets and vehicle can be tracked based on location, this comes under the packet detection method, here frequency velocity and number of nodes are the parameters to detect the malicious nodes and packets. They have implemented and measured the performance in terms of packet loss, network throughput, packet delivery ratio and number of alive nodes. Simulated in NS-2 where both malicious packets and legitimate packets are detected. [4] Authors have proposed malicious packet detection by implying verification check using advanced packet detection formula. Here when vehicle sends request to road side unit then RSU will capture information of vehicle. Later when malicious node starts
404
M. R. Patil and L. Agilandeeswari
sending malicious packets then RSU will detect this abnormal behavior and stops the acceptance of packets from that node and later it removes records of such node. This algorithm is implemented in NS-2. [5] Here the malicious nodes are detected in two classified ways, one is node centric and another is data centric. In the node centric method nodes are monitored continuously for malicious behaviors, in data centric approach sent and received data are compared for detection of the malicious behaviors. This approach is simulated in matlab. In this article [6] authors have proposed a dynamic defense strategy to confuse a DOS malicious attacker, then to reduce the losses caused by the malicious attacker some security services are added to the port which is valueless to the malicious node, this method is inspired by port hopping. An article based on [7] congestion aware DOS attack detection in NDN, here when there are multiple interests are coming in bursts then the router will check for the satisfaction ratio, based on this ratio the procedure will detect suspicious behavior and start to investigate on these incoming interests. If found such behavior then router will drop those packets and invoke congestion control procedure. This proposed model is implemented in ndnSIM and based on the network congestion the attacks are mitigated. [8] In this article author discusses about the Named data networking security and they have mentioned 3 different kinds of interest flooding mitigation methods and these are token bucket per interface fairness, satisfaction based interest acceptance and satisfaction based push back. They have evaluated and compared them and the results are obtained which shows the most effective approach is satisfaction based push back according to their results. In article [9] Authors have discussed about different caching policies and their importance in ICN. These caching policies help ICN routers to deliver data fast and efficiently, authors have compared those policies in different scenarios and suggested which caching policy will work best in which scenario. With the knowledge of the outcome of this article we will be able to decide about which caching policy must be used in vehicular Adhoc network for efficient data delivery and should work greatly during mitigating interest flooding attack. To overcome these kinds of DOS attacks, this article proposes a method to detect and mitigate the attack, and in turn it also helps in congestion control if the attacking node behavior is doubtful and the behavior is not known by sending these doubtful interests in the interest shaping queue to keep the flow balance and avoid service denial. In this article we mainly concentrate on detection methods of DOS attack and ways to mitigate it, which is in Sect. 3, Sect. 4 describes our approach implementation parameters and configuration, Sect. 5 shows the performance evaluation of this approach and finally concluding in Sect. 6.
3 Detecting and Mitigating DOS Attack In the network at a particular node Interest Fulfillment Ratio (IFR) is calculated all the time whenever there is an incoming interest with different name prefixes. This statistics is done periodically every 2 s as this is the interest packet default expiration time in ndnSIM [2].
ICN Based DOS Attack Mitigation in Vehicle Adhoc Networks
405
When a node finds that the IFR of a name prefix incoming from an interface is less than I min and the total volume of interest packets with that particular prefix is greater than the value N max this prefix can be called as malicious one. Here we set the N max as legitimate request unfortunately which is not fulfilled. If this is the case then that node will drop all the incoming interest with that particular prefix. If the IFR for a name prefix coming at an interface is not less than I min but less than another threshold value I medium then this prefix will be recognized as likely malicious one. If this is the case then that node will insert all those interests in the interest shaper queue to reduce the bandwidth consumption and accommodate legitimate traffic. If the IFR is greater than I medium then the interest packets are legitimate packets and are transferred as usual. To avoid congestion as well as to reduce interest sending rate for doubtful traffic we use interest shaper queue. Interest Shaping Interest shaper queue is present at each and every node in the network, when a likely malicious attack is detected with the name prefix in it, then the interests which are associated with that attack are placed in interest shaper queue. Later node will forward these interests from the queue which will ultimately slow down the interest sending rate and node will have full control over the interest forwarding speed. Interest shaper queue will also help in reducing congestion on that path and accommodates legitimate traffic.
Fig. 2. Interest shaping mechanism
In above Fig. 2, R1 and R2 are the two in-network routers communicating between producer and consumer. Q1 is the interest shaping queue placed at R1, when likely malicious interests are received at a particular node then these interests are placed in the interest shaping queue Q1. This method also helps in congestion control by allowing legitimate interests directly sent over network without queuing them in the interest shaper queue.
4 Implementation (See Fig. 3). To mimic the actual scenario of the vehicle Adhoc network we have considered 7node 2-bottleneck topology, where C1 to C5 are consumers, P1 & P2 are the producers. This topology is totally Adhoc and the links are wireless links connected to each other. Node C2 is the attacker node in this implementation, where it tries to jam the network by sending malicious interests. C1 and C3 are the primary nodes to be attacked. This scenario is implemented in ndnSIM [2] with above topology (Table 1).
406
M. R. Patil and L. Agilandeeswari
Fig. 3. 7-node 2-bottleneck topology Table 1. Link configuration X
Y
Capacity Delay Max packets
C1 P1 10 Mbps 1 ms
100
C1 P2 10 Mbps 1 ms
100
C1 C2 1 Mbps
1 ms
20
C1 C3 1 Mbps
20 ms
20
C5 C3 10 Mbps 50 ms 100 C4 C3 10 Mbps 10 ms 100 C2 C3 1 Mbps
1 ms
20
5 Performance Evaluation Parameters like Interest fulfillment rate and Pending interest table (PIT) consumption rate are the parameters to measure the effectiveness of our proposed approach. The pending interest table consumption rate is calculated as below. Pending Interest Table Consumption Rate Total no of PIT entries in real time × 100% = Pending interest table total size
(1)
The interest fulfillment rate (IFR) is calculated according to (2). IFR (Interest fulfillment rate) Fulfilled Interest Packets = × 100% No of interest packets sent out in total
(2)
Here we consider the PIT size as 500; total simulation time is set for 100 s. In the simulation the normal user sends interests at the rate of 100 interests per second, and attacker once starts the attack the attacker will flood interests at the rate of 500 interests/sec. The threshold values, which are also discussed in Sect. 3, are as follows: Imin = 0, Imedium = 25%, Nmax = 50. Figure 4 and 7 are the graphical representation of the consumers C1 and C3 at the time of active simulation, these figures shows us the status of the pending interest table during simulation.
ICN Based DOS Attack Mitigation in Vehicle Adhoc Networks
407
In this simulation, attacker starts attacking at 20 s and attack continues till 60 s. If we have a closer look at these figures, these two consumers are in-network bottleneck link attached nodes, so most of the traffic will pass through these nodes. At the beginning of simulation, consumers start requesting the data by sending interest packets towards the producer. When the attack begins at 20th second at these nodes, they get flooded by unwanted interests, PIT entries start building up and PIT becomes full. Since these two nodes are attacked and there is no room for legitimate traffic, the network gets jammed during start of the attack. At the 60th second our approach starts to detect the malicious behaviour and mitigates the attack. When there is no malicious attack, most of the interest requests are fulfilled. As shown in the results the pending interest table consumption is not high for the nodes during the first 20 s but when the attack starts, node C1 and C3 PIT gets full due to interest flooding. Later our approach manages and mitigates the attack and PIT resumes to previous state. When there is no threat from the attacker, nodes result in high IFR (Interest fulfillment rate) as shown in below results but when the attack starts at 20th second in-network nodes gets flooded with malicious unwanted interests, resulting in lowest IFR. Our proposed approach starts to work when the attacker floods the interests at C1 and C3, these two nodes successfully detects the attack and starts dropping packets received from the attacker C2. In this way interest flooding is mitigated (Figs. 5 and 6).
Fig. 4. PIT consumption rate and interest fulfillment rate at C1
408
M. R. Patil and L. Agilandeeswari
Fig. 5. PIT consumption rate and interest fulfillment rate at C4
Fig. 6. PIT consumption rate and interest fulfillment rate at C5
6 Conclusion In this article we have proposed DOS attack mitigation in ICN based VANET. Our approach proposes calculation of interest fulfillment rate to detect and differentiate between malicious traffic and legitimate traffic. When the interests are doubtful that whether it is legitimate or malicious and difficult to detect, in such case interest shaper is used to put such interests in the shaper queue for better flow control and avoiding congestion and
ICN Based DOS Attack Mitigation in Vehicle Adhoc Networks
409
Fig. 7. PIT consumption rate and interest fulfillment rate at C3
network jamming. Our approach is specifically designed for ICN based vehicle Adhoc network and simulated in ndnSIM, results show that this approach is capable to mitigate the interest flooding attack or DOS attack efficiently without hurting the legitimate traffic.
References 1. Patil, M.R., Agilandeeswari, L.: A role of routing, transport and security mechanisms in information centric network. Int. J. Recent Technol. Eng. (IJRTE) 8(2S4), 196–203 (2019) 2. NS-3 based Named Data Networking (NDN) simulator home page. https://ndnsim.net/cur rent. Accessed 15 Oct 2020 3. Kumar, S., Mann, K.S.: Prevention of DoS attacks by detection of multiple malicious nodes in VANETs. In: 2019 International Conference on Automation, Computational and Technology Management (ICACTM), London, United Kingdom, pp. 89–94 (2019). https://doi.org/10. 1109/ICACTM.2019.8776846 4. Singh, A., Sharma, P.: A novel mechanism for detecting DOS attack in VANET using enhanced attacked packet detection algorithm (EAPDA). In: 2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS), Chandigarh, pp. 1–5 (2015). https://doi.org/10.1109/RAECS.2015.7453358 5. Kumar, S, Mann, K.S.: Detection of multiple malicious nodes using entropy for mitigating the effect of denial of service attack in VANETs. In: 2018 4th International Conference on Computing Sciences (ICCS), Jalandhar, pp. 72–79 (2018). https://doi.org/10.1109/ICCS. 2018.00018
410
M. R. Patil and L. Agilandeeswari
6. Jie, Y., Li, M., Guo, C., Chen, L.: Dynamic defense strategy against dos attacks over vehicular ad hoc networks based on port hopping. IEEE Access 6, 51374–51383 (2018). https://doi. org/10.1109/ACCESS.2018.2869399 7. Benmoussa, A., et al.: A novel congestion-aware interest flooding attacks detection mechanism in named data networking. In: 2019 28th International Conference on Computer Communication and Networks (ICCCN), Valencia, Spain, pp. 1–6 (2019). https://doi.org/10.1109/ ICCCN.2019.8847146 8. Afanasyev, A., Mahadevan, P., Moiseenko, I., Uzun, E., Zhang, L.: Interest flooding attack and countermeasures in Named data networking. In: 2013 IFIP Networking Conference, pp. 1–9. IEEE, May 2013 9. Hegadi, R., Kammar, A., Budihal, S.: Performance evaluation of in-network caching: a core functionality of information centric networking. In: 2019 International Conference on Data Science and Communication (IconDSC), pp. 1–8. IEEE, March 2019 10. Ahlgren, B., Dannewitz, C., Imbrenda, C., Kutscher, D., Ohlman, B.: A survey of informationcentric networking. IEEE Commun. Mag. 50(7), 26–36 (2012) 11. Jacobson, V., et al.: Networking named content. In: Proceedings of CoNEXT (2009) 12. Zhang, L., Afanasyev, A., Burke, J., Jacobson, V., Claffy, K., Crowley, P., Papadopoulos, C., Wang, L., Zhang, B.: Named data networking. SIGCOMM Comput. Commun. Rev. 44(3), 66–73 (2014) 13. Compagno, A., Conti, M., Gasti, P., Tsudik, G.: NDN interest flooding attacks and countermeasures. In: Annual Computer Security Applications Conference (2012) 14. Tang, J., Zhang, Z., Liu, Y., Zhang, H.: Identifying interest flooding in named data networking. In: Green Computing and Communications (GreenCom), 2013 IEEE and Internet of Things (iThings/CPSCom), IEEE International Conference on and IEEE Cyber, Physical and Social Computing, pp. 306–310. IEEE (2013) 15. Named Data Networking home page. https://named-data.net/. Accessed 15 Oct 2020
From Machine Learning to Deep Learning for Detecting Abusive Messages in Arabic Social Media: Survey and Challenges Salma Abid Azzi(B) and Chiraz Ben Othmane Zribi National School of Computer Science, Manouba University, Manouba, Tunisia {Salma.Abid,Chiraz.zribi}@ensi-uma.tn
Abstract. The pervasiveness of social networks in recent years has revolutionized the way we communicate. The chance is now opened up for every person to freely and anonymously share his thoughts, opinions and ideas in a real-time manner. However, social media platforms are not always considered as a safe environment due to the increasing propagation of abusive messages that severely impact the community as a whole. The rapid detection of abusive messages remains a challenge for social platforms not only because of the harm it may cause to the users but also because of its impact on the quality of service they provide. Furthermore, the detection task proves to be more difficult when contents are generated in a specific language known by its complexity, richness and specificities like the Arabic language. The aim of this paper is to provide a comprehensive review of the existing approaches for detecting abusive messages from social media in the Arabic language. These approaches extend from the use of traditional machine learning to the incorporation of the latest deep learning architectures. Additionally, a background on abusive messages and Arabic language specificities will be presented. Finally, challenges are described for better analysis and identification of the future directions. Keywords: Machine learning · Deep learning · Natural language processing · Arabic language · Abusive messages · Social media
1
Introduction
Given the wide spread of abusive messages in social media, many researchers are fighting this scourge by exploring various research areas using Natural Language Processing techniques. Contributions in different languages varied with regard to the research terms as many researchers tended to refer to more frequently used expressions covering, in general, the abusive messages phenomena like “Hate speech” while others have shifted to employing a confined subtype like the spread of “violent content”. Over the last decades, there has been a significant interest in automating the detection of abusive content not only because its resulting effects are increasingly dangerous on individuals and their social relationships, but also c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 411–424, 2021. https://doi.org/10.1007/978-3-030-71187-0_38
412
S. A. Azzi and C. B. O. Zribi
because existing solutions haven't gone through considerably great achievements in detecting and preventing harmful and abusive messages. For all the companies hosting user generated content and more particularly social platforms owners, the need of potential automated techniques is continuously growing along with a big moderation issue: An abusive message might not simply depend on keywords but much more on other aspects such as users’ cultures, world knowledge and circumstances at the time of posting. That’s why, freedom of expression must be primarily respected. Recently, the average rate of generating Arabic content is particularly growing along with the varying usage trends of influential social platforms across the Arabic countries. In the context of political and social transformations the Arab world has witnessed, activists have heavily used the social platform for disseminating views antagonistic to several Arab governments [1] or planning insurgent attacks and terrorist incidents. The need to restrict and regulate the social media usage is, therefore, with no doubt. As a particular research area, the automatic identification of Arabic Social media abuse has attracted a limited number of researchers when compared, for instance, with the contributions examined on the English language. This brings us, on the one hand, to the characteristics of the Arabic language and, on the other hand, to the particular norms and cultures in the Arabic regions which have both a direct impact on the investigation and detection process. The rest of the paper is organized as follows: Sect. 2 will define abusive messages in social media and their different types while Sect. 3 will go through the specificities and the difficulties of the Arabic language. Then, approaches for the automatic detection of Arabic abusive messages in social media will be presented in Sect. 4. Finally, we analyze the discussed papers and highlight challenges for future research in Sect. 5.
2 2.1
Abusive Messages in Social Media Definition
Social media abuse is the usage of online platforms to spread any content that could intentionally injure or embarrass another person. It is the general concept that covers all the hurtful language [5]. However, defining abuse is still considered as a matter of controversy and a debatable issue as it depends on people’s social and political backgrounds: Some may perceive an utterance as abusive while others don’t. This interpretation is closely linked to the cultural norms and the commonly debated topics in the same geographic regions. For instance, a tweet , which means:“Where like is Moeidh’s bamboo stick? #Naked_youth_in_Dammam_Corniche”, doesn’t make any sense for non-Saudi people, or rather those who don’t know the violent video of Mr. Moeidh hitting young boys with a bamboo stick; yet, those who watched the video beforehand would interpret the content as hurtful and conveying a clear incitement for violence against youth in Damam.
Detecting Abusive Messages in Arabic Social Media: Challenges and Survey
413
While abusing normally implies the use of harmful terms (Table 1(c)), we need undoubtedly to eliminate the case when these explicit terms are utilized to speak about a related topic without an undercurrent of offense (Table 1(d))). On the other hand, abuse can also be committed and not expressed directly like in the sentence on Table 1(e) where the user is expressing happiness about a terrorist act and wishing death to more people. Table 1. Explicit and implicit abusive messages
Here, we are differentiating the implicit and the explicit forms. According to [2], abusive language is that which is unambiguous in its potential to be abusive. With whatever form it may take, it is the desire to control, manipulate, and harm someone else using social media. 2.2
Types of Abusive Messages
A lot of criteria could be found to distinguish the different types of abusive messages. We chose, in the following, to differentiate the eventual types based on the content of the messages and their effect on the abused person. Discriminative Content. It includes any sort of prejudice against a person showing different physical characteristics, belongings or preferences. In the following, and for each case, we gave real illustrative examples from social media. Targets may not only be individuals but also a whole community sharing common characteristics which are mainly, but not exclusively: • Physical and apparent characteristics: It is the case when posts disparage people on the basis of visible specificities like gender, skin color, and disability. This is one of the most common and dangerous discriminative abuse. Example:
You, black dog, you gave us peace with your death.
414
S. A. Azzi and C. B. O. Zribi
• Belongings: This category comprises all kinds of hatred expressions against anyone because of a group to which he belongs. We can cite, as examples, xenophobia and regionalism.
Among us is a Chinese woman, we will all be infected. She has to return to her country. • Preferences: It is committed when inciting hate and diminishing a minority knowing their personal choices and preferences. For instance, abusing an individual because of his favorite music band or his political preferences.
The political islamist would sell his mother if he has an interest with his father’s wife. Violent Content. In the wake of the increasing number of extremist groups and criminals, users in social media became more exposed to the aggression. The rise in such activities was in the aftermath of the Arab Spring where countries that have lived series of anti-government protests, uprisings, and armed rebellions in the early 2010s were the most affected by this emergence. The violent content being spread by extremists usually contains terms of physical attacks like . Correspondingly, death, murder, burn, we can define online violence as the use of any term threatening or promoting an intentioned violence. However, violence is not always clearly expressed and is not exclusively committed by terrorists and criminals: simple and short messages from ordinary users may, as well, legitimate and reinforce violence.
The previous sentence is referring to a tweet after a terrorist attack saying “Finally, a terrorist blow himself up in people who deserved it, not in a mosque”: It does not only share the joy and happiness after an anti-humanitarian act but also justifies it. Adult Content. Vulnerable categories of people like children or youth are dangerously exposed to the psychological threat of Adult-oriented content in social Media. It includes mainly pornography, the texts illustrating sexual behaviors and more importantly children sexual abuse. Due to some religious and cultural sensitivities, the depiction of pornography is totally forbidden and inappropriate, yet the number of consumers is strikingly high [3]. That’s why; in this category, the main focus needs to be oriented to those who share and spread the content along with the creators themselves.
Detecting Abusive Messages in Arabic Social Media: Challenges and Survey
3 3.1
415
Which Difficulties for Detecting Arabic Abusive Messages ? Vowelization
It is the fact that the Arabic language contains, other than the consonants, only 3 long vowels (alif , waaw , yaa ) and 3 short vowels that belong to a bigger diacritics list and take the form of accent symbols (Fatha , Kasra , Damma ). In the modern written Arabic texts, no vowel signs are given. Words can be read differently and can therefore have several meanings. That’s why, the reader has to deduce the diacritics from prior knowledge and rely on context to compensate for the lack. In the following, we have an example illustrating the impact of neglecting vowels on the meaning of the world even though one could be understood in two different ways: single letter was un-vowelized. or which are apparently the same, except for the first letter . However, the first word is a call for recovery and the second word is a call for loss and death. 3.2
Agglutination
The Arabic language is highly agglutinative: Words can be formed from a base, to which we can add affixes (prefixes and/or suffixes) and clits (enclitic and/or is a one Arabic word including Enclitic, proclitic). For example, Suffix, Radical, Prefix and Proclitic. If we try to translate it to the English language, we will get a complete sentence: “You (Feminine) will kill them (masculine Plural)" (Table 2).
Table 2. Agglutination in the Arabic word:
Consequently, Arabic words may have several possible segmentations which would enlarge the sources of morphological complexity. For instance, the inter" will depend on the letter if seen as a conjunction pretation of the word “ or a simple letter in the word. • Noun (Corruption): • Proclitic (So) + Verb (Prevail): While corruption is literarily the abuse of entrusted power for private gain, the second form is completely different and has no abusive intention.
416
3.3
S. A. Azzi and C. B. O. Zribi
Grammatical Ambiguity
Misinterpretation is highly occurring in the Arabic language: One word, sometimes even vowelized, could have several grammatical interpretations in different having two possible meanings: contexts. For instance, we can find the word • “A snake”: • “Became evil:
(noun) (verb)
and having the same vowelizaThey are both referring to the same form tion. However, as snakes are poisonous and malicious, some would use the word with an abusive intention. The second form is a rarely used and less abusive informative verb. 3.4
Semantic Ambiguity
A given word in Arabic language can be semantically interpreted in several , ways, each having the same voyellation and the same grammatical class: for example, is a noun; still, whether it means retardation or absence, it depends on the context.
As we can deduce, absence is not an abusive word but accusing anyone with backwardness is an offensive attitude. 3.5
Varieties of Arabic
When it comes to communicating, social Media users tend to express themselves with the most comfortable way as there are no language rules to be respected. Particularly, people in Arabic countries who received education with at least one foreign language often combine it with the dialectal Arabic and the Modern Standard Arabic (MSA) which constitute a combination that can be called “The Social Media Language” or “Chat language”. Consequently, the frequent use of this “language” affected the script itself to such a degree that we can find, in one sentence, the Latin script, the Arabic script and maybe numbers that represent Arabic letters. As highlighted in [?], textual content of social networks is characterized by an intense orthographic heterogeneity which made its processing a serious challenge for NLP tools. [?] also gave an example illustrating a translation of one English word to the Arabic language with various spelling variants. Table 3 is an example of a sentence in the chat language which is a combination between the French language, the Modern Standard Arabic and the Tunisian dialect. The sentence is also written in different scripts combined with letter and the ‘5’ the letter in the numbers. The ‘3’ here represents the Arabic language. The sentence says: I swear! Women are very stupid.
Detecting Abusive Messages in Arabic Social Media: Challenges and Survey
417
Table 3. An abusive message written in different languages and scripts
4
Approaches for the Automatic Detection of Arabic Abusive Messages in Social Media
Research on detecting Social media abusive language and its closely related concepts has been deeply investigated in English language. Other common languages like German, Italian or Indian had, as well, a high growth rate in means of studies in the same area. Contrarily, similar research for Arabic language is still scarce since detecting abusive unstructured texts in such a specific language from social media is a highly challenging issue. Researchers used to take advantage of NLP computational task to interpret context-dependent natural language and make it understood by the machine which was the case of abusive language. One of the approaches was lexiconbased which relies on building lexicons (domain based lexicons) depending on extracting useful features as lexical terms. However, the tendency toward the use of Machine learning based approaches is highly remarkable. The rationale behind this consists of overcoming the limitations of lexicon-based approaches, since these can fail to detect foul language by missing variations and invented or misspelled words [4]. Recently, a clear transition from classical machine learning to deep learning approaches is noticed regarding the detection of abusive language in social media: The emergence of word embeddings like word2vec as semantic features was among the reasons of this trend. These features provide a vector representing the meaning of a word or phrase. Researchers are tending toward using deep learning for complex tasks like the detection of abusive messages but it isn’t until now obvious that it exceeds traditional machine learning approaches in term of efficiency. A promising future is expected in this area as deep learning models depend entirely on the artificial neural networks but with extra depth. It is important to mention that deep learning models need intensive training and require sufficient datasets in term of size. In what follows, we focus on contributions to detect abusive messages in Arabic language: both classical machine learning approaches and deep learning approaches are described. After that, a discussion and analysis are conducted. 4.1
Machine Learning Approaches
The majority dealt with the detection of abusive messages for the Arabic language as a supervised machine learning task: [17] aimed to detect abusive accounts that use obscenity, profanity, slang, and swearing words in Arabic text
418
S. A. Azzi and C. B. O. Zribi
format. They applied the SVM classifier and achieved a predictive accuracy of 96%. Authors in [14] used a publically available dataset which was created in [15] from YouTube offensive comments taking into account the difference in the Arab dialects and the diversity of perception of offensive language throughout the Arab world. The use in [14] was for a supervised machine learning scenario where they also trained an SVM classifier combined with n-gram, word-level features and several preprocessing techniques. 90.05% was the accuracy achieved. In addition to the previous behaviors, rogue and spam accounts were the issue addressed by [12] in their supervised approach. They claimed that spammers are distributing adult content through Arabic tweets that Arabic norms and cultures prohibit. Their experiment showed that the random forest classification algorithm with 16 features performs best, with accuracy rates greater than 90%. [13] constructed, as well, an Arabic Twitter large-scale dataset of adult content together with three lexica involving hashtags, unigrams, and bigrams. The analysis of the data showed the geographical distribution of adult content target. As in other languages, some contributions like [11], have referred to cyberbullying in Arabic social media. They believed that its threat on Arab children is rising with the abundance of electronic devices. For the training procedure, they used two classifiers which are Naive bayes and SVM chosen based on analysis of the previous work done in the machine Learning field. The precision achieved with the both classifiers was respectively 90.1% and 93.4%. Violence and extremism were also investigated in Arabic language. [10] claimed that most ISIS members are after all Arabs and classifying Arabic Twitter users into pro-ISIS and anti- ISIS would help to look more at the “soil on which violent behavior can flourish". To accomplish that task, they trained a classifier that predicts future support or opposition of ISIS with %87 accuracy. A machine learning scenario was also presented in [8] with three different classification algorithms; SVM, Naive Bayes and Adaboost. The aim was to automatically classify a tweet as containing material that is supporting jihadists groups or not. The semi-supervised approach is with less popularity in the Arabic language. Research using it usually aim firstly to avoid the difficulties found with supervised approaches while dealing with the complex, noisy and unstructured data and secondly to try to attenuate the impact of manual labeling insufficiency. It was used in [9] to detect tweeps who are media supporters of jihadist groups disseminating propaganda content online using a combination of data-dependent and data-independent features. Results showed that the model performed well for classifying English tweeps and tweets and performed significantly worse on Arabic data. Furthermore, an unsupervised framework was as well presented in [7] for detecting violence in the Arabic Twitter using a probabilistic nonlinear DR technique and a clustering algorithm. They showed that violent and non-violent Arabic tweets are not separable using k-means in the original high dimensional space and better results are achieved by clustering in low dimensional latent space of SGPLVM.
Detecting Abusive Messages in Arabic Social Media: Challenges and Survey
4.2
419
Deep Learning Approaches
It is a set of machine learning methods based on neural networks with multiple hidden layers. Trying to learn from a high level of abstraction, deep learning model is based on layers of nonlinear processing units extracting characteristics from a previous layer and transmitting them to the successive one. Compared to the traditional machine learning algorithms, deep learning absorbs a massive amount of unstructured data. Recently, [21] claimed that little research has been done using deep learning on the Arabic language. With the aim to fill this gap, they used the dataset proposed by [15] and presented an evaluation of four different neural network architectures performances: Convolutional Neural Network(CNN), Bidirectional Long Short-Term Memory (Bi-LSTM), Bi-LSTM with attention mechanism, and a combined CNN-LSTM architecture. The best recall of 83,46% was achieved by the last combined CNN-LSTM while results demonstrate highest accuracy, precision and F1_score of respectively 87.84%, 86.10% and 84.05% for CNN. Moreover, [20] built a dataset of religious hate speech manually annotated to hate and not hate and evaluated it using different classifiers. The best results were achieved using deep learning with an F1_score of 79%. Later, the same dataset was used by [26] to propose ARHNet (Arabic Religious Hate Speech Net) model that incorporates both Arabic Word Embeddings and Social Network Graphs. They developed multiple deep learning models using GRU, LSTM, Bi-LSTM, and Bi-GRU. The ARHNet model (LSTM + CNN + NODE2VEC) outperforms the others in terms of Recall (89%), F1_score (78%) and AUROC (86%). A more recent study [27] explored multiple deep learning models like RNN, LSTM, GRU using TF-IDF feature and AraVec word embeddings. The best precision (87%), Macro-F1 (83%) and Weighted-F1 (84%) was achieved by the system they named SalamNet which consists of a Bi-GRU architecture with TF-IDF features. Finally, [28] claimed that their “quick" approach took 3 d to be implemented yet achieved reasonable performance. They firstly investigated the performance of 15 different classical and neural learning models like SVM, Random Forest, XGBoost, Extra Trees, Logistic Regression, RNN and CNN. After that, they show that joint CNN and LSTM neural learning architecture outperforms the classical ones. With the aim to compare the performance of machine Learning and deep learning algorithms for detecting Arabic hate speech, [25] built a dataset from various social media platforms. Twelve machine learning algorithms (e.g. are MultinomialNB, Complement NB, BernoulliNB, SVC, NuSVC, LinearSVC, LogisticRegression, Decision Tree, SGD, Ridge, Perceptron, and Nearest Centroid) and two deep learning architectures (e.g. CNN, and RNN) were applied to evaluate the performance of the dataset. In machine learning algorithms, Complement NB yielded the best performance achieving an accuracy score of 97.59%. While in deep learning architectures, RNN gave the highest performance achieving an accuracy score of 98.70.
420
5
S. A. Azzi and C. B. O. Zribi
Discussion and Challenges for Future Research
Throughout this survey paper, we are providing an overview of the efforts of detecting abusive messages in Arabic social media. In the next table 4, we summarize all discussed papers for this issue and organize the current state of the field by topics. The classes,the Arabic variety,the approach, the social platform and results of each paper are also listed. Table 4. A summary of the discussed papers for detecting abusive messages in Arabic social media Paper Topic
Classes
Variety
Platform
Dataset
Results
[16]
Abusive Messages
Obscene Offensive Clean
Standard Arabic and Dialects Lexicon based Features: unigram and bigram, Log Odds Ratio(LOR), Seed Words lists
Approach
Twitter
Created for public use
F1-score: 0.6
[17]
Abusive Accounts
Abusive Non-Abusive
Standard Arabic
Supervised Features: PageRank, Semantic Orientation, statistical measures
Twitter
A publicly available dataset is used F1-score: 0.96
[18]
Abusive Arabic Accounts Spammers Non-Spammers
Standard Arabic
Lexicon based Features: bag of words (BOW), N-gram
Twitter
A publicly available dataset is used F1-score: 0.96
[21]
Offensive Language
Offensive Inoffensive comments
Informal Arabic
Deep Learning YouTube Features: AraVec word embedding
A publicly available dataset is used F1-score: 0.84
[14]
Offensive Language
Offensive Inoffensive comments
Informal Arabic
Supervised Features: N-gram
A publicly available dataset is used F1-score: 0.82
[27]
Offensive Language
Offensive Not offensive
Standard Arabic
Deep learning Twitter Features: AraVec word embedding and TF-IDF
[6]
Cyberbullying
Bullying Non-Bullying
Informal Arabic
Lexicon based
Twitter YouTube Created for the experiment
F-score: 0.81
[11]
Cyberbullying
Yes No
Arabic and English
Supervised Features: Tweet to SentiStrength Feature Vector
Twitter
Created for the experiment
F1-score: 0.92
[7]
Violent Content
Violence Non-Violence
Informal Arabic
Unsupervised Features: Sparse Gaussian process latent variable model, morphological features, Vector Space Model
Twitter
A publicly available dataset is used F1-score: 0.58
[8]
Jihadist Messages
Jihadist Non-Jihadist
Standard Arabic
Supervised Features: Sentiment, temporal and stylometric features
Twitter
A publicly available dataset is used Accuracy: 0.95
[9]
Multipliers of jihadism
Tweep involved in Standard Arabic mujahidden Tweep not involved
Semi-supervised Features: Data dependent features and data independent Features
Twitter
Created for the experiment
F1-score: 0.86
[10]
Potential ISIS Supporters Pro-ISIS Anti-ISIS
Standard Arabic
Supervised Features: Temporal patterns, Hashtags
Twitter
Created for the experiment
F1-score: 0.87
[12]
Rogue Content
Spam Account Non-Spam Account
Standard Arabic
Supervised Features: User based and content based features
Twitter
Created for the experiment
Accuracy: 0.93
[13]
Adult Content
Adult Content Regular Content
Standard Arabic
Supervised Features: Lexicon, Ngrams, bag-ofmeans (BOM)
Twitter
Created for public use
F1-score: 0.78
[20]
Religious Hate Speech
Hate and not Hate Informal Arabic
Deep learning Features: Word embeddings (AraVec)
Twitter
Created for public use
F1-score: 0.77
[26]
Religious Hate Speech
Hate and not Hate Informal Arabic
Deep learning Features: Word Embeddings and Social Network Graphs
Twitter
A publicly available dataset is used F1-score: 0.78
[28]
Hate Speech
HS and NOT_HS Standard Arabic
Deep learning Features:Tf-idf and word embeddings
Twitter
Created for the experiment
YouTube
A publicly available dataset is used Weighted-F1: 0.82
Macro-F1 score: 0.73
Based on some observations on this table, challenges can be pointed of in regards to the approaches, the social platforms as well as the datasets used. • Approaches: Detecting abusive language in social networks is a task that is usually framed as a supervised learning problem combining numerous features
Detecting Abusive Messages in Arabic Social Media: Challenges and Survey
421
and using classifiers that vary from one contribution to another since there is no single one that generally achieves the best classification performance. Still, it can be inferred that promising results were found and proved successful in the past ([11,14,17]) despite the challenges that remain among all these systems. Annotating data, for example, is a time and effort consuming process in the supervised machine learning approaches not only because of the lack of a unique definition for the concept of abusive messages but also because of the annotation reliability which is not always guaranteed and has a direct impact on the detection results. Another issue that faces the scientific society is class-imbalance when the clean text is bigger that the abusive text and removing or processing it requires more effort. All this and more led to the use of techniques that were not supervised, including semi-supervised and unsupervised learning techniques. While keywords and lexicon based approaches for abusive language detection tasks are faster and straightforward to understand, they have, however, several drawbacks. As a matter of fact, a system that relies entirely on abusive keywords will fail to detect implicit abuse. It is notable as well, that the task of abusive messages detection is predominately coarse-grained: Existing work typically focuses on binary classification or on differentiating among a set of three categories like the work of [16]. More fine-grained multiclass classification is a challenging issue as it requires a finer differentiation between classes. Another challenge is that abusive messages have, in addition to a fine-grained class, targets and actors that also need to be detected to provide additional information about the abuse. However, and based on our literature study, there has been no research on abusive messages with fine-grained multiclass classification and including the detection of actor, category and target. There have been a raising number of approaches that are using deep learning architectures progressively and extensively ([27],[?],[28]) on the basis of a strong believe that they have a high potential to further contribute to the issue [29]. It is worth mentioning that deep learning has shown a strong learning capacity despite the inexplicability of his behaviour and the difficulty of giving simple and clear reasons for a particular decision. • Social platforms: As far as social platforms are concerned, Twitter is the most accessed to collect data even though some contributions have chosen other data sources like YouTube or Facebook. [19] claimed that since Arab spring, the number of Twitter users in Arab nations has been escalating. YouTube, as well, attracted contributors assuming that haters tend to post comments without risking to receive unpleasant criticism themselves [15]. Finally, according to [11], “Collecting Facebook data was harder due to the restrictions imposed by Facebook security and privacy measures. In Facebook the data source had to be specified beforehand” . This may explain the number of searches on this platform. Generally, social platforms have strict data usage policies which makes the data access a more difficult task for researchers. Even though Twitter resources are valuable, some specificities like the maximum length of a tweet
422
S. A. Azzi and C. B. O. Zribi
can reduce information about the general topic and indirectly affect the meaning of the data. • Datasets: The major difference that can be distinguished between contributions is the dataset used for the experiments. Some studies are carried out on datasets privately collected by the researchers and oriented towards a particular experiment while others rely on publically-available, recognized and meticulously labeled ones which are remarkably rare. Among the works we collected, some were a representation of datasets that are now available for future use (e.g. [13,15,16,20] and [24] ). [23] attributed the scarcity of research for detecting abusive language in Arabic dialects to the lack of the needed publicly-available sources. That’s why they introduced the first publicly-available Tunisian Hate Speech and Abusive (T-HSAB) dataset with the objective to be a benchmark for automatic detection of online Tunisian toxic contents. [23] affirmed that Tunisian people entered a new era of freedom of expression since the “Jasmine Revolution” which led to an unrestricted spread of toxic contents mainly related to politics, social causes, and religion. A similar work was also done before by [22] introducing the first publicly-available Levantine Hate Speech and Abusive (L-HSAB) Twitter dataset. When it comes to the Arabic language, the detection of abusive messages from social media is much more complicated because defining abuse is closely related to the geographical location and the culture background of the users to such a degree that one Arabic word can be abusive in a specific dialect and clean in another. This may explain the difference of choice between papers: some opted for a solution on the Arabic standard and others worked on informal Arabic or a particular dialect.
6
Conclusion
The threat of abusive messages in social media is exponentially increasing. Countries all over the world are paying a particular attention to fight this problem. Particularly, Arabic countries are increasingly committed to develop automatic techniques for the detection of such contents in a rapid and consistent way. In this paper, we defined abusive messages in social media and enumerated, as well, some of its types and characteristics so it can be reliably recognized. After that, we recalled specificities of the Arabic language to describe the difficulties that must be taken into account in the detection process. Moreover, we introduced a detailed review on the existing approaches for the detection of Arabic abusive messages in social media. Finally, we pointed out some research challenges. Our ongoing work will include firstly the creation of a multi-class labeled dataset with the objective to be a benchmark for the detection of Arabic abusive messages in social media. For this purpose, we aim to build a semi-automatic tool to create the dataset using Active learning. Secondly, we intend to propose an automatic solution using deep learning techniques to perform multiclassification.
Detecting Abusive Messages in Arabic Social Media: Challenges and Survey
423
References 1. Alhelbawy, A., Massimo, P., Kruschwitz, U.: Towards a corpus of violence acts in arabic social media. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 1627–1631. European Language Resources Association (ELRA), Paris (2016) 2. Waseem, Z., Davidson, T.: Understanding abuse: a typology of abusive language detection subtasks. In: Proceedings of the First Workshop on Abusive Language Online, pp. 78–84. Association for Computational Linguistics (2017) 3. Coletto, M., Aiello, L.M., Lucches. C., Silvestri, F.: Pornography consumption in social media. CoRR abs/1612.08157 (2016) 4. Bosque, L. D., Garza., S.: Aggressive text detection for cyberbullying. In: Lecture Notes in Computer Science Human-Inspired Computing and its Applications, pp. 221–232 (2014) 5. Al-Hassan, A., Al-Dossari, H.: Detection of hate speech in social networks: a survey on multilingual corpus. In: Presented at the 6th International Conference on Computer Science and Information Technology, Dubai, UAE (2019) 6. Alharbi, B.Y., Alharbi, M.S., Alzahrani, N.J., Alsheail, M.M., Alshobaili, J.F., Ibrahim, D.M.: Automatic cyber bullying detection in Arabic social media. Int. J. Eng. Res. Technol. 12(12), 2330–2335 (2019) 7. Abdelfatah, K.E., Terejanu, G., Alhelbawy, A.A.: Unsupervised detection of violent content in Arabic social media. In: Computer Science & Information Technology (CS & IT), pp. 1–7 (2017) 8. Ashcroft, M., Fisher, A., Kaati, L., Omer, E.: Detecting jihadist messages on twitter. In: Proceedings of the 2015 European Intelligence and Security Informatics Conference, Manchester, UK, pp. 161–164 (2015) 9. Kaati, L., Omer, E., Prucha, N., Shrestha, A.: Detecting multipliers of jihadism on twitter. In: IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, pp. 954–960 (2016) 10. Magdy, W., Darwish, K., Weber, I.:#FailedRevolutions: using twitter to study the antecedents of ISIS support. First Monday 21(2) (2016) 11. Haidar, B., Chamoun, M., Serhrouchni, A.: A multilingual system for cyberbullying detection: Arabic content detection using machine learning. Adv. Sci. Technol. Eng. Syst. J. 2(6), 275–284 (2017) 12. Alharbi, A., Aljaedi, A.: Predicting rogue content and arabic spammers on twitter. Future Internet 11(11), 229 (2019) 13. Alshehri, A., Nagoudi, E., Alhuzali, H., Abdul-Mageed, M.: Think before you click: data and models for adult content in Arabic twitter. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018) 14. Alakrot, A., Murray, L., Nikolov, N.S.: Towards accurate detection of offensive language in online communication in Arabic. Procedia Comput. Sci. 142, 315–320 (2018) 15. Alakrot, A., Murray, L., Nikolov, N.S.: Dataset construction for the detection of anti-social behaviour in online communication in Arabic. Procedia Comput. Sci. 142, 174–181 (2018) 16. Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on Arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, pp. 52–56 (2017)
424
S. A. Azzi and C. B. O. Zribi
17. Abozinadah, E.A., Jones, J.H.: A statistical learning approach to detect abusive twitter accounts. In: Proceedings of the International Conference on Compute and Data Analysis, pp. 6–13 (2017) 18. Abozinadah, E.A., Jones, J.H.: Improved micro-blog classification for detecting abusive Arabic twitter accounts. Int. J. Data Mining Knowl. Manag. Process 6(6), 17–28 (2016) 19. Abozinadah, E.A., Mbaziira, A.V., Jones, J.H.: Detection of abusive accounts with Arabic tweets. Int. J. Knowl. Eng. 1(2), 113–119 (2015) 20. Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? analysis and detection of religious hate speech in the Arabic twittersphere. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 69–76. IEEE (2018) 21. Mohaouchane, H., Mourhir, A., Nikolov. N.: Detecting offensive language on arabic social media using deep learning. In: 2019 Sixth International Conference on Social Networks Analysis, pp. 466–471. Management and Security (SNAMS) (2019) 22. Mulki, H., Haddad, H., Bechikh Ali, C., Alshabani, H.: L-HSAB: a levantine twitter corpus for hate speech and abusive language. In: Proceedings of the Third Workshop on Abusive Language Online, pp. 111–118. Association for Computational Linguistics, Florence (2019) 23. Haddad, H., Mulki, H., Oueslati, A.: T-HSAB: a tunisian hate speech and abusive dataset. From Theory to Practice. Springer International Publishing, Arabic Language Processing (2019) 24. Alhelbawy, A., Massimo, P., Kruschwitz, U.: Towards a corpus of violence acts in Arabic social media. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 23-28 . European Language Resources Association (ELRA), Paris (2016) 25. Omar, A., Mahmoud, T., Abd El-Hafeez, T.: Comparative performance of machine learning and deep learning algorithms for Arabic hate speech detection in OSNs. In: Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), pp. 247–257. Springer, Cham (2020) 26. Chowdhury, A. G., Didolkar, A., Sawhney, R., Shah., R. : ARHNet - leveraging community interaction for detection of religious hate speech in Arabic. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 273–280. Association for Computational Linguistics, Florence (2019) 27. Husain, F., Lee, J., Henry, S., Uzuner, O.: SalamNET at SemEval-2020 Task 12: deep learning approach for Arabic offensive language detection. In: Proceedings of the International Workshop on Semantic Evaluation (SemEval) (2020) 28. Abuzayed, A.: Quick and simple approach for detecting hate speech in Arabic tweets. In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pp. 109– 114. European Language Resource Association, Marseille (2020) 29. Pitsilis, G.K., Ramampiaro, H., Langseth, H.: Detecting offensive language in tweets using deep learning. CoRR, abs/1801.04433 (2018)
A-Part: Top-Down Clustering Approach for Mesh of Clusters FPGA Khouloud Bouaziz1,2(B) , Sonda Chtourou1 , Zied Marrakchi3 , Abdulfattah M. Obeid4 , and Mohamed Abid1,2 1
4
CES Laboratory, National Engineering School of Sfax, University of Sfax, Sfax, Tunisia {khouloud.bouaziz,sonda.chtourou,mohamed.abid}@enis.tn 2 Digital Research Center of Sfax (CRNS), Sfax, Tunisia 3 Mentor Graphics, Tunis, Tunisia zied [email protected] National Center for Electronics and Photonics Technology, KACST, Riyadh, Saudi Arabia [email protected]
Abstract. Clustering is the process of distributing synthesized and packed circuits into Field Programmable Gate Array (FPGA) clusters effectively. Clustering is a crucial phase of FPGA Computer-Aided Design (CAD) flow since it majorly interferes afterwards in the quality and performance of the placed and routed circuit in terms of power, area and delay. It is mainly controlled by hardware constraints of specific target architectures. In this paper, we propose Adapted Multilevel Partitioner (A-Part) as a clustering algorithm targeting Mesh of Clusters (MoCs) FPGA. We explore the impact of A-Part on MoCs FPGA performance compared to T-VPack. This paper shows through experimentation that A-Part ameliorates power consumption, area and critical path delay by an average of 7%, 5% and 11% respectively compared to T-VPack for MoCs FPGA. Keywords: FPGA · CAD tools · Mesh of Clusters · Clustering algorithms · T-VPack · Adapted Multilevel Partitioner
1
Introduction
Since the mid-eighties, Field-Programmable Gate Array (FPGA) has known significant advancement and has become an attractive reprogrammable platform. FPGAs allow implementing arbitrary logic functions after manufacturing unlike Application-Specific Integrated Circuits (ASICs) thanks to the flexible logic and interconnect resources they procure. They ensure design alternatives’ fast emulation and hence enable a faster design cycle. Furthermore, they offer low Non-Recurring Engineering costs and speed-up time-to-market. Nevertheless, FPGAs are still facing serious challenges in terms of power consumption, logic density and performance. According to [13], FPGAs cause cosiderable logic c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 425–434, 2021. https://doi.org/10.1007/978-3-030-71187-0_39
426
K. Bouaziz et al.
density, power and delay overhead compared to ASICs. This overhead is caused in most part by the flexible yet massive routing interconnect. The quality of an FPGA and its efficiency are decided by FPGA architecture, FPGA electrical design and the Computer-Aided Design (CAD) tool suite that maps circuits into the FPGA. CAD algorithms are used to perform all required steps to convert a synthesizable RTL description into a configuration bitstream to be loaded in the FPGA. In this paper, we focus on exploring CAD algorithms aspects to enhance FPGA performance. CAD flow consists of five distinct phases, which are synthesis, technology mapping, clustering, placement and routing. Among the main concerns of an FPGA-based implementation is the quality of its associated CAD tools. In fact, the benefits of a rich and well-designed FPGA architecture might be undermined unless CAD tools can take advantage of the architectural advancement provided by the FPGA. Having efficient CAD algorithms is essential to optimize the use of FPGA resources. Therefore, it enables bridging the performance gap between FPGAs and customized computational devices like ASICs. Recent works explored possible enhancements of CAD algorithms used in different stages of logic circuit implementation into FPGA. In this work, we are interested in investigating clustering algorithms aspects to optimize Mesh of Clusters based FPGAs (MoCs-based FPGAs) performance. In fact, clustering quality has a notable effect on FPGA performance. For instance, in [20], authors studied and improved clustering and placement algorithms based on Rent’s rule, they claim that 90% of the gain introduced by their method originates from the clustering phase while only 10% is obtained through placement. In this context, various algorithms for packing and clustering were investigated to improve performance [3,5,12,16,19–21]. Clustering algorithms can be classified into bottom-up, top-down, or hybrid algorithms that combine both. Generally, bottom-up approaches [3,5,16,19,20] are used for mesh-based FPGA architectures since they enable easier control of clustering results based on FPGA architecture resources, namely clusters’ resources. Meanwhile, Top-down approaches [12] are not commonly used for this type of FPGA topology. Indeed, even though these methods can offer better solutions compared to the bottom-up method, they have difficulty in involving clustering constraints, or clustering metrics due to the use of graphs. The target architecture in this work is a Mesh-based FPGA. This architecture was investigated in different aspects like cluster size, LUT size, Rent parameter ... [6]. In most works, the used clustering method was based on bottom-up approach which is T-VPack [16]. This algorithm is well-known and often used for academic research [4]. In this paper, we explore the possibility of tuning a top-down approach to respond to the target MoCs FPGA clustering constraints. Hereby, we propose the “Adapted Multilevel Partitioner” (A-Part) as a clustering Approach then we explore and investigate its effects on MoCsbased FPGA [6] performance in terms of power consumption, area and delay. The remainder of this paper is organized as follows: Sect. 2 briefly describes the target MoCs-based FPGA architecture. Section 3 denotes the implemented configuration flow for MoCs-based FPGA then details metrics models to estimate power consumption, area and delay. Section 4 exhibits and details the proposed A-Part
A-Part: Top-Down Clustering Approach for Mesh of Clusters FPGA
427
approach that we adopted as a clustering method. Finally, Sect. 5 presents the experimental setup and results for A-Part effect on MoCs performance compared to T-VPack.
2
MoCs-Based FPGA Architecture
In this section, we present a brief description of MoCs FPGA architecture under investigation, more details can be found in [6]. MoCs FPGA is a matrix of clusters placed into a regular 2D-grid connected with a unidirectional routing network and Switch Boxes (SBs), as illustrated in Fig. 1. Each cluster contains local Logic Blocks (LBs) connected with a depopulated intra-cluster interconnect inspired from [17] to decrease power and area overhead without penalizing flexibility. Each LB includes a k-input Look Up Table (LUT) accompaigned with a Flip-Flop (FF). Figure 1 illustrates an example of a cluster with 8 4-input LBs. The depopulated intra-cluster interconnect is composed of Mini Switch Blocks (MSBs). An MSB is a full crossbar. The cluster interconnect includes an upward network and a downward network: The downward network connects cluster inputs to LBs’ inputs using Downward MSBs (DMSBs). This network structure is based on the Butterfly Fat Tree (BFT) topology [17]. The upward network connects LBs’ outputs to cluster outputs using Upward MSB (UMSB). It ensures that all LBs outputs can reach all DMSBs and cluster outputs. Thus, LBs inside the same cluster are equivalent and their ordering has no impact on routing quality. Conventional Mesh-based FPGA architectures use Switch Boxes (SBs) to connect horizontal and vertical adjacent wire segments and use Connection Boxes (CBs) to connect routing channels to cluster inputs and outputs.
Fig. 1. Structure of MoCs-based FPGA architecture and its main components
428
K. Bouaziz et al.
In MoCs-based FPGA, SBs with a multi-level interconnect structure is used to connect horizontal and vertical adjacent routing channels and also to connect clusters inputs/outputs to adjacent tracks. As illustrated in Fig. 1, each cluster is connected to 4 adjacent SBs. SB is connected to the 4 neighboring SBs (SB Top, SB Bottom, SB Right, SB Left) and the 4 neighboring clusters (cluster 1, cluster 2, cluster 3, cluster 4) highlighted in Fig. 1. The SB structure is based on a multilevel topology. It comprises 3 main Boxes: Box 1 is composed of MSBs that connect the inputs coming from the adjacent SBs to the outputs going to the adjacent SBs. Besides, Each SB is connected to the 4 neighboring clusters through both the first and third Boxes forming 2 interconnect levels. Then, outputs of MSBs in Box 1 drive the DMSB in Box 3 whose outputs drive one input of each of the 4 neighboring clusters. Adjacent clusters are connected through Box 2 and Box 3 which represent 2 interconnect levels. Cluster outputs drive MSB located at Box 2 whose outputs drive DMSB located at Box 3. Each cluster communicates with the four adjacent SBs. The connection from cluster to SB is realized through cluster outputs that drive Box 2. More details about this architecture can be found in [6]. The clock network is modeled as a H-Tree distribution network, similar to the topology used in Xilinx Virtex II Pro [15]. The clock network contains buffers separated by a distance equal to the size of a tile (cluster and SB) in the FPGA. The investigation of MoCs-based FPGA quality and the evaluation of its performance requires a CAD tool suite.
3
FPGA Configuration Flow
In this section, we represent MoCs FPGA exploration platform. This platform must involve a set of CAD tools to implement benchmark circuits automatically in FPGA. Besides, metric models are required to evaluate FPGA performance. CAD tools are used to convert the high-level circuit description into configuration bitstream to be loaded into the target FPGA, as shown in Fig. 2. The first stage of CAD flow is Synthesis. It converts a circuit written using Hardware Description Language (HDL) (VHDL or Verilog) to a gate-based representation. This representation is a network composed of Boolean logic gates and FFs. In this work, ODIN II [11], an open-source tool, is used to achieve synthesis. The second stage is Technology mapping using ABC [1] where the output from synthesis tools is transformed into k-bounded cells. According to the target FPGA technology, these cell are implemented as k-input LUTs. Then, the packing stage Forms LBs by grouping k-input LUTs and FFs. Subsequently, clustering enables to group sets of n LBs into clusters to be easily mapped into FPGA afterwards. Clustering algorithms are generally classified into bottomup [3,5,16,19,20], top-down [12] and hybrid or other approaches [21]. In our methodology, we propose and tune A-Part as a modified top-down approach, more details of this algorithm will be presented in Sect. 4. The following step is the placement of clusters and I/Os instances on MoCs-based FPGA architecture using the Simulated Annealing Algorithm [4]. Subsequently, Pathfinder algorithm [18] is used to guarantee routing circuit nets. The Pathfinder algorithm is based on an iterative, negotiation-based approach for congested routing
A-Part: Top-Down Clustering Approach for Mesh of Clusters FPGA
429
resources. Besides, this routing approach uses the Dijkstra algorithm [7] to route each net using the shortest path with no congestion. The architecture description file is provided to the CAD tool. This file includes the necessary information about MoCs-based FPGA architecture such as its size (nx × ny), LUT size, cluster size and cluster inputs number. Each tool in the CAD flow requires some of the architecture file parameters. For the clustering phase, MoCs-based FPGA cluster size and inputs’ number are needed as parameters. Placement phase requires LUT size, cluster size and design cluster inputs (based on design Rent’s parameter) which are extracted from the FPGA description file. While routing involves more parameters like the architecture cluster inputs (according to architecture Rent’s parameter) and channel width. In this work, we determine the minimal channel width (Wmin ) that enables the successful routing of the circuit. Thus, we decrease channel width and then route the design netlist until routing is no longer feasible. When Wmin is found, we estimate the resulting power consumption, area and delay through metrics models. Activity estimation and Power modules, are developed and integrated into the proposed MoCs-based FPGA CAD flow to estimate power consumption. Activity estimation tool ACE 2.0 [14] combines existing probabilistic and simulation techniques. It determines the switching activity and the static probability of the input signals. The obtained values serve afterwards to compute the static and dynamic power of different components. The power module incorporates an architecture generator and low-level Power Estimation. The architecture generator uses the routing resource graph to decompose the entire MoCs-based FPGA circuit into low-level components which are inverters, multiplexers and wires [6]. After the decomposition of the FPGA circuit, the VersaPower model [9] equations are used to compute the dynamic and static powers of each component. Static power originates from sub-threshold and gate leakages [22]. Total power dissipation is equal to the sum of static power and dynamic power. The area estimation model is based on a transistor-counting algorithm which is accurate and consistent with the methodology used in [9] to deduce components area. Routing graph resource is used to parse all FPGA components (MSBs, LBs and buffers) and compute accurately the total number of transistors of all entities within the FPGA using the assumption of low-level decomposition. The area is expressed as a function of λ. λ = d2 where d represents the minimal distance between the transistor’s source and drain. Timing analysis allows evaluating the performance of the final placed and routed circuit implemented on the FPGA. Wires and switches delays have a considerable effect on critical path delay values [6]. Local wires length is computed after approximating the size of entities using transistor-counting functions. The timing model considers also the effect of the resulting LUT delay as a function of the LUT size [2]. Local and global wires, LUTs and switches (multiplexer and buffer) delays are extracted from the SPICE circuit simulator using STMicro’s 130 nm Technology node.
430
K. Bouaziz et al.
Fig. 2. FPGA configuration flow
4
Adapted Multilevel Partitioning Approach
The proposed adapted multilevel partitioning (A-Part) approach comprises two main phases which will be detailed in the next subsections: the multilevel partitioning phase along with the legalization phase. 4.1
First Phase: Multilevel Partitioning
Multilevel partitioning is inspired by hMETIS tool. It comprises a bottom-up coarsening phase using First Choice coarsener [12], an initial partitioning phase then a top-down refinement phase, as shown in Fig. 3. The used top-down multilevel refinement is adopted from the Fiduccia-Mattheyses (FM) heuristic [8]. FM begins with a random solution and then performs a sequence of moves to optimize the partitioning solution by minimizing the number of cuts. The sequences of moves are considered as passes. As a pass begins, all vertices have the ability to move and each viable move is labeled with the corresponding cost based on the immediate change (the gain of the move). The move with the highest gain is chosen and executed. Consequently, the moved vertex becomes in a locked state and so it is not allowed to move again during that pass. After the execution of a move, all gains must be updated. This process is repeated iteratively until all vertices are locked. Subsequently, the best over-all solution found during the pass is chosen as the initial solution in the next pass. The algorithm stops whenever a pass can no longer ameliorate the solution’s quality. The main modification that we made compared to the original FM approach is the control of partitions’ size instead of the number of partitions. This adjustment is crucial to respond to MoCs cluster size constraint and then ensures that each partition can fit in the MoCs cluster during placement. The output of this algorithm is a 2-level netlist of clusters. In our case it is a Hypergraph called ClusteringHierarchy based on a hierarchical data structure that is required for the Legalization phase.
A-Part: Top-Down Clustering Approach for Mesh of Clusters FPGA
431
Fig. 3. Multilevel k-way partitioning
4.2
Second Phase: Legalization
During this phase, each cluster in the design netlist will be checked to confirm whether MoCs cluster inputs number constraint is verified. If this constraint is not respected, the LB with the least common input pins compared to its peers inside the same cluster will be extracted. Afterwards, we scan the netlist’s remaining clusters to find a target cluster where the extracted LB can migrate. The solution must respond to MoCs cluster inputs number constraint and induce the least expensive cost, through having high connectivity between the LB to move and the target cluster. Algorithm 1. Pseudo-code of Legalization phase clustering level = F M solution; for cluster ∈ clustering level do if inputs number(cluster) > inputs number(architecture cluster) then LB to move = choose LB to move(); extract LB to move(); target cluster = N U LL; move gain = 0; for cluster ∈ clustering level do if feasible move then gain = compute move cost(LB to move, cluster ); if gain > move gain then move gain = gain; target cluster = Cluster ; end if end if end for migrate(LB to move, target cluster); end if end for
5
Experimental Set-Up and Comparison Results
This section aims to experimentally compare T-VPack and A-Part algorithms on the target MoCs-based FPGA architecture. For both architectures, we use cluster size 8 and LUT size 4. The design Rent’s parameter is fixed to 0.67 (design cluster inputs = 16) and the architecture Rent’s parameter is fixed to 0.89 (architecture cluster inputs = 24). Our comparison metrics are power consumption, area and
432
K. Bouaziz et al.
delay using the smallest FPGA array and Wmin implementing MCNC [23] and IWLS [10] benchmark circuits with distinct sizes and features. In terms of Wmin for T-VPack and A-Part, results in Table 1 show that A-Part offers average savings of 8% compared to T-VPack. Thus, A-Part alleviates congestion on inter-cluster interconnect resources compared to T-VPack by increasing the use of intra-cluster interconnect resources. Table 1. Wmin , Power, Area and Delay Results for MoCs with SWs: T-VPack vs. A-Part Circuit
T-Vpack
A-Part
Wmin Power (mW)
Number Area of global (106 λ2 ) buffers
Number of Delay 2-to-1 mul- (nS) tiplexer
Wmin Power (mW)
Number Area of global (106 λ2 ) buffers
Number of Delay 2-to-1 mul- (nS) tiplexer
tseng.net
22
26
11968
249
156086
122
18
21
9792
201
156086
diffeq.net
24
25
13056
257
173772
95
24
25
13056
257
173772
65 85
misex.net
30
29
16320
282
217759
108
30
29
16320
282
217759
107
alu4.net
115
32
29
17408
291
197469
140
32
29
17408
291
197469
elliptic.net 36
71
43200
688
456812
229
28
62
33600
603
446836
204
frisc.net
36
69
43200
688
456812
204
34
64
36000
650
452000
227
dsip.net
14
83
26040
751
497308
145
8
67
16120
688
485841
117
des.net
20
107
42240
952
717785
213
18
98
38016
900
695716
192
pdc.net
48
115
77952
1089
883344
301
44
110
71456
1037
750497
268
clma.net
46
182
122544
1748
1130880
345
44
178
117216
1700
1066835
319
usb funct
50
138
81200
1241
676450
210
46
128
72268
1179
642628
187
b22 c
44
360
240000
3490
1066835
490
40
335
213600
3200
1013493
436
ava2
50
204
119000
1820
993960
420
46
190
105910
1729
944262
374
aes core
44
236
130416
2103
1161895
510
40
219
94260
1998
1103800
454
wb conmax 68
643
403920
5532
2958750
390
62
600
359489
5255
2810812
347
1510
985920
12669
6797650
680
74
1404
877470
12035
6457768
605
8%
7%
12%
5%
5%
11%
vga lcd
80
Average gain compared to T-Vpack
In terms of total power consumption and the total number of routing buffers for T-VPack and A-Part. Results in Table 1 show that the use of A-Part decreases power consumption compared to T-VPack. Power gain reaches 7% for A-Part on average. Power consumption depends mainly on the number of buffers [9]. Wmin reduction thanks to the use of A-Part indicates that the number of buffers has decreased since routing buffers number has a linear dependence on Wmin , as shown in the Eq. 1 given by: (1)
N b buf (Routing) = N b SB × N b out(Box1) = N b SB × 2 × Wmin
Results in Table 1 show that A-Part reduces the total number of buffers used for the global routing network by 12% overall. In terms of area and the total number of 2-to-1 multiplexers for T-VPack and A-Part, we note that A-Part enables an overall area reduction of 5% compared to T-VPack, as presented in Table 1. Area metric is typically affected by multiplexers’ number variation. In fact, the number of 2-to-1 multiplexers in Box 1 of SBs (see Fig. 1) is given by: N b DM SBs M ultiplexers(Box1) = N b DM SB ouputs(Box1) × N b DM SB(Box1) = 5 ×
Wmin 2
(2)
A-Part: Top-Down Clustering Approach for Mesh of Clusters FPGA
433
Equation 2 shows that the number of 2-to-1 multiplexers in Box 1 has a linear dependency on Wmin . Hence, Wmin decrease induces the reduction of the number of 2-to-1 multiplexers in SBs and then the decrease in the total number of 2-to-1 multiplexers. Table 1 results show that A-Part reduces the total number of 2-to-1 multiplexers by an average of 5% compared to T-VPack. Results in Table 1 show that A-Part provides better speed compared to TVPack for most benchmarks. This is achieved thanks to the decrease of critical path delay by an average rate of 11% for A-Part. The main cause of delay reduction is the decrease of critical path switches number. These results show that A-Part offers better results in terms of speed compared to T-VPack, which is a timing-driven algorithm, by relieving the congestion on global routing interconnect resources and boosting the use of local interconnect structures which are faster and cheaper.
6
Conclusion
In this paper, we proposed to investigate CAD algorithms aspects to optimize MoCs-based FPGA performance. Our exploration targeted the clustering stage by comparing A-Part and T-VPack clustering approaches. We defined metrics models to enable power consumption, area and critical-path delay estimation. Experimentation results show that A-Part offers better results compared to TVPack packing and clustering tool. A-Part enables power, area and delay gains for MoCs-based FPGA of 7%, 5% and 11%, respectively in comparison with T-VPack.
References 1. ABC a system for sequential synthesis and verification. http://www.eecs.berkeley. edu/∼alanmi/abc/. Accessed 07 Feb 2019 2. Ahmed, E., Rose, J.: The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 12(3), 288–298 (2004) 3. Betz, V., Rose, J.: Cluster-based logic blocks for FPGAs: area-efficiency vs. input sharing and size. In: Custom Integrated Circuits Conference, 1997, Proceedings of the IEEE 1997, pp. 551–554. IEEE (1997) 4. Betz, V., Rose, J., Marquardt, A.: Architecture and CAD for Deep-Submicron FPGAs, vol. 497. Springer Science & Business Media, Berlin (2012) 5. Bouaziz, K., Chtourou, S., Marrakchi, Z., Abid, M., Obeid, A.: Exploration of clustering algorithms effects on mesh of clusters based FPGA architecture performance. In: 2019 International Conference on High Performance Computing & Simulation (HPCS), pp. 658–665. IEEE (2019) 6. Chtourou, S., Marrakchi, Z., Pangracious, V., Amouri, E., Mehrez, H., Abid, M.: Mesh of clusters FPGA architectures: exploration methodology and interconnect optimization. In: International Symposium on Applied Reconfigurable Computing, pp. 411–418. Springer, Cham (2015) 7. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2009)
434
K. Bouaziz et al.
8. Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network partitions. In: 19th Design Automation Conference, pp. 175–181. IEEE (1982) 9. Goeders, J.B., Wilton, S.J.: Versapower: power estimation for diverse FPGA architectures. In: Field-Programmable Technology (FPT), 2012 International Conference on, pp. 229–234. IEEE (2012) 10. IWLS, B.: IWLS benchmarks. http://iwls.org/iwls2005/benchmarks.html 11. Jamieson, P., Kent, K.B., Gharibian, F., Shannon, L.: Odin II - an open-source verilog HDL synthesis tool for CAD research. In: Field-Programmable Custom Computing Machines (FCCM), 2010 18th IEEE Annual International Symposium on, pp. 149–156. IEEE (2010) 12. Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. VLSI Design 11(3), 285–300 (2000) 13. Kuon, I., Rose, J.: Measuring the gap between FPGAs and ASICs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 26(2), 203–215 (2007) 14. Lamoureux, J., Wilton, S.J.: Activity estimation for field-programmable gate arrays. In: Field Programmable Logic and Applications, 2006. FPL’06. International Conference on, pp. 1–8. IEEE (2006) 15. Lamoureux, J., Wilton, S.J.: FPGA clock network architecture: flexibility vs. area and power. In: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, pp. 101–108. ACM (2006) 16. Marquardt, A.S., Betz, V., Rose, J.: Using cluster-based logic blocks and timingdriven packing to improve FPGA speed and density. In: Proceedings of the 1999 ACM/SIGDA Seventh International Symposium on Field Programmable Gate Arrays, pp. 37–46. ACM (1999) 17. Marrakchi, Z., Mrabet, H., Farooq, U., Mehrez, H.: FPGA interconnect topologies exploration. Int. J. Reconfigurable Comput. 2009, 6 (2009) 18. McMurchie, L., Ebeling, C.: Pathfinder: a negotiation-based performance-driven router for FPGAs. In: Reconfigurable Computing, pp. 365–381. Elsevier (2008) 19. Rajavel, S.T., Akoglu, A.: Mo-pack: many-objective clustering for FPGA CAD. In: 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 818– 823. IEEE (2011) 20. Singh, A., Parthasarathy, G., Marek-Sadowska, M.: Efficient circuit clustering for area and power reduction in FPGAs. ACM Trans. Des. Autom. Electr. Syst. (TODAES) 7(4), 643–663 (2002) 21. Wang, Y., Trefzer, M.A., Bale, S.J., Walker, J.A., Tyrrell, A.M.: A novel multiobjective optimisation algorithm for routability and timing driven circuit clustering on FPGAs. IET Computers and Digital Techniques (2018) 22. Weste, N.H., Harris, D.: CMOS VLSI Design: A Circuits and Systems Perspective. Pearson Education India, New Delhi (2015) 23. Yang, S.: Logic synthesis and optimization benchmarks version 3.0. Technical Report, Microelectronics Centre of North Carolina (1991)
In-Car State Classification with RGB Images Pedro Faria(B) , Sandra Dixe , Jo˜ ao Leite , Sahar Azadi , Jos´e Mendes , ao Borges Jaime C. Fonseca , and Jo˜ Algoritmi Center, University of Minho, Guimar˜ aes, Portugal [email protected]
Abstract. In the next years, shared autonomous vehicles are going to be a new reality. The absence of the human driver is going to create a new paradigm for in-car safety. This paper addresses this problematic by presenting a monitoring system capable of classifying the state of the vehicle interior, i.e. good or bad condition. We propose the use of classifiers, with RGB images, to infer the in-car cleanliness state. Moreover, 18 state-of-the-art classifiers were trained and evaluated, using pre-trained models. To be able to train and evaluate these approaches an in-car dataset was created with 3488 samples from 135 cars, and then split in 2439 train, 351 validation and 689 test RGB images. From all the evaluated, ResNet-18 showed the best results, achieving an average accuracy of 91.24% 123 Hz. Keywords: Classification · Deep transfer learning · Shared autonomous vehicles · Deep learning · Supervised learning
1
Introduction
The next paradigm shift in the field of transportation is going to be led by fully autonomous vehicles. Although their implementation is still being evaluated and discussed, major technology companies and car manufactures are already in the race to build the first operational vehicle. The soaring recent interest and investments for autonomous vehicles imply that such systems will be soon a reality [1]. With autonomous technology, people’s mobility choices will change from the form of manual personal ownership to shared autonomy. Shared autonomous vehicles (SAVs) will be a more effective, ecologically friendly and money-saving type of transport while presenting a secure and comfortable service. This will lead to a less congested and efficient world with a mixture of car-sharing and autonomous vehicles, taking us to envision the means of transport of the future [2]. To guarantee the safety of passengers and the monitoring of the interior of an SAV, several works have been developed. Torres et al. [3] proposes a system for monitoring passengers, using a deep learning strategy to accurately detect the human pose in images captured inside a car. Deep learning strategies require a considerable amount of data, thus Borges et al. proposes tools for automated c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 435–445, 2021. https://doi.org/10.1007/978-3-030-71187-0_40
436
P. Faria et al.
generation of synthetic [4] and real [5] in-car dataset for human body pose detection. The synthetic dataset approach provides a personalized in-car environment, which simulates humans, sensors and car models. Moreover, the real dataset approach combines optical and inertial based systems to achieve in-car motion capture. It’s expected that with the presence of passengers, the state inside these vehicles will suffer some change such as material wear, damage, stain or dirt presence, jeopardising directly or indirectly the quality of the service provided. Therefore, there is a need to develop advanced systems capable of monitoring the interior of the car, ensuring this way the conditions of the vehicle and safety of the passengers. Existing literature aims to classify damage in many sectors [6], through different types of approaches [7,8], however, in the case of in-car inspection it has not yet been studied. In this article, a network-based deep transfer learning approach was used. Moreover, 18 state-of-the-art pre-trained classifiers were fine-tuned, for the selected use-case, to estimate the cleanliness state of the car interior (i.e. good and bad condition) using RGB images. Transfer learning [9–11] is an important tool in machine learning, where an attempt is made to transfer the knowledge from the source domain to the target domain by relaxing the assumption that the training data and the test data must be distributed independently and identically [12]. It is based on the assumption that the neural network is similar to the mechanic process of the human brain, that is an iterative and continuous process of abstraction. The first layers of the network can be treated as a feature extractor, being the features extracted versatile [12]. This process is usually much faster and easier than running a network from scratch reducing the training time and compute resources since it does not need to be trained as much as a new model would require. This paper was organised as follows. In Sect. 2 the state-of-the-art for different approaches applied for damage detection and localisation on in-car materials is presented. In Sect. 3, the classifiers customisation is presented as well as the dataset creation. As for the results, they are shown in Sect. 4 and discussed in Sect. 5. In Sect. 6, the article is concluded.
2
Related Works
In the literature, several studies focus on the detection of damages and defects in textile fabrics. Recent techniques tend to rely on Machine Learning [13] and Deep learning [14] techniques. Liu et al. [13] introduced a new method for classifying defective tissue images with unsupervised segmentation by using Extreme Learning Machine. The model was evaluated using the TILDA dataset and some real tissue samples. The classification accuracy of the presented method was of 91.8%, surpassing state-ofthe-art models.
In-Car State Classification with RGB Images
437
Jeyaraj and Samuel Nadar [14] proposed a model that allowed to accurately detect the defective region using a Convolutional Neural Network (CNN), where the algorithm classifies defects through unsupervised learning. In the test phase, the algorithm was evaluated using the standard TILDA dataset and tissue samples acquired in real-time. The classification accuracy of the presented method was of 96.55%. Abdulkadir Seker et al. [15] presents a solution to the problem of detecting tissue defects through the use of transfer learning. In the textile industry, the sale of defective fabrics hurts both producers and customers, and accurate and rapid detection of fabric defects is a crucial need. As each fabric has unique texture characteristics this task can be difficult. They used transfer learning methods to classify the existence of defects in tissue images. In this sense, the AlexNet model trained with millions of images was used, and the success rate of the training increased from 75% to 98%. With a focus on the automotive industry, Patil et al. [16] presents the use of CNN to classify the types of damage in cars. In the auto insurance industry, a lot of money is lost due to leakage, which is the difference between the insurance payment and the appropriate payment that should be made. For the value to be fair, it is necessary to carry out a visual inspection process, however, to carry out this process, human intervention is necessary, which results in long delays. Therefore, the need to create an automated computer vision system for automated inspection appeared. The author considers common damages in the exterior of cars. As there was no dataset with this type of damage, the author created his dataset by collecting images from the web and making annotations manually. The author tried different approaches to the classification task and concluded that the use of transfer learning combined with ensemble learning worked better. They also developed a method for locating a specific type of damage. An accuracy of 89.5% was obtained, with a transfer and ensemble learning combination. In this paper, multiple existent models of CNNs for classification were taken into consideration: Alexnet proposed by Krizhevsky et al. [17] in 2012, a pioneer in deep architectures with a top-5 test accuracy of 84.6% on ImageNet data; SqueezeNet proposed by Iandola et al. [18] in 2017, a small CNN architecture that achieves the AlexNet accuracy on ImageNet with 50x fewer parameters, being 510 times smaller; GoogLeNet proposed by Szegedy et al. [19] in 2014, is one of the first CNN architectures, which deviate from the common approaches, on ImageNet ILSVRC14 detection and classification challenge it gives a stateof-art performance, with top-5 test accuracy of 93.3%, the main point of this architecture was to improve utilization of computing resources inside the network, through the use of a building block known as “inception module”, which takes into account the enlarged depth and width of the model; VGGNet proposed by Simonyan and Zisserman [20] in 2014, expands the intensity of the neural network that not just accomplishes the state-of-art precision over ILSVRC datasets but it’s also relevant to another image recognition databases; ResNet proposed by He et al. [21] in 2015 presented a deep residual learning technique for the recognition of images, being performed on the ImageNet database; Inception proposed
438
P. Faria et al.
by Szegedy et al. [22] in 2015, an important milestone in the development of CNN classifiers, they achieve a 3.08% top-5 error on the test set of the ImageNet classification challenge; DarkNet proposed by Redmon and Farhadi [23] in 2016, is a new classification model proposed to be used as a base to the object detection system YOLOv2, it only requires 5.58 billion operations to process an image, achieving 72.9% top-1 accuracy and 91.2% top-5 accuracy on ImageNet; MobileNet presented by Howard et al. [24] in 2017, showed strong performance compared to other popular models on ImageNet classification; Xception proposed by Chollet [25] in 2017, is an extended version of the inception model, where it is completely based on depthwise distinguishable convolutions, followed by a pointwise convolution, this architecture gives better results on ImageNet than Inception V3, ResNet-50, ResNet-101, ResNet 152 and VGG-Net; ShuffleNet proposed by Zhang et al. [26] in 2017, a very computation efficient CNN architecture, built for mobile devices with very reduced computing power, tests on ImageNet classification and MS COCO object detection show a higher performance over other structures; DenseNet proposed in 2018 by Huang et al. [27], achieved state-of-art performance in several databases, including CIFAR-100, ImageNet, and SVHN, with the only drawback that a lot of additional memory is required due to the chaining of the tensors; and NasNet proposed by Adam and Lorraine [28] in 2019, obtain significant results on semantic segmentation and image classification, making use of the reinforcement learning search method.
3
Implementation
This work aimed to classify the cleanliness of car interiors (i.e. good or bad condition), using RGB images. Moreover, several state-of-the-art pre-trained classifiers were fine-tuned on a generated in-car dataset. 3.1
Classifiers
A set of pre-trained classifiers were used in this study, namely Alexnet [17], SqueezeNet [18], GoogLeNet [19], VGG-16 [20], VGG-19 [20], ResNet-18 [29], ResNet-50 [29], ResNet-101 [29], Inception-V3 [22], Inception-ResNet-V2 [30], DarkNet-19 [23], DarkNet-53 [23], MobileNet-V2 [26], Xception [25], ShuffleNet [31], DenseNet-201 [27], NasNet-Large [32] and NasNet-Mobile [32]. Input resolution was preserved for each of the original pre-trained classifiers: – Alexnet [17] and SqueezeNet [18] classifiers share the original 227×227×3 input resolution. – GoogLeNet [19], VGG-16 [20], VGG-19 [20], ResNet-18 [29], ResNet-50 [29], ResNet-101 [29], MobileNet-V2 [26], ShuffleNet [31], DenseNet-201 [27] and NasNet-Mobile [32] classifiers share the original 224×224×3 input resolution. – Inception-V3 [22], Inception-ResNet-V2 [30] and Xception [25] classifiers share the original 229×229×3 input resolution. – DarkNet-19 [23] and DarkNet-53 [23] classifiers share the original 256×256×3 input resolution. – NasNet-Large [32] classifier use 331×331×3 input resolution.
In-Car State Classification with RGB Images
439
As shown in Fig. 1, to re-target each model the original 1000 classes softmax, SL, and classification, CL, layers were removed. Moreover, 3 new layers were added (i.e. for good and bad classes): (1) a fully connected layer, F Cn ; (2) a softmax layer, SLn ; (3) and a weighted classification layer, W CL, which makes use of class weights, CWc , to improve lack of dataset class representation (Eq. 1). Where c represents the class index, N classc , represents the number of samples, from each class, in the dataset. 1 N classc CFc CWc = 2 CFc =
c=1 CFc 2
(1)
This process was implemented in all the classifiers used, being then trained, validated and tested.
Fig. 1. Scheme of the customization of the classifier by removing the last two output layers SL and CL from the original classifier and adding three new layers F Cn , SLn and W CL.
3.2
Dataset Creation
The dataset used, MoLa-VI, was created by the authors, it is comprised of images of the interior of cars from scrap yards and dealerships. Images are related with two RGB sensors, with 3264×2448 and 1920×1080 resolutions and an ultrawide field-of-view (greater than 110◦ ). Each sensor was placed in one of the 9 positions (i.e. P1 to P9) represented in Fig. 2, where is also illustrated the perspective of each. In positions P1 and P5, images are captured in three different vertical orientations. In total, each car provides an average of 13 images for each sensor. With a total of 135 cars, the dataset is comprised of 3488 images. Pixel-wise labelling was performed, for each image, thus adding the spatial information of 4 relevant classes (i.e. good, damage, stain and dirt), as shown in Fig. 2. To provide the dataset with the required classification information, an automated labelling process was created. For this, a script verifies the presence of bad classes, i.e. pixel-wise labels in all the images, to obtain a new set of labels (i.e. good and bad condition).
440
P. Faria et al.
Fig. 2. MoLa-VI dataset. Example of perspectives captured by RGB sensors at positions P1 to P9 inside each car. Left side the RGB sensors positions are shown. Middle the perspectives of each RGB sensor is presented. Right side presents a pixel-wise segmentation where each colour is associated with a certain class (i.e. blue as damage, orange as stain and yellow as dirt).
4 4.1
Experiments Data Preparation
All classifiers were trained and tested in the Mola-VI, which was split in train, valid and test, with a percentage of random samples of the entire dataset. The division consisted of 70%, 10% and 20%, respectively, as shown in Table 1. Table 1. Presentation of the percentages of each label in each training set and the respective number of samples in each set. Classes Bad condition Train
4.2
Good condition
70% (2439 samples) 52% (1169 samples) 48% (1270 samples)
Validation 10% (351 samples)
50% (175 samples)
50% (176 samples)
Test
50% (350 samples)
50% (348 samples)
20% (689 samples)
Classifier Configuration
All evaluations were performed using similar hyper-parameters: 100 epochs, ADAM optimizer, 0.0001 learning rate, learning rate drop factor of 30%, learning rate drop period of 10 epochs, L2 Regularization of 0.0001, validation frequency of 1 epoch, validation loss patience of 10 epoch and a batch size of 64.
In-Car State Classification with RGB Images
441
All evaluations were performed using MATLAB R2020a source code and performed on an Intel (R) processor Xeon (R) Gold 6140 CPU 2.30 Ghz, with 128 GB RAM and an NVIDIA Tesla V100 PCIE 16 GB GPU. Results related to train and validation are shown in Figs. 3 and 4 for accuracy and loss, respectively. Test results are shown in Table 2, for mean Accuracy, mAC(%), and performance, Hz. Qualitative results for bad and good predictions are shown in Fig. 5. Table 2. Accuracy obtained from the test executed to each classifier. Classifiers
mAC
Hz
Alexnet
83.19%
Classifiers
mAC
Hz Classifiers
mAC
Hz
111 ResNet-50
90.66% 71 MobileNet-V2 91.52% 53
SqueezeNet 86.06%
119 ResNet-101
91.24% 42 Xception
89.22% 48
GoogLeNet 61.78%
68
88.79% 45 ShuffleNet
87.93% 48
VGG-16
87.79%
103 Inception-ResNet-V2 90.80% 33 DenseNet-201
51.15% 13
VGG-19
87.79%
91
88.65% 8
Inception-V3 DarkNet-19
ResNet-18 91.52% 123 DarkNet-53
86.93% 83 NasNet-Large
86.35% 43 NasNet-Mobile 91.24% 43
Fig. 3. Graphical representation of the train executed for each classifier in terms of accuracy and validation per epoch.
442
P. Faria et al.
Fig. 4. Graphical representation of the train executed for each classifier in terms of loss and validation per epoch.
Fig. 5. Qualitative test results. Two left images show correct predictions, where the second image presents small stains. Two right images show wrong predictions, with a highly damaged steering wheel.
5
Discussion
This paper proposed the use of pre-trained classifiers to estimate the cleanliness state of car interiors, using RGB images. An in-house dataset, with images of the interior of 135 cars, was used and expanded with classification labels (i.e. good and bad classes). 18 state-of-the-art classification models were fine-tuned for the targeted dataset. Using the original pre-trained models, the two last layers were replaced by three layers focused on two classes. Moreover, cross-entropy used weighted classes thus improving robustness to the dataset class imbalance.
In-Car State Classification with RGB Images
443
Train results (Figs. 3 and 4) showed that all models stopped training under 20 epochs, due to validation loss stabilization and exploding gradients (i.e. loss or accuracy spike), thus avoiding overfit. Generally, all models showed increasing train and validation accuracy, stabilizing between 60% and 100%, something that can be related with the dataset reduced size. Some models were more prone to the effect of exploding gradients, such as darknet19, which had a sudden drop in accuracy and increase in the loss at the 7th epoch. DenseNet201, Inceptionnetv2, Inceptionnetv3 and VGG16 showed the same effect after the 10th epoch, however, in this case, it triggered the validation patience threshold, thus stopping the train. Test results, shown in Table 2, conclude that several models are able to achieve a mean accuracy higher than 90% (i.e. ResNet-18, ResNet-50, ResNet-101, Inception-ResNet-V2, MobileNet-V2 and NasNet-Mobile). Moreover, ResNet-18 achieved 123 Hz and 91.52% accuracy, presenting the opportunity of applying such classification into high performance, or low-profile computing, scenarios. Although classification is good, there is still some aberrant classification (Fig. 5), where a highly damaged steering wheel is wrongfully classified. Something that can be related to the reduced dataset size, allowing for material texture misunderstandings.
6
Conclusions and Future Work
In this paper, we have shown how to implement state-of-the-art classifiers, through transfer-learning, to classify in-car cleanliness. The objective of such a study is to estimate the integrity of the car interior. Therefore, there is a need to understand when we are in the presence of an unsuitable car interior, since cars are susceptible to degradation, due to the use of the car interior space by passengers. It is also presented the creation of an in-car dataset, Mola-VI, with images of the interior of cars. For this purpose, a set of classifiers were trained. The ResNet-18 classifier showed the best results, with a 91.24% mean accuracy 123 Hz, and is presented as a good solution with high performance and low-profile computing for future implementations in in-vehicle cleanliness classification. As for future work, we intend to expand the in-car dataset, trying to add more samples and more diversity at the level of cars and classes found in this context. Moreover, extra fine-tuning techniques can be added, as well as other methods, to evaluate this issue in our in-car dataset. Acknowledgements. This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n◦ 039334; Funding Reference: POCI-01-0247-FEDER-039334].
444
P. Faria et al.
References 1. Narayanan, S., Chaniotakis, E., Antoniou, C.: Shared autonomous vehicle services: a comprehensive review. Transp. Res. Part C Emerg. Technol. 111(January), 255– 293 (2020) 2. Hao, M., Yamamoto, T.: Shared autonomous vehicles: a review considering car sharing and autonomous vehicles. Asian Transp. Stud. 5(1), 47–63 (2018) 3. Torres, H.R., et al.: Real-time human body pose estimation for in-car depth images. In: IFIP Advances in Information and Communication Technology, vol. 553, pp. 169–182. Springer, New York LLC (2019) 4. Borges, J., et al.: Automated generation of synthetic in-car dataset for human body pose detection. In: VISIGRAPP 2020 - Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 5, no. Visigrapp, pp. 550–557 (2020) 5. Borges, J., et al.: A system for the generation of in-car human body pose datasets. Mach. Vis. Appl. 32(1), 1–15 (2021) 6. Liu, J., Yang, W., Dai, J.: Research on thermal wave processing of lock-in thermography based on analyzing image sequences for NDT. Infrared Phys. Technol. 53(5), 348–357 (2010) 7. Jing, J., Zhang, H., Wang, J., Li, P., Jia, J.: Fabric defect detection using Gabor filters and defect classification based on LBP and Tamura method. J. Text. Inst. 104(1), 18–27 (2013) 8. Hu, G.-H.: Optimal ring Gabor filter design for texture defect detection using a simulated annealing algorithm. In: Proceedings of the 2014 International Conference of Information Science, Electronic and Electrical Engineering, ISEEE 2014, vol. 2, pp. 860–864 (2014) 9. Jaiswal, A., Gianchandani, N., Singh, D., Kumar, V., Kaur, M.: Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning. J. Biomol. Struct. Dyn. 38, 1–8 (2020) 10. Sukegawa, S., et al.: Deep neural networks for dental implant system classification. Biomolecules 10(7), 1–13 (2020) 11. Wu, Y., Qin, X., Pan, Y., Yuan, C.: Convolution neural network based transfer learning for classification of flowers. In: 2018 IEEE 3rd International Conference on Signal Image Process. ICSIP 2018, pp. 562–566 (2019) 12. Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning, Lecture Notes in Computer Science (including Subseries of Lecture Notes Artificial Intelligence and Lecture Notes Bioinformatics), vol. 11141, LNCS, pp. 270–279 (2018) 13. Liu, L., Zhang, J., Fu, X., Liu, L., Huang, Q.: Unsupervised segmentation and elm for fabric defect image classification. Multimed. Tools Appl. 78(9), 12421–12449 (2019) 14. Jeyaraj, P.R., Samuel Nadar, E.R.: Computer vision for automatic detection and classification of fabric defect employing deep learning algorithm. Int. J. Cloth. Sci. Technol. 31(4), 510–521 (2019) 15. Universitesi, C.: Evaluation of Fabric Defect Detection Based on Transfer Learning with Pre-trained AlexNet Onceden E gitilmis AlexNet ile Transfer O Dayalı Kumas Hata Tespitinin De gerlendirilmesi (2018) 16. Patil, K., Kulkarni, M., Sriraman, A., Karande, S.: Deep learning based car damage classification. In: Proceedings - 16th IEEE International Conference on Machine Learning and Applications. ICMLA 2017, vol. 2017, pp. 50–54 (2017). https://doi. org/10.1109/ICMLA.2017.0-179
In-Car State Classification with RGB Images
445
17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012) 18. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ¡0.5MB model size, pp. 1–13 (2016) 19. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 07, pp. 1–9 (2015) 20. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14 (2015) 21. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016, pp. 770–778 (2016) 22. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016) 23. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings 30th IEEE Conference on Computer Vision Pattern Recognition, CVPR 2017, vol. 2017, pp. 6517–6525 (2017) 24. Howard, A.G., et al.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (2017) 25. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017, pp. 1800–1807 (2017) 26. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE 27. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017, pp. 2261–2269 (2017) 28. Adam, G., Lorraine, J.: Understanding Neural Architecture Search Techniques (2019) 29. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) 30. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, vol. 4, p. 12 (2017) 31. Zhang, X., Zhou, X, Lin, M., Sun, J.:. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv preprint arXiv:1707.01083v2 (2017) 32. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.W.: Learning Transferable Architectures for Scalable Image Recognition. 2, no. 6. arXiv preprint arXiv:1707.07012 (2017)
Designing and Developing Graphical User Interface for the MultiChain Blockchain: Towards Incorporating HCI in Blockchain Tani Hossain1(B) , Tasniah Mohiuddin1 , A. M. Shahed Hasan1 , Muhammad Nazrul Islam1 , and Syed Akhter Hossain2 1
Department of Computer Science and Engineering, Military Institute of Science and Technology, Mirpur Cantonment, Dhaka 1216, Bangladesh 2 Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
Abstract. Blockchain is a revolutionary technology that is gradually changing the transaction structure, database system, and even communication system. Among many, MultiChain is one of the most prominent blockchains for deploying private blockchains. As the MultiChain platform is a script-based tool, it has a very challenging user experience. Using a research point of view, we have conducted some experiments to find a solution to make the MultiChain platform usable for people with non-technical backgrounds and came up with a graphical user interface (GUI) that is going to increase the usability of MultiChain. A withinsubject evaluation study showed that the developed GUI system significantly increased the usability of MultiChain. With this new interface, MultiChain can be used in financial, educational, medical, and various other sectors with great ease. Keywords: Blockchain · MultiChain Human-Computer Interaction (HCI)
1
· GUI · User interface ·
Introduction
Currently, Blockchain is a new and promoted trend; and continuing its signs of progress for the regulation of human and business activities [2,22,23]. According to Islam et al. [10], “Blockchain consists of blocks containing messages, proof of work and reference of the previous block and stored in a shared database, which can perform transactions over P2P network maintaining irreversible historical records and transparency”. There are mainly three types of blockchain, such as public, private, and hybrid. These blockchains are implemented in various platforms such as Bitcoin, Ethereum, MultiChain, etc. [7,9,21,24]. Private blockchain could belong to a person or a company where the administrator(s) looks after essential things such as access and read/write permissions [14]. One organization or individual keeps the centralized write permissions and the admin c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 446–456, 2021. https://doi.org/10.1007/978-3-030-71187-0_41
Developing GUI for MultiChain
447
gives mining rights to the desired participants [8]. In a private blockchain, only people with mining permission can participate in validating transactions. An example of a private blockchain is MultiChain [13]. MultiChain is an off-the-shelf private blockchain which ensures confidentiality and control [8]. It has threefold core aims [19]: ensuring the visibility of blockchain’s activity to chosen participants only, introducing control over the permissions of transactions and enabling mining to be performed securely without proof of work. Despite being a useful blockchain platform, especially for the private network, people can be discouraged from using MultiChain because it is only available through a command-line interface (CLI), which raises many issues. Firstly, it is impossible for people without a thorough knowledge of CLI to use this platform. Secondly, because of the CLI’s complex functions with multiple parameters, it is very difficult to use the interface without making errors. Thirdly, MultiChain uses wallet addresses to distinguish users, which are very complicated. Moreover, MultiChain shows the information in a primitive format, making it very difficult to understand and get the necessary information. Additionally, finding specific information is near impossible because of the process in which information is shown. Again, if MultiChain is used through CLI, training will be required for users, which will take considerably more time to complete the tasks. Thus a user-friendly, usable, and interactive user interface (UI) is required for interacting with the MultiChain blockchain to improve its usability. Since, usability is considered a vital quality of any computer application [11,12], enhancing the usability of systems is treated as the most crucial goal of HCI [3,17]. Therefore, the objective of this research is to design and develop a Graphical User Interface (GUI) based software to eradicate the complexity and improving the usability of the MultiChain platform. The GUI system was also evaluated through a user study to assess its usability in terms of effectiveness, efficiency, and satisfaction. The remaining sections of this paper are organized as follows. Section 2 covers the works related to MultiChain. The overall development and implementation of the system are discussed in Sect. 3. In Sect. 4, evaluation of the system has been done and Sect. 5 concludes the paper.
2
Related Works
Blockchain is a revolutionary technology and so a significant amount of research has been conducted related to blockchain. Since MultiChain is a relatively new technology, very few works were conducted on MultiChain. In this section, the most relevant researches that have been studied are discussed. In [19], authors configured a decentralized information sharing architecture using MultiChain blockchain technology in travel sector, while a use case on food tracking and product management was implemented using the MultiChain platform in [13]. Similarly, a MultiChain-based traceability system was proposed in [4] for assuring secure and effective tracking at various stages of wine production. Authors of [1] proposed a MultiChain platform for real estate to ensure transparency, immutability, and trust among the users. In [18], a distributed supply chain management was designed using MultiChain architecture.
448
T. Hossain et al.
Authors of [15] found that the usability of the product can affect the overall user experience for doing their desired tasks. Another study [16] focused on usability issues of performing the key tasks of a blockchain system and suggested to improve some areas for better user experience. In another work [5], the authors presented a typology of seven classes of blockchain applications and discussed various roles that HCI can play in blockchain technology such as inspiring user experience in blockchain applications. Similarly in [6], the use of HCI in blockchain research and development is highlighted. As the introduction of blockchain requires many new tools, devices, apps, etc., HCI can ensure their smooth and productive experience. As seen in [1,4,13,18], various studies proposed use cases or systems based on the MultiChain platform. Although researches explicitly based on usability or incorporating HCI in MultiChain were not found, multiple works have been done on usability and the role of HCI in blockchain in general. While some studies discussed how usability can affect users in performing tasks and suggested improvements on usability, other works focused on the HCI component in blockchain applications and provided HCI related guidelines in blockchain technology to ensure smooth and productive blockchain applications with greater user experience. In light of these works, we have come to the conclusion that a rich, smooth, user-friendly interface instead of the current shell-based interface will greatly assist in performing MultiChain functionalities more effectively and efficiently.
3
Development of the System
The GUI for MultiChain intends to be interactive, user-friendly, intuitive, and convenient. For designing the GUI, Qt Designer 5 is used, which is vastly used for software designing. The backend functionalities of communicating between the shell and the user are implemented with Python3. A welcome screen will be shown when a user opens the GUI for the very first time. The user can either create a new chain or connect to an existing chain to proceed further. After the user is connected to a specific chain, he/she will be led to the dashboard. The dashboard has a left sidebar for ease of navigation. With this, the user can go to different functionalities. He/she can perform various tasks such as granting/revoking permissions if the user is the admin of the chain, asset issuance, and transaction, creating or publishing to streams. The user can also observe various information such as the list of permissions, transactions, or stream items. How to perform all these tasks efficiently using the system is described in detail in the following subsections. 3.1
Creating and Connecting to Chain
Before using any functionality, the user must be connected to a chain. The first page gives the option to either connect to an existing chain or to create a new chain. In the first option, if the user has previously connected to the chain, he will be connected without a problem just by entering the name of the chain. On the other hand, if it is a new user, the wallet address is shown in a message, which
Developing GUI for MultiChain
449
is to be copied and given to the admin node for granting connect permission. Creating a new chain is fairly straightforward. The user will give the name of the chain and a name for his/her wallet. The chain will be created and he/she will be the admin of the chain. 3.2
Creating Wallets
Long and complicated wallet addresses are inconvenient for realistic conditions. That is why, in the GUI system, every wallet is assigned a unique wallet name so that anyone can easily distinguish between the nodes without having to remember or identify the wallet addresses. A node can have multiple wallets. This is useful because the node can use different wallets for different reasons such as for general, emergencies, work, etc. In the left sidebar under the Wallet section, a new wallet can be created. A new wallet address will be created automatically. The user has to provide a new unique name corresponding to the address to separate it from other wallets. 3.3
Granting and Revoking Permissions
Admin node can grant or revoke one or multiple permissions of the chain to any other wallet. Without getting permission from the admin node, no one can perform any task in the chain. To grant or revoke permissions from the GUI system the admin needs to go to the permissions section from the sidebar. Permission can be granted to either an existing user or a new wallet address. The option of a wallet address is needed because granting connect permission to the wallet address is the only way a new user can be connected to the chain. A name must also be provided with a wallet address to identify the wallet later on. One or multiple permissions can be selected from the list of permissions. In case of granting permissions to an existing user the admin just needs to choose among the names of wallets shown instead of the wallet address field. 3.4
Issuance and Transaction of Assets
In the Asset section, a user can perform various tasks related to assets. The first of them is issuing a new asset or issuing more quantity of an existing asset. Figure 1(a) shows the page for issuing a new asset. Here, a specific wallet is used for issuing the asset. It also needs to be mentioned which wallet will receive the issued asset. The wallets with receive permissions automatically appear to choose from, as the receiver of the asset must possess receive permission. The name, quantity, and smallest divisible amount of the asset are also needed. When issuing more of an existing asset, open assets will be shown and the specific asset has to be chosen. Just the quantity needs to be mentioned in that case. As mentioned before, the asset transaction is a very important part of MultiChain. A user, with more amount of the asset than its smallest amount, can send any asset in his/her possession to another user or transfer the asset between
450
T. Hossain et al.
Fig. 1. MultiChain GUI
his/her own wallets. In the “Send” portion of the GUI, the assets in the possession of the user will be shown with the quantity of each asset. The user has to choose an asset and input the quantity of the asset he/she wants to send and then send it to any other wallet in the chain with receiving permission. A user can also see the balance of the assets in his/her possession. This can be shown in two manners, the overall balance combining all the wallets of the user, or for a specific wallet of the user. The user can also see the list of transactions which shows the assets he/she issued, sent, and received with quantity and sending and receiving wallet. 3.5
Creating and Publishing to Streams
When a new chain is created, a default root stream is created along with it. In the GUI system, this root stream is used for storing the names of the wallet addresses. This way, the wallet names can be shown in every node without having to maintain a database server. A user can create a new stream, publish to an existing stream, and see the items of a specific stream. For creating a stream, the name of the stream is needed. The user also needs to choose a wallet with which the stream will be created. For this, a user has to choose a wallet, give a key for the item, and the item itself which is basically any information he/she wants to store. The key is needed for retrieving the item later and to separate various data stored in the stream. For this purpose, the key has to be unique for the stream. 3.6
Retrieval of Information
Information such as list of permissions granted to the wallets, list of transactions of a specific asset or overall, list of items in a stream, etc. is processed and shown in a tabular form with wallet names so that anyone can easily understand the
Developing GUI for MultiChain
451
details of the information. There is no separate list segment in the system. Users can find various lists of information in each section. Figure 1(b) shows the list of transactions in a tabular format that is filterable with respect to wallets. The published items can be seen in a list where if a stream is chosen, items of the streams will be shown. The items can be filtered with wallets. A user can also search through the list with the key to see a particular item. So, the list of streams is filterable with respect to streams, wallets, and searchable with respect to keys. This interactive list makes the data more useful. With the shell-based platform, even simple tasks become complex with various issues such as identifying different wallet addresses, remembering specific commands and their parameters in order to successfully complete a task, etc. If a command is given to perform a task, a transaction ID is returned if the task is successfully completed. With the GUI, these complexities are eradicated with the user-friendly interactive approach. A successful task is represented by showing a success message in the GUI. Otherwise, an error message with the cause of the error is shown. The user can easily understand and perform the tasks and get the required information.
4
Evaluation of the System
To observe the effectiveness of the GUI, a within-subject experiment was carried out. This section will discuss the participants’ profiles, the study procedure, and the evaluation outcome. 4.1
Participant’s Profile
In order to conduct the evaluation, a total of 12 people, consisting of 8 males and 4 females, were selected according to the following conditions. All of the participants were comfortable with using computers. Six of the participants were familiar with the command-line interface. None of them were familiar with MultiChain-cli. The average age was 24 ± 2.6 (mean ± std. deviation). 4.2
Study Procedure
The study was executed in the following manner. 1. At the beginning of the evaluation process, participants were briefed about the purpose of the study and their roles in it. Participants were dearly ensured that it is the system that was being assessed and not their performance so that they can give their opinions without hesitation. 2. Functionality of blockchain, MultiChain and the working process of MultiChain such as creating a chain, granting permissions, issuing and sending assets, publishing streams, etc. were explained in detail to the participants.
452
T. Hossain et al.
3. To extricate participants from the tedious process of setting up the environment, two separate chains were created for each participant by the authors who acted as moderators in the experiment, one to use with CLI and another to use with GUI. For each of the chains, one participant had administrative privileges and the other 11 participants were given connect permission. Therefore, each participant had his/her own wallet address (in the case of CLI) or wallet name (in case of GUI) for each chain. 4. Once the chains are set up by the moderators, participants were requested to perform the following tasks in the chains where they had administrative privileges, once in CLI and again in GUI. A certain time was assigned for each task. The assignment to CLI and GUI was given in a random order to avoid the problem of learning fatigue as the experiment was designed as within-subject. Each participant was also provided with a wallet address (in the case of CLI) and a wallet name (in the case of GUI) to be used to perform the tasks. Task 1: Each participant will give issue, send, receive, create and publish permissions to the wallet that was provided. 30 s was assigned to complete the task. Task 2: Participants will issue an asset with a name provided by the moderators with the amount of 1000 units and the smallest unit of 0.1 in their own chains, then send 500 units of the asset to the provided wallet (for test purpose). The time given for this task was 40 s. Task 3: Every participant will create a stream in the chain with the administrative prerogative, and publish an item in the stream (for test purpose). The name of the stream and the item to publish are provided by the moderators. The time for this task was also 40 s. 5. A set of questions were presented to participants after the study about the ease of using and learning their respective system, CLI or GUI, feasibility in future usage, chances of recommending to others, and overall satisfaction. Each answer was taken from a range of 1 to 5 where 1 is strongly negative and 5 is strongly positive. 4.3
Data Collection and Analysis
The results of the evaluation were analyzed from a usability perspective in order to incorporate HCI issues in blockchain uses. In this work, usability metric effectiveness, efficiency, and satisfaction are used to assure the usability as suggested by ISO [20]. The study and the participants’ feedback from the questions are represented in Table 1. The descriptive statistical values (mean and standard deviation) and the inferential statistical values (t-value and p-value) in CLI and GUI of the 12 participants are calculated for each evaluation criterion such as number of attempts, task completion time, etc. The “*” in the p-value indicates the values that showed a significant difference with a 0.05 level of significance. As seen from the result, there are 3 aspects of the evaluation; effectiveness, efficiency, and satisfaction.
Developing GUI for MultiChain
453
Table 1. Results of evaluation study Evaluation metrics
Data type
Effectiveness Number of attempts
Efficiency
Task completion time
Number of times asking for help
Satisfaction
Ease of use
Task
CLI (Mean ± SD) GUI (Mean ± SD) t-value p-value
Task 1
2.00 ± 0.95
1.75 ± 0.97
0.638
0.265
Task 2
2.17 ± 0.94
1.83 ± 0.83
0.920
0.184
Task 3
1.92 ± 0.90
1.33 ± 0.65
1.818
0.041*
Task 1 25.50 ± 5.09
21.42 ± 6.30
1.476
0.047*
Task 2 34.08 ± 6.65
28.92 ± 5.98
2.001
0.029*
Task 3 32.67 ± 6.64
25.92 ± 7.32
2.367
0.014*
Task 1
2.33 ± 0.89
1.42 ± 1.08
2.267
0.017*
Task 2
2.58 ± 1.16
1.67 ± 1.15
1.936
0.033*
Task 3
1.83 ± 1.03
1.17 ± 1.03
1.586
0.064
−
2.50 ± 1.24
4.08 ± 0.67
−3.886 0.001*
Ease of learning
−
2.67 ± 1.30
3.75 ± 0.87
−2.399 0.013*
Future use
−
2.25 ± 1.06
3.83 ± 0.72
−4.298 0.000*
Recommendability −
2.17 ± 1.11
3.58 ± 0.67
−3.776 0.001*
−
2.58 ± 1.38
3.83 ± 0.72
−2.785 0.005*
Overall satisfaction
Effectiveness. Effectiveness is determined by the success rate and number of attempts it takes for participants in completing the tasks. It can be seen from Fig. 2 that for Task 1, the success rate in the CLI is 7 out of 12 while it is 9 in the GUI. In tasks 2 and 3, success rates are 7 and 8 respectively in the CLI, while it is 11 in the GUI for both cases. For each task, the success rate is noticeably higher in GUI than in CLI. In the case of number of attempts, comparing between the CLI and the GUI, the GUI comes ahead with a lower number of attempts in average for all tasks. It is notable that, number of attempts is the highest for Task 2 in both platforms, as this is comparatively more complicated than Task 1 with participants having to perform two functionalities. Although Task 3 is also complicated with two functionalities, number of attempts reduces as participants get more used to the platforms.
Fig. 2. Success rate of the tasks in CLI & GUI
454
T. Hossain et al.
Efficiency. Efficiency is determined by completion time and the frequency of assistance needed for task completion; the lower the number, the higher the efficiency. The difference in the efficiency of the two tools can be clearly observed with the task completion time. The difference in completion time between CLI and GUI increases significantly from Task 1 to 3. A similar result is observed for asks for help. Again, for both platforms, Task 2 takes the most time and has the most number of participants asking for help. It can also be noticed that despite Task 3 being as challenging as Task 2, the average number of attempts for completing Task 3 is less than Task 2. This is because the participants became gradually familiar with the respective systems after completing Tasks 1 and 2. Satisfaction. From the questions that were asked to the participants, the satisfaction results are calculated. The participants found the GUI more satisfying than the CLI. The descriptive statistics (mean and STD values) and the inferential statistics (t-value and p-values) analysis showed improvement in all concerns related to effectiveness, efficiency, and satisfaction, while most of the cases showed significant improvement, except the cases of Task 1 and 2 in number of attempts and Task 3 in the number of times asking for help. Again, it can also be seen that time to complete the task and the number of times asking for help decreases in Task 3 from Task 2 due to familiarity with the respective systems. Analysis Based on Participant Experience. Of the 12 participants selected for the evaluation study, 6 were familiar with CLI and the rest had no prior experience with CLI. In the case of CLI based system, a detailed comparison of task performance between the participants who were familiar with CLI and the participants who were not was conducted. The experiment results showed that in the case of performing the tasks in CLI, participants having knowledge about the command-line interface have performed better than those without knowledge of CLI. Again, no significant differences were observed in all concerns related to the effectiveness, efficiency, and satisfaction while performing the tasks using GUI between the same groups of participants (i.e., who were familiar with using CLI and the participants who were not). So, the prior knowledge of CLI has no noticeable effect on the performance of the participants on the GUI.
5
Discussion and Conclusion
In this research, basic functionalities of MultiChain were implemented using a GUI so that the tasks of MultiChain can be completed without interacting with the command-line interface. It is seen from the evaluation that using the GUI instead of the CLI makes it possible to perform the functionalities of MultiChain with higher accuracy and in a much shorter time increasing effectiveness and efficiency. This makes the tool satisfactory and recommendable according to the participants of the evaluation. However, there are functionalities of MultiChain that are not included in the GUI. In our future work, more of these functionalities
Developing GUI for MultiChain
455
will be added gradually. For example, the exchange transaction will be added to the GUI. Again, MultiChain offers multiple administration functionality and in such case, majority rule applies for granting and revoking permissions which will be added to the GUI in the future as well. Finally, tweaking the configuration of the chain will also be incorporated. Although there are many functionalities yet to incorporate in the GUI, the GUI can be used to perform all the basic functionalities which is still very useful. The GUI system removes the complexity issues and makes the MultiChain usable in financial, educational, medical, or any other professional sectors where private blockchain is needed.
References 1. Avantaggiato, M., Gallo, P.: Challenges and opportunities using multichain for real estate. In: 2019 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), pp. 1–5. IEEE (2019) 2. Azim, A., Islam, M.N., Spranger, P.E.: Blockchain and novel coronavirus: towards preventing covid-19 and future pandemics. Iberoam. J. Med. 2(3), 215–218 (2020) 3. Benyon, D.: Adaptive systems: a solution to usability problems. User Model. UserAdap. Inter. 3(1), 65–87 (1993) 4. Biswas, K., Muthukkumarasamy, V., Tan, W.L.: Blockchain based wine supply chain traceability system. In: Future Technologies Conference (2017) 5. Elsden, C., Manohar, A., Briggs, J., Harding, M., Speed, C., Vines, J.: Making sense of blockchain applications: a typology for HCI. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–14 (2018) 6. Foth, M.: The promise of blockchain technology for interaction design. In: Proceedings of the 29th Australian Conference on Computer-Human Interaction, pp. 513–517. OZCHI ’17, Association for Computing Machinery (2017) 7. Gerard, D.: Attack of the 50 Foot Blockchain: Bitcoin, Blockchain. Ethereum & Smart Contracts (2017) 8. Greenspan, D.G.: Multichain private blockchain—white paper (2015). https:// www.multichain.com/download/MultiChain-White-Paper.pdf. Accessed 16 June 2020 9. Guegan, D.: Public blockchain versus private blockhain (2017) 10. Islam, I., Munim, K.M., Oishwee, S.J., Islam, A.K.M.N., Islam, M.N.: A critical review of concepts, benefits, and pitfalls of blockchain technology using concept map. IEEE Access 8, 68333–68341 (2020) 11. Islam, M.N., Rahman, S.A., Islam, M.S.: Assessing the usability of e-government websites of Bangladesh. In: 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 875–880. IEEE (2017) 12. Islam, M.N., T´etard, F.: Exploring the impact of interface signs’ interpretation accuracy, design, and evaluation on web usability. J. Syst. Inf. Tech. 16(4), 250– 276 (2014) ˇ 13. Ismailisufi, A., Popovi´c, T., Gligori´c, N., Radonjic, S., Sandi, S.: A private blockchain implementation using multichain open source platform. In: 2020 24th International Conference on Information Technology (IT), pp. 1–4 (2020) 14. Kaltyshev, M.: Proof of university certificate using blockchain technology (2018) 15. Kazerani, A., Rosati, D., Lesser, B.: Determining the usability of bitcoin for beginners using change tip and coinbase. In: Proceedings of the 35th ACM International Conference on the Design of Communication, pp. 1–5 (2017)
456
T. Hossain et al.
16. Moniruzzaman, M., Chowdhury, F., Ferdous, M.S.: Examining usability issues in blockchain-based cryptocurrency wallets. In: International Conference on Cyber Security and Computer Science, pp. 631–643. Springer, Cham (2020) 17. Razzak, M.A., Islam, M.N.: Exploring and evaluating the usability factors for military application: a road map for HCI in military applications. Hum. Factors Mech. Eng. Def. Saf. 4(1), 4 (2020) 18. Schulz, K.F., Freund, D.: A multichain architecture for distributed supply chain design in industry 4.0. In: International Conference on Business Information Systems, pp. 277–288. Springer, Cham (2018) 19. Shrestha, A.K., Deters, R., Vassileva, J.: User-controlled privacy-preserving user profile data sharing based on blockchain. arXiv preprint arXiv:1909.05028 (2019) 20. Standard, I.: Ergonomic requirements for office work with visual display terminals (VDTs)–part 11: Guidance on usability. ISO standard 9241-11: 1998. International Organization for Standardization (1998) 21. Stephen, R., Alex, A.: A review on blockchain security. In: IOP Conference Series: Materials Science and Engineering, vol. 396, p. 012030. IOP Publishing (2018) 22. Sultana, M., Hossain, A., Laila, F., Taher, K.A., Islam, M.N.: Towards developing a secure medical image sharing system based on zero trust principles and blockchain technology. BMC Med. Inf. Decis. Mak. 20(1), 1–10 (2020) 23. Swan, M.: Blockchain: Blueprint for a New Economy, 1st edn. O’Reilly Media Inc., Newton (2015) 24. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology: Architecture, consensus, and future trends. In: 2017 IEEE International Congress on Big Data (BigData Congress), pp. 557–564. IEEE (2017)
Novel Martingale Approaches for Change Point Detection Jonathan Etumusei(B) , Jorge Martinez Carracedo, and Sally McClean Ulster University, Jordanstown, UK [email protected] https://www.ulster.ac.uk
Abstract. Existing algorithms are able to find changes in data streams but they often struggle to distinguish between a real change and noise. This fact limits the effectiveness of current algorithms. In this paper, we propose two methods using the Martingale framework that are able to detect changes and minimise the noise effect in a labelled electromagnetic data set. Results show that the proposed methods make some improvements over the previous approaches within the Martingale framework. Keywords: Anomaly detection
1
· Electromagnetic data · Martingales
Introduction
Machine learning is a key component of artificial intelligence, being essential in the development of different algorithms. Activity recognition, medical diagnosis, image recognition, speech recognition, social media services or virtual personal assistants are only a few examples of the many applications that this field has. It is a big challenge for the research community to track, analyse and detect anomalies in time series and it is therefore critical to develope more effective data analytic techniques to detect those anomalies. One important application of anomaly detection (AD) is to discover abnormalities in geological or terrestrial time series for risk assessment. The Earth’s magnetic field acts as a gigantic magnet and therefore it creates a mangnetic field that can interact with electrically charged objects[18]. An electromagnetic (EM) field can be influenced by electromagnetic interference (EMI). EMI can come from different sources and they can be both natural or human-made. The existence of an EM field makes possible the existence of noise [7]. Noise refers to the interference of the communication process between the satellite and the Earth’s surface. However, it can also be produced by an human or equipment errors. Being able to isolate real changes in an EM dataset is key in the development of new algorithms. The paper structure is a follows. In Sect. 2, we review the latest work done on identifying changes or anomalies in EM data. In Sect. 3, we introduce our new approaches. In Sect. 4, we show our experimental results and compare them with the existing Martingale algorithm. We finish the paper in Sect. 5 discussing the results that we got and the next steps that we will take in the research. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 457–467, 2021. https://doi.org/10.1007/978-3-030-71187-0_42
458
2
J. Etumusei et al.
Related Work
In the last decades, new AD techniques have been developed to detect earthquakes from EM data generated by satellites, geological and geographical sources. Xiong et al. [14] developed a technique to compare two earthquakes using a wavelet-based data mining method. The work aimed to use a wavelet transformation to detect seismic changes in EM data. The distinguishing feature of this method is that it analyses and detects seismic precursors using the wavelet transformation. The limitation of the algorithm is that it can not provide a measurement of the degree of change in the EM data set [9]. Christodoulou et al. [1] proposed a fuzzy-inspired method, known as the fuzzy inference system, to detect seismic anomalies in EM data. The goal of this algorithm was to detect changes in an electromagnetic time series obtained from satellite and geological sources. The results obtained showed that this method can detect changes in a terrestrial data set. This method is able to show the boundaries between regular data and an anomalous sequence. However, it does not produce a measurement of the degree if change in the time series. This method uses different components such as smoothing filter to reduce noise, correlation technique to reduce the signal dimension and a fast Fourier transformation method for peak finding. All this techniques makes computational complexity of the algorithm high. Ho and Wechsler [5] proposed the method known as Randomised Power Martingale (RPM) that is able to detect changes in data streams. The method was developed to adapt the Martingale framework (which was initially a betting strategy) to anomaly detection. This approach is able to detect changes in a data stream by checking every arriving time point through hypothesis testing. Experimentation showed the effectiveness of the method detecting abnormalities in time series. This method can be used to analyse EM data streams for change detection. There are still issues that diminish the effectiveness of the algorithm [9] such as the detection of a high number of false positives. Kong et al. [9] developed an algorithm that makes use of a geometric moving average of Martingales to detect earthquake precursors in EM data. This was developed to address some of the difficulties that the use of sliding windows present. This approach has been applied to analyse earthquake anomalies in areas such as Wenchuan, Puer, Beijing and other north-eastern areas in China. The method makes use of a geometric moving average to enhance the previously discussed RPM. This approach also uses a forgetting factor that determines the weight of the Martingale points. As the forgetting factor increases less weight is given to previous Martingale point. However, this approach still has some limitations. For instance, when the threshold is high some correct abnormal points are ignored increasing the false negatives and when the chosen threshold is small, a large amount of regular datapoints are identified as anomalies, increasing the false positives. As we have seen above, current techniques for anomaly detection in EM datasets still face some challenges. Between these challenges we can find a high
Martingale Approaches for Change Point Detection
459
rate of false positives detection or the inability of estimate the time interval in which change is taking place in the data set. In this paper, we present two new methods based in original Martingale method proposed by Ho and Weschler [5]: the moving median of a Martingale sequence (MMMS) and the Gaussian moving average of a Martingale sequence (GMAS). We also test these two methods in an EM dataset and compare the outputs of our algorithms with the original method. We also compare the results of the proposed algorithm to the previous randomised power Martingale (RPM) approach.
3
Martingale Approach
A Martingale could be informally defined as a sequence of random variables such that at a given time, the conditional expectation of the next value given all prior values is equal to the present value. Definition 1: [5] A sequence of random variables Mi : 0 ≤ i < ∞ is a Martingale concerning the sequence of random variables Xi : 0 ≤ i < ∞ , if for all i ≥ 0, the following conditions hold: – The Martingale Mi is a function that is measurable of X0 , X1 , ..., Xi , – E(| Mi |) < ∞ and – E(Mn+1 |X0 , ..., Xn ) = Mn . The Martingale theory was generalised by Joseph Leo Doob [2], who was motivated by the possibility of a successful gaming strategy. The building blocks for the use of the Martingale framework in anomaly detection were introduced by Ho and Wechsler [5] where they use a new metric called strangeness. Intuitively, it is the outcome of how a new data point differ from the previous ones. Let us consider a time series Z = z1 , ..., zi−1 . The arriving point will be represented as zi . Let us suppose that the data has been clustered into k disjoint sets Y1 , ..., Yk , (k si + θi j : sj = si . (2) pi (Z ∪ zi , θi ) = i Intuitively, pi measures the probability of being stranger than zi .
460
J. Etumusei et al.
It should be noted that pi is an exceptional case of the statistical notion of the p-value [5]. The set of pi values are not uniformly distributed on [0,1]; this is as a result of the newly observed points which are likely to have higher strangeness values compared to the previously observed points, and therefore pi becomes smaller [5]. It is possible to use the set of pi to compute a new random variable that is called the randomised power Martingale. Definition 3: [5] The randomised power Martingale (RPM) is indexed by ∈ [0, 1] defined at each time-point as Mn() =
n
(pi −1 ).
(3)
i=1
If we recap what we have at the moment, we are considering a time series Z = z1 , ..., zi−1 where the arriving point is zi . The data is clustered into k disjoint sets Y1 , ..., Yk , (k t,
(4)
where the threshold t is chosen in a probablistic way based in Dobb’s Inequality [5]. In the following section we introduce two new methods where we aim to improve the accuracy, recall and F1 of the previously described Martingale approach. 3.1
Moving Median of a Martingale Sequence
A moving median approach is a very robust and effective technique to detect anomalies in a data stream [8]. The moving median method finds the median of a data stream using a sliding window. Once the Martingale sequence for our time series has been computed, we can implement moving medians on the sequence of Martingale points. The main reason for using a median, rather than a mean, is that the median is more robust than a mean in the sense that it is not influenced by the individual values, only their order. This characteristic suggests that the median tends to smooth the data reducing the noise’s effect. Working with moving medians to detect terrestrial anomalies such as earthquakes can be interesting as these changes are not short-term or momentary abnormalities. Terrestrial anomalies show changes that can be prolonged in time [12]. Let us consider a Martingale sequence M = Mi : 0 ≤ i < ∞ and let us fix a window length l > 0. We define Dk as the moving median of the k − th window in Martingale sequence M. Fixed a threshold t > 0, we can set the condition
Martingale Approaches for Change Point Detection
461
for change detection. While Ho and Weschler [5] proposed a probabilistic way of finding a threshold, we suggest in this paper a threshold using the mean absolute deviation (MAD) of the Martingale sequence. The MAD is the average distance between each data point and the mean. The MAD shows how spread out the data is. Ley et al. [10] proposed, based on outlier detection, a threshold for change detection of M E ± 3 ∗ M AD, where M E is the mean (or median) of the data points. We used this approach to compute a threshold t for our methods. Therefore, this model will detect a change when: Dk ≥ t.
(5)
If Dk exceeds the given threshold of t, a change has been detected. When a change is detected the computation of Dk is finalised and the algorithm is restarted. 3.2
Gaussian Moving Average of a Martingale Sequence
The Gaussian function [11] can be used as a smoothing operator to compute a weighted average of the Martingale point. The points that are closer to the mean will get a bigger weight while the ones further away will get a smaller weight. This weighting process can be used to minimise the noise in the dataset making the time series pattern more visible. Let us consider a Martingale sequence M = Mk : k = 1, ..., s − 1 . For every k ∈ {1, ..., s − 1} we can compute the Gaussian function of the Martingale point Mi as 2 2 (6) ei = e−(Mi −μ) /2∗σ , where μ and σ are the mean and standard deviation of M [11]. At this point, we can compute a new value Hi using the following equation: ei Hi = k
i=0 ei
.
(7)
Finally, we define a final sequence of values as Eq. (8): Gk =
k
Hi M i ,
(8)
i=0
for k = 1, ..., n. If the computed Gk value exceeds some threshold t, then it is possible to assume that a change or abnormal condition has happened. As before, we will be using a threshold of M ± 3 ∗ M AD. All the methods that we are proposing in this paper have a similar flowchart that we show below:
462
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
4
J. Etumusei et al.
Data: Input (F): EM univariate data set Result: Output: MMMS/GMASpoints Initialise: M (0) = 1; i = 1; F = ; Set values for cluster group k, value, window size; while do A new example of zi is discovered; if F ={ } then Set the strangeness of zi := 0 else Compute the strangeness of zi and the data points in F Compute the pi of zi ; Compute the Mi using equation (3); Compute the MMMS/GMAS point; Compute the threshold t end if Dk /Gk ≥ t then CHANGE DETECTED Re-initiate Mi = 1 else Add zi into F ; end if i = i + 1 ; then end end
Experimental Results
All the proposed algorithms were implemented in a labelled EM data set to test their effectiveness. The dataset that we are using was generated by the ESA Swarm satellite [1], a univariate dataset containing 3751 data points. The data set contain anomalies that have been labelled by seismic experts. In Fig. 1, it is possible to see a plot of the data set and its abnormal fluctuation. To evaluate the algorithm performance we use the standard evaluation metrics: accuracy, precision, recall and F1. Accuracy is defined as the ratio between data points correctly detected and the total number of observations [4]. Accuracy can be computed using the following formula: Accuracy =
TP + TN TP + FP + FN + TN
where TP stands by True Positives, TN by True Negatives, FP by false positives and FN by false negatives. Precision is defined as the probability that change detection is correct. The precision is computed as follows P recision =
TP TP + FP
Martingale Approaches for Change Point Detection
463
6
4
Intensity(mV)
2
0
-2 Change region (2288 - 2478)
-4
-6
-8 0
500
1000
1500
2000
2500
3000
3500
4000
Data Points Fig. 1. EM data set
Recall is defined as the number of correctly detected time points over the number of real changes [4] and we can compute it as: Recall =
TP TP + FN
F1 score, also known as the harmonic mean, is a combination of precision and recall. The formula to compute F1 is given below: F1 =
2 ∗ Recall ∗ P recision Recall + P recision
As the initial family of Martingale is indexed by ∈ [0, 1], it was necessary to understand the behaviour of each algorithm when epsilon varies. Therefore, for each algorithm (RPM,MMMS and GMAS) we checked F1 for any epsilon value in the set {0, 0.01, 0.02, ..., 1}. A graphical representation of these results can be seen in Fig. 2. Therefore, for each algorithm (RPM, MMMS and GMAS), we compute the F1 for each epsilon value in the set 0, 0.01, 0.02, ..., 1. In each case we selected the epsilon which optimised F1 and used this value in the subsequent algorithm evaluation, using a different data set.
464
J. Etumusei et al. 0.7 MMMS GMAS RPM
0.65
0.6
F1
0.55
0.5
0.45
0.4
0.35 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Epsilon value Fig. 2. F1() values
In order to better visualize how all the methods work, we plotted the output of the proposed algorithms using the trained epsilon values on a new EM data set of 3750 data points together with their representation in the data set. Figure 3, 5, 7 show the algorithm output while Fig. 4, 6, 8 show the test data with accompanying change-points. The new EM data set is labelled with change points from 2821 to 2901 data points respectively. To evaluate the performance of the proposed algorithms, we will use the standard confusion metrics: TP, TN, FP and FN. These metrics provide an idea on how good is the performance of a classifier [13]. The confusion metrics for the proposed algorithms and the original RPM are summarized in Table 1.
Fig. 3. RPM output
Fig. 4. Change points detected by RPM
Martingale Approaches for Change Point Detection
465
Fig. 5. MMMS output
Fig. 6. Change points detected by MMMS
Fig. 7. GMAS output
Fig. 8. Change points detected by GMAS
Table 1. Confusion metrics Approach
TP
FP
FN
TN
MMMS
68(84.0%)
65(1.8%)
13 (16.0%)
3064(98.2%)
GMAS
55(67.9%)
68(1.9%)
26(32.1%)
3601(98.1%)
RPM
48(59.3%)
118(3.2%)
33(40.7%)
3551(96.8%)
We can see that our proposed new algorithms presents a higher TP detection ratio and a lower FN detection when we compared them to the previous RPM method. We can therefore affirm that for the data analysed, the proposed methods outperforms the traditional RPM. We can also use the information presented in Table 1 to compute the accuracy, recall and F1 for all the methods in the analysed dataset. These metrics are presented in Table 2. In this way we can easy present a comparison of the algorithm’s performance. As it is possible to see from Table 2, the two newly proposed algorithms present a better performance than the original RPM for every analysed metric. It is important to note that while accuracy is only slightly better for the proposed methods, the improvement in recall and F1 is significantly high.
466
J. Etumusei et al. Table 2. Evaluation metrics Method RPM
5
Accuracy
Recall
F1
0.66 0.9611
0.5625
0.3814
MMMS 0.32 0.9883
0.8395
0.6355
GMAS
0.67901 0.5392
0.89 0.9776
Conclusion and Future Work
In this paper, we discussed one of the first algorithms using a Martingale framework for change detection, the randomised power Martingale. This method is efficient detecting changes in some specific data streams. We proposed two methods that improve the performance of the original algorithm. Taking into account that F1 is more than a 50% bigger for some of our methods than for the original one, the improvement is considerably high. The results indicate that our proposed method is more effective when compared to the previous RPM method. We can conclude that the proposed algorithms can accurately detect changes in our terrestrial data set. However, more testing will be needed to verify this hypothesis. Future work will include to implement the proposed algorithms in other data sets coming from other sources and to develop a multivariate version of the methods.
References 1. Christodoulou, V., Bi, Y., Zhao, G.: A fuzzy inspired approach to seismic anomaly detection. In: International Conference on Knowledge Science, Engineering and Management, pp. 575–587. Springer, Chongqing (2015) 2. Doob, J.L.: Probability and statistics. Trans. Am. Math. Soc. 36(4), 759–775 (1934) 3. Fedorova, V., Gammerman, A., Nouretdinov, I., Vovk, V.: Plug-in Martingales for testing exchangeability on-line. arXiv preprint arXiv:1204.3251 (2012) 4. Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and fscore, with implication for evaluation. In: European Conference on Information Retrieval, pp. 345–359. Springer, Meylan (2005) 5. Ho, S.S., Wechsler, H.: A Martingale framework for detecting changes in data streams by testing exchangeability. IEEE Trans. Pattern Anal. Mach. Intell. 32(12), 2113–2127 (2010) 6. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002) 7. Kaur, M., Kakar, S., Mandal, D.: Electromagnetic interference. In: 2011 3rd International Conference on Electronics Computer Technology, vol. 4, pp. 1–5. IEEE (2011) 8. Kawala-Sterniuk, A., Podpora, M., Pelc, M., Blaszczyszyn, M., Gorzelanczyk, E.J., Martinek, R., Ozana, S.: Comparison of smoothing filters in analysis of EEG data for the medical diagnostics purposes. Sensors 20(3), 807 (2020)
Martingale Approaches for Change Point Detection
467
9. Kong, X., Bi, Y., Glass, D.H.: Detecting seismic anomalies in outgoing long-wave radiation data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(2), 649–660 (2014) 10. Leys, C., Ley, C., Klein, O., Bernard, P., Licata, L.: Detecting outliers: do not use standard deviation around the mean, use absolute deviation around the median. J. Exp. Soc. Psychol. 49(4), 764–766 (2013) 11. Lin, H.C., Wang, L.L., Yang, S.N.: Automatic determination of the spread parameter in gaussian smoothing. Pattern Recogn. Lett. 17(12), 1247–1252 (1996) 12. Phillips, R.J., Lambeck, K.: Gravity fields of the terrestrial planets: longwavelength anomalies and tectonics. Rev. Geophys. 18(1), 27–76 (1980) 13. Visa, S., Ramsay, B., Ralescu, A.L., Van Der Knaap, E.: Confusion matrix-based feature selection. MAICS 710, 120–127 (2011) 14. Xiong, P., Bi, Y., Shen, X.: Study of outgoing longwave radiation anomalies associated with two earthquakes in China using wavelet maxima. In: International Conference on Hybrid Artificial Intelligence Systems, pp. 77–87. Springer, Salamanca (2009)
Development of a Reinforcement Learning System to Solve the Job Shop Problem Bruno Cunha1(B)
, Ana Madureira1
, and Benjamim Fonseca2
1 Interdisciplinary Studies Research Center, Institute of Engineering - Polytechnic of Porto
(ISEP/IPP), Porto, Portugal [email protected] 2 INESC TEC, University of Trás-os-Montes and Alto Douro (UTAD), Vila Real, Portugal
Abstract. Traditionally, solving scheduling problems requires the intervention of expert users. This is a difficult task in a domain that presents significant challenges, considering that modern manufacturing industries are exceptionally complex, distributed, interconnected, and exposed to unexpected changes due to external factors. As such, it becomes clear the need for modern scheduling systems that can provide optimized solutions automatically with reduced user intervention. This paper presents an original system architecture that takes advantage of the latest machine learning developments to create an innovative approach that is very quick to propose a solution to a job shop scheduling problem while preserving an acceptable level of solution quality. The proposed architecture is detailed, including the five components: the job shop problem instance, the problem decoder, the job shop environment and the reinforcement learning algorithm - which, combined, form the intelligent agent - and, finally, the optimized solution to the problem. This paper also includes a description of the adopted hybrid software development methodology, that combines the best practices from cascade and evolutionary methodologies. This paper concludes by providing preliminary results that corroborate the high potential of the system, which is notably more efficient than existing offers while maintaining a good performance on the quality of the solutions. Keywords: Software development · Job shop scheduling · Simulation · Machine learning · Reinforcement learning
1 Introduction Manufacturing industries have a clear necessity to plan in advance, so that they can maximize their resources and increase their profits. The sequencing of operations in job shop scheduling (JSS) environments consists in making a decision on what is the order that each job will be processed in each machine, so that those resources are better exploited. This sequence of operations is, in practice, a scheduling plan that contains the solution to the JSS problem (JSSP). Even today the creation of scheduling plans is often done in an informal way, using empirical knowledge, and accomplished with insufficient care. However, it is essential to have a proper system in place and a careful approach to deal with more complex © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 468–477, 2021. https://doi.org/10.1007/978-3-030-71187-0_43
Development of a Reinforcement Learning System
469
situations. To achieve the goal of maximizing the company profits, decision makers must have access to decision support systems that allow them to get optimized plans. Considering the need to aid specialists that have to make these sequencing decisions everyday it becomes clear the need for modern scheduling systems that put into practice the best approaches so that they are capable of providing optimized solutions. The goal of this paper is to present a novel architecture that takes advantage of the latest machine learning, deep learning and reinforcement learning developments to create an innovative and efficient approach to compute a solution to a JSSP while maintain an acceptable level of solution quality. To support this work, this paper also presents the methodology applied during the development of the system, from the initial study and design of a scheduling system to the complete details of its development. The remaining sections are organized as follows: Sect. 2 starts by presenting a literature review on the relevant concepts of scheduling problem, fundamental to the proposal that this paper puts forward; Sect. 3 presents the methodology that was used to develop the proposed architecture; Sect. 4 contains the details of the developed architecture; Sect. 5 presents some preliminary results; and, at last, Sect. 6 contains the final conclusions and puts forwards the planned future works.
2 Literature Review 2.1 Scheduling Problems Scheduling is a decision-making process that is regularly used not only in various service and manufacturing industries, but also in the day-to-day lives of ordinary people. A schedule - or plan - is what defines when a certain activity should occur and what resources it will use; e.g. the schedule of a bus defines the time of departure, arrival and where to go; a school schedule defines the classes of each curricular unit, the teacher who will teach and which room to use; in a production company, the production plan contains accurate information about the orders to be processed [1]. The scheduling process consists in generating a detailed plan of the jobs to be performed. Each job is composed of one or more operations, which may have associated restrictions (e.g. an operation can only start after another has ended). The execution of an operation requires a certain resource that typically has a limited capacity (e.g. a machine in a factory environment). The detailed plan consists in defining the allocation of the operations to the former available resources, including the definition of the start time of each operation, so that all the restrictions are satisfied. The objective of the scheduling process is to find the optimal plan; i.e. a plan that maximizes a given objective [2]. This goal is usually associated with the elapsed time. The most common goal is the minimization of the time required for all jobs to finish, also known as makespan, but there are many others, such as minimizing delays or reducing machine downtime. In any modern organization job scheduling is a crucial process. It is the last step of planning before the execution of a certain activity. Therefore, it is the element that makes the connection between the planning and execution phases, bringing the companies’ plans into practice. Although in the present-day reality most of the plans are made at short notice, the plans need to be generated in long-term in many industries, so that the organizations can
470
B. Cunha et al.
prepare themselves; e.g. in the pharmaceutical industry it is necessary to guarantee the raw material with a long time in advance. However, unforeseen events can happen. Thus, these environments tend to use a more dynamic approach, where orders arise from one moment to another, the machines may have failures, priorities of operations are changed or even jobs that are canceled [3]. To avoid this situation, companies choose to create long-term plans that are put into operation in two distinct phases [1]: an initial part of the plan is frozen, being carried out regardless of the unforeseen events that arise; the second part of the plan is more volatile, and it is possible to re-plan all the jobs included therein. The creation of these plans is often done in an informal way, using empirical knowledge, and concluded with insufficient care. However, it is crucial that there is a proper system and a careful approach to deal with more complex situations. Decision makers must have access to decision support systems that allow them to obtain optimized plans. These systems can be as simple as spreadsheets with more basic functionalities (e.g. Microsoft excel) or more complex, allowing more experienced users to make use of advanced graphical representations, visualizing performance metrics and having access to effective optimization methods (e.g. ADSyS [4]). The level of support that the user wants should always be adjustable, meeting the best practices of human-machine interfaces [4]. 2.2 Job Shop Problem Definition Multi-operation JSSPs consist of processing n manufacturing orders (known as jobs), J1 , J2 , J3 , … Jn , in m machines, M1 , M2 , M3 , … Mm . These machines are positioned in an industrial environment such as a factory or workshop (“shop”). Each job is characterized by a certain number of operations. Each operation is represented by oij , where the i represents the job number that the operation belongs to and j the processing order on the job (e.g. o23 represents the third operation of the second job). A JSSP P is thus defined by the set of machines, jobs and operations, establishing P = (M, J, O) (illustrated in Fig. 1). Each operation also has an associated processing time, pij , that is known, representing the number of time units that is required to fully execute the operation oij . In a job shop environment, there are some basic restrictions that need to be respected: • Operations of a job can only be executed when the previous operation has been completed, except in the case of the first one. • The operations of a given job cannot take precedence over operations of another job, i.e. there is only precedence between operations of the same job. • An operation that is being executed cannot be interrupted. • A machine can only perform one job in each time unit. • A job can only be performed by one machine at a time, i.e. machine changes can only be made between operations.
Development of a Reinforcement Learning System
471
Fig. 1. Representation of a JSSP (adapted from [5]).
In addition to these restrictions, some assumptions are made that are assumed to be true at any time during the execution of the plan. These are: • The machines available in the problem can be used at any time, there are no stops or mandatory dead times. • There are no repeated machines, i.e. the JSSP assumes that all machines are different. • For each operation existing in the plan there is one and only one machine where its execution is possible. • A machine can only process one operation per time unit. Although these constraints make the job shop a simplification of the problem faced by several industries in the real world, this approach is still very useful, since it provides valuable information, such as the best order of execution of existing jobs [2].
3 Methodology The development process of this work was divided into four main phases: design and planning of the work to be developed; the development of the system itself; thorough system validation; and collection of results. These phases are based on the hybrid cascade model proposed by Sommerville [6]; with it, it is possible to take advantage of the structure offered by the cascade models in conjunction with the rapid prototyping of evolutionary models [6]. A traditional cascade approach begins by defining the system requirements that shall serve as the basis for software design. In the implementation phase, each previously designed component is converted into a unit that can be tested later. In evolutionary development, an implementation of the system is made right from the start. This is then evaluated and iteratively the system is refined to a satisfactory level. This requires the specification of requirements, development, and validation to occur simultaneously. The methodology employed in this paper follows a hybrid approach as it combines the two discussed models: cascade and evolutionary. It follows, in part, the cascade model because it began with the specification of requirements that serve as the basis for the implementation of the system, then moved on to the development phase of the
472
B. Cunha et al.
system. Once this phase was over, the overall system was tested and, finally, the relevant statistical results and values were collected. However, at a lower granularity level an evolutionary model [6] was applied. The initial version of each component was tested as it was developed (as if it were a prototype). Then, each version of the components was worked on in an iterative manner, enhancing the previous versions. At last, the final version was built and combined in the overall system. The development process of this work, from the initial research to the achievement and analysis of the results, is split into four phases: design, development, validation and results. These are detailed on the following paragraphs. The design phase is the initial phase of the project. Thus, it is not surprising that in it there are activities such as the revision of the state of the art of the scientific areas involved, namely the reinforcement learning algorithms and the training environments of agents that use these algorithms. In the machine learning module, the design phase consisted then in the revision of the state of the art mentioned, which was previously published [1]. This served as the basis for specifying the training environment of the agent to be developed (based on the Gym standard [7]) and in the decision of the learning algorithm to use - Proximal policy optimization (PPO) [8]. For the scheduling module this phase served to concretely define each of the components of the problems: jobs, the operations that compose each job, the machines where the operations are performed and the decoder of academic instances of these problems. The design phase serves as the basis for the software development phase. This consists, mainly, in the implementation of the components previously decided and planned. In the scheduling module, the sub-modules corresponding to each component were developed and then integrated into a fully functional scheduling system. Each component went through iterative tests until it reached the final version, being only later, already in the validation phase, that the complete dispatching system was tested as a whole (and marked as complete). In the machine learning module, the design phase was separated into two main components: the intelligent agent training environment and the algorithm to be used by the agent to be developed. The training environment was created according to the chosen configuration, which uses the standards defined by Gym [7] as a reference. Furthermore, it could only be developed after each component of the scheduling system was completed, as the training environment depends on these to simulate the scheduling problem. The intelligent agent was built with the implementation of the PPO algorithm [8]. This development was done in parallel to the scheduling system since the learning algorithm is independent of the environment where the agent will act (one of the advantages of using Gym). After the completion of the development of the agent and the environment where it would act, the training phase of the agent was initiated. In this phase, the agent works iteratively, solving several millions of JSSPs and using each one of these experiences to improve his performance (and, effectively, doing true learning). During this process the validation phase of the machine learning module is initiated, and the results of the agent are analyzed to understand if any adjustment is necessary (e.g. fine-tuning the reward structure, changes to the parameters) or if the agent is, in fact, learning and presenting better results over time.
Development of a Reinforcement Learning System
473
These decisions made by the agent during the learning process are made according to the values defined in the parameterization of the reinforcement learning algorithm employed. Since the algorithm chosen for this work was the PPO algorithm, the most relevant parameters used are presented in Table 1. Table 1. Parameters chosen for the reinforcement learning algorithm. Parameter
Value
γ (Gamma)
0.99
Iteration number
50000
Entropy coefficient
0.01
Learning rate
0.00025
Bias
0.95
Training minibatches 4 Epochs
4
Seed
Random
4 Proposed Architecture The overview of the developed system architecture is presented in Fig. 2. It illustrates the division of the developed system into five different components: the JSSP instance, the problem decoder, the job shop environment and the learning algorithm - which, combined, constitute the intelligent agent - and, finally, the optimized solution to the problem. The purpose of the system is to offer an optimized solution for a given JSSP. Thus, the system is activated by giving an instance of a JSSP to be solved. The format of the files that are used is based on the Taillard proposal [9], widely adopted by the scientific community. These files are, in practice, text files where it is known beforehand that in certain rows and columns there is specific information that determines the problem. An illustrative example is shown in Fig. 3. This file is then used by the next component in the system: the decoder. It is responsible for converting the information contained in the file into workable objects in the system; that is, creating the machine, job and operation objects needed to represent all the information. For example: considering the file shown in Fig. 3, the decoder would immediately scan line 2 that there are 15 jobs and 15 machines and, therefore, would create the respective objects in memory. From line 3 the execution times of each job are defined: line 4 defines that the first operation of the first job will take 96 units of time and that the last job will take 63 time units. The last section of the file defines the order in which each job must pass on the machines: in the last line it is established that job 15 starts to be processed at machine 15 and should end at machine 14. The decoder uses all this information to create the objects, which will be supplied to the intelligent agent so that it can solve the problem. A
474
B. Cunha et al.
Fig. 2. Diagram of the developed system (adapted from [2]).
Fig. 3. Extract from a text file describing a JSSP instance
validation mechanism of the information contained in the system has also been developed to ensure that the information in the file is complete. After all the problem information is made available to the system by the decoder it is ready to be processed by the intelligent agent. The agent consists of two components, the training environment and the learning technique, which cooperate to obtain the solution to the problem. The job shop environment is where all the idiosyncrasies of this type of scheduling problem are established and verified. In practice, this is the execution and implementation of a previously proposed architecture [1]. It is in this environment that all objects processed by the decoder live. Thus, it is here that the operation objects are allocated to the machine objects according to the respective processing orders and always respecting the times defined in the instance to be solved. The approval of the allocations is made by a validation entity that has access to all the system information, designed according to software standards that are quite common in modern systems: observation, mediation, responsibility and command [10].
Development of a Reinforcement Learning System
475
If the job shop environment is the place where the intelligent agent’s decisions are executed, it is the learning algorithm that enables the agent to know the best decision to make at any given moment. Considering the moment immediately after the job shop environment is configured with the instance provided by the decoder, all operations are available and not allocated to a machine (i.e. they do not have the start time set). The agent learning component is then responsible for deciding which operations to allocate and when. After all jobs are completely assigned and with all times defined, the agent decides whether it is better to repeat the entire assignment process based on the new knowledge acquired or, alternatively, if the current knowledge is sufficient to calculate the optimal definitive plan. Finally, the optimized solution is the compilation of information relevant to the scheduling plan, including the final execution times. In practice, the optimized solution defines the start and end times for each operation of each job. Thus, it is possible to know where and when any job will be executed and how long it will take. Statistics relevant to plan quality analysis, such as total makespan, are also available, so that it is easy for the end user (such as a planner or an automated factory environment) to validate its effectiveness.
5 Preliminary Results In order to evaluate the contribution of the proposed system architecture a proper computational study has to be conducted. The full, complete results are meant to be shared with the scientific community in the future when those experiments are concluded; Nevertheless, this chapter contains a brief presentation of selected preliminary results on the experiment that evaluates the time required by the system to calculate a solution. The JSSP solving experiments aimed at minimizing the completion time (makespan), not only because this is the most common goal dealt with but also because it was the one chosen in the selected academic instances – in this case, Taillard [9] instances (commonly identified as TAI) were chosen. Regarding the required time to obtain a solution, Fig. 4, 5 and 6 display boxplots of the total time required to solve TAI instances 1 up to 50 from the system here discussed and the results presented by Peng et al. [11] and C. Y. Zhang et al. [12], respectively.
Fig. 4. Time required, in seconds, by the system presented in this paper.
Fig. 5. Time required, in seconds, by the method proposed in Peng et al. [11].
476
B. Cunha et al.
Fig. 6. Time required, in seconds, by the solutions in C. Y. Zhang et al. [12].
Just by analyzing the range on the axis it is very clear that the proposed architecture is capable of very efficient solutions – for example, this system takes around 5,5 s to solve instance TAI50, while the two other approaches require more than 1700 and 900 s, respectively. To be clear, the solutions proposed by those approaches are expected to have a better makespan. However, real industries cannot afford to wait such a long time to have a solution for an unforeseen event, such as a machine that breaks down or an order that is cancelled. Considering that even if the final solutions are not as good as those, but are acceptable and good enough to be released into production, these preliminary results on the speed of the system demonstrate that it has immense potential to be implemented on real industrial production environments.
6 Conclusion An innovative and original scheduling system that takes advantage of the latest development in machine learning (and, specifically, the combination of deep learning and reinforcement learning methods) is presented in this paper. The scheduling problem has always been a complex field studied by many scientific areas, and any help that the decision makers can have will lead them to create better scheduling plans, increasing the efficiency of their companies and, overall, maximizing their profits. As such, the proposed system emerges from this necessity of developing a solution capable of offering optimized solutions for scheduling problems in a timely manner. Thus, the work presented here is more than an intelligent agent that calculates JSSP solutions: it is a complete scheduling system, composed by the artificial intelligence module, of which the intelligent agent and the training environment used during learning are part, and the scheduling module, which establishes all the rules of the problem, supports the decoding of academic instances and ensures the correct definition of all its components. Not only the system architecture is explained in detail, but this paper also discusses the software development methodology that was applied. Considering that this work belongs in the field of optimization, it is almost mandatory to have a clear and concise plan that optimizes the development system process. This work presents a hybrid approach that combines the best from cascade and evolutionary methodologies. A brief discussion of some preliminary results is also included. These suggest that the proposed system is, in fact, very efficient to calculate an optimized solution, when compared with the best approaches that are found in relevant literature. Future works include the execution of a complete computational study, that shall demonstrate the benefits not only in terms of speed but also in quality of the plans and, ultimately, the good tradeoff balance between efficiency and quality of proposed solutions. It is also planned the implementation of a proper graphical interface so that any type of user can utilize the system. At last, after a proper interface is implemented
Development of a Reinforcement Learning System
477
it is planned the operationalization of a usability test with potential users to gauge if the system is in accordance with their necessities and expectations.
References 1. Cunha, B., Madureira, A., Fonseca, B., Coelho, D.: Deep reinforcement learning as a job shop scheduling solver: a literature review. In: Madureira, A.M., Abraham, A., Gandhi, N., Varela, M.L. (eds.) Hybrid Intelligent Systems. Advances in Intelligent Systems and Computing, vol. 923, pp. 350–359. Springer, Cham (2020) 2. Pinedo, M.L.: Scheduling: Theory, Algorithms, and Systems, 5th edn. Springer, Cham (2016) 3. Cunha, B., Madureira, A., Fonseca, B.: Reinforcement learning environment for job shop scheduling problems. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 12, 231–238 (2020) 4. Cunha, B., Madureira, A., Pereira, J.P.: User modelling in scheduling system with artificial neural networks. In: 10th Iberian Conference on Information Systems and Technologies, CISTI 2015 pp. 1–6 (2015) 5. Beirão, N.: Sistema de apoio à decisão para sequenciamento de operações em ambientes Job Shop. Faculdade de Engenharia da Universidade do Porto (1997) 6. Sommerville, I.: Software Engineering 9th Edition, vol. 137035152 (2011). ISBN-10 7. Brockman, G., et al.: “OpenAI Gym,” (2016). [Online]. Available: arxiv.abs/1606.01540. 8. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). [Online]. Available: arxiv.org/abs/1707.06347. 9. Taillard, E.: Benchmarks for basic scheduling problems. Eur. J. Oper. Res. 64(2), 278–285 (1993) 10. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns Elements of Reusable Software. Addison-Wesley Professional Computing Series, Reading (1996) 11. Peng, B., Lü, Z., Cheng, T.C.E.: A tabu search/path relinking algorithm to solve the job shop scheduling problem. Comput. Oper. Res. 53, 154–164 (2015) 12. Zhang, C.Y., Li, P., Rao, Y., Guan, Z.: A very fast TS/SA algorithm for the job shop scheduling problem. Comput. Oper. Res. 35(1), 282–294 (2008)
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification Using Document Embedding Sonia Guehria1(&), Habiba Belleili2, Nabiha Azizi2, and Samir Brahim Belhaouari3 1
2
LRI Laboratory, Badji Mokhtar University, 12-23000 Annaba, BP, Algeria [email protected] LabGED, Badji Mokhtar Annaba University, 12-23000 Annaba, BP, Algeria [email protected] 3 College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
Abstract. The classification of movie genres from their synopses has attracted the attention of many researchers. Indeed, synopses are a source of relevant information that contributes to determinate movie genre. The automation of this classification process is very useful in several applications, such recommendation systems. Moreover, movies can belong simultaneously to several genres (drama, action, comedy, horror), which reflects a typical problem of multi-label classification (MLC). In this article, we use a powerful representation of film synthesis via a document integration technique Doc2vec in the multi-label classification context. The technique used in our experience is One Vs All, which is a transformation approach; it creates a model for each label through a kernel classifier. We have chosen to use three different classifiers: logistic regression, SVM and ANN. The results of our experimental study show that the best accuracies are obtained using ANN model. Keywords: Artificial neural network Movie genre classification Multi-label classification Doc2vec Logistic regression SVM One vs All
1 Introduction The exponential growth of heterogeneous digital data in various sectors, as well as the constant enrichment of the Internet with new content (chats, social media, survey responses, e-learning, e-health), has placed considerable emphasis on the need for effective and absolute solutions for storing, searching and classifying data to extract relevant information in a reasonable time. This has brought specialists in the field to focus on text classification, in order to help users finding their needs and facilitate their work. With the recent breakthroughs in natural language processing (NLP) and text mining, many researchers are now interested in the development of applications exploiting text classification methods. This type of classification is one of the most fundamental problems of machine learning. It consists of automatically analysing the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 478–487, 2021. https://doi.org/10.1007/978-3-030-71187-0_44
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification
479
text, then assigning a set of predefined tags or categories according to its content using supervised learning algorithms. For years, the automatic classification of media products such as movies has attracted a lot of attention from researchers in the field due to recent technological advances [LEE 02, RAS 02]. These have made it possible to share large volumes of film data online. The task of classifying movie genres is a typical problem in multilabel classification, as a movie can belong to several genres simultaneously. As in the case of the famous movie “Titanic”, that belongs to several genres simultaneously (Catastrophe, Drama, Romantic). However, early studies of movie genre classification cited in the literature considered only one genre per movie, based only on audiovisual characteristics despite the availability of other sources of information such as images, film posters, subtitles, trailers, frame of video clips and synopses [PQR 18]. Multi-label classification was initially introduced for applications in text categorisation [SCH 00], and then spread to other types of issues, such as: bioinformatics [CLA 01], image labelling [BOU 04], classification of video [SNO 06] and audio [Lo 11], the recommendation of tags [KAT 08], web and rule searching [OZO 09]. According to the literature, several approaches with different strategies have been developed to solve the problems of multi-label learning. MLC approaches have been organised into three main families [MAD 12] which are: Transformation approaches, Adaptive approaches and Ensemble approaches. The goal of any classification task is to have a robust system. For this purpose, a stage of generating relevant descriptors representing the texts is crucial. A bad vector of primitives degrades significantly the robustness of the categorisation system. Recently, with the deep learning, new techniques of automatic feature extraction are proposed under the name of vector representation. They allow each text to be represented by a set of values considering the syntactic and semantic relationships between its components according to the level of granularity between letters; words and paragraphs of the corpus. It represents a mapping of words in numeric vector spaces, allowing words with similar meaning to be understood by the machine learning algorithms [AUB 20]. Several deep neural networks architectures have been proposed in the literature for the construction of Word Embedding such as Word2vec [Mik 13] and Doc2vec [LE 14]. The aim of this paper is to analyse the impact of using Doc2Vec technique to extract features from synopsis corpus to obtain a robust MLC system able to predict movie genres from their synopsis. Our experimental study is based on using the One Vs All strategy which is a transformation approach. It allows creating a model for each label using a kernel classifier. In our case three predictors were analysed which are logistic regression (LR), SVM and ANN. The rest of this article is organised as follows. In Sect. 2, we present the related works in the field of MLC of movie genres. Section 3 presents OVA strategy in MLC context. In Sect. 4, we describe the methodology including the architecture of our system with the chosen algorithms and techniques. Section 5, presents our experimentation with the results obtained. We end this study with a conclusion and perspectives.
480
S. Guehria et al.
2 Related Works A great deal of research has allowed new approaches developing for movie genres classification. For this purpose, researchers have opted for classical approaches to feature extraction such as the TF-IDF method [LEE 02]. In other related works [BRE 06, HON 15] the authors used bag-of-words (BOW) from closed captions and bag-ofvisual-features (BOVF) from images of video clips. These traditional approaches often suffer from the dimensionality of the dives, which is proportional to the size of the dictionary. Moreover, when they destroy the word order, a large part of the semantic structure of the corpus is also destroyed. Recently, several models, such as Neural Net Language Models (NNLM) [SHI 17], Global Vectors for word representation (GloVe) [MAK 16], deep representations of words in context (ELMo) [LIU 20], Word2vec [MIK 13] and Doc2vec [AUB 20, LE 14], are designed to learn continuous vector representations of words, which are vectors of real-valued features for each word. For example, training a Word2vec model using a large corpus, such as the English Wikipedia corpus, produces continuous vector representations that capture significant distance and direction between semantically related words.
Table 1. Summary of the related works. Author/Year [BAL 16]
[MAK 16] [LEN 17]*
Dataset nature Free texts in English and French of Wikipedia Language models Longest texts in a document
Content descriptor Doc2vec Word2vec
GloVe Word2vec
[PRO 18]*
Synopses movies
TF-IDF/Word2vec Doc2vec/Glove
[YAG 18]*
Fine-grained typing of entity names
Word2vec Cwin/SSkip
[AUB 20]
Text document
Rule_based approach + Doc2vec
[LIU 20]*
Sentiment task
ELMo
* Multi-label classification
Observation Bilingual integration classification is very attractive when some labeled training data is available for learning The error is smaller on the task of next word prediction Learning of Word2vec is crucial to obtain good results (Precision = 89%) Doc2Vec is efficient when trained on a relatively small amount of data. in contrast to TF-IDF approach ML data with fine-grained classes allows efficient evaluation of embeddings without the need for context D2vecRule approach had a good classification process (Precision = 89%) In the multi-label context ELMo can achieve the greatest precision with less training time
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification
481
Table 1 shows a summary of the main works based on intelligent techniques for the extraction of textual features.
3 One-vs-All Strategy for Multi-label Classification One-Vs-All is an heuristic method for multi-class problem. It involves splitting the multi-class dataset into multiple binary classification sub-problems. Then, each binary classifier is attributed to a class, it predicts whether the instance belongs to the target class or not. This classic multi-class classification scheme is recognised as a robust method [Rif 04]. For our study, the OVA strategy is adopted for movie genres categorisation in a multi-label context. Like multi class problem, a model is constructed with a kernel classifier for each label. In the prediction phase, the example is submitted to each label model which returns a probability likelihood. Hence, a threshold for accepting a label must to be fixed. Labels that have the probability greater or equal to the threshold are retained as predicted labels for the submitted example.
4 Methodology In this section we describe the architecture of the proposed system. It contains three stages. A preliminary stage named “Features Engineering Using Doc2vec” that allows generating a Doc2vec model to create the features vectors, from textual corpus. The second stage “Training Stage”, constructs a model for each label. Like said before, we use different kernel classifiers to depict the more suitable one to our problem. Finally, the test stage evaluates the performance of the classification model. Figure 1 illustrates the architecture of our system with the different stages.
Fig. 1. Proposed system architecture
482
4.1
S. Guehria et al.
Pre-processing
Pre-processing is a crucial step for text classification applications, it is a set of procedures to prepare and clean the dataset before it is exploited by the classifier. These steps are summarised below in Fig. 2.
Start Pre-Processing
Load Dataset
At the age of 40, she was finally able to pass her PhD competition, which was the greatest dream of her life. Faith, Courage, Hope and Success...
Remove Numbers
At the age of, she was finally able to pass her PhD competition, which was the greatest dream of her life. Faith, Courage, Hope and Success...
Remove PunctuaƟon
At the age of she was finally able to pass her PhD competition which was the greatest dream of her life Faith Courage Hope and Success
ToknizaƟon
‘At’ ’the’ ‘age’ ‘of’ ‘she’ ‘was’ ‘finally’ ‘able’ ‘to’ ‘pass’ ‘her’ ‘PhD’ ‘competition’ ‘which’ ‘was’ ‘the’ ‘greatest’ ‘dream’ ‘of’ ‘her’ ‘life’ ‘Faith’ ‘Courage’ ‘Hope’ ‘and’ ‘Success’
Stop Word
‘age’ ‘she’ ‘was’ ‘finally’ ‘able’ ‘pass’ ‘PhD’ ‘competition’ ‘was’ ‘greatest’ ‘dream’ ‘life’ ‘Faith’ ‘Courage’ ‘Hope’ ‘Success’
CapitalizaƟon
‘age’ ‘was’ ‘able’ ‘pass’ ‘phd’ ‘competition’ ‘greatest’ ‘dream’ ‘life’ ‘faith’ ‘courage’ ‘hope’ ‘Success’
Stemming and LemmaƟzaƟon
‘age’ ‘be’ ‘able’ ‘pass’ ‘phd’ ‘competition’ ‘great’ ‘dream’ ‘life’ ‘faith’ ‘courage’ ‘hope’ ‘success’
Corpus Cleaned
Fig. 2. Pre-processing stages illustration
5 Experimentation In this section we discuss the experimental results obtained using the different classifiers with a document incorporation method. For this purpose, we used vectors of different sizes (50, 100 and 200) and two different window sizes (5 and 8 words), applied according to the two architectures PV-DM and PV-DBOW of Doc2vec. Our choice was based on values maximizing the evaluation metrics used.
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification
5.1
483
Dataset
For this study, a large dataset of movie synopses labelled with different genres, extracted from a single open source CMU Movie Summary Corpus1 is used. This collection of movie synopses and plot metadata contains several information about movies (box office receipts, gender…), as well as information about the character (gender and estimated age). For our study, we were only interested in the movie synopses and the genres (Adventure, Costume/Adventure, Comedy, Action/Adventure, Action…) to which the movie is belonged. In fact, two files from this database were used; Plot_summaries.txt and Movie.metadata.tsv. As the dataset is unbalanced, only the most relevant labels are taken into account. 5.2
Results and Discussion
The aim of our study is to find the optimal Doc2vec parameters settings that maximize classification results. These parameters were varied according to the dimension of the context window as well as the size of the vector. To evaluate the performance of multilabel classifiers, several metrics proposed in the literature which is different from those used in mono-label classification. These measures can be grouped into observationbased and on label-based metrics. The observation-based metrics are calculated on the differences in means between the actual labels and the predicted label sets in the test set [AUB 20]. In our study we used two metrics Accuracy and F1-Score. Accuracy ¼ TP þ TN=TP þ FP þ FN þ TN
ð1Þ
F1 ¼ 2 ðprecision recallÞ=ðprecision þ recallÞ
ð2Þ
In contrast, label-based metrics express averages calculated separately for each label as in the case of binary classification. These averages are of two types: Macroaverage measure, providing an equivalent weight for each label, regardless of its frequency. Micro-average measure gives an equivalent weight for each observation. Therefore, it is an average over all pairs of observations and labels. For our study, we used Precision and Recall as label-based metrics. Precision ¼ TP=TP þ FP
ð3Þ
Recall ¼ TP=TP þ FN
ð4Þ
TP (True Positive), FP (False Positive), TN (True Negative), FN (False Negative) are parameters with which the different measurements are calculated, to indicate the degree of reliability of a classifier. For example, TP indicates that the instance belonging to class A that the classifier has classified to the same class A [LAH 16].
1
https://www.cs.cmu.edu/*ark/personas/.
484
S. Guehria et al. Table 2. Parameter settings and Metrics of the LR algorithm Parameters Doc2vec
Window Size 5
8
5
DM
8
5
8
Vector Size 50 100 200 50 100 200 50 100 200 50 100 200 50 100 200 50 100 200
Metrics Seuil
50%
60%
70%
Precision Recall
F1-score
Acuracy
Tests
0,72
0,61
0,66
0.72
1
0,62 0,73 0,73 0,73 0,73 0,77 0,78 0,78 0,77 0,78
0,62 0,62 0,61 0,62 0,62 0,50 0,52 0,52 0,49 0,51
0,62 0,67 0,66 0,67 0,67 0,61 0,62 0,62 0,60 0,62
0.63 0.75 0.73 0.74 0.75 0.77 0.80 0.78 0.79 0.80
2 3 4 5 6 7 8 9 10 11
0,78 0,82 0,83 0,84 0,82 0,83 0,82
0,52 0,61 0,61 0,64 0,61 0,63 0,61
0,62 0,70 0,70 0,73 0,70 0,72 0,70
0.78 0.82 0.84 0.84 0.85 0.85 0.82
12 13 14 15 16 17 18
Table 3. Parameter settings and Metrics of the SVM algorithm Parameters Doc2vec
Window Size 5
8
5
DM
8
5
8
Vector Size 50 100 200 50 100 200 50 100 200 50 100 200 50 100 200 50 100 200
Metrics Seuil
50%
60%
70%
Precision Recall 0,70 0.72 0.71 0.71 0.73 0.70 0.71 0.72 0.71 0.70 0.70 0.70 0.74 0.75 0.76 0.74 0.75 0.75
0.60 0.62 0.62 0.60 0.61 0.61 0.54 0.54 0.53 0.52 0.52 0.51 0.60 0.62 0.63 0.60 0.61 0.61
F1-score 0,65 0,67 0,66 0,65 0,66 0,65 0,61 0,62 0,61 0,60 0,60 0,59 0,66 0,68 0,69 0,66 0,67 0,67
Acuracy 0.70 0.72 0.71 0.73 0.73 0.72 0.72 0.72 0.73 0.72 0.71 0.71 0.76 0.75 0.78 0.76 0.77 0.76
Tests 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification
485
Table 4. Parameter settings and Metrics of the ANN algorithm Parameters Doc2vec
Window Size 5
8
5
DM
8
5
8
Vector Size 50 100 200 50 100 200 50 100 200 50 100 200 50 100 200 50 100 200
Metrics Seuil
50%
60%
70%
Precision Recall 0.83 0.82 0.84 0.84 0.84 0.84 0.85 0.86 0.86 0.86 0.87 0.87 0.90 0.89 0.91 0.89 0.89 0.89
0.76 0.74 0.73 0.75 0.73 0.76 0.77 0.77 0.76 0.75 0.74 0.76 0.84 0.83 0.85 0.84 0.82 0.82
F1-score 0,79 0,78 0,78 0,79 0,78 0,80 0,81 0,81 0,81 0,80 0,80 0,81 0,87 0,86 0,88 0,86 0,85 0,85
Acuracy 0.84 0.84 0.83 0.83 0.84 0.87 0.82 0.87 0.87 0.87 0.87 0.85 0.88 0.89 0.90 0.86 0.87 0.87
Tests 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Table 2, Table 3 and Table 4 illustrate the obtained results according the classification model choice: According to the experiments carried out by the different methods, we note that the ANN method gave better results compared to the LR and SVM classifier. Knowing that, the Precision and F1-score parameters scored 0.91 and 0.88 according to a threshold of 70%, a window of five words using a vector of size 200 (Table 4). Using the same parameters with other methods, LR method scored the following scores: Precision = 0.84 and F1-score = 0.73 (Table 2), while the SVM basic classifier gave the following scores: Precision = 0.76 and F1-score = 0.69 (Table 3). Therefore, these two measures tend to offer very high success rates compared to the others. For the evaluation of the classification score of each label, the analysis of the results shows that the numerical feature vectors using doc3Vec technique keep the deep meaning of the document, since the vector is generated by contribution to the existing relationships between words. To evaluate the prediction score of our classifiers on the Doc2Vec based-digital vectors, we varied the window and vector sizes in order to find the best values optimising the evaluation measures. To this end, we have kept the most promising scores, but which are not high enough: Precision = 91%, Recall = 85%, Accuracy = 90%. This means that the performance of these methods will always depend on the distribution of labels and their unique combination that occurs in the data set.
486
S. Guehria et al.
6 Conclusion and Future Work Through our study, we presented basic results for MLC of movie genres based on their synopses. In a first step, we used a powerful Doc2vec document embedding representation for feature extraction in the form of numeric vectors representative of the corpus. Then, the OVA approach is analyzed by the use of three classifiers: Logistic Regression (LR), SVM and ANN employed separately. Experiments with the three methods have shown that the best results have been obtained with the ANN classification method. This study presents a promising perspective on the problem of MLC of movie genres, as the performance obtained by Precision and F1-score could be further improved in future works by using more specific primitives and classifiers. In the future, we propose to enrich our system with a module based on metaheuristics, allowing on the one hand to select the most relevant features in an automatic way; and on the other hand, to be able to select the best primitives during the generation of a vector of a large size by Doc2vec. Moreover, the chosen dataset must be balanced with adapted methods to take into account all the labels.
References [AUB 20] Aubaid, A., Mishra, A.: A rule-based approach to embedding techniques for text document classification. Appl. Sci. 10(11), 4009 (2020). https://doi.org/10.3390/ app10114009 [BAL 16] Balikas, G., Amini, M.R.: Multi-label, multi-class classification using polylingual embeddings. In: 38th European Conference on Information Retrieval ECIR, Archives-Ouvertes (HAL), Italy (2016) [BRE 06] Brezeale, D., Cook, D.J.: Using closed captions and visual features to classify movies by genre. In: Poster session of the 7th International Workshop on Multimedia Data Mining (MDM/KDD 2006), USA (2006) [BOU 04] Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004) [CLA 01] Clare, a., King, R.D.: Knowledge discovery in multi-label phenotype data. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 42–53. Springer, Heidelberg (2001) [HON 15] Hong, H.-Z., Hwang, J.G.: Multimodal PLSA for movie genre classification. In: International Workshop on Multiple Classifier Systems, pp. 159–167. Springer, Heidelberg (2015) [KAT 08] Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2008, vol. 18, pp. 75–83. Google Scholar (2008) [LAH 16] Ouchiha. L.: Classification supervisée de documents étude comparative. Maitrise en sciences et technologies de l’information, Université dé Québec, Outaouais (2016) [LEN 17] Lenc, L., Kral, P.: Word embeddings for multi-label document classification. In: Proceedings of Recent Advances in Natural Language Processing, 4–6 September, Varna, Bulgaria, pp. 431–437 (2017)
“One vs All” Classifier Analysis for Multi-label Movie Genre Classification
487
[LEE 02] Lee, Y.-B., Myaeng, S.-H.: Text genre classification with genre-revealing and subject-revealing features. In: Proceedings of the 25th Annual International SIGIR Conference on Research and Development in Information Retrieval, pp. 145–150. ACM, Finland (2002) [LE 14] Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Proceding Machine Learning research (PMLR), China, vol. 32, no. 2, pp. 1188–1196 (2014) [LIU 20] Liu, W., Wen, B., Gao, S., Zheng, J., Zheng, Y.: A multi-label text classification model based on ELMo and attention. MATEC Web Conf. 309, 03015 (2020) [LO 11] Lo, H.-Y., Wang, J.-C., Wang, H.-M., Lin, S.-D.: Cost-sensitive multi-label learning for audio tag annotation and retrieval. IEEE Trans. Multimedia 13(3), 518– 529 (2011) [MAD 12] Madjarov, G., Kocev, D., Gjorgjevikj, D., Dzeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012) [MAK 16] Makarenkov, V., Rokach, L., Shapira, B.: Language Models with GloVe Word Embeddings. Researchgate (2016). https://www.researchgate.net/publication/ 309037295, Accessed 29 Sept 2020 [MIK 13] Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR), Scottsdale (2013) [OZO 09] Ozonat, K., Young, D.: Towards a universal marketplace over the web: Statistical multi-label classification of service provider forms with simulated annealing. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1295–1304. ACM, France (2009) [POR 18] Portolese, G., Feltrim, V.D.: On the use of synopsis-based features for film genre classification. In: Conference XV Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), pp. 892–902. SBC, Brazil (2018) [RAS 02] Rasheed, Z., Shah, M.: Movie genre classification by exploiting audio-visual features of previews. In: 16th International Conference on Pattern Recognition, pp. 11–15. IEEE Computer Society Press, Canada (2002) [RIF 04] Rifkin, R., Klautau, A.: In defense of one-vs-all classification. J. Mach. Learn. Res. 5, 101–141 (2004) [SCH 00] Robert, E., Singer, S.Y.: BoosTexter: a boosting-based system for text categorization. In: Machine Learning, vol 39, pp. 135–168. Springer, Yang (2000) [SHI 17] Shi, D.: A Study on Neural Network Language Modeling. Researchgate (2017). https://www.researchgate.net/publication/319271962, Accessed 25 Sept 2020 [SNO 06] Snoek, C.G., Worring, M., Van Gemert, J.C., Geusebroek, J.-M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 421–430. ACM, USA (2006) [YAG 18] Yaghoobzadeh, Y., Kann, K., Schutze, H.: Evaluating word embeddings in multilabel classification using fine-grained name typing. In: Proceedings of the 3rd Workshop on Representation Learning for NLP, pp. 101–106. Association for Computational Linguistics, Australia (2018)
Analysis of the Superpixel Slic Algorithm for Increasing Data for Disease Detection Using Deep Learning Luiz Daniel Garay Trindade1(B) , F´ abio Paulo Basso1 , Elder de Macedo Rodrigues1 , Maicon Bernardino1 , Daniel Welfer2 , and Daniel M¨ uller2 1
Laboratory of Empirical Studies in Software Engineering, Unipampa Alegrete, Av. Tiaraju, 810, Alegrete, Brazil [email protected] 2 Federal University of Santa Maria, Cidade Universit´ aria Bairro Camobi, Av. Roraima, 1000, Santa Maria, Brazil
Abstract. With the increase in the world population, it is necessary to increase agricultural production. The technology in the field aims to assist producers, agriculture with greater productivity without forgetting to care for the environment. One of the problems encountered by farmers is plant diseases, which can cause great damage to their crops. Thus, the use of automatic disease detection techniques by means of a computational method can be an alternative to solve this problem. However, the problem in using automatic techniques is the lack of data and that the use of methods to augment existing bases is a challenge. The objective of this work is to verify the efficiency of the SLIC together with the CNNs, using the SLIC as a preprocessing technique and the CNNs as a classification method. Finally, the selected results are not motivating with the use of SLIC at the expense of using the original images. Keywords: Deep learning · Convolutional neural networks detection · Precision agriculture
1
· Disease
Introduction
The world population has been increasing steadily and this leads us to the fact that we must obtain an increase in food production [6]. Smart agriculture aims to solve the problems found in agricultural production, focusing on some points such as production, environmental impact, food security, and sustainability, aiming at greater productivity and safety [8]. Plant health is extremely important for farmers and it is directly linked to food security. Pests and diseases can lead to the loss of up to 40% of world food production, thus posing a threat to food security [3]. One of the ways to solve the problem of pests and diseases in crops is the use of pesticides, their use has increased agricultural production since the 1950s, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 488–497, 2021. https://doi.org/10.1007/978-3-030-71187-0_45
Analysis of the Superpixel Slic Algorithm for Increasing Data
489
helping the needs of the growing world population [5]. However, its use is not harmless to the environment, and it can negatively affect biodiversity, including insects, fish, birds, as well as the quality of soil, water, air, and, consequently, humans [3]. Obtaining data and information on field conditions can limit the use of pesticides. Allowing the farmer to find the right place and time for application. However, analyzing the condition of plants in a crop is not a simple task if done manually, consuming a lot of work and time. The early identification of plant diseases using automatic detection techniques can help and solve the problems encountered by farmers [3]. Deep learning techniques have significantly improved image processing, especially convolutional neural networks (CNN). According to [3], since 2016, several applications for the identification of diseases in cultures have been developed. The work of [9] uses CNNs as a technique to detect 17 diseases of different cultures, such as wheat, barley, corn, rice, and rapeseed. The work included a set of data reaching more than a thousand images captured by cell phones in real conditions in the field. In the work of [7], CNNs were used to identify diseases in 13 different leaves of plants. The data set had 54,305 images, divided into 38 classes, each of which was defined as healthy or infected and with a labeled disease. In the work of [12], a comparison was made between three models of CNNs, AlexNet, GoogLeNet, and ResNet. The three models were used to detect diseases on the soybean leaf, with a dataset with 8470 images, divided into 6 types of soybean diseases and apart with images of the healthy leaf. One of the challenges encountered in using automatic techniques is the lack of data that can be solved using methods to increase the existing data set. However, these techniques can be challenging. The main objective of this study is to verify the efficiency of the Superpixel algorithm as a pre-processing technique together with the CNNs, as a classification method. The experiment conducted in this study follows the protocol defined by [11]. The rest of the work is structured as follows: in Sect. 2 the protocol used for the development of the experiment is presented; the methodology of the present study is presented in Sect. 3, including preprocessing in Subsect. 3.1, the development of the CNN model in Subsect. 3.2 and the collection of the metrics specified in Subsect. 3.3; and finally, in Sect. 6 the final considerations are presented.
2
Summary of the Research Protocol
The protocol was defined and conducted concerning the book of [11], to be based on a standard and a better development of the protocol. This section summarizes the main features of our research protocol, while the complete protocol is available at the experimental repository at this link .
490
2.1
L. D. G. Trindade et al.
Image Base
The image base used has a total of 65 images of the soybean leaf infected with the symptoms of rust disease, and 57 images of the healthy soybean leaf. The image base used in the experiment is public and was taken from the repository of the Brazilian Agricultural Research Corporation (Embrapa) called Digipathos [2]1 . An example of the images that make up the database used in the experiment can be viewed in Fig. 1. 2.2
Tools
For the development of this work, a computer with the Ubuntu 18.04 operating system was used. It has hardware with 6 GB of RAM, a disk with 1T capacity, an Intel Core i5-3337U CPU, with 4 threads and 2 cores and the TensorFlow and Keras libraries, both in the Python programming language. The programming language and libraries used were selected due to their development capacity and because they are written in open code. The version of the libraries and programming language are described below: tensorflow version: 1.5.0; keras version: 2.2.5; python version: 2.7.15
3
Methodology
In this section, the Methodology used for the development of the Experiment will be presented. The study was divided into 3 stages, namely: pre-processing and organization of the images, where the Slic superpixel algorithm was used to increase the number of images in the database, even at this stage, the images were divided into 70% for training, 20% for testing and 10% for validation (these image division rules apply to both sick soy leaf images and healthy soy leaf images); CNN model development, where the models were developed using a Keras API and a TensorFlow library. Python was used as the standard programming language; finally, collection of metrics, a stage in which the confusion matrix and the formulas of accuracy, error rate, sensitivity, and specificity were used to represent the results obtained in the development stage. 3.1
Step 1: Pre-processing
In the pre-processing of the images, we use a segmentation algorithm called Simple Linear Iterative Clustering (SLIC). According to [1], the Slic algorithm is an adaptation of K-means to generate similar regions, called superpixels. The K parameter of the Slic algorithm is related to the size of the superpixel about the image, in our work we defined a superpixel size that was suitable for our study. 1
https://www.digipathos-rep.cnptia.embrapa.br.
Analysis of the Superpixel Slic Algorithm for Increasing Data
491
Fig. 1. Example of an image of sick soy leaf with slic applied. Table 1. Image base after data increase process using Slic. Grouping Before the data growth process
After data augmentation process
Images of infected leaves Sound leaf images Images of infected leaves Sound leaf images Training
45
6
1.213
Test
13
2
347
Validation 7
1
173
Total
9
1.733
65
165 47 24 236
Taking into account the number of images in the database, we use this technique to increase the number of images in our database. Generally, the greater the number of images, the more beneficial it becomes for training the CNN, as this can facilitate learning. A practical example of using the Slic superpixel technique used in the experiment can be seen in Fig. 1. After the application of the Superpixel segmentation technique (Slic) of 65, a total number of 1,540 images of soybean leaves with rust symptoms were obtained, an increase that can be significant for the training of CNN. The images of the healthy soy leaf, after segmentation, obtained 65 images, the total number of 1,207 images. After SLIC application, the image base was divided using cross-validation, in which 70% of the images of the sick and healthy leaves were separated for the training of the CNN, 20% of the images of the sick and healthy leaves were destined for the test, and for validation, 10% of the images of the sick and healthy leaves were separated. The enlargement of the image base using SLIC and the division of images using cross-validation can be seen in the Table 1. An important option that the slic algorithm provides is the adjustment of the size of the superpixels. In this work, the size of 60 superpixels was used as a standard, resulting in between 43 and 58 images of each image. A larger number could be placed, such as 100, 200 or 500, however, this would result in low quality images, which could be bad for the training of the CNN. Therefore, the size of 60 superpixels was adopted as standard, which gives us a significant increase in data and with good quality images. The superpixels that returned only with images without relevant data (dark background), as shown in Fig. 1, were discarded. The decrease in the number of
492
L. D. G. Trindade et al.
images can help in reducing the use of computational power, focusing only on the images that have relevant leaf content, with or without rust symptoms. 3.2
Step 2: Development of the CNN Model
A CNN model is composed of layers that have variables to be changed, thus changing the final result of tests and validations. The layers of a CNN are the input layer, convolutional layer, pooling layer, dense layer, and output layer. Model Development. In the development of this experiment, several models were tested for training, tests, and validations. In the end, we selected the 3 models that presented the best results. All models were trained and tested using 70% of the total set of images for training and 20% of the total set of images for testing. The remaining 10% were used in the validation phase, which will be detailed in Subsect. 4.1. The two architectures used in the 5 models are presented in Fig. 2. As the figure shows, models 1, 2, and 3 used the same architecture, but what differentiates one model from another is the number of epoch in which they were used. In model 01, 60 epochs were used, in model 02, 100 epochs were used, and finally, in model 03, 150 epoch were used. Model 4 has a different architecture, in which 100 epochs were used for its training. Finally, model 5 was also trained and tested using 100 epochs. As shown in Fig. 2, models 1, 2, and 3 have two layers of convolution, two layers of pooling and two dense layers (fully connected), one of which is the output layer. Model 4, on the other hand, has 3 layers of convolution, 3 layers of pooling, and 2 dense layers. Model 5 also has two layers of convolution, two layers of pooling and two dense layers, one of which is the output layer. The other models in which they were not selected had several variations of these models, such as the number of layers, the size of the kernel, different activation functions, among other important differences, but which obtained inferior results. 3.3
Step 3: Collection of Metrics
The final step is to apply those metrics depicted in the complete research protocol. The collection of metrcs resulted in an analysis of the slick technique applied to the context: disease detection in soybean leafs as follows.
4
Results
After pre-processing the set of images, developing the models, we carry out the training, tests, and validations of our models using our image bank. After training, testing, and validation, we compute the data and perform an analysis of it, using the evaluation metrics, to identify the positive and negative points of the models.
Analysis of the Superpixel Slic Algorithm for Increasing Data
493
Fig. 2. Models used in the experiment.
Fig. 3. Accuracy result of the models using the data set augmented by the SLIC algorithm
The first stage of this evaluation contains the results after training and testing the models, in which we analyze the accuracy of the models and their error rate. In Fig. 3 the results obtained by models 2, 4, and 5 are presented using the data set augmented by the SLIC algorithm during the training and test phases. The second set of results were obtained using the same models, with the difference in the set of images, in which the original images were used, without increasing the data and with the initial number of images. Thus, in Fig. 4 the results from this second experiment are presented. Models 1 and 3 have the number of different epochs (60 and 150 epochs consequently), so they were not put on a graph for comparison. We will present the results of the validation of these models in the following subsection and the reason for presenting the results of these models will also be justified.
494
L. D. G. Trindade et al.
Fig. 4. Result of accuracy of the models using the data set without the increase of data
4.1
Validation
After the training and testing steps, the proposed models were validated. Validation, unlike training and testing, has no label on the images. In training and testing, we put labels on the images, indicating which images are of the healthy soy leaf and which images are of the infected soy leaf. In the validation, to test the algorithm, the images used do not have labels, so the algorithm must say whether the images used for validation are part of the healthy plant class or the class of sick plants. In validation, we use 10% of the total set of images. In the Table 2 table, the results of the models are presented using the set of images that went through the data increase in the validation step. Using the formulas presented in Subsect. 3.3 based on the confusion matrix presented in Subsect. 3.3 we obtain the results presented in the table Table 2. Table 2. Model results using the image set augmented with the Superpixel Slic technique Metrics
Model 02 Model 04 Model 05
TP
83
39
FP
0
0
0
TN
121
121
121
26
FN
89
133
146
Accuracy
0,6962
0,5460
0,5017
Error rate 0,3038
0,4540
0,4983
Sensitivity 0,4825
0,2267
0,1511
Specificity 1
1
1
Efficiency
0,6133
0,5755
0,7412
The second table of the results obtained by the models using the set of original images in the validation step is presented in the Table 3.
Analysis of the Superpixel Slic Algorithm for Increasing Data
495
Table 3. Model results using the original set of images without using the data augmentation technique Metrics
Model 02 Model 04 Model 05
TP
7
7
7
FP
2
2
4
TN
5
5
3
FN
0
0
0
Accuracy
0,8571
0,8571
0,7142
Error rate 0,1429
0,1429
0,2850
Sensitivity 1
1
1
Specificity 0,7142
0,7142
0,4285
Efficiency
0,8571
0,7142
0,8571
Finally, the Table 4 presents the results obtained by models 1 and 3 in the validation stage. These results were obtained using the images resulting from the data increase performed by the SLIC algorithm in training, testing, and validation. Table 4. Results of models 1, 2, and 3 using the image set augmented with the Superpixel Slic technique
5
Metrics
Model 01 Model 02 Model 03
TP
64
FP
0
0
0
TN
121
121
121
83
61
FN
108
89
111
Accuracy
0,6313
0,6962
0,6211
Error rate 0,3687
0,3038
0,3789
Sensitivity 0,3720
0,4825
0,3546
Specificity 1
1
1
Efficiency
0,7412
0,6773
0,6860
Discussions
The results obtained from models 1 and 3 together with model 2 in the step of obtaining results in Subsect. 4.1 was carried out to show a limit on the number of epochs used. Model 1 has 60 epochs, model 2 100, and model 3 150. All of these models have the same CNN architecture, being different in the number of epochs. When we increased the number of epochs from 60 to 100, we achieved an
496
L. D. G. Trindade et al.
improvement in accuracy and also in efficiency. When we increase the number of epochs to 150, the accuracy results are still lower than the model with 100 epochs, as well as a lower efficiency. With these results, it is proved that adding a number greater than 100 epochs would not obtain better results, as well as reducing it. The comparative results between the models using images from different sets of images (original images and images enlarged by SLIC) are also presented in Subsect. 4.1. These results show that the models that use the original images obtained good results when compared to the models that used the pre-processed images, both in the accuracy value and in the efficiency value. The model that obtained the best result with the pre-processed images was model 2, and the worst results were with model 5. Already using the original images, models 2 and 4 obtained the same results and were the most favorable results. Model 5 also obtained the worst results using the original images. Another detail is that model 5, in which it was the worst, with the original images, it obtained a better accuracy compared to model 2 using the pre-processed images, however, it obtained less efficiency. With these results presented, it is concluded that one of the hypotheses is that some images resulting from the increase in data were incorrectly labeled as patients. That’s because an entire sick leaf can have healthy parts as well as sick parts. With this, some superpixels resulting from the sick images, even though they are healthy, could have been added to the patient class, thus confusing the model of CNN. For this problem not to have occurred, it would be necessary to evaluate a technician in the area to label image by image (resulting from preprocessing) adding them in the correct class. If the part of an image (superpixel) resulting from pre-processing was proven to be healthy by the technician, it would be labeled as healthy and placed in the healthy class, not in the patient class. The results obtained in this experiment, show us the importance of evaluating all the hypotheses and results, and that the increase in data does not always help in the training of the models, as there may be errors during the execution of the same thus impairing the execution. Even with the smaller number of images in the training with the original images, the results were better.
6
Final Considerations
The evolution and improvement in the techniques of increasing data and classifying diseases automatically may increasingly help farmers. Studies in these areas are of paramount importance for evolution and to increasingly avoid problems in which they can be encountered by researchers and technicians. In this study, we carried out an experiment with the objective of testing the process of increasing data in the pre-processing step using the SLIC algorithm and the classification of soybean rust disease using CNN models created using the Tensorflow and Keras libraries. The number of enlarged images in the pre-processing was efficient, however, the results of the models that used the enlarged images
Analysis of the Superpixel Slic Algorithm for Increasing Data
497
were not motivating. The results that obtained greater efficiency were the models trained with the original images without increasing the set of images. The reason why the values were not efficient may have been derived from soybean leaves that have different conditions, thus concluding the need for an evaluation by a technician before using these images for automatic detection methods. The best result obtained through the use of images that went through the preprocessing process was of accuracy equal to 0,6962 and 0,7412 efficiency. The results obtained through the use of the original images were of accuracy equal to 0,8771 and efficiency equal to 0,8571. As future work, the objective is to use another data augmentation technique to perform this experiment. The use of a data augmentation technique that does not partition the leaf image into several parts will be selected for a future study. The objective of carrying out another experiment with another data augmentation technique will be made to achieve better results of accuracy and efficiency compared to the results obtained with the original images used.
References 1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., S¨ usstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2274–2282 (2012) 2. Barbedo, J.G.A., Koenigkan, L.V., Halfeld-Vieira, B.A., Costa, R.V., Nechet, K.L., Godoy, C.V., Junior, M.L., Patricio, F.R.A., Talamini, V., Chitarra, L.G., et al.: Annotated plant pathology databases for image-based detection and recognition of diseases. IEEE Lat. Am. Trans. 16(6), 1749–1757 (2018) 3. Boulent, J., Foucher, S., Th´eau, J., St-Charles, P.L.: Convolutional neural networks for the automatic identification of plant diseases. Front. Plant Sci. 10, 941 (2019) 4. Cecotti, H., Rivera, A., Farhadloo, M., Villarreal, M.P.: Grape detection with convolutional neural networks. Exp. Syst. Appl. 159, 113588 (2020) 5. Cooper, J., Dobson, H.: The benefits of pesticides to mankind and the environment. Crop Prot. 26(9), 1337–1348 (2007) 6. FAO: How to feed the world 2050. the special challenge for Sub-Saharan Africa. In: High Level Expert Forum (2009) 7. Geetharamani, G., Pandian, A.: Identification of plant leaf diseases using a ninelayer deep convolutional neural network. Comput. Electr. Eng. 76, 323–338 (2019) 8. Kamilaris, A., Prenafeta-Bold´ u, F.X.: Deep learning in agriculture: a survey. Comput. Electron. Agric. 147, 70–90 (2018) 9. Picon, A., Seitz, M., Alvarez-Gila, A., Mohnke, P., Ortiz-Barredo, A., Echazarra, J.: Crop conditional convolutional neural networks for massive multi-crop plant disease classification over cell phone acquired images taken on real field conditions. Comput. Electron. Agric. 167, 105093 (2019) 10. U˘ guz, S., Uysal, N.: Classification of olive leaf diseases using deep convolutional neural networks. Neural Comput. Appl. 33, 4133–4149 (2020) 11. Wohlin, C., Runeson, P., H¨ ost, M., Ohlsson, M.C., Regnell, B., Wessl´en, A.: Experimentation in software engineering. Springer, Heidelberg (2012) 12. Wu, Q., Zhang, K., Meng, J.: Identification of soybean leaf diseases via deep learning. J. Inst. Eng. (India) Ser. A 100(4), 659–666 (2019)
Evaluating Preprocessing Techniques in Identifying Fake News Matheus Marinho1 , Carmelo J. A. Bastos-Filho1 , and Anthony Lins2(B) 1
2
Universidade de Pernambuco, Recife, PE, Brazil {mblm,carmelofilho}@ecomp.poli.br Universidade Catolica de Pernambuco, Recife, PE, Brazil
Abstract. Combating the circulation of disinformation has been increasing since this type of news can influence people’s behavior and opinion. One way to deal with this problem is to develop automated factchecking systems with machine learning techniques. An essential step in the generation of classifying models is data preprocessing. This work compared analysis using different preprocessing methodologies to normalization, transformation, and feature selection in a Portuguese Language Corpus that contained fake and legitimate news. We obtained better results compared to the current approaches presented in the literature. Keywords: Preprocessing
1
· Supervised learning · Disinformation
Introduction
Disinformation has influenced people’s behavior and opinion in recent years, reaching issues in different knowledge areas. The fight against disinformation has been strengthened, especially after the Cambridge Analytics related case. There were cases involving the political sphere in the 2016 US elections and affected Brexit’s economic area in 2018 [2]. Fact-checking agencies worldwide and big companies raise funds to finance projects to reduce the spread of fake content. However, the enormous volume of disinformation that professionals can check is still one of the main challenges in this fight [5]. The term fake news has been used to label information adulterated in the context [1]. Alternatively, it has been used to name content produced to manipulate readers [18]. According to Tandoc Jr., Lim, and Ling [20], the concept has been used to designate different types of information: satire, parody, manufacturing, manipulation, publicity, and advertising. The term still has an intersection with the concepts of misinformation, which is false information, but its alteration was not intentional. It may have been a mistake or forgetfulness and disinformation, which is when information has its content altered purpose [9]. Fact-checking is one of the alternatives to combat disinformation, aiming to assess the integrity of information [16]. The fact verification process requires research and identification of evidence, understanding of the information context, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 498–507, 2021. https://doi.org/10.1007/978-3-030-71187-0_46
Evaluating Preprocessing Techniques in Identifying Fake News
499
and reasoning about what can be inferred from this evidence [21]. It can depend on the complexity of what is being evaluated and can take up to days to conclude [6]. Due to the large volume of content to be verified and the propagation speed in digital media, this process has become a challenge [5], which demands automated fact verification systems (AFC, Automated Fact-Checking) [7,15]. The use of Computational Intelligence techniques has been the focus of tools to seek automation in detecting disinformation [3,17,22]. It occurs because intelligent technologies allow identifying patterns and generating computational models capable of separating legitimate from disinformation news. An essential step in the construction of supervised classifiers is the preprocessing of the data. The representation and quality with which these data are presented to the model contribute positively to a good generalization and, consequently, obtain good results in the classification process [8]. This paper proposes an analysis of the application of different preprocessing techniques on Fake.br Corpus [11], a dataset related to disinformation that contains 7200 news divided equally between legitimate and fake news and presents 21 features related to linguistics and Part of Speech tags. For the evaluation, 11 dataset configurations were generated from different preprocessing techniques: transformation, normalization, and; resource selection. We implemented three different supervised classifiers and calculated the metrics related to spam detection in each proposed dataset. Then, we apply statistical tests to check if there is a significant difference between the results obtained, and the best results obtained are compared with the state of the heart related to Fake.br Corpus. We organize the remainder of the paper as follows. Section 2 reviews the related work. Section 3 describes the methodology. Section 4 shows the results. Finally, Sect. 5 presents conclusions and future work.
2
Related Work
In [11], the authors presented the first Corpus in Portuguese composed of true and false news, containing 7200 news. Data were collected between 2016 and 2018, grouped into six categories and divided into 3600 true and 3600 false. The category with the highest volume of data is politics, containing 4180 news items. This data set was named FAKE.BR CORPUS. The authors also proposed using machine learning using Support Vector Machine and evaluated some combinations of characteristics, such as, for example, Part of Speech (POS) tags, Bag of Words, and linguistic features. The best result reached an accuracy of 89%, using all the proposed features. Silva et al. [19] sought to answer some of the open-ended questions related to the automatic classification process of fake news, using FAKE.BR CORPUS as the object of study. The researchers evaluated different proposals for supervised classifiers and concluded that none of the techniques outperformed the others in all experiments. However, the Support Vector Machine, Random Forest, and Logistic Regression algorithms achieved better results in most scenarios. They also analyzed the best characteristics used in the classification process.
500
M. Marinho et al.
They evaluated the linguistic characteristics and the characteristics generated by text representation techniques and concluded that the Bag of Words techniques’ textual characteristics had a better performance than the others. According to the systematic review of the literature proposed in [18], the features analyzed for the classification of information are: (i) Feeling contained in the texts; (ii) Syntactic analysis (scarcely used); (iii) Semantic analysis; (iv) Word Embeddings, which is a form of language modeling, where words are grouped close to those that have the same meaning; (v) Target news features such as social media attributes, such as likes and shares and; (vi) textual features: the proportion of the amount of punctuation (period, comma, exclamation, etc.) by the text size and; the polarity of the feelings of the words. The authors also highlight two open challenges in the area: A broader analysis taking into account texts, videos and photos present in the news and; the disambiguation of the news, because among the linguistic resources, there are metaphors and sarcasm.
3
Methodology
This section presents the dataset used in this work and the features analyzed in each news. We explained the proposed preprocessing steps to produce the final dataset. We also describe the deployed evaluation process metrics and the machine learning techniques used for classification. We present the experimental setup and the comparison process using statistical tests. Finally, we compare the best obtained results with other researchers’ results using Fake.br Corpus. We found a wide variety of datasets in the English language about disinformation. However, it is not easy to obtain this type of information in the Portuguese language. The Fake.Br Corpus [11] is the first attempt with this purpose and helps us to understand the construction of disinformation in Brazil. The Fake.br Corpus contains 7200 news collected from websites. The authors also made available a set with 25 features regarding each news. In this dataset, the feature types are distributed as follows: 21 features are numeric with linguistic information and PoS Tags, and four are categorical, which are related to the date of publication, author, news link, and category. In this work, we only use the numeric features. Categorical features have been removed because they do not have information that is relevant in the process of identifying the type of content. For example, the use of the author could hinder the process, as it would not take into account the written content but who wrote it, or the date of publication, could also have the same effect, given that if by chance on a given day the dataset had more fake than legitimate news. Data Preprocessing is an essential step in the machine learning processes, and the lack of this can lead to less accurate results, especially for noisy or missing data [8]. The activities in data preprocessing include [4]: (i) data preparation, consisting of cleaning, standardization, and data transformation; and (ii) data reduction, such as Feature Selection and Instance Selection.
Evaluating Preprocessing Techniques in Identifying Fake News
501
We propose a different preprocessing stage compared to previous approaches, aiming to facilitate the classification process. We modified the normalization, transformation, and selection of features. As a result, we obtain a dataset with only 11 features generated from the Fake.br Corpus dataset. The normalization process [13] is the activity of reducing the magnitude of the values in each feature of the dataset. It is relevant in many datasets since there is a big difference between each feature’s maximum and minimum value. For the normalization step, we deployed the Min-Max technique, in which, for each proposed feature, the data underwent the transformation shown in this equation:
X =
X − minvalueX − (M ax − M in) + M in maxvalueX − minvalueX
(1)
The transformation [12] of data aims to improve the normality of a distribution and equalize the variance. For this, we used the 2 transformation techniques: (i) Box-Cox transformation [12]: λ X −1 if λ = 0 λ ; X = (2) logX; if λ = 0 Where lambda is estimated using maximum likelihood methods. And; (ii) the logarithmic transformation using natural logarithm, also known as Neperian logarithm. After applied to the transformations, the transformed datasets were submitted to Min-Max normalization. Then, we have four different datasets for the classification task 1. Table 1. Proposed datasets Primary (Raw): without normalization and transformation Min-Max: with normalization and without transformation Box-Cox + Min-Max: with normalization and Box-cox transformation Log + Min-Max: with normalization and Log transformation
We perform the Resource Selection [10] activity for removing redundant or irrelevant characteristics within the classification process. For this, we calculate the entropy of each feature present in the dataset. The features with a value lower than the average are removed from the dataset. After this step, 11 features remain, are they: number of nouns; the number of words without punctuation; the number of tokens; the number of characters; the number of types; diversity; the number of verbs; average sentence length; pausality, emotiveness and; average word length. In the proposed features, we apply the Tukey test to verify whether there is a statistical difference between them, and the results showed that the number of characters could represent
502
M. Marinho et al.
the other four features: the number of verbs, nouns, tokens, and words without punctuation. Moreover, after that, we removed these four features, remaining only seven features to be evaluated. We also deployed another feature selection procedure. In the second approach, we use the Random Forest technique to obtain each feature’s weight in the classification process. For this, each proposed data set was submitted to 30 runs of the algorithm. In the end, we calculate the average weight of each feature, and we rank considering the most critical features as the ones that obtain the most significant weights. Finally, the seven most relevant features were selected to be compared with the first approach’s seven features. Table 2 shows the final features selected in each of the approaches. We highlight approach 2 (Random Forest) found the same seven features in all the proposed datasets. Table 2. Features selected in both processes. Random forest
Entropy + Tukey
Number of nouns
Average sentence length
Number of characters
Number of characters
Number of words without punctuation Average word length Number of tokens
Emotiveness
Number of types
Number of types
Diversity
Diversity
Number of verbs
Pausality
For each configuration contained in Table 1, we generated a new dataset using each of the approaches presented in the Feature Selection process, thus totaling another eight new datasets. This work evaluated 12 configurations, 11 proposed by the preprocessing techniques, and 1 being the original data from Fake.br Corpus. After completing the data preprocessing activities, the expected result is the generation of the final dataset that has a better representation and quality of the original data, contributing positively to the generalization performance of the machine learning models. In this research, we propose the use of 3 different supervised classifiers: Random Forest (RF), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP). We selected these three techniques because they have different rationales and can lead to different results. We implement all the approaches using the scikit-learn toolkit [14], with the following specifications: (i) Random Forest with the maximum depth equal to 100; Linear SVM using stochastic gradient descent (SGD) learning, with a penalty of l2 and the maximum number of iterations equal to 100; and MLP using the adam optimizer, with the maximum number of iterations equal to 1000 and with two hidden layers, containing 50 neurons in the first and 25 in the second. We observed that the RF approach achieved the best results.
Evaluating Preprocessing Techniques in Identifying Fake News
503
Misleading content metrics were used in the work of Silva et al. [19], and these are utilized for comparisons with the results obtained in this paper. The metrics are: (i) Fake news caught rate (FCR): the proportion of fake news correctly identified; (ii) Fake news precision rate (FPR): proportion of classified news as fake and that they really belong to the fake class and; (iii) F-Measure: harmonics average of the FCR and FPR. For each proposed dataset configuration, 30 tests were performed, wherein each of them a cross-validation (folder = 10) was made. For each proposed dataset configuration, we run 30 trials considering k-fold cross-validation with k =10. The three configurations resulting from normalization and transformation, and the original dataset were evaluated using the proposed three classifiers. Two points were then observed: (i) which classifier got the best performance and; (ii) which data configuration performed best. The other eight datasets were then evaluated with the classifier that obtained a better performance, observing the Misleading content metrics. We apply the statistical tests of Anova and tstudent to validate whether the generated datasets have a statistical difference. We considered a 95% confidence in the samples, observing the F-measure, representing the average harmonic between FCR and FPR. Finally, we compare the best results obtained in this paper and the results obtained by other researchers using the Fake.br Corpus [11,19].
4
Results
First, we analyze the four dataset configurations proposed in this work in Table 1. We obtain 12 results considering the four datasets and the three classifiers. The best results of each configuration are depicted in Table 3, and in bold, one can see the best results obtained for each metric evaluated. Table 3. Bests results from classifiers analysis Dataset
FPR
Raw
FCR
F-Measure Classifier
95.77
97.41
96.54
RF
Min-Max 95.78
97.42
96.55
RF
Log
95.77
97.43 96.55
Box
95.78 97.42
96.55
RF RF
The results showed that the RF technique obtained the best results in all configurations, with its F-measure values at 95.55%, and obtaining 97.43% FCR and 95.78% FPR. It is essential to highlight that concerning the configuration of the data set, whether the data are normalized or transformed or pure, there was no statistical difference between them, so the application of these preprocessing steps did not bring a difference in the results.
504
M. Marinho et al.
For the second evaluation, we applied the RF technique to the settings resulting from the application of the Feature Selection (Table 2). The results in Table 4 refer to the features selected through the approach proposed in this article (entropy and application of the Turkey test), and the results in Table 5 are those related to the features obtained through the RF feature importance technique. Table 4. Results from feature selection by Entropy + Turkey Dataset
FPR
Raw
95.02 95.58 95.26
FCR
F Measure
Min-Max 94.99
95.54
95.22
Log
95.00
95.53
95.23
Box
94.98
95.54
95.22
Table 5. Results from feature selection by random forest Dataset
FPR
Raw
94.87 95.05
FCR
F Measure 94.90
Min-Max 94.87 95.04
94.90
Log
94.83
95.08 94.90
Box
94.87
95.06
94.91
The results obtained with the first approach’s features obtained better performance than the second approach (RF) in all the sets of configurations and data. One of the reasons for this is that the features selected by the first approach ended up bringing a greater variety of information, as it eliminated four features that could be represented by only 1, and brought four more new information (features). However, this reduction in features caused a 1.29% loss by observing the F-measure compared to the best result in analyzing all the features. The configuration that obtained the best result was the data without preprocessing in the first approach, and for the second approach, the statistical tests showed no significant difference between the configurations. Finally, Table 6 presents the best results obtained in this article, and the best results from the other papers [11,19], evaluating different features in Fake.br Corpus. These other results were obtained by analyzing the following features. Bag of Words (BoW), both in the complete texts and in the truncated texts, since the authors observed that the true texts’ size is larger than those of the false ones, which could skew the result. (ii) linguistic-based features, in [11], only features related to pausality, emotiveness, uncertainty, and non-immediacy were observed. In [19], they added the features of diversity, the average size of the
Evaluating Preprocessing Techniques in Identifying Fake News
505
Table 6. Comparison between the best results Dataset
FCR
FPR
Silva et al. (2020) Ling. features
0.941
0.940
0.941
Silva et al. (2020) BoW (trunc. text)
0.932 0.943
0.937
Silva et al. (2020) BoW (trunc. text) + Ling. features 0.954
F Measure
0.976 0.965
Monteiro et al. (2018) Ling. features
0.530 0.570
0.550
Monteiro et al. (2018) PoS Tag + Semantic + BoW
0.890
0.880
0.890
Entropy + Tukey - Raw dataset
0.955
0.950
0.952
Log dataset
0.974 0.957
0.965
sentences, the average size of the words, and the number of spelling errors. (iii) the features related to Part of Speech (PoS) tags. The results showed that using the logarithmic transformation, or any of the configurations presented in Table 1, it was possible to obtain the same result related to the best result found in the other works when observing the F-measure. We had a better result in FCR, which is equivalent to the recall metric related to the fake news class. This means that our model can recover and identify more news belonging to that class. However, when we had a performance in the metric related to FPR, 0.19% lower, in this case, the capacity of when we classify news as fake when it is false. To our understanding, the two metrics have the same importance because, in an automatic fact verification system, it is essential to identify all the news that is disinformation and that this classification is correct. The result achieved by this research took into account only 21 features, and the best result proposed in [19], was obtained with the use of linguistic features and BoW, resulting in a more significant number of characteristics and, consequently, an increase in complexity for the problem resolution. If the results were compared using only the linguistic features obtained in [19], we had an increase of 2.45% when observing the F-measure and compared to [11] of 41.55%. However, the researchers did not apply a grid-search to find the best one’s parameters for the proposed SVM. Our results using only the seven features also obtained a better performance if compared to all the results obtained in [11], and it was also better than those of [19] analyzing only the linguistic features.
5
Conclusions and Future Works
This work aimed to evaluate the application of some of these approaches in the different phases of the preprocessing activity: normalization, transformation, and Feature Selection. For this purpose, 11 datasets were generated from the data contained in Fake.br Corpus, which contains true and fake news, and 21 numerical features related to linguistics and PoS were selected. These datasets were submitted to different supervised machine learning techniques, and then their results were evaluated using spam metrics, with the application of statistical tests
506
M. Marinho et al.
to validate whether there is a difference between the results obtained. Finally, we compared the best results with others found in the literature using the same dataset. In the first round of analyses, we observed that the best results obtained in all standardized, transformed, or original sets, came from applying the Random Forest technique. We also found that there was no statistical difference between the results obtained by the generated sets and the results of the original set, thus concluding that the application of normalization or transformation techniques, in these cases, made no difference. The second round included the datasets evaluation with the application of two Feature Selection approaches. The results obtained showed a loss of information and, consequently, the metrics had a lower performance compared to the results of using all the features. However, the approach proposed in this research, which joined entropy and the Tukey test application between the features, obtained better performance than the use of the approach involving the best features of the RF. The comparison with other works involving Fake.br Corpus, showed that the best result of this work had: its value equal to the best result of the other works when the F-measure was analyzed; had a better result when observing the FCR metric and; a worse performance when compared to the FPR metric. However, the number of features analyzed in this work is much smaller than the amount proposed in other works, allowing the same result to be obtained, but with less complexity and less information. Finally, we conclude that preprocessing activities related to normalization and transformation in the features evaluated by this work are related to Fake.br Corpus, had no influence, neither negative nor positive in the results obtained. The application of the techniques of Feature Selection ended up having a reduction in the values obtained. However, it still proved to be more effective than other works that analyzed only this type of feature, excluding the analyzes with text representation techniques. In future work, we will carry out a more detailed analysis of this dataset by applying unsupervised techniques to understand whether there are subgroups within the legitimate and false news and their characteristics. We will also propose a new set of data to be added to Fake.br Corpus, to bring more updated news, as this corpus is from 2016 to 2018, and to increase the diversity of existing news. Finally, we will seek to propose an automated system that helps people identify the likelihood of news being disinformation quickly and easily.
References 1. Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. J. Econ. Perspect. 31(2), 211–36 (2017) 2. Bennett, W.L., Livingston, S.: The disinformation order: disruptive communication and the decline of democratic institutions. Eur. J. Commun. 33(2), 122–139 (2018) 3. Conroy, N.J., Rubin, V.L., Chen, Y.: Automatic deception detection: methods for finding fake news. Proc. Assoc. Inf. Sci. Technol. 52(1), 1–4 (2015)
Evaluating Preprocessing Techniques in Identifying Fake News
507
4. Garc´ıa, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, Heidelberg (2015) 5. Graves, D.: Understanding the promise and limits of automated fact-checking (2018) 6. Hassan, N., Adair, B., Hamilton, J.T., Li, C., Tremayne, M., Yang, J., Yu, C.: The quest to automate fact-checking. In: Proceedings of the 2015 Computation + Journalism Symposium (2015) 7. Hassan, N., Arslan, F., Li, C., Tremayne, M.: Toward automated fact-checking: detecting check-worthy factual claims by claimbuster. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1803–1812 (2017) 8. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1(2), 111–117 (2006) 9. Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Metzger, M.J., Nyhan, B., Pennycook, G., Rothschild, D., et al.: The science of fake news. Science 359(6380), 1094–1096 (2018) 10. Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007) 11. Monteiro, R.A., Santos, R.L., Pardo, T.A., de Almeida, T.A., Ruiz, E.E., Vale, O.A.: Contributions to the study of fake news in Portuguese: new corpus and automatic detection results. In: International Conference on Computational Processing of the Portuguese Language, pp. 324–334. Springer, Heidelberg (2018) 12. Osborne, J.: Improving your data transformations: applying the box-cox transformation. Pract. Assess. Res. Eval. 15(1), 12 (2010) 13. Patro, S., Sahu, K.K.: Normalization: A preprocessing stage. arXiv preprint arXiv:1503.06462 (2015) 14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 15. Rocha Jr, D.B., Lins, A.J.D.C.C., de Souza, A.C.F., de Oliveira Lib´ orio, L.F., de Brito Leit˜ ao, A.H., Santos, F.H.S.: Verific. ai application: automated factchecking in Brazilian 2018 general elections. Brazil. J. Res. 15(3), 514 (2019) 16. Ruiz, M.J.U., Verd´ u, F.J.M.: El fact checking: en busca de un nuevo modelo de negocio sostenible para el periodismo. estudio de caso de miniver. Miguel Hern´ andez Commun. J. (9), 511–534 (2018) 17. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017) 18. Cardoso Durier da Silva, F., Vieira, R., Garcia, A.C.: Can machines learn to detect fake news? a survey focused on social media. In: Proceedings of the 52nd Hawaii International Conference on System Sciences (2019) 19. Silva, R.M., Santos, R.L., Almeida, T.A., Pardo, T.A.: Towards automatically filtering fake news in Portuguese. Expert Syst. Appl., 113199 (2020) 20. Tandoc Jr., E.C., Lim, Z.W., Ling, R.: Defining “fake news” a typology of scholarly definitions. Dig. J. 6(2), 137–153 (2018) 21. Thorne, J., Vlachos, A.: Automated fact checking: Task formulations, methods and future directions. arXiv preprint arXiv:1806.07687 (2018) 22. Zhou, X., Zafarani, R., Shu, K., Liu, H.: Fake news: fundamental theories, detection strategies and challenges. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 836–837 (2019)
Covid-19 and the Pulmonary Immune System Fight: A Game Theory Model Said Lakhal(B)
and Zouhair Guennoun
Research Team in Smart Communications-ERSC, E3S Research Center, EMI, Mohammed V University, Rabat, Morocco [email protected]
Abstract. Against the pathogen threats, the pulmonary immune system (PIS) acts systemically, by executing a series of strategies. At the first it detects and blocks the intruder; then, it activates and recruits the immune cells; and finally, it kills and evacuates the damage. On its side, the virus is diversifying techniques to evade the PIS, enter to the cells and replicate. This relationship makes the two in a game situation that takes place on several rounds. The goal of the present work, is to design a game theory model, by establishing the strategies of each player, the payoffs at the end of each round, and the winner after a high number of rounds. We prove that our model relates six parameters: four to assess the probabilities of success of the strategies, replication rate and winning player. After that, we show that we can estimate the value of one parameter, if the other five parameters are known.
Keywords: Covid-19
1
· Pulmonary immune system · Game theory
Introduction
The Novel Coronavirus-Infected Pneumonia (NCIP) [1] or Covid-19 [2], has started since December 29th, 2019 in Wuhan, Hubei province, Republic of China [3]. The Incubation period takes between 1 and 4 days [1,4]. The patient got fever, fatigue, dry cough, and dyspnea, sore throat [1]. A case study, confirmed these symptoms, for a worker at the seafood market, admitted in the Central Hospital of Wuhan on December 26th 2019 [5]. 1.1
Coronavirus Categorization and Structure
Relative to the degree of pain, the coronaviruses can be classified into six types: four cause mild respiratory symptoms, and the two others have caused epidemics with high mortality rates [1], namely Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS) coronavirus. In relation to the infected host, these coronaviruses are categorized into four families: α-Cov, β-Cov, γ-Cov and δ-Cov [6]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 508–518, 2021. https://doi.org/10.1007/978-3-030-71187-0_47
Covid-19 and the Pulmonary Immune System: A Game Theory Model
509
Covid-19 encodes more than dozen of proteins, to viral entry and replication [2]. The most well-studied are papain-like protease (PLpro), 3C-like protease (3CLpro) and spike protein (S protein) [2]. This last is localized in the membrane envelope, with two subunits: S1 and S2. The first one recognizes and binds the host cell receptor via a unique angiotensin-converting enzyme 2 (ACE2) [4], while the second represents a membrane fusion [7]. The data analysis of 425 confirmed cases allowed to determine the epidemiologic characteristics of Covid-19 [8]. 1.2
Mathematical Model
Mathematical and statistical methods have helped reveal many secrets about disease [3], by answering the following questions: how transmissible the disease is? When the infectiousness is highest during the course of infection? How severe the infection is? How effective interventions have been and ought to be? The Program for Monitoring Emerging Diseases (ProMED) can be used as a source of epidemiological data on coronavirus [6]. Our objectives in this work are: establishment of pulmonary immune system mechanisms, enumeration of the virus actions, and finally the design of a model, based on game theory to formulate the fight between the two. For this, we give the operating of pulmonary immune system, the activities of virus and the principals of game theory; in Sect. 2. The game theory components are given in Sect. 3 and the modelization in Sect. 4. We propose some applications of the proposed model in Sect. 5, and finally, we conclude in Sect. 6.
2
Related Works
The lung is an intermediate between environment and blood, which enables to exchange oxygen and dioxide. Thus, this middle is constantly exposed to the exterior particles, allergens, and pathogens. From where, the presence of pulmonary immune system, becomes necessary. This system is situated in a fine balance between quiescence and inflammation, with one of the most intricate vasculature networks in the body. The capillary density minimizes the spatial distance between blood and air, enabling maximal efficiency of the oxygen gradient through which hemoglobin-rich red blood cells pass as they transit the lung [9]. 2.1
Pulmonary Immune System
The pulmonary immune system can be divided into two parts: innate and adaptative [10]. The first is non-programmed which represents the first line of defense [11]. By cons, the second is self and non-self-antigen recognition, able to develop the immune memory [10]. Each of them contains cellular and humoral components [10]. Their interconnection is ensured via the extracellular matrix (ECM), after the expression of the cytokine/chemokines and leukocyte recruitment, coming from damage associated molecular patterns (DAMPs), in the cases of: cell
510
S. Lakhal and Z. Guennoun
death, stress and/or injury [12]. The innate immunity includes several barriers, including: mucus, cilia, the mucociliary escalator, and proteins [12]. Epithelial cells are the primary detectors of pathogens: they secrete RANTES and IP-10 (CXCL10) [10]; in order to sensitize innate immunity to recruit immune cells, through neutrophils, phagocytes and various cytokines; as well as, to activate the adaptative immunity [10,11], by the intermediate of the T and B cells. At the end of the fight between the virus and the immune system, eosinophils facilitate the elimination of viruses and reduce infectivity [11]. 2.2
Virus Activities
The virus binds cell through its receptors [11]. In our case, covid-19 attaches to the ACE2 receptor, present in the respiratory epithelium and alveoli of the lungs. Once the attachment is established, the replication phase begins, to multiply the number of viruses [13]. At the same time, the virus is developing many techniques to block the IFN production [11]. 2.3
Presentation of Game Theory
Game theory is a branch of applied mathematics, studying situations of competition and cooperation between several involved parties by using mathematical methods [14]. It was developed as a theory of human strategic behavior based on an idealized picture of rational decision making [15]. Able to providing tools to analyze strategic interactions, which may then be applied to any arbitrary game-like situation. In the literature several types of games are cited, but not limited to: 1) Simultaneous or strategic form game that attempts to capture a scenario in which there is strategic interdependence among a set of players who either take their actions simultaneously or without observing the actions chosen by others. 2) Sequential or extensive form game that is opposed to the strategic form, provides a more appropriate framework to analyze certain interesting questions that arise in strategic interactions that involve sequential moves. 3) Repeated games capture the idea that a player will have to take into account the impact of his or her current action on the future actions of other players; this impact is sometimes called his or her reputation [16]. In each game, three important components must be specified: the players, the adopted strategies by each one and the obtained payoffs. Based on the payoff values, the winning player is determined; but sometimes, the game stabilizers on an equilibrium point, where each player reaches his maximum payoff, without having a winning player.
3 3.1
Game Components Players
During a fight between the virus and the pulmonary immune system (PIS), there are two players: the virus and PIS which are noted P1 and P2 , respectively. Each one performs many actions to achieve its goal. In a game theory language, these actions are called strategies and goals are payoffs.
Covid-19 and the Pulmonary Immune System: A Game Theory Model
3.2
511
Strategies
Inside the human body, the virus may performs enormous actions, including: attachment to the receptors, entry into the cells, evading the PIS, replication, ....Among all these strategies, we focus only on two main strategies: entry and replication, which are noted respectively: s11 = E and s12 = R. On the other hand, the PIS executes a series of actions against each external threat: starting with the detection and blockage of pathogens, going through the activation and recruitment of immune cells and arriving at the kill and evacuation of pathogens. In the present study, we treat two principal strategies: blockage and kill of virus, indicated by s21 = B and s22 = K, respectively. Once a player has executed a strategy, all players receive their payoffs as outcomes. 3.3
Payoffs
The payoff concept designates the obtained reward, after performing a strategy by a particular player. This value can be positive or negative in relation to the interests of a given player. Generally, during a game, each player competes with the others to win a maximum payoff. Therefore, when someone receives a positive payoff, another obtains a negative payoff. In our case, the payoff of the virus is the number of viruses inside the human body, and that of PIS is the remaining number of the immune cells after each round. Let O1 and O2 be the payoff of P1 and that of P2 , respectively. Figure 1 illustrates the conflict between the PIS and the virus.
Fig. 1. Fight between PIS and virus
Before the result is established, the game follows several rounds.
512
3.4
S. Lakhal and Z. Guennoun
Rounds of Game
The virus tries to enter (s11 ) the cells and after starts the replication strategy (s12 ). Against these tries, the PIS prevents the virus to reach at its goals, sometimes by the blockage strategy (s21 ) and sometimes by the kill strategy (s22 ). The success of a strategy executed by P1 or P2 is generally not sure, so it may succeed by one probability and not by another probability.
4
Modelization
4.1
Probabilities of Successful Strategies ⎧ ⎪ ⎨enter the cell with probability: p11 , Execution of s11 : killed by the PIS with probability: p12 , ⎪ ⎩ blocked with probability: 1 − (p11 + p12 ). The viruses that succeed to enter into the cells, starts the replication phase: s12 . Execution of s12 : ⎧ ⎪replicate with probability: p21 , ⎨ killed by the PIS with probability: p22 , ⎪ ⎩ neither killed nor replicated with probability: 1 − (p21 + p22 ). 4.2
Payoffs After Each Round
• 1st round: at the beginning the total number of viruses and immune cells are respectively: nv and nc . Once the virus has performed strategy s11 , these values will be modified. As result, we obtain a number of viruses inside the cells, a 2nd number is killed by the PIS and a 3rd number outside the cells. These numbers ⎧ are distributed as follows: st ⎪ number : p11 nv , ⎨1 After s11 : 2nd number : p12 nv , ⎪ ⎩ rd number : (1 − (p11 + p12 ))nv . 3 The viruses inside the cells will be replicated, by performing strategy s12 : After s12 : ⎧ ⎪ Number of replicated viruses: rp p21 p11 nv , with rp is the replication rate : ⎪ ⎪ ⎪ ⎨each replicated virus becomes outside the cells, ⎪ Number of killed viruses: p22 p11 nv , ⎪ ⎪ ⎪ ⎩Number of intact viruses that are inside the cells: (1 − (p + p ))p n . 21 22 11 v
After the 1st round, the numbers of viruses inside the cells, outside the cells and total, are u1 , v1 and O11 , respectively. Thus: u1 = (1 − (p21 + p22 ))p11 nv v1 = rp p21 p11 nv + (1 − (p11 + p12 ))nv = (rp p21 p11 + (1 − (p11 + p12 )))nv O11 = u1 + v1 Currently, we calculate the remaining number of cells. For this, we subtract the number of destroyed cells p11 nv , from the initial number of cells nc . Highlighting, the PIS is able to repair a certain number of destroyed cells with a
Covid-19 and the Pulmonary Immune System: A Game Theory Model
513
rate ra and we assume that this operation is limited to the cells where the viruses have been replicated. So, after the 1st round, the remaining number of immune cells will be: O12 = nc + (ra p21 p11 − p11 )nv = nc + (ra p21 − 1)p11 nv . • 2nd round: The outputs of the 1st round represent the inputs of the 2nd round. Consequently: ⎧ st ⎪ number : u1 + p11 v1 , ⎨1 After s11 : 2nd number : p12 v1 , ⎪ ⎩ rd number : (1 − (p11 + p12 ))nv . 3 : After s 12 ⎧ ⎪ ⎨Number of replicated viruses: rp p21 (u1 + p11 v1 ), Number of killed viruses: p22 (u1 + p11 v1 ), ⎪ ⎩ Number of intact viruses that are inside the cells: (1 − (p21 + p22 ))(u1 + p11 v1 ).
As a result, we obtain: u2 = (1 − (p21 + p22 ))(u1 + p11 v1 ) = (1 − (p21 + p22 ))u1 + (1 − (p21 + p22 ))p11 v1 v2 = rp p21 (u1 + p11 v1 ) + (1 − (p11 + p12 ))v1 = rp p21 u1 + (1 − (p11 + p12 ) + rp p21 p11 )v1 O22 = O12 + (ra p21 − 1)(u1 + p11 v1 ) We proof by absurd that 1 − (p21 + p22 ) = 0: 1 − (p21 + p22 ) = 0 imply un = 0, ∀n ≥ 1, then: the number of viruses inside the cells is zero all the time, so the virus is inactive. Thus: ra p21 −1 u2 O22 = O12 + 1−(p 21 +p22 ) • ......... • nth round: By following the same reasoning, we establish: α
un = (1 − (p21 + p22 )) vn = rp p21
β
un−1 + (1 − (p21 + p22 ))p11
un−1 + (1 − (p11 + p12 ) + rp p21 p11 )
σ
ra p21 − 1 2 un . On2 = On−1 + 1 − (p21 + p22 )
vn−1 ,
vn−1 ,
λ
(1)
In summary, we have the next formulation: ⎧ un = αun−1 + βvn−1 , α = (1 − (p21 + p22 )), β = αp11 , ⎪ ⎪ ⎪ ⎨v = σu n n−1 + λvn−1 , σ = rp p21 , λ = 1 − (p11 + p12 ) + σp11 , S1 : 1 ⎪On = un + vn , ⎪ ⎪ ⎩ 2 ra p21 −1 2 + 1−(p un . On = On−1 21 +p22 ) 4.3
Problem Resolution
It is easy to verify that the four parameters: α, β, σ and λ are all nonzero. Because, if we assume that one of them is zero, the virus will be inactive, therefore, it will not be able to replicate and/or bind to the cells.
514
S. Lakhal and Z. Guennoun
The goal is to calculate the values of un and vn , as function of n and others initial inputs of the problem: p11 , p12 , p21 , p22 , rp and ra . and Wn2 , for n ≥ 0: We define two sequences: Wn1
⎧ ⎨W 1 = un + k1 vn , k1 = β + ( α−λ )2 − α−λ , n σ 2σ 2σ
S2 : ⎩W 2 = u + k v , k = −( β + ( α−λ )2 + α−λ ). n 2 n 2 n σ 2σ 2σ While σ = 0 and β > 0, then: k1 and k2 are well defined, so it will be the same for the two sequences: Wn1 and Wn2 , for (n ∈ N). In addition, each one is a geometric sequence, with common ratio, r1 and r2 , respectively. Such as: r1 = α + k1 σ, S3 : r2 = α + k2 σ. As aresult, Wn1 and Wn2 will be given by: Wn1 = W01 r1n = k1 r1n nv , S4 : Wn2 = W02 r2n = k2 r2n nv , The subtraction between the two parts of system S2 gives:
Wn1 − Wn2 = (k1 − k2 )vn . While: β = 0, then k1 − k2 = 2 W 1 −W 2
k r n −k r n
β σ
2 + ( α−λ 2σ ) = 0,
consequently: vn = kn1 −k2n = 1 k11 −k22 2 nv . By using the 1st equation of system S2 , we can write: k r n −k r n 2 nv un = Wn1 − k1 vn = (r1n − 1 k11 −k22 2 )k1 nv = (r2n − r1n ) kk11k−k 2 Based ⎧ on these notations, we obtain : 1 n n ⎪ ⎨vn = k1 −k2 (k1 r1 − k2 r2 )nv , k1 k2 nv n n S5 : un = k1 −k2 (r2 − r1 ), ⎪ ⎩ 1 −1 −1 On = un + vn = kk11−k k2 nv r2n − kk12−k k1 nv r1n . 2 2 For calculating the number of remaining cells, going back to Eq. (1): ra p21 −1 2 Oi2 − Oi−1 = 1−(p ui 21 +p22 ) Summing the first part in one side and the the second part in the other side, we obtain: i=n ra p21 − 1 On2 − O02 = ui 1 − (p21 + p22 ) i=0 with O02 = nc and ζ =
ra p21 −1 1−(p21 +p22 )
< 0, we obtain:
On2 = nc + ζ
i=n
ui
i=0
Let’s replace ui by its value given in the 2nd equation of system S5 . On2 = nc + ζ
i=n k1 k2 nv i (r − r1i ) k1 − k2 i=0 2
(2)
Knowing that r2 < 1, it remains to discuss in relation to the value of r1 , to simplify this expression.
Covid-19 and the Pulmonary Immune System: A Game Theory Model
515
As a result, we obtain: n+1 −1 r n+1 −1 2 nv r2 nc + ζ kk11k−k ( r2 −1 − 1r1 −1 ), if r1 = 1, 2 2 On = n+1 −1 2 nv r2 ( r2 −1 − (n + 1)), if r1 = 1, nc + ζ kk11k−k 2 To specify the winner after n rounds, we calculate the margin between On2 and On1 . Let Dn be such value, then: D = On2 − On1 = n n+1 −1 r n+1 −1 −1 −1 2 nv r2 nc + ζ kk11k−k ( r2 −1 − 1r1 −1 ) − ( kk11−k k2 nv r2n − kk12−k k1 nv r1n ), if r1 = 1, 2 2 2 r n+1 −1
−1 −1 2 nv ( 2r2 −1 − (n + 1)) − ( kk11−k k2 nv r2n − kk12−k k1 nv ), if r1 = 1, nc + ζ kk11k−k 2 2 2 The result of the game is obtained after a high number of rounds. Mathematically speaking, we calculate the limit of Dn , when n tends to +∞ . While 0 < r2 < 1, then: lim r2n = 0 n−→+∞
For r1 , we have ⎧ three cases: ⎪ if 0 < r1 < 1, ⎨0 n lim r1 = +∞ if r1 > 1, n−→+∞ ⎪ ⎩ 1 if r1 = 1. Consequently,⎧ we obtain the limite of Dn , for different cases: k1 k2 nv −1 1 ⎪ ⎨nc + ζ k1 −k2 ( r2 −1 + r1 −1 ), if r1 < 1, lim Dn = −∞, if r1 = 1, n−→+∞ ⎪ ⎩ −∞, if r1 > 1, Interpretation : ⎧ st ⎪ 1 case: the PIS has lost a small number of cells, and so it is the winner, ⎨ 2nd and 3rd case: the PIS is dominated by the virus because, ⎪ ⎩ it has lost a large number of cells, so the winner is the virus. Relative to the values of r1 , the winner is given as: PIS, if r1 < 1, Winner: virus, if r1 ≥ 1. Based on that, we deduce that r1 is a determinant factor, to specify the winning player. Therefore, this factor deserves a detailed study, to know its variation. For this, going back to the 1st equation of system S3 , which gives the expression of r1 as:
α+φ+h φ+h−α 2 r1 = + αh + ( ) , with φ = 1−(p11 +p12 ), h = σp11 ) (3) 2 2 Let r1 be a function defined as follows: r1 : R+ −→ R+
h −→ α+φ+h + αh + ( φ+h−α )2 2 2 r1 is differentiable on R+ , with: 1 α+φ+h 1 r1 (h) = 2 1 + ( 2 )(αh + φ+h−α )− 2 ≥ 0, ∀h ∈ R+ , then r1 is increasing. 2
516
S. Lakhal and Z. Guennoun
1 r1 (0) = α+φ 2 + 2 | φ − α |= max(φ, α). Knowing that: 0 ≤ φ ≤ 1 and 0 ≤ α ≤ 1. As a result, we have the three following results:
1) r1 (0) ≤ 1, 2)
lim
h−→+∞
r1 (h) = +∞ and 3) r1
is an increasing and continuous on R+ .
Such conditions are sufficient to deduce that it exists a unique h0 ∈ R+ that verifies: r1 (h0 ) = 1. The solution of this equation gives: h0 = (1 − φ)(1 − α) Knowing that: h = σp11 = rp p21 p11 , then: 0
0
(p11 +p12 )(p21 +p22 ) . p21 p11 0 ∀rp ≥ rp , we have: r1 (rp )
h0 = rp p21 p11 = (p11 + p12 )(p21 + p22 ) =⇒ rp = 0
Therefore: ∀rp < rp , we have: r1 (rp ) < 1, and
≥ 1.
As result, if we set the values: p11 , p12 , p21 and p22 ; and want to have 21 +p22 ) ; otherwise, we choose: r1 (rp ) < 1, we have to choose: rp < (p11 +pp1221)(p p11 rp ≥
(p11 +p12 )(p21 +p22 ) . p21 p11
Figure 2 summarizes the different configurations of the six parameters and the player ranges
Fig. 2. Ranges of players
5 5.1
Applications Values Presentation
The principle is to fix the values of five parameters and to find the 6th , by following a circular permutation, as shown in Table 1: 5.2
Results Explanations 0
• In the 1st column of Table 1, the unknown value is p11 , rp > rp and the virus is the winner. Then, based on Fig. 2.(a), we obtain p11 ≥ 0.09. 0
• In the 2nd column of Table 1, the unknown value is p12 , rp < rp and the PIS is the winner. Then, based on Fig. 2.(c), we obtain 0.25 < p12 < 0.6.
Covid-19 and the Pulmonary Immune System: A Game Theory Model
517
Table 1. Set values of five parameters and after search the value of the 6th parameter p11
p12
p21
p22
rp
W
?
0.35 0.7
0.2
6
Virus (p11 , rp ) = (0.09, 1.28) p11 ≥ 0.09
0.4
?
0.7
0.23 0.47 5
0.25 ?
PIS
Thresholds 0
0
0
0
(p12 , rp ) = (0.25, 7.6) 0
Unknown value 0.25 < p12 < 0.6
0
0.55 7.4 Virus (p21 , rp ) = (0.12, 1.35) p21 ≥ 0.12 0
0
6
PIS
(p22 , rp ) = (0.54, 6.21) 0.54 < p22 < 0.58
0.35 0.25 0.42 0.18 ?
PIS
rp = 2.44
0.23 0.37 0.42 ? 0.25 0.6
0.29 0.41 14 ?
0 0
rp = 8.2
rp < 2.44 Virus
0
• In the 3rd column of Table 1, the unknown value is p21 , rp > rp and the virus is the winner. Then, based on Fig. 2.(a), we obtain p21 ≥ 0.12. 0
• In the 4th column of Table 1, the unknown value is p22 , rp < rp and the PIS is the winner. Then, based on Fig. 2.(c), we obtain 0.54 < p22 < 0.58. • In the 5th column of Table 1, the unknown value is rp , and the winner is thePIS. Then, based on Fig. 2.(b), we obtain rp < 2.44. 0
• In the 6th column of Table 1, the unknown value is the winner, rp > rp . Then, based on Fig. 2.(b), the winner is the virus.
6
Conclusion
In this work, we have developed a game theory model to simulate the fight between the pulmonary immune system and the virus. Therefore, we have established the strategies of each player, the payoffs at the end of each round and the winning player after a high number of rounds. The results of our model are summarized by a relation linking six parameters: the probabilities of the success of the strategies, the replication rate and the winning player. In the application section, we estimated the unknown parameter based on knowledge of five other parameters.
References 1. Soheil, K., Melina, H., Lee, M., Ali, G.: Coronavirus outbreak: what the department of radiology should know. J. Am. Coll. Radiol. 17(4), 447–451 (2020) 2. Zhang, D.-H., Wu, K.-L., Zhang, X., Deng, S.-Q., Peng, B.: In silico screening of Chinese herbal medicines with the potential to directly inhibit 2019 novel coronavirus. J. Integr. Med. 18(2), 152–158 (2020) 3. Biao, T., Nicola, L.-B., Qian, L., Sanyi, T., Yanni, X., Jianhong, W.: An updated estimation of the risk of transmission of the novel coronavirus (2019-nCoV). Infect. Dis. Model. J. 5, 248–255 (2020) 4. Xian, P., Xin, X., Yuqing, L., Lei, C., Xuedong, Z., Biao, R.: Transmission routes of 2019-nCoV and controls in dental practice. Int. J. Oral Sci. 12(1), 1–6 (2020)
518
S. Lakhal and Z. Guennoun
5. Wu, F., et al.: A new coronavirus associated with human respiratory disease in China. Nat. 579, 265–269 (2020) 6. Katterine, B.-A., Yeimer, H.-R., Isabella, C.-B., Mara, C.-C.-T., Alejandra, G.B., Hugo, A-B-As., Ali, A.-R., Ranjit, S., Alfonso, J.R.-M.: Coronavirus infections reported by ProMED. Travel Med. Infect. Dis. J. 35, 1–5 (2020) 7. Yudong, Y., Richard, G.-W.: MERS, SARS and other coronaviruses as causes of pneumonia. Respirology 23(2), 130–137 (2017) 8. Qun, L., et al.: Early transmission dynamics in Wuhan, China, of novel coronavirus infected pneumonia. N. England J. Med. 382(13), 1–9 (2020) 9. Mark, R.-L., Mark, B.-H.: Live imaging of the pulmonary immune environment. Cell. Immunol. 350, 103862 (2018) 10. Kbra, B., Thomas, B.: Th17 cells and the IL-23/IL-17 axis in the pathogenesis of periodontitis and immune-mediated inflammatory diseases. Int. J. Mol. Sci. 20(14), 3394 (2019) 11. Santtu, H., et al.: Infant immune response to respiratory viral infections. Immunol. Allergy Clin. N. Am. 39(3), 361–376 (2019) 12. David, N., et al.: Pulmonary immunity and extracellular matrix interactions. Matrix Biol. J. 73, 1–32 (2018) 13. Balachandar, V., et al.: COVID-19: A promising cure for the global panic. Sci. Total Environ. 725, 138277 (2020) 14. Hans, P.: Game Theory. A Multi-leveled Approach, Second Edition, pp. 19–40. Springer, Heidelberg (2015) 15. Peter, H: Game Theory and Evolutionary Biology. Max-Planck-Institut fr verhaltensphisiologie, University of Bonn, Elsevier Science, pp. 931-950 (1994) 16. Levent, K.-K., Efe, A.: An Introduction to Game Theory (2007). http://home.ku. edu.tr/lkockesen/teaching/econ333/lectnotes/uggame.pdf
IAS: Intelligent Attendance System Based on Hybrid Criteria Matching Fanglve Zhang1 , Jia Yu1 , and Kun Ma1,2(B) 1
2
School of Information Science and Engineering, University of Jinan, Jinan 250022, China ise [email protected] Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan 250022, China
Abstract. The traditional fingerprint-based presence system and the geographic location on the mobile phone are not always efficient. First of all, it may leak personal information and privacy. Besides, it is easy to cause a fraud when users alter the GPS information of their mobile phones. What’s more, the mode of the traditional attendance system is not flexible enough. In our demonstration, we construct an intelligent attendance system (abbreviated as IAS), which is based on hybrid criteria matching. The front-end of IAS is WeChat applet. In IAS, we take hybrid criteria matching model as the core. The model organically combines multiple matching criteria in order to ensure the strength of attendance management. The innovation of the system is hybrid criteria matching and multiple operation modes. The hybrid criteria matching model associates a variety of matching criteria such as image matching, Wi-Fi matching and geographic location matching, which can increase the supervision intensity of attendance on the basis of ensuring the information security of users. Meanwhile, this system also provides managers with multiple operation modes, such as periodic attendance mode and personnel batch management mode, to meet the needs of multiple attendance activities, and evidently reduces the burden on managers, which lead to higher efficiency. Keywords: Attendance system · Hybrid criteria matching operation modes · WeChat applet · Information statistics
1
· Multiple
Introduction
The existing attendance system on the mobile phone usually uses fingerprint and GPS (Global Positioning System) [1,9]. The following enumerate its shortcomings: – Although fingerprint information can be used as effective identification information, it is not safe to be stored in the database. Once the information is leaked, it will cause great loss to users. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 519–525, 2021. https://doi.org/10.1007/978-3-030-71187-0_48
520
F. Zhang et al.
– There is a vulnerable fraud when using this attendance system because users can modify GPS information of their mobile phones. Consequently, systems using GPS as their only way to detect geographic location information will suffer from the problem that names of those who are actually absent will probably appear in the list of attendance. – Attendance mode is inflexible. The personnel list can only be entered in advance. And it’s hard to deal with the unknown situation of the actual participant list as well. In our previous work, we have proposed automatic scheduling algorithm with hash-based priority selection strategy [5,10]. However, this method lacks on the authentication of the signers. In order to overcome the shortcomings of the existing attendance system, this paper proposes an Intelligent Attendance System (abbreviated as IAS) based on Hybrid Criteria Matching. The innovations of IAS are to use multiple criteria matching model for attendance control, increase the supervision of attendance, and ensure the effective execution of attendance activities. The following describes the innovations of this system in detail: – Image matching and Wi-Fi matching. We propose a matching method to determine the user is in attendance or absence. We use MD5 algorithm to encrypt the image and the WiFi information sent by the client to ensure the security of user information. – Hybrid criteria matching model using three methods. We use WiFi matching [3,7] and geographic location matching [2] to filter out requests with obvious differences, and then filter out error requests more accurately through image matching to ensure that users are indeed in attendance. – Multiple attendance modes. We combine the methods of periodic attendance, invitation attendance and batch personnel management attendance to solve the problems such as repeated creation of attendance activities or uncertainty of attendance participants that users often encounter. It meets a variety of attendance need through various participants management methods.
2
Architecture
The architecture of IAS is shown in Fig. 1. It consists of four layers: activity releasing layer, personnel input layer, intelligent matching layer and statistics revealing layer. First, of the activity releasing layer, the manager can use the WeChat applet [4] client to set the specific information, including attendance operation mode, time limit and criteria of attendance, and then publish them. The attendance activity information will be stored in the database asynchronously. Then, the manager will confirm the participants of the newly released attendance activity and then pass the information to the personnel input layer. After that, participants of whose names have already been entered into the attendance
IAS: Intelligent Attendance System Based on Hybrid Criteria Matching
521
Fig. 1. Architecture of IAS
list will have the access to the lately released activity information on the homepage of WeChat applet client. After the attendees sign in, the attendance request issued by the system will be pushed to the intelligent matching layer. The intelligent matching layer matches the request step by step according to the criteria restricted by the attendance activity. And then at last feed back the attendance results to the participants. Also, the attendance manager can view the current attendance status in the form of a chart simultaneously. The structure of the system includes the following advantages: Multiple criteria restrictions greatly avoid cheating on attendance, so as to strengthen supervision of attendance activities. At the same time, the automatic arrangement and visual information feedback of the system both save the statistical time of the attendance manager. Strengthening attendance management can reduce managers’ burden of viewing automatic statistics of attendance results. The entire process does not require manual intervention. Active attendance supervision is transformed into passive result reception, which improves efficiency and reduces the work pressure of managers. 2.1
Activity Releasing Layer
In the activity releasing layer, the WeChat applet is used as a client, which can reduce the redundancy of software installation in the case of low usage. In this layer, the system provides managers with two attendant activity operation modes and three attendance matching criteria as optional operation modes. Two operation modes include single attendance mode and periodic attendance
522
F. Zhang et al.
mode. Under the single attendance mode, the manager can limit attendees’ specific attendance time periods. While under the periodic attendance mode, the manager first set the specific span of a cycle, and then specify the attendance time period of attendees in each cycle. During the corresponding time period of each cycle, the system will automatically remind the attendees to send request. The principle of periodic attendance mode is that if periodic attendance mode is enabled, the manager will enter the specified attendance period within a cycle. For example, if a week is set as a cycle, the entered information can be from 8:00 to 12:00 on Tuesday. The system sets the time stamp in the database, pushes the attendance information to the user within the specified time by setting the server timing task, and updates the next attendance time, so as to achieve the requirement of periodic attendance. Attendance matching criteria include Wi-Fi matching, geographic location matching and image matching, which made up the hybrid criteria model. Once the manager decides to use one of the criteria, he has to offer the detailed information about it. We use the on-duty attendance which is organized by the student organization of University of Jinan to test the IAS and compare the performance with the previous attendance system, and the functional comparison results in the following layers are all based on this and will not be repeated. Compared with the previous attendance system, the manager can choose more suitable attendance methods and attendance modes for different attendance activities. There is no need to create the system as before and also need to do some additional manual operations, which greatly reduce the workload of the manager. 2.2
Personnel Input Layer
The personnel input layer provides the managers with two management modes of attendance participants as optional operation modes: specified participant mode and invitation mode. Under the specified participant mode, the manager can enter attendance participants by single entry or batch import of Excel list. The personnel input layer will match the entered participants’ information with the users’ information in the database. The matched users can view the newly released attendance activity information. If it does not match, the system will feed back the error information to the manager. Under the invitation mode, the personnel input layer will generate a unique invitation code for this attendance activity. The manager can invite other users to join the activity by sharing the invitation code. After the manager entering participants, participants’ attendance activity list will be automatically updated. We set different colors for different states of user attendance, which can facilitate users to directly understand the state of attendance activities. The previously used attendance system by the Jinan University’s student organization consumes a long time when entrying personnel and requires a lot of
IAS: Intelligent Attendance System Based on Hybrid Criteria Matching
523
tedious operations, which is only suitable for the attendance management of a small number of personnel. When using IAS, the manager can sort out the list of personnel who need to be entered according to the Excel sheet provided by IAS to the user, and then upload the Excel sheet to the system management terminal, and IAS automatically sorts the personnel into the system. During large-scale attendance activities, almost half an hour of input time for the manager is saved. The advantage of IAS in the work of managers for personnel management is obvious. At the same time, compared with the previous attendance system, attendance participants can understand the status of each attendance activity more clearly, so no accidental missed behavior has occurred. 2.3
Intelligent Matching Layer
In order to ensure the management of attendance request, we propose a multiple intelligent matching model, which is based on image matching, Wi-Fi matching and geographic location matching. The model will judge the information in the attendance request in the light of the matching criteria selected by the manager in the activity releasing layer. The model first uses WiFi matching and geographic location matching as the filter layer to filter out the attendance requests with obvious errors, so as to speed up the matching speed of the model for ensuring the users’ experience. The steps of Wi-Fi matching are described as follows. After the Wi-Fi matching is turned on, participants must connect to the Wi-Fi designated by the attendance activity when submitting the attendance request, otherwise the request will be rejected. We use the Wi-Fi detection interface (wx.getConnectedwifi) provided by WeChat applet to ensure its accuracy. When the manager creates an attendance activity, the system will detect the SSID and BSSID of the WiFi connected to the manager’s mobile phone and store them in the database by encryption of the MD5 algorithm. When the participant sends the attendance request, the system will detect the WiFi information connected by the participant and match it to determine whether the request is valid. The steps of geographic location matching are described as follows. After the geographic location matching is enabled, the participants shall upload the longitude and latitude of the location when submitting the attendance request. If the location is not within the specified range, the attendance request will be rejected. We use the location interface provided by WeChat applet to ensure the accuracy of geographic location matching. The steps of image matching are described as follows. As long as the image matching is enabled, participants must capture and upload the image specified in the attendance activity when submitting the attendance request. Otherwise the request will be rejected. The manager can ask the participants to take photos of the iconic items in the attendance site to ensure the effectiveness of the attendance. System uses the file upload interface provided by WeChat (wx.uploadFile) to upload the photo from the client to the server. After receiving the file, the system will encrypt the image information with MD5 algorithm to ensure the
524
F. Zhang et al.
confidentiality of user information. The next is to match the image. First of all, we use SIFT algorithm [8] in OpenCV on the purpose of detecting image corner points, obtaining image key points and then calculating key points. Secondly, use FLANN algorithm and kd-tree [6] spatial index to search for feature points and achieve feature matching between two images. We will analyze the effectiveness of attendance and personnel motivation. For effectiveness of attendance, our requirement is to attendance inside the office to be effective. The previously used attendance system cannot accurately limit the scope. It often happens that personnel in attendance successfully without entering the office. It led to a high rate of attendance fraud in previous attendance activities. When using IAS, the personnel have to enter the office for attendance because they are asked to connect to the specified WiFi to take the correct photos of the landmarks. After using for a month, the rate of attendance fraud has dropped significantly compared to the previous month. For personnel motivation, when using the previous attendance system, there are many personnel who choose to attendance phoning and not come on duty due to the lack of strict restrictions on attendance. When using IAS, personnel have to be on duty seriously due to strict attendance restrictions. Over time, a habit is formed, which greatly improves personnel motivation. 2.4
Statistics Revealing Layer
The system feeds back the matching results of the intelligent matching layer to the attendance participants. Meanwhile, the attendance results of all the attendees would be sorted out, and the data shall be fed back to the manager in the form of charts in real time. So as to facilitate the manager to manage and count the results. This layer displays data through two types of clients. One is the attendance activity management page in WeChat applet, the manager can directly view the representative data and attendance details. In addition, we use Vue.js framework to create a web-based background management system. The manager can log in to view the all data and attendance details by binding the mobile phone number with WeChat user information. When using the previous attendance system, the manager can only see the name list of the attendance personnel when viewing the attendance status, and cannot see the visual attendance status. When organizing the attendance rate, it needs to calculate and statistics by themselves, which costs extra and meaningless energy. When using IAS, the manager can directly view the attendance rate, pie chart and other visual attendance statistics in the attendance activity details interface, which intuitively displays the specific situation of the attendance, and can also save the information file in the mobile device, and the manager can organize the data or use it in meetings.
3
Demonstration Scenarios
We will first use slides to introduce the development motivation of IAS and how it operates. After that, we will show it to the audience in the WeChat applet.
IAS: Intelligent Attendance System Based on Hybrid Criteria Matching
4
525
Conclusions
In our demo, we showcase the intelligent attendance system (abbreviated as IAS). An intelligent attendance method based on image, Wi-Fi and geographic location information is introduced. This method is an improvement of the traditional attendance management platform. Innovation highlights two points: One, a flexible attendance management method based on multiple attendance modes and multiple participants manage modes. The other, a multiple criteria matching method based on image, Wi-Fi and geographic location information. Acknowledgment. This work was supported by the National Natural Science Foundation of China (61772231), the Industry-Academy Cooperative Education Project of Ministry of Education (201801002030 & 201702185051), the Shandong Provincial Natural Science Foundation (ZR2017MF025), the Project of Shandong Provincial Social Science Program (18CHLJ39), the Science and Technology Program of University of Jinan (XKY1734 & XKY1828), and the Project of Independent Cultivated Innovation Team of Jinan City (2018GXRC002).
References 1. Chew, C.B., Mahinderjit-Singh, M., Chiang, K., Tan, W., Malim, A.H.: Sensorsenabled smart attendance systems using NFC and RFID technologies. Soc. Digit. Inf. Wirel. Commun. 5(1), 19–28 (2015) 2. Chunlin, M.: Electronic attendance checking method and system based on mobilephone positions. https://lens.org/114-772-127-012-986 3. Guibin, L.: Cell phone attendance system based on WiFi signals. https://lens.org/ 041-507-692-612-577 4. Hao, L., Wan, F., Ma, N., Wang, Y.: Analysis of the development of WeChat mini program. J. Phys. Conf. Ser. 1087, 062040 (2018). IOP Publishing 5. Ji, X., Ma, K.: Toward automatic scheduling algorithm with hash-based priority selection strategy. In: Soft Computing for Problem Solving, pp. 35–42. Springer, Singapore (2020) 6. Li, M., Wang, L., Hao, Y.: Image matching based on sift features and kd-tree. In: 2010 2nd International Conference on Computer Engineering and Technology, vol. 4, pp. V4–218. IEEE (2010) 7. Nanqing, Z.: Attendance system based on WiFi (wireless fidelity) signals of cell phone. https://lens.org/094-205-693-952-708 8. Panchal, P., Panchal, S., Shah, S.: A comparison of SIFT and SURF. Int. J. Innov. Res. Comput. Commun. Eng. 1(2), 323–327 (2013) 9. Soewito, B., Gaol, F.L., Simanjuntak, E., Gunawan, F.E.: Attendance system on android smartphone. In: 2015 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC), pp. 208–211 (2015) 10. Yao, Y., Zheng, X., Ma, K.: ILFS: intelligent lost and found system using multidimensional matching model. In: 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp. 1205–1208. IEEE (2019)
Energy-Based Comparison for Workflow Task Clustering Techniques Youssef Saadi1(B)
, Abdelhalim Hnini2 , Soufiane Jounaidi3 , and Hicham Zougah1
1 Information Processing and Decision Support Laboratory, Faculty of Sciences and
Technologies, Sultan Moulay Slimane University, Beni Mellal, Morocco {1y.saadi,4h.zougagh}@usms.ma 2 LAVETE Laboratory, National School of Applied Sciences, Hassan First University of Settat, 26100 Berrechid, Morocco 3 IRMM Laboratory, Higher Institute of Engineering and Business (ISGA Group), Hassan First University of Settat, Casablanca, Morocco
Abstract. Cloud computing consumes a huge amount of electrical power which penalizes both the cloud service providers and environmental standards. Energyperformance tradeoff is still an open issue in cloud computing that targets minimizing energy consumption by the datacenters’ infrastructure while meeting the user QoS as well as defined by the Service Level Agreement. Scientific distributed applications generate complex workflows that may contain thousands of tasks deployed over the cloud datacenters for execution. These applications are high demanding in terms of computation capabilities and may lead to hosts’ overloading which may increase the energy consumption. In this study, we analyze the contribution of task clustering techniques on energy consumption by the cloud’s infrastructure. The most relevant scheduling polices will be setup according to horizontal and vertical task clustering schemes with the Montage scientific workflow to demonstrate the impact on energy saving. For this purpose, we used WorkflowSim: an open source cloud simulator providing a workflow-level support, task scheduling and clustering techniques. The simulations’ results highlight the impact of horizontal scheme in increasing the energy consumption. Keywords: Cloud computing · Scheduling policies · Task clustering · Energy consumption · Scientific workflow
1 Introduction Cloud computing represents a pay-per-use model which offers on demand resource use. A cloud environment can be seen as a collection of datacenters that may be collocated or distributed geographically. Each datacenter contains a set of physical servers that lodge a set of virtual machines responding to customers’ requests [1]. Nowadays, cloud computing environment consumes a great many of electrical energy resources due the huge amount of computation and data processing capabilities. Firstly, this consumption leads to high carbon dioxide emissions and breaks consequently the environmental standards. Approximately 3% of the world’s total electricity is consumed © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 526–535, 2021. https://doi.org/10.1007/978-3-030-71187-0_49
Energy-Based Comparison for Workflow Task Clustering Techniques
527
by datacenters, producing 200 million metric tons of dioxide carbon [2]. In addition, the increase in energy consumption is often related to system instability because of over-provisioned and overload resources aiming to maintain continuous availability and reliability of services to end users [3, 4]. Secondly, the energy cost will double every five years for each typical datacenter [5] and according to [6–8], power consumption cost may exceed the cost of the datacenter’s hardware and will become much more expensive for cloud service providers which may penalizes their Return On the Investment (ROI). Cloud computing processes a variety of applications with different requirements. Among these applications, we find scientific workflows which are known for their complex structures and need of high computation capability. Scientific workflows may contain thousands of tasks to be executed by cloud datacenters. Task execution time is the main relevant measure that may determine the energy consumption by the workflow execution [9]. Task clustering aims to reduce the scheduling overheads by aggregating small tasks into atomic units called jobs. However, this operation results in important amount of data processing and computation time [10, 11]. Thus, our present study focuses on analyzing the behavior of scheduling techniques on energy consumption by the cloud datacenters when considering task clustering. We aim to interpret the impact of such clustering technique on such load balancing or scheduling policy according to energy saving. The experiments will be carried out on Workflowsim simulator [12] to establish several scenarios of workflows executions in cloud datacenters and according to different scheduling and clustering techniques. The reminder of the paper is organized as follows: Sects. 2 presents a brief overview of scientific workflows and task clustering techniques. In addition, we discuss some related works. In Sect. 3, we describe our experimental setup and interpret the obtained results. Section 4 concludes this paper.
2 Background and Related Work In this section, we give a brief summary of the basic elements that we will use for simulating and evaluating our proposition. Firstly, we start by describing the design of scientific workflows, secondly, we present the main task clustering categories, thirdly we introduced the workflow simulator and finally the energy model adopted in this work. 2.1 Scientific Workflows Scientific workflows are often modeled using Direct Acyclic Graphs (DAGs) which described a hierarchical structure connecting between tasks according to a parent-child relation. Child tasks can’t start executing until the parent task completed its execution and made available the data for transfer to concerned child tasks. Many types of scientific workflows were defined. The most popular ones are: Montage [13], CyberShake [14], SIPHT [15], LIGO Inspiral Analysis [16]. Scientific workflows generate complex workloads and are high computation demanding.
528
Y. Saadi et al.
2.2 Clustering Techniques Task clustering aims to merge small sized tasks into large sized task in order to reduce the queue wait time and then to minimize the makespan of the workflow [17, 18]. Many task clustering categories were proposed [17, 19]: level (horizontal), vertical, blocked and balanced based clustering. In horizontal clustering, the independent tasks of the same level of a workflow can be combined together to form a cluster. In vertical based clustering, the tasks of the same pipeline can be merged in a cluster. Blocked based clustering combines between the two categories described above. Because of load imbalance problem, the balanced based clustering was introduced. It used metrics to solve the runtime and dependency imbalance problem. 2.3 Workflow Simulator Workflowsim [12] is a popular simulator that processes the scientific workflows-based DAG structure. It implements both the above cited task clustering techniques and the basic scheduling policies for planning execution of tasks on appropriate resources. Workflowsim defines many entities that manage processing workflows: 1. The Planner: it imports workflows as DAX files and creates a tasks list to assign to an execution site. 2. The clustering engine: it receives the workflow tasks and group them into several jobs based on the defined clustering technique. 3. The workflow engine: after clustering operation, this engine manages the received jobs based on their dependencies to assure that parent jobs release first before their related Childs. 4. The workflow scheduler: selects a Virtual Machine (VM) of the cloud datacenter which is the most suitable for executing each job based on the configured scheduling policy or user-defined criteria. 5. The workflow datacenter: it keeps track of the job execution. 2.4 Energy Model 2.4.1 Makespan: It is the completion time of the last task, corresponding to the time spent by the system to complete all of its jobs (the overall time to release all tasks). 2.4.2 Power Model: In this work, we consider that the power consumption by hosts is a linear function of CPU utilization [9, 20, 21]. An idle server consumes about 70% of power of fully utilized server. Thus, the power consumption P(u) is defined as: P(u) = 0.7 × Pmax + 0.3 × Pmax × u
(1)
where Pmax is the maximum power of a host in the running state at 100% CPU utilization; u is the current CPU utilization. CPU utilization changes over time, therefore we define
Energy-Based Comparison for Workflow Task Clustering Techniques
529
the power consumption as a function u(t) of time. Equation (2) determines the total energy consumption of a physical machine: E = P(u(t))dt (2) To compute the energy consumption due to specific scheduling algorithm, we use then, the Eq. (3): E = (0.7 × Pmax + 0.3 × Pmax × u) × makespan
(3)
2.5 Workflow Scheduling Policies: Workflow scheduling aims in principle to allocate resource to a task in order to minimize the overall workflow completion time. An effective scheduling policy is a load balancing based scheme that targets avoiding over and under utilization of hosts. The main goal of such scheme is then to disseminate the load among hosts according to specific criteria. In this work, we discuss the main local scheduling policies supported in Workflowsim: Data-Aware Scheduling (DAS): to schedule a task on a virtual machine (VM), the data transfer time for the pair (task, VM) should be minimum. Round Robin (RR): the time is divided into slices and each task is assigned to a resource in that time slice. After expiration of the first time slice, resources will be assigned to next task and so on. First Come First Serve (FCFS): tasks are scheduled according to their arrival time. First arrived task is scheduled on the first available VM. Min-Min: The scheme starts by sorting tasks in increasing order of execution time then schedule the task with the minimum overall completion time on the appropriate resource. Then this task is removed from the unmapped tasks list and the operation is repeated until all unmapped tasks are scheduled. Max-Min: the scheme operates as well as Min-Min scheme, however instead of selecting first the task with minimum completion time, it selects the task with the overall maximum execution time. Minimum Completion Time (MCT): it selects task that is suitable to be completed in minimum possible time. This scheme does not sort tasks from the unmapped tasks list. 2.6 Related Work Chaudrey et all. proposed in [22], a power aware MAX-MIN VM placement algorithm for reducing power consumption by cloud datacenters. They proposed a power model based DVFS with two governors. They observed that On-Demand governor offers approximately same energy saving like Conservator governor. The authors conclude that task clustering contribute to reducing energy consumption.
530
Y. Saadi et al.
The authors in [23] present an algorithm-based energy efficient workflow scheduling which is inspired from hybrid chemical reaction optimization. The obtained simulations’ results showed that the proposed scheme minimized both the energy consumption and the workflow Makespan. The authors in [24] provide an energy-efficient task-scheduling algorithm based on best-worst and the Technique for Order Preference by Similarity to Ideal Solution methodology. The main idea is to determine the best scheduling policy to select in response to user requests. The experimental results showed that the authors’ work can reduce effectively the makespan and energy consumption. It also improves the VM utilization. Safari and Khorsan in [25], proposed a new technique for minimizing energy consumption of DAG-structured applications on heterogenous cloud system. They make use of power-aware list-based scheduling algorithm combined with DVFS technique under deadline constraint to maintain the quality of service. Alahmadi et al. exploited DVFS and VM reuse techniques to investigate energyefficient task scheduling in cloud environment [26]. Their EATS-FFD algorithm achieved better energy-efficiency without compromising the QoS of the cloud system. Fernández-Cerero et al. developed a set of energy-aware strategies for resource allocation and task scheduling in clouds [27]. The main idea is to hibernate idle state VMs by considering combination of energy and performance-aware scheduling policies. The proposed solution achieved about 45% reduction of the energy consumption by the cloud system. Garg and Goraya introduced an energy-efficient scheduling model under the deadline constraint for cloud environment [28]. This heuristic technique contains two instances to achieve respectively maximizing the workload execution in the operational state of the host and maximizing the energy saving in idle state of the host. Kaur and Chana in [29] carried out an exhaustive analysis concerning the techniques of energy efficiency in cloud computing. The main objective was to analyze the software optimization in the efficient management of energy. Lis et al. [30] conducted a study that aims to map the structure of research on energy efficiency techniques in cloud computing environments. They tried to give insights about hot fields in this research area. They also listed the tools and methodologies deployed accordingly. This study was performed based on results conducted on the analysis of bibliometric data extracted from the Scopus database. Surveyed scheduling algorithms do not consider the impact of clustering techniques on energy consumption. In this simulation study we target to analyze the behavior of existing scheduling policies in terms of energy consumption and according to selected task clustering technique. We will try to answer the question of what combination of workflow scheduling and clustering schemes is most suitable for saving energy?
3 Methodology To carry out the simulation, we used in this first work the Montage scientific workflow. We varied the number of its related tasks from 50 to 1000 tasks in order to represent light, middle and heavy loads. We applied the following scheduling policies for each workflow
Energy-Based Comparison for Workflow Task Clustering Techniques
531
size: DAS, Round Robin, FCFS, Max-Min, Min-Min, MCT. And for each scheduling policy, we used two configurations based on either level, or vertical task clustering technique. The simulations aim to compare the above combined configurations in terms of energy consumption. 3.1 Simulation Setup: In this simulation study, we will use Montage Workflow and a common configuration for the datacenter hosts (see Table1). Table 1: Host Configuration RAM (MB)
Storage (MB)
BW (Mb/s)
CPU (MIPS)
2048
100000
10000
2000
In addition, one common VM configuration instance is defined which is described in Table 2: Table 2: VM Configuration Storage
RAM
CPU
BW
PEs
Architecture
10000 MB
512 MB
1000 MIPS
(1000 Mb/ s)
1
‘Xen’
In this work, three simulations’ scenarios are defined: • In the first use case, we simulated light loads using Montage with 50 tasks. 5 Hosts and 5 VMs are setup by the datacenter. • In the second case, we used Montage with 100 tasks. 10 VMs and 10 hosts are setup. • In the third scenario, heavy load is considered using Montage with 1000 tasks. In this configuration, also, 10 VMs are considered with 10 Hosts. • In all simulations, we tried to observe the contribution of clustering technique on energy consumption. We considered in this first work only vertical and level-based task clustering. • For each simulation scenario, we repeated simulations 10 times and average values are token as result. • In horizontal clustering, we specified the clusters’ size equal to two, which means each job has two tasks. • For both, vertical and horizontal clustering, the clustering delay specified to each level is 1.0 s
532
Y. Saadi et al.
3.2 Results and Discussion Figures 1, 2 and 3 show that horizontal clustering increases significantly the energy consumption by the datacenter compared to vertical task clustering scheme. When load is light, MAXMIN and DAS using vertical clustering may save important amount of energy. As well, when the workflow size increases, MAXMIN combined with vertical task clustering reduced significantly the energy consumption as compared to other scheduling policies. Actually, vertical task clustering optimizes resource utilization compared to horizontal scheme which involves more VM in executing tasks and then more energy consuming. MAXMIN policy improves the makespan by executing large tasks first. And combination between MAXMIN and vertical clustering demonstrates the reduced energy consumption. MINMIN combined with horizontal clustering depicts the worst case in energy consumption above all when the load become heavy because of resource usage maximization and increased makespan in comparison with the other schemes. Round Robin using vertical task clustering leads to maximizing power consumption due to increased makespan. Thus, vertical task clustering is the most suitable clustering technique when we are targeting the energy efficiency. And combining both vertical clustering with MAXMIN may improve the energy saving.
Energy Consumption (KWatts) Montage-50 165.00 160.00 155.00 150.00 145.00 140.00 135.00 MINMIN
MAXMIN
MCT
Horizontal Clustering
RR
DAS
FCFS
Vertical Clustering
Fig. 1. Energy Consumption in case of light load: Montage with 50 tasks.
Energy-Based Comparison for Workflow Task Clustering Techniques
533
Energy Consumption (KWatts) Montage-100 500.00 400.00 300.00 200.00 100.00 0.00 MINMIN
MAXMIN
MCT
Horizontal Clustering
RR
DAS
FCFS
Vertical Clustering
Fig. 2. Energy consumption in the case of middle load: Montage with 100 tasks
Energy ConsumpƟon (KWaƩs) Montage-1000 3210.00 3200.00 3190.00 3180.00 3170.00 3160.00 3150.00 3140.00 3130.00 3120.00 MINMIN
MAXMIN
MCT
Horizontal Clustering
RR
DAS
FCFS
VerƟcal Clustering
Fig. 3. Energy consumption in the case of heavy load: Montage with 1000 tasks
4 Conclusion In cloud computing, reducing energy consumption is a serious requirement. One way to allow this is by disseminating load among cloud datacenters’ nodes such a way to avoid the hosts’ over or underutilization state. Task scheduling with concern of minimizing resource usage remains an import issue to achieve power-aware load balancing
534
Y. Saadi et al.
in cloud environment. In this study, we demonstrated using simulation that task clustering contributes in energy consumption. The combination of vertical task clustering with MAXMIN scheduling policy participates in reducing the energy consumption compared to others scheduling solutions using horizontal technique. In future work, we will extend our simulation study to the other task clustering schemes and according to different types of scientific workflows.
References 1. Saadi, Y., El Kafhali, S.: Energy-efficient strategy for virtual machine consolidation in cloud environment. Soft. Comput. 24, 14845–14859 (2020). https://doi.org/10.1007/s00500-02004839-2 2. Chaudhry, A., M. A. R. : A two-way street: green big data processing for a greener smart grid. IEEE Syst. J. 11(2), 784–795 (2017) 3. Rincón, D., Agustí-Torra, A., Botero, J.F., Raspall, F., Remondo, D., Hesselbach, X., Beck, M.T., de Meer, H., Niedermeier, F., Giuliani, G.: A novel collaboration paradigm for reducing energy consumption and carbon dioxide emissions in data centres. Comput. J. 56(12), 1518– 1536 (2013) 4. Ma, Y., Ma, G., Zhang, S., Zhou, F.: Cooling performance of a pumpdriven two phase cooling system for free cooling in data centers. Appl. Therm. Eng. 95, 143–149 (2016) 5. Buyya, R., Vecchiola, C., Selvi, S.T.: Mastering Cloud Computing: Foundations and Applications Programming. Morgan Kaufmann, Burlington (2013) 6. Rivoire, S., Shah, M. A., Ranganathan, P., Kozyrakis, C., Meza, J.: Models and metrics to enable energy-efficiency optimizations. Computer 40(12), 39–48 (2007) 7. Gao, Y., Guan, H., Qi, Z., Wang, B., Liu, L.: Quality of service aware power management for virtualized data centers. J. Syst. Archit. 59(4), 245–259 (2013) 8. Poess, M., Nambiar, R.O.: Energy cost, the key challenge of today’s data centers. Proc. VLDB Endow. 1(2), 1229–1240 (2008) 9. Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gener. Comput. Syst. 28(5), 755–768 (2012) 10. Juve, G., Chervenak, A., Deelman, E., Bharathi, S., Mehta, G., Vahi, K.: Characterizing and profiling scientific workflows. Futur. Gener. Comput. Syst. 29(3), 682–692 (2013) 11. da Silva, R., Juve, G., Deelman, E.: Toward fine-grained online task characteristics estimation in scientific workflows. In: Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science, pp. 58–67 (2013) 12. Chen, W., Deelman, E., Workflow Sim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th International Conference on E-Science, Chicago, IL, pp. 1–8 (2012). https://doi.org/10.1109/eScience.2012.6404430 13. Berriman, G.B., Deelman, E., Good, J.C., Jacob, J.C., Katz, D.S., Kesselman, C., Laity, A.C., Prince, T.A., Singh, G., Su, M.-H.: Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand, p. 221 (2004) 14. Graves, R., Jordan, T.H., Callaghan, S., Deelman, E., Field, E., Juve, G., Kesselman, C., Maechling, P., Mehta, G., Milner, K., Okaya, D., Small, P., Vahi, K.: CyberShake: a physicsbased seismic hazard model for southern california. Pure Appl. Geophys. 168(3–4), 367–381 (2011) 15. “SIPHT.” [Online]. https://pegasus.isi.edu/applications/sipht
Energy-Based Comparison for Workflow Task Clustering Techniques
535
16. Brown, D. A., Brady, P. R., Dietz, A., Cao, J., Johnson, B., McNabb, J.: A Case study on the use of workflow technologies for scientific analysis: gravitational wave data analysis. In: Taylor, I. J., Deelman, E., Gannon, D. B., Shields, M. (eds.) Workflows for e-Science, pp. 39–59. Springer, London (2007) 17. Chen, W., Silva, R.F.D., Deelman, E., Sakellariou, R.: Balanced task clustering in scientific workflows. In: 2013 IEEE 9th International Conference on e-Science, Beijing, pp. 188–195 (2013) https://doi.org/10.1109/eScience.2013.40 18. Chavan, D.V., et al.: Comparative Performance Analysis of Task Clustering Methods in Cloud Computing. In: National Conference on Recent Trends in Computer Science and Information Technology (NCRTCSIT-2016), pp. 50–52 (2016). e-ISSN: 2278–0661, p-ISSN: 2278–8727 19. Singh, G., Su, M., Vahi, K., Deelman, E., Berriman, B., Good, J., Katz, D.S., Mehta, G.: Workflow task clustering for best effort systems with Pegasus. In: Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities (MG 2008). Association for Computing Machinery, New York, NY, USA, Article 9, pp. 1–8 (2008) 20. Fan, X., Weber, W-D., Barroso, L.A.: Power provisioning for a warehouse-sized computer. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, ACM, pp. 13–23 (2007) 21. Kusic, D., Kephart, J. O., Hanson, J. E., Kandasamy, N., Jiang, G.: Power and performance management of virtualized computing environments via lookahead control. Cluster Comput. 12(1), 1–15 (2009) 22. Choudhary, A., Govil, M.C., Singh, G., Awasthi, L.K., Pilli, E.S.: Task clustering-based energy-aware workflow scheduling in cloud environment. In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Exeter, United Kingdom, pp. 968–973 (2018) https://doi.org/10. 1109/HPCC/SmartCity/DSS.2018.00160. 23. Singh, V., Gupta, I., Jana, P.K.: An energy efficient algorithm for workflow scheduling in IaaS Cloud. J. Grid Comput. 18, 357–376 (2020). https://doi.org/10.1007/s10723-019-09490-2 24. Khorsand, R, Ramezanpour, M.: An energy-efficient task-scheduling algorithm based on a multi-criteria decision-making method in cloud computing. Int. J. Commun. Syst. 33, e4379 (2020) 25. Safari, M., Khorsand, R.: PL-DVFS: combining power-aware list-based scheduling algorithm with DVFS technique for real-time tasks incloud computing. J. Supercomput. 74(10), 5578– 5600 (2018) 26. Alahmadi, A., Che, D., Khaleel, M., Zhu, M.M., Ghodous, P.: An innovative energy-aware cloud task scheduling framework. In: 2015 IEEE 8thInternational Conference on Cloud Computing, pp. 493–500. IEEE June 27 2015 27. Fernández-Cerero, D., Jakóbik, A., Grzonka, D., Kołodziej, J., Fernández-Montes, A.: Security supportive energy-aware scheduling and energy policies for cloud environments. J. Parallel Distrib. Comput. 1(119), 191–202 (2018) 28. Garg, N., Goraya, M.S.: Task deadline-aware energy-efficient scheduling model for a virtualized cloud. Arabian J. Sci. Eng. 43(2), 829–841 (2018) 29. Kaur, T., Chana, I.: Energy efficiency techniques in cloud computing. ACM Comput. Surv. 48(2), 1–46 (2015). https://doi.org/10.1145/2742488 30. Lis, A., Sudolska, A., Pietryka, I., Kozakiewicz, A.: Cloud computing and energy efficiency: mapping the thematic structure of research. Energies 13(16), 4117 (2020). https://doi.org/10. 3390/en13164117
Fuzzy System for Facial Emotion Recognition Kanika Gupta1 , Megha Gupta1 , Jabez Christopher1(B) , and Vasan Arunachalam2 1 Department of Computer Science and Information Systems, BITS Pilani, Hyderabad Campus,
Hyderabad, Telangana, India [email protected] 2 Department of Civil Engineering, BITS Pilani, Hyderabad Campus, Hyderabad, Telangana, India
Abstract. Fuzzy logic-based systems can be used for representing and handling the vagueness and uncertainty involved in predicting human emotions. The performance of a fuzzy inference system depends on the design of the system. The fuzzification step involves the design of the membership functions, that characterize the fuzzy set of each linguistic variable. In this work, triangular, trapezoidal, gaussian and bell membership are used for designing fuzzy systems for facial expression-based emotion recognition. Each input variable is validated and tested with three, four, five and seven fuzzy sets for all four membership functions. The ideally partitioned range of the input variable is used as a baseline for comparison, and the parameters of the membership functions are tuned using Genetic Algorithms and Adaptive Neural Networks. Features are extracted using the Open face toolkit and a subset of features relevant to each emotion are chosen. The inference system is validated and tested using the benchmark CK + dataset. The performance of the fuzzy inference system is analyzed individually for seven basic emotions on the basis of membership function design, and also as a whole with respect to emotion recognition. The fuzzy systems with 4 and 5 fuzzy sets with gaussian and bell membership functions achieve an RMSE of 0.162, 0.053 and 0.123 for emotions contempt, happy and sad respectively. The average testing accuracy for all emotions is 88%; the emotions happy, anger and disgust were predicted with 99%, 96% and 95% accuracy respectively. The observations and conclusions of this study highlight the tacit knowledge that is gained in the design and analysis of different configurations of fuzzy inference systems. This can be used by knowledge engineers to design better systems for facial emotion recognition and allied applications. Keywords: Fuzzy inference system · Membership function · Genetic algorithms · Adaptive neural network · Facial Emotion recognition
1 Introduction Emotion recognition is the procedure of inferring human emotion from facial expressions, speech, or non-verbal signals such as gestures, postures and body movements. Human interaction is greatly enriched by comprehending and realizing how to react to © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2020, AISC 1351, pp. 536–552, 2021. https://doi.org/10.1007/978-3-030-71187-0_50
Fuzzy System for Facial Emotion Recognition
537
the behaviors of individual’s expressions. Emotion recognition systems find applications mainly in the field of robotics, telecommunications, gaming, psychology, behavioral study, human resource management, education, etc. Recent advancement in these domains has become a major motivation for the development of robust systems for recognizing emotions. The main focus of this work is, recognizing emotions using facial expressions. Emotions can be categorized into 7 fundamental emotions viz. happy, sad, surprise, fear, anger, disgust, and contempt on which the research in the field of emotion recognition is premised [1]. Emotions result in various facial muscle movements that humans exhibit inadvertently. Based on facial muscles, different facial movements can be parameterized and further used to represent different facial expressions. In the construction of these parameter sets, two major and promising attempts have been made: The Facial Action Coding Scheme (FACS) and the Parameters of Facial Animation (FAPs). FACS is a system to codify facial muscle movements that form a facial expression. Emotional FACS [1] maps various combinations of AUs and their intensities to emotions. FACS intensities are annotated by adding letters A-E (for minimum-maximum intensity) to the number of the action unit, where A means only a trace of emotion is detected and E means emotion is detected with maximum intensity. Using FACS programmers can manually code almost any facial expression that is anatomically possible, breaking it into the basic units of action (AU) that compose the expression. The proposed work leverages the fact that the AUs are independent of interpretation, and can be used for decisionmaking processes of greater order, including identification of emotions or programming commands for an intelligent environment. Existing strategies for the classification of emotions use various image processing and machine learning algorithms. Machine learning algorithms are employed to develop models that are capable of learning the relationship between AUs, facial muscle movements, and emotions. Many researchers have strived to solve the major challenges in developing FER systems using extant machine learning techniques like SVM, Random Forest and Weber Local Binary Image Cosine Transform (WLBI-CT) [2, 3]. Modeling emotions, and mapping them to intensity values requires mathematical functions and modeling approaches. Classical logic perceives all forms of knowledge and conclusions as crisp values – zero or one – true or false. However, problems arise when a solution possesses variable answers and there are no crisp distinctions between them. In such situations, the conclusions are the result of reasoning based on vague, incomplete knowledge in which the sampled results are mapped to the spectrum. In order to handle these uncertainties and vagueness, fuzzy logic and soft computing approaches are more appropriate than the classical boolean logic and mathematical approaches. Fuzzy logic is founded on the theory that humans arrive to conclusions based on indefinite and nonarithmetic information which is subjective in many cases. Fuzzy logic is a mathematical representation of vague and improper information, capable of recognizing, representing, manipulating, interpreting, and utilizing data that lacks certainty. 1.1 Overview of FIS Fuzzy arithmetic and fuzzy logic are the foundational concepts of Fuzzy Inference System (FIS). FIS consists of four modules, namely, Fuzzification module, a Knowledge
538
K. Gupta et al.
Base, Fuzzy Inference Engine and Defuzzification module. The process of translating the ’crisp’ inputs into fuzzy values is called fuzzification. Each crisp input belongs to a linguistic variable that corresponds to the domain of application. Linguistic variables in a fuzzy system use non-numeric values to encourage the interpretation of rules and information that cannot be expressed by numeric values of mathematical variables. Each linguistic variable is represented by more than one fuzzy set. A membership function (MF) characterizes the behavior of a fuzzy set; it maps each point in the input space to a membership value (or membership degree) between 0 and 1. Mathematically, in a universe of discourse U, the membership function µ for a fuzzy set F on the is defined as µF: U → [0,1]. It quantifies the degree of membership of the element in U (crisp value) to the fuzzy set F (fuzzified value). Fuzzy sets and the membership functions associated with them are commonly defined with respect to the linguistic variables and the domain requirements. Knowledge base is a collection of rules, developed by conversion of ‘crisp’ inputs to fuzzy values. It constitutes facts in terms of linguistic variables, resulting in approximate reasoning. The mapping of output fuzzy sets into a crisp number is known as defuzzification. There are several methods to perform defuzzification, such as, center of gravity, center of sums, center of the largest area, first of the maxima, middle of the maxima, maximum criterion and height defuzzification. Fuzzy inference systems can be applied to develop classification models for tasks which involve uncertainties and also require human reasoning; emotion recognition is one such typical task. The next section presents an overview of some recent facial emotion recognition (FER) systems employing fuzzy logic. 1.2 Fuzzy Inference Systems for Emotion Recognition Many researchers have proposed FER systems that can be analyzed on whether they analyze facial expressions with respect to the image of neutral expression or “deformation” of a single facial expression is taken into account or whether the differences in facial expression over a sequence of images is taken into account. Some of the early systems rely on the FACS developed by Ekman and Friesen that allows quantifying facial expression in terms of 44 Action units (AU). Cowie, et al. proposed a rule-based fuzzy system for emotion recognition that characterizes human emotion into six different, universal expressions (joy, surprise, anger, disgust, sadness) [4]. The facial feature points, left, right, top and bottom-most coordinates of the eye and mouth masks, the left and top coordinates of the eye-brow masks, as well as the nose coordinates are used as input to the fuzzy system. Esau, et al. proposed a real-time facial expression recognition model using fuzzy logic on the basis of distortions rather than AUs [5]. Several researchers have contributed their work to the field of emotion recognition based on facial expressions using fuzzy inference systems (FIS), machine learning approaches, optimization approaches and many more computing methods [6, 7]. These researches share a common objective, to wit, accuracy and efficiency, keeping in account different methods and parameters. Gheorghe, et al. proposed a method to employ fuzzification measurements of the face: eyebrow, eyelid and mouth from the detected human faces which are transferred to the decisional fuzzy system [6]. Our work on the other
Fuzzy System for Facial Emotion Recognition
539
hand builds seven different FIS, one for each emotion and combines the output of these, to result into the detected emotion. The major objective of our work is to determine the effects of the parameters used for building the fuzzy systems - number of fuzzy sets and type of membership function. Knowledge engineers and developers interact with domain experts to acquire different parameters of the FIS. But, lack of expertise and pitfalls in the knowledge acquisition process may penalize the efficiency of the FIS. The performance of FIS primarily depends on the factors that influence the fuzzification module. Some of the factors are as follows: number of fuzzy sets for each linguistic variable; choice of membership function that characterize the fuzzy sets; and finally, fitting appropriate values to the parameters of the membership functions. This work is based on all three aforementioned factors and a substantial amount of experimental analysis is conducted to achieve the main objectives, that are: 1. Design Fuzzy Inference Systems using Genetic Algorithm and Adaptive Neural Networks 2. Identify suitable parameters, membership function type and number of fuzzy sets, to develop fuzzy inference system for emotion recognition 3. Provide guidelines on the choice and tuning of membership functions for designing optimal recognition systems for individual emotion.
2 Proposed System The architecture comprises of two modules namely, Preprocessing module and Prediction module as shown in Fig. 1. The functions, processes and intermediate outputs of these modules are discussed in the following subsections. 2.1 Preprocessing Module Preprocessing generally refers to the data cleaning operations and data transformations applied on the raw data in order to enhance its quality and also thereby catalyze the performance of subsequent machine learning and knowledge-mining phases. The Extended Cohn-Kanade (CK +) dataset is the input to the preprocessing module. Compared to the Cohn-Kanade (CK) dataset, the extended version (CK +) of the dataset has additional 107 sequences as well as another 26 subjects, along with 486 sequences across 97 subjects within the current distribution of CK. CK + provides 593 labelled (with FACS coding) sequences of the subject’s impression of one of the seven basic emotion categories, anger, happy, sad, surprise, disgust, fear and contempt. A subset of these action units is encoded to depict existence of facial expression. More details on the CK + dataset can be found at [8]. Feature Extraction. The proposed system uses independent rule based fuzzy systems for each category of emotion. Each fuzzy system contributes in determining to which degree the facial expression belongs to the corresponding emotion. The process of feature extraction utilizes OpenFace [9] to extract features from a sequence of images. OpenFace is a real-time performance toolkit for facial behavior
540
K. Gupta et al.
Fig. 1. Proposed Emotion Recognition System Architecture
analysis. It can detect and track multiple faces by processing videos. OpenFace is used to determine the intensities of AUs for a given input image using HOG and SVM algorithm for face detection and Constrained local neural Fields (CLNF) algorithm for landmark detection. To extract action units from images, CK + dataset is loaded in OpenFace. The results from the sequence of images contain image id, for each AU its regression value (AU_r) depicting the intensities, classification value (AU_c, depicted as not present if 0 and present if 1) and confidence with additional information that is not of interest in this study. On the whole, there are 711 features which include 68 facial landmarks, 17 facial action units and miscellaneous attributes that are less relevant for our purpose. Feature Subset Selection. The process of identifying and removing irrelevant features, from a set of extracted features, that allows learning algorithms to work effectively is called, Feature Subset Selection. It is also known as attribute selection or dimensionality reduction. In this work, the feature selection phase involves two subtasks: obtaining derived features from the entire set of features and selection of an optimal subset. The former subtask deals with applying mathematical transformation to primitive features to obtain derived features. The latter deals with selection of an optimal subset of primitive features based on the obtained derived features. AU_r (intensity) and AU_c (presence) values are used to derive new features. Derived features are a disjunctive normal form (sum of products) of AU_c and AU_r. This ensures, for instance, if the value of AU_c (presence)
Fuzzy System for Facial Emotion Recognition
541
is 0, the corresponding AU_r (intensity) will not contribute in determining the emotion. These derived features are generated for all seven emotions categories. Feature subset selection is based on the derived features calculated in the previous stage. For each emotion, action units (AUs) are ranked based on the values of the derived feature, where higher the value, smaller is the rank. The next step is to investigate the differences between the values of derived features of adjacent AUs (in increasing order by rank), for a significant increase in value. The AUs are clustered into two groups bounded by this interval. Inferencing from the observations, established facts (stated by EFACS), and conglomeration of AU with higher ranks, a subset for each emotion is selected. The selected subset of AUs contributes more to the impression of emotion, as depicted by the value of the derived feature. This can be validated based on the studies that state description of emotions in terms of actions units [1]. Thus, the selected feature subsets are sufficient and optimal to represent the respective emotions. The final subset for each emotion is listed in Table 1.
Table 1. Subset of features selected for each emotion Emotion
Action units
Anger
AU4, AU7, AU17, AU20
Contempt AU12, AU14, AU20 Disgust
AU4, AU6, AU7, AU9, AU20
Fear
AU1, AU5, AU7, AU20
Happy
AU6, AU7, AU12, AU20, AU25
Sad
AU1, AU4, AU15, AU20
Surprise
AU1, AU2, AU5, AU20, AU26
2.2 Prediction Module The outcome of the preprocessing module, the preprocessed data with 705 records and optimal subsets of action units for each emotion, is the input to the prediction module. It is split into training and testing sets using randomized stratified technique with an 80:20 ratio. This module consists of the Genetic algorithm optimized Fuzzy Inference System (GAFIS) and Adaptive Neuro-Fuzzy Inference System (ANFIS). For each of the seven emotions, a separate FIS is developed. These fuzzy systems are combined to generate an emotion recognition system capable of categorizing impressions to one of the seven emotions. The FIS employs a winner-takes-it-all approach for inference. In this work, multiple GAFIS and ANFIS with varying combinations of MFs type (trimf, trapmf, gaussmf, gbellmf) and MFs numbers (3, 4, 5, 7) are developed. Each system is built on homogenous membership function. These systems are compared to identify optimal parameters to develop an emotion recognition system using fuzzy logic. A baseline (ideal) FIS is developed where the membership functions are ideally
542
K. Gupta et al.
distributed and only the knowledge base is learned by using GA. This baseline model is used as a benchmark against which GAFIS with tuned MFs, and ANIFS are evaluated. Genetic Algorithm Optimized Fuzzy Inference Systems. Genetic algorithms are a type of stochastic algorithms developed to imitate the process of natural selection. GA is a randomized search algorithm that operates on string-like structures (chromosomes) which evolve with time. After every generation, a new chromosome is created from parts of the fittest members. Genetic algorithms work on encoded parameters, and not the actual values. Learning rules using GA results in the rule base (knowledge base) for FIS. It does not require any predefined (user-defined) rules. For learning the rules, each chromosome is initialized randomly and encodes the whole rule base of FIS. Formally defining, N = number of membership functions for a linguistic variable. S = number of rules. L = number of linguistic variables j Mi =jth fuzzy set of ith linguistic variable j j Pi = parameter set for membership function of Mi A rule R can be represented by a R = M1a1 ||M2a2 ||M3a3 || . . . Ml l where ai is a random integer ∈ [1, N] (1) A chromosome C in learning phase can be denoted by C = R1 ||R2 ||R3 || . . . Rs where each Ri is selected, randomly
(2)
For tuning GAFIS, parameters of membership functions are adjusted, using the rules learned from the learning process. The rule base is not modified, only the fuzzy sets are fine-tuned. Let Qi represent the parameter set for ith linguistic variable, which is given by Qi = [Pi1 ||Pi2 || . . . ||PiN ]
(3)
A chromosome C in tuning phase can be denoted by C = [Q1 ||Q2 || . . . ||QN ]
(4)
Further, GAFIS performs the following operations: 1. Fitness Evaluation: Each chromosome is evaluated by using an objective function to analyze the deviation of predicted values from actual. 2. Selection: The parent chromosomes are selected to create a new offspring. This work uses a stochastic uniform approach which each parent corresponds to a section on a line. It ensures selection of parents based on a uniform random step taken along the line. 3. Recombination: Recombination of selected individuals (parents) is achieved by applying the crossover operator. In this work scatter-crossover is performed and the crossover rate (crossover fraction) controls the number of crossover children that constitute each population.
Fuzzy System for Facial Emotion Recognition
543
4. Variation: Variation of individuals is achieved by applying the mutation operator. This work uses the gaussian mutation operator which is controlled by the scale parameter and shrink parameter. The former controls the standard deviation of the mutation, whereas the latter controls the mutation decrease rate. The fittest chromosome from the last population forms the rule base for our fuzzy inference system when learning rules using GA. On the other hand, the fittest chromosome from the last population defines the fuzzy sets for each linguistic variable, for each input when tuning FIS using GA. The parameters of GAFIS are listed in Table 2. Table 2. The parameters used for tuning GAFIS Parameter
Value/Type
Max iterations
25
Max generation
25
Population size
50 if number of variables < = 5, else 200
Variation/Crossover rate
0.80
Recombination/Mutation values
Scale = 1, Shrink = 1
Number of membership functions
3, 4, 5, 7
Type of membership functions
Triangular, Trapezoidal, Gaussian, Bell
Adaptive Neuro-Fuzzy Inference System. ANFIS is a Sugeno-type fuzzy inference system-based feed-forward network that provides adaptability. In ANFIS, linguistic variables are provided as inputs which are mapped to the corresponding membership functions in the next (first hidden) layer. This layer is completely connected to the next hidden layer, in which each perceptron represents a rule for the FIS. Each of these rules correspond to a single resultant MF in the next layer. At the output layer, output from each rule is defuzzied to obtain a single crisp value for predicting emotion. In this work, ANFIS is generated using a grid-partitioning method and employs a combination of backpropagation gradient descent and least squares method to train models on input data. The parameters of ANFIS are listed in Table 3.
2.3 Performance Analysis Performance analysis is a technique involving systematic observations and inference to enhance performance of the system and improve the decision-making process. In this work, four different metrics are used to evaluate the prediction results. Root mean squared error (RMSE) is assessed for verification of the type of rough fuzzy sets for each emotion. Metrics like accuracy using confusion matrix, KL divergence and Intersection (Jaccard) similarity are evaluated to analyze the performance of GAFIS and ANFIS based facial emotion recognition systems against a baseline GAFIS with ideal membership functions.
544
K. Gupta et al. Table 3. The parameters used for tuning ANFIS Parameter
Value/Type
Input layer neurons
25
Output layer neurons 25 Learning algorithm
Hybrid
Error tolerance
0.1
Generation
Grid partitioning
Training epochs
100
It can be inferred that the happy class achieves highest accuracy on the CK + dataset, which means that happy expression is more distinguishable than the other ones. Detailed experimental results are tabulated in Sect. 3.
3 Results and Discussion Experiment is performed with a two-fold evaluation focus: 1) Evaluating each emotion to determine the best representation in terms of MFs type and MFs number and; 2) Evaluating GAFIS and ANFIS against ideal FIS to the optimal parameters for FER systems. In the first scenario, to measure the performance of linguistic condition for emotion recognition using facial components, RMSE values for each variation of an emotion is compared, while in the second scenario comparison is carried out to measure the performance of fuzzy emotion recognition as the final result. 3.1 Evaluation of FIS for Individual Emotions For emotion Anger (refer Table 4.), RMSE values show less deviation in ideal and GAFIS. In ANFIS systems, for triangular and gaussian type membership function (MF), there is a significant decrease in RMSE value as the number of MFs increases. This can be the result of overfitting the model. FIS systems with 3, 4, and 5 fuzzy sets show comparable prediction results with trapezoidal and gaussian bell MFs, but the performance decreases when 7 fuzzy sets are used. The best RMSE value achieved for emotion anger is 0.2969 (trapezoidal, 5). The RMSE values for ideal FIS and GAFIS show a similar trend for emotion contempt, as seen in the case of anger, that is, the values differ marginally. In the case of ANFIS systems, the errors increase with the increase in the number of fuzzy sets. The best performance is achieved from tuned system with 4 fuzzy sets of gaussian type resulting in an RMSE value of 0.1773. For disgust emotion, a peak in the RMSE value is observed with 4 and 5 fuzzy sets of the gaussian bell and trapezoidal membership functions (MF) respectively. ANFIS system with trapezoidal membership functions (3 fuzzy sets) and an RMSE value of 0.1224 perform best overall. ANFIS with trapezoidal MF show comparable performance to GAFIS with trapezoidal MF, with least RMSE value of 0.159 for emotion fear. For happy emotion, the RMSE values (least, 0.053) are significantly less than other emotions which can be attributed to a greater number
0.36
0.34
0.34
0.36
0.36
0.34
0.35
0.37
0.36
0.34
0.34
0.34
0.47
1.36
2.31
0.33
Ideal, trapmf, 4
Ideal, trapmf, 5
Ideal, trapmf, 7
Ideal, gaussmf, 3
Ideal, gaussmf, 4
Ideal, gaussmf, 5
Ideal, gaussmf, 7
Ideal, gbellmf, 3
Ideal, gbellmf, 4
Ideal, gbellmf, 5
Ideal, gbellmf, 7
ANFIS, trimf, 3
ANFIS, trimf, 4
ANFIS, trimf, 5
ANFIS, trimf, 7
ANFIS, trapmf, 3
0.32
0.36
Ideal, trapmf, 3
ANFIS, trapmf, 7
0.33
Ideal, trimf, 7
0.30
0.33
Ideal, trimf, 5
0.30
0.35
Ideal, trimf, 4
ANFIS, trapmf, 5
0.36
Ideal, trimf, 3
ANFIS, trapmf, 4
Anger
FIS
0.56
0.21
0.18
0.19
1.96
0.66
0.25
0.17
0.23
0.22
0.19
0.21
0.23
0.21
0.20
0.21
0.24
0.23
0.20
0.21
0.24
0.22
0.19
0.18
Contempt
0.23
0.67
0.15
0.12
0.16
0.22
0.40
0.81
0.25
0.18
0.20
0.23
0.26
0.17
0.21
0.22
0.27
0.19
0.18
0.22
0.26
0.18
0.40
0.44
Disgust
0.17
0.17
0.18
0.18
1.01
1.22
2.72
0.27
0.21
0.19
0.21
0.22
0.21
0.18
0.22
0.21
0.22
0.18
0.20
0.22
0.21
0.19
0.20
0.18
Fear
0.17
0.10
0.15
0.30
0.10
0.07
0.09
0.12
0.17
0.14
0.15
0.22
0.18
0.12
0.16
0.21
0.18
0.15
0.13
0.22
0.18
0.14
0.14
0.09
Happy
0.31
0.25
0.20
0.24
0.62
2.09
0.24
0.22
0.22
0.21
0.22
0.23
0.22
0.20
0.23
0.22
0.23
0.22
0.23
0.22
0.23
0.20
0.22
0.21
Sad
0.83
3.69
0.86
1.56
5.17
1.75
0.28
0.96
0.29
0.29
0.29
0.28
0.31
0.29
0.28
0.87
0.30
0.30
0.29
0.87
0.29
0.29
0.28
0.87
Surprise
GAFIS, gbellmf, 7
GAFIS, gbellmf, 5
GAFIS, gbellmf, 4
GAFIS, gbellmf, 3
GAFIS, gaussmf, 7
GAFIS, gaussmf, 5
GAFIS, gaussmf, 4
GAFIS, gaussmf, 3
GAFIS, trapmf, 7
GAFIS, trapmf, 5
GAFIS, trapmf, 4
GAFIS, trapmf, 3
GAFIS, trimf, 7
GAFIS, trimf, 5
GAFIS, trimf, 4
GAFIS, trimf, 3
ANFIS, gbellmf, 7
ANFIS, gbellmf, 5
ANFIS, gbellmf, 4
ANFIS, gbellmf, 3
ANFIS, gaussmf, 7
ANFIS, gaussmf, 5
ANFIS, gaussmf, 4
ANFIS, gaussmf, 3
FIS
0.34
0.31
0.34
0.36
0.34
0.30
0.35
0.36
0.33
0.34
0.34
0.36
0.33
0.32
0.33
0.35
1.26
0.49
0.34
0.41
1.05
0.80
0.39
0.33
Anger
0.22
0.21
0.17
0.21
0.20
0.20
0.16
0.19
0.21
0.22
0.17
0.19
0.22
0.21
0.18
0.18
0.93
0.40
0.23
0.18
2.21
0.44
0.18
0.18
Contempt
0.19
0.16
0.50
0.21
0.17
0.17
0.16
0.19
0.21
0.16
0.17
0.16
0.21
0.15
0.18
0.57
0.19
0.71
0.36
0.26
0.60
0.44
Disgust
Table 4. Performance Evaluation (RMSE based) of fuzzy systems for each Emotion
0.20
0.18
0.18
0.18
0.19
0.18
0.18
0.19
0.22
0.18
0.16
0.20
0.18
0.19
0.16
0.18
1.57
0.70
2.66
0.18
1.86
1.56
1.99
0.18
Fear
0.13
0.10
0.12
0.20
0.12
0.11
0.16
0.17
0.14
0.12
0.16
0.14
0.14
0.11
0.12
0.28
0.09
0.05
0.07
0.08
0.05
0.09
Happy
0.19
0.21
0.22
0.22
0.20
0.20
0.22
0.22
0.20
0.22
0.22
0.21
0.21
0.20
0.22
0.21
0.66
0.60
0.25
0.18
0.72
0.43
0.27
0.21
Sad
0.28
0.28
0.29
0.27
0.29
0.27
0.28
0.26
0.31
0.30
0.28
0.27
0.29
0.28
0.28
0.87
0.65
1.14
0.71
0.76
1.53
0.87
Surprise
Fuzzy System for Facial Emotion Recognition 545
546
K. Gupta et al.
of training data examples. The emotion sad is most distinguishable when represented using gaussian bell MF with 3 fuzzy sets each in ANFIS, resulting in the least RMSE value of 0.184. For surprise emotion, ideal FIS and GAFIS perform significantly better than ANFIS based FER systems. In ANFIS FER systems, the optimal performance was achieved by triangular MF with 4 fuzzy sets, resulting in an RMSE value of 0.2037, which is the lowest for this emotion. 3.2 Accuracy of Ideal FIS, GAFIS and ANFIS Accuracy of the fuzzy based FER system depends on prediction accuracies of underlying FIS for each emotion. The parameters resulting in more accurate prediction of an emotion will contribute more to the overall accuracy of the FIS. The ideal FIS serves as a baseline system against which GAFIS with membership tuning and ANFIS based FER systems are compared in this work. The average accuracy rate for ideal FIS based FER systems is 76%. The highest average accuracy as depicted in Table 5., results from ideal FIS using trapezoidal type MF with 4 fuzzy sets for each linguistic variable (0.82) and the lowest from ideal FIS using trapezoidal type MF with 7 fuzzy sets (0.68). Class happy achieved the highest average accuracy (0.96) whereas class fear achieved the lowest average accuracy (0.43). The drop in accuracy for class fear can be attributed to the reason of imbalance in the dataset: a smaller number of positive samples for fear class. For this experiment, GAFIS systems is trained to recognize emotion from one of the seven fundamental categories. These systems are validated using k-fold cross validations. Table 6 presents the accuracies for all variations of GAFIS systems. The average accuracy rate for GAFIS based FER systems is 82%. The highest average accuracy is achieved in GAFIS with triangular type 4 fuzzy sets for each linguistic variable (0.85) and the lowest with 4 fuzzy sets of gaussian bell type membership functions (0.63). GAFIS performs better than ideal FIS. The average prediction accuracy for all emotions is more than 50% in GAFIS. To evaluate ANFIS based FER systems, ANFIS with 3,4,5 membership functions of each type are trained. Training ANFIS with 7 membership functions for any emotion was computationally hard, as required resources were not available. For this reason, this experiment tabulates only triangular and trapezoidal type with 7 fuzzy sets. The performance of other variants (gaussmf, gbellmf) with 7 fuzzy sets can be extrapolated using these results. The average accuracy rate for ANFIS based FER systems is 88%. The highest average accuracy as depicted in Table 7., results from ANFIS using triangular, gaussian, gaussian bell type MFs with 5 fuzzy sets for each linguistic variable (0.91) and the lowest from trapezoidal type MF with 3 fuzzy sets (0.86). Class happy achieved the highest average accuracy (0.99) whereas class disgust, anger achieved the average accuracy (0.96). ANFIS based FER systems perform better than GAFIS based systems (all variations).
0.76
0.57
0.84
0.35
1.00
0.52
0.84
0.77
Contempt
Disgust
Fear
Happy
Sad
Surprise
Avg Acc
0.78
0.83
0.39
0.99
0.35
0.85
0.42
0.90
0.79
0.82
0.42
1.00
0.36
0.86
0.51
0.86
0.78
0.82
0.42
1.00
0.36
0.86
0.49
0.84
0.81
0.85
0.49
0.99
0.51
0.96
0.60
0.81
tri
bell
MF: 4
gaus
tri
trap
MF: 3
Anger
Emotion
0.82
0.89
0