Proceedings of International Conference on Data Science and Applications: ICDSA 2022, Volume 2 9811966338, 9789811966330

This book gathers outstanding papers presented at the International Conference on Data Science and Applications (ICDSA 2

682 74 25MB

English Pages 907 [908] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Proceedings of International Conference on Data Science and Applications: ICDSA 2022, Volume 2
 9811966338, 9789811966330

Table of contents :
Preface
Contents
Editors and Contributors
Improving River Streamflow Forecasting Utilizing Multilayer Perceptron-Based Butterfly Optimization Algorithm
1 Introduction
2 Study Area
3 Methodology
3.1 MLP
3.2 BOA
4 Results and Discussions
5 Conclusion
References
COVID-19 Contact Tracing Using Low Calibrated Transmission Power from BLE—Approach and Algorithm Experimentation
1 Introduction
2 Overcoming RSSI Shortcomings
2.1 Internal Factors
2.2 External Factors
3 Solution Approach—Experimentation and Results
3.1 Experimentation, Hardware and Software Set-Up
3.2 Experimentation Part One—Proof of Concept
3.3 Experimentation Part Two—Algorithm Test
3.4 Discussion
4 Conclusion
References
Monitoring Loud Commercials in Television Broadcast
1 Introduction
2 Materials and Methods
3 Results and Discussion
4 Conclusion
References
Potential Customers Prediction in Bank Telemarketing
1 Introduction
2 Dataset and Preprocessing
2.1 Data Description
2.2 Data Correlation
2.3 Category Data Encoding
3 Experimental and Result
3.1 Data Mining Models
3.2 Result
4 Conclusion
References
Analysis and Implementation of Normalisation Techniques on KDD’99 Data Set for IDS and IPS
1 Introduction
2 Intrusion Detection System
2.1 Taxonomy of IDS
2.2 Intrusion Detection Methodologies
2.3 IDS and Their Functions
2.4 Data Sets for IDS
3 An Approaches of Machine Learning
3.1 Decision Tree
3.2 Naive Bayes
3.3 K-nearest Neighbour
3.4 Artificial Neural Network
3.5 Support Vector Machines
3.6 Fuzzy Logic
4 Proposed Research Work
5 Normalisation
5.1 Quick Overview of Normalisation Techniques
6 Literature Review
7 Proposed Work
8 Research Methodology
8.1 Evaluation Metrics
9 Result and Discussion
9.1 Comparition of Normalisation Techniques
10 Z-Score Implementation on KDD CUP ’99
11 Conclusion
References
Deep Neural Networks Predicting Student Performance
1 Introduction
2 Methodology
2.1 Dataset and Data Processing
2.2 Deep Neural Network
3 Results and Discussions
4 Conclusion
References
An Efficient Group Signature Scheme Based on ECDLP
1 Introduction
2 Preliminaries
2.1 Background of Elliptic Curve Group
2.2 ECDLP Assumption
3 Proposed GS Scheme
3.1 KGC
3.2 Extract
3.3 GroupSign
3.4 GroupVerif
4 Security Analysis
5 Application
6 Conclusion
References
Sentiment Analysis of COVID-19 Tweets Using TextBlob and Machine Learning Classifiers
1 Introduction
2 Related Works
3 Design and Analysis
3.1 Data Extraction
3.2 Data Preprocessing
3.3 Visualization
3.4 Model Training
3.5 Support Vector Machine (SVM)
3.6 Classifier and Performance Measure
4 Finding and Discussion
5 Conclusion and Future Work
References
Reflection of Star Ratings on Online Customer Reviews; Its Influence on Consumer Decision-Making
1 Introduction to Star Ratings and Online Customer Reviews
2 Literature Review
3 Methodology
4 Discussion
4.1 Sentiment Analysis
4.2 Star Rating and Sentiment Analysis
4.3 Emotion Analysis
4.4 Discrepancy of Star Rating with Sentiments and Emotions of the Customer Reviews
5 Major Findings and Recommendations
5.1 Recommendations
5.2 Scope for Further Study
5.3 Optimization Process to Assess Star Rating and Reviews
6 Conclusion
Appendix
References
An Integrated Machine Learning Approach Predicting Stock Values Using Order Book Details
1 Introduction
2 Review of Literature
3 Proposed Methodology
3.1 Multiple Linear Regression (MLR)
3.2 Long Short Term Memory (LSTM)
3.3 K-means Clustering
3.4 Bayesian Correlation
4 Sample Data Analysis
5 Conclusion
References
Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron Optimization
1 Introduction
2 Methodology
2.1 Overview of Bees Algorithm (BA)
2.2 Improving Global Search Phase of BA by Employing GA
2.3 Our Proposed Method: HGBA for Training MLP
3 Experiments Settings
4 Results and Analysis
4.1 Means Squared Error
4.2 Shrink Factor (sf)
4.3 The Number of Scout Bees
4.4 MSE and Accuracy
5 Conclusion
References
Intellectual Identification Method of the Egg Development State Based on Deep Neural Nets
1 Introduction
2 The Ovoscoping Model on the Basis of the Convolution Neural Net of 2D Generalized Lenet
3 The Ovoscoping Model on the Basis of the Visual Transformer of One-Block Vit
4 Choice of Quality Criterion of the Method for Identification of the State of Egg Development
5 Determination of the Identification Method Structure for the State of the Egg
6 Numerical Research
7 Conclusions
References
Predicting Order Processing Times in E-Pharmacy Supply Chains During COVID Pandemic Using Machine learning—A Real-World Study
1 Introduction
2 Literature Review
3 Problem Statement
4 Materials and Methods
5 Feature Engineering
6 Model Construction and Training
7 ML Regressors for Order Processing Time Prediction Problem
8 ML Classifiers for Shipment Time Prediction Problem
9 Results
10 Discussion
11 Conclusion and Future Work
11.1 Disclosure Statement
11.2 Funding
References
Cognitive Science: An Insightful Approach
1 Introduction
2 Background and Related Work
2.1 Artificial Intelligence
2.2 Neuroscience
2.3 Artificial Neural Network
2.4 Robotics
3 Information Processing Approach
3.1 Recent Advances
3.2 Implementation Approach
4 Conclusion
References
Predicting the Dynamic Viscosity of Biodiesels at 313 K Using Empirical Models
1 Introduction
2 Material and Methods
2.1 Data
2.2 Machine Learning Models (MLMs)
2.3 QM and MLR
3 Results and Discussion
3.1 MLMs
3.2 QM and MLR
3.3 Performance Evaluation of Machine Learning Models, MLR and QM
4 Conclusions
References
Artificial Neural Networks, Quadratic Regression, and Multiple Linear Regression in Modeling Cetane Number of Biodiesels
1 Introduction
2 Material and Methods
2.1 Data
2.2 Machine Learning Models
2.3 QM and MLR Models
3 Results and Discussion
3.1 Machine Learning Models
3.2 Mathematical Models
3.3 Performance Evaluation of Machine Learning Models, MLR, and QM
4 Conclusions
References
AI-Based Automated Approach for Trend Data Generation and Competitor Benchmark to Enhance Voice AI Services
1 Introduction
2 Related Work
3 Proposed Framework
4 Trending Data Generation
4.1 Generation of Unstructured Utterances
4.2 Generation of Structured Utterances
5 AI-Based Voice Solutions Benchmarking
5.1 Model Evaluation Concepts on BERT
5.2 ASR End-to-End Evaluation Method
5.3 Model Based E2E Evaluation of Trend Data
5.4 Model-Based NLU and E2E Evaluation for Native Apps
6 Results and Impact
6.1 Assessment of Effectiveness Result
7 Conclusion and Future Scope
References
Identification of ADHD Disorder in Children Using EEG Based on Visual Attention Task by Ensemble Deep Learning
1 Introduction
2 Methods
2.1 Subjects
2.2 Preprocessing
2.3 Classification Models
2.4 Experimentation Setup
3 Results
3.1 Results of Independent Architecture
3.2 Result of Ensemble Framework
3.3 Comparison of Classification Results
4 Conclusion
References
Machine Learning as a Service (MLaaS)—An Enterprise Perspective
1 Introduction
2 Machine Learning Applications
2.1 Health care
2.2 Education
2.3 Economy and Finance
2.4 Social Networks
2.5 Complementary Applications
3 Companies that Develop Machine Learning Techniques
4 Data Protection Privacy in Machine Learning
5 Trends in Machine Learning Jobs
6 GPUs Evolution
7 Conclusions
References
Very Low Illumination Image Enhancement via Lightness Mapping
1 Introduction
2 The Proposed Method
3 Image Quality Metrics
4 Experimental Results
5 Conclusion
6 Future Work
References
Clustering High Dimensional Transcriptomic Data with Spectral Clustering for Patient Subtyping
1 Introduction
2 Proposed Methodology
2.1 t-distributed Stochastic Neighborhood Embedding (t-SNE)
2.2 Spectral Clustering
3 Results and Discussion
4 Conclusion
References
3D CNN-Based Classification of Severity in COVID-19 Using CT Images
1 Introduction
2 Materials and Methods
2.1 Introduction to 3D CNN
2.2 Dataset Description
2.3 Data Pre-processing
2.4 Model Design
2.5 Results and Discussion
3 Conclusions and Future Scope
References
A Hybrid Architecture for Action Recognition in Videos Using Deep Learning
1 Introduction
2 Literature Survey
2.1 Data Set
3 Proposed Architecture for Activity Identification
3.1 Architecture Diagram
3.2 Implementation
4 Results and Discussion
5 Conclusion
References
Data Envelopment Analysis: A Tool for Performance Evaluation of Undergraduate Engineering Programs
1 Introduction
1.1 Data Envelopment Analysis
1.2 DEA Versus Conventional Efficiency Approaches
1.3 Review of Literature
2 Problem Definition and Requirement Analysis
3 Data Collection and Validation
4 Solution Design
4.1 Data Normalizing
4.2 Importing Data into R and Installing Packages
4.3 Designing the DEA Model on R
4.4 Verification on Banxia’s Frontier Analyst
5 Analysis of Results
5.1 Results Obtained in R
5.2 Results Obtained on Banxia’s Frontier Analyst
5.3 Summary of Results
6 Conclusion and Future Enhancements
Appendix 1: Program Code for DEA of Departments Using R
References
The Role of Big Data in Color Trend Forecasting: Scope and Challenges-A Systematic Literature Review
1 Introduction
2 Methodology
3 Applications of Big Data in Color Forecasting
3.1 Big Data and Artificial Intelligence
3.2 Curating Datasets
3.3 Color Forecasting Process
4 Benefits and Challenges of Integrating Big Data in Color Forecasting
5 Discussions
6 Conclusions
References
Forensic Facial Recognition: Review and Challenges
1 Introduction
2 Facial Recognition Systems
3 Challenges of Forensic Facial Recognition
3.1 Facial Aging
3.2 Sketch Recognition
3.3 Facial Artifacts
3.4 Degraded/Uncontrolled Conditions
3.5 Pose Variations
3.6 3D Reconstructions
3.7 Access Control
3.8 Low Quality Images
3.9 Imperfect Facial Data
3.10 Privacy Preservations
3.11 Plastic Surgery
4 Proposed System
4.1 Input Image
4.2 Pre-processing
4.3 Face Detection
4.4 Feature Extraction
4.5 Classification
5 Implementation
5.1 Parameter Selection
5.2 Experimental Setup
6 Result Analysis
7 Conclusion
References
Spatio-Temporal Analysis of Urbanization by Using Supervised Image Classification with Correlation of Land Surface Temperature and Topography
1 Introduction
2 Study Area
3 Materials and Methods
3.1 Data Collection and Workflow
3.2 Land Use and Land Cover Classification
3.3 Calculation of NDVI
3.4 Calculation of NDWI
3.5 Land Surface Temperature (LST)
3.6 Urban Heat Island Retrieval
3.7 Correlation Analysis
4 Results and Discussions
4.1 Land Use and Land Cover Analysis
4.2 Normalized Difference Vegetation Index Distribution Analysis (NDVI)
4.3 Normalized Difference Water Index Distribution Analysis
4.4 Land Surface Temperature Distribution (LST)
4.5 Correlation Analysis Between LST and the Factors Influencing LST
5 Conclusions and Suggestions
References
COVID-DenseNet: A Deep Learning Architecture to Detect COVID-19 from Chest Radiology Images
1 Introduction
2 Related Works
2.1 Deep Learning in Computer Vision
2.2 Detection of COVID-19
3 Methodology
3.1 Data Generation
3.2 Preprocessing
3.3 Model Architecture
3.4 Model Implementation
3.5 Prediction and Heatmap Generation
4 Results and Discussion
4.1 Main Results
4.2 Detailed Results
4.3 Patient-Wise Cross-Validation
4.4 Different Initial Weights
4.5 Comparison with Standard Computer Vision Models
4.6 Qualitative Analysis
5 Limitations and Future Works
6 Conclusion
References
Pneumonia Chest X-ray Classification Using Support Vector Machine
1 Introduction
2 Methodology
2.1 Dataset
2.2 Image Segmentation
2.3 Feature Extraction
2.4 Classification
3 Results
4 Conclusions
References
Linking Social Media Data and Clinical Methods to Detect Depression Using Artificial Intelligence: A Review
1 Introduction
2 Modules
2.1 Identification
2.2 Screening
2.3 Eligibility
2.4 Included
3 Analysis of Papers on Social Media Data
3.1 Limitations Found in the Use of Social Media Data
4 Analysis of Papers Using Medical Data
4.1 Unimodal
4.2 Multimodal
5 Reviewed Databases for Medical Data
6 Discussions Based on the Above-Mentioned Databases
7 Preprocessing
8 Feature Extraction
9 Result
9.1 Textual Data
9.2 Acoustic Data
9.3 Limitations
10 Conclusion
References
Triplet Multi-task Learning Strategy for Person Re-identification Using Deep Learning
1 Introduction
2 Related Work
3 Triplet Multi-task Learning Strategy
3.1 Region Aligned Pooling
3.2 Semantic Segmentation
3.3 Triplet Prediction (Triplet Loss)
3.4 Triplet Multi-loss Training
4 Experiment
4.1 Dataset
4.2 Implementation Details
4.3 Results and Discussion
4.4 Ablation Study
5 Conclusion
References
K-Means Algorithm to Form Dynamic Cluster Formation to Counter the Static Property of K-Means
1 Introduction
2 Literature Survey
3 K-Means Explored
3.1 Data Redundancy
3.2 Load Balancing
3.3 High Availability
3.4 Monitoring and Automation
4 Research Methodology
4.1 Pseudo Code for the Algorithm
4.2 Distance Formula
4.3 Proposed Methodology for Cluster Count
4.4 Cluster Formation
4.5 Cluster Count Iteration for 300 Dataset Point
5 Result and Analysis
5.1 Cluster Count Iteration for 500 Dataset Point
5.2 Cluster Count Iteration for 2000 Dataset Point
5.3 Cluster Count Iteration for 3000 Dataset Point
5.4 Time Comparison for Formed Clusters
6 Conclusion
References
Analysis of an Efficient Elite Group-Based Routing Protocol for Wireless Sensor Networks
1 Introduction
2 Related Work
3 Proposed Work
3.1 Proposed Network Model
3.2 Cluster Head Selection Scheme
3.3 Association of Nodes
3.4 Selection of Elite Group
4 Result and Analysis
5 Conclusion
References
A Systematic Study of Fake News Detection Systems Using Machine Learning Algorithms
1 Introduction
2 Literature Review
3 A Framework of Fake News Detection (FND) System
4 Various Fake News Classification Methods
4.1 Logistic Regression (LR) Classifier
4.2 KNN Classifier
4.3 Decision Tree (DT) Classifier
4.4 Random Forest (RF) Classifier
4.5 Naïve Bayes (NB) Classifier
4.6 Support Vector Machine (SVM) Classifier
4.7 Long Short-Term Memory (LSTM) Classifier
5 Existing Result Analysis
6 Conclusion and Further Work
References
Training Logistic Regression Model by Hybridized Multi-verse Optimizer for Spam Email Classification
1 Introduction
2 Preliminaries and Related Works
3 Proposed Method
3.1 Basic MVO Algorithm
3.2 MVO Hybridized with ABC Metaheuristics
4 Experimental Setup, Findings and Comparative Analysis
5 Conclusion
References
Optimization of Spatial Pyramid Pooling Module Placement for Micro-expression Recognition
1 Introduction
2 Recent Works
2.1 Traditional Feature-Based Approach
2.2 Convolutional Neural Network Feature-Based Approach
3 Methodology
3.1 Dataset
3.2 Convolutional Neural Network Model
3.3 Spatial Pyramid Pooling
4 Result and Discussion
5 Conclusion
References
Image Colorization: A Convolutional Network Approach
1 Introduction
2 State of Art
3 Methodology
3.1 Hyperparameter
3.2 Data Pre-processing
3.3 Network Architecture
4 Experimental Results and Discussion
4.1 About Dataset
4.2 System Configuration
4.3 Result and Discussion
5 Conclusion and Future Work
References
Prediction of Particulate Matter (PM2.5) Across India Using Machine Learning Methods
1 Introduction
2 Methodology
2.1 Data Collection
2.2 Preparation of Data Set
2.3 Machine Learning Algorithms Used for Building Models
2.4 Procedure of Designing Prediction Models
3 Results and Discussion
4 Conclusion and Future Work
References
Convolutional Neural Network for COVID-19 Detection
1 Introduction
2 Literature Overview
2.1 Existing Work
2.2 Drawback of Existing Work
2.3 Our Contribution
3 Proposed Work
4 Results and Discussion
4.1 Supported by X-ray Model
4.2 Supported by CT Scan Model
5 Conclusion
References
Posit Extended RISC-V Processor and Its Enhancement Using Data Type Casting
1 Introduction
2 Background
2.1 Posit Format
2.2 RISC-V ISA
3 Approaches to Enhance RISC-V ISA with Posit Arithmetic
3.1 Posit Integration as a Tightly Coupled Unit by Replacing the F-Extension
3.2 Posit Integration as an Accelerator by Utilizing the Custom Opcode Space of the RISC-V ISA
3.3 Posit Integration as a Tightly Coupled Unit Using Custom Opcode Space
4 Implementation of Data Type Casting in RV32IMF_XPosit
4.1 MOT: Mixed Operand Type Block
4.2 DTC: Data Type Converter Block
5 Implementation Results
6 Conclusion
References
Securing Microservice-Driven Applications Based on API Access Graphs Using Supervised Machine Learning Techniques
1 Introduction
2 Related Work
3 Proposed System
4 Experiment
4.1 Dataset
4.2 The Graph
4.3 node2vec
4.4 Classification Algorithms
5 Results and Discussion
6 Future Research Directions
7 Conclusion
References
Scaling and Cutout Data Augmentation for Cardiac Segmentation
1 Introduction
2 Related Work
3 Methods
3.1 Overview
3.2 CNN Network Structure
3.3 Data Augmentation
4 Experiment Result and Discussion
4.1 Dataset
4.2 Experimental Setup
5 Conclusion
References
43 An Improved Method to Recognize Bengali Handwritten Characters Using CNN
Abstract
1 Introduction
2 Proposed Methodology
2.1 Image Dataset Collection and Preprocessing
3 Experimental Analysis and Discussion
3.1 Training and Validation Accuracy and Loss
3.2 Performance Accuracy
3.3 Testing Result
4 Conclusion and Future Work
References
Dynamic Pricing for Electric Vehicle Charging at a Commercial Charging Station in Presence of Uncertainty: A Multi-armed Bandit Reinforcement Learning Approach
1 Introduction
2 Model and Problem Formulation
3 Solution Techniques
4 Simulations and Results
5 Conclusion
References
Photo Restoration: A Sequential Pipeline Approach Involving Denoising and Deblurring
1 Introduction
2 Related Works
3 Methodology
3.1 Pipeline Overview
3.2 Dataset Consolidation
3.3 Metrics Used
4 Implementation Details
4.1 Spatial Filters
4.2 Deblurring
4.3 Denoising
5 Experiments and Results
5.1 Noise Generation
5.2 Pipeline Formation
5.3 Experimental Analysis
5.4 Performance Evaluation
6 Conclusion and Future Work
References
Development of a Linear-Scaling Consensus Mechanism of the Distributed Data Ledger Technology
1 Introduction
2 Analysis of the Problem of Scalability and Security of Distributed Ledger Systems
3 Statement of the Main Research Material
4 Basic Probabilistic Models for Exploring the Blockchain Scalability Problem
5 Conclusion
References
A Comparative Study on Distracted Driver Detection Using CNN and ML Algorithms
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Proposed CNN Model
3.2 ResNet50 Model
3.3 VGG16 Model
3.4 Logistic Regression
4 Experimental Design
4.1 Dataset Description
4.2 Preprocessing the Dataset
4.3 Result and Discussion
5 Conclusion
References
Exploring Word2vec Embedding for Sentiment Analysis of Bangla Raw and Romanized Text
1 Introduction
2 Related Works
3 Proposed Methodology
3.1 Pre-processing Data
3.2 Word Embedding
3.3 Training Deep Recurrent Neural Network Model for Sentiment Analysis
3.4 Classification
3.5 Post-processing
4 Experimental Evaluation
4.1 Experimental Setup
4.2 Dataset
4.3 Experimental Results
5 Conclusion and Future Work
References
Anomaly Based Network Intrusion Detection System for IoT
1 Introduction
2 Related Theory
2.1 Network Security
2.2 Machine Learning
2.3 Deep Learning
3 Related Work
4 Experimental Design
4.1 Dataset Description
4.2 Preprocessing the Dataset
4.3 Model Architecture
5 Result and Discussions
6 Conclusion
References
CoviIS: A Real-Time Covid Help Information System Using Digital Media
1 Introduction
2 Related Work
2.1 Covid Informatics Applications
2.2 Covid Advisory Bots
2.3 Covid Sentiment Analysis
3 Design Methodology
3.1 Statistical Analysis
3.2 Geographical Analysis
3.3 News Analysis
3.4 Social Media Analysis
3.5 Sentiment Analysis
3.6 Health Safety
4 Development
5 Results and Discussion
6 Conclusion and Future Work
References
Distributed Denial of Service Attack Detection Using Optimized Hybrid Neuro-Fuzzy Classifiers
1 Introduction
2 Related Work
3 Proposed System
3.1 Pre-processing
3.2 Extracting Features
3.3 Optimized Hybrid Neuro-Fuzzy Classifier
3.4 Grasshopper Optimization Algorithm (GOA)
4 Results and Discussion
5 Conclusion
References
An Efficient Framework for Forecasting of Crime Trend Using Machine Learning Technique
1 Introduction
1.1 Author's Contributions
1.2 Organization
2 Related Works
3 Experimental Methodologies
3.1 Naive Method
3.2 Simple Average Method
3.3 Simple Moving Average Method
3.4 Simple Exponential Smoothing
3.5 Holt's Method with Trend
3.6 Holt-Winters' Additive Method with Trend and Seasonality
3.7 Holt-Winter's Multiplicative Method with Trend and Seasonality
4 Proposed Model
5 Experimental Outcomes and Discussion
6 Conclusion and Future Work
References
Performance Evaluation of a Novel Thermogram Dataset for Diabetic Foot Complications
1 Introduction
2 Methodology
2.1 Data Acquisition and Preprocessing
2.2 Feature Extraction
2.3 Feature Selection and Classification
3 Result and Discussion
4 Conclusion
References
Improving Indoor Well-Being Through IoT: A Methodology for User Safety in Confined Spaces
1 Introduction
2 Updates from Recent Literature
3 Methodology
3.1 Methodological Path
4 Expected Impact
5 Conclusions
References
General Natural Language Processing Translation Strategy and Simulation Modelling Application Example
1 Introduction
2 Epistemic Knowledge Orgiton Model
3 Natural Language Processing Translation Strategy or Algorithm
4 Application Example for the Simulation of Movement as a Translation Form Natural Language into Computer Processed Language
4.1 Minimalistic Example Set
4.2 Translation into a Computational Representation
4.3 Search- and Find Process or Search-Find-o
5 Results and Recommendation
6 Conclusion and Outlook
References
Artificial Intelligence in Disaster Management: A Survey
1 Introduction
2 Literature Survey
2.1 Disaster Management
2.2 Recent Development in NIA and Their Applications in Disaster Management
3 Conclusion
References
A Survey on Plant Leaf Disease Detection Using Image Processing
1 Introduction
2 Literature Review
3 Methodology
3.1 Image Preprocessing
3.2 Detection of Model
3.3 Feature Extraction and Training
3.4 Classification of the Object
3.5 Identification
4 Conclusion
References
Feature Importance in Explainable AI for Expounding Black Box Models
1 Introduction
2 Literature Survey
2.1 Surrogate Explainability
2.2 Local Perturbation Based Explainability
2.3 Propagation-Based Explainability
2.4 Metadata-Based Explainability
3 Implementation Details
3.1 ELI5
4 Challenges
5 Conclusion
References
Sign Language Recognition System Using Customized Convolution Neural Network
1 Introduction
2 Literature Survey
3 System Architecture
3.1 OpenCV
3.2 Convolutional Neural Network
4 Methodology
4.1 Set Histogram
4.2 Creating Dataset
4.3 Image Processing
4.4 Model Creation
4.5 Displaying Gestures
4.6 Real-Time Classification
5 Results and Discussions
5.1 Dataset
5.2 Software Used
5.3 Specifications
5.4 Confusion Matrix
6 Conclusion
References
Space Fractionalized Lattice Boltzmann Model-Based Image Denoising
1 Introduction
2 Statement of Problem
3 Formulation of Problem
4 Numerical Computations
5 Results and Discussion
6 Conclusion and Future Direction
References
A Review About Analysis and Design Methodology of Two-Stage Operational Transconductance Amplifier (OTA)
1 Introduction
2 Literature Survey
3 Design Methodology
4 Mathematical Modeling
4.1 Comparision Table
5 Conclusion
References
Design and Optimization of Wideband RF Energy Harvesting Antenna for Low-Power Wireless Sensor Applications
1 Introduction
2 Methodology
3 Antenna Design
4 Results and Discussions
5 Conclusion
References
Forecasting of Novel Corona Cases in India Using LSTM-Based Recurrent Neural Networks
1 Introduction
2 Methodology
2.1 Data Collection
2.2 Data Cleaning
2.3 Recurrent Neural Network Model
3 Performance Evaluation
4 Software Information
5 Results and Discussion
6 Conclusion
References
Algorithms for Syllogistic Using RMMR
1 Background and Historical Preliminaries
2 Retooled Method of Minimal Representation
2.1 Preface of RMMR
2.2 Propositions in RMMR
2.3 Examining Syllogisms in RMMR
3 Functioning of the RMMR
4 Algorithms for RMMR
5 Summary and Conclusion
References
Machine Learning-Based Approach for Airfare Forecasting
1 Introduction
2 Literature Survey
3 Proposed System
3.1 Data Preparation
3.2 Feature Selection
3.3 Data Analysis
4 Results and Discussion
4.1 Experimental Results
4.2 Comparative Analysis
5 Conclusion
References
Author Index

Citation preview

Lecture Notes in Networks and Systems 552

Mukesh Saraswat Chandreyee Chowdhury Chintan Kumar Mandal Amir H. Gandomi   Editors

Proceedings of International Conference on Data Science and Applications ICDSA 2022, Volume 2

Lecture Notes in Networks and Systems Volume 552

Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas—UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong

The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science. For proposals from Asia please contact Aninda Bose ([email protected]).

Mukesh Saraswat · Chandreyee Chowdhury · Chintan Kumar Mandal · Amir H. Gandomi Editors

Proceedings of International Conference on Data Science and Applications ICDSA 2022, Volume 2

Editors Mukesh Saraswat Department of Computer Science and Engineering and Information Technology Jaypee Institute of Information Technology Noida, India Chintan Kumar Mandal Department of Computer Science and Engineering Jadavpur University Kolkata, India

Chandreyee Chowdhury Department of Computer Science and Engineering Jadavpur University Kolkata, India Amir H. Gandomi Data Science Institute University of Technology Sydney Sydney, NSW, Australia

ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-19-6633-0 ISBN 978-981-19-6634-7 (eBook) https://doi.org/10.1007/978-981-19-6634-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

This book contains outstanding research papers as the proceedings of the 3rd International Conference on Data Science and Applications (ICDSA 2022). ICDSA 2022 has been organized by School of Mobile Computing and Communication, Jadavpur University, Kolkata, India, and technically sponsored by Soft Computing Research Society, India. The conference is conceived as a platform for disseminating and exchanging ideas, concepts, and results of researchers from academia and industry to develop a comprehensive understanding of the challenges of the advancements of intelligence in computational viewpoints. This book will help in strengthening congenial networking between academia and industry. We have tried our best to enrich the quality of the ICDSA 2022 through a stringent and careful peer-review process. This book presents novel contributions to Data Science and serves as reference material for advanced research. ICDSA 2022 received many technical contributed articles from distinguished participants from home and abroad. ICDSA 2022 received 482 research submissions from 34 different countries, viz. Afghanistan, Albania, Australia, Austria, Bangladesh, Bulgaria, Cyprus, Ecuador, Ethiopia, Germany, Ghana, Greece, India, Indonesia, Iran, Iraq, Italy, Japan, Malaysia, Morocco, Nepal, Nigeria, Saudi Arabia, Serbia, South Africa, South Korea, Spain, Sri Lanka, Taiwan, Thailand, Ukraine, USA, Vietnam, and Yemen. After a very stringent peer-reviewing process, only 130 high-quality papers were finally accepted for presentation and the final proceedings. This book presents second volume of 65 research papers on Data Science and applications and serves as reference material for advanced research. Noida, India Kolkata, India Kolkata, India Sydney, Australia

Mukesh Saraswat Chandreyee Chowdhury Chintan Kumar Mandal Amir H. Gandomi

v

Contents

Improving River Streamflow Forecasting Utilizing Multilayer Perceptron-Based Butterfly Optimization Algorithm . . . . . . . . . . . . . . . . . . Abinash Sahoo, Ippili Saikrishnamacharyulu, Shaswati S. Mishra, Sandeep Samantaray, and Deba Prakash Satapathy COVID-19 Contact Tracing Using Low Calibrated Transmission Power from BLE—Approach and Algorithm Experimentation . . . . . . . . . Thein Oak Kyaw Zaw, Saravanan Muthaiyah, Malik Manivanan Sehgar, and Ganes Raj Muthu Arumugam

1

13

Monitoring Loud Commercials in Television Broadcast . . . . . . . . . . . . . . . Silvana Sukaj and Rosaria Parente

33

Potential Customers Prediction in Bank Telemarketing . . . . . . . . . . . . . . . Le Dinh Huynh, Phung Thai Duong, Khuat Duy Bach, and Phan Duy Hung

43

Analysis and Implementation of Normalisation Techniques on KDD’99 Data Set for IDS and IPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Priyalakshmi and R. Devi

51

Deep Neural Networks Predicting Student Performance . . . . . . . . . . . . . . . Kandula Neha, Ram Kumar, S. Jahangeer Sidiq, and Majid Zaman

71

An Efficient Group Signature Scheme Based on ECDLP . . . . . . . . . . . . . . Namita Tiwari, Amit Virmani, and Ashutosh Tripathi

81

Sentiment Analysis of COVID-19 Tweets Using TextBlob and Machine Learning Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. Kathiravan, R. Saranya, and Sridurga Sekar

89

Reflection of Star Ratings on Online Customer Reviews; Its Influence on Consumer Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 C. Selvaraj and Anitha Nallasivam

vii

viii

Contents

An Integrated Machine Learning Approach Predicting Stock Values Using Order Book Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Hemantkumar Wani and S. H. Sujithkumar Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Truong Tran Mai Anh and Tran Duc Vi Intellectual Identification Method of the Egg Development State Based on Deep Neural Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Eugene Fedorov, Tetyana Utkina, and Tetiana Neskorodieva Predicting Order Processing Times in E-Pharmacy Supply Chains During COVID Pandemic Using Machine learning—A Real-World Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Mahesh Babu Mariappan, Kanniga Devi, and Yegnanarayanan Venkataraman Cognitive Science: An Insightful Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Manjushree D. Laddha, Harsha R. Gaikwad, Harishchandra Akarte, and Sanil Gandhi Predicting the Dynamic Viscosity of Biodiesels at 313 K Using Empirical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Youssef Kassem, Hüseyin Çamur, Tu˘gberk Özdemir, and Bawa Bamaiyi Artificial Neural Networks, Quadratic Regression, and Multiple Linear Regression in Modeling Cetane Number of Biodiesels . . . . . . . . . . 217 Youssef Kassem, Hüseyin Çamur, George Edem Duke, and Abdalla Hamada Abdelnaby AI-Based Automated Approach for Trend Data Generation and Competitor Benchmark to Enhance Voice AI Services . . . . . . . . . . . . 225 Jayavel Kanniappan, Jithin Gangadharan, Rajesh Kumar Jayavel, and Aravind Nadanasabapathy Identification of ADHD Disorder in Children Using EEG Based on Visual Attention Task by Ensemble Deep Learning . . . . . . . . . . . . . . . . 243 Swati Aggarwal, Nupur Chugh, and Arnav Balyan Machine Learning as a Service (MLaaS)—An Enterprise Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Ioannis Grigoriadis, Eleni Vrochidou, Iliana Tsiatsiou, and George A. Papakostas Very Low Illumination Image Enhancement via Lightness Mapping . . . . 275 Ahmed Rafid Hashim, Hana H. Kareem, and Hazim G. Daway

Contents

ix

Clustering High Dimensional Transcriptomic Data with Spectral Clustering for Patient Subtyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Arif Ahmad Rather and Manzoor Ahmad Chachoo 3D CNN-Based Classification of Severity in COVID-19 Using CT Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 R. Leena Sri, Divya Vetriveeran, and Rakoth Kandan Sambandam A Hybrid Architecture for Action Recognition in Videos Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Kakarla Ajay Kumar Reddy, Ch. Vijayendra Sai, Sundeep V. V. S. Akella, and Priyanka Kumar Data Envelopment Analysis: A Tool for Performance Evaluation of Undergraduate Engineering Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Vaidehi Bhaskara, K. T. Ramesh, and Sayan Chakraborty The Role of Big Data in Color Trend Forecasting: Scope and Challenges-A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . 337 Siddhali Doshi Forensic Facial Recognition: Review and Challenges . . . . . . . . . . . . . . . . . . 351 Ipsita Pattnaik, Amita Dev, and A. K. Mohapatra Spatio-Temporal Analysis of Urbanization by Using Supervised Image Classification with Correlation of Land Surface Temperature and Topography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 R. Marianne Rhea and S. Thangaperumal COVID-DenseNet: A Deep Learning Architecture to Detect COVID-19 from Chest Radiology Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Md. Mohaiminul Islam, Tanveer Hannan, Laboni Sarker, and Zakaria Ahmed Pneumonia Chest X-ray Classification Using Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 S. Nagashree and B. S. Mahanand Linking Social Media Data and Clinical Methods to Detect Depression Using Artificial Intelligence: A Review . . . . . . . . . . . . . . . . . . . . 427 Anushka Choudhury, Muskan Didwania, P. C. Karthik, and Saad Yunus Sait Triplet Multi-task Learning Strategy for Person Re-identification Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 Shavantrevva Bilakeri and A. K. Karunakar K-Means Algorithm to Form Dynamic Cluster Formation to Counter the Static Property of K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Narender Kumar, Vinesh Kumar Jain, and Jyoti Gajrani

x

Contents

Analysis of an Efficient Elite Group-Based Routing Protocol for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Rupal Shukla and Ashwini Kumar A Systematic Study of Fake News Detection Systems Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 Ravish and Rahul Katarya Training Logistic Regression Model by Hybridized Multi-verse Optimizer for Spam Email Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Miodrag Zivkovic, Aleksandar Petrovic, Nebojsa Bacanin, Marko Djuric, Ana Vesic, Ivana Strumberger, and Marina Marjanovic Optimization of Spatial Pyramid Pooling Module Placement for Micro-expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Marzuraikah Mohd Stofa, Mohd Asyraf Zulkifley, Muhammad Ammirrul Atiqi Mohd Zainuri, and Mohd Hairi Mohd Zaman Image Colorization: A Convolutional Network Approach . . . . . . . . . . . . . 533 Nitesh Pradhan, Saransh Gupta, and Gaurav Srivastava Prediction of Particulate Matter (PM2.5) Across India Using Machine Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Rikta Sen, Ashis Kumar Mandal, Saptarsi Goswami, and Basabi Chakraborty Convolutional Neural Network for COVID-19 Detection . . . . . . . . . . . . . . 557 Pulkit Agarwal, Neeraj Yadav, Rishav Kumar, and Rahul Thakur Posit Extended RISC-V Processor and Its Enhancement Using Data Type Casting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 Ashley Kurian and M. Ramesh Kini Securing Microservice-Driven Applications Based on API Access Graphs Using Supervised Machine Learning Techniques . . . . . . . . . . . . . . 587 B. Aditya Pai, Anirudh P. Hebbar, and Manoj M. V. Kumar Scaling and Cutout Data Augmentation for Cardiac Segmentation . . . . . 599 Elizar Elizar, Mohd Asyraf Zulkifley, and Rusdha Muharar An Improved Method to Recognize Bengali Handwritten Characters Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 Monishanker Halder, Sudipta Kundu, and Md. Ferdows Hasan Dynamic Pricing for Electric Vehicle Charging at a Commercial Charging Station in Presence of Uncertainty: A Multi-armed Bandit Reinforcement Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Ubaid Qureshi, Mehreen Mushtaq, Juveeryah Qureshi, Mir Aiman, Mansha Ali, and Shahnawaz Ali

Contents

xi

Photo Restoration: A Sequential Pipeline Approach Involving Denoising and Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637 Abhijnya Bhat, Sejal Priya, Abhijnan Bajpai, and S. Natarajan Development of a Linear-Scaling Consensus Mechanism of the Distributed Data Ledger Technology . . . . . . . . . . . . . . . . . . . . . . . . . . 647 Gennady Shvachych, Boris Moroz, Andrii Matviichuk, Hanna Sashchuk, Oleksandr Dzhus, and Volodymyr Busygin A Comparative Study on Distracted Driver Detection Using CNN and ML Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663 Annu Dhiman, Anukrity Varshney, Faeza Hasani, and Bindu Verma Exploring Word2vec Embedding for Sentiment Analysis of Bangla Raw and Romanized Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Sumaiya Kashmin Zim, Fardeen Ashraf, Tasnia Iqbal, Md. Adnanul Islam, Isbat Khan Polok, Lopa Ahmed, Md. Mahbubur Rahman, and Md. Saddam Hossain Mukta Anomaly Based Network Intrusion Detection System for IoT . . . . . . . . . . 693 Gitesh Prajapati, Pooja Singh, and Rahul CoviIS: A Real-Time Covid Help Information System Using Digital Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707 Niharika Ganji, Arnab Sinhamahapatra, Shubhi Bansal, and Nagendra Kumar Distributed Denial of Service Attack Detection Using Optimized Hybrid Neuro-Fuzzy Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725 Pallavi H. Chitte and Sangita S. Chaudhari An Efficient Framework for Forecasting of Crime Trend Using Machine Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Bam Bahadur Sinha and Tarun Biswas Performance Evaluation of a Novel Thermogram Dataset for Diabetic Foot Complications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757 Naveen Sharma, Sarfaraj Mirza, Ashu Rastogi, Prasant K. Mahapatra, and Satbir Singh Improving Indoor Well-Being Through IoT: A Methodology for User Safety in Confined Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767 Mariangela De Vita, Eleonora Laurini, Marianna Rotilio, Vincenzo Stornelli, and Pierluigi De Berardinis General Natural Language Processing Translation Strategy and Simulation Modelling Application Example . . . . . . . . . . . . . . . . . . . . . . 781 Bernhard Heiden and Bianca Tonino-Heiden

xii

Contents

Artificial Intelligence in Disaster Management: A Survey . . . . . . . . . . . . . 793 Suchita Arora, Sunil Kumar, and Sandeep Kumar A Survey on Plant Leaf Disease Detection Using Image Processing . . . . . 807 U. Lathamaheswari and J. Jebathangam Feature Importance in Explainable AI for Expounding Black Box Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815 Bikram Pratim Bhuyan and Sudhanshu Srivastava Sign Language Recognition System Using Customized Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825 Dipmala Salunke, Ram Joshi, Nihar Ranjan, Pallavi Tekade, and Gaurav Panchal Space Fractionalized Lattice Boltzmann Model-Based Image Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839 P. Upadhyay A Review About Analysis and Design Methodology of Two-Stage Operational Transconductance Amplifier (OTA) . . . . . . . . . . . . . . . . . . . . . . 849 Usha Kumari and Rekha Yadav Design and Optimization of Wideband RF Energy Harvesting Antenna for Low-Power Wireless Sensor Applications . . . . . . . . . . . . . . . . 861 Geetanjali, Poonam Jindal, Nitin Saluja, Neeru Kashyap, and Nitika Dhingra Forecasting of Novel Corona Cases in India Using LSTM-Based Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 Sawan Kumar Tripathi, Sanjeev Mishra, and S. D. Purohit Algorithms for Syllogistic Using RMMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885 Sumanta Sarathi Sharma and Varun Kumar Paliwal Machine Learning-Based Approach for Airfare Forecasting . . . . . . . . . . . 901 L. Sherly Puspha Annabel, G. Ramanan, R. Prakash, and S. Sreenidhi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913

Editors and Contributors

About the Editors Mukesh Saraswat is an Associate Professor at Jaypee Institute of Information Technology, Noida, India. Dr. Saraswat obtained his Ph.D. in Computer Science and Engineering from ABV-IIITM Gwalior, India. He has more than 19 years of teaching and research experience. He has guided three Ph.D. students and presently guiding four Ph.D. students. He has published more than 70 journal and conference papers in the area of image processing, pattern recognition, data mining, and soft computing. He was part of successfully completed project funded by SERB, New Delhi on image analysis and currently running one project funded by CRS, RTU, Kota. He has been an active member of many organizing committees for various conferences and workshops. He is also a guest editor of the Array, Journal of Swarm Intelligence, and Journal of Intelligent Engineering Informatics. He is one of the General Chairs of the International Conference on Data Science and Applications. He is also an Editorial Board Member of the Journal MethodsX. He is also a series editor of the SCRS Book Series on Computing and Intelligent Systems (CIS). He is an active member of IEEE, ACM, CSI, and SCRS Professional Bodies. His research areas include Image Processing, Pattern Recognition, Data Mining, and Soft Computing. Chandreyee Chowdhury is an Associate Professor in the department of Computer Science and Engineering at Jadavpur University, India. She has received M.E. in Computer Science and Engineering in 2005 and Ph.D. in 2013 from Jadavpur University. Her research interests include IoT in healthcare, indoor localization, and human activity recognition. She was awarded Post-Doctoral Fellowship by Erusmus Mundus in 2014 to carry out research work at Northumbria University, UK. She has served as technical program committee members of many international conferences. She has published more than 100 papers in reputed journals, book chapters and international peer-reviewed conferences. She is a member of IEEE and IEEE Computer Society.

xiii

xiv

Editors and Contributors

Chintan Kumar Mandal is presently working in the Department of Computer Science and Engineering at Jadavpur University, Kolkata, India. He did his graduation and post-graduation in Computer Science and Engineering from Calcutta University followed by his Ph.D. from Motilal Nehru National Institute of Technology, Allahabad. Prior to these, he did his Physics with Honours from Calcutta University. He has published papers in various journals and conferences. His areas of interest are Computational Geometry, Computer Graphics and Robotics. Amir H. Gandomi is a Professor of Data Science and an ARC DECRA Fellow at the Faculty of Engineering and Information Technology, University of Technology Sydney. Prior to joining UTS, Prof. Gandomi was an Assistant Professor at Stevens Institute of Technology, USA and a distinguished research fellow at BEACON center, Michigan State University, USA. Professor Gandomi has published over 300 journal papers and 12 books which collectively have been cited 31,000+ times (H-index = 81). He has been named as one of the most influential scientific minds and received Highly Cited Researcher awards (top 1% publications and 0.1% researchers) for five consecutive years, 2017 to 2021. He also ranked 17th in GP bibliography among more than 15,000 researchers. He has received multiple prestigious awards for his research excellence and impact, such as the 2022 Walter L. Huber Prize which is known as the highest level mid-career research award in all areas of civil engineering. He has served as associate editor, editor, and guest editor in several prestigious journals such as AE of IEEE TBD and IEEE IoTJ. Professor Gandomi is active in delivering keynotes and invited talks. His research interests are (big) data analytics and global optimisation.

Contributors Abdalla Hamada Abdelnaby Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia, North Cyprus B. Aditya Pai Department of Information Science and Engineering, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Pulkit Agarwal Department of Electronics and Communication Engineering, Delhi Technological University, New Delhi, Delhi, India Swati Aggarwal Netaji Subhas Institute of Technology, New Delhi, India Lopa Ahmed Military Institute of Science and Technology, Dhaka, Bangladesh Zakaria Ahmed Enosis Solutions, Dhaka, Bangladesh Mir Aiman Department of Electrical Engineering, Institute of Technology, University of Kashmir, Srinagar, India

Editors and Contributors

xv

Harishchandra Akarte Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India Sundeep V. V. S. Akella Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Mansha Ali Department of Electrical Engineering, Institute of Technology, University of Kashmir, Srinagar, India Shahnawaz Ali Department of Electrical Engineering, Institute of Technology, University of Kashmir, Srinagar, India Truong Tran Mai Anh School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City, Vietnam Suchita Arora Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Rajasthan, Jaipur, Rajasthan, India Ganes Raj Muthu Arumugam Multimedia University, Cyberjaya, Selangor, Malaysia Fardeen Ashraf Military Institute of Science and Technology, Dhaka, Bangladesh Nebojsa Bacanin Singidunum University, Belgrade, Serbia Khuat Duy Bach Computer Science Department, FPT University, Hanoi, Vietnam Abhijnan Bajpai Department of Computer Science and Engineering, PES University, Bangalore, Karnataka, India Arnav Balyan Netaji Subhas Institute of Technology, New Delhi, India Bawa Bamaiyi Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia, North Cyprus Shubhi Bansal Indian Institute of Technology Indore, Indore, India Vaidehi Bhaskara Department of Industrial Engineering and Management, B.M.S College of Engineering, Bengaluru, India Abhijnya Bhat Department of Computer Science and Engineering, PES University, Bangalore, Karnataka, India Bikram Pratim Bhuyan Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India Shavantrevva Bilakeri Department of Data Science and Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India Tarun Biswas Indian Institute of Information Technology Ranchi, Ranchi, India Volodymyr Busygin VUZF University (Higher School of Insurance and Finance), Sofia, Bulgaria

xvi

Editors and Contributors

Hüseyin Çamur Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia, North Cyprus Manzoor Ahmad Chachoo Department of Computer Sciences, University of Kashmir, Srinagar, JK, India Basabi Chakraborty Dean and Professor, School of Computing Science, Madanapalle Institute of Technology and Science, Madanapalle, AP, India Sayan Chakraborty Department of Operations and IT, ICFAI Business School, Hyderabad, India Sangita S. Chaudhari Ramrao Adik Institute of Technology, Dr. D. Y. Patil Deemed to be University, Navi Mumbai, India Pallavi H. Chitte Ramrao Adik Institute of Technology, Dr. D. Y. Patil Deemed to be University, Navi Mumbai, India Anushka Choudhury SRM Institute of Science and Technology, Kattankulathur, Chennai, India Nupur Chugh Netaji Subhas Institute of Technology, New Delhi, India Hazim G. Daway Department of Physics, College of Science, Mustansiriyah University, Baghdad, Iraq Pierluigi De Berardinis Department of Civil, Building and Environmental Engineering, University of L’Aquila, L’Aquila, Italy Mariangela De Vita Department of Civil, Building and Environmental Engineering, University of L’Aquila, L’Aquila, Italy Amita Dev Indira Gandhi Delhi Technical University for Women (IGDTUW), New Delhi, India Kanniga Devi Department of Computer Science, Kalasalingam Academy of Research and Education, Krishnankoil, India R. Devi Department of Computer Science, Chennai, VISTAS, India Annu Dhiman Department of Information Technology, Delhi Technological University, New Delhi, India Nitika Dhingra Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Muskan Didwania SRM Institute of Science and Technology, Kattankulathur, Chennai, India Marko Djuric Singidunum University, Belgrade, Serbia Siddhali Doshi Department of Fashion Communication, Symbiosis Institute of Design, Pune, India

Editors and Contributors

xvii

George Edem Duke Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia, North Cyprus Phung Thai Duong Computer Science Department, FPT University, Hanoi, Vietnam Oleksandr Dzhus Taras Shevchenko National University of Kyiv, Kyiv, Ukraine Elizar Elizar University Kebangsaan Malaysia, Bangi, Selangor, Malaysia; Universitas Syiah Kuala, Banda Aceh, Aceh, Indonesia Eugene Fedorov Cherkasy State Technological University, Cherkasy, Ukraine Harsha R. Gaikwad Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India Jyoti Gajrani Department of CSE, Engineering College, Ajmer, Rajasthan, India Sanil Gandhi Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India Jithin Gangadharan Intelligence & IoT, Samsung R&D Institute Bangalore, Bangalore, India Niharika Ganji Indian Institute of Technology Indore, Indore, India Geetanjali Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Saptarsi Goswami Bangabasi Morning College, University of Calcutta, Kolkata, West Bengal, India Ioannis Grigoriadis MLV Research Group, Department of Computer Science, International Hellenic University, Kavala, Greece Saransh Gupta Manipal University Jaipur, Jaipur, India Monishanker Halder Department of Computer Science and Engineering, Jashore University of Science and Technology, Jashore, Bangladesh Tanveer Hannan Ludwig Maximilian University of Munich, Munich, Germany Md. Ferdows Hasan Department of Computer Science and Engineering, Jashore University of Science and Technology, Jashore, Bangladesh Faeza Hasani Department of Information Technology, Delhi Technological University, New Delhi, India Ahmed Rafid Hashim Department of Computer Science, College of Education for Pure Sciences-Ibn Al-Haitham, University of Baghdad, Baghdad, Iraq Anirudh P. Hebbar Department of Information Science and Engineering, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India

xviii

Editors and Contributors

Bernhard Heiden Carinthia University of Applied Sciences, Villach, Austria; University of Graz, Graz, Austria Phan Duy Hung Computer Science Department, FPT University, Hanoi, Vietnam Le Dinh Huynh Computer Science Department, FPT University, Hanoi, Vietnam Tasnia Iqbal Military Institute of Science and Technology, Dhaka, Bangladesh Md. Adnanul Islam Military Institute of Science and Technology, Dhaka, Bangladesh Md. Mohaiminul Islam University of North Carolina at Chapel Hill, Chapel Hill, USA S. Jahangeer Sidiq Lovely Professional University, Phagwara, Punjab, India Vinesh Kumar Jain Department of CSE, Engineering College, Ajmer, Rajasthan, India Rajesh Kumar Jayavel Intelligence & IoT, Samsung R&D Institute Bangalore, Bangalore, India J. Jebathangam Department of Computer Science, VISTAS, Chennai, India Poonam Jindal Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Ram Joshi Department of Information Technology, JSPM’s Rajarshi Shahu College of Engineering Tathawade, Pune, India Jayavel Kanniappan Intelligence & IoT, Samsung R&D Institute Bangalore, Bangalore, India Hana H. Kareem Department of Physics, College of Education, Mustansiriyah University, Baghdad, Iraq P. C. Karthik SRM Institute of Science and Technology, Kattankulathur, Chennai, India A. K. Karunakar Department of Data Science and Computer Applications, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, India Neeru Kashyap Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Youssef Kassem Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia, North Cyprus; Faculty of Civil and Environmental Engineering, Near East University, Nicosia, North Cyprus Rahul Katarya Big Data Analytics and Web Intelligence Laboratory, Department of Computer Science, Delhi Technological University, New Delhi, India

Editors and Contributors

xix

P. Kathiravan Department of Computer Science, Central University of Tamil Nadu, Thiruvarur, India Ashwini Kumar Department of ECE, IGDTUW, Delhi, India Manoj M. V. Kumar Department of Information Science and Engineering, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India Nagendra Kumar Indian Institute of Technology Indore, Indore, India Narender Kumar Department of CSE, Engineering College, Ajmer, Rajasthan, India Priyanka Kumar Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Ram Kumar Lovely Professional University, Phagwara, Punjab, India Rishav Kumar Department of Electronics and Communication Engineering, Delhi Technological University, New Delhi, Delhi, India Sandeep Kumar Department of Computer Science and Engineering, CHRIST (Deemed to be University), Bangalore, Karnataka, India Sunil Kumar Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Rajasthan, Jaipur, Rajasthan, India Usha Kumari Department of Electronics and Communication, DCRUST Murthal, Murthal, India Sudipta Kundu Department of Computer Science and Engineering, Jashore University of Science and Technology, Jashore, Bangladesh Ashley Kurian National Institute of Technology, Surathkal, Karnataka, India Manjushree D. Laddha Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India U. Lathamaheswari Department of Computer Science, VISTAS, Chennai, India Eleonora Laurini Department of Civil, Building and Environmental Engineering, University of L’Aquila, L’Aquila, Italy R. Leena Sri Department of CSE, Thiagarajar College of Engineering, Madurai, India B. S. Mahanand Department of Information Science and Engineering, JSS Science and Technology University, Mysore, India Prasant K. Mahapatra CSIR-Central Chandigarh, India

Scientific

Instruments

Organisation,

Ashis Kumar Mandal School of Software and Information Science, Iwate Prefectural University, Iwate, Japan

xx

Editors and Contributors

R. Marianne Rhea Department of Civil Engineering, St. Joseph’s College of Engineering, Chennai, India Mahesh Babu Mariappan Department of Computer Science, Kalasalingam Academy of Research and Education, Krishnankoil, India Marina Marjanovic Singidunum University, Belgrade, Serbia Andrii Matviichuk Vernadsky National Library of Ukraine, Kyiv, Ukraine Sarfaraj Mirza CSIR-Central Scientific Instruments Organisation, Chandigarh, India Sanjeev Mishra Department of Mechanical Engineering, University Department, Rajasthan Technical University, Kota, India Shaswati S. Mishra Department of Philosophy, Utkal University, Bhubaneswar, Odisha, India A. K. Mohapatra Indira Gandhi Delhi Technical University for Women (IGDTUW), New Delhi, India Boris Moroz Dnipro University of Technology, Dnipro, Ukraine Rusdha Muharar Universitas Syiah Kuala, Banda Aceh, Aceh, Indonesia Md. Saddam Hossain Mukta United International University, Dhaka, Bangladesh Mehreen Mushtaq Department of Electrical Engineering, Institute of Technology, University of Kashmir, Srinagar, India Saravanan Muthaiyah Multimedia University, Cyberjaya, Selangor, Malaysia Aravind Nadanasabapathy Intelligence & IoT, Samsung R&D Institute Bangalore, Bangalore, India S. Nagashree Department of Information Science and Engineering, JSS Academy of Technical Education, Bangalore, India Anitha Nallasivam Jain CMS Business School, Jain University, Bengaluru, India S. Natarajan Department of Computer Science and Engineering, PES University, Bangalore, Karnataka, India Kandula Neha Lovely Professional University, Phagwara, Punjab, India Tetiana Neskorodieva Vasyl’ Stus Donetsk National University, Vinnytsia, Ukraine Tu˘gberk Özdemir Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia, North Cyprus Varun Kumar Paliwal Birla Institute of Technology and Science Pilani, Pilani, Rajasthan, India

Editors and Contributors

xxi

Gaurav Panchal Department of Information Technology, JSPM’s Rajarshi Shahu College of Engineering Tathawade, Pune, India George A. Papakostas MLV Research Group, Department of Computer Science, International Hellenic University, Kavala, Greece Rosaria Parente Benecon University Consortium, Naples, Italy Ipsita Pattnaik Indira Gandhi Delhi Technical University for Women (IGDTUW), New Delhi, India Aleksandar Petrovic Singidunum University, Belgrade, Serbia Isbat Khan Polok Military Institute of Science and Technology, Dhaka, Bangladesh Nitesh Pradhan Manipal University Jaipur, Jaipur, India Gitesh Prajapati Department of Software Engineering, Delhi Technological University, New Delhi, India R. Prakash Department of Information Technology, St. Joseph’s College of Engineering, Chennai, India Sejal Priya Department of Computer Science and Engineering, PES University, Bangalore, Karnataka, India V. Priyalakshmi Department of Computer Science, Chennai, VISTAS, India S. D. Purohit Department of Mechanical Engineering, University Department, Rajasthan Technical University, Kota, India; Department of HEAS (Mathematics), University Department, Rajasthan Technical University, Kota, India Juveeryah Qureshi Department of Electrical Engineering, Institute of Technology, University of Kashmir, Srinagar, India Ubaid Qureshi Department of Electrical Engineering, Indian Institute of Technology, Delhi, India; Department of Electrical Engineering, University of Kashmir, Srinagar, India Md. Mahbubur Rahman Military Institute of Science and Technology, Dhaka, Bangladesh Rahul Department of Software Engineering, Delhi Technological University, New Delhi, India G. Ramanan Department of Information Technology, St. Joseph’s College of Engineering, Chennai, India K. T. Ramesh Department of Industrial Engineering and Management, B.M.S College of Engineering, Bengaluru, India M. Ramesh Kini National Institute of Technology, Surathkal, Karnataka, India

xxii

Editors and Contributors

Nihar Ranjan Department of Information Technology, JSPM’s Rajarshi Shahu College of Engineering Tathawade, Pune, India Ashu Rastogi Postgraduate Institute of Medical Education and Research, Chandigarh, India Arif Ahmad Rather Department of Computer Sciences, University of Kashmir, Srinagar, JK, India Ravish Big Data Analytics and Web Intelligence Laboratory, Department of Computer Science, Delhi Technological University, New Delhi, India Kakarla Ajay Kumar Reddy Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Marianna Rotilio Department of Civil, Building and Environmental Engineering, University of L’Aquila, L’Aquila, Italy Abinash Sahoo National Institute of Technology Silchar, Silchar, Assam, India Ch. Vijayendra Sai Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India Ippili Saikrishnamacharyulu Department of Civil Engineering, GIET University, Gunupur, India Saad Yunus Sait SRM Institute of Science and Technology, Kattankulathur, Chennai, India Nitin Saluja Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Dipmala Salunke Department of Information Technology, JSPM’s Rajarshi Shahu College of Engineering Tathawade, Pune, India Sandeep Samantaray Department of Civil Engineering, OUTR Bhubaneswar, Bhubaneswar, Odisha, India Rakoth Kandan Sambandam Department of CSE, CHRIST (Deemed to be University), Bangalore, India R. Saranya Department of Computer Science, Central University of Tamil Nadu, Thiruvarur, India Laboni Sarker University of California, Santa Barbara, USA Hanna Sashchuk Taras Shevchenko National University of Kyiv, Kyiv, Ukraine Deba Prakash Satapathy Department of Civil Engineering, OUTR Bhubaneswar, Bhubaneswar, Odisha, India Malik Manivanan Sehgar Multimedia University, Bukit Beruang, Melaka, Malaysia

Editors and Contributors

xxiii

Sridurga Sekar STRAIVE, Tamil Nadu, India C. Selvaraj Department of Business Administration, Saveetha College of Liberal Arts and Sciences (SIMATS, Deemed University), Chennai, India Rikta Sen School of Software and Information Science, Iwate Prefectural University, Iwate, Japan Naveen Sharma Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India; CSIR-Central Scientific Instruments Organisation, Chandigarh, India Sumanta Sarathi Sharma Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India L. Sherly Puspha Annabel Department of Information Technology, St. Joseph’s College of Engineering, Chennai, India Rupal Shukla Department of ECE, IGDTUW, Delhi, India Gennady Shvachych Ukrainian State University of Science and Technology, Dnipro, Ukraine Pooja Singh Department of Software Engineering, Delhi Technological University, New Delhi, India Satbir Singh National Institute of Technology, Jalandhar, India Bam Bahadur Sinha Indian Institute of Information Technology Ranchi, Ranchi, India Arnab Sinhamahapatra National Institute of Technology Durgapur, Durgapur, India S. Sreenidhi Department of Information Technology, St. Joseph’s College of Engineering, Chennai, India Gaurav Srivastava Manipal University Jaipur, Jaipur, India Sudhanshu Srivastava Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India Marzuraikah Mohd Stofa Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia Vincenzo Stornelli Department of Industrial and Information Engineering and Economics, University of L’Aquila, L’Aquila, Italy Ivana Strumberger Singidunum University, Belgrade, Serbia S. H. Sujithkumar BIET MBA Department, Davangere, Karnataka, India Silvana Sukaj Department of Engineering and Architecture, European University of Tirana (UET), Tirana, Albania

xxiv

Editors and Contributors

Pallavi Tekade Department of Information Technology, JSPM’s Rajarshi Shahu College of Engineering Tathawade, Pune, India Rahul Thakur Department of Electronics and Communication Engineering, Delhi Technological University, New Delhi, Delhi, India S. Thangaperumal Department of Civil Engineering, St. Joseph’s College of Engineering, Chennai, India Namita Tiwari School of Sciences, CSJM University Kanpur, Kanpur, India Bianca Tonino-Heiden University of Graz, Graz, Austria Ashutosh Tripathi T Systems, Pune, India Sawan Kumar Tripathi Department of Mechanical Engineering, University Department, Rajasthan Technical University, Kota, India Iliana Tsiatsiou MLV Research Group, Department of Computer Science, International Hellenic University, Kavala, Greece P. Upadhyay DST-CIMS, Banaras Hindu University, Varanasi, Uttar Pradesh, India Tetyana Utkina Cherkasy State Technological University, Cherkasy, Ukraine Anukrity Varshney Department of Information Technology, Delhi Technological University, New Delhi, India Yegnanarayanan Venkataraman Department of Mathematics, School of Applied Sciences, Kalasalingam Academy of Research and Education, Krishnankoil, India Bindu Verma Department of Information Technology, Delhi Technological University, New Delhi, India Ana Vesic Singidunum University, Belgrade, Serbia Divya Vetriveeran Department of CSE, CHRIST (Deemed to be University), Bangalore, India Tran Duc Vi School of Industrial Engineering and Management, International University, Vietnam National University, Ho Chi Minh City, Vietnam Amit Virmani Computer Application, UIET Kanpur, Kanpur, India Eleni Vrochidou MLV Research Group, Department of Computer Science, International Hellenic University, Kavala, Greece Hemantkumar Wani BIET Davangere, Davangere, Karnataka, India Neeraj Yadav Department of Electronics and Communication Engineering, Delhi Technological University, New Delhi, Delhi, India Rekha Yadav Department of Electronics and Communication, DCRUST Murthal, Murthal, India

Editors and Contributors

xxv

Muhammad Ammirrul Atiqi Mohd Zainuri Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia Majid Zaman University of Kashmir, Srinagar, India Mohd Hairi Mohd Zaman Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia Thein Oak Kyaw Zaw Multimedia University, Cyberjaya, Selangor, Malaysia Sumaiya Kashmin Zim Military Institute of Science and Technology, Dhaka, Bangladesh Miodrag Zivkovic Singidunum University, Belgrade, Serbia Mohd Asyraf Zulkifley Universiti Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia

Improving River Streamflow Forecasting Utilizing Multilayer Perceptron-Based Butterfly Optimization Algorithm Abinash Sahoo, Ippili Saikrishnamacharyulu, Shaswati S. Mishra, Sandeep Samantaray, and Deba Prakash Satapathy

Abstract Accurate prediction and quantification of streamflow are vital to cope with climate change for proper planning and management of basins. The potential of forecasting streamflow is essential since it can assist in mitigating flood risk. Historical long-term streamflow data are necessary for watershed management, hydropower plant construction, long-term water supply use, and flood prediction. Artificial intelligence (AI) models have been applied effectively for predicting/forecasting specified variables in many engineering applications, such as river streamflow, mainly where the variable is highly nonlinear in nature and complex in identification using conventional mathematical models. In present study, an optimization algorithm, i.e., butterfly optimization algorithm (BOA), is projected for enhancing the global optima’s search procedure. The model developed is the MLP-BOA (multilayer perception) for forecasting streamflow using historical streamflow data collected from Rajghat station of Subarnarekha river basin. Classical MLP model is considered a benchmark model for examining the performance of the proposed hybrid model based on three statistical measures, namely root mean squared error (RMSE) and Willmott index (WI). It was found that MLP4-BOA model generates best results with WI of 0.9953 and 0.98235. Keywords Stream flow · MLP-BOA · Willmott index · Subarnarekha river A. Sahoo National Institute of Technology Silchar, Silchar, Assam, India I. Saikrishnamacharyulu Department of Civil Engineering, GIET University, Gunupur, India e-mail: [email protected] S. S. Mishra Department of Philosophy, Utkal University, Bhubaneswar, Odisha, India S. Samantaray (B) · D. P. Satapathy Department of Civil Engineering, OUTR Bhubaneswar, Bhubaneswar, Odisha, India e-mail: [email protected] D. P. Satapathy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_1

1

2

A. Sahoo et al.

1 Introduction Because of population growth and correspondingly increasing demand for energy and water, the management of water resources needs serious attention. Forecasting streamflow is a significant component of water resource systems and has always been challenging for managers and engineers of water resources [1–3] Several activities related to planning and operation of different components of water resources system need future streamflow forecasts. Among different machine learning techniques, artificial neural network (ANN) is a promising tool on basis of its potential to model nonlinear processes [4]. Machine learning models have been a successful implementation in predicting/forecasting various hydrological parameters: flood [5– 8], sediment [9–11], runoff [12–14], groundwater level [15, 16]. It comprises flexible mathematical structures that can map complex nonlinear relationship amid input and output datasets and develop common trends without the need to describe the physical connection. Anctil and Rat [17] implemented MLP to forecast 1-day ahead stream flow for 47 watersheds spread across Central United States and France. Results showed that good stream flow forecasting was obtained from simple MLP. Yonaba et al. [18] used MLP for forecasting multistep ahead stream flow in five diverse catchments with 1–5 days lead times. Uysal et al. [19] used MLP and radial basis function (RBF) models for streamflow forecasting using satellite data for Turkey’s snowdominated area. The obtained results specified that both models gave similar performance. Boucher et al. [26] proposed MLP and extreme learning machines (ELM) models to assimilate state variables in conceptual hydrological models (streamflow forecasting). They found that MLP ensembles provided more reliable estimations than ELM ensembles. Yet, prevailing AI models, like MLP, support vector machine (SVM), etc., are mostly incomprehensible and face problems when used for time series prediction/forecasting. Consequently, the overall performance of MLP model provided imprecise prediction for anticipated output. Talee [20] employed MLP optimized with resilient back-propagation (RP), GDX (variable learning rate), and Levenberg–Marquardt (LM) for streamflow forecasting in Aspas Watershed, situated in Fars state of Iran. In general, MLP-LM model yielded best results compared to other applied models. Kisi et al. [21] investigated and compared accuracy of ANFIS-PSO (particle swarm optimization), ANFIS-GA (genetic algorithm), ANFIS-ACO (ant colony optimization), ANFIS-BOA, and classical ANFIS methods for forecasting drought indexes at various time scales. Fadaee et al. [22] investigated ability of genetic algorithm (GA) and BOA, combined with data-driven models [ANFIS, multiple linear regression (MLR), and ANN] to predict suspended sediment load of Eagle Creek Watershed, Indiana. Results indicated that BOA outperformed GA by enhancing performance of proposed data-driven models. Mohamadi et al. [23] used RBFN, MLP, SVM, and ANFIS models in combination with nomadic people algorithm (NPA) for forecasting meteorological droughts in Iran. Findings revealed that ANFIS-NPA generated superior forecasts than other hybrid and standalone models. Li et al. [24] applied BOA to improve accurateness

Improving River Streamflow Forecasting Utilizing Multilayer …

3

of network intrusion detection systems in medical IoT systems. Sammen et al. [25] combined MLP-SFA (sunflower optimization) for stream flow prediction in Muda Di Jeniang and Jam Seyed Omar stations of Malaysia. A comparison between performances of MLP-SFA with MLP-GA, MLP-PSO, and classical MLP was made, which revealed that integration of optimization algorithms improved performance of MLP, with SFA being the most effective. The major objective of this study is to assess performance of integrated conventional MLP with BOA for improving effectiveness of streamflow forecasting model. Several quantitative indices have been computed for all the explored scenarios to examine the prediction performance of proposed models.

2 Study Area Subarnarekha is a rain-fed river originating from Ranchi’s Nagri village in Jharkhand state of India (23.4° N, 85.4° E). It is one of the major rivers of Southern Chotanagpur plateau, situated at 756 m above MSL. Its drainage area covers the Jharkhand, West Bengal, and Orissa states and finally converges in the Bay of Bengal. Summer, winter, and monsoon are major seasons of the basin. The summer is the warmest, with an average temperature of about 40 °C. In contrast, in winter, the mean temperature is around 7 °C. Southwest monsoon determines rainfall in the basin, contributing 90% of rainfall. Average yearly precipitation is around 1250 mm over the basin, with minimum and maximum rainfall varying between 1100 and 1400 mm (Fig. 1).

3 Methodology 3.1 MLP Among many different neural network types, one of the most popularly applied in hydrology (and in general) is the MLP [28]. It comprises three or more layers. In this study, first layer constitutes of hydrological inputs. The second layer, also known as hidden layer, includes a specific number of neurons. Every input vector is linked to each neuron of the second layer. Those neurons are nonlinear or linear functions, called activation functions (AF), employed to summed and weighted input vectors. Equation (1) gives weighted sum in hidden layer ζ j,t = a j +

n  (W j,i · X i,t ) i=1

(1)

4

A. Sahoo et al.

Fig. 1 Proposed study area

where X i,t —i th input at time t, W j,i — j th input weight associated with i th input, n—number of inputs, a j — j th hidden neuron added to weighted sum. To generate the hidden layer, weighted sum of inputs  is passed to AF of neuron j. A sigmoid tangent function gives output vector C ζ j,t at time t, as expressed in Eq. (2) (Fig. 2).   C ζ j,t =

2 −1 1 + e−2ζ j,t

(2)

3.2 BOA Arora and Singh [27] suggested BOA as a metaheuristic optimization algorithm inspired by food-finding behavior of butterflies. It is the tendency of butterflies to hover to regions with extra pheromone, and the more pheromone a butterfly

Improving River Streamflow Forecasting Utilizing Multilayer …

5

Fig. 2 Network structure of MLP

discharges, the more suitable it is to entice other butterflies. The BOA is a swarmbased algorithm that has been modeled on pheromone reproduction behavior of butterflies and their desirability toward pheromone. A function is used to formulate the fragrance of stimulus’s physical intensity that is expressed using the following equation: f = cI a

(3)

where, f —perceived fragrance’s magnitude; I —intensity of stimulus; c—modality of sensory; a—exponent of power dependent on modality accounting for various absorption levels. BOA has two essential stages: global search and local search stage. In global search stage, butterfly moves toward the best solution/butterfly as given below:   xit+1 = xit + r 2 × g ∗ − xit × f i

(4)

6

A. Sahoo et al.

where, xi —solution vector represented by xit in repetition number t for i th butterfly; f i —i th butterfly fragrance; g ∗ —present best solution obtained among all solutions in present phase, r —random number between [1]. Local search stage is defined as expressed below:   xit+1 = xit + r 2 × x tj − xkt × f i

(5)

where x tj and xkt — j th and k th butterflies chosen randomly from solution space from the existing population (Fig. 3). Even though there are various statistical measures to evaluate proposed models, following widely used standards are considered since they can examine models based on varied factors.   n 1  |ti − yi |2 (6) RMSE =  n i=1 

n

WI = 1 − n i=1

Fig. 3 Flowchart of MLP-BOA model

2 i=1 (ti − yi )



 

yi − t + yi − t 2

(7)

Improving River Streamflow Forecasting Utilizing Multilayer …

7

where yi = Predicted value, ti = Observed value, y =Mean predicted value, t = Mean observed value.

4 Results and Discussions The performance assessment of conventional and hybrid MLP models for streamflow prediction at various input combinations is presented in Table 1. It is observed that WI and R2 values of all models during training and testing stages were greater than 0.90. It is noted that performance of hybrid MLP model is very accurate and reliable. Performance of MLP improved substantially after implementation of the BOA, and accuracy of MLP-BOA-4 was the highest and was recognized as the most appropriate model for streamflow forecasting. The improved values of WI, RMSE of MLP-4 were from 0.94568, 11.23 to 0.98235, 4.005 by hybrid (MLP-BOA-4) model, respectively. From Table 1, it is clear that MLP-BOA models give superior forecasting results for input scenario (4) compared to input scenarios (1), (2), and (3). However, scenario (1) gives the worst result among all the input scenarios (Fig. 4). Scatterplots of observed and predicted monthly streamflow by MLP-BOA and MLP models for input scenario (4) are presented in Fig. 5. It is clear from the scatterplots that predicted values by MLP-BOA model are closer to the line (1:1), which indicates a higher R2 value than MLP model. Hydrographs amid observed and predicted streamflow by MLP-BOA and MLP models are illustrated in Fig. 5. Also, from Fig. 5, it is apparent that predicted streamflow values by MLP-BOA are closer to the observed streamflows than the MLP model. Figure 6 uses violin plots for presenting variability of streamflow across various proposed methodologies. The plots demonstrate that changes in the magnitude of streamflow do not vary much across timescales based on the interquartile range (inside each violin, the thicker dark solid line). But, the main difference is the median Table 1 Performance evaluation results of proposed models Station name

Model name

Training

Testing

RMSE

WI

RMSE

WI

Rajghat

MLP-1

10.9726

0.95534

14.6325

0.9358

MLP-2

9.3354

0.95763

13.5647

0.93705

MLP-3

8.632

0.9586

11.9851

0.93994

MLP-4

8.1389

0.96015

11.23

0.94568

MLP-BOA-1

2.2674

0.98864

6.034

0.97534

MLP-BOA-2

1.98

0.99001

5.247

0.97726

MLP-BOA-3

1.257

0.99295

4.658

0.9785

MLP-BOA-4

0.9654

0.9953

4.005

0.98235

8

A. Sahoo et al.

Fig. 4 Scatter plot showing actual and predicted streamflow

Fig. 5 Time series plot of streamflow in training and testing phase

values of streamflow. Clearly, the mean and median streamflow values show the variability as we move toward a longer timescale.

Improving River Streamflow Forecasting Utilizing Multilayer …

9

Fig. 6 Violin plot showing variability in observed and predicted streamflow values

5 Conclusion The current study used an MLP network optimized with butterfly optimization algorithm to forecast streamflow. The models were employed to monthly streamflow at Rajghat station of Subarnarekha river basin. Applied model was then compared with conventional MLP model for performance assessment. Outcomes revealed that developed MLP-BOA model performed better than conventional MLP and considerably improved all recommended values of statistical indexes. Utilizing MLP-BOA model, maximum WI increased from 0.9358 to 0.9953, and minimum RMSE decreased from 14.6325 to 0.9654. A significant enhancement in river streamflow prediction results is observed when the hybrid model is used. We can conclude that improved MLPBOA has the potential to improve forecasting accurateness of river streamflow in comparison with conventional MLP model. Developed models could be generalized and employed in different rivers worldwide.

References 1. Moradkhani H, Hsu KL, Gupta HV, Sorooshian S (2004) Improved streamflow forecasting using self-organizing radial basis function artificial neural networks. J Hydrol 295(1–4):246– 262 2. Samantaray S, Sahoo A (2021) Prediction of suspended sediment concentration using hybrid SVM-WOA approaches. Geocarto Int 1–27 3. Sahoo A, Samantaray S, Ghose DK (2019) Stream flow forecasting in Mahanadi River Basin using artificial neural networks. Procedia Comput Sci 157:168–174 4. Kisi O, Sanikhani H (2015) Prediction of long-term monthly precipitation using several soft computing methods without climatic data. Int J Climatol 35(14):4139–4150 5. Samantaray S, Sahoo A, Agnihotri A (2021) Assessment of flood frequency using statistical and hybrid neural network method: Mahanadi River Basin, India. J Geol Soc India 97(8):867–880

10

A. Sahoo et al.

6. Sahoo A, Samantaray S, Ghose DK (2021) Prediction of flood in Barak River using hybrid machine learning approaches: a case study. J Geol Soc India 97(2):186–198 7. Sahoo A, Samantaray S, Paul S (2021) Efficacy of ANFIS-GOA technique in flood prediction: a case study of Mahanadi river basin in India. H2Open J 4(1):137–156 8. Sahoo A, Samantaray S, Bankuru S, Ghose DK (2020) Prediction of flood using adaptive neuro-fuzzy inference systems: a case study. In: Smart intelligent computing and applications. Springer, Singapore, pp 733–739 9. Sahoo A, Barik A, Samantaray S, Ghose DK (2021) Prediction of sedimentation in a watershed using RNN and SVM. In: Communication software and networks. Springer, Singapore, pp 701–708 10. Mohanta NR, Biswal P, Kumari SS, Samantaray S, Sahoo A (2021) Estimation of sediment load using adaptive neuro-fuzzy inference system at Indus River Basin, India. In: Intelligent data engineering and analytics. Springer, Singapore, pp 427–434 11. Samantaray S, Sahoo A (2021) Modelling response of infiltration loss toward water table depth using RBFN, RNN, ANFIS techniques. Int J Knowl Based Intell Eng Syst 25(2):227–234 12. Samantaray S, Sahoo A (2020) Prediction of runoff using BPNN, FFBPNN, CFBPNN algorithm in arid watershed: a case study. Int J Knowl Based Intell Eng Syst 24(3):243–251 13. Samantaray S, Sahoo A, Ghose DK (2019) Assessment of runoff via precipitation using neural networks: watershed modelling for developing environment in arid region. Pertanika J Sci Technol 27(4):2245–2263 14. Jimmy SR, Sahoo A, Samantaray S, Ghose DK (2021) Prophecy of runoff in a River Basin using various neural networks. In: Communication software and networks. Springer, Singapore, pp 709–718 15. Samantaray S, Sahoo A, Ghose DK (2020) Infiltration loss affects toward groundwater fluctuation through CANFIS in Arid Watershed: a case study. In: Smart intelligent computing and applications. Springer, Singapore, pp 781–789 16. Samanataray S, Sahoo A (2021) A Comparative study on prediction of monthly streamflow using hybrid ANFIS-PSO approaches. KSCE J Civ Eng 25(10):4032–4043 17. Anctil F, Rat A (2005) Evaluation of neural network streamflow forecasting on 47 watersheds. J Hydrol Eng 10(1):85–88 18. Yonaba H, Anctil F, Fortin V (2010) Comparing sigmoid transfer functions for neural network multistep ahead streamflow forecasting. J Hydrol Eng 15(4):275–283 19. Uysal G (2016) Streamflow forecasting using different neural network models with satellite data for a snow dominated region in Turkey. Procedia Eng 154:1185–1192 20. Talaee PH (2014) Multilayer perceptron with different training algorithms for streamflow forecasting. Neural Comput Appl 24(3):695–703 21. Kisi O, Gorgij AD, Zounemat-Kermani M, Mahdavi-Meymand A, Kim S (2019) Drought forecasting using novel heuristic methods in a semi-arid environment. J Hydrol 578:124053 22. Fadaee M, Mahdavi-Meymand A, Zounemat-Kermani M (2020) Suspended sediment prediction using integrative soft computing models: on the analogy between the butterfly optimization and genetic algorithms. Geocarto Int 1–17 23. Mohamadi S, Sammen SS, Panahi F, Ehteram M, Kisi O, Mosavi A, Ahmed AN, El-Shafie A, Al-Ansari N (2020) Zoning map for drought prediction using integrated machine learning models with a nomadic people optimization algorithm. Nat Hazards 104(1):537–579 24. Li Y, Ghoreishi SM, Issakhov A (2021) Improving the accuracy of network intrusion detection system in medical IoT systems through butterfly optimization algorithm. Wireless Pers Commun 1–19 25. Sammen SS, Ehteram M, Abba SI, Abdulkadir RA, Ahmed AN, El-Shafie A (2021) A new soft computing model for daily streamflow forecasting. Stochast Environ Res Risk Assess 1–13 26. Boucher MA, Quilty J, Adamowski J (2020) Data assimilation for streamflow forecasting using extreme learning machines and multilayer perceptrons. Water Resour Res 56(6):p.e2019WR026226 27. Arora S, Singh S (2015) September. Butterfly algorithm with levy flights for global optimization. In 2015 International conference on signal processing, computing and control (ISPCC) (pp 220–224). IEEE

Improving River Streamflow Forecasting Utilizing Multilayer …

11

28. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

COVID-19 Contact Tracing Using Low Calibrated Transmission Power from BLE—Approach and Algorithm Experimentation Thein Oak Kyaw Zaw , Saravanan Muthaiyah , Malik Manivanan Sehgar, and Ganes Raj Muthu Arumugam Abstract Within a short period of time, the highly infectious COVID-19 virus has progressed into a pandemic which has forced countries to develop contact tracing solutions for closer monitoring of its further spread into the society. Bluetooth low energy (BLE) has been extensively adopted to implement contact tracing focusing mainly on utilizing received signal strength indicator (RSSI) for its distance estimation toward close contact identification (CCI). Nevertheless, when observed closely, many of these solutions were not able to accurately carry out the contact tracing as required by Centers for Disease Control (CDC) and Prevention. The provisions set were distance of within 6-ft (~ 2 m) and period of no less than 15 min for close contact identification. This is mainly because usage of RSSI is highly unstable and volatile. In closing the gap, we proposed a novel approach that utilizes low calibrated transmission power (Tx) employing nRF52832 BLE chipset as wearables, in which, at a distance of greater than 2 m, no close contact will be detected making the accuracy to high and low error distance estimation under ideal condition. Algorithm in establishing close contacts is also demonstrated with complete experimentation. Results show that our proposed solution has maximum error of 0.3209 m in distance estimation of 2 m and 71.43% accuracy in CCI with 4 devices and distance of 2 ± 0.3 m consideration. Keywords Bluetooth low energy · Contact tracing · COVID-19 · Transmission power

T. O. K. Zaw (B) · S. Muthaiyah · G. R. M. Arumugam Multimedia University, 63100 Cyberjaya, Selangor, Malaysia e-mail: [email protected] M. M. Sehgar Multimedia University, 75450 Bukit Beruang, Melaka, Malaysia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_2

13

14

T. O. K. Zaw et al.

1 Introduction COVID-19 is a highly transmittable virus that has taken the world by surprise with its emergence at the end of year 2019. Even after a person is infected by the virus, the patient may not show any sign of symptoms until a period of time making it highly unpredictable and formidable [1]. As the infection keeps on spreading, contact tracing becomes an important task in order to manage and control the pandemic as the number of cases rises all over the world. From timeseries data visualization carried out using data from World Health Organization, we noticed the top 10 countries affected by the pandemic are as shown in Fig. 1. From Fig. 1, it shows that the trend for these countries is similar where the number of cases is spiking from day to day. Figure 2 highlights the confirmed, deaths and recovered cases from January 2021 until January 2022. From Fig. 2, it shows that there have been no cases reported for recovery since August 2021 which is why, containment of spread is a serious priority. In identifying the close contact, CDC has laid down ground rules to be followed. Distance of less than 6-ft (approximately 2 m) and minimum duration of 15 min with one another in order to be accepted as close contact [2]. Nevertheless, these rules may vary from time to time depending on the type of variants that are at large in a particular place. From CDC’s requirements, two vital information are required to conduct contact tracing and they are: (1) distance and (2) duration of close contact. However, it is only probable with the usage of smart contact tracing solutions utilizing smart phones and electronic wearables. With that in mind, many digital contact tracing solutions were developed using various

Fig. 1 Top 10 countries affected by COVID-19 pandemic from year 2020 until 2021

COVID-19 Contact Tracing Using Low Calibrated Transmission …

15

Fig. 2 Confirmed, deaths and recovered cases from January 2021 until January 2022

localization technologies such as cellular networks, Wi-Fi, GPS as well as Bluetooth low energy (BLE) for COVID-19 pandemic [3]. From all these technologies that are being adopted, BLE has been acknowledged and proved as a promising solution [4]. The reason being is because, it is ubiquitous as majority of the people possess a smartphone with BLE in it and the privacy is being preserved as no location data is being requisite. The theory is straightforward where when two or more users are being in proximity with one another, the application automatically exchanges information with one another. In BLE-based solutions, measurement of RSSI is often being used in to obtain distance estimation [3]. As such, the accuracy or effectiveness in measuring distance for the solutions will explicitly depend on RSSI. However, in real-world application, there are many aspects that can affect the RSSI which makes it unreliable such as angle of arrival (AOA) and surrounding obstacles. Not just that, it can also yield false positives when the solution is not able to differentiate scenario for two person that are separated by a wall. Factors from software, hardware and external factors also will affect RSSI. As such, it is impractical and unreasonable to rely on RSSI in determining close contacts as the outcome will not be accurate with many false positives and negatives. In overcoming scenarios mentioned, we propose a new approach of low calibrated transmission power (Tx) using wearable as shown in Fig. 3. Usage of low Tx will cause the RSSI on the receiver side to be big as the signal produced is weak. At the same time, every wearable or chipset has its own maximum RSSI sensitivity that are different from one another. As the maximum sensitivity for RSSI is reached, a chipset will not be able to detect anymore the incoming signals theoretically. Thus, in return, it makes the number of signals captured by the receiver to be reduced than what it should have obtained in the stipulated time. Exploiting this insight, Tx is adjusted so that maximum RSSI sensitivity is reached at the wanted distance of 6-ft for the COVID-19 pandemic. This will ensure that

16

T. O. K. Zaw et al.

Fig. 3 a Close contact detected; b close contact not detected

Fig. 4 nRF52832 chipset

signals from BLE can only be detected (in adequate amount to be considered as close contact) when users are within the range under ideal condition making the accuracy of CCI to be high. While so, this approach phases out all the setbacks of solely using RSSI as the method for distance measurement for CCI. It also eliminates the mainstream role of RSSI by taking over CCI and leaves distance estimation to RSSI alone as an added feature. In this study, nRF52832, a versatile Bluetooth 5.2 system on a chip (SoC) was used for the experimentation which has maximum RSSI sensitivity of − 96 dBm [5]. Figure 4 shows the chipset used. It should be noted that this approach can be replicated by using other BLE modules as well.

COVID-19 Contact Tracing Using Low Calibrated Transmission …

17

2 Overcoming RSSI Shortcomings Practically, there are many factors in existence which can affect the RSSI making it unreliable to be used for distance estimation. This is the main reason why RSSI must not be the crucial component to be used for CCI. This section discusses on the two main factors that affect RSSI and how the approach overcomes some of these limitations.

2.1 Internal Factors Looking at the internal factors, it is mainly the hardware and the software that can have significant influence on the RSSI. For the hardware aspect, items such as antenna layout and the chipset play a vital role. Bluetooth chipset can affect the transmission power Tx which directly influences the RSSI. While so, manufacturers produce various components with different materials and settings making the technical items such as transmission power to differ. Not just that, antenna layout, orientations and data transmission capability can greatly affect the strength of the signal that are being transmitted [3]. As for the software, it is mainly operating systems that adapts various settings indirectly affects the battery consumption that influence the RSSI as well. All of these internal setbacks are hardware limitations that cannot be addressed for changes as it is fixated. Nevertheless, these internal factors will not effect on the accuracy of contact tracing using the approach as all the factors will be constant except for Tx.

2.2 External Factors Scrutinizing the external factors, it is divided into two areas and they are, physical obstruction and radio waves. Effects from radio waves are such as Wi-Fi where it can be configured to be operating on 2.4 GHz channel used by Bluetooth, in which both signals can influence one another [6]. For a crowded place with many Wi-Fi signals, it can have a substantial effect even though each network technology has its own methods to reduce signal interference. Using calibrated low Tx approach will make it less likely to be affected by other radio waves as the traveled distance is not wide. Not just that, angle of arrival (AOA) also can have significant reaction on the RSSI especially when the distance is big creating fluctuations. A big angle from the line-of-sight (horizontal) means the value of RSSI will be higher. Thus, when the angle is big, there will a reduction in the number of detections as RSSI value is high. Nevertheless, AOA does not have any significant effect in this approach eliminating the setback. As for physical obstruction, any object that is in the path of the signal can affect the RSSI. Anything from walls, human, glass as well as surface materials

18

T. O. K. Zaw et al.

can lead to signal interference such as diffraction and absorption. These physical obstructions can diminish and fluctuate the signal making calculation to be greater than the actual distance of two persons [7]. However, with this new approach, it makes the RSSI to be big in value approaching chipset’s maximum sensitivity at the desired distance of 2 m. Thus, physical obstructions such as people passing by or walls in the surrounding will only make the RSSI to be higher making the chipset unable to detect it producing less error. While small distance projection will make physical obstructions to not be much of an issue as well. Even where there is an error, it will be limited and smaller than many other BLE-based solutions.

3 Solution Approach—Experimentation and Results 3.1 Experimentation, Hardware and Software Set-Up In this study, readings from five nRF52832 SoC powered by 5 V lithium-ion battery were used. As for the experimentation, we separated it into two parts which are: (1) Proof of Concept to verify whether the approach is working as intended under ideal condition while to test for distance estimation as an added value. (2) Algorithm test to assess CCI accuracy using our constructed algorithm for ideal condition. For Proof of Concept, it is separated into two parts in order to determine the RSSI readings at various distances and to prove the concept. iPhone XS with iOS15.1 operating system was utilized as the receiver in the first part by using a mobile application to capture data packets from these two devices. Usage of two BLE modules will ensure variations can be observed if there are any. The chipsets will be sending out advertising packets periodically every 50 ms (can be change according to various situations) to the smart phone, in which it will scan every 5 s. This information then was recorded in an excel sheet afterward. Captured data has four main information and they are: (1) medium access control address (Mac address), (2) RSSI value, (3) date and (4) time. For nRF52832, it has its own central processing unit (CPU) of 64 MHz Arm Cortex-M4 with flash memory of 256 and 128 KB random access memory (RAM). Armed with maximum Tx of 4 dBm, it is more than adequate to be the transmitter in order to test the low calibrated Tx approach for close contact tracing. Tx in this study is set at-8 dBm which is the best suited value as the RSSI is reaching toward − 96 dBm (maximum sensitivity) at the distance of 2 m. Observation size of 100 readings was used in Proof of Concept part so that it is adequate and reliable. It should be noted that experiment was conducted in a controlled environment where no other network connectivity (mobile phone was kept in airplane mode) nor obstacles were present. At the same time, both sender and receiver were elevated to 0.5 m to imitate actual wearing of wearables on the wrist and given direct angle. Figure 5 illustrates the experiment set-up for Proof of Concept and Fig. 6 shows sample of captured data from excel sheet.

COVID-19 Contact Tracing Using Low Calibrated Transmission …

19

Fig. 5 Experiment part one set-up for proof of concept

Fig. 6 Sample of captured data

For algorithm test, it was conducted using five nRF52832 with one acting as a receiver and another four being the transmitter. In general, the algorithm is to fine-tune and cater for various real-life scenarios in existence. Similar to the Proof of Concept, data captured was formatted in excel sheets containing the same information. Again, devices were elevated at the height of 0.5 m to imitate real wearing of the wearables. Usage of four devices is to examine the approach ability in handling multiple users and also to test number of successful scans for different AOA toward CCI, especially when the AOA is within 45° from the receiver. This is because, in a study conducted by [8] has stated that airflow angle of spread for speech is at 42.9° and COVID-19 virus spreads mainly through tiny water droplets from one person to another. Thus, it is crucial for a solution to be able to detect close contact successfully within the angle from the receiver. Nevertheless, for ease of experimentation and lesser rooms of error, angle of 45° was chosen. For other angles however, less stricter provisions were set. While so, usage of four transmitters is also to determine minimum number of data captures required for CCI for four users’ case. Figure 7 shows the experiment set-up for algorithm testing and Fig. 8 illustrates the concept for the speech air flow for virus spread. From Fig. 7, it shows multiple positions that nRF52832 was placed to test for CCI for various distances of within 2 m for duration of 15 min for each test. While for Fig. 8, it illustrates 42.9° is the angle that virus most likely to spread due to angle of speech air flow which makes it an important angle that needs to have high accuracy for CCI.

20

T. O. K. Zaw et al.

Fig. 7 Experiment set-up for algorithm test

Fig. 8 Concept of speech air flow by [8]

3.2 Experimentation Part One—Proof of Concept RSSI Values for Distance Estimation Initial part of experiment is to ascertain average reference RSSI value, RSSIr , at a point in which, is usually set at 1 m for Bluetooth applications [9]. Apart from that,

COVID-19 Contact Tracing Using Low Calibrated Transmission …

21

Table 1 Results for RSSI measurement for three different distances using two devices under lineof-sight condition Device number, n

Condition

Distance, m in meters

Observations

Max value, RSSImax

Min value, RSSImin

Average RSSI, RSSIavg

1

No obstacle—line-of-sight

0

100

− 39

− 33

− 35

− 37

− 32

− 35

− 78

− 66

− 71

− 75

− 70

− 72

− 90

− 77

− 83

− 86

− 78

− 82

2 1

1

2 1

2

2

it is also to observe maximum limit of RSSI value at the distance of 2 m for the selected transmitted power. As there is no information on factory calibrated RSSI reference point for the hardware used, the average from the readings taken at distance of 1 m will be chosen as the RSSIr . Table 1 gives the result from the initial part of the experiment. Table 1 highlights that at distance of 0 m, the average RSSI reading is at − 35 dBm for both devices. Similar scenario also was also detected with other two distances as well, although there is a very minor difference of value for the distance of 1 m. However, with difference of only − 1 dBm, it is not significant enough that may indicate that the data obtained is not reliable. While so, average maximum RSSI value of − 83 and − 82 dBm was gained from both devices at the distance of 2 m, in which the initial will be used. A value higher than that will indicate that a user has a distance of more than 2 m. Nevertheless, in determining the RSSI reference point, RSSIr , the average of the average readings obtained at distance of 1 m will be obtained using the formula (1). for m = 1, RSSIr =

(RSSIavg1 + RSSIavg2 ) 2

(1)

Using Eq. (1), the calculation is shown below. RSSIr =

−143 −71 + (−72) = = −71.5 dBm ≈ −72 dBm 2 2

Thus, in estimating distance between two devices, − 72 dBm will be used as the constant RSSIr . Comparing to another Bluetooth beacon by Kontak.io, it also has RSSIr value of − 77 dBm at the distance of 1 m [10], which does not differ much from the value gained in this study. What is important is that observed values should be closing on to the maximum RSSI value of − 96 dBm for nRF50832 chipset at the distance of 2 m. In this study, Tx was set to − 8 dBm as it was observed that the RSSI is closing to its maximum sensitivity at 2 m even though it has not reached yet fully. The observed RSSI at 2 m,

22

T. O. K. Zaw et al.

Table 2 Summary of RSSI values obtained for distance estimation Item

Value obtained (dBm) Condition

RSSIr

− 72

No obstacle–LOS 1

Distance (m) Explanation RSSI reference value for distance estimation

RSSIavg,min

− 35

0

Minimum RSSI value for distance at 0 m

RSSIavg,max

− 83

2

Maximum RSSI value for distance at 2m

which is RSSIavg,max , is at − 83 dBm, and the difference for the maximum sensitivity is to offset the instability that is associated with RSSI especially from physical obstructions. Table 2 gives the RSSI values summary for the analysis conducted. Determining the Value of n for Distance Estimation In this study, distance estimation was computed using Eq. (2) following study conducted by [11]. RSSIr is the reference distance at 1 m and RSSIi is the RSSI value at the distance i. Even though there are few other equations that can be considered, Eq. (2) suits the best for the approach used in this study. This is because, as Tx was set to low and calibrated, setbacks such as signal propagation loss or paths loss model will not be relevant. The environmental factor constant, n, has the range of 2–4. It is also known as attenuation factor with value 2 being the lowest in strength and 4 being the highest. If a solution is meant to be used in an outdoor-mode, value of close to 2 can be utilized. While for indoor-type of environment, value range from 3 to 4 is commonly used. Distance = 10

RSSIi −RSSIr 10n

(2)

For this study, n with the value of 3.5 was chosen due to several reasons. First, it is because the study will be in an indoor-type of environment where there will be walls in the surrounding. Second, because using low Tx creates strong signal attenuation which makes the value to be high. Thus, there will be strong attenuation associated with it. Apart from that, nRF52832 only has maximum sensitivity of − 96 dBm, in which it will be reaching the maximum value when users go little bit further than 2 m. Thus, it will require a strong n-value to compensate it in the distance calculation. Therefore, considering all these factors, n-value of 3.5 will be chosen for this study as it fulfills the requirements needed and matches the RSSIavg,max to be set at 2 m. Table 3 gives the results from using Eq. (1) on the three distances. In general, n with values 3.5 shows that it has been able to calculate distance in quite effective manner with solution’s biggest potential distance estimation error under optimum condition is to be at 0.3209 m with an average error of 0.111 m. Not just that, the result also shows that the obtained value can be lesser than the actual distance. However, the difference is small that it will not have much of an

COVID-19 Contact Tracing Using Low Calibrated Transmission …

23

Table 3 Results of calculated distance using distance estimation formula Device Condition number, n 1 2 1

Actual Observations Average Calculated Difference, distance, RSSI, distance, D = dc − da RSSIavg d c (m) d a (m) − 35

0.087

0.087

0.3

− 49

0.220

0.02

0.5

− 68

0.7686

0.2686

− 69

0.8209

0.3209

1

− 72

1

0

1.5

− 77

1.3895

− 0.1105

− 80

1.6927

0.1927

− 79

1.5848

− 0.2152

− 80

1.6927

− 0.1073

− 83

2.062

0.062

No 0 obstacle—line-of-sight

100

2 1 2 1 2 1 2 1

1.8

2 1

2

2 Average

0.111

effect. Therefore, as a conclusion, the value n = 3.5 is proven to be a good value in estimating distance for this solution under ideal conditions as an added value. Not just that, using the low calibrated Tx approach gives on par estimation with other BLE-based contact tracing solutions. For example, [11] using multiple Bluetooth beacons is getting proximity error of 0.27 m in distance of 3 m. Faragher and Harle [12] on the other hand uses 19 beacons in a 600m2 environment and able to achieve error of 2.6 m. Proof of Concept—Experimentation Results For Proof of Concept, experimentation was conducted using two and four number of devices, nd , for multiple distances within 15 min period. Scanning interval, t s , was set at 5 s per scan. With that, number of scans needed for 15 min duration, nsn , is shown in Formula (3). While so, Table 4 gives the experiment conducted and its results. n sn =

60(s)×15(minutes) ts

nd

=

900 ts n d

(3)

From Table 4, it gives that the concept for low Tx is proven as ndtc is getting smaller when the distance gets bigger using one and four devices. Nevertheless, it is important to be able to select nmin effectively so that CCI’s accuracy is high. Considerations taken is that the ndtc must be the smallest (even though it may not

A

A1

A2

B

LOS

LOS

LOS

LOS

1

Position

Condition

Number of devices, nd

Table 4 Results for proof of concept

180

Number of scans needed nsn = (900)/ t s nd 180

Number of detection, ndtc

117

2.5 3

178

179 143 131 113 120

2 2.1 2.2 2.3 2.5

1.5

180

2 1

179

1

180

127

2.3

180

108

2.2

2

135

2.1

1

150 158

2

179

5s

Scan interval, ts

1.5

1

Actual distance, d a

99.44%

100%

98.89%

99.44%

100%

83.33%

99.44%

100%

Signal detection (ndtc /180) × 100

66.67%

62.78%

25.56%

79.44%

65.00%

70.56%

60.00%

75.00%

87.78%

False detection

(continued)

150 (considering maximum distance of 2 m)

Number of min. scans needed, nmin

24 T. O. K. Zaw et al.

Number of devices, nd

C

D

LOS

LOS

2-inch wooden Zero angle door, LOS

Position

Condition

Table 4 (continued)

110 107 101

2.3 2.5 3

120

2.5 3

172

119

2.3

169

124

2.2

1.5

121

2.1

1

179 152

2

179

130

2.2

1.5

145

2.1

177

172

1

160

2

107

Number of detection, ndtc

1.5

Number of scans needed nsn = (900)/ t s nd 175

Scan interval, ts

1

3

Actual distance, d a

95.56%

93.89%

99.44%

98.33%

95.56%

88.89%

97.22%

Signal detection (ndtc /180) × 100

66.67%

66.11%

68.89%

67.22%

84.44%

56.11%

59.44%

61.11%

72.22%

80.56%

59.44%

False detection

(continued)

Number of min. scans needed, nmin

COVID-19 Contact Tracing Using Low Calibrated Transmission … 25

Condition

Condition

LOS

Number of devices, nd

nd

4

Table 4 (continued)

62 68

C

D

74

A

62

70

D

B

64

C 2

57

70

A

1.5

78

D

B

39

82 69

75

C

1

3s

122

3 ndtc

130

2.5 nsn

144

2.3

ts

153

2.2

da

160

Number of detection, ndtc

158

Number of scans needed nsn = (900)/ t s nd

2.1

Scan interval, ts

2

Actual distance, d a

B

A

Position

Position

90.67%

82.67%

82.67%

98.67%

93.33%

89.33%

76%

93.33%

100%

52%

92%

100%

(ndtc /75) × 100

88.89%

Signal detection (ndtc /180) × 100

Number of min. scans needed, nmin

Total ndtc :289

Total ndtc : 291

(continued)

Total ndtc : 57 286 (considering distance of 2 ± 0.3 m)

67.78%

72.22%

80.00%

85.00%

87.78%

False detection

26 T. O. K. Zaw et al.

Number of devices, nd

Condition

Table 4 (continued)

75

Number of detection, ndtc

31 55 97

C

D

A

B

59

D 3

95 95

C

16

47

A

B

93

D 2.5

32

C

50

A 85

58

D 2.3

80

B

66

A

C

65

D

B

69 75

C 2.2

Number of scans needed nsn = (900)/ t s nd 53

2.1

A

Scan interval, ts

B

Actual distance, d a

Position

Total ndtc :265

Total ndtc :271

Total ndtc :282

Total ndtc :278

Total ndtc :281

Signal detection (ndtc /180) × 100

100%

73.33%

41.33%

78.67%

100%

100%

21.33%

62.67%

100%

42.67%

100%

66.67%

77.33%

100%

88%

86.67%

100%

92%

70.67%

100%

False detection

Number of min. scans needed, nmin

COVID-19 Contact Tracing Using Low Calibrated Transmission … 27

28

T. O. K. Zaw et al.

apply for all cases) for distance under 2 m under LOS. This will ensure all instances within the wanted distance to be considered as close contact with very small error. Nevertheless, false positives do exist as shown from the darkest gray boxes in Table 4. What it indicates is that, as the nd gets bigger, it is not really reliable to solely rely on ndtc for CCI. It is only reliable when the nd is small especially within ten users. Thus, a new way is needed in order to reduce the false positives as shown in the next section.

3.3 Experimentation Part Two—Algorithm Test Algorithm Considerations and Approach There are factors that can make number of scans to be lower (or higher) and they are: (1) scan interval, t s, (2) AOA from other users, (3) physical obstruction, (4) passerby scenario and (5) multiple devices. If the number of successful scans is higher than theoretically expected, it will still cannot be taken as 100% accurate. Thus, in constructing algorithm for CCI, these factors must be taken into considerations under real-life application. While for AOA and physical obstruction, experimentation results from Table 4 has shown does it does have much of an affect with this method used. Thus, the only factors left are number of devices, nd, and passer-by scenario as t s is a variable that can be changed. In overcoming the multiple devices scenario, nmin is needed for specific value of nd to be used in a venue. While for passer-by scenario, a function or method will be required to determine whether a person is close contact or not. In this case, it is done by dividing the duration of 15 min into three-time segments as shown in Fig. 9. Thus, ndtc needs to appear at least once in one of the time segments while fulfilling the nmin requirement to be accepted as close contact. Algorithm Construction Algorithm is constructed based on the scenarios stated in the previous section and also results from experimentation. In constructing the algorithm, we set the data limit to one-day for ease of analysis. This value, however, can be changed according to situational requirements. For data capture, MAC address was utilized for user identification and number of successful scans. Figure 10 shows the proposed algorithm. For the algorithm, segmentations of data according to MAC address were utilized. This is because, it is more efficient rather than segmenting the data into 15 min group per MAC address as it will produce unnecessary data flow and requires more computational power. From Fig. 10 as well, it shows that the main method for CCI is the usage Fig. 9 Separation of 15 min duration into three-time segments

COVID-19 Contact Tracing Using Low Calibrated Transmission …

29

Fig. 10 Proposed algorithm for close contact identification

of number of successful signal scans, ndtc , in which the number may vary depending on number of users, nd . Nevertheless, the algorithm is a general one that features and functionalities can be added in making it better. With that being said, applying the algorithm toward the results for four device experimentation while taking into account of additional ± 0.3 m distance to compensate internal and external factors, the obtained accuracy is at 71.43%.

3.4 Discussion From the study conducted, it shows that low calibrated Tx for COVID-19 contact tracing has been quite effective with high accuracy of 71.43% following CDC’s requirements. Nevertheless, there are still limitations and further improvements that are still in existence. Limitations such as mass testing is yet to be conducted, in which it can be a future direction for researchers before a decision can be made to implement in the real world. Not just that, limitations in existence are such as small number of devices and limited number of experimentations. Results will be more solid if more tests with more devices can be achieved. Thus, it is important that further testing can be conducted in order to observe the consistency in the accuracy and accuracy for

30

T. O. K. Zaw et al.

different number of devices. While so, this approach can be used upon future similar pandemics to come. It is because, variables such as, scan time, duration needed and Tx can be modified according to what a situation requires making the approach to be very versatile. Looking at the experimentation, it is clearly shown that as the number of devices increases, number of successful scans will decrease. However, this setback is overcome using signal occurrence in the three-time segments within 15 min duration.

4 Conclusion In our proposed solution, we utilized low calibrated Tx in order to conduct effective contact tracing for COVID-19 pandemic. RSSI in this approach only be used as an added feature for distance estimation. While for CCI, it will be utilized mainly by number of successful signal scans and signal appearance in the three-time segments. Results from experimentation have shown to be positive and accurate. Thus, as a final conclusion, it can be said that using low calibrated Tx approach for COVID-19 contact tracing is a good tool or method to be utilized by countries and implementers in better containing the spread.

References 1. Wei WE, Li Z, Chiew CJ, Yong SE, Toh MP, Lee VJ (2020) Presymptomatic transmission of SARS-COV-2—Singapore, January 23—March 16, 2020. MMWR Morb Mortal Wkly Rep 69(14):411–415 2. Health departments (2021) Contact tracing for covid-19. Centers for Disease Control and Prevention. Retrieved from https://www.cdc.gov/coronavirus/2019-ncov/php/contact-tracing/ contact-tracing-plan/contact-tracing.html 3. Zhao Q, Wen H, Lin Z, Xuan D, Shroff N (2020) On the accuracy of measured proximity of bluetooth-based contact tracing apps. In: Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, pp 49–60 4. O’Neill PH (2020) Bluetooth contact tracing needs bigger, better data. MIT Technology review. Retrieved from https://www.technologyreview.com/2020/04/22/1000353/%20bluetooth-con tact-tracing-needs-bigger-better-data/ 5. Nordic Semiconductor (2021) NRF52832-Nordicsemi.com. Multiprotocol Bluetooth 5, ANT/ANT+ and 2.4 GHz proprietary system-on-chip. Nordic Semiconductor | Specialists in low power wireless-nordicsemi.com. Retrieved from https://www.nordicsemi.com/-/media/ Software-and-other-downloads/Product-Briefs/nRF52832-product-brief.pdf?hash=2F9D99 5F754BA2F2EA944A2C4351E682AB7CB0B9&la=en 6. Sibi´nski D (2018) WIFI and bluetooth interference-diagnosing and fixing. CodeJourney.net. Retrieved from https://www.codejourney.net/2017/04/wifi-and-bluetooth-interference-diagno sing-and-fixing/ 7. Labrique D (2020) Effects of obstructions on the accuracy of bluetooth contact tracing 8. Han M, Ooka R, Kikumoto H, Oh W, Bu Y, Hu S (2021) Experimental measurements of airflow features and velocity distribution exhaled from sneeze and speech using particle image velocimetry. Build Environ 205:108293. https://doi.org/10.1016/j.buildenv.2021.108293

COVID-19 Contact Tracing Using Low Calibrated Transmission …

31

9. Maccari L, Cagno V (2021) Do we need a contact tracing app? Comput Commun 166:9–18 10. Toulson S (2020) Transmission power, range and RSSI–support center. Retrieved from https:// support.kontakt.io/hc/en-gb/articles/201621521-Transmission-power-Range-and-RSSI 11. Mackey A, Spachos P, Song L, Plataniotis KN (2020) Improving ble beacon proximity estimation accuracy through bayesian filtering. IEEE Internet Things J 7(4):3160–3169 12. Faragher R, Harle R (2015) Location fingerprinting with bluetooth low energy beacons. IEEE J Sel Areas Commun 33(11):2418–2428

Monitoring Loud Commercials in Television Broadcast Silvana Sukaj and Rosaria Parente

Abstract Television programming is a source of entertainment widely used by people around the world. To cover production costs, commercial televisions broadcast television commercials. To highlight the content of the spots, some television networks broadcast them at higher volumes than in ordinary programming. This commercial strategy causes an annoyance in the viewer who considers the commercial as noisy. Complaints from viewers have attracted the attention of the legislators who have regulated the broadcasting of the commercials. Despite this, there are still some television networks that do not respect these indications. In this study, the volumes of various advertising spots broadcast on various regional and national television channels were detected and analyzed. The recordings were made in a room of a home to reproduce the typical conditions in which viewers find themselves. The volumes of the commercials were compared with the volumes of the television programs within which these commercials were reproduced. The measures show that advertisements are broadcast at higher volumes than the television programming in which they are placed. Keywords Commercials · Noise emission · Television broadcast · Noise annoyance

1 Introduction Companies use advertising to spread their image and establish themselves among consumers. Advertising takes place mainly through the mass media: Press, television, radio, cinema and web represent the vehicle for the dissemination of the advertising message, obtaining in turn an advantage in terms of financing [1]. S. Sukaj Department of Engineering and Architecture, European University of Tirana (UET), 1000 Tirana, Albania R. Parente (B) Benecon University Consortium, 80138 Naples, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_3

33

34

S. Sukaj and R. Parente

Marketing communication is a way for the company to interact with existing consumers and make its image known to new potential customers. In addition to the final goal, which is achieved through the sale of the advertised product, other intermediate and instrumental objectives no less important for companies must be considered [2]. One goal is, for example, to increase the number of individuals exposed to advertising, who could positively process the message contained, consolidating their idea of the brand image and finally buy the product, adopting the brand [3]. Companies use advertising to make their product known and affirmed among consumers: The goal is to create a successful advertising strategy, able to reach and communicate to the entire target audience. Therefore, we try to create an ad hoc message for the target of the product/service. The success of an advertising project depends on numerous factors, which come from both inside and outside the market in which the company operates. The trends outside the market are difficult to predict and estimate. A completely uncontrollable factor is the continuous development of technology, especially at a time when digital is one of the most widely used means to spread an advertising message. On the other hand, the internal dynamics of the market are much more predictable since they present similar trends at regular intervals [4]. The success of an advertising campaign depends on the content of the messages that are offered to consumers but mainly on the number of contacts it manages to reach. Since television remains the main media through which it is possible to reach the maximum number of contacts, at least for specific population targets, most advertising companies orient their products on this communication channel [5]. There are no rules that establish what the canonical structure of an advertisement should be. The essential thing, however, is that it manages to capture the consumer’s attention, so that the message you want to convey can reach the recipient. Often, however, the consumer watches a television program with transport and the arrival of an advertisement is ignored, causing disturbance. To highlight the message contained in the commercial, tricks are often used that can amplify its passage. One of these is to increase the volume of the commercial to capture the attention of the viewer. However, this variation in volume is perceived by the viewer as a nuisance and generates a discontent that often culminates in protests [6]. The problem has been so felt by the population that some television production companies have equipped their devices with automatic volume controllers (Sony). Audience complaints highlighted the problem at a political level and prompted the legislators of the various countries to issue specific regulations aimed at regulating the sound power during acronyms and advertising messages broadcast on television. Regardless of any rules or recommendations that must be followed as defined by the regulatory bodies, there are a whole series of protocols and quality controls that all companies producing and distributing audio video content adopt to make the quality of television broadcasts is homogeneous and consistent. The pandemic period that has swept the entire globe has forced millions of people to stay indoors for a long time with the significant increase in the use of television programs. This increase in the consumption of television content has once again highlighted the problem, demonstrated by a significant increase in reports of noisy advertising [7]. In the USA, a specific law has been enacted (Commercial Advertisement Loudness

Monitoring Loud Commercials in Television Broadcast

35

Mitigation Act) which requires that the audio of TV commercials not be broadcast at a higher volume than the accompanying TV program material [8]. The law entered into force in 2012 and requires the Federal Communications Commission to monitor and collect reports from viewers. In Europe, the European Broadcasting Union (EBU) [9] has addressed the problem and issued a recommendation for the normalization of the volume and maximum level of audio signals. These recommendations were first issued in 2010 and subsequently updated in 2020. EBU’s recommendations recommend normalizing the audio to the target level of − 23 LUFS. The LUFS measures the integrated sound intensity calculated over the entire duration of the program and its contents, it is a unit of volume measurement on an absolute scale, K-weighted, relative to a digital scale. Although the problem has been addressed, the issue of the excessive volume of TV commercials remains a current problem. In this study, the volumes of various advertising spots broadcast on various regional and national television channels were detected and analyzed. The recordings were made in a room of a home to reproduce the typical conditions in which television viewers find themselves. The volumes of the commercials were compared with the volumes of the television programs within which these commercials were reproduced.

2 Materials and Methods The protection from noise in living spaces has been an aspect which has long been neglected even if it is of primary importance for the development of our life [10]. The disturbance caused by noise affects the quality of life, housing conditions and is a source of health problems. Excessive noise levels compromise the good quality of life because they cause physical and mental discomfort. Exposure of the auditory organ to loud noises leads to a temporary lowering of the hearing capacity, which lasts the longer the exposure to the noise. Noise, in fact, is an important stress factor that causes a series of body reactions such as changes in heart, respiratory rate and sleep disturbance. It also interferes with attention and learning [11]. Noise pollution is a decisive parameter for evaluating the environment: A home, an office and a place of study appear to have a superior quality if placed in a nonnoisy environmental context, also affecting the commercial value of the immovable itself. Furthermore, the perception of noise essentially depends on the quality of the environment in which one lives [12]: In a closed environment, the sound emitted by a sound source reaches the receiver through a direct component and through the countless reflections to which the noise of the waves is subjected from the walls of the room. These reflections reach the receptor with a certain delay which depends on the properties of the environment and which determines its acoustic characteristics [13, 14]. To check the audio volume levels of commercials with respect to television programs, various audio recordings were made [15–17]. Audio clips of various television programs broadcast by national and regional broadcasters were recorded. To

36

S. Sukaj and R. Parente

Table 1 Technical specification of Zoom—H1n/IFS device

Specification

Value

Recording formats

44.1 kHz/16-bit, 48 kHz/16-bit, 48 kHz/24-bit, 96 kHz/24-bit

Built-in stereo mic

Unidirectional condenser, 90° XY stereo format

Mic gain

− ∞ dB to + 39 dB

Maximum SPL

120 dB SPL

Dimensions

1.9 in (W ) × 5.4 in (D) × 1.2 in (H)

make the comparison as complete as possible, different types of television programs were analyzed: movies, sports programs, music videos and shows. For each program, the audio of the program and the TV commercial inserted into it have been recorded. The audio clips have been recorded for a sufficiently long period to give a good representation of the overall volume [18–20]. The recordings were made with a digital stereo handheld recorder (Zoom—H1n / IFS) with technical specifications given in Table 1. The recorder was placed on a tripod at a height of about 0.80 m to simulate the height of the ears of a person sitting on a sofa. The recorder was placed about two meters from the TV. During playback of broadcasts, the volume of the TV was adjusted to 20% of the total available volume. Four types of television programs were analyzed: movies, sports programs, music videos and entertainment shows. For each type of program, the television commercial inserted in the programming was recorded for a duration of about 30 s, in the case of commercials of longer duration, multiple recordings were made. Each audio clip was subsequently processed to extract appropriate sound descriptors to compare the volumes of the broadcasts with those of the advertising spots. The equivalent sound level defined through Eq. (1) was adopted as the acoustic descriptor. ⎡ L Eq

1 = 10 ∗ log⎣ T

T 0

⎤ p2 ⎦ dt 2 pref

(1)

In Eq. (1), the terms are defined as follow: • • • •

L Eq is the equivalent sound level T is the registration period p is the signal recorded pref is a reference signal.

The equivalent sound level represents an average of the sound level emitted by an intermittent source over the period T considered [21, 22]. The importance of this level is to allow us to quantify the sound level emitted by a source through a single number [23–26].

Monitoring Loud Commercials in Television Broadcast

37

3 Results and Discussion As anticipated, four types of television programs were analyzed: movies, entertainment shows, sports programs and music videos. For each type of television programs, several recordings were made to evaluate an average volume of the transmission. The acoustic descriptors of these audio clips were extracted, and an average was carried out between the values obtained [27, 28]. The TV commercial included in the programming was recorded for each type of program. Figure 1 shows a comparison between the A-weighted equivalent sound level of a movie clip and that of a commercial included in the programming of the television network. Figure 1 shows the equivalent sound level which represents an average of the sound level emitted by an intermittent source in the considered time T. Figure 1a shows the equivalent sound level of a movie clip, the horizontal line indicates the LeqA value, while Fig. 1b shows the equivalent sound level of a commercial inserted in the same programming of the television network. The recordings were made keeping the volume of the television set at a volume equal to 20% of its availability [29–32]. From the comparison between the two figures, the LeqA of the TV commercial is higher than that of the movie. Figure 2 shows a comparison between the A-weighted equivalent sound levels of an evening entertainment program clip (quiz game) and that of a commercial included in the programming of the television network. Figure 2a shows the equivalent sound level of an entertainment program clip, the horizontal line indicates the LeqA value, while Fig. 2b shows the equivalent sound level of a commercial inserted in the same programming of the television network. The recordings were made keeping the volume of the television set at a volume equal to 20% of its availability. From the comparison between the two figures, the LeqA of the television commercial is higher than that of the entertainment program.

Fig. 1 Comparison between equivalent sound levels of a movie clip (a) and the TV commercial included in the television programming (b)

38

S. Sukaj and R. Parente

Fig. 2 Comparison between equivalent sound levels of an entertainment program (a) and the TV commercial included in the television programming (b)

Figure 3 shows a comparison between the A-weighted equivalent sound level of a fragment of a football match and that of a commercial included in the programming of the television network. Figure 3a shows the equivalent sound level of a fragment of a football match, the horizontal line indicates the LeqA value, while Fig. 3b shows the equivalent sound level of a commercial inserted in the same programming of the television network. From the comparison between the two Figures, it is evident that the LeqA of the tele-visual spot is higher than that of the movie, just note the arrangement of the horizontal line which in Fig. 3b appears higher. Finally, Fig. 4 shows a comparison between the A-weighted equivalent sound level of a fragment of a music video and that of a commercial included in the programming of the television network. Figure 4a shows the equivalent sound level of a fragment of a music video, the horizontal line indicates the LeqA value, while Fig. 4b shows the equivalent sound level of a commercial inserted in the same programming of the television network [33–35]. From the comparison between the two figures, it is evident that the LeqA of the television commercial is higher than that of the music

Fig. 3 Comparison between equivalent sound levels of a football match (a) and of the TV commercial included in the television programming (b)

Monitoring Loud Commercials in Television Broadcast

39

Fig. 4 Comparison between equivalent sound levels of a music video (a) and the TV commercial included in the television programming (b)

Table 2 Comparison between the LeqA values of the recorded audio fragments Typology

LeqA (dBA)

Typology

LeqA (dBA)

Comparison (%)

Movie

51.9

Spot

58.3

12

Variety

52.1

Spot

59.2

14

Match

55.0

Spot

59.6

8

Music

59.2

Spot

61.6

4

video, just note the arrangement of the horizontal line that appears higher up in figure (b). In the case of Fig. 4, however, the difference in the average values of the acoustic energy transmitted is less marked [36–38]. To make the comparison easier, the values of the acoustic descriptor LeqA have been reorganized in Table 2. Table 2 gives us that in all the cases analyzed the commercial is broadcast at an average volume higher than that of the television programming in which it is inserted. The volume of the spots is higher in a percentage ranging from 4 to 14%. This indicates that the degree of the difference in volume depends on the television broadcast being witnessed and the type of advertising message conveyed [39–42]. In the case of the entertainment show, the volume of the commercial is 14% higher while in the case of the music video, it is reduced to 4%. However, this depends on the content that in the case of the music video is transmitted at a higher volume. In all the cases analyzed, the contribution made by the environment in which the television set is placed is decisive, which determines the reverberation of the sound [43–45].

4 Conclusion Television programs are followed by a large percentage of the population from all over the world, who can relax by following the type of programming they prefer.

40

S. Sukaj and R. Parente

To cover production costs, commercial televisions include in the programming of television commercials. Some television networks to highlight the content of the spots transmit them at higher volumes than in ordinary programming. This commercial strategy causes annoyance in the viewer who considers the commercial as noisy and causes numerous complaints. In this study, the volumes of various advertising spots broadcast on various regional and national television channels were detected and analyzed. The recordings were made in a room of a home to reproduce the typical conditions in which viewers find themselves. The volumes of the commercials were compared with the volumes of the television programs within which these commercials were reproduced. Measurements show that advertisements are broadcast at higher volumes than the television programming in which they are placed. The volume of commercials is higher than that of television programming in a percentage ranging from 4 to 14%. This indicates that the level of the difference in volume depends on the television broadcast you are watching. In the case of the entertainment show, the volume of the commercial is 14% higher while in the case of the music video it is reduced to 4%. However, this depends on the content that in the case of the music video is transmitted at a higher volume. Similar measurements should be made for all television channels and based on the results obtained, producers should be obliged to reduce broadcasting levels to bring them within the limits imposed by the legislation. This monitoring should be carried out by an independent body, to be able to express an opinion above any specific interest. Furthermore, it would be advisable for the levels of the commercials to be suitably modulated according to the program in which they are inserted. Loud spots should be aimed at louder programs such as music programs. Conversely, in programs that require concentration, such as a movie, more sensitive spots should be targeted.

References 1. Lienhart R, Kuhmunch C, Effelsberg W (1997) On the detection and recognition of television commercials. In: Proceedings of IEEE international conference on multimedia computing and systems. IEEE, pp 509–516 2. Laskey HA, Day E, Crask MR (1989) Typology of main message strategies for television commercials. J Advert 18(1):36–41 3. Singh SN, Rothschild ML (1983) Recognition as a measure of learning from television commercials. J Mark Res 20(3):235–248 4. Raedts M, Roozen I, De Weerdt E (2019) The effectiveness of subtitles in cross-cultural television commercials. World Englishes 38(3):387–403 5. Fung P, Ho AG (2019) Study on how television commercials affect consumer reactions with visual strategies. In: International conference on applied human factors and ergonomics. Springer, Cham, pp 162–173 6. Moore BC, Glasberg BR, Stone MA (2003) Why Are commercials so loud? perception and modeling of the loudness of amplitude-compressed speech. J Audio Eng Soc 51(12):1123–1132 7. Staff AES (2006) Loudness trumps everything. J Audio Eng Soc 54(5):421–423

Monitoring Loud Commercials in Television Broadcast

41

8. Loud commercials. https://www.fcc.gov/media/policy/loud-commercials. Last Accessed 18 Jan 2022 9. European Broadcasting Union (EBU). https://www.ebu.ch/home. Last Accessed 18 Jan 2022 10. Shepherd D, Welch D, Dirks KN, McBride D (2013) Do quiet areas afford greater health-related quality of life than noisy areas? Int J Environ Res Public Health 10(4):1284–1303 11. Iannace G, Berardi U, De Rossi F, Mazza S, Trematerra A, Ciaburro G (2019) Acoustic enhancement of a modern church. Buildings 9(4):83 12. Puyana Romero V, Maffei L, Brambilla G, Ciaburro G (2016) Acoustic, visual and spatial indicators for the description of the soundscape of waterfront areas with and without road traffic flow. Int J Environ Res Public Health 13(9):934 13. Ciaburro G, Iannace G (2021) Acoustic characterization of rooms using reverberation time estimation based on supervised learning algorithm. Appl Sci 11(4):1661 14. Jang HS, Jeon JY (2016) Acoustic characterization of on-stage performers in performing spaces. Appl Acoust 114:159–170 15. Kyon DH, Kim MS, Bae MJ (2013) Current status and problems in mastering of sound volume in TV news and commercials. Int J Multimedia Ubiquitous Eng 8(3):399–406 16. Iannace G, Ciaburro G, Trematerra A (2020) The acoustics of the holy family church in Salerno. Can Acoust 48(1) 17. Moore BC, Glasberg BR, Stone MA (2003) Why Are commercials so loud?’ perception and modeling of the loudness of amplitude-compressed speech. J Audio Eng Soc 51(12):1123–1132 18. Malecki P, Wiciak J (2010) Sound pressure level analysis of commercials and regular programs. Acta Phys Pol A 118(1):118–122 19. Florentine M (2011) Loudness. In: Loudness. Springer, New York, pp 1–15 20. Campbell W, Paterson J, Toulson R (2010) The effect of dynamic range compression on the psychoacoustic quality and loudness of commercial music. pp 2580–2588 21. Iannace G, Ciaburro G (2021) Modelling sound absorption properties for recycled polyethylene terephthalate-based material using Gaussian regression. Building Acoustics 28(2):185–196 22. Puccinelli NM, Wilcox K, Grewal D (2015) Consumers’ response to commercials: when the energy level in the commercial conflicts with the media context. J Mark 79(2):1–18 23. Ciaburro G, Iannace G, Lombardi I, Trematerra A (2020) Acoustic design of ancient buildings: the odea of Pompeii and Posillipo. Buildings 10(12):224 24. Ciaburro G (2021) Security systems for smart cities based on acoustic sensors and machine learning applications. In: Machine intelligence and data analytics for sustainable future smart cities. Springer, Cham, pp 369–393 25. Renz T, Leistner P, Liebl A (2019) Use of energy-equivalent sound pressure levels and percentile level differences to assess the impact of speech on cognitive performance and annoyance perception. Appl Acoust 153:71–77 26. Sukaj S, Ciaburro G, Iannace G, Lombardi I, Trematerra A (2021) The acoustics of the benevento roman theatre. Buildings 11(5):212 27. Puyana-Romero V, Cueto JL, Gey R (2020) A 3D GIS tool for the detection of noise hot-spots from major roads. Transp Res Part D Transp Environ 84:102376 28. Ciaburro G (2020) Sound event detection in underground parking garage using convolutional neural network. Big Data Cogn Comput 4(3):20 29. Emmett J, Emmett J (2003) Audio levels-in the new world of digital systems. EBU technical review 30. Núñez-Solano D, Puyana-Romero V, Ordóñez-Andrade C, Bravo-Moncayo L, Garzón-Pico C (2019) Impulse response simulation of a small room and in situ measurements validation. In: Audio Engineering Society Convention 147. Audio Engineering Society 31. Ciaburro G, Iannace G, Puyana-Romero V, Trematerra A (2020) A comparison between numerical simulation models for the prediction of acoustic behavior of giant reeds shredded. Appl Sci 10(19):6881 32. Yoon SG (1993) The role of music in television commercials: the effects of familiarity with and feelings toward background music on attention, attitude, and evaluation of the brand. Doctoral dissertation, University of Georgia

42

S. Sukaj and R. Parente

33. Ciaburro G, Iannace G (2022) Membrane-type acoustic metamaterial using cork sheets and attached masses based on reused materials. Appl Acoust 189:108605 34. Grimm E, Skovenborg E, Spikofski G (2010) Determining an optimal gated loudness measurement for TV sound normalization. In: Audio Engineering Society Convention 128. Audio Engineering Society 35. Yeung AWK (2021) Brain responses to watching food commercials compared with nonfood commercials: a meta-analysis on neuroimaging studies. Public Health Nutr 24(8):2153–2160 36. Ciaburro G, Iannace G (2021) Modeling acoustic metamaterials based on reused buttons using data fitting with neural network. J Acoust Soc Am 150(1):51–63 37. Spaleniak P, Kostek B (2012) Automatic analysis system of TV commercial emission level. In: 2012 Joint conference new trends in audio & video and signal processing: algorithms, architectures, arrangements and applications (NTAV/SPA). IEEE, pp 65–69 38. Iannace G, Bravo-Moncayo L, Ciaburro G, Puyana-Romero V, Trematerra A (2019) The use of green materials for the acoustic correction of rooms. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol 259, no. 7. Institute of Noise Control Engineering, pp 2589–2597 39. Nomura T, Mitsukura Y (2015) EEG-based detection of TV commercials effects. Procedia Comput Sci 60:131–140 40. Iannace G, Berardi U, Ciaburro G, Trematerra A (2020) Egg cartons used as sound absorbing systems. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol 261, no. 6. Institute of Noise Control Engineering, pp 405–412 41. Angell R, Gorton M, Sauer J, Bottomley P, White J (2016) Don’t distract me when i’m media multitasking: toward a theory for raising advertising recall and recognition. J Advert 45(2):198– 210 42. Iannace G, Ali M, Berardi U, Ciaburro G, Alabdulkarem A, Nuhait A, Al-Salem K (2020) Development and characterization of sound-absorbing materials produced from agricultural wastes in Saudi Arabia. In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings, vol 261, no. 5. Institute of Noise Control Engineering, pp 1806–1812 43. Malik H, Farid H (2010) Audio forensics from acoustic reverberation. In: 2010 IEEE international conference on acoustics, speech and signal processing. IEEE., pp 1710–1713 44. Khanum N, Shareef A, Khanam F (2015) The effects of animation in TV commercials on information recall. Acad Res Int 6(3):349–358 45. Van der Goot MJ, van Reijmersdal EA, Kleemans M (2015) Age differences in recall and liking of arousing television commercials. Communications 40(3):295–317

Potential Customers Prediction in Bank Telemarketing Le Dinh Huynh, Phung Thai Duong, Khuat Duy Bach, and Phan Duy Hung

Abstract Data mining plays a vital role in the success of direct marketing campaigns by predicting which leads subscribe to a term deposit. This study is accomplished to illustrate with practical mining methods that the data are related to a Portuguese banking institution’s direct marketing campaign (phone calls). The algorithms are used: K-nearest neighbor, logistic regression, linear supported vector machines, and extreme gradient boosting to classify potential customers for long-term deposits finance products. Response coding is used to vectorize categorical data while solving a machine learning classification problem. Accuracy and AUC scores are key metrics to evaluate performance. We inherited selecting important features from previous research. This paper employed a better method by combining response coding techniques with practical algorithms in an unbalanced dataset. The best prediction model achieved 91.07% and 0.9324 of accuracy and AUC score, significantly higher than the prior of 79% and 0.8, respectively. Keywords Bank telemarketing · Data mining · Response coding · K-nearest neighbor classifier

1 Introduction In the finance sector, marketing is a tool that helps commercial banks effectively distribute and use the money of individual customers and corporate customers. L. D. Huynh · P. T. Duong · K. D. Bach · P. D. Hung (B) Computer Science Department, FPT University, Hanoi, Vietnam e-mail: [email protected] L. D. Huynh e-mail: [email protected] P. T. Duong e-mail: [email protected] K. D. Bach e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_4

43

44

L. D. Huynh et al.

The goal can be achieved in both indirect marketing (mass marketing) and direct marketing (one-to-one contact) [1]. Direct marketing has shown to yield better results than mass marketing since it provides organizations with better interaction with both current and prospective customers [2]. Telemarketing is a form of direct marketing. In the modern economy, telemarketing still plays a significant role in marketing. With the development of technologies, telemarketing can be taken in face-to-face or formal calls with a low fee, and at the same time want to save resources (time, cost) for people who are not potential customers. Our study applies machine learning techniques to classify potential customers through their personal information and favorite. To make it clear, we consider the responses of customers as a binary classification problem. Two classes are “yes,” which denotes the customers have interest in a term deposit, and “no” represents the customers who do not have interest in a term deposit. According to research by Ghoddusi et al. [3], from 2005 to 2018, more than 130 studies were presented that applied machine learning to finance. This study aims to apply machine learning techniques to predict potential customers for selling banking finance products through telemarketing. The data in this study will consist of most related 20 features extracted from 150 the original data from a Portuguese retail bank between 2008 and 2013 [4]. In most customer datasets collected by banks, the number of customers who agree to use credit packages usually accounts for a minimal number. As a result, these datasets are imbalanced. This issue is important in model evaluation, as shown in the study by Miguéis et al. [5] or mentioned in a research paper by Zhang et al. [6]. Thus, one measure of effectiveness is the AUC, which is independent of the frequency of class or specific false-positive/negative data by Martens et al. [7]. A common approach is to use data mining. According to research by Moro et al. [4], they used a dataset of 52,944 customer calls with 150 attributes obtained by a Portuguese retail bank between 2008 and 2013. With the application of a semiautomatic feature selection to reduce to 22 features and compare the results of 4 data mining models such as logistic regression, decision trees, neural network, and support vector machine, the best result obtained is AUC = 0.8 with neural network model. A similar study, combining data mining and the decision tree model of Amponsah and Pabbi [8], gave very good results with a ROC value of 0.925. Also, in the bank customer classification problem, Kozak and Juszczuk [9] presented a new model based on the ant colony decision forest algorithm presented specifically in the study of Boryczka and Kozak [10] to obtain the results of 0.6436. In direct marketing datasets, a very common feature that greatly affects the results of learning models is imbalanced data. Ghatasheh et al. [11] presented an approach to minimize the impact of imbalanced using the meta-cost multilayer perceptron method and the cost-sensitive multilayer perceptron method, achieving 78.93% and 73.17% results, respectively. Another approach is to solve small problems within a large problem. Research by Moro et al. [12] in 1915 inbound contacts of total 52,944 contacts dataset provides a divide-and-conquer procedure utilizing both the databased sensitivity analysis for extricating highlight pertinence and master assessment

Potential Customers Prediction in Bank Telemarketing

45

for part the issue of characterizing telemarketing contacts to offer bank deposits products, get the AUC = 0.9247. For bank telemarketing datasets, highly accurate studies often only deal with a small part of the problem, for example, only predicting with inbound contacts [12] or using a small part of data with few features [8]. The remaining studies using large datasets (over 40,000 records) have not achieved really good results. In this paper, we are focusing on: • Mining in unbalanced data. • Encoding category features. • Model calibrating to acquire the best performance. The structure of the following parts of this paper is as follows: Sect. 2 provides a generalization about the dataset, followed by the data preprocessing steps. Some machine learning models and their performance are explained in Sect. 3. Conclusion and perspective will present in the Sect. 4.

2 Dataset and Preprocessing 2.1 Data Description In our research, the dataset is very close to the dataset in Moro et al. [4]. This dataset was provided by a Portuguese retail bank and publicized on the University of California Irvine (UCI) Web site for research purposes. Dataset was collected between 2008 and 2013, including the negative effect of the global financial crisis. Dataset contains 41,188 phone contacts with 20 most important features selected from the original data provided by the Portuguese retail bank. Although selected features have different relative importance levels, these features are a must-have to meet all the business questions that demand a successful telemarketing result. Phone calls took the marketing campaigns. Customers asked if they were interested in bank financial products (Bank term deposits) or not. The difference of the dataset compared to the study of Moro et al. [4] is that we use the feature “duration.” This feature has proven to be an important factor influencing the prediction results [13]. This also makes a lot of sense, given that customers will be more inclined to prolong the call if they are interested in making a bank deposit. In real life, collected data can be messy and not always ideal for training purposes. Before conducting the machine learning process, it is necessary to analyze the data distribution. Figure 1 point, the number of rejected calls is about eight times compared to successful calls (88.7 and 11.3% for “no” and “yes” records). Following the imbalance of given data, we decided to use area under the receiver operating characteristic score (AUC). In case using accuracy as the metric, machine learning models can produce very high accuracy in training and testing progress. In real-life data, the

46

L. D. Huynh et al.

Fig. 1 Target prediction distribution

deployed model can perform very poorly as it tends to correctly predict the label “0” (“naive behavior”) more than the label “1.” The false-positive rate (FPR) and true-positive rate (TPR) are used. Only, when both TPR and FDR are above the ROC curse’s random line, we can judge whether our work is efficient or not.

2.2 Data Correlation The correlation in the dataset also plays an important role in this research. Just in case, some features are not related. A correlation with positive results shows the parallel decrease or increase of variables. In contrast, a correlation with negative results shows the increase of one variable, but the other decreases. Figure 2 indicates that employment variation rate (emp. var. rate), consumer price index (cons. price. idx), Euribor 3 month rate (euribor3m), and number of employees (nr. employed) are the most correlation. This is proof the dataset meets not only all the major business demand but also meets the requirements of a good data source.

2.3 Category Data Encoding This research applied models that can only deal with numeric data types. Removing all duplicated and missing values records in the data has been completed. The entire record is reduced by only 21 duplicate items from 41,188 to 41,176, and there are no missing values. The dataset size is still very idealized for the training stage. Response coding is a technique to represent categorical data. The original idea of the technique is to present a data point belonging to a class of the category. In a case with a K-class classification problem, K-new features will be embedded with the probability calculation of which class data points belong to base on the value of categorical. Laplace smoothing has been shown to not only outperform in text categorization by Zhou et al. [14], in a study by He and Ding [15] improves accuracy

Potential Customers Prediction in Bank Telemarketing

47

Fig. 2 Data correlation

when applied to text classifiers such as Naive Bayes. Laplace smoothing is included to avoid zero probability.

3 Experimental and Result 3.1 Data Mining Models After the data preprocessing steps, the bank telemarketing dataset was removed from the duplicate records and obtained the final size of 41,176. With the categorical data encoding, the number of features increased to 30. We have investigated and tested

48

L. D. Huynh et al.

methods in supervised learning and unsupervised learning algorithms to classify labeled data problems. The research team prefers to use an advanced algorithm that parses the information, gains from itn and utilizes those learnings to find significant interest examples. Algorithms can make accurate decisions on their own, but other decisions may require human participation, such as neural networks, which are not being employed at this time. There are four typical and effective classifications used: K-nearest neighbors (KNN), logistic regression (LR), linear support vector machines (Linear SVM), and XGBoost (XGB). The performance result will be shown in Sect. 3.2. In this study, a noteworthy problem is data imbalance, with 88.7% being negative and 11.3% positive. In predicting probabilities, predictive models can be overconfident. Furthermore, in the case of this imbalanced dataset, the predictors may likely give preference to the majority class. Because of this, besides using the model performance evaluation tool ROC curve and AUC score, it is necessary to calibrate the probability prediction. To avoid observable data bias, we divided it into the train, cross-validation, and test sets with the scale of [0.45, 0.22, 0.33] in order. The train set will be learned and calibrated by Calibrated Classifier CV, using the cross-validation set with the ‘sigmoid’ method similar to Platt’s method. Tuning hyper parameters for each chosen algorithm is very important. It helped us understand data better and explain it to the bank business team if needed. For example, of the KNN algorithm, we found that pram k number of neighbors greater than 5 gives a better result than 3—according to Abdelmoula’s credit scoring research [16].

3.2 Result Table 1 shows the experimental results compared with the study by Moro et al. [4, 12]—who introduced the first bank telemarketing dataset with four machine learning models: logistic regression (LR), decision tree (DT), support vector machine (SVM), and neural network (NN). Table 1 AUC score comparison between authors and Moro et al.

Method

Train AUC Test AUC Cross-validation AUC

LR

0.9229

0.9233

0.9237

Linear SVM 0.8853

0.8878

0.8945

KNN

0.9405

0.9295

0.9324

XGBoost

0.9283

0.925

0.9247

LR [4]

0.715

DTs [4]

0.757

SVM [4]

0.767

NN [4]

0.794

Potential Customers Prediction in Bank Telemarketing

49

Fig.3 Best experimental result: a AUC score and b confusion matrix

The previous study produced the best result AUC = 0.794 (NN). In research [4, 12], the authors did not use the “duration” feature even though they do concern that “duration” might have a positive change on the result. In this research, the authors extend the idea using the “duration” feature to improve the research results. Finally, we received magnificent results compared to previous research. According to the obtained results in Table 1 and Fig. 3, the AUC score of the KNN model is the best with train AUC = 0.9405, test AUC = 0.9295, cross-validation AUC = 0.9324. The correct prediction rate of bank telemarketing failure TNR = 0.9688 (Fig. 3b) is very high. Meanwhile, the correct prediction of success TPR is quite low, only 0.4372. This happens because of an imbalance in the dataset, with the number of failure records being too large (88.7%). However, the overall prediction accuracy rate is still quite high, accounting for 91.07%.

4 Conclusion This research demonstrates how machine learning techniques can make an incredible impact on the result of the telemarketing campaign. There are two major steps: data preprocessing and model evaluation. In the first step, cleaning data by removing duplicates records, checking if there were missing values to remove or not, visualizing data to check the imbalance of dataset, and applying the response coding technique to encode category features with the help of Laplace smoothing. Moreover, adding a “duration” feature also greatly affects the final result. In the second step, typical efficient algorithms: KNN, LR, linear SVM, and XGBoost were chosen to determine the best classifier model. Since the bias of the dataset, the area under the receiver operating characteristic score is observed to judge the successfulness of research. KNN is the best method with 93% AUC and performance 91.07% accuracy. The experiment results show that the best KNN with k greater than five can be useful for interpreting the business point of view. The paper can be a good reference for many machine learning problems [17, 18].

50

L. D. Huynh et al.

References 1. Ling CX, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceedings of the international conference on knowledge discovery from data (KDD 98). New York City, pp 73–79 2. Elsalamony HA, Elsayad AM (2013) Bank direct marketing based on neural network. Int J Eng Adv Technol (IJEAT) 2(6):2249–8958 3. Ghoddusi H, Creamer GG, Rafizadeh N (2019) Machine learning in energy economics and finance: a review. Energy Econ 81:709–727. https://doi.org/10.1016/j.eneco.2019.05.006 4. Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31. https://doi.org/10.1016/j.dss.2014.03.001 5. Miguéis VL, Camanho AS, Borges J (2017) Predicting direct marketing response in banking: comparison of class imbalance methods. Serv Bus 11(4):831–849. https://doi.org/10.1007/s11 628-016-0332-3 6. Zhang X, Li X, Feng Y, Liu Z (2015) The use of ROC and AUC in the validation of objective image fusion evaluation metrics. Sign Process 115:38–48. https://doi.org/10.1016/j.sigpro. 2015.03.007 7. Martens D, Vanthienen J, Verbeke W, Baesens B (2011) Performance of classification models from a user perspective. Decis Support Syst 51(4):782–793. https://doi.org/10.1016/j.dss.2011. 01.013 8. Amponsah AA, Pabbi KA (2016) Enhancing direct marketing using data mining: a case of Yaa Asantewaa Rural Bank Ltd. in Ghana. Int J Comput Appl 153(7):6–12. https://doi.org/10. 5120/ijca2016912092 9. Kozak J, Juszczuk P (2018) The ACDF algorithm in the stream data analysis for the bank telemarketing campaign. In: 2018 5th International conference on soft computing and machine intelligence (ISCMI), pp 49–53. https://doi.org/10.1109/iscmi.2018.8703246 10. Boryczka U, Kozak J (2012) Ant colony decision forest meta-ensemble. In: Computational collective intelligence. Technologies and applications, pp 473–482. https://doi.org/10.1007/ 978-3-642-34707-8_48 11. Ghatasheh N, Faris H, AlTaharwa I, Harb Y, Harb A (2020) Business analytics in telemarketing: cost-sensitive analysis of bank campaigns using artificial neural networks. Appl Sci 10(7):2581. https://doi.org/10.3390/app10072581 12. Moro S, Cortez P, Rita P (2017) A divide-and-conquer strategy using feature relevance and expert knowledge for enhancing a data mining approach to bank telemarketing. Expert Syst 35(3). https://doi.org/10.1111/exsy.12253 13. Moro S, Laureano R, Contez P (2012) Enhancing bank direct marketing through data mining. In: Proceedings of the forty-first international conference of the European marketing academy. European Marketing Academy, pp 1–8 14. Zhou S, Li K, Liu Y (2009) Text categorization based on topic model. Int J Comput Intell Syst 2(4):398–409. https://doi.org/10.1080/18756891.2009.9727671 15. He F, Ding X (2007) Improving naive bayes text classifier using smoothing methods. In: Lecture notes in computer science, vol 4425, pp 703–707. https://doi.org/10.1007/978-3-540-714965_73 16. Abdelmoula AK (2015) Bank credit risk analysis with k-nearest-neighbor classifier: case of Tunisian banks. Account Manage Inform Syst 14(1):79–106 17. Luu NT, Hung PD (2021) Loan default prediction using artificial intelligence for the borrow– lend collaboration. In: Luo Y (ed) Cooperative design, visualization, and engineering, CDVE 2021. Lecture notes in computer science, vol 12983. Springer, Cham. https://doi.org/10.1007/ 978-3-030-88207-5_26 18. Hung PD, Thinh TQ (2019) Cryptocurrencies price index prediction using neural networks on bittrex exchange. In: Dang T, Küng J, Takizawa M, Bui S (eds) Future data and security engineering, FDSE 2019. Lecture notes in computer science, vol 11814. Springer, Cham. https://doi.org/10.1007/978-3-030-35653-8_43

Analysis and Implementation of Normalisation Techniques on KDD’99 Data Set for IDS and IPS V. Priyalakshmi and R. Devi

Abstract The rapid expansion of the Internet, numerous types of network attacks have emerged, making the ability to identify aberrant behaviour and accurately recognise attack categories an essential study topic in the field of network security with the help of Intrusion Detection System and Intrusion Prevention System. Many popular machine learning-based approaches have recently been used to build a data-driven model in the intrusion detection system (IDS). The methods can help save time and money by reducing the amount of manual detection required. However, real-time network data contain a plethora of duplicated phrases and sounds, and some present intrusion detection methods have low accuracy and feature extraction capabilities. In order to solve the above problems, this research work proposes new machine learning algorithm for avoid intrusion. Pre-processing work for the final ML method is proposed in this paper. This pre-processing includes data cleaning and normalisation. To perform normalisation, this paper compares alternative normalisation algorithms and implements the chosen normalisation approach with the KDD’99 data set. Keywords IDS · ML · Normalisation · KDD’99

1 Introduction Computer network security [1] refers to the steps taken by companies and organisations to monitor and prevent illegal access from outsiders. Different methods used for network security are depend upon the scale of computer network. For example, usage of computer in home need only basic network security, but large organisations need to keep their networks safe from attackers. Cyber-attacks are becoming more sophisticated, posing greater dangers in correctly identifying breaches. The credibility of security providers, such as data of availability, integrity, and secrecy, may be jeopardised in the absence of intrusion V. Priyalakshmi (B) · R. Devi Department of Computer Science, Chennai, VISTAS, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_5

51

52

V. Priyalakshmi and R. Devi

prevention. The emergence of malicious applications poses a significant threat to the IDS design (malware). Malicious attacks have become more complex, and the biggest risk is identifying unknown and obfuscated software since malware writers employ a variety of evasion techniques to disguise details and escape detection by intrusion detection systems (IDS) [2]. The network administrator is in charge of the network’s data and software. The user ID and password are assigned to the authorised person by a network administrator. Considering all of the conceivable sorts of attacks, constructing a dependable network is a difficult undertaking. We widely use computer networks and its services in industry, business and virtually. Intruder [3] assaults are a big worry for security personnel and anyone who is responsibility for protecting networks and its users. By using IDS, every type of malicious network traffic and usage of computer can be detected by using traditional firewall. This covers network assaults on weak services, data file attacks, and host-based attacks such as privilege escalation, unlawful logins, sensitive data, and malware. For users, protection from harmful external impacts is essential for their systems. The firewall mechanism is one of the most often utilised network security strategies. Medical applications, credit cards, insurance agencies and IDS are all employed in communication [4]. Machine learning is an artificial intelligence discipline in which machines can recognise how to solve a problem on their own by giving access to the relevant data. ML enables robotics to independently perform computational activities that humans have historically performed using complex statistical and mathematical methodologies. This idea of automating complicated procedures has sparked a surge of interest in networking, with the hope that many of the operations involving the communication networks and design will be deputed to machines. Implementations of machine learning occur in many networking areas have been already met these standards in fields such as cognitive radio, classification of traffic and intrusion detection. Machine learning is a technique for training data algorithms, and deep learning is a type of ML innovative by the structure of the human brain in deep learning [5]. Deep learning algorithms have made significant progress in a variety of applications, resulting in high-performing safety solutions. It is thought to be the best option for identifying complex architecture in highly dimensional data by employing backpropagation methods. A subset of machine learning is deep learning that deals with AI. The technique that allows machines to replicate behaviours of human is termed as artificial intelligence. An artificial neural network [6] is the name given to this network construction.

2 Intrusion Detection System An instance of a device is intrusion detection is a network security management system. A system that analyses data in a computer or network in numerous locations to find suspected security breaches, such as intrusion or misuse (attacks from inside

Analysis and Implementation of Normalisation Techniques on KDD’99 …

53

Fig. 1 IDS

association). ID employs susceptibility evaluation (scanning), a technology that was developed to test network or computer system security as shown in Fig. 1 [6].

2.1 Taxonomy of IDS There are two types of intrusion detection systems (IDSs): HIDS and NIDS [7]. The agents are attached on stated devices as part of an IDS. The agents track and write data in order to create files and trigger alarms. An interconnected IDS typically comprises of a network programmed (or sensor) that maintains the interface separately in a promiscuous mode. The IDS is applied to the network segment or boundary in this part. In order to construct more effective hybrid networks, intrusion detection is now centred on a combination of host and network information. Host-Based Intrusion Detection System (HIDS) Single-host device intrusion detection system is referred as HIDS. Data are collected from a single host device. System integrity, programme behaviour, file alterations and traffic on the host network/security logs are all tracked by the HIDS agent. To provide information on the current condition of the local host, the agent employs typical hacking tools such as file timestamps, log system and monitor system calls, as well as the local network interface. If no unauthorised changes or activity are found, the user is alerted through a pop-up; the central management server is activated, and the three procedures described above are banned or merged. The outcome is determined by the policy of the local system. Network-Based IDS (NIDS) A network-based instruction detection system analyses and monitors network traffic in order to protect the network against network-related data threats. A NIDS aids in the detection of unusual activities such as dos attacks, port scans and network traffic monitoring assaults. A packet intrusion prevention sensor, one or more NIDS management feature servers or one or more human interface management relieves are all part of an NIDS. NIDS analyses the traffic of the packets in real time to discover patterns of interference. The investigation of intrusion detection pattern carried out

54

V. Priyalakshmi and R. Devi

on sensors, servers or as a result of these two devices. The active component of these networked procedures is taken into account. Hybrid ID The current trend is to combine HIDS and NIDS in various forms to create hybrid systems. The hybrid IDS’ adaptability adds to the layer of security. It incorporates IDS sensor placements as well as individual segment or network-wide data.

2.2 Intrusion Detection Methodologies To incorporate the desired features of an ID, several ways are currently being employed. There are two popular approaches for detecting intrusions: Anomaly Detection (AD) The IDS anomaly tries to find abnormalities when the normal system differs somewhat. Anomalies are detected using audit data collected during normal operations. Anomaly detection is critical for detecting fraud, network interference and other troubling events that are vital but difficult to spot. Anomaly detection is crucial because of the role anomalous data play in operational data across a wide range of applications. Because it involves customer behavioural variations, AD is also known as behaviour-based detection [7]. Misuse Detection (MD) Misuse of IDS to examine specific traffic and compare aberrant activity to rules-based systems can detect attacks, such as matching signature patterns. Because alerts are focussed on specific signatures of assaults, MD is also known as signature-based detection. These signatures are based on traffic or behaviour that is known to be intrusive [8, 9].

2.3 IDS and Their Functions IDS has four crucial functionalities data collection, feature selection, analysis and action.

Analysis and Implementation of Normalisation Techniques on KDD’99 …

55

2.4 Data Sets for IDS DARPA This data set was created for the purpose of system security research. File transfer and receipt via FTP, email transmission and receipt via SMTP and POP3, web surfing, signing into distant computers via Telnet and doing work, sending and receiving IRC messages and remotely tracking the router via SNMP are all tasks included in DARPA. KDD’99 By analysing the section of the TCPDUMP 1998 DARPA data collection, a KDD Cup data set for 1999 was constructed. Neptune-Dos, Pod-Dos, Rootkit, Satan, and TearDrop are among the assaults used by KDD’99. Kyoto Honeypots were used to construct this data set, and there is no human labelling or anonymization; therefore, it is possible to see solely honeypot-directed assaults. It offers 10 additional features such as IDS detection, detection of malware and AshulaDetection that are useful in the study and analysis of NIDS. Twente Network packet trace files and wireless tracks are included in the data set. A single TCP attack request scenario was used to develop it. NSL-KDD The intrusion detection benchmark data set is the data set. Based on the DARPA data set [10, 11], NSL-KDD has 43 attributes (including the class label) and 147,907 samples.

3 An Approaches of Machine Learning Machine learning comes under a branch of mathematics that studies the theory, outcomes and properties of educational systems and algorithms. Optimization theory, artificial intelligence, psychology, knowledge processing, optimum controls, cognitive sciences, as well as other departments of research, science and mathematics, are all used. It is a very interdisciplinary subject. Machine learning is now being used in a wide range of applications. It has infiltrated practically every sector of research with a profound impact on technology and society. Advice engines, recognition procedures, IT and data mining, and independent testing systems are among the challenges that have been addressed. The field of machine learning is commonly divided into three sub-domains: supervised, unsupervised and reinforcement learning as shown in Fig. 2.

56

V. Priyalakshmi and R. Devi

Fig. 2 Machine learning approach

3.1 Decision Tree The DT function used to be build a classifier that can predict the target class value of an unknown test case. There are numerous ways to define an unseen instance as a DT. The decision tree is considered as popular single classifier because of its simplicity and easy implementation. DT can be improved in two ways. A description tree of a collection of symbolic class labels. A regression tree of a set of numerically significant labels.

3.2 Naive Bayes The attributes are linear combinations, according to the class label provided by NB, and thus attempt to estimate the probability in support. Naive Bayes also obtains positive results in the classification of simplified relations. Naive Bayes requires only one training data scan, making categorization much easier. Figure 3 shows block diagram of Naïve Bayes classifier.

Analysis and Implementation of Normalisation Techniques on KDD’99 …

57

Fig. 3 Naive Bayes classifier

classifier P ( doc|cat)P (cat) P (doc )

P(doc|cat) ( ( )

)

3.3 K-nearest Neighbour KNN employs a variety of distance measurement techniques. KNN can find the k number of samples and provide a class label to the test sample among the investigated workouts in the training data. The simplest and nonparametric solution for classifying samples is KNN. KNN is an inductive learner, not a case-based learner.

3.4 Artificial Neural Network ANN is a pattern recognition system based on the human brain’s function. NNs are often organised into levels, which are made by number of interconnected nodes that form an mechanism of activation. The patterns of network send to input layer, and real processing is performed by communication with one or more hidden layers.

3.5 Support Vector Machines SVM was created at mid-1990s. Essentially, the objective of SVM is to train the data to define an common class of objects or to distinguish a non-attack in the IDS, while the other data are presumed to be anomalies. In a specific region, where regular interests are present, the classification generated using the SVM technique discriminates against the input field. Through the SVM technique as shown in Fig. 4, IDS feature classification is effective [12].

3.6 Fuzzy Logic In dual logic, true principles can be either false (0) or true (1) for reasoning purposes; however, in fuzzy logic, some limits are removed. In FL, this suggests that the true

58

V. Priyalakshmi and R. Devi

Fig. 4 Support vector machines

Fig. 5 Fuzzy logic interference system

degree of a statement may be around 0 or 1 with “0” or “1”. Figure 5 shows an architectural diagram of fuzzy logic.

4 Proposed Research Work Users’ accounts, network resources, passwords, and personal information are all safeguarded in some way by network administrators and security officers. Network attacks may be carried out in two ways. (i) to prevent people from accessing a network service. (ii) infringing on the privacy of others. DoS attacks are among the most popular forms of attacks that target network resources and make network services unavailable to customers. For the intruder to achieve their aim of making the network unavailable to its users, there are a number of different DoS attacks,

Analysis and Implementation of Normalisation Techniques on KDD’99 …

59

each with a unique set of behaviours for using network resources to do this. In a remote-to-local (R2L) attack, an attacker sends a swarm of data packets to a target computer or server without first obtaining authorization to join as a local user. One other kind of attack is known as a “User to Root” (U2R) in which an attacker attempts to access network resources as an ordinary user but succeeds after many unsuccessful attempts [10]. To get future unrestricted access to personal information, an attacker may employ probing, a technique that involves checking network equipment for flaws in the topology design or open ports. Nmap, portsweep and ipsweep are some examples of network probing tools. We can detect any kind of assault early on using IDS; thus, it is an important aspect of constructing a computer network that protects against all types of intruder attacks. IDS utilises classification algorithms to determine if a packet is normal or an attack, packet as it passes across the network. In an online repository data set like KDD, all sorts of intruder assaults are included, such as DOS, R2L, U2R and PROBE. The KDD data set is assessed in this research based on the number of classifiers. The technique used in this study is to first do a pre-processing step on the KDD data set, then apply the prepared data set in a resource and fair environment, and lastly, to see which classifier is more accurate than others in identifying all of the assaults investigated. This research work is going to detect and prevent IDS using machine learning algorithms as shown in Fig. 6. This concept is used for implementation of ML algorithms [11, 13, 14] to detect and prevent IDS. Figure 6 diagram shows workflow of the anomaly network-based intrusion detection system using improved glowworm swarm optimization algorithm with transductive support vector machine research work. This paper proposes a pre-processing work on KDD’99 data set and analysis different types of data normalisation techniques and implement best normalisation techniques on KDD’99 data set.

Fig. 6 Research process flow

60

V. Priyalakshmi and R. Devi

5 Normalisation Normalisation is the process of transforming features into a similar scale that is comparable. This enhances the training stability of the model and performance.

5.1 Quick Overview of Normalisation Techniques There are five common normalising procedures that may be useful. • • • • •

Scaling to a range Clipping Log scaling Min–max Z-score.

Scaling to a Range Scaling is a technique used to convert floating point feature values from one range to another, as described in MLCC. Typically, the scaling ranges from 0 to 1 although it may also range from − 1 to 1. To scale to a range, use the following basic formula:  X  = (x − xmin ) (xmax − xmin )

(1)

When both conditions are met, scaling to a range is a viable option: When few or no outliers are discovered, in this case, you may be certain that your data are roughly and evenly distributed. As an example, age is used. The majority of people’s ages fall between 0 and 90, and each end of the spectrum has a sizable population. On the other hand, scaling on income would not be used because only a few people have really high income. The linear scale would be relatively high, towards the higher end of the income scale, with most individuals cramming into a tiny piece of it. Feature Clipping The concept of feature clipping is used when data set contain extreme outliers, it sets a fixed value or all feature values above or below a specified threshold. For example, all readings of temperature above 40 may be clipped to be exactly 40. Feature clipping can be used before or after with other normalisations techniques. Clipping by Z-score method is to + −N (for example, limit to + −3) is another basic clipping approach.

Analysis and Implementation of Normalisation Techniques on KDD’99 …

61

Log Scaling To compress a low range to high range, log scaling computes the log of your data. x  = log(x)

(2)

Few values have a lot of points, but the rest of the values have a lot of points, log scaling comes in handy. The power law distribution is the name for this type of data distribution. A good example is movie reviews. On the figure below, the majority of movies have just a few ratings (data in the tail), whereas a few have a lot of them (the data in the head). Distribution can altered by using log scaling, which aids the performance of linear models. Min–Max Min–max normalisation is a normalisation method that involves executing linear adjustments on the original data to generate a value of balance comparisons before and after the operation [15, 16]. The formula below can be used in this technique[17]. X new = A − min(X )/max(X ) − in(X )

(3)

where Xnew = The value of new derived from the normalised data. X = Old value. Max (X) = Max value in the data set. Min (X) = Min value in the data set. Z-Score Mean of standard deviations is represented by the z-score method, which is a form of scaling. Your feature distributions were not tested using z-scores to verify that the mean and standard deviation were equal to zero and one, respectively. When there are only a few outliers, but not enough to warrant pruning, this is a useful tool. It is based on the data’s mean and standard deviation to perform z-score normalisations [18, 19]. If the data’s actual lowest and maximum values are unknown, this approach comes in handy. The following is the formula for determining a point’s z-score, x: x  = (x − μ)/σ

(4)

62

V. Priyalakshmi and R. Devi

6 Literature Review Liu and Zhang [19] Data pre-processing, feature self-learning, and classifier are the three functional modules that make up the network intrusion detection model based on CNN. As a consequence of employing pre-processed original sample data sets, this model is trained using convolutional neural networks, resulting in a substantial convergence impact. Noel et al. [20] These findings offer a systematic comparison analysis that highlights the differences and simulates the effects of eight quantitative data standardisation methods, assisting researchers in selecting appropriate data transformation procedures for their standard studies. Lippmann et al. [21] This research explained the attack strategy, looked the job of verifying each warning, recognising the attack activities and reacting to an attack, all of which are useful in lowering susceptibility to similar attacks and indicated long-term beneficial records of attack-related events. Lippmann et al. conducted a thorough analysis of 58 different types of assaults against DARPA 1999. Kumar et al. [22] In this study, we examine the distribution and arrangement of the KDD Cup’99 data set. In this statistical analysis, the data set is described in detail. Network traffic data characteristics and aspects, as well as their relevance in behaviour and intrusion detection system performance, are examined in this study. Dharamvir and Arul Kumar [23] This study provides a thorough examination of the various data normalisation strategies that may be used on the KDD CUP’99 data set, which has characteristics that span a wide range of values. When non-uniform data are employed in pattern recognition algorithms, the result is biassed output that relies only on a representation of all features while keeping the reality intact. On the data set, min–max and z-score approaches have been shown. The outcomes seem to be within the expected range.

7 Proposed Work The goal of this work is to apply a normalisation strategy to the KDD Cup’99 data set. According to the results of the aforementioned investigation, z-score produces better accuracy than other standardisation procedures. The process of standards implementation is depicted in Fig. 7

Analysis and Implementation of Normalisation Techniques on KDD’99 …

63

Fig. 7 Normalisation process flow Data Gathering

Data Cleaning

Data Analysis

Data Normalizaon

encode_numeric_zscore ========== encode_text_dummy

Normalized KDD Cup '99

8 Research Methodology This study relied on the NSL-KDD Cup’99 data set. There are 494,021 rows and 42 features in the KDD’99 10% data set. The ‘outcome’ feature has all the type of attacks information. From the following attacks, this work is going to find intrution. Table 1 shows the different types of attacks in intrution.

64

V. Priyalakshmi and R. Devi

Table 1 Different types of attacks in intrution Attack category

Attack type

DoS (Denial-of-service)

Neptune, land, pod, smurf, teardrops, back, worm, udpstorm, process table and apache2

Probe

Ipsweep, satan, nmap, portsweep, mscan, saint ipsweep, satan, nmap, portsweep, mscan, saint

R2L

Imap, multihop, phf, spy, warezclient, warexmaster, snmpguess, named, xlock, xsnoop, snmpgettack, httptunnel, sendmail

U2R

Buffer overflow, load module, perl, rootkit, ps, xterm, sqlattack

8.1 Evaluation Metrics Detection rate : It is the total number of incidences divided by the genuine positive/true-negative ratio. Sensitivity : Positives are created that are accurately categorised. Specificity : Actual negatives percentage correctly categorised. Error : During the training iteration, a low error was achieved. Epochs : The total number of iterations completed. Time : The amount of time it took to finish the training process in minutes.

9 Result and Discussion The degree of accuracy of the classification technique is determined by evaluating the data mining classification algorithm. There are two types of data used by the classification algorithm: training data and testing data. In order to develop a classification model, trained data are used to establish a pattern. In the meanwhile, data testing is being carried out to investigate the problem. The classification algorithm is more efficient and effective. Confusion is a factor in the assessment. Decisions made during training and testing are presented in a matrix. The confusion matrix assesses the level of confusion. True or false items were used to classify the results. According to the min–max normalisation technique as shown in Fig. 8, the greatest accuracy value is 97% at k = 5 and k = 21, while the lowest accuracy value is 95% at k = 1, k = 7, k = 9 and k = 27. A new normalising approach will be used to alter the data set once more. The zscore normalisation approach is the next normalising method employed. The formula used in this process is shown in equation. To normalise the z-score, the attribute data’s mean and standard deviation are processed. The z-score normalisation approach has the maximum accuracy at k = 5 and k = 15, with a 98% value of accuracy, and the low accuracy at k = 1, k = 13, k = 21, k = 23, k = 25 and k = 27 with a 96% a value of accuracy as shown in Fig. 9. According to the results of this study’s tests, the z-score normalisation approach has a consistent accuracy of 96–98%.

Analysis and Implementation of Normalisation Techniques on KDD’99 …

65

Fig. 8 Min–max normalisation method’s accuracy

Fig. 9 Accuracy in the z-score normalisation method

9.1 Comparition of Normalisation Techniques According to the results shown in Figs. 10, 11, 12, 13, 14 and 15 and discussions above, z-score normalisation is more accurate than min–max normalisation. Comparison of normalisation techniques is discussed in Table 2. As a result, z-score normalisation is used in KDD Cup ‘99 for standarisation.

10 Z-Score Implementation on KDD CUP ’99 This proposal is being suggested. Standarised outputs are shown in Table 3. Python is used for pre-processing and data cleansing. This study suggests ML-based intrusion detection and prevention. This effort must begin standardisation before implementing the ML algorithm. According to the aforementioned analysis, the z-score is more accurate than other normalisations.

Fig. 10 Detection rate comparition

V. Priyalakshmi and R. Devi

Detecon Rate (%)

66

84.66

86 84 82 80 78 76

79.84

Min - Max

Z-Score

Fig. 11 Sensitivity comparition

Sensivity (%)

Normalizaon Methods

84.405

86 84 82 80 78 76

79.68

Min - Max

Z-Score

Fig. 12 Specificity comparition

Specificity (%)

Normalizaon Method

90

84.745

85

79.92

80 75 Min - Max

Z-Score

Normalizaon Method

Fig. 13 Error comparition

Error (%)

0.00015

0.000136 0.000112

0.0001 0.00005 0 Min - Max

Z-Score

Normalizaon Methods

11 Conclusion The data set has been pre-processed by removing “na” and “duplicate” values. Zscore normalisation is used to normalise the KDD’99 10% data set. This normalisation process, when compared to other normalisation algorithms, provides greater

Analysis and Implementation of Normalisation Techniques on KDD’99 …

67

200

Fig. 14 Eposh comparition Epocj (%)

150

144.8 118

100 50 0 Min - Max

Z-Score

Normalizaon Method

21 Time (Min)

Fig. 15 Time comparition

20

20 19

18

18 17 Min - Max

Z-Score

Normalizaon Method

Table 2 Comparision of normaliatiom techniques Noramalization method

Detection rate (%)

Sensitivity (%)

Specificity (%)

Error (%)

Epoch (%)

Time (min)

Min-max

79.84

79.68

79.92

0.000136

144.8

20

Z-score

84.66

84.405

84.745

0.000112

118

18

accuracy, as demonstrated in the study. According to the analysis, it has a precision of 96–98%.The next step is to extract features in order to implement the ML algorithm.

− 0.004293

− 0.004261

− 0.004263

− 0.004272

− 0.004273

− 0.10785

− 0.10785

− 0.10785

− 0.10785

− 0.10785

0

1

2

3

4

− 0.013613

− 0.025042

− 0.025042

− 0.039036

0.042595

hot

− − 0 0.08439 0.00474

− − 0 0.08439 0.00474

− − 0 0.08439 0.00474

− − 0 0.08439 0.00474

1

1

1

1

1

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

1

land-0 land-1 logged_in-0 logged_in-1 is_guest_login-0 is_guest_login-1

− − 0 0.08439 0.00474

Duration src_bytes dst_bytes wrong_fragment urgent

Table 3 Standarised output

68 V. Priyalakshmi and R. Devi

Analysis and Implementation of Normalisation Techniques on KDD’99 …

69

References 1. Dretheyk-Ossowicka A (2020) A survey of neural networks used for intrusion detection systems. J Ambient Intell Humaniz Comput 2. Sultana N, Chilamkurti N, Peng W, Alhadad R (2019) Survey on sdn based network intrusion detection system using machine learning approaches. Peer-to-Peer Netw Appl 12(2):493–501 3. Khraisat A (2019) Survey of intrusion detection systems: techniques, datasets, and challenges 4. Musumeci F et al (2019) An overview on application of machine learning techniques in optical networks. IEEE Commun Surv 21(2):1383–1408. https://doi.org/10.1109/COMST.2018.288 0039 5. Pouyanfar S et al (2019) A survey on deep learning. ACM Comput Surv 51(5):1–36. https:// doi.org/10.1145/3234150 6. Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Springer 7. Byun H, Lee S-W (2002) Applications of support vector machines for pattern recognition: a survey. Springer-Verlag, Berlin Heidelberg 8. Manasa PRMV, Patil SB (2012) A survey on intrusion detection system, 1(4):928–932 9. Huy N, Deokjai C (2008) Application of data mining to network intrusion detection: classifier selection model. In: Challenges for next generation network operations and service management, pp 399–408 10. Hachimi M, Kaddoum G, Gagnon G, Illy P (2020) Multistage jamming attacks detection using deep learning combined with kernelized support vector machine in 5G cloud radio access networks. In: Proceedings of the 2020 international symposium on networks, computers and communications (ISNCC). IEEE, Montreal, pp 1–5 11. Abhale AB, Manivannan S (2020) Supervised machine learning classification algorithmic approach for finding anomaly type of intrusion detection in wireless sensor network. Opt Mem Neural Netw 29(3):244–256 12. Xie W, She Y, Guo Q (2021) Research on multiple classification based on improved SVM algorithm for balanced binary decision tree, 2021:11. Article ID 5560465 13. Mahbooba B, Timilsina M, Sahal R, Serrano M (2021) Explainable artificial intelligence (xai) to enhance trust management in intrusion detection systems using decision tree model. Complexity 2021:11. Article ID 6634811 14. Al-rawahnaa ASM, Yahya A, Al B (2020) Data mining for education sector, a proposed concept. J Appl Data Sci 1(1):1–10 15. Saranya C, Manikandan G (2013) A study on normalization techniques for privacy preserving data mining. Int J Eng Technol 5(3):2701–2704 16. Ribaric S, Fratric I (2006) Experimental evaluation of matching-score normalization techniques on different multimodal biometric systems. Proc Mediterr Electrotech Conf MELECON 2006:498–501. https://doi.org/10.1109/melcon.2006.1653147 17. Patro SGK, Sahu KK (2015) Normalization: a preprocessing stage. IARJSET 20–22. https:// doi.org/10.17148/iarjset.2015.2305 18. Jain A, Nandakumar K, Ross A (2005) Score normalization in multimodal biometric systems. Pattern Recognit 38(12):2270–2285. https://doi.org/10.1016/j.patcog.2005.01.012 19. Liu G, Zhang J (2020) CNID: research of network intrusion detection based on convolutional neural network. Discrete Dyn Nat Soc 2020:11. Article ID 4705982, https://doi.org/10.1155/ 2020/4705982 20. Noel DD, Justin KGA, Alphonse AK, Désiré LH, Dramane D, Nafan D, Malerba G (2021) Normality assessment of several quantitative data transformation procedures. Biostatistics Biometrics Open Access J 10(3). https://doi.org/10.19080/BBOAJ.2021.10.555786

70

V. Priyalakshmi and R. Devi

21. Lippmann R, Haines JW, Fried DJ, Korba J, Das K (2000) The 1999 DARPA off-line intrusion detection evaluation. Comput Netw 34(4):579–595 22. Kumar S, Sunanda, Arora S (2020) A statistical analysis on KDD Cup’99 dataset for the network intrusion detection system. In: Applied soft computing and communication networks, lecture notes in networks and systems 125. https://doi.org/10.1007/978-981-15-3852-0_9 23. Dharamvir, Arul Kumar V (2020) Data normaliztion techniques on intrusion detection for dataset applications. Int J Adv Sci Technol 29(7):5083–5093

Deep Neural Networks Predicting Student Performance Kandula Neha , Ram Kumar, S. Jahangeer Sidiq, and Majid Zaman

Abstract In recent years, deep learning and educational data have received considerable attention. In this paper, the neural network (NN) model shows students belongs to which category. This provides the institution with knowledge so that potential failing students can be remedied. A collation with the existing expert system algorithm using the same dataset as the model. The proffered knowledge engineering model achieves greater precision and exceeds the accuracy of other machine learning algorithms. Keywords Deep neural network (DNN) · Deep learning · Artificial neural network (ANN) · Education data mining

1 Introduction The performance of academic students has always been an important factor in determining the career of a student and its prestige. Education data mining (EDM) is a discipline that is taken from an educational background to draw significant knowledge. EDM applications like model development can help students predict their academic performance. Thus, researchers are led to look at unique information mining methods in order to improve the present method. The functions of machine learning techniques to predict scholar performance based on scholar history and time period examination performance have proven helpful in predicting more than a few K. Neha (B) · R. Kumar · S. Jahangeer Sidiq Lovely Professional University, Phagwara, Punjab, India e-mail: [email protected] R. Kumar e-mail: [email protected] S. Jahangeer Sidiq e-mail: [email protected] M. Zaman University of Kashmir, Srinagar, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_6

71

72

K. Neha et al.

degrees of performance. Using such machine getting to know techniques can be used to at once predict students who are surprisingly possibly to fail so that an instructor can grant the student with a remedy. It can even assist observe high-quality college students and assist him grant a bursary. Algorithms such as decision tree [1] and Naive Bayes [2] for learning machines are widely used in education data mining. Such algorithms are restricted, as HavanAgrawal [3] states, when the entries are provided to the Bayesian classification in a continual range, the accuracy of the models is reduced. This sorting works better with discrete data. A neural network also exists when a continuous data is provided. Deep learning is seen as a state-of-theart [4–7] research tool used in various applications for artificial intelligence research. Deep learning may be classified as: deep neural network (DNN), recurrent neuronal network (RNN), convolution neural network (CNN) and Q-learning. Deep learning was recently utilized to recognize voice/sound [8], natural language processing [9] and computer vision [10]. In this paper, we proposed a mannequin classifying the neural community (DNN) to predict the performance of the students. The NN model proposed is designed to predict whether students fall into a failure category or pass by thru a logistic classification analysis [11, 12]. The proposed model predicts failure of students with an estimated 85% accuracy.

2 Methodology 2.1 Dataset and Data Processing This area explains the steps taken to extract the predictors and clarifies the training manner and the parameters for the classifiers: Dataset Link: https://www.kaggle. com/aljarah/xAPI-Edu-Data. Name: Academic Dataset of Students. Data from above link is obtained for constructing the proposed deep neural network to predict performance. This is an instructional dataset from getting to know administration device recognized as Kalboard 360. There are 500 student records in the dataset. It has 16 different characteristics (see Table 1). The dataset consists of 3 classes based on interval values of numeric (see Table 2. Classes—intervals and labels). Data preprocessing following collection of data is needed to improve dataset quality. Selection of data attributes, cleaning of data, transformation and reduction of data are all part of preprocessing of data. As part of the process discovery of knowledge, the dataset contains some values missing from 500 records in different features. All missing values are removed and the data cleaning number is 480. Data transformation for the dataset is applied. The Gender, Relation, Semester, Parent Answering Survey, Parent School Satisfaction and Student Absence Days attributes of nominal data type are translated into ‘0’ and ‘1’ binary data. Other-nationality,

Deep Neural Networks Predicting Student Performance

73

Table 1 Student dataset Students dataset Name

Data type

Distinct values

Gender, parent–responsibility answering and satisfaction, student absent day, semester

Nominal

2

Nationality and place of birth

14

Stages and section ID

3

Grades and topics

12

Raised hand, Visited resource, Viewing announcements and group Numeric discussions

0–100

Table 2 Classes—intervals and labels

Classes Interval values

Class label

0–69

Low

70–89

Middle

90–100

High

Birth Place, ID of Stage, Grade and Section and Topic attributes are transformed into numerical [13].

2.2 Deep Neural Network Deep learning methods goal at getting to know attribute hierarchies with higher-level attributes fashioned by the mixture of different low features. Multiple methods for higher and deeper architectures are included. DNN is a couple of NN mannequin class. Model that carries enter layers, arbitrary hidden layers and output layers [14, 15]. The layers consist of neurons that are similar to human neurons in the brain. A neuron is a nonlinear function that maps the vector {I 1 ,…, I n ) to the Y output by the weighted vector {w1 ,…,wn ), and to the f . Also called feed-forward. kY = f ( f ) = f (wt I )i = 0(wt)

(1)

The model’s objective is to optimize weight w to reduce square loss error. This can be done by descending stochastically gradient (SGD). SGD iteratively updates the weight vector, which ultimately aims to direct the loss function to the minimum gradient. In order to get the equation SGD update: wnew = wold −n · (Y −t)Y (1 − Y ) . . . Y (1 − Y )I

(2)

74

K. Neha et al.

An epoch is a propagation of feed and back. Each duration helps to minimize costs. In deep neural networks, the nth times are iterated, the gradients are updated and optimized. Deep neural network (DNN) is a profound getting to know architecture that approves working fashions consisting of more than a few hidden processing layers to learn exceptional representations of multi-level summary data. Deep learning has a notable ability to AutoLearn and adapt, to study and correctly address complicated problems in the actual world. In this paper, we current a linear classification model of the DNN to predict students’ performance. This method is followed as soon as the dataset: data cleansing and statistics transformation is preprocessed. Python3 and TensorFlow 1.3.0 are used to construct the DNN model. Python is a full programming language featured for well-known purposes. It is a mature and swiftly increasing platform for scientific and numerical research. Python is domestic to several open-source libraries and almost all usual desktop getting to know libraries, which can be used for profound gaining knowledge of models. All this advantages from the Python ecosystem main to the two biggest libraries for numerical analysis, TensorFlow and Theano Library, for the reason of profound learning. TensorFlow is an open-source library that uses information flow graphs to compute numbers. The information waft graphs are also referred to as the Static Computer Graph. A developer first needs to plan the input layer and then connect each enter layer from the hidden layer to the output layer (see Fig. 1). The graphs are composed of tensors and ops that outline all neural networks and all math. The session helps the diagram to run. TensorFlow comes with a Graphical Processing Unit bundle that can be used efficiently and faster for each and every matrix calculation. Once the records have been preprocessed, the statistics is divided into two components of training and checking out data. It has a ratio of 3:1 (Train/Test). The elements and instructions in the training dataset are broken down and saved in a tensor glide placeholder. Both facts set category data are one-hot (see Table 3), it is a method in which classification variables are converted to a numerical structure that is provided to the neural deep network model to predict successfully [16]. Two hidden layers with 300 neurons and a 50-year length are defined. We have to construct a dataflow diagram next. Random weight initialization w and bias b for every layer (input, hidden and output). The matrix multiplication of the first hidden Fig. 1 Structure of neural network in predicting student performance

Deep Neural Networks Predicting Student Performance Table 3 Encoding format

75

One-hot encoding Classes

One-hot encoding Format

Low

[1, 0, 0]

Middle

[0, 1, 0]

High

[0, 0, 1]

layer is passed to a linear rectified known as ReLu activation, input x, where all the neurons of the first hidden layer are linked. f (x) = maximum(x, 0)

(3)

All neurons in the next hidden layer are activated by the ReLu activation function are again computed with the next layer matrix. In the 2nd hidden layer, the activation function called Softmax is used to calculate the matrix. The Softmax function squashs the output into a categorical distribution probability that indicates the class probability of the output. Here z as the input layer vector to the output layer and j as the output unit index. The output is then transferred to the cost function in which the output is compared with the actual output. The cost function returns the error, this error is passed on to an optimization function called the TensorFlow optimizer. The optimization function updates the layer weights to reduce the error value in the cost function. The static calculation chart should be activated when the data flowchart is created. The TensorFlow session can be used to enable the computer graph. Session instantiation and transmission of data inputs into the run function. This model defines an era of 50 when the computer graph is iterated 50 times to give greater accuracy [1].

3 Results and Discussions The DNN was run by tools such as Python3.7 and TensorFlow. To visualize the interior work of the model, tensor boards and matplotlib library were used. It took the model 5 min to run the program (see Fig. 2). In our experiments, we used the cost function and accuracy of two measures to evaluate the quality of the classifier [9]. The objective of precision to achieve high value, the main purpose of cost function is for value reduction. We used various algorithms in their article such as decision tree, Naive Bayes and the artificial neural network, as well as the same dataset. This model was implemented in the Weka tool with a tenfold cross validation. The accuracy of the final classification is considered and compared to the model proposed. Here we will upload EDU dataset and load dataset for students. We can see that the dataset consists of 480 records and applications with 384 training records and 96

76

K. Neha et al.

Fig. 2 Consignment model for prediction

test records. Run all 7 algorithms now to calculate their accuracy of prediction and generate a train model (see Fig. 3). In the above result, we will run all buttons and have their accuracy for all 7 algorithms. All neural networks have the highest precision of 89.06% (see Table 4). In this application, random test data are used so that accuracy can vary with each run. Check the ‘accuracy graph’ now (see Fig. 4). In the above graph x-axis, the name of the algorithm and y-axis are the accuracy of the algorithms. We can conclude from the above graph that the NN algorithm has the greatest accuracy. Now click on the ‘Upload & Pronounce New Student Performance’ button and upload test dataset, and then use neural networks and their model to predict student performance. We have predicted the performance value for each test dataset record as HIGH, LOW or MEDIUM (see Fig. 5).

4 Conclusion This paper proposes a deep neural community model to predict pupil performance. This is the first time a DNN is used for information mining and student overall performance prediction. Through the experiment, we located that a DNN can better function with even much less information by having profound expertise of the records set and

Deep Neural Networks Predicting Student Performance

77

Fig. 3 Displays accuracy generated by algorithms

Table 4 Accuracy of different models

KNN

61.45

Naïve Bayes

67.70

Decision tree

70.83

Random forest

75.0

Logistic regression

73.95

SVM

55.20

Neural network

89.06

pleasant change on the model. The proposed mannequin carried out 89.06% accuracy. A DNN can achieve greater accuracy with large dataset records and features, and outperforms another machine learning algorithm. This paper assesses the performance of different machine learning algorithms and neural networks in order to forecast student performance. Compared to other machine learning algorithms, such as KNN, Naïve Bays, SVM, random forest and decision tree, neural network has better predictive accuracy. This model is reliable and can help predict the performance of a student and identify students who are more likely to fail beforehand to find remedies.

78

Fig. 4 Accuracy graph of algorithms

Fig. 5 Performance prediction of test dataset

K. Neha et al.

Deep Neural Networks Predicting Student Performance

79

References 1. Wang W, Miao C, Yu H (2017) Deep model for the prediction of dropout in moocs. In: The proceedings of the 2nd international crowd science and engineering conference (ICCSE 2017). Beijing, pp 26–32 2. Yadav SK, Bharadwaj B, Pal S (2012) Applications for mining data: comparative studies for predicting student performance dedication. arXiv preprint arXiv:1202.4815 3. Ng AY (2004) Selection of features, regulation of L1 versus L2 and rotational invariance. In: The 21st international conference on machine learning (ICMl 2004). Banff, pp 78–85 4. Ashraf M, Zaman M, Ahmed M, Sidiq SJ (2017) Knowledge discovery in academia: a survey on related literature. Int J Adv Res Comput Sci 8(1) 5. Rudin C, Letham B, Kogan E, Madigan D, Salleb-Aouissi A (2011) Prediction of sequential events by association rules. In: The 24th annual learning theory conference (COLT 2011), pp 615–634 6. Seaton D, Whitehill J, Mohan K, Rosen Y, Tingley D (2017) Delving deeper into a mooc dropout prediction for students. arXiv preprint arXi:1702.06404 7. Mi F, Yeung DY (2015) Temporary models to predict drop-out in massive open online courses. In: The international data mining workshop (ICDMW 2015) IEEE 15th proceedings. Atlantic City, pp 256–263 8. Ducharme R, Bengio Y, Vincent P (2001) A neural model of probabilistic language. In: Advances on neural information processing systems 13 (NIPS 2000), pp 932–938 9. Sebastian S, Puthiyidam JJ (2015) The artificial neural network performance evaluation by WEKA. Int Comput Appl J 119(23):36–39 10. Keeler JD, Leow WK, Rumelhart DE (1991) Integrated segmentation of hand-printed numerals and recognition. In: Advances in neural information processing system 3 (NIPS 1990), pp 557–563. 11. Graves A, Schmidhuber J (2005) Classification of phoneme with two-way LSTM networks. In: International neural networks joint conference (ICJNN’05) 2005, pp 23–43 12. Vihavainen A, Luukkainen M, Kurhila J (2013) Using student programming behaviour to predict success in an introductory course on mathematics. Educational data mining 2013 13. Sidiq SJ, Zaman M, Butt M (2018) An empirical comparison of classifiers for multi-class imbalance learning. Int J Data Min Emerg Technol 8(1):115–122 14. Pedro MO, Baker R, Bowers A, Heffernan N (2013) Predicting college enrolment by student interaction with an intelligent middle school teaching system. Educational data mining 15. Neha K, Sidiq SJ (2020) Analysis of student academic performance through expert systems. Int Res J Adv Sci Hub 2:48–54 16. Neha K (2021) A study on prediction of student academic performance based on expert systems. Turk J Comput Math Educ (TURCOMAT) 12(7):1483–1488

An Efficient Group Signature Scheme Based on ECDLP Namita Tiwari, Amit Virmani, and Ashutosh Tripathi

Abstract In today’s digital world, group signature plays an important role for different security aspects in institutions, government organizations, etc. Group signature is basically applicable if there is a need of signing a digital document by an authorized group member anonymously on behalf of whole group. We have proposed a group signature scheme in the identity-based setting. Its security relies on elliptic curve discrete logarithm problem (ECDLP). Our proposal is the first ever scheme in ECDLP setting and much more efficient in terms of computational complexity. It is applicable in all those environments where less bandwidth is required as blockchain architectures, blockchain-based mobile-edge computing (BMEC), etc. Keywords Group signature · Elliptic curves · Public key cryptography

1 Introduction ‘Source authentication’ can be achieved by digital signatures in cryptography. Researchers have proposed many digital signatures to achieve authentication. A method allowing any member of group to make signature on behalf of group anonymously is called group signature scheme (GSS). This concept was firstly introduced by Chaum [1] in 1991. A well defined group signature must have the following main properties:

N. Tiwari (B) School of Sciences, CSJM University Kanpur, Kanpur, India e-mail: [email protected] A. Virmani Computer Application, UIET Kanpur, Kanpur, India e-mail: [email protected] A. Tripathi T Systems, Pune, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_7

81

82

N. Tiwari et al.

(a) If signature is asked on message m, then any one group member can sign m. (b) The owner of the message m can verify that signature on m is generated by the group, but it does not know the actual signer. (c) If there is any dispute in future, only group manager (GM) knows the identity of signer. It is applicable in e-voting, e-auctions, blockchain architecture and blockchain-based mobile-edge computing (BMEC) [2] etc. Motivated by these applications, we have raised one another application, namely document verification by admission committee discussed in application section. Electronic cash systems use GSS to hide the cash issuing bank’s identities [3] and identity escrow [4]. There are several different group signature schemes [5–9], etc., in the literature. Of course, these literatures are benchmark on the theory and applications related to group signature scheme. In this journey, a lot of improvements and new group signature schemes [10–24] has also been proposed in the literature. Recently, many interesting GSS [25, 26] in different security environments have been proposed, but there is always an scope for new thoughts and improvements. Still, there are a lot of limitations as well as improvements required which make existing schemes much efficient due to less key size. Getting motivated, we have proposed an efficient GSS using elliptic curve setting. Proposed scheme satisfies all the security requirements, namely correctness, unforgeability, anonymity, unlinkability, exculpability, and traceability. To the best of our knowledge, this is the first GSS ever having security on ECDLP [27, 28]. Roadmap: Sect. 2 is dedicated for mathematical foundation. GSS is proposed in ‘Sect. 3’. Demonstration of security properties of GSS are in ‘Sect. 4’. Applications have been discussed in Sect. 5. Section 6 concludes with some future directions.

2 Preliminaries 2.1 Background of Elliptic Curve Group Tiwari and Padhye [29]; Consider a field Fq (q prime); ‘EC/Fq ’ means an elliptic curve EC over Fq s.t. z 2 = y 3 + ay + b, a, b ∈ Fq , and discriminant = 4a 3 + 27b2 = 0. The points of EC/Fq together with point ∞ (called point at infinity) make a group G  = {(y, z) : y, z ∈ Fq , EC(y, z) = 0} ∪ {∞}. Consider a cyclic subgroup of G  say G of order n with generator P under addition “+” [29].

An Efficient Group Signature Scheme Based on ECDLP

83

2.2 ECDLP Assumption Given x ∈ R Z n ∗ , P, s.t. Q = x P. It is assumed to be Computationally intractable to find x in polynomial time if only P, Q are known.

3 Proposed GS Scheme Let U = {U0 , U1 , U2 , . . . , Un } is a group of signers who can create signatures on behalf of group U , where U0 is manager of group say (GM) and C acts as trusted KGC.

3.1 KGC Taking k as security params, KGC produces system parameters as: {Fq , EC/Fq , G, P, H, H1 , where}. H : {0, 1}∗ → Z n∗ and H1 : {0, 1}∗ × G × G → Z n∗ are hash functions. This algorithm is performed by GM U0 with the help of KGC. (a) chooses randomly x R Z n∗ (b) computes Ppub = x P (c) picks IDi (e.g., email id’s or other id’s related to group members) and computes Q i = H (IDi ), 0 ≤ i ≤ n. Q i is public key of members of group so that real identity can not be revealed by verifiers. U0 publishes group public key as pk = {P, Ppub , Q i }, i = 0, 1, ..., n. Here, (x, Ppub ) is master secret and public key pair for group manager U0 .

3.2 Extract Consider if any user Ui wishes to be a member of the signer group then GM communicates with U0 using ‘secure channel’. (a) (b) (c) (d)

Ui sends IDi to U0 . U0 computes ski = x Q i mod n and sends to each Ui (secure channel). Ui computes bi = ski H1 (ski , Q i , Ppub ) mod n and Bi = bi P Ui sends Bi to general manager U0 .

84

N. Tiwari et al.

(e) U0 has (Q i , ski ) for every authorized group member Ui . Please note that ski is held only by group manager U0 and user Ui . (f) Ui becomes an authorized GM of group. Ui s secret key is (ski , bi), and public key is (Q i , Bi ). All group members store these informations in smart card held privately.

3.3 GroupSign This is group signature generation algorithm. Given a message m, Ui (a) chooses r  R Z n∗ computes R = r P H1 (ski , Q i , P) + Q i P group (b) computes s = ski + H1 (m, R, Bi )bi mod n m is {R, m, s, Bi , Ppub } and sends to verifier.

signature

on

3.4 GroupVerif Verifier or message receiver can verify the group signature by checking the following equation: Verifier takes s, Ppub , Q i , R, Bi , P and checks whether s P = Ppub Q i + H1 (m, R, Bi )Bi holds. if yes, signature is valid otherwise there is a problem somewhere.

4 Security Analysis (1) Correctness of GSS: s = (ski + H1 (m, R, Bi )bi ) mod n. s P = ski P + H1 (m, R, Bi )bi P s P = x Q i P + H1 (m, R, Bi )bi P s P = Q i PPub + H1 (m, R, Bi )bi P s P = Ppub Q i + H1 (m, R, Bi )Bi . (2) Unforgeability: Only, authorized member can sign message m on behalf of the group. Our proposed scheme satisfies this property as: During verification, verifier uses R, Bi where R = r P H1 (ski , Q i , P) + Q i P and Bi = bi P. Clearly, R

An Efficient Group Signature Scheme Based on ECDLP

(3)

(4)

(5)

(6)

85

is linked with ski and Bi is linked with bi . So, only, authorized group member is only able to make sign on the required group in our scheme. Anonymity: Only, GM can identify the actual signer on given message m. Logic: Since s P = Ppub Q i + H1 (m, R, Bi )Bi . Verification requires Ppub , Q i , H1 (m, R, Bi ), Bi . Only, group manager knows Q i = H (IDi ) for IDi , and no other person can determine IDi for given H (IDi ) because hash function is not invertible. Unlinkability: Clearly, Q i disappears for any two different message-signature pair. So, it is computationally hard to decide whether two different valid messagesignature pair are created by same group signer. Exculpability: Clearly, during the signature generation, s = ski + H1 (m, R, Bi ) bi mod n, every group member has its own secret bi , so neither of Ui can make sign on behalf of U j , (i = j). Traceability: Since Bi is used in verification equation s P = Ppub Q i + H1 (m, R, Bi )Bi .

Group manager U0 can check whether Bi = PPub Q i H (ski , Q i ). Thus, only U0 knows ski , and so, traceability holds.

5 Application Suppose there is an admission committee wants to verify documents of admission seeking candidate digitally by signing, then for this purpose, convenor of the committee assigns a particular member for verification process. But, sometime, it happens that he/she deny to make sign on documents at the time of verification because he/she don’t want to be get responsible or claimed by candidate in future for any kind of mistake in document verification process. Problem is very serious. What is the solution. Of course, group signature is the best solution. Message owner can verify that signature is generated by signer group, namely admission committee, but it can not determine the actual signer, and so, message owner cannot complain against the signer. Now, in case if dispute arises, it is possible to determine who is actual signer. No candidate seeking admission can determine that which committee member verified his/her documents, but candidate knows only that anyone of the committee members did so. In case of any unavoidable circumstances, group manager (convenor of committee) can identify the actual signer. Our proposed GSS can also be used electronic voting mechanisms, electronic sales and bidding system, corporate organizations and blockchains etc.

6 Conclusion This paper presents the first ever ECDLP-based GSS. Elliptic curve cryptography makes it more efficient and secure. It can be used practically if identity of signers in the group is needed to be hide. It is applicable also in blockchain. Our future plan is

86

N. Tiwari et al.

to prove security against adaptive chosen message attack. We will try to write how it can be used in blockchains. Further, we will add notion of fail-stop signature to our group signature scheme to make it more secure.

References 1. Chaum D (1991) Group signatures. In: Advances in cryptology-EUROCRYPT’91. LNCS, vol 547. Springer, Berlin, pp 257–265 2. Zhang S, Lee JH (2020) A group signature and authentication scheme for blockchain-based mobile-edge computing. IEEE Int Things J 7(5):4557–4565. https://doi.org/10.1109/JIOT. 2019.2960027 3. Lysyanskaya A, Ramzan Z (1998) Group blind digital signatures: a scalable solution to electronic cash. In: Financial cryptography (FC’98). Lecture notes in computer science, vol 1465. Springer, Berlin, pp 184–197 4. Kilian J, Petrank E (1998) Identity escrow. In: Advances in cryptology—CRYPTO’98. Lecture notes in computer science, vol 1642. Springer, Berlin, pp 169–185 5. Camenisch J, Stadler M (1997) Efficient group signature schemes for large groups. In: Kaliski B (ed) Advances in cryptology—CRYPTO’97. Lecture notes in computer science, vol 1296. Springer, Berlin, pp 410–424 6. Camenisch J, Michels M (1998) A group signature with improved efficiency. In: Advances in cryptology ASIACRYPT’98. Lecture notes in computer science, vol 1514. Springer, Berlin, pp 160–174 7. Camenisch J, Michels M (1999) Separability and efficiency for generic group signature schemes. Wiener M (ed) Advances in cryptology—CRYPTO’99. Lecture notes in computer science, vol 1666. Springer, Berlin, pp 413–430 8. Chen L (1995) Pedersen TP (1995) New group signature schemes. In: De Santis A (ed) Advances in cryptology EuroCrypt’94, vol 950. Lecture notes in computer science. Springer, Berlin, pp 171–181 9. Ateniese G, Camenisch J, Joye M, Tsudik G (2000) A practical and provably secure coalitionresistant group signature scheme. In: Bellare M (ed) Advances in cryptology—CRYPTO 2000. Lecture notes in computer science, vol 1880. Springer, Berlin, pp 255–270 10. Alamelou Q, Blazy O, Cauchie S, Gaborit P (2016) A practical group signature scheme based on rank metric. In: Arithmetic of finite fields. Springer, Berlin, pp 258–275 11. Alamelou Q, Blazy O, Cauchie S, Gaborit P (2017) A code-based group signature scheme. Des Codes Crypt 82(1–2):469–493 12. Cao Y (2019) Decentralized group signature scheme based on blockchain. In: Proceedings of the 2019 international conference on communications, information system and computer engineering (CISCE). Haikou, China 13. Jonathan JRC, Chiang YY, Hsu WH, Lin WY (2021) Fail-stop group signature scheme. In: Security and communication networks. https://doi.org/10.1155/2021/6693726 14. Devidas S, Subba Rao YV, Rukma Rekha N (2021) A decentralized group signature scheme for privacy protection in a blockchain. Int J Appl Math Comput Sci 31(2). https://doi.org/10. 34768/amcs-2021-0024 15. Ezerman MF, Lee HT, Ling S, Nguyen K, Wang H (2015) A provably secure group signature scheme from code-based assumptions. In: Advances in cryptology. Springer, Berlin, pp 260– 285 16. Gordon SD, Katz J, Vaikuntanathan V (2010) A group signature scheme from lattice assumptions. In: Advances in cryptology. Springer, Berlin, pp 395–412 17. Langlois A, Ling S, Nguyen K, Wang H (2014) Lattice-based group signature scheme with verifier-local revocation. In: Public-key cryptography-PKC 2014. Springer, Berlin, pp 345–361

An Efficient Group Signature Scheme Based on ECDLP

87

18. Ling S, Nguyen K, Wang H (2015) Group signatures from lattices: simpler, tighter, shorter, ring-based. In: Public-key cryptography-PKC 2015. Springer, Berlin, pp 427–449 19. Nakanishi T, Fujii H, Hira Y, Funabiki N (2009) Revocable group signature schemes with constant costs for signing and verifying. In: Public key cryptography-PKC 2009. Springer, Berlin, pp 463–480 20. Nguyen PQ, Zhang J, Zhang Z (2015) Simpler efficient group signatures from lattices. In: Public-key cryptography-PKC 2015. Springer, Berlin, pp 401–426 21. Perera MNS, Koshiba T (2018) Fully dynamic group signature scheme with member registration and verifier-local revocation. In: Mathematics and computing. Springer, Berlin 22. Wei L, Liu J (2010) Shorter verifier-local revocation group signature with backward unlinkability. In: Pairing-based cryptography. Springer, Berlin, pp 136–146 23. Wang L, Zhang K, Qian H, Chen J (2021) Group signature with verifier-local revocation based on coding theory, security and communication networks. https://doi.org/10.1155/2021/ 3259767 24. Zhou S, Lin D (2006) Shorter verifier-local revocation group signatures from bilinear maps. In: Cryptology and network security. Springer, Berlin, pp 126–143 25. Ling S, Nguyen K, Wang H, Xu Y (2017) Lattice-based group signatures: achieving full dynamicity with ease. In: Gollmann D, Miyaji A, Kikuchi H (eds) Proceedings of conference. ACNS, Kanazawa, 10–12 July 2017. Springer, Berlin, pp 293–312 26. Sun Y, Liu Y, Wu B (2019) An efficient full dynamic group signature scheme over ring. Cybersecur 2:21. https://doi.org/10.1186/s42400-019-0037-8 27. The Certicom Corporation, SEC 2: recommended elliptic curve domain parameters www.secg.org/collateral/sec2_final.pdf 28. Shamus Software Ltd., Miracl library. http://www.shamus.ie/index.php?page=home 29. Tiwari N, Padhye S (2011) Provable secure proxy signature scheme without bilinear pairings. Int J Commun Syst. https://doi.org/10.1002/dac.1367

Sentiment Analysis of COVID-19 Tweets Using TextBlob and Machine Learning Classifiers An Evaluation to Show How COVID-19 Opinions Is Influencing Psychological Reactions of People’s Behavior in Social Media P. Kathiravan, R. Saranya , and Sridurga Sekar Abstract Social media are the influential Internet community for general netizens to disseminate information, views, and opinions extensively. Recently, the research on Sentiment analysis has intensified due to the vast amount of data obtained from these numerous social networking platforms. During this COVID-19 pandemic, one of the great social networking sites viz. Twitter has experienced a significant increase in online posts and reviews concerning COVID-19 related tweets. In the view of Twitter, the spread of news concerning coronavirus variants has impacted many aspects of the public by providing messages around different issues and opinions to its potential users. In this study, we have used Twitter as a source of data for searching the index keywords and hashtag versions for the terms like “Pandemic”, “Coronavirus”, “COVID-19”, “SARS-CoV-2”, and “Omicron”, and we have examined the psychological and emotional impact it had on the public. This paper provides analytical information about people’s perceptions of coronavirus (tweets were collected from March 2020 to December 2021). With those tweets, sentiment analysis was performed using the TextBlob text processing python module to acquire the people’s subjective data (opinions and feelings) polarity concerning the effects of coronavirus. Furthermore, for effective text classification, we have applied classification methodologies like Support Vector Machine (SVM), K Nearest Neighbor (KNN), Logistic Regression (LR), and Decision Tree (DT). The SVM concerning feature extraction produced the results with 99.49% accuracy. This study assists the government organizations in attaining COVID-19 insights by implying public mental health sentiments in social networks. Keywords Sentiment analysis · Twitter · Coronavirus · TextBlob · COVID-19 · Omicron · Wordcloud · Pandemic P. Kathiravan · R. Saranya (B) Department of Computer Science, Central University of Tamil Nadu, Thiruvarur, India e-mail: [email protected] S. Sekar STRAIVE, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_8

89

90

P. Kathiravan et al.

1 Introduction Today, the advent of social networking sites plays a crucial role and has become an integral part of everyone’s life. Popular social media platforms like Twitter, Facebook, LinkedIn, and Instagram purveyed the exchange of information or ideas, interactions, and communicating with people all across the world, which has influenced human psychological traits that have engendered their behavior and actions to a greater extent [1, 2]. People freely post their thoughts on these platforms, which contemplates their emotional responses that publicize their lives on social media. In the view of aggressive growth of textual data generated on the Internet, sentiment analysis has progressively become a key research area that delegates the business process to make decisive actions by detecting positive or negative sentiments about their products or services [3]. In this time of pervasive COVID-19 pandemic, which has shattered our lives in countless ways, people substantially used Twitter to express their opinions (sentiments) and emotions, and humankind is striving to accomplish something in the face of toil or struggles. Such a Twitter-based communication medium has been flooded with a pool of covid related sentiments [4]. Notwithstanding the unstructured masses of text generated, the scalable nature of text analytics has gathered the worthy of attention in opinion mining. In this study, we have analyzed the following Research Questions (RQ): RQ 1: What are the psychological reactions and discernments associated with COVID-19 variants (SARS-CoV-2 and Omicron) spread of information in the Twitter media? RQ 2: How do the COVID-19 opinions or features-based directions on Twitter influences people’s behavior concerning vaccine sentiments, anxiety, misinformation, and panic-related tweets? Our contributions in this study are concerning perceiving the people’s mindset to benefit government organizations and health sectors to take appropriate measures and necessary arrangements in the view of COVID-19 variants. In this study, we have collected the tweets with the keywords regarding coronavirus (#Coronavirus, #COVID-19, #Omicron, #Pandemic) from the Twitter application through API keys [4, 5]. We have applied text preprocessing techniques like stop word removal and stemming on coronavirus tweets, followed by creation and analysis of word cloud. Then we have found the polarity score of each tweet to analyze and classify the sentiments, followed by experimental evaluations using classification algorithms. The rest of this study paper is organized as follows. Section 2 discusses the contribution of related works on Sentiment analysis using tweets. In Sect. 3, the discussion on design and analysis. Finally, in Sect. 4, we concluded our work.

Sentiment Analysis of COVID-19 Tweets Using TextBlob …

91

2 Related Works Various researches have leveraged to identify the sentiment of public discourses, particularly on social media. Table 1 illustrates the summary of related works. Chiarello et al. [3], proposed a new social media-based paradigm for assessing how new products are perceived from a technical perspective. They used the technique based on a lexicon created in a related study [6] to analyze patents and identify benefits and limitations associated with a specific technology like Xbox one X and the new N2DS XL. These are a few of the most powerful gaming consoles on the market today. Kariya and Khodke [7] compared the various Sentiment analysis tools and techniques. They got 99.64% accuracy with the KNN classifier, significantly better than Naïve Bayes and SVM. During the epidemic, Kaur et al. [4] created a model to investigate the dynamics and flow of behavioral changes among twitter users using natural language processing tools (TextBlob). The data is also visualized in order to spot any patterns that are changing. Mandloi and Patel [8] experimented with some machine learning algorithms to analyze the sentiment on Twitter data. Firstly they gathered the data/tweets using Twitter API. Then they applied preprocessing techniques for tokenization, stop word removal, and non-English words removal. After preprocessing, the feature selection has been made to make the classification efficient. And the next step is model building. In this paper, they compared three machine learning classification algorithms. So, they implemented three machine learning algorithms, such as Naïve Bayes, Support Vector Machine(SVM), and Maximum Entropy Method. With these machine learning algorithms, they classified each tweet as a positive tweet, negative tweet, or neutral tweet. The accuracy of the three classification algorithms is given. Such as Naïve Bayes—86% of accuracy; Support Vector Machine(SVM)—74.6% of accuracy; Maximum Entropy Method—82.6% of accuracy. Moreover, what they conclude with this comparative analysis is that Naïve Bayes has to give higher accuracy than other experimented algorithms Support Vector Machine(SVM) and Maximum Entropy Method. Pokharel [5] analyzed the sentiment from tweets that were tweeted from Nepal. This investigation was done to examine the cognition of Nepalese people. So, the tweets are extracted from the Twitter application using Twitter API based on geolocation Nepal. And these tweets were collected by a particular period, which was from May 21 to May 31. They utilized Python to accomplish this work effectively. Then the tweets are preprocessed. Then Textblob library has used to find the polarity and subjectivity of each tweet. After that, the model has built. They used the Naïve Bayes algorithm for the classification of tweets. Finally, Wordcloud has been created. Wordcloud is one of the data visualization techniques. They visualized the data for subjectivity, polarity, and positive, negative, and neutral tweets correspondingly on a date basis.

92

P. Kathiravan et al.

Table 1 Related works on sentiment analysis using tweets Author

Topic

Dataset

Observation

Size

[3]

Xbox One X, New Nintendo 3DS XL

Twitter

11th June 66,796 tweets 2017–31th July 2017

Novel Lexicon with LSTM and SVM

[4]

COVID-19

Twitter

February 16,138 tweets 05–February 11, May 21–May 27 and June 15–June 21 in the year 2020

TextBlob

[8]

N/A

Twitter

N/A

Naïve Bayes Classifier

[5]

COVID-19

Twitter

21st May 2020 615 tweets and 31st May 2020

TextBlob

[9]

Few recent events and issues

Twitter

25 February 2018–08 March 2018

7246 tweets

Naïve Bayes classifier

[10]

US presidential Twitter elections 2016

N/A

150,000 tweets TextBlob

[11]

Different topics Twitter

N/A

Various sizes

Voting ensemble classifier

[12]

Movie rating

Large movie review dataset (LMRD)

N/A

50,000 tweets

SentiWordNet with Naïve Bayes

[13]

Election, 2 general topics, movie review

Twitter

N/A

7086, 1,578,672, 5513, 25,000 tweets

Ensemble classifier

[14]

Airline

Twitter

N/A

N/A

BoW with SVM

1000 tweets

Method

This paper has some limitations. They are, it will apply only for the English language. Moreover, the location criteria are optional in the Twitter platform while uploading the posts or tweets. So, some people have not mentioned the location. So, this system could not be able to fetch those tweets. Sailunaz and Alhajj [9] recommended machine learning techniques for emotion and sentiment analysis on Twitter text data. They developed a general and personalized recommendation system for the social network communities. In this paper, they compared three machine learning algorithms. Such as Naïve Bayes. Support Vector Machine (SVM) and random forest. Moreover, the Naïve Bayes gives 66.68% accuracy; the Support Vector Machine gives 23.32% accuracy, and Random Forest provides 55.23% accuracy. They concluded that Naïve Bayes gave higher accuracy

Sentiment Analysis of COVID-19 Tweets Using TextBlob …

93

among machine learning algorithms. Moreover, they did up to sentiment analysis, which is the first step toward building a recommendation system for social networks. The limitation of this paper is that this system is static and has a minimal range of data within a fixed period. Furthermore, this framework focused only on simple and extracted proper words from it. However, in reality, tweets have many shorthands/short-forms like an abbreviation, emoticons, etc., So, this proposed system cannot handle this type of thing. It worked only on the proper English words. Nausheen and Begum [10] authors extracted 150,000 tweets regarding Donald Trump, Hillary Clinton, and Bernie Sanders are the top election candidates and tried to find the people’s opinions about them. Even though so many ML algorithms are available, they used the Python library called TextBlob, which does not require labeled data and is easy to implement. TextBlob is used to find the polarity score of each tweet and calculate the average of emotions such as positive, negative, and neutral from the collected tweets. Finally, they concluded that Hillary was the most popular and got a high polarity score. They used word cloud to visualization represent their work. Saleena and Ankit [11] experimented with a novel Twitter sentiment analysis with the ensemble techniques. They used Bag of Words (BoW) for text vectorization in this work. The BoW is a technique to transform the text data into numerical data to make the data in a machine-readable format. It contains the count of the most frequent word. The limitation of this method is high sparsity, which increases the dimensionality of the vector [15, 16]. Then the model has built. They implemented the ensemble classifier with Naïve Bayes, Random Forest, Support Vector Machine(SVM), and Logistic Regression. The accuracy of this proposed ensemble classifier is significantly high compared to other machine learning algorithms. Goel et al. [12] combined the SentiWordNet along with Naïve Bayes to develop a new Sentiment analysis system for movie reviews. They used the Large Movie Review Dataset(LMRD) with 50,000 tweets. The SentiWordNet is a lexicon database with 16 million English words and is used to find a word’s positivity, negativity, and objectivity score. Naïve Bayes classifier is a speed and simple ML algorithm, and it gave 58.40% accuracy along with SentiWordNet to classify the movie review tweets. Kanakaraj and Guddeti [13] focused on comparing various boosting and bagging ensemble methods to design a novel system for Twitter sentiment analysis. They worked with different Twitter datasets,(A) An Election dataset with 7086 classified tweets. (B) The Two different general twitter datasets with 1,578,627, 5513 classified tweets, and (C) A Movie review dataset of 25,000 tweets. They used synsets of WordNet, which is used to capture the semantic similarity of each tweet. Finally, they compared the Ensemble methods against various machine learning algorithms such as SVM, Baseline, MsxEnt, and naïve Bayes. They concluded that the ensemble methods with Extremely Randomized Trees classification perform better than other methods to classifying the data as Positive, Negative, and Neutral tweets.The model was tested on a dataset containing tweets from six different airlines in the United States by Abdelrahman and Saad [14] and got 83.31% with BoW and SVM.

94

P. Kathiravan et al.

Fig. 1 Working model of SA

3 Design and Analysis This study is to understand people’s perception of the COVID-19, how it frequently changes, and how it impacts society. The workflow of the working model is shown in Fig. 1

3.1 Data Extraction We aim to analyze the movement of Omicron, and COVID-19 on social media to classify them into different emotion categories such as positive, negative, and neutral. Due to its wide popularity, the data was extracted from Twitter because it is the top social media platform. Many people use Twitter to share their opinions and experiences about public issues. It allows up to 280 characters in a single tweet and generates 550 million tweets worldwide every day [12, 13, 17]. For the sake of extracting the tweets, we used Tweepy (Python package) to make a dataset of 7,000 English tweets having ‘#COVID-19’, ‘#Pandemic’, ‘#SARS-CoV-2’, and ‘#OMICRON’ as a key term from the 15 most active pages during March 2020–December 2021. The example tweets are shown in Fig. 2.

3.2 Data Preprocessing The social media posts are unstructured and ambiguous in nature. It is a collection of information related to the user’s emotions, leading to ambiguities such as spelling mistakes, acronyms, colloquial languages, and emojis [2]. Activity such as tokeniza-

Sentiment Analysis of COVID-19 Tweets Using TextBlob …

95

Fig. 2 Sample tweet

tion, stop word removal, and stemming is used to eliminate noisy and undesired words from documents to enhance the performance of the Sentiment analysis [18]. Table 2 illustrates the various preprocessing tasks implemented in this work. Removal of duplicate data: This step involves removing the redundant data. Filtering: This step involves removing URLs and the special characters (e.g., @, #, ! etc.). The Stop word removal is the commonly used preprocessing activity in various text mining problems. The purpose of removing the Stop words is to extract the content words from the given data, and it controls the increase of document dimension [15, 19]. The most common words in textual documents are articles, prepositions, and pronouns, which alone do not give any meaning, so these words are treated as stop words or weak words. Some example stop words are the, in, a, an, with, by, they, and is [20]. The Twitter dataset has 134,010 word counts at the early stage. Still, it reduced to 92,518 after removing the stop words it is shown in Table 2 and Algorithm 1 illustrates stop word removal from the given Twitter dataset. Stemming: The stemming or lemmatization algorithms [21] reduce dimensionality of the features [22]. For example, the words trouble, troubling, troubled all can be stemmed from the word “troubl” by removing suffix, but it is not an exact root word. It often leads to the incorrect meaning and spelling of the word. On the other hand, the lemmatization algorithms replace the words with proper root words by considering the context or meaning of the word called a lemma. It is computationally expensive than stemming algorithms. In an implementation, lemmatization is mostly used in applications like a chatbot, machine translation, and recommended systems. The stemming is used to develop document classification, information extraction, etc. [15, 20]. The advantages and limitations of various stemming algorithms are discussed in Table 3. As it could be observed, among all the stemming algorithms, porter Stemmer produced the better output with less error rate.

96

P. Kathiravan et al.

Table 2 Data description with preprocessing steps Steps Words count Tweets No pre-processing

Total no of words Example (tweet)

Stemming

Total no of words Example (tweet)

Stop words removal

Total no of words Example

Special characters removal

Example

134,010 Two new #Omicron cases reported in #Delhi, case tally rises to 24, says Delhi Health Department. Out of these 24 patients, 12 have been discharged and 12 are under treatment, adds the Health department 134,010 Two new #Omicron case report in #Delhi, case tally rise to 24, say Delhi Health Department. Out of these 24 patients, 12 have been discharg and 12 are und treatment, add the Health department 92,518 Two new #Omicron cases reported #Delhi, case tally rises 24, says Delhi Health Department. Out 24 patients, 12 discharged 12 treatment, adds Health department. (ex. in, to, of, these, have, been, and, are, under, the) Two new Omicron cases reported in Delhi case tally rises to 24 says Delhi Health Department out of these 24 patients 12 have been discharged and 12 are under treatment adds the Health department

Algorithm 1 To remove the stop words Require: tweets Ensure: dataset without stop words. Dataset ← T witter Data stop_wor ds ← set (stopwor ds.wor ds( english  )) wor d_tokens ← wor d_tokeni ze(Dataset) f ilter ed_sentence ← [w for in word_tokens if not w.lower() in stop_words] f ilter ed_sentence ← [] for winword_tokens: ifwnotinstop_words: f ilter ed_sentence.append(w) tweets ← f ilter ed_sentence

3.3 Visualization A word cloud is a graphical representation model that contains a collection of words used in a particular text dataset, in which the size of each word denotes its frequency or importance [23]. Figure 3, depicts the word cloud of the sample dataset, where the terms such as Omicron, covid variant, and new Omicron occurred most frequently.

Sentiment Analysis of COVID-19 Tweets Using TextBlob … Table 3 Comparison of various stemming algorithms Stemming algo- Advantages rithms Lovins stemmer

Porter stemmer

N-gram stemmer

HMM stemmer

YASS stemmer

Fast—single pass algorithm Effectively handles the double letters in words Handles irregular plurals Dependent on the technical terms used by the program Produces the best output and less error rate than other stemmers It’s very light than Lovins in terms of performance It is used to design the language-independent approach, e.g., the Snowball stemmer Based on the concept of n-grams and string comparisons Language independent

97

Limitations High time consuming Not all suffixes available No reliable and frequently fails to form words from the stems

The stems are not always produce actual words It has more steps and rules to process, consuming more time

Time consumption is high

Requires sample space for creating and indexing the n-grams Not a very practical method Based on the Hidden Markov model Difficult to implement Unsupervised method Over stemming is one kind of error in this method Language-independent like N-gram stemmer Based on hierarchical clustering Complex to decide a threshold for approach and distance measures creating clusters It is also a corpus based method Requires high power to compute

Sentiment Analysis The sentiment of a tweet can be understood using the TextBlob library [5, 7, 17], which gives the polarity of each tweet. The Algorithm 2 and flow chart illustrated the processor polarity estimation, which is the range that lies between −1 to +1. The range −1 indicates that the shared tweets are negative. The range +1 means that the given tweets are positive, and 0 means that the given texts are neutral. The flow chart illustrates in Fig. 4 the overview of the process of calculating the polarity scoring. After that, the positive, negative, and neutral tweets are extracted separately as ‘#COVID-19’, ‘#Pandemic’, ‘#SARS-CoV-2’, and ‘#OMICRON’ , and results are given in the form of a pie chart in Fig. 5a–d respectively.

98

P. Kathiravan et al.

Algorithm 2 find the polarity score using TextBlob. Require: tweets Ensure: polarity X ← tweets Y ← textblob.T ext Blob(tweets) S A ← Y.sentiment. polarit y if S A > 0 then return 1 else if SA == 0 then return 0 return -1 end if

3.4 Model Training We identified the problem of Sentiment analysis for COVID-19 trends on Twitter as multiclass classification. In particular, in this paper, we used TextBlob to find the polarity score for analyzing the insight of emotions from tweets. It replaces the manual annotation and is used to input the Machine Learning (ML) classifier. Word Embedding The word embedding techniques such as Bag of Words (BOW), Term Frequency-Inverse Document Frequency (TF-IDF), The Pre-trained Word Embeddings such as Word2Vec, Global Vectors (GloVe) are already shown better results in various class prediction problems [2, 24]. It is used to select valuable features from social media posts for better document representation. Word2Vec has been trained on 100 billion words from Google News and covers 300-dimensional vectors for a vocabulary of 3 million words and phrases. Global Vectors (GloVe)

Fig. 3 Wordcloud for sample tweets

Sentiment Analysis of COVID-19 Tweets Using TextBlob …

99

Start

X=tweets, Y=TextBlob(X)

SA ← Y.sentiment.polarity

no

SA > 0

yes

return Positive

SA == 0

no

yes

return Neutral

return Negative

Output

Stop

Fig. 4 Flow chart for find the polarity score

is the most popular pre-trained embedding like Word2Vec. It has been trained on nearly 840 billion words from Twitter posts, covering 300-dimensional vectors for a vocabulary set of 2.2 million words and phrases [24].

100

P. Kathiravan et al.

(a) SA of COVID-19

(b) SA of Pandemic

(c) SA of SARS-CoV-2

(d) SA of OMICRON

Fig. 5 Visualization of positive, negative, and neutral (SA) tweets with different keywords related to COVID-19

Automatic classification of social media posts offers a solution to NGO and public health sectors to decide the assimilation of the data. It is a hot topic of research that aims to identify key data elements from social media posts by using Natural Language Processing Techniques [3, 11, 25, 26]. Machine learning is broadly categorized into supervised learning and unsupervised learning. The automatic classification comes under supervised learning, where the labeled data, i.e., manually annotated data, must be considered into training and testing data. The training data with the label is used to build the model, and training data without a label is used to evaluate the model’s performance in terms of accuracy. For the sake of efficiency and accuracy, various classifiers are evaluated. We randomly selected the following algorithms as per related works to automatically classify the COVID-19 related online posts. K-Nearest Neighbors (KNN) K-NN is a non-parametric and lazy learning algorithm, which is one of the most popular classification algorithms in ML. It does not learn anything. It simply memorizes the data and classifies new cases based on a similarity measure, i.e., distance functions. There are two formal methods to calculate the distance between data elements: Euclidean and Manhattan distances. The Euclidean distance is calculated by using the following equation.

Sentiment Analysis of COVID-19 Tweets Using TextBlob …

D( p, q) =

 ( p 1 − q 1 )2 + ( p 2 − q 2 )2 + · · · + ( p n − q n )2

101

(1)

In Eq. (1), p and q are social media posts that are to be compared with n characteristics, and the parameter k denotes the number of neighbors that will be chosen for the KNN algorithm [7, 27, 28]. Logistic Regression (LR) In general, logistic regression is used with dichotomous dependent variables. Further, the technique extended to polytomous or multinomial dependent variables. It is a predictive analysis that is used to describe the relationship between the nominal and dependent variables and more continuous-level independent variables [11, 15, 29]. Decision Tree (DT) DT is a divide and conquer approach to solve classification and regression problems. It is a hierarchical model composed of decision rules that are applied recursively to split the source set into several subsets until it reaches the target value [30]. It also helps to calculate the self-similarity to represent the features [1, 31]. It can generate understandable rules and perform classification with less computation and fast. The impurity is calculated using measurement called entropy (Eq. 2) and information gain (Eq. 3) is used to decrease the entropy. Entropy (S) =

c 

− pi log2 pi

(2)

i=1

Information Gain (S, Fi) = Previous Entropy (S) − Current Entropy (S)

(3)

Information gain is the difference of previous entropy and the current entropy which reduce the entropy. To avoid DT overgrowth, cross-validation is used to determine the hyperparameter of DT, i.e., the height of the DT. The large tree is difficult to comprehend, and training the model is computationally expensive.

3.5 Support Vector Machine (SVM) Both linear and nonlinear classification, as well as regression, are well handled by SVM [32]. The SVM classifier’s main idea is that it first non-linearly translates the initial training data into a suitably higher dimension, let’s say n, so that the data within the higher dimension is simply separated by n − 1 dimension decision surfaces known as hyperplanes. The SVM classifier finds the simplest hyperplane with the largest support vector margins among all hyperplanes. SVM classifiers work quickly on large data sets thanks to non-linearity mapping, and they have been used to classify text with success [18]. Limitation The above machine learning algorithms work on labeled data which requires manual annotation with the supervision of a domain expert. Due to unstructured data, feature engineering is a tedious task in social media data. Storing and

102

P. Kathiravan et al.

Table 4 SWOT analysis of sentiment analysis using Twitter Strengths Weaknesses Faster and better data identification Improves decision making Time saving Opportunities Hot spot identification Helps to understand public opinion Research and Academic opportunities

Lack of reliable data Manual annotation Not suitable for massive dataset Dataset availability Threats Lack of experts False messages easily spread over internet

Fig. 6 Various performance measures

processing are also challenging tasks because of the real-time streaming of social media posts. Table 4 depicts the strengths, weaknesses, and opportunities of automatic document classification.

3.6 Classifier and Performance Measure The word2vec is used as a word embedding technique, converting the text data into numeric vectors. Then the dataset was split into two parts, namely, training and testing, with the percentage of 80 and 20. 80% of data is considered as training data, and the rest of the data is considered as test data. Then we applied the above-discussed algorithms to build the classifier. Then the model performance is evaluated using test data with different performance measures shown in Figs. 6 and 7. These measures are defined by true positive (tpi ), true negative (tni ), false positive (fpi ), and false

Sentiment Analysis of COVID-19 Tweets Using TextBlob …

103

Fig. 7 Visual representation

negative (fni ) of class C i [18, 33]. If m is the total number of classes in the dataset, then i value is 1 to m. In this experiment, performance measures such as accuracy, precision, recall, and F1-score are used to evaluate the classifier’s performance. The performance measures are defined as follows. m Accuracy =

t pi +tn i i=1 t pi + f pi + f n i +tn i

m t pi |y | i i=1 tp + f p m i i Precision = i |yi | m t pi i=1 |yi | t pi + f n i m Recall = i |yi | m 2t pi 1 |yi | 2t pi + f pi + f n i m F1-Score = i |yi |

(4)

m

(5) (6) (7)

4 Finding and Discussion Finding for R1: We observed that the people generally believe the other’s personal opinion rather than scientific proof. When the COVID-19 vaccination started, people were afraid to get the vaccination at an early stage. The use of social media such as Twitter to spread vaccination awareness is aided by tweets from people, who have received vaccinations, which influences public behavior related to vaccines and helps to reduce the panic about COVID issues.

104

P. Kathiravan et al.

Finding for R2: People are readily swayed by social media posts, and they believe and share them without questioning the reliability of the information they receive. We also found out that social media posts had a high risk of disseminating false information about sensitive issues that impact people’s mental health. Still, the most recent papers are available with state-of-the-art Deep learning techniques, which require massive datasets and high computation processes to implement. It will become an evitable avenue of further research.

5 Conclusion and Future Work Sentiment analysis is the most crucial task of Natural Language Processing, and it is a subtask of text processing. This research paper summarizes the state-of-theart automatic extraction of public emotions from Twitter posts concerned with the coronavirus pandemic. The tweets were extracted using Twitter API and the python module called TextBlob was used on tweets to find the polarity, which is the quantifier of the sentiment analysis, i.e., Positive, Negative, and Neutral, to understand the public perception of the current pandemic situation. Furthermore, for effective text classification, we have applied classification methodologies like Support Vector Machine (SVM), K Nearest Neighbour (KNN), Logistic Regression (LR), and Decision Tree (DT). Then the results were compared with performance matrices like accuracy, precision, recall, and F1-score. The experimental results have shown the accuracy of the SVM algorithm with 99.4%, which is significantly better than the KNN, LR, and DT. Due to time constraints and computational processes, many components have been left for future research. It could be exciting to consider the following research aspects. (a) Using Deep Learning techniques to work on unstructured data efficiently replaces the need for human feature selection and makes it easier to deal with large amounts of data. (b) Fake news should be detected before it becomes viral. It is a crucial task in social media posts analysis. So we extend our work to fake news detection in the future.

References 1. Chu Z, Widjaja I, Wang H (2012) Detecting social spam campaigns on twitter. In: International conference on applied cryptography and network security, pp 455–472 2. Subramani S, Michalska S, Wang H, Du J, Zhang Y, Shakeel H (2019) Deep learning for multi-class identification from domestic violence online posts. IEEE Access 7:46210–46224 3. Chiarello F, Bonaccorsi A, Fantoni G (2020) Technical sentiment analysis. Measuring advantages and drawbacks of new products using social media. Comput Ind 123:103299 4. Kaur S, Kaul P, Zadeh P (2020) Monitoring the dynamics of emotions during COVID-19 using Twitter data. Proc Comput Sci 177:423–430 5. Pokharel B (2020) Twitter sentiment analysis during covid-19 outbreak in Nepal. Available At SSRN 3624719

Sentiment Analysis of COVID-19 Tweets Using TextBlob …

105

6. Chiarello F, Fantoni G, Bonaccorsi A (2017) Others product description in terms of advantages and drawbacks: exploiting patent information in novel ways. In: DS 87-6 proceedings of the 21st international conference on engineering design (ICED 17), vol 6. Design information and knowledge. Vancouver, Canada, 21–25 Aug 2017, pp 101–110 7. Kariya C, Khodke P (2020) Twitter sentiment analysis. In: 2020 international conference for emerging technology (INCET), pp 1–3 8. Mandloi L, Patel R (2020) Twitter sentiments analysis using machine learning methods. In: 2020 international conference for emerging technology (INCET), pp 1–5 9. Sailunaz K, Alhajj R (2019) Emotion and sentiment analysis from Twitter text. J Comput Sci 36:101003 10. Nausheen F, Begum S (2018) Sentiment analysis to predict election results using Python. In: 2018 2nd international conference on inventive systems and control (ICISC), pp 1259–1262 11. Saleena N (2018) Others an ensemble classification system for twitter sentiment analysis. Proc Comput Sci 132:937–946 12. Goel A, Gautam J, Kumar S (2016) Real time sentiment analysis of tweets using Naive Bayes. In: 2016 2nd international conference on next generation computing technologies (NGCT), pp 257–261 13. Kanakaraj M, Guddeti R (2015) Performance analysis of ensemble methods on Twitter sentiment analysis using NLP techniques. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015), pp 169–170 14. Saad A (2020) Opinion mining on US airline Twitter data using machine learning techniques. In: 2020 16th international computer engineering conference (ICENCO), pp 59–63 15. Indra S, Wikarsa L, Turang R (2016) Using logistic regression method to classify tweets into the selected topics. In: 2016 international conference on advanced computer science and information systems (ICACSIS), pp 385–390 16. Englmeier K (2020) Named entities and their role in creating context information. Proc Comput Sci 176:2069–2076 17. Rakshitha K, Ramalingam H, Pavithra M, Advi H, Hegde M (2021) Sentimental analysis of Indian regional languages on social media. Glob Trans Proc 18. Behera B, Kumaravelan G, Kumar P (2019) Performance evaluation of deep learning algorithms in biomedical document classification. In: 2019 11th international conference on advanced computing (ICoAC), pp 220–224 19. Li L (2020) Event-related collections understanding and services. Virginia Tech 20. Kaul S (2015) Agenda detector: labeling tweets with political policy agenda. Iowa State University 21. Suman C, Reddy S, Saha S, Bhattacharyya P (2021) Why pay more? A simple and efficient named entity recognition system for tweets. Exp Syst Appl 167:114101 22. Grimmer J, Stewart B (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal 21:267–297 23. Heimerl F, Lohmann S, Lange S, Ertl T (2014) Word cloud explorer: text analytics based on word clouds. In: 2014 47th Hawaii international conference on system sciences, pp 1833–1842 24. Subramani S, Wang H, Vu H, Li G (2018) Domestic violence crisis identification from facebook posts based on deep learning. IEEE Access 6:54075–54085 25. Blanco A, Casillas A, Pérez A, Ilarraza A (2019) Multi-label clinical document classification: impact of label-density. Exp Syst Appl 138:112835 26. Larson K (2012) Tying social media to organizational decision-making. UGA 27. Zhang S (2021) Challenges in KNN classification. IEEE Trans Knowl Data Eng 28. Zhang Z (2016) Introduction to machine learning: k-nearest neighbors. Ann Transl Med 4 29. Jeerasuwannakul B, Sawunyavisuth B, Khamsai S, Sawanyawisuth K (2021) Prevalence and risk factors of proteinuria in patients with type 2 diabetes mellitus. Asia-Pac J Sci Technol 26 30. Myles A, Feudale R, Liu Y, Woody N, Brown S (2004) An introduction to decision tree modeling. J Chem Soc 18:275–285 31. Safavian RS, Landgrebe D (1991). A survey of decision tree classifier methodology. IEEE Trans Syst Man Cybernet 213

106

P. Kathiravan et al.

32. Hota S, Pathak S (2018) KNN classifier based approach for multi-class sentiment analysis of twitter data. Int J Eng Technol 7:1372–1375 33. Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437

Reflection of Star Ratings on Online Customer Reviews; Its Influence on Consumer Decision-Making C. Selvaraj and Anitha Nallasivam

Abstract As a result of the widespread proliferation of the internet, global sales of products have risen dramatically, owing to the advent of e-commerce platforms and mobile phones. The customer must comprehend the product well before purchasing it; because if the consumer is not clear on the benefits of the product, they will not have enough information to decide whether to purchase it. Many consumers these days rely on star ratings and reviews to make buying choices. Customer behaviour is strongly influenced by how satisfied customers are with their purchases and the number of product reviews they leave. The authors of this article analysed 110 customer reviews for 22 products. Customer reviews and ratings examine the intensity of customer sentiment and emotion. Keywords Star rating · Customer feedback · Sentiments and emotions

1 Introduction to Star Ratings and Online Customer Reviews The internet shopping nightmare is seldom as simple as it seems. Buying anything is always a risk, no matter whether it is a product the customer has heard of or not. These other characteristics are difficult to evaluate on the screen since the feel of the material and whether the cream may create a problem are difficult to feel and understand. Buyers often look at previous customers’ experiences before purchasing, and these reviews typically influence their purchasing decisions. Buyers provide star ratings, like many e-commerce sites, such as the Amazon India website, which C. Selvaraj (B) Department of Business Administration, Saveetha College of Liberal Arts and Sciences (SIMATS, Deemed University), Chennai, India e-mail: [email protected] A. Nallasivam Jain CMS Business School, Jain University, Bengaluru, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_9

107

108

C. Selvaraj and A. Nallasivam

provides ratings ranging from one to five stars. Emotion words like “like” and “love” have established themselves as succinct shorthand for conveying how people feel about something. Online reviews undeniably have a revolutionary influence on how users utilise services and make purchases while surfing the internet. It is increasingly commonplace for all goods, with positive assessments and desirable sales for the company linked to how effectively the choice is made. An online review platform offers feedback on the quality of a product or service over the internet. There are a lot of websites that deal with consumer reviews and the development and promotion of these reviews [1]. In his article, Aral [2] said that society has become dependent on aggregated digital viewpoints. Online ratings are a good example. Random assignment of participants is required to know the full impact of social influence on collective judgement. Prior good reviews have an irrational impact on things’ appeal, while quality leads to greatness. In today’s dynamic and ever-changing e-commerce sector, reliable online consumer evaluations and reviews are essential. They also assist consumers in making faster and more informed decisions that match their needs and preferences. On the other hand, they assist new entries into the market by offering a feedback loop that continually improves their products and manages their reputation. Customers have always trusted their peers’ online ratings and reviews, but data show that their reliability changes based on the situation [3]. The prevalence of online reviews, as well as the impersonal nature of the ecommerce medium and the importance channel users place on them, has prompted the regulation of online reviews to emerge as a topic of critical importance for every organisation, regardless of size or industry, in recent years. The number of usergenerated customer reviews is rapidly growing, and they may prove to be a helpful heuristic for consumers when making purchase choices. In today’s age of information overload, the value of user-generated consumer reviews is becoming increasingly evident. Customers may benefit from this because the amount of cognitive load they have to deal with during the decision-making process will be cut down. The importance of customer evaluations and star ratings in the case of businesses with an online presence, particularly small businesses, cannot be overstated, and this is especially true for small businesses. Customer reviews and star ratings, which can create or destroy a company’s online image, make it easier to manage a company’s online reputation. In recent years, customer evaluations have had a higher impact on purchase choices than even recommendations from the customer’s friends and family members had before. Thus, customer feedback and star ratings may have a considerable influence on a company’s bottom-line profit margin. At no other time in history has it been easier to react to customer reviews and star ratings than it is right now. In some instances, the customer reviews do not display the star rating of the customers. Sometimes, it may be in negative sentiment or emotion, but the star rating may be high. This article tries to identify the reflection of star ratings on the

Reflection of Star Ratings on Online Customer Reviews …

109

customer reviews of selected products on the Amazon India website to understand this complication.

2 Literature Review According to the new research results, the number of “angry” and “love” responses is connected to a post’s mood. Furthermore, if a post covers a matter considered significant by constituents, it has a good influence on the number of “angry” reactions. The researchers found that when issue awareness is low, feelings of “angry” reactions are at their highest [4]. Consumer-generated online product evaluations are critical for buyer selections as well as sales and profits for companies. The findings show that there are a lot of things that have a big impact on sales, like the type of product and how popular it is, as well as things that are more important to people but not as popular. Customer evaluations are a practical approach to developing a deeper connection with consumers and increasing the uniqueness of the store’s content. To increase the exposure of product evaluations, make sure the star ratings are visible in several locations on the storefront [5]. Customers prefer trip planning shortcuts. The online reviews may help shoppers decide. Customer satisfaction influences the rating quality. No one found the treatment unpleasant. Study findings show that customers choose incredibly high or low ratings, resulting in a U-shaped line. Critical assessments are more valuable than positive ones, yet pleasant assessments are associated with higher enjoyment. There are several good outcomes: authenticity, expertise, awards, words of influence on utility and enjoyment [6]. Depending on satisfaction, customers rate the product. Products classified on a scale of 1–5 are the lowest, and five are the highest. Analysis of emotions using machine learning requires a collection of sentimental terms. It understands sarcasm to assess the emotional tone of a remark correctly. We analyse and compare what individuals say with the database [7]. For customers, there is an inherent dilemma. Online product reviews are valuable sources of product information. Meanwhile, contemporary technology makes it prohibitively expensive to find all the relevant information. Consumers need to decide on which data subset to utilise. The star ratings are fantastic since they indicate the tone of a review quickly [8]. Customers’ purchases may be influenced by online reviews. Consumption places a higher value on product-related information than it does on review-related information, especially in terms of content and reviewers. Individual page components get varying amounts of attention across different search and experience offerings [9]. The duration of the review reveals significant disparities between regular and organic commodities. Regular product reviews have many fewer words when compared to organic product reviews. The mean emotional score of regular goods is considerably less than that of organic goods. Ghee and honey have different average emotion ratings. Organic honey and regular ghee had higher sentiment ratings. Sentiment drops for product assessments that last longer than usual. For more extended

110

C. Selvaraj and A. Nallasivam

evaluations, regular and organic honey scores better in emotions [10]. The emotion of focus reviews is favourably associated with star ratings. The number of stars awarded to a review increases in direct proportion to how favourable the review is. When assessing the initial review, other customers’ opinions are considered, which indicates a social effect among consumers. Content-rich reviews may only result in lower star ratings when they are written on a smartphone [11]. The importance of star ratings is that they provide a valuable and convenient source of information. This is particularly apparent in industries such as foodservice and hospitality. When it comes to finding restaurants, consumers are increasingly turning to review and rating sites [12]. The online consumer rating information on the average assessment includes the number of individuals who rated something, hence the rating’s credibility. Consumer decisions based on online ratings show inter-individual variation. One out of every four people uses Bayesian updating on a regular basis. When the better-rated alternative has a larger sample size, people act less Bayesian [13]. Feelings from text reviews influence the ranking. The polarity and length of the review were found to be highly related to the categorisation outcome. Furthermore, it can be seen and deduced that ratings of one and five stars are easier to predict due to strong sentiments than ratings of three stars [14]. When it comes to people, 8 out of every 94 believe that bad online reviews have convinced them not to buy from a firm. Let’s suppose a few negative reviews are helpful to businesses since they provide a sense of reality. In that case, however, it is also possible that having too many negative reviews might negatively affect the organisation. Because of this, a corporation can suffer tens of thousands of dollars in losses in revenues and other costs [15]. According to new research in customer service, mining guest evaluations is a new trend. For management, the importance of a guest’s experience is enormous. Guest input might help improve the hotel’s service quality. This study proposes a method for determining tourist satisfaction with Vietnamese hotels [16]. According to the results of research on cue usage, reviewers place a premium on signals that are readily visible and assessable. When compared to the other indicators studied, these signals have the biggest effect on the buying process. Amazon uses star ratings to differentiate itself from its competitors. Amazon utilises a variety of techniques to cement its position as the industry leader in online consumer reviews. Customers have created social networks and are cognizant of others’ perspectives while expressing their own, as seen by their use of popular vocabulary and jargon in their judgments, indicating that they are well educated. Marketers must have a defined communication strategy for the materials they deliver to clients, and they must decide whether to emphasise positive or negative information [17]. If a significant number of favourable internet reviews are published by verified purchasers and have a five-star rating over a lengthy period, a boost in sales for a firm should be anticipated. According to the study, companies should spend less time focussing on the length of reviews and the personal information of reviewers and more time collecting online reviews with high star ratings from verified consumers, according to the study, to maximise the effect of online reviews on sales. D’Arbelles

Reflection of Star Ratings on Online Customer Reviews …

111

et al. [18] found that a lot of other things, like how many reviews there were and whether the reviewer was a real customer, had a bigger impact on sales than the star rating did. Positive user views and intents to spread the word are more likely to be prompted by evaluations with a high valence, a high star rating and usefulness. Additionally, the involvement with product categories and the interpersonal influence of reviewers are explored. The valence of review material has a greater influence on evaluative replies in highly engaged and interpersonally impacted individuals. Review usefulness has a stronger effect on review perception for those who are more susceptible to interpersonal influence. Participation and susceptibility do not affect star ratings [19]. Without recommendations from family or friends, purchasers will depend on reviews from strangers and anonymous sources, giving little heed to the source of the review. That is, shoppers believe that internet reviews are just as valuable as the opinions of trusted sources, which is consistent with previous research results. Online retailers may be able to increase conversion rates on their product pages by optimising them based on this information. Rather than having one or two entire reviews on each page, it may be more beneficial to use the same amount of space to show six starbased ratings and a portion of the review with a link to the full review. Consequently, buyers would get more readily digestible information more rapidly, allowing them to express their favourable reactions to a product. Consumers interested in learning more about the product may do so by visiting the above-mentioned website. The existing literature does not give a clear idea of star ratings and corresponding customer reviews. The genuine sentiment is not reflected in the star rating. A product may have a four-star rating if the reviews are negative, and the ratings are one-star ratings if the reviews are positive. To figure out how product star ratings and customer reviews are linked, the authors looked at the Amazon India website.

3 Methodology To comprehend the link between the star rating and customer reviews, 22 products were randomly selected from the Amazon website (India). For each product, the average star rating, total ratings and first five customers’ star ratings and reviews are taken for analysis from the Amazon India website. The collected customer reviews are analysed through monkeylearn.com to know the customer’s sentiments. Through the komprehend.io website, the customer reviews are analysed to identify the customers’ emotions. Both the sentiments and emotions are summarised in star ratings assigned by the customers.

112

C. Selvaraj and A. Nallasivam

Statistical Tools Used Correlation technique is used to identify the correlation between the star rating and sentiments of reviewer. Chi-square tool is implemented to identify the significant relationship between the star rating and the customer reviews. Discrepancy test is used to identify the discrepancy between star rating with emotions of customer reviews. Programme Code The customer reviews collected from Amazon are analysed through monkeylearn.com. The following code is given for a customer review analysis on monkeylearn.com to know the customer’s sentiment. HTTP/1.1 200 success content-length: 211 content-type: application/json [ { “text”: “Very nice phone. Eye protection is also there. awesome looking. Delivered very quickly\n”, “external_id”: null, “error”: false, “classifications”: [ { “tag_name”: “Positive”, “tag_id”: 122921383, “confidence”: 1 } ] } ]

Reflection of Star Ratings on Online Customer Reviews …

113

Table 1 Sentiment analysis of customer reviews with their confidence level Confidence level

Sentiment Positive

Negative

Positive (in %)

Negative (in %)

95% and above

46

15

41.82

13.64

90–95

12

6

10.91

5.45

80–90

6

3

5.45

2.73

60–80

12

6

10.91

5.45

40–60

4

0

3.64

0.00

Total

80

30

72.73

27.27

4 Discussion 4.1 Sentiment Analysis Table 1 shows the positive and negative sentiment of customer reviews analysed through monkeylearn.com. There are eighty customer reviews with positive sentiments and thirty with negative sentiments. Around 73% of customers gave positive reviews for their purchase of the product through the Amazon India website, and 27% of customer reviews was with negative sentiments.

4.2 Star Rating and Sentiment Analysis Table 2 shows that the customer reviews analysed through monkeylearn.com are summarised against the corresponding stars given by the customers on the website. Of 80 positive customer reviews, 73 fall into the five-star and four-star categories. Moreover, seven reviews are in the three and one-star categories. Of the thirty negative sentiments, 25 are in the one and two-star categories, and four are in the three-star and one in the four-star category. It is a bit of a contradictory customer review. Table 3 shows that there is a strong correlation between the star rating and the review sentiment of the customers. Table 2 Star rating and sentiment of customers Star rating Sentiment

5

4

3

2

1

Total

In %

Positive

58

15

3

0

4

80

72.73

Negative

0

1

4

3

22

30

27.27

Total

58

16

7

3

26

110

100

114

C. Selvaraj and A. Nallasivam

Table 3 Correlation between star rating and sentiment analysis Correlations Sentiment

Pearson correlation

Sentiment

Rating

1

0.827**

Sig. (2-tailed) Rating

** Correlation

0.000

N

110

110

Pearson correlation

0.827**

1

Sig. (2-tailed)

0.000

N

110

110

is significant at the 0.01 level (2-tailed)

Table 4 Chi-square test for significant association between star rating and sentiments of customer reviews Results 5 Star rating

4 Star rating

3 Star rating

2 Star rating

1 Star rating

Row totals

Positive

58 (42.91) [5.31]

15 (11.64) [0.97]

3 (4.36) [0.43]

0

4 (18.18) [12.68]

80

Negative

0

1 (4.36) [2.59]

4 (1.64) [1.14]

3 (1.09) [3.34]

22 (6.82) [33.80]

30

Column totals

58

16

7

3

26

110

The chi-square statistic shown in Table 4 is 75.6635. The p-value is < 0.00001. The result is significant at p < 0.05. There is a significant relationship between the star rating and the customer reviews given by the customers on the website.

4.3 Emotion Analysis The customer reviews collected from the Amazon India website are again analysed through komprehend.io to know the customers’ emotions. The following is the sample analysis conducted through komprehend.io. The results displayed six different emotions: happy, angry, excited, sad, frightened and bored. The emotion with a high percent value was taken for the analysis of the data. Similarly, all 110 customer reviews were emotionally analysed, and the results are summarised in Table 3. Star rating and emotion analysis of customer reviews is shown in Fig. 1. All the 110 customer reviews collected for 22 brands on the Amazon India website were emotionally analysed using the komprehend.io web page. All the analysed text data is presented against the star rating given by the customer in the above table.

Reflection of Star Ratings on Online Customer Reviews …

115

Emotion Analysis

28

30 25 20 15

12

11 8

10

5 5

1

0

10

8 1

0 1 Star

0 0 0

2 1 0

2

2 Star Happy

0 1 1

3 0

1 1

3 Star Angry

Excited

2

5

4 0

4 Star Sad

Fear

2

1 5 Star

Bored

Fig.1 Star rating and emotion analysis of customer reviews

Around 58 customers gave a five-star rating to their purchased products. Out of 58 five-star ratings, 40 are in the emotional categories of “happy” and “excited.” The remaining 18 five stars are in the angry, sad, fearful and bored categories. It shows that their five-star rating is not reflected in their customer reviews. 26 respondents who gave emotional customer reviews in the angry, sad, fear and bored categories. Three customer reviews with a star rating of 3 are in the happy and excited emotional categories, and four customer reviews are in the angry, sad, fear and bored emotional categories. The chi-square statistic as shown in Table 5 is 38.459. The p-value is 0.001301. The result is significant at p < 0.05. There is a significant association between star ratings and the emotional expression of customer reviews. Table 5 Chi-square test for association between star rating and emotions of customer reviews Results Emotions

1 Star rating 2 Star rating

3 Star rating

4 Star rating

5 Star rating

Row totals

Happy

1 (6.20) [4.36]

0

2 (1.74) [0.31]

1 (3.47) [1.76]

28 (19.10) [4.15]

32

Angry

11 (6.78) [1.53]

0

0

1 (3.80) [2.06]

2 (1.74) [0.31]

14

Excited

0

0

1 (1.14) [0.02]

8 (2.28) [6.07]

12 (12.53) [0.02]

21

Sad

8 (3.49) [5.84]

2 (0.84) [1.61]

1 (0.98) [0.00]

2 (1.95) [0.00]

6 (10.74) [3.07]

19

Fear

6 (4.46) [0.07]

1 (1.07) [0.00]

3 (1.25) [2.46]

4 (2.50) [0.91]

10 (13.73) [1.01]

24

Column totals

26

3

7

16

58

110

116

C. Selvaraj and A. Nallasivam

4.4 Discrepancy of Star Rating with Sentiments and Emotions of the Customer Reviews Table 6 shows the discrepancy between the customer star rating and the emotion of their reviews given on the website. Reviewer 3 gave a five-star rating for the product, but his reviews were based on the emotion of fear. Reviewer 5 also gave it five stars, but the reviews were written in fear. Reviewer 21 gave it three stars, but his reviews of happy emotion were mixed. Reviewer 28 gave a four-star rating; however, the emotion is angry. A kind of mismatch prevails in the star rating and corresponding emotions of reviews given by the customers on the website.

5 Major Findings and Recommendations Fifty-eight customer reviews with positive sentiments have a customer confidence level of 90% or above, and 22 customer reviews have less than a 90% customer confidence level. It shows that their confidence level does not support their positive sentiments. Twenty-one customer reviews with negative sentiment have a confidence level of 90% or higher, while nine have a confidence level of less than 90%. It shows that their confidence level does not support their negative sentiments. The corresponding star rating is given by the customers and is reflected in the customer reviews. Around seven positive customer reviews are in the three-star and one-star categories only. The star rating is not reflected in the sentiments expressed by the customers. Even though the star rating is low, the reviews are generally positive sentiment. Twenty-five negative customer reviews have two- and one-star ratings, while five negative sentiments have three- and four-star ratings. Here, the rating is not reflected in the customer reviews. Despite having a high star rating, the reviews are negative in nature. The emotional analysis shows that the respective star ratings given by the customers are not reflected in the reviews they have written on the websites. Many five-star ratings reflect the negative emotions of anger, sad, fear and bored. For example, ten of the five-star ratings are in the fear category, and three are in the sad emotional category. Around seven of the four-star customer reviews are in the emotional categories of anger, sad, fear and bored emotional category. It may be due to the wrong choice of words by the customers, the wrong choice of star rating by the customers or the wrong choice of words for the machine learning adopted. A kind of discrepancy exists between customer star ratings and reviews. Customers are marked with a high star rating for certain products, but their corresponding reviews are negative emotions. First, the customer may assign a star and then write the reviews. The discrepancy may arise due to the wrong choice of words, star rating or misreading of customer reviews by the algorithm.

Solimo almonds, 500 g

Amazon pay eGift card

Amazon pay eGift card

ANNI DESIGNER women’s art silk saree

Biotique bio-wild Positive grass a soothing after shave gel

biotique bio-wild Positive grass a soothing after shave gel

Biotique bio-wild Positive grass a soothing after shave gel

3

4

5

6

7

8

9

Positive

Positive

Positive

Positive

Positive

Inkast Denim Co. men’s slim fit casual shirt

2

Positive

Inkast Denim Co. men’s slim fit casual shirt

91.40

100

99.80

100

45.40

90.60

77.10

96.80

90.60

Sentiment Confidence level (%)

Sentiment analysis

1

Reviewers Products

Table 6 Discrepancy between star rating with emotions

5

5

5

5

5

5

5

5

5

40.39

33.06

27.27

24.61

38.28

40.49

28.21

35.45

27.96

(continued)

Happy (%) Angry (%) Excited (%) Sad (%) Fear (%) Bored

Star ratings Emotion analysis

Reflection of Star Ratings on Online Customer Reviews … 117

LG V40 ThinQ

LG V40 ThinQ

LG V40 ThinQ

LG V40 ThinQ

Lizol disinfectant surface and floor cleaner liquid

Park avenue premium Positive men’s soap

Pears moisturising bathing bar soap

Pears moisturising bathing bar soap

Redmi note 9 pro

12

13

14

15

16

17

18

19

20

Positive

Positive

Positive

Positive

Positive

Positive

Positive

Positive

FLIPZON Positive multipurpose 6 shelve baby wardrobe

11

Positive

Bourge men’s vega-z1 running shoes

99.50

99.80

80.40

99.10

72.80

80.60

99.90

98.70

91.00

62.10

92.20

Sentiment Confidence level (%)

Sentiment analysis

10

Reviewers Products

Table 6 (continued)

5

5

5

4

4

3

5

5

5

4

3

30.95

20.59

23.39

41.04

20.58

23.13

24.22

27.75

34.44

26.93

26.93

(continued)

Happy (%) Angry (%) Excited (%) Sad (%) Fear (%) Bored

Star ratings Emotion analysis

118 C. Selvaraj and A. Nallasivam

SleepX dual comfort mattress

SleepX dual comfort mattress

SleepX dual comfort mattress

Tata tea gold, 1 kg

Vim dishwash anti-smell bar

Wolpin wall stickers Positive wallpaper DIY decals

23

24

25

26

27

28

Positive

Positive

Positive

Positive

Positive

SAFARI 15 ltrs sea Positive blue casual/school/college backpack

Positive

Redmi note 9 pro

22

77

87.20

99.90

52.30

90.40

91.80

95

68

Sentiment Confidence level (%)

Sentiment analysis

21

Reviewers Products

Table 6 (continued)

4

5

4

4

5

5

4

3

32.52

48.77

28.02

37.95

58.28

26.32

24.23

21.78%

Happy (%) Angry (%) Excited (%) Sad (%) Fear (%) Bored

Star ratings Emotion analysis

Reflection of Star Ratings on Online Customer Reviews … 119

120

C. Selvaraj and A. Nallasivam

5.1 Recommendations The website designer may have a predetermined number of reviews against each star rating. After giving the star rating, it is easier for the customers to express their sentiments and emotions if the phrases appear. Based on the keywords of the customer reviews, the customers may ask them to give their reviews. It will motivate other purchasers to choose or not choose the product. Star ratings may be interconnected with different keywords. Instead of assigning a five-star rating, the website designer may give a 7- or 9-star rating setup to get an exact picture of the customer’s mind-set.

5.2 Scope for Further Study Converting the text into data is a difficult and subjective process. Even though we have many machine learning tools, correctly predicting the emotions and sentiments of customer reviews is a difficult task. The language and culture of the people vary from country to country. Identifying the right emotions of people through customer reviews may require more sophisticated tools. This article is based on 22 types of products and 110 customer reviews collected from the Amazon India website. More types of products and customer reviews from different websites may be included in future research to get the perfect information on the star rating and corresponding emotions and sentiments.

5.3 Optimization Process to Assess Star Rating and Reviews Users are more likely to notice and offer feedback if the rating and review indications are visible above the fold. Indicate that items were inspected promptly. Do not ignore negative feedback; use it to engage with dissatisfied customers and show them you care about their concerns. Allow reviews to be rated to choose the best. So that visitors may easily identify the most useful reviews. Allowing users to upload graphics should be encouraged. Thus, customers will trust items more. The averages appear when there are enough reviews. An early average rating may leave a consumer unsatisfied with the merchandise. It makes reviews more relevant when users can choose the kind of client they wish to see. This allows buyers to easily find the reviews that best fit their needs. Remember that unreviewed items should not be used as negative social evidence. This may be done in two ways. If incentives are offered, people may post evaluations. Participants in a review programme gain points, discounts or freebies. This shows them the value of involvement.

Reflection of Star Ratings on Online Customer Reviews …

121

6 Conclusion Assigning a star rating may be a somewhat arbitrative decision by the customer without any words in his mind. However, in writing reviews, the experience with the product is likely to come into the picture. The customer may express his feelings and emotions through words. A star rating simply selects one star, but review writing is a combination of words to express his experience with the products. Sometimes, this star rating may not be reflected in the customer reviews. However, in the first instance, the average star rating appeared on the front page of the product, highly influencing the customers. Customers should read the customer reviews and star ratings before purchasing a product.

Appendix Sentiment Analysis

Average star rating

4

4.2

4.2

4.2

Products

Oppo A31

Samsung galaxy M21

Redmi note 9 pro

LG V40 ThinQ

386

20,335

131,859

12,330

Total no of ratings

Positive Positive Positive Positive

Review 4 Review 5

Negative

Review 5

Review 3

Positive

Review 4

Review 2

Positive

Review 3

Positive

Positive

Review 1

Positive

Positive

Review 5 Review 2

Negative

Review 4 Review 1

Positive

Review 3

Positive

Review 5 Negative

Negative

Review 4

Review 2

Positive

Review 3

Positive

Positive

Review 2

Review 1

Negative

Sentiment

Review 1

Reviewers

80.60

99.90

98.70

91.00

99.30

99.80

68

96.30

92.30

88.20

99.50

100

97.80

80.70

98.50

98.20

100

100

96.10

97.30

Confidence level (%)

Sentiment analysis

30.95

32.52

33.00

35.24

38.67

39.65

38.09

Happy (%)

63.52

20.59

56.67

66.78

Angry (%)

33.29

30.97

30.41

44.28

Excited (%)

20.58

41.07

48.95

Sad (%)

Emotion analysis of customer reviews

26.93

26.93

Fear (%)

Bored (%)

(continued)

3

5

5

5

5

1

3

5

4

4

5

1

5

3

5

5

1

5

5

1

Star ratings

122 C. Selvaraj and A. Nallasivam

4.5

Tata tea gold, 1 kg

4672

5785

4.4

Biotique bio-wild grass a soothing after shave gel for men, 120 ml

11,179

1840

4.5

Lizol disinfectant surface and floor cleaner liquid, citrus 2L

Total no of ratings

Vim dishwash 4.5 anti-smell bar, pudina, removes tough food smells from the utensils, 200 g

Average star rating

Products

(continued)

Negative Positive Negative

Review 3 Review 4

Positive

Review 5 Positive

Negative

Review 4

Review 2

Positive

Review 3

Review 1

Positive

Positive

Review 5 Review 2

Positive

Review 4 Positive

Negative

Review 1

Negative

Positive

Review 5

Review 3

Positive

Review 4

Review 2

Positive

Review 3

Positive

Positive

Review 1

Positive

Review 2

99.80

99.90

76.50

99.90

99.80

91.30

91.40

100

97.40

67.00

87.20

81

100

99.70

45.00

99.80

90.70

99.80

72.80

Confidence level (%)

Sentiment analysis Sentiment

Review 1

Reviewers

43.67

27.82

Happy (%)

38.53

22.34

Angry (%)

35.71

35.57

41.10

29.97

24.59

Excited (%)

Sad (%)

Emotion analysis of customer reviews

27.17

26.32

33.02

40.49

32.72

24.61

38.28

58.28

34.01

34.44

Fear (%)

Bored (%)

(continued)

1

4

3

5

5

1

5

5

4

5

5

1

1

5

1

5

5

4

4

Star ratings

Reflection of Star Ratings on Online Customer Reviews … 123

22,483

3215

Amazon brand-inkast denim co. men’s slim fit casual shirt

4.1

5389

ANNI DESIGNER 3.4 women’s art silk saree

4.1

Allen Solly men’s polo

Total no of ratings

16,047

Average star rating

Amazon brand-solimo 4.1 almonds, 500 g

Products

(continued)

Positive Positive

Review 3

Negative

Review 5 Review 2

Positive

Review 4 Positive

Negative

Review 1

Positive

Review 3

Negative

Review 5 Review 2

Negative

Review 4 Positive

Negative

Review 3

Review 1

Positive

Positive

Review 5 Positive

Positive

Review 4

Review 2

Positive

Review 3

Review 1

Positive

Review 2

Positive Negative

Review 5 Review 1

99.80

99.10

99.80

92.80

100

91.90

100

98.90

92.60

100

67.30

99.30

97.60

77.10

86.90

99.90

63.70

94.60

99.90

Confidence level (%)

Sentiment analysis Sentiment

Reviewers

50.95

44.39

39.69

45.26

30.77

28.48

42.57

47.75

58.81

Happy (%)

29.05

Angry (%)

35.14

29.93

Excited (%)

28.33

39.83

33.06

34.15

29.83

Sad (%)

Emotion analysis of customer reviews

23.62

35.45

Fear (%)

Bored (%)

(continued)

5

5

5

1

5

1

5

5

2

1

2

5

5

5

5

5

4

1

5

Star ratings

124 C. Selvaraj and A. Nallasivam

2666

4293

Luzuliyo® 3.8 anti-pollution, reusable, non-woven protective layer face n95 mask for men washable and reusable

PlantexPitru stainless steel folding towel rack for bathroom/towel stand (24 inch-chrome finish)

4.4

3542

4.1

SAFARI 15 ltrs sea blue casual/school/college backpack

Total no of ratings

8249

Average star rating

Bourge men’s vega-z1 4.3 running shoes

Products

(continued)

Positive

Review 5 Positive

Negative

Review 4

Review 2

Positive

Review 3

Negative

Positive

Review 1

Positive

Negative

Review 5 Review 2

Positive

Review 4 Review 1

Positive

Review 3

Positive

Review 5 Positive

Positive

Review 4

Review 2

Positive

Review 3

Positive

Positive

Review 2

Review 1

Positive

Positive

Review 5 Review 1

Positive

Review 4

76.90

100

96.20

100

100

99.90

100

64.20

95

77.70

98.60

98.10

88.50

92.20

98.90

97.90

94.10

90.60

96.80

Confidence level (%)

Sentiment analysis Sentiment

Reviewers

41.88

42.54

46.54

46.94

41.35

36.48

43.00

Happy (%)

47.48

Angry (%)

32.55

38.63

28.75

23.39

32.88

Excited (%)

31.21

37.95

27.16

27.27

Sad (%)

Emotion analysis of customer reviews

31.46

27.96

Fear (%)

Bored (%)

(continued)

4

1

5

3

5

5

5

1

4

5

5

4

1

3

5

5

5

5

5

Star ratings

Reflection of Star Ratings on Online Customer Reviews … 125

5653

57,243

FLIPZON 4.2 multipurpose 6 shelve baby wardrobe, foldable, (unbreakable material) (blue)

Amazon pay eGift card

4.6

3268

4.3

SleepX dual comfort mattress-medium soft and hard-double bed size (orange, 72 × 48 × 6)

Total no of ratings

4316

Average star rating

Wolpin wall stickers 4.2 Wallpaper DIY decals (45 × 500 cm) 3D PVC DIY self-adhesive, multicolour

Products

(continued)

Negative Positive

Review 4 Review 5 Negative

Negative

Review 3

Review 1

Positive

Positive

Review 5 Review 2

Negative

Review 4 Positive

Positive

Review 1

Positive

Positive

Review 5

Review 3

Negative

Review 4

Review 2

Positive

Review 3

Positive

Positive

Review 1

Positive

Negative

Review 5 Review 2

Positive

Review 4 Review 1

Negative

Review 3

75

62.10

100.00

100.00

99.90

100

52.30

97.60

90.40

91.80

100

65.90

99.70

77

99.50

90

90.60

99.60

100

Confidence level (%)

Sentiment analysis Sentiment

Reviewers

27.94

46.19

61.39

24.34

Happy (%)

26.26

48.77

52.55

Angry (%)

41.91

46.51

26.94

Excited (%)

23.13

28.02

47.87

34.41

Sad (%)

Emotion analysis of customer reviews

24.54

31.55

24.23

Fear (%)

38.16

21.78

Bored (%)

(continued)

3

4

1

1

5

5

4

1

5

5

5

5

2

4

5

5

1

5

1

Star ratings

126 C. Selvaraj and A. Nallasivam

12,745

14,883

4.2

Park avenue premium men’s soap, shea butter + coconut oil, 125 g (buy 3 get 1)

Pears moisturising 4.5 bathing bar soap with glycerine pure and gentle for golden glow (125 g × 5)

Total no of ratings

Average star rating

Products

(continued)

Positive Positive Negative Positive

Review 3 Review 4 Review 5

Positive

Review 5 Positive

Negative

Review 4

Review 2

Negative

Review 3

Review 1

Positive

Positive

Review 5 Review 2

Positive

Review 4 Positive

Negative

Review 3

Review 1

Positive

Review 2

80.40

63.90

99.80

75.50

53

100

100.00

87.80

99.10

76

45.40

90.60

67.80

99.80

Confidence level (%)

Sentiment analysis Sentiment

Reviewers

39.74

23.18

33.71

Happy (%)

38.30

40.39

33.49

Angry (%)

26.54

32.71

Excited (%)

41.04

30.23

Sad (%)

Emotion analysis of customer reviews

24.22

32.15

27.75

28.21

Fear (%)

Bored (%)

5

4

5

1

1

5

1

1

4

4

5

5

1

5

Star ratings

Reflection of Star Ratings on Online Customer Reviews … 127

128

C. Selvaraj and A. Nallasivam

References 1. Kelly J (2018) Five stars: the importance of online reviews to your sales. new initiatives marketing. https://newinitiativesmarketing.com/five-stars-positive-online-reviews-meanmore-sales/ 2. Aral S (2013) The Problem with online ratings. MIT sloan management review. https://sloanr eview.mit.edu/article/the-problem-with-online-ratings-2/ 3. Engler TH, Winter P, Schulz M (2015) Understanding online product ratings: a customer satisfaction model. J Retail Consum Serv 27:113–120. https://doi.org/10.1016/j.jretconser.2015. 07.010 4. Eberl JM, Tolochko P, Jost P, Heidenreich T, Boomgaarden HG (2020) What’s in a post? How sentiment and issue salience affect users’ emotional reactions on Facebook. J Inform Tech Polit 17(1):48–65. https://doi.org/10.1080/19331681.2019.1710318 5. How to display customer review star ratings (2021) Volusion V1 help center. https://helpcenter. volusion.com/en/articles/424394-how-to-display-customer-review-star-ratings 6. Park S, Nicolau JL (2015) Asymmetric effects of online consumer reviews. Ann Tour Res 50:67–83. https://doi.org/10.1016/j.annals.2014.10.007 7. Ghelber A (2021) A quick guide on sentiment analysis using product review data (2020). Revuze. https://www.revuze.it/blog/sentiment-analysis-using-product-review-data/ 8. Lak P, Turetken O (2014) Star ratings versus sentiment analysis—a comparison of explicit and implicit measures of opinions. In: 2014 47th Hawaii international conference on system sciences 9. Maslowska E, Segijn CM, Vakeel KA, Viswanathan V (2019) How consumers attend to online reviews: an eye-tracking and network analysis approach. Int J Advert 39(2):282–306. https:// doi.org/10.1080/02650487.2019.1617651 10. Rajeswari B, Madhavan S, Venkatesakumar R, Riasudeen S (2020) Sentiment analysis of consumer reviews—a comparison of organic and regular food products usage. Rajagiri Manage J 14(2):155–167. https://doi.org/10.1108/ramj-05-2020-0022 11. Yoon Y, Kim AJ, Kim J, Choi J (2019) The effects of eWOM characteristics on consumer ratings: evidence from TripAdvisor.com. Int J Advertising 38(5):684–703. https://doi.org/10. 1080/02650487.2018.1541391 12. Campbell C (2015) Star ratings matter just as much as (if not more than) online reviews. Entrepreneur. https://www.entrepreneur.com/article/250838 13. Hoffart JC, Olschewski S, Rieskamp J (2019) Reaching for the star ratings: a Bayesian-inspired account of how people use consumer ratings. J Econ Psychol 72:99–116. https://doi.org/10. 1016/j.joep.2019.02.008 14. Taparia A, Bagla T (2020) Sentiment analysis: predicting product reviews’ ratings using online customer reviews. SSRN Electron J. https://doi.org/10.2139/ssrn.3655308 15. Georgiev D (2021) Online reviews statistics & facts in 2020 [Infographic]. Review42. https:// review42.com/resources/online-reviews-statistics/ 16. Thu HNT (2020) Measuring guest satisfaction from online reviews: evidence in Vietnam. Cogent Soc Sci 6(1):1801117. https://doi.org/10.1080/23311886.2020.1801117 17. Venkatesakumar R, Vijayakumar S, Riasudeen S, Madhavan S, Rajeswari B (2020) Distribution characteristics of star ratings in online consumer reviews. Vilakshan XIMB J Manage 18(2):156–170. https://doi.org/10.1108/xjm-10-2020-0171 18. D’Arbelles K, Berry P, Theyyil A (2020) Electronic word-of-mouth marketing on Amazon: exploring how and to what extent Amazon reviews affect sales. McMaster J Commun 12(1):50– 79. https://doi.org/10.15173/mjc.v12i1.2384 19. De Pelsmacker P, Dens N, Kolomiiets A (2018) The impact of text valence, star rating and rated usefulness in online reviews. Int J Advert 37(3):340–359. https://doi.org/10.1080/02650487. 2018.1424792

An Integrated Machine Learning Approach Predicting Stock Values Using Order Book Details Hemantkumar Wani and S. H. Sujithkumar

Abstract Machine learning, Artificial Intelligence and deep learning can be applied in numerous fields like medical diagnosis, computer networks, travel and tourism, banking, stock market, etc. The proposed approach emphasizes the stock market to forecast the stock values, average prices, turnover, etc. This integrated model is built on recent methods such as linear regression, multiple linear regression, classification, clustering, long short term memory (LSTM), convolutional neural network (CNN), etc. Model processes the sample data downloaded from the National Stock Exchange (NSE) which consists of the previous stock value, open price, high price, low price, average price, turnover, total traded quantity, etc. The model applies the linear regression, multiple linear regression, K-means clustering, Bayesian correlation and LSTM to forecast the desired values such as future turnover, probable total traded quantity if the average price and previous closing prices are increases. It executes the multiple linear regression and found as the previous closing price of the stock and average price on a daily basis increases it directly affects the total traded quantity. K-means and Bayesian correlation helps in estimating the stock values, turnover and total traded quantity. Bayesian correlation helps in correlating the variables and estimates the evidence in terms of ‘very strong’, ‘strong’, ‘moderate’, etc., and generates charts/plots accordingly. The advantage of using such model is less human intervention, minimal time to process the data and accurately generate the results. This approach consists of both practical and theoretical aspects to explore the stock market with different features. Keywords Machine learning · Deep learning · Classification · Clustering · Stock values · Price of the stock · Linear regression · Multiple linear regression

H. Wani (B) BIET Davangere, Davangere, Karnataka, India e-mail: [email protected] S. H. Sujithkumar BIET MBA Department, Davangere, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_10

129

130

H. Wani and S. H. Sujithkumar

1 Introduction Massive trades are executed on both the exchanges, i.e., Bombay Stock Exchange (BSE) and National Stock Exchange (NSE). All the clients (traders and investors) have their demat accounts associated with the registered authorized depository. For these ‘n’ trades, there is a need to record all the transactions, i.e., selling and buying, and exchanges are responsible to do the same. Price movement on daily basis plays a key role in these transactions and according to these daily prices, clients placed their orders. The proposed approach emphasizes using recent technological trends such as machine learning, deep learning, recurrent neural network (RNN), convolutional neural network (CNN), long short term memory (LSTM), etc., to forecast the stock values, turnover, etc. We are integrating these recent methods to forecast the stock value on daily basis. However, there are many challenges like relative strength index (RSI) changes rapidly, sentiment plays very important role in such systems. This is an integrated approach that predicts stock values using linear regression, multiple linear regression and other machine learning techniques. Linear regression helps predict the desired value, e.g., one can use ‘previous close price’ of the stock and ‘total traded quantity’ or ‘day’s average price’ and ‘total traded quantity’ to forecast the future sale following the day’s average prices. Similarly, one can apply multiple linear regression using multiple attributes (values) to forecast the future value of the stock. There are ‘market orders’ and ‘limit orders’. A limit order is an order to buy or sell a stock at a specific price or better. A buy limit order can only be executed at the limit price or lower, and a sell limit order can only be executed at the limit price or higher. A limit order is not guaranteed to execute. It can only be filled if the stock’s market price reaches the limit price. While limit orders do not guarantee execution, they help ensure that an investor does not pay more than a predetermined price for a stock.

2 Review of Literature Trading rules are required to get the maximum profits from the stock market. However, developing such rules is not a simple task as it is totally based on the bids and offers on real-time basis. Genetic algorithms, ANN and machine learning models can help in predicting the approximate value of the stock. Genetic algorithms (GA) are suitable for handling ‘n’ parameters and give a proper solution. GA requires setting a threshold or fitness function to train the objects from the population. Trading rules are the key players in buying and selling strategy. The authors have used a dataset of India Cement from 2011 to 2012. ‘N’ trading rules are devised and compared to check which is the most profitable? A simulation is executed with the help of these trading rules to evaluate the results [1]. A limit order book data and its strategy are discussed to predict the price jump of stock. In the limit order book, there are “bidders” and “sellers”, and these values are continuously changing

An Integrated Machine Learning Approach Predicting Stock Values …

131

in real-time manner. With these limit order book values, one can observe the volume associated with the stock, price gaps, and other relevant information associated with the stocks. Authors have proposed a model based on logistic regression to predict the price jump in stocks and named this model—LASSO. Being a trader, clients always wish to sell the stock at a good price and others wish to buy it at a low price. These limit orders are highly dynamic, they can be changed, they can be canceled, etc. Best bid and best offers are the key players in this strategy. Idea is to calculate the Ask side inter-trade price jump indicator and bid side inter-trade price jump predictor and then applying the LASSO model (logistic regression) can give better results [2–4]. A convolutional neural network (CNN) can be used in stock value prediction and analysis [5–7]. Time series data is processed and analyzed to check the fluctuations in prices of the stocks. Few parameters in time series analysis are important such as the closing of the stock, the volume traded, day’s high, day’s low and day’s open. “Timestamps” can play a major role in stock value prediction as different timestamps “t” can have different prices “p”. One can calculate the difference between ‘t 1 ’ and ‘t 2 ’ and can be represented as follows: r 1 = (P2 − P1) / P1

(1)

Unsupervised learning and deep learning techniques are used to predict the stock value by processing the time series data. Time series data is continuous, have different timestamps (date and time), may have different values recorded, and the task is to find the differences between these timestamps. One drawback of time series data is its uncertainty. Therefore, predicting the accurate value of the stock is not possible using time series analysis. However, an attempt should be made to find the closest, approximate values of the stocks. One can use the hidden Markov model (HMM), recurrent neural network (RNN), auto-encoding, etc., can be used to process the time series data [8–10]. Authors have analyzed NASDAQ’s large dataset with feature analysis to predict the price of the stock. In depth analysis of the stocks with a limit order book is conducted to predict the stock value. The key concept in this proposed study is to predict the mid-price of the stock and study whether its price will increase, decrease or can remain the same. The model should consider the best ask price and best bid price. The authors have suggested calculating the possible state of the order book. Therefore, timestamp ‘t’ is very important and for different timestamp ‘t’ state ‘s’ can be calculated as—(no. of shares at the best bid price)/100 and similarly (no. of shares at the best ask price)/100. The price of the stock changes rapidly and remains stationary for a certain timestamp. Idea is to calculate the mid-price change between different timestamps [3, 11]. A model is proposed to estimate the volatility for the limit order book data. ‘n’ observations, i.e., ask and bid prices of the stock can help in predicting the volatility and approximate future value of the stock on an intra-day basis [3, 12]. Authors have studied 02 different stock markets—the National stock Exchange (NSE) of India and the New York Stock Exchange (NYSE). Daily reports are collected and the proposed model ARIMA helps in calculating the outperformance of these two exchanges. There are many models developed using

132

H. Wani and S. H. Sujithkumar

linear and nonlinear techniques, deep learning techniques, recurrent neural networks (RNN), convolutional neural networks (CNN), machine learning techniques, etc. Authors have also suggested one can use long short term memory (LSTM). The onestep-ahead prediction of the document’s topics requires not only the latest data but also the previous data. The benefit of the self-feedback mechanism of the hidden layer, the RNN model has an advantage in dealing with long-term dependence problems, but there are difficulties in practical application. LSTM unit consists of a memory cell that stores information that is updated by three special gates: the input gate, the forget gate and the output gate. The output of the previous STEP is given input to the next step. Therefore, it is necessary to remember all the steps. As its name suggests RECURRENT, it is nothing but regular neural network logic is repeated. As discussed o/p of the previous phase will be given as an input to the next phase, it means remembering the PAST and output/decision of the next phase will be influenced by the PAST LEARNIG [6, 7, 9, 13–15]. Authors have used nonparametric evaluation methods predicting the price and volatility of the NASDAQ order book data. The objective of this research work is to assess the price jumps in stock values using discontinuous leverage effects. The authors have evaluated 06 years from NASDAQ data of 320 organizations. Spot volatility estimation is the prime objective of this proposed model where timestamp‘t’ and values ‘V 1 ’, ‘V 2 ’………… ‘V n ’ are recorded. Smoothing the values on a certain time period gives the probable volatility estimation [16]. A generative adversarial network (GAN) with multi-layer perceptron (MLP) is proposed by the authors to predict the stock values. Authors have also suggested using LSTM one can forecast the probable closing price of the stock. Data is collected from S&P 500 and proposed the model GAN with MLP and LSTM helps in forecasting the closing prices of the stocks [17].

3 Proposed Methodology The Table 1 describes the selling and buying order details. Current market price (CMP) is the key value and based on this CMP orders are placed. As shown in the Table 1, ‘n’ orders are placed at the desired price. ‘B1’ order is to buy the shares @ Rs. 80 and the quantity is 5 * 50 = 250. However, unless there are sellers selling the shares@ Rs.80 order ‘B1’ cannot be executed. Table 1 Limit orders Bid

Orders

Qty

Offers

Orders

Qty

B1

80

5

50

S1

81

3

50

B2

79.5

8

25

S2

81.5

10

40

B3

78

10

40

S3

82

15

100

An Integrated Machine Learning Approach Predicting Stock Values …

133

As soon as ‘S1’ executes the order @ Rs.80 and if there are buyers orders on both the side will get executed. These values in the Table 1 are dynamically changing in accordance with the CMP. To predict the stock value proposed integrated approach applies linear regression, multiple linear regression, long short term memory (LSTM) and convolutional neural network (CNN). The proposed machine learning integrated approach helps forecast the stock value by analyzing the previous close, open price, high and low price, average price, turnover, etc. There are ‘n’ parameters with which the model tries to forecast the stock value, turnover, etc. Sample data is downloaded (e.g. TATA MOTORS) from National Stock Exchange (NSE) [18]. It consists of parameters as given in Table 4. Multiple linear regression involves ‘n’ parameters such as ‘average price’ and ‘previous close’ to predict the ‘turnover’. The model helps evaluate the sample data using LSTM and CNN. The proposed approach will surely be beneficial in analyzing and accurately predicting the stock values, turnover, etc. The advantage of the integrated approach is with the help of various methods one can predict the stock value, turnover, etc. Sample data is collected from NSE exchange which consists of the following attributes—symbol, series, date, previous close, open price, high price, low price, last price, average price, total traded quantity, no. of trades, deliverable quantity, turnover, etc.

3.1 Multiple Linear Regression (MLR) In MLR, the model processes multiple variables, i.e., total traded quantity in this example. Matrix plot for prices and traded quantity is shown in the Fig. 1. In the following example, Tables 2 and 3 consists of ‘average price’, ‘previous close’ as an independent variables and ‘Total Traded Quantity’ as a dependent variable. Independent variables are processed to forecast the values for dependent variable. Linear regression is expressed as follows: y = b0 + b1 ∗ x1

(2)

Multiple linear regression is expressed as follows: y = b0 + b1 ∗ x1 + b2 ∗ x2 + · · · · · · · · · + bn ∗ xn X_Test See Table 3. X_Train See Table 4.

(3)

134

H. Wani and S. H. Sujithkumar

Fig. 1 Matrix plot for prices and traded quantity Table 2 Output of multiple linear regression Average price

Previous close

Total traded quantity

475

475

19,237,384

500

475

14,664,928

525

500

11,300,255

550

525

7,935,582

575

550

4,570,908

600

575

1,206,235

Table 3 X_test data Index

Average price

Total traded quantity

6

493.32

16,951,593

19

478.62

22,860,916

11

491.60

13,581,352

20

477.93

10,263,884

17

465.86

15,487,368

12

473.46

27,341,178

9

487.95

17,722,547

An Integrated Machine Learning Approach Predicting Stock Values … Table 4 X_train data

135

Index

Average price

Total traded quantity

21

471.40

11,923,961

4

477.76

21,280,797

22

479.23

15,541,446

0

469.83

28,256,188

1

476.44

21,012,008

13

449.56

33,007,516

14

455.58

18,838,054

18

467.81

12,557,565

5

489.94

20,581,817

2

481.07

20,948,900

10

491.70

16,263,723

16

473.59

12,116,814

15

467.28

22,730,750

7

493.50

14,567,815

3

473.56

17,473,075

8

500.82

21,921,272

3.2 Long Short Term Memory (LSTM) The proposed approach uses Long short term memory (LSTM) to predict the stock value by processing the different attributes such as ‘previous close’, ‘average price’ and ‘total trade quantity’. Training data (samples) is given as an input to the LSTM. Trainx data is going to the neural network and Trainy is the predicted future values. Seq_size or look back will look into the ‘n’ previous steps predicting LSTM with hidden dense values. Training and testing data are processed to predict the future values.

3.3 K-means Clustering K-means clustering algorithm used to train the sample data downloaded from NSE as illustrated in Table 5. Training the data is actually calculating the distance between samples ‘S1 ’, ‘S2 ’… ‘Sn ’. Table 5 K-means clustering results K-means clustering Clusters

N

R2

AIC

BIC

Silhouette

3

23

0.68

118.220

148.880

0.420

136

H. Wani and S. H. Sujithkumar

Fig. 2 Elbow method plot

Figure 2 illustrates the “Elbow method” generating a plot with the “total within the sum of squares on the y-axis” and “the number of clusters on the x-axis”. Used to determine the optimal number of clusters. The plot shows three curves AIC, BIC and ‘elbow method’ optimization. Distance is calculated and a sample with a larger distance is likely to be selected as a center point. To identify the structure of the data, one can apply clustering techniques. Sub-groups can be identified and objects in the one sub-group are quite similar, whereas they are distinguishable to other clusters. Centroid calculation, assigning objects, computing the distances and forming the clusters are the initial steps in the K-means clustering algorithm.

3.3.1

Cluster Density Plots

Cluster density is nothing but drawing a plot/s for the predictor variable used in the model. It helps detect the plot areas where to concentrate and identify the empty as well as sparse areas. The following figures illustrate the cluster density plots of the “previous price of the stock”, “open price”, “high price”, “low price” and the “turnover” features. The cluster density plot helps identify unique clusters or groups as shown in the cluster density plots (Figs. 3, 4, 5, 6 and 7). If there is a high point density, then it will be separated from the low point density. On the ‘X’ axis, the model has simulated the “high price” and “low price” as shown in Figs. 5 and 6. One can observe, there are few “peaks’ and “sparse” regions. The model simulates the “X” axis and “Peak” indicates the highly dense region and “Sparse” region. Figure 8 illustrates the visualization of the clustered points as it calculates the probability score based on similarity. T-SNE is a very successful approach in visualizing large data points.

An Integrated Machine Learning Approach Predicting Stock Values …

137

Fig. 3 Cluster density plot (I)

Fig.4 Cluster density plot (II)

Fig. 5 Cluster density plot (III)

3.4 Bayesian Correlation The model processes random variables and estimates the association between these variables. Using this approach, the probability of estimation for the desired variable can be calculated, e.g., model processes, ‘turnover’, ‘previous close’ and ‘average price’. Correlational results are as shown below:

138

Fig. 6 Cluster density plot (IV)

Fig. 7 Cluster density plot (V)

Fig. 8 T-SNE cluster plot

H. Wani and S. H. Sujithkumar

An Integrated Machine Learning Approach Predicting Stock Values …

139

Figure 9 illustrates the scatterplot for the 02 variables “average price” and “turnover”. It helps identify the correlation between two variables. As observed in the Fig. 9, as the average price of the stock gets increased, there is a decline in turnover of the company. Similarly, evidence can be generated as shown in Figs. 10, 11, 12, 13, 14 and 15 to like “very strong”, “strong”, etc.

Fig. 9 Scatterplot for average price–turnover

Fig. 10 Sequential analysis for average price—turnover

140

H. Wani and S. H. Sujithkumar

Fig. 11 Scatterplot for turnover—prev close

Fig. 12 Sequential analysis for turnover—prev close

4 Sample Data Analysis Descriptive statistics is illustrated in Table 6 Distribution plots are illustrated in Fig. 13. Scatter plots for prev close—open price and prev Close—high price are illustrated in Fig. 14. Figure 15 illustrates scatter plot for prev close—low price, prev close— average price and prev close—total traded quantity.

An Integrated Machine Learning Approach Predicting Stock Values …

141

Fig. 13 Distribution plots

5 Conclusion This approach explores the stock values and forecasting the same, turnover, probable total traded quantity, etc. With less human intervention, an automated system was developed to process the data and give accurate (probable) results in less time. Unlike traditional models, the proposed integrated approach carefully processes the features. Multiple linear regression helps in predicting the dependent variable, Bayesian correlation estimates the relationship between variables and LSTM by recognizing historical values helps in predicting the future possible value.

142

H. Wani and S. H. Sujithkumar

Fig. 14 Scatter plots (I)

Fig. 15 Scatter plots (II) Table 6 Descriptive statistics Prev close

Open price

High price

Low price

Average price

Total traded quantity

Valid

23

23

23

23

23

23

Missing

0

0

0

0

0

0

Mean

477.033

478.374

483.687

471.996

477.731

1.884e+7

Median

476.000

478.750

482.800

471.700

477.760

1.772e+7

Mode

447.050

452.100

460.000

440.000

449.560

1.026e+7

Std. deviation

13.506

12.454

11.812

13.320

12.559

5.673e+6

Skewness

− 0.394

− 0.247

− 0.148

− 0.439

− 0.242

0.724

Std. error of skewness

0.481

0.481

0.481

0.481

0.481

0.481

Kurtosis

− 0.339

− 0.272

− 0.185

0.150

− 0.018

0.395

Std. error of kurtosis

0.935

0.935

0.935

0.935

0.935

0.935 (continued)

An Integrated Machine Learning Approach Predicting Stock Values …

143

Table 6 (continued) Prev close

Open price

High price

Low price

Average price

Total traded quantity

Shapiro-wilk 0.940

0.977

0.969

0.968

0.969

0.955

P-value of 0.182 Shapiro-wilk

0.845

0.665

0.632

0.670

0.363

Minimum

447.050

452.100

460.000

440.000

449.560

1.026e+7

Maximum

495.350

499.450

506.400

494.200

500.820

3.301e+7

Sum

10,971.750 11,002.600 11,124.800 10,855.900 10,987.810 4.332e+8

25th percentile

470.300

472.625

477.200

464.775

470.615

1.503e+7

50th percentile

476.000

478.750

482.800

471.700

477.760

1.772e+7

Acknowledgements This work is supervised by Prof. (Dr.) Sujithkumar S. H. BIET MBA Department, Davangere. Author thanks him for continuous help and guidance to complete this research work. Hemantkumar Wani also thanks Director, Head of Institution and other technical staff of the University for continuous guidance, help and providing the necessary infrastructure.

References 1. Naik L et al (2012) Prediction of stock market index using genetic algorithm. Comput Eng Intell Syst 3(7):162–171 2. Zheng B, Eric M, Frédéric A (2012) Price jump prediction in limit order book. arXiv preprint arXiv:1204.1381 3. Zhang Z, Stefan Z, Stephen R (2019) Deeplob: Deep convolutional neural networks for limit order books. IEEE Trans Signal Process 67(11):3001–3012 4. Mäkinen Y et al (2019) Forecasting jump arrivals in stock prices: new attention-based network architecture using limit order book data. Quant Financ 19(12):2033–2050 5. Siripurapu A (2014) Convolutional networks for stock trading. Stanford Univ Dep Comput Sci 1(2):1–6 6. Chen S, Hongxiang H (2018) Stock prediction using convolutional neural network. IOP Conf Ser Mater Sci Eng 435(1) 7. Jiang W (2021) Applications of deep learning in stock market prediction: recent progress. Expert Syst Appl 184:115537 8. Längkvist M, Lars K, Amy L (2014) A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recogn Lett 42:11–24 9. Ntakaris A et al (2018) Benchmark dataset for mid-price forecasting of limit order book data with machine learning methods. J Forecast 37(8):852–866 10. Sirignano JA (2019) Deep learning for limit order books. Quant Financ 19(4):549–570 11. Palguna D, Ilya P (2016) Mid-price prediction in a limit order book. IEEE J Sel Top Signal Process 10(6):1083–1092 12. Bibinger M, Moritz J, Markus R (2016) Volatility estimation under one-sided errors with applications to limit order books. Ann Appl Probab 26(5):2754–2790 13. Hiransha M et al. (2018) NSE stock market prediction using deep-learning models. Procedia Comput Sci 1351–1362

144

H. Wani and S. H. Sujithkumar

14. Velay M, Fabrice D (2018) Stock chart pattern recognition with deep learning. arXiv preprint arXiv:1808.00418 15. Tsantekidis A et al (2020) Using deep learning for price prediction by exploiting stationary limit order book features. Appl Soft Comput 93:106401 16. Bibinger M, Christopher N, Lars W (2019) Estimation of the discontinuous leverage effect: evidence from the NASDAQ order book. J Econometrics 209(2):158–184 17. Zhang K et al (2019) Stock market prediction based on generative adversarial network. Procedia Comput Sci 147:400–406 18. NSE India homepage. https://www.nseindia.com/

Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron Optimization Truong Tran Mai Anh and Tran Duc Vi

Abstract The Multi-layer Perceptron (MLP) is extensively used in solving realworld problems. Backpropagation (BP) is often employed to deal with training data in MLP. However, this method is quickly faced with premature convergence problems and local traps. In this study, the Hybrid Genetic-Bees Algorithms (HGBA) in training MLP is reported for the first time. The genetic algorithm (GA) is used to improve the global search phase of the Bees Algorithm (BA), which is then to search for better solutions. The proposed HGBA is tested on four standard UCI classification datasets with different levels of difficulty. Experimental results show that HGBA provides significantly better performance than Particle Swarm Optimization (PSO) in training MLP with higher accuracy. Keywords Metaheuristic · Multi-layer Perceptron (MLP) · HGBA · Neural network · Classification problem

1 Introduction Artificial neural network (ANN) has been widely known for its efficiency in solving classification problems. The reason is that ANN can extract the meaningful and accurate information from complex dataset [1, 2]. Based on the characteristics of mathematical neurobiology, the data can be clustered with similarities into labels and be extracted to find useful information. In recent years, ANNs have sparked great academic interest and have progressed rapidly in different fields. Some of them include: pattern recognition [3, 4], maximum power point tracking [5–7], image processing [8], etc. Feed forward neural networks (FNNs) is one of the most widely used artificial neural network structures, with only layers of neurons and forward connection only allowed [9]. This method can be applied to classification task in nonlinear and discrete patterns [10]. T. T. M. Anh · T. D. Vi (B) School of Industrial Engineering and Management, International University, Vietnam National University, Block 6, Linh Trung Ward, Thu Duc District, Ho Chi Minh City, Vietnam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_11

145

146

T. T. M. Anh and T. D. Vi

There are two main directions to train FNNs: unsupervised and supervised learning. The former works with determining the number of constituent groups that characterize certain data [11]. Its objective is to separate a subset of data based on characteristics in which member of one group is different from that of others. Unsupervised learning is mostly used with unlabeled data, with the main goal of exploratory analysis. In this work, we will focus on the supervised learning methodology. In consideration of training MLP in supervised learning, there are also two main directions: gradient-descent methodology and heuristic search methodology [12]. In terms of the gradient-descent method, the backpropagation method, which determines appropriate weights in a neural network, has been mostly discussed [13]. Arbitrary weight value will be initialized to define the network, then learning is repeatedly adjusting the weights after the iterative process. There are numerous variations to use backpropagation. Rumelhart et al. [14] first studied backpropagation by its simplest form. They constructed a basic gradient-descent method for determining weights that can minimize the system’s means squared error (MSE). Then Le Cun [13] improved the method by attaching local criteria with each unit in order to minimize locally. However, the number of iterations needed to converge in several applications is very large. Several researchers had modified backpropagation by updating the learning rate after each iteration [14]. Vogl et al. [15] focused on speeding the convergence of backpropagation by changing the number of iterations and learning rate. However, by using this method, literature states the solution may lead easily lead to a local trap, so that convergence to global optimum is not always guaranteed [12]. The other direction—heuristic search has been attracted many researchers to deal with this problem in recent years. Natural-inspired metaheuristic algorithms, such as Bees Algorithm (BA) [16], Artificial Bee Algorithm [17–19] genetic algorithms (GA) [20], Grey Wolf Optimization (GWO) [12], and Particle Swarm Optimization (PSO) [9], prove their power in handling with optimization task in training MLP. Pham et al. [16] employed Bees Algorithm (BA) as the “wrapper” feature selection for training MLP. This method can allow the system to choose a set of features that can minimize classification errors. Ding et al. [20] combined genetic algorithm (GA) and BP to learn the connection weights, leading to speed up the learning phase. Particle Swarm Optimization (PSO) was used as a stochastic algorithm that does not require gradient information from the system. Meng et al. [12] developed a new mechanism of GWO, including elastic, circling, and attacking techniques to overcome the difficulties of the system. Although these algorithms show good performance in the testing dataset, the No Free Lunch theorem [21] indicates that no heuristics can do all optimization tasks in different scenarios. Hence, a new method or hybrid algorithm should be proposed in order to solve the specific optimization task by utilizing the strengths of the two methods. BA is a population-based search algorithm that mimics the natural foraging behavior of honey bees and has been applied to different optimization problems. This algorithm is popular with the ability to avoid the local minimum trap because there are two main phases: global search and local search. Its applications are ranging from the scheduling problem [22], manufacturing cell formation [23] to supply chain problem

Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron …

147

[24]. However, BA is not very good at the global search phase. Random search is the main component of the global search phase in BA, which creates high variance. Therefore, other techniques should be combined to overcome these difficulties. Hence, the motivations of this work are as follows: • Reduce the drawbacks of training MLP by using metaheuristics to increase the accuracy of the classification. • Combine BA with another algorithm to enhance the performance of global search. • Improve the performance of BA to effectively find the global solution without easily coming to local optimum. In this paper, we will introduce a novel hybrid algorithm: Hybrid Genetic Bees Algorithm (HGBA), which combines GA to improve the global search phase of BA. This proposed method will contribute to minimizing the means squared error (MSE) and hence improving the accuracy of MLP. The rest of this paper will be arranged as follows: we will review BA with its pros and cons and our proposed method in training MLP in Sect. 2. Section 3 will be how we set up the experiments, and we will discuss the results in Sect. 4. The summary of the project and future development will be introduced in Sect. 5.

2 Methodology 2.1 Overview of Bees Algorithm (BA) The Bees Algorithm (BA) is a population-based search algorithm that mimics the natural foraging behavior of honey bees and has been applied to different optimization problems [25]. The main process of BA includes the following steps: • To exploit colonies of scout bees for foraging. • To assess the good paths and do the waggle dance to recruit forager bees for further search. • To do the shrinking area for local search or abandon exploited areas. There are two types of bees in BA: scout bees and forager bees. Scout bees will do the global search for elite sites, and forager bees, after being recruited, will do the local search in these sites. Forager bees also can become scout bees if the searched sites become elite sites. The optimization search process can be separated into three simultaneous processes: (1) Exploiting scout bees randomly for global search, (2) Assessing the paths, and (3) Doing the local search by forager bees. The overall flowchart is depicted in Fig. 1. Process 1: Exploiting Scout Bees Randomly for Global Search The algorithm begins by setting up several parameters, including the number of scout bees (ns), the number of elite bees (ne), the number of selected regions out of n points (nb), the number of recruited bees around elite regions (nse), the number of recruited bees

148

T. T. M. Anh and T. D. Vi

Fig. 1 Flowchart of Bees algorithm proposed by Pham et al. [25]

around other selected (m − e) regions (nr b), and the stopping criteria. Then, similar to scout bees, N bees are deployed randomly on the search space. Process 2: Assessing Paths Each scout bee on the population space assesses the field’s fitness. Scout bees use a similar natural seeking (or scouting) method. After that, we select and save top bees with superior fitness for the next population. Only one representative bee with the best fitness will be chosen for each site. There are no such limitations in nature. Bees are more adaptable, but in terms of efficiency and optimization, it is sufficient to choose only one representative from each location. The next step will send the remaining bees around the search area randomly. The algorithm, like honey bees, maintains its random seeking aspect by dispatching scout bees to seek new potential answers. This process will be repeated until a certain condition has been satisfied. At the end of each cycle, the colony will arrange itself into three parts: elite bees, representatives from each swarm, and the colony’s new population—elite bees, representatives from each selected site, and remaining bees assigned to random search. Process 3: Doing the Local Search by Forager Bees After selecting the top bees, the algorithm then assigns elite bees locations to search directly around them in neighborhood search, and other positions are picked, either the best or using a roulette wheel proportionate to fitness. After this step, the bees seek inside the limits for these spots, assessing their individual fitness. Around elite spots, more bees will be recruited, whereas around the remaining chosen points, fewer bees will be recruited. Recruitment, along with the scout bee idea, is one of the key assumptions of the bee algorithm, both of which are employed in nature. In comparison with other algorithms, the Bees Algorithm has the following advantages: it can do local and

Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron …

149

global searches, it can be used to solve a variety of optimization problems, it is simple to use, and it can be hybridized with other algorithms. However, we must consider the global search phase. As this is merely a random search, the variation will be large. Additionally, we cannot guarantee the system will find other global solutions when the problem becomes larger. Therefore, improvement of the global search phase must be made.

2.2 Improving Global Search Phase of BA by Employing GA In this work, we focus on improving the global search phase of BA. Local search phase provides us the temporary of (nr e × ne + (nb − ne) × nr b) scout bees. These temporary sites are then fed into the global search phase. In this phase, the threshold is identified to decide whether this “bee” will do the random search or genetic algorithm to diversify the solution. The procedure is completely the same as the traditional bees algorithm in the random search. This will generate the completely new “bee” to search other spaces in the random search phase. In terms of the genetic algorithms, roulette wheel selection (proportionate fitness selection) is used to choose “bee” based on fitness value. The lower the fitness value, the higher the probability of choosing the bee to continue the process. After that, mutation and crossover operators are applied to find a promising solution.

2.3 Our Proposed Method: HGBA for Training MLP 2.3.1

Training Method

The training method in this work is to find set of weights and biases to minimize the loss function. Hence, this work aims to minimize the Mean Squared Error in the training dataset. Figure 2 will be used to illustrate how the algorithms works. The overall process is summarized as follows: 1. Initialization: The set of weights and biases is initialized by encoding strategy. This encoding solution, which is called “scout bee,” is fed into the population. 2. Mapping and evaluation: Each “scout bee” is decoded into weights and biases to evaluate the loss value. The lower the loss, the better the performance of MLP. 3. Optimization: After evaluating the loss value, the population uses the proposed GBA to optimize the loss function.

150

T. T. M. Anh and T. D. Vi

Fig. 2 Training MLP network by hybrid GBA

2.3.2

Encoding Strategy

Encoding strategy is important to organize the structure so that the performance of the metaheuristic can be improved. In this work, weights and biases are encoded as {iw, hw, hb, ob} where iw, hw, hb, ob are input weights, hidden weights, hidden biases, output biases, respectively. Figure 3 illustrates how we encode our solution. MLP Output Calculation After obtaining the input, weights, and biases, we follow the process to get the output of MLPs: sj =

n  (Wi j × X i j ) − θ j

(1)

i=1

The output of the hidden node is computed as follows: S j = sigmoid(s j ) =

1 1 + exp(−s j )

(2)

The following are the final outputs based on the hidden nodes’ computed outputs:

Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron …

151

Fig. 3 Weights and biases mapping for MLP

ok =

h  (wi j × S j ) − θk

(3)

j=1

Ok = sigmoid(ok ) =

1 1 + exp(−ok )

(4)

Criteria for Evaluating Performance After being trained, the ANN models will be evaluated using statistical metrics: Mean Squared Error (MSE). MSE represents the difference between the actual and predicted data. The lesser the Mean Squared Error, the closer the fit is to the dataset. RMSE is the error rate by the square root of MSE. n 1 (yi −  yi )2 (5) MSE = n i=1 yi is the neural network output. where yi is the desired output and  The Selection of the Activation Function Choosing the activation function in designing the ANN will define how well the model can learn and the prediction it can make. An activation function determines how the weighted sum of the input is turned into an output value. In this model, we use the Sigmoid function as the activation functions as below: σ (x) = where x is the input.

1 1 + e−x

(6)

152

T. T. M. Anh and T. D. Vi

Table 1 Parameter setting for HGBA Parameter (GBA)

Value

Number of scouts bees (ns) Number of best sites (nb) Number of elite sites (ne) Number of foragers in elite sites (nr e) Shrink factor (s f ) Mutation rate Crossover rate Max iterations Number of iterations unchanged

100 5 3 20 0.1 0.1 0.8 100 20

Table 2 Classification datatsets Dataset Attribute number Training samples Test samples XOR Breast cancer Heart Tic Tac Toe

3 9 22 9

8 599 80 637

8 100 80 300

Classes nuber 2 2 2 2

3 Experiments Settings In this work, we will benchmark the efficient performance of HGBA by testing on 4 classification datasets. The whole training process of each testing case will be done in 30 times to generalize the results. Parameters of HGBA algorithm and classification datatsets are as follows (Table 1). The classification datasets are summarized as follows (Table 2). We also consider the Particle Swarm Optimization for training MLP in the work of Kenedy and Eberhart [26]. Additionally, we also perform sensitivity analysis on shrink factor, number of scout bee, and iterations to find the optimal parameter settings for the algorithm. The results will be represented in the next section. The MLP structure for each dataset is summarized in Table 3. In this work, we will adopt the suggestions for classification problems: 2 × na + 1 where na is the number of attributes.

Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron … Table 3 MLP structure for classification datasets Dataset Number attributes XOR Breast cancer Heart Tic Tac Toe

3 9 22 9

153

MLP structure 3-7-1 9-19-1 22-45-1 9-19-1

Fig. 4 Convergence curve for “Breast cancer” dataset

4 Results and Analysis 4.1 Means Squared Error The following five figures illustrate the convergence curve of four different types of datasets. They were performed by two different metaheuristics algorithms: HGBAMLP and PSO-MLP. The dataset we used includes “Breast cancer” dataset, “Heart” dataset, “Tic Tac Toe Endgame,” and “XOR” dataset, which have already been described in Sect. 3. Overall, in all four datasets, HGBA-MLP always performs better than PSO-MLP in minimizing the objective value (Fig. 4).

154

T. T. M. Anh and T. D. Vi

Table 4 Sensitivity analysis on shrink factor Shrink factor Breast cancer Heart 0.001 0.002 0.005 0.01 0.05 0.1 0.2 0.5

0.039887534 0.042906719 0.042080124 0.047253134 0.046908873 0.042367223 0.038874115 0.047651649

0.248317181 0.245772466 0.245691644 0.24216748 0.244228577 0.252891685 0.247698323 0.250066689

Tic Tac Toe Endgame

XOR

0.255780597 0.258271766 0.258035425 0.255754658 0.245897409 0.252114593 0.253178375 0.250787427

0.023857062 0.030685722 0.037550701 0.03707171 0.036551992 0.025976337 0.033243973 0.03265171

Table 5 Sensitivity analysis on the number of scout bees (ns) ns Breast cancer Heart Tic Tac Toe Endgame 10 20 50 100 200

0.046708849 0.043891996 0.044581365 0.042171396 0.040102251

0.252640043 0.250344622 0.24997474 0.242662925 0.239898949

0.258003435 0.257088053 0.253710963 0.25023097 0.249604235

XOR 0.038086929 0.033867636 0.034522478 0.032089518 0.022426695

4.2 Shrink Factor (s f ) The neighborhood size of patches was shrunk by using the following equation: ngh i = ngh i−1 × s f

(7)

The shrinking factor is assumed to be from 0 to 1. If s f = 0, then this figure will eliminate the effect of the neighborhood size because the size will approach 0. If s f = 1, then there is no effect on the size. In the case that s f > 1, it will make the size become larger, which contradicts to our purpose. Table 4 illustrates how different the objective value is when we apply different values of the shrink factor. The bold value indicates the smallest value in this dataset. It can be seen that parameter tuning is relatively important as with different shrink factor value, the objective value for each dataset differs. For example, shrink factor of 0.2 will bring the best value for “Breast cancer” dataset while that for XOR, the shrink factor should set at 0.001.

Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron …

155

Table 6 Comparisons between HGBA-MLP and PSO-MLP Dataset

HGBA-MLP Mean

PSO-MLP Std

Accuracy (%)

Breast cancer

0.043491171 0.040366654 92.3

Heart Tic Tac Toe Endgame XOR

Mean

Std

Accuracy (%)

0.044877396 0.03001

92.1

0.247104256 0.049575268 66.7

0.246917228 0.03046

58.1

0.253727531 0.057619606 66.9

0.259985173 0.03321

60.0

0.032198651 0.0589139

0.053474034 0.061656

32.0

93.5

4.3 The Number of Scout Bees The value ns is applied at the initialization phase of Genetic Bees Algorithm. ns scout bees are uniformly distributed to explore the search space. As the good initial value will guide the bees to global optimal point, tuning the parameter is important. Table 5 gives that all dataset will obtain the best MSE with 200 initial scout bees. However, the trade-off would occur. It is because the higher the number of scout bees, the longer time it takes to execute the algorithm.

4.4 MSE and Accuracy Table 6 compares the performance of two algorithms applying to four different classification problems. It can be seen from the table that the higher accuracy rate belongs to our proposed algorithm. Moreover, the average of HGBA-MLP is also smaller than the PSO-MLP for all cases except for the “Heart” dataset. The HGBA-MLP gives a higher deviation when compared to the figure for PSO-MLP, but the difference is not large enough to consider.

5 Conclusion In this work, we consider another way of applying a hybrid metaheuristic algorithm in neural networks to solve the optimization problem. The goal of optimization algorithms is to find the optimal solution to a problem while meeting one or more objective functions, which are subject to some sets of constraints [27]. In this study, the hybrid method of GA and BA is used to get the optimal weights and biases. These values are then input into the feed forward neural network in training MLP. The proposed HGBA in training MLP is tested in 4 standard classification datasets,

156

T. T. M. Anh and T. D. Vi

including Breast cancer, Heart, Tic Tac Toe Endgame, and XOR with the Sigmoid activation function. The outcome result is very promising for the HGBA method compared with the popular algorithm of PSO in benchmarking. HGBA performs faster in convergence with lower MSE after the set number of iterations when training MLPs. Based on the training result and analysis, the findings from the experiment are shown as below: 1. The application of GA in BA can optimize the high variance of the random global search in BA to improve the solution exploration. 2. Of four test datasets, the HGBA has proven to be a successful strategy for training MLP in terms of biases and weights. 3. The robust ability of our proposed method in exploitation and exploration contributes to a lower error rate. As shown in the previous parts, the HGBA is proven to be a good optimization method applied in neural networks. This paper’s MLP optimization is confined to weights and bias; however, there is no mention of MLP topology optimization. Hence, in the future research, this should be taken into consideration for this optimization problem. Moreover, we only focus on the classification problem, more tests on various problems should be conducted to prove the effectiveness of this hybrid algorithm.

References 1. de Jesus Rubio J (2009) SOFMLS: online self-organizing fuzzy modified least-squares network. IEEE Trans Fuzzy Syst 17(6):1296–1309. https://ieeexplore.ieee.org/document/5196829/ 2. Nguyen G, Dlugolinsky S, Bobák M, Tran V, López García Á, Heredia I, Malík P, Hluchý L (2019) Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey. Artif Intell Rev 52(1):77–124. https://doi.org/10.1007/s10462-018-09679-z 3. Chinas P, Lopez I, Vazquez JA, Osorio R, Lefranc G (2015) SVM and ANN application to multivariate pattern recognition using scatter data. IEEE Latin Am Trans 13(5):1633–1639 4. Majidi M, Fadali MS, Etezadi-Amoli M, Oskuoee M (2015) Partial discharge pattern recognition via sparse representation and ANN. IEEE Trans Dielectr Electr Insul 22(2):1061–1070 5. Kermadi M, Berkouk EM (2017) Artificial intelligence-based maximum power point tracking controllers for photovoltaic systems: comparative study. Renew Sustain Energy Rev 69:369– 386 6. Lee HH, Phuong LM, Dzung PQ, Dan Vu NT, Khoa LD (2010) The new maximum power point tracking algorithm using ANN-based solar PV systems. In: TENCON 2010–2010 IEEE region 10 conference, pp 2179–2184. ISSN: 2159-3450 7. Ramaprabha R, Gothandaraman V, Kanimozhi K, Divya R, Mathur BL (2011) Maximum power point tracking using GA-optimized artificial neural network for solar PV system. In: 2011 1st international conference on electrical energy systems, pp 264–268 8. Egmont-Petersen M, de Ridder D, Handels H (2002) Image processing with neural networks— a review. Pattern Recogn 35(10):2279–2301 (2002). https://linkinghub.elsevier.com/retrieve/ pii/S0031320301001789 9. Mendes R, Cortez P, Rocha M, Neves J (2002) Particle swarms for feedforward neural network training. In: Proceedings of the 2002 international joint conference on neural networks. In: IJCNN’02 (Cat. No. 02CH37290). IEEE, Honolulu, HI, USA, pp 1895–1899. http://ieeexplore. ieee.org/document/1007808/

Hybrid Genetic-Bees Algorithm in Multi-layer Perceptron …

157

10. Zhao ZQ, Zheng P, Xu ST, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232 11. Oliver J, Baxter R, Wallace C (1996) Unsupervised learning using MML. In: Machine learning: proceedings of the thirteenth international conference (ICML 96). Morgan Kaufmann Publishers, pp 364–372 12. Meng X, Jiang J, Wang H (2021) AGWO: advanced GWO in multi-layer perception optimization. Exp Syst Appl 173:114676. https://linkinghub.elsevier.com/retrieve/pii/ S0957417421001172 13. Lecun Y (2001) A theoretical framework for back-propagation 14. Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Technical reports. Institute for Cognitive Science, California University, San Diego, La Jolla. https://apps.dtic.mil/sti/citations/ADA164453 15. Vogl TP, Mangis JK, Rigler AK, Zink WT, Alkon DL (1988) Accelerating the convergence of the back-propagation method. Biol Cybernet 59(4):257–263. https://doi.org/10.1007/BF00332914 16. Pham D, Mahmuddin M, Otri S, Al-Jabbouli H (2007) Application of the bees algorithm to the selection features for manufacturing data 17. Sharma K, Gupta P, Sharma H (2016) Fully informed artificial bee colony algorithm. J Exp Theoret Artif Intell 28(1–2):403–416 18. Sharma H, Bansal JC, Arya K, Yang XS (2016) Lévy flight artificial bee colony algorithm. Int J Syst Sci 47(11):2652–2670 19. Jadon SS, Bansal JC, Tiwari R, Sharma H (2015) Accelerating artificial bee colony algorithm with adaptive local search. Memetic Comput 7(3):215–230 20. Ding S, Su C, Yu J (2011) An optimizing BP neural network algorithm based on genetic algorithm. Artif Intell Rev 36(2):153–162. https://doi.org/10.1007/s10462-011-9208-z 21. Wolpert D, Macready W (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82 22. Packianather MS, Yuce B, Mastrocinque E, Fruggiero F, Pham DT, Lambiase A (2014) Novel genetic Bees algorithm applied to single machine scheduling problem. In: 2014 world automation congress (WAC), pp 906–911. ISSN: 2154-4824 23. Pham DT, Afify A, Koc E (2007) Manufacturing cell formation using the bees algorithm. In: Innovative production machines and systems virtual conference. Cardiff, UK 24. Lambiase A, Iannone R, Miranda S, Lambiase A, Pham D (2016) Bees algorithm for effective supply chains configuration. Int J Eng Bus Manag 8:1847979016675301 25. Pham D, Ghanbarzadeh A, Koc E, Otri S, Rahim S, Zaidi M (2005) The bees algorithm-technical report. Manufacturing Engineering Centre, Cardiff University, Cardiff 26. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95international conference on neural networks, vol 4. IEEE, pp 1942–1948 27. Yuce B, Packianather MS, Mastrocinque E, Pham DT, Lambiase A (2013) Honey bees inspired optimization method: the bees algorithm. Insects 4(4):646–662

Intellectual Identification Method of the Egg Development State Based on Deep Neural Nets Eugene Fedorov , Tetyana Utkina , and Tetiana Neskorodieva

Abstract In the article for the solution of the classification process automation problem and interpretation of ovoscoping eggs visualization results at an incubation, the intellectual identification method of chicken embryos development state based on deep neural networks is offered. The created generalized LeNet 2D model has the following advantages—the input image is not square that expands a scope; the input image contracts previously, and the new size is defined empirically and depends on the initial size of the image that increases the training speed of model and identification accuracy of the model; the number of couples “a convolution layer–the down sampling layer” is defined empirically and depends on the image size that increases identification accuracy on the model; full-connected layers are absent that increases the speed of model training; the quantity of layers planes is defined automatically that accelerates determination of model structure. The created one-block ViT model has the following advantages—the input image is not square that expands a scope; the image contracts previously, and the new size is defined empirically and depends on the initial size of the image that increases the model training speed and identification accuracy of the model; the size of a patch is defined empirically and depends on the image size that increases identification accuracy on the model; there is only one block that increases the speed of training of model and accelerates determination of model structure. The offered method of intellectual identification of chicken embryos development state based on deep neural networks can be used in various intellectual systems of visualization results identification of egg ovoscoping at an incubation on industrial production of poultry. Keywords Ovoscoping · Intellectual identification of eggs development state · Deep neural networks · Convolution neural network · Transformer

E. Fedorov (B) · T. Utkina Cherkasy State Technological University, Shevchenko blvd, Cherkasy 460, 18006, Ukraine e-mail: [email protected] T. Neskorodieva Vasyl’ Stus Donetsk National University, 600-richcha str., 21, Vinnytsia 21021, Ukraine © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_12

159

160

E. Fedorov et al.

1 Introduction When breeding poultry, both on poultry farms and in the household, to ensure providing a bigger percent of the exit of qualitative and healthy posterity and for early rejection of eggs with pathologies or other violations of development during incubation and before it, there is carried out ovoscoping of the poultry eggs. Ovoscoping is the process of determining a viable egg by transilluminating its contents with a light beam [1, 2]. Growth observation and development of an embryo allows defining fertility of eggs, thickness and quality of a shell, a state by ovoscoping development of an embryo according to the dates of the procedure, etc. Raying of the studied eggs does not lead to their damage, but it is necessary to perform the procedure quickly to avoid overcooling of eggs and, as a result, possible dying down of embryos or even their death. The optimum duration of ovoscoping is no more than 5 min. At the same time, it is important to remove impractical eggs (fruitless or dead) in due time from an incubator because of infection risk of other eggs with harmful microorganisms. It will also allow to avoid excessive evaporation of water and to exclude a pollution source [2]. Monitoring of eggs development state at an incubation is carried out several times: primary ovoscoping becomes before a bookmark in an incubator and secondary— during incubatory term [3]. When carrying out ovoscoping at an early stage, to a bookmark in an incubator, it is checked: existence of a germ, a yolk with accurate contours also is in the center; the air chamber is placed in the top corner, lack of cracks, roughness’s, ledges, poles and the darkened spots of a shell, foreign objects (grains of sand, duvets), clots, and any other formations. Egg is considered suitable for an incubation at an existence exception at least of one of the listed signs. Otherwise, eggs are rejected, and, on their place, it is possible to place other checked, having cut down costs of resources on incubation of impractical copies. If such a large number is revealed, it is required to surely reconsider microclimate parameters in an incubator: temperature, humidity, composition of the ventilated air, etc. Ovoscoping in households and hen houses is carried out, as a rule, by a hammer ovoscope. On poultry farms or in farms, the procedure is carried out in the warm room on a mirage table by the operator. However, at long-term identification, this procedure is tiresome because of what organs of vision suffer, fatigue collects, and the attention of the operator decreases. He can pass weak embryos or not find out viable that he will also lead to excess expenses of time and energy by production of poultry. On early terms of ovoscoping eliminate unfertilized of which the following deviations are characteristic: the light tone, a yolk is presented as a continuous dark stain, and there are no threads of a blood system. The stood embryo is big by the size, a pronounced oval form and with uneven edge. Identification of a blood circle means that the embryo died. On late terms, the control of eggs development state allows to estimate the course of incubatory process and to define eggs with the stood fruit, as a rule, in a consequence of overcooling, overheating, sticking to a film, etc. At normal

Intellectual Identification Method of the Egg Development …

161

development, the fruit occupies practically all space, its outlines are illuminated, and it is possible even to record some movement [1–3]. Depending on a type of poultry time of carrying out ovoscoping for monitoring of eggs differs development: chicken—7, 11, 19 days; turkey or duck—8, 13, 24– 25 days, goose—9.5, 14–15, 28 days, etc. Recently, apply various methods of incubatory process visualization to increase in monitoring efficiency of egg development state of poultry, including digital processing of a signal, ultrasound, a thermal difference, microscopy, a tomography, magnetic and resonant and infrared visualization [3, 4]. Apply methods of intellectual identification to further interpretation of results that allows to provide high precision of classification, to increase quality control of incubatory process, and to limit costs of industrial production of poultry. For today among methods of images intellectual identification, the classifying deep neural networks which surpassed the pseudo-two-dimensional hidden Markov models in popularity became widespread. The first class of such networks is two-dimensional convolution networks such as: The neural net of LeNet-5 [5] has the simplest architecture and uses two couples of convolutions and downsampling layers and also two full-connected layers. The convolution layer reduces sensitivity to shift image elements. The layer downsampling reduces dimension of the image. Neural nets of AlexNet [6] and neural net of the VGG family (Visual Geometry Group) [7] are modification of LeNet. In these, neural nets can be a little in a row the convolution layers going one after another. Neural nets of the ResNet [7] family uses the residual block which contains two consecutive convolution layers. Output signals of the planes of the layer preceding this block develop with output signals of the planes of the second convolution layer of this block. The neural net of dense convolutional network (DenseNet) [8] uses the fullconnected (dense) block which contains the set of residual blocks. Output signals of the planes of the second convolution layer of the current residual block of this dense block concatenate with output signals of the planes of the second convolution layer of all previous of residual blocks of this dense block and with output signals of the planes of the layer preceding this dense block. Besides, reduction of the planes of the convolution layers (usually twice) which are between dense blocks is used. The GoogLeNet neural net (Inception V1) [9] uses inception block which contains parallel convolution layers with areas of connecting of the different sizes and onelayer downsampling. Output signals of the planes of these parallel layers concatenate. For reduction of operations number consistently connect convolution layers with single connection area (in case of convolution layers such convolution layer is put before them, and in case of the layer of layers downsampling, such convolution layer is put after it) to these parallel layers. The Inception V3 neural net [10] is modification of GoogLeNet, and its inception and reduction blocks are modification of inception block of a GoogLeNet neural net.

162

E. Fedorov et al.

The Inception-ResNet-v2 neural net [11] is modification of GoogLeNet and ResNet; its inception the block is modification of residual and inception block; reduction block is modification of inception block. The Xception neural net [12] uses depthwise separable convolution block which carries out pointwise convolution in the beginning (each plane of the current layer relates to all planes of the previous layer, and the connection area is single), and then depthwise convolution (each plane of the current layer is connected only with the corresponding plane of the subsequent layer). For both convolutions, ReLU activation function is usually used. The MobileNet neural net [13] uses depthwise separable convolution the block which carries out depthwise convolution in the beginning (each plane of the previous layer is connected only with the corresponding plane of the current layer), and then pointwise convolution (each plane of the current layer relates to all planes of the subsequent layer, and the connection area is single). For both convolutions, the linear function of activation is usually used. The MobileNet2 neural net [14] uses inverse residual the block which carries out pointwise convolution in the beginning (each plane of the current layer relates to all planes of the subsequent layer, and the connection area is single), then depthwise convolution (each plane of the previous layer is connected only with the corresponding plane of the current layer), and then pointwise again. For both convolutions, the function of activation of SiLU is usually used. The neural net of SR-CNN [15] uses squeeze-and-excitation—residual the block which is squeeze-and-excitation combination (convolution, global downsampling, two full-connected layers) and residual (two consecutive convolution layers) of blocks. The second class of such networks is transformers such as: As the main component, the encoder consisting of the sequence of blocks contains visual transformer (ViT) [16]. Each block contains the first layer of normalization, multi-head attention (weighs image patches), the second layer of normalization, two-layer perceptron. As the main component the encoder consisting of the sequence of blocks contains data-efficient image transformers (DeiT) [17]. As well as in a case with ViT, each block contains the first layer of normalization, multi-head attention, the second layer of normalization, two-layer perceptron. Unlike ViT, except patches in addition, uses distillation token. As the main component, the encoder consisting of the sequence of blocks contains deep visual transformer (DeepViT) [18]. Each block contains the first layer of normalization, re-attention (modification of multi-head attention, the second layer of normalization, two-layer perceptron. As the main component, the encoder consisting of the sequence of blocks contains class-attention in image transformers (CaiT) [19]. Each block contains the first layer of normalization, multi-head attention, or class-attention (the modification of multihead attention considering not only a patch, but also a class), the second layer of normalization, two-layer perceptron.

Intellectual Identification Method of the Egg Development …

163

As the main component, the multiscale encoder consisting of the sequence of blocks contains cross-attention multiscale vision transformer (CrossViT) [20]. Two encoders (each encoder is like an encoder of ViT) for patches of the big and small size and cross-attention (modification of multi-head attention) which allows to share patches of the different sizes contain each block. Deep neural networks possess one or more of the limitations following disadvantages [21]: • insufficiently high speed of training; • complexity of determination of parameters of architecture of neural network (quantity and size of layers, size of patches, number of blocks of a transformer, etc.); • insufficiently high precision of recognition. In this regard, the problem of creation of effective deep neural network is current. The purpose of work is automation of process of classification and interpretation of results of visualization of ovoscoping of eggs at an incubation by application of deep neural networks due to increase in efficiency of identification of a state of development of embryos of baby birds. For achievement of a goal, it is necessary to solve the following problems: 1. To create model of egg development state identification based on a convolution neural net. 2. To develop model of egg development state identification based on a transformer. 3. To choose quality criteria of the identification method of egg development state. 4. To define structure of the identification method of egg development state. 5. To conduct numerical research of the offered method of egg development state.

2 The Ovoscoping Model on the Basis of the Convolution Neural Net of 2D Generalized Lenet The two-dimensional generalized convolution neural net (2D Generalized LeNet) for classification which is non-recurrent dynamic ANN and has hierarchical structure. Unlike traditional, the 2D LeNet input image is not square. Unlike traditional, 2D LeNet quantity of couples “a convolution layer—the layer downsampling” is defined empirically and depends on the image size. Unlike traditional LeNet, the full-connected layers are absent. Unlike traditional LeNet, the quantity of the planes is defined automatically—as private from division of quantity of cells of an input layer into the two in degree (degree is equal to the doubled number of couples “a convolution layer—the layer downsampling”) that will allow to keep total number of cells in a layer after decrease in sampling which reduces twice the size of the planes of a layer on height and width. Let ν—position in the connection area, ν = (νx , ν y ), K I —quantity of the planes of cells in an input layer I (for RGB images 3), K sl —quantity of the planes of cells

164

E. Fedorov et al.

in the layer downsampling Sl , K cl —quantity of the planes of cells in a convolution layer Cl , A I —connection area of the plane of a layer I , Al —connection area of the 

plane of a layer Sl , L—quantity convolution (or downsampling) layers. 1. Compression of the image based on bilinear interpolation. 2. l = 1. 3. Calculation of an output signal of a convolution cell

u cl (m, i) = ReLU(h cl (m, i)), m ∈ {1, ..., N 1cl }{1, ..., N 2cl }, i ∈ 1, K cl , (1)  K cl = 2 , 2l

h cl (m, i) =

N 1cl =

N 1I , l = 1 , N 1sl−1 , l > 1

 N 2cl =

N 2I , l = 1 , N 2sl−1 , l > 1

⎧ KI   ⎪ ⎪ wc1 (ν, k, i)x(m + ν, k), ⎪ ⎨ bc1 (i) + k=1 v∈A I K sl−1

 ⎪ ⎪ ⎪ ⎩ bcl (i) +



k=1 v∈Al−1

(2)

l=1 ,

(3)

wcl (ν, k, i)u sl−1 (m + ν, k), l > 1

where wc1 (ν, k, i)—connection weight from ν-th positions in the connection area k-th planes of cells of an input layer I to i-th planes of cells of a convolution layer C1 , wcl (ν, k, i)—connection weight from ν-th positions in the connection area k-th the planes of cells of the layer downsampling Sl−1 to i-th planes of cells of a convolution layer Cl , u cl (m, i)—cell output in m-th positions in i-th of the plane of cells of a convolution layer Cl . 4. Calculation of an output signal of the cell downsampling (reduction of scale twice based on averaging)

u sl (m, k) = max u cl (2m + υ, k), m ∈ {1, ..., N 1sl } × {1, ..., N 2sl }, k ∈ 1, K sl , υ∈{0,1}2

(4) K sl = 22l ,

(5)

where wsl (k, k)—connection weight from k-th planes of cells of a convolution layer Cl to k-th the planes of cells of the layer downsampling Sl , u sl (m, k)—cell output in m-th positions in k-th the planes of cells of the layer downsampling Sl . 

5. If l ≤ L, then l = l + 1, go to 2. 6. Flattening

Intellectual Identification Method of the Egg Development …

165

us = (u s ((1, 1), 1), ..., u s ((N 1s , N 2s ), K s )), L

L

L

L

L

(6)

L

7. Calculation of an output signal for an output layer

y( j) = softmax(h o ( j)),

j ∈ 1, No ,

(7)

N 1s N 2s K s  L

h o ( j) = bo ( j) +



L

 L

wo (z, j)u s (z),

(8)

L

z=1

where wo (z, j)—connection weight from i-th a neuron of the flattened layer F to j-th to a neuron on an output layer O, y( j)—output j-th neuron of an output layer O.

3 The Ovoscoping Model on the Basis of the Visual Transformer of One-Block Vit The one-block visual transformer (ViT) for classification which is a non-recurrent network. The transformer is included only by an encoder that uses the mechanism of multi-head attention. Unlike traditional ViT, there is only one block. The model of a one-block visual transformer is presented in the following form. 1. Compression of the image based on bilinear interpolation. 2. Extraction of square patches from the image and their transformation to a flat form  (0)

y

= extract(x),

(9)

where extract(·)—function extraction of square patches from the image.  (0)

yn(0) = reshape(yn ), n ∈ 1, P,

(10)

where reshape(·) function of transformation square dimension matrix N (0) × N (0) n-th a patch in a vector with the same quantity of elements. 3. Transformation of vectors of patches in a full-connected layer ⎛ ⎝ (1) yn(1) j = ReLU bn j +

(0) (0) N N

i=1

⎞ (1) (0) ⎠ wni , j yni

j ∈ 1, N (1) , n ∈ 1, P,

(11)

166

E. Fedorov et al.

(1) Where wni j —connection weight from i-th an image patch element to j-th to a neuron of a full-connected layer, yn(1) j —output j-th a neuron of a full-connected layer for n-th patch. 4. Encoding of a position and addition with the transformed patch vector

yn(2) = yn(1) + embedding((0, ..., N (1) − 1), n ∈ 1, P,

(12)

where embedding(·)—function of encoding of a position of a patch. 5. The one-block encoder carries out the following: 5.1. Normalization (on a layer)  g (2) yni − μ , i ∈ 1, N (1) , n ∈ 1, P, σ   N (1) N (1)  1   1 (2)  (y ), σ = (y (2) − μ)2 , μ = (1) N i=1 ni N (1) i=1 ni (3) yni =

(13)

(14)

(3) where g—gain parameter, yni —output i-th a normalization layer neuron for n-th patch. 5.2. Multi-head attention It is used N (H ) attention heads. 5.2.1. Calculation of inquiries for each head of attention (1)

qln j =

N 

(3) (H ) , wli(Q) j yni , l ∈ 1, N

j ∈ 1, N (K ) , n ∈ 1, P

i=1

(15) where wli(Q) j —connection weight from i-th an image patch element to j-th to inquiry for l-th attention heads, qln j — j-th inquiry for l-th the attention heads for n-th patch. 5.2.2. Calculation of keys for each head of attention (1)

kln j =

N 

(3) wli(Kj ) yni , l ∈ 1, N (H ) ,

j ∈ 1, N (K ) , n ∈ 1, P

i=1

(16) where wli(Kj ) —connection weight from i-th an image patch element to j-th to a key for l-th attention heads, kln j — j-th a key for l-th the attention heads for n-th patch. 5.2.3. Calculation of values of keys for each head of attention

Intellectual Identification Method of the Egg Development …

167

(1)

vln j =

N 

(3) wli(Vj ) yni , l ∈ 1, N (H ) ,

j ∈ 1, N (V ) , n ∈ 1, P

i=1

(17) where wli(Vj ) —connection weight from i-th an image patch element to j-th to value of a key for l-th attention heads, vln j — j-th value of a key for l-th the attention heads for n-th patch. Usually, N (V ) = N (K ) . 5.2.4. Calculation of weights (estimates) of attention for each head of attention The scaled multiplicative attention of “dot” is used. There is a comparison of inquiry and a key. elmn =

(K )

N 

1 N (K )

qln i klmi , l ∈ 1, N (H ) , m ∈ 1, P, n ∈ 1, P,

i=1

(18) almn = softmax(elmn ) =

exp(elmn ) , n ∈ 1, P P  exp(elmz )

(19)

z=1

where almn —connection weight between m-th and n-th patches for l-th attention heads. 5.2.5. Calculation of the heads of attention Multiplication of the weight (assessment) of attention to value of a key. h ln j =

P 

almn vlm j , l ∈ 1, N (H ) ,

j ∈ 1, N (V ) , n ∈ 1, P,

m=1

(20) where h ln j —weighed j-th value of a key for l-th the attention heads for n-th patch. 5.2.6. Formation of a matrix of the weighed values of keys for the heads of attention and patches by means of concatenation ⎤ h 111 ... h 11N (V ) ... h N (H ) 11 ... h N (H ) 1N (V ) ⎦. H = ⎣ ... ... ... ... ... ... ... h 1P1 ... h 1PN(V ) ... h N (H ) P1 ... h N (H ) PN(V ) ⎡

5.2.7. Calculation of multi-head attention

(21)

168

E. Fedorov et al.

yn(4) j

=

) (V ) N (H N 

wi(4) j h ni ,

j ∈ 1, N (1) , n ∈ 1, P,

(22)

i=1

where wi(4) j —connection weight from i-th values of a key to j-th to a neuron of a layer of multi-head attention, yn(4) j —output j-th a neuron of a layer of multi-head attention for n-th patch. 5.3. Addition and normalization (on a layer)  g (2) (4) yni + yni − μ , i ∈ 1, N (1) , n ∈ 1, P, (23) σ   N (1) N (1)  1  1  (2) (4) (4) (yni + yni ), σ =  (1) (y (2) + yni − μ)2 , (24) μ = (1) N i=1 N i=1 ni (5) = yni

(5) where g—gain parameter, yni —output i-th a normalization layer neuron for n-th patch. 5.4. Calculation of an output signal for the hidden and output layer of a two-layer perceptron on P to neural nets for P patches





(1)

⎝ (6) yn(6) j = ReLU bn j +

N 

(6) (5) ⎠ , wni j yni

j ∈ 1, N (6) , n ∈ 1, P,

(25)

i=1 (6)

yn(7) j

=

bn(7)j

+

N 

(7) (6) wni j yni ,

j ∈ 1, N (1) , n ∈ 1, P,

(26)

i=1

where wz(k) j —connection weight from z-th neuron k − 1-th a layer to j-th to a neuron k-th layer, yn(k) j —output j-th neuron k-th a full-connected layer for n-th patch. 6. Addition and normalization (on a layer)  g (4) (7) yi + yni − μ , i ∈ 1, N (1) , n ∈ 1, P, σ (27)   N (1) N (1)  1  1  (4) (7) (7) (yni + yni ), σ =  (1) (y (4) + yni − μ)2 , (28) μ = (1) N i=1 N i=1 ni

(8) (4) (7) yni = Norm(yni + yni )=

(8) where g—gain parameter, yni —output i-th a normalization layer neuron for n-th patch. 7. Flattening

Intellectual Identification Method of the Egg Development …

169

(8) y(9) = (y11 , ..., y N(8)(1) P ).

(29)

8. Calculation of an output signal for the hidden and output layer of a two-layer perceptron ⎛ y (10) = ReLU⎝b(10) + j j

(1) N P

⎞ (9) ⎠ wz(10) , j yz

j ∈ 1, N (10) ,

(30)

j ∈ 1, N (11) ,

(31)

z=1

⎛ y j = softmax⎝b(11) + j

(10) N 

⎞ (10) ⎠ wz(11) , j yz

z=1

where wz(k) j —connection weight from z-th neuron k − 1-th a layer to j-th to a neuron k-th layer, y (k) j —output j-th neuron k-th full-connected layer.

4 Choice of Quality Criterion of the Method for Identification of the State of Egg Development In work for assessment of training of the offered mathematical models of deep neural networks are chosen: • criterion of accuracy I  1   Accuracy = di = yi → max, W I i=1



yi j =

⎧ ⎨ 1, j = arg max yi z z

⎩ 0, j = arg max yi z

;

(32)

z

• criterion of categorical cross-entropy

CCE = −

I K 1  di j ln yi j → min, W I i=1 j=1

(33)

where yi —i-th a vector on model, yi j ∈ [0, 1], di —i-th test vector, di j ∈ {0, 1}, I —power of the training set, K —quantity of classes (neurons in an output layer), W —vector of weights.

170

E. Fedorov et al.

Fig. 1 Block diagram of identification method of egg development state

5 Determination of the Identification Method Structure for the State of the Egg In Fig. 1, the block diagram of identification method of eggs development state for the offered deep neural networks 2D generalized LeNet and one-block ViT is submitted.

6 Numerical Research The numerical research was conducted on the benchmark of chicken [22], containing RGB of the image of 1080 × 800 in size. From 380 images, 80% of the images for the training sample and 20% of images for test and test sample were in a random way selected. Because the offered deep neural networks do not contain recurrent connection, training was made by means of GPU. For realization of the offered deep neural networks, the tensorflow package was used, as the program environment Google Collaboratory was chosen. The LeNet 2D generalized model structure is presented in Table 1. The one-block ViT model structure is presented in Table 2. In Fig. 2a, the losses dependence (based on categorical entropy) from the image size for the one-block ViT model is presented. In Fig. 2b, the losses dependence (based on categorical entropy) from the patch size for the one-block ViT model is presented. In Fig. 2c, the losses dependence (based on categorical entropy) from the number of iterations for the one-block ViT model is presented. In Fig. 2d, the accuracy depends on the number of iterations for the one-block ViT model is presented. The following recommendations can be made based on the results of the numerical research: the best size of the image after compression for one-block ViT in terms of losses (based on categorial entropy) is 112 × 80 (according to Fig. 2a); the best size of compression patch for one-block ViT in terms of losses (based on categorial

Intellectual Identification Method of the Egg Development … Table 1 LeNet 2D generalized model structure

Table 2 LeNet 2D generalized model structure

Layer type

171 Input size

Input

1080 × 800 × 3

Resizing

112 × 80 × 3

Conv2D

112 × 80 × 4

MaxPooling2D

56 × 40 × 4

Conv2D

56 × 40 × 16

MaxPooling2D

28 × 20 × 16

Conv2D

28 × 20 × 64

MaxPooling2D

14 × 10 × 64

Flatten

8960

Output (Full connect or dense)

19

Layer type

Input size

Input

1080 × 800 × 3

Resizing

112 × 80 × 3

Extract patches

14 × 10 × 192

Reshape

140 × 192

Full connect or dense

140 × 8

Add

140 × 8

Normalization

140 × 8

Multi-head attention

140 × 8

Add

140 × 8

Normalization

140 × 8

MLP hidden (Full connect or dense)

140 × 8

MLP output (Full connect or dense)

140 × 8

Add

140 × 8

Normalization

140 × 8

Flatten

1120

MLP hidden (Full connect or dense)

1120

MLP output (Full connect or dense)

19

entropy) is 8 × 8 (according to Fig. 2b); the minimum quantity of iterations for oneblock ViT in terms of losses (based on categorical entropy) (according to Fig. 2c) and accuracy (according to Fig. 2d) is 11. Thus, in terms of the minimum quantity of iterations, the one-block ViT model is more quickly trained, than the LeNet 2D generalized models. On the other hand, the one-block ViT model demands the bigger number of the configured settings (the number of neurons in layers), than the LeNet 2D generalized models.

172

E. Fedorov et al.

Fig. 2 For the one-block ViT model dependence of: a—losses (based on categorial entropy) from the image size, b—losses (based on categorial entropy) from the patch size, c—losses (based on categorial entropy) from the number of iterations, d—accuracy on the number of iterations

7 Conclusions For a solution of the problem of classification process, automation and interpretation of visualization results of eggs ovoscoping at an incubation the corresponding

Intellectual Identification Method of the Egg Development …

173

methods of recognition of images were investigated. This research showed that today the most effective is use of deep neural networks. The created LeNet 2D generalized model unlike traditional 2D LeNet has the following advantages—the input image is not square that expands a scope; the input image contracts previously, and the new size is defined empirically and depends on the initial size of the image that increases the model training speed and identification accuracy on model; the number of couples “a convolution layer—the layer downsampling” is defined empirically and depends on the image size that increases identification accuracy on model; full-connected layers are absent that increase the model training speed; the quantity of the planes is defined as quotient of the division of cells quantity of an input layer into the two in degree (degree is equal to the doubled number of couple “a convolution layer—the layer downsampling”) for preservation of total number of cells in a layer after decrease in sampling which reduces twice the size of the planes of a layer on height and width that automates determination of model layers structure. The created one-block ViT model unlike traditional ViT has the following advantages—the input image is not square that expands a scope; the image contracts previously, and the new size is defined empirically and depends on the initial size of the image that increases the model training speed and identification accuracy on model; the size of a patch is defined empirically and depends on the image size that increases identification accuracy on model; there is only one block that increases the speed of training of model. The offered method of intellectual identification of eggs development state based on deep neural networks allows to provide high precision of classification, to increase quality control of incubatory process, and to limit costs of industrial production of poultry. It can be used in various intellectual systems of visualization results identification of eggs ovoscoping at an incubation on industrial production of poultry.

References 1. Yeo C, Park H, Lee K, Song C (2016) Avian Embryo monitoring during incubation using multi-channel diffuse speckle contrast analysis. Biomed Opt Express 7(1):93–98 2. Hashemzadeh M, Farajzadeh N (2016) A Machine vision system for detecting fertile eggs in the incubation industry. Intl J Comput Intell Syst 9(5):850–862. https://doi.org/10.1080/187 56891.2016.1237185 3. Tsai S-Y, Li C-H, Jeng C-C, Cheng C-W (2020) Quality assessment during incubation using image processing. Sensors 20:5951. https://doi.org/10.3390/s20205951 4. Yu H, Wang G, Zhao Z, Wang H, Wang Z (2019) Chicken embryo fertility detection based on ppg and convolutional neural network. Infrared Phys Technol 103:103075. https://doi.org/10. 1016/j.infrared.2019.103075 5. Wan L, Chen Y, Li H, Li C (2020) Rolling-element bearing fault diagnosis using improved LeNet-5 network. Sensors (Basel, Switzerland) 20(6):1693. https://doi.org/10.3390/s20061693 6. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105

174

E. Fedorov et al.

7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi. org/10.1109/CVPR.2016.90 8. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2018) Densely connected convolutional networks, pp 1–9. arXiv preprint arXiv:1608.06993 9. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions, pp 1–12. arXiv preprint arXiv:1409.4842 10. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision, pp 1–10. arXiv preprint arXiv:1512.00567 11. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning, pp 1–12. arXiv preprint arXiv:1602.07261 12. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions, pp 1–8. arXiv preprint arXiv:1610.02357 13. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications, pp 1–9. arXiv preprint arXiv:1704.04861 14. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR), pp 4510–4520. https://doi.org/10.1109/CVPR.2018.00474 15. Geng L, Hu Y, Xiao Z, Xi J (2019) Fertility detection of hatching eggs based on a convolutional neural network. Appl Sci 9(7):1408. https://doi.org/10.3390/app9071408 16. Dosovitskiy A, Beyer L, Kolesnikov A et al. (2021) An image is worth 16 × 16 words: transformers for image recognition at scale. In: 9th International conference on learning representations, pp 1–22 17. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2020) Training data-efficient image transformers and distillation through attention, pp 1–22. arXiv preprint arXiv:2012.128 77v2 18. Zhou D, Kang B, Jin X, Yang L, Lian X, Jiang Z, Hou Q, Feng J (2021) DeepViT: towards deeper vision transformer, pp 1–12. arXiv preprint arXiv:2103.11886 19. Touvron H, Cord M, Sablayrolles A, Synnaeve G, Jégou H (2021) Going deeper with image transformers, pp 1–30. arXiv preprint arXiv:2103.17239 20. Chen C-F, Fan Q, Panda R (2021) CrossViT: cross-attention multi-scale vision transformer for image classification, pp 1–12. arXiv preprint arXiv:2103.14899 21. Shekhawat SS, Shringi S, Sharma H (2021) Twitter sentiment analysis using hybrid spider monkey optimization method. Evol Intel 3:1–10 22. Fedorov E Chicken eggs image models. https://github.com/fedorovee75/ArticleChicken/raw/ main/chicken.zip

Predicting Order Processing Times in E-Pharmacy Supply Chains During COVID Pandemic Using Machine learning—A Real-World Study Mahesh Babu Mariappan, Kanniga Devi, and Yegnanarayanan Venkataraman Abstract The purpose of this paper is to solve the problem of processing time prediction for orders for medical supplies placed through a large real-world e-Pharmacy— in a post-COVID-lockdown world—using artificial intelligence (AI) and machine learning (ML) techniques. We use an ensemble of ML regressors to predict the processing times of orders for medical supplies and an ensemble of ML classifiers to predict the shipment times of deliverables. We use intelligent model stacking methods to obtain performance improvements for our models. On exact match performance measurement scheme, our solution produces 548.49%, and on a 3-day range performance measurement scheme, our solution produces 25% improvement over the existing statistical solution implemented at the said e-Pharmacy. This is an important problem because when an e-Pharmacy can predict in advance the time elapsed between medical order placement and the time the order gets shipped out, the said ePharmacy can implement measures and controls to optimize the speed of fulfillment. We are one of the first to study real-world e-Pharmacy supply chain from the perspective of order processing time prediction under post-COVID-19-lockdown conditions and come up with a novel ML ensemble stacking approach to make predictions. The value this work provides is that we have shown that the adoption of AI and ML techniques in e-pharmacy supply chains would result in infusing certainty in the supply of therapeutics in these uncertain COVID lockdown times. Keywords e-Pharmacy supply chain · Order processing time prediction · Post-COVID-lockdown · Analytics · Artificial intelligence · Machine learning

M. B. Mariappan (B) · K. Devi Department of Computer Science, Kalasalingam Academy of Research and Education, Krishnankoil 626126, India e-mail: [email protected] Y. Venkataraman Department of Mathematics, School of Applied Sciences, Kalasalingam Academy of Research and Education, Krishnankoil 626126, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_13

175

176

M. B. Mariappan et al.

1 Introduction With the third wave of COVID-19 hitting the world, governments are scrambling to implement measures to control the pandemic. Medical supplies are clearly exhausted, and out-of-stock rates of important COVID-19 medicines such as Remdesivir are becoming a huge problem. In these chaotic times, it is important to bring order to the prevailing situation by applying AI and ML to problems pertaining to demand fulfillment of therapeutic supplies. We came up with and implemented a novel approach to predict in advance the processing times of orders for medical supplies, namely therapeutics, diagnostics, and vaccines. We further used the results obtained downstream to predict the shipment times of medical supplies. According to [1], due to the COVID-19 lockdown, people prefer to buy medicines online. This is where the importance of e-Pharmacies comes to the fore. e-Pharmacies are playing a crucial role in delivering therapeutics, diagnostics, and vaccines to populations efficiently. Countries around the globe have witnessed an increase in online spending during this COVID-19 lockdown. The growth rate of the Internet or online sales far outweighs the corresponding growth rate of offline retail sales, and Internet-based fulfillment is a crucial growth enabler for e-commerce organizations [2, 3]. Growing at a CAGR of around 14.26%, the global e-Pharmacy market, which was approximately USD 42.32 billion in 2019, is expected to generate around USD 107.53 billion by 2025. Besides growing at an exponential CAGR of 63%, the ePharmacy market in India is estimated to reach $3.6 billion by 2022 from a $512 million market in 2018. This is a big industry, and therefore, many opportunities exist for researchers to solve problems faced by e-Pharmacies [4]. Research showed that performance expectancy, effort expectancy, social influence, and hedonic motivation have a positive co-relation with the adoption of e-Pharmacies, with the COVID19 pandemic further encouraging this adoption [5, 6]. Therefore, research on ePharmacies in these COVID times is important and warranted. There have been studies on pharmaceutical supply chains such as the following. Ângelo et al. [7] studied the supply chains of retail pharmacies and proposed integrated digital services such as pharmacy supply management, product traceability, quality management, order management, digital assistant, and product experience to empower users of pharmaceutical compounds. Hines et al. [8] studied pharmacy information systems and concluded that pharmacists had limited awareness regarding their systems’ many decision support functionalities. Lapão et al. [9]’s study on the implementation of online services for pharmacies found that pharmacists spend only 50% of their time engaging with patients’ offline, uncovering an opportunity for online interactions through e-Pharmacies. The following sections present the literature review, problem statement, our solution methodology including data preparation and automation methods used, feature engineering performed on the data, the mathematical background of the ML models used in our solution, model construction and model training, evaluation metrics, results, discussion, and finally the conclusion.

Predicting Order Processing Times in E-Pharmacy Supply Chains …

177

2 Literature Review This section presents a literature review concerning the application of AI and ML techniques to lead-time prediction problems. Polato et al. [10], in their paper, used a combination of the current state—in terms of the data collected so far—and the remaining time estimation as given by a regression model built upon the data to perform the remaining time prediction of business processes. Singh and Soni [11] presented various ML algorithms to predict order lead time for just-in-time production systems. Alenezi et al. [12] used support vector regression for real-time prediction of order flow times to assign due dates to new orders dynamically in a maketo-order production system. They showed that their approach is better than classic time series models such as exponential smoothing, moving average, and feedforward artificial neural networks. Lingitz et al. [13] studied the use of both linear models (linear regression, lasso regression, ridge regression) as well as nonlinear models (Support vector machines, k-nearest neighbors, neural networks, and treebased models) and concluded that a random forest regressor works best for their problem, which was lead-time prediction in semiconductor manufacturing. Öztürk et al. [14] used regression trees for manufacturing lead-time estimation. Meidan et al. [15] presented a data-driven ML approach to lead-time prediction in the semiconductor industry using conditional mutual information maximization and applying a selective Naive Bayesian classifier. They showed that the results they obtained were comparable to using decision trees, neural networks, and multinomial logistic regression. Raaymakers et al. [16] compared regression models and neural network models for make-span estimation in batch process industries and concluded that the neural networks performed better. Wang and Jiang [17] used deep belief networks for order time completion prediction using real-time job shop RFID data. Dosdo˘gru et al. [18] proposed an AI-based hybrid model for lead-time prediction in supply chains. Mohamed [19] studied the effect of lead time on supply chain resilience performance during times of disruption. The authors studied the lead-time effect across different stratifications in a supply chain comprising a factory, a distributor, and a retailer. The authors found that significant disruption impacts—such as those caused by the COVID-19 pandemic—deteriorate alongside lead time. This is the reason why it is important to pay attention to the COVID-19 pandemic when predicting lead times. The authors of [20] collected over 3 million shipments during the COVID-19 pandemic from a pharmaceutical supply chain. Various ML multiclass classification models, namely CatBoost (CB), XGBoost (XGB), extra trees (XRT), decision tree (DT), random forest (RF), multilayer perceptron (MLP), linear stochastic gradient descent (SGD), and the linear Naïve Bayes (NB) were built and striped datasets of (source, destination, shipper) triplets were fed into the models. Stacked meta-models were built on top of the base models to solve the problem. Tenfold cross-validation (CV) was used for performance evaluation. Furthermore, it is an established fact that order lead-time prediction is an important problem for e-commerce companies, and e-Pharmacies are a subset of e-commerce companies. Improved customer satisfaction is a beneficial side effect of accurately

178

M. B. Mariappan et al.

estimating lead times [21, 22]. Not predicting lead times correctly can lead to delivery dates not being met. This often results in loss of confidence from the customer point of view, which is detrimental to an e-commerce organization. Organizations miss due dates when there are disruptions. Lead times get extended due to supply chain disruptions [23]. In addition, firms gain higher profits when they are able to provide shorter-order lead times to customers [24]. It has been shown that firms that offer time-based flexibility to certain customers without premium pricing the flexibility have the most satisfied customers. Furthermore, it was observed that firms experience improved profitability from shorter-order lead times even when they need to pay a premium price to sub-contractors. Some authors have argued that today’s e-commerce organizations provide customers with aggressive, guaranteed delivery dates. Improved operating efficiency in e-commerce order handling is key to aligning logistics service providers with online retailers [25]. They highlight problems such as lack of efficient mechanisms for order preprocessing and frequent order arrivals in an unplanned, ad-hoc manner, causing delay in delivery of orders to end customers. In certain industries, there are long lead times [26–28]. This could result in a company’s revenues and market share getting reduced, inflating costs, sending the company over budget, and threatening production and distribution [29]. This situation gets worsened during pandemics such as COVID-19. Our literature review uncovered studies pertaining to the application of AI and ML techniques on industrial lead-time prediction problems. However, based on our knowledge, we are the first to apply AI and ML techniques on organic real-world COVID-19 data pertaining to customer order lead-time prediction of medical supplies such as therapeutics, diagnostics, and vaccines in the e-Pharmacy supply chains. This is the research gap that we address in this paper.

3 Problem Statement The problem we are solving in this paper is to predict in advance the processing times of orders for medical supplies placed through e-Pharmacies in a post-COVIDlockdown world. This is an important problem because when an e-Pharmacy can predict in advance the time elapsed between medical order placement and the time the order gets shipped out, the said e-Pharmacy can implement measures and controls to optimize the speed of fulfillment. As an extension, we also solve the problem of predicting shipment times of therapeutics, diagnostics, and vaccines.

4 Materials and Methods We applied AI and built ML models to make advance predictions on processing times and shipment times of therapeutics, diagnostics, and vaccines. We experimented with real-world post-COVID-19 lockdown data obtained from one of the

Predicting Order Processing Times in E-Pharmacy Supply Chains …

179

largest e-Pharmacies in India [30], catering to patients distributed all across India. These patients are located across different locations/pincodes/zip codes. Patients place their orders for therapeutics, diagnostics, and vaccines with the e-Pharmacy. The e-Pharmacy then allocates the orders to the appropriate fulfillment centers) (FC). Finally, the FCs deliver the therapeutic supply shipments to the patients through designated logistics service providers. We would like to reiterate here that our solution approach is generic and that we are using an Indian e-Pharmacy as only a case study for validation. We experimented with real-world post-COVID-19-lockdown data. The solution methodology diagram consisting of ensembles of regressors to predict the order processing times and ensembles of classifiers to predict the shipment times of medical supplies is shown in Fig. 1.

Fig. 1 Solution methodology

180

M. B. Mariappan et al.

Our methodology is novel from an ML perspective because we are one of the first to use a combination of ensemble of regressors, ensemble of classifiers, stacked model zoo, along with intelligent data striping mechanisms to solve this problem during COVID. Processing time prediction involves training ensembles of regressors on four different datasets, namely Mixed Pharma Existing Patients (VME) dataset, Mixed Pharma New Patients (VMN) dataset, Over-the-Counter Existing Patients (VOE) dataset, and the Over-the-Counter New Patients (VON) dataset. Shipment time prediction involves training ensembles of classifiers on ten different datasets, namely minimum 100 triplets to minimum 1000 triplets. We create base models, derive predictions from these base models, and then use these outputs to train the second layer of stacked meta-models. Consequently, we create a model zoo comprising all these ML models and deploy these models as part of our solution. We conducted our experiments and built our solution using the latest versions of Python, Pandas, Numpy, Scikit Learn, MySQL, on cloud instances with CPU 2 cores, 4 threads Xeon processors @2.3 Ghz, 46 MB cache, available RAM of ~ 25.3 GB and available disk space of ~ 155 GB. Order Processing Flow Patients log into the e-Pharmacy system hosted on the Web or through the mobile app. They then select the desired product category relating to medicine, wellness, or consultation. If it is a medicine, the patient searches the e-Pharmacy system directly and selects the desired product. The e-Pharmacy system automatically checks whether a prescription is required. If so, the patient is asked to upload the prescription. The e-Pharmacy system then checks the validity of the prescription uploaded by the patient. If the prescription is valid, then the e-Pharmacy sends a confirmation SMS/email to the patient. If the prescription is invalid, then the order is canceled. The patient also has the option to consult with an external doctor to get a prescription. Alternatively, the patient can also upload a prescription directly. The system checks the validity of the uploaded prescription. If it is valid, sends a confirmation SMS/email, else the order is canceled. Patients who select medicines and add them to the cart directly are redirected to the login page, and the process flows as before. When a patient uploads a prescription, the e-Pharmacy digitizes the prescription. The prescription then goes through the verification and validation process, whereby if the prescription is deemed to be junk, the order is canceled by the e-Pharmacy with an “Rx Invalid” note. If the prescription is deemed a duplicate of the previously uploaded prescription, then the order is canceled by the e-Pharmacy with an “Rx Duplicate” note. Consequently, a consultation is scheduled with an external doctor who would then prescribe the patient appropriate medicines after diagnosis. When a patient uploads a prescription, it is first digitized. Then, the prescription is verified against the digitized data. After the Rx verification process, the prescription is validated against the medicine order placed by the patient. If the prescription is valid, then the e-Pharmacy proceeds to dispatch the medicines to the patient. Else, the e-Pharmacy schedules a free online doctor consultation, where a doctor consults with the patient and provides a valid prescription.

Predicting Order Processing Times in E-Pharmacy Supply Chains …

181

When the e-Pharmacy receives an order, if there is a pharma product in the order, the order goes through the pharma check process and hits the control center, else it directly hits the control center. Depending on the availability of the medicines and medical supplies at different locations/fulfillment centers, the control center decides to automatically split the order and send them through multiple fulfillment centers or a single fulfillment center. Data We extracted our real-world data from live data warehouse of one of the largest ePharmacies in India [30]. For post-COVID-lockdown data, we collected data during the time period March 24, 2020 and December 10, 2020. We collected data pertaining to this time period because the Indian Government announced a nationwide COVID19 response lockdown on March 24, 2020 during the first wave of COVID. For the order processing time prediction problem, we divided the data into 4 different primary datasets, namely Mixed Pharma Existing Patients (VME) dataset, Mixed Pharma New Patients (VMN) dataset, Over-the-Counter Existing Patients (VOE) dataset, and the Over-the-Counter New Patients (VON) dataset. For the shipment time prediction portion of the problem, we divided the shipment data into ten categories based on triplets of {pickup location, drop location, shipper}, namely minimum 100 to minimum 1000 triplets. We set a triplet as the combination of Shipment Pickup Pincode, Shipment Drop Pincode, and cp_id, where cp_id is the therapeutic supply logistics service provider ID. Various processes go into pharma fulfillment procedure, namely order placement by the patient, prescription uploading, digitization of the prescription, verification of the prescription, validation of the prescription, push process, item picking process, order packing process, tracking number generation (RT) process, internal shipping process, handover process to the external shipment partner, and finally shipment to the patient. Each of these processes take execution time, and the processing time is defined as the total elapsed between order time, and the time the medical supply package gets picked up by the shipment partner. Shipment time is defined as the time taken by the shipment partner to deliver the shipment to the patient. Data Preparation and Automation Overall, we extracted 3,033,539 real-world records for medical supply orders placed by patients through the e-Pharmacy. We performed data cleansing on the data in the following ways. We implemented outlier removal conditions. We removed records with total processing time less than 0—cases where the process timestamps were incorrectly captured in the database. We also removed records with total processing time greater than 3 times the average processing time. We removed negative time durations—as mistakenly captured in the databases. We removed NULL values in times, where the timestamps were missing from the databases. We also removed distances with NULL values, where the distances were missing from the databases. For data extraction, we applied the following conditions. For pharma orders (Looped), we applied the condition of Rx_order = Y, Looped = Y and Order_type =

182

M. B. Mariappan et al.

Regular. For Over-The-Counter (OTC) orders, we applied the condition of Rx_order = N and Order_type = Regular. We created a local SQL server database to access the data required for processing time prediction (Pharma) using Python extract-transform-load Scripts (ETL). We integrated the following data sources to create a single data source, namely the SQL server data source, Boomerang Shipments data source, and the business intelligence data source.

5 Feature Engineering Table 1 provides the various process timestamps (T 1–T 11) important in this study. There are eleven timestamps given in the table, namely Order_Time, Digitization_Start_Time, Digitization_End_Time, Verification_End_Time, Validation_End_Time, Push_Time, Pick_Time, Pack_Time, RT_Time, Ship_out_Time, First_Scan_Time. Table 2 gives the time elapsed between each step of the process as durations, namely D1 through D10. As part of feature engineering, we extracted D1 through D10 (differences in times) for the last 20 pickedup orders processed by FC, D1 through D10 (differences in times) for the last 20 pickedup orders processed from drop_pincode, D1 through D10 (differences in times) for the last 20 pickedup orders processed through shipment partner cp_id, D1 through D10 (differences in times) for the last 5 pickedup orders processed from current patient, historic average of differences (D1 through D10) for existing patients, historic average processing_time_ in_hours_by_patients, Is order_date_plus_1_holiday, historic_avg_processing_time_in_hours_by_fc, historic_avg_processing_time_in_ Table 1 Timestamps (T 1–T 11)

Timestamp

Timestamp name

T1

Order_Time

T2

Digitization_Start_Time

T3

Digitization_End_Time

T4

Verification_End_Time

T5

Validation_End_Time

T6

Push_Time

T7

Pick_Time

T8

Pack_Time

T9

RT_Time

T 10

Ship_out_Time

T 11

First_Scan_Time

Predicting Order Processing Times in E-Pharmacy Supply Chains … Table 2 Time elapsed durations (D1–D10)

183

Duration

Definition

D1

(Order_time to digitization start time)

D2

(Digitization start time to digitization end time)

D3

(Digitization end time to verification end time)

D4

(Verification end time to validation end time)

D5

(Validation end time to push time)

D6

(Push time to pick time)

D7

(Pick time to pack time)

D8

(Pack time to RT time)

D9

(RT time to ship date)

D10

(Ship date to pickedup time)

Processing time

(Order time to pickedup time)

hours_by_cp_id, historic_avg_processing_time_in_hours_by_drop_pincode for last 20 pickedup orders. For existing patients, the raw values of T 1…T 11 for the last pickedup order of patient were used. For the dynamic features relating to the fulfillment center, we extracted the last 20 orders based on fulfillment center code (Fc_code) and suborder ID, with pickedup time as ascending. For fallback cases, we took the data from the last 20 picked up orders from the overall e-Pharmacy operation. For the dynamic features relating to the drop_pincode, we extracted the last 20 orders based on drop_pincode and suborder ID, with pickedup time as ascending. For fallback cases, we took the data from the last 20 pickedup orders from the overall e-Pharmacy operation. For the dynamic features relating to the shipment partner (cp_id), we extracted the past 20 orders based on cp_id and suborder ID, with pickedup time as ascending. For fallback cases, we took the data from the last 20 picked up orders from the overall e-Pharmacy operation. For the dynamic features relating to the patient_id, we extracted the last 5 orders based on patient_id and suborder ID, with pickedup time as ascending. For fallback cases, we took the data from the last 5 picked up orders from the overall e-Pharmacy operation. For the shipment time prediction problem, a variety of features were used for this study, namely date time and cyclic features, numerical features, categorical features, and several derived features. We performed cyclic encoding on date time and cyclic features, which include pickedup_week_day, pickedup_day, pickedup_time, order_week_day, order_day, and order_time. We included various numerical features such as drop_lat, drop_lon, distance, SLA, pickup_pincode, pickup_lat, pickup_lon, sundays_in_between, holidays_in_between (picked up time and pickedup time + SLA), drop_pincode, among others. drop_pincode is the zip code associated with the patient address.

184

M. B. Mariappan et al.

pickup_pincode is the zip code associated with the fulfillment center address. pickup_lat and pickup_lon denote the latitude and the longitude of the fulfillment center from which the therapeutic shipment is picked up, respectively. SLA is the service-level agreement days provided by the therapeutic supply shipment partner to deliver the therapeutic shipment; sundays_in_between is the number of Sundays that fall between when the therapeutic shipment gets picked up and the pickedup time along with SLA. Distance denotes the physical Euclidean distance between the pickup location (fulfillment center) and the drop_pincode. drop_lat and drop_lon denote the latitude and the longitude of the patient address to which the therapeutic shipment needs to be delivered, respectively. We included various categorical features such as is_pickup_metro, is_drop_metro, fc_code, fc_city, cp_id, Payment_type, among others. is_pickup_metro is a flag indicating whether the therapeutic shipment pickup location is a metropolitan location, and is_drop_metro is a flag indicating whether the therapeutic shipment drop location is a metropolitan location; fc_code is the fulfillment center code; fc_city is the fulfillment center city of location; cp_id is the therapeutic supply shipment partner identification number; Payment_type is the mode of payment for the transaction by the patient. We then performed one-hot encoding of these categorical features. We included various derived features. We calculated the distances from the patient drop location to each of the metropolitan cities. We did this to feed into our system, a proxy for well-connectedness of the patient’s location. Furthermore, we also calculated the distances from the fulfillment center location to each of the metropolitan cities. We did this to feed into our system, a proxy for well-connectedness of the fulfillment centers. We also included derived features such as the therapeutic supply shipment partner performance scores based on their performance over the last 7, 30, 90, and 365 days. A therapeutic supply shipment partner’s performance scores are derived from factors such as number of shipments delayed by the shipper, number of shipments delivered on-time by the shipper, number of shipments delivered in advance by the shipper, over the duration of the calculation of the performance scores. We defined these performance score as the ratio of the difference between SLA and TAT to the SLA, where TAT is the actual turn-around times of the shipments delivered by the therapeutic supply logistics providers. This metric is also known as the advanced delivery percentage for a therapeutic supply shipment partner. Holiday information was also included from the state-wise holiday list of therapeutic supply logistics service providers. We normalized the data using standard scaler. Class weighting was used to handle class imbalance problems.

6 Model Construction and Training We modeled the order processing time prediction problem as a regression problem, predicting continuous values of processing times in hours. We built various regression models such as the random forest regressor [31], XGBoost regressor [32], linear regressor [32], and CatBoost regressor [33]. Furthermore, we built several

Predicting Order Processing Times in E-Pharmacy Supply Chains …

185

meta-models to combine the outputs of these base regressors in an ML-driven way using model stacking. We trained these models on the datasets we prepared, namely Mixed Pharma Existing Patients (VME) dataset, Mixed Pharma New Patients (VMN) dataset, Over-the-Counter Existing Patients (VOE) dataset, and the Over-the-Counter New Patients (VON) dataset. VME contains orders placed by existing patients who buy prescription medicines. VMN contains orders placed by new patients who buy prescription medicines. VOE contains orders placed by existing patients who buy over-the-counter products. VON contains orders placed by new patients who buy over-the-counter products. We modeled the shipment time prediction problem as a classification problem, predicting discrete values of shipment times in days. We conducted a series of experiments and found that it was best to model shipment time prediction problem as a classification problem. We got better results this way rather than modeling the shipment time prediction problem as a regression problem. We built various classifier models such as the random forest classifiers [34], extra trees classifiers (XRT) [35], decision tree classifier [36], multilayer perceptron classifier [37], XGBoost classifiers [38], CatBoost classifiers [39], linear stochastic gradient descent classifier [40], and the linear Naïve Bayes classifier [41] and used intelligent model stacking to obtain robust results.

7 ML Regressors for Order Processing Time Prediction Problem The ML regressors which we used to predict the order processing times are elaborated below. Linear Regressor (LR) LR extracts an appraisal for not previously known variable w, the gain, by determining a weighted sum of previouslyknown variables u, the absorption, and we load a bias term to it, namely w = a + u i vi where i = 1 to m and m is the data point count. LR tries to minimize the sum of squared errors between the actual values and the predicted values. An iterative process of minimizing the error of the LR model on the training data is called gradient descent, where the coefficients are initially given random values, and then iteratively optimized. Random Forests Regressor (RFR) When we consider fissure in a tree every time, an unplanned excerpt of n√predictors is picked as fissure contender from the whole set of q predictors. q is a quintessential value for n. One can build T 1,…, T A trees. The RFR predictor is A Ta (u). The final output is the mean of the outputs of the indigrfA (u) = (1/A) b=1 vidual trees. Since the trees are all combined together in parallel, the overall variance

186

M. B. Mariappan et al.

of random forest regression is low. RFR is a bagging technique, unlike XGBR and CBR, which are boosting techniques. XGBoost Regressor (XGBR) XGB adopts f to denote the first derivative, and Hessian finds its place in the second derivative. Hence, it is denoted by g in XGB. By ignoring the gain less terms, we optimize the left-over function by considering the derivative with respect to gain value. Then, we equate the derivative to zero by negotiating with the bottom-most point in parabola. Such a manipulation for the gain value yields a closed form formula for XGBR. Let f (i) stands for the negative residuals, and g(i) stands for the number of residuals. Then, Ou = (sum of residuals)/(no. of residuals + 3) provides the gain value and the x coordinate for the bottom-most point of the parabola. CatBoost Regressor (CBR) CBR suggests improvements to gradient boosting and deals with categorical variables of high cardinality. For categorical variables with low cardinality, it employs an encoding called one-hot. The next pertinent point is CBR encodes categorical variable values through an indicator function. The indicator function plays a central role to transform the categorical feature’s value to a numerical value. The learning speed of CBR is faster than that of XGBR, when benchmarked on the Epsilon dataset consisting of 400,000 samples and 2000 features. The CBR implementation library comes with optimized hyperparameters. Evaluation Metrics for ML Regressors The following evaluation metrics were computed to evaluate the performance of our stacked regressors for order processing time prediction. Mean absolute error (MAE), root mean squared error (RMSE), R-squared (R2 ), and mean absolute percentage error (MAPE).

8 ML Classifiers for Shipment Time Prediction Problem The ML classifiers which we used in this study are elaborated below. These are the random forest classifiers [34], extra trees classifiers (XRT) [35], decision tree classifier [36], multilayer perceptron classifier [37], XGBoost classifiers [38], CatBoost classifiers [39], linear stochastic gradient descent classifier [40], and the linear Naïve Bayes classifier [41]. Decision Tree Classifier (DT) DT is an amenable and explicable step after step method for supervised learningbased task. It is applied for classification tasks and regression-based tasks as well. It can crisscross a dataset and delineate a path that emulates a tree structure to the familiar causatum. It can be deemed as an upshot depending on control statements and data points that lie on both side of the split vertex which relies on particular

Predicting Order Processing Times in E-Pharmacy Supply Chains …

187

feature value. Cue accrual gives us fissure property as the extent of cue stand in need to explain the tree. Cue accrual optimizes the cue coveted to classify concerned splits as data points and exhibits randomness in splits. Let’s say X is a dataset. Then, cue(X) = −r i log2 (r i ), i runs from I to m. CueY (X) = (|X j |/|X|) cue(X j ) j varies from 1 to ν. Here, r i is the chance of any dataset X to belong to class ei . Cue(D) is the mean cue required to know the category of D. As log map with 2 as base is involved, cue is enciphered in bits. Cue(X) is referred to as X’s entropy. Random Forest classifier (RF) Presume a microarrays training set as follows. E = {(y1, w1 ),…,(yn , wn )} taken in an arbitrary manner from a previously unknown chance issuance (yi , wi ) ~ (Y, W ). We line up to set up a classifier which prognosticates w from y depending on the picked set of data exemplars E. Let’s say K is a set of classifiers k = {k 1 (y),…,k h (y)} is a combo of classifiers that could possibly be weak. Suppose every k h (y) happens to be a decision tree, then we term that combo a random forest. One can set the criteria of k h (y) as ηh = (ηh1 ,…,ηhq ). Such criteria encompass layout of the tree, what variables are divided in what vertices and so on. One can also note down k h (y) = k(y|ηh ), in which case h, the decision tree gives rise to a classifier k h (y) = k(y|ηh ). Extra Trees Classifiers (XRT) XRT is a genre of combo method of swotting which assemblage the ramifications of sundry unmatched resolution trees composed in a forest to out-turn its classification process. It is akin to RF and diverge from it only in the built of the resolution trees. Every resolution tree in the XRT is built from the primeval tutelage specimen. At every test vertex, every tree is allotted in an arbitrary manner a specimen of r traits from the trait-set from which every resolution tree can pick the most appropriate trait to partition the data depending on certain attribute that is mathematical, refereed as the Gini token. This arbitrary specimen of traits gives rise to the formation of sundry unmatched resolution trees. To do trait pickup, every trait is arranged in nonascending manner as per the Gini pertinence of every trait, and the customer picks the prioritized k traits as per one’s option. To compute the selective fact counsel (SFC), the recipe is: SFC = −r i log2 (r i ) where i varies from 1 to d, with d, number of class labels that are unique and r i , the fraction of rows with i as out-turn label. Multilayer Perceptron Classifier (MLP) A linear sill unit comprises a put-in w with m values, a one valued out-turn x. Set s = qt .w = qi wi , “.” stands for dot product; qt means transpose of q. We involve Heaviside step function as an exhilarant function g(s) = 0 if s < 0, 1 if s ≥ 0. An outturn x’s nature is understandably binary: x = g(s) = g(qt .w). A perceptron depends on one layer of linear sill unit with every unit linked to all put-in of w and one-sidedness f vector. That is, x = g(Q.w + f ) where Q = (q1 t ,…,qm t ). A multilayer perceptron is a constitution of a put-in layer, an occult layer of linear sill unit, and an out-turn layer of linear sill unit. The computations are exactly as for a perceptron with out-turn u1 = g(s1 ) = g(Q1 .w + f 1 ), x = g(s2 ) = g(Q2 .u1 + f 2 ).

188

M. B. Mariappan et al.

CatBoost Classifier (CB) Yandex introduced an open-source library called categorical boosting. Other than stratification and atavism, it can be employed for place-in-formation and approbation systems, furcating, etc. Gradient boosting assumes a form that is additive and which sequentially constructs a collection of estimates Ar in gluttonous way with a previously known deficit function M(x, Ar ). It has two put-in values, namely the jth anticipated put-turn value x j and the rth function Ar that approximates x j. Presuming Ar , one can enhance the approximations of x j by determining one more function Ar = Ar−1 + βms , where step size is β and ms is an initial guess picked from an ensemble of functions G to optimize the anticipated deficit. CB introduces ramifications to the method of gradient boosting. XGBoost Classifier (XGB) XGB is a group-based model that is additive with various fundamental layers. Here, one can select a function that optimizes the deficit. It makes use of Taylor series to estimate the value ofthe deficit  function learner k r (wi ). Suppose  for a fundamental . . . + (k n (b)/n! h n . Put b = z (t−1) , h = k(b + h) = k(b) + k  b)h + k  (b)/2 h 2 +  kr (wi ), k(b) = m(zi, zi t−1 ), then M = (D + f i kr (wi ) + h i kr (wi )) + η(kr ). Here, only, up to second-order level of estimation is considered. D is a constant not based on k r (wi ); f i is the first order level estimation at before current step. So, we can compute k i and f i before beginning probe of various fundamental learners. Thus, XGBoost simplifies cumbersome mathematical analysis. Naive Bayes Classifier (NB) Given a trait vector W = (w1 ,…,wm ) and genre variable Dr , Bayes result records: P(Dr |W ) = P(W|Dr )P(Dr )/P(W ), for r = 1,…,R. We refer to P(Dr |W ) as caudal chance, P(W|Dr ), the plausibility, P(Dr ), the precedent chance of a genre and P(W ) the precedent chance of an envisager. In ML terminology, the traits are not dependent and hence do not clash with the other. The traits are based on the happening of the other, and this aspect is not considered by the NB. The NB mixes this a verdict rule. It adopts greatest caudalchance verdict rule and allots a class tab for certain k and hence z = argMaxz P(z) P(wj |z), j = 1,…,m Stochastic Gradient Descent Classifier (SGD) One can use linear regression scenario to describe gradient descent step after step way to lower the sum of squared errors. It is known that a function attains its lowest value when zero is reached by its slope. By exploiting this attribute, we can comprehend the weight vector. It can be adopted for gradient descent also. It is a sequential procedure that begins from an arbitrary point on a function and moves along the slope downward until it attains its least value. There are some pitfalls in gradient descent. Gradient descent is not fast on massive data. By involving randomness, SGD overcomes this hurdle. It arbitrarily selects a data point on the entire set at each sequential operation to lower the calculations to a larger extent.

Predicting Order Processing Times in E-Pharmacy Supply Chains …

189

Evaluation Metrics for ML Classifiers Confusion matrix was first computed from the extracted results, and we then derived the rest of the metrics such as accuracy, precision, recall, F1-score, and Cohen-kappa.

9 Results Order Time Processing Prediction Using ML Regressors Processing time prediction involves training ensembles of regressors on four different datasets, namely Mixed Pharma Existing Patients (VME) dataset, Mixed Pharma New Patients (VMN) dataset, Over-the-Counter Existing Patients (VOE) dataset, and the Over-the-Counter New Patients (VON) dataset. Figure 2 shows the R-squared plots and the actuals vs predicted plots for VME dataset for linear, XGB, CB and RF regressors. Figure 3 shows the R-squared plots and the actuals vs predicted plots for VMN dataset for linear, XGB, CB and RF regressors. Figure 4 shows the R-squared plots and the actuals vs predicted plots for VOE dataset for linear, XGB, CB, and RF regressors. Figure 5 shows the R-squared plots and the actuals vs predicted plots for VON dataset for linear, XGB, CB, and RF regressors. Table 3 provides the results for Mixed Pharma Existing Patients (VME) dataset for processing time prediction for the stacked meta-regressor models. Table 4 provides the results for Mixed Pharma New Patients (VMN) dataset for processing time prediction for the stacked meta-regressor models. Table 5 provides the results for Over-the-Counter Existing Patients (VOE) dataset for processing time prediction for the stacked meta-regressor models.

Fig. 2 R-squared plots (left) and actuals versus predicted plots (right) for VME dataset for linear (top left), XGB (top right), CB (bottom left), and RF (bottom right) regressors

190

M. B. Mariappan et al.

Fig. 3 R-squared plots (left) and actuals versus predicted plots (right) for VMN dataset for linear (top left), XGB (top right), CB (bottom left), and RF (bottom right) regressors

Fig. 4 R-squared plots (left) and actuals versus predicted plots (right) for VOE dataset for linear (top left), XGB (top right), CB (bottom left), and RF (bottom right) regressors

Table 6 provides the results for Over-the-Counter New Patients (VON) dataset for processing time prediction for the stacked meta-regressor models. A plot of a sample of actuals versus predicted for the VOE dataset for order processing time prediction for the stacked CatBoost regressors is shown in Fig. 6. While the order processing times range from 2.411 to 56.016 h in this sample, the MAE is 3.068 h overall. Shipment Time Prediction Using ML Classifiers We ran our AI and ML models and collected results for the post-COVID-lockdown data for the following triplets, namely minimum 100 triplets, minimum 200 triplets,

Predicting Order Processing Times in E-Pharmacy Supply Chains …

191

Fig. 5 R-squared plots (left) and actuals versus predicted plots (right) for VON dataset for linear (top left), XGB (top right), CB (bottom left), and RF (bottom right) regressors Table 3 Results for mixed pharma existing patients (VME) dataset Metric

VME_Linear

VME_XGB

VME_CB

VME_RF

MAE

5.623006

7.117749

9.401894

5.134374

MAPE

32.69601

45.11713

59.87726

27.99295

RMSE

8.778585

11.3406

13.53513

10.06349

R2

0.672596

0.453605

0.221678

0.569739

Table 4 Results for mixed pharma new patients (VMN) dataset Metrics

VMN_Linear

VMN_XGB

VMN_CB

VMN_RF

MAE

3.949874

3.483167

3.297488

3.263948693

MAPE

22.95833

18.46618

18.04883

17.50804313

RMSE

8.691686

8.660717

8.147848

8.216083618

R2

0.6647682

0.667153

0.705407

0.700451713

Table 5 Results for over-the-counter existing patients (VOE) dataset Metrics

VOE_Linear

VOE_XGB

VOE_CB

VOE_RF

MAE

3.417415

3.153171

3.068263

3.093008

MAPE

30.15034

26.55003

26.20526

25.96692

RMSE

5.795312

5.861542

5.70641

5.750502

R2

0.744016

0.738132

0.75181

0.74796

192

M. B. Mariappan et al.

Table 6 Results for over-the-counter new patients (VON) dataset VON_Linear

VON_ XGB

VON_CB

VON_RF

MAE

6.085341

6.171263

5.947309

6.124511

MAPE

55.45376

55.07197

53.39164

56.19272

RMSE

9.625422

9.951514

9.66219

9.797373

R2

0.404996

0.363998

0.400441

0.383547

60 40 20 96

86

91

81

76

71

66

56

61

46

51

41

36

31

21

26

16

11

1

0 6

Processing Time in Hours

Metrics

Orders actuals

VOE_CB predicƟons

Fig. 6 Actuals versus predicted plot for VOE dataset for a sample of hundred records

minimum 300 triplets, minimum 400 triplets, minimum 500 triplets, minimum 600 triplets, minimum 700 triplets, minimum 800 triplets, minimum 900 triplets, and finally minimum 1000 triplets. As shown in Table 7, our shipment time prediction solution attained a predictive performance of 93.5%. In the exact match scheme, the promised delivery date should exactly match the actual delivery date (e.g., patient is shown a delivery date of Jan 15, 2021, and the delivery happens exactly on Jan 15, 2021). However, in the three-day range window scheme, performance is calculated based on actual delivery date falling within a threeday range delivery estimate provided by the system to the patients (e.g., patient is Table 7 Tenfold CV results for level-1 stacked meta-models on post-COVID-lockdown real-world data Classifier

Tenfold CV accuracy

Tenfold CV F1 score

Tenfold CV precision

Tenfold CV recall

Tenfold CV cohen kappa

XRT

0.935995

0.935101

0.934544

0.935995

0.8795

RF

0.932500

0.931142

0.930476

0.932500

0.8726

DT

0.910830

0.911371

0.912124

0.910830

0.8343

XGB

0.905150

0.901404

0.902339

0.905150

0.8202

MLP

0.897592

0.897848

0.898438

0.897592

0.8095

CB

0.866660

0.856652

0.866074

0.866660

0.7437

Linear SGD

0.773122

0.783732

0.801617

0.773122

0.5865

Linear NB

0.526366

0.578744

0.743269

0.526366

0.2596

Predicting Order Processing Times in E-Pharmacy Supply Chains …

193

shown delivery estimate as “between Jan 15, 2021, and Jan 17, 2021,” and the actual delivery happens on either one of the three days).

10 Discussion Theoretical Contributions and Implications The originality of our contribution lies in the fact that we are the first to study a real-world e-Pharmacy supply chain under post-COVID-19-lockdown conditions and come up with novel ML ensemble stacking techniques to make predictions on the order processing times and shipment times of therapeutics, diagnostics, and vaccines. Through this work, we hope that there will be greater adoption of AI and ML techniques in order lead-time prediction in the e-Pharmacy industry in these COVID times. We see that the stacked ensemble of linear regressors works best for the VME dataset obtaining an R2 score of 0.67, whereas stacked ensemble of CatBoost regressors works best for the VMN dataset obtaining an R2 score of 0.71. For the VOE dataset, we see that all the methods deployed, namely stacked ensemble of linear regressors, XGBoost regressors, CatBoost regressors, and random forest regressors works equally well, obtaining R2 scores of about 0.75. Furthermore, with the VON dataset, stacked ensemble of linear regressors, and the stacked ensemble of CatBoost regressors obtained R2 scores of 0.40. The implication is that the order processing time prediction for the VON dataset— new patients who order non-prescription products—is shown to be the harder problem among the datasets and the problems studied. Practical Contributions and Implications Our solutions can be readily adapted and utilized by e-Pharmacy companies to enhance their SCM capabilities in these pandemic times. We have demonstrated the applicability, effectiveness, and use of ensembles of ML regressors to predict order processing times and ensembles of ML classifiers to predict shipment times. On exact match performance measurement scheme, our solution produces 548.49% improvement, and on a 3-day range performance measurement scheme, our solution produces 25% improvement over the existing solution implemented at the said e-Pharmacy. The practical implication is that when an e-Pharmacy is able to predict in advance, the order lead times, the said e-Pharmacy can implement measures and controls to optimize the speed of fulfillment. Limitations In order to establish the generalization capability of our proposed AI and ML-based approach to order lead-time prediction, a broader study encompassing medical orders

194

M. B. Mariappan et al.

data collected from multiple countries and multiple companies is required. While we studied over three million (3 M) medical orders as part of this study, our study is not without limitations, in that the real-world dataset which we collected and used for this study came from a single large e-Pharmacy supply chain in India.

11 Conclusion and Future Work As per our knowledge, we are the first to perform a large-scale real-world study on order lead-time prediction using real-world post-COVID-lockdown data from an e-Pharmacy supply chain. We used an ensemble of ML regressors to predict the processing times and an ensemble of ML classifiers to predict the shipment times. We used intelligent model stacking methods to obtain performance improvements for our models. Results from the empirical study performed show that our AI and ML solution for order processing time prediction achieves R-squared scores of up to 75% and an F1-score of 93.5% on shipment times prediction problem. As shown in Table 8, on exact match performance measurement scheme, our solution produces 548.49% improvement over the existing solution implemented at the e-Pharmacy. On a 3day range performance measurement scheme, our solution produces 25% improvement over the existing statistical solution implemented at the said e-Pharmacy. We conclude that our AI and ML solution is able to make good predictions even during these post-COVID times and hence would be very useful in a post-COVID world in terms of planning and intervention methodologies pertaining to COVID supply distribution. In the future, the authors plan to work on the problem of COVID-19 pandemic’s disruptive effects on the supply chain, distribution and fulfillment operations of e-Pharmacies with pre-COVID-lockdown and post-COVID-lockdown data using AI and ML.

11.1 Disclosure Statement There is no potential conflict of interest. Table 8 AI ML performance table Use cases

e-pharmacy current performance (%)

AI ML performance (%)

Improvement (%)

Exact match

9.75

63.21

548.49

3-day range window

76

95

25

Predicting Order Processing Times in E-Pharmacy Supply Chains …

195

11.2 Funding This research received no external funding.

References 1. Singh H, Majumdar A, Malviya N (2020) E-pharmacy impacts on society and pharma sector in economical pandemic situation: a review. J Drug Deliv Ther 10(3-s):335–340 2. Agatz NA, Fleischmann M, Van Nunen JA (2008) E-fulfillment and multi-channel distribution– a review. Eur J Oper Res 187(2):339–356 3. Maltz AB, Rabinovich E, Sinha V (2004) Logistics: the key to e-retail success. Supply Chain Manage Rev ILL 8(3):48–54 4. Srivastava M, Raina M (2020) Consumers’ usage and adoption of e-pharmacy in India. Int J Pharm Healthcare Mark 5. Miller R, Wafula F, Onoka CA, Saligram P, Musiega A, Ogira D, Okpani I, Ejughemre U, Murthy S, Garimella S, Sanderson M (2021) When technology precedes regulation: the challenges and opportunities of e-pharmacy in low-income and middle-income countries. BMJ Glob Health 6(5):e005405 6. Sonawane MSR, Mahajan VC (2020) Has the outbreak of the Coronavirus pandemic impacted the online pharmacy in serving the nation or capitalization of business opportunities in India? BVIMSR’s J Manage Res 12(2):94–101 7. Ângelo A, Barata J, da Cunha PR, Almeida V (2017) Digital transformation in the pharmaceutical compounds supply chain: design of a service ecosystem with e-labeling. In: European, Mediterranean, and Middle Eastern conference on information systems, Springer, Cham, pp 307–323 8. Hines LE, Saverno KR, Warholak TL, Taylor A, Grizzle AJ, Murphy JE, Malone DC (2011) Pharmacists’ awareness of clinical decision support in pharmacy information systems: an exploratory evaluation. Res Social Adm Pharm 7(4):359–368 9. Lapão LV, Da Silva MM, Gregório J (2017) Implementing an online pharmaceutical service using design science research. BMC Med Inform Decis Mak 17(1):1–14 10. Polato M, Sperduti A, Burattin A, de Leoni M (2014) Data-aware remaining time prediction of business process instances. In: 2014 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 816–823 11. Singh S, Soni U (2019) Predicting order lead time for just in time production system using various machine learning algorithms: a case study. In: 2019 9th International conference on cloud computing, data science and engineering (Confluence), IEEE, pp 422–425 12. Alenezi A, Moses SA, Trafalis TB (2008) Real-time prediction of order flowtimes using support vector regression. Comput Oper Res 35(11):3489–3503 13. Lingitz L, Gallina V, Ansari F, Gyulai D, Pfeiffer A, Sihn W, Monostori L (2018) Lead time prediction using machine learning algorithms: A case study by a semiconductor manufacturer. Proc Cirp 72:1051–1056 14. Öztürk A, Kayalıgil S, Özdemirel NE (2006) Manufacturing lead time estimation using data mining. Eur J Oper Res 173(2):683–700 15. Meidan Y, Lerner B, Rabinowitz G, Hassoun M (2011) Cycle-time key factor identification and prediction in semiconductor manufacturing using machine learning and data mining. IEEE Trans Semicond Manuf 24(2):237–248 16. Raaymakers WH, Weijters AJMM (2003) Makespan estimation in batch process industries: a comparison between regression analysis and neural networks. Eur J Oper Res 145(1):14–30 17. Wang C, Jiang P (2019) Deep neural networks based order completion time prediction by using real-time job shop RFID data. J Intell Manuf 30(3):1303–1318

196

M. B. Mariappan et al.

˙ 18. Dosdo˘gru AT, Boru Ipek A, Göçken M (2021) A novel hybrid artificial intelligence-based decision support framework to predict lead time. Int J Log Res Appl 24(3):261–279 19. Chang WS, Lin YT (2019) The effect of lead-time on supply chain resilience performance. Asia Pac Manag Rev 24(4):298–309 20. Mariappan MB, Devi K, Venkataraman Y, Lim MK, Theivendren P (2022) Using AI and ML to predict shipment times of therapeutics, diagnostics and vaccines in e-pharmacy supply chains during COVID-19 pandemic. Int J Logistics Manage 21. Mohamed AAM (2015) Lead-time estimation approach using the process capability index. Int J Supply Chain Manage 4(3):7–14 22. Al Fikri MA (2015) The influence of lead time and service quality toward customers satisfaction [a case study of shipping company JNE in Cikarang], Dissertation, President University 23. Schuh G, Potente T, Jasinski T (2013) Decentralized, market-driven coordination mechanism based on the monetary value of in time deliveries. In: Proceedings of global business research conference, pp 7–8 24. Kärki P (2012) The impact of customer order lead time-based decisions on the firm’s ability to make money: case study: build to order manufacturing of electrical equipment and appliances 25. Leung KH, Choy KL, Siu PK, Ho GT, Lam HY, Lee CK (2018) A B2C e-commerce intelligent system for re-engineering the e-order fulfilment process. Expert Syst Appl 91:386–401 26. De Treville S, Shapiro RD, Hameri AP (2004) From supply chain to demand chain: the role of lead time reduction in improving demand chain performance. J Oper Manag 21(6):613–627 27. Heikkilä J (2002) From supply to demand chain management: efficiency and customer satisfaction. J Oper Manag 20(6):747–767 28. Mason-Jones R, Towill DR (1999) Total cycle time compression and the agile supply chain. Int J Prod Econ 62(1–2):61–73 29. Karimi CG (2018) Effect of lead time on procurement management in the motor industry in Kenya. J Int Bus Innov Strateg Manage 1(1):22–42 30. Netmeds (2021) Netmeds.com: Indian Online Pharmacy [Online]. Available at: https://www. netmeds.com/. Accessed 01 June 2021 31. Zhan C, Zheng Y, Zhang H, Wen Q (2021) Random-forest-bagging broad learning system with applications for COVID-19 pandemic. IEEE Internet Things J 32. Khakharia A, Shah V, Jain S, Shah J, Tiwari A, Daphal P, Warang M, Mehendale N (2021) Outbreak prediction of COVID-19 for dense and populated countries using machine learning. Annals Data Sci 8(1):1–19 33. Fedorov N, Petrichenko Y (2020) Gradient boosting–based machine learning methods in real estate market forecasting. In: 8th scientific conference on information technologies for intelligent decision making support (ITIDS 2020), Atlantis Press, pp 203–208 34. Iwendi C, Bashir AK, Peshkar A, Sujatha R, Chatterjee JM, Pasupuleti S, Mishra R, Pillai S, Jo O (2020) COVID-19 patient health prediction using boosted random forest algorithm. Front Public Health 8:357 35. AlJame M, Ahmad I, Imtiaz A, Mohammed A (2020) Ensemble learning model for diagnosing COVID-19 from routine blood tests. Inf Med Unlocked 21:100449 36. Vinod DN, Prabaharan SRS (2020) Data science and the role of Artificial Intelligence in achieving the fast diagnosis of Covid-19. Chaos Solitons Fractals 140:110182 - c N, Lorencin I, Mrzljak V (2020) Modeling the spread of 37. Car Z, Baressi Šegota S, Andeli´ COVID-19 infection using a multilayer perceptron. Comput Math Methods Med 2020 38. Carvalho ED, Carvalho ED, de Carvalho Filho AO, de Araújo FHD, Rabêlo RDAL (2020) Diagnosis of COVID-19 in CT image using CNN and XGBoost. In: 2020 IEEE symposium on computers and communications (ISCC), IEEE, pp 1–6 39. Shahriar SA, Kayes I, Hasan K, Hasan M, Islam R, Awang NR, Hamzah Z, Rak AE, Salam MA (2021) Potential of ARIMA-ANN, ARIMA-SVM, DT and CatBoost for atmospheric PM2. 5 forecasting in Bangladesh. Atmosphere 12(1):100

Predicting Order Processing Times in E-Pharmacy Supply Chains …

197

40. Deepa N, Prabadevi B, Maddikunta PK, Gadekallu TR, Baker T, Khan MA, Tariq U (2020) An AI-based intelligent system for healthcare analysis using ridge-adaline stochastic gradient descent classifier. J Supercomput 1–20 41. Samuel J, Ali GG, Rahman M, Esawi E, Samuel Y (2020) Covid-19 public sentiment insights and machine learning for tweets classification. Information 11(6):314

Cognitive Science: An Insightful Approach Manjushree D. Laddha, Harsha R. Gaikwad, Harishchandra Akarte, and Sanil Gandhi

Abstract The development of psychological science where scientific methods are used to create, test, and improve the hypothesis, which is an unmistakable field of study that critically affects different disciplines including data science, artificial intelligence, neuroscience, and artificial neural network. One of the main purposes of this paper is to notice that no single view is sufficient to study the complex human mind; interconnection and interaction are needed between these disciplines to give more insight. Exploration questions identifying with the investigation of reasoning, instinct, and knowledge were thought of insignificant for insightful requests before the start of intellectual science; notwithstanding, a more noteworthy consideration has been given to these inquiries since the start of the intellectual worldview. Intellectual science carried an alternate point of view to the investigation of a client’s psychological models, setting encompassing a client, information portrayal, its creation, and move. This viewpoint then, at that point took establishes in the discipline of data science and in this manner affected the examination. This paper is a rundown of the different commitments of cognitive science to artificial intelligence, neuroscience, artificial neural networks, data science with specific accentuation on data recovery research. The audit of different commitments, some new exploration roads is examined and introduced the importance of the intellectual worldview to these roads. It would be significant for data researchers to consider the use of the intellectual way to deal with these exploration regions so the hypothetical and viable extent of the different disciplines of cognitive science could be improved. M. D. Laddha · H. R. Gaikwad (B) · H. Akarte · S. Gandhi Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra 402 103, India e-mail: [email protected] M. D. Laddha e-mail: [email protected] H. Akarte e-mail: [email protected] S. Gandhi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_14

199

200

M. D. Laddha et al.

Keywords Cognitive science · Psychology · Neuroscience · Artificial intelligence · Deep learning · Computer vision · Linguistics

1 Introduction Cognitive is characterized as ‘coming to know.’ It incorporates the inward psychological cycles of learning, insight, perception, thought, retention, and consideration which is stranded in a transformative change in the brain science of gaining from behavioral to intellectual center. Cognitive science encourages the development of constructive socio-perspective on education since it was discovered that understudies effectively join the social association and individual practice into the interaction of changing data to information. Subsequently, assessment of available intellectual science writing the discovery of the writing featured the different learning hypothesisbased ideas that appeared to relate straightforwardly or in a round about way to experiential learning. Cognitive science can develop a person’s life from its relation to the buyer’s decision. Individuals rarely refer to what is going on in their past. Clients and strategy makers need to help buyers make better decisions. We can use cognitive parameters like memory, reasoning, concentration, and planning to work with excellent dynamics for planning the decision situation. Cognitive science gives more sophisticated ideas about the buyer’s psychological abilities and restrictions, which are introduced into the planning of decision design, where the financial aspect is also taken into consideration [1]. In the cognitive disciple model, four fundamental provisions win content utilization with intellectual systems embedded, experiential techniques, sequencing from the easy to the different and complex test, and the social science of the local area of training. The zero in is on individual and gathering issue settling arranged in settings that reproduce real-world settings utilizing the educational ideas of displaying, training, and platform or blurring while at the same time giving progressively mind-boggling openings for learning.

2 Background and Related Work The word ‘cognition’ is gotten from the Latin word ‘cognoscere’, which signifies “to know” or “to come to know”. Cognitive science counts experimental psychology cognition which deals with the identification of processes elemental behavior and cognition, artificial intelligence, neuroscience, logistic, and anthropology [2]. Figure 1 depicts the relationship between cognitive science and computer science. Cognitive science not aggregates of all these areas but, collectively these all notions solve a given specific problem. It is not a specific field of research but collective efforts took by scientists working in different above mentioned fields [3].

Cognitive Science: An Insightful Approach

201

Fig. 1 Relationship between cognitive science and computer science. Source https://nyuccl.org/ cogsci.html

Consequently, cognition incorporates the exercises and cycles worried about the securing, stockpiling, recovery, and preparing of information. At the end of the day, it may incorporate the cycles that assist us with seeing, join in, recall, think, sort, reason, choose, etc. One of the objectives of artificial intelligence was to make artificial frameworks that outperform or at least challenging human capacities. A large portion of the early examination focused on numerical models of knowledge, dismissing the human viewpoint. Cognitive science then again prompted the development of frameworks for displaying human cognition and thinking, however, not really knowledge. This part,called cognitive modeling, develops artificial systems that can simulate mental capacities and measures, which are dependent on test information.

2.1 Artificial Intelligence Collins Dictionary of artificial intelligence defines cognitive science as, the field that studies the mechanics of human intelligence. Cognitive science involves the investigation of the processes involved in producing intelligence in a given situation. Cognitive science and artificial intelligence are analogous to each other; advancement in one area will affect the framework of research in other. Depending on our understanding of how the human mind works, will try to create an artificial human mind. Likewise, advances and failures in artificial intelligent will excite to make more research in cognitive science [4].

202

M. D. Laddha et al.

Due to advances and successful experimental research in artificial intelligence, there is a shift of thinkers from orthodox cognitive science to embodied-embedded cognitive science. AI research is also responsible for the establishment of en-activism as a mainstream of cognitive science [5]. Intelligence is generally considered as an extraordinarily human capacity, yet there are various natural organic entities that display a minor structures or grades of knowledge. With the appearance of computing science, people began contemplating whether the “machines can think” as well? This is the issue considered by Turing in his persuasive article “Registering hardware and knowledge”. With programmable machines, calculations were planned that could play out a portion of the mental assignments that require higher mental capacities like thinking and learning. The examination and plan of such calculations and techniques structure the premise of manmade consciousness, i.e., artificial intelligence (AI), a field meaning to deliver smart machines. Exploration in artificial intelligence incorporates critical thinking, information, and thinking, getting the hang of, preparation, correspondence (counting normal language handling), discernment and different qualities of knowledge in case of education such as to predict the employability of students [6]. A large portion of the ways to deal with artificial intelligence is constricted and just has interest in taking care of a specific issue or a bunch of comparative issues. For instance, a personal computer that can beat title holders in Chess, can’t rival beginners in Checkers. A definitive objective of artificial intelligence is to deliver general knowledge in machines. This type of artificial intelligence was previously known as solid artificial intelligence, but it has recently resurfaced under the name artificial general intelligence (AGI). AGI means to fabricate PC programs that can simulate general knowledge and have an option to address unanticipated issues and perform at or past human level [7]. There are three major streams of computer science: artificial intelligence, cognitive science, and system applications and solutions. Because of the connections and implications between these components, they form solutions known as cognitive machines [8].

2.2 Neuroscience Cognitive neuroscience is basically a connection or co-relation between brain and mind. Developmental cognitive neuroscience is an intrinsically multidisciplinary effort, where theoretical findings from multiple levels of description are integrated into a holistic account of the origins of behavior [9]. Cognitive neuroscience encompasses the analysis of all mental functions that are associated with neural processes which are moving over broad areas. It thoroughly checks from how human thinks to how computational experiments take place in the laboratory [10].

Cognitive Science: An Insightful Approach

203

Neuroscience methodology acquired fame from the previous duration or somewhere in the vicinity. It includes mind imaging gadgets to examine intellectual capacities. Factors can assist with finding where such cycles happen in the cerebrum. At the end of the day, this methodology includes utilizing mind imaging and cerebrum life systems to consider ‘live’ psychological working in solid people. As the innovation improves, these examines are turning out to be more persuasive and conceivably helpful. A portion of the strategies utilized in the intellectual neuroscientific approach include: Single Unit Recording, Event-Related Potentials (ERPs), Positron Emission Tomography (PET), (Functional) Magnetic Resonance Imaging (fMRI, MRI), Magneto-encephalography (MEG), and Trans-cranial magnetic stimulation (TMS) [11]. In any case, these strategies may be of problematic use with high-request working which probably won’t be coordinated succinctly. Likewise, if information from a few people is received at the midpoint of the translations become as needs be gruff. Here and there, when utilizing these strategies, propensity for exploration to be directed is simply for research. Papers can regularly be deficient with regards to any hypothetical premise and result in impromptu theories. Besides, limit levels should be set to dismiss clamor, and these levels are a questionable issue.

2.3 Artificial Neural Network Neural network models can now perceive information from images, interpret text, and conversions of languages. These systems are highly brief but inspired by human minds and use only biologically conceivable computing. In the following years, a neural network will be less dependent for learning on the labeled huge data scale, and it will give a more robust performance. Based on the achievements and failures in an artificial neural network in the computations tasks, we can understand where the mind should excel. Deep learning also provides an engine for testing cognitive theories. There is a large number of levels at which cognitive neuroscience can use deep learning. Deep learning can be used in cognitive science in a large number of levels, for different motivational theories to make complete computational models. Growing advances in deep learning take us near to recognize how cognition and perception can be carried out in the brain which is the extensive challenge of cognitive science [12]. Neural networks of the PDP style, which indicate the outcome of deep learning, have been the most radical form of experiment in artificial intelligence. Their achievements derive from the creative invention of Backpropagation. PDP style neural networks have used the fundamental role of cognitive science in the ideas of various computations and representations [13].

204

M. D. Laddha et al.

2.4 Robotics There is a difference between the conventional robot and the cognitive humanoid robot. Creating a humanoid robot is the biggest challenge as it must think, act according to the external environment, and learn from past experiences. As it is emerging and a fast growing field of research, there are many challenges to robotic scientists from creating a model which will have the ability of self learning to it must have the conscious intellectual capability. Many researchers have proposed the architecture of cognitive humanoid robots [14]. Robots show a goal-oriented behavior in changing environmental conditions and give ideas which kind of minds produce goal-oriented behavior. Recent advances in robotics are also contributing to the researches in cognitive science and mind studies [3]. Table 1 gives the literature review on the proposed cognitive architecture for robots. New convention cognitive development robotic is used for designing architecture humanoid robots.

3 Information Processing Approach The innovation in cognitive science has allowed interpreting new views of human behavior and the concept of computation helps to understand the assumption of the mind. New tools are provided by the computations to analyze these assumptions. A large amount of behavioral data is generated due to human interaction with each other and with the interaction of humans and computers. The above type of data preferred a data scientist who tries to forecast people behavior with help of their past data. New cognitive innovation is needed to demonstrate the mind values as intervening variables in these interpretations which will help to evaluate the model of human cognition [15]. The transcendent methodology in the investigation of brain is the computational hypothesis of psyche. The data handling view intends to comprehend the brain as far as cycles that work on portrayals. The essential supposition that is that any psychological cycle can be thought as a processable capacity. The data handling approach contends that perception is perceived as far as discrete, mental portrayals (images), and psychological cycles are change of such portrayals or images depicted as far as rules or calculations. Utilizing the data handling, psychological brain science has advanced well over the most recent years bringing about itemized information on measures including discernment, consideration, memory, language, and dynamic. Conventional psychological brain science utilizes social tests to comprehend the psychological and enthusiastic cycles. For instance, starting with psycho-physics in the nineteenth century, the investigation of visual discernment has explained a large number of the systems engaged with various parts of vision like tone, shape, and movement that has likewise prompted the improvement of point by point computational models furthermore, connections to neural systems hidden vision [21].

Cognitive Science: An Insightful Approach

205

Table 1 Literature survey for cognitive architecture for robots S. No. Author name Proposed cognitive architectures 1.

Mushtaq. et al. [14]

2.

Tanguy [16]

3.

Arsénio et al. [17]

4.

Chella et al. [18]

5.

Sandra et al. [19]

6.

Kinouchi et al. [20]

Capable of cognitive development through social engagement and self exploration A robot coach capable of demonstrating rehabilitation activities to patients, monitoring a patient’s performance and providing comments in order to improve and motivate him Using learning aids, cognitive prototypes, and educational activities to educate a humanoid robot’s perceptual system in the manner of infant development, so that a robot learns about the environment according to a child’s developmental phases A robot’s emotional response when interacting with humans is simulated. The robot’s architecture is built on the autonomous induction of a probabilistic emotional conceptual space from data Adding skills into a cognitive architecture for collaborative problem solving and task completion The proposed system adapts to its surroundings spontaneously and implements a clearly defined “consciousness” function, which enables both habitual and goal-directed behavior patterns

3.1 Recent Advances One of the advancing terms in area of cognitive science is cognitive computing which is defined with respect to its functionality. Cognitive computing is a computational climate which is involved a superior figuring framework controlled by uncommon processors, for example, multi-core CPUs, GPUs, TPUs, and neuromorphic chips; a programming improvement climate with natural help for equal and dispersed figuring, and fueled by the hidden processing framework; programming libraries and AI calculations for separating data and information from unstructured information sources; an information investigation climate whose measures and calculations impersonate human psychological cycles; and question dialects and APIs for getting to the administrations of the cognitive computing climate [22]. The solution which is formed due to the interaction and communication between the three important components, artificial intelligence, cognitive science, system application, and solutions is known as a cognitive computer. The architecture of a cognitive computer is created by the cognitive solutions of artificial intelligence. The base of developed applications is semantic cognitive solutions that are executed by the cognitive computer. Cognitive analytics draws upon the cognitive computing climate to create noteworthy experiences by breaking down different heterogeneous information sources utilizing cognitive models that the human mind utilizes [23].

206

M. D. Laddha et al.

Cognitive analytics is sought after according to two corresponding viewpoints. The first is driven by the software engineering analysts in both industry and the scholarly community. Advances in enormous information, distributed computing, regular language comprehension, and AI are empowering extraction of information from immense archives of unstructured information like regular language text, pictures, video, and sound. According to this current gathering’s viewpoint, the information removed from the unstructured information combined with factual induction and thinking recognizes intellectual investigation from business examination. The subsequent viewpoint is progressed by psychological and neuroscience specialists. They utilize speculations of psyche, practical spaces of the cerebrum, and cognitive models and measures. Cognitive informatics (CI) is a multidisciplinary study of computer science, information science, cognitive science, and intelligence scientific research into the brain’s internal information processing mechanisms and processes, as well as their engineering applications in cognitive computing [24]. Recent advances in cognitive computing and engineering applications have resulted in the emergence of cognitive computing and the development of cognitive computers capable of perception, reasoning, and learning. Cognitive computing is a new paradigm of intelligent computing methodologies and systems based on cognitive informatics that implements computational intelligence through autonomous inferences and perceptions that mimic the mechanisms of the brain [24]. The nature of information processing in the brain, such as information acquisition, representation, memory, retrieval, creation, and communication, is the focus of CI. Based on CI theories and cognitive models, mechanisms of the brain and mind can be systematically explored using an interdisciplinary approach, modern information and neuroscience technologies. Cognitive computing are expected to enable the following innovations, among others: 1. A reasoning machine for complex and long-series of inferences, problem solving, and decision-making beyond traditional logic and if-then-rule based technologies. 2. An autonomous learning system for cognitive knowledge acquisition and processing. 3. A novel search engine.

3.2 Implementation Approach Artificial intelligence, neuroscience, artificial neural networks, and data sciencerelated programs easily implemented in Python. By using machine learning and deep learning algorithms. Depending upon the type of application, select the proper algorithm. If machine learning algorithms are used, then the model parameter is to find a variable; that is, specific to the model and whose value can be estimated from the given dataset.

Cognitive Science: An Insightful Approach

207

These model parameters are the part of the model which are the key to the machine learning algorithm. These are learned from historical training data. If deep learning algorithms, then the model parameters are weight and bias. These parameters are learned during the learning process from training data. These parameters measure how well a model is performing in terms of accuracy.

4 Conclusion The development of cognitive science concept has profited various fields. Distinctive calculated advancements inside cognitive science have affected the idea of exploration in different fields including artificial intelligence to robotics. Recent advances in cognitive informatics and cognitive computing contribute in creating intelligent machines. The client fixated approach weights on considering the client’s psychological state. However, the socio-cognitive viewpoint is more helpful, and the client focused methodology ought not to be deserted. Rather, it ought to be utilized related to the socio-psychological way to deal with comprehend the intellectual states of a client inside a more extensive hypothetical structure. Also, a concern to investigate the ideas like data lopsidedness, data bundling, and detail intricacy utilizing the intellectual methodology. Cognitive science might be utilized for the investigation of various cycles that becomes possibly the most important factor when data scatters, diffuses, and moves. This sort of position would increment the degree of multidisciplinary research and could grow the extent of ensuing exploration contemplates.

References 1. Bartels DM, Johnson EJ (2015) Connecting cognition and consumer choice. Cognition 135:47– 51 2. Ogiela L, Ogiela MR (2012) Advances in cognitive information systems. In: Cognitive systems, monographs 3. Silverman GW, Friedenberg JD (2015) Cognitive science: an introduction to the study of mind. SAGE Publications, USA 4. Westberg M, Amber Z, Amro N (2019) A historical perspective on cognitive science and its influence on XAI research. In: International workshop on explainable, transparent autonomous agents and multi-agent systems. Springer, Cham 5. Tom F (2007) On the role of AI in the ongoing paradigm shift within the cognitive sciences. In: 50 years of artificial intelligence. Springer, Berlin, pp 63–75 6. Laddha MD (2022) To predict employability of student by using artificial neural network. In: ICDSMLA. Springer. Singapore, pp 675–682 7. Dupoux E (2018) Cognitive science in the era of artificial intelligence: a roadmap for reverseengineering the infant language-learner. Cognition 173:43–59 8. Wang Y, Zhang D, Kinsner W (eds) (2010) Advances in cognitive informatics and cognitive computing. In: Studies in computational intelligence

208

M. D. Laddha et al.

9. Sun R (2008) Introduction to computational cognitive modeling. Cambridge handbook of computational psychology, pp 3–19 10. Thomas MSC, Forrester NA, Angelica R (2016) Multiscale modeling of gene-behavior associations in an artificial neural network model of cognitive development. Cogn Sci 40(1):51–99 11. Nadel L, Piattelli-Palmarini M (2003) What is cognitive science. In: Encyclopedia of cognitive science. Macmillan, London 12. Banich MT, Compton RJ (2018) Cognitive neuroscience. Cambridge University Press 13. Storrs KR, Kriegeskorte N (2019) Deep learning for cognitive neuro-science. arXiv preprint, arXiv:1903.01458 14. Mushtaq MF, Akram U, Tariq A, Khan I, Zulqarnain M, Iqbal U (2017) An innovative cognitive architecture for humanoid robot. Int J Adv Comput Sci Appl (IJACSA) 8(8). https://doi.org/ 10.14569/IJACSA.2017.080808 15. Griffiths TL (2015) Manifesto for a new (computational) cognitive revolution. Cognition 135:21–23. ISSN 0010-0277. https://doi.org/10.1016/j.cognition.2014.11.026 16. Tanguy P (2016) Cognitive architecture of a humanoid robot for coaching physical exercises in kinaesthetic rehabilitation 17. Arsénio AM, Teaching a robotic child—machine learning strategies for a humanoid robot from social interactions. Int J Adv Robot Syst. https://doi.org/10.5772/46213 18. Chella A, Sorbello R, Pilato G, Vassallo G, Balistreri G, Giardina M (2011) An architecture with a mobile phone interface for the interaction of a human with a humanoid robot expressing emotions and personality, vol 6934, pp 117–126. 10.1007/978-3-642-23954-0_13 19. Sandra D, Milliez G, Fiore M, Clodic A, Alami R (2016) Some essential skills and their combination in an architecture for a cognitive and interactive robot 20. Kinouchi Y, Mackin KJ (2018) A basic architecture of an autonomous adaptive system with conscious-like function for a humanoid robot. Front Robot AI 5. https://doi.org/10.3389/frobt. 2018.00030 21. Afzal W, Thompson KM (2012) Contributions of cognitive science to information science: an analytical synopsis 22. Wang Y, Zhang D, Kinsner W (eds) (2010) Advances in cognitive informatics and cognitive computing. vol 323. Springer 23. Gudivada VN, Irfan MT, Fathi E, Rao DL (2016) Cognitive analytics: going beyond big data analytics and machine learning. In: Handbook of statistics, vol 35. Elsevier, pp 169–205. ISSN 0169-7161. https://doi.org/10.1016/bs.host.2016.07.010 24. Bogdanova M (2017) Cognitive science: from multidisciplinarity to interdisciplinarity. Int J Cogn Res Sci Eng Educ 5(2):145

Predicting the Dynamic Viscosity of Biodiesels at 313 K Using Empirical Models Youssef Kassem , Hüseyin Çamur , Tu˘gberk Özdemir, and Bawa Bamaiyi

Abstract Biodiesel is an alternative energy source produced from renewable materials. In this study, five machine learning models were developed to predict the dynamic viscosity (DV) of biodiesel at 313 K. Furthermore, the performance of the developed models was compared with quadratic model and multiple linear regression. For this purpose, the biodiesel data in terms of kinematic viscosity and density at 313 K with various blending ratios with petro-diesel fuel were collected from previous scientific studies. Then, the dynamic viscosity of biodiesel was calculated. The developed models have been evaluated by estimating the value of R-squared, root mean squared error, and mean absolute error. The results showed that the developed models have high efficacy in predicting the DV at 313 K. Furthermore, the results demonstrated that the QM model has the best prediction performance compared to other models. Keywords Biodiesel · Dynamic viscosity · Kinematic viscosity · Blending ratio · Density · Machine learning · MLR · QM

1 Introduction Biodiesel is domestically obtained from refined and waste vegetable oils, animal fats, or other lipids [1]. It has many advantages such as renewable fuel, higher viscosity, density, cold flow properties, and emissions reduction [2]. Because of these advantages, it gained the scientific community’s attention considerably over Y. Kassem (B) · H. Çamur · T. Özdemir · B. Bamaiyi Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia 99138, North Cyprus e-mail: [email protected] H. Çamur e-mail: [email protected] Y. Kassem Faculty of Civil and Environmental Engineering, Near East University, Nicosia 99138, North Cyprus © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_15

209

210

Y. Kassem et al.

the past decade. The characteristics of biodiesel depend on feedstock’s chemical and physical properties [3, 4]. Thus, biodiesel’s physicochemical properties depend on the fatty acid profile and its characteristics [3, 4]. Thus, the combustion and emissions characteristics are dependent on the type of biodiesel used. Viscosity and density are considered crucial properties that influence the characteristics of the engine [5–7]. They depend on the fatty acid profile of the source utilized for biodiesel production [6, 7]. These properties of biodiesel must meet the specifications of fuel (ASTM D6751 and EN 14,214 standards) [8]. It can be used directly in the diesel engine or as a blend component in petro-diesel. Therefore, it is necessary to provide a predictive model that can determine the variation of biodiesel viscosity with various volume fractions and temperatures [8]. Therefore, the current study aims to estimate the dynamic viscosity (DV) of various types of biodiesel at 313 K using machine learning models, quadratic model (QM), and multiple linear regression (MLR). It should be noted that DV can be calculated using Eq. (1) [9]. The analysis procedure for the current study is illustrated in Fig. 1. DV = KV × D

(1)

2 Material and Methods 2.1 Data In this work, the biodiesel data in terms of kinematic viscosity and density at 313 K with various blending ratios with petro-diesel fuel were collected from previous scientific studies [10–21]. Table 1 lists the statistical parameters of the used variables.

2.2 Machine Learning Models (MLMs) MLMs are utilized as a tool to describe a complex system [22]. Wide ranges of ML models are utilized to solve complex problems in a variety of fields. In this study, Feed-Forward Neural Network (FFNN), Cascade Feed-forward Neural Network (CFNN), Elman neural network (ENN), generalized regression neural network (GRNN), and layer recurrent neural network (LRNN) are developed to predict the dynamic viscosity at 313 K. The description of the developed models was given in Ref. [23–25]. In this work, blending ratio (BR), kinematic viscosity (KV), and density (D) at 313 K are used as illustrative input variables. The training (75%) and testing data (25%) were used to develop and validate the models, respectively.

Predicting the Dynamic Viscosity of Biodiesels …

211

Fig. 1 Analysis procedure for the current study

Table 1 Statistical parameters of used variables Variable

Mean

Standard deviation

Minimum

Maximum

BR

61.8

34.39

0

100

KV

7.321

9.661

1.701

45.34

D

792.1

135.9

254.7

932.5

DV

0.005662

0.007483

0.000768

0.040788

2.3 QM and MLR The QM is a mathematical and statistical method polynomial model as expressed below. Y = β0 +

n  i=1

βi xi +

n  i=1

βii xi2

+

n−1  n  i

i=i+1

βi j xi x j

(2)

212

Y. Kassem et al.

where Y is the predicted response, β0 , βi , are related to the main effects and βii and βi j to interaction, and x i and x j are the independent variables. Moreover, MLR is used to predict the DV of biodiesel. MLR is expressed as Y = β0 + β1 x1 + . . . + βi xi i = 1, 2 . . . n

(3)

where Y , β0 , βi , and xi are defined above. In general, these models aim to show the relationship between the variables as shown in Eq. (4). DV = f (BR, KV, D)

(4)

3 Results and Discussion 3.1 MLMs Aforementioned, five MLMs were utilized to estimate the DV of biodiesel at 313 K. Thus, the value of blending ratio, kinematic viscosity, and density are used as input parameters. The best network configuration was found by trial and error method and selected based on the lowest value of mean squared error (MSE). In the present work, the training function was TRAINLM. The back-propagation algorithm was used to reduce the MSE value between the measured and predicted values. It was found that the optimal configuration for MLMs is shown in Table 2. Figure 2 compares the predicted data with the actual data of DV using five machine learning approaches. It found that predicted data are generally close to the observed data. Table 2 Optimal configuration for MLMs MLM

Configuration

Spread constant

Number of neurons

MSE

FFNN

3:1:1



5

1.66 × 10−7

CFNN

3:1:1



8

4.32 × 10−8

GRNN



0.002





ENN

3:1:1



7

1.27 × 10−7

RLNN

3:1:1



7

9.28 × 10−8

Predicting the Dynamic Viscosity of Biodiesels …

213

EsƟmated data [Pa.S]

0.035 0.03 0.025 0.02 0.015 0.01 0.005 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual data [Pa.S] Actual

FFNN

CFNN

LRNN

ENN

GRNN

Fig. 2 Comparison of the estimated data with the observed data for DV using different ANN models

3.2 QM and MLR The mathematical equations that were developed using MLR (Eq. (5)) and QM (Eq. (6)) for predicting the DV are expressed as shown below. DV = −0.00253 − 0.0006692 · BR + 0.03623 · KV + 0.00526 · D

(5)

DV = −0.000433 + 0.001112 · KV + 0.001153 · D + 0.02958 · KV · D

(6)

Figure 3 compares the predicted value obtained by QM and MLR models with the measured value. It showed that the estimated values are closed to the actual data.

EsƟmated data [Pa.S]

0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Actual data [Pa.S] Actual

MLR

QM

Fig. 3 Comparison of the estimated data with the observed data for DV using MLR and QM

214

Y. Kassem et al.

Table 3 Value of R-squared and RMSE for the developed models Statistical FFNN indicator

CFNN

ENN

LRNN

GRNN

MLR

QM

R-squared 9.88E−01 9.99E−01 9.99E−01 9.99E−01 9.61E−01 9.81E−01 1.00E + 00 RMSE

9.85E−04 2.71E−04 4.27E−04 3.21E−04 1.66E−03 1.07E−03 9.87E−19

3.3 Performance Evaluation of Machine Learning Models, MLR and QM The MLM performance is compared with the QM and MLR to evaluate the performance of the proposed models. The values of R-squared and root mean squared error (RMSE) are listed in Table 3. It is found that all proposed models gave a good accuracy to estimate the DV of biodiesel. Additionally, it is noticed that the maximum R-squared value and minimum RMSE were obtained from the QM model.

4 Conclusions In this study, seven models, namely FFNN, CFNN, ENN, GRNN, LRNN, QM, and MLR, were developed to predict the DV of biodiesel. The results showed that all proposed models were suitable for estimating the dynamic viscosity of biodiesel at 313 k. Among the developed models, the QM model presented significantly better prediction performance based on the value R2 and RMSE. In the future work, various models with various combinations of parameters including fatty acid composition, storage temperature, oxidation stability, flash point, and cetane number should propose to categorize the most influencing input parameters for predicting the DV of biodiesel at 313 K.

References 1. Moser BR (2012) Preparation of fatty acid methyl esters from hazelnut, high-oleic peanut, and walnut oils and evaluation as biodiesel. Fuel 92(1):231–238 2. Hosseini SM, Pierantozzi M, Moghadasi J (2019) Viscosities of some fatty acid esters and biodiesel fuels from a rough hard-sphere-chain model and artificial neural network. Fuel 235:1083–1091 3. Çamur H, Alassi E (2021) Physicochemical properties enhancement of biodiesel synthesis from various feedstocks of waste/residential vegetable oils and palm oil. Energies 14(16):4928 4. Ya¸sar F (2020) Comparision of fuel properties of biodiesel fuels produced from different oils to determine the most suitable feedstock type. Fuel 264:116817

Predicting the Dynamic Viscosity of Biodiesels …

215

5. Nagappan B, Devarajan Y, Kariappan E, Philip SB, Gautam S (2021) Influence of antioxidant additives on performance and emission characteristics of beef tallow biodiesel-fuelled CI engine. Environ Sci Pollut Res 28(10):12041–12055 6. Kassem Y, Çamur H, Esenel E (2017) Adaptive neuro-fuzzy inference system (ANFIS) and response surface methodology (RSM) prediction of biodiesel dynamic viscosity at 313 K. Proc Comput Sci 120:521–528 7. Kassem Y, Çamur H (2017) Prediction of biodiesel density for extended ranges of temperature and pressure using adaptive neuro-fuzzy inference system (ANFIS) and radial basis function (RBF). Proc Comput Sci 120:311–316 8. Hajinezhad A, Mohammad Hosseini HA (2021) Regional standardisation of bio-diesel fuel based on indigenous sources (Norouzak fuel). Int J Ambient Energy 42(8):895–899 9. Kassem Y, Çamur H (2017) A laboratory study of the effects of wide range temperature on the properties of biodiesel produced from various waste vegetable oils. Waste Biomass Valorization 8(6):1995–2007 10. Amin A, Gadallah A, El Morsi AK, El-Ibiari NN, El-Diwani GI (2016) Experimental and empirical study of diesel and castor biodiesel blending effect, on kinematic viscosity, density and calorific value. Egypt J Pet 25(4):509–514 11. Ivaniš GR, Radovi´c IR, Veljkovi´c VB, Kijevˇcanin ML (2016) Thermodynamic properties of biodiesel and petro-diesel blends at high pressures and temperatures. Experimental and modeling. Fuel 184: 277–288 12. Esteban B, Riba JR, Baquero G, Rius A, Puig R (2012) Temperature dependence of density and viscosity of vegetable oils. Biomass Bioenerg 42:164–171 13. Ramírez-Verduzco LF, García-Flores BE, Rodríguez-Rodríguez JE, del Rayo Jaramillo-Jacob A (2011) Prediction of the density and viscosity in biodiesel blends at various temperatures. Fuel 90(5):1751–1761 14. Moradi G, Mohadesi M, Karami B, Moradi R (2015) Using artificial neural network for estimation of density and viscosities of biodiesel–diesel blends 15. Tate RE, Watts KC, Allen CAW, Wilkie KI (2006) The viscosities of three biodiesel fuels at temperatures up to 300 C. Fuel 85(7–8):1010–1015 16. Machado M, Zuvanov V, Rojas E, Zuniga A, Costa B (2012) Thermophysical properties of biodiesel obtained from vegetable oils: corn, soy, canola and sunflower. Enciclopédia Biosfera 8(14) 17. Davies RM (2016) Effect of the temperature on dynamic viscosity, density and flow rate of some vegetable oils. J Sci Res Eng Technol 1(1):14–24 18. Ateeq E (2015) Biodiesel viscosity and flash point determination (Doctoral dissertation) 19. Gokdogan O, Eryilmaz T, Kadir Yesilyurt M (2015) Thermophysical properties of castor oil (Ricinus communis L.) biodiesel and its blends. CT F-Cienc Tecnología y Futuro 6(1):95–128 20. Moradi GR, Karami B, Mohadesi M (2013) Densities and kinematic viscosities in biodiesel– diesel blends at various temperatures. J Chem Eng Data 58(1):99–105 21. Kumbár V, Skˇrivánek A (2015) Temperature dependence viscosity and density of different biodiesel blends. Acta Univ Agric et Silviculturae Mendelianae Brunensis 63(4):1147–1151 22. Kassem Y, Gökçeku¸s H, Alassi E (2022) Identifying most influencing input parameters for predicting Cereal production using an artificial neural network model. Model Earth Syst Environ 8(1):1157–1170 23. Kassem Y, Gokcekus H (2021) Do quadratic and poisson regression models help to predict monthly rainfall? Desalin Water Treat 215:288–318 24. Modaresi F, Araghinejad S, Ebrahimi K (2018) A comparative assessment of artificial neural network, generalized regression neural network, least-square support vector regression, and K-nearest neighbor regression for monthly streamflow forecasting in linear and nonlinear conditions. Water Resour Manage 32(1):243–258 25. Chu Y, Fei J, Hou S (2019) Adaptive global sliding-mode control for dynamic systems using double hidden layer recurrent neural network structure. IEEE Trans Neural Networks Learn Syst 31(4):1297–1309

Artificial Neural Networks, Quadratic Regression, and Multiple Linear Regression in Modeling Cetane Number of Biodiesels Youssef Kassem , Hüseyin Çamur , George Edem Duke, and Abdalla Hamada Abdelnaby Abstract The cetane number (CN) of the biodiesel is very essential, which lets to reduce the quality of the biodiesel. In this study, three machine learning models were developed to predict the CN value. Furthermore, the performance of the developed models was compared with the quadratic model and multiple linear regression. For  this purpose, the biodiesel data in terms of the sum of the saturated ( SFAMs),   monounsaturated ( MUFAMs), polyunsaturated ( PUFAMs), degree of unsaturation (DU), and long-chain saturated factor (LCSF) were determined based on the fatty acid compositions of biodiesel. The developed models have been evaluated by estimating the value of R-squared, root mean squared error, and mean absolute error. The results showed that the proposed models have good efficacy in predicting the CN value. Furthermore, the results demonstrated that the best model for predicting the CN value was QM based on the value RMSE. Keywords Biodiesel · CN · Fatty acid composition · Machine learning · MLR · QM

1 Introduction Biodiesel is domestically obtained from refined and waste vegetable oils, animal fats, or other lipids [1]. This type of fuel does not usually contain petroleum, but it is possible to create a blend of biodiesel by mixing it with petroleum diesel [2]. To obtain accurate estimates of the combustion process of alternative fuels, it is Y. Kassem (B) · H. Çamur · G. E. Duke · A. H. Abdelnaby Faculty of Engineering, Mechanical Engineering Department, Near East University, Nicosia 99138, North Cyprus e-mail: [email protected] H. Çamur e-mail: [email protected] Y. Kassem Faculty of Civil and Environmental Engineering, Near East University, Nicosia 99138, North Cyprus © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_16

217

218

Y. Kassem et al.

necessary first to accurately estimate the physical properties, mainly in the cases of atomization, spraying, and combustion [3]. In general, viscosity and density are considered crucial properties that influence the engine’s performance and emission characteristics [4, 5]. Moreover, cetane number (CN) is one of the main indicators that characterize biodiesel fuel [6]. The CN is an indicator of the ignitibility of diesel fuels. The cetane number characterizes the flammability of the fuel [6]. The higher this indicator, the less time passes from the injection of fuel into the working cylinder to the beginning of its combustion, and, accordingly, the shorter the engine warm-up time [7]. Several scientific studies have analyzed the effect of cetane numbers on the performance of diesel engines [8–10]. For instance, ˙Içıngür and Altiparmak [8] assessed the impact of CN value of engine performance. They found that the performance of the engine increased when the cetane number is above 54.5. Ahmed and Chaichan [9] studied the effect of adding the 2-Ethylhexyl nitrate on the cetane number of fuel diesel. Moreover, several researchers have estimated the CN of various types of biodiesel using various machine learning approaches [11–15]. Therefore, it needs to develop a model for calculating the value of CN based on the fatty acid composition. In this regard, the performance of the Generalized Regression Neural Network and quadratic model (QM) is analyzed and compared with multiple linear regression (MLR). In this study, estimating monounsaturated (MUFAMEs), polyunsaturated (PUFAMEs), and saturated (SFAMEs), the degree of unsaturation (DU), and long-chain saturated factor (LCSF) are used as input variables for predicting the CN value. Figure 1 illustrates the analysis procedure of the current study.

2 Material and Methods 2.1 Data In this work, CN value and the fatty acid compositions of various types of biodiesels are collected from previous studies [16–19]. For determining the estimating monounsaturated (MUFAMEs), polyunsaturated (PUFAMEs), saturated (SFAMEs), the degree of unsaturation (DU), and long-chain saturated factor (LCSF) based on the fatty acid compositions, Eqns. (1)-(5) are used. Table 1 lists the statistical parameters of the used variables.  

MUFAMs =

PUFAMs =





wt%Cxx : 1

wt%Cxx : 2 +



wt%Cxx : 3

(1) (2)

Artificial Neural Networks, Quadratic Regression …

219

Fig. 1 Analysis procedure for the current study

Table 1 Statistical parameters of used variables Variable  MUFAMs  PUFAMs  SFAMs

Mean

Standard deviation

Minimum

Maximum

38.4

23.92

0

100

27.01

25.49

0

100

34.19

28.74

0

118.99

DU

92.42

47.77

0

200

9.69

26.09

0

200

54.257

10.526

22.7

100

LCSF CN



SFAMs =



wt%Cxx : 00

  DU = [monounsatuarted Cn : 1] + 2 polyunsaturated Cn : 2, 3

(3) (4)

220

Y. Kassem et al.

LCSF = 0.1 × [C16 : 0] + 0.5 × [C18 : 0] + [C20 : 0] + 1.5 × [C22 : 0] + 2 × [C24 : 00]

(5)

2.2 Machine Learning Models MLMs are utilized as a tool to describe a complex system [4, 5]. Wide ranges of ML models are utilized to solve complex problems in a variety of fields [4, 5]. In this study, Generalized Regression Neural Network (GRNN) is developed to predict the CN value of biodiesel. The GRNN has not required an iterative training procedure as in the back-propagation method. The description of the GRNN model was given in Ref. [20]. In this work, estimating monounsaturated (MUFAMEs), polyunsaturated (PUFAMEs), saturated (SFAMEs), the degree of unsaturation (DU), and longchain saturated factor (LCSF) are used as input variables. The training (75%) and testing data (25%) were used to develop and validate the models, respectively. The results of the GRNN model are compared with actual data and traditional machine learning models (Feed-Forward Neural Network (FFNN), Cascade Feed-forward Neural Network (CFNN)).

2.3 QM and MLR Models The QM is a mathematical and statistical method polynomial model as expressed below. Y = β0 +

n 

βi xi +

n 

i=1

βii xi2 +

i=1

n n−1   i

βi j xi x j

(6)

i=i+1

where Y is the predicted response, β0 , βi , are related to the main effects and βii and βi j to interaction, and x i and x j are the independent variables. Moreover, MLR is used to predict the DV of biodiesel. MLR is expressed as Y = β0 + β1 x1 + · · · + βi xi i = 1, 2 . . . n

(7)

where Y , β0 , βi , and xi are defined above. In general, these models aim to show the relationship between the variables as shown in Eq. (8). CN = f



MUFAMs,



PUFAMs,



 SFAMs, DU.LCSF

(8)

Artificial Neural Networks, Quadratic Regression …

221

3 Results and Discussion 3.1 Machine Learning Models Aforementioned, three MLMs were utilized to estimate the CN of biodiesel. Thus, MUFAMEs, PUFAMEs, SFAMEs, DU, and LCSF are used as input parameters. The best network configuration was found by trial and error method and selected based on the lowest value of mean squared error (MSE). In the present work, the training function was TRAINLM. The back-propagation algorithm was used to reduce the MSE value between the measured and predicted values. It was found that the optimal configuration for MLMs is shown in Table 2. Figure 2 compares the predicted data with the actual data of CN using the proposed machine learning approaches. Table 2 Optimal configuration for MLMs MLM

Configuration

Spread constant

Number of neurons

MSE

5:1:1



8

0.0027387

5:1:1



5

0.00092378

GRNN

– -

0.002





CN [ -]

FFNN CFNN

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 0

0.1

0.2

0.3

0.4 ∑

Actual

GRNN

0.5

0.6

0.7

0.8

0.9

1

[%] FFNN

CFNN

Fig. 2 Comparison of the estimated data with the observed data for DV using different ANN models

222

Y. Kassem et al.

3.2 Mathematical Models The mathematical equations that were developed using MLR (Eq. (9)) and QM (Eq. (10)) for predicting the CN are expressed as shown below. CN = −0.12 + 0.6



MUFAMs + 0.25



PUFAMs + 0.69



SFAMs + 0.33LCSF

(9)

   CN = 164 + 301 MUFAMs + 113 PUFAMs + 454 SFAMs + 136LCSF   2 2 + 83 PUFAMs MUFAMs − 80  2 + 186 SFAMs + 235DU2 − 0047LCSF2      + 364 MUFAMs SFAMs + 203 MUFAMs (LCSF)      + 262 PUFAMs SFAMs + 135 PUFAMs (LCSF)   + 203 SFAMs (LCSF) (10)

CN [-]

Figure 3 compares the predicted value obtained by QM and MLR models with the measured value.

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

∑ Actual

0.6

0.7

0.8

0.9

[%] MLR

QM

Fig. 3 Comparison of the estimated data with the observed data for CN using MLR and QM

1

Artificial Neural Networks, Quadratic Regression …

223

Table 3 Performance evaluation of the models Statistical indicator

GRNN

FFNN

CFNN

MLR

QM

R-squared

0.6191

0.1861

0.0791

0.6879

0.7446

RMSE

0.1128

0.1223

0.2630

0.0722

0.0653

3.3 Performance Evaluation of Machine Learning Models, MLR, and QM The MLM performance is compared with the QM and MLR to evaluate the performance of the developed models. The values of R-squared and root mean squared error (RMSE) are listed in Table 3. It is noticed that the maximum R-squared value and minimum RMSE were obtained from the QM model.

4 Conclusions The cetane number (CN) of the biodiesel is very essential, which lets to reduce the quality of the biodiesel. Therefore, in this study, five models, namely FFNN, CFNN, GRNN, QM, and MLR, were developed to estimate the CN value. The results showed that all proposed models were suitable for determining the CN value of biodiesel. Among the developed models, the QM model presented significantly better prediction performance based on the value R2 and RMSE. In the future work, various models with various combinations of parameters including storage temperature, oxidation stability, flash point, density, and viscosity should propose to categorize the most influencing input parameters for predicting the CN value of biodiesel.

References 1. Patil AK (2015) Experimental investigations of the performance analysis of CI engine fuelled with sesame oil biodiesel as alternative fuel. Indian J Sci Res 350–355 2. Keera ST, El Sabagh SM, Taman AR (2018) Castor oil biodiesel production and optimization. Egypt J Pet 27(4):979–984 3. Esclapez L, Ma PC, Mayhew E, Xu R, Stouffer S, Lee T, Ihme M (2017) Fuel effects on lean blow-out in a realistic gas turbine combustor. Combust Flame 181:82–99 4. Kassem Y, Çamur H, Esenel E (2017) Adaptive neuro-fuzzy inference system (ANFIS) and response surface methodology (RSM) prediction of biodiesel dynamic viscosity at 313 K. Proc Comput Sci 120:521–528 5. Kassem Y, Çamur H (2017) Prediction of biodiesel density for extended ranges of temperature and pressure using adaptive neuro-fuzzy inference system (ANFIS) and radial basis function (RBF). Proc Comput Sci 120:311–316

224

Y. Kassem et al.

6. Kaisan MU, Anafi FO, Nuszkowski J, Kulla DM, Umaru S (2017) Calorific value, flash point and cetane number of biodiesel from cotton, jatropha and neem binary and multi-blends with diesel. Biofuels 7. Du J, Sun W, Guo L, Xiao S, Tan M, Li G, Fan L (2015) Experimental study on fuel economies and emissions of direct-injection premixed combustion engine fueled with gasoline/diesel blends. Energy Convers Manage 100:300–309 8. ˙Içıngür Y, Altiparmak D (2003) Effect of fuel cetane number and injection pressure on a DI diesel engine performance and emissions. Energy Convers Manage 44(3):389–397 9. Ahmed ST, Chaichan MT (2012) Effect of fuel cetane number on multi-cylinders direct injection diesel engine performance and exhaust emissions. Al-Khwarizmi Eng J 8(1):65–75 10. Atmanli A (2016) Effects of a cetane improver on fuel properties and engine characteristics of a diesel engine fueled with the blends of diesel, hazelnut oil and higher carbon alcohol. Fuel 172:209–217 11. Ramadhas AS, Jayaraj S, Muraleedharan C, Padmakumari K (2006) Artificial neural networks used for the prediction of the cetane number of biodiesel. Renewable Energy 31(15):2524–2533 12. Piloto-Rodríguez R, Sánchez-Borroto Y, Lapuerta M, Goyos-Pérez L, Verhelst S (2013) Prediction of the cetane number of biodiesel using artificial neural networks and multiple linear regression. Energy Convers Manage 65:255–261 13. Rocabruno-Valdés CI, Ramírez-Verduzco LF, Hernández JA (2015) Artificial neural network models to predict density, dynamic viscosity, and cetane number of biodiesel. Fuel 147:9–17 14. Miraboutalebi SM, Kazemi P, Bahrami P (2016) Fatty acid methyl ester (FAME) composition used for estimation of biodiesel cetane number employing random forest and artificial neural networks: a new approach. Fuel 166:143–151 15. Hosseinpour S, Aghbashlo M, Tabatabaei M, Khalife E (2016) Exact estimation of biodiesel cetane number (CN) from its fatty acid methyl esters (FAMEs) profile using partial least square (PLS) adapted by artificial neural network (ANN). Energy Convers Manage 124:389–398 16. Gopinath A, Puhan S, Nagarajan G (2009) Relating the cetane number of biodiesel fuels to their fatty acid composition: a critical study. Proc Inst Mech Eng Part D J Automobile Eng 223(4):565–583 17. Tong D, Hu C, Jiang K, Li Y (2011) Cetane number prediction of biodiesel from the composition of the fatty acid methyl esters. J Am Oil Chem Soc 88(3):415–423 18. Azam MM, Waris A, Nahar NM (2005) Prospects and potential of fatty acid methyl esters of some non-traditional seed oils for use as biodiesel in India. Biomass Bioenerg 29(4):293–302 19. Winayanuwattikun P, Kaewpiboon C, Piriyakananon K, Tantong S, Thakernkarnkit W, Chulalaksananukul W, Yongvanich T (2008) Potential plant oil feedstock for lipase-catalyzed biodiesel production in Thailand. Biomass Bioenerg 32(12):1279–1286 20. Liang Y, Niu D, Hong WC (2019) Short term load forecasting based on feature extraction and improved general regression neural network model. Energy 166:653–663

AI-Based Automated Approach for Trend Data Generation and Competitor Benchmark to Enhance Voice AI Services Jayavel Kanniappan, Jithin Gangadharan, Rajesh Kumar Jayavel, and Aravind Nadanasabapathy

Abstract The worldwide growth of voice-based NLP system is immense and specifically toward the smartphone platform which interacts with other connected devices. The growth of smartphone and NLP raises user expectation toward specific solutions with high quality End-to-End experience, adaptation of live trending data and better experience than competitor devices to lead in market. The proposed AI-based method to generate market trending structured utterances and method to validate NLP (ASR, NLG, E2E) system across competitor devices. The proposed framework has four novel aspects: (1) generation of market data with structured text utterances, (2) auto script generation for native domain with key E2E validation parameters, (3) identify ASR issues and segregate specific word and generate audio input to system to define language model or accent model improvement areas and (4) semantic approach to validate End-to-End system using dialog responses for all voice-enabled devices. The proposed framework has been used in Samsung voice platform project named as Bixby, which helped to reduce lot of manual effort and enhance NLP in market trending data in real time and user experience to lead in market. Keywords Artificial intelligence · Voice assistants · Competitor benchmarking · ASR · NLU · NLG · Text-to-speech · Trending data · BERT · Slot · Word error rate · Test automation · Sematic · AI/ML

J. Kanniappan (B) · J. Gangadharan · R. K. Jayavel · A. Nadanasabapathy Intelligence & IoT, Samsung R&D Institute Bangalore, Bangalore, India e-mail: [email protected] J. Gangadharan e-mail: [email protected] R. K. Jayavel e-mail: [email protected] A. Nadanasabapathy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_17

225

226

J. Kanniappan et al.

1 Introduction Over the past few years, the growth of NLP system across the fields is very impressive and results turned out to be huge global demand with high impact on consumer experiences. The market value of NLP is approx. USD 10.72 billion in 2020 has been expected to be worth of USD 48.26 billion by 2026. Natural Language Processing plays a vital role to connect humans and technology. The NLP services are well recognized and accepted across the different industry verticals like voice platform, health care, banking, AR/VR, automotive, multimodal and robots with real-time decision on great accuracy level. Though NLP is used across the industry, the heart of the system is datasets like text, image, video and audio used for model training, testing and evolution of market insights. Even though AI system looks very sophisticated, but prospect of this AI-based solutions not only with stable production data output, and it also depends on quality of voice solutions in terms of market trending data, user navigation experience, seamless content delivery and leading with competitor solutions. The challenges persist to any AI-based solutions to adopt new data, frequency of data, human effort and quality assurance to assure stability. The proposed research focus on building AI-based evaluation framework for smartphone voice-enabled platform (Bixby, Google Assistant and Alexa) with salient features like generating market trending utterances from leading content providers, auto text classification for native domains, auto script generation with slot parameters, quality assessment of ASR with respective word issues to train LM/AM model and quality assessment of NLP [1] End-to-End system with dialog response using semantic BERT model. The proposed framework enables user to benchmark competitor solutions with engineering KPI’s like latency, memory usage, CPU, thermal and stability.

2 Related Work Presently in market, there are different ways in which voice assistants follow benchmark with competitors like manual assessment by voice and web-based evaluation tools to test the skills and user experience. One of the research work A [2] talks about offline utterances generation with paraphrasing technique, which contain different sequence of words but have the same meaning using contextual embedding techniques. The generated utterances are static in nature and it does not contain any market trending topics and variations covered. The proposed features compared against related work as given in Table 1. The team of researcher from computer science department of company [3] studied the usability, user experience and usefulness of the Google Home smart speaker and findings showed that Google Home is usable and user-friendly for the user and other voice assistants are not part of their analysis. One of the competitor tool B have

AI-Based Automated Approach for Trend Data Generation …

227

Table 1 Tool feature comparison Features

Proposed framework

Tool ‘A’

Tool ‘B’

Tool ‘C’

Trending utterance generation

Yes

No

No

No

Dialog-based E2E verification

Yes

No

No

No

Auto script creation

Yes

No

No

No

ASR word error rate

Yes

No

No

No

Phonetic ASR verification

Yes

No

No

No

Competitor benchmark test

Yes

No

No

No

Voice mode test

Yes

Yes

No

Yes

Real user simulation

Yes

No

No

No

Engineering KPI’s

Yes

Yes

Yes

Yes

Multiple language support

Yes

Yes

Yes

Yes

mechanism to pull the raw data without trending, structured and meaningful way to input voice assistants. Other market tools C mainly focused on NLP components verification and there is no common E2E automation framework to generate market trending utterance, assure quality of AI-based voice-enabled devices and benchmark with competitor solutions to understand market gap and enhance AI-based system. The proposed framework has coupled with solution to enhance voice-enabled AI services. AI-based voice-enabled devices growth is significant and end user expectation for respective devices with real-time trending data to know user interest in popular areas. It is very essential to enhance our NLP system production skills and training data in real time to connect with our user always and lead in market.

3 Proposed Framework The proposed framework designed with layered architecture has desktop and client device approach for the market trending data evaluation with competitor benchmarking. In general, product development and quality team face huge challenge to identify and assure quality of market trending data in real time or in a daily basis as the effort could be very high due to the frequent exploration and collection of trending topics in specified region and date. Hence, the proposed framework explains in detailed methods and research work on achieving the automation of trend data collection and quality evaluation of voicebased solutions. The research concepts also adapted the AI methods for the valuation of market trending data in terms of dialog response from the voice solutions in assessing the End-to-End quality. The proposed architecture incorporated with three major components like application layer, framework layer and model/data layer as shown in Fig. 1.

228

J. Kanniappan et al.

Fig. 1 Layered architecture

The application layer provides user interface for device configuration, data source from web, user-defined natural utterance inputs and communication channel for framework and model/data layers. The framework layer coupled with multiple functionalities like structure data generation, auto script creation, device execution, validation parameters and log parser. The model and data layer heart of this framework mainly deals with model-based evaluation for NLP E2E system, native application classification and auto script creation with final UI parameters. The framework has capability to set configuration in terms of region, date and voice assistants to fetch trending data and evaluate the quality. The system takes those user inputs and crawl for the current trending topics over the web with multiple sources for market trending topics and information through web requests and interfaces as a standard approach, which is similar to manual effort. After retrieving

AI-Based Automated Approach for Trend Data Generation …

229

Fig. 2 User flow representation

trending topics, then related queries parsed for most relevant content to gather the unstructured utterances in a precise manner as represented in the Fig. 2. The current unstructured data has been formatted and converted into structured utterances by applying prefix and postfix based on the category classification of the topics to identify its nature in order to construct the structured utterances in a more meaningful way with high data accuracy of 95% with improved datasets for prefix and postfix. The utterances for the execution are generated in real time through automated approach and now it is proceeded with the execution of those utterances for the voice solutions in parallel to evaluate the quality of various components like automatic speech recognition (ASR), Natural Language Processing and generation as dialog response against one or more voice-based assistants. The execution performed in connected mobile devices in parallel with different voice solution for the benchmarking, which configured by user before the start of the whole process. Based on the device and target solution, the proposed framework initialize the device handler and methods of assess the conditions prior to the execution and set to environment for the execution readiness. The device handler of different voice solutions is invoked to perform the scenarios in respective devices in automated manner similar to the manual operations. The framework monitors in real time for the scenario completion

230

J. Kanniappan et al.

of voice interactive flow and then parses the actual data from the voice solutions under test to evaluate the quality against the trending data. This process automated using AI models to determine the End-to-End quality through BERT semantic similarity matching of the response text to perform the match contextually against the competitor devices for the dialog response evaluation with the accuracy of over 92% with improved natural language datasets. The BERT AI model is deployed at centralized server to get the model evaluation in a quickly way with lightweight handles through REST API services. It also gathers all the words wrongly recognized by the ASR components with analytical data across different market trending utterances. The overall process is seamlessly carried out in the simplest and sophisticated manner with easy to use interface with all controls for the execution. The accuracy and speed of the execution balanced with getting high efficiency of benchmarking evaluation. It also generated the detailed reports for the connected devices with configured voice-based solutions for all the voice interactive components with evaluated results. The proposed framework is the most efficient method for market trending data evaluation in real time without any effort in data collection and thus avoiding a huge manual effort in performing the execution. This gives evaluation result in a consistent manner and hence reduces the manual errors while performing a huge execution of thousands of utterances across voice-based solutions and competitor devices.

4 Trending Data Generation In modern era with technological growth, we observe a surge in popularity on specific topic among people interest for a limited duration of time subject to how long the popularity exists for larger audience of people. We usually call it as trending topic or the market trends as given in Table 2. Market trends [4] are usually searched most among people over the Internet as those topics are highly trending in the market and can be fetched from leading source like Google Trends. In present digital world, the mobile user experience are being highly improved and lot more way to enhance the voice capabilities of the personal assistants for the market trending topics and related areas in terms of speech recognition, domain classification and providing the proper and meaningful dialog response. Speech recognition is the key area in which the user speech recognized properly to process and proceed further to give the response and actions to the user as desired. Hence, it is very challenge to construct the utterances to simulate the trending topics and queries related to trending areas. In this proposed research methods, we explain the continuous process of getting the market trending data and related queries automatically with popularity ratio and category to determine the strength of the trending and weights accordingly to give a constructive set of top utterances from the market trending areas for specific duration of time and the region of trending.

AI-Based Automated Approach for Trend Data Generation … Table 2 Trending topics and popularity

231

Trend topic

Popularity hit

Category

Omicron

10 M +

Virus

Elon Musk

5M+

Personality

Liverpool

50 K +

Sports/football

Squid game

20 K +

Series/web

WHO

20 K +

Organization/health

Zepto

20 K +

Virus

Harry potter return

10 K +

Series/TV

4.1 Generation of Unstructured Utterances It is essential to get the utterances variations for the trending topics that covered all the interested and popular sub-topics in particular region for each trending areas as shown in Fig. 3, which gives insights of the people interest. Usually, there are many trending topics emerge in specific days. Additionally, it is very important to know the popularity for the topics is very important to prioritize the benchmarking analysis. Hence, the findings of popularity hit count helps to identify the usage statistics from the market data. It gives a broader perspective of the global or regional market trends and the expectations and it used to evaluate for those data to determine the voice solutions capabilities in accessing those market trends. In general, the top trending topics are fetched which has very high popularity counts among the specified region for the evaluation which is derived from the top sources Google Trends using web services.

Fig. 3 System flow of fetching unstructured utterances

232

J. Kanniappan et al.

4.2 Generation of Structured Utterances The queries from the trending topics are usually in the unstructured format that is essential to convert to a meaning utterance to make a structured query for the evaluation of the voice-based personal assistants. The process of conversion of structured utterances has predefined prefix and postfix template based on nature of the context and category of the trending topics and the related queries. The most efficient methods to generate the utterance variations to construct structured utterances [5] through the context and topic category classification as shown in Fig. 4 and constructing with more relevant prefix and postfix with all possible ways of queries are defined in the rule-based template for the construction of the structured utterances.

5 AI-Based Voice Solutions Benchmarking It is very essential to evaluate any AI voice-based personal assistants against the competitor to determine the strength and assessing the opportunity for the better quality improvements. It is highly impossible to conclude the quality of voice assistant for the open trending data and utterances, hence the evaluation to be carried out with very large set of data with real-time trending areas for the long period to evaluate the behaviors of the solutions along with the competitor benchmarking. The key aspects of voice intelligence solutions are highly classified as Automatic Speech Recognition (ASR), Natural Language Processing (NLP) and Natural Language Generation (NLG) as shown in Fig. 5, that all are equally important to determine the End-to-End quality of the complete utterance flow to provide high quality user experiences to the customers. Hence, to determine the quality of large set of data, it is very essential to bring the automation of competitor benchmarking for the trending data in real time in a continuous basis for a long period.

Fig. 4 System flow of structured utterances conversion

AI-Based Automated Approach for Trend Data Generation …

233

Fig. 5 Components of voice-enabled solutions

5.1 Model Evaluation Concepts on BERT This proposed research method describes the model-based evaluation for the dialog response texts using the BERT model. The concept does with the similarity matching based on the context rather than a character or word length-based matching as it could lead to deviation due to slight changes in the texts, hence it is highly essential to evaluate the dialog responses using BERT model [6]. It also describes various methods and techniques of using the BERT and transformer layers for the applications and features classifications [7], also the named entity recognition to fetch the slot for the final End-to-End verification.

5.1.1

Market Trending Data Evaluation Methods

The evaluation of market trending data is performed by fetching the market trending topics and constructing the meaningful utterances from those related queries. It cannot be created with human effort for constructing those trending utterances as it cannot be in real time and cannot be done easily without any errors. In this proposed method, we focus on the collection of top trending in specific region for the certain duration or in a daily basis for the evaluation. The execution actual output is taken from different competitor for the evaluation of dialog response for the trending data that holds most appropriate for the validation of End-to-End accuracy of the voice interaction flow. It is performed to evaluate based on the context similarity of the dialog response texts to determine the accuracy based on the BERT model evaluation with the score for either entailment, neutral or contradiction [6].

5.1.2

Native Application Data Evaluation Methods

The techniques followed for the evaluation of native application are through the BERT classification to identify the right set of dialog responses and its variations. Transformer layer is used to identify the slot parameter to evaluate the successful tasks performed by voice assistant through verification of slots either from the screen layout

234

J. Kanniappan et al.

in the UI or from the log strings. These AI-based techniques [7] assisted throughout the process to access the right dialog responses of defined native applications and End-to-End slot level validations in final executed layout view.

5.2 ASR End-to-End Evaluation Method In voice-based intelligence, Automatic Speech Recognition (ASR) part is the one of the significant part out of three major components such as NLG and E2E, and ASR flow is described as shown in Fig. 6. If ASR failed to detect right word from the user voice input, it can guide NLU wrongly to predict, and it affects the end user overall experience using the voicebased intelligence [8]. Below mentioned are the different test approach to check the quality of ASR engine [9]. The playback of the TTS carried out by following methods: (1) TTS configuration according to age (Teen, Adult and Elder): This mode of testing will make sure ASR is able to recognize the words utter by different age groups. (2) TTS configuration according to Place (Country/Region): This mode of testing will make sure ASR capability to recognize words uttered in different ascent and language. (3) TTS configuration according to gender (Male and Female): This mode of testing will make sure ASR capability to recognize words uttered by male and female. (4) TTS configuration according to modulation and speed: This mode of testing will make sure ASR capability to recognize words uttered by different speeds (fast and slow). (5) TTS configuration with different volume levels (low, mid and high): This mode of testing will make sure ASR capability to recognize words uttered in different volume levels. The TTS playback normally ranges from different distance corresponding to the distance either near or far from the microphone and with simulation of noise condition to determine the quality of speech recognition [9] in different home and user

Fig. 6 System flow of ASR word validation process

AI-Based Automated Approach for Trend Data Generation …

235

Fig. 7 Evaluation results of ASR with trending utterances

environment. ASR evaluation is always a challenging part in voice-based intelligence solution. We can evaluate ASR multiple ways, and we follow the below approach to evaluate and validate ASR quality. Expected ASR (E-ASR) is the input data used to test voice-based intelligence (it can be with user real voice or automated way using TTS) and Actual ASR (A-ASR) is ASR processed data in text format from ASR engine using logs or API call or GUI methods. The evaluation of ASR verifications through automation methods results falls under below category and accuracy of ASR evaluation as shown in Fig. 7. Entailment cases: Verification of E-ASR and A-ASR strings with any string matching methods and extracts the missing or extra words that recognized. If EASR and A-ASR strings are matching then the test case considered PASS. This is the most reliable approach we can consider with cost-effective and minimal manual effort involved. With this approach, it can evaluate 85% of ASR test cases automatically when it is evaluated in real-time competitor execution and evaluation of voice solutions. Representation difference cases: E-ASR some of the parts are in numbers and AASR represent that number in text format (it includes numbers, dates, mathematical symbols, logic operator symbols, etc.). Additional character difference cases: A-SSR can include addition character or symbols like (‘s,.!? ~ “etc.) or vice versa. Failure cases: Identification of actual failures using string match algorithm, and it will provide missing ESR string from A-ASR. That we call as wrongly detected word and that word needs to consider as training or improving ASR engine. As mentioned above in result category of failure, we can extract the word using string match algorithm that caused to fail. Based on the tool configuration we do

236

J. Kanniappan et al.

re-test of failure words alone with different test methods mentioned in above like Different Age, Volume, Speed and Gender to improve ASR AM model, and LM model will improved based on the new trending key word fetching from market trending data.

5.3 Model Based E2E Evaluation of Trend Data Natural Language Generation (NLG) plays a very important role in user interaction. When user provides the input to voice-based intelligence, it gets processed in NLU and identify the action to be performed to the user end. NLG role is to communicate to the user about the action in terms of voice play back. Evaluating NLG for market trending data or trend data is quite challenging [10] because of its dynamic in nature. When we perform automation test with trend dataset as input, we do not get any expected data of NLG for the test cases, only we can get actual NLG data from logs, GUI or API call. There is no research paper or framework available until date to evaluate NLG of trend input data. Therefore, we did research on this area to evaluate NLG with AI using context extraction [11] from the response texts and use the approach for trend data input automation and reduce maximum manual effort. In our research, we use one of the leading competitor actual NLG data as expected data for NLG verification. This method uses BERT context-based similarity analysis with word tokenization [12] on two NLG responses (Voice Intelligence Response and Competitor Voice Intelligence Response) as shown in Fig. 8. The final evaluation result measured by the entailment and contradiction of the BERT semantic model output, and accuracy is determined based on the True Positive and True Negative results for the overall evaluation as shown in Fig. 9. Ea =

TP TP + FP

(1)

where ‘Ea’ is accuracy of entailment, ‘TP’ is True Positive; ‘FP’ is False Positive. The overall accuracy of positive outcome, which is more relevant, is measured. Ca =

TN TN + FN

(2)

where ‘Ca’ is accuracy of contradiction, ‘TN’ is True Negative; ‘FN’ is False Negative. The overall accuracy of false outcome determined properly.

AI-Based Automated Approach for Trend Data Generation …

Fig. 8 Approach to BERT semantic similarity of texts

Fig. 9 Evaluation results of NLG with trending utterances

237

238

J. Kanniappan et al.

5.4 Model-Based NLU and E2E Evaluation for Native Apps As most of the native test cases expected to do some actions on user side. It is important to make sure that the actions, which performed to the user end, is correct or not based on the input given by user. If we try to do this through automation, one usual approach we can follow is give input to voice intelligence using TTS and verify the expected elements in the launched screen by voice intelligence app or redirect to the screen where action supposed to do. Then check the expected elements state in that particular screen using GUI-based automation library or device framework utilities. Major drawbacks are manual creation of scripts will take huge effort. When it comes to AI-based solutions testing, we cannot limit our testing with few test scenarios or test scripts. We have to perform different types of testing like trend data, natural variations, functionality, sanity, market data and competitors input to make sure the quality of product before reaching to user end. In this kind of situations, it is very tedious to create automation scripts then stabilize the scripts and then test the product quality using the manually created scripts. This is the global challenge in AI-based solution verification [13] through automation especially voicebased intelligent solutions. In proposed framework, we provided solution to verify the End-to-End voice interactive scenarios dynamically by evaluating the extracted actual contents using AI-assisted approaches as shown in Fig. 10. The sample data given in Table 3 denotes the slot extraction [14] to construct the exact precondition to be set for scenario execution and verification criteria post the execution of voice assistants in an automated ways [15]. Therefore, we did a research on this area to create automation scripts for the test scenarios automatically with the help of AI algorithms. We use BERT text classification and BERT entity detection algorithm and transformers [16] to achieve this approach.

Fig. 10 Systematic approach to auto scenario validation

AI-Based Automated Approach for Trend Data Generation …

239

Table 3 Sample slot tagging with POS fields for E2E verification Sentence #

Word

POS

Tag

0

Sentence: 1

Show

VBP

O

1

Sentence: 1

Pictures

NNS

O

2

Sentence: 1

Taken

VBN

O

3

Sentence: 1

At

IN

O

4

Sentence: 1

London

NNP

O

6 Results and Impact This proposed framework deployed for Samsung voice-enabled device benchmark comparison of market trending datasets across competitors using various approaches and the effectiveness comparison of proposed framework against manual approaches as given in Table 4. The overall evaluation of voice-based solutions are assessed with the proposed approach, and the quality verdict has been derived in terms of user experience, action performed and the correctness of the information across the voice assistant competitor benchmark evaluation. The proposed methods is very efficient and it is not only asses the quality of the voice solution using AI approaches as given in Table 5 but also saves huge human efforts as given in Table 6 on effectiveness and thus avoiding the errors occurred during the manual process with consistency in evaluation without any human bias. The result analysis represents the accuracy on every trending topics which are dynamic in nature every day in an automated approach, thus no manual effort in gathering the new trending data and preparing the scripts on every day when market Table 4 Manual versus proposed method effectiveness comparison Criteria

Manual approach

Proposed framework

Execution/day

~ 150 TC’s/engineer

~ 4320 TC’s

Parallel execution

NO

YES

Voice accents

Manual

100% Automated

Test reporting

Manual benchmark

Automated benchmark

Test configuration

Manual setup

Auto setup

Man days savings

NO

96.52%

Table 5 Result summary of voice assistants Voice assistants

Quality

Correctness

Competitor ‘A’

Excellent

Excellent

Competitor ‘B’

Above average

Above average

Competitor ‘C’

Average

Average

240

J. Kanniappan et al.

Table 6 Results and impact analysis Voice-enabled devices

Total open datasets

Man days saved

Competitor ‘A’

31,500

150

Competitor ‘B’

31,500

150

Competitor ‘C’

31,500

150

Table 7 Results of trending category across assistants Trend dataset category

Total data

App ‘A’

App ‘B’

App ‘C’

COVID

3350

0.85

0.8

0.77

Cryptocurrency

3100

0.9

0.88

0.81

Movies/songs

5620

0.87

0.79

0.75

Stock market

1650

0.92

0.87

0.81

World day

1800

0.95

0.9

0.84

Personalities

3550

0.88

0.86

0.78

Web series

1800

0.92

0.88

0.75

Festival names

2600

0.89

0.84

0.77

Sports

3200

0.93

0.9

0.82

trend changes. The size of utterances generated automatically is dynamic in nature depends on the duration and the region. The accuracy on all trending topics assessed against each of voice-based solutions as given in the Table 7.

6.1 Assessment of Effectiveness Result The proposed framework played crucial role to assure market trending dataset quality, correctness, stability of voice-enabled products and saving huge manual efforts across multiple execution cycles. Let us assume, number of test cases used for testing = ‘n’ and number of test cases kept for 1 test cycle = ‘T’, Hence n/T = number of test cycles. For manual testing, one tester can test 150 test cases at most (in one day). So, T/150 = number of test cases for one test cycle/number of test cases executed by one tester. Calculation to determine the manual effort and assessing the efficiency improvement:      n n  T ∗ = (3) 150 T 150

AI-Based Automated Approach for Trend Data Generation …

(i)

241

For automation, 4320 test cases can be executed in one day for Competitors ‘A’, ‘B’ and ‘C’ and one tester to perform failure triaging. So, T/4320 = No. of test cases for one test cycle/No. of test cases executed by automation. Total no. of testers required. 

T 4320

    n n  ∗ = T 4320

(4)

(ii) Hence, using (i) and (ii), percentage decrease in manpower is calculated as,  n n   n  ∗ 100 = 96.52% − 150 4320 150 (iii) • • •

(5)

Effectiveness from proposed automation framework. 450 man days saved. 15,151 defects captured. 20 top market trending categories covered.

7 Conclusion and Future Scope As the market demand upsurge for AI-enabled solutions across the industry and devices like smartphone, TV, refrigerator, speakers, watch, camera, robots, we are in the crucial journey to define efficient mechanism to meet customer quality, data on real time, human centric and lead in market. The proposed framework applied on voice-enabled solutions for Samsung Bixby and competitors for smartphone platform to improve dynamic data for production NLP system. This framework can expanded for other AI systems and different types of datasets like image, audio and video. As a future work, this framework can expanded for open source global community for smartphone platform across the text input AI solutions.

References 1. Khurana D, Koli A, Khatter K, Singh S (2017) Natural language processing: State of The Art, Current trends and challenges 2. Abujabal A, Roy RS, Yahya M, Weikum G (2018) Comqa: a community-sourced dataset for complex factoid question answering with paraphrase clusters. CoRR abs/1809.09528 3. Barzilay R, McKeown KR (2001) Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th annual meeting on association for computational linguistics (ACL), pp 50–57 4. Google Market Trend (2021). https://trends.google.com/trends/ 5. Niu X, Kelly D (2014) The use of query suggestions during information search. IP&M 50(1):218–234 6. Mohamad M (2020) Semantic similarity with BERT. https://keras.io/examples/nlp/semantic_ similarity_with_bert/. 15 Aug 2020

242

J. Kanniappan et al.

7. Sterbak T (2018) Named entity recognition with BERT. https://www.depends-on-the-defini tion.com/named-entity-recognition-with-bert/. 10 Dec 2018 8. A study on automatic speech recognition systems; international symposium on digital forensics and security (ISDFS) (2020) 9. Paul R, Beniwal R, Kumar R, Saini R (2018) A review on speech recognition methods. Int J Future Revolution Comput Sci Commun Eng 4(2):292–298 10. Smith RW, Hipp DR (1995) Spoken natural language dialog systems: a practical approach. Oxford University Press 11. Tur G, De Mori R (2011) Spoken language understanding: systems for extracting semantic information from speech. John Wiley and Sons 12. Wang B, Jay Kuo C-C, Sentence embedding method by dissecting BERT-based word models 13. Androutsopoulos I, Paliouras G, Karkaletsis V, Sakkis G, Spyropoulos CD, Stamatopoulos P (2000) Learning to filter spam e-mail: a comparison of a naive bayesian and a memory-based approach. arXiv preprint cs/0009009 14. Feldman S (1999) NLP Meets the Jabberwocky: natural language processing in information retrieval. ONLINE-WESTON THEN WILTON 23:62–73 15. Ritter A, Clark S, Etzioni O (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, pp 1524–1534 16. Singh V (2021) Named entity recognition using transformers. https://keras.io/examples/nlp/ ner_transformers/. 23 Jun 2021

Identification of ADHD Disorder in Children Using EEG Based on Visual Attention Task by Ensemble Deep Learning Swati Aggarwal, Nupur Chugh, and Arnav Balyan

Abstract Attention deficit hyperactivity disorder (ADHD) is a neurodevelopmental illness that is complicated, universal, and heterogeneous. Children with ADHD have a hard time staying focused and managing their activities. For children with ADHD, which is today a major issue, early screening and diagnosis are essential. The purpose of this research is to create a technology that will help doctors distinguish between ADHD and healthy youngsters using electroencephalography (EEG) based on the visual attention task. Because of the use of stand-alone classifiers, a few studies that employed EEG data to diagnose ADHD were unable to improve accuracy. As a result, the authors created a novel analysis framework in this research by performing an average ensemble on long short-term memory (LSTM) and gated recurrent unit (GRU), both of which have a lot of potential for diagnosing the problem. The suggested model obtained an accuracy of 97.9% and 95.33% for training and testing data, respectively. The accuracy of the model validates the detection efficiency over the performance of stand-alone deep learning models. As a result, this new ensemble model is thought to be a good option for detecting and diagnosing children with an attention disorder. Keywords Attention deficit hyperactivity disorder (ADHD) · Deep learning · Ensemble · EEG

1 Introduction Attention-deficit/hyperactivity disorder (ADHD) is a heterogeneous neurodevelopmental disorder that affects a large number of people. Medical symptom-based diagnosis criteria for ADHD depend largely on subjective information obtained from different sources, which may lead to prejudices and discrepancies in diagnoses. [1]. Some of the features of this behavioral condition are inattention, impulsivity, and hyperactivity [2]. It often affects children, but symptoms may occur and also affect S. Aggarwal · N. Chugh (B) · A. Balyan Netaji Subhas Institute of Technology, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_18

243

244

S. Aggarwal et al.

adults in later stages of adultery. The Centers for Disease Control and Prevention in Atlanta, GA, in 2011 conducted a study and found that 11 percent of children (4–17 years old) in the United States were diagnosed with ADHD [3]. In children with ADHD, academic performance and social interactions are hampered. 44% of the 400 primary care physicians polled said the ADHD diagnostic criteria were unclear, 72% said it was simpler to detect ADHD in children than adults, and 75% said the diagnostic criteria’s accuracy was “poor” or “moderate” [4]. Because of the difficulty and inaccuracy of ADHD diagnosis, some research has been done on developing assistive gadgets to shorten the time and improve the accuracy of the diagnosis. In recent years, electroencephalography (EEG) signals have been successfully applied for ADHD diagnosis due to their accessibility, informativeness, and low cost [5–7]. Researchers analyzed different measures for diagnosing ADHD using EEG signals such as Event-Related Potential markers (ERPs) [8, 9], Power Spectral Density (PSD) [10], Current Source Density (CSD) [11], and univariate and multivariate EEG measurements [12]. The most commonly documented effect in the literature was a large difference in frequency of the EEG signal between ADHD and healthy controls, appearing as distinct electrical distributions in the scalp that are obtained from EEG’s relative band power [13–15]. Based on the elaborately hand-crafted features of ADHD, some machine learning algorithms have been utilized to build classification models as complementary tools for the diagnosis of ADHD, such as the logistic regression (LR) [16, 17], linear discriminant analysis (LDA) [18], and the support vector machine (SVM) [19–21]. The majority of deep learning-related methods, such as convolutional neural networks (CNN) [22–24] and long short-term memory (LSTM) architecture of recurrent neural network [25, 26], are used for ADHD diagnosis, with CNN being the most common. Although many researchers explored machine learning methods and deep learning, only a small number of researchers [27, 28] were able to obtain a high error rate. Because the information in the frontal and parietal lobes is dispersed, it is challenging to create machine learning algorithms that can distinguish ADHD with high potential using EEG. As a result, recent studies [13, 22, 29] have used a variety of techniques to improve the rate of separability between ADHD control and healthy controls using attributes derived from multichannel EEG signals. However, these researches used stand-alone machine learning algorithms as classifiers, which are incapable of making collective decisions. Recently, researchers have explored ensemble learning to deal with complex variations in features and biases and improved prediction performance by combining results from several simple classifiers using voting [30], bagging [31], stacking [32], or boosting [33, 34] techniques. Ensemble learning techniques (ELT) were used to differentiate ADHD subjects from healthy controls in three recent studies [13, 35, 36]. Voting-based ELT was applied on EEG recordings to differentiate ADHD adults from healthy controls, but the discrimination accuracy was claimed to be low (0.65).

Identification of ADHD Disorder in Children Using EEG …

245

To address these challenges, the authors use a combination of deep learning and ensemble learning to perform ADHD classification in this paper. The following are the contributions: 1. To distinguish ADHD controls from healthy controls, an average ensemble framework comprising two deep learning architectures (long short-term memory (LSTM) and gated recurrent unit (GRU)) was introduced. 2. The average ensemble strategy was applied to solve the bias-variance problem. 3. 5-fold cross-validation was used to tune the hyperparameters and reduce overfitting. 4. An average ensemble framework combining LSTM and GRU was evaluated on an ADHD dataset; findings show that the proposed approach has a greater potential in comparison with state-of-art methods to detect attention-deficit/hyperactivity disorder.

2 Methods Preprocessing is used to remove noise and other artifacts beforehand. The data is then divided into 5 groups using the fivefold cross-validation technique. Then, in the first stage of classification, two deep learning architectures, LSTM and GRU, are trained on training data and applied to testing data to predict the samples’ class. Finally, the authors aggregate the predictions from the first stage using average ensemble (second-stage ensemble model) in order to reduce generalization error and provide a better result.

2.1 Subjects The authors used the dataset from “EEG data for ADHD/control children” that is publicly available on IEEE data port [37]. The study included 61 children with ADHD and 60 healthy controls (boys and girls) between the ages of 7 and 12. The ADHD children were diagnosed based on DSM-IV guidelines [38] by an experienced psychiatrist, and they were given Ritalin for up to 6 months. Psychiatric illnesses or reports of high-risk activities among the children in the control group are absent among the subjects in the control group.

2.2 Preprocessing EEG signals are frequently attenuated by power sources and artifacts such as cardiac signals (ECG), movement artifacts induced by muscle contraction (EMG), and ocular signals induced by eyeball movement (EOG). To reduce noise and artifacts, data is

246

S. Aggarwal et al.

preprocessed using automatic tools or by skilled visual inspection. In this analysis, the authors utilized the EEGLAB toolbox [39] for preprocessing. EEG signals were filtered using a finite impulse response (FIR) bandpass filter with a frequency range of 2.0–50 Hz. By changing the reference to an average reference, the data was re-referenced. Independent component analysis (ICA, runica algorithm) was used to reduce ocular artifact. Decomposition and conversion of multichannel signals to ICA components were performed. The components with a threshold of 90% were chosen and eliminated. In this study, using a Hanning window, the data was segmented into 4-s windows with 75% overlap. For the suggested approach, these settings produced the best results. The training data and testing data have the following format: (number of epochs* sampling rate* channel). For the training set, there were 11,555 epochs, and for the testing set, there were 4952 epochs.

2.3 Classification Models We evaluate the prediction performance of two common RNN architectures for discriminating between normal and ADHD controls after preprocessing the datasets. At the first step of classification models, LSTM and GRU are used. Both of these architectures have been used successfully for EEG signal classification in the application of speech recognition [40, 41], natural language understanding [42, 43], etc., and are reviewed as follows. Long Short-Term Memory Networks (LSTMs): Long short-term memory networks (LSTMs) are one of the classes of recurrent neural network that uses weight sharing across networks to interpret sequential data. Hochreiter and Schmidhuber [44] proposed them, and many people in several domains refined them. The information flow in an LSTM is regulated via three different gates that are forget gate, input gate, and output gate. LSTMs are specially built to take into account longterm dependencies and perform exceptionally well in a variety of applications. Table 1 provides the parameters of the layers used. To prevent over-fitting, the authors incorporate the dropout layer. Gated Recurrent Units (GRU): Gated recurrent units (GRU) are a type of recurrent neural network with a gated mechanism that was first proposed by Cho et al. [45]. GRU is a newer type of recurrent neural network that looks a lot like LSTMs. The GRUs did away with the cell state and instead used the concealed state to convey data. It also only has two gates, one for reset and one for an update. Table 2 provides a list of the layers used. Average Ensemble: In practice, there are various classification systems for ADHD, but not any of the systems is completely correct, and individual classifiers may make errors in distinct areas. The assemblage of numerous classifiers may result in better efficiency than stand-alone models. The second-stage model optimally combines forecasts from the first-stage models to come up eventually with predictions. To

Identification of ADHD Disorder in Children Using EEG … Table 1 Summary of LSTM layers

Table 2 Summary of GRU layers

Layer used

Type of layer

lstm_3 dense_7

247 Shape

Parameters

LSTM

512.64

21,504

Dense

512.32

2080

lstm_4

LSTM

512.32

8320

dropout_4

Dropout

512.32

0

dense_8

Dense

512.16

528

dropout_5

Dropout

512.16

0

lstm_5

LSTM

16

2112

dense_9

Dense

16

272

dense_10

Dense

2

34

Layer used

Type of layer

Shape

Parameters

gru_3

GRU

512.64

16,320

dense_11

Dense

512.32

2080

gru_4

GRU

512.32

6336

dropout_6

Dropout

512.32

0

dense_12

Dense

512.16

528

dropout_7

Dropout

512.16

0

gru_5

GRU

16

1632

dense_13

Dense

2

34

stack the numerous classifiers in this paper, the authors use an average ensemble model. The authors develop an average ensemble framework in the second step, which is used for training two independent deep learning architectures (LSTM and GRU) on a single dataset. The output of two architectures is combined to provide the final prediction, which is the average of the two. The data flow graph is shown in Fig. 1. The authors employed fivefold cross-validation to determine the accuracy of the proposed approach at this stage.

2.4 Experimentation Setup The experiments were conducted using the Python programming language and based on the following experimental parameters: For data splitting (hold-out), the authors used 70% and 30% of the data for training and testing, respectively. The batch size was also set to 32, and the number of epochs was set to 80. The rmsprop optimizer was employed, with a categorical cross-entropy loss function and a learning rate of 25%. We also used weight decay and L2-regularizers to reduce over-fitting in the various

248

S. Aggarwal et al.

EEG Dataset

Training Set

Test Set

Training LSTM

Tes ng

GRU Model

Classifier 1

Output 1

Classifier 2

Output 2

Average Ensemble

Predicted Result Fig. 1 Flow graph of classification method used

models. We should point out that these models are all trained independently, and the outcomes of ensemble models are combined using the average/fusion technique. Furthermore, in single and ensemble learning models, a final dense layer was updated to produce two classes representing normal and ADHD controls. NVIDIA GTX 1050Ti was used for training and testing.

3 Results This section presents and examines the single and ensemble model findings obtained using the experimental setup described in the preceding section.

3.1 Results of Independent Architecture The authors recorded loss and accuracy values for each fold during different training epochs, as well for model validation. Figures 2 and 3 show progress graphs of performance variables for training and validation of LSTM and GRU. In a maximum of 70 epochs, the model converged and obtained its optimum accuracy. For training and evaluation, the author’s determined mean loss and mean accuracy for each fold. The mean fivefold progress plot for training and validation is shown in Fig. 4.

Fig. 2 Progress plots for accuracy and loss values of fivefold cross-validation for training and validation of LSTM

Identification of ADHD Disorder in Children Using EEG … 249

Fig. 3 Progress plots for accuracy and loss on training and validating GRU

250 S. Aggarwal et al.

Fig. 4 Progress plots for mean accuracy and mean loss on training and testing LSTM and GRU

Identification of ADHD Disorder in Children Using EEG … 251

252

S. Aggarwal et al.

It can be seen in Figs. 2 and 3 that the training and testing accuracy increases from epoch 0 to 50. The accuracy curves for both training and testing data become steady after this epoch.

3.2 Result of Ensemble Framework The authors evaluated the proposed framework using a fivefold cross-validation approach in this study. In each fold, 70% of the samples were used for training and 30% for testing. Table 3 presents the accuracies, as well as the results of precision, recall, and f1-score for each fold. The graphs of assessment metrics over time imply how quickly the model is approaching its best accuracy. The authors recorded mean accuracy and mean loss values during different training epochs, as well as for model validation, and its corresponding plots are displayed in Fig. 5. Figures 6 and 7 show the confusion matrixes based on the True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN) terms as described below: TP: The number of the correct responses identified correctly by the proposed approach. TN: The number of the incorrect responses recognized correctly by the proposed approach. FP: It indicates how many negative class samples your model got wrong. FN: It indicates how many positive class samples your model mistakenly predicted. Table 3 Evaluation metric result of ensemble framework Fold

Sample

Precision (%)

Recall (%)

F1-score (%)

Accuracy (%)

1

Train

98.0833

98.1593

98.1212

97.9011

Test

96.0421

95.5218

95.7813

95.2958

Train

98.6659

97.4424

98.0503

97.8362

Test

96.1496

94.6912

95.4148

94.9122

Train

98.1038

99.2251

98.6613

98.4962

Test

95.2331

97.3998

96.3042

95.8207

Train

98.4014

98.9733

98.6865

98.5286

Test

96.2963

97.6526

96.9697

96.5879

Train

98.84

95.7381

97.2643

96.9926

Test

96.9996

95.7381

94.5576

94.0642

2 3 4 5 Mean

Train







97.95

Test







95.33

Fig. 5 Progress plots for mean accuracy and mean loss on training and testing ensemble framework

Identification of ADHD Disorder in Children Using EEG … 253

Fig. 6 Confusion matrices for training ensemble framework for fivefold CV

254 S. Aggarwal et al.

Fig. 7 Confusion matrices for testing ensemble framework for fivefold CV

Identification of ADHD Disorder in Children Using EEG … 255

256

S. Aggarwal et al.

Fig. 8 Comparison of accuracy of single independent architectures (LSTM, GRU) and ensemble framework for train and test data

3.3 Comparison of Classification Results Single independent deep learning architectures, as well as ensemble framework, are tested on the same dataset to corroborate the efficacy of the proposed average ensemble framework. The column chart of the accuracy derived by the classifiers based on all test sets is shown in Fig. 8. It shows that the ensemble framework produces better results than solo independent architectures.

4 Conclusion In this study, using an EEG-based visual attention task, the authors suggested an average ensemble framework for classifying children with ADHD in this study. It has been discovered that the suggested ensemble framework performed admirably, with 97.9% accuracy, outperforming single neural network architectures, LSTM and GRU. Therefore, using ensemble techniques to tackle issues in ADHD identification is advantageous and the suggested framework can be broadened to include other medical applications and brain signal recordings. The authors demonstrated that the proposed framework has a remarkable learning power for detecting the primary aberration in ADHD children, which offers a ray of hope for developing a highly reliable auxiliary diagnostic system. Furthermore, by using more datasets, the performance might be enhanced, and different fusion methodologies could be explored.

Identification of ADHD Disorder in Children Using EEG …

257

References 1. Luo Y, Alvarez TL, Halperin JM, Li X (2020) Multimodal neuroimaging-based prediction of adult outcomes in childhood-onset ADHD using ensemble learning techniques. NeuroImage Clin 26:102238 2. Arns M, Heinrich H, Strehl U (2014) Evaluation of neurofeedback in ADHD: the long and winding road. Biol Psychol 95:108–115 3. Visser SN, Danielson ML, Bitsko RH, Holbrook JR, Kogan MD, Ghandour RM, Perou R, Blumberg SJ (2014) Trends in the parent-report of health care provider-diagnosed and medicated attention-deficit/hyperactivity disorder: United States, 2003–2011. J Am Acad Child Adolesc Psychiatry 53(1):34–46 4. Marshall P, Hoelzle J, Nikolas M (2021) Diagnosing attention-deficit/hyperactivity disorder (ADHD) in young adults: a qualitative review of the utility of assessment measures and recommendations for improving the diagnostic process. Clin Neuropsychol 35(1):165–198 5. Snyder SM, Rugino TA, Hornig M, Stein MA (2015) Integration of an EEG biomarker with a clinician’s ADHD evaluation. Brain Behav 5(4):e00330 6. Janssen TWP, Bink M, Geladé K, van Mourik R, Maras A, Oosterlaan J (2016) A randomized controlled trial into the effects of neurofeedback, methylphenidate, and physical activity on EEG power spectra in children with ADHD. J Child Psychol Psychiatry 57(5):633–644 7. Rodríguez-Martínez EI, Angulo-Ruiz BY, Arjona-Valladares A, Rufo M, Gómez-González J, Gómez CM (2020) Frequency coupling of low and high frequencies in the EEG of ADHD children and adolescents in closed and open eyes conditions. Res Dev Disabil 96:103520 8. Kaur S, Singh S, Arun P, Kaur D, Bajaj M (2019) Event-related potential analysis of ADHD and control adults during a sustained attention task. Clin EEG Neurosci 50(6):389–403 9. Lau-Zhu A, Tye C, Rijsdijk F, McLoughlin G (2019) No evidence of associations between ADHD and event-related brain potentials from a continuous performance task in a populationbased sample of adolescent twins. PLoS ONE 14(10):e0223460 10. Lenartowicz A, Truong H, Salgari GC, Bilder RM, McGough J, McCracken JT, Loo SK (2019) Alpha modulation during working memory encoding predicts neurocognitive impairment in ADHD. J Child Psychol Psychiatry 60(8):917–926 11. Ponomarev VA, Mueller A, Candrian G, Grin-Yatsenko VA, Kropotov JD (2014) Group independent component analysis (gICA) and current source density (CSD) in the study of EEG in ADHD adults. Clin Neurophys 125(1):83–97 12. González JJ, Méndez LD, Mañas S, Duque MR, Ernesto P, De Vera L (2013) Performance analysis of univariate and multivariate EEG measurements in the diagnosis of ADHD. Clin Neurophys 124(6):1139–1150 13. Tenev A, Markovska-Simoska S, Kocarev L, Pop-Jordanov J, Müller A, Candrian G (2014) Machine learning approach for classification of ADHD adults. Int J Psychophys 93(1):162–166 14. Arns M, Gordon E (2014) Quantitative EEG (QEEG) in psychiatry: diagnostic or prognostic use? Clin Neurophys J Int Fed Clin Neurophys 125(8):1504–1506 15. Lenartowicz A, Loo SK (2014) Use of EEG to diagnose ADHD. Curr Psychiatry Rep 16(11):498 16. Fernández A, Quintero J, Hornero R, Zuluaga P, Navas M, Gómez C, Escudero J, GarcíaCampos N, Biederman J, Ortiz T (2009) Complexity analysis of spontaneous brain activity in attention-deficit/hyperactivity disorder: diagnostic implications. Biol Psychiatry 65(7):571– 577 17. Buyck I, Wiersema JR (2014) Resting electroencephalogram in attention deficit hyperactivity disorder: developmental course and diagnostic value. Psychiatry Res 216(3):391–397 18. Duda M, Ma R, Haber N, Wall DP (2016) Use of machine learning for behavioral distinction of autism and ADHD. Transl Psychiatry 6(2):e732–e732 19. Mueller A, Candrian G, Kropotov JD, Ponomarev VA, Baschera G-M (2010) Classification of ADHD patients on the basis of independent ERP components using a machine learning system. Nonlinear Biomed Phys 4(1):S1. https://doi.org/10.1186/1753-4631-4-S1-S1

258

S. Aggarwal et al.

20. Anuradha J, Tisha R, Ramachandran V, Arulalan KV, Tripathy BK (2010) Diagnosis of ADHD using SVM algorithm. In: Proceedings of the 3rd annual ACM Bangalore conference–COMPUTE, vol 10. pp 1–4. https://doi.org/10.1145/1754288.1754317 21. Tenev A, Markovska-Simoska S, Kocarev L, Pop-Jordanov J, Müller A, Can-drian G (2014) Machine learning approach for classification of ADHD adults. Int J Psychophys 93:162–166. https://doi.org/10.1016/j.ijpsycho.2013.01.008 22. Mohammadi MR, Khaleghi A, Nasrabadi AM, Rafieivand S, Begol M, Zarafshan H (2016) EEG classification of ADHD and normal children using non-linear features and neural network. Biomed Eng Lett 6(2):66–73 23. Dubreuil-Vall L, Ruffini G, Camprodon JA (2020) Deep learning convolutional neural networks discriminate adult ADHD from healthy individuals on the basis of event-related spectral EEG. Front Neurosci 14:251 24. Chen H, Song Y, Li X (2019) A deep learning framework for identifying children with ADHD using an EEG-based brain network. Neurocomputing 356:83–96 25. Chen H, Song Y, Li X (2019) Use of deep learning to detect personalized spatial-frequency abnormalities in EEGs of children with ADHD. J Neural Eng 16(6):066046 26. Liu R, Huang ZA, Jiang M, Tan KC (2020) Multi-LSTM networks for accurate classification of attention deficit hyperactivity disorder from resting-state fMRI data. In: 2nd International conference on industrial artificial intelligence (IAI), IEEE, pp 1–6 27. Markovska-Simoska S, Pop-Jordanova N (2017) Quantitative EEG in children and adults with attention deficit hyperactivity disorder: comparison of absolute and relative power spectra and theta/beta ratio. Clin EEG Neurosci 48(1):20–32 28. Loo SK, Cho A, Hale TS, McGough J, McCracken J, Smalley SL (2013) Characterization of the theta to beta ratio in ADHD: identifying potential sources of heterogeneity. J Attention Disord 17(5):384–392 29. Helgadóttir H, Gudmundsson ÓÓ, Baldursson G, Magnússon P, Blin N, Brynjólfsdóttir B, Emilsdóttir Á, Gudmundsdóttir GB, Lorange M, Newman PK, Jóhannesson GH (2015) Electroencephalography as a clinical tool for diagnosing and monitoring attention deficit hyperactivity disorder: a cross-sectional study. BMJ Open 5(1):e005500 30. Ruta D, Gabrys B (2005) Classifier selection for majority voting. Inf Fusion 6(1):63–81 31. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140 32. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259 33. Johnston BA, Mwangi B, Matthews K, Coghill D, Konrad K, Steele JD (2014) Brainstem abnormalities in attention deficit hyperactivity disorder support high accuracy individual diagnostic classification. Hum Brain Mapp 35(10):5179–5189 34. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139 35. Eloyan A, Muschelli J, Nebel MB, Liu H, Han F, Zhao T, Barber AD, Joel S, Pekar JJ, Mostofsky SH, Caffo B (2012) Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging. Front Syst Neurosci 6:61 36. Zhang-James Y, Helminen EC, Liu J, Franke B, Hoogman M, Faraone SV (2019) ENIGMAADHD working group, Machine learning classification of attention-deficit/hyperactivity disorder using structural MRI data. bioRxiv 546671 37. https://ieee-dataport.org/open-access/eeg-data-adhd-control-children#files 38. Nelson-Gray RO (1991) DSM-IV: empirical guidelines from psychometrics. J Abnorm Psychol 100(3):308 39. Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods 134(1):9–21 40. Kim J, El-Khamy M, Lee J (2017) Residual LSTM: design of a deep recurrent architecture for distant speech recognition. arXiv preprint arXiv:1701.03360 41. Shewalkar A (2019) Performance evaluation of deep neural networks applied to speech recognition: RNN LSTM and GRU. J Artif Intell Soft Comput Res 9(4):235–245 42. Huang Z, Xu P, Liang D, Mishra A, Xiang B (2020) TRANS-BLSTM: Transformer with bidirectional LSTM for language understanding. arXiv preprint arXiv:2003.07000

Identification of ADHD Disorder in Children Using EEG …

259

43. Zheng Y, Liu Y, Hansen JH (2017) Navigation-orientated natural spoken language understanding for intelligent vehicle dialogue. In: 2017 IEEE intelligent vehicles symposium (IV), IEEE, pp 559–564 44. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780 45. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

Machine Learning as a Service (MLaaS)—An Enterprise Perspective Ioannis Grigoriadis, Eleni Vrochidou , Iliana Tsiatsiou, and George A. Papakostas

Abstract Machine learning (ML) algorithms due to their outstanding performances are being extensively used in applications covering several different domains. Recently, the increased growth of cloud services provided training infrastructures for complex ML models able to deal with big data, resulting in the enhancement of ML as a Service (MLaaS). Toward this end, ML applications have been deployed in systems, production models, and businesses. ML algorithms involve accessing data, which is often privacy sensitive. The latter may result in security and privacy risks. Toward this end, this work examines MLaaS and its incorporation into businesses, covering a wide range of different sectors. Companies that develop ML applications are reviewed, and trends in ML-related jobs are reported. Moreover, data protection privacy is discussed and the evolution of graphics processing units (GPUs) as a necessary supporting technology for ML applications is also considered. Keywords MLaaS · Machine learning as a service · Artificial intelligence · MLOps · Enterprise · GPUs

1 Introduction Machine learning is a growing branch of artificial intelligence (AI) focusing on emulating human intelligence by learning from the surroundings and gradually improving in accuracy [1]. Statistical methods are employed to train machine learning algorithms to perform classification, regression, clustering, and rule extraction by uncovering key insights in data mining projects [2]. These insights subsequently drive decision-making within applications and businesses, ideally impacting key growth metrics. As big data continue to expand and grow, the market demand for data scientists increases exponentially, expecting them to look in the data for answers to relevant business questions. By learning a pattern from sample data inputs, machine I. Grigoriadis · E. Vrochidou · I. Tsiatsiou · G. A. Papakostas (B) MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_19

261

262

I. Grigoriadis et al.

learning algorithms can predict and perform tasks solely based on the learned pattern and not on predefined program instructions. Machine learning is, therefore, a life savior in several cases where applying strict algorithms is not possible; the algorithm will learn the new process from previous patterns and will execute the extracted knowledge [3, 4]. Over the last decades, machine learning became a vital method of problem solving not only in research but also in human daily routine, as a crucial element of many applications: in social media features, product recommendations systems, image recognition subjects, sentiment analysis, automating employee access control mechanism, regulating healthcare efficiency and medical services along with predicting potential heart failure, language translation and banking domain, etc. Toward this end, a set of “Software-as-a-Service” (SaaS) technologies have been evolved and offered on-demand [5], including the development of distributed machine learning libraries on cloud services. Incorporation of machine learning libraries in production models, introducing Machine Learning as a Service (MLaaS), is constantly emerging. This work presents an overview of the state-of-the-art MLaaS technologies distributed and applied by enterprises. The main contribution of the work is to highlight the degree of integration of MLaaS technologies in enterprises. In this work, ML algorithms are examined at a system level in terms of their deployment in a system, without evaluating their theoretical contribution. Well-known enterprises that use ML applications are identified, covering a wide range of different fields. Data protection privacy in ML applications and job opportunities related to ML development applications are also discussed. Finally, an overview of graphics processing units (GPUs) that could process multiple data simultaneously is provided, as a necessary supporting technology for ML applications. The remainder of the paper is structured as follows: ML applications deployed in companies of various fields are presented in Sect. 2; Sect. 3 reviews companies that develop ML applications; data privacy issues are discussed in Sect. 4; trends in ML jobs are presented in Sect. 5; and finally, Sect. 5 summarizes details regarding GPUs’ evolution, and Sect. 7 concludes the paper.

2 Machine Learning Applications Worldwide distributed ML applications cover a wide range of fields: healthcare, education, economy and finance, social networks, etc. In what follows, an overview of well-known companies that employ ML applications, covering all aforementioned fields, is provided.

Machine Learning as a Service (MLaaS)—An Enterprise Perspective

263

Fig. 1 Logos of a IBM Watson [9] and b Quizlet [10]

2.1 Health care In healthcare, IBM company with the application “Better Healthcare,” managed to transition from older business models to newer revenue streams, by using AI called Watson (Fig. 1a) [6]. In recent years, Watson is deployed in several hospitals and medical centers, for its aptitude for making highly accurate recommendations in the treatment of certain types of cancers. Watson also reported significant potential in the retail sector, where it could be used as an assistant to shoppers, as well as to the hospitality industry. Other notable companies in the field of health care that use ML applications are included in Table 1.

2.2 Education ML applications are widely used in education to optimize and personalize learning experiences for students and to additionally help teachers to grade quicker and with higher accuracy. As technology evolves and computer features are more powerful, algorithms’ training is optimized, just like humans when they evolve. The latter is applied in education, where technology companies employ ML algorithms to formulate innovative and more intuitive methods of teaching [7]. One of the most popular software applications for educational purposes is Quizlet. Quizlet is an online studying tool that has access to a variety of big data, employing statistics and machine learning methods to make studying more efficient for students (Fig. 1b) [8]. Table 2 includes well-known companies in the field of education that use ML applications.

2.3 Economy and Finance ML applications are also adjusted in the field of economy and finance. ML technology can adapt to different situations and continue learning. The finance industry is taking advantage of ML, by implementing it in all facets of finance, from offering alternative

264

I. Grigoriadis et al.

Table 1 Healthcare companies that use ML applications Company

Location

Use of ML application

Quotient

Denver, Colorado

Reduce the cost of supporting EMR

KENSCI

Seattle, Washington

Predict illness and treatment

CIOX health

Alpharetta, Georgia

Improve the accuracy and flow of health information

PATHAI

Cambridge, Massachusetts

Help pathologists make quicker and more accurate diagnoses

Quantitative Insights

Chicago, Illinois

Improve the speed and accuracy of breast cancer diagnosis

Microsoft

Redmond, Washington

In Project InnerEye to differentiate between tumors and healthy anatomy using 3D radiological images

PFIZER

New York, New York

In research for how the body’s immune system can fight cancer

INSITRO

San Francisco, California

Development of drugs for quickly curing patients at a lower cost

BIOSYMETRICS

Boston, Massachusetts

Improve accuracy and eliminate tasks done by humans in different sectors of the healthcare realm

CONCERTO Health AI

New York, New York

Analyze oncology data

ORDERLY health

Denver, Colorado

Help employers and insurers save time and money on health care

MD INSIDER

Santa Monica, California

To better match patients with doctors

BETA BIONICS

Boston, Massachusetts

Manage blood sugar levels around the clock in those with Type 1 diabetes

PROGNOS

New York, New York

For early disease detection

BERG

Framingham, Massachusetts

Disease mapping and treatments in oncology

Table 2 Education companies that use ML applications Company

Location

Use of ML application

SCHOOLING

Austin, Texas

Helps students prepare for courses, college, and careers by assisting their criteria

COLLEGEAI

Boston, Massachusetts

Help prospective college students choose the best schools by providing information

DUOLINGO

Pittsburgh, Pennsylvania

Provide a statistical model of how long users can remember new words

COGNII

Boston, Massachusetts

Provide virtual tutoring and quick grading of open-ended responses

TESTIVE

Boston, Massachusetts

Help students improve standardized test scores through personalized teaching

Machine Learning as a Service (MLaaS)—An Enterprise Perspective

265

Table 3 Finance companies that use ML applications Company

Location

Use of ML application

AFFIRM

San Francisco, California

Payment service enabling consumers to simply finance items and pay them over time

AGENTRISK

Los Angeles, California

Help generate returns in idle assets while protecting portfolios from market volatility

DATAVISOR

Mountain View, California

Catalyze fraud detection

DESERVE

Menlo Park, California

Help adults build their credit history

ENOVA

Chicago, Illinois

Provide personalized risk and credit analysis

FEEDZAI

San Mateo, California

Provide solutions for managing risks online and in person

FINTECH STUDIOS

New York, New York

Provide search for financial professionals across millions of resources

KABBAGE

Atlanta, Georgia

Determine whether an applicant is approved, reducing the possibility for human error

PENDO SYSTEMS

Montclair, New Jersey

Extract data for loan originating, tax reporting, residential mortgages, and trade financing

RISKIFIED

New York, New York

Ensure fewer misidentifications of fraudulent activity and update on new methods of fraud

credit reporting methods to speeding up underwriting. ML is rapidly deploying in the industry to automate painstaking processes, driven to better opportunities in the fields of economy. Some companies that use ML applications in the field of finance are listed in Table 3.

2.4 Social Networks ML applications have been also integrated into social lives, making social media capable of proposing or even choosing our needs, providing a way for personalized decision-making. A practical ML application can be found, among all, in the social networking site “Twitter.” Twitter redesigned its timelines by using ML to prioritize tweets that are most relevant to each user, being able to rank them with a relevance score and place them on top of the user’s feed so that it is more likely to be seen [11]. A social networking platform that also uses ML, through an army of chatbots, is the well-known “Facebook.” Developers can create and submit a chatbot for inclusion in the Messenger application of Facebook. Chatbots provide information and links as

266

I. Grigoriadis et al.

a response to keywords or selections from multiple-choice menus [12]. Additionally, there are ML applications that provide the ability to filter spam and poor-quality content.

2.5 Complementary Applications ML applications have been deployed in a wide range of additional services such as voice commands, self-driving cars, movie platforms, and computer vision tasks. Pindrop’s Deep Voice biometric engine has created a deep neural network speaker recognition system that runs in the background of every call, being able to analyze multiple callers and, thus, authenticate legitimate callers and identify fraudsters [13]. Baidu’s Deep Voice 2 text-to-speech engine can imitate hundreds of human accents. The application is based on a deep neural network that can generate entirely synthetic human voices that are very difficult to distinguish from genuine human speech. The network can learn the unique subtleties in voice, cadence, accent pronunciation, and pitch to create accurate recreations of speaker voices based on voice pattern recognition systems [14]. ML algorithms are employed to self-driving cars. The car collects data from its sensors, interprets it, and decides its subsequent action, resulting in similar driving skills, or even better, than a human driver [15]. A commercial example is that of Google self-driving car, Waymo [16]. Waymo uses ML to understand its surroundings and creates circumstances viable to prevent accidents, providing a car that can drive autonomously and safely without the need of human surveillance. Netflix is a paid streaming service offering a variety of movies, TV shows, etc. In 2000, Netflix first introduced personalized movie recommendations and then, in 2006, launched the Netflix Prize, a ML and data mining competition. Netflix uses ML to shape the catalog of movies/TV shows by learning personalized characteristics of the user so as to propose targeted content [17, 18]. One of the most important applications of ML services is in computer vision. Computer vision is used in many fields, especially in agriculture. Blue River Technology [19] is an example of a company that extensively uses ML and computer vision to revolutionize agriculture by proposing intelligent machinery, implementing sustainable solutions regarding the usage of resources, and improving farming yields. Recently, Blue River has developed algorithms in order to identify plants in open fields and distinguish illegal plants among others [20].

3 Companies that Develop Machine Learning Techniques In the last decade, numerous companies have used ML applications to both expand their capabilities and as the main object of their activity. Machine learning operations (MLOps) refer to the collection of techniques and tools for the deployment

Machine Learning as a Service (MLaaS)—An Enterprise Perspective

267

of ML models in production. MLOps are implemented by certain companies so as to continuously deliver new versions of ML software for installation [21]. Table 4 includes information on companies with main interest in the development of ML algorithms.

4 Data Protection Privacy in Machine Learning ML can internalize concepts found in data to form predictions for new situations. To achieve reliable levels of accuracy, ML models need large datasets for training. Regarding big data, in order for individual privacy to be guaranteed, different anonymization techniques have been suggested. Many privacy challenges cited within big data are also relevant to AI being able to reidentify personal information using large datasets; however, only small amounts of consumer data, i.e., personal information, lack of transparency regarding their use. AI involves enormous amounts of data; therefore, data of high dimensionality need to be considered by models. However, conventional statistical techniques consider only a limited number of selected variables. Due to both novel data regularization techniques and a decrease in computational cost, possible feature space has drastically expanded, so that ML models could consider thousands of variables for making a single prediction. With algorithms that can make inferences from such large and complex datasets, new conceptual issues arise. In fact, with the vast amount of daily collected data, it is common for data owners not to be aware of how the data collected from them is being used or even not being aware of what exactly type of data is being collected [22]. According to privacy leaks related to the data sharing process, multiple levels of threads exist: non-clear private data, reconstruction attacks, model inversion attacks, membership inference attacks, and de-anonymization [22]. Therefore, many approaches that solve or restrict the problem of data security on ML usage have been developed, such as Cryptographic, Garbled Circuits, Homomorphic Encryption, Secret Sharing, Perturbation, Secure Processors, Dimensionality Reduction, Differential Privacy, and Local Differential Privacy [23].

5 Trends in Machine Learning Jobs In recent years, the focus on digitization has been as intense as ever. ML and AI are helping to drive information technology (IT) companies and global enterprises out of the global pandemic with minimal losses. Specialists in the fields of data science and ML are therefore to be sought. Some of the main jobs related to ML currently in the spotlight are the following:

268

I. Grigoriadis et al.

Table 4 Companies that use ML as their main subject of activity Company

Location

Subject

DATAROBOT

Boston, Massachusetts

Platform to enable data scientists to construct and apply ML models

HYPERSCIENCE

New York, New York

Turns human-readable content into machine-readable data

NEWKNOWLEDGE

Austin, Texas

Detects and provides resources to combat social media attacks

STRONG ANALYTICS

Chicago, Illinois

Uses ML to enhance the effectiveness of email marketing campaigns by automated optimization

AURORA FLIGHT SCIENCES

Cambridge, MA and Manassas

Uses ML to develop and manufacture unmanned flight systems and aerospace vehicles

MIGHTY AI

Seattle, Washington

Provides training data to companies that build computer vision models for autonomous vehicles

CLOUDZERO

Boston, Massachusetts

Helps clients keep track of everything that is happening on their clouds

KEMVI

San Francisco, California

ML engine blasts through massive amounts of data

ZYLOTECH

Cambridge, Massachusetts

Pinpoint trends and patterns in customer data so clients can predict behaviors

ATTIVIO

Newton, Massachusetts

Analyzes user behaviors and builds relevancy models that learn and improve as content

GAMALON

Cambridge, Massachusetts

Deploys ML to read, interpret, and summarize text of customer messages

FITTEDCLOUD

Acton, Massachusetts

Uses ML to predict cloud usage

COLLABIP

Austin, Texas

Uses ML to transcribe and analyze phone calls

CADGE RESEARCH LABS

Chicago, Illinois

Employs ML for search and optimization

SOUNDHOUND

Sandra Carla, California

Uses ML in the development of its Speech-to-Meaning technology (continued)

Machine Learning as a Service (MLaaS)—An Enterprise Perspective

269

Table 4 (continued) Company

Location

Subject

QUOTIENT HEALTH

Denver, Colorado

Developed software to lower the expense of maintaining electronic medical records (EMR) systems by optimizing their design

PARTIAC

Los Angeles, California

Uses safe tracking technology to connect people and places in the nightlife sector

OCTI

Los Angeles, California

Locates people in-camera and employs that knowledge to apply different effects

FAMA TECHNOLOGIES

Playa Vista, California

Help companies weed out job applicants by scouring their online presence, including social media, for information that indicates risk

UNITY TECHNOLOGIES

San Francisco, California

Uses ML to help developers and researchers train agents in realistic, complex scenarios

DIGITAL REASONING Franklin, Tennessee

Evaluates human behaviors in communications data

LUMINOSO

Cambridge, Massachusetts

Deploys ML in sifting through massive amounts of text data from call centers to improve client interactions

QBURST

Chantilly, Virginia

Makes quick data-driven decisions

HIREIQ

Atlanta, Georgia

Recruitment platform that allows companies to virtualize interviews

STABILITAS

Seattle, Washington

Optimizes crisis communication

• Machine Learning Software Engineer. ML software engineers are programmers who work in the field of AI. Their task is to create algorithms that enable machines to analyze input information and understand causal relationships between events. ML engineers also work for the improvement and update of already distributed algorithms. • Data Scientist. Data scientists apply ML algorithms and data analytics to work with big data. One of their main tasks is to discover patterns in data that can be used for predictive business intelligence. • Artificial Intelligence for IT Operations (AIOps) Engineer. AIOps engineers help to develop and deploy ML algorithms that analyze IT data and boost the efficiency of IT operations.

270

I. Grigoriadis et al.

• Cyber Security Analyst. Cyber security analysts identify information security threats and risks of data leakages. They also implement measures to protect companies against information losses and ensure the safety and confidentiality of big data. It is important to protect data from malicious use, since AI systems are now ubiquitous. • Cloud Architect for Machine Learning. Cloud architects are responsible for managing cloud architecture in an organization. This profession is becoming more and more relevant, especially as cloud technologies are becoming more complex. Cloud computing architecture encompasses everything related to it, including ML software platforms, servers, storage, and networks. • Computational Linguist. Computational linguists take part in the creation of ML algorithms and programs used for developing online dictionaries, translating systems, virtual assistants, and robots. Computational linguists have a lot in common with ML engineers but differ in that they combine deep knowledge or linguistics with an understanding of how computer systems approach natural language processing. • Human-Centered AI System Designer/Researcher. Human-centered AI system designers make sure that intelligent software is created with the end user in mind. Human-centered AI must learn to collaborate with humans and continuously improve based on deep learning algorithms. A human-centered AI designer must possess not only technical knowledge but also cognitive science, computer science, psychology of communications, and user experience/user interface (UX/UI) design. • Robotics Engineer. A robotics engineer designs and builds robots and complex robotic systems. Robotics engineers must conceptualize the mechanics of future robotic assistants, envision how to assemble them in terms of electronics/hardware, and accompany them with the appropriate software. • Data Lawyer. Data lawyers are specialists that guarantee security and compliance with General Data Protection Regulation (GDPR) requirements to high-cost fines, knowing how to protect data, being able to buy and sell data in a way to avoid legal complications, and being able to manage risks arising from the processing and storing of data. • AI Ethicist. AI ethicists conduct ethical audits of the AI system of companies and propose a comprehensive strategy for improving non-technical aspects. Their goal is to eliminate reputational, financial, and legal risks that AI adoption might pose to the organization along with avoiding companies bearing responsibility for their intelligent software. • Conversation Designer. Conversation designers design the user experience of a virtual assistant. They are efficient UX/UI copywriters and specialists in communication, able to translate the brand’s business requirements into a dialog. In Table 5 are listed indicatively the trends in ML jobs in terms of posting requested skills, jobs projected growth in 10 years, and associated educational level [24].

Machine Learning as a Service (MLaaS)—An Enterprise Perspective

271

Table 5 Trends in ML job landscape [24] ML-related job

Requested skills (%)

Projected growth (%)

Education level

Software developer/engineer

4

31

Bachelor’s degree

Data scientist

70

19

Bachelor’s degree

Network engineer/architect

6

7

Bachelor’s degree

Data engineer

22

12

Bachelor’s degree

Data/data mining analyst

8

9

Bachelor’s degree

Computer system engineer/architect

4

9

Bachelor’s degree

Researcher/research associate

6

28

Bachelor’s degree

Product manager

4

10

Bachelor’s degree

Database architect

10

9

Bachelor’s degree

Engineering manager

4

6

Bachelor’s degree

6 GPUs Evolution The most important supporting technology of ML computational systems is the GPU. GPUs can perform simultaneously multiple computations, supporting ML training processes, and thus significantly accelerate ML operations. Therefore, GPUs need to be fast and accurate, able to process multiple data simultaneously and solve increasingly complex problems. GPUs have evolved substantially over the last five years, driving the market to a critical point for massive production, making GPUs powerful processors for general purpose computations. The need for powerful data processing is currently greater than ever, in response to the growing COVID-19 pandemic, since scientists are trying to develop sophisticated scientific methods to model and attack coronavirus. The latter techniques have become increasingly computationally intensive in nature, involving massive data that require high-performance computing. Figure 2 includes some indicative GPUs available in the market during the last 5 years, along with their scores and date of release. From Fig. 2, it is deduced that the scores of GPUs continuously improve over the years, while their prices decrease.

7 Conclusions ML is expected to be a critical technology in the future, for interpreting data, decisionmaking, and improving business by providing tools to resolve several problems in a wide range of applications, e.g., health care, education, finance, and more.

272

I. Grigoriadis et al.

Fig. 2 GPUs evolution

This work aims to evaluate the current state of MLaaS, examine the deployment of ML in business sectors and identify related research fields. The contributions of this work are as follows: (1) the examination of MLaaS related into its incorporation to several business sectors, (2) the research on companies that develop ML applications, (3) the presentation of trends in ML-related jobs, (4) insights in data protection privacy, and (5) the evolution of GPUs to support ML applications. This work highlights the contribution of MLaaS to the success of many well-known companies and the opportunities resulting from ML for both users, regarding the gain of business intelligence, and developers, regarding vocational rehabilitation and long-term development.

References 1. El Naqa I, Murphy MJ (2015) What is machine learning? In: Machine learning in radiation oncology, Springer International Publishing, Cham, pp 3–11. https://doi.org/10.1007/978-3319-18305-3_1 2. Matsangidou M, Liampas A, Pittara M, Pattichi CS, Zis P (2021) Machine learning in pain medicine: an up-to-date systematic review. Pain Ther 10:1067–1084. https://doi.org/10.1007/ s40122-021-00324-2 3. Firebanks-Quevedo D, Planas J, Buckingham K, Taylor C, Silva D, Naydenova G, ZamoraCristales R (2022) Using machine learning to identify incentives in forestry policy: towards a new paradigm in policy analysis. For Policy Econ 134:102624. https://doi.org/10.1016/j.for pol.2021.102624 4. Chauhan P, Sharma N, Sharma H (2018) Feature selection techniques in machine learning: a survey. In: International conference on recent trends in computational engineering and technology (ICTRCET-18). IEEE, Karnataka, India 5. Raghavan RS, Jayasimha KR, Nargundkar RV (2020) Impact of software as a service (SaaS) on software acquisition process. J Bus Ind Mark 35:757–770. https://doi.org/10.1108/JBIM12-2018-0382 6. Strickland E (2019) IBM Watson, heal thyself: how IBM overpromised and underdelivered on AI health care. IEEE Spectr 56:24–31. https://doi.org/10.1109/MSPEC.2019.8678513

Machine Learning as a Service (MLaaS)—An Enterprise Perspective

273

7. Villegas-Ch W, Román-Cañizares M, Palacios-Pacheco X (2020) Improvement of an online education model with the integration of machine learning and data analysis in an LMS. Appl Sci 10:5371. https://doi.org/10.3390/app10155371 8. Carman M (2020) Using Quizlet to enhance learner agency and self-efficacy in EFL. JALT Postconf Publ 2019. 516. https://doi.org/10.37546/JALTPCP2019-59 9. IBM: IBM Watson is AI for smarter business. https://www.ibm.com/watson 10. Quizlet Inc.: Quizlet. https://quizlet.com/en-gb 11. Shekhawat SS, Shringi S, Sharma H (2021) Twitter sentiment analysis using hybrid Spider Monkey optimization method. Evol Intell 14:1307–1316. https://doi.org/10.1007/s12065-01900334-2 12. Balasudarsun NL, Sathish M, Gowtham K (2018) Optimal ways for companies to use Facebook Messenger Chatbot as a marketing communication channel. Asian J Bus Res 8:1–17. https:// doi.org/10.14707/ajbr.180046 13. Pindrop: DEEP VOICETM Biometric Engine. https://www.pindrop.com/technologies/deepvoice/ 14. Zhang R, Chen W, Xu M, Yang Y (2019) Analysis and design of voice assisted learning system based on Baidu AI. In: 2019 IEEE international conference on computer science and educational informatization (CSEI), IEEE, pp 334–336. https://doi.org/10.1109/CSEI47661. 2019.8938894 15. Abstract E, Rao Q, Frtunikj J (2018) Deep learning for self-driving cars: chances and challenges. In: Proceedings of the 1st international workshop on software engineering for AI in autonomous systems—SEFAIS 18, pp 35–38 16. Waymo LLC by Google: Waymo Driver. https://waymo.com/ 17. Gomez-Uribe CA, Hunt N (2016) The Netflix recommender system. ACM Trans Manag Inf Syst 6:1–19. https://doi.org/10.1145/2843948 18. Aggarwal K, Mijwil MM, Alomari S, Gök M, Alaabdin AMZ, Abdulrhman SH (2022) Has the future started ? The current growth of artificial intelligence , machine learning , and deep learning. Iraqi J Comput Sci Math 3:115–123. https://doi.org/10.52866/ijcsm.2022.01.01.013 19. Blue river technology: blue river technology. https://bluerivertechnology.com/ 20. Panpatte S, Ganeshkumar C (2021) Artificial intelligence in agriculture sector: case study of blue river technology. In: Lecture notes in networks and systems, pp. 147–153. https://doi.org/ 10.1007/978-981-15-9689-6_17 21. Symeonidis G, Nerantzis E, Kazakis A, Papakostas GA (2022) MLOps - definitions, tools and challenges. In: IEEE 12th annual computing and communication workshop and conference (CCWC), pp 0453–0460. https://doi.org/10.1109/CCWC54503.2022.9720902 22. Al-Rubaie M, Chang JM (2019) Privacy-preserving machine learning: threats and solutions. IEEE Secur Priv 17:49–58. https://doi.org/10.1109/MSEC.2018.2888775 23. Ali HY, El-Medany W (2019) IoT Security: a review of cybersecurity architecture and layers. In: 2nd smart cities symposium (SCS 2019), Institution of Engineering and Technology, pp 18 (7 pp). https://doi.org/10.1049/cp.2019.0191. 24. Burning glass technologies: Mapping the genome of jobs: The burning glass skills taxonomy (2019)

Very Low Illumination Image Enhancement via Lightness Mapping Ahmed Rafid Hashim, Hana H. Kareem, and Hazim G. Daway

Abstract Image taken in low light suffers from poor quality, low contrast, and color distortion, so this image with too low light is difficult to improve due to lack of color information and lack of lightness. Therefore, enhancing images is an important topic because it is included in many important applications such as surveillance, tracking, and medical images. In this paper, we introduce a new method for enhancing very low illumination images. This method consists of two steps, first step includes color restoration by using adding a median filter and min filter, the color restoration applies on each color component R, G, B then converted to HSV color model to extract lightness component (v) then apply the second step on component (v) which include illumination enhancement using sigmoid function and Contrast Limited Adaptive Histogram Equalization (CLAHE). Then recombine with chromatic components (H, S). The HSV model is converted to the RGB model to get the final enhanced image. The proposed method is applied on eight images taken from the DICM database, and the proposed method is compared with other methods such as Parallel Nonlinear Adaptive Enhancement (PNAE), New Nonlinear Adaptive Enhancement (NNAE), Naturalness (NAT) preserved Fusion-based-enhancing (FU), new contrast enhancement (CN), and image enhancement by fuzzy (EF) using image quality metrics as entropy and naturalness image quality assessment evaluator (NIQE). Experimental results show that the proposed method has the highest value than other methods, where the proposed method achieves average quality value (6.65) and (3.80) for eight images of NIQE and entropy metrics. Keywords Low-light image · Image enhancement · Sigmoid function · Color restoration · CLAHE · Lightness component A. R. Hashim (B) Department of Computer Science, College of Education for Pure Sciences-Ibn Al-Haitham, University of Baghdad, Baghdad, Iraq e-mail: [email protected] H. H. Kareem Department of Physics, College of Education, Mustansiriyah University, Baghdad, Iraq H. G. Daway Department of Physics, College of Science, Mustansiriyah University, Baghdad, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_20

275

276

A. R. Hashim et al.

1 Introduction Enhancement images are one of the important topics in image processing because it is involved in several areas such as medical images, object detection [1, 2], surveillance, tracking, and underwater images [3, 4]. Images taken in low illumination have poor lighting, low contrast, and noise [5–9], and this affects the quality of the images, especially in the images taken in external conditions that suffer from low quality due to weather factors such as haze, fog, and dust. It notices the impact of the image quality, and this affects the quality of visual effect [10]. Therefore, improving the image in terms of lighting is very important for enhancing contrast and improving visual effects for human and computer processing. There are several methods to enhance the image, which are as follows. Histogram equalization is the most popular method used in image enhancement; it works on equalizing the level of intensity in the image so the output image has the uniform distribution of intensity [11], which enhances lightness and contrast. The histogram equalization has disadvantages including generating noise and causing a loss in color fidelity [5]. Therefore, a method called Adaptive Histogram Equalization (AHE) [12] has been proposed, but AHE has some problems such as producing noise in images and slow speed [13]. So a new method was developed called Contrast Limited Adaptive Histogram Equalization (CLAHE). To address the disadvantages of AHE however, CLAHE has disadvantages, which are don’t using all dynamic ranges of histogram [14]. Gamma transform [5] also called power transform is a nonlinear method used for image enhancement by adjusting the illumination. Land and Mccann [15] introduced a retinex theory to enhance the image, this theory assumes that the image can be represented as the product of reflectance and illumination, and the theory works on separating reflection component from the lightness and can reduce the illumination on image which leads to enhancing the image. Many methods work on developing retinex theory such as single scale retinex (SSR) [16], multi-scale retinex (MSR) [17], and multi-scale retinex with color restoration (MSRCR) [18]. Xiaojieguo et al. [19] introduced a method to enhance the image in a low-light condition called LIME. This method depends on estimating the illumination map by finding the max value in R, G, B in each illumination pixel and then using the illumination map to enhance the image. This method is compared with other methods using objective quality metrics as Lightness Order Error (LOE). Zhou et al. [20] introduced a method to enhance the image in low or high intensity and poor contrast, and this method is called Parallel Nonlinear Adaptive Enhancement (PNAE). The PNAE method consists of three steps, first step converts the input color image to intensity image, second step applies adaptive intensity adjustment and contrast enhancement on intensity image, and final step includes color restoration. Zhou et al. [21] introduced a New Nonlinear Adaptive Enhancement (NNAE) to develop the PNAE algorithm [20]. The NNAE method can preserve edge information in addition to enhancing the image.

Very Low Illumination Image Enhancement via Lightness Mapping

277

Lisani [22] introduced a method to enhance the image, and this method used logarithmic mapping which applies to each pixel depending on the luminance characteristic of the neighborhood of the pixel. This method succeeds in enhancing visibility in dark and bright regions. This method may produce a halo artifact. Daway et al. [23] introduced a method to enhance color image depending on image enhancement by fuzzy (EF) technique, by using a sigmoid function in fuzzy technique, and this method consists of three steps: first converting the image to a YIQ color model; second, applying the proposed method on the lightness component (Y); and finally, returning the image after enhancing the image to the RGB color model. This algorithm is compared with other methods using objective quality assessment such as LOE, entropy, and Natural Image Quality Evaluator (NIQE). Wang et al. [24] introduced a method which works on maintaining the naturalness (NAT) and enhancing the details for non-uniform illumination images. This method consists of three steps, first includes using a bright-pass filter to decompose the image into reflectance and illumination, the second step includes using bi-log transformation to make illumination mapping, and finally, they make synthesis of reflectance and illumination to get the final enhanced image. This method is compared with other methods using LOE metric as objective quality assessment, and this method may produce some low fault in video application when the scenes are varied. Gupta et al. [13] introduced a method to enhance the image contrast (CN) depending on the illumination part; this method works on Ycbcr color model and then extracts the illumination component (Y) and applies a new modified sigmoid function. This method works on enhancing contrast in the dark region for non-uniform lightness without effect on details of bright regions. This method is compared with other methods using objective quality metrics such as entropy, root mean square contrast (RMSC), and Perceptual Quality Metric (PQM). Lin and Zhenweishi [25] introduced an algorithm to enhance the image in the nighttime by modifying the MSR using the sigmoid function instead of the logarithm function. This modification in MSR makes this method to enhance the image in the nighttime without losing data. Fu et al. [26] introduced a method to enhance low illumination image by using the fusion technique (FU). This method applies to a single image and does not need more than one image for the same scene to implement the fusion method, then propose a simple illumination estimation which represents naturalness and illumination then apply the fusion-based algorithm for adjusting the estimated illumination and enhancing the image. This method is compared with other methods using objective quality metrics such as gradient magnitude similarity deviation metric (GMSD) and NIQE. Fu et al. [27] introduced a method for image enhancement, and this method used a probabilistic method and used linear domain instead of logarithm domain for estimating the reflectance and illumination in better form by using Maximum A Posterior (MAP). This algorithm is compared with other methods using subjective and objective quality metrics. Jawad et al. [28] introduced a method for low lightness image enhancement. This method used fuzzy logic power membership function (FLPMF) algorithm. This

278

A. R. Hashim et al.

method is compared with other methods using image quality metrics such as entropy and mean squared error (MSE).

2 The Proposed Method The proposed method consists of two steps which are color restoration and enhancement of illumination component using sigmoid function and CLAHE using HSV color model, where color restoration will depend on H, S components; this step enhances color without producing false color, as explained in the following steps: 1. The first step includes color restoration Low lighting makes images taken in this situation include several things, including color distortion. The color restoration can be done by using a technique based on collecting two filters which are median filter and min filter for each color component (R, G, B) as follows (Fig 1): R  = min filter(R) + median filter (R)

Original image

Histogram for Original image

Original image a er color restora on

Histogram for Original image a er color restora on

Fig. 1 Histograms for the original image and image after color restoration

(1)

Very Low Illumination Image Enhancement via Lightness Mapping

279

G  = min filter(G) + median filter(G)

(2)

B  = min filter(B) + median filter(B)

(3)

2. The second step includes lightness mapping To improve the lightness component by lightness mapping, this is done by relying on the sigmoid function and CLAHE. Initially, the image is converted from RGB color model to HSV color model as [29]. After extracting lightness component (V) and working to enhance it, the first sigmoid function transform (as lightness mapping) is used for enhancing the contrast in the lightness component according to this equation: 

 √ 1−v vt = 1/ 1 + v

(4)

Then, Contrast Limited Adaptive Histogram Equalization (CLAHE) is applied to increase the lightness. 3. The third step includes recombining the lightness component (V) and chrominance component (H, S) to get the HSV model; then, HSV is converted to RGB to get enhanced image; Fig. 2 shows the block diagram of the suggested method; and Fig. 3 shows the steps of the proposed method for low lightness image.

3 Image Quality Metrics This paper used two image quality metrics to estimate the quality of the image, which are as follows:1. Information Entropy (IE) It is one of the important metrics to measure image quality. It works to measure the amount of information in the image. It can be explained as the following equation [30]: EN = −

255 

p(a) log p(a)

(5)

i=0

where p(a) represents the probability of pixel (a) in the image. 2. Natural Image Quality Evaluator (NIQE) This metric works to establish image quality in the spatial domain depending on the natural scene statistic model [31].

280

A. R. Hashim et al. Input very low illumination original RGB image

Estimation lightness component V

Convert image from RGB to HSV

Applied color restoration

Apply sigmoid function on V (VS)

Convert image from RGB to HSV

Apply CLAHE ON (VS) to get (VSC)

Recombine components

Extract chromatic component H and S

Convert HSV to RGB

Output enhanced

Fig. 2 Diagram of the proposed method

4 Experimental Results The proposed method is applied to eight very low lightness images taken from the DICM image database [32], as shown in Fig. 4. The images size is (480 × 640) and (640 × 480) pixel, and the file format is jpg. The proposed method was implemented using Matlab version (R202a) on a computer platform with 16 GB of Ram and 2.7 GHz core i7 processor. The result of the present method is compared with other methods which are FU [26], PNAE[20], NNAE [21], EF [23], NAT [24], and CN [13], as shown in Figs. 5, 6, 7, and 8, using image quality metrics which are entropy and NIQE as shown in Table 1 that represent the average of image quality for an enhanced image. Table 1 illustrated the no reference metrics for all enhancement algorithms, we can note the superiority of the proposed method over the other methods, it had the highest value in entropy metric which means the enhanced image had the most image information for lightness, and also, it had the highest value in NIQE metric

Very Low Illumination Image Enhancement via Lightness Mapping

281

Fig. 3 a Original image, b image after color restoration, c lightness component (v), d sigmoid function for lightness component, e CLAHE for image, and f image after enhancement

282

A. R. Hashim et al.

although it clears the lower value because the smaller value in NIQE metric refers to high quality for color restoration.

5 Conclusion In this paper, we present a new method for enhancing very low-light images, this method consists of two steps, and the first step is color restoration using addition of two filters which are median filter and min filter. The second step includes enhancing the lightness component using the sigmoid function and CLAHE. The proposed method is compared with other methods which are FU, PNAE, NNAE, EF, NAT, and CN, using entropy and NIQE as image quality metrics. It is clear from Table 1 that the present method is the better method by getting high value in entropy and NIQE metric.

6 Future Work As further work, we would like to develop our work in the following directions: 1 Using deep learning methods to improve low-light images. 2 Apply the proposed algorithm using another color space such as the YIQ color model. 3 Adopting Dark Channel Prior (DCP) to improve low-light images. 4 Development of MSRCR models to improve images in night vision.

(f)

(e)

Fig. 4 Images from a to h used in the proposed method

(b)

(a)

(g)

(c)

(h)

(d)

Very Low Illumination Image Enhancement via Lightness Mapping 283

(a)-NNAE

(a)-CN

(a)-PNAE

(a)-EF

(a)-proposed

(a)-FU

Fig. 5 Comparison between the proposed method with CN [13], EF [23], FU [26], NAT [24], NNAE [21], and PNAE [20] for image (a)

(a)-NAT

(a)

284 A. R. Hashim et al.

(b)-EF

(b)-PNAE

(b)-CN

(b)-NNAE

(b)-proposed

(b)-FU

Fig. 6 Comparison between the proposed method with CN [13], EF [23], FU [26], NAT [24], NNAE [21], and PNAE [20] for image (b)

(b)-NAT

(b)

Very Low Illumination Image Enhancement via Lightness Mapping 285

(a)-NNAE

(a)-NAT

(a)-PNAE

(a)-EF

(a)-Proposed

(a)-FU

Fig. 7 Histogram distribution for the proposed method, CN [13], EF [23], FU [26], NAT [24], NNAE [21], and PNAE [20] for image (a)

(a)-CN

(a)

286 A. R. Hashim et al.

(b)-NNAE

(b)-CN

(b)-PNAE

(b)-EF

(b)-Proposed

(b)-FU

Fig. 8 Histogram distribution for the proposed method, CN [13], EF [23], FU [26], NAT[24], NNAE [21], and PNAE [20] for image (b)

(b)-NAT

(b)

Very Low Illumination Image Enhancement via Lightness Mapping 287

288 Table 1 Represents the average image quality for all enhancement algorithms

A. R. Hashim et al. Enhancement methods

Entropy

NIQE

Proposed method

6.651404

3.807112

FU [26]

6.486206

4.246705

PNAE [20]

6.208851

5.864456

NNAE [21]

6.186234

5.842524

EF [23]

5.728506

4.201386

NAT [24]

6.272624

4.124319

CN [13]

6.168414

3.896173

The value in bold represent the superiority of the proposed method over the other methods, it had the highest value in entropy metric which mean the enhanced image had the most image information for lightness, also it had the highest value in NIQE metric although it clear the lower value because the smaller value in NIQE metric refers to high quality for color restoration

References 1. Daway HG, Kareem HH, Hashim AR (2018) Pupil detection based on color difference and circular hough transform. Int J Electr Comput Eng 8(5):3278–3284. https://doi.org/10.11591/ ijece.v8i5.pp.3278-3284 2. Mutar A et al (2018) Smoke detection based on image processing by using grey and transparency features. J Theor Appl Inf Technol 96(21):6995–7005 3. Mirza NM, Al–Zuky AAD, Dway HG (2014) Enhancement of the underwater images using modified retinex algorithm. Al-Mustansiriyah J Sci 24(5):511–518 4. Abd-Al Ameer ZS, Daway HG, Kareem HH (2019) Enhancement underwater image using histogram equalization based on. J Eng Appl Sci 14(2):641–647 5. Gonzalez RC, Woods RE (2007) Digital Image Processing (3rd Edition). Prentice-Hall, Inc. Up. Saddle River, NJ, USA ©2006, p. 976, 2007, Accessed: Jan. 09, 2022. [Online]. Available http://dl.acm.org/citation.cfm?id=1076432. 6. Wang W, Wu X, Yuan X, Gao Z (2020) An experiment-based review of low-light image enhancement methods. IEEE Access 8:87884–87917 7. Park S, Kim K, Yu S, Paik J (2018) Contrast enhancement for low-light image enhancement: a survey. IEIE Trans Smart Process Comput 7(1):36–48 8. Resham NH, Abbas HK, Mohamad HJ, Al-Saleh AH (2021) Noise reduction, enhancement and classification for sonar images. Iraqi J. Sci, 4439–4452 9. Hassan SF, Daway HG, Al-Alaway IT (2018) Improving an illumination system in the microscopic imaging of nuclear tracks using light emitting diode. Indian J Public Heal Res Dev 9(12):1282–1287 10. Hashim AR, Daway HG, kareem HH (2020) No reference image quality measure for hazy images. Int J Intell Eng Syst 13(6): 460–471. https://doi.org/10.22266/ijies2020.1231.41 11. Razzak AA, Hashem AR (2015) Facial expression recognition using hybrid transform. Int J Comput Appl 119(15) 12. Hummel R (1977) Image enhancement by histogram transformation. comp graph 13. Gupta B, Agarwal TK (2018) New contrast enhancement approach for dark images with nonuniform illumination. Comput Electr Eng 70:616–630 14. Zuiderveld (1994) Contrast limited adaptive histogram equalization Graph Gems, pp 474–485 15. Land EH, McCann JJ (1971) Lightness and retinex theory. Josa 61(1):1–11 16. Jobson DJ, Rahman Z, Woodell GA (1997) Properties and performance of a center/surround retinex. IEEE Trans image Process 6(3):451–462

Very Low Illumination Image Enhancement via Lightness Mapping

289

17. Jobson DJ, Rahman Z, Woodell GA (1997) A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans Image Process 6(7):965–976 18. Rahman Z, Jobson DJ, Woodell GA (2004) Retinex processing for automatic image enhancement. J Electron Imaging 13(1):100–110 19. Guo X, Li Y, Ling H (2016) LIME: Low-light image enhancement via illumination map estimation. IEEE Trans image Process 26(2):982–993 20. Zhou Z, Sang N, Hu X (2014) A parallel nonlinear adaptive enhancement algorithm for low-or high-intensity color images. EURASIP J Adv Signal Process 2014(1):1–14 21. Zhou Z, Chen L, Hu X (2015) Color images enhancement for edge information protection based on second order Taylor series expansion approximation. Optik (Stuttg) 126(3):368–372 22. Lisani J-L (2020) Local contrast enhancement based on adaptive logarithmic mappings. Image Process Line 10:43–61 23. Daway HG, Daway EG, Kareem HH (2020) Colour image enhancement by fuzzy logic based on sigmoid membership function. Int J Intell Eng Syst 13(5):238–246 24. Wang S, Zheng J, Hu H-M, Li B (2013) Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Trans Image Process 22(9):3538–3548 25. Lin H, Shi Z (2014) Multi-scale retinex improvement for nighttime image enhancement. Optik (Stuttg) 125(24):7143–7148 26. Fu X, Zeng D, Huang Y, Liao Y, Ding X, Paisley J (2016) A fusion-based enhancing method for weakly illuminated images. Signal Process 129:82–96 27. Fu X, Liao Y, Zeng D, Huang Y, Zhang X-P, Ding X (2015) A probabilistic method for image enhancement with simultaneous illumination and reflectance estimation. IEEE Trans Image Process 24(12):4965–4977 28. Jawad MK, Daway HG, Mohamad HJ, Daway EG (1999) Lightness enhancement by fuzzy logic depending on power membership function. J Phys Conf Ser 1:2021. https://doi.org/10. 1088/1742-6596/1999/1/012129 29. Sangwine SJ, Horne REN (2012) The colour image processing handbook. Springer Science & Business Media 30. Shen W, Hao S, Qian J, Li L (2017) Blind quality assessment of dehazed images by analyzing information, contrast, and luminance. J. Netw. Intell. 2(1):139–146 31. Mittal A, Soundararajan R, Bovik AC (2012) Making a ‘completely blind’ image quality analyzer. IEEE Signal Process Lett 20(3):209–212 32. Lee C, Lee C, Kim C-S (2012) Contrast enhancement based on layered difference representation. In: 2012 19th IEEE international conference on image processing, pp 965–968

Clustering High Dimensional Transcriptomic Data with Spectral Clustering for Patient Subtyping Arif Ahmad Rather and Manzoor Ahmad Chachoo

Abstract Discovering statistically significant subtypes of cancer forms the backbone of precision medicine. In this regard, patients are stratified with respect to their gene expression data followed by determining statistical relevance of the subtypes. However, the ‘curse of dimensionality’ of such data acts as a bottleneck in developing computational tools to leverage expression profiles for disease subtyping. In this study, we propose a methodology to subtype patients based on their gene expression profiles. Our methodology is able to discover statistically significant subtypes with regard to log rank p value, and the compactness of the clusters is measured with silhouette score. The results of the proposed methodology are compared with two different paradigms of clustering, which is hard clustering and hierarchical clustering. For hard clustering, we choose kmeans ++ method, and for hierarchical clustering, BIRCH is chosen. The methodology is tested on gene expression profiles of LUNG and GBM datasets obtained from The Cancer Genome Atlas (TCGA). The results suggest that the methodology presented is able to discover subtypes that are statistically significant in terms of p value with better silhouette score compared to other methodologies. Keywords Gene expression · Survival analysis · t-SNE · Clustering

1 Introduction Clustering is usually the first unsupervised and exploratory choice to uncover the patterns in high dimensional gene expression data. Cluster analysis is a challenging task due to the non-availability of ground truth to guide the learning process. Depending upon the notion of similarity used, clusters discovered by the algorithms can differ significantly [1]. Despite the myriads of challenges posed by the unsupervised cluster analysis, clustering still forms an important bioinformatics tool. Cluster analysis has served as an important tool to identify sub-population of patients A. A. Rather (B) · M. A. Chachoo Department of Computer Sciences, University of Kashmir, Srinagar, JK, India e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_21

291

292

A. A. Rather and M. A. Chachoo

with similar molecular profiles for custom medical treatment personalised for each subgroup [2]. Gene expression data is however high dimensional with very low sample size. This leads to problem of ‘curse of dimensionality’ (CoD), and thereby complicates the judicious analysis of gene expression data [3]. Among others, the most important problem caused by CoD is increase in sparsity in high dimensional space and this results in loss of importance of neighborhood. Consequently, clustering algorithms that depend on customary distance metrics like Manhattan and Euclidean fail to perform efficiently in such scenarios [4]. Several methods have been proposed to deal with the CoD phenomenon. Broadly, two different paradigms have been used for effective feature dimension reduction in the microarray data. One is the manifold learning, and the other is the linear projection. For instance, Nilsson et al. [5] applied Isometric Mapping (ISOMAP) to estimate the geodesic distances in the microarray data to reveal the biologically relevant patterns with respect to lymphoma and lung cancer patients. The authors also highlighted the importance of using nonlinear techniques for effective feature dimension reduction in high dimensional omics data. Li et al. [6] used Locally Linear Embedding (LLE) to first find the equivalent low dimensional representation of the microarray data and thereafter perform the downstream analysis to classify data into different categories. Singular Value Decomposition (SVD), which is a linear projection-based dimensionality reduction, has successfully been used in the literature for unsupervised analysis of gene expression data in [7]. Recently, Coretto et al. [8] used Principal Component Analysis (PCA) to subtype cancer patients into different homogenous groups by leveraging their gene expression profiles. However, PCA often fails to effectively find the new feature space when the data is distributed nonlinearly in the original space. Also, the basic assumption of PCA is the orthogonality; i.e., principal components are perpendicular to each other but in practice, there can be basis vectors that are not orthogonal but summarize the distribution of data better. Motivated by the above problems, we propose the methodology to cluster the patient samples into different subgroups using their mRNA expression data. The clustering is designed such that the clustering results are statistically significant with regard to log rank p value. We use manifold learning technique t-SNE [9] to first reduce the dimensions of the omics data and afterward apply spectral clustering to data embedded into lower dimensions. In fact, different embeddings are sought for different value of perplexity parameter of t-SNE and clustering results that are optimal with respect to p values are considered. The rest of the article is presented as follows: First, we present the proposed methodology and the brief introduction to algorithms on which our methodology is based and then the proposed model is presented followed by results and conclusion.

2 Proposed Methodology The proposed methodology consists of the following sequence of algorithms: First the imputation of missing values in gene expression matrix is carried out, and for

Clustering High Dimensional Transcriptomic Data …

293

Fig. 1 Graphic layout of the proposed methodology

this, we use k nearest neighbor-based imputation according to [10]. Thereafter, the Pearson correlation matrix is computed to find the linear correlations between the set of genes. Other than this, no preprocessing is done. The sequence of steps that follow is given below and graphically represented in Fig. 1. The aim is to discover subtypes that have better prognosis with regard to survival measured in terms of log rank p value. The compactness of the clusters is quantified via silhouette scores.

2.1 t-distributed Stochastic Neighborhood Embedding (t-SNE) t-SNE [9] is a manifold learning dimensionality reduction and visualization technique that is an extension of stochastic neighborhood embedding (SNE). t-SNE offers numerous advantages over other dimensionality reduction methods. For example, unlike PCA, t-SNE preserves the local structure in the data which is desirable in

294

A. A. Rather and M. A. Chachoo

biomedical datasets [11]. This is achieved through the perplexity parameter of tSNE. While preserving the local structure in the data, t-SNE also overcomes the crowding problem that may occur when preserving the proximities in the data in the lower dimensional space [9]. Moreover, PCA’s inherent assumption of data being linearly separable is often violated in principle and most of the real-world data are often high dimensional and distributed around a nonlinear high dimensional manifold [12]. In t-SNE [9], the Euclidean distances in the original space are transformed to conditional probabilities to delineate their proximities. t-SNE defines two probability distributions P and Q in the original and new spaces, respectively, with pi j and qi j as probabilities of being neighbors between data points i and j. Points that are close have large value for pi j . t-SNE minimizes the differences in two probability distributions P and Q. For this purpose, a nonparametric method, Kullback–Leibler divergence is used as a loss function to minimize the probability distributions. Consider the data matrix Mnxp , with n being the number of observations and p, the number of features. Each observation xi is a p-dimensional vector. Mathematically, the two probability distributions are defined as:   2 exp −||xi −x j ||

pi j =

 k =i

2σi2

exp(−||xi −xk ||2 ) 2σi2

(1)

where σ in (1) is the perplexity parameter, which is the variance of Gaussian with mean at xi . Intuitively, σ can be thought of as the number of nearest neighbors each data point has. Probabilities in new space are given as:   2 −1 1 +  yi − y j  qi j =    2 −1 k=i 1 + ||yi − yk ||

(2)

Note that the probabilities pi j are estimated in the original space and qi j in the lower space with pii = qii = 0 and the two probabilities are symmetric, i.e., pi j = p ji and qi j = q ji . Once the probabilities are estimated in the two spaces, the next task is to ensure two probability distributions are similar to each other as far as possible. For this purpose, Kullback–Leibler divergence is used as a cost function to minimize the overall probabilities P and Q. This is given by: DKL ( P|| Q) =



P(x)log

P(x) Q(x)

(3)

The solution to (3) is obtained through gradient descent method. Although many implementations of t-SNE are available, we preferred to use bht- SNE implementation in R. This is due to its reduced time complexity from O n 2 to O(nlogn). Among all the parameters of t-SNE, perplexity is the most important parameter. It controls the neighborhood of data points.

Clustering High Dimensional Transcriptomic Data …

295

2.2 Spectral Clustering Spectral methods for clustering have found great importance in various fields like computer vision, bioinformatics, and sociology. For instance, spectral clustering has successfully been used for image segmentation, discovering subtypes of cancer in multiomics studies [13–15]. It depends upon the similarity graph G(V, E), with vertices V representing the data points and E representing the edges. An edge is drawn from vertex vi to vertex v j if the degree of similarity is more than the threshold chosen. The graph G is finally represented as a square matrix Snxn , where n is the numbers of observations and each element si j represents some degree of similarity between data points i and j. Note that the graph G is undirected and si j = s ji . The degree matrix D is constructed from the similarity matrix and is defined as: di =

n 

si j

(4)

j=1

From (4), the degree corresponding to each data point is computed and a degree matrix D is defined such that the diagonals (d1 , d2 , . . . , dn ) of D represent the degree of each data point. Now, the graph Laplacian L is computed to obtain the semioptimal cut. There are many methods of computing the graphLaplacian, and each method gives a different interpretation of the clustering results. We use a basic method of computing the Laplacian of S as: L(S) = D −1/2 (D − A)D −1/2

(5)

The Laplacian matrix obtained from (5) is decomposed through eigen value decomposition. After decomposition, eigen vectors are stacked in the matrix E nk and the actual clustering is done on the matrix E of k eigen vectors corresponding to n observations. Note that only the eigen vectors corresponding to smallest positive (>0) eigen values are chosen. The resultant eigen vector matrix is clustered with k means to obtain the cluster membership of each observation. The different embeddings are obtained through t-SNE [9] by varying the perplexity parameter. Each lower dimensional embedding is given as input to spectral clustering, and the clustering solution that maximizes the p value together with the silhouette score is chosen.

3 Results and Discussion The proposed methodology is tested on mRNA expression profiles of LUNG and GBM datasets. the LUNG dataset consists of 106 samples and GBM contains 100 samples with 12,042 genes. The number of clusters is chosen to be 4 for LUNG and 3 for GBM. This choice of k for such data is well supported in the literature [8,

296

A. A. Rather and M. A. Chachoo

10]. The results are compared with both hard clustering and hierarchical clustering. For hard clustering, we choose kmeans ++, and for hierarchical clustering, Balanced Iterative and Clustering using Hierarchies (BIRCH) [16] is chosen. Note that, kmeans ++ solves the problem of initialization sensitivity to some extent in the Lloyds version of kmeans by incorporating an element of probability every time a centroid is selected. BIRCH [16] uses an agglomerative hierarchical approach to clustering. It works by first constructing the cluster filtering (CF) and then searches for optimal clusters from the vertices of CF tree with the help of an arbitrary clustering algorithm. BIRCH is suitable when dealing with outliers in the data, while kmeans ++ finds it hard to deal with outliers in the data [1]. After obtaining the final clustering results, the survival analysis is performed. For this purpose, survival information is performed by estimating the Kaplan–Meier survival plots corresponding to each subtype [17]. This is done for every clustering result, and the subtypes that are statistically significant with regard to log rank pvalue( p ≤ 0.05) value are chosen. The clustering is performed across all the values of perplexity parameter of t-SNE [9] from 1 to (n − 1)/3 in steps of 1, where n is the numbers of patient samples. The clustering results are found to be robust to changes across all parameters other than perplexity. For LUNG dataset, perplexity therefore ranges from 1 to 35 and, for GBM, 1 to 33. Figures 3 and 4 represent the survival analysis of the proposed methodology and other methodologies. The proposed methodology is able to discover subtypes that are statistically relevant with regard to p value. To further quantify the clustering results, silhouette scores are used to measure the consistency of each sample within the cluster. The value of silhouette score ranges from -1 to + 1, where −1 means the perfectly imperfect clustering whereas +1 means the perfect clustering. As shown in Fig. 2, the average silhouette score for proposed methodology corresponding to LUNG dataset is 0.42 and for GBM it is 0.41, which indicates the clustering results obtained by proposed methodology are better compared to other methodologies. In Fig. 2, the silhouette scores are arranged in the decreasing order of their magnitude. Note that in Table. 1, BIRCH [16] performed worst on GBM dataset both in terms of p value and silhouette score. The results of the comparison are shown in Table 1. Note that in Fig. 4c, the survival curves corresponding to hierarchical clustering have two subtypes that are almost superimposed with p value of 0.76 which is statistically insignificant. While the subtypes are discovered by proposed methodology, Figs. 3a and 4b have high statistical significance with less overlap (observed visually). Both kmeans ++ and BIRCH performed worst in the high dimensional space, due to the manifestation of curse of dimensionality. For effective comparison, we considered clustering results only in the lower dimensional space for both kmeans ++ and BIRCH. Out of all the methodologies, proposed methodology on average achieves good results while hierarchical clustering with BIRCH did not perform well and kmeans ++ performed relatively well.

Clustering High Dimensional Transcriptomic Data …

a

297

b

Fig. 2 Silhouette scores of the proposed methodology (arranged in non-decreasing order). a LUNG cancer, b GBM cancer

4 Conclusion In this study, we propose a methodology to cluster gene expression profiles of LUNG and GBM cancer patients into different subtypes. To handle the curse of dimensionality, we used t-SNE [9], which is a manifold learning-based dimensionality reduction method, and thereafter, spectral clustering [13] is performed to obtain subtypes. The aim of the study is to discover subtypes that are statistically significant ( p < 0.05). To validate the clustering results, survival analysis is performed for each subtype with regard to Kaplan–Meier estimator [17] for every clustering result and only those results are retained that have more statistical significance with regard to p value. The methodology is compared with kmeans ++ and BIRCH [16]. The results infer that our methodology is able to discover subtypes that have relatively better life expectancy patterns compared to kmeans ++ and BIRCH.

298

A. A. Rather and M. A. Chachoo

a

b

c

Clustering High Dimensional Transcriptomic Data …

a

299

b

c

Fig. 4 Result of survival analysis for GBM dataset for a proposed, b kmeans ++, and C: BIRCH clustering. The proposed methodology obtains statistically more significant subtypes with more significance with regard to p value Table 1 Comparison of results of proposed and other methodologies with regard to silhouette score and p value. Note that perplexity parameter is defined only for the proposed methodology Dataset

Algo

perplexity

p value

LUNG

proposed

23

0.0051

0.42

4

Kmeans ++



0.06

0.21

4

BIRCH



0.08

0.2

4

proposed

12

0.0063

0.41

3

Kmeans ++



0.029

0.32

3

BIRCH



0.76

GBM

Silhouette score

−0.4

The bold values represent the best performing method in each dataset

k

3

300

A. A. Rather and M. A. Chachoo

References 1. Nwadiugwu MC (2020) Gene-based clustering algorithms: comparison between denclue, fuzzy-C, and BIRCH. Bioinform Biol Insights 14:1–6. https://doi.org/10.1177/117793222090 9851 2. Saria S, Goldenberg A (2015) Subtyping: what it is and its role in precision medicine. IEEE Intell Syst 30:70–75. https://doi.org/10.1109/MIS.2015.60 3. Altman N, Krzywinski M (2018) The curse(s) of dimensionality this-month. Nat Methods 15:399–400. https://doi.org/10.1038/s41592-018-0019-x 4. Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1998) When is “nearest neighbor” meaningful? Lect Notes Comput Sci (Including Subser Lect Notes Artif Intell Lect Notes Bioinf) 1540:217– 235. https://doi.org/10.1007/3-540-49257-7_15 5. Nilsson J, Fioretos T, Höglund M, Fontes M (2004) Approximate geodesic distances reveal biologically relevant structures in microarray data. Bioinformatics 20:874–880. https://doi.org/ 10.1093/bioinformatics/btg496 6. Li B, Zheng CH, Huang DS et al (2010) Gene expression data classification using locally linear discriminant embedding. Comput Biol Med 40:802–810. https://doi.org/10.1016/j.com pbiomed.2010.08.003 7. Liang F (2007) Use of SVD-based probit transformation in clustering gene expression profiles. Comput Stat Data Anal 51:6355–6366. https://doi.org/10.1016/j.csda.2007.01.022 8. Coretto P, Serra A, Tagliaferri R (2018) Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics 34:4064–4072. https://doi.org/10.1093/ bioinformatics/bty502 9. Van Der Maaten L, Hinton G (2008) Visualizing Data using t-SNE 9:2579–2605 10. Wang B, Mezlini AM, Demir F et al (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11:333–337. https://doi.org/10.1038/nmeth.2810 11. Nicolau M, Levine AJ, Carlsson G (2011) Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc Natl Acad Sci U S A 108:7265–7270. https://doi.org/10.1073/pnas.1102826108 12. Vasighizaker A, Danda S, Rueda L (2022) Discovering cell types using manifold learning and enhanced visualization of single-cell RNA-Seq data. Sci Rep 12:1–16. https://doi.org/10.1038/ s41598-021-03613-0 13. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416. https://doi. org/10.1007/s11222-007-9033-z 14. Andrew YN (2017) On spectral clustering: analysis and an algorithm. Encycl Mach Learn Data Min, 1167–1167. https://doi.org/10.1007/978-1-4899-7687-1_100437 15. John CR, Watson D, Barnes MR et al (2020) Spectrum: fast density-aware spectral clustering for single and multi-omic data. Bioinformatics 36:1159–1166. https://doi.org/10.1093/bioinf ormatics/btz704 16. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25:103–114. https://doi.org/10.1145/235968.233324 17. Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481. https://doi.org/10.2307/2281868

3D CNN-Based Classification of Severity in COVID-19 Using CT Images R. Leena Sri, Divya Vetriveeran, and Rakoth Kandan Sambandam

Abstract With the pandemic worldwide due to COVID-19, several detections and diagnostic methods have been in place. One of the standard modes of detection is computed tomography imaging. With the availability of computing resources and powerful GPUs, the analyses of extensive image data have been possible. Our proposed work initially deals with the classification of CT images as normal and infected images, and later, from the infected data, the images are classified based on their severity. The proposed work uses a 3D convolution neural network model to extract all the relevant features from the CT scan images. The results are also compared with the existing state-of-the-art algorithms. The proposed work is evaluated in accuracy, precision, recall, kappa value, and Intersection over Union. The model achieved an overall accuracy of 94.234% and a kappa value of 0.894. Keywords COVID-19 · Lung infection · 3D CNN · Severity classification · CT images

1 Introduction A pandemic hits the world due to the spread of coronavirus in the year 2019 (COVID19). Since then, there have been several mutations of the virus, but the primary diagnosis of the disease was severe acute respiratory syndrome coronavirus 2 (SARSCoV2) [1]. One laboratory evaluation or testing for COVID is a real-time reversetranscriptase polymerase chain reaction (RT-PCR) assay [2]. However, when large samples are being collected and analyzed, several cases reported False Negative, who was eventually positive or vice versa [3].

R. Leena Sri Department of CSE, Thiagarajar College of Engineering, Madurai, India D. Vetriveeran (B) · R. K. Sambandam Department of CSE, CHRIST (Deemed to be University), Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_22

301

302

R. Leena Sri et al.

As the number of cases increased, the laboratory facilities were limited, and the testing error had to be kept under check. Also, the time taken to receive the report ranges from 3 to 5 days, leading to increasing cases if the patient is not in proper quarantine until the test reports are received [4]. The other diagnostic method in place during the pandemic was the diagnostic imaging services. The most used imaging service during the pandemic is CT imaging which helped effectively diagnose the disease [5]. The findings of pneumonia termed to be “Diamond Princess,” which can give the results in an hour or two, were the practical mode of results [6]. The computer imaging applications extend in almost all areas like Brain MRI, dental imaging, cancer detection, etc. Neural networks have been in place for various image analyses and disease diagnoses. The most commonly used algorithm is the 2D convolution neural network (2D CNN), which helps in COVID-19 detection using X-Ray images, skin lesion classification, eye disease detection, and several other medical applications [7]. When the images are two-dimensional, such as the X-Rays, the 2D CNN works fine with great accuracy, helping assist the medical practitioners. As the technology progresses, the imaging has become 3D slices from the CT or MRI imaging technologies. The traditional 2D CNNs can lead to information loss and give misclassifications that cannot be acceptable in medical diagnostics. Thus, the use of 3D CNNs can help in diagnosis based on spatiotemporal modeling. Thus, the need for image manipulation or the chances of loss of information are minimal. A brief review of the literature where the 3D CNN has been used in imaging applications is given in Table 1. With the advent of the pandemic, lung imaging has increased, and the data is being collected as X-Ray images or CT image slices. There have been several works in classifying lung images and disease classifications. Some of the prominent literature in the field are listed in Table 2. The kinds of literature include the use of 2D and 3D CNNs for image classifications. Several datasets are used in the study of COVID that are classified using AI models. Some of the significant datasets are given in Table 3. Our contributions to the work are as follows: Table 1 Review on 3D CNN in image classification S. No.

Literature

Model used

Application

1

[8]

DeCoVNet

Volume modeling

2

[9]

3D ResNet-18, ResNet-50, and ResNet-101

Video datasets

3

[10]

3D-SqueezNet,3D-SuffleNet, 3D-MobileNet-V1, and 3D-MobileNetV2

Complexity analysis on 3D CNN models

4

[11]

3D CNN

COVID-19 screening

5

[12]

3D CNN

Chest CT lesion localization

3D CNN-Based Classification of Severity in COVID-19 …

303

Table 2 Literatures on image classifications S. No. Literature Work done

Improvements needed

1

[13]

Transfer learning-based 2D CNN for Works on a small dataset, and the COVID detection model was built using ImageNet data which needed massive fine-tuning

2

[12]

2D binary classification on COVID acquired pneumonia based on softmax activation

3

[14]

Resnet-18 and SVM-based COVID A small dataset is challenging for AI classification. The data includes both performance assessment. The images and non-image data trade-off for misclassification has to be handled

4

[15]

DenseNet-121-based pneumonia classification

The accuracy of the model achieved was 80% and can be improved

5

[16]

AlexNet model-based lung abnormalities detection based on X-Ray images

Data other than X-Ray can be worked upon with better accuracy

Table 3 Review on dataset available for COVID-19 study

An analysis is based on a small dataset. It can be extended to a multiclass classification

S. No.

Literature

Type of dataset

Size of dataset (No. of images)

1

[17]

COVIDx test

13,800

2

[18]

Proprietary data

4356

3

[19]

GitHub, Kaggle

158

4

[20]

Public dataset from Italy

5840

5

[21]

COVID-19 CT

1600

1. Study on the datasets available for COVID-19 lung infection detection and classification 2. Building a 3D CNN model for detection of infected CT images 3. Multiclass classification on the images based on the severity of infections. Our paper is structured as follows. Section 1 had the details on the field Introduction and review on literature in the field. Section 2 introduces 3D CNN, followed by the methods and materials used in our work. Section 3 gives the results and discussions on the work done. Finally, the paper is concluded with future research directions in Sect. 4.

304

R. Leena Sri et al.

2 Materials and Methods 2.1 Introduction to 3D CNN In contrast to the existing 2D CNN models, in a 3D CNN, the kernel moves in a 3-dimensional space. The input shape of the images is (height, width, depth), along which a 4th dimension is added, which is the color channel. They are not just for 3d spaces like CT images and videos; they may also be used with 2d space inputs like photos. The layers in 3D CNN are similar to the 2D CNN, which include the convolution layers, pooling, padding, dropout, flatten, and, finally, the dense layer with a softmax activation for output. The only difference is the working of the convolution, where the filters are distributed and perform a one-to-one multiplication which gives a feature map. A sample convolution process is given in Fig. 1. Our overall work is depicted in Fig. 2 as follows.

Fig. 1 3D CNN convolution process

3D CNN-Based Classification of Severity in COVID-19 …

305

Fig. 2 Proposed workflow

2.2 Dataset Description The dataset used in our work is “MosMedData [22],” which has the human lung computed tomography (CT) scans of anonymous patients with and without COVID symptoms. The data from the patients was collected from March 2020 to April 2020 in Russia. The dataset has 1110 image slices in 5 different classes. Class 0 is the image with no signs of pneumonia. Class 1 is the images with less than 25% infection, class 2 with 25–50% infection, class 3 with 50–75% infection, and finally class 4 with more than 75% infection. A sample image from the dataset is shown in Fig. 3.

Fig. 3 a Image from dataset and b mask image for analysis

306

R. Leena Sri et al.

2.3 Data Pre-processing The data is stored with extension.nii, which is the Nifti format. In the dataset, Hounsfield units are used to store raw voxel intensity from CT images. To have a proper orientation of the image to help in the better classification, rotating the volumes by 90 degrees was the initial pre-processing. The CT scans were normalized using a threshold between −1000 and 400 as per the dataset. These images were scaled to be between 0 and 1 and were resized to equal width, height, and depth (128 × 128 × 64). On pre-processing, the data was split in the ratio 70–30 for training and validation. The dataset is unbalanced; that is, there are few data in cases that represent high infections. Thus, to get the most out of the data in all aspects, data augmentation was performed to build a strong data for training. During training, the CT scans were additionally enhanced by rotating at different angles. Each image is rotated at random angles during training to get a better dataset. A dimension of size 1 is added at axis 4 for 3D convolutions on the data. The data is stored in rank-3 tensors of shape (samples, height, width, and depth). As a result, the new dimension of the data can be represented as (samples, height, width, depth, 1).

2.4 Model Design The proposed model was designed with four convolution layers with a filter size of 64, 128, 256, and 256, respectively. The kernel size was chosen to be 3 × 3 × 3. For the pooling layer, max-pooling was used with stride 2. These layers were used in feature extraction, and finally, these features were given to a flatten layer and finally to a fully connected layer. To have an optimal network size, a dropout of 30% was implemented, and finally, the results were gathered using softmax activation function. The output layer is designed with 2 neurons since a binary classifier was implemented. The designed 3DCNN is as given in Fig. 4.

Fig. 4 Proposed 3D CNN model

3D CNN-Based Classification of Severity in COVID-19 …

307

2.5 Results and Discussion The data after pre-processing and in proper dimension was visualized. The sample data is given in Fig. 5. The scan images have several slices, and these slices are the proper input to visualize the entire lung scans. The work visualizes a montage of the slices as given in Fig. 6. The performance of the model in terms of accuracy is represented as follows. The early stopping strategy was implemented in order to avoid model overfitting. Thus, as shown in Fig. 7, the model achieved validation accuracy closer to the training accuracy.

Fig. 5 Pre-processed image for classification

Fig. 6 Slices of CT image

308

R. Leena Sri et al.

Fig. 7 Model accuracy

The proposed model was compared with the existing prebuilt models, and the comparisons are shown in Table 4. The same work was extended to a multiclass classification where the images classified as infected images are given to the classifier to know the severity of infection. The input scans are classified based on 5 classes ranging from 0 to more than 75% infections. The confusion matrix of the multiclass classification is given in Table 5. The following are the evaluation parameters used to validate the proposed model. Accuracy =

TP + TN TP + TN + FP + FN

(1)

TP TP + FP

(2)

Precision =

Table 4 Comparison with existing state-of-the-art algorithm

S. No.

Model

Accuracy (%)

1

AlexNet

89.4

2

VGG-16

92.6

3

VGG-19

93.2

4

SqueezeNet

90.5

5

GoogleNet

92.7

6

Proposed CNN model

93.5

The data given in bold represents the performance of the proposed model. The other data in the table are the performance of the existing models

3D CNN-Based Classification of Severity in COVID-19 …

309

Table 5 Confusion matrix of classifier Class1

Class2

Class3

Class4

Class5

Class1

240

11

2

1

0

Class2

0

676

6

2

0

Class3

0

18

89

10

0

Class4

0

0

6

39

0

Class5

0

0

0

0

2

Recall =

TP TP + FN

Kappa Accuracy =

(3)

TA − RA (1 − RA )

(4)

where TP, TN, FP, and FN represent the True Positive, True Negative, False Positive, and False Negative values from the model. TA represents the Total Accuracy, and RA represents the Random Accuracy. Based on the above formulae, the precision and recall for each of the classes are given as follows in Table 6. The overall accuracy of the work is 94.234%, and the kappa value is 0.894. In addition to these parameters, the evaluation parameter—Intersection over Union was considered. This is computed using Eq. (5). IoU =

Overlap Area Union Area

(5)

The ground truth image was a part of the dataset, and the proposed detection model was able to detect most of the area correctly. The sample output of the same is shown in Fig. 8. The IoU obtained in our work was 90.4. As shown in the figure, the red area shows the infections that are a part of the ground truth and the detection from the proposed model. It can be seen that the proposed model was able to detect the infected area with high accuracy. Table 6 Evaluation parameters of the model Class

Precision (%)

Recall (%)

Class1

94.48

100.00

Class2

98.83

95.88

Class3

71.20

86.41

Class4

86.67

65.00

Class5

100.00

100.00

310

R. Leena Sri et al.

Fig. 8 Classified image and ground truth

3 Conclusions and Future Scope Our proposed work in the paper gives an organized approach to detecting lung infections from the CT images and predicting the severity of infection from the scan data. The proposed work uses the mask values from the dataset to predict the infected lung segments. The proposed 3D CNN gave an accuracy of 93.5%, which was better than the state-of-the-art algorithms for image classification. The model was also validated on other parameters, wherein the constructed model gave an average precision of 90.23% and an average recall of 89.45%. The parameter chosen to evaluate the infected area classification of the lung was IoU which was 90.4. The proposed work can still be extended to various other models and fine-tuned to improve accuracy. In addition, the work can be extended to a furthermore extensive data and classify the images using an ensemble method. Acknowledgements The authors gratefully acknowledge the authorities of CHRIST (Deemed to be University) and Thiagarajar College of Engineering, Madurai, for the facilities offered to carry out this work.

3D CNN-Based Classification of Severity in COVID-19 …

311

References 1. Ai T, Yang Z, Hou H, Zhan C, Chen C, Lv W, Xia L (2020) Correlation of chest CT and RT-PCR testing for coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases. Radiology 296(2):E32–E40 2. Li Y, Yao L, Li J, Chen L, Song Y, Cai Z, Yang C (2020) Stability issues of RT-PCR testing of SARS-CoV-2 for hospitalized patients clinically diagnosed with COVID-19. J Med Virol 92(7):903–908 3. Wang W, Xu Y, Gao R, Lu R, Han K, Wu G, Tan W (2020) Detection of SARS-CoV-2 in different types of clinical specimens. JAMA 323(18):1843–1844 4. Zou L, Ruan F, Huang M, Liang L, Huang H, Hong Z, Yu J, Kang M, Song Y, Xia J, Guo Q, Wu J (2020) SARS-CoV-2 viral load in upper respiratory specimens of infected patients. New England J Med 382(12):1177–1179 5. Fang Y, Zhang H, Xie J, Lin M, Ying L, Pang P, Ji W (2020) Sensitivity of chest CT for COVID-19: comparison to RT-PCR. Radiology 296(2):E115–E117 6. Inui S, Fujikawa A, Jitsu M, Kunishima N, Watanabe S, Suzuki Y, Umeda S, Uwabe Y (2020) Chest CT findings in cases from the cruise ship diamond princess with coronavirus disease (COVID-19). Radiol: Cardiothorac Imaging 2(2):e200110 7. Serte S, Serener A, Al-Turjman F (2020) Deep learning in medical imaging: a brief review. Trans Emerg Telecommun Technol e4080 8. Zheng C, Deng X, Fu Q, Zhou Q, Feng J, Ma H, Liu W, Wang X (2020) Deep learning-based detection for COVID-19 from chest CT using weak label. MedRxiv. 9. Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition, (pp 6546–6555) 10. Kopuklu O, Kose N, Gunduz A, Rigoll G (2019) Resource-efficient 3d convolutional neural networks. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, (pp 0–0) 11. Han Z, Wei B, Hong Y, Li T, Cong J, Zhu X, Wei H, Zhang W (2020) Accurate screening of COVID-19 using attention-based deep 3D multiple instance learning. IEEE Trans Medical Imaging 39(8):2584-2594 12. Wang X, Deng X, Fu Q, Zhou Q, Feng J, Ma H, Liu W, Zheng C (2020) A weakly-supervised framework for COVID-19 classification and lesion localization from chest CT. IEEE Trans Med Imaging 39(8):2615-2625 13. He X, Yang X, Zhang S, Zhao J, Zhang Y, Xing E, Xie P (2020) Sample-efficient deep learning for COVID-19 diagnosis based on CT scans. medrxiv 14. Mei X, Lee HC, Diao KY, Huang M, Lin B, Liu C, Xie Z, Ma Y, Robson PM, Chung M, Bernheim A, Yang Y (2020) Artificial intelligence-enabled rapid diagnosis of patients with COVID-19. Nature Med 26(8):1224-1228 15. Harmon SA, Sanford TH, Xu S, Turkbey EB, Roth H, Xu Z, Turkbey B (2020) Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nat Commun 11(1):1–7 16. Bhandary A, Prabhu GA, Rajinikanth V, Thanaraj KP, Satapathy SC, Robbins DE, Shasky C, Zhang YD, Tavares JM, Raja NSM (2020) Deep-learning framework to detect lung abnormality–A study with chest X-Ray and lung CT scan images. Pattern Recognit Lett 129, 271-278 17. Wang L, Lin ZQ, Wong A (2020) Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Sci Rep 10(1):1–12 18. Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, Bai J, Lu Y, Fang Z, Song Q, Cao K, Xia J (2020) Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiology. 19. Sethy PK, Behera SK (2020) Detection of coronavirus disease (covid-19) based on deep features

312

R. Leena Sri et al.

20. Kumar R, Arora R, Bansal V, Sahayasheela VJ, Buckchash H, Imran J, Raman B (2020) Accurate prediction of COVID-19 using chest X-Ray images through deep feature learning model with SMOTE and machine learning classifiers. MedRxiv. 21. Fan DP, Zhou T, Ji GP, Zhou Y, Chen G, Fu H, Shen J, Shao L (2020) Inf-net: Automatic covid-19 lung infection segmentation from CT images. IEEE Trans Med Imaging 39(8):2626-2637 22. Morozov SP, Andreychenko AE, Pavlov NA, Vladzymyrskyy AV, Ledikhova NV, Gombolevskiy VA, Blokhin IA, Gelezhe PB, Gonchar AV, Chernina VY (2020) Mosmeddata: chest ct scans with covid-19 related findings dataset. arXiv preprint arXiv:2005.06465

A Hybrid Architecture for Action Recognition in Videos Using Deep Learning Kakarla Ajay Kumar Reddy, Ch. Vijayendra Sai, Sundeep V. V. S. Akella, and Priyanka Kumar

Abstract Recognition of human actions in videos could be a difficult task that has received a great deal of importance within the analysis community. The actions of the people in the video sequence are symbols that describe the physical aspects and the displacement of people and objects in it. Human Activity Recognition is used in many applications including surveillance, anti-terrorism, anti-crime protection, and the cutting of health and assistance. The challenge is to find relevant details in the visual appearance of videos and motion between frames. The article presented is an integrated learning model that learns to differentiate human actions from videos that have been projected. The process begins by examining the effect of various existing neural network structures used to understand static frames in action perception, and then identifying the best and most appropriate model for action recognition is done using various neural network architectures such as VGG-16 and identifies the architecture that best suits our problem based on a variety of metrics. Finally, the selected architecture is trained using the UCF-101 dataset and the outputs obtained are tested for further improvement. Keywords VGG-16 · Neural network · Deep learning

K. A. K. Reddy (B) · Ch. V. Sai · S. V. V. S. Akella · P. Kumar Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 500085, India e-mail: [email protected] Ch. V. Sai e-mail: [email protected] S. V. V. S. Akella e-mail: [email protected] P. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_23

313

314

K. A. K. Reddy et al.

1 Introduction With a huge amount of video data available in many domains, recognition of human actions in videos is important and is also a challenging task. Human action recognition is employed in several applications like surveillance and anti-crime securities and for life, work, and help. The challenging aspect is to capture the corresponding data on the planning from individual single frames of videos and the displacement between frames. Analyzing a person’s activity is by no means just an event of introducing designs of movement among various body parts; It is a very important part of the analysis and understanding of human behavior that square measure is important in many fields and the investigation of the police, robots, health care, video viewing, human communication, and computers, etc. Also, video details include data additions, e.g., jittering for video frame by stages. In this article, a novel hybrid architecture for activity recognition from videos using a bidirectional LSTM and VGG-16 network has been proposed. This projected model can learn future sequences and may method prolonged videos by understanding features in a selected interval. Recognition accuracy shows the efficacy of the proposed algorithm.

2 Literature Survey Though many researchers have developed and presented algorithms for action recognition, a few significant works have been discussed here. In [1] the authors suggested a useful way to extract powerful features, which essentially determine the effect of the perception of action. After their RGB depth stationary image classification, the authors have expanded convolutional neural networks (CNNs) to consider the processing of motion information to be used to detect action. The performance of the proposed VGG-16 model on images and the new architecture proposed by the authors of the paper in the UCF-101 database provided 91% accuracy. A simulation method for a novel action based on a 2-D wavelet and watermark embedding is proposed in [2]. The authors used a Deep Belief Network (DBN)(Deep belief Network) and Discrete Cosine Transform method for data understanding and extracting a measure. Performance of their method with watermark embedding on the KTH dataset has given a test set accuracy of 94.3. A two-stream ConvNet architecture that aims to separate the action recognition task into two networks: spatial and temporal has been explained in [3]. The authors showed that a ConvNet which is trained on a dense optical flow with multiple frames is in a position to realize excellent performance even with limited training data, intending to prove that their model performs far better compared to traditional models with hand-crafted features. The architecture is trained and its performance is then measured based on two video action datasets: first on UCF-101 and then on HMDB51. The metric of accuracy obtained for the individual spatial network was found to be 72.75% and for the temporal network at around 81.2%.

A Hybrid Architecture for Action Recognition in Videos …

315

In [4, 5] the researchers have reviewed varied techniques for the 3 styles of datasets: multiple viewpoints, single viewpoint, and videos with RGB-depth. The authors have additionally realized the longer-term directions of labor within the above-named techniques. In this, the authors have planned a very automatic deep learning model that is able to classify actions performed by humans while not utilizing any previous data. The primary process in their scheme supported the extension of Convolutional Neural Networks to three-dimensional space, which can learn features based on space and time on its own. Then the researchers have taken a Recurrent Neural Network and trained it to separate every sequence by taking into account the development of the remembered temporal features in every occurring time step. In the methodology discussed in [6], the authors have implemented an algorithm in which a model trained for detecting objects will search about the existence of objects already reported. After this, the power is then transmitted to a model that can detect events and it awaits the occurrence of an event that may be object placement or object removal. Only if some event is detected does the object detector become operational. A basic training technique in HSV color space using a single color camera makes it a consumer application. The experimental findings show the viability of the approach proposed in [6]. The authors present a novel strategy in [7] for perceiving human activities from a progression of video outlines. They use the possibility of three kinds of measures: Region, Speed, and Direction (RSD) that is equipped for perceiving the majority of the regular exercises disregarding the spatiotemporal fluctuation between subjects. The direction-based methodology gives less precise outcomes because of the fluctuation of activity design between subjects. To acknowledge an activity using RSD Code, the authors gave importance to Region, Speed, and Direction factors. These factors combined provide an improved outcome for perceiving activities. This technique is liberated from impediments, positional mistakes, and missing data. The outcomes from the calculation are practically identical to the aftereffects of the current human activity identification calculations [7]. In [8] the researchers have designed a feature extraction model from temporal and spatial dimensions by using 3D convolutions, consequently capturing the movement data ciphered in different adjoining frames. From the input frames the model produces various modes of data, and the last feature consolidates data from various modes. The authors have come up with a factorized spatial-transient convolutional networks (FstCN) that factorizes the beginning three-dimensional convolution portion data as part of a successive cycle of understanding two-dimensional spatial bits in the latter layers, trailed by the understanding of one dimensional worldly pieces in the uppermost layers (called fleeting convolutional layers). Another novel method has been discussed in [8] by the authors for change and stage administrators to enable FstCN factorization conceivable. In addition, to communicate the concern of succession arrangement, they propose a compelling preparation and deduction procedure dependent on examining numerous video cuts from a given activity video grouping [9, 10]. After a study on the research articles published by various authors on activity recognition, considering different architectures like CNN-LSTM,3D CNN, Two Stream CNN [11–14], a new architecture has been proposed here, by extending VGG-16 and Bidirectional LSTM.

316

K. A. K. Reddy et al.

2.1 Data Set The design is implemented using the dataset UCF-101, for the presented algorithm for activity identification. The dataset consists of a total of 101 action classes and 13320 clips. The dataset comprises videos on the web that are recorded in uncontrolled conditions which prominently include the motion of the camera, differing lighting conditions, partial occlusion, frames with low quality, etc. The nature of videos in this dataset are broadly classified into five distinct categories: 1. 2. 3. 4. 5.

Human—particle Interaction Subject—Motion Only Person-Person Interaction Musical Instruments Sports based interaction.

The reasons behind choosing this dataset are as follows: 1. It includes a large number of classes which makes it comparatively larger than other datasets. 2. The dataset comprises of uncontrolled videos downloaded from YouTube with challenges like poor lighting, cluttered backgrounds, etc.

3 Proposed Architecture for Activity Identification In this novel work, a new and advanced design has been introduced for activity identification from videos using a Bidirectional LSTM and VGG-16 network.

3.1 Architecture Diagram The proposed approach uses a VGG-16 to extract options from the individual frames of the video, the sequence of frame options is then passed into a biface LSTM network for classification. Figure 1 shows the proposed architecture. The algorithm tends to cipher each timestep of the video as a collection of feature maps by passing them through a VGG-16 network; and then pass the output feature maps to a bidirectional LSTM to cipher them on the temporal space of the video, taking each pass forward in time and then backwards. Finally, this representation is passed on to a classifier to spot human action. At each stage, access to the next iteration and earlier inputs from a contemporary state authorizes the bidirectional LSTM to acknowledge the context of the contemporary input, granting higher accuracy on non-homogeneous and sophisticated datasets. Frame Extraction from Videos Initially, frames are extracted from the input video to pass them to the VGG-16 model. The challenge during this module is computing

A Hybrid Architecture for Action Recognition in Videos …

317

Fig. 1 Detailed understanding of the architectural diagram

the scale or range of keyframes for a particular video sequence. The steps involved are shown below: 1. Read the video file. 2. Extract frames for every one second (for example a one-minute video can provide sixty frames or images). 3. Each image extracted is saved as a single frame. Feature Extraction Using VGG-16 The existing VGG-16 model is imported from TensorFlow Keras. The image module has been imported to process the image object; also preprocessing the input module has been imported to measure pixel values accordingly with the VGG-16 model. A NumPy module has been imported for use by the same members; then the VGG-16 model is loaded and trained on the Imagenet dataset with pre-made weights. The VGG-16 design is a series of five neural convolutional layers accompanied by a single or thin layer (or fully connected). Firstly, the additional non-linear activation functions that VGG-16 model has compared to other models gives the network the ability to converge faster.Secondly, the 3 × 3 convolutions that are consistently present across the network makes the network very simple and easy to work with. Since, we need a model with good performance to extract features from the frames of the video, we have implemented the widely used VGG-16 model. Include_top lets us choose whether we need dense storage layers or not; False indicates that the ultimate dense layers do not seem to be enclosed once loading the model. When shaping the model, the input image is transferred to the expected size of the model in this case it is 224×224. The max-pooling layer output is embedded within the component vector transmitted to the bidirectional LSTM network.

318

K. A. K. Reddy et al.

Bidirectional LSTM-Based Action Recognition In bidirectional LSTM, timely output depends not solely on previous frames, but additionally on future frames. Bidirectional RNNs are straightforward; they are structured with two RNNs, one stacked atop the other. One RNN learns outcomes in the forward direction and the other learns in reverse. The combined results calculated supported the latent state of each RNN. Since in this proposed work, a pair of LSTM layers for future and past have been adopted.

3.2 Implementation The following steps describe the structure of this proposed model. This has been implemented using Keras. 1. Split the video into frames. 2. To extract useful traits from the individual frames of the video we use VGG-16 model. 3. Pass these features to the pre-trained Bidirectional LSTM model to obtain a prediction. 4. Show the predicted class and the actual class that the video belongs to. The VGG-16 model which has proven to be an excellent model for image classification coupled with the Bidirectional LSTM layers which allow the design to have both backward and forward information about the frames of the video at every time step proved to have better results in contrast to using the normal LSTM layers.

4 Results and Discussion Testing has been performed on the video samples by varying the number of test cases from the dataset. All the results in this work have been obtained by running the model on 15 different classes. Table 1 displays the metric of the accuracy of the prediction of classes by varying the testing sample size. As observed from Table 1, the best train-test split was found to be 80:20 split as 70:30 split and 60:40 split were not giving better accuracy. Thus 80:20 has been decided as the final split mode for this proposed work. This proposed model has been tested using various optimizers as shown in Table 2.

Table 1 Percentage of testing data and accuracy obtained % of testing data Accuracy (%) 20 30 40

86 78 77

A Hybrid Architecture for Action Recognition in Videos … Table 2 Performance of optimizers Optimizer % of testing data RMSProp RMSProp Adam Adam Adam

20 30 20 30 40

319

Accuracy (%) 81 79 84 82 78

Table 3 Accuracy of various regularization techniques and constants Regularization technique Accuracy (%) L1 regularization (constant value = 0.0001) L2 regularization (constant value = 0.0001) L1 regularization (constant value = 0.001) L2 regularization (constant value = 0.001)

84 82 75 90

From Table 2, it may be inferred that the Adam optimizer with 80:20 train-test split gave the best accuracy. Regularization: Regularization is a method that allows minor improvements to the learning process, such that the model is more generic. This in effect also improves the efficiency of the model on the unknown results. It helps in preventing over-fitting by adding the weight term in calculating the loss ensuring weights do not grow beyond a value according to the specified lambda value. In this proposed work, L1 and L2 Regularization techniques have been used. L1 regularization or the Lasso regression helps in selecting important features which improves feature selection, and L2 regularization or Ridge regression helps in avoiding over-fitting. The results of Table 3 were obtained by running it on the best split obtained previously as 80:20 with the Adam optimizer. From the table, it may be inferred that L2 Regularizer with a constant value equal to 0.001 gives the best accuracy. From the accuracy plot shown in Fig. 2, it can be observed that the training accuracy and validation accuracy keep increasing as the epochs keep increasing. By this, we can say that the model has not yet over-learned the training dataset, showing comparable accuracy on both datasets. From the loss plot shown in Fig. 3, we conclude that the training loss keeps decreasing as the epochs number increases. Another observation is that when the number of epochs becomes closer to 20, the validation loss remains almost constant suggesting that 20 is the optimum number of epochs, beyond which we will overfit the model. Since we are using ‘categorical cross entropy’ as the loss function for our model, we can observe that the loss value is greater than 1 initially. Additionally, from Table 4, we can see the Precision and Recall values of our model for five action categories. We can observe from the table that the Precision and Recall values are lower for the BrushingTeeth category compared to the values for outdoor

320

K. A. K. Reddy et al.

Fig. 2 Accuracy plot of the model after regularization

Fig. 3 Loss plot of the model after regularization Table 4 Precision and recall values for five action categories Category Precision Archery Basketball BrushingTeeth Biking CuttingInKitchen

0.832 0.920 0.659 0.969 0.862

Recall 0.924 0.948 0.824 0.948 0.964

A Hybrid Architecture for Action Recognition in Videos …

321

activities like Biking and Basketball. This is due to the poor lighting and lower quality of videos present in the dataset for the BrushingTeeth category compared to Biking and Basketball where the videos are taken in good lighting outdoor conditions with better video cameras. In the future, we can do some preprocessing to improve the lighting and quality artificially on the videos that belong to these categories like BrushingTeeth, ShavingBeard, etc. which are typically taken in poor lighting conditions.

5 Conclusion The proposed concept is an in-depth, intelligent self-realizing model, which can be used for the identification and categorization of individual actions by extracting frame-level features from VGG-16 and processing it through Bidirectional LSTM layers. This helps to identify the complex frames enclosed in the sequential patterns hidden in the elements, which is useful to read complex long sequences in videos. The proposed method removes features from the entire video frame. This proposed model can be extended to analyze the key independent regions for action perception.

References 1. Ma S, Bargal SA, Zhang J, Sigal L, Sclaroff S (2017) Do less and achieve more: training CNNs for action recognition utilizing action images from the web. Pattern Recogn 68:334–345 2. Ali KH, Wang T (2014) Learning features for action recognition and identity with deep belief networks. In: 2014 international conference on audio, language and image processing. IEEE, pp 129–132 3. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 4. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. HBU 5. Wu D, Sharma N, Blumenstein M (2017) Recent advances in video-based human action recognition using deep learning: a review. Int Joint Conf Neural Netw (IJCNN) 2017:2865–2872 6. KrishnaKumar P, Parameswaran L (2013) A hybrid method for object identification and event detection in video. In: 2013 fourth national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG), pp 1–4 7. Geetha M, Anandsankar B, Nair LS, Amrutha T, Rajeev A (2014) An improved human action recognition system using RSD code generation. In: Proceedings of the 2014 international conference on interdisciplinary advances in applied computing, pp 1–9 8. Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231 9. Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatiotemporal convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4597–4605 10. Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049 11. Chéron G, Laptev I, Schmid C (2015) P-CNN: pose-based CNN features for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3218–3226

322

K. A. K. Reddy et al.

12. Vinayakumar R, Soman KP, Poornachandran P (2017) Applying deep learning approaches for network traffic prediction. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2353–2358 13. Varma VSKP, Adarsh S, Ramachandran KI, Nair BB (2018) Real time detection of speed hump/bump and distance estimation with deep learning using GPU and ZED stereo camera. Proc Comput Sci 143:988–997 14. Kanishka ND, Bagavathi SP (2015) Learning of generic vision features using deep CNN. In: 2015 fifth international conference on advances in computing and communications (ICACC). IEEE, pp 54–57

Data Envelopment Analysis: A Tool for Performance Evaluation of Undergraduate Engineering Programs Vaidehi Bhaskara, K. T. Ramesh, and Sayan Chakraborty

Abstract Organizations strive to achieve optimum performance levels through higher efficiency of operations. Data envelopment analysis (DEA) is a technique employed to measure and evaluate the efficiency specifically in not-for-profit organizations. DEA is a data oriented, nonparametric, and linear programming methodology in operations research that is used to measure the relative efficiency of organizational units called decision-making units (DMUs) within an organization. In this research, DEA was applied to various undergraduate departments in a private engineering college in Bengaluru, India, to measure their relative efficiencies over a two-year period (2018–2020). Twelve departments (DMUs) and four variables were selected to carry out this analysis. The DEA model was designed using R-programming and final results were compared with Banxia’s Frontier Analyst Software. Results indicated that four of the twelve DMUs showed maximum efficiency. Keywords Data envelopment analysis · Relative efficiency · Engineering colleges · Decision-making units · Performance measurement

1 Introduction 1.1 Data Envelopment Analysis Data envelopment analysis is a non-parametric, linear programming, and dataoriented methodology in operations research that is used to measure the relative efficiencies of organizational units called decision-making units (DMUs). A DMU V. Bhaskara (B) · K. T. Ramesh Department of Industrial Engineering and Management, B.M.S College of Engineering, Bengaluru, India e-mail: [email protected] S. Chakraborty Department of Operations and IT, ICFAI Business School, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_24

323

324

V. Bhaskara et al.

is simply a group of individuals involved in the decision-making activities of an organization. The definition of a DMU is generic and flexible. Charnes et al. [1] proposed the concept of data envelopment analysis and explained that the relative efficiency of the DMU is the ratio of the maximum weighted output to the weighted input. The weights given for every single input and output are assigned by solving the linear programming problem. A common set of weights is not used for the inputs and outputs. Instead, the DEA recognizes each DMU and values the inputs and outputs differently. The final optimal solution gives the relative efficiency for each DMU and an efficiency score of one indicates that the particular DMU is relatively efficient. Any score less than one represents inefficiency. The proposed formulation to measure the efficiency of each DMU by Charnes et al. [1] is given by: s u r yr0 Max.h 0 = rm=1 i=1 vi x i 0

(1)

s u r yr j rm=1 ≤ 1 where j = 1, . . . n i=1 vi x i j

(2)

u r , vi ≥ 0 where r = 1, . . . , s and i = 1, . . . , m

(3)

Subject to:

Equation (1) provides the objective function for efficiency subject to two constraints represented in Eqs. (2) and (3). yr j and xi j are the outputs and inputs, respectively, of the jth DMU, and ur , vi are the variable weights.

1.2 DEA Versus Conventional Efficiency Approaches The main drawback of conventional efficiency calculation methods is that it does not provide the overall efficiency score for units that have multiple inputs and outputs for a homogeneous set. It was observed that assigning equal sets of weights was difficult for all the inputs and outputs to determine relative effects (because they concluded that each variable should adopt a set of weights). Thus, Charnes et al. [1] described few characteristics of DEA which differentiates it from the conventional efficiency approach as follows: • DEA considers multiple inputs and multiple outputs to calculate a single relative efficiency score. • A causal relationship has to exist between the inputs and outputs being measured. • The source and level of inefficiency for each input and output identified for each DMU is generated. • Each unit of every input and output variable is identical across all DMUs.

Data Envelopment Analysis: A Tool for Performance Evaluation …

325

• Relative efficiency is measured by comparing it with the best units, even if such units may not be absolutely efficient. • DEA is feasible even when there exists more than one factor contributing to the organization’s success.

1.3 Review of Literature A systematic literature review of various publications on the applications of DEA revealed the implementation of DEA as an effective performance measurement tool in a wide range of not-for-profit organizations in the fields like education, transportation, hospitality, agriculture, and finance. Bhagavath [2] applied DEA to evaluate the efficiency of 44 state road transport undertakings in India, and results indicated that those transport undertakings that operated as individual entities performed better than the rest. An output-oriented DEA model was designed by Neves and Lourenco [3] to analyze performance improvement strategies for a set of hotel companies and those companies that concentrated on their core business and productivity rather than the diversification or scale performed better. Additionally, Paco and Perez [4] applied DEA to increase the competitive environment among hotel services. With respect to agriculture, Sendhil et al. [5] discussed topics like components of efficiency, types of DEA frontiers, cost efficiency method, and steps in DEA analysis. Bakshi and Sinha [6] explored the application of DEA to the financial sector, where the efficiency of public sector banks in India was evaluated. Technical efficiency and DEA efficiency were measured and compared using different scales. Specifically addressing the area of education, Taylor and Harris [7] examined seven DEA models for universities in South Africa by varying input variables and maintaining constant output variables. Similarly, Kiong et al. [8] tested several secondary schools in Malaysia for efficiency analysis by considering both endogenous and exogenous variables based on overall student performance. A more recent application of DEA in the education stream in the context of India was carried out by Pai et al. [9], where DEA was applied at a premier business school in India. Here, two input and two output variables were considered, and this study highlighted that reducing the number of inputs for underperforming DMUs improves performance. With the understanding of the current literature, this research proposes the utilization of DEA to the undergraduate engineering programs in one of the top engineering academic institutions in Bengaluru, India. This study considered 12 undergraduate engineering departments for performance analysis, where each department individually served as a DMU. The objective was to identify departments with the highest relative efficiencies, and benchmark them for the lower-performing departments for relative improvement.

326

V. Bhaskara et al.

2 Problem Definition and Requirement Analysis Measurement of relative efficiencies of decision-making units (DMUs) within an organization requires a systematic approach that factors the multiple inter-related input and output parameters, in order to analyze and improve the overall performance of the organization. This study designed a framework to measure the relative efficiencies of various departments (DMUs), using a multiple input-multiple output model. A digital application using RStudio was created to carry out a data envelopment analysis to identify DMUs with the highest relative efficiencies and suggest corresponding improvements to the underperforming departments in order to meet the relative benchmark. This study’s requirement analysis mainly comprised the appropriate selection of inputs and outputs to obtain an effective DEA model. The ranking parameters used by India’s National Institutional Ranking Framework (NIRF) [10] for engineering colleges were closely studied to identify various inputs and outputs that could be used to measure the relative performances of the departments in this study. A causal relationship was also mapped for each of the sets of inputs and outputs to select the set with the highest correlation among variables. Using the Delphi technique [11], and based on the data available from the secondary data sources, the two most effective input and output variables were chosen. With this two-input, two-output selection, the relevant data was collected and analyzed.

3 Data Collection and Validation Data collection for an efficient DEA model includes selection of inputs, outputs and the DMUs that meet the following conditions: • The inputs and outputs should have a causal relationship. • For an efficient model, the number of DMUs should be three times the number of inputs and outputs. Various studies point out the basis for which the number of units may be selected. Bowlin [12] mentions the need to have three times the number of DMUs as there are input and output variables. Extending this to our research, we have considered this aforementioned recommendation. Given that we have two inputs and two outputs, Bowlin’s study fits well with our study as the college in our focus has 12 departments. On analyzing the parameters used by NIRF to rank universities in India, the Delphi technique was employed to finalize the inputs and outputs to be employed, as represented in Table 1. As established before, the underlying principle of DEA is linear programming. This study’s DEA was carried out using RStudio, and the obtained results were compared with Banxia’s Frontier Analyst Software. While weights may be used for a DEA, for the sake of simplicity and ensuring that biasing does not influence the veracity of the model, this study assumed equal weights.

Data Envelopment Analysis: A Tool for Performance Evaluation …

327

Table 1 Basis of compiling data for DEA Variable

Description

Basis of calculation

Input (X1)

No. of Ph.D. professors

Total number of professors with a Ph.D. degree in each department

Input (X2)

Faculty-student ratio

Ratio of annual student intake in each department to the total number of professors in the department

Output (Y 1) No. of selected publications Total number of conference/journal papers published by the faculty in collaboration with other faculty, experts and students Output (Y 2) No. of MoUs signed

Total number of MoUs signed by each department with other companies and/or universities

4 Solution Design 4.1 Data Normalizing In order to create the model on R, a compiled dataset was created on MS Excel: Data was collected from the college website over a three-year period (2018–2020), and included names of individual DMUs (in this case, each of the 12 undergraduate engineering departments being analyzed), the two input variables: (i) Number of Ph.D. Professors in the department, and (ii) Faculty-Student Ratio, and the two output variables: (i) Number of Selected Publications and (ii) Number of MoUs Signed.

4.2 Importing Data into R and Installing Packages The data from MS Excel was imported to RStudio. In order to perform data envelopment analysis in R, various in-built packages may be used. The R packages used in this study were ‘Benchmarking’ and ‘Psych’. ‘Benchmarking’ was used to carry out the DEA by identifying and displaying the relative efficiencies of each DMU, generating frontier plots to represent the convexity, and benchmarking departments with respect to each other. ‘Psych’ on the other hand, was used to describe statistical parameters like mean, variance, range etc., that aided in understanding the data and making suitable inferences.

4.3 Designing the DEA Model on R The DEA program on R was created using inbuilt functions under the ‘Benchmarking’ package like ‘eff ’ (to compute the relative efficiencies), ‘ccr’ (used for the constant returns to scale mode for this DEA), ‘dea.plot’ (to plot the efficiency frontiers), and

328

V. Bhaskara et al.

‘data.frame’ (to display results in a tabular form) [13]. The main function pertinent to the non-parametric feature of DEA that was used was the ‘shapiro.test’ function [14] which calculated the p-value to test normality. Functions from the ‘Psych’ [15] package like ‘summary’ (to display statistical results like mean, median, and quartiles) and ‘describe’ (to show variable-specific descriptive statistics like mean, range, skew, and kurtosis) were used to bring in the statistical view of the data being analyzed. The code written for this DEA study is presented in Appendix 1.

4.4 Verification on Banxia’s Frontier Analyst In order to compare the accuracy of the code on R, this DEA was also executed on Banxia’s Frontier Analyst. The MS Excel dataset was uploaded to the Frontier Analyst software. Input and output variables were manually categorized, after which the analysis was carried out. The input data, as shown in Table 2, was used and displayed on the ‘Data Viewer’ tab of the software.

5 Analysis of Results The results of this DEA study carried out for the 12 engineering departments showed that there were four departments with 100% relative efficiency. The results obtained on R and Frontier Analyst matched exactly, thereby validating the model designed on R. Table 2 Input data for Banxia’s Frontier Analyst Unit name (DMU)

No. of Ph.D. professors

Faculty-student ratio

Selected publications

MoUs signed

Biotech

5.00

6.00

11.00

2.00

Chem

7.00

4.62

22.00

3.00

Civil

23.00

4.39

69.00

4.00

CSE

15.00

5.63

90.00

6.00

EE

12.00

2.86

47.00

11.00

ECE

20.00

4.00

29.00

11.00

EI

4.00

5.00

15.00

5.00

TCE

7.00

4.29

33.00

1.00

IEM

8.00

4.62

34.00

4.00

ISE

10.00

6.21

68.00

4.00

Mech

32.00

3.33

30.00

6.00

ML

9.00

5.00

27.00

14.00

Data Envelopment Analysis: A Tool for Performance Evaluation …

329

5.1 Results Obtained in R The packages utilized in R displayed relevant statistical details and DEA insights in the results section. Table 3 summarizes the results obtained from the first part of the analysis representing all relevant descriptive statistical information like overall and variable-specific mean, quartiles, median, variance, range, kurtosis, etc. A Shapiro– Wilk normality test was conducted to test normality. Since the p-value obtained was 0.02849 (which is less than 0.05), it can be concluded that the dataset does not follow a normal distribution, thus emphasizing the non-parametric aspect of DEA [14]. Pertaining to the DEA aspect on R, the following results were obtained: DMUs with an efficiency score of 1 are the benchmark departments. As shown in Table 4, 4 departments (EEE, CSE, ISE, and ML) are the benchmark DMUs. R also displays the range of efficiencies in intervals for a step-by-step analysis of a particular class of DMUs. The three least efficient departments were found to be Biotech with an efficiency score of 0.433, followed by Mech with an efficiency score of 0.547, and Chem with an efficiency score of 0.561. In addition to the efficiency scores, a frontier plot was also generated as shown in Fig. 1. This frontier plot is output oriented, with DMU 1 (Civil), DMU 2 (Mech), and DMU 6 (CSE) forming the efficient frontiers of the plot. The highest correlation was found between no. of Ph.D. Professors (X1 as represented on the X-axis) and no. of selected publications (Y 1 as represented on the Y-axis): Mech has the highest number of Ph.D. professors, while CSE has the highest number of selected publications. DMU 1+Y 1 . Since 1 (Civil) has also formed the frontier because of its highest value of XX 2+Y 2 the method employed was to minimize inputs, this plot shows the performance of DMUs with respect to fixed outputs, thus making it possible to evaluate the methods by which inputs can possibly be rectified (preferably minimized), in order to meet the benchmarks. Table 3 Statistical results pertaining to this DEA study obtained on R Parameter

No. of Ph.D. professors

Faculty-student ratio

Selected publications

MoUs signed

Min

4.00

2.857

11.00

1.000

1st quartile

7.00

4.215

25.75

3.750

Median

9.50

4.615

31.50

4.500

Mean

12.67

4.661

39.58

5.917

3rd quartile

16.25

5.156

52.25

7.250

Max

32.00

6.207

90.00

14.000

Skewness

0.99

−0.12

0.76

0.73

Kurtosis

−0.26

−1.03

−0.8

−0.89

Std. error

2.43

0.29

6.98

1.16

Shapiro–Wilk test

p-value = 0.02849 W = 0.84102

330 Table 4 Relative efficiency scores obtained in R

V. Bhaskara et al. DMU No

DMU

Efficiency score

1

Civil

0.955

2

Mech

0.547

3

EEE

1.000

4

ECE

0.714

5

IEM

0.735

6

CSE

1.000

7

TCE

0.696

8

ISE

1.000

9

EI

0.969

10

ML

1.000

11

Chem

0.561

12

Biotech

0.433

Fig. 1 Output oriented frontier plot obtained on R

5.2 Results Obtained on Banxia’s Frontier Analyst Banxia’s Frontier Analyst carries out the DEA in a user-friendly approach. The ‘Analysis’ tab of this software has various options pertaining to various DEA functionalities. For this study’s analysis, the objective was to minimize the inputs with constant returns, and this setting was selected in the ‘Optimization mode’ under the ‘DEA options’ tab of the software. By selecting the ‘Analyze now’ option, the analysis was carried out. Results (as summarized in Table 5) were displayed with traffic toggle lights: Green for the most efficient DMUs (score = 100%), yellow for mediocre units (70% ≤ score < 100%), and red for the underperforming units (score

Data Envelopment Analysis: A Tool for Performance Evaluation … Table 5 Results obtained on Frontier Analyst

DMU

Score (%)

Efficient

331 Condition

Biotech

43.3

Red

Chem

56.1

Red

Civil

95.5

CSE

100.0



Green

Yellow

EE

100.0



Green

ECE

71.4

EI

96.9

Yellow

TCE

69.6

Red

IEM

73.5

ISE

100.0

Mech ML

Yellow

Yellow ✓

54.7 100.0

Green Red



Green

< 70%). Check marks were also displayed adjacent to DMUs with 100% relative efficiency scores. Additionally, similar to the frontier plot shown in Fig. 1, Frontier Analyst generated an X–Y plot as shown below in Fig. 2, with the input: no. of Ph.D. Professors (X1) on the X-axis and the output: no. of selected publications (Y 1) on the Y-axis. Similar results as mentioned in Sect. 5.1 can be inferred from this plot. In addition, Frontier Analyst also provided a correlation value of 0.32, indicating the relative closeness of association between X1 and Y 1 when compared to the other combinations of inputs and outputs considered in this study.

Fig. 2 X–Y plot on Frontier Analyst

332

V. Bhaskara et al.

5.3 Summary of Results The results obtained from both the applications matched, thus validating the R model designed in this study. The key results indicated that four departments: EEE, CSE, ISE, and ML, secured the highest relative efficiency score of 1, and three departments: CSE, Civil, and Mech, constituted the efficient frontiers. From the analysis conducted, various improvement strategies can be inferred in order to help improve the efficiency of the under-performing DMUs. The mode of optimization for this DEA was input minimization, i.e., the analysis was carried out with the assumption that with the least ratio of inputs, the target outputs could be met. Based on the efficiency scores obtained for each department, and on analysis of their corresponding inputs and outputs, it may be seen that improvement suggestions are department-specific. Every department must carefully calculate the effect of increasing each input and/or output on the overall efficiency scores. Thus, the following suggestions may be considered for the non-benchmarked departments in this study: • Encouraging students and faculty to publish more papers in Journals/Conferences. • Providing resources to reduce the faculty student ratio, by hiring more professors. • Motivating the existing faculty to pursue Ph.D. or hiring faculty with Ph.D. degrees. • Reaching out to external bodies, like other institutes/companies, and developing the department’s external network to increase the number of MoUs. The above strategies may be used if found feasible for implementation by the department and the college. The improvement methodologies need not be restricted only to the aforementioned strategies. Other methodologies may also be followed, as long as there exists a causal relationship between the current selection of input and output variables as used in this study.

6 Conclusion and Future Enhancements As seen in Table 6, the results of this study obtained from both the applications matched, thus validating the model generated on R. DEA is a benchmarking method where relative efficiency scores provide information about an organization’s capability to improve its inputs and outputs. Since it has been very difficult to conduct a general study on such organizations, a systematic and quantitative perspective is very much in demand. In this study, the relative efficiencies of 12 undergraduate engineering departments were measured. The dataset included two inputs and two outputs, and the most effective DEA model was designed and analyzed on RStudio with a verification of obtained results on Banxia’s Frontier Analyst software. At the end of the analysis, improvements were suggested for the departments with relatively lower efficiencies.

Data Envelopment Analysis: A Tool for Performance Evaluation …

333

Table 6 Cross validation of results obtained in R and Frontier Analyst Parameter measured for cross validation

R

Are exact results obtained on Frontier Analyst?

Departments with 100% relative efficiency score (Benchmarked Depts.)

EEE, CSE, ISE, ML

YES

Efficiency plots

Frontier plot = XY plot

YES

Values of relative efficiency scores for all departments (Non-Benchmarked Depts.)

EI = 0.969 Civil = 0.955 IEM = 0.735 ECE = 0.714 TCE = 0.696 Chem = 0.561 Mech = 0.547 Biotech = 0.433

YES

The following aspects may be considered as future enhancements of this study: • The model in this study was output oriented. However, input-oriented models may also be created. • Different DEA models (using other input and output parameters) may be created for the same set of DMUs used in this study to check for correlations between other input and output parameters. • For the sake of simplicity, constant weightages were considered in this study. However, other models with varying weightages may also be generated to test and compare the effect of biasing on the results thus obtained. • The consistency of this model may be tested by observing how the model behaves with time-dependent data. This analysis may be standardized by testing the consistency of the values of the correlation coefficients obtained over various time periods. Data Envelopment Analysis is a vastly growing technique with applications in many research areas as described by Cooper et al. [16]. This DEA can be extended to analyzing efficiencies of other colleges, institutes and organizations. The number and type of input and output parameters may be suitably changed depending on the data available for study and the type of organization being analyzed, as long as there exists a causal relationship between the selected inputs and outputs. Additionally, as seen in the recent research proposed by Loganathan and Hillemane [17], DEA may be further enhanced to incorporate more variables in several approaches like hybrid multicriteria decision-making (MCDM) models and slack-based models (SBM).

334

V. Bhaskara et al.

Appendix 1: Program Code for DEA of Departments Using R library(Benchmarking) library(psych) summary(DEA) describe(DEA$‘No. of Ph.D. Professors‘) describe(DEA$‘Faculty Student Ratio‘) describe(DEA$‘Selected Publications‘) describe(DEA$‘MoUs Signed‘) class(DEA) str(DEA) x E SNci Then Current round terminates and next round starts Else Current round continues and SNcj will be out of the race of selection phase SNci will be designated as cluster head (CH) node and will send messages to all the deployed sensor nodes inside the group End loop Else SNc will stay as idle until the cluster head selection phase is completed End Round

3.3 Association of Nodes This defines association of cluster head (CH) node with other deployed sensor nodes. In this step every individual deployed sensor node sends an echo signal ECo . The cluster head node is going to entertain this echo signal as a result all the deployed sensor nodes available in nearest neighborhood of cluster head respond to this echo appeal. The quickest sensor node retort for the corresponding echo signal is evaluated and in conclusion, cluster head node is selected with the purpose of nodes association phase. Algorithm Round: For each individual of the deployed sensor node SN Sends echo message (ECo ) and starts the round SN ← Cluster Head Responds (r cei ) Keep the track of each response time If Tr cei < Tr cej Then SNChi is designated as one of the cluster heads joinee End if End Round

Analysis of an Efficient Elite Group-Based Routing Protocol …

485

3.4 Selection of Elite Group This is the group whose nodes are the only ones going to take part in favor of prospect choice of cluster head devoid of assassination of their respective nodal energy. The deployed sensor nodes generally correspond with the cluster head node by means of flare messages in order to discover the outstanding power inside the corresponding node contained by the given specified radius. Each and every one of the other cluster nodes then drive their left-over power as message to the Cluster Head (CH) in order to compute the average residual energy as calculated in Eq. 2. E avg =



E ei1 . . . E en /n

 (3)

Hence, if the outstanding power of any individual deployed sensor node i is larger in comparison of the average left-over network energy E avg , subsequently node i becomes the constituent entity of the corresponding elite group. Or else, the corresponding deployed sensor node i will not be regarded as the constituent entity of the corresponding elite group. In addition to this, if the number of alive nodes value is greater than the certain threshold limit (Y ) then elite group gets erased and such type of elite group exists for no longer in upcoming rounds. Algorithm Round: For each individual of the cluster head node All the deployed sensor nodes available within the cluster are going to transmit their left-over energy as message to the corresponding Cluster head node  E avg = ( E ei1 …E en /n) If E ei > E avg Then N i is an entity within the corresponding elite group Else Is not an entity within the corresponding elite group End if If N alive > Y (Threshold) The corresponding elite group gets expunged Selection process is going to incorporate the first phase Else Process of cluster head selection gets carry on inside the corresponding elite group End if End Round In order to preserve and marmalade the power of the respective deployed sensor node for participating in the process of choice of cluster head in future, the cluster head node gets involuntarily elected as of the available elite group bunch of sensor nodes on the basis of the power evaluation in the midst of only the available deployed sensor nodes to a certain extent than concerning each and every one of the sensor

486

R. Shukla and A. Kumar

nodes deployed in the network area. In this routing scheme, the deployed sensor node aggregates the entire data or information and for this reason every sensor node enables the transmission of all of the data or information to the particular cluster head nodes. Further, these cluster head nodes will advance this required set of data or information to the corresponding base station.

4 Result and Analysis The implementation and simulation of the work has been carried out using NS2 network simulator. The distribution coordination functionality has been enabled using CSMA-CA. Furthermore, the other network related parameters with the intention of simulation for the deliberated and intended wireless network are presented in Table 2. Here, the wireless sensor nodes have been deployed quite randomly in 100 × 100 m2 area. Furthermore, the packet size for data communication is accommodated with 500 KB per message. With the intention of evaluation of performance of the designed wireless sensor network, various network performance parameters such as number of active alive nodes, remaining network energy, throughput, network lifetime etc. have been evaluated. The number or active alive nodes are simply the active nodes among the deployed sensor nodes. The remaining network energy is the energy remaining after each cycle which is sum of the energy of the all-active alive nodes. Hence, the remaining network energy corresponds to the network lifetime too. Throughput is going to be calculated as the ratio of the number of data packets are going to be received by the corresponding base station and the numbers of data packets are going to be transmitted by particular deployed sensor node. Network lifetime is Table 2 Imperative network parameters for designed wireless network

Network parameter

Parameter value

Field area for designed wireless network

100 m × 100 m

Overall number of sensor nodes deployed

10–100

MAC type

IEEE 802.11

Preliminary energy of sensor node deployed (E o )

500 mJ

Queue length

500

Transmitter power

360 mW

Idle state power

335 mW

Data broadcast model

Two ray-based ground model

Network simulation time

600 s

Packet size

500 KB

Analysis of an Efficient Elite Group-Based Routing Protocol …

487

regarded as the time interval commencing the starting of simulation of the designed wireless network until the last active or alive node. Furthermore, the network simulation results as number of dead nodes, number of residual alive nodes, remaining network energy, dissipated network energy, arrival of first dead node and last alive node, transmission delay and throughput at certain stages for 20 deployed nodes have been captured and analyzed for the designed wireless sensor network along with few existing schemes such as NCBR [21], DL-LEACH [22], EECRP [23] and are shown in Figs. 1, 2, 3, 4, 5, 6 and 7 and are tabulated in Table 3, 4, 5, 6, 7, 8 and 9.

Fig. 1 Comparison of number of dead nodes

Fig. 2 Comparison of number of active alive nodes

488

R. Shukla and A. Kumar

Fig. 3 Comparison of arrival of first dead node and last alive node

Fig. 4 Comparison of residual energy

From above simulation and analysis results it is fairly obvious that the proposed routing scheme enhances the network lifetime, throughput, transmission delay and energy efficient too. The elite group-based efficient selection of cluster heads is one of the major contributors here. Furthermore, probability and residual energybased cluster head selection with elite group communication provides the better communication from nodes to cluster heads and hence to base stations. In addition to this, the communication through elite group selection along with optimal cluster head selection guarantees the minimal energy wastage and dissipation in favor of the premeditated wireless network throughout entire network simulation during each

Analysis of an Efficient Elite Group-Based Routing Protocol …

489

Fig. 5 Comparison of dissipated energy

Fig. 6 Comparison of throughput

cycle. From comparison simulation results this is quite obvious that elite groupbased routing scheme outperforms as compared to existing schemes in terms of all the network parameters.

490

R. Shukla and A. Kumar

Fig. 7 Comparison of transmission delay

Table 3 Number of dead nodes Rounds completed

NCBR [21]

DL-LEACH [22]

EECRP [23]

Proposed work

1

0

0

0

0

2000

1

2

0

0

4000

5

7

4

2

6000

7

10

7

5

8000

10

13

9

8

10,000

14

16

12

10

DL-LEACH [22]

EECRP [23]

Proposed work

Table 4 Number of alive nodes Rounds completed

NCBR [21]

1

20

20

20

20

2000

19

18

20

20

4000

15

13

16

18

6000

13

10

13

15

8000

10

7

11

12

10,000

6

4

8

10

5 Conclusion In this paper, author has worked on the design and analysis of elite group-based routing scheme discussed in [12], that upshots in power cutback with the exclusive

Analysis of an Efficient Elite Group-Based Routing Protocol …

491

Table 5 Residual energy (J) Rounds completed

NCBR [21]

DL-LEACH [22]

EECRP [23]

Proposed work

1

10

10

10

10

2000

9.5

9

10

10

4000

7.5

6.5

8

9

6000

6.5

5

6.5

7.5

8000

5

3.5

5.5

6

10,000

3

2

4

5

Table 6 Dissipated energy (J) Rounds completed

NCBR [21]

DL-LEACH [22]

EECRP [23]

Proposed work

1

0

0

0

0

2000

0.5

1

0

0

4000

2.5

3.5

2

1

6000

3.5

5

3.5

2.5

8000

5

6.5

4.5

4

10,000

7

8

6

5

Table 7 Rounds completed till first dead node and last alive node Type of nodes

NCBR [21]

DL-LEACH [22]

EECRP [23]

Proposed work

First dead node

1941

1232

2504

3157

Last alive node

13,904

12,128

16,422

18,513

Table 8 Throughput Number of nodes

NCBR [21]

DL-LEACH [22]

EECRP [23]

Proposed work

20

167

176

202

210

40

182

185

213

222

60

194

198

223

232

80

211

214

231

245

100

223

227

238

260

Proposed work

Table 9 Transmission delay (us) Number of nodes

NCBR [21]

DL-LEACH [22]

EECRP [23]

20

118.69

110.96

104.53

95.28

40

168.37

161.83

145.86

128.13

60

215.91

213.66

201.13

175.22

80

270.59

257.42

235.18

213.01

100

315.85

294.71

272.92

245.12

492

R. Shukla and A. Kumar

features of better-quality network lifetime in favor of the designed wireless sensor networks. In this work, elite groups are formed where the selection of cluster head is limited and is subjected to only a few nodes which are having higher energy contained instead of all available network area nodes. This bifurcation results in reduced cluster head selection complexities and hence in enhanced network lifetime along with the optimized and minimal energy consumption during each round. The discussed simulation results demonstrate the dominance of the proposed novel routing technique with respect to various network performance parameters likely active alive nodes and dead nodes during each cycle, residual energy, network lifetime, data throughput, transmission delay, etc. Also, the results were compared with few existing schemes where the proposed work was found to be outperforming with respect to the existing works. In future, the proposed routing technique can be exercised with heterogeneous networks and performance can be evaluated with current ongoing researches in relevant domain.

References 1. Pino-Povedano S, Arroyo-Valles R, Cid-Sueiro J (2014) Selective forwarding for energyefficient target tracking in sensor networks. Signal Process 94:557–569 2. Javaid N, Rahim A, Nazir U, Bibi A, Khan ZA, Aslam MS (2012) Survey of extended leachbased clustering routing protocols for wireless sensor networks. In: Proceedings of the IEEE 14th international conference on high performance computing and communication and IEEE 9th international conference on embedded software and systems. pp 1232–1238 3. Paradis L, Han Q (2007) A survey of fault management in wireless sensor networks. J Netw Syst Manage 15(2):170–190 4. Abed A, Alkhatib A, Baicher GS (2012) Wireless sensor network architecture. In: International conference on computer networks and communication systems (CNCS 2012), vol 35. IACSIT Press, Singapore, pp 11–15 5. Milenkovi´c A, Otto C, Jovanov E (2006) Wireless sensor networks for personal health monitoring: issues and an implementation. Comput Commun 29(13–14):2521–2533 6. Akkaya K, Younis M (2005) A survey on routing protocols for wireless sensor network. Ad Hoc Network J 3(3):325–349 7. Al-Karaki JN, Kamal AE (2004) Routing techniques in wireless sensor networks: a survey. IEEE Wirel Commun 11(6):6–28 8. Yu M, Mokhar H, Merabti M (2007) A survey on fault management in wireless sensor networks. IEEE Wireless Commun 14(6):13–19 9. Yin MH, Win Z (2014) Fault management using clusterbased protocol in wireless sensor networks. Int J Future Comput Commun 3(1):36–39 10. Challal Y, Ouadjaout A, Lasla N, Bagaa M, Hadjidj A (2011) Secure and efficient disjoint multipath construction for fault tolerant routing in wireless sensor networks. J Netw Comput Appl 34(4):1380–1397 11. Chessa S, Santi P (2002) Crash fault identification in wireless sensor networks. Comput Commun 25(14):1273–1282 12. Shukla R, Kumar A, Niranjan V (2019) An efficient elite group-based routing protocol for wireless sensor network. Int J Electron. https://doi.org/10.1080/00207217.2019.1692372 13. Heinzelman W, Chandrakasan A, Balakrishnan H (2000) Energy-efficient communication protocol for wireless micro sensor networks. In: Proceedings of the 33rd annual Hawaii international conference on system sciences, vol 2. p 10

Analysis of an Efficient Elite Group-Based Routing Protocol …

493

14. Arumugam GS, Ponnuchamy T (2015) EELEACH: development of energy-efficient leach protocol for data gathering in WSN. EURASIP J Wireless Com Network 1:1–9 15. Tang C, Tan Q, Han Y, An W, Li H, Tang H (2016) An energy harvesting aware routing algorithm for hierarchical clustering wireless sensor networks. KSII Trans Internet Inf Syst (TIIS) 2(2) 16. Mahmood D, Javaid N, Mahmood S, Qureshi S, Memon AM, Zaman T (2013) MODLEACH: a variant of leach for WSNs. In: Eighth international conference on broadband and wireless computing, communication and applications (BWCCA). pp 158–163 17. Qing L, Zhu Q, Wang M (2006) Design of a distributed energy-efficient clustering algorithm for heterogeneous wireless sensor networks. Comput Commun 29(12):2230–2237 18. Lindsey S, Raghavendra C (2002) PEGASIS: power-efficient gathering in sensor information systems. In: Aerospace conference proceedings, vol 3. IEEE, pp 3–1125–3–1130 19. Jin Y, Wang L, Kim Y, Yang X (2008) EEMC: an energy-efficient multi-level clustering algorithm for large-scale wireless sensor networks. Comput Netw 52(3):542–562 20. Manjeshwar A, Agrawal DP (2001) TEEN: a routing protocol for enhanced efficiency in wireless sensor networks. In: Proceedings 15th international parallel and distributed processing symposium. pp 2009–2015 21. Rajeh TM, Saleh AI, Labib LM (2018) A new cooperative balancing routing (CBR) protocol to enhance the lifetime of wireless sensor networks. Wireless Pers Commun 98(3):2623–2656 22. Lee JY, Jung KD, Moon SJ, Jeong H (2016) Improvement on leach protocol of a wide-area wireless sensor network. Multimedia Tools Appl 1–18 23. Shen J, Wang A, Wang C, Hung PC, Lai CF (2017) An efficient centroid based routing protocol for energy management in WSN-assisted IoT. IEEE Access 5:18469–18479 24. Sherazi HHR, Grieco LA, Boggia G (2018) A comprehensive review on energy harvesting MAC protocols in WSNs: challenges and tradeoffs. Ad Hoc Netw 71:117–134 25. Abidi W (2017) Fuzzy cluster head election algorithm based on LEACH protocol for wireless sensor networks. pp 993–997 26. Gupta S, Marriwala N (2017) Improved distance energy-based LEACH protocol for cluster head election in wireless sensor networks. In: 4th international conference on signal processing, computing and control (ISPCC). IEEE, Solan, India, pp 91–96

A Systematic Study of Fake News Detection Systems Using Machine Learning Algorithms Ravish and Rahul Katarya

Abstract Day by Day, the Internet has been the essential requirement of a human being. A considerable number of persons are its consumers. Generally, these persons utilize the Internet for different purposes. Any consumer may create a post or broadcast the news through these online platforms like Facebook, Instagram, Twitter, LinkedIn, etc. These platforms do not identify the consumers or their posts. Therefore, few consumers try to broadcast the FN through social media platforms. These FN may be publicity against individuals, organizations, society, etc. Generally, a human being is not able to find all these FN. Machine learning (ML) has played an essential role in classifying or detecting the data with some disadvantages. There is a requirement for ML-based classifies, which may identify these FN automatically. The framework and ML-based classifiers are used for detecting the FN and existing results as described in this review article. Keywords Fake news detection (FND) system · Social media (SM) · Machine learning (ML) · Facebook · Twitter database

1 Introduction Generation invests most of the time communicating on SM, as the worldwide acceptance of intelligent phones creates their access accessible almost anywhere and anytime. They simplify communication with family members, relatives, friends, and strangers through the review or comment basis, through discussion, comments/quickly like and dislike buttons [1]. According to the researchers, in 2020, fifty-three percent of United States adults say they attained news from SM “sometimes” or “often,” with fifty-nine percent of Twitter consumers and fifty-four percent of FACEBOOK (FB) consumers spending news on the site frequently. Stimulatingly, fifty-nine percent of those who attained Ravish (B) · R. Katarya Big Data Analytics and Web Intelligence Laboratory, Department of Computer Science, Delhi Technological University, New Delhi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_34

495

496

Ravish and R. Katarya

news on SM said they predictable that news to be highly imprecise [2]. SM allows minimum cost, easy access, and faster data broadcasting. The majority of people explore news from SM compared with the traditional organizations these days. Basically, on the one side, SM has performed and become a great source of data and carried people together. On the other side, it is harming society. FN detection on SM faces various challenges. Initially, it was not easy to collect FN information. Also, it is not easy to label FN manually. It is closed messaging applications because they are purposely written to deceive readers. The distortion spread by trusted news/their relatives and friends is not easy to consider as a FAKE. It is difficult to identify the reliability of new developing and limited news as they are not easy to train the app database [3]. Essential methods to define credible consumers, extract reliable or valuable news features, and advance a security-based data system are reliable research areas that require future research. Various methods to manage the issue of misinformation on SM. Statistical methods verify the correlation among several data features; pattern analyses the dissemination. Usually, ML-based methods classify unreliable information and study the accounts that share such content. Several methods are concerned with implementing security or authentication methods and specific case analysis [4]. Several categories of FN by analyzers of article [5] in their current article is summarized below: • Consumer-based [6]: It is a type of fabricated news made by duplicate accounts and directed to a particular audience representing certain age of the class, sex, political affiliations, and culture. • Visual-based: These FN posts utilize graphics various content that may add doctor video, images, or hybridization. • Knowledge-based: These categories of posts give reasonable explanations to some unresolved problems and create consumers to believe it is secure. The research gaps are as follows: (i) Supervised learning (SL) is supported by the current method. (ii) Semi-structured and structured information are acceptable. (iii) The classification method accuracy is minimal, and the mean square error rate (MSER) is maximum. Advantages: FN detection will help manage the broadcast of FN over social sites or media. They can help consumers make more informed decisions, and they are not made to think regarding what others are trying to manipulate. An FN detection system will optimize the handle to verify the data authenticity of the news physically and save time [7]. Disadvantages: The detection accuracy of the FN will be 100%, and some papers may be classified as incorrect [7]. The review paper is systematized as follows: Sect. 2 discusses the FND system issues, details literature of review to classify and detect the FN using deep learning and machine learning-based Classification methods. Section 3 discusses the detailed description of the fake news detection system. Section 4 shows that the FN detection using ML-based KNN, SVM, NB, DT, RF, etc. Section 5 defines the existing result

A Systematic Study of Fake News Detection Systems …

497

analysis and Sect. 6 the conclusion and further improvements of the FN detection system.

2 Literature Review Most people have used their daily lives to associate with the Social Networks (SNs) and the Internet [8]. People worldwide utilize the INTERNET on the growth with the suitability of distribution, assessing, and sending news via the SNs and Internet. That means it is easy to distribute the data without limitations while posting it on these SNs platforms. Moreover, the circulated data might give both original/duplicate or FN (fake news). Some attacker consumers take benefit of SNs platforms by creating FN, scattering them on the SNs and INTERNET to destroy the status of politics, businesses, etc. FN is a big issue in every country. FN can be identified and should stop sharing information before it causes future damage to society. Observing FN is challenging because it’s not static. Mees [9] proposed a novel architecture for Thai FN classification. The novel architecture was shown in three different stages such as (i) Information Retrieval (IR), (ii) Natural Language Processing (NLP), and (iii) Machine Learning (ML). The proposed architecture has two steps: (i) Data collection and ML-based designing phase. (ii) First step, they attained information from THAI online news websites using web crawler IR. Using NLP methods, they studied the information to fetch reliable feature sets from web data. On a comparison basis, they choose well-defined classifiers based on ML models such as NB, LR, KNN, MLP, SVM, DT, RF, LSTM. The evaluation analysis of the test data showed that the LSTM methods were highly successful, and they developed an automatic online FN detection web app. Mahabub [10] discussed ensemble voting classifier (EVC) based on an intelligent detection system (IDS) implemented to work with the news detection of original and FN. Now, 11 mostly well-defined ML-based methods such as NB, KNN, SVM, RF, artificial neural network (ANN), LR, gradient boosting (GB), and AdaBoost. These methods have been used for the detection of fake news. After cross-validation (CV), they used the best 3 ML-based classification methods in EVC. The simulation results affirmed that the research model could accomplish 94.5% of the ACC rate. Some other metrics are precision (P), recall (R), ROC score, and F1score, were also outstanding. The research models may efficiently search for the most significant highpoints of the news. These may be developed in other detection methods to identify FPs (fake profiles), messages. de Souza et al. [11] implemented a network-based method on +ive and unlabeled learning-based label propagation and a one-class and transudative semi-supervised learning approach (PU-LP). This method executed classification by 1st detecting potential interest and non-interest documents into unlabeled information and then propagating labels to categorize the left unlabeled documents. Label propagation (LP) method was then employed to detect the left unlabeled documents. They evaluated the proposed model’s system presentation, measuring only documents with terms. The comparison analysis has measured 4 one-class learning (OCL) methods employed in 1-class text classification such

498

Ravish and R. Katarya

as KNN, K-means clustering, density-based, 1-class SVM, dense autoencoder, and another PUL method. The methods were calculated in 3 news collections, extremely unbalanced, considering balanced scenarios. They used doc2vec and bag-of-words methods to convert news into structure format data. Outcomes had shown which PU– LP methods were more reliable and attained better outcomes than other methods. Rohit Kumar Kaliyar et al. [12] discussed several approaches for the FN detection system that employed successive neural networks (NNs) to encrypt news material and community context-level data. The texture order was studied in a unidirectional path. So, a bidirectional training method was important for modeling the reliable data of FN, i.e., proficient in enhancing the detection performance to consider long-distance dependencies and semantics in sentences. They implemented a bidirectional encoder representation from the transformers-based DL method (Fake-BERT) by merging distinct similar blocks of the particular layer of convolutional NN with different filters and kernel sizes (KSs) with the BERT model. The mixture was valuable to manage ambiguity that was a challenge to NLP understanding. The classification showed that the research approach outperforms the existing methods with an Acc of 98.90 ~ 99%. Nasir et al. [13] implemented a new hybrid DL method that mergers RNNs and CNNs for FN detection. The proposed method was successfully reliable on 2 FN databases such as FA-KES, and ISO, attaining classifying outcomes that were expressively enhanced than other non-hybrid baseline databases. Future simulations on the simplification of the research framework across various databases had promising outcomes. Table 1 discusses the proposed methods, datasets, performance metrics, simulation tools, and improvements surveyed in the various articles. The proposed methods have shown the detection performance metrics with different datasets and tools. Improvement has defined which the proposed models have resolved the problem.

3 A Framework of Fake News Detection (FND) System FN is excessively and not static creation it hard to construct an ML-based model as an FND system. Moreover, it is possible to construct a vigorous novel method. For example, news creators allocate and issue online news; they may enter it speedily via the Internet. The World Wide Web and SM keep duplicate and original news on the cloud servers. A person posts a reviews conversation on the news website and shares it on SM. The framework of the FND system depends on three different phases: (i) Information retrieval (IR), (ii) Natural language processing (NLP), and (iii) ML-based model. The first phase recovers data from the Internet according to the news inquiry fed by the consumer. The outcomes from the explored content are reliable news information from various online news bases. After that, NLP is 2nd phase which studies the saved records by evaluating cleaning, feature extraction (FE), and segmentation.

A Systematic Study of Fake News Detection Systems …

499

Table 1 Analysis of the literature review Author name/year

Proposed methods

Meesad et al. [8]

Crawler-based Thai fake news IR NLP-based NLP ML-based Methods: LR, KNN, NB, MLP, RF, DT, LSTM algorithm

Mahabub et al. [9]

Ensemble voting classifier SVM NB EGB RF LR KNN

Retweets social Best score acc network using pre rec ROC instagram score images

De Souza et al. [10]

PUL and OCL methods

Portugueses datasets policy datasets

Kaliyar et al. [12]

BERT

Fake news dataset

Nasir et al. [13]

CNN

Datasets

Metrics

Tool used

Improvement

Acc pre rec F-measure

Python with Django Apache2

LSTM was the most excellent prototypical that attained 100% on the test set evaluated by parameters

Average standard

Accuracy rate Confusion

FakeBERT

Matrix

LSTM

FPR, and FNR

Hybrid CNN-RNN algorithm

ISO-KES Dataset

ROC false positive rate (FPR) actual positive rate (TPR)

EV classifier achieved improved sufficiency scores t-SNE tool

PUL and OCL methods were calculated as balanced and unbalanced databases The proposed model has improved the accuracy rate compared with the existing methods It improves the detection performance in the form of accuracy rate

Abbreviations Fake news (FN), social networks (SNs), machine learning (ML), information retrieval (IR), web crawler (WC), natural language processing (NLP), Naïve Bayes (NB), LR, KNN, SVM, random forest (RF), decision tree (DT), long short-term memory (LSTM), ensemble voting classifier (EVC), intelligent detection system (IDS), artificial neural network (ANN), gradient boosting (GB), accuracy (ACC), precision (P), recall (R), label propagation (LP), one-class learning (OCL), neural network (NN), bidirectional encoder representation from Transformers (BERT), Deep learning (DL), convolutional neural network (CNN), recurrent neural network (RNN), False Positive Rate (FPR), False Negative Rate (FNR).

Lastly, the machine learning phase classifies the news into three categories: original, suspicious, and duplicate (fake). Figure 1 shows the process of the FND system framework. FND system contains two steps: (i) Collection of News and Training section. (ii) Classification or Prediction of ML-based model.

500

Ravish and R. Katarya

Fig. 1 The framework of fake news detection system [9]

It collects data from online (www) and social sites. After that, apply the preprocessing steps such as cleaning, splitting, and processing the news data. To train the ML-based model as an FN detection system. The collection of data and training process, the information retrieval step the web to store the news information from news websites. However, the system transfers web crawlers (WCs) to retrieve and make similar news lists for each news query. The reliable news record can get featured information from training with ML-based model. Each query of news will behave as a user query (UQ). Its resources, which the WC will make the web get a reliable new record conforming to the news query. Natural language processing will procedure the saved news record and reappearance featured information. NLP phase gets the news data and makes texture segmentation (TS), FE, and Cleansing. Lastly, the outcome features information flow to construct the ML. They utilized the featured information for labeling them as a natural/suspicious, fake/duplicate.

4 Various Fake News Classification Methods This section focused on machine learning (ML) methods used to detect fake news. ML is an analysis area in computer science (CS) that includes generating adaptive programs which may study via trained information. There are various categories of ML, such as supervised, unsupervised learning, semi-supervised learning, and

A Systematic Study of Fake News Detection Systems …

501

Fig. 2 Representation of the dataset: PieChart [15]

reinforcement learning. Usually, constructing an ML-based model executes information preparation for twice sets; (i) train and (ii) test data. The ML-based model has been studied from train data. The consumers calculate the trained ML-based model using test data. They mainly focused on supervised learning, which includes KNN, Logistic regression (LR), SVM, Decision Tree (DT), Long short-term memory (LSTM) classifiers, etc.

4.1 Logistic Regression (LR) Classifier This classifier calculates the metrics of a logistic model. It is arithmetical modelingbased ML utilized to explain the association of various independent to a dependent binary (0,1) variable. This method is applied to train the data and suitable regression analysis (RA) for the dependent binary variable. Usually, this method explains the connection between dependent and independent variables, mainly defining at least one sequence, range, and ratio level [14]. Figure 2 represents the pie chart diagram of the dataset. For example, Fail/Pass, Lose/Win, Dead/Alive. These are defined by a binary variable, where the two qualities are marked “1” or “0”.

4.2 KNN Classifier This algorithm is a simple non-parameter classifier. A case-based learning approach preserves all train data sets for the detection task. To utilize the k-nearest neighbor classifier, they require to select a reliable k-value and the classifier outcome based on this given value. Generally, there are several ways to choose a k-value, then the

502

Ravish and R. Katarya

simple path is to execute the method numerous times with distinct k-values and select the greatest efficient one. To predict information using the k-nearest neighbor [10] model, they require three different things: (i) Train data stored, (ii) K-Value, (iii) Similarity metric (SM) or distance. KNN classifier performs as follows: • To read the data and data record to classify. • To evaluate the distance of SM among the predicting information record to all stored train information. • To choose a “K” minimum distance. • To categorize the information depends on the expected vote from the k-nearest information labels.

4.3 Decision Tree (DT) Classifier It is a tree-type model defined as a recursive partition of the information space. DT method contains the most discriminant node, forming an rooted tree (RT). The whole tree has initiated with an origin node that does not have an incoming link/edge. Other nodes gave only one inward link/edge. Nodes with external links are known as a test or intermediate nodes. At the minimum stages, nodes are known as leaves that are decision nodes (DNs). Various DT induction approaches include C4.5, C5, and ID3 [16].

4.4 Random Forest (RF) Classifier This method is used for classification tasks. The general overview behind random forest is a class of weak learners may form a strong learner. It can predict massive databases with maximum precision and correctness rate. Its performances as a classifier with each tree based on an random vector (RV) value. It creates various DTs at the time of training. The modalities detected by individual trees were generated using BOOTSTRAP [12] samples of train data and random selection (RS) of attributes in tree induction (TI). Detection forms by merging utilizing majority vote (MV)/avg. of all DTs.

4.5 Naïve Bayes (NB) Classifier It is easy to learn a probabilistic-based approach, which uses Bayes rule in combination with the explicit hypothesis that qualities are provisionally independent of each other. The training data depends on naïve Bayes evaluations of each class’s

A Systematic Study of Fake News Detection Systems …

503

subsequent probability P(y |x  ), y of a defined object, x  . They may utilize the calculation for organization applications because of its evaluation effectiveness and various required features. It looks like a suitable solution in various practical designs [17].

4.6 Support Vector Machine (SVM) Classifier This classifier depends on the learning of statistical theory. Various investigators developed support vector machines [7] for several presentations classifying data records or pattern recognition (PR). This model has theoretical ideas are as follow: • Structural Risk Minimization (SRM) is the idea, which expresses the spread of risk/the likelihood of learning exceptions. This learning procedure controls the decision-making method to optimize the means square error rate (MSE). Kernel function (KF) plays a vital role in supporting a vector machine (VM) method. This function maps information from input space to feature space to generate non-linear decision-making methods to information in the learning space [18]. • The central concept is defined as the best margin HYPERPLANE of the VM support method. SVM method is a learning procedure to search the plane with the maxima margin. It may divide the information into two classes and resolve the problem of over-fitting.

4.7 Long Short-Term Memory (LSTM) Classifier It is a DL approach in an recurrent neural network (RNN) [19] class. RNN gives hidden state (HS) review as input which creates it possible to consider the dependency of sequence records like natural languages (NLs) and Time Series (TSs). It is not only to procedure an individual data point (DP) but also to procedure SEQ data. Long Short-Term Memory [18] classifier, implemented to resolve the issue of blowing and disappearance the slope error rate (SER) faced in a traditional recurrent neural network, is well-defined for classification, detection, and processing depending on TSs information as there can be an indefinite period among actions in TSs. Mainly LSTM is used for several tasks like a sentiment analysis (SA) from documents, handwriting, and speech recognition.

5 Existing Result Analysis This section discussed the existing performance metrics in various machine learningbased methods. The machine learning-based models are explained in detail in Sect. 4. The studies ML-based methods such as Naïve Bayes, Logistic Regression, SVM, KNN, LSTM, DT, and RF. These ML-based methods are showing the different

504

Ravish and R. Katarya

performance parameters such as Accuracy, Recall, Precision, and F-measure. SVM and RF classifier has achieved an accuracy value is 96% as compared with the other ML-based methods. Table 2 and Fig. 3 represent the detailed summary outcomes of all ML-based models. LSTM model achieved the maximum acc, pre, rec, F-measure with the accurate score; all performance metrics are 1.00. Naïve Bayes classifier has the minimum performance as compared to other ML-based models. Table 3 defines the cross-validation of different ML-based classification Techniques. Figure 4 represents the cross-validation scores; they selected the best machine learning-based classification methods. KNN method has optimized the cross-validation error rate compared to the other ML-based methods [10, 21]. Table 2 Comparisons with ML model [9, 20] ML-based model/parameters

Accuracy

Precision

Recall

F-measure

Naïve Bayes

78

85

78

79

Logistic regression

92

92

92

92

Support vector machine

96

96

96

96

K-nearest neighbor

93

93

93

93

Long short-term memory

1.00

1.00

1.00

1.00

Decision tree

92

92

92

92

Random forest

96

97

96

96

Fig. 3 Comparison analysis with proposed and existing ML-based methods: accuracy

A Systematic Study of Fake News Detection Systems … Table 3 Comparison of cross-validation score of various ML-based classification methods [10]

505

ML-based model

Cross-validation error rate

KNN

79.6

DT

82.6

RF

83.59

LR

91.97

NB

89.17

Fig. 4 Comparison with several ML-based methods: cross-validation error rate [10]

6 Conclusion and Further Work The conclusion of the review article, ML-based classification methods are used for FN detection systems. Typically, these classifiers are initially trained with a database known as a training database. After that, these ML-based classifiers may spontaneously classify or detect the FN. This literature review describes the framework of FN detection and ML-based classifiers, which need the labeled information for training. The labeled database is not simply available, which may be utilized for training the classifiers for identifying the FN. This article has defined the MLbased methods with different performance metrics. ML-based methods achieved a high accuracy rate, recall, precision, and were compared with the other metrics. Another comparative analysis with a cross-validation error rate. The error rate has been reduced by the KNN method as compared with the existing methods. Future work research may utilize DL-based classifiers such as RNN, VGGnet, Xception, GoogleNet to detect FN.

506

Ravish and R. Katarya

References 1. Shearer E, Mitchell A (2021) News use across social media platforms in 2020 2. Hernon P (1995) Disinformation and misinformation through the Internet: findings of an exploratory study. Gov Inf Q 12(2):133–139 3. Jain A, Shakya A, Khatter H, Gupta AK (2019) A smart system for fake news detection using machine learning. In: International conference on issues and challenges in intelligent computing techniques (ICICT), vol 1. IEEE, pp 1–4 4. Bahad P, Saxena P, Kamal R (2019) Fake news detection using bi-directional LSTM-recurrent neural network. Procedia Comput Sci 165:74–82 5. Conroy NK, Rubin VL, Chen Y (2015) Automatic deception detection: methods for finding fake news. Proc Assoc Inf Sci Technol 52(1):1–4 6. Stahl K (2018) Fake news detection in social media. Calif State Univ Stanislaus 6:4–15 7. Bhogade M, Deore B, Sharma A, Sonawane O, Singh M (2021) A review paper on fake news detection. Int J 6(5):1–6 8. Shu K, Wang S, Lee D, Liu H (2020) Mining disinformation and fake news: concepts, methods, and recent advancements. In: Disinformation, misinformation, and fake news in social media. Springer, Cham, pp 1–19 9. Meesad P (2021) Thai fake news detection based on information retrieval, natural language processing and machine learning. SN Comput Sci 2(6):1–17 10. Mahabub A (2020) A robust technique of fake news detection using ensemble voting classifier and comparison with other classifiers. SN Appl Sci 2(4):1–9 11. de Souza MC, Nogueira BM, Rossi RG (2021) A network-based positive and unlabeled learning approach for fake news detection. Mach Learn 1–44 12. Kaliyar RK, Goswami A, Narang P (2021) FakeBERT: fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools Appl 80(8):11765–11788 13. Nasir JA, Khan OS, Varlamis I (2021) Fake news detection: a hybrid CNN-RNN based deep learning approach. Int J Inf Manage Data Insights 1(1):100007 14. Kleinbaum, D. G., Dietz, K., Gail, M., Klein, M., & Klein, M., Logistic regression (p. 536). New York: Springer-Verlag, 2002. 15. Harrell FE (2015) Ordinal logistic regression. In: Regression modeling strategies. Springer, Cham, pp 311–325 16. Guo G, Wang H, Bell D, Bi Y, Greer K (2003) KNN model-based approach in classification. In: OTM confederated international conferences on the move to meaningful internet systems. Springer, Berlin, Heidelberg, pp 986–996 17. Hamsa H, Indiradevi S, Kizhakkethottam JJ (2016) Student academic performance prediction model using decision tree and fuzzy genetic algorithm. Procedia Technol 25:326–332 18. Ahmad I, Yousaf M, Yousaf S, Ahmad MO (2020) Fake news detection using machine learning ensemble methods. Complexity 19. Yager RR (2006) An extension of the naive Bayesian classifier. Inf Sci 176(5):577–588 20. Dietrich R, Opper M, Sompolinsky H (2019) Statistical mechanics of support vector networks. Phys Rev Lett 82(14):2975 21. Choudhury N, Faisal F, Khushi M (2020) Mining temporal evolution of knowledge graphs and genealogical features for literature-based discovery prediction. J Informetrics 14(3):101057

Training Logistic Regression Model by Hybridized Multi-verse Optimizer for Spam Email Classification Miodrag Zivkovic , Aleksandar Petrovic , Nebojsa Bacanin , Marko Djuric , Ana Vesic, Ivana Strumberger , and Marina Marjanovic

Abstract Spam emails pose a significant threat to end users, annoying them and wasting their time. To counter this problem, numerous spam detection systems have been proposed recently, where the most of the solutions have grounds in the machine learning algorithms, due to their efficiency in classification tasks. Unfortunately, existing spam detection solutions typically face low detection rate and generally have troubles in dealing with high-dimensional data. To address this problem, this paper suggests a hybrid spam detection approach by combining the logistic regression classifying model with the hybridized multi-verse optimizer swarm intelligence metaheuristics. The proposed approach was validated on a public benchmark dataset (CSDMC2010) and compared to other cutting-edge techniques. The obtained results indicate that the suggested hybrid approach outperforms other spam detection solutions included in the comparative analysis, by achieving the highest classification accuracy.

M. Zivkovic (B) · A. Petrovic · N. Bacanin · M. Djuric · A. Vesic · I. Strumberger · M. Marjanovic Singidunum University, Danijelova 32, 11000 Belgrade, Serbia e-mail: [email protected] A. Petrovic e-mail: [email protected] N. Bacanin e-mail: [email protected] M. Djuric e-mail: [email protected] A. Vesic e-mail: [email protected] I. Strumberger e-mail: [email protected] M. Marjanovic e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_35

507

508

M. Zivkovic et al.

Keywords Multi-verse optimizer · Hybridization · Optimization · Spam email · Swarm intelligence · Feature selection · Classification · Logistic regression

1 Introduction Even though an increase in popularity is registered when it comes to mobile applications for messaging and chatting, e-mail remains a vital part of daily life when it comes to Internet communication. According to Statista [1], the global amount of e-mail users in 2020 was around 4 billion, and that number is estimated to grow up to 4.6 billion users by the end of 2025. The year 2020 has averaged 306 billion e-mails sent and received worldwide on a daily basis. It is estimated that this number will rise up to 376 billion daily e-mails by the end of 2025. Even though the number of spam e-mails is slowly declining over the years in 2020, the percentage of spam in e-mails was 47.3%. Furthermore, most spam e-mails are nothing more than a nuisance; there is a small percent that is malicious. Nonetheless, according to Verizon Data Breach Incident Report (DBIR) [2], 94% of malware found its way into execution via e-mail. It is also worth noting that the enormous volume of spam e-mail affects network resources such as network bandwidth, file storage space, and computational sources. A variety of different methods were utilized throughout the years in order to combat spam. The majority of them can be classified into different groups: heuristic filters, blacklisting, whitelisting, challenge-response systems, collaborative spam filtering, honey pots, signature schemes, and machine learning-based methods [3]. The simplest classification puts these methods into two large groups: static and dynamic. The static group includes methods for filtering approaching mail that is established on whitelists and blacklists [4]. Given the fact that spam senders tend to utilize various e-mail addresses, and even victims of malware can send spam without their knowledge; these methods are considered ineffective. Dynamic group on the other hand utilizes statistical techniques and machine learning methods in combination with text classification in order to check if an e-mail is spam. Data cleaning is the first step on the road of classifying if an e-mail is considered spam or not. It is important to clean the data in hopes of decreasing the probability of getting substandard results since some words have no effect on the classification. Words can also be normalized with the goal of grouping same-meaning words and reducing redundancy. By improving the condition of the data, the accuracy can be improved. By eliminating stop words, stemming and lemmatization the results of machine learning algorithms can be enhanced. This study proposes a logistic regression (LR) approach that is trained by multiverse optimizer algorithm [5]. Its goal is to utilize all of the benefits of logistic regression such as fast classification, efficiency, and simplicity. The goal of avoiding its convergence to poor local minima is accomplished with multi-verse optimizer (MVO) training. The MVO is a nature-inspired stochastic population-based algorithm that was compared to four popular algorithms (gray wolf optimizer, parti-

Training Logistic Regression Model by Hybridized …

509

cle swarm optimization, genetic algorithm, gravitational search algorithm) and has shown excellent results. The authors believe this to be the first application of LR trained by the MVO algorithm in spam detection, as the literature search provided no results. For the purpose of this research, the basic implementation of the MVO has been hybridized with artificial bee colony algorithm (ABC), in order to address the known deficiencies of the basic MVO. The contributions presented in this research paper can be considered as triple-fold: • The basic MVO implementation has been hybridized with the ABC algorithm in order to devise a novel metaheuristics that utilizes the advantages of both basic algorithms. • A novel way of improving a known statistical regression method is demonstrated by utilizing a previously proposed algorithm. • The application of the said proposed method to a classification problem addressing spam detection and categorization. The remainder of this work is structured according to the following: Sect. 2 gives a short overview of works covering similar approaches and methods, as well as a brief summary behind this paper’s inspiration. Section 3 covers the approaches used in this research as well as the inspiration for the novel algorithm approach. Section 4 explains the experimental setup, research findings, and also shows comparative analysis with other state-of-the-art techniques for spam email classification. Section 5 gives a conclusion based on the results attained, as well as covering potential future works on similar subjects.

2 Preliminaries and Related Works Due to its popularity with giant corporations in the field of technology like Google, Facebook, Amazon, and Microsoft, the field of machine learning has risen to take the place of one of the most important topics in the last decade when it comes to computational science solving important issues like the COVID-19 case prediction in Zivkovic et al. [6] and some less critical problems like Bezdan et al. [7] with text documentation clustering. Generally, speaking machine learning (ML) indicates a system’s capability to collect and combine knowledge by using large-scale examinations, to develop and broaden itself by studying new knowledge as opposed to being programmed with said knowledge. Machine learning techniques are another important part because they allow computers to study from data inputs, and after that to use statistical analysis to develop highly precise outputs. There are numerous types of models that are produced by machine learning, most notable among them being: neural network, support vector machine, random forest, self-organizing map, and Bayesian network. When it comes to training a machine learning model, the goal is to divide a dataset in order to have the training, testing, and validation parts while performing cross-validation.

510

M. Zivkovic et al.

Logistic regression is an effective method when it comes to analyzing classification problems. The objective is to resolve whether a new sample matches a certain category [8, 9]. Logistic regression tends to be an uncomplicated and highly effective method when it comes to classifying binary and linear problems. The benefits of logistic regression are that it is fairly simple to implement, and it produces excellent results with linearly separable classes. It is broadly used when it comes to classification. The most important objective is to determine the weights that diminish a cost function and lessen the model’s intricacy and avoid overfitting. Gradient descent is a very popular optimization approach that is widely used in machine learning. The strength of gradient descent lies in its simplicity and ease of utilization described by Goel et al. [10] and Lei et al. [11]. Generally, it is utilized when training data models can be mixed with every algorithm. The most notable gradient descent models are batch gradient descent, stochastic gradient descent, and mini-batch gradient descent. The majority of time gradient descent methods are used to train a machine learning model, but they run the risk of falling into local optimum, which is why swarm intelligence is being used. Swarm intelligence is a term that represents a collection of techniques that excel in search and optimization. Inspiration for these techniques was found in nature, and they work by analyzing cumulative intelligence in groups of low complexity entities. Swarm intelligence algorithms are influenced by the cooperation of individuals that form one larger or multiple smaller groups that both cooperate and compete with each other. These algorithms utilize groups of entities in order to explore the dimension of the problem. Every entity can produce a possible solution for the problem. When it comes to the exploration process, swarm intelligence algorithms retain and develop a selection of solutions unless certain conditions are fulfilled, which usually means that a certain point of iterations was hit. Nature-inspired stochastic metaheuristic swarm intelligence algorithms which are considered to be a subset of artificial intelligence (AI) mimic the actions of animal groups that form a collective intelligence by acting independently while exchanging information with others, and the size of such population is discussed by Piotrowski et al. [12]. This allows the algorithms to focus their effort on the most promising regions. The goal is for individual entities which can specify possible solutions to exchange information among each other and to statistically become better over iterations until they produce a good enough result. Some of the more notable swarm intelligence algorithms are particle swarm optimization (PSO) [13], artificial bee colony (ABC) [14], firefly algorithm (FA) [15], gray wolf optimizer (GWO) [16], monarch butterfly algorithm (MBA) [17]. Swarm intelligence (SI) has found a lot of use in the design of machine learning algorithms [18, 19]. It has shown good results in many different fields such as intrusion detection [20], solving prediction problems [21, 22], and optimization of machine learning [23, 24]. SI alongside machine learning has shown good results when it comes to spam e-mail detection. Some of the works in that field include spam filtering using LR trained by advanced bee colony algorithm [4], spam detection using fireworks algorithm, negative selection algorithm, and particle swarm optimization

Training Logistic Regression Model by Hybridized …

511

for spam detection [25], a hybrid approach combining particle swarm optimization and random forest for spam detection. A significant number of complex problems exist that can be addressed by swarm intelligence algorithms, including: cloud computing [26–28], wireless sensor network optimization [29–33], neural network optimization [7, 34–43], machine learning and COVID-19 cases forecasting [6, 44], and finally complex problems in the field of medicine [45–47].

3 Proposed Method In this section, a brief description of the basic MVO algorithm by Mirjalili [5] is shown, along with its features workings. Next, according to the known and observed deficiencies of the basic MVO implementation, a novel metaheuristics has been devised, by hybridizing the original MVO with ABC algorithm.

3.1 Basic MVO Algorithm Multi-verse optimizer is a stochastic population-based algorithm. It drew its inspiration from multi-verse theory in physics that argues that there is more than one big bang, and each one causes the birth of a universe. In turn, these universes can interact with each other and might even collide. MVO algorithm is composed of three main concepts: wormholes, black holes, and white holes. According to physicists, white holes are believed to be the main components in the birth of a universe, and the big bang may be considered to be a white hole. Black holes on the other hand stand completely opposite the white holes, and they attract everything with their strong gravity. Wormholes act as tunnels through time and space where an object can travel instantly between different parts of a universe or even between different universes altogether. Like all population-based algorithms, it divides the search process into two phases: exploration and exploitation. White and black holes concepts are translated into formulas for exploration, while wormholes are used for exploitation. As shown in the study, the process of optimization follows these rules: • If the inflation rate is high in a universe, the chances of having a white hole are higher. • If the inflation rate is low, the chances that a black hole will appear are higher. • Universes with higher inflation rates tend to send objects through white holes, while universes with lower inflation tend to receive objects through black holes. • The objects in all universes can spontaneously move toward the best universe via wormholes despite the inflation rate.

512

M. Zivkovic et al.

According to the original study, objects can travel from one universe to the other through white/black hole tunnels. When such a tunnel is formed between universes, it is assumed that the universe with higher inflation has a white hole, and on the opposite side, a universe with lower inflation has a black hole. The mechanism that transfers objects from white holes to destination black holes allows the universes to quickly exchange objects. There is a high chance for objects to be moved from a high inflation universe to a low inflation one. This in turn guarantees the improvement of the average inflation rates of the whole universes over iterations. The roulette wheel mechanism was used to mathematically model the tunnels and exchange the objects. In every iteration, the universes are sorted based on their inflation rates, and one is chosen by the chance for it to have a white hole.

3.2 MVO Hybridized with ABC Metaheuristics Regardless of the excellent performance exhibited by the original implementation of the MVO algorithm, extensive simulations with benchmark functions have exposed some drawbacks. As it is the case with numerous other swarm intelligence algorithms, MVO can get stuck and linger in the suboptimal areas of the search domain, and completely miss the area where optimum resides. Additionally, it was observed that in some cases, basic MVO suffers from slow converging speed. Both mentioned issues are rooted to the exploration–exploitation balance, where in some cases of the algorithm’s run the MVO’s exploration is not powerful enough to locate the most promising domains of the search space, leading to the convergence toward the solutions that are not optimal, finally resulting in worse mean values. To overcome the limited exploration of the basic MVO, the algorithm has been hybridized with the ABC metaheuristics that is very good at exploration. As described in original paper that proposed the ABC algorithm [14], if an employed bee doesn’t improve its food sources, the food source will be deserted, and that particular bee will become a scout bee. The abandoned food source will be removed, and the new, random food source (solution) will be generated. This behavior is implemented with the control parameter limit that is utilized to establish which solution should be abandoned. Inspired by this mechanism that ABC uses for exploration, the exploration power of the MVO has been enhanced by utilizing the same simple approach of replacing depleted solutions. All solutions that have not been improved after certain number of iterations will be replaced by the pseudo-random solutions. Therefore, this hybrid algorithm, named MVO-ABC, uses the same limit parameter as described in the basic ABC algorithm [14]. In order to implement this approach, each solution in the hybrid algorithm has been expanded with one attribute—trial. When the solution has not been improved in a given iteration, trial value is incremented, and when it reaches the limit value, it is removed from the population and replaced by a new, random solution.

Training Logistic Regression Model by Hybridized …

513

The proposed MVO-ABC algorithm does not increase significantly the complexity of the basic MVO implementation. The most costly operation in any swarm intelligence algorithm is the fitness function evaluation (F F E), and since the proposed hybrid algorithm does not introduce additional F F Es, the complexity is similar to the original MVO. The pseudocode for the proposed MVO-ABC, where the termination condition is set as the maximum number of F F Es (Max F F Es), may be seen in Algorithm 1. Algorithm 1 Pseudocode for the original MVO algorithm while max F F E is not satisfied: for each universe (solution) as i : for each object (solution’s component) as j : r2 = random value between 0 and 1; if r 2 lt Chance_of_wormhole_esixting: r3 = random value between 0 and 1; r4 = random value between 0 and 1; if r 3 < 0.5: U (i, j) = O ptimal _univer se( j) + T ravelling _distance_rate ∗ ((ub( j) − lb( j))) ∗ r4 + lb( j) else: U (i, j) = O ptimal _univer se( j) − T ravelling _distance_rate ∗ ((ub( j) − lb( j))) ∗ r4 + lb( j) end if end if end for end for Replace each solution whose trial = limit with pseudo-random individual from the search end while Show best solution, post-process results and visualization

4 Experimental Setup, Findings and Comparative Analysis The work achieved employs a dataset for spam email detection training called CSDMC2010 SPAM corpus which was used for a competition in data mining. The performance of the algorithms was measured on the said dataset. The structure of the dataset is 68.15% valid emails and 31.85% spam emails that amount to 4327 total emails. The number of valid emails is 2949, while there are 1378 spam emails in the dataset. The number of distinct terms contained in the CSDMC2010 is 82,148. The dataset was evaluated as imbalanced with the factor of 2.14 and sparse with the percentage of 90.48% with the feature vector size of 1000 in the referenced work or Dedeturk and Akay [4]. The authors experimented with the MVO algorithm for the purpose of training the logistic regression model. The original MVO algorithm and the proposed MVO-ABC were compared to the implemented ABC algorithm on the same dataset in the work of [4]. The authors have also implemented the ABC algorithm independently and repeated the experiments as described in [4]. Similar results have been obtained like the [4] research, and the algorithm was tested with various parameters alike in the referenced work, including the limit parameter values of 100 and 200, as reported in Table 1. SN represents the number of solutions in a population.

514

M. Zivkovic et al.

Table 1 ABC algorithm trained LR classification statistics on the CSDMC2010 dataset SN

MR

limit

Feature vector size = 500 Best (%)

40

0.05 0.08 0.1 0.2

60

0.05 0.08 0.1 0.2

80

0.05 0.08 0.1 0.2

Worst Median (%) (%)

Feature vector size = 1000

Mean Std. (%) (%)

Best (%)

Worst Median (%) (%)

Mean Std. (%) (%)

100

98.18 97.81

98.03

98.04 0.11

98.57 98.18

98.39

98.38 0.09

200

98.32 97.78

98.03

98.01 0.12

98.66 98.22

98.45

98.44 0.10

100

98.16 97.86

97.99

98.01 0.07

98.64 98.15

98.30

98.33 0.11

200

98.18 97.81

98.03

98.03 0.09

98.48 98.11

98.34

98.34 0.11

100

98.18 97.74

97.93

97.93 0.11

98.51 98.04

98.27

98.25 0.13

200

98.16 97.77

97.94

97.96 0.10

98.55 98.02

98.24

98.24 0.12

100

98.06 97.46

97.75

97.75 0.14

98.36 97.76

98.04

98.03 0.14

200

98.01 97.63

97.75

97.77 0.10

98.32 97.72

98.06

98.03 0.16

100

98.34 97.95

98.09

98.11 0.10

98.63 98.25

98.45

98.41 0.10

200

98.20 97.93

98.10

98.07 0.07

98.71 98.25

98.45

98.42 0.11

100

98.22 97.83

98.07

98.05 0.10

98.55 98.13

98.41

98.39 0.11

200

98.25 97.90

98.06

98.08 0.08

98.60 98.18

98.37

98.36 0.09

100

98.18 97.81

97.99

98.02 0.10

98.54 98.15

98.31

98.32 0.11

200

98.25 97.86

98.05

98.05 0.11

98.56 98.11

98.32

98.35 0.12

100

98.16 97.57

97.85

97.83 0.14

98.36 97.81

98.11

98.12 0.15

200

98.10 97.46

97.82

97.81 0.15

98.36 97.65

98.12

98.11 0.15

100

98.18 97.76

98.03

98.06 0.09

98.57 98.20

98.33

98.36 0.09

200

98.18 97.88

98.02

98.02 0.08

98.48 98.11

98.35

98.34 0.09

100

98.23 97.81

98.03

98.01 0.11

98.54 98.15

98.38

98.35 0.11

200

98.23 97.71

98.04

98.02 0.12

98.54 98.11

98.39

98.33 0.11

100

98.29 97.83

98.04

98.03 0.13

98.53 98.18

98.34

98.34 0.09

200

98.22 97.86

97.99

97.97 0.10

98.53 98.13

98.31

98.33 0.11

100

98.06 97.54

97.83

97.84 0.15

98.34 97.86

98.10

98.08 0.11

200

98.06 97.66

97.88

97.89 0.11

98.34 97.74

98.07

98.05 0.14

Table 2 MVO algorithm trained LR classification statistics on the CSDMC2010 dataset SN

Feature vector size = 500

Feature vector size = 1000

Best (%)

Worst (%)

Median Mean (%) (%)

Std. (%)

Best (%)

Worst (%)

Median Mean (%) (%)

Std. (%)

40

98.09

97.66

97.85

97.86

0.12

98.43

98.03

98.29

98.31

0.10

60

98.16

97.78

97.98

97.93

0.10

98.54

98.19

98.38

98.36

0.11

80

98.09

97.68

97.94

97.89

0.10

98.51

98.16

98.29

98.25

0.09

In the experiments and thorough testing conducted in this research, it was established that the limit parameter value 200 yields better results. Consequently, the results reported in Tables 2 and 3 were obtained with the limit = 200. The experiments in this research were executed with forty, sixty, and eighty solutions in the population for each algorithm observed in the comparative analysis.

Training Logistic Regression Model by Hybridized …

515

Table 3 MVO-ABC algorithm trained LR classification statistics on the CSDMC2010 dataset SN

Feature vector size = 500

Feature vector size = 1000

Best (%)

Worst (%)

Median Mean (%) (%)

Std. (%)

Best (%)

Worst (%)

Median Mean (%) (%)

Std. (%)

40

98.35

97.85

98.07

98.05

0.11

98.72

98.29

98.49

98.47

0.09

60

98.38

97.98

98.15

98.15

0.10

98.78

98.32

98.56

98.53

0.10

80

98.35

97.88

98.06

98.08

0.09

98.71

98.42

98.52

98.50

0.09

The research [4] presented the results with a limit parameter values 100 and 200. As stated earlier, the authors have implemented and executed the experiments with ABC independently, and obtained similar results, as presented in Table 1. After extensive simulations, it was observed that all included algorithms achieve better results with the limit = 200, and this value was used for simulations that have been reported in Tables 2 and 3. From the findings presented in the tables, it can be concluded that the LR classifier with basic MVO algorithm’s performances is slightly below the results obtained by the LR classifier with ABC algorithm, for all observed cases (population size of 40, 60, and 80 individuals, and feature vector sizes 500 and 1000). At the same time, the LR classifier backed-up with the proposed MVO-ABC hybrid algorithm outperforms the ABC model in all simulations, where the limit of 200 was observed. Standard deviation for ABC and MVO-ABC is similar in all simulations, so it can be deducted that both algorithms have similar stability. Additionally, it can be noted that the proposed MVO-ABC approach generally obtained better results for the best and the worst values as well. For better visualization of the obtained results, the mean accuracy statistics for all three algorithms is presented in Figs. 1, 2 and 3. Figure 1 depicts the mean accuracy of LR classifiers with ABC, MVO, and MVO-ABC, respectively, for limit = 200, SN = 40 and feature vector sizes of 500 and 1000. Figure 2 displays the mean accuracy of LR classifiers with ABC, MVO, and MVO-ABC, respectively, for limit = 200, SN = 60 and feature vector sizes of 500 and 1000. Similarly, Fig. 3 portraits the mean accuracy statistics for all three discussed approaches for limit = 200, SN = 80, and feature vector sizes of 500 and 1000. It can be seen that the proposed MVO-ABC approach obtains approximately 0.05% better accuracy than ABC for feature vector size value of 500. The difference is slightly more obvious for feature vector size of 1000, where the proposed MVO-ABC outperforms the ABC method by 0.10–0.15% in average. The experimental findings suggest that the proposed MVO-ABC method is suitable for this particular task, keeping in mind that it outperformed the LR classifier with ABC proposed in [4]. It is worth noting that it was shown in [4] that ABC model used as the reference in this research clearly outperformed traditional classifiers, including Gaussian NB, linear SVM, SVM with RBF kernel, and LR by gradient descent.

516

M. Zivkovic et al.

Fig. 1 Comparative analysis of classification accuracy of the LR classifier over CSDMC20210 dataset (SN = 40, limit = 200)

Fig. 2 Comparative analysis of classification accuracy of the LR classifier over CSDMC20210 dataset (SN = 60, limit = 200)

5 Conclusion The efforts in preventing spam are still required since the perpetrators always find new ways to avoid the existing systems, and those systems are not 100% reliable. The successful effort in improving the existing solutions is the logistic regression metaheuristic-backed solution with the hybridized MVO-ABC algorithm. The MVOABC further enhances the performance of LR by helping the method stay out of local

Training Logistic Regression Model by Hybridized …

517

Fig. 3 Comparative analysis of classification accuracy of the LR classifier over CSDMC20210 dataset (SN = 80, limit = 200)

minima. The authors did not find any similar implementations in the literature of this algorithm; hence, the validation of the results was performed on the similar leading solutions exploiting other algorithms with the same problem. The dataset used for training is the CSDMC2010 with 82,148 distinct terms, across 2949 valid emails and 1378 spam emails. The comparative analysis shows that the proposed solution improves performance with this task and the implementation of the MVO-ABC algorithm yielded results. Other swarm-based solutions are to be explored by the authors, as well as other machine learning techniques. Acknowledgements The paper is supported by the Ministry of Education, Science and Technological Development of Republic of Serbia, Grant No. III-44006.

References 1. Johnson J (2021) Number of sent and received e-mails per day worldwide from 2017 to 2025 2. Verizon: 2019 data breach investigations report (2019) 3. Bhowmick A, Hazarika SM (2018) E-mail spam filtering: a review of techniques and trends. In: Kalam A, Das S, Sharma K (eds) Advances in electronics, communication and computing. Springer, Singapore, pp 583–590 4. Dedeturk BK, Akay B (2020) Spam filtering using a logistic regression model trained by an artificial bee colony algorithm. Appl Soft Comput 91:106229 5. Mirjalili S, Mirjalili SM, Hatamlou A (2016) Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl 27(2):495–513 6. Zivkovic M, Bacanin N, Venkatachalam K, Nayyar A, Djordjevic A, Strumberger I, Al-Turjman F (2021) Covid-19 cases prediction by using hybrid machine learning and beetle antennae search approach. Sustain Cities Soc 66:102669

518

M. Zivkovic et al.

7. Bezdan T, Stoean C, Naamany AA, Bacanin N, Rashid TA, Zivkovic M, Venkatachalam K (2021) Hybrid fruit-fly optimization algorithm with k-means for text document clustering. Mathematics 9(16):1929 8. Ahmed A, Jalal A, Kim K (2020) A novel statistical method for scene classification based on multi-object categorization and logistic regression. Sensors 20(14):3871 9. Shah K, Patel H, Sanghvi D, Shah M (2020) A comparative analysis of logistic regression, random forest and KNN models for the text classification. Augment Hum Res 5(1):1–16 10. Goel S, Gollakota A, Jin Z, Karmalkar S, Klivans A (2020) Superpolynomial lower bounds for learning one-layer neural networks using gradient descent. In: International conference on machine learning, PMLR, pp 3587–3596 11. Lei Y, Ying Y (2020) Fine-grained analysis of stability and generalization for stochastic gradient descent. In: International conference on machine learning, PMLR, pp 5809–5819 12. Piotrowski AP, Napiorkowski JJ, Piotrowska AE (2020) Population size in particle swarm optimization. Swarm Evol Comput 58:100718 13. Zhang X, Liu H, Tu L (2020) A modified particle swarm optimization for multimodal multiobjective optimization. Eng Appl Artif Intell 95:103905 14. Karaboga D, Basturk B (2007) A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J Glob Optim 39:459–471 15. Yang XS (2009) Firefly algorithms for multimodal optimization. In: Watanabe O, Zeugmann T (eds) Stochastic algorithms: foundations and applications. Springer, Berlin, Heidelberg, pp 169–178 16. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 17. Alweshah M, Al Khalaileh S, Gupta BB, Almomani A, Hammouri AI, Al-Betar MA (2020) The monarch butterfly optimization algorithm for solving feature selection problems. Neural Comput Appl 1–15 18. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626 19. Sedki A, Ouazar D, El Mazoudi E (2009) Evolving neural network using real coded genetic algorithm for daily rainfall-runoff forecasting. Expert Syst Appl 36(3, Part 1):4523–4527 20. Arul A, Subburathinam K, Sivakumari S (2015) A hybrid swarm intelligence algorithm for intrusion detection using significant features. Sci World J 2015:574589 21. Qiao W, Huang K, Azimi M, Han S (2019) A novel hybrid prediction model for hourly gas consumption in supply side based on improved whale optimization algorithm and relevance vector machine. IEEE Access 7:88218–88230 22. Qiao W, Yang Z, Kang Z, Pan Z (2020) Short-term natural gas consumption prediction based on Volterra adaptive filter and improved whale optimization algorithm. Eng Appl Artif Intell 87:103323 23. Sun Y, Xue B, Zhang M, Yen G (2018) An experimental study on hyper-parameter optimization for stacked auto-encoders, pp 1–8 24. Itano F, de Abreu de Sousa MA, Del-Moral-Hernandez E (2018) Extending MLP ANN hyperparameters optimization by using genetic algorithm. In: 2018 international joint conference on neural networks (IJCNN), pp 1–8 25. Idris I, Selamat A, Thanh Nguyen N, Omatu S, Krejcar O, Kuca K, Penhaker M (2015) A combined negative selection algorithm-particle swarm optimization for an email spam detection system. Eng Appl Artif Intell 39:33–44 26. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Multi-objective task scheduling in cloud computing environment by hybridized bat algorithm. In: International conference on intelligent and fuzzy systems. Springer, pp 718–725 27. Bacanin N, Bezdan T, Tuba E, Strumberger I, Tuba M, Zivkovic M (2019) Task scheduling in cloud computing environment by grey wolf optimizer. In: 2019 27th telecommunications forum (TELFOR). IEEE, pp 1–4 28. Zivkovic M, Bezdan T, Strumberger I, Bacanin N, Venkatachalam K (2021) Improved Harris hawks optimization algorithm for workflow scheduling challenge in cloud–edge environment. In: Computer networks, big data and IoT. Springer, pp 87–102

Training Logistic Regression Model by Hybridized …

519

29. Zivkovic M, Bacanin N, Tuba E, Strumberger I, Bezdan T, Tuba M (2020) Wireless sensor networks life time optimization based on the improved firefly algorithm. In: 2020 international wireless communications and mobile computing (IWCMC). IEEE, pp 1176–1181 30. Bacanin N, Tuba E, Zivkovic M, Strumberger I, Tuba M (2019) Whale optimization algorithm with exploratory move for wireless sensor networks localization. In: International conference on hybrid intelligent systems. Springer, pp 328–338 31. Zivkovic M, Bacanin N, Zivkovic T, Strumberger I, Tuba E, Tuba M (2020) Enhanced grey wolf algorithm for energy efficient wireless sensor networks. In: 2020 zooming innovation in consumer technologies conference (ZINC). IEEE, pp 87–92 32. Bacanin N, Arnaut U, Zivkovic M, Bezdan T, Rashid TA (2022) Energy efficient clustering in wireless sensor networks by opposition-based initialization bat algorithm. In: Computer networks and inventive communication technologies. Springer, pp 1–16 33. Zivkovic M, Zivkovic T, Venkatachalam K, Bacanin N (2021) Enhanced dragonfly algorithm adapted for wireless sensor network lifetime optimization. In: Data intelligence and cognitive informatics. Springer, pp 803–817 34. Strumberger I, Tuba E, Bacanin N, Zivkovic M, Beko M, Tuba M (2019) Designing convolutional neural network architecture by the firefly algorithm. In: 2019 international young engineers forum (YEF-ECE). IEEE, pp 59–65 35. Milosevic S, Bezdan T, Zivkovic M, Bacanin N, Strumberger I, Tuba M (2021) Feed-forward neural network training by hybrid bat algorithm. In: Modelling and development of intelligent systems: 7th international conference, MDIS 2020, Sibiu, Romania, 22–24 Oct 2020. Revised selected papers, vol 7. Springer, pp 52–66 36. Cuk A, Bezdan T, Bacanin N, Zivkovic M, Venkatachalam K, Rashid TA, Devi VK (2021) Feedforward multi-layer perceptron training by hybridized method between genetic algorithm and artificial bee colony. Opportunities and challenges, data science and data analytics, p 279 37. Bacanin N, Bezdan T, Zivkovic M, Chhabra A (2022) Weight optimization in artificial neural network training by improved monarch butterfly algorithm. In: Mobile computing and sustainable informatics. Springer, pp 397–409 38. Gajic L, Cvetnic D, Zivkovic M, Bezdan T, Bacanin N, Milosevic S (2021) Multi-layer perceptron training using hybridized bat algorithm. In: Computational vision and bio-inspired computing. Springer, pp 689–705 39. Bacanin N, Alhazmi K, Zivkovic M, Venkatachalam K, Bezdan T, Nebhen J (2022) Training multi-layer perceptron with enhanced brain storm optimization metaheuristics. Comput Mater Contin 70(2):4199–4215 40. Bacanin N, Bezdan T, Venkatachalam K, Zivkovic M, Strumberger I, Abouhawwash M, Ahmed A (2021) Artificial neural networks hidden unit and weight connection optimization by quasirefection-based learning artificial bee colony algorithm. IEEE Access 41. Bacanin N, Zivkovic M, Bezdan T, Cvetnic D, Gajic L (2022) Dimensionality reduction using hybrid brainstorm optimization algorithm. In: Proceedings of international conference on data science and applications. Springer, pp 679–692 42. Bacanin N, Petrovic A, Zivkovic M, Bezdan T, Antonijevic M (2021) Feature selection in machine learning by hybrid sine cosine metaheuristics. In: International conference on advances in computing and data sciences. Springer, pp 604–616 43. Bacanin N, Stoean R, Zivkovic M, Petrovic A, Rashid TA, Bezdan T (2021) Performance of a novel chaotic firefly algorithm with enhanced exploration for tackling global optimization problems: application for dropout regularization. Mathematics 9(21) 44. Zivkovic M, Venkatachalam K, Bacanin N, Djordjevic A, Antonijevic M, Strumberger I, Rashid TA (2021) Hybrid genetic algorithm and machine learning method for covid-19 cases prediction. In: Proceedings of international conference on sustainable expert systems: ICSES 2020, vol 176. Springer, p 169 45. Bezdan T, Zivkovic M, Tuba E, Strumberger I, Bacanin N, Tuba M (2020) Glioma brain tumor grade classification from MRI using convolutional neural networks designed by modified FA. In: International conference on intelligent and fuzzy systems. Springer, pp 955–963

520

M. Zivkovic et al.

46. Bezdan T, Milosevic S, Venkatachalam K, Zivkovic M, Bacanin N, Strumberger I (2021) Optimizing convolutional neural network by hybridized elephant herding optimization algorithm for magnetic resonance image classification of glioma brain tumor grade. In: 2021 zooming innovation in consumer technologies conference (ZINC). IEEE, pp 171–176 47. Basha J, Bacanin N, Vukobrat N, Zivkovic M, Venkatachalam K, Hubálovsk`y S, Trojovsk`y P (2021) Chaotic Harris hawks optimization with quasi-reflection-based learning: an application to enhance CNN design. Sensors 21(19):6654

Optimization of Spatial Pyramid Pooling Module Placement for Micro-expression Recognition Marzuraikah Mohd Stofa, Mohd Asyraf Zulkifley, Muhammad Ammirrul Atiqi Mohd Zainuri, and Mohd Hairi Mohd Zaman

Abstract Facial expressions are essential to human communication, which can be divided into micro and macro-expressions. Between these two expressions, facial micro-expression portrays the more authentic emotion even though the reactions are minute and subtle. Hence, micro-expression is hard to detect and recognize since the reactions are short-lived between 0.04 and 0.2 s with exceedingly modest amplitudes. The deep learning approach has recently shown significant adoption in the domain of micro-expression recognition. The most popular technique is based on VGG-M, which has been tested on multiple datasets that include SMIC, CASME II, and SAMM. In this work, a variety of spatial pyramid pooling layer configurations were tested to discover the best network design for micro-expression recognition. The findings show that the improved VGG-M model with spatial pyramid pooling modules achieved higher accuracy compared to the original VGG-M model. The optimal network configuration determined by the experiment consists of three parallel branches that are placed after the second layer of the improved VGG-M model. Keywords Spatial pyramid pooling · Micro-expression recognition · Deep learning

1 Introduction Human faces can produce millions of micro-expressions every second, yet only a tiny fraction of those is visible to the naked eye. In most cases, these microexpressions manifest the real repressed or unspoken feelings. Macro-expressions and micro-expressions are the two types of facial expressions that differ majorly in terms of reaction duration and intensity. Macro-expressions are observable all over the face compared to micro-expressions, which typically last between 0.5 and 4 s only. Therefore, macro-expressions are easier to distinguish from the noise compared to M. M. Stofa · M. A. Zulkifley (B) · M. A. A. M. Zainuri · M. H. M. Zaman Universiti Kebangsaan Malaysia (UKM), 43600 Bangi, Selangor, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_36

521

522

M. M. Stofa et al.

micro-expressions. In general, micro-expressions are spontaneous, fast, and localized expressions that typically last between 0.065 and 0.5 s. Micro-expression recognition is a very difficult task because of its shortness and haziness facial reactions. As a result, it is difficult for humans to determine the types of emotion portrayed by the micro-expressions existence just using bare eyes. Even for the case where an averagely trained human in micro-expression recognition only produces marginally better detection compared to the normal individual. Many works have been proposed throughout the years to use cutting-edge technology such as machine learning and computer vision techniques to detect micro-expression accurately. Since 2012, the machine learning field can be divided into two categories: deep and conventional machine learning techniques. Advances in automated systems have spurred research into the study of micro-expressions. Face expression recognition [1], image recognition [2], recycling system [3], and human activity recognition [4] are all examples of tasks where both techniques have been used. Note that deep learning technology is employed in this paper to produce a high accuracy system for micro-expression recognition. This study proposed an improved VGG-M model with embedded Spatial Pyramid Pooling (SPP) modules. According to Chen et al. [5] and Liu et al. [6], SPP is the layer capable of obtaining information of various scales from the input data. It has also been successfully implemented in oil palm planting segmentation [7], paddy fields segmentation [8], and automated pterygium segmentation [9]. This paper is structured according to the following division. Section 2 presents the recent works for micro-expressions recognition using both conventional and deep learning-based approaches. In Sect. 3, the details of the improved VGG-M model and SPP module are explained, whereby experimental findings RE described in Sect. 4 of this paper. Finally, the last section concludes the summary of the simulation findings.

2 Recent Works 2.1 Traditional Feature-Based Approach According to Shreve et al. [10], the spatial-temporal strains on the face that were induced by non-rigid motions can be employed as a collective way to recognize the macro and micro-expressions in a video sequence. This approach effectively discriminates between micro and macro-expressions by estimating the strain magnitude at various face regions such as cheek, lips, chin, forehead, and so on. Then, the work in [11] and [12] introduced the histogram of the oriented gradients for micro-expression recognition. In this case, true positives are defined as the sequences that have been observed for less than 100 frames. In contrast, a false positive is a sequence of motions that have been identified but not encoded yet. This approach produced a recall rate of 0.84, an accuracy of 0.70, and an F1-measure of 0.76 when tested with the SAMM database.

Optimization of Spatial Pyramid Pooling Module Placement …

523

Other than that, Huang et al. [13] has developed micro-expression recognition using Spatiotemporal Completed Local Quantization Patterns (STCLQP). STCLQP begins by extracting three major components of information which are orientation, magnitude, and signage. Secondly, the authors implemented dense and characteristic codebooks for each component in the temporal and appearance domains. Finally, the authors retrieved and fused the spatio-temporal characteristics from the generated codebooks. Recently, a new binary pattern variant was introduced by Huang and Zhao [14] named Spatio-Temporal Local Radon Binary Pattern (STRBP), which leveraged Radon transform to produce a set of resilient shape characteristics. Apart from that, Hot Wheel Pattern (HWP) was introduced by Ben et al. [15] to represent the discriminative characteristics of macro-expression and micro-expression of the subject. The similarities between macro and micro-expression information are represented by using a pair of metric learning methods.

2.2 Convolutional Neural Network Feature-Based Approach Many researchers have focused more on the deep learning approach over the past few decades despite acceptable performances by the conventional micro-expression recognition methods, which are based on handmade features of interest. In this context, several studies in micro-expression recognition are presented that demonstrate the proposed deep learning method to obtain better recognition performance. With the rapid advancement in GPU processing, complex deep learning models can be trained on massive volumes of data at a relatively faster rate. In [16], Yu and Zhang have developed a structure of a nine-layer convolutional network for facial expression recognition. They have used the softmax activation function to divide the expressions into seven classes. Their network managed to obtain a recognition rate of 61.29% when trained using the Static Facial Expressions from the Wild (SFEW2.0) dataset. Lopes et al. [17] recommended preprocessing the data first before employing the convolutional operations. A recognition rate of 97.81% was achieved using the CK + dataset, which verified the impact of preprocessing on facial expression recognition accuracy. Wang and Yuan [18] have further improved the recognition rate by 2% by utilizing the softmax training network and data enhancement approach with the implementation of a triplet loss function. After that, Takalkar and Xu [19] established a framework for creating fake images using data augmentation techniques and passing them into the Deep Convolutional Neural Network (DCNN) to train and identify face emotions. A novel CNN architecture with sparse batch normalization was developed by Cai et al. [20] to overcome the issue of vanishing gradient problems. This network has the property of employing two convolutional layers sequentially, followed by max-pooling and Sparse Batch normalization (SBP). A dropout was then utilized in the center of three fully connected layers to reduce over-fitting.

524

M. M. Stofa et al.

Other than that, Li et al. [21] provided a novel CNN method for solving the facial occlusion problem. As a first step, the authors created the data and fed it into a VGGNet network, which they embedded with an attention mechanism. Peng et al. [22] constructed Dual Temporal Scale Convolutional Neural Network (DTSCNN) to categorize the facial micro-expression. This model had a success rate of over 10% higher than the prior approaches that produced the best categorization results.

3 Methodology 3.1 Dataset There are several freely accessible databases for spontaneous micro-expression simulation that can be obtained online, such as Spontaneous Micro-Expression Corpus (SMIC) [23], Chinese Academy of Sciences Micro-expressions (CASME II) [24], and Spontaneous Actions and Micro-Movement (SAMM) [25]. These datasets are extensively utilized in many recent works as they are reliable emotion simulations based on micro-expression. Li et al. [23] developed a SMIC dataset that consists of 164 micro-expression clips collected from 16 different subjects. SMIC dataset consists of three subsets of data that differ according to the types of recording cameras, which are SMIC-HS, SMI-VIS, and SMIC-NIR. A high-speed (HS) camera of 100 fps was used to record SMIC (HS), a standard visual (VIS) camera of 25 fps was used to record SMIC (NIR), and an infrared-based (NIR) camera was used to record SMIC (NIR). SMIC datasets categorize micro-expressions as positive, negative, or surprise. Yan et al. [24] developed the CASME II dataset, which is a continuity from the original CASME dataset. The CASME II videos were recorded in a high frame rate format, which is 200 fps, while the resolution is 640 × 480 pixels, whereby they also provided cropped region of interest size in the form of 280 × 340 pixels resolution. It contains 255 micro-expression videos from 35 subjects, which are labeled into five categories that include happiness, disgust, repression, surprise, and others. Davison et al. [25] developed the SAMM dataset. This dataset has higher variability compared to the CASME II dataset, especially in terms of the subject’s age and race. Participants’ age in their experiments ranged from as low as 19–57 years old, coming from various types of nationalities of India, the United Kingdom, Spain, and others. In contrast to the previous datasets, the videos used in the SAMM dataset were tailored to each participant so that the potential emotion could be elicited accurately. It contains 159 spontaneous micro-expression samples collected from 32 subjects with a demographically diverse mix of genders. SAMM dataset subjects were induced using seven fundamental emotions with a frame rate of 200 fps. In this paper, there are three types of emotion will be considered, which are positive, negative, and surprise. General good human emotion will be categorized as the positive emotion, whereas general bad human emotion will be categorized as

Optimization of Spatial Pyramid Pooling Module Placement …

525

Fig. 1 An example of a SMIC dataset image with different emotion

the negative emotion, and surprise emotion is recognized when a human detects a discrepancy between expectation and reality. Figures 1, 2, and 3 illustrate three types of emotion from all three different datasets.

Fig. 2 An example of a CASME II dataset image with different emotion

Fig. 3 An example of a SAMM dataset image with different emotion

526

M. M. Stofa et al.

Fig. 4 The proposed VGG-M architecture

3.2 Convolutional Neural Network Model This study was developed using VGG-M as the base model, which was first introduced by Chatfield et al. [26]. VGG-M was selected as the base model in this study based on its compact layers, which happens to suit our small training database with only 441 micro-expression videos. This model also shows excellent performance in terms of computation speed and recognition accuracy. In this study, the original VGG-M architecture has been modified, whereby the original network consists of nine main layers with five convolution layers, one flattens layer, and three fully connected (FC) layers. A total of 96 kernels with a size of 7 × 7 were applied to the first convolution layer, then 256 kernels with a size of 5 × 5 were used in the second convolution layer, and a total of 512 kernels with a size of 3 × 3 were used in the third, fourth, and fifth convolution layers. Every convolutional and FC layer was applied with the ReLu activation function, but the last FC layer applied the softmax activation function. Batch normalization layer is applied after the first and second layers. A maximum pooling layer is employed right after the ReLu activation function layer in the third, fourth, and fifth layers to make the model more accurate with good robustness. Robustness in an automated system allows the model to perform at a good performance level across a wider operating range [27]. Figure 4 shows the proposed original VGG-M architecture.

3.3 Spatial Pyramid Pooling Generally, the SPP module contains three parallel layers that perform down-sampling operations separately with different pooling kernel sizes. These operations include convolution, batch normalization, and ReLu activation function layers. The size of the kernel in convolution varies between different parallel layers. Finally, the outputs are combined using the concatenation operator after they have been resized to match the original input feature maps. Figure 5 illustrate the architecture of the SPP module with three parallel branches.

Optimization of Spatial Pyramid Pooling Module Placement …

527

Fig. 5 An architecture of SPP module with three parallel branches

In this study, there are four variations of SPP architecture will be tested and their details are as follow: • First: two parallel branches, Size of kernel: 3 × 3, Number of kernels: 48, Position: after first layer • Second: three parallel branches, Size of kernel: 3 × 3, Number of kernels: 48, Position: after first layer • Third: two parallel branches, Size of kernel: 3 × 3, Number of kernels: 32, Position: after second layer • Fourth: three parallel branches, Size of kernel: 3 × 3, Number of kernels: 32, Position: after second layer. There are two schemes in placing the SPP module; either right after the first layer and right after the second layer as shown in Fig. 6.

4 Result and Discussion The VGG-M model for micro-expression recognition is trained using an Nvidia Titan V for 500 epochs with 32 samples each iteration. A learning rate of 0.0001 is used in the Adam optimizer [28] to update the parameters. The proposed method employed a mean accuracy metric to recognize human emotions reliably, where TP denotes true positive, TN denotes true negative, and NTotal is the total number of videos. Accuracy =

TP + TN NTotal

(1)

528

M. M. Stofa et al.

Fig. 6 Placement of the SPP module in CNN model

Table 1 demonstrates the performance results of improved VGG-M with four different SPP placements schemes. The highest performance result was obtained by the SMIC dataset with an accuracy of 89.43% when VGG-M is embedded with three parallel branches of SPP implanted right after the second layer. For the CASME II dataset, the original VGG-M without the SPP module produced a better performance with an accuracy of 78.79%. VGG-M with three parallel branches of SPP modules that are applied after the second layer is the best setup for the SAMM dataset, which obtained an accuracy of 74.8%. Finally, for the combined dataset, VGG-M with three parallel branches of SPP placed after the second layer also produced the best performance with an accuracy of 77.48%. Figures 7 and 8 illustrates the training graph for original VGG-M and improved VGG-M with three SPP module placed after second layer. Table 1 Performance results of the improved VGG-M with four types of SPP placements schemes Datasets

Accuracy (%) Original

After first layer

After second layer

2 SPP

3 SPP

2 SPP

3 SPP 77.48

Combined

72.34

76.11

76.57

76.72

SMIC

76.09

86.67

87.59

89.25

89.43

CASME II

78.79

68.69

68.69

67.17

67.68

SAMM

63.82

72.76

73.17

73.17

74.80

Optimization of Spatial Pyramid Pooling Module Placement …

529

Fig. 7 Training graph for original VGG-M

Fig. 8 Training graph for improved VGG-M with three SPP module placed after second layer

5 Conclusion In conclusion, this study has successfully increased the micro-expression recognition performance by embedding the SPP module into the VGG-M model. In order to identify the best possible setup, different configurations have been explored and validated. The combined dataset obtained the best performance when three parallel branches of the SPP module were placed after the second layer. This configuration

530

M. M. Stofa et al.

produced a performance improvement of micro-expression recognition accuracy of 5.14% over the base model. In general, the VGG-M model with SPP of three parallel branches applied after the second layer produced good accuracy improvement results for the combined dataset, SMIC, and SAMM, but not for CASME II. For future works, similar experiments will be conducted on more diverse datasets to authenticate the efficiency of the SPP module further, so that the optimal SPP configuration for the CNN model can be determined more efficiently. Acknowledgements The authors would like to acknowledge funding from Universiti Kebangsaan Malaysia (Geran Universiti Penyelidikan: GUP-2019–008) and Ministry of Higher Education Malaysia (Fundamental Research Grant Scheme: FRGS/1/2019/ICT02/UKM/02/1).

References 1. Wang Y, Li Y, Song Y, Rong X (2019) Facial expression recognition based on random forest and convolutional neural network. Information 10(12):375 2. Lai Y (2019) A comparison of traditional machine learning and deep learning in image recognition. J Phys: Conf Ser 1314(1):012148 3. Zulkifley MA, Mustafa MM, Hussain A, Mustapha A, Ramli S (2014) Robust identification of polyethylene terephthalate (PET) plastics through Bayesian decision. PLoS ONE 9(12):e114518 4. Slim SO, Atia A, Elfattah MMA, Mostafa MSM (2019) Survey on human activity recognition based on acceleration data. Int J Adv Comput Sci Appl 10(3) 5. Chen J, Wang C, Tong Y (2019) AtICNet: semantic segmentation with atrous spatial pyramid pooling in image cascade network. EURASIP J Wireless Commun Networking 2019(1):146 6. Liu Z, Zhang Y, Chen Y, Fan X, Dong C (2020) Detection of algorithmically generated domain names using the recurrent convolutional neural network with spatial pyramid pooling. Entropy 22(9):1058 7. Abdani SR, Zulkifley MA, Mamat M (2020) U-Net with spatial pyramid pooling module for segmenting oil palm plantations. In: 2020 IEEE 2nd international conference on artificial intelligence in engineering and technology (IICAIET), pp 1–5 8. Abdani SR, Zulkifley MA, Siham MN, Abiddin NZ, Aziz NAA (2020) Paddy fields segmentation using fully convolutional network with pyramid pooling module. In: 2020 IEEE 5th international symposium on telecommunication technologies (ISTT), pp 30–34 9. Abdani SR, Zulkifley MA, Zulkifley NH (2021) Group and shuffle convolutional neural networks with pyramid pooling module for automated pterygium segmentation. Diagnostics 11(6):1104 10. Shreve M, Godavarthy S, Goldgof D, Sarkar S (2011) Macro- and micro-expression spotting in long videos using spatio-temporal strain. In: Face and gesture 2011, pp 51–56 11. Davison AK, Yap MH, Lansley C (2015) Micro-facial movement detection using individualised baselines and histogram-based descriptors. In: 2015 IEEE international conference on systems, man, and cybernetics, pp 1864–1869 12. Davison AK, Lansley C, Ng CC, Tan K, Yap MH (2016) Objective micro-facial movement detection using FACS-based regions and baseline evaluation 13. Huang X, Zhao G, Hong X, Zheng W, Pietikäinen M (2016) Spontaneous facial microexpression analysis using spatiotemporal completed local quantized patterns. Neurocomputing 175 14. Huang X, Zhao G (2017) Spontaneous facial micro-expression analysis using spatiotemporal local radon-based binary pattern. In: 2017 International conference on the frontiers and advances in data science (FADS), pp 159–164

Optimization of Spatial Pyramid Pooling Module Placement …

531

15. Ben X, Jia X, Yan R, Zhang X, Meng W (2018) Learning effective binary descriptors for microexpression recognition transferred by macro-information. Pattern Recogn Lett 107:50–58 16. Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp 435–442 17. Lopes AT, de Aguiar E, Oliveira-Santos T (2015) A facial expression recognition system using convolutional networks. In: 2015 28th SIBGRAPI conference on graphics, patterns and images, pp 273–280 18. Wang J, Yuan C (2016) Facial expression recognition with multi-scale convolution neural network. 376–385 19. Takalkar MA, Xu M (2017) Image based facial micro-expression recognition using deep learning on small datasets. In: 2017 International conference on digital image computing: techniques and applications (DICTA), pp 1–7 20. Cai J, Chang O, Tang X-L, Xue C, Wei C (2018) Facial expression recognition method based on sparse batch normalization CNN. In: 2018 37th Chinese control conference (CCC), pp 9608–9613 21. Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans Image Process 28(5):2439–2450 22. Peng M, Wang C, Chen T, Liu G, Fu X (2017) Dual temporal scale convolutional neural network for micro-expression recognition. Front Psychol 8 23. Li X, Pfister T, Huang X, Zhao G, Pietikainen M (2013) A spontaneous micro-expression database: inducement, collection and baseline 24. Yan WJ, Li X, Wang SJ, Zhao G, Liu WJ, Chen YH, Fu X (2014) CASME II: an improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE 9(1) 25. Davison AK, Lansley C, Costen N, Tan K, Yap MH (2018) SAMM: a spontaneous micro-facial movement dataset. IEEE Trans Affect Comput 9(1) 26. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets 27. Zulkifley MA, Rawlinson D, Moran B (2012) Robust observation detection for single object tracking: deterministic and probabilistic patch-based approaches. Sensors 12(11):15638–15670 28. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization

Image Colorization: A Convolutional Network Approach Nitesh Pradhan, Saransh Gupta, and Gaurav Srivastava

Abstract In recent days, due to some application of image colorization as automatic coloring of old pictures, correctly identification of thief from sketch image, etc., image colorization has become hottest area in research domain. Machine learning plays an essential role in this concern. This paper illustrates deep convolutional neural network approach with VGG16 pre-trained classifier that takes grayscale image as input image and gives its equivalent-colored image as output image. In this research, deep convolutional neural network is categorized into encoder, fusion, and decoder part, and VGG16 pre-trained model is used for extracting high-level features from an image. The proposed network is trained till 2000 epochs. The final-colored image is compared with ground truth image. Keywords Image colorization · LAB color space · Convolutional neural network · Deep learning · Classifier

1 Introduction Computer vision and deep learning are a powerful domain with many applications such as image generation from the text description, image-to-image translation, face aging, face generation, image colorization from gray image or black-and-white image, etc. Normally, image colorization problem is defined in term of International Commission on Illumination (CIE) LAB color space. As red, green, blue color space, CIE color space also a 3-channel (A, B, and L) space where A and B used to encode the color information in green-red component and blue-yellow component, respectively, and L channel used to encode the intensity information of an image. On the contrary to cyan, magenta, yellow, key (CMYK) and red, green, blue (RGB) models of shading, LAB color model aims to inexact human vision. It mainly focusses to bring about perceptual consistency. The L component helps in the coordination N. Pradhan (B) · S. Gupta · G. Srivastava Manipal University Jaipur, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_37

533

534

N. Pradhan et al.

of the human impression of delicacy such that it does not produce the Helmholtz– Kohlrausch results into record. Hence by altering A and B channels, optimal and precise shading balance can be obtained. Generally, people use photoshop for image colorization which is long and tedious task because a face alone needs up to 20 layers of blue, green, and pink shades to get it just right. Image colorization has many practical applications such as colorizing old movies or photographs, color recovery, artist assistance, and color image encoding.

2 State of Art In 2016, Zhang et al. [1] designed a fully automatic algorithm that produced exceptionally realistic results. At training time, they used classification and classrebalancing to incorporate greater diversity of colors in the result. When model is given the lightness channel L as input, for an image, it calculates the corresponding A* and B* color channels. At test time, entire network is a feed-forward pass in a CNN. The model is trained on more than a million color images. Evaluated the results using a “colorization turing test,” where the people were asked to select between a generated and ground truth color image. Method successfully managed to befool people on 32% of the trials. The algorithm designed by Iizuka et al. [2] which had four main components: a low-level features network, a mid-level features network, a global features network, and a colorization network. Set of low-level features extracted from the image are used to compute the sets of global image features and mid-level image features. “Fusion layer” is used to combine or fuse the mid-level and the global features. These fused features are used as an input to the colorization network that finally outputs the colored image. The entire network learns in an end-to-end fashion. CIE L * A * B* color space is used for the training images. No such pre-processing or post-processing is required. Main features of the proposed architecture are • It jointly learns global and local features for an image using an end-to-end approach. • Classification labels are exploited to increase proposed network performance. • A style transfer technique used. Trained the model on the places scene dataset, which consists of 2,448,872 training images and 20,500 validation images distributed among 205 subjects corresponding to the various types of scenes such as volcano and conference center. Evaluated the model through a user study and found that the output of the model is considered “natural” 92.6% of the time. In 2017, Zhang et al. [3] presented a deep learning algorithm for user guided image colorization. Given the grayscale version and user inputs, the designed network predicted the color of an image. The proposed model uses two networks such as first, the Local Hints Network which uses sparse user points and second, the Global Hints Network which uses global statistics. The model has been trained on nearly a million

Image Colorization: A Convolutional Network Approach

535

images, with simulated user inputs. A single feed-forward pass is used to perform the colorization, making it considerably fast and enabling real-time use. In the same year 2017, later on Baldassarre et al. [4] used the CIE L * A * B* color space for the images, where L, the luminance contains all the main features of the image and A * B* called chromaticity contains all the color information of the image. It used a pre-trained Inception ResNetV2 for high-level feature extraction to obtain an intuition about the contents of the image decoder–encoder convolution network used for coloring the image. Network when provided with the luminance component of an image as the input, the model outputs A * B* components which are combined with the input to obtain the final-colored image. Size of the training dataset is kept small, which restricts the model to small variety of images. Training results were presented by assessing the public acceptance evaluation of the images generated by the network through a user study. In the last few years, convolution neural network has shown tremendous advancement in the object detection and classification. Experiments and research show that CNNs have almost reduced the error rate for object detection to half Krizhevsky et al. [5]. The ImageNet dataset is a large collection of images designed for research in visual object recognition. It consists of more than 14 million hand—annotated images which indicate what objects are present in the picture and in at least one million of the images, bounding boxes are also provided. ImageNet dataset is distributed among more than 20,000 classes. Since 2010, the ImageNet organizes an annual contest called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Algorithms for object detection and image classification at large scale are evaluated in the ImageNet Challenge (ILSVRC) Russakovsky et al. [6]. A specially “trimmed” list of one thousand non-overlapping classes is used for the challenge. The ILSVRC follows the footsteps of the PASCAL VOC challenge Everingham et al. [7] established in 2005, which had a smaller dataset of about 20,000 spread across twenty object classes. VGGNet Simonyan et al. [8] was invented by Visual Geometry Group (VGG) from University of Oxford. VGG16 emerged as the winner of ILSVRC 2014 in the classification task. VGG16 obtains 8.8% error rate on the ImageNet dataset. VGG16 had been used for malicious software classification based on deep neural network bottleneck feature Rezende et al. [9]. Apart from VGG16, ResNet50 He Kaiming et al. [10] and Szegedy et al. [11] used transfer learning concept for malicious software classification in deep neural network era Rezende et al. [12]. VGG16-based fully convolutional structure has been used to classify the weld defect image Liu et al. [13] which achieves a high accuracy with a relatively small dataset. Testing dataset had 3000 images. Achieved a test accuracy of 97.6% and train accuracy of 100% on two main defects. A common observation after reviewing all these models is that the models which take “global features” under consideration outperforms the one who does not. This is because global features provide information about what is present in the image and helps in mapping the detected objects with their corresponding probabilities. This further helps the colorization network to learn what kind of colors are possible for what specific kind of object/image. Most of the reviewed papers have used a

536

N. Pradhan et al.

classifier network to extract global features and a separate colorization network to color the image on the basis of high-level features (global features) Iizuka et al. [2] and Baldassarre et al. [4]. Highly inspired by such an architecture authors decided to design a model which uses a classification network to determine object type and category under consideration and a colorization network which estimates the output colors on the basis of high-level and mid-level features. But instead of training a model from scratch to extract global features, authors decided to use a pre-trained network for that, which would reduce the training time.

3 Methodology To perform the image colorization on grayscale image hyperparameters, preprocessing and network architecture are explained in further sub-sections.

3.1 Hyperparameter Rectified liner unit (ReLU) has been used as the activation function for all the layers of the encoder as well as decoder network except the last layer. tanh has been used in the last layer to map the predicted values in the same interval as of the real values. This is done for the easy comparison between the predicted values and the real values. Since the real values (values for A and B channels) lie in the range of [−1, 1], tanh is used since it outputs the values in the same range for any input given to it. Over the estimated and target output, the ideal model parameter is identified by minimizing the objective function. To quantify the loss of the model, mean square error is used between estimated pixel values and its real values. Mean square error for an input image X is defined by Eq. (1). C(X , θ ) =

W H  2    1 X ki j − X˜ ki j 2H W k∈(a, b) i=1 j=1

(1)

All model parameters represented by θ. X ki j and X˜ ki j indicate the pixel estimation of the ij:th pixel estimation of the k:th segment of the objective and reproduced picture, separately. During training, Adam optimizer is used to back-propagate the loss to update the model parameters with an initial learning rate η = 0.001.

Image Colorization: A Convolutional Network Approach

537

3.2 Data Pre-processing All the training images have been pre-processed before feeding them to the convolution network. Pre-processing is required at two ends, namely colorization end and classification end. • Colorization End All the images present in the training dataset are in RGB format having pixel values in a range of [0, 255]. As the first step of pre-processing, the pixel values are brought in a range of [0, 1] by dividing them by 255. It helps in reducing the convergence time of the network loss. Secondly, the images are converted in CIELAB (also known as CIE L * A * B* or sometimes abbreviated as simply “LAB”) color space. It is due to the following reasons: L stands for lightness or luminance channel which is the grayscale version of the image. a and b represent the chrominance where A is for red-green and B is for blueyellow. Converting it to LAB reduces the problem to the determination of only two extra color channels, i.e., A and B, whereas in RGB, it would have been 3 channels as red, green, and blue. The L layer of the image which will be our input (as it is nothing but the grayscale values) can also be used in our final step of merging the L, A, and B layers. In the LAB color space format, an image of size H × W is passed as input. To generate the complete color image X˜ ∈ R H∗W ∗3 , luminance and A and B component required. So, authors are passing a luminance component to the model by which model will predict A and B components. The relationship between luminance and A and B is defined in Eq. (2). Figure 1 shows the L * A * B color space with respective to the RGB image.   F : X L → X˜ a , X˜ b

(2)

• Classification End Since VGG16 takes 224 × 224 images with 3 channels as input. All the images are resized to 224 * 224 and converted to gray scale (after converting to gray scale, the image is then converted to RGB to get three channels in gray scale) before passing them to the classification end as shown in Fig. 2. • Image Augmentation

538

N. Pradhan et al.

Fig. 1 RGB to LAB format

Fig. 2 RGB to grayscale conversion

It is required to avoid over fitting. Since our training dataset is small with only 10,000 images, the model can learn the details and noise in our training dataset to a large extent and cannot generalize the features when working on the test dataset. Over fitting adversely affects the performance of the network on the test set. Image augmentations techniques are methods of artificially increasing the variations of images in our dataset by using horizontal/vertical flips, rotations, and variations in the brightness of images, horizontal/vertical shifts, etc. It creates random batches of training data.

3.3 Network Architecture Convolutional neural network (ConvNet/CNN) O’Shea et al. [14] is a type of deep learning algorithm which takes an image as input, extracts feature from it, and assigns weights and biases to various aspects/objects in the image and is able to differentiate one from the other. ConvNets have the ability to automatically extract the features/characteristics from the images whereas on the contrary, classical classification algorithms require manually designed filters for the same. Hence, convolution neural network manages everything in an end-to-end fashion. Figure 3 represents the used network architecture for image colorization. The network architecture comprises of two different networks, the classification network, and the colorization network.

Image Colorization: A Convolutional Network Approach

539

Fig. 3 Network architecture

• Classification Network From the various pre-trained classification networks like Xception Chollet et al. [15], VGG16, VGG19, ResNet50, InceptionV3, InceptionResNetV2, Mobile Net Howard et al. [16], author choose VGG16, which is the winner of ILSRVC 2014. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. VGG16 is an improvement over AlexNet Krizhevsky et al. [5] by replacing the large filters with size 11 × 11 and 5 × 5 in first and second layers, respectively, by multiple smaller filters of size 3 × 3 one after the other. VGG16 takes 224 × 224 × 3 images as input. All the images are resized to 224 × 224 × 2 which is already discussed under the pre-processing section. The output (embedding) of the last classification layer just before the softmax function is extracted from VGG16. It is 1000 × 1 dimensional list or column vector. It represents the probabilistic measure of what all objects/aspects are present in the image. This high level of information helps to predict more realistic colors for the objects in the images. For example, if the network knows its sky, then it would eventually color it blue rather than coloring it with any arbitrary color. The output from the classification network is merged with the output from the encoder part of the colorization network which will be discussed under the colorization network. Since VGG16 is pre-trained on 1000 classes and ImageNet dataset, transfer learning can be used to train it on smaller datasets with lesser number of classes. Generally, it is helpful when working with a specific type of datasets like cats and dogs or the natural scenes datasets, etc. • Colorization Network Colorization network is logically divided into three main components. The encoder to extract the mid-level features whereas fusion layers to merge the midlevel features with high-level features obtained from the classification network and decoder to use these features to predict the values for the color channels.

540

N. Pradhan et al.

Table 1 Layer structure for encoder

Layer

Kernel size

Stride size

Output image size

Conv2D

64 × (3 × 3)

2×2

H/2 × W /2

Conv2D

128 × (3 × 3)

2×2

H/4 × W /4

Conv2D

128 × (3 × 3)

2×2

H/8 × W /8

Batch Norm





H/8 × W /8

Conv2D

256 × (3 × 3)

2×2

H/16 × W /16

Conv2D

512 × (3 × 3)

2×2

H/32 × W /32

Conv2D

512 × (3 × 3)

2×2

H/64 × W /64

Encoder: The encoder takes H × W (256 × 256 tensor with 1 channel) grayscale images as input. The grayscale image is, in fact, the luminance component (L*) in LAB color space. It uses six convolutional layers with 3 × 3 kernels each. Every convolution layer uses a stride size of 2 consequently halving the dimension of their output and hence reducing the number of computations required. Padding is used to ensure that the images are exactly halved. A batch normalization layer is placed symmetrically between the convolution layers right after the third convolution layer as shown in Table 1. The first three convolution layers having a smaller number of filters are used to detect low-level features. The next three convolution layers having a higher number of filters are used to detect mid-level and high-level features using the low-level features obtained from the previous three layers. It outputs H/64 × W /64 × 512 feature maps representing a combination of low-level and mid-level features. Fusion Layer: The fusion layer takes the feature maps vector from the classification network, replicates it H/64 * W /64 times, and combines it to the feature maps or to the vector obtained as output from the encoder part of the colorization network. Along the depth axis, the feature vector or map obtained which has dimensions of H/64 × W /64 × 1512. This helps to obtain a single feature map or vector containing both the mid-level and high-level features obtained from classification network and the encoder, respectively. This also ensures even distribution of information throughout the image. After this, a convolution layer with 512 kernels or filters of size 1 × 1 with stride size 1 is applied on the feature vector obtained after fusion, to obtain a feature volume of dimensions H/64 × W /64 × 512. Table 2 shows the layer structure of the fusion layer. Decoder: The decoder takes H/64 × W /64 × 512 feature volume as input and applies deconvolution using transpose convolution layer. It uses eight transpose convolution layers with 3 × 3 kernel size each and a stride size of 2 for all the layers except the Table 2 Layer structure of fusion layer Layer

Kernel size

Stride size

Output image size

Conv2D

512 × (1 × 1)

1×1

H/64 × W/64

Image Colorization: A Convolutional Network Approach Table 3 Layer structure of decoder

Layer

Kernel size

541 Stride size Output image size

Conv2DTrans 512 × (3 × 3) 1 × 1

H/64 × W /64

Conv2DTrans 256 × (3 × 3) 2 × 2

H/32 × W /32

Conv2DTrans 256 × (3 × 3) 2 × 2

H/16 × W /16

Conv2DTrans 128 × (3 × 3) 2 × 2

H/8 × W /8

Batch Norm

H/8 × W /8





Conv2DTrans 128 × (3 × 3) 2 × 2

H/4 × W /4

Conv2DTrans 64 × (3 × 3)

2×2

H/2 × W /2

Conv2DTrans 32 × (3 × 3)

2×2

H×W

Conv2DTrans 2 × (1 × 1)

1×1

H×W

first and the last layer as shown in Table 3. For every layer with a stride size of 2, the size of the feature volume is doubled to match the original image dimensions. The number of kernels is gradually decreased from 512 to 2 because finally, out target is to predict two channels a* and b*.

4 Experimental Results and Discussion 4.1 About Dataset The results obtained from the deep learning networks depend almost equally on the network architecture as well as on the dataset. Choice of the dataset has a significant role in determining the parameters like training accuracy, training loss, validation accuracy, and validation loss. Moreover, the size of the dataset can also give a hint about whether a model is over fitted or under fitted. Most of the previous automatic colorization models Larsson et al. [17] have used the easily and extensively available ImageNet dataset. ImageNet’s huge size and free availability make it a good choice for our work. Authors used the publicly available dataset on FloydHub which is available at www.floydhub.com/emilwallner/datasets/ colornet. The images in the dataset are collected from Unsplash, a platform which provides free high-quality images clicked by professional photographers. The dataset has a great diversity of images ranging from natural scenes to human faces, animals to gadgets. It approximately contains 9500 training images and 500 validation images all having a uniform size of 256 × 256.

542

N. Pradhan et al.

4.2 System Configuration The model is trained on the Google Colab, which is a free online training platform. Google Colab uses Tesla K80 GPU and provides 12 GB RAM and 12 h of continuous use. Due to RAM limitations, the batch size is kept small.

4.3 Result and Discussion Colorization of grayscale images is considered as a problem with no “correct” answer. For example, a red chair is indistinguishable from a blue chair when photographed in black-and-white. Here, authors present a method using deep learning techniques to colorize a grayscale image and produce a colored image with possible colors. If the output is able to fool human eyes, then it is considered as a valid output. The results were exceptionally good on a certain set of images, where the network generated almost the ground truth. The estimated color improved gradually as the number of epochs was increased up to a limit of 2000 epochs. But due to the small size of the training dataset, it was limited to coloring only the main objects which were easily detected by the classifier network. Hence, all the objects in the image were not colored. Our model is trained for 2000 epochs with a batch size of 32 as shown in Fig. 4.

Fig. 4 Obtained result compared with ground truth image

Image Colorization: A Convolutional Network Approach

543

5 Conclusion and Future Work Deep learning plays an important role in image colorization task. In this paper, a novel approach introduced for image colorization which first classifies the image and then performs colorization on it. The proposed method applied to several images of different category. It is observed that introduced network performs better if the image belongs to the natural scenes category such as sky and sea. The reason is in our dataset maximum images are from natural scenes category. So for unseen images, network highly depends on the dataset. To overcome this issue, the proposed network should be trained with a large dataset which contains images of all categories. This work can be further extended to be applied on videos like old black-andwhite movies and feed obtained from the CCTV cameras. To get an idea of how the images are perceived by the observer or how compelling the colors look to a human observer, a public survey can be conducted asking the people to label the images colored by our network as fake or real. This can help in calculating the accuracy of the model. The accuracy of the model can be calculated by how many times the colored images are able to fool the human eye.

References 1. Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision. Springer, Cham, pp 649–666 2. Iizuka S, Simo-Serra E, Ishikawa H (2016) Let there be color! Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Trans Graph (ToG) 35(4):1–11 3. Zhang R, Zhu JY, Isola P, Geng X, Lin AS, Yu T, Efros AA (2017) Real-time user-guided image colorization with learned deep priors. arXiv Preprint. arXiv:1705.02999 4. Baldassarre F, Morín DG, Rodés-Guirao L (2017) Deep koalarization: image colorization using CNNs and inception-resnet-v2. arXiv Preprint. arXiv:1712.03400 5. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, 25 6. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252 7. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vision 88(2):303–338 8. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv Preprint. arXiv:1409.1556 9. Rezende E, Ruppert G, Carvalho T, Theophilo A, Ramos F, Geus PD (2018) Malicious software classification using VGG16 deep neural network’s bottleneck features. In: Information technology-new generations. Springer, Cham, pp 51–59 10. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778 11. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence

544

N. Pradhan et al.

12. Rezende E, Ruppert G, Carvalho T, Ramos F, De Geus P (2017) Malicious software classification using transfer learning of resnet-50 deep neural network. In: 2017 16th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 1011–1014 13. Liu B, Zhang X, Gao Z, Chen L (2017) Weld defect images classification with VGG16-based neural network. In: International forum on digital TV and wireless multimedia communications. Springer, Singapore, pp 215–223 14. O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv Preprint. arXiv:1511.08458 15. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258 16. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv Preprint. arXiv:1704.04861 17. Larsson G, Maire M, Shakhnarovich G (2016) Learning representations for automatic colorization. In: European conference on computer vision. Springer, Cham, pp 577–593

Prediction of Particulate Matter (PM2.5) Across India Using Machine Learning Methods Rikta Sen, Ashis Kumar Mandal, Saptarsi Goswami, and Basabi Chakraborty

Abstract Air pollution is one of the global issues that has been a major concern in many countries due to its dangerous effect on human health as well as the environment. Air pollution in India is also a national threat because many big cities already have poor air quality, severely affecting human health and nature. Among the pollutants, particulate matter 2.5 (PM2.5) is the dominant factor causing cancers and other diseases in humans. Hence, predicting this type of air pollutant is vital for effective pollution control measures. In this paper, prediction models have been developed to forecast PM2.5 concentrations in India from air pollutant time series data taken at different locations in India using four machine learning algorithms. The algorithms used here are Support Vector Regression, Decision Tree Regression, K-Nearest Neighbors Regression, and Bayesian Ridge Regression. The data used in this study consists of hourly PM2.5 data from 7 areas in India for the year 2016. We have analyzed the univariate time series data by mapping it into a supervised problem. Experimental results show that the proposed models have effectively predicted PM2.5 concentrations, with Bayesian Ridge Regression-based model showing better performance than other models. Keywords Air pollution · PM2.5 · Time series forecasting · Machine learning

R. Sen · A. K. Mandal School of Software and Information Science, Iwate Prefectural University, 152-52 Sugo, Takizawa, Iwate, Japan e-mail: [email protected] A. K. Mandal e-mail: [email protected] S. Goswami Bangabasi Morning College, University of Calcutta, Kolkata, West Bengal, India B. Chakraborty (B) Dean and Professor, School of Computing Science, Madanapalle Institute of Technology and Science, Madanapalle, AP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_38

545

546

R. Sen et al.

1 Introduction Air pollution is a severe environmental issue receiving more attention across the world. Certain air pollutants such as PM2.5 can travel through the nasal passages and into the throat and lungs during inhalation. This is because PM2.5 is a particulate matter with an aerodynamic diameter less than or equal to 2.5 mm that includes combustion particles, organic compounds, metals, etc. [1]. Long-term exposure to this ambient fine particulate matter can adversely affect human health and cause respiratory symptoms such as irregular heart rate, coughing, airway inflammation, and abnormal lung function [2, 3]. Meanwhile, the air quality of India is deteriorating day by day due to rapid urbanization and industrialization. It is a matter of hope that, despite increasing air pollutant concentrations in India, the Indian government and research institutions are trying to address the air pollution problem in several ways, one of which is forecasting the air pollutant in several places in India. This type of research is essential because air pollutant concentration forecasting effectively protects the public health by providing an early warning against harmful air pollutants. Until recently, several methods have been proposed to predict air pollution prediction. These include statistical time series methods such as Auto Regressive Moving Average (ARMA) [4], Auto Regressive Integrated Moving Average (ARIMA)[5], and Hidden Markov Model (HMM) [6]; machine learning models such as Support Vector Regression (SVR) [7], Artificial Neural network (ANN) [8], and Decision Tree Regression (DTR) [9]. Abhilash et al. suggested a statistical model such as ARIMA to develop an air quality prediction model [10]. Liu et al. [11] used the machine learning approach such as SVR and random forest regression (RFR) to predict the air quality index in Beijing. Their experimental result showed that SVR performs better for air quality index prediction. An artificial neural network (ANN) was employed for predicting hourly air pollutant concentrations near an arterial in Guangzhou, China [12]. Experimental results show that ANN performs better than multiple linear regression models. ANN models were also used on Hong Kong air monitoring data, and these models outperform the traditional linear regression algorithms. Azid et al. [13] used the combination of principal component analysis (PCA) and ANN to predict the air pollutant index. They show that the PCA-ANN have ability to predict better Air Pollution Index (API) with fewer parameters. In [14], Xu and Ren proposed a supplementary leaky integrator echo state network (SLI-ESN) with minimum redundancy maximum relevance (mRMR) feature selection method to predict PM2.5 concentration. This work used the feature selection algorithm mRMR to select the appropriate input variables or influencing factors responsible for air pollution. Zhou et al. predicted Beijing’s seasonal PM2.5 using four deep learning models in four seasons such as spring, summer, autumn, and winter with experiments [15]. Leong et al. used the support vector machine (SVM) to overcome the difficulty of calculating complex air pollution index modeling using radial basis kernel function (RBF) [16]. They showed that their proposed model can efficiently and precisely predict the API on the data of Perak and Penang States, Malaysia.

Prediction of Particulate Matter (PM2.5) …

547

Although machine learning algorithms have been successfully employed for forecasting air pollutants, more study is required to find appropriate machine learning algorithms that can effectively predict air pollutant concentrations in different locations. The main objective of this paper is to conduct a comparative analysis of different machine learning methods for the prediction of air pollutants with the help of data from different locations in India. In this paper, we have investigated four machine learning-based forecasting techniques such as Support Vector Regression (SVR), Decision Tree Regression (DTR), K-Nearest Neighbors Regression (kNR), and Bayesian Ridge Regression (BRR) for forecasting of air pollutant concentration. In this prediction task, univariate time series data are transformed into supervised learning data sets to use with four machine learning-based prediction models. The predictive performance of different models is analyzed and compared. It has been observed that all the models are quite effective in Air pollutant concentration prediction task, but BRR-based model performs better compared to other approaches. The rest of the paper is organized as following: Sect. 2 is the methodology of the study in which description of data collection, data set preparation, data preprocessing, machine learning models used for prediction analysis and the experimental setup are discussed. Section 3 describes the comparison of the results with different approaches followed by a discussion on results. Finally, the conclusion and future direction are presented in Sect. 4.

2 Methodology In this section, the methodology adopted for predicting particulate matter has been described. The first step is to collect data from which the appropriate data sets are prepared. Data sets are then partitioned into the training set, and test set and normalization are performed. After that, time series data are properly converted to be used for supervised classification. In the next step, prediction models are developed using the machine learning method, and finally, results are analyzed. Figure 1 shows the overall block diagram of methodology.

2.1 Data Collection The original database contains hourly pollutant observation in different locations of India from 2000 to 2017. The database is in .nc format, containing pollutant observations (PM2.5 level) of places that have to be found by Latitude and Longitude values. We first select seven locations in India based on Latitude and Longitude. The places are Kolkata, Delhi, Assam, Bengaluru, Chennai, Ajmer, and Pondicherry. The data is available in Kaggle [17]. Figure 2 shows hour wise average PM2.5 concentration in seven different locations in India in 2016. It is observed that, compared to other locations, Kolkata obtained

548

Start

End

R. Sen et al.

Data collection

Preparation of data sets

Data set partition and normalization

Results

Prediction model using machine learning methods

conversion of data for supervised problem

Fig. 1 Block diagram of methodology for prediction of particulate matter

Fig. 2 Hour wise pollution in 2016 at different area

a very high level of PM2.5, containing PM2.5 level between 80 and 85. Chennai came next in the list, having PM2.5 levels between 50 and 65, with PM2.5 values rising high from the afternoon. Both Ajmer and Bengaluru have an almost equal level of PM2.5 at around 55. Like in Bengaluru, PM2.5 concentration also fluctuated in Assam and Pondicherry, with the lowest values obtained in the morning and the highest values obtained at night. Figure 3 shows the level of PM2.5 detected in the first five consecutive days in Kolkata in the year 2016. It is clearly observed that changing pattern of PM2.5 level of any two days is not the same. On Day 1, the PM2.5 level decreased gradually until 10 am followed by a level-off. However, on Day 2, PM2.5 concentration drops a little bit until 4 am, and after that, it increases gradually. More PM2.5 level has been detected in Day 3 Compared to in Day 2, with both showing the same changing pattern. Interestingly, Day 5 exhibits a sudden rise of PM2.5 values between 5 am to around 9 am.

Prediction of Particulate Matter (PM2.5) …

549

Fig. 3 First 5 day wise PM2.5 at Kolkata at the year 2016 Table 1 Description of the data Area Year Kolkata Delhi Assam Bengaluru Chennai Ajmer Pondicherry

2016 2016 2016 2016 2016 2016 2016

Duration

No. of samples

January–December January–December January–December January–December January–December January–December January–December

8783 8783 8783 8783 8783 8783 8783

2.2 Preparation of Data Set Seven different locations are chosen such that some locations have high PM2.5 concentration, and others have low PM2.5 concentration. We have taken all observations of these locations in 2016, and therefore, for each area, we got 12 months of data from January to December. Table 1 shows the description of data that are used in our study.

550

R. Sen et al.

2.3 Machine Learning Algorithms Used for Building Models 2.3.1

K-Nearest Neighbors Regression (kNR)

It is a non-parametric regression model based on K-Nearest Neighbor algorithm [18]. The approach K-Nearest Neighbor predicts the values of any new data point based on how closely this point resembles the points in the training set. In K-Nearest Neighbors Regression (kNR), the new point is estimated based on taking the average value of its k nearest neighbors [19]. Euclidean distance is usually used to find the k nearest points of the query point. If Yi is the outcome of ith nearest neighbor and total k nearest neighbors are considered, then the predicted value of the query point Y can be written using the following formula: 1 Yi k i=1 k

Y =

2.3.2

(1)

Support Vector Regression (SVR)

Support Vector Regression (SVR) [20] is based on the elements of Support Vector Machine (SVM) where support vectors are basically closer points toward the generated hyperplane in an m-dimensional feature space that distinctly segregates the data points about the hyperplane [21]. The generalized equation for hyperplane can be represented as Y = w X + b, where w is weights and b is the intercept at X = 0. The margin of tolerance is represented by epsilon  [22]. Then the equations of decision boundary become: w X + b = + and w X + b = −. Thus, any hyperplane that satisfies SVR should satisfy Eq. 2. Finally, only the points within the decision boundary that have the least error rate, or those within the margin of tolerance, are taken −  < Y − w X + b < + (2)

2.3.3

Decision Tree Regression (DTR)

Decision Tree [23] is a supervised learning algorithm that is used both in classification and regression problems. When the target variable is numeric or continuous, decision tree regression is used. In this approach, a decision tree is built incrementally in which mean squared error (MSE) is typically used to split a node into sub-nodes. Finally, the value of terminal nodes is the mean of the observations falling in that region. Therefore, if an unseen data point falls in that region, the prediction value is determined using the mean value of the data points presented at that leaf node.

Prediction of Particulate Matter (PM2.5) …

2.3.4

551

Bayesian Ridge Regression (BRR)

Bayesian Ridge Regression (BRR) is a probabilistic model with extra L2 regularization parameter for the coefficients. Mathematically this model is as follows [24]: y ∼ N (μ, α)

(3)

where μ = β X = βo + β1 x1 + β2 x2 + · · · + β p x p . β ∼ N (0, λ−1 I p ), α ∼ gamma(α1 , α2 ), λ ∼ gamma(λ1 , λ2 ). The output y is assumed to be normal distribution characterized by variance α, and mean μ = β X . The regression parameter β has independent Gaussian priors with variance λ−1 I p and mean zero [25]. Besides, λ and α are regularizing parameters that are distributed as gamma distribution and estimated jointly during the fit of the model [26]. The distributions have hyper-parameters α1 , α2 , λ1 , λ2 .

2.4 Procedure of Designing Prediction Models We split the time series data into train and test sets and use min-max normalization [27] to transform the data in the range [0, 1]. Training and test sets are constructed in three different ways. In the first case, data of the first 11 months are used for training and the last month for the test set. In the 2nd case, data of the first ten months are used for training and the last two months for the test set. In the 3rd case, data of the first nine months are used for training and the last three months for the test set. In order to build the prediction model using the machine learning approach, time series data has been converted to be used with the supervised learning problem. For each train and test set, the sliding window is used to prepare n lagged variable (t − n), (t − n + 1), . . . , t. Each of the variables indicates the input time step (feature). Besides, a future time step (t + 1), which indicates the prediction value, are prepared. Next, four machine learning methods, Support Vector Regression (SVR), Decision Tree Regression (DTR), K-Nearest Neighbors Regression (kNR), and Bayesian Ridge Regression (BRR), are used to design the prediction models to estimate the PM2.5 level. For performance evaluation, Root mean square error (RMSE) has been used. The implementation program code was written in Python 3.4, and the simulation was executed on Kaggle Notebooks computational environment with CPU 2.6 GHz, 8 Gbyte RAM. The machine learning models were implemented using Scikit Learn with the default setting. The parameters set in the machine learning algorithms are shown in Table 2.

552

R. Sen et al.

Table 2 Parameter settings of the learning algorithms Machine learning algorithm Parameters Support Vector Regression (SVR) Decision Tree Regression (DTR)

K-Neighbors Regression (kNR) Bayesian Ridge Regression (BRR)

kernel=‘rbf’, degree=3, gamma=‘scale’, coef0=0.0, tol=0.001, C=1.0, epsilon=0.1 criterion=‘mse’, splitter=‘best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0 n_neighbors=5, weights=uniform, algorithm=auto, leaf_size=30 alpha_1=1e-06, alpha_2=1e-06, lambda_1=1e-06, lambda_2=1e-06

3 Results and Discussion Figure 4a–d shows the comparison between prediction and actual results of air pollution in Kolkata in 2016 using SVR, DTR, kNR, and BRR models, respectively. The testing data is the last month of the year (December), whereas the first 11 months is training data. It is clear from the graph that methods are effective for the prediction of PM2.5 as a small difference is found between prediction and actual values. It is also observed that among the models, BRR performs the best, with a minimum RMSE value of 1.931. Table 3 shows the comparisons of the machine learning models on the all test data sets when the testing size is data of the last one month of the year and training size is the first 11 months. The observed RMSE values indicate that BRR is the best model having the lowest RMSE values for 6 out of 7 data sets. In only one case, for Assam, kNR performs the best. Besides, SVR model does not show good performance, with all cases having high RMSE values compared to other methods. We extended our experiment for all data sets with different training and test data set sizes. The experimental results are shown in Fig. 5a–g. Figure 5a shows that when the test set is changed from 1 month to 2 months or 3 months, RMSE increases significantly. Among the models, Bayesian Ridge performs better than other models. Figure 5b shows that, with the increase of test data size, RMSE increases for SVRbased model, whereas there is a decrease of RMSE for the other three models. Also, irrespective of test data size, the BRR-based model performs the best, followed by kNR and DTR-based model, and the SVR model performs the worst. In Fig. 5c, BRR shows better performance than the other three models. Although changing test data size from 1 to 2 months, the RMSE values increases for all the experiments, changing test data size from 2 to 3 months, RMSE value increases for only DTR-based models. For the next data set in Figure 5d, the change of test data size results in the increase of RMSE for all models except Bayesian Ridge. In Fig. 5e, the performance of the kNR, BRR, and DTR are approximately the same, but SVR does not perform well compared to the other three models. Besides, here performance does not vary significantly with the change in test data size. In this data set, BRR-based model

Prediction of Particulate Matter (PM2.5) …

553

(a) SVR (Test RMSE: 3.260)

(b) DTR (Test RMSE: 2.156)

(c) kNR (Test RMSE: 2.002)

(d) BRR (Test RMSE: 1.931)

Fig. 4 Predicted air pollution at Kolkata at the year 2016 using 4 different models Table 3 Prediction results with different learning methods when testing time is 1 month Area Year RMSE of different models SVR DTR kNR BRR Kolkata Delhi Assam Bengaluru Chennai Ajmer Pondicherry

2016 2016 2016 2016 2016 2016 2016

3.260 3.606 4.4275 5.295 7.160 3.260 4.321

2.156 2.675 2.232 4.324 4.329 2.125 3.804

2.002 2.134 1.574 3.861 3.930 2.002 3.042

1.931 2.055 1.706 3.338 3.530 1.931 2.863

The bold face represents the best performing model according to RMSE value

slightly outperforms the other two models. Figure 5f shows that an increase of test data size decreases the RMSE value for all the models except SVR-based model. In addition, SVR performs worse than the other three models. In Fig. 5g it is also observed that Bayesian Ridge outperforms other models. Considering all the cases, we can conclude that all the models perform best for a one month test data size.

R. Sen et al. 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

5

1 month 2 months 3 months

1 month 2 months 3 months

4.5 4 3.5 RMSE

RMSE

554

3 2.5 2 1.5 1 0.5

SVR

DTR

KNR

BRR

SVR

DTR

(a) Kolkata 11

10

1 month 2 months 3 months

10

8

8

7

6

RMSE

RMSE

7 5 4

6 5 4

3

3

2

2

1

1 DTR

KNR

BRR

SVR

(c) Chennai 9

4.5

KNR

BRR

1 month 2 months 3 months

4

7

3.5

6

3

5

2.5

RMSE

RMSE

DTR

(d) Bengaluru

1 month 2 months 3 months

8

BRR

1 month 2 months 3 months

9

9

SVR

KNR

(b) Delhi

4 3

2 1.5

2

1

1

0.5

SVR

DTR

KNR

BRR

SVR

DTR

KNR

BRR

(f) Ajmer

(e) Assam 5

1 month 2 months 3 months

4.5 4

RMSE

3.5 3 2.5 2 1.5 1 0.5 SVR

DTR

KNR

BRR

(g) Pondicherry

Fig. 5 RMSE value for different training and testing size in different areas in year 2016

Prediction of Particulate Matter (PM2.5) …

555

4 Conclusion and Future Work Like many other countries, India faces numerous challenges to control air pollution. Pollutants like PM2.5, which consists of liquid and solid particle compounds, are dangerous to human health and living bodies. Hence, proper prediction of PM2.5 can effectively address many air pollution-related illnesses. In this paper, we have taken PM2.5 data from 7 different places in India between January 1, 2016, to December 31, 2016, and developed prediction models with four machine learning algorithms. These machine learning models are SVR, DTR, kNR, and BRR. Before building the models, we analyzed the data, created seven datasets, and mapped each univariate time series data into a supervised problem data set using the sliding window method. Our data analysis result shows that air is most polluted after the evening (6 pm to 12 am) and relatively clean during the morning (5–10 am). Among the areas, Kolkata experiences the highest level of air pollution, while Assam has the lowest air pollution level. Experimental results show that all models can forecast the PM2.5 concentration well, with the BRR-based model outperforming others for most cases. We also varied the test data size from 1 month to 3 months, and results demonstrate that an increase in test data size generally increases the RMSE of models. Moreover, BRR usually performs the best, whereas SVR performs the worst among all the models. In future, we will consider PM2.5 data of several years from different geographical areas of India and employ deep learning models to forecast PM2.5 concentration.

References 1. Gupta T, Jaiprakash, Dubey S (2011) Field performance evaluation of a newly developed PM2.5 sampler at IIT Kanpur. Sci Total Environ 409(18):3500–3507 2. Khafaie MA, Yajnik CS, Salvi SS, Ojha A (2016) Critical review of air pollution health effects with special concern on respiratory health. J Air Pollut Health 1(2):123–136 3. Ling SH, van Eeden SF (2009) Particulate matter air pollution exposure: role in the development and exacerbation of chronic obstructive pulmonary disease. Int J Chronic Obstr Pulm Dis 4:233 4. Mirsanjari MM, Zarandian A, Mohammadyari F, Visockiene JS (2020) Investigation of the impacts of urban vegetation loss on the ecosystem service of air pollution mitigation in Karaj metropolis, Iran. Environ Monit Assess 192(8):1–23 5. Kulkarni GE, Muley AA, Deshmukh NK, Bhalchandra PU (2018) Auto-regressive integrated moving average time series model for forecasting air pollution in Nanded city, Maharashtra, India. Model Earth Syst Environ 4(4):1435–1444 6. Gómez-Losada A, Pires JCM, Pino-Mejías R (2016) Characterization of background air pollution exposure in urban environments using a metric based on hidden Markov models. Atmos Environ 127:255–261 7. Murillo-Escobar J, Sepulveda-Suescun J, Correa M, Orrego-Metaute D (2019) Forecasting concentrations of air pollutants using support vector regression & improved with particle swarm optimization: case study in Aburra valley, Colombia. Urban Clim 29:100473 8. Cabaneros SM, Calautit JK, Hughes BR (2019) A review of artificial neural network models for ambient air pollution prediction. Environ Modell Softw 119:285–304 9. Althuwaynee OF, Balogun AL, Al Madhoun W (2020) Air pollution hazard assessment using decision tree algorithms and bivariate probability cluster polar function: evaluating intercorrelation clusters of PM10 and other air pollutants. GIScience Remote Sens 57(2):207–226

556

R. Sen et al.

10. Abhilash M, Thakur A, Gupta D, Sreevidya B (2018) Time series analysis of air pollution in Bengaluru using Arima model. Ambient Commun Comput Syst 413–426 11. Liu H, Li Q, Yu D, Gu Y (2019) Air quality index and air pollutant concentration prediction based on machine learning algorithms. Appl Sci 9(19):4069 12. Cai M, Yin Y, Xie M (2009) Prediction of hourly air pollutant concentrations near urban arterials using artificial neural network approach. Transp Res Part D: Transp Environ 14(1):32–41 13. Azid A, Juahir H, Toriman ME, Kamarudin MKA, Saudi ASM, Hasnam CNC, Aziz NAA, Azaman F, Latif MT, Zainuddin SFM (2014) Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: a case study in Malaysia. Water Air Soil Pollut 225(8):1–14 14. Xu X, Ren W (2019) Prediction of air pollution concentration based on mRMR and echo state network. Appl Sci 9(9):1811 15. Zhou X, Xu J, Zeng P, Meng X (2019) Air pollutant concentration prediction based on GRU method. J Phys: Conf Ser 1168:032058 16. Leong W, Kelani R, Ahmad Z (2020) Prediction of air pollution index (API) using support vector machine (SVM). J Environ Chem Eng 8(3):103208 17. Goswami S (2019) PM energy. https://www.kaggle.com/saptarsi/pmenergy/code. Accessed 05 Dec 2019 18. Bishop CM (2006) Mach Learn 128(9) 19. Tanuwijaya J, Hansun S (2019) Lq45 stock index prediction using k-nearest neighbors regression. Int J Recent Technol Eng 8(3):2388–2391 20. Awad M, Khanna R (2015) Support vector regression. In: Efficient learning machines, pp 67–80 21. Parbat D, Chakraborty M (2020) A python based support vector regression model for prediction of covid19 cases in India. Chaos Solitons Fractals 138:109942 22. Sethi A (2020) Support vector regression tutorial for machine learning. https:// www.analyticsvidhya.com/blog/2020/03/support-vector-regression-tutorial-for-machinelearning/. Accessed 05 Dec 2021 23. Myles AJ, Feudale RN, Liu Y, Woody NA, Brown SD (2004) An introduction to decision tree modeling. J Chemom: J Chemom Soc 18(6):275–285 24. Varoquaux G, Buitinck L, Louppe G, Grisel O, Pedregosa F, Mueller A (2015) Scikitlearn: machine learning without learning the machinery. GetMobile: Mob Comput Commun 19(1):29–33 25. Mostafa SM, Eladimy AS, Hamad S, Amano H (2020) CBRL and CBRC: novel algorithms for improving missing value imputation accuracy based on Bayesian ridge regression. Symmetry 12(10):1594 26. Mostafa SM, Eladimy AS, Hamad S, Amano H (2020) CBRG: a novel algorithm for handling missing data using Bayesian ridge regression and feature selection based on gain ratio. IEEE Access 8:216969 27. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining, p 72

Convolutional Neural Network for COVID-19 Detection Pulkit Agarwal, Neeraj Yadav, Rishav Kumar, and Rahul Thakur

Abstract The unprecedented COVID-19 pandemic, with a zero-patient emerging to millions in few months, has spread across countries and is approaching nearly 280 million of cases worldwide. Billions of check kits are given to hospitals because of exploding cases of COVID-19. Hence, it becomes necessary to implement automatic detection system to restrain COVID-19 spreading. Here, we have demonstrated a convolutional neural network (CNN) model which is trained on X-ray, CT scan images of chest cavity collected from totally different sources to foretell COVID-19 patients and healthy persons. The model is thereby connected to a Web application, which is be made by using Hyper Text Markup Language (HTML), Cascaded Style Sheet (CSS), JavaScript (JS), jQuery, ECMA Script (ES6), and Flask (Python). The model will be providing us the probability so that huge numbers of tests can be done in few clicks with available medical scans. Keywords COVID-19 · Web application · HTML · CSS · JS · jQuery · ES6 · Flask · Python · CNN · Image processing · X-ray · CT scan · Deep learning

1 Introduction A large body of existing research related to the use of machine to improve the efficiency and more precision on foretelling of lungs cancer diagnosis, as there is major focus on CT scan-based lung cancer programs in many parts of this world [1]. The sharp spike in of patients has put unprecedented load over healthcare social infrastructure systems [2] across the world. There are restricted kits for COVID-19 examination of such patients, restricted personal protective equipment (PPE). So, we propose the employment of chest CT scan, X-ray, to observe COVID-19 infection P. Agarwal (B) · N. Yadav · R. Kumar · R. Thakur Department of Electronics and Communication Engineering, Delhi Technological University, New Delhi, Delhi, India e-mail: [email protected] R. Thakur e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_39

557

558

P. Agarwal et al.

within the patients’ exhibiting symptoms. A major contribution of this work lies in proposing a convolutional neural network-based model, for detection of COVID-19 infection, we have taken datasets of the chest X-ray, CT scan pictures of the patients. However, it is a sorting tool to guide educationally for those within the forefront of this analysis. 3D and 2D images when given at input end can utilize crucial information. Infection present in lungs is the major COVID-19 symptoms which could be detected during CT scan, X-ray of the lungs may provide true success during the COVID-19 [3] diagnosis. The application considers the depth and findings of radiology images. In this paper, we considered the benefits of in-depth study sections for extracting features from COVID-19 CT scan, X-ray radiology imaging [4]. Thus, our set of rules can be used both independently and alongside the radiologist. Through these experiments, it is attempted for the detection of COVID-19 within few clicks to be carried out.

2 Literature Overview Acquisition of computer assistance and lung examination pathologies from X-ray images is the field of study that began in the 1960’s and progressed steadily to the next. Decades of paper documenting the most precise result for a variety of conditions including osteoporosis, breast cancer, and heart disease. CT scans also use X-ray as a source of radiation; however, they offer a much higher resolution and make comparison with the standard X-ray images due to the more focused X-ray [4] used to present different patient images. CT is considered generally the best imaging modality of the lung parenchyma. Several studies have yielded amazingly accurate results using CNN by studying transfer to detect lung lumps. Recently, an in-depth learning system developed by Google has gained modern performance using current and pre-CT patient dosage to foretell the chances of lungs cancer.

2.1 Existing Work 1. Diagnosis of chest X-rays: Many of the most in-depth study methods are designed for a variety of lung diseases, including pneumonia [5]. The model is trained to distinguish categories of X-ray images, as well as respiratory illness. 2. Detection of COVID-19 by X-ray images of chest: Since the recent outbreak of COVID-19 infection worldwide, a number of diagnostic techniques have been adopted for depicting COVID-19 patient. The main goal is to find patients with COVID-19 [6]. It acts as a chest X-ray imaging and produces predictions between the 3 classes: no illness, pneumonia, and COVID-19.

Convolutional Neural Network for COVID-19 Detection

559

2.2 Drawback of Existing Work COVID has an effect on several elements of lungs that cannot be pictured solely with X-rays. Thus, with CT images of chest can offer our deep learning (CNN) model to extract additional features that it uses to train and do the predictions. The existing work also uses pre-trained models which require immense computational power. The model trained in this project is built from scratch and can run on medium computational power as well.

2.3 Our Contribution CT scan of chest can also be used to identify COVID, and we will use these alongside X-rays to give more accurate results. Our convolution neural network model will be trained on CT scans, X-rays to predict COVID which is able to provide additional correct results as compared to the prevailing work. Further, a Web application is built using the deep learning models to provide a better interface to use. The Web application is built in Flask, HTML, and CSS as follows: 1. We developed the convolutional neural network architecture made from scratch and tested it in two different radiology datasets to quickly detect COVID-19. 2. Increase in speed in getting results is few clicks away than medical test even for large groups of people. 3. We have also used false positive cases here. 4. The efficiency of the proposed set of rules is more precise on X-ray as compared to CT scans images; the final output is based on the average. 5. We have also used dropout layers in our proposed set of rules is to overcome the overfitting issue and do better in real time. 6. We then created a Web page, which provides the options of uploading X-ray image and CT scan image. Uploaded images are then linked to trained models. Combined model then makes prediction that are then displayed on the Web page. Starting with the machine learning part, we first trained our model on CT scans, X-ray images. As shown in Fig. 1, it is the model diagram of the project which was trained on X-ray image and CT scan image collected from different sources. The final outcome is based on the mean of predictions made by individual models. AI, ML, data science, and data mining are actively used to find an effective solution to the COVID-19 epidemic. Methods that rely on datasets for effective results and the use of in-depth learning in radiology thinking, few community datasets are used. Here in this work, we have collected the datasets from GitHub and Kaggle. For X-ray, dataset is collected from GitHub repository for COVID-19 patients and from Kaggle for normal patients. For CT scan dataset images were collected from COVID-CT Master dataset for normal and COVID-19 patients.

560

P. Agarwal et al.

Fig. 1 Working model diagram

3 Proposed Work The proposed CNN model particularly makes all the specialty of binary class of pictures which classify the various X-ray, CT scan images for detecting true COVID19 [7] patient. The proposed set of rules considers for two different convolutional neural network trained on X-ray and CT scans images to extract simple features and then subsequently improve the pattern of COVID-19 symptoms found in patients with CT scan and X-ray images. A key feature of the proposed strategy is that it is trained from the beginning, no pre-trained models are used. Here, X-ray and CT scan images of normal and COVID-19 cases have been considered. Steps taken are Step1: Generating training, validation, and testing dataset—In this step, we break down the input dataset into required sub-datasets, i.e., train, validate, and test set. Training dataset is used to make the model learn. Second sub-dataset contains testing dataset, which consists of test and validate datasets. A fixed of samples is considered for tuning the various hyperparameters for the neutral assessment of the classifier while choosing the variety of hidden units inside the neural community. Ultimately, the hold out tests for sub-dataset for any set of samples used for testing the overall performance of a particular model [1]. The division price of training, validation, and testing sub-dataset is taken into consideration 70, 15, and 15 in case of X-ray model in Fig. 2a and 70, 20, and 10 in case of CT scan model in Fig. 2b respectively. Step2: Prepare the CNN architecture—In this step, we have made two different model architecture for classification which is further trained on X-ray as shown in Fig. 3 and CT scan shown in Fig. 4 separately. The proposed set of rules has been trained for learning and adapting the fundamental functions like edges and boundaries of computer imaginative and prescient which guarantees that models do not need continuously examine from scratch even after undergoing through CT scan

Convolutional Neural Network for COVID-19 Detection

561

Fig. 2 a ML model diagram (X-ray), b (CT scan)

and X-ray datasets. The final CNN architecture of proposed model contains the best accuracy. • CNN Computers scan pictures as pixels, and it is expressed as a matrix of (N × N × 3). The dimensions are in length, width, and depth. Images make use of 3 channels (RGB), so we have a depth of three. The convolutional layer makes use of kernel filters. Filter is used to look for the presence of some specific patterns within the actual image (input). It is a matrix of (M × M × 3), with a small size but equal in depth due to input data [8]. Then, the filter is convolved across the dimension and height of the input data, to convey a map which is called as activation map. Completely different filters that observe different options square measure

562

P. Agarwal et al.

Fig. 3 CNN architecture for X-rays

Fig. 4 CNN architecture for CT scans

convolved on the input data, and a group of activation maps is outputted that is transferred to subsequent layer within the CNN. Step3: Creation of the Web app—After creating both the models, we proceed with the Web application. Our application front end is developed with HTML, CSS, JS, jQuery, and ES6. It will basically contain two windows to upload CT scan and Xray images, and the result is shown below. For the backend of our application, we have used Flask Web framework module is based on the Python. The backend is used to connect the uploaded images with the CNN model trained before. The Web application is created for providing an interface to the person and the model trained. Separate window for uploading CT scan and X-ray is provided. After clicking the

Convolutional Neural Network for COVID-19 Detection

563

Fig. 5 Web page

predict button, final result along with the probability is shown. The created Web page is shown in Fig. 5.

4 Results and Discussion • Web Application Front end using: HTML, CSS, JS, jQuery Back end using: Flask supported Python We have two different models trained on two different datasets of radiology imaging so we have combined the results of both individual models. The two models are 1. X-ray model 2. CT scan model.

4.1 Supported by X-ray Model We used the proposed model using the deep learning algorithm (DLA) to distinguish COVID-19 cases from normal cases [12]. We used 200 CXR images of general patients and 196 X-ray images of true COVID-19 patients from the image dataset [2] of chest X-ray. We have highlighted the count as shown in Table 1. The dataset is split

564

P. Agarwal et al.

into three sub-datasets, i.e., testing, training, and validation sets. For training, we have used 70% of dataset and the remaining for testing set. Test dataset is again divided equally for validation and test purposes. Distribution of true COVID-19 patients’ dataset is 136, 30, and 30 for learning, validation, and testing, respectively, while, for the normal patients, the learning dataset distribution is 140, and for validation set, it is 30, and for testing, it is 30. The model is trained on training dataset. We have provided accuracy for training and validation as shown in Fig. 7. Graphs for training loss and validation loss are also shown in Fig. 6. The importance of the proposed set of rules is that it automatically discards overfitting issue through dropout method [4]. We have calculated confusion matrix for the performance of proposed model as shown in Fig. 8. The sensitivity is near to 97% (exact value is given in table below), and specificity is near to 91% (exact value is given in table below) on proposed set of rules. From these outcomes, we are able to conclude that patient who visited to the hospital and is COVID-19 true negative is examined as normal patient with much higher precision during the examination by the usage of X-ray images of chest. Table 1 Data source used Collection

Number of images

Characteristics

COVID-19 image data collection [9]

196: COVID-19(PA)

Variable size, brightness, quality, and contrast

NIH chest X-ray [10]

1500: pneumonia: no finding

All images are 1024 × 1024 in dimension

COVID-CT MASTER [11] 349: COVID-19 397: normal

Variable size, contrast, and brightness

Fig. 6 Lowest loss obtained on training data is 0.17 and 0.15 on validation data. The most effective model is saved

Convolutional Neural Network for COVID-19 Detection

565

Fig. 7 94% was the maximum accuracy obtained on training data and 96% on validation data

Fig. 8 Confusion matrix (X-ray)

4.2 Supported by CT Scan Model At first, we trained the model on CT scans and evaluated the performance measures, then the results were compared for COVID-19 and normal cases. We have used dataset from CT Master dataset for CT scan model for both positive and normal patient images. We have put equal number of images, i.e., 349 and 350 as shown in Table 2. We have provided accuracy for training and validation as shown in Fig. 10. Graphs for training loss and validation loss are as shown in Fig. 9. Through the confusion matrix as shown in Fig. 11, sensitivity of the proposed model is around 87%, and specificity is around 98%. From these results, we are able to conclude

566 Table 2 Sampled dataset for experiments

P. Agarwal et al. Image mode

Condition

Source images

Curated images

X-ray

COVID-19 normal

930 1500

196 200

CT

COVID-19 normal

349 397

349 350

Fig. 9 Minimum loss obtained is 0.21 on training data and 0.41 on validation data. The most effective model was saved

that positive patient is often tested accurately with 92.85% accuracy as shown in Table 3. We also found that CT scans are very reliable to use in training our model because CT scans images provide the best and most detailed explanation to the radiologist. Once the proposed model made learned on CT scan images, the same can come across COVID-19 sufferers efficiently in randomized CXR images [13]. The CT Master dataset is one among the most important dataset with the highest range of COVID-19 images and normal radiology.

5 Conclusion In this work, we conducted experimental classification of binary images to identify patients with COVID-19 and normal patient. Furthermore, in this examination, it was found that patients without COVID-19 may develop pneumonia or other lung diseases. In various tests to detect COVID-19 cases, we taken into consideration CT scan and X-ray images of the chest. The proposed set of rules shows result with 93% accuracy while detecting COVID-19 patient with much faster than conventional

Convolutional Neural Network for COVID-19 Detection

567

Fig. 10 92% was the maximum accuracy obtained on training data and 89% on validation data

Fig. 11 Confusion matrix (CT scan)

Table 3 Performance measures calculated

Performance measure X-ray model (%) CT scan model (%) Accuracy

93.33

92.85

Precision (sensitivity) 96.67

87.14

Recall (specificity)

98.38

90.62

RT-PCR test. Weights gained from the proposed set of rules training during CT scan imaging also provide a great response to X-ray images. The final proposed model is predicated on the two saved models which were trained on X-Ray and CT scans. So, if there is one that has symptoms of coronavirus can prove his or her X-ray and CT scans to our model, which is able to make the predictions on the individual image

568

P. Agarwal et al.

Fig. 12 Web page showing results after uploading X-ray and CT scans. Predicted result: COVID positive. Probability: 0.97

by the individual model. The ultimate result will show whether or not the person is COVID positive with a probability which is calculated on the idea of average of the individual model prediction probabilities. The results are displayed through the Web page as shown in Fig. 12. The results obtained that affected person diagnosed with Pneumonia had a greater risk of pleading to be examined as a false positive with a proposed algorithm. The results obtained that affected person diagnosed with pneumonia had a greater risk of pleading to be examined as a false positive with a proposed algorithm. Acknowledgements This project was supported by the Department of Electronics and Communication Engineering, Delhi Technological University, New Delhi, India.

References 1. Horry et al (2020) Proposed COVID-19 detection through transfer learning using multi modal imaging data. IEEE Access 8:149808–149824 2. Bennett et al Smart CT scan based covid19 virus detector. https://github.com/JordanMicahB ennett/SMART-CT-SCAN_BASED-COVID19_VIRUS_DETECTOR 3. Polsinelli et al (2020) A light CNN for detectingCOVID-19 from CT scans of the chest. arXiv:2004.12837. Available: https://arxiv.org/abs/2004.12837 4. Huang et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition 5. Reshi et al (2021) An efficient CNN model for COVID-19 disease detection based on X-ray image classification. Complexity

Convolutional Neural Network for COVID-19 Detection

569

6. Szegedy et al (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision pattern recognition (CVPR), Las Vegas, NV, USA, Jun 2016, pp 2818 2826. https://doi.org/10.1109/CVPR.2016.308 7. Yamashita et al (2018) Convolutional neural networks: an overview and application in radiology. Insights Image 9(4):611–629. https://doi.org/10.1007/s13244-018-0639-9 8. Wu et al (2016) Quantitative analysis of ultrasound images for computer aided diagnosis. J Med Imaging 3(1), 014501. https://doi.org/10.1117/1.jmi.3.1.0145 9. Ravishankar et al (2016) Understanding the mechanisms of deep transfer learning for medical images. In: Carneiro G (ed) Deep learning and data labeling for medical applications (Lecture notes in computer science), vol 10008. Springer, Cham, Switzerland, pp 188–196 10. Cohen et al (2020) COVID-19 image data collection: prospective predictions are the future. arXiv:2006.11988 [Online]. Available: http://arxiv.org/abs/2006.11988 11. Wang et al (2017) Chest X-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, Jul 2017, pp 3462–3471. https://doi.org/10.1109/CVPR.2017.369 (Google Scholar) 12. Narin et al (2021) Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks pattern. Anal Appl 24:1207–1220. https://doi.org/10. 1007/s10044-021-00984-y 13. Born et al (2020) POCOVID-net: automatic detection of COVID-19 from a new lung ultrasound imaging dataset (POCUS). arXiv:2004.12084 [Online]. Available: http://arxiv.org/abs/2004. 12084

Posit Extended RISC-V Processor and Its Enhancement Using Data Type Casting Ashley Kurian and M. Ramesh Kini

Abstract The Posit extended RISC-V processor has gained attention as an alternative to its floating point counterpart. However, the Posit compliant RISC-V processor needs further enhancements to be accepted as a standard. In this paper, the shortcomings of existing Posit integration approaches are discussed and a novel approach is put forth wherein, Posit and the floating point arithmetic are incorporated within the core. We also present a comparative study of various Posit integration approaches in terms of resource utilization and timing requirements. Furthermore, to enhance RISC-V processor that supports the concurrent usage of integer, floating point and the Posit arithmetic, a data type casting unit is incorporated. Two data type casting approaches are suggested and compared in terms of speed and the area occupied. Based on the implementation results, inferences are derived on the apt choice of data type casting approach to be undertaken. Keywords IEEE-754 · Posit · RISC-V · Arithmetic · Data type casting

1 Introduction Posit arithmetic is considered as an alternative to the widely used IEEE-754 floating point standard owing to its property of variable dynamic range and precision. In addition to that, Posit eliminates not a numbers (NaNs) and mathematically incorrect multiple representations of zero, and avoids underflow and overflow through rounding. On the other hand, RISC-V, an open source Instruction Set Architecture (ISA) has been designed to support extensive customization. The Posit extended RISC-V core has received significant curiosity and various approaches of Posit incorporation has been put forth. Jaiswal and So [1] implemented algorithmic flows and open source parameterised Verilog HDL for Posit arithmetic architectures pertaining to basic arithmetic operations. Tiwari et al. [2] provided insights on two separate approaches for Posit integration with RISC-V core and inferred that co-existence of the floating A. Kurian (B) · M. Ramesh Kini National Institute of Technology, Surathkal, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_40

571

572

A. Kurian and M. Ramesh Kini

point and Posit arithmetic can be facilitated by leveraging the custom opcode space of the ISA. The existing approaches include integrating Posit arithmetic either as an execution unit or as an accelerator. However, the demerits such as the inability of co-existence with the IEEE-754 standard in case of the execution unit approach and the increased cycles per instruction as in the accelerator approach of Posit integration necessitates a novel method. Our proposed approach overcomes both the shortcomings by incorporating the Posit unit within the pipeline along with the Floating Point Unit (FPU). Several literature works have proposed various techniques and methodologies to enhance the Posit extended RISC-V processor. Sarkar et al. [3] focused on performance analysis using a reconfigurable hardware accelerator of Posit format for signal processing algorithms. Arunkumar et al. [4] implemented and integrated a Posit Processing Unit (PPU) into rocket chip SoC generator. Hou et al. [5] researched on type Posit and floating point single precision IP core and concluded that Posit exhibits better superiority in representation and dynamic range than the IEEE754 standard. Cococcioni et al. [6] discussed the integration of PPU as an alternative to FPU for deep neural networks in automotive applications. Carmichael et al. [7] developed a multiply and accumulate algorithm namely, Deep Positron for accelerating ultra-low precision (F Type; I5–>F Type; F6 = (P2 –>F) + (I5 –>F) Here, after converting the source operands to float data type, the converted operands are then routed to the FPU for addition. The result of the floating point addition is then written back to the destination register in the float register bank.

4.1.1

MOT Instruction Format

The MOT block handles only R-type instructions having two source and a destination register as operands. The MOT instruction format is presented in Fig. 11. The block gets activated using a custom opcode. The data type of the operand is identified by two bits, specifically ‘01’ for integer, ‘10’ for float and ‘11’ for Posit. From the instruction, the destination type (xd) and the two source types (xs1 and xs2) are {IR[31], IR[14]}, {IR[30], IR[13]} and {IR[29], IR[12]} respectively. Similarly, two source and the destination registers are denoted by IR[24:20], IR[19:15] and IR[11:7], respectively. The IR[28:25] field (funct4) specifies the type of the operation to be executed.

4.1.2

MOT Block Architecture

The architecture of the MOT block is depicted in Fig. 12. The MOT block receives the source operands from all the three register banks, namely, Integer, Float and Posit. The multiplexer at the initial stage selects the source operands depending on the value of xs1 and xs2. The second stage multiplexer checks if the source and the destination data type are different and decides whether a conversion is required. The operands are then routed to the appropriate conversion block if needed. The converted operands of the destination type is then routed to the appropriate functional block (Integer ALU, Posit or Float block) depending on the xd value.

580

A. Kurian and M. Ramesh Kini

Fig. 12 MOT block architecture

Fig. 13 MOT block integrated RISC-V core

4.1.3

Integration of MOT Block to the RISC-V Core

The illustration of the MOT block integrated RV32IMF_XPosit is presented in Fig. 13. The separation of the register banks from the respective functional units is the major change in the execution stage of the MOT integrated core. The inputs to the Float, Posit and the Integer arithmetic block is routed from the MOT block in case of a mixed operand type instruction else, it is directly fed from the respective register banks. If the MOT block is activated (Start_MOT is set), the ALU, Posit and IEEE-754 block waits until Done_ MOT is set. The execution stage of the RISC-V processor along with the MOT block is depicted as in Fig. 14.

Posit Extended RISC-V Processor and Its Enhancement …

581

Fig. 14 Execution stage of MOT block integrated RISC-V core

Fig. 15 DTC instruction format

4.2 DTC: Data Type Converter Block Unlike the MOT block, the DTC block do not handle instructions involving arithmetic computation. It is solely designed for data type conversion from one to the other. The DTC block converts the source type operand to that of destination type and writes back the converted value into the destination register bank.

4.2.1

DTC Instruction Format

The DTC block handles instructions with a single source and a destination as operands. DTC block is also uniquely identified using a custom opcode. The DTC instruction format is presented in Fig. 15. From the instruction, the destination type (xd) and the source type (xs) can be obtained form IR[23:22] and IR[21:20], respectively.

582

A. Kurian and M. Ramesh Kini

Fig. 16 DTC block architecture

Fig. 17 DTC block integrated RISC-V core

4.2.2

DTC Block Architecture

The architecture of the DTC block is depicted in Fig. 16. The initial multiplexer selects the source operand depending on the xs value, which is then routed to the appropriate conversion block if needed. The result obtained is written back to the destination register bank decided by xd.

4.2.3

Integration of DTC Block to the RISC-V Core

The DTC block integrated RV32IMF_XPosit is presented in Fig. 17. Similar to the MOT block, the DTC also demands the separation of the Posit and float register banks from their respective functional units. It is to be noted that the float and Posit converted result is written to their corresponding register banks at the execution stage itself, while the integer converted result is passed to the later stages and finally written back to the integer register bank during the write-back stage only. The execution stage of the DTC block integrated RISC-V core is as presented in Fig. 18.

Posit Extended RISC-V Processor and Its Enhancement …

583

Fig. 18 Execution stage of DTC block integrated RISC-V core

5 Implementation Results Verilog HDL and RTL coding was used to implement the proposed hardware. The code was implemented on Xilinx ARTIX-7 FPGA (xc7a200tfbg676-1) using the Xilinx Vivado tool. The results include comparison of the resource utilization and timing requirements of various Posit integration and data type casting approaches. Table 2 illustrates the clock cycles per instruction for various Posit extended RISC-V cores differing in terms of Posit integration method and co-existence with the IEEE754 standard. It is found that Posit as an accelerator consumes two additional clock cycles compared to the proposed approach wherein, Posit arithmetic is integrated as a tightly coupled unit along with its floating point counterpart. Therefore the novel approach is capable of overcoming both the drawbacks of the existing approaches and can be considered as the most effective method of Posit integration to the processor. Table 3 illustrates the resource utilization for the three approaches (3.1, 3.2 and 3.3) of Posit integration discussed in Sect. 3. It can be noted that the approach 3.2 and 3.3 of Posit integration utilizes the same resources which is slightly higher than approach 3.1, in which the F-extension is totally replaced by the Posit extension. With regard to the data type casting approaches discussed in Sect. 4, performance analysis of the two methods are undertaken. The comparison of the MOT and the DTC block integrated Posit enhanced RISC-V core in terms of area is as presented in Table 4. It is found that both the implementation occupies nearly the same area, with MOT block occupying slightly lesser.

584

A. Kurian and M. Ramesh Kini

Table 2 Timing requirement analysis of various Posit integration approaches Unit under consideration Clock cycles/instruction RV32I Floating point unit Floating point unit in RV32IMF_XPosit Posit unit in RV32IMF_XPosit Posit unit as an accelerator

3 2 8 10 12

The last two entries in this table are bold as the focus of the study is to compare the clock cycle requirements of the two designs, the Posit as an accelerator and Posit within the pipeline. These are the designs we developed. The other entries are already existing Table 3 Resource utilization analysis of various Posit integration approaches Posit integration Co-existence with floating LUT count FF count point arithmetic Tightly coupled Accelerator Tightly coupled

No Yes Yes

3 6 6

5 7 7

Table 4 Resource utilization analysis of the type casting approaches Unit under LUT FF DSP consideration MOT DTC

4 6

7 7

4 4

BUFG 1 1

Table 5 Timing requirement analysis of the type casting approaches (in µs) Operation Destination data type MOT DTC Addition

Multiplication

Division

Float Posit Integer Float Posit Integer Float Posit Integer

1.805 2.805 2 2.784 2 1.005 2.8 2.4 2.8

1.715 1.814 3.109 2.615 1.815 3.814 2.11 2.215 3.8146

Posit Extended RISC-V Processor and Its Enhancement …

585

The timing requirements (in µs) of the data type casting approaches for basic arithmetic operations involving mixed operand types are as presented in Table 5. The table is a subset of all the possible arithmetic operations involving various data types and each instruction handles all the three data types. For instance, the first row of the table indicates that the MOT block can perform an arithmetic operation such as F6 = P2 + I5 in 1.805 µs. However, the DTC block takes only 1.715 µs for the same instruction. The time taken by the DTC block is the sum of the time taken for the conversion of all the source data types to that of the destination type and finally to carry out the arithmetic operation. It is inferred that the DTC block approach of data type casting is always preferred over the MOT method except when the destination data type is that of an integer.

6 Conclusion The proposed method of Posit integration to the RISC-V core is better in terms of speed compared to the existing accelerator approach. It also allows co-existence of the floating point and Posit arithmetic unlike when Posit is implemented as an execution unit within the core by replacing the FPU. Furthermore, addition of features that complement compatibility are vital for the widespread acceptability of the Posit extended RISC-V core in various applications. To that extend, hardware unit to implement data type casting is incorporated into the processor. This complements the concurrent use of integer, floating and Posit arithmetic within the processor. Data type casting block allows the conversion of the floating point operands into Posit which will in turn allow easy porting to the new arithmetic. Moreover, this feature eliminates the need for rewriting the existing code using floating point arithmetic. In addition to that, this hardware innovation supports instructions for carrying out arithmetic operations involving mixed operand types. Different approaches for data type conversion are put forth and compared based on the resource utilization and the timing requirements for few mixed operand instructions. It is inferred that the selection of the data type casting approach depends on the destination data type of the operand. Among the two different data type casting units proposed, the DTC block is preferred to the MOT block except for arithmetic instructions involving integer destination operand. This is with regard to the increased time taken in case of the integer data type. However, in either approaches, the data type casting enhanced RISC-V processor occupies nearly the same area.

References 1. Jaiswal MK, So HK-H (2019) PACoGen: a hardware posit arithmetic core generator. IEEE Access 7:74586–74601

586

A. Kurian and M. Ramesh Kini

2. Tiwari S, Gala N, Rebeiro C, Kamakoti V (2019) PERI: a Posit enabled RISC-V core. arXiv:1908.01466v1 [cs.AR] 3. Sarka S, Velayuthan PM, Gomony MD (2019) A reconfigurable architecture for posit arithmetic. In: 22nd Euromicro conference on digital system design (DSD), Kallithea, Greece, pp 82–87. https://doi.org/10.1109/DSD.2019.00022 4. Arunkumar MV, Bhairathi SG, Hayatnagarkar HG (2020) PERC: Posit enhanced rocket chip. CARRV Valencia, Spain 5. Hou J, Zhu Y, Du S, Song S (2019) Enhancing accuracy and dynamic range of scientific data analytics by implementing Posit arithmetic on FPGA. J Sign Process Syst 91:1137–1148. https:// doi.org/10.1007/s11265-018-1420-5 6. Cococcioni M, Ruffaldi E, Saponara S (2018) Exploiting posit arithmetic for deep neural networks in autonomous driving applications. In: International conference of electrical and electronic technologies for automotive, pp 1–6. https://doi.org/10.23919/EETA.2018.8493233 7. Carmichael Z, Langroudi HF, Khazanov C, Lillie J, Gustafson JL, Kudithipudi D (2019) Deep positron: a deep neural network using the posit number system. In: Design, automation & test in Europe conference & exhibition (DATE), pp 1421–1426. https://doi.org/10.23919/DATE.2019. 8715262 8. Cococcioni M, Rossi F, Ruffaldi E, Saponara S (2021) Vectorizing Posit operations on RISC-V for faster deep neural networks: experiments and comparison with ARM SVE. Neural Comput Appl 33:10575–10585. https://doi.org/10.1007/s00521-021-05814-0 9. Klöwer M, Düben P, Palmer T (2019) Posit numbers as an alternative to floating point numbers for weather and climate models. In: CoNGA’19: proceedings of the conference for next generation arithmetic, Article No. 2, pp 1–82019

Securing Microservice-Driven Applications Based on API Access Graphs Using Supervised Machine Learning Techniques B. Aditya Pai, Anirudh P. Hebbar, and Manoj M. V. Kumar

Abstract Distributed microservice-driven applications are composed of application programming interface (API), which are accessed by other microservices or apps or even developers through API endpoints. Attackers often abuse API access to exploit these endpoints’ business logic and computing resources. Current intrusion detection systems for large applications deploy entry-level classification to find malicious user requests. However, there is a possibility of attackers showing intrusive behavior after moving past the entry-level checks. In microservice-driven architectures, the API access patterns of normal users are observed to be different from that of attackers. These access patterns consist of APIs called in specific orders reflecting an access graph that depicts user behavior in long-running sessions. This paper presents a mechanism where microservice-based applications are monitored holistically to extract these API call patterns and create a user behavior graph. These graphs are processed as embeddings through representational learning analyzed to classify normal and malicious behavior using machine learning algorithms. This methodology has shown significant results in its efficiency and ability to generate insights in a real-time environment by subsuming the information used by entry-level intrusion detection systems and user behavior patterns. Keywords Microservice · Access graphs · Node2Vec · Classification · User behavior

1 Introduction With the increasing complexity of business problems, software systems have evolved to accommodate scalability, continuous delivery, and domain-driven design. Developers extensively use the microservice architecture to create larger, more complex B. Aditya Pai (B) · A. P. Hebbar · M. M. V. Kumar Department of Information Science and Engineering, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka 560064, India e-mail: [email protected] URL: http://www.nmit.ac.in © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_41

587

588

B. Aditya Pai et al.

Fig. 1 Microservice architecture

applications that are better developed and managed as a combination of smaller services that work in cohesion for more extensive, application-wide functionality. Figure 1 shows a generalized microservice-driven architecture. Moreover, technologies like the Internet of things (IoT) [1], edge computing [2], and blockchain [3] require service-oriented architectures to connect functionality on distributed computing systems. As a result, APIs are deployed to expose business logic and computing power through software systems that have numerous microservices, tantamount to many API calls. This creates an opportunity for attackers and malicious programs to misuse the access that the API endpoints provide to microservices and computing power [4]. While there exist methods to detect intrusion at the application entry-level [5], it is also important to monitor the access patterns in large distributed microservice-based applications for anomalous behavior. Methods like graph matching and search algorithms can be applied to compare the API access graphs to pre-defined normal and anomalous user behavior but do not scale well for large distributed applications with non-deterministic outcomes. We investigate the possibilities of detecting malicious behavior of users from API access graphs in large-scale microservice-based applications with low latency through supervised machine learning techniques. In this paper, we discuss a methodology for securing microservice architectures using supervised machine learning algorithms based on the access patterns of microservice APIs. Our work emphasizes using the graph data structure and representational learning to model the data, which creates an avenue to retrieve more information during runtime while accounting for fast inference times. This creates a new perspective to approach security in distributed environments. The upcoming sections of the paper are organized as follows. Section 2 discusses related work. Section 3 proposes the architecture which can be used to overcome the vulnerabilities. In the experiment section, we discuss individual components involved in the architecture and highlight their importance and inner workings. We

Securing Microservice-Driven Applications Based on API …

589

then showcase the results of the classification algorithms used for the problem statement. Finally, we provide the future direction with which this paper can be taken forward in other domains using more complex systems and conclude our proposal by summarizing the paper.

2 Related Work In the paper titled API Security in Large Enterprises: Leveraging Machine Learning for Anomaly Detection, the authors discuss a machine learning-based approach to analyze API traffic by training the models on features like bandwidth and number of requests per token. Support vector machine (SVM) was used to achieve this. Their proposed method achieved an F1-Score of 0.964 with 7.3% of false-positive rates [6]. According to the research paper titled “Malware detection via API calls, topic models, and machine learning” proposed a model which uses the concepts of text mining and topic modeling to detect any anomaly based on the types of API call sequences. They inferred decision tree as the go-to algorithm for this problem statement as they could serve as an early warning system to the experts, thanks to its “if-else” rules structure [7]. In the approach proposed in the paper titled “Analyzing API Sequences for Malware Monitoring Using Machine Learning”, make use of natural language processing (NLP) methods such as word embedding and latent Dirichlet allocation (LDA) to analyze the sequences of malicious API calls, thus helping them to identify the important features of malicious programs [8]. The paper titled “Malware detection based on mining API calls” presented an approach to collect the API call sets via portable executable files, which generates a set of features. These features are then trained on a classification model to detect unseen malware. They have achieved an accuracy of 98.3% and a detection rate of 99.7%. Their approach increased the state-of-the-art implementation by increasing the accuracy by 5.24%, detection rate by 2.51%, and the false alarm was decreased from 19.86 to 1.51% [9]. According to a research paper titled “An Anomaly Detection Algorithm for Microservice Architecture Based on Robust Principal Component Analysis”, the algorithm for mining causality and detecting the root cause has two parts: Invocation chain anomaly analysis based on robust principal component analysis (RPCA) and a single indicator anomaly detection algorithm. The single indicator anomaly detection algorithm consists of the isolation forest (IF) algorithm, one-class support vector machine (SVM) algorithm, the local outlier factor (LOF) algorithm, and the 3σ principle. In the four batches of the data, their proposal had a score of 0.8304 out of 1 [10].

590

B. Aditya Pai et al.

In the paper titled “A cyber risk-based moving target defense mechanism for microservice architectures”, they aim to tackle the problem of shared vulnerabilities in a microservice architecture. Based on the concept of Moving Target Defenses (MTD), their mechanism runs a risk analysis of a microservice architecture to identify and rank the vulnerabilities. A security risk-oriented software diversification is employed, led by a defined diversification index. The microservices attack surfaces are altered at runtime, thanks to diversification performed at the runtime, which leverages both model and template-based automatic code generation techniques, thus altering the programming languages and images in the microservices. This introduces uncertainty for the attackers, which reduces the vulnerability of the microservices. With this, they have achieved an average success rate of over 70% attack surface randomization [11]. Our literature survey has covered a variety of existing security mechanisms involving rule-based security systems, policy compliance frameworks, and machine learning. This, coupled with the study of microservice architectures, has shown scope for leveraging the API access patterns to track user behavior. Consequently opening avenues to explore the possibility of using this information to detect access patterns that attempt malicious usage of functionality and the computing resources these applications provide. Moreover, representing this information as a graph maintains the relationships and chronology in access patterns, motivating our approach to this as a supervised machine learning problem on graph representations.

3 Proposed System Distributed software systems are composed of numerous loosely coupled microservices, each specific functionality and business logic. We propose an additional module to the microservice-based architecture to retrieve and model the user behavior as shown in Fig. 2. When the user sends a request through an API, the access graph of microservices involved in serving the request is created for each user session. In addition to the business logic, each microservice executes a small module to append its identifier to a JSON file for each user session. At the end of a user session, this JSON file will consist of an ordered list of microservice identifiers and the user session’s API access pattern. The JSON file for multiple user sessions, over time, consists of user session id and the access graph where the “from id” and “to id” represent the source and destination nodes, respectively, as shown in the figure. We propose a system that interprets these access graphs as multidimensional vectors, thereby approaching the problem of detecting logic attacks from malicious access graphs as a downstream supervised machine learning task. The outcome of this access graph classification as usual or malicious is actionable insights that can be used to avoid contingencies by ending the user session before the client request is served.

Securing Microservice-Driven Applications Based on API …

591

Fig. 2 Architecture for the proposed system

Fig. 3 Flowchart for the experiment

4 Experiment This section discusses the setup used for testing the proposed system. We emphasize on the API access pattern classification using supervised machine learning in a runtime scenario for a real-world application with microservice architecture as shown in Fig. 3.

592

B. Aditya Pai et al.

Fig. 4 JSON depicting microservice access order through API calls

4.1 Dataset The open-source dataset used in this experiment consists of around 1700 user sessions of API access data as JSON objects. A relational dataset that is manually labeled based on observed behavior metrics such as session duration, number of unique API calls made, inter-microservice API call duration, and number of users accessing the application with a different session, all captured from the production environment of two diverse distributed microservices-driven applications. Figure 4 shows the sample JSON data of a single record.

4.2 The Graph A graph G(V, E) is a data structure represented by vertices (or nodes) V and edges E. In our interpretation, each microservice represents the node, and the API calls between microservices are edges between the nodes, as shown in Fig. 5. This representation also accommodates behavioral metrics, such as the number of unique API calls (number of edges in the graph) and the number of microservices involved (the number of nodes in the graph).

4.3 node2vec The node2vec algorithmic framework is used to represent the nodes in a graph as embeddings or n-dimensional vectors [12] (refer Eq. 1): node2vec(G(V, E, W)) −→ Rn

(1)

Securing Microservice-Driven Applications Based on API …

593

Fig. 5 API access graphs from 3 user sessions

We combine the individual access graphs of all 1700 user sessions to form a master graph for the sampling strategy used by the node2vec algorithm to create a “corpus.” This sampling strategy simulates biased random walks in the neighborhood of each node in this master graph while balancing the exploration-exploitation trade-off to create a set of directed acyclic graphs. It requires four arguments: • Number of walks: number of random walks to be generated from each node in the graph • Walk length: number of nodes in each random walk • P: return hyperparameter controls the probability of returning to the source node • Q: in/out hyperparameter controls the probability of exploring undiscovered parts of the graph. These sequences are input for a skip-gram negative sampling model that generates input sequence and context node pairs over a given context window, fed into an external two-layer neural network. The weights of hidden layers in this trained neural network are node embeddings, where the number of neurons is the size of the embedding. In this experiment, we train the model to generate node embeddings of size 128 through 100 random walks per node, where the length of each walk is 100 nodes. The node embeddings created for a given graph in a user session are summed up to form the embedding for the graph.

4.4 Classification Algorithms This section talks about the various classification algorithms tested on embedded graph vectors. It ranges from logistic regression, the baseline model for our problem statement, to more powerful algorithms like XGBoost. We then compare the performance of these algorithms based on various classification metrics such as Accuracy, Recall, F1-Score, Precision, and other attributes such as speed of execution. Logistic Regression: It is the simplest classification algorithm that can be used to solve both binary and multi-class classification problems. We utilized this algorithm

594

B. Aditya Pai et al.

as our benchmark model, thus allowing us to implement more complex algorithms and test their performance on the dataset [13]. Mathematically speaking, logistic regression is given by (refer Eq. 2) yi =

e(β0 +β1 ×X ) (e(β0 +β1 ×X ) )

(2)

where y i is the predicted value, β0 is the intercept term, β1 is the coefficient values of the input features, and X is the input feature. Support Vector Classifier: It is another classification algorithm whose working principle is to find an optimal hyperplane that maximizes the margin between the two classes. The hyperplane is nothing but a decision boundary in an n-dimensional space. Support vectors are those data points that are closest to the said hyperplane. Margin can be defined as the gap between two lines on the closest data points of two different classes. Support vector classifier generates the hyperplane iteratively and finds the best hyperplane to separate the two classes accurately [14]. Mathematically, support vector classifier can be explained by (refer Eq. 3) wt ∗ x + b = 0

(3)

where W is the weight vector, X is the input variable, B is the bias. The equation of calculating the margin is given by (refer Eq. 4) (x + − x − ) · w = (x + − x − ) ·

w ||w||

(4)

where x + and x − represent the positive classes and negative classes, respectively, and W represents the weight vector. The equation which maximizes the margin is given by (refer Eq. 5) L(w) =



= 1 max(0, 1 − yi [wt xi + b] + λ||w||2

(5)

i

The first term is known as the “hinge loss,” which penalizes the misclassifications made by the model. The second term is known as the “regularization” term, which is a technique to avoid overfitting. Random Forest: It is a robust ensemble algorithm used as a classifier and a regressor. The base algorithm used is the decision tree trained simultaneously at its core. Random forest uses the bagging technique, where a row and column sampling is performed on the dataset and fed to each decision tree. “Aggregation” is then performed where the majority of the predicted classes from each of these trees are considered for the final prediction for classification [15]. XGBoost: XGBoost, at its core, is an ensemble learning algorithm that combines the results of many models (decision trees) called the base learners to make a prediction.

Securing Microservice-Driven Applications Based on API …

595

Weights are assigned to the input features then fed to the decision tree models. The weight of the variables which are mispredicted is increased and is then fed to the next decision tree, thus learning from the previous model’s mistakes. Combining these results will provide us with one precise and robust model [16]. Mathematically speaking, XGBoost is given by Eq. 6 Yˆi =

K −1 

f k (x j ),

fk ∈ F

(6)

k

where k is the number of trees, f is the functional space of F, F is the set of possible decision trees. The objective function for the above model is given by (refer Eq. 7) Obj() =

n  i

l(yi , yˆ i ) +

K −1 

( f k )

(7)

k

The first term is the loss function used to calculate the actual and predicted value difference. The second parameter is the regularization parameter which acts as a penalizing function to prevent overfitting.

5 Results and Discussion The results obtained were very encouraging; the machine learning models performed well, both speed and accuracy for the given data. While the baseline model, i.e., logistic regression, performed well in speed, and its accuracy was around 80%. The random forest model outperformed all the other tested models in terms of accuracy, whereas the XGBoost model excelled well in the speed of execution parameter. Confusion matrix was plotted to evaluate the model performance (Fig. 6). Using suitable formulas, we can derive various classification metrics such as Accuracy, Precision, Recall, and F1-Score. The results obtained from the confusion matrix are displayed in the table below (Table 1). It was observed that ensemble machine learning algorithms performed well for this application. The graph-based approach proposed by our work provides a new direction in securing microservice architectures. Compared to existing state-of-the-art methods that require explicit metadata, our approach consistently performs above the benchmark while accommodating implicit data in the graph data structure. Moreover, the proposed system also creates a pathway to model the user’s behavioral attributes, an advantage over the existing methods.

596

B. Aditya Pai et al.

Fig. 6 Confusion matrix obtained from the classification models used on the dataset Table 1 Results for trained machine learning models Algorithm Accuracy (%) Precision (%) Recall (%)

Logistic regression Support vector classifier Random forest classifier XGBoost

F1-Score (%)

Execution speed (in seconds)

79

78.5

75.5

77

0.00797

93

94

91.5

92.5

0.03039

96

96.5

96

96

0.02998

94

94

93

93.5

0.00760

Securing Microservice-Driven Applications Based on API …

597

6 Future Research Directions This section discusses the implications of our access graph-based pattern recognition approach and recommends directions for future work. From a methodological standpoint, in situations where unlabeled graph data is retrieved, access graph clusters can be formed through unsupervised learning methods and active learning. We see scope for node classification and edge prediction methods in finding impending attacks and bottlenecks in the distributed environments, which form the immediate future direction for our work. This methodology is transferable to other use cases involving security, such as analysis of user behavior patterns for problems such as fraudulent transaction detection in finance applications [17] and finding bots on social media (or Internet). Domains beyond security also have an immense scope, for example, analysis of user behavior for testing UI/UX of applications [18]. Moreover, this approach can be extended to analyze the communication patterns in other systems that leverage distributed environments, such as decentralized applications (DApps) [19], edge computing, and Internet of things (IoT).

7 Conclusion This work explores the graph-based supervised machine learning approach to detect intrusive behavior in distributed microservice-driven applications. The results presented in this research work have presented multiple supervised machine learning algorithms by representing the microservice access patterns as a graph. As seen in the results, the ability to consider the access patterns of the user in addition to the usual features from the user request metadata, all at fast inference times, places this approach at an exciting position concerning existing methods.

References 1. Lu D, Huang D, Walenstein A, Medhi D (2017) A secure microservice framework for IoT. In: 2017 IEEE symposium on service-oriented system engineering (SOSE), pp 9–18 2. Qu Q, Xu R, Nikouei SY, Chen Y (2020) An experimental study on microservices based edge computing platforms. In: IEEE INFOCOM 2020—IEEE conference on computer communications workshops (INFOCOM WKSHPS), pp 836–841 3. Driss M, Hasan D, Boulila W, Ahmad J (2021) Microservices in IoT security: current solutions, research challenges, and future directions 4. Abdelhakim H, Salima Y (2021) Securing microservices and microservice architectures: a systematic mapping study. Comput Sci Rev 41:100415 5. Pereira-Vale A, Fernandez Eduardo B, Monge R, Astudillo H, Márquez G (2021) Security in microservice-based systems: a multivocal literature review. Comput Secur 103:102200 6. Baye G, Hussain F, Oracevic A, Hussain R, Kazmi SA (2021) API security in large enterprises: leveraging machine learning for anomaly detection. In: 2021 international symposium on networks, computers and communications (ISNCC), pp 1–6

598

B. Aditya Pai et al.

7. Sami A, Yadegari B, Rahimi H, Peiravian N, Hashemi S, Hamze A (2010) Malware detection based on mining API calls. In: Proceedings of the 2010 ACM symposium on applied computing, SAC ’10, New York, NY, USA, 2010. Association for Computing Machinery, pp 1020–1025 8. Voronin V, Morozov A (2021) Analyzing API sequences for malware monitoring using machine learning. In: 2021 3rd international conference on control systems, mathematical modeling, automation and energy efficiency (SUMMA), pp 519–522 9. Shi Y, Sagduyu YE, Davaslioglu K, Li JH (2018) Active deep learning attacks under strict rate limitations for online API calls. In: 2018 IEEE international symposium on technologies for homeland security (HST), pp 1–6 10. Jin M, Lv A, Zhu Y, Wen Z, Zhong Y, Zhao Z, Wu J, Li H, He H, Chen F (2020) An anomaly detection algorithm for microservice architecture based on robust principal component analysis. IEEE Access 8:226397–226408 11. Torkura KA, Sukmana MI, Kayem AV, Cheng F, Meinel C (2018) A cyber risk based moving target defense mechanism for microservice architectures. In: 2018 IEEE international conference on parallel distributed processing with applications, ubiquitous computing communications, big data cloud computing, social computing networking, sustainable computing communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), pp 932–939 12. Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. CoRR, abs/1607.00653 13. Mao Q, Wang L, Goodison S, Sun Y (2015) Dimensionality reduction via graph structure learning. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’15, New York, NY, USA, 2015. Association for Computing Machinery, pp 765–774 14. Cramer JS (2002) The origins of logistic regression. Econom eJournal 15. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory, COLT ’92, New York, NY, USA, 1992. Association for Computing Machinery, pp 144–152 16. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, New York, NY, USA, 2016. Association for Computing Machinery, pp 785–794 17. Bird J (2017) DevOps for finance. O’Reilly Media 18. Steinegger RH, Giessler P, Hippchen B, Abeck S (2017) Overview of a domain-driven design approach to build microservice-based applications. In: The third international conference on advances and trends in software engineering 19. Nikouei SY, Xu R, Chen Y, Aved A, Blasch E (2019) Decentralized smart surveillance through microservices platform. In: Sensors and systems for space applications XII, vol 11017. International Society for Optics and Photonics, p 110170K

Scaling and Cutout Data Augmentation for Cardiac Segmentation Elizar Elizar, Mohd Asyraf Zulkifley, and Rusdha Muharar

Abstract Convolutional neural network (CNN) has a compelling learning capability, especially on spatial data representation, crucial in dealing with complex learning tasks. However, it requires extensive training data to optimally fit the model, making it susceptible to overfitting problems when the data is scarce, thus limiting its generalization ability. Therefore, it is essential to collect enough data or supplement the dataset with artificial data to improve the performance of the CNN model. In this paper, simple data augmentation through the geometry transformations of scaling and cutting is explored to augment the training dataset for cardiac segmentation. The generated images and labels are combined with the original dataset to double the training data size. Three state-of-the-art semantic segmentation models, which are U-Net, TernausNet, and DabNet, were used to validate the performance improvement of the proposed data augmentation method. The best performance improvement is returned by DabNet with an increment of 0.24% and 5.14% for mean accuracy and mean intersection over union, respectively. Hence, a better segmentation performance will enable medical practitioners to localize the organs effectively and efficiently. Keywords Semantic segmentation · Data augmentation · Cardiac analysis · Magnetic resonance imaging

E. Elizar · M. A. Zulkifley (B) University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia e-mail: [email protected] E. Elizar · R. Muharar Universitas Syiah Kuala, Kopelma Darussalam, 23111 Banda Aceh, Aceh, Indonesia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_42

599

600

E. Elizar et al.

1 Introduction For medical imaging diagnostic applications, a reliable image segmentation algorithm is a crucial module in helping medical practitioners. An automated semantic segmentation algorithm can reduce the doctors’ workload and provide them with better imaging diagnostic abilities. The significant performance potential of the convolutional neural network (CNN) has made it the most utilized deep learning tool for various automated medical analysis tasks that include disease classification, organ detection, and lesion segmentation. In general, CNN utilizes a set of parameterized and sparsely connected kernels to preserve the spatial characteristics of images. When restricted to a subset of images, the kernel process of local descriptors in the CNN layer will typically involve statistics of image gradients. The network structure then can be fine-tuned to produce feature maps at various scales and locations, pooled repeatedly, normalized at various layers, and quantized the output through the activation function. This flow is hierarchically repeated several times, whereby the weights will be updated to fit the targeted task. In other words, CNN learns a hierarchical feature representation of the visual data. Hence, it is imperative to supply the model with large-scale training data [1]. Most computer vision applications have tried to implement deep learning techniques for various tasks over conventional machine learning. The advancement of deep learning is also supported by the introduction of parallel computation that provides high flexibility to perform intensive calculations, which inspires many researchers to fit millions of parameters of deep learning models effectively. It is important to stress that collecting a larger-scale dataset to study visual training may greatly benefit each medical-based vision task. In some circumstances, the dataset is limited due to clinical restrictions, whereby the computational power has increased every year (a newer, better, and faster graphics processing unit). This dataset limitation wastes the potential of deep learning in improving the system performance. Furthermore, large-scale data can overpower the effect of noise in the label space [2]. This development is also supported by the availability of many datasets of public medical data of the same problem. Furthermore, to overcome the overfitting issue, it is essential to improve the generalization ability of deep learning models, which is one of the most challenging tasks. A good generalization ability can be simplified as the ability of the model to perform well even when evaluated on previously unseen data. Models with poor generalization ability tend to produce good results on training data but not testing data. Data augmentation is one of the promising methods to enlarge the number of training data so that the overfitting problem can be minimized. With the assumption that the original dataset should contain more information that can be further extracted through the augmentation process, the artificially created data can reduce the likelihood of overfitting by supplementing the number of the training dataset. The augmented data will create a dataset with a more comprehensive set of possible data points, thus bridging the distance gap between the training and testing datasets [3].

Scaling and Cutout Data Augmentation for Cardiac Segmentation

601

2 Related Work Shorten et al. in [3] have surveyed some data augmentation techniques commonly used in deep learning applications such as geometric transformations, random erasing, and generative adversarial networks. They aim to increase the existing training data variation without performing additional data collection and manually annotating the new data. They have shown that the use of data augmentation could improve the performance of the deep learning model by expanding the dataset for the cases of limited datasets, which is in line with the advantages of the significant data capabilities. Besides, synthetically generated data is used in [4] to increase the number of data due to the emergence of new diseases. The medical community will take a while to collect the data since the disease is reactively new. Furthermore, the new variants are continuously observed, making data collection challenges. Besides that, data augmentation can be implemented by increasing the information variation in the images by manipulating them in several ways, such as flipping, resizing, and random cropping [2, 5, 6]. LeCun et al. in [7] have applied some geometric transformations, including horizontal and vertical translations, scaling, squeezing, and horizontal shearing, to improve the performance of their proposed model, LeNet5. The same intention was also implemented by Krizhevsky et al. [8] to improve AlexNet by applying image mirroring, cropping, and randomly adjusting color and brightness values determined from the original dataset’s component analysis. The data augmentation model in [9] utilizes a combination of photometric and geometric image transformations to simulate realistic, complex imaging variations, and data augmentation to the magnetic resonance imaging (MRI) image processing. All the previously mentioned works have shown promising improvement in the model performance either through simple or complex data augmentation techniques.

3 Methods 3.1 Overview Like most of the CNN-based segmentation networks, the base models used in our experiments follow encoder–decoder pair structures, both symmetrical and asymmetrical architectures. A CNN encoder function is to detect higher-level feature representation of the objects in the dataset, which is in our case are the left ventricular cavity (LV), the left ventricular myocardium (MYO), and the right ventricular cavity (RV) in the input image. Subsequently, the CNN decoder takes this information and enriches it with information from the lower layers passed directly and indirectly from the encoder to produce labels prediction for each pixel in the original input image.

602

E. Elizar et al.

3.2 CNN Network Structure The comparison method is independent of the network structure. We adopted three commonly used CNN network structures as follow: 1. U-Net [10]; The U-Net is a fully convolutional network with 18 convolutional layers. As shown in Fig. 1, the U-Net architecture consists of two parts; whereby the first part, the encoder side will compress the latent variables, while the second part, the decoder side will deconstruct the latent variables into pixel-level labels. 2. TernausNet [11]: It is an improvement of the original U-Net architecture with a pre-trained encoder module. As shown in Fig. 2, this architecture uses fine-tuned weights to initialize the encoder side, which can be considered as the transfer learning. A pre-trained network or transferred weights can reduce the training time if the starting points happen to be close to the new problem and possibly prevents the occurrence of overfitting. 3. Depth-wise Asymmetric Bottleneck (DABNet) [12]: This network structure was inspired by the bottleneck design in ResNet and factorized convolution in ERFNet, whereby both designs were integrated into one model called the deepwise asymmetric bottleneck (DAB) module. As shown in Fig. 3, each bottleneck path will have a reduced number of channels to half, which is then restored to the original number of channels by a point-wise convolution by using n × 1 depthwise convolution, followed by 1 × n depth-wise convolution in both branches of DAB module. This network structure only adopted dilated convolution on its second branch, to create a large receptive field to extract broader context information without decreasing the resolution of the feature maps to reduce the computational cost.

Fig. 1 U-Net architecture consists of an encoder module that captures semantic information and a decoder module that enables precise localization information (modified from [10])

Scaling and Cutout Data Augmentation for Cardiac Segmentation

603

Fig. 2 TernausNet architecture, a modification from the U-Net structure with the use of a pre-trained encoder module during the initialization phase (adapted from [11])

Fig. 3 DabNet architecture with the use of some deep-wise asymmetric bottleneck (DAB) module (adapted from [12])

3.3 Data Augmentation In this work, we intend to investigate the effect of a simple data augmentation technique to improve the performance of cardiac segmentation of magnetic resonance imaging (MRI). Data augmentation can be defined as the method to train the model with slightly modified data to create different examples added to the whole training dataset [13]. The additional data can be taken from the neighborhood distribution of training data to extend the support of the distribution of training data. All augmented data is added by using an oversampling strategy rather than an undersampling strategy, which has been proven to work better [14]. The proposed method

604

E. Elizar et al.

Fig. 4 Result visualization of the data augmentation functions by scaling and cutout from the original image

uses two simple steps to create augmented data by providing a variety of image contrasts to the training dataset images, thereby increasing the network’s focus for better recognition of the geometric features of the segmentation target. In Algorithm 1, we illustrate the detailed steps of the proposed simple data augmentation method. This augmentation process uses image geometric transformation functions which are built to reflect common data variations that exist in the magnetic resonance images. This image transformations model T geo simulates image spatial variance arising from the dataset acquisition process (e.g., the adjustment of image resolution, field-of-view). Image transformation function can be modeled as   xgeo = Tgeo (x ; a) = Tgeo x ; sx , sy

(1)

where a contains two affine parameters [sx , sy ] to characterize the scaling S operations which are performed during image transformation (see Fig. 4). Algorithm 1. Simple Scaling and Cutout Data Augmentation Input: A batch of images x from the labeled training set: DL ∪ DU, a segmentation network fθ for i = 0,…., k−1 do Equalize the image size according to U-Net processing standards end for for i = 0,…., k−1 do Increase the image size with a ratio of 1:1.125 Cutout the image size to fit according to U-Net processing standard Apply the augmentation rule to all respective train image Combine augmented images with original training images in step 2–4 end for

Scaling and Cutout Data Augmentation for Cardiac Segmentation

605

Compute the performance of image segmentation using Eq. 2(b) We assume that a 2D image x, the affine parameter a, a pixel x(u, v) at position (u, v) in the original image x is transformed to a new position x(u , v ) via the following matrix multiplications: ⎡

⎤ ⎡ ⎤ u u ⎣ v  ⎦ = S.⎣ v ⎦ 1 1 ⎡ ⎤ 1 + sx 0 0 S = ⎣ 0 1 + sy 0 ⎦ 0 0 1

(2)

(3)

We use a normalized 2D Cartesian coordinate system centered at (x, y) = (0, 0) to specify each pixel’s location (u, v). Each transformation parameter is restricted in a user-defined range to control the range of the spatial transformations:  − ∈ai ≤ a i ≤∈ai ; ∀a i ∈ sx , sy

(4)

We also assume that each of the pixel location is normalized by the input spatial dimensions and its value lies in [−1, 1]: − 1 ≤ u ≤ 1, − 1 ≤ v ≤ 1. Geometric transformation T geo can transform the perturbed prediction image back to the coordinates of the original image accordingly, before computing the loss function. The regularization loss is defined as below:   RD (x; f θ , T ) = D f θ (x), Tt −1 ( f θ (T (x; t)))

(5)

where T t−1 denotes the inverse transformation of T (.; t) that can be defined as −1 = S −1 Tgeo

(6)

4 Experiment Result and Discussion 4.1 Dataset The MRI cardiac dataset is provided by the Automated Cardiac Diagnosis Challenge (ACDC) [14], which is an open public dataset for cardiac MRI segmentation. The GT images were provided by medical experts, who have manually labeled each frame to locate the left ventricular cavity (LV), the left ventricular myocardium (MYO), and the right ventricular cavity (RV), for both end-diastolic and end-systolic frames. Each image has the original in-plane pixel with spacing ranges of 1.37 × 1.37 to 1.68 × 1.68

606

E. Elizar et al.

mm2 . All images were centrally sized to 224 × 224 pixels to confirm with the U-Net standard image size requirement. We have organized the data division into a train-test dataset by splitting the original images (100 subjects) into 2 subsets: 75 subjects for the training set and 25 subjects for the testing set. The weights initialization has been set to random, whereby the mean performance of the test image dataset is used to report the model performance.

4.2 Experimental Setup Training Configuration: All experiments were computed by a Nvidia® GeForce® Titan RTX that implements the TensorFlow–Keras library. The network was trained with a batch size of 16 images per iteration for at least 100 epochs, whereby the weights are randomly initialized without applying any pre-trained parameters. The learning rate chosen is 0.01, and the chosen loss function is categorical cross-entropy. All tested CNN models were trained by using the original image dataset first before the augmented data test is performed. Then, the original training dataset is combined with the augmented data as in Algorithm 1 to train the same set of CNN models with the same configuration as in the first experiments. Performance Metrics: We use two standard segmentation metrics for the augmentation performance evaluation to all three CNN network structures, which are • Pixel-wise mean accuracy (Acc), assessment that uses the total sum of true positive and true negative pixels over the total number of tested pixels, N Total [15]: Accuracy =

Tr+ve + Tr−ve NTotal

(7)

I i represents a pixel at the location i. Tr+ve (true positive) = for a correctly label predicted pixel of i as class i Tr−ve (true negative) = for an incorrectly label predicted pixel of not class i has been labeled as other classes than i • Class-based mean intersection over union (IoU); which focuses on the mean, whereby the average of each class of the overlap between prediction and ground truth over union between the predicted outputs and ground truth labels [16]:

(GTi == α ∩ PLi == α) IoU = ∀i ∀i (GTi == α ∪ PLi == α) I i represents a pixel at the location i GTi = the ground truth class

(8)

Scaling and Cutout Data Augmentation for Cardiac Segmentation

607

PLi = the predicted label α = class label. Table 1 shows the results of the performance for the three CNN networks both without and with the implementation of data augmentation. It can be seen clearly that U-Net has the best performance for mean accuracy in both tests, either with or without data augmentation. If performance improvement rate is considered, DabNet has a better ability to adapt to the implementation of data augmentation, which is evidenced through the best increment rate in terms of mean accuracy, which is 0.24%. Furthermore, if a comparison is made regarding mean IoU performance between the three CNN networks, the U-Net produced the best performance for both without augmented data and using augmented data. Likewise, suppose the performance improvement rate is judged. In that case, DabNet again improves the most with an increase in performance of 5.14%, which has a structural advantage with better adaptation when augmented data is supplied. In line with the results presented in Table 1 earlier, we will look further at the visual examples of each segmentation result presented in the two images below. Visually, as shown in Fig. 5, especially in the upper row of images, U-Net and TernausNet have provided excellent segmentation results. The performance differences in each class are minor; as such, all classes are well presented. On the other hand, the segmentation result for DabNet looks terrible where one of the class labels is not detected correctly. Furthermore, suppose the images in the lower row of Fig. 5 are observed. In that case, the results show a better image contrast, resulting in better segmentation performance except for DabNet, whereby some of its class labels look overfitted. Figure 6 shows additional results of segmentation output using data augmentation. It is proven that upon closer inspection of the two input images, DabNet segmentation results have improved significantly, where each class label has better representation when compared to the segmentation results without implementing data augmentation strategy as shown previously. Table 1 Performance results of cardiac segmentation without and with data augmentation Method

Acc

Performance IoU Performance Parameter increase (%) Original Data increase (%) Original Data augment augment

U-Net [10] 0.9852

0.9859

0.07

0.7159

0.7257

1.37

31,033,988

TernausNet 0.9847 [11]

0.9853

0.06

0.7147

0.7150

0.04

22,927,292

DabNet [12]

0.9761

0.24

0.5504

0.5787

5.14

11,950,404

0.9738

608

E. Elizar et al.

Fig. 5 Output sample of segmentation without data segmentation

Fig. 6 Output sample of segmentation with data segmentation

5 Conclusion The proposed scaling and cutout data augmentation strategy have increased the semantic segmentation performance of the cardiac MRI. The improvement rate is visibly higher for the DabNet architecture compared to U-Net and TernausNet. In fact, some of the previous total miss can be segmented relatively well with the help of augmented data. Note that both the original MRI and ground truth labels have undergone geometric transformations with the same upscaling factor and cutout size. A more complex data augmentation method will be explored for future work that includes synthetically generated data through an adversarial network. Furthermore, the addition of an attention mechanism might also be able to locate the organs more accurately.

Scaling and Cutout Data Augmentation for Cardiac Segmentation

609

Acknowledgements The researchers acknowledge research funds from Universiti Kebangsaan Malaysia through Geran Universiti Penyelidikan (GUP-2019–008) and Ministry of Higher Education Malaysia through Fundamental Research Grant Scheme (FRGS/1/2019/ICT02/UKM/02/1).

References 1. Soatto S, Chiuso A (2016) Modeling visual representations: defining properties and deep approximations 2. Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. 2017-Oct:843–852. https://doi.org/10.1109/ICCV.2017.97 3. Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6(1):1–48. https://doi.org/10.1186/S40537-019-0197-0/FIGURES/33 4. Zulkifley MA, Abdani SR, Zulkifley NH (2020) COVID-19 screening using a lightweight convolutional neural network with generative adversarial network data augmentation. Symmetry 12(9):1530. https://doi.org/10.3390/SYM12091530 5. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Accessed 31 Jan 2022. [Online]. Available: http://www.robots.ox.ac.uk/ 6. Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 07–12 June, pp 1–9. https://doi.org/10.1109/ CVPR.2015.7298594 7. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2323. https://doi.org/10.1109/5.726791 8. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386 9. Chen C et al (2021) Enhancing MR image segmentation with realistic adversarial data augmentation 10. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Lecture notes in computer science (including subseries Lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9351, pp 234–241. https://doi. org/10.1007/978-3-319-24574-4_28 11. Iglovikov VI, Shvets AA (2021) TernausNet. In: Computer-aided analysis of gastrointestinal videos, pp 127–132. https://doi.org/10.1007/978-3-030-64340-9_15 12. Li G, Yun I, Kim J, Kim J (2019) DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation. Accessed 31 Jan 2022. [Online]. Available: https://github.com/Reagan 1311/DABNet 13. Simard PY, Lecun YA, Denker JS, Victorri B (2012) Transformation invariance in pattern recognition—tangent distance and tangent propagation. In: Neural networks: tricks of the trade. Lecture notes in computer science, vol 7700, pp 235–269. https://doi.org/10.1007/9783-642-35289-8_17 14. Mohamed NA, Zulkifley MA, Ibrahim AA, Aouache M (2021) Optimal training configurations of a CNN-LSTM-based tracker for a fall frame detection system. Sensors 21(19):6485. https:// doi.org/10.3390/S21196485 15. Bernard O et al (2018) Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging 37(11):2514– 2525. https://doi.org/10.1109/TMI.2018.2837502 16. Abdani SR, Zulkifley MA, Siham MN, Abiddin NZ, Aziz NAA (2020) Paddy fields segmentation using fully convolutional network with pyramid pooling module. In: 2020 IEEE 5th international symposium on telecommunication technologies ISTT—proceedings, pp 30–34. https://doi.org/10.1109/ISTT50966.2020.9279341

An Improved Method to Recognize Bengali Handwritten Characters Using CNN Monishanker Halder , Sudipta Kundu, and Md. Ferdows Hasan

Abstract The process of attaching a symbolic identification to a character’s visual is known as character recognition. There are numerous handwritten characters of various languages, Bengali is one of them. Detecting handwritten documents is exceedingly challenging in today’s digital age due to the vast range of uses, unique shapes, and morphologically complicated formation. Besides, people’s handwritings are identical to them and vary according to sizes, shapes, and styles. Bangla has a lot of similar-looking characters that can be confusing to anyone who is not familiar with the language. In many circumstances, a single dot or mark distinguishes one character from another. Our paper describes an approach to recognize Bangla handwritten characters based on convolutional neural network. Here, we develop a new convolutional neural network model and test our model with CMATERdb 3.1.2 dataset. Our model obtains an average training accuracy of 98.78%, average validation accuracy of 98.33%, and average test accuracy of 98.21% on the alphabets of the dataset which outperforms the performances with the existing methods in the literature. This study paves the way for further advancements in the field of Bangla handwriting recognition by offering a framework for individual character recognition. Keywords Bengali handwritten character recognition Convolutional neural network (CNN) CMATERdb



 Image processing 

1 Introduction The process of extracting or identification of a character lies in an image is known as character recognition. This type of recognition can be divided into two groups [1]. The characters from printed literature are one, while the characters written by M. Halder (&)  S. Kundu  Md.F. Hasan Department of Computer Science and Engineering, Jashore University of Science and Technology, Jashore 7408, Bangladesh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_43

611

612

M. Halder et al.

humans are the other one. Handwritten letter identification becomes the most prevalent proposal points to researchers all around the world in the field of artificial intelligence. Optical character recognition (OCR) is a mechanical or electrical translation of an image from a handwritten or printed type of text to a machine-encoded text from a scanned document or image of the document [2]. Despite some studies in English and various commonly spoken languages, progress rate of handwriting recognition of Bengali remains low [3]. This is because each language has its own properties such as marks, genres, and character shapes. The recognition of handwritten characters is more difficult than that of printed characters. Furthermore, individual people’s personalities are distinct from one another and vary in size, shape, and style. Bangla HCR has ten times the complexity of certain other languages, owing to the morphologically complex structure of Bangla letters. Bangla has a lot of similar-looking characters that can be confusing to anyone who is not familiar with the language. In many circumstances, a single dot or mark distinguishes one character from another. Although the nature of several sets of characters might usually be relatively similar [4], there are major distinctions in the writing style of Bangla language users. This situation has a critical effect on the improvement of programmed frameworks that endeavor to recognize manually written characters. Hasan et al. [1] developed a method in 2021 that recognizes handwritten Bengali letters using deep CNN. They achieved 98.03% accuracy while using CMATERdb 3.1.2 dataset for classification purposes. In 2013, Banerjee and Chaudhury [5] proposed a scheme for Bengali and Devanagari text identification. They proposed an approach for the extraction and identification of Bengali and Devanagari text from video clips with complicated backgrounds and obtained accuracy rate of 78.92%. Purkaystha et al. [6] developed an image-based recognition system for handwritten Bengali characters in 2017 using the deep convolutional neural network. Using 50 alphabet classes of characters, the classification accuracy for this experiment was 91.23%. Chowdhury et al. [7] presented some major factors as well as some processing techniques, used to enhance the performance of character recognition and help to develop optical character recognition-based system with better accuracy. Hasan et al. [8] claimed to have an accuracy of 98.13% and 81.83% for the identification of Bengali isolated characters as well as handwritten compound characters, respectively. In 2018, Chowdhury et al. [9] proposed a method which convert image of Bengali handwritten text to editable text. They acquired 88–95% accuracy for individual characters. Rabby et al. [10] also contributed to this field and proposed a method having accuracy of 97.73% when using Ekush dataset, and 95.01% cross-validation accuracy on the CMATERdb dataset. Rahman et al. [11] in 2015 developed a CNN-based recognition system for handwritten characters in Bengali. For the 50 class, the accuracy of the CNN approach is 85.96%. Custom datasets were created with a total of 20,000 samples and 400 photos per class, and the image resolution was 28  28.

An Improved Method to Recognize Bengali Handwritten Characters …

613

In this paper, we proposed an approach and constructed a new CNN model for recognizing Bengali isolated handwritten characters. Our proposed CNN model works around recognizing 50 basic isolated Bangla handwritten characters with highest accuracy. Among the 50 characters, there are 39 consonants and 11 vowels. The CMATERdb 3.1.2 dataset is used for classification, training, validation and testing purpose in our work. The model employed with an adaptive learning method called stochastic gradient descent (SGD). From the experimental results, we have found that a training accuracy of 98.78%, validation accuracy of 98.33%, and testing accuracy of 98.21% are achieved from the model which indicates better performance results in the literature. Our work in this paper is organized as follows. Section 2 will cover the proposed methodology. Next, the experimental results in Sect. 3. And finally, we concluded in Sect. 4.

2 Proposed Methodology The following Fig. 1 demonstrates the basic phases in our suggested approach for recognizing isolated Bangla handwritten characters.

Fig. 1 Proposed system architecture

614

2.1

M. Halder et al.

Image Dataset Collection and Preprocessing

Collection of Sample Image. The dataset CMATERdb 3.1.2 is used for our work, which was constructed for a chip application (CMATER) for preparation, instruction, and investigation. The dataset comprises of 50 different handwritten Bengali letters, with 39 constants and 11 vowels in 50 basic characters. The training and test datasets are placed separately within the dataset folder. Each of these directories has 50 subfolders corresponding to 50 different characters. This balance eliminates the problem of class imbalance in the model during training. The dataset contains data of handwritten characters from different people of different ages and genders. This allows each character to take a different format. In the Fig. 2 below, there are some photo illustrations that include the main Bengali consonant character “অ” by hand. Preprocessing. Some preprocessing techniques are applied on the images before putting them into the model to make it easier to extract characteristics from them. The pixel value of the magnified image is a single integer in the range 0–255 which illustrates the glossiness of pixels. A pixel with a value of 0 is considered black, and a pixel with a value of 255 is considered white. All input images were resized into a 32  32 dimension. Scaling every image to the same range [0, 1]. Augmentation techniques shear and rotation are applied to the images using the shear tool and image rotation method. Since the size of the collected character images is different at first, they are all reduced to 32-bit size so that the input form is consistent. Fig. 2 Raw image of “অ” character

Fig. 3 32  32 resized images of character “অ”

An Improved Method to Recognize Bengali Handwritten Characters …

615

Fig. 4 Examples of vowel pictures from the dataset

Figures 3, 4, and 5 represents 32  32 resized images of a sample character, examples of vowel characters and examples of consonant characters from the dataset used as input to the model. Dataset Formation. We constructed the picture matrix after converting the images to gray scale. Each value was divided by a maximum of 255 to convert the pixel data to a value between 0 and 1. This reduced the processing cost. To make several one-dimensional arrays of photos, we divided the row and column. Each character was identified by a classifier number found in each row of photos. Arrays were constructed from the dataset in the first place. The first array training set was created to train the machine, the second array test set was created for testing, and the training sets were further categorized into two categories: validation and training. A quarter of the photos from the original training dataset was chosen to produce the validation dataset, which was used to validate the model while it was being trained as well as the test dataset. First of all, 4(four) arrays were created from the dataset where first two array x_train and y_train were constructed to train the model, and next two array x_test and y_test were constructed to test the model. Again, we splitted training data into two set called training and validation dataset. Ten percent (10%) data from the main dataset was separated and used to construct the test dataset (x_test and y_test). Figure 6 illustrates about training, validation, and testing dataset information, and Fig. 7 illustrates the dataset information about individual characters in the dataset. Classify Using CNN. As already expressed, this work employs a CNN-based demonstration to perform the classification task. CNN performs nicely for classification related tasks in the field of computer vision, which is one of the main reasons for its widespread use. CNN is a well-known and profound learning module that can learn highlights from input pictures without the requirement for broad preprocessing. A CNN design typically involves a number of non-linear transformations at various stages, followed by a classifier. This is responsible for the classification. The sample photo is used as input to a CNN model trained by learning properties of the image. The final output level compares the image to the provided label. CNN’s success is due to the principle of back propagation. The gradient produced by the mismatch error for predicting the image label is fed back from the output label in the forward path. The parameters on the various layers are adjusted for reducing errors in the engendered curves. This process is repeated until the model is near saturation. As you approach saturation, one can calculate the final accuracy of your model. This is how the CNN model was created. The model

616

M. Halder et al.

Fig. 5 Examples of the consonants from the dataset

Fig. 6 Summary of the dataset used

consists of eight convolutional layers with a max pooling layer applied after each pair of convolution layers. A fully connected layer (flat and dense) is placed prior to the last layer which generates the result. The input of a convolutional layer is “l” feature maps and produced “k” feature maps as output. The filter size is “n * m”. So, the total number of parameters are “(n * m * l + 1) * k”. The layers of our CNN model are concisely described below: Convolutional Layer. This layer receives input in the form of a training image of size (256  32  3). Fold the image using a total of 32 sizes (3  3) filters to extract features of the submitted photos. This layer uses the ReLU activation function to introduce a non-linear aspect to the network. This layer uses the strides

An Improved Method to Recognize Bengali Handwritten Characters …

617

Fig. 7 Number of images in each character

function size (1, 1) to introduce a non-linear aspect to the network. This activation function is accountable for converting the input signal of each layer into the output signal of the next layer. This activation function is suitable for models because it is faster and more operative in performing similar settings than other similar types of activation functions. ReLU stands for rectified linear unit which represents the function below. ReLuð xÞ ¼ maxð0; xÞ

ð1Þ

The geometry of the first convolution layer’s output result is 32  32  32 (Here, 32 symbolizes that the images are convoluted with 32 filters). The “same padding” strategy is utilized by this layer to define the output’s height and weight. The parameter number is 7168. Max Pooling Layer. This max pooling layer receives the output produced by the previous layer’s ReLU (2  2). This layer is primarily used to support the model to avoid overfitting during training. Such a pooling layer is typical after two folding layers. This layer shrinks the form to exactly half its original size (16  16  256). This has a parameter of 0. This helps to combat the overfitting of the model. Convolutional Layer 1. This convolution layer uses 128 filters (3  3) like the other 64 filters. This layer changes the current dimension to 16  16  128 and at the same time sends parameter number 295,040 to other collapsing layers. This layer uses the strides function size (1, 1) to introduce a non-linear aspect to the network. Convolutional Layer 2. Like the previous convolutional layer, this one has the same setup as the one before it, except the parameter number is 147,584.

618

M. Halder et al.

Max Pooling Layer 1. Just like the past max pooling layer, this one employs the same setup and dropout component. This method reduces the size of the form to exactly half of its original size (8  8  128). Convolutional Layer 3. This layer takes after the same structure as the other convolution layers within the demonstration but increments the number of channels to 64(3  3 estimate). The current shape is at that point changed over to a shape of 8  8  64 utilizing the esteem 73,792 as a parameter. This layer uses the strides’ function size (1, 1) to introduce a non-linear aspect to the network. Convolutional Layer 4. This layer is a duplicate of the preceding convolution layer, but with 36,928 more parameters. Max Pooling Layer 2. This layer, like the previous two max pooling levels, achieves the same result by half the current shape (4  4  64). Flatten Layer. Flattening is the way of transforming data into a one-dimensional array for use in the next layer. Flatten the output of the convolution layer to create a single long feature vector. It is also associated with a final classification model called a fully connected layer. The output shape is 1024. Dense Layer. The dense layer is a deeply connected neural network layer. That is, all neurons in the dense layer receive input from all neurons in the previous layer. The output shape is 1024, although there is a 0.2 dropout. To obtain 1,049,600 parameters, ReLU activation is also employed. Dense Layer 1. In a dense layer, all of the preceding layer’s outputs are given to all of the neurons, with each neuron providing one output to the next layer. This is the most basic layer in neural networks. Although there is a 0.2 dropout, the output form is 1024. In this situation, ReLU activation is also used to retrieve 1,049,600 parameters. Dense Layer 2. The output shape is 50. Softmax activation is utilized to obtain 51,250 parameters in this case. Each node will offer a probability value, which will be applied to the fifty output nodes in order to select the picture label. The node with the highest probability score is then classed as the image’s related label. The equation for softmax activation function is given below. exi softmaxðxiÞ ¼ PN

j¼1 xj

ð2Þ

We have used a total of 60 filters each having first (5, 5) and then (3, 3). The size of the pools is (2, 2), and the number of nodes used is 500. The below Fig. 8 illustrates the CNN architecture of our model. Optimization. The model is employed with an adaptive learning method called Adam optimizer. Instead of having a fixed learning rate for all parameters, an optimizer is used for training. This strategy ensures that the most effective learning rate is employed throughout the learning process by allowing the learning rate to fluctuate continuously when training a model. When weights get high gradients

An Improved Method to Recognize Bengali Handwritten Characters …

619

Fig. 8 Diagram of our CNN architecture

while being updated, Adam optimizer seeks to minimize the current learning rate. On the other hand, in case, the weights are at times or not all through overhauled amid the learning preparation, the adequacy of the learning rate is moved forward. Adam optimizer default values were used to train the proposed CNN model (learning rate = 0.001). We also used the categorical cross-entropy loss function in the compilation of the model. The time-consuming task of manually tweaking the learning rate became obsolete when you chose to employ the Adam optimizer. Training the Model. The contribution of this classification model was evaluated using cross-entropy loss (between label and prediction). This method guaranteed that the layer’s neurons did not begin training immediately after immersion, but instead remained within tolerable limits throughout initialization. Batches of size 64 were used to train the model. We moreover included the categorical lesson, which can offer assistance to diminish learning rates when approval misfortune is not making strides. We trained the model with a total of 20 epochs in this case.

3 Experimental Analysis and Discussion The experiment is performed on a system having configuration, Intel(R) Core(TM) i5-7200U CPU @ 2.50–2.71 GHz, 8 GB RAM, Windows 10 Education edition 64-bit Operating System. Using the specified system, we run the model in Google

620

M. Halder et al.

Colaboratory which is a cloud-based and GPU-enabled platforms provided by Google for research purpose in the field of machine learning, data analysis, education, etc. During our experiment, Google colab had the following versions of methods and application packages which includes Python 3.6.9, TensorFlow 2.7.0, Keras 2.7.0, Matplotlib 3.2.2, Pandas 1.1.5, NumPy 1.19.5, and OpenCV version of cv2 4.1.2.

3.1

Training and Validation Accuracy and Loss

The Fig. 9 represents the training and validation accuracy with respective to number of epochs, and Fig. 10 represents the training and validation loss with respective to number of epochs.

3.2

Performance Accuracy

After completing the training, the model’s final training and validation accuracy are 98.78% and 98.33%, respectively. The demonstration is at that point tried on the test dataset, yielding a test accuracy of 98.21%. The model’s performance is summarized within the Table 1. From the confusion matrix depicted in Fig. 11, we evaluated the macro average and weighted average values of precision, recall, and f1 score. We found the values

Fig. 9 Training and validation accuracy

An Improved Method to Recognize Bengali Handwritten Characters …

621

Fig. 10 Training and validation loss

Table 1 Model performance

Dataset

Total number of images

Accuracy (%)

Training dataset Validation dataset Test dataset

19,452 2162 2402

98.78 98.33 98.21

0.98 for precision, 0.98 for recall, and also 0.98 for f1 score for both macro and weighted average values.

3.3

Testing Result

We also tested our system for individual character recognition and found a very good result which is shown in above Table 2. During the test, only the character “jha” is recognized incorrectly once among 10 test cases, and other characters were recognized correctly 10 times out of 10 times. A comparison table according to accuracy metrics is shown in Table 3 from which we can summarize that our method yields better performance than others.

622

M. Halder et al.

Fig. 11 Confusion matrix

Table 2 Testing results of different characters Sample image

Number of tests

Test result Positive

Negative

Average accuracy (%)

10

10

0

99.995

10

10

0

99.975

10

10

0

100

10

10

0

100

10

9

1

90.55

10

10

0

96.255

10

10

0

100

10

10

0

99.975

10

10

0

100 (continued)

An Improved Method to Recognize Bengali Handwritten Characters …

623

Table 2 (continued) Sample image

Number of tests

Test result Positive

Negative

10

10

0

100

10

10

0

98.88

10

10

0

99.91

10

10

0

99.99

10

10

0

100

10

10

0

96.515

Table 3 Comparing our model performance with others

Average accuracy (%)

Works

Accuracy (%)

Hasan et al. [1] Purkaystha et al. [6] Chowdhury et al. [7] Rahman et al. [11] Abir et al. [12] Proposed model

98.03 91.23 95.25 85.96 91.1 98.21

4 Conclusion and Future Work From the data in the experimental analysis and discussion section, we can summarize that our proposed model generates better results than other methods described in the literature. Considering the overall performance, the system will help researchers and developers to develop application software’s which are dependent on Bangla handwritten characters. Though the proposed system does an excellent job at recognizing individual Bangla alphabet letters, it has some flaws like there is no way to detect a string of characters such as a word. Also, full-text recognition is not yet supported. In future, we will try to develop a system which can recognize compound Bengali handwritten characters or full texts. We also have plans for using different datasets to test our model performance as well as different neural network models may be integrated to develop a powerful and robust system.

624

M. Halder et al.

References 1. Hasan MN, Sultan RI, Kasedullah M (2021) An automated system for recognizing isolated handwritten Bangla characters using deep convolutional neural network. In: 2021 IEEE 11th IEEE symposium on computer applications & industrial electronics (ISCAIE). IEEE, pp 13–18 2. Ahmed MS, Gonçalves T, Sarwar H (2016) Improving Bangla OCR output through correction algorithms. In: 2016 10th International conference on software, knowledge, information management & applications (SKIMA). IEEE, pp 338–343 3. Alom MZ, Sidike P, Taha TM, Asari VK (2017) Handwritten Bangla digit recognition using deep learning. arXiv preprint. arXiv:1705.02680 4. Bhattacharya U, Shridhar M, Parui SK, Sen PK, Chaudhuri BB (2012) Offline recognition of handwritten Bangla characters: an efficient two-stage approach. Pattern Anal Appl 15(4):445–458 5. Banerjee P, Chaudhuri BB (2013) An approach for Bangla and Devanagari video text recognition. In: Proceedings of the 4th international workshop on multilingual OCR, pp 1–5 6. Purkaystha B, Datta T, Islam MS (2017) Bengali handwritten character recognition using deep convolutional neural network. In: 2017 20th International conference of computer and information technology (ICCIT). IEEE, pp 1–5 7. Chowdhury AA, Ahmed E, Ahmed S, Hossain S, Rahman CM (2002) Optical character recognition of Bangla characters using neural network: a better approach. In: 2nd ICEE 8. Hasan MM, Abir MM, Ibrahim M, Sayem M, Abdullah S (2019) AIBangla: a benchmark dataset for isolated Bangla handwritten basic and compound character recognition. In: 2019 International conference on Bangla speech and language processing (ICBSLP). IEEE, pp 1–6 9. Chowdhury S, Wasee FR, Islam MS, Zaman HU (2018) Bengali handwriting recognition and conversion to editable text. In: 2018 Second international conference on advances in electronics, computers and communications (ICAECC). IEEE, pp 1–6 10. Rabby ASA, Haque S, Abujar S, Hossain SA (2018) Ekushnet: using convolutional neural network for Bangla handwritten recognition. Procedia Comput Sci 143:603–610 11. Rahman MM, Akhand MAH, Islam S, Shill PC, Rahman MH (2015) Bangla handwritten character recognition using convolutional neural network. Int J Image Graph Signal Process 7(8):42 12. Abir BM, Mahal SN, Islam MS, Chakrabarty A (2019) Bangla handwritten character recognition with multilayer convolutional neural network. In: Advances in data and information sciences. Springer, Singapore, pp 155–165

Dynamic Pricing for Electric Vehicle Charging at a Commercial Charging Station in Presence of Uncertainty: A Multi-armed Bandit Reinforcement Learning Approach Ubaid Qureshi, Mehreen Mushtaq, Juveeryah Qureshi, Mir Aiman, Mansha Ali, and Shahnawaz Ali Abstract We consider the price design for a non-stationary demand problem faced by a commercial charging station to maximize its profit over the long run. As the electric vehicles arrive randomly at a charging facility with uncertain demand and completion deadlines, the charging station operator continuously faces the task to adjust the price of charging to maximize profit. If prices are too high, the demand will be low, and hence, the average profit will be marginal. In contrast if prices are low, the charging station is losing on the additional profit had the prices been higher. Thus, at each demand, there is a corresponding price which will maximize the profit. In order to determine the profit maximizing price, we run price experiment in which the agent learns about the demand and profit-maximizing price by interacting with the environment and receiving the reward. We transform the problem as nonstationary multi-armed bandit reinforcement learning problem that can deal with demand uncertainty and cost with the objective of maximizing expected average profit. Numerical simulations validate the performance of our pricing scheme. Keywords Electric vehicle · Charging station · Multi-armed bandit · Reinforcement learning

U. Qureshi (B) Department of Electrical Engineering, Indian Institute of Technology, Delhi, India e-mail: [email protected] Department of Electrical Engineering, University of Kashmir, Srinagar, India M. Mushtaq · J. Qureshi · M. Aiman · M. Ali · S. Ali Department of Electrical Engineering, Institute of Technology, University of Kashmir, Srinagar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_44

625

626

U. Qureshi et al.

1 Introduction The transport sector is a major polluter globally and was responsible for about 72% of carbon dioxide emissions in the year 2020 according to a report submitted by International Energy Agency (IEA). Thus, transitioning to a zero-emission transport is a crucial step for future and to address such environmental and energy challenges in the transport sector. Electric vehicles seem to be a potential alternative offering reduction of transport emissions as well as petroleum consumption [1]. The working and management of electric vehicle charging station are like that of a commercial gas or petroleum station in the perspective of buying resources from wholesale markets and reselling them to the customers. Several countries have extensively adopted electric vehicle into their transportation systems. For example, the New York City has replaced one third of its taxis with electric vehicles in the year 2020, and Shenzhen which is the fourth largest city in China is operating a fleet of 22,000 electric taxis that serve people’s daily transportation demands [2]. However, electric vehicle industry still faces many shortcomings related to the management which leads to new optimization challenges aiming to develop efficient models to manage electric vehicle and consider all the constraints related to the industry. One of those constraints being the random charging demand of electric vehicles at charging stations. As the electric vehicle arrives randomly at a charging facility with random charging demand and random completion deadlines, the problem of being able to meet the customer demands becomes difficult for the charging station owner. Another major constraint to be considered is the price design of charging units for the EV charging station in presence of uncertain demand. The two traits of price designing need to be exploited: Sellers can vary prices nearly continuously and often randomize those changes for learning purposes, and a seller considers automated pricing policy to set real-time retail prices with incomplete demand information. Lastly, the profit of the charging station owner is also random and needs to be optimized. All these three constraints collectively create an interesting problem for the charging station owner, since the profit depends on the pricing policy, and the best pricing policy is determined by the demand. There is a considerable literature that has addressed the pricing of electric vehicle charging station. With the exponential growth of the electric vehicle industry, there is a perquisite that needs to be provided for smooth running of electric vehicle charging station [3]. One of those perquisites is the pricing in electric vehicle charging station. References [4] and [5] have used a non-linear solution to estimate the prices that an EV must pay and have put an effort in designing the demand response as to enhance the efficiency of charging stations. In [6], the authors have used a pricing strategy in a charging station that points toward diminishing cost of power infused with the delay off electrical vehicle users. In [7] and [8], a new exploration was done by using battery storage system which deals with hour-wise prices and based on this power is given to an electric vehicle. In this paper, they have also evaluated demand. Taking a note of all these approaches, one thing that is to be kept in mind is lack of finding a balance when load is charged with respect to time and when

Dynamic Pricing for Electric Vehicle Charging …

627

demand is uncertain with that of load. A lot of strategies have been proposed to solve this problem, but the consequences lie in charging of these EVs when loads are coarse, due to which there will be a lot of shambling of loads [9]. The solution to this question is in [10], which proposed a coordinated dynamic pricing. This CDP method in comparison with other methods produced astounding results. While talking about a charging station, the main aim of it should always be profit maximization, which was not provided by any of the methods. Another thing which is to be kept in is the technique of pricing in all the mentioned methods are offline and demand is accepted as a constant. Therefore, to achieve a result that is more authentic and pragmatic, a dynamic approach must be used which is rigorous and can be adapted in various lines of coming times [11]. Various approaches of reinforcement learning have been used for optimization of profits in [12] and [13]. In these approaches, the demand is known beforehand (prognosticated) which is directly related to profits, so in conclusion, it can be said that demand is certain. Researchers of [14] and [15] have used Markov’s decision process for a Markovian distribution which is related to events that will occur in coming time with respect to changing prices. These approaches will not lead to profit maximization which should be the main aim of a reinforcement-based algorithm. Thus, maximization of profit is the result of using a reinforcement learning algorithm which is based on learning, subjective of the fact that algorithm should not have any sense of advent of electric vehicles, and the algorithm must have a sense of capitalization, i.e., profit maximization as stated in [16] and [17]. The main approach used by authors works based on facilitating the profit, but here, a proper reward-based system for pricing is not scrutinized as per the requirements, when a shift in load is perceived. The answer to this question lies in using a model free agent reinforcement learning approach [18]. In [19], there is an approach which is related to charging/discharging of an EV by taking into consideration continuous state/action spaces which constitute charge/discharge process of an electric vehicle. In [20], an algorithm has been proposed which uses an online approach for pricing which solves the problem of shifting load to a great extent. Therefore, it is clear from the above survey that all these approaches are carried out when demand is not varying which is practically non-existent. Thus, in this paper, we resort to the widely known framework of multi-armed bandit (MAB) reinforcement learning approach to address the basic problem of an electric vehicle charging station owner, who wants to assign a price to the charging units without knowing the demand of the market. With the help of the MAB algorithm, we propose a single price and observe the outcome due to this choice. The MAB algorithm selects arms (prices) sequentially in real time with the goal of balancing currently earning profits and learning about the demand for future profits. The goal is to design a pricing policy that minimizes the regret, i.e., the loss the seller incurs in the choice of a suboptimal arm, and at the same time is able to converge to the optimal price choice while satisfying the already mentioned constraints of random demand, price design, and profit of the charging station. The main contributions of this paper are summarized as follows:

628

U. Qureshi et al.

(1) We formulate the basic framework for the dynamic pricing problem for the uncertain demand of an electric vehicle charging station, from the perspective of charging station operator. (2) We transform the problem into a multi-armed bandit reinforcement learning problem to design the pricing scheme for controlling the charging demand and for increasing the total profit using the reward-based model of reinforcement learning. (3) We validate our algorithm with simulations results for the proposed pricing algorithm. The rest of the paper is organized as follows: Sect. 2 presents the model and formulation of the complete price design problem, Sect. 3 presents the solution techniques used for our model, Sect. 4 discusses the results for our simulations and shows that our algorithm performs optimally, and finally, Sect. 5 presents the conclusions.

2 Model and Problem Formulation A. System Modeling We consider the operation of a monopolistic electric vehicle charging station over a time horizon divided into different time slots. For the proposed model, there are two factors that need to be considered: the supply and the demand. The supply is assumed to be infinite, and the demand is always uncertain. The goal is to learn about the demand and design a pricing policy that will maximize the profit for the charging station owner. Figure 1 shows the schematic diagram of the proposed framework in which EV owners arrive at the charging station randomly demanding random energy, and the charging station must decide the price which maximizes the profit. We consider a set of electric vehicles arriving at a charging station with random demand ‘d’ and at random time interval. We must assign prices ‘pt ’ at each interval of time to every single unit of electricity needed to charge every vehicle just to maximize the average profit of the establishment. As, the demand is varying with time, there is a need to adjust the prices accordingly which makes it a problem for charging of electric vehicles because the uncertainty in demand will lead to losses if not optimized correctly. Thus, to settle this problem, we use the reinforcement learning framework in which we divide the time horizon into three time slots of morning, evening, and night. In each interval, our proposed model learns about the demand of market and puts up a price for the units of electricity used to charge the electric vehicles. Therefore, our proposed model earns while it learns and thus offers profit maximization. B. Problem Formulation Let P(t) be the price of unit charge of electricity at the charging station, V i be the consumers preference of consumer i, then the user’s utility can be written as

Dynamic Pricing for Electric Vehicle Charging …

629

Fig. 1 Model of electric vehicle charging station for uncertain demand

Ui = Vi − P

(1)

Also, the EV owner will charge vehicle only if: Ui > 0, or Vi > P

(2)

We assume there exist a finite set of prices: P = { p1 p2 . . . . Pn }

(3)

that the charging station owner can choose from. For each price in p, there exists an unknown true demand D(p), and thus, the true profit is given as π (p) = pD(p). Since the demands at different prices are unknown, thus the true profits π (p) are unknown. The true profits can be observed by the charging station operator by selecting one price at a time. Thus, the problem is to determine the price which maximizes the average profit of charging station at different intervals of time. We formulate the price design problem to determine the uncertain demand of electric vehicles arriving at the charging station. There is a need to learn about the demand before putting up a price for the charging units. Thus, we run price experiments using the MAB algorithm to maximize the profit of the station. Algorithm 1 summarizes the proposed dynamic pricing model based on non-stationary multi-armed bandit approach.

630

U. Qureshi et al.

Algorithm 1

Proposed dynamic pricing model

Data

Number of EVs at charging station Price per charging unit Step size parameter α = 1/k ∈ (0,1]

Ensure

Charging demand at different time intervals

Result

Initialize Q(s, a) for each interval Initialize Q(s, a) ∈ q for each step of interval Choose an action ‘a’ using the non-stationary algorithm Take action ‘a’ and observe ‘R’ Q(s, a) = Q(s, a) + [R(s, a)−Q(s, a)/k] if ‘R’ is maximum profit Update Q else return end if end for

Considering the above-discussed formulation, we describe the solution technique used to learn about the uncertain demand and the optimal profit maximizing profit price in the next Sect. 3.

3 Solution Techniques We propose a multi-armed bandit reinforcement learning approach at commercial charging station which strategically deals with price design for the uncertain demand at different time intervals. The two different approaches used to understand the price optimization problem are discussed below. A. Stationary Multi-armed Bandit Algorithm The stationary MAB is a problem where the bandit and the distribution at each time step remain the same. Each choice of bandit results in a reward from a distribution which does not change over time. The goal is to maximize the total reward with every choice or step for the long run. The stationary algorithm carries out its goal by using strategies that prove to be very useful for the reward maximization. The strategies used are discussed below: 1. Greedy algorithm: It is the type of algorithm which gives a localized optimum solution and aims at maximizing immediate rewards. By selecting greedy actions, the exploitation of the current knowledge is done. 2. Epsilon-greedy algorithm: This algorithm is used to balance the trade-off between the exploration and exploitation. In this approach, the action with the highest estimated rewards is selected most of the time, and with a small probability of ‘ε’, we choose to explore, i.e., not to exploit what we have learned so far.

Dynamic Pricing for Electric Vehicle Charging …

631

3. Epsilon-greedy with upper confidence bound algorithm: This algorithm utilizes the uncertainty in the estimates of action values for the balancing of the exploration and exploitation. In this approach, such non-greedy actions are selected or preferred which have the potential of being optimal. The relative performance of these strategies over 1000 steps for 10-armed bandit can be seen from Figs. 2 and 3. In all of these strategies, the action value estimates are formed using the sample average technique and Gaussian distribution with mean 0 and standard deviation of 1. The figures show an increase in expected rewards as the time goes by but in the beginning, the greedy method improves faster than other methods, but then it gets stuck at a lower level. The greedy method performs significantly worse in the long run because it converges at sub-optimal policy. The figures show that the greedy method finds

Fig. 2 Average performance of stationary bandits over 1000 steps

Fig. 3 Optimal performance of stationary bandits over 1000 steps

632

U. Qureshi et al.

the optimal action in only one third of the tasks approximately. The ε-greedy method can perform better due to its continuous exploration. The upper confidence bound algorithm performs better than other algorithm as it limits the receiving rewards for maximum profit. The multi-armed bandit algorithms that we have discussed above use the averaging methods, and these methods are suitable only for a stationary environment but not where the bandit is changing over time, i.e., when the environment is non-stationary. These approaches converge to the true value over time, but in case of a non-stationary problem where the true value keeps on changing, an approach which converges to a single value will not work. Thus, for such environment, non-stationary algorithm of MAB is used. B. Non-stationary Multi-armed Bandit Algorithm The non-stationary algorithm is categorized as one of the multi-armed bandit algorithms in reinforcement learning where the bandit is changing over time or we can say, where the environment and its related information are non-stationary. In the non-stationary algorithm, the true value of a function keeps on changing, and thus, an approach which converges to a single value will not work for it. As the true value changes with time, the most useful data is going to come from the most recent information, and the older information will provide less useful data. Thus, in case of non-stationary problems, it makes sense to give preference to the most recent information over the older rewards by weighting recent rewards more heavily than long past ones. The most common method of doing this is to fix the step size parameter (α) which ranges from 0–1. The step size parameter requires some conditions to assure its convergence for each action ‘a’ selected, described as follows: ∞ 

αk(a) = ∞

(4)

k=1 ∞  [αk(a)]2 = ∞

(5)

k=1

0 = 0

(4)

f (x) = 0 for x < 0

(5)

The pooling layer has a size of 4 × 4, and the dropout rate of the dropping layer is 0.5. The second hidden layer consists of one more convolutional layer but with 64 different self-learning filters of size 3 × 3 each. It also consists of a max pooling layer of size 4 × 4 and one dropout layer with 0.5 dropping rate and ReLU activation function. Then, there is an output layer. The output layer consists of a flatten layer and multiple dense and dropout layers. The first dropout layer has a dropping rate of 0.5 after which there is a dense layer that consists of a ReLU activation function. Then, there is one more dropout layer having a dropping rate of 0.5 followed by a dense layer which contains softmax as an activation function. We have used softmax as an activation function as it supports multi-classification. The softmax function can be mathematically formulated as ez i σ (z i ) = k j=1

(6)

ez j

A brief description of the number of layers used in CNN with the output shapes and parameters used is given in Table 1, and the CNN model explained here has been summarised in Table 2. Parameters and Hyper parameters. Parameters such as weights and biases were used in our model for training purposes. The hyper parameters used in our model are coefficients of dropout rate, coefficients of learning rate. Some other hyper parameters used were kernel size, batch size, activation function, number of hidden and dense layers in architecture and the number of epochs. The values of the different hyper Table 1 Model architecture

Layer

Output shape

Parameters

conv1d (Conv. layer)

(None, 83,32) 128

max_pooling1d 2d (Max pooling) (None, 20,32) 0 dropout (Dropout layer)

(None, 20,32) 0

conv1d (Conv. layer)

(None, 20,64) 6208

max_pooling1d_1 (Max pooling)

(None, 10,64) 0

dropout_1 (Dropout layer)

(None, 10,64) 0

flatten (Flatten layer)

(None, 640)

0

dropout_2 (Dropout layer)

(None, 640)

0

dense (Dense layer)

(None, 256)

164,096

dropout_3 (Dropout layer)

(None, 256)

0

dense_1 (Dense layer)

(None, 2)

514

702

G. Prajapati et al.

Table 2 Summary of the convolutional neural network Layer

Conv-1 Pooling-1 Conv-2 Pooling-2 Flatten Dense-1 Dense-2

Number of Kernals

32



64



Size of Kernal

3×3



3×3

Size of pooling



4×4



Dropout rate



0.5 –

Activation function used ReLU















2×2









0.5

0.5

0.5



ReLU





ReLU

Softmax

parameters used in our model are: number of layers as 11, size of batch as 32,256, number of epochs as 100 and the activation function as ReLU.

5 Result and Discussions For the comparative study of the network intrusion detection system, we have used five techniques, namely “convolutional neural network (CNN), decision tree, logistic regression, Naive Bayes and K-nearest neighbours”. These models were trained and tested on Kaggle and implemented using the Python programming language. After successfully implementing the network intrusion detection system using the above-mentioned five techniques, we can conclude that the convolutional neural network developed by us proved to be the best classifier for network intrusion detection systems as it performed better than all the other methods on the testing dataset. The proposed model used ReLU as an activation function because it supports nonlinearity and overcomes the problem of vanishing gradient. Also, we have used softmax as an activation function because it can perform multi-classification when compared to other activation functions like sigmoid mainly used for binary classification. The model was trained for 100 epochs by taking the batch size as 32. The accuracy was found to be maximum with 95% on training and on testing, it was found to be 91%. We have also used other performance measures like recall, precision and F1score to validate our results. To serve as a comparison for the above hyper parameters, the model was also trained with a batch size of 256. The accuracy achieved in this case was found to be slightly lower than what we found with a batch size of 32. This can be seen in Fig. 6 clearly where the model accuracy for the two different batch sizes versus the epochs has been plotted. After comparing the deep learning CNN model with two different batch sizes, we compared it with four other models. To analyse these four models, we have used ROC curves to keep track of true positive and false positive rates which has been shown in Fig. 7. From the graphs given below, we were able to analyse the area under the ROC curve for the different types of algorithms and ascertain that the ROC curve proved to be a good validator for the accuracy achieved in the different types of models as we can see that the ROC for the decision tree as well as the accuracy was found to

Anomaly Based Network Intrusion Detection System for IoT

703

Fig. 6 Accuracy for batch size: 256 and 32, respectively, for 100 epochs

be the lowest among all. On the other hand, logistic regression was seen to have the greatest accuracy as well as ROC curve among all four ML classifiers. The results obtained by these supervised learning models are shown in the tables below. The best accuracy on the testing dataset was found to be in the case of our proposed deep learning model that is 91.23% (see Table 3). The performance measures of the Naive Bayesian classifier has been depicted in Table 4, showing a

Fig. 7 ROC analysis curves for decision tree, KNN, logistic regression and Naïve Bayes, respectively, from top to bottom (left to right)

704

G. Prajapati et al.

Table 3 Performance measure of deep learning Accuracy (%)

Recall (%)

Precision (%)

F 1 -score (%)

Train

95.82

94.60

96.26

94.34

Test

91.23

89.53

91.71

89.90

Table 4 Performance measure of Naïve Bayes Accuracy (%)

Recall

Precision

F 1 -score (%)

Train

97.21

97.34

98.11

97.01

Test

83.43

74.78

86.32

79.50

high accuracy on the training dataset. However, the accuracy for the testing dataset dropped by more than 13% making the model weaker when compared to the higher accuracy of the deep learning model developed by us. Decision tree proved to be the weakest model with the lowest testing accuracy of 81.31%. Even the recall and F1-score were below 80% for this model (see table 5). K-nearest neighbour was a simple model with the highest training accuracy as can be seen in Table 6. However, the accuracy dropped by more than 15% on the testing dataset which meant that the KNN model was suffering from a high over fitting problem. Logistic regression for binary classification problems had a higher testing accuracy when compared with the other four classification methods as well as the ROC curve for the same covered greater area (see Table 7). However, the testing accuracy was still lesser than the deep learning model proposed by us in this research work. Table 5 Performance measure of decision tree Accuracy (%)

Recall (%)

Precision (%)

F 1 -score (%)

Train

99.28%

99.01

99.13

99.18

Test

81.31%

75.61

81.22

78.26

Table 6 Performance measure of KNN Accuracy (%)

Recall (%)

Precision (%)

F 1 -score (%)

Train

99.23

100.00

99.35

99.51

Test

83.63

78.88

86.11

83.20

Table 7 Performance measure of logistic regression Accuracy (%)

Recall (%)

Precision (%)

F 1 -score (%)

Train

98.32

97.12

98.34

97.35

Test

84.46

80.54

83.63

81.03

Anomaly Based Network Intrusion Detection System for IoT

705

6 Conclusion In this paper, we have highlighted the importance of network intrusion detection systems and how the growth of IoT calls for more efforts into this technology. The malicious activity can be prevented and controlled if we have an efficient intrusion detection system. To develop such a system, we tested several algorithms like CNN and supervised machine learning algorithms such as KNN, Naïve Bayes, logistic regression and decision tree. The deep learning algorithm produced by us gave the best results among all with an accuracy of 91%. The results found that the accuracy of the model increased through pre-processing as data pre-processing being a vital step can affect the exactness of the classifier. This has been proved by the various result analysis provided in the tabular form in the previous section. As part of future research, we will pay attention to increase the accuracy of the model and the detection rate of U2R and R2L attacks. We will aim to use the combination of CNN and LSTM as a learning model to improvise the accuracy and precision rate. And use a different dataset such as “CSE-CIC-IDS2018” which containing about 16,000,000 samples and has 17% of instances containing anomaly threats. [21] Our next focus is based on various intensive learning models which can be used for multivariate network intrusion detection in IoT. Examples of such networks are “recursive neural networks” and “intensive trust networks” [22, 23].

References 1. Mahmoud R, Yousuf T, Aloul F, Zualkernan I (2015) Internet of things (IoT) security: Current status, challenges and prospective measures. In: 2015 10th International conference for internet technology and secured transactions (ICITST), pp 336–341. https://doi.org/10.1109/ICITST. 2015.7412116 2. Dean A, Agyeman MO (2018) A study of the advances in IOT security. In: Proceedings of the 2nd international symposium on computer science and intelligent control. https://doi.org/10. 1145/3284557.3284560 3. Lazarevic A, Kumar V, Srivastava J (2005) Intrusion detection: a survey. In: Kumar V, Srivastava J, Lazarevic A (eds) Managing cyber threats. Massive computing, vol 5. Springer, Boston, MA. https://doi.org/10.1007/0-387-24230-9_2 4. Meng W, Li W, Kwok L-F (2015) Design of intelligent knn-based alarm filter using knowledgebased alert verification in intrusion detection. Wiley Online Library. https://onlinelibrary.wiley. com/doi/abs/10.1002/sec.1307 5. Sahu SK, Mehtre BM (2015) Network intrusion detection system using J48 decision tree. In: 2015 International conference on advances in computing, communications and informatics (ICACCI) (2015), pp 2023–2026 6. Bhosale KS, Nenova M, Iliev G (2018) Modified Naive bayes intrusion detection system (MNBIDS). In: 2018 International conference on computational techniques, electronics and mechanical systems (CTEMS) (2018), pp 291–296 7. Edgar TW, Manz DO (2017) Logistic regression. Logistic regression—an overview | science direct topics. https://www.sciencedirect.com/topics/computer-science/logistic-regression 8. Smeda K (2019) Understand the architecture of CNN. Medium. https://towardsdatascience. com/understand-the-architecture-of-cnn-90a25e244c7

706

G. Prajapati et al.

9. Rahul, Kedia P, Sarangi S, Monika (2020) Analysis of machine learning models for malware detection. J Discrete Math Sci Crypt 23(2):395–407.https://doi.org/10.1080/09720529.2020. 1721870 10. Gupta RA, Raj A, Arora M (2021) IP traffic classification of 4G network using machine learning techniques. In: 2021 5th International conference on computing methodologies and communication (ICCMC), 2021, pp 127–132. https://doi.org/10.1109/ICCMC51019.2021.941 8397 11. Jin D, Lu Y, Qin J, Cheng Z, Mao Z (2020) SwiftIDS: real-time intrusion detection system based on LightGBM and parallel intrusion detection mechanism. Comput Secur 97:101984. https://doi.org/10.1016/j.cose.2020.101984 12. Aljamal I, Tekeoglu A, Bekiroglu K, Sengupta S (2019) Hybrid intrusion detection system using machine learning techniques in cloud computing environments. 84–89. https://doi.org/ 10.1109/SERA.2019.8886794 13. Mendonça R, Teodoro A, Rosa R, Saadi M, Carrillo D, Nardelli P, Rodriguez DZ (2021) Intrusion detection system based on fast hierarchical deep convolutional neural network. IEEE Access. 9:61024–61034. https://doi.org/10.1109/ACCESS.2021.3074664 14. Manimurugan S, Almutairi S, Aborokbah M, Chilamkurti N, Ganesan S, Patan R (2020) Effective attack detection in Internet of medical things smart environment using a deep belief neural network. IEEE Access, pp 1–1. https://doi.org/10.1109/ACCESS.2020.2986013 15. Belavagi MC, Muniyal B (2016) Performance evaluation of supervised machine learning algorithms for intrusion detection. Procedia Comput Sci. https://www.sciencedirect.com/science/ article/pii/S187705091631081X 16. Vinayakumar R, Alazab M, Soman KP, Al-Nemrat A, Venkatraman S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access, pp 1–1. https://doi.org/10. 1109/ACCESS.2019.2895334 17. Farahnakian F, Heikkonen J (2018) A deep auto-encoder based approach for intrusion detection system. 178–183. https://doi.org/10.23919/ICACT.2018.8323688 18. Agarwal M, Pasumarthi D, Biswas S, Nandi S (2014) Machine learning approach for detection of flooding DOS attacks in 802.11 networks and attacker localization. Int J Mach Learn Cybern. SpringerLink https://link.springer.com/article/10.1007/s13042-014-0309-2 19. Tavallaee M, Bagheri E, Lu W, Ghorbani A (2009) A Detailed analysis of the KDD CUP 99 Data Set. In: Submitted to second IEEE symposium on computational intelligence for security and defense applications (CISDA) 20. Sun P, Liu P, Li Q, Liu C, Lu X, Hao R, Chen J (2020) DL-ids: extracting features using CNNLSTM hybrid network for intrusion detection system. Secur Commun Netw. https://www.hin dawi.com/journals/SCN/2020/8890306/ 21. Solarmainframe (2020) IDS 2018 intrusion CSVS (CSE-CIC-IDS2018). Kaggle. Retrieved 30 Jan 2022, from https://www.kaggle.com/solarmainframe/ids-intrusion-csv 22. Mohammadi S, Desai V, Karimipour H (2018) Multivariate mutual information-based feature selection for cyber intrusion detection. 1–6. https://doi.org/10.1109/EPEC.2018.8598326 23. Almiani M, AbuGhazleh A, Al-Rahayfeh A, Atiewi S, Razaque A (2019) Deep recurrent neural network for IOT intrusion detection system. Simul Model Pract Theory. https://www.sciencedi rect.com/science/article/abs/pii/S1569190X19301625

CoviIS: A Real-Time Covid Help Information System Using Digital Media Niharika Ganji, Arnab Sinhamahapatra, Shubhi Bansal, and Nagendra Kumar

Abstract In this paper, we present CoviIS, an emergency Covid Information System that utilizes digital media to provide helpful information in uncertain times of the Covid pandemic. Since people require different types of information during times of crisis, the findings obtained from this work integrate various pieces of information into a form of coherency, thereby aiding people during an emergency and reducing further damage. The study brings together real-time Covid informatics employing multiple methods such as general search, social media search, and geographical analysis. To assist people in this emergency, we also conduct a comprehensive analysis of news articles and social media activities to provide an economically feasible solution. CoviIS helps locate the nearest hospitals and Covid isolation centers for seeking medical attention during an emergency. CoviIS also provides emergency information through news articles and social media posts, thereby serving as an important Covid emergency tool. Keywords Covid assistance · News search · Geographical analysis · Information system

N. Ganji · S. Bansal · N. Kumar (B) Indian Institute of Technology Indore, Indore 453552, India e-mail: [email protected] N. Ganji e-mail: [email protected] S. Bansal e-mail: [email protected] A. Sinhamahapatra National Institute of Technology Durgapur, Durgapur 713209, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_50

707

708

N. Ganji et al.

1 Introduction The coronavirus disease (Covid) has been a significant cause of death of lakhs of people worldwide and has scarred several others with severe health issues. Many countries have been severely affected. The Covid pandemic has led to a drastic loss of human life worldwide and disrupted the day-to-day activities of people. The social and economic disruption caused by it is devastating. It has led to severe economic problems such as loss of livelihood, market downfall, poverty increase, and education loss. The situation is further worsened by the continuous development of the coronavirus variants. The newly detected variant of the coronavirus, Omicron, exacerbates the present-day disruption of the world. Omicron is seen as a highly transmissible variant, almost six times the delta variant—which has already rattled several lives during the second wave. Therefore, it is the need of the hour to be prepared both mentally and physically for the upcoming battle against the corona variant—Omicron. It is a well-known fact that information saves lives. Several deaths during the Covid pandemic were due to the lack of information regarding the location of local hospitals, availability of oxygen, and vacant beds in them. Many people lost lives during the second wave of the Covid due to the unavailability of vacant beds and oxygen facilities at the nearest hospitals. Unawareness regarding the location of nearest corona isolation centers, hunger relief centers, and night shelter camps significantly affected the lives of individuals. Hence, we endeavored to bridge the information gap during the Covid predicament. We aim at designing and developing a web application that provides the required data to cope with the challenging times of a medical emergency. Most of the Covid assistance bots developed so far are either confined to presenting the Covid statistical data or identifying disease based on symptoms and providing information regarding immediate action to be taken. COVIBOT [1] helps in the identification of Covid disease based on symptoms provided by the user and advises regarding the precautionary measures that need to be adopted. Biswas et al. [2] developed the ‘GOCOVID’ that presents the real-time Covid statistics in India. Desai et al. [3] developed ‘covi-help’ that presents information regarding Covid and co-vaccine built upon NLP. However, none of the bots developed so far acts as a onestop solution during a Covid emergency. Other problems prevalent during Covid, such as the location of sample collection centers or hunger relief centers nearby, are not focused upon. Considering this as the use case, we aim at creating an information system that serves as a coherent and comprehensive solution to all the Covid problems. Our information system is built as per the following five-fold analysis: (a) statistical analysis, (b) geographical analysis, (c) news analysis, (d) social media analysis, and (e) sentiment analysis. Our system provides the latest Covid statistics, such as the total number of cases, number of new cases, and the total number of deaths worldwide, showcasing the above data thematically over the world map. Further, our work acquaints the individuals regarding local hospitals, corona isolation cen-

CoviIS: A Real-Time Covid Help Information …

709

ters, testing labs, and sample collection centers nearby. Information regarding district containment zones and hunger and night shelter camps can also be known quickly and accurately. Attributed to the fact that awareness regarding the current Covid situation is highly essential, and our system is equipped with search functionality that yields the latest news articles across the world. As social media is viewed as one of the key contributors in disseminating information across a wide array of people [4], we have connected our system with Twitter. Our information system is equipped with search functionality that provides the latest tweets filtered upon the given keyword. The knowledge of sentiment of any piece of information plays a crucial role in mentally preparing individuals against any emergency news. Therefore, our system shows the sentiment of news articles obtained from any keyword search. Additionally, our system is also linked to the official websites of essential health organizations. People using our system can quickly get information regarding the preventive measures to be adopted during Covid times. Our key contributions are as follows: • We design an interactive information system for Covid assistance. • Our system provides real-time Covid information analytics across the world. • Our system is integrated with two different search functionalities, i.e., news search and social media search. • Our system is integrated with sentiment analysis to show the polarity of given news information. • The system is also amended with helpful real-time information from different health organizations. The rest of the paper is organized as follows. In Sect. 2, we discuss the related work. Section 3 presents our design methodology, followed by Sect. 4 concerning the working of our system. Finally, we conclude our work in Sect. 5.

2 Related Work Covid assistance systems have become an important research topic in recent times. People across diverse fields are joining hands to provide relevant information ranging from identifying early stage symptoms to the times diagnosed with Covid. Covid assistance applications are developed to display real-time statistics, suggest measures to be followed on the occurrence of Covid symptoms, and analyze people’s sentiment during the Covid pandemic. We mainly classify the Covid assistance systems into three categories: (a) Covid informatics applications, (b) Covid advisory bots, (c) Covid sentiment analysis.

710

N. Ganji et al.

2.1 Covid Informatics Applications Everyone battling the Covid pandemic has many questions to be answered. Biswas et al. [2] deployed ‘GOCOVID’ in India to solve this issue at a large scale. It has real-time features on the dashboard that help society be aware of the Covid statistics. However, the information displayed is confined to India and does not contain any vaccination data. Dong et al. [5] are a user-friendly tool developed to track and visualize the data of the coronavirus disease. The dashboard developed illustrates the real-time Covid infected cases, deaths, and recovered cases across countries. Nevertheless, the dashboard displays the statistics only at the country level in India. Our system provides statistical information regarding the total number of cases, new cases, and deaths worldwide. Information regarding state-wide and district-wide Covid cases in India is presented thematically in our application. Our system is also updated with the latest vaccination informatics displayed thematically over the world map.

2.2 Covid Advisory Bots Covid advisory bots are applications that provide information regarding the location of the nearest hospital or isolation center when are observed with Covid symptoms or any other helpful information during a medical emergency. Covid advisory bots also assist in finding the location of the nearest vaccination center and help maintain proper mental health. Khan and Albatein [1] developed COVIBOT, using Whatsapp API, that provides information about the immediate treatment to be followed when observed with Covid symptoms. The motivation behind the development of Covisstance chatbot [6] was to locate available beds and ventilators in the hospital nearest to the user’s location. Vaan et al. designed the Cory COVID-Bot [7] to advise the difficult-to-reach populations about the health guidelines during the challenging times of Covid. The chatbot is powered by artificial intelligence (AI) and is supported by behavioral changeability that makes the reach to people more prompt. Desai et al. [3] built a chatbot with the help of NLP and deep learning, named ‘covi-help’ that gives information about co-vaccine. Symptoma [8] is a Covid detecting tool built upon AI that detects whether a person is infected with the disease based on the symptoms given as text input. The tool has been trained by considering over 20,000 diseases and is available in 36 languages. Wang et al. [9] introduced query bot to ease the process of analyzing a patient’s clinical history for a doctor before diagnosing. The bot uses the pre-trained contextual language model, and the k-means model is used to train the model. The user feedback is used for tuning the performance using the Rocchio algorithm. Medbot [10], a multilingual chatbot based on natural language processing (NLP), acts as a personal virtual doctor and provides free primary healthcare advice to chronic patients. Nazrul et al. [11] designed a mobile application named ‘Muktomon’ for

CoviIS: A Real-Time Covid Help Information …

711

mental health care during the Covid times. The application includes an AI-based chatbot to provide virtual mental health assistance, virtual therapy videos, other support to communicate with doctors, and a list of authentic news sources. Covi-Help1 is brought in collaboration to provide information regarding medicine, availability of beds, oxygen, and plasma. CovidAsha2 connects the individuals willing to help with the oxygen and medicine suppliers. The application also allows users to input their location to determine the nearest vaccination centers available. However, the applications discussed above are not diverse enough in supplying the information. Our system provides a wide range of information such as the location of local hospitals, corona isolation centers, sample collection centers, and hunger relief centers presented as corona layers functionality over the map of India.

2.3 Covid Sentiment Analysis Kaur et al. [12] propose a framework that analyzes and visualizes the trends in human behavior during the Covid pandemic based on Twitter data. Tweets are extracted at regular intervals, processed, and cleaned for sentiment analysis. The dashboard developed presents the data distribution classified into various human emotions analyzed across different months. Barkur et al. [13] study the sentiments of Indians toward nationwide lockdown using Twitter data. The sentiments of people based on Twitter posts are analyzed and presented graphically. Venigalla et al. [14] developed mood of India furing Covid that analyzes real-time tweets across India during Covid times to determine the mood of the nation. The author classifies the tweets into seven categories: six basic emotions and a neutral category. Mathur et al. [15] utilize Twitter data across the world during the Covid emergency to dissect the feelings of the individuals. The author uses NRC Word-Emotion Association Lexicon—a list of English words with real-valued intensity scores for eight primary emotions, to classify the dataset into anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. Covid assistance systems developed so far have focused upon either one or two of the above use cases. In contrast, our work is complementary to the existing works where we provide one remedy for all the Covid problems. Unlike the existing works, our information system integrates multiple solutions into a single platform. Our system provides real-time information about the current situation of Covid that has been implemented based on statistical analysis, geographical analysis, news analysis, social media analysis, and sentiment analysis.

1 2

https://www.covihelp.info/. https://www.covidasha.org/.

712

N. Ganji et al.

Fig. 1 Flowchart of methodology

3 Design Methodology CoviIS is an emergency information system that acquaints individuals with necessary information during an emergency, arming them for the battle against the Covid pandemic.3 By utilizing the information presented by our system, people can seek immediate medical assistance and save their lives. Further, our system also supplies the latest information via news articles and social media posts alerting the people regarding the pandemic situation. Our system has been built to provide real-time Covid information, including different functionalities explained across the following five analysis techniques: Fig. 1 shows the flowchart of the proposed methodology. We divide the proposed methodology across five analysis techniques: (a) statistical analysis, (b) geographical analysis, (c) news analysis, (d) social media analysis, and (e) sentiment analysis. We present the real-time Covid data across six main features of our system: (a) Home, (b) GlobalData, (c) IndiaCovidDetails, (d) General News Search, (e) Covid News Search, and (f) Social Media Search. Data from different sources is brought together in different features of our information system, as shown in Fig. 1. General news search and Covid news search features are further equipped to provide the sentiment of the news articles.

3

https://basicdashflask.herokuapp.com/.

CoviIS: A Real-Time Covid Help Information …

713

Fig. 2 Total number of Covid cases worldwide

3.1 Statistical Analysis Our information system displays the real-time Covid information statistics presented thematically in the GlobalData feature of our application. It includes information such as the total number of cases and the number of new cases encountered worldwide. It also contains information regarding the total death count and the new death count across the world. One can easily hover over any country on the world map to know the current Covid data statistics of the country. The total number of Covid cases viewed thematically across the world can be seen in Fig. 2. Vaccination plays a crucial role in fighting against the Covid battle. The more people vaccinated, the less severe the Covid pandemic. Significant in the present day, our system aims at representing the vaccinated population across each country thematically, as illustrated in Fig. 3. The intensity of the color in Fig. 3 is interrelated to the number of people vaccinated per 100K population. The higher intensity of green color represents that a larger population is vaccinated in the country and vice versa. One can also hover over any country on the map to determine the number of people vaccinated per 100K population of the country. Additionally, our system displays the list of the countries with the highest number of people vaccinated per 100K population. A real-time bar chart depicting the countries with the highest vaccinated population and the number of people vaccinated per 100K population in the respective country is displayed, as shown in Fig. 4. The chart changes dynamically based on the number of people vaccinated.

714

N. Ganji et al.

Fig. 3 Vaccination informatics

Fig. 4 Countries with highest people vaccinated per 100K population

3.2 Geographical Analysis Our system presents the information regarding Covid cases analyzed geographically. Our application’s IndiaCovidDetails feature provides emergency information through corona layers functionality. One can hover over their location to get information about the total number of Covid cases in their state and district. Figure 5 shows

CoviIS: A Real-Time Covid Help Information …

715

Fig. 5 District-wide Covid cases in Telangana

the total number of district-wide cases across Telangana. The intensity of the color signifies the number of cases in the district. A higher intensity of red color denotes the districts with higher numbers of Covid cases and vice versa. Information regarding the location of the nearest hospital and corona isolation center is of utmost importance during a medical emergency. Therefore, our information system is equipped with the functionality of locating the treatment centers near a user’s location, as shown in Fig. 6. One can hover over any location (in India) to find out information regarding the nearest hospital and seek immediate medical assistance. People requiring immediate medical attention can easily search for the hospitals nearby. Awareness regarding the nearest sample collection center/testing lab can help a person suspicious of the Covid disease get tested quickly and stay isolated from their friends and family in the corona isolation centers. Hence, our system supplies emergency information, such as testing labs, sample collection centers, and corona isolation centers near a user’s location. One can input their location to determine the sample collection center/testing lab nearby. This functionality helps affected individuals predict the disease at an early stage and contain its further spread. Our application also depicts the India’s district containment zones, i.e., red, yellow, and green. The red color indicates regions with a high level of Covid cases and is under strict restrictions from the Government. In contrast, the yellow color signifies the areas with a decent number of Covid cases and urges people residing in this zone to be alert. The green color indicates regions with fewer positive cases and can be considered primarily safe. People affected with corona disease have not only suffered from severe health problems but also faced great financial loss. Many people have lost their livelihood and are now suffering from hunger and starvation. Our corona layers functionality of hunger and night shelter can help users locate the nearest hunger relief camp. One can hover over their location to determine the nearest hunger relief camp, as shown in Fig. 7.

716

N. Ganji et al.

Fig. 6 Treatment centers in user’s locality

Fig. 7 Hunger relief centers in user’s locality

3.3 News Analysis The knowledge of emergency news is very informative during the Covid pandemic. Therefore, our information system is integrated with two news search functionalities, i.e., general news search and Covid news search, that yield news articles based on the keyword given. The Covid news search function displays the latest Covid news articles obtained from NewsAPI filtered upon the keyword given. Users can search for news articles giving their city/state as the keyword and become aware of emergency situations in their localities. This feature helps users easily find any emergency news

CoviIS: A Real-Time Covid Help Information …

717

Fig. 8 Covid news search results for the keyword—‘Delhi’

related to their areas, such as the imposition of lockdown, night curfew, or any other social event. Figure 8 displays the latest Covid-related news articles searched based on the keyword—‘Delhi.’

3.4 Social Media Analysis Social media is considered a key contributor in disseminating information across several people in just a few seconds. The most popular social media applications include Whatsapp, Facebook, and Twitter [16, 17]. In bringing together Covid-related information on social media, we have connected our information system with Twitter. Our information system allows searching real-time Twitter posts based upon a keyword. This search functionality yields the latest ten tweets filtered on the given keyword. Since people post quick updates over Twitter, one can easily search for information using simple keywords such as ‘blood’ and ‘food’ regarding blood and food requirements. Figure 9 displays the tweets obtained as a result of the keyword search—‘Omicron.’

718

N. Ganji et al.

Fig. 9 Tweet search results for the keyword—‘Omicron’

3.5 Sentiment Analysis The knowledge regarding the polarity of news information helps individuals stay alert and be mentally prepared during the Covid times. Our information system provides the sentiment of the news articles, obtained by averaging the polarity score of the text information. The polarity score is defined by the semantic orientation and the intensity of each word in text information [18]. The news articles with a polarity score greater than zero are considered positive, news articles with a polarity score less than zero are considered negative, while those with a polarity score equal to zero are considered neutral. We have used TextBlob4 to derive the sentiment of news articles. Figure 10 represents the sentiment of news articles obtained based on the keyword search—‘Telangana.’ The blue color indicates the news articles exhibiting a positive sentiment, while the red color indicates those exhibiting a negative sentiment. The green color indicates the news articles exhibiting a neutral sentiment.

3.6 Health Safety Our information system also supplies individuals with information regarding the latest Omicron variant referenced from the World Health Organization (WHO).5 Help4 5

https://textblob.readthedocs.io/en/dev/. https://www.who.int/news/item/28-11-2021-update-on-omicron.

CoviIS: A Real-Time Covid Help Information …

719

Fig. 10 Sentiment analysis results

ful information such as transmissibility and severity of Omicron variant is shared. Further, the effectiveness of the prior vaccines and prior SARS-CoV-2 infection alongside suggested precautionary measures to be followed is discussed. Our portal also contains links, which when clicked, will redirect to the websites of important health organizations such as WHO and Co-WIN, further leveraging the individuals with more information. A small segment of the emergency information deployed on our system is shown in Fig. 11.

4 Development Figure 12 shows the different help modules of our web-based information system. We explain the working of our system as follows. 1. Home: The home page displays the real-time vaccinated population across the world thematically, integrated from the website DOMO6 and other helpful information regarding the Omicron variant referenced from WHO. 2. Co-WIN: Link to the Co-WIN website is integrated, providing more information to people. 6

https://www.domo.com/covid19/embed-visualizations/.

720

N. Ganji et al.

Fig. 11 Small segment of information about Omicron on our system

3. WHO: Link to the WHO website is integrated, further providing additional information to people. 4. IndiaCovidDetails: The corona layers functionality is integrated using MapmyIndia API7 to display the helpful information. 5. GlobalData: AnyChart8 is used to display the world choropleth map. The real-time data that appears on hovering is obtained from COVID19API9 and consolidated with the AnyChart map that is being displayed. 6. General News Search: NewsAPI10 is used to fetch the latest ten news articles related to the keyword. 7. Covid News Search: The latest ten news articles based on the given query are fetched from NewsAPI. Further, the articles containing the words Covid, covid, corona, Corona are displayed as the search results. 8. Tweet Search: Real-time English tweets are fetched based upon the keyword given from Twitter’s Search API,11 which is integrated using the tweepy module.12 7

https://apis.mapmyindia.com/advancedmaps/v1/map_js_key/map_load?v=1. 5plugins=coronaLayers. 8 https://www.anychart.com/. 9 https://api.covid19api.com/. 10 https://newsapi.org/. 11 https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-searchtweets. 12 https://docs.tweepy.org/en/stable/api.html.

CoviIS: A Real-Time Covid Help Information …

721

Fig. 12 Depicting different tabs on the web application

The web technologies—HTML, CSS, and Bootstrap have been used for the frontend development.

5 Results and Discussion CoviIS has accomplished its task in connecting individuals with emergency information and providing assistance during times of predicament. 1. Our system presents the Covid data statistically in the GlobalData tab. Information regarding the total number of cases, new cases, and total deaths can be known by hovering over the world map, as shown in Fig. 2. 2. Our system presents the Covid data geographically. IndiaCovidDetails tab of our application displays state-wide and district-wide cases in India that can be viewed thematically over the map, as shown in Fig. 5. 3. Our system supplies emergency information such as the location of nearest hospitals and testing labs through its IndiaCovidDetails feature. Nearest hospitals can be located easily through our corona layer functionality of Treatment Centers, as shown in Fig. 6. Other information such as the location of nearest corona isolation centers, sample collection centers, and hunger relief centers can be searched, as shown in Fig. 7. 4. Our system successfully presents the latest news articles through its general news search and Covid news search features. The tabs yield the latest news articles searched based upon the keyword, as shown in Fig. 8. 5. Our system is also connected with Twitter through its social media search bar. The latest tweets can be retrieved easily based upon the keyword search as shown in Fig. 9. Tweets related to emergency health information from Government organizations and important officials can be searched quickly and easily.

722

N. Ganji et al.

6 Conclusion and Future Work CoviIS is successfully built upon the drive to acquaint individuals with information during an emergency. People can utilize our system to find out the location of the nearest hospital or corona isolation center, helping them during times of medical crisis. Our system allows people to search for Covid-related information quickly and precisely. Our system also highlights the importance of being aware of Covid-related news near one’s location. Government organizations can utilize our system to track major issues in different cities. Our application’s news search bar and tweet search bar can help Government organizations search for the latest information regarding emergencies and address the related issues quickly and accurately. In our future work, we would like to utilize other publicly available resources such as the combination of different news-related websites, consolidation of various social media platforms like Facebook, Twitter, and others, to obtain authentic information quickly during an emergency. Another interesting direction to this work is analyzing social media data utilizing AI to report the fake news about Covid. Fake news can mislead people and worsen the pandemic, triggering more fear and loss.

References 1. Khan NA, Albatein J (2021) Covibot—an intelligent Whatsapp based advising bot for covid19. In: 2021 international conference on computational intelligence and knowledge economy (ICCIKE). IEEE, pp 418–422 2. Biswas S, Sharma LK, Ranjan R, Banerjee JS (2020) Go-covid: an interactive cross-platform based dashboard for real-time tracking of covid-19 using data analytics. J Mech Contin Math Sci 15:1–15 3. Desai V, Naik S, Hirurkar R, Solanki P, Desai S (2021) Chatbot for covidvaccine using deep learning. Available at SSRN 3874429 4. Kumar N, Chandarana Y, Anand K, Singh M (2017) Using social media for word-of-mouth marketing. In: International conference on big data analytics and knowledge discovery. Springer, pp 391–406 5. Dong E, Du H, Gardner L (2020) An interactive web-based dashboard to track covid-19 in real time. Lancet Infect Dis 20(5):533–534 6. Shaik F, Khalid Bashir A, Mahmoud Ismail H (2021) Covisstance chatbot. In: The 7th annual international conference on Arab women in computing in conjunction with the 2nd forum of women in research, pp 1–4 7. Van Baal ST, Le S, Fatehi F, Hohwy J, Verdejo-Garcia A (2022) Cory covid-bot: an evidencebased behavior change chatbot for covid-19. Stud Health Technol Inform 289:422–425 8. Martin A, Nateqi J, Gruarin S, Munsch N, Abdarahmane I, Zobel M, Knapp B (2020) An artificial intelligence-based first-line defence against covid-19: digitally screening citizens for risks via a chatbot. Sci Rep 10(1):1–7 9. Wang Y, Tariq A, Khan F, Gichoya JW, Trivedi H, Banerjee I (2021) Query bot for retrieving patients’ clinical history: a covid-19 use-case. J Biomed Inform 123:103918 10. Bharti U, Bajaj D, Batra H, Lalit S, Lalit S, Gangwani A (2020) Medbot: conversational artificial intelligence powered chatbot for delivering tele-health after covid-19. In: 2020 5th international conference on communication and electronics systems (ICCES). IEEE, pp 870–875

CoviIS: A Real-Time Covid Help Information …

723

11. Islam MN, Khan SR, Islam NN, Rezwan-A-Rownok M, Zaman SR, Zaman SR (2021) A mobile application for mental health care during covid-19 pandemic: development and usability evaluation with system usability scale. In: International conference on computational intelligence in information system. Springer, pp 33–42 12. Kaur S, Kaul P, Zadeh PM (2020) Monitoring the dynamics of emotions during covid-19 using twitter data. Procedia Comput Sci 177:423–430 13. Barkur G, Vibha GBK (2020) Sentiment analysis of nationwide lockdown due to covid 19 outbreak: evidence from India. Asian J Psychiatry 51:102089 14. Venigalla ASM, Chimalakonda S, Vagavolu D (2020) Mood of India during covid-19—an interactive web portal based on emotion analysis of twitter data. In: Conference companion publication of the 2020 on computer supported cooperative work and social computing, pp 65–68 15. Mathur A, Kubde P, Vaidya S (2020) Emotional analysis using twitter data during pandemic situation: Covid-19. In: 2020 5th international conference on communication and electronics systems (ICCES). IEEE, pp 845–848 16. Kumar N, Baskaran E, Konjengbam A, Singh M (2021) Hashtag recommendation for short social media texts using word-embeddings and external knowledge. Knowl Inf Syst 63(1):175– 198 17. Kumar N, Ande G, Kumar JS, Singh M (2018) Toward maximizing the visibility of content in social media brand pages: a temporal analysis. Soc Netw Anal Min 8(1):1–14 18. Kumar N, Nagalla R, Marwah T, Singh M (2018) Sentiment dynamics in social media news channels. Online Soc Netw Media 8:42–54

Distributed Denial of Service Attack Detection Using Optimized Hybrid Neuro-Fuzzy Classifiers Pallavi H. Chitte and Sangita S. Chaudhari

Abstract Currently due to the wide deployment of communication networks, wireless communication networks endure from various attacks, which involve sinkhole attacks, tampering attacks, Denial of Service (DoS) attacks and so on. Therefore, an effective system is obligatory for recognizing the attacks in wireless communication networks. This paper aims to make analysis of different optimal intrusion detection systems (IDS) models used for detecting the presence of intrusion in a network. Feature extraction is carried out by pre-processing with data normalization, where analyze performance of various population-based algorithms for detecting presence of intrusion in the network. Features like statistical, higher order statistical features and improved holoentropy features, exponential moving average (EMA) features are considered in this analysis for the purpose of effective extraction of relevant features. In the detection process, a hybrid neuro-fuzzy classifier is deployed with extracted features where the membership function of the fuzzy classifier is optimally tuned through grasshopper optimization technique. The detection system detects the presence of Distributed Denial of Service (DDoS) attacks in the network. Then, the performance of conventional classifiers together with various optimization techniques is analyzed with respect to extracted features. Keywords WSN · DDoS attack · Intrusion detection · EMA features · Optimized neuro-fuzzy

P. H. Chitte (B) · S. S. Chaudhari Ramrao Adik Institute of Technology, Dr. D. Y. Patil Deemed to be University, Nerul, Navi Mumbai, India e-mail: [email protected] S. S. Chaudhari e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_51

725

726

P. H. Chitte and S. S. Chaudhari

1 Introduction Owing to the increased rate of cybercrime in the advanced technology epoch, information security has become a higher concern in recent days. Information security assures the accessibility of information throughout its transmission from transmitter to receiver in a confident manner against illegal users without varying and deteriorating the data [1]. The most important reasons that inspire intrusion activities to intimidate data systems are insisted for status, economic benefits and fame. A network intrusion detection system (NIDS) [2] can be any software application designed to provide network security. It monitors and analyzes the data collected from different resources. This is made by means of numerous measures [3] which assist the recognition of such happenings. An occurrence does not consent that there is a malevolent activity; actually, several indentations may be caused through certain imprecise activities carried out by certified users of the network. Therefore, more consideration is necessary for resolving the NIDS problems [4]. For offering better information security, a wide range of software and hardware devices could be exploited [5]. For the institutional or personal requirement, an information technology administrator has to introduce an appropriate model and offer the required security solution and sustain its reliability. In addition, guaranteeing the effective and active functioning of information systems is made through offering and sustaining the efficacy of such security metrics [6, 7]. Analogous to the technical improvements, diverse attacks against the information systems were also growing. The recognized attack kinds that were recorded are saved in certain databases. Intrusion detection systems (IDSs) [8, 9] maintain these advanced databases and offer institutional and personal computer systems to scrutinize feasible attacks constantly. IDSs are merely a monitoring and analysis system, and they do not include any alternatives for preventing the intrusions. Intrusion prevention system’s (IPS) are the hardware and software tools, which were introduced to identify and avoid malevolent attacks while it takes place [10, 11]. IPSs, thus, are located on network linked segments, in which they are positioned to avoid malevolent traffic [12]. On the other hand, security problems have accompanied the extensive usage of wireless sensor networks. Wireless sensor network (WSN) proved promising for real-time event detection. Sophisticated motivated cybercriminals are increasing day by day. They are capable to hide their identities, communication, from illegal benefits and use non compromising infrastructure. Various attacking techniques like evasion still undergoes requirement of investigating further to build robust IDS. With different sophisticated techniques and social engineering strategies in mindset, Internet services are threatened by cybercriminals. Therefore, advanced intrusion detection systems are important to protect from modern malware. Also owing to the sincerity of the exploited surroundings and the transmission medium, the network systems endure from different attacks, together with tampering attacks, hijack attacks, forwarding attacks and Distributed denial of Service (DDoS) attacks [13, 14]. The simplest DDoS attack tends to exhaust the resources available to the

Distributed Denial of Service Attack Detection Using Optimized Hybrid …

727

victim node, by transferring additional unnecessary packets and, therefore, halts valid network users from accessing resources or services to which they are allowed. DDoS attacks are intended for adversaries’ requests to destroy, deprave or damage a network, but also for any occurrence that decreases a network’s ability to offer a service. In order to bring robust IDS systems, it is important and necessary to have an overall awareness of the strengths and limitations of contemporary IDS research. There are many existing solutions provide high detection rate or low false alarm rate in detecting DDoS attacks but very few are able to prove their successful performance in real time. This paper introduces novel IDS for DDoS attack where various statistical and higher order statistical, EMA and improved holoentropy features in are derived, then optimized hybrid neuro-fuzzy classifier for precise attack detection is deployed. Here, we tried to improve the performance of neuro-fuzzy classifier (NFC) by adapting whale optimization algorithm (WOA) [15], moth flame optimization algorithm (MFO) [16] and grasshopper optimization algorithm (GOA) [17] for tuning the optimal membership of hybrid NFC. The paper is structured as follows. Section 2 gives overview about related work. Section 3 portrays proposed DDoS attack detection in WSN. Section 4 depicts the results and discussions. Conclusions are drawn in Sect. 5.

2 Related Work Numerous methods have been developed by various researchers for intrusion detection over the years. Many machine learning algorithms are used in existing research work to demonstrate their effectiveness by experimenting them on publically available datasets. Fuzzy techniques proved to be helpful in removing problem of uncertainty. Optimization techniques played a vital role in obtaining global optimum solution. In this section, available literatures are reviewed for the various existing research works in intrusion detection. Baykara et al. have suggested a scheme resulting in minimizing the cost of maintenance, management and configuration, after analyzing the utilization of honey pots on corporate networks. The introduced scheme was a honey pot dependent intrusion detection/prevention system (IDPS) category, and it was capable to demonstrate the network congestion on servers in real-time simulation. It can identify zero-day attack owing to the modeling of intrusion detection (ID) that offers better performance when distinguished to other IDSs. This scheme moreover assists in minimizing the false positive rate (FPR) in IDSs [18]. Zhang et al. have suggested a NIDS depending on hierarchical trust and dynamic state context in WSNs that was reliable and appropriate for steadily varying WSNs categorized by variations in the perceptual surroundings. The trust of cluster heads (CHs) was accessed by neighbor BS and CHs, and thus, the difficulty of the evaluation was minimized by the entire

728

P. H. Chitte and S. S. Chaudhari

erstwhile CHs in networks. In the meantime, the network intrusion detection system (NIDS) approach depending on a trust threshold was portrayed that enhances the applicability and reliability and was appropriate for cluster dependent WSNs. Here, the experimentations and performance point out that the adopted method functions better than the conventional typical system in malevolent recognition and overhead [19]. Aldwairi et al. deployed restricted Boltzmann machine (RBM) scheme for evaluating normal and abnormal net flow traffics. Here, the developed method was valuated using Information Security Center of Excellence (ISCX) dataset. In addition, the outcomes point out those RBMs could be trained effectively to categorize anomalous and normal flow of traffic. Contrasting to earlier investigations, metrics of TPR and FPR were deployed together with the accurateness to analyze the efficiency of RBM as a classifier [20]. Qu et al. have developed a knowledge-based intrusion detection strategy (KBIDS) for linking the gap among balancing. Here, mean shift clustering algorithm (MSCA) was deployed for differentiating the abnormal patterns that imitated the atypical behavior of a WSN. Consequently, the SVM scheme was deployed for increasing the margin among abnormal and normal characteristics and therefore, the error could be reduced that efficiently improves the accuracy of detection. At last, a feature updating policy was introduced and hence, the scheme can adopt with changes in network [21]. Tarfa et al. deployed network intrusion detection system (NIDS) that depends on two schemes, such as bigram and recursive feature addition (RFA) method and consequently, the scheme was designed, executed and analyzed. In addition, the bigram scheme assisted in encoding payload string characteristics for selecting the features. Moreover, a novel estimation measure was employed that evaluated the varied systems and it has chosen the optimum one. The modeled system has revealed a perceptible enhancement on performance by means of diverse measures [22]. Selvakumar et al. have illustrated adaptive IDS depending on fuzzy rough sets for Allen’s interval algebra and attribute selection. It was deployed on network trace dataset for selecting an enormous count of attack data for effectual forecast of attacks in wireless sensor networks (WSNs). Moreover, a fuzzy rough set-based nearest neighborhood classification (FRNN) approach was adopted here for efficient categorization of network trace datasets [23]. Zha et al. have introduced a complex matching accelerator (CMA) facilitated by a nonvolatile memory for quickening up intrusion detection (ID) systems with improved energy efficiency. Accordingly, CMA could be modeled to offer a wide-ranging set of numerical matching operations, ensuing in enhanced exploitation and superior energy efficiency. Moreover, CMA was evaluated and on average, it attains better area minimization, energy utilization reduction and enhancement in exploration distinguished to conventional NIDS. It moreover offered better developments in terms of energy, delay and area [24]. Narendrasinh et al. proposed fuzzy lion Bayes (FLBS) for intrusion detection by grouping dataset into several clusters by the fuzzy clustering algorithm. System was depicted for optimal probability measures along with the lion optimization algorithm. Model is then applied to each data group, to generate the aggregated data, then abnormal nodes are found based on the posterior probability function. The proposed system gives better performance over existing methods [25]. Premkumar et al. proposed a new lightweight Denial of Service (DoS)

Distributed Denial of Service Attack Detection Using Optimized Hybrid …

729

detection scheme called deep learning-based defense mechanism (DLDM) to detect and isolate the attacks in data forwarding phase (DFP). The proposed mechanism is applicable to the network with nodes having low mobility or no mobility [26]. Considering networks complexity in terms of large traffic data of network intrusion interference, Manimurugan et al. proposed adaptive neuro-fuzzy inference system (ANFIS) which is an efficient hybrid technique for classification. To optimize ANFIS, the crow search optimization (CSO) algorithm is used. Optimizing the parameters based on CSO that uses crow’s ability of memorizing warning signals in danger. Performance of proposed system for intrusion detection has been validated using the benchmarked dataset. Model gives improved accuracy and decreased FAR [27]. Borkar et al. suggested ACSO model for prevailing over the restrictions in efficient clustering model. In addition, a two phase classifier scheme by means of SVM was implemented. Here, the malevolent nodes were reported by deploying the acknowledgment oriented method that aided in detecting diverse attacks. Following the detection of these attacks, a high level security development model was deployed over other SNs. This resulted in a safer transmission of packets amid diverse SNs. The examination outcomes have revealed the finer results over evaluated models [28]. Sedjelmaci et al. have introduced a trade-off among the intrusion detection rate and overhead. It fails to emit a node without delay when it reveals a bad indication of malevolent actions as this indication could be temporary or it might be just owing to unreliable or noisy communications. Therefore, a predicament among detection and FPR was considered here. Accordingly, such two security concerns were addressed by a Bayesian approach so as to precisely identify attacks with reduced overhead. Simulation outcomes have illustrated that the adopted security model does attain consistent recognition [3]. From the literature, it has been observed that honey pot algorithm was implemented in [18] which recognize zero-day attack and it also offers minimized FPR. However, it needs more information regarding the attacker. FRNN approach was introduced in [23] which offers reduced FAR and it also offers higher accuracy, but genetic-oriented feature enhancement is not incorporated. RBM algorithm was deployed in [20] that minimize complexity and it also offers high accuracy, but it has to focus more on real-time data. In addition, KBIDS approach was exploited in [21] that provide the reduced FPR and it also offers an increased rate of detection. Nevertheless, it needs consideration on battery depletion. Likewise, RFA scheme was exploited in [22], which offers accurate detection and it also enhances the performance. However, it involves more computational complexity. Also the model which was employed in [19] which offer better efficiency and reduced storage; however, there were possibilities of congestion due to increased data packet length. CMA model was exploited in [24] that provide enhanced utilization and it minimizes the area, nevertheless, it has to consider more on numerical mapping of NIDS rule. At last, the Bayesian approach was suggested in [3] that provide increased accuracy and it includes high payoffs. However, it has to focus on embedding homogenous UAV network. Nevertheless, it must concern on embedding UAV networks. A much

730

P. H. Chitte and S. S. Chaudhari

hazardous malevolent traffic on Internet is DDoS volumetric attack that causes above 65% of every attack. In DDoS attack, numerous attackers organize the distribution of a higher rate of ineffectual data for overloading the nearby network links or victim’s resources. Thus, the recognition of DDoS attacks is significant with sophisticated detecting system. These disadvantages must be focused on improving the performances of IDS models proficiently in the present work.

3 Proposed System The proposed attack detection model is divided in two phases: Feature extraction and Intrusion detection. At first, statistical features, higher order statistical features, technical indicator and improved holoentropy features were extracted. Then, the features were classified via neuro-fuzzy classifier by optimally tuning membership function to detect the existence of DDoS attack in the network. At last, the conventional optimization techniques are analyzed with diverse feature combinations. The introduced network comprises a varied number of nodes represented by N = n1 , n2 , n3 , … nnm wherein nm points out the total node count. Consequently, the destination and source nodes are selected randomly. Here, the nodes are allocated with DDoS attacks and normal nodes in an arbitrary manner. Furthermore, the distances among the nodes are also portrayed primarily. The developed IDS approach for DDoS attack detection encompasses phases like pre-processing, feature extraction and intrusion detection. Figure 1 shows the diagrammatic illustration of the proposed developed model. The implemented IDS approach for DDoS attack detection encompasses three vital phases such as: pre-processing, feature extraction and intrusion detection. The mechanism that takes place in the proposed research work is described below.

Fig. 1 Overall steps of developed attack detection model

Distributed Denial of Service Attack Detection Using Optimized Hybrid …

731

The objective function Obj of the developed work is given in Eq. (1), wherein Acc signifies the detection accuracy via training the system in an optimal manner by tuning the membership function with a proposed algorithm. Obj = Max(Acc)

(1)

3.1 Pre-processing Initially, pre-processing is performed, during which data normalization process is carried out. Data normalization is the organization of data to appear similar across all records and fields. It increases the cohesion of entry types leading to cleansing, lead generation and higher quality data. Data normalization process is deployed during pre-processing. By carrying out data normalization, here the entire data are normalized between 0 and 1.

3.2 Extracting Features From pre-processed data, the statistical features (mean, median and standard deviation), higher order statistical features (entropy, skewness, kurtosis and percentiles), EMA and improved holoentropy features are extracted. The derived statistical features are indicated by FeST . The derived higher order statistical features FeHO is indicated by SF1 + SF2 + SF3 + SF4 = FeHO , where SF1 is skewness, SF2 is kurtosis, SF3 is entropy feature and SF4 is percentile. Skewness It is a symmetry measure or the lack of symmetry exactly. A dataset or distribution is symmetric only if it is similar to the left and right of the center point [29]. The mathematical expression of skewness SF1 is given in Eq. (2). k SF1 =

i=1 (Yi

− μ)3 /k

L3

(2)

Here, Yi = Y1 , Y2 , …, Yk are data points, μ indicates the mean value, L denotes the standard deviation, and k refers to the number of data points. Moreover, L is calculated with k present in the denominator rather than k − 1 while computing the skewness. The skewness value is near zero for any symmetric data, and zero for the normal distribution. Kurtosis It is a measure that identifies whether the data are light-tailed or heavy-tailed and related to the normal distribution. Datasets with less kurtosis tend to provide less

732

P. H. Chitte and S. S. Chaudhari

outlier [29]. Moreover, the datasets with larger kurtosis tend to provide outliers in the mathematical formula of kurtosis SF2 for univariate data such as, Y 1 , Y 2 …, Y k , is expressed in Eq. (3). k  i=1

SF2 =

4 Yi − Y /k L4

(3)

The standard deviation is calculated by the k value present in the denominator rather than k − 1 while computing kurtosis. Entropy Entropy is computed as in Eq. (4), where and u are v the coordinates of co-occurrence matrix [30]. SF3 = Ent = −

 u

g(u, v) log(g(u, v))

(4)

v

Percentile It provides the idea of how the data values are spread over the interval from the smallest value to the largest value [39]. About q percent of data values comes under qth percentile and around 100-q percentage of data values exceeds the pth percentile. The percentile features are denoted by SF 4 [31]. EMA Assume gi as a specific characteristic of a procedure that is the event intensity, which is formulated as per Eq. (5) in which 0 < λ ≤ 1 indicates weighting or smoothing constant, and zi represents the exponentially weighted moving average statistics for ith event intensity observation gi . z i = λgi + (1 − λ)z i−1

(5)

It is assumed that the procedure observation, gi is portrayed as in Eq. (6), wherein εi refers to sequence of identically and independently distributed arbitrary factors with constant variance and zero mean. gi = gi−1 + εi − θ εi−1

(6)

The derived EMA features are indicated by FeEMA . Improved Holoentropy Features The sum of the entropy and the total correlation of the random vector and can be expressed by the sum of the entropies on all attributes. The holoentropy H L ψ (υ) is the total of the entropy and the total correlation of the random vector υ. It is modeled as in Eq. (7). The weighted holoentropy W eψ (υ) is the sum of the weighted entropy on each attribute of the random vector and it is modeled as in Eq. (8). HLψ (υ) = Hψ (υ) + ψ (υ) =

m  i=1

Hψ (yi )

(7)

Distributed Denial of Service Attack Detection Using Optimized Hybrid …

Weψ (υ) =

m 

wψ (yi )Hψ (yi )

733

(8)

i=1

Conventionally, the weight factor wψ (yi ) is computed based on sigmoid function. As per improved model, the weight wψ (yi ) is computed based on softmax classifier as shown in Eq. (9). 

1   wψ (yi ) = 2 1 − 1 + exp −Hψ (yi )

 (9)

The above derived improved holoentropy features are indicated by FeIMH . The extracted statistical features, EMA features and improved holoentropy features are then summed up as Fe, i.e., Fe = FeST + FeHO + FeEMA + FeIMH are then given as input to hybrid neuro-fuzzy classifier for detecting the intrusions.

3.3 Optimized Hybrid Neuro-Fuzzy Classifier The structure includes six layers: input, fuzzy membership, fuzzification, defuzzification, normalization and output. The classifier has multiple inputs and multiple outputs. The initial layer consists of nodes for input features (X 1 to X n ). Every input is represented with three linguistic parameters [32]. Membership layer is the second layer that maps the  input  values of membership functions. Here, a Gaussian memberless constraints and it is smooth for ship function ρi j Xs j is deployed as  it includes  biased derivative. Accordingly, ρi j X s j of jth feature and ith rule is specified by Eq. (10), wherein X s j indicate jth feature and sth sample; σi j and Cs j indicate the width and center of Gaussian function, in that order.     2 (10) ρi j X s j = exp − Cs j − Cs j /2σi2j The firing power ζis of ith fuzzy rule is characterized as in Eq. (11), in which denotes feature count. ζis =

N

  ρi j Xs j

(11)

i=1

The weight among a class and output rule should be larger than the weight among the similar output rule and other classes. For smaller class weights, the output weight for sth sample for kth class is approximated as in Eq. (12).

734

P. H. Chitte and S. S. Chaudhari

βks =

M

ζis ωik

(12)

i=1

In Eq. (12), ωik signify the degree that belongs to kth class for ith rule whereas M corresponds to rule count. In a few cases, the summary of weights might be superior to 1. The network output is thus standardized. The standardization takes place at fifth layer. The standardized value Oks of sth mple at kth pass is specified by Eq. (13). Oks = βks /

k 

βls

(13)

l=1

Subsequently, the class label for Cs sample sth is attained by discovering the highest Oks alue. Such classes comprise the output layer. Cs = max {Oks } k=1,2...k

(14)

Thus, the output attained from optimized neuro-fuzzy classifier provides the final classified output. Here, the features are classified via hybrid neuro-fuzzy classifier. Especially, the membership functions of NFC were optimally tuned via a GOA algorithm. The Gaussian membership function (ρ) is optimally tuned by the GOA. The membership of NFC is given as input for optimization.

3.4 Grasshopper Optimization Algorithm (GOA) GOA is inspired by grasshopper’s swarming behavior. There are two most important features of swarms, such as food source search and its movement. Generally, the search procedure is detached to two phases like the exploitation and exploration. During exploration, the search agent shifts suddenly and during exploitation, the search agent’s shift locally. The numerical depiction to disclose the swarming behavior is exposed in Eq. (15), wherein Oi points out ith grasshopper position, Ii points out social interaction, Fi points out gravity force on ith grasshopper and Wi signifies wind advection. Oi = Ii + Fi + Wi

(15)

For producing an arbitrary behavior, Eq. (15) is modified as revealed in Eq. (16), wherein, n 1 , n 2 , n 3 indicates arbitrary integers among [0, 1]. Oi = n 1 Ii + n 2 Fi + n 3 Wi

(16)

Distributed Denial of Service Attack Detection Using Optimized Hybrid …

735

Equation (17) portrays the social interactions, wherein ai j indicates the distance R −R among ith and jth grasshopper, and it is assessed as ai j = R j − Ri and aˆ i j = jai j i pointed out a unit vector from ith to jth grasshopper. Ii =

M 

r (ai j )aˆ i j

(17)

j=1 j=i

Equation (18) represents the social force strength r , wherein g specify the attraction intensity and l specify attractiveness regarding length scale. r (q) = ge

−q l

− e−q

(18)

The distance between two grasshoppers is allocated in the range [1, 4]. Thus, the F factor in Eq. (15) is computed as shown in Eq. (19), wherein g signify gravitational constant and eˆg disclose a unity vector to earth’s center. Fi = −g eˆg

(19)

In Eq. (15), W is computed as exposed in Eq. (20), wherein v indicate a constant drift and eˆw imply a unity vector in direction of wind. Wi = v eˆw

(20)

On substituting the values of I, W and F in Eq. (15), it is remodeled as in Eq. (21), wherein M points out grasshopper count. Oi =

M 

j =1 j = i

 R j − R i  − g eˆg + v eˆw r R j − Ri ai j

(21)

Conventionally, c is computed based on cmx and cmn. c = cmx − t

cmx − cmn tmax

(22)

Here, the value with the best objective is measured as the target value in GOA; the more accurate target is elected as the global optimal value in searching space.

736

P. H. Chitte and S. S. Chaudhari

Grasshopper optimization algorithm is illustrated in Algorithm 1. Algorithm 1: GOA Algorithm Initialization of population Estimate the fitness as in Eq. (1) Consider T as best search agent While (t < tmax ). Update C based on Eq. (22) For (each search agent) Normalize distance among grasshoppers Update the position based on Eq. (21) Bring the current search agent to the original position End for Update the value of T t =t +1 End while Return T

The implemented IDS approach encompasses the given steps: • At first, the data normalization process is deployed during pre-processing then the features including (FeEMA ) (FeST ) and (FeHO ) are extracted. • Detection stage, where hybrid neuro-fuzzy classifier (NFC) provides the detected output via evaluating the trained information. In this phase, the system is deployed with optimal tuning of membership parameters via grasshopper optimization algorithm.

4 Results and Discussion This work is computing the performance of NFC with optimization-based IDS detection models. The performance of NFC and optimization-based NFCs are evaluated using UNSW-NB15 and DDoS [33, 34] datasets. While describing the datasets, UNSW-NB15 and DDoS were represented as dataset 1 and dataset 2, respectively. In this work, NFC with various standard metaheuristic optimization techniques used to find the best global optimal solution such as WOA, MFO and GOA are used. The performance of NFC, NFC with WOA, NFC with MFO and NFC with GOA are evaluated using accuracy, specificity and false negative rate (FNR), false positive rate (FPR) and negative predictive value (NPV) and it is presented in Table 1. It is observed that for dataset 1, the NFC with GOA model has exposed more effective outputs than the compared schemes in terms of accuracy. Results extracted in Fig. 2 shows that the NFC with GOA approach provides better performance than the values for specificity, NPV and FPR attained by NFC, NFC with WOA and NFC with MFO, respectively. Here, the analysis established the enhanced efficiency of NFC with GOA method with the better output. For dataset 2, as shown in Fig. 3, the integrated optimization models show the efficiency of NFC with GOA, NFC with

Distributed Denial of Service Attack Detection Using Optimized Hybrid …

737

Table 1 Performance analysis of adopted and existing models for dataset 1 and 2 Dataset 1

Dataset 2 NFC + WOA

NFC Specificity 0.8

NFC + MFO

NFC + GOA

NFC

0.801778 0.833333 0.838556 0.67125

NFC + WOA

NFC + MFO

0.673611 0.6775

NFC + GOA 0.673611

Accuracy

0.665281 0.668701 0.713436 0.729912 0.559336 0.56325

0.568421 0.56325

NPV

0.798005 0.800444 0.832178 0.83716

FPR

0.2

0.198222 0.166667 0.161444 0.32875

FNR

0.4581

0.991185 0.529421 0.808815 0.661728 0.655647 0.647934 0.655647

0.667081 0.670816 0.674689 0.670816 0.326389 0.3225

0.326389

WOA and NFC with WOA over NFC model. The fine-tuned system ensures the exact detection of presence of attacker. 1 0.95 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -1.55E-15

NFC NFC + WOA NFC + MFO NFC + GOA

Fig. 2 Performance of proposed approach over compared approaches for dataset 1 (UNSW-NB15 dataset) 0.7 0.65 0.6 0.55 0.5 0.45

NFC NFC + WOA

0.4

NFC + MFO

0.35

NFC + GOA

0.3

Fig. 3 Performance of proposed approach over compared approaches for dataset 2 (DDoS dataset)

738

P. H. Chitte and S. S. Chaudhari

5 Conclusion This work attempts to present, discuss and analyze performance of various population-based metaheuristic algorithms for detecting presence of intrusion in the network. Features like improved holoentropy features, EMA features, statistical and higher order statistical features used in this analysis play the effective role. These features were classified via hybrid neuro-fuzzy classifiers. In particular, the membership functions of NFC were optimally tuned via a GOA algorithm. Then, the conventional classifiers like NFC, NFC with WOA, NFC with MFO and NFC with GOA are deployed and analyzed with respect to the extracted features. At last, the conventional NFC with different swarm search optimization techniques is analyzed with varied feature combinations. Thus, the existence of DDoS attacks in the network was detected. At last, the superiority of offered NFC with GOA scheme was established over the conventional schemes regarding diverse measures.

References 1. Han ML, Kwak BI, Kim HK (2021) Event-Triggered interval-based anomaly detection and attack identification methods for an in-vehicle network. IEEE Trans Inf Forensics Secur 16:2941–2956 2. Carrasco RSM, Sicilia MA (2018) Unsupervised intrusion detection through skip-gram models of network behavior. Comput Secur 78:187–197 3. Sedjelmaci H, Senouci SM, Ansari N (2016) Intrusion detection and ejection framework against lethal attacks in UAV-aided networks: a Bayesian game-theoretic methodology. IEEE Trans Intell Transp Syst 18(5):1143–1153 4. Baddar SAH, Merlo A, Migliardi M, Palmieri F (2018) Saving energy in aggressive intrusion detection through dynamic latency sensitivity recognition. Comput Secur 76:311–326 5. Song G, Khan F, Yang M (2018) Security assessment of process facilities–Intrusion modeling. Process Saf Environ Prot 117:639–650 6. Bouhaddi M, Radjef MS, Adi K (2018) An efficient intrusion detection in resource-constrained mobile ad-hoc networks. Comput Secur 76:156–177 7. Colom JF, Gil D, Mora H, Volckaert B, Jimeno AM (2018) Scheduling framework for distributed intrusion detection systems over heterogeneous network architectures. J Netw Comput Appl 108:76–86 8. Gupta GP, Kulariya M (2016) A framework for fast and efficient cyber security network intrusion detection using apache spark. Procedia Comput Sci 93:824–831 9. Condomines JP, Zhang R, Larrieu N (2018) Network intrusion detection system for UAV Ad-hoc communication 10. Shams EA, Rizaner A, Ulusoy AH (2018) Trust aware support vector machine intrusion detection and prevention system in vehicular ad hoc networks. Comput Secur 78:245–254 11. Viegas E, Santin A, Bessani A, Neves N (2019) BigFlow: Real-time and reliable anomaly-based intrusion detection for high-speed networks. Futur Gener Comput Syst 93:473–485 12. Rajakumar BR, George A (2012) A new adaptive mutation technique for genetic algorithm. In: 2012 IEEE International conference on computational intelligence and computing research. IEEE, pp 1–7. https://doi.org/10.1109/ICCIC.2012.6510293 13. Zhang H, Wang Y, Chen H, Zhao Y, Zhang J (2017) Exploring machine-learning-based control plane intrusion detection techniques in software defined optical networks. Opt Fiber Technol 39:37–42

Distributed Denial of Service Attack Detection Using Optimized Hybrid …

739

14. Swamy SM, Rajakumar BR, Valarmathi IR (2013) Design of hybrid wind and photovoltaic power system using opposition-based genetic algorithm with Cauchy mutation.https://doi.org/ 10.1049/ic.2013.0361 15. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67 16. Mirjalili S (2015) Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm. Knowl-Based Syst 89:228–249 17. Saremi S, Mirjalili S, Lewis A (2017) Grasshopper optimisation algorithm: theory and application. Adv Eng Softw 105:30–47 18. Baykara M, Das R (2018) A novel honeypot based security approach for real-time intrusion detection and prevention systems. J Inf Secur Appl 41:103–116 19. Zhang Z, Zhu H, Luo S, Xin Y, Liu X (2017) Intrusion detection based on state context and hierarchical trust in wireless sensor networks. IEEE Access 5:12088–12102 20. Aldwairi T, Perera D, Novotny MA (2018) An evaluation of the performance of Restricted Boltzmann Machines as a model for anomaly network intrusion detection. Comput Netw 144:111–119 21. Qu H, Qiu Z, Tang X, Xiang M, Wang P (2018) Incorporating unsupervised learning into intrusion detection for wireless sensor networks with structural co-evolvability. Appl Soft Comput 71:939–951 22. Hamed T, Dara R, Kremer SC (2018) Network intrusion detection system based on recursive feature addition and bigram technique. Comput Secur 73:137–155 23. Selvakumar K, Karuppiah M, Sai Ramesh L, Islam SH, Hassan MM, Fortino G, Choo KKR (2019) Intelligent temporal classification and fuzzy rough set-based feature selection algorithm for intrusion detection system in WSNs. Inf Sci 497:77–90 24. Zha Y, Li J (2017) CMA: A reconfigurable complex matching accelerator for wire-speed network intrusion detection. IEEE Comput Archit Lett 17(1):33–36 25. Narendrasinh BG, Vdevyas D (2019) FLBS: Fuzzy lion Bayes system for intrusion detection in wireless communication network. J Central South Univ 26(11):3017–3033 26. Premkumar M, Sundararajan TVP (2020) DLDM: Deep learning-based defense mechanism for denial of service attacks in wireless sensor networks. Microprocess Microsyst 79:103278 27. Manimurugan S, Majdi AQ, Mohmmed M, Narmatha C, Varatharajan R (2020) Intrusion detection in networks using crow search optimization algorithm with adaptive neuro-fuzzy inference system. Microprocess Microsyst 79:103261 28. Borkar GM, Patil LH, Dalgade D, Hutke A (2019) A novel clustering approach and adaptive SVM classifier for intrusion detection in WSN: a data mining concept. Sustain Comput Inf Syst 23:120–135 29. https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm#:~:text=Skewness% 20is%20a%20measure%20of,relative%20to%20a%20normal%20distribution 30. http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0001-37652013000301063 31. https://en.wikipedia.org/wiki/Percentile 32. Nkamgang OT, Tchiotsop D, Tchinda BS, Fotsin HB (2018) A neuro-fuzzy system for automated detection and classification of human intestinal parasites. Inf Med Unlocked 13:81–91 33. https://research.unsw.edu.au/projects/unsw-nb15-data-set 34. https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys#”,https://cse-cic-ids 2018.s3.ca-central-1.amazonaws.com/Processed%20Traffic%20Data%20for%20ML%20A lgorithms/Wednesday-21-02-2018_TrafficForML_CICFlowMeter.csv

An Efficient Framework for Forecasting of Crime Trend Using Machine Learning Technique Bam Bahadur Sinha and Tarun Biswas

Abstract Crime data analysis, response, prediction and recommendation have become a burning concern in the modern era. The crime data analysis problem is challenging due to its diverse characteristics and dynamics such as crime types, rate of crime occurrence and crime pattern. In this paper, crime analysis, response and prevention approach is designed to identify crime-prone areas and uncover hotspots with high likelihood of crime occurrence. The proposed crime analysis technique is simulated on the Chicago data set for crime hotspots identification and mapping crime incidents with crime sites. The proposed work further analyses crime trends over period of time to deduce the safest hours using graph object. Moreover, crime occurrence trends are forecasted using simple time series methods and exponential smoothing techniques. The efficiency of the model is tested using Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). Low value of RMSE and MAPE indicates the supremacy of the proposed model. Keywords Criminology · Data analysis · Machine learning · Crime trend forecasting

1 Introduction With the proliferation of technological advancement, to ensure all-round public safety has become critical and one of the major concerns for criminologist as well as criminal justice industries [1, 2]. The criminology threats continuously affecting the social, economic and personal well-being worldwide. Therefore, people are still enabled their effort to contribute in designing an effective strategy how to do accurate analysis geo-referenced as well noge-referenced crime data? for accurate response and forecasting. After analyzing the crime data over a region based on the crime type, crime rate, crime pattern, hotspots, timing of crime, etc. crime prevention and reducB. B. Sinha · T. Biswas (B) Indian Institute of Information Technology Ranchi, Ranchi, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_52

741

742

B. B. Sinha and T. Biswas Steps in Crime Investigation

Categories of crime

Who committed the crime

What happened? What place?

Duration? What time?

Desired actions

Fig. 1 Fundamental steps in crime investigation Crime Analysis Methods

Geo-referenced

Hotspot Centered

Non-Geo-referenced

Spato-temporal crime pattern

Conventional Cluster Algorithms

Mathematical

Data Mining

Machine Learning

Statistics

Self-organising Map

Fig. 2 Generic overview of criminology aspects

tion strategies are being proposed. Generally, for any crime investigation procedure undergoes through the following steps as shown in Fig. 1. A generic overview of illicit aspect for crime study, analysis, prediction, responses, recommendations, etc. is highlighted in Fig. 2 as follows. Due to diverse characteristic and dynamics in crime data such as crime types, crime location, crime pattern, rate of crime occurrence, prompt crime response and prevention makes crime data analysis challenging. Motivated by these challenges, we have designed an efficient crime data analysis for hotspots identification and forecasting crime occurrence using simple time series methods and exponential smoothing techniques. The overall contribution of the work is as follows.

1.1 Author’s Contributions • An efficient exploratory data analysis is performed over Chicago data set by considering different factors such as crime types, crime rate and crime pattern. • Crime hotspot detection/crime site analysis is done to visualize geo-spatial data by mapping crime incidents with crime sites.

An Efficient Framework for Forecasting of Crime Trend Using . . .

743

• Analyzing crime trend over period of time for deducing safest hours using graph object. • Finally, forecasting crime occurrence using simple time series methods and exponential smoothing techniques.

1.2 Organization The rest of the paper is organized as follows. The related machine learning algorithmsbased crime analysis is discussed in Sect. 2. In Sect. 3, pre-requisite experimental methodologies are presented. The proposed machine learning algorithms enabled crime data analysis and forecasting of crime occurrence approach is discussed in Sect. 4. Section 5 describes the overall experimental result and discussion. Finally, the work is concluded with future possibilities in Sect. 6.

2 Related Works There are several related works available for crime prediction using machine learning techniques. Few recent related study are discussed as follows. A crime prevention and reduced crime risk-aware route recommendations approach is developed at road-level leveraging heterogeneous open data [2]. Accurate prediction and possible crime risky route recommendation are targeted by fusing large-scale urban sensing data generate by optimal risk-aware routes data. The proposed road-level two phase framework first extract important features from finegrained crime risk large-scale urban data which in terns recommends crime riskaware route. The feature extraction process starts with relevant external data sets and historical crime records, as temporal, spatial and recurrence crime features. The extraction includes spatio-temporal pattern (mobility data, infrastructure data, meteorological data, demographic data, etc.) along with unique crime near-repeat phenomenon by data sparsity. In particular, we first extract three categories of features from various external data sets and historical crime records, i.e. temporal features (e.g. date, time and meteorological data), spatial features (e.g. POIs, taxi mobility, demographic data and police station distribution) and recurrence crime features. Natural language processing-based crime data analysis techniques are addressed for criminal justice industry in [3]. The authors have used a level-wise top-down hierarchical graph-based crime prediction using unsupervised learning. The levelwise graph is generated using cosine similarity matrix for finding the relationship between entity pair such as person to person (PER-PER), person to location (PERLOC) and organization to person (ORG-PER). Simulation is conducted over Indian crime related data sets which is consisting of 22 states and 5 union territories ranges from 2004 to 2016.

744

B. B. Sinha and T. Biswas

Table 1 Summary of related crime analysis work Algorithms Objectives Methodology Crime analysis, 2020 [2]

NLP-2019 [3]

Crime prevention and reduced crime risk-aware route recommendations Crime patterns analysis

CrimAnalyzer, 2019 [4]

Identification of hotspots and crime pattern

Crime prediction and forecasting, 2021 [5]

Improve crime prediction and forecasting accuracy Inverse distance weighted for crime prevention and emergency response

Women safety, 2021 [6]

A co-training and Bayes (CoBayes)-based method Word2Vec approach and cosine similarity matrix Non-negative matrix factorization

Improvements Mobility features and crime risk tolerance

Paraphrase basis extraction

More entity relationship feature extraction can be deployed Machine learning/deep Satellite imagery data learning approaches sets can enhance the performance Mobile application Data sets can scaled based wearable device up geographically for using GIS enhancing performance measures

The authors [4] proposed crime analyser tool with a visual named as CrimAnalyzer for Sao Paulo city. The crime analysis are done based on the behaviour of crimes in a locality. The study of these crime characteristic is analysed by exploring crime patterns in a region, spatial crime hotspots identification and understanding crime dynamics a period of time. They have used an non-negative matrix factorization technique for hotspots identification. The data sets used for simulation consist of several illicit activities such as robbery, burglary and larceny based on the criminal activities over Sao Paulo, Brazil during 2000–2006. An empirical study report for crime analysis for Chicago and Los Angeles crime data sets by deploying several machine learning algorithms [5, 7]. The objective is to enhance the predictive accuracy for crimes and response accordingly. The deployed machine learning-based approaches are further enhanced with deep learning algorithms such as LSTM and ARIMA. A holistic framework for crime analysis, prevention and response for women safety by incorporating technology enabled wearable device [6]. The proposed work underwent through four fundamental phases as designing of mobile application, prototyping of wearable device, GIS-enabled hotspots identification, and finally, crime monitoring, response and analysis through website. Table 1 highlights the key works performed in the domain of crime analysis.

An Efficient Framework for Forecasting of Crime Trend Using . . .

745

3 Experimental Methodologies 3.1 Naive Method In the case of naive predictions, we effectively assign the level of all forecasts to the value of the most recent observation [8]. The Naive method takes into account what occurred in the preceding period and forecasts that the same event will occur again in future. Equation 1 gives the mathematical representation to perform naive forecasts. Xˆ T +h|T = X T

(1)

where, X T +h|T denotes the estimation of X T +h based on previous data, i.e. X 1 , X 2 , . . . XT .

3.2 Simple Average Method In this technique, all subsequent values are forecasted to be equal to the mean (or “average”) of the past data [9], which is calculated via Eq. 2. X1 + X2 + · · · + XT Xˆ T +h|T = T

(2)

3.3 Simple Moving Average Method In order to calculate a simple moving average at any specific instant in time, one just has to take the mean of a specified number of time periods [10]. The ten-day stock price using this approach, for example, forecasts the stock price by considering the preceding ten days stock prices.

3.4 Simple Exponential Smoothing While exponential smoothing is often employed for short-term projections, longterm forecasts made using this method may be misleading. Exponential smoothing approaches increase the weights of more recent data, and the weights drop exponentially as the distance between observations increases. These strategies are most successful when the time series’ parameters change slowly over time. This approach of predicting the time series is mostly employed when the data do not have any seasonal pattern [11]. Weighted form of moving averages containing decreasing weights are used in this technique. Mathematical formula for the same is denoted via Eq. 3.

746

B. B. Sinha and T. Biswas

X T = α X T + (1 − α)X T −1

(3)

where α represents the weight values.

3.5 Holt’s Method with Trend Forecasting trends and seasonality may be accomplished using the Holt-Winters time series forecasting technique. Three different smoothing procedures, namely Simple, Holt’s and Winter’s combine to form the Holt-Winters method [12]. Exponential smoothing is used three times in this technique. Forecasting time series with a linear trajectory and seasonal variation may be done with the help of this approach.

3.6 Holt-Winters’ Additive Method with Trend and Seasonality Holt’s exponential smoothing with seasonality is indeed an extension. For example, the degree of the forecast, its trend and its seasonal change are all exponentially smoothed using this approach. Adding seasonality to the surged forecast produces the Holt-Winters’ seasonal additive estimate [13]. This approach works well with data that does not grow in trend or seasonality over time. To demonstrate the seasonal fluctuations in the data, it generates a curved forecast.

3.7 Holt-Winter’s Multiplicative Method with Trend and Seasonality The Holt-Winters’ additive approach is comparable to this one. Multiplicative technique additionally produces exponentially smoothed estimates for the forecasts, and seasonal adjustment based on Holt-Winters’ method. The Holt-Winters multiplicative forecast is created by multiplying the early onset forecast by the seasonality [13]. This approach is best suited for data with a rising trend and rising seasonality. Because of the seasonality of the data, it generates an accurate prediction.

4 Proposed Model In this section, the flow of the work done has been discussed. Figure 3 illustrates the proposed framework for analysing and forecasting the crimes in Chicago. The complete framework comprises five stages. The first stage of the proposed model deals with loading the data set, pre-processing the data set and performing exploratory

An Efficient Framework for Forecasting of Crime Trend Using . . .

Fig. 3 Proposed framework

747

748

B. B. Sinha and T. Biswas

data analysis to extract meaningful information from the data set. The second stage analyses the crime site and detects the crime hotspots. In this stage, the main focus is on mapping the crime hotspots with respect to district, distant locations and type of crime. Once the crime hotspots are detected, crime trend is analysed for different crime types over the year 2012–2017 in third stage of the proposed framework. In this stage, the impact of different crimes during day/night is also analysed to deduce the safest hour as per crime rates. The fourth stage deals with making future predictions of crime rate using different time series analysis techniques, namely Simple Time Series methods (Naive method, Simple Average Method, Simple Moving Average Method) and Exponential Smoothing Techniques (Simple Exponential Smoothing, Holt’s Method with Trend, Holt-Winters’ Additive Method with Trend and Seasonality, Holt-Winter’s Multiplicative Method with Trend and Seasonality) [14]. The performance of forecasts made by different time series techniques is tested in the last stage of proposed framework by mapping the actual and forecasted trend. Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) values are used to measure the error. Lower RMSE and MAPE values are an indication of better performing model.

5 Experimental Outcomes and Discussion This section of the paper discusses the output obtained at each stage of the proposed model. The key findings of the proposed framework are highlighted in this section. The proposed framework has used Chicago data set1 for experimentation and analysis of crimes. The data set is publicly available on Kaggle website. Table 2 gives the description of different attributes present in the data set. After loading the data set, it is checked for missing values, and all the missing/NULL values are dropped from the data set. Once the pre-processed data set is ready, exploratory data analysis (EDA) is performed. The EDA provides us with meaningful insights related to data set attributes, crime rate (with respect to hour, day, month and year), and crime type in any particular premise. Figure 4 illustrates the correlation among the different attributes present in the data set. After analysing the correlation values, it can be observed that there are certain pairs of attributes which are highly correlated with each other. For example: X-Coordinate, Longitude: 1.0, Y -Coordinate, Latitude: 1.0, District, Ward: 0.69 and so on. The rate of crime occurrence over a specific period of time (hour, day, month, year) is illustrated via Fig. 5. During the exploratory phase, the type of crime and its subtype is visualized and it is observed that mostly the number of crime case belongs to ‘Theft’ category. So, preventive measures can be taken by the police department to handle such incidents. Figure 6 illustrates the frequency of primary crimes and its subtypes. It can also be concluded from the subtype plot that most of the ’Theft’ cases are related to ‘$500 and under’ category. 1

https://www.kaggle.com/currie32/crimes-in-chicago.

An Efficient Framework for Forecasting of Crime Trend Using . . .

749

Table 2 Chicago data set description Attribute Description ID Arrest Beat Block Case number Community area Date Description District Domestic FBI code IUCR Latitude Longitude Location Location description Primary type Updated on Ward X coordinate Y coordinate Year

Unique identifier for recognizing the record This comprises Boolean value indicating any arrest made Indicates beat description when crime occurred Block address representing partial address RD number assigned by Chicago Police Out of 77 communities, in which community crime occurred Day of incident occurrence Description (Secondary) of corresponding IUCR code Comprises district where crime occurred Indicates whether crime occurred is related to domestic issues Classification of crime as per FBI Four digit code for classifying criminal incident Location latitude Location longitude Details of location as per geographic and map format Details of location where crime occurred Description (Primary) of corresponding IUCR code Indicates the time and date on which record was updated Details of ward where crime occurred Location where crime occurred (X-axis) Location where crime occurred (Y -axis) Indicates the year in which crime occurred

After analysing the data set, the next step is to detect the crime hotspots based on the crime rate or based on certain crime occurrence in any specific location. For detecting the hotspots and for indicating it in Chicago map, a small subset of crime data set that comprises 2500 instances have been considered. Specific location is considered as crime hotspot only if the crime rate is greater than 500 in that area. Hotspot value > 500 is taken just for giving a glimpse of how hotspots in the map can be generated. The hotspots can be further narrowed down based on the type of crime. Figure 7 illustrates the crime hotspots based on total crime rates, and based on crime type: ‘Public Peace Violation’, respectively. The hotspots are generated if the number of cases related to ‘Public Peace Violation’ is greater than 5. The value based on which crime hotspots are generated can be adjusted by the police department as per their objective. After detecting the crime hotspots in the specific locations of Chicago, crime trend is analysed and forecasted using different time series approaches as discussed in Sect. 3. The crime trend is analysed over the year 2012 to 2017. The trend analysis of crime type helps in deducing safest hours using graph object. Figure 8 illustrates the crime type frequency during the day and night time. The heatmap clearly indicates that crime rate during night time is more as compared to day time.

750

Fig. 4 Correlation among attributes of Chicago data set

Fig. 5 Crime rate w.r.t hour, day, month and year (2012–2017)

B. B. Sinha and T. Biswas

An Efficient Framework for Forecasting of Crime Trend Using . . .

751

Fig. 6 Frequency of primary crimes and its subtype

(a) Based on total crimes in Chicago (b) Based on ’Public Peace Violation’ crime Fig. 7 Crime site analysis/crime hotspots

Fig. 8 Crime trend during day and night time

752

B. B. Sinha and T. Biswas

Fig. 9 Crime trend analysis

Figure 9 illustrates trend of 18 crime types, namely Criminal Damage, Criminal Trespass, Deceptive Practice, Gambling, Homicide, Human Trafficking, Interference with public officer, Intimidation, Kidnapping, Liquor Law Violation, Motor Vehicle Theft, Narcotics, Non-Criminal, Non-Criminal, Non-Criminal (Subject Specified), Obscenity, Offense Involving Children and Other Narcotic Violation. The trend analysis of crime gives us a clear picture of safest hours. The police department cannot only analyse the crime trend based on past crime records but can

An Efficient Framework for Forecasting of Crime Trend Using . . .

Fig. 10 Mapping of actual and forecasted trend

753

754 Table 3 Performance testing of forecasted trend Technique Naive method Simple average method Simple moving average forecast Simple exponential smoothing forecast Holt’s exponential smoothing method Holt-Winters’ additive method Holt-Winters’ multiplicative method

B. B. Sinha and T. Biswas

RMSE

MAPE

3672.13 4717.98 3573.08 3651.22 3837.12 2368.40 2304.46

16.07 19.73 14.89 15.94 17.14 8.32 8.64

also forecast the crime trend that can occur in the nearby future. In our proposed framework, different time series methods have been utilized to predict the future crime incidents. The efficiency of prediction model is tested using RMSE and MAPE. Figure 10 demonstrates the mapping of actual and predicted crime trend over time. The time series data set has been divided into 80/20: train/test data set. The data set is selected based on month end frequency. Table 3 demonstrates the error value obtained by different simple time series methods and exponential smoothing methods. Holt-Winters’ additive method yields minimum RMSE and MAPE value of 2368.40 and 8.32, respectively.

6 Conclusion and Future Work Accurate crime data analysis is challenging due to its heterogeneous characteristic and dynamics aspect such as crime types, crime location, crime pattern, rate crime occurrence and prompt crime response and prevention schemes. In this work, machine learning algorithms-based crime analysis, response and prevention approach is designed. The proposed crime analysis technique is simulated based on the Chicago data set by considering different factors such as crime types, crime rate and crime pattern. Later on, crime hotspots identification is conducted to visualize geo-spatial data by mapping crime incidents with crime sites. We further analyse crime trend over period of time for deducing safest hours using graph object and forecast crime occurrence using simple time series methods and exponential smoothing techniques.

References 1. Xia Z, Stewart K, Fan J (2021) Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major us metropolitan area. Comput Environ Urban Syst 87:101599 2. Zhou B, Chen L, Zhou F, Li S, Zhao S, Das SK, Pan G (2020) Escort: fine-grained urban crime risk inference leveraging heterogeneous open data. IEEE Syst J 15(3):4656–4667

An Efficient Framework for Forecasting of Crime Trend Using . . .

755

3. Das P, Das AK, Nayak J, Pelusi D, Ding W (2019) A graph based clustering approach for relation extraction from crime data. IEEE Access 7:101269–101282 4. Zanabria GG, Silveira JA, Poco J, Paiva A, Nery MB, Silva CT, Nonato LG (2019) CrimAnalyzer: understanding crime patterns in Sao Paulo. IEEE Trans Vis Comput Graph 01:1 5. Kshatri SS, Singh D, Narain B, Bhatia S, Quasim MT, Sinha GR (2021) An empirical analysis of machine learning algorithms for crime prediction using stacked generalization: an ensemble approach. IEEE Access 9:67488–67500 6. Shenoy MV, Sridhar S, Salaka G, Gupta A, Gupta R (2021) A holistic framework for crime prevention, response, and analysis with emphasis on women safety using technology and societal participation. IEEE Access 9:66188–66207 7. Sinha BB, Dhanalakshmi R (2019) Evolution of recommender system over the time. Soft Comput 23(23):12169–12188 8. Papacharalampous G, Tyralis H, Koutsoyiannis D (2018) Predictability of monthly temperature and precipitation using automatic time series forecasting methods. Acta Geophys 66(4):807– 831 9. Karasu S, Altan A, Bekiros S, Ahmad W (2020) A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series. Energy 212:118750 10. Singh SN, Mohapatra A (2019) Repeated wavelet transform based ARIMA model for very short-term wind speed forecasting. Renew Energy 136:758–768 11. Ostertagová E, Ostertag O (2011) The simple exponential smoothing model. In: The 4th international conference on modelling of mechanical and mechatronic systems, Technical University of Košice, Slovak Republic, pp 380–384 12. Sakizadeh M, Mohamed MM, Klammler H (2019) Trend analysis and spatial prediction of groundwater levels using time series forecasting and a novel spatio-temporal method. Water Resour Manag 33(4):1425–1437 13. Lima S, Gonçalves AM, Costa M (2019) Time series forecasting using Holt-Winters exponential smoothing: an application to economic data. In: AIP conference proceedings, vol 2186, no 1. AIP Publishing LLC, p 090003 14. Anderson TW (2011) The statistical analysis of time series, vol 19. Wiley, Hoboken

Performance Evaluation of a Novel Thermogram Dataset for Diabetic Foot Complications Naveen Sharma, Sarfaraj Mirza, Ashu Rastogi, Prasant K. Mahapatra, and Satbir Singh

Abstract Diabetes, a metabolic disease, has been designated as a “public health priority” in the majority of countries due to its increasing global prevalence. Diabetic foot ulcer is one of the most serious complications of the disease, and if not treated promptly, it can lead to lower extremity amputation. Advancement in thermography imaging technology plays a pivotal role in the early diagnosis of foot ulceration. This article presents a new plantar foot dataset developed in the Indian healthcare conditions. The dataset includes 208 individual foot thermograms collected from 71 diabetic patients with some diabetic abnormalities in the plantar foot and 33 non-diabetic (controlled) subjects. This dataset was compared to the HernandezContreras dataset. The findings were analyzed to determine the data’s suitability and future potential in promoting the research in the non-invasive diagnosis of diabetic foot ulcers. Keywords Thermogram dataset · Diabetic foot complications · Infrared imaging · Image segmentation · Image classification

1 Introduction Diabetes and its associated diseases are now highly prevalent, particularly in India. According to the International Diabetes Federation, India has the second-highest number of diabetics (72 million) [1]. The figure is expected to rise to 120 million N. Sharma (B) Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India e-mail: [email protected] N. Sharma · S. Mirza · P. K. Mahapatra CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India A. Rastogi Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India S. Singh National Institute of Technology, Jalandhar 144011, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_53

757

758

N. Sharma et al.

over the next two decades. Diabetes is the root cause of many diseases, including chronic kidney disease (nephropathy), neurological disease (neuropathy), retinal disease (retinopathy), and cardiovascular problems [2]. Diabetic patients were found to develop peripheral artery disease (PAD) and its associated problems with the lower limbs, often known as diabetic foot ulcer (DFU) [3]. It is known that approximately 15–25% of diabetic patients will develop a foot ulcer at some point in time [4]. A cumulative effect of undiagnosed continuous trauma at pressure points on the foot sole is one of the causes of foot ulcers. Chronic illnesses and obesity associated with an aging population in India, such as circulatory insufficiency, diabetes neuropathy, or a combination of these pathologies, mirror the current rising cases of foot ulcers [5]. Diabetic foot ulcers can be avoided if symptoms of diabetic foot complications are detected early and treated timely. Early detection requires regular examination, which might be limited for many reasons [6]. Traditional non-invasive skin assessment procedures, such as visual examination and palpation, can be useful diagnostic tools; however, they rarely discover complete abnormalities in skin patches until skin breakdown has occurred [7]. The abnormal rise in foot temperature is considered a sign of abnormality [8]. A recent research study found that patient with diabetic abnormality in plantar foot has raised temperature (32–35 °C) compared with normal subject foot temperature (27–30 °C) [2]. To measure foot inflammation, a manual thermometer is used by the clinicians; however, it only shows a mean temperature value over a wide area of the foot and falls in monitoring the entire temperature distribution over the plantar foot [9]. This shortcoming is resolved by using the thermal imaging-based technique. This technique is non-invasive and non-radiative; hence, it doesn’t cause any harm to the patients despite of the regular examination’s usages. Rising plantar foot temperature is a significant indication of underlying inflammation; hence, thermal imaging is a potential tool for achieving this goal [10]. Temperature measurements can provide quantifiable data that can be used to predict ulceration more effectively. A significant and sufficient dataset is necessary to construct such a prediction model for the early detection of DFU. To the best of our knowledge, only one standard open-source thermal dataset is available, namely Hernandez-Contreras’ “The plantar thermogram database” [11]. In the database, there are 334 unique foot thermograms, comprising 122 diabetic and 45 non-diabetic patients. Only photos of pre-ulceration are included in this dataset. However, there is no specific database available that has been developed in the Indian healthcare conditions and includes both ulcerate and pre-ulcerate subjects. For the last one and a half years, we have been developing a thermal dataset for diabetic foot ulcer patients. So far, we have developed a dataset of 208 individual foot thermograms collected from 71 diabetic patients with some abnormalities in plantar foot and 33 non-diabetic (controlled) subjects. Pre-ulcer, ulcerated, and a control group were among the subjects we enrolled. In this paper, we compare our novel dataset to the benchmark dataset developed by Hernandez-Contreras (H-C) [11] using traditional thermal and texture-based

Performance Evaluation of a Novel Thermogram Dataset …

759

features. Section 2 speaks about the methodology for feature extraction and classification. Finally, for both databases, we applied various machine learning models to predict classification accuracy.

2 Methodology The methodology for analysis of plantar foot thermogram and predict the abnormalities proposed as in Fig. 1.

2.1 Data Acquisition and Preprocessing The proposed thermal image dataset of diabetic foot patients was taken after the ethical approval (Ref. No-2020/000170) was obtained from the Research Ethics Committee of the PGIMER, Chandigarh. We used FLIR E60 thermal camera to take

Data Acquisition

Data Preprocessing

Patches Formation

Feature Extraction

Mean temperature (MT),standard deviation (SD) entropy, contrast, dissimilarity, homogeneity, energy, and correlation

Classifiers

Logistic regression, Random Forest, Decision Tree, Gradient Boosting, Naive Bayes, K-NN, SVM, ANN

Normal Testing and Evaluation

Diabetics Abnormality Present

Fig. 1 Flowchart of proposed methodology

760

N. Sharma et al.

Fig. 2 Data acquisition setup

the dataset images. The camera has a 320 × 240 pixel resolution and a field view of 25° × 19°. The operating wavelength range in the infrared spectrum is from 7.5 to 13 m, and the thermal sensitivity is less than 0.07 °C. The data acquisition setup is shown in Fig. 2. All other protocols were followed as per the approved ethical clearance. Plantar thermogram images of the diabetic patient (Fig. 3a) and the controlled group (Fig. 3c) were captured. To analyze the thermal data, removing the image’s unwanted noise becomes necessary [12]. Region of interest (ROI), i.e., left and right plantar feet, was segmented using the OSTU thresholding method [13]. Figure 3b, d depict the corresponding segmented ROI for the controlled and diabetic groups. Hernandez-Contreras’s database provides the preprocessed segmented plantar thermograms for diabetic patients and normal subjects, as shown in Fig. 3e, f.

(a) Controlled subject

(b) Segmented ROI

(e) Controlled Subject

(c) Diabetic subject

(d) Segmented ROI

(f) Diabetic Patient

Fig. 3 Preprocessing of a normal subject a, b and diabetic subject c, d for proposed dataset and e, f are segmented thermogram from H-C’s dataset

Performance Evaluation of a Novel Thermogram Dataset …

761

2.2 Feature Extraction

(a) Our Database

Fig. 4 Patches and data labeling

(b) Hernandez-Contreras’s Data-base

The ROI was further subdivided into right and left feet. To localize the foot abnormality presented in the ROI, each foot was divided into three equal patches, i.e., front foot, midfoot, and heel foot (Fig. 4). Thermal and texture features (GLCM) were calculated from each patch and used as inputs to our training model [14, 15]. The mean temperature (MT) and standard deviation (SD) of each patch were calculated, and the second-order texture features were extracted from the segmented ROIs using the gray-level co-occurrence matrix (GLCM). Six GLCM features were extracted from the ROI, namely entropy, contrast, dissimilarity, homogeneity, energy, and correlation [16]. Finally, classifiers were trained using these combined features.

762

N. Sharma et al.

2.3 Feature Selection and Classification Each patch’s computed features are labeled as normal or diabetic patches (Fig. 4). Our proposed dataset was labeled by medical experts, whereas Hernandez-Contreras’ dataset is pre-labeled. To test the accuracy and repeatability, the labeled dataset was classified using best-known classifiers such as Logistic Regression, Random Forest Classifier, Decision Tree Classifier, Gradient Boosting Classifier, Naive Bayes, K-neighbor’s, Support Vector Classifier, and Neural Network-based MLP Classifier.

3 Result and Discussion The primary goal of this study is to create a thermal dataset for diabetic foot patients that is appropriate for Indian healthcare conditions. The dataset was compared to an existing dataset for testing its accuracy and repeatability. Plantar thermograms from H-Cs and proposed dataset were divided into patches, and thermal and GLCM features were calculated for each patch. Different classifiers were used to categorize these features. A quantitative analysis was carried out to evaluate the performance of the various classifiers. The parameters used for quantitative analysis are as follow: Accuracy =

Tp + Tn ∗ 100 Tp + Tn + Fp + Fn

(1)

Tp Tp + Fp

(2)

Precision = recall =

Tp Tp + Fn

f 1 score =

Tp Tp + Fn

(3) (4)

Each classifier model was run three times to test its repeatability and suitability. The outcomes are shown in Tables 1 and 2. The accuracy matrices for our database are shown in Table 1. Analyzing the performance of all classifiers revealed that the random forest and gradient boosting classifiers dominated over all other classifiers for both the datasets; however, accuracy in our dataset (Table 1) is slightly higher than the H-C’s (Table 2). The primary difference between the proposed dataset and the H-C’ dataset is that we included both pre-ulcerate and ulcerate patients, whereas H-C’s dataset on the other hand only included pre-ulcerate patients to diagnose the diabetic abnormality in the plantar foot. A comparative analysis of both datasets is shown in Fig. 5 Comparative analysis of the accuracy matrices for the both datasets.

Classifier

Decision tree classifier

Gaussian

Logistic regression

KNN

SVC

NN(MLP)

Random forest classifier

Gradient boosting classifier

S. No.

1

2

3

4

5

6

7

8

91.63

90.37

87.03

88.70

89.95

87.87

76.56

87.03

89.54

88.70

84.94

87.67

56.61

87.45

74.78

84.93

Metrics accuracy

89.54

91.21

87.87

88.70

87.87

86.61

74.47

87.86

Table 1 Quantitative analysis of different classifiers for proposed database

0.65

0.85

0.83

0.77

0.80

0.84

0.79

0.84

0.83

0.76

0.82

0.80

0.81

0.65

0.76

Average precision 0.77

0.86

0.88

0.82

0.85

0.83

0.81

0.69

0.82 0.72

0.86

0.82

0.78

0.77

0.78

0.74

0.81

0.79

0.77

0.76

0.73

0.76

0.70

0.77

Average recall 0.78

0.82

0.84

0.82

0.80

0.80

0.78

0.77

0.82

0.67

0.85

0.83

0.77

0.79

0.80

0.76

0.82

0.81

0.76

0.78

0.76

0.78

0.66

0.76

0.84

0.86

0.82

0.82

0.81

0.79

0.70

0.82

Average F 1 -measure 0.77

Performance Evaluation of a Novel Thermogram Dataset … 763

Classifier

Decision tree classifier

Gaussian

Logistic regression

KNN

SVC

NN(MLP)

Random forest classifier

Gradient boosting classifier

S No

1

2

3

4

5

6

7

8

87.37

88.37

87.37

84.38

86.37

84.71

83.05

87.37

84.71

85.71

83.72

84.71

82.72

81.39

83.05

83.72

Metrics Accuracy

86.71

87.04

80.06

84.39

84.71

83.05

83.38

80.06

0.84

0.85

0.84

0.81

0.82

0.81

0.78

0.84

0.83

0.84

0.81

0.86

0.80

0.79

0.80

0.81

Average Precision

Table 2 Quantitative analysis of different classifiers for the H-C’s database

0.85

0.86

0.76

0.86

0.83

0.81

0.80

0.76 0.81

0.83

0.84

0.84

0.77

0.82

0.77

0.79

0.81

0.80

0.77

0.77

0.75

0.81

0.81

Average Recall 0.84

0.81

0.81

0.76

0.75

0.78

0.75

0.81

0.76

0.83

0.85

0.84

0.78

0.82

0.79

0.79

0.84

0.81

0.82

0.80

0.79

0.78

0.76

0.80

0.80

0.83

0.83

0.76

0.78

0.80

0.77

0.80

0.76

Average F 1 -Measure

764 N. Sharma et al.

Performance Evaluation of a Novel Thermogram Dataset …

765

Fig. 5 Comparative analysis of the accuracy matrices for the both datasets

4 Conclusion This manuscript presents a novel dataset generated in Indian healthcare conditions. The dataset contains 208 individual foot thermograms collected from 71 diabetic patients with some diabetic abnormalities in the plantar foot and 33 non-diabetic (controlled) subjects. To assess the dataset’s suitability, it was compared to a wellknown dataset [11], and it was found that, for the same type of measurement parameters, the accuracy of the proposed dataset was slightly higher than that of the existing dataset. The primary goal of creating this dataset is to encourage researchers to evaluate the potential role of thermal imaging in the early detection of diabetic-related foot complications. To the best of our knowledge, there is a very limited publicly available thermal dataset for DFU patients. We are still working with our medical partners to enroll more patients and make this dataset available to researchers.

References 1. International Diabetes Federation (2017) [Online]. Available: https://www.idf.org/our-net work/regions-members/south-eastasia/members/94-india.html 2. Bagavathiappan S, Philip J, Jayakumar T, Raj B, Rao PNS, Varalakshmi M, Mohan V (2010) Correlation between plantar foot temperature and diabetic neuropathy: a case study by using an infrared thermal imaging technique. J Diabetes Sci Technol 4:1386–1392 3. Gatt A, Formosa C, Cassar K et al. (2015) Thermographic patterns of the upper and lower limbs: baseline data. Int J Vascular Med. 2015:9. Article ID 831369 4. Boulton AJ et al (2005) The global burden of diabetic foot disease. Lancet 366:12–18 5. Pecoraro RE, Ahroni JH, Boyko EJ, Stensel VL (1991) Chronology and determinants of tissue repair in diabetic lower extremity ulcers. Diabetes 40:1305–1313 6. Bakker K, Apelqvist J, Schaper NC (2012) International working group on diabetic foot editorial board. Practical guidelines on the management and prevention of the diabetic foot 2011. Diabetes Metab Res Rev 28(Suppl 1):225–231 7. Schubert V, Fagrell B (1989) Local skin pressure and its effect on skin microcirculation as evaluated by laser Doppler fluxmetry. Clin Physiol 9:535–545 8. Armstrong DG, Lavery LA, Liswood PJ, Todd WF, Tredwell J (1997) Infrared dermal thermometry of the high-risk diabetic foot. Phys Ther 77:169–177

766

N. Sharma et al.

9. Selle J, Prakash KVMV, Sai GA, Vinod B, Chellappan K, Kebangsaan UKM, Classification of foot thermograms using texture features and support vector machine 10. van Netten JJ, van Baal JG, Liu C, van Der Heijden F, Bus SA (2013) Infrared thermal imaging for automated detection of diabetic foot complications. 1122–1129 11. Hernandez-Contreras DA, Peregrina-Barreto H, de Jesus Rangel-Magdaleno J, Renero-Carrillo FJ (2019) Plantar thermogram database for the study of diabetic foot complications. IEEE Access 7:161296–161307 12. Chan AW, MacFarlane IA, Bowsher DR (1991) Contact thermography of painful diabetic neuropathic foot. Diabetes Care 14(10):918922 13. Fraiwan L, AlKhodari M, Ninan J, Mustafa B, Saleh A, Ghazal M (2017) Diabetic foot ulcer mobile detection system using smart phone thermal camera: a feasibility study. BioMed Eng OnLine 16(11). Art. no. 117 14. Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image classification. IEEE Trans Syst Man Cybernet 3(6):610–621 15. Soh LK, Tsatsoulis C (1999) Texture analysis of SAR sea ice imagery using gray level cooccurrence matrices. IEEE Trans Geosci Remote Sens 37(2):780–795 16. Saminathan J, Sasikala M, Narayanamurthy VB, Rajesh K, Arvind R (2020) Computer aided detection of diabetic foot ulcer using asymmetry analysis of texture and temperature features. Infrared Phys Technol 105:103219

Improving Indoor Well-Being Through IoT: A Methodology for User Safety in Confined Spaces Mariangela De Vita, Eleonora Laurini, Marianna Rotilio, Vincenzo Stornelli, and Pierluigi De Berardinis

Abstract The main objective of the research is to develop an integrated methodology for monitoring the management of worker/user safety in confined spaces, such as on a selection of construction sites. The methodology involves the use of IoT to detect environmental and personal psychophysical parameters, collecting and connecting them in real time so as to distance users from health risks. Furthermore, the intelligence of the monitoring sensors implies the command, when necessary, of the activation of architectural or technological devices in order to compensate for any environmental or health dangers. In fact, the integration of IoT into such construction devices is beneficial in confined spaces for bettering the indoor air, mitigating the impact on people of pollutants, noise and poisons of various natures. Safety devices—both those to be worn by the user and those to be installed in the environment—have to be IoT equipped with sensors, communicating with each other during the operations onsite thanks to a specific network. The methodology described in this work is fundamental for protecting health under risky environmental conditions and can be extended in any closed space (the home, office or in shops, for example) with the aim of optimising internal comfort and thus improving everyday well-being. Keywords Internet of Things · Smart devices · Monitoring system · Health · Confined space

M. De Vita (B) · E. Laurini · M. Rotilio · P. De Berardinis Department of Civil, Building and Environmental Engineering, University of L’Aquila, L’Aquila, Italy e-mail: [email protected] V. Stornelli Department of Industrial and Information Engineering and Economics, University of L’Aquila, L’Aquila, Italy © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_54

767

768

M. De Vita et al.

Fig. 1 User safety results from the integration of the main elements of the research

1 Introduction In recent years, systems for monitoring through sensors have become an indispensable tool for political, urban, social and economic surveys and have been largely characterised by scientific research in specific sectors and fields of application: medical, construction and electronics [1]. With the results obtained to date, an attempt is being made to tackle the problem of climate change and the unsustainability of technological progress in relation to the consumption of non-renewable resources and energy. Indeed, the growing concern regarding irreversible environmental damage has brought the issue of having to monitor the well-being of both the environment and people to international government attention [2]. The theme of everyday well-being finds specific application in indoor comfort and even more specific application in terms of the safety and health of workers in buildings and on construction sites. This paper presents a methodology for integrating environmental monitoring—a common tool for protecting and controlling structures from a construction and architectonic point of view—with the monitoring of specific psychophysical parameters of users so as to evaluate their overall conditions of health in confined spaces. The methodology uses the aid of building devices (architectural, such as ventilation chimneys, or technological, such as heat recovery ventilation) to improve the internal environmental quality (IEQ) in conditions of specific risk. The innovation of this research consists in the fact that the activation of such building devices takes place thanks to the Internet of Things (IoT) which, applied both on the building envelope and to users (through wearable sensors), are able to identify states of alarm during operations and carry out compensatory measures (cf. Fig. 1).

2 Updates from Recent Literature The construction sector is one of the main sources of energy consumption, mainly due to the use of heating, ventilation and air conditioning (HVAC) systems to achieve

Improving Indoor Well-Being Through IoT: A Methodology …

769

indoor thermal comfort for occupants. To reach the optimal internal temperature, a lot of energy is generally consumed. From an environmental sustainability perspective, the search for optimisation—without sacrificing comfort—in building usage is therefore fundamental. In order to obtain such optimisation, in recent years, the application of IoT for environmental monitoring of buildings has been a topic widely addressed in the literature of the sector. An initial study of interest [3] describes an optimisation approach termed “Dynamic PMV”, which combines building information and modelling (BIM) and Internet of Things (IoT) sensors. This integration makes it possible to exploit the geometric and parametric richness of BIM models and the real-time streaming of environmental data (humidity, temperature, etc.) collected by IoT sensors to optimise internal thermal comfort. First, the IoT measurements are interpolated according to a regular three-dimensional grid considering the heat exchanges between the areas using the parametric information of the BIM model followed by the real-time 3D visualisation of thermal comfort using the predicted mean vote (PMV) index. Another study [4] shows a method for defining an IoT framework which monitors workers in a production plant from a social point of view. This assessment allows us to understand the critical issues that should be improved through organisational decisions or the better management of processes, benefitting both the production system and health of operators by preventing occupational diseases and avoiding absenteeism, turnover and a reduction in performance. It also improves the fulfilment of jobs, as a win–win situation for both operators and the company. For the purposes of our study, this study has shown how the well-being of the worker has brought benefits in terms of performance simply by making the worker aware of their own health conditions in the workplace thanks to IoT and the correlation of a small amount of data. At the same time, the Internet of Things (IoT) in healthcare applications and health monitoring is an emerging field of research. The health and well-being of people pass through the objective environmental conditions along with the way in which this environment is experienced and managed. Smart IoT sensors for healthcare applications allow accurate measurements, monitoring and analysing of a variety of vital health indicators, such as heart rate, ECG, blood pressure, blood glucose or oxygen saturation and so on. These parameters are then collected and transferred to various IP end devices. Not only is it cost effective but this approach allows real-time access to patient health data. What’s more, several studies presented IoT remote health monitoring systems that provide patient conditions via a web browser [5]. Here, sensors connected within a wireless network allow the spatial–temporal sampling of physical, physiological, psychological, cognitive and behavioural processes in spaces ranging from just the personnel to the buildings and even cities. Such dense sampling across spaces of different scales results in healthcare applications based on sensory information that fuse and aggregate information gathered from multiple distributed sensors. This would render it possible to easily relate data collected at the same time but at completely different scales. Furthermore, the development of sophisticated machine learning algorithms means being able to deduce complex conditions such as stress and depression from sensory information. It is

770

M. De Vita et al.

expected that healthcare devices and applications will eventually process vital private information such as personal healthcare data [6]. Such smart devices can even be connected to global information networks for access anytime, anywhere. Regarding the evaluation of complex psychophysical parameters, such as when operating under conditions of stress for example, one study performed a real-time analysis of the physiological and physical stress detection for athletes using the proposed MOR-DCNN [7]. Such detection may also be useful for identifying all essential parameters to be monitored and correlated in order to diagnose stress even under non-sports conditions, for example at home. This has been shown by numerous studies looking into the health and well-being of individuals in the workplace, at home or whilst exercising. Another study [8] uses IoT technology to evaluate the conditions for the well-being of students, perfecting the network and the cloud. Yet again, with the application of IoT, other researchers have evaluated the health conditions during smart working [9]. A comprehensive study by Lawal and Rafsanjani shows the trends in IoT applications for buildings, comparing results between residential and commercial [6]. In particular, they identify specific applications and categories such as home or office healthcare facilities, home automation and smart energy management systems. Some applications are even found in countering the SARS-CoV-2 pandemic, mostly oriented towards the timely identification of specific symptoms [10–17]. One particular chapter in a book [18] describes the designed and implementation of a system for IoT-enabled home-based care and living assistance, conceptualising the multidimensionality of smart home automation. This is a great contribution and innovation, which unites the nursing care and living assistance paradigm as a whole under the blanket term “home-based care”. The main objectives of such studies are to improve the quality of life and prevent patients and caregivers suffering from social isolation, stress, depression and burnout [18] whilst facilitating geriatric rehabilitation [19]. A study currently underway has developed an innovative multifunctional panel produced with waste and used for checking safety in factories and for the monitoring of environmental conditions therein, along with the energy improvement of the building envelope. In detail, sensors were used to monitor temperature, moisture and localisation by means of an RFID tag [20, 21]. In this literature update, mention must be made of the studies that have linked the incessant increase in air and noise pollution, to the detriment of individuals’ health conditions. One study provides a system to gauge and monitor the environmental parameters and raise an alarm when the air quality and noise level breach safe levels, thanks to sensors able to detect the gases present in the atmosphere as well as the noise levels in a particular area, transmitting the details to the NodeMCU microcontroller. This microcontroller is connected to a cloud platform in order to acquire and process the data by comparing it with the values perceived as safe. This cloud-based monitoring app even provides an alert system when either of the air quality or noise pollution variables breaches the permissible level. Users are notified by sending an email or message to an android device or even through a buzzer can be activated as an

Improving Indoor Well-Being Through IoT: A Methodology …

771

alarm [22–25]. Another study permits the monitoring—and thus, the mitigation—of risks deriving from dusts produced by demolition sites [26]. Finally, if it is true that science fiction often anticipates actual science, we cannot overlook the latest film by Adam McKay, Don’t Look Up. In this film, an innovative product named BASH LiiF is configured as a veritable external “senses tracker”, being a mobile phone able to detect—before the user—their personal emotional state and thus their well-being [27]. Reviewing the recent literature allows us to understand what incredible and rapid steps IoT technology is taking towards the complete engineering of our lives. Results are achieved in the diagnosis of pathologies at the time of their onset, preventing serious complications. This is true for the human factor as well as for the work environment and for our homes. Systematising the results obtained by IoT in these disciplines allowed us to build an efficient methodology of environmental monitoring in confined spaces for the protection of workers and, not least, of the environment.

3 Methodology A confined space is an enclosed area, not conceived as a building permanently utilised by people but which can occasionally be occupied by workers both during the construction phase and for maintenance, repair, inspection and cleaning operations. Normally, these closed places are characterised by unfavourable natural ventilation and limited access points that involve a high risk of death or injury due to polluting or dangerous substances. The possible risks in such environments can be caused by the presence of pollutants in the air, a lack of oxygen, the risk of falling from a height, the risk of fire/explosion, adverse microclimatic conditions (heat, humidity, etc.), along with exposure to dangerous noise levels. The following methodology clarifies the relationships existing between buildings and the environmental conditions in confined places and user safety. The methodology foresees both the realisation of devices able to regulate the internal environmental conditions of confined spaces (e.g. elements such as a kind of chimney for natural ventilation of the room, possibly combined with the use of VMC machines) and the monitoring of the environment and users through sensor systems integrated in personal protective equipment (PPE). The sensors serve to detect both the indoor conditions (air quality, fire risk, dangerous noise and acoustic emission levels and so on), along with user reactions (psychophysical parameters) (cf. Fig. 2). In this scenario, IoT applications that require mobility and global coverage will be fundamentally based on cellular technologies, such as those defined by the LTE-M and NB-IoT standards for today’s 4G networks and the 5G networks of the future. Others will be based on low-power wide-area networks (LPWANs) operating within unlicensed bands, such as Sigfox or LoRaWAN. Most of these will also take advantage of short-range or medium-range wireless communication technologies, such as Bluetooth®, WLAN/Wi-Fi, Zigbee, Thread and others. What’s more, a sensor network

772

M. De Vita et al.

Fig. 2 Approach idea

will be able to communicate with the microclimatic regulation devices (those for building safety), providing for their operation based on the data acquired thanks to an IoT application (levels of air quality, thermo-hygrometric parameters and the subjective parameters of the user). The system can be applied to any type of confined space, whilst the sensor network applied to the PPE can be modified and implemented based on the risk to which the user is subjected and on the basis of the data to be monitored.

3.1 Methodological Path The definition of confined space in the national legislation in force (pursuant to Article 66 of Legislative Decree no. 81/2008) equates this environment to “cesspools, sewers, chimneys, pits, tunnels and generally in environments and containers, pipes, boilers and the like, where the release of deleterious gases is possible” (cf. Fig. 3). Other international regulations, such as those of the USA, give three different definitions of confined spaces, being: a space that one can enter completely with the body and perform an operation, a space not designed to be permanently occupied or which has limited access and exit routes (cf. Fig. 4). A further classification also provides for the subdivision into three classes based on the level of danger (Class A = immediately dangerous to life; Class B = dangerous to life but not immediately; Class C = potentially dangerous to life). In turn, each class is then associated with a concentration of oxygen, being a flammable gas. This legislation also entails a series of prescriptions based on the class to which the space belongs. The requirements cover the activation of ventilation, preliminary monitoring and continuous monitoring. Hence, we intend to carry out this research

Improving Indoor Well-Being Through IoT: A Methodology …

773

Fig. 3 Example of confined space: the Smart Tunnel in L’Aquila, Italy [28]

with a procedure that includes not only a geometric classification of the space but also the actual level of instantaneous and continuous risk. To obtain such data, it is first of all necessary to monitor the environment and the personnel involved through IoT sensors, given that each operation and each space is unique, pertaining only to that specific site. In this sense, the first phase of the research involves the application of devices for the control of: • Physical parameters of the operator—GPS tracking, glare levels, noise levels, accidental falls, body temperature and heartbeat; • Parameters relating to the operations to be carried out—use of PPE, falling objects and interference between operators, equipment and vehicles; • Parameters relating to the internal environment—internal and external temperature, relative humidity, noise, oxygen level and polluting or flammable gases. Monitoring the operator’s physical parameters and the relative health conditions and interference with equipment and/or operators on site will be carried out through the application of RFID tags in Smart PPE for continuous monitoring (cf. Fig. 5). These measured parameters will be subsequently collated in a cloud according to a defined period of reference and then verified on the basis of stable regulatory limits for worker safety. If, following this verification, the parameters are not suitable, an alert will be sent to the platform, which will involve prompt intervention in the event of an accident, the activation of the ventilation systems (adaptive ventilation chimneys) previously set up in each confined space to facilitate air exchange and increase the indoor air quality level, along with the possible application of further

774

M. De Vita et al.

Fig. 4 Example of confined space in the broadest definition: demolition work within closed places

and specific PPE suitable for reducing the risks for that specific process or, finally, the eventual evacuation of operators from the environment. A methodological path, including all the issues mentioned above from the monitoring steps to problem solving (application steps), has been defined as follows (cf. Fig. 6).

Improving Indoor Well-Being Through IoT: A Methodology …

Fig. 5 Smart PPE for a responsive safety measure in health risk environment [29–34]

Fig. 6 Methodology for smart confined spaces towards IoT application

775

776

M. De Vita et al.

4 Expected Impact For the purposes of safety through prevention and protection in confined places, it is necessary to identify all risks, duly planning all activities as well as employing qualified and trained personnel to work in these types of environments at risk. This system intends to avail of support in the operational phases for all preventive assessments that are carried out in the initial analysis of the workplace. The possibility to conduct continuous monitoring means having instant feedback in the event of an established risk along with more effective and immediate prevention of the possibility of injury or accident. The sensors could also permanently remain in places requiring maintenance and subsequent processing, creating a city-wide monitoring system that can also be used as a guide for governance through monitoring the conditions of citizens in confined urban environments. The development of an integrated “citizens environment” sensor network would render it possible to make each space in the city safe and comfortable for citizens, inserting the necessary precautions and services where it is deemed necessary thanks to the signals emitted by the sensors. IoT applied to architectural comfort apparatuses (smart ventilation ducts, smart light pipes, etc.) would also permit passive adaptive systems that are able to automatically modify the conditions for indoor comfort, through the regulation of internal temperature and relative humidity parameters, as well as controlling air quality levels, directly proportional to the number of air changes required and even the relative energy savings. The purpose of using innovative and technological systems mostly concerns reliability—not with a focus on the type of approach but looking at the result in terms of user protection within a given context.

5 Conclusions Energy savings and environmental sustainability in the construction sector are limited in the relationship with user comfort. The design of tools capable of simultaneously managing the safety of cities and citizens through IoT applications represents a turning point in environmental sustainability for the built environment. This undertaking finds specific application on the construction site, where the reasons for environmental comfort leave room for the more urgent issue of the safety of workers/users in confined spaces presenting a high risk to health. In such spaces, well-being is not a luxury but a necessity. For this reason, this paper has taken, as a case study, the issue of worker safety on the worksite with a view to applying an integrated methodology. Such methodology involves the use of IoT to communicate data from the monitoring of both environmental and user health.

Improving Indoor Well-Being Through IoT: A Methodology …

777

The processing of data in real time would make it possible to promptly activate safety devices both on the environment (towards ecological devices) and on the individual (towards smart PPE). The monitoring data is connected to specific safety devices that are activated if the user’s health conditions are in danger. A future development of the research could be to apply this IoT methodology to any built environment, with the aim of everyday well-being for users. Furthermore, in a subsequent step, it would be interesting to study how governments can effectively acquire and process monitoring data on human health in urban context in such a way as to satisfy needs arising during the governance process, from the most physical (management of buildings, cities and infrastructures) to those of an economic, social, cultural and psychological nature. Clearly, the ethical and privacy aspects regarding the management and accumulation of countless personal information of users cannot be ignored. Indeed, this pervasive and invasive hyper-connection would risk not only making us part of the network even in our most intimate and psychological contexts but also risk a disconnect from the self-awareness of one’s own state of health, which would instead be delegated to “sense organs” external to the body that would use intelligence to tell us how we are. Author Contributions Pierluigi De Berardinis is the scientific manager of the project; Mariangela De Vita and Eleonora Laurini designed the research and the methodology; Mariangela De Vita designed the paper and wrote Sects. 1, 2, 3 and 5; Eleonora Laurini wrote Sects. 3.1 and 4; Marianna Rotilio reviewed the paper and verified the applicability of the methodology to the construction site design; and Vincenzo Stornelli reviewed the paper and verified the compatibility of the presented methodology with IoT technology.

References 1. Barile G, Leoni A, Pantoli L, Stornelli V (2018) Real-Time autonomous system for structural and environmental monitoring of dynamic events. Electronics 7:420. https://doi.org/10.3390/ electronics7120420 2. LNCS Homepage, https://ukcop26.org/it/iniziale/. Last accessed 30 Jan 2022 3. Zahid H, Elmansoury O, Yaagoubi R (2021) Dynamic predicted mean vote: an IoT-BIM integrated approach for indoor thermal comfort optimization. Autom Constr 129:103805 4. Gregori F, Papetti A, Pandolfi M, Peruzzini M, Germani M (2018) Improving a production site from a social point of view: an IoT infrastructure to monitor workers condition. Procedia CIRP 72:886–891 5. Mohammad GB, Shitharth S (2021) Wireless sensor network and IoT based systems for healthcare application. Mater Today Proc. https://doi.org/10.1016/j.matpr.2020.11.801 6. Lawal K, Rafsanjani HN (2022) Trends, benefits, risks, and challenges of IoT implementation in residential and commercial buildings. Energy Built Environ 3(3):251–266 7. Jin N, Zhang X, Hou Z, Sanz-Prieto I, Mohammed BS (2021) IoT based psychological and physical stress evaluation in sportsmen using heart rate variability. Aggression Violent Behav.https:// doi.org/10.1016/j.avb.2021.101587 8. Hong-tan LI, Cui-hua KONG, Muthu B, Sivaparthipan CB (2021) Big data and ambient intelligence in IoT-based wireless student health monitoring system. Aggression Violent Behav.https://doi.org/10.1016/j.avb.2021.101601

778

M. De Vita et al.

9. Zhang X, Zheng P, Peng T, He Q, Lee CKM, Tang R (2022) Promoting employee health in smart office: a survey. Adv Eng Inf 51.https://doi.org/10.1016/j.aei.2021.101518 10. Gulseren D, Lyubykh Z, Turner N (2021) Reimagining work safety behaviors in the light of COVID-19. Ind Organ Psychol 14(1–2):214–216. https://doi.org/10.1017/iop.2021.45 11. Laurini E, Rotilio M, De Berardinis P, Vittorini P, Cucchiella F, Di Stefano G, Ferri G, Stornelli V, Tobia L (2021) Coflex: flexible bracelet anti covid-19 to protect construction workers. Int Arch Photogramm Remote Sens Spatial Inf Sci XLVI-4/W1-2021. https://doi.org/10.5194/ isprs-archives-XLVI-4-W1-2021-63-2021 12. Makram BH, Ali F, Hala N, Hadi S (2021) Analysis of COVID-19 concerns raised by the construction workforce and development of mitigation practices. Front Built Environ 7:66. https://doi.org/10.3389/fbuil.2021.688495 13. Alsharef A, Banerjee S, Uddin SMJ, Albert A, Jaselskis E (2021) Early impacts of the COVID19 pandemic on the united states construction industry. Int J Environ Res Public Health 18:1559. https://doi.org/10.3390/ijerph18041559 14. Jahmunah V, Sudarshan VK, Oh SL, Gururajan R, Gururajan R, Zhou X, Tao X, Faust O, Ciaccio EJ, Hoong KN, Acharya UR (2021) Future IoT tools for COVID-19 contact tracing and prediction: A review of the state-of-the-science. Int J Imaging Syst Technol 31:455–471 15. Getz WM, Salter R, Luisa Vissat LL, Horvitz N (2021) A versatile web app for identifying the drivers of COVID-19 epidemics. J Transl Med 19:109 16. Ebekozien A, Aigbavboa (2021) C. COVID-19 recovery for the Nigerian construction sites: the role of the fourth industrial revolution technologies. Sustain Cities Soc 69:102803. https:// doi.org/10.1016/j.scs.2021.102803 17. Sharma NK, Gautam DK, Sahu LK, Khan MR (2021). First wave of covid-19 in India using IoT for identification of virus. Mater Today Proc (Jun 7). https://doi.org/10.1016/j.matpr.2021. 05.492 18. Oscar T (2022) Long-term nursing care at home: challenges and technology-driven solution approaches: the case of German healthcare system. In: Smart Home technologies and services for geriatric rehabilitation. Academic Press, pp 79–106 19. Willetts M, Atkins AS, Stanier C (2022) Big data, big data analytics application to smart home technologies and services for geriatric rehabilitation. In: Smart Home technologies and services for geriatric rehabilitation. Academic Press, pp 205–230 20. Pantoli L, Gabriele T, Donati FF, Mastrodicasa L, Berardinis PD, Rotilio M, Cucchiella F, Leoni A, Stornelli V (2021) Sensorial multifunctional panels for smart factory applications. Electronics 10:1495. https://doi.org/10.3390/electronics10121495 21. Rotilio M (2021) Product innovation between circular economy and Industry 4.0. TECHNE J Technol Archit Environ 22:192–200. https://doi.org/10.36253/techne-10598 22. Ramdevi M, Gujjula R, Ranjith M, Sneha S (2021) IoT evaluating indoor environmental quality check of air and noise. Mater Today Proc. https://doi.org/10.1016/j.matpr.2021.03.137 23. Majed Alowaidi K, Aljaafrah M, El Saddik A ( Empirical study of noise and air quality correlation based on IoT sensory platform approach. In: 2018 IEEE International instrumentation and measurement technology conference, pp 1–6. https://doi.org/10.1109/I2MTC.2018.8409629 24. Piyush P (2017) Smart IoT based system for vehicle noise and pollution monitoring. In: International conference on trends in electronics and informatics (ICEI), pp 322–326. https://doi. org/10.1109/ICOEI.2017.8300941 25. Saha AK, Sircar S, Chatterjee P, Dutta S, Mitra A, Chatterjee A, Chattopadhyay SP, Saha HN (2018) A raspberry Pi controlled cloud based air and sound pollution monitoring system with temperature and humidity sensing. In: 2018 IEEE 8th annual computing and communication workshop and conference, pp 607–611. https://doi.org/10.1109/CCWC.2018.8301660 26. Paolucci R, Rotiliom M, De Berardinis P, Ferri G, Cucchiella F, Stornelli V (2021) Electronic system for monitoring of dust on construction sites for the health of workers. In: 15th International conference on advanced technologies, systems and services in tele-communications, pp 329–332. https://doi.org/10.1109/TELSIKS52058.2021.9606281 27. LNCS Homepage, https://www.dontlookup-movie.com/. Last accesssed 30 Jan 2022

Improving Indoor Well-Being Through IoT: A Methodology …

779

28. Rotilio M (2020) Technology and resilience in the reconstruction process. A case study. Int Arch Photogramm RemoteSens Spatial Inf Sci XLIV-3/W1-2020:117–123. https://doi.org/10. 5194/isprs-archives-XLIV-3-W1-2020-117-2020 29. Nwaogu JM, Chan APC (2021) Work-related stress, psychophysiological strain and recovery among on-site construction personnel. Autom Constr 125.https://doi.org/10.1016/j.autcon. 2021.103629 30. Kim JH, Jo BW, Jo JH, Lee YS, Kim DK (2021) Autonomous detection system for non-hardhat use at construction sites using sensor technology. Sustainability 13(3):1102. https://doi.org/ 10.3390/su13031102 31. Sakhakarmi S, Park J (2020) Wearable tactile system for improved hazard perception in construction sites. In: Construction research congress 2020 American society of civil engineers, construction research congress, Mar 8–10, 2020, Tempe: Arizona. https://doi.org/10. 1061/9780784482872.014 32. Antwi-Afari MF, Li H, Seo J, Anwer S, Yevu SK, Wu Z (2020) Validity and reliability of a wearable insole pressure system for measuring gait parameters to identify safety hazards in construction. Eng Constr Archit Manag 28(6):1761–1779. https://doi.org/10.1108/ECAM-052020-0330 33. Kim TB, Ho CB (2021) Validating the moderating role of age in multi-perspective acceptance model of wearable healthcare technology. Telematics Inform 61.https://doi.org/10.1016/j.tele. 2021.101603 34. Laurini E, Rotilio M, Lucarelli M, https://www.ingegneri.cc/smart-safety-belt-la-cinturasmart.html/. Last accessed 30 Jan 2022

General Natural Language Processing Translation Strategy and Simulation Modelling Application Example Bernhard Heiden and Bianca Tonino-Heiden

Abstract In this paper, we describe the fundamental problem of how to translate natural language into computationally guided processes and suggest a straightforward strategy of implementing natural language processing by means of this relating the abstract word topology dimension towards a numerical, orgitonal and restricted or compressed dimensional of an abstract meta-mesh description and hence cybernetic modelling space. By this, natural language is a guide towards a scientific meshed topology of events or room-time configurations in a virtual bifurcating meaning room of natural language. We first sketch the general strategy and then apply the notion of the term ‘motion’ to explore meshed unities of this abstraction dimension space by this compressing meaning and hence information, in our abstract description state space of the orgitonal-osmotic based paradigm denoted by directions, flows and distance-sets, as well as Markovian transition calculus indicating the future use of matrix algebra in a demonstrative descriptive example. Keywords Orgiton theory · Natural language processing · Translation strategy · Movement · Cybernetic epistemology

1 Introduction Scientific applications refer to observations and measurements, which is very basic to form finally universal laws, which allow for predictions (see, e.g. [1], p. 3). There is an exhaustive endeavour to gain knowledge, and for this, we can formulate with Emma Ruttkamp a knowledge representation consisting of three elements: (i) an intended system, (ii) a conceptual model and (iii) an axiomatic method [2]. Especially, the B. Heiden (B) Carinthia University of Applied Sciences, 9524 Villach, Austria e-mail: [email protected] URL: http://www.cuas.at B. Heiden · B. Tonino-Heiden University of Graz, 8010 Graz, Austria © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_55

781

782

B. Heiden and B. Tonino-Heiden

axiomatic method dates back to Euclid’s ‘elements’ (in German ‘Elemente’) [3] and is now widely used in scientific theories, although often not recognised as such. Newtonian mechanics, e.g. consists of three axioms, according to [4, 5], p. 18 and Newton’s original publication in ‘Philosophiae naturalis principia mathematica’ in 1687. The rework of the Newtonian basic work has led in conjunction with the rework of the Euclidian geometry and here the distance axiom, which states that the shortest distance between two points is a straight line, by Hilbert [6] to Einstein’s theory of relativity, at this moment replacing this axiom by the new non-Euclidean one, that the shortest distance between two points is a straight line in space-time. With regard to natural language processing, Wittgenstein was one of the first to state that the natural language is sufficient to explain theories and that it is, in principle, not necessary to construct a separate language. However, we need a language to explicitly formulate a theory, which is also synonym to Wittgenstein’s privatelanguage argument [7].1 There is a wide development path in the history of science with regard to processes. Mainly at the beginning of the last century, Whitehead has introduced the new process philosophy paradigm [8], which was then followed by the cybernetics movement with the invention of this terminus [9], which is now common sense and deep part of ongoing informational progress. Selforganisational theory [10] bridges scientific disciplines that are far apart, like hard and soft sciences. This can also be linked to chaos theory [11] and complexity theory [12], using increasingly mathematicalstatistical and other computational techniques to model it. Finally, general system theory [13] together with selforganisational theory and others have given rise to orgiton theory, which makes a cybernetical assumption, bridging different knowledge areas deeper using units of cybernetic networks of information, energy and material or orgitons [14–16]. With relation to natural language programming, there is rapid development. Our here presented research topic is mainly based on the idea of how to correlate movement and language. This means that many now separate disciplines are connected. So we just dive into current touched research areas. The area of neuroscience methods may be represented by modelling neuro-states of humans with function activity, even in the rest-state [17]. This implies then the use of positive definite correlation matrices that calculate the transition of human total brain states. At least the state between males and females could be identified. Anyhow our paper addresses the relation of movement and observation, so this might then, e.g. be determined by a may be derived method, helping to identify the pattern, correlating movement and observation. Arbib has researched quite early on this field [18] and is related also to quite essential discoveries in this field, e.g. the mirror neuron, which is very well explained in [19], closing by this the link between cybernetics, neuroscience and linguistics. An excellent overview of the whole field of how the brain works can be found in Grossberg’s book [20] and also [21], which is not only interesting as it overcomes the problem of deep learning algorithms, which are quite famous but also erroneous 1

Also discussed in seminar talk with Peter Payer on the 21st of January 2022.

General Natural Language Processing Translation Strategy . . .

783

because of the ‘catastrophic’ forgetting, which is not the case with his Adaptive Resonance Theory (ART) algorithm, coming closer to the working functionality of the brain and hence an adequate modelling. It might well be that our simplified model would also have to profit from being embedded in this resonant brain network modelling as another promising approach to ‘explain’ the brain. Another approach to extract data from real-life systems to support our model could be the ‘data analytics’ approach as described, e.g. in [22]. From all sorts of data, all sorts of knowledge can be generated. For example, using Cyber-Physical Systems (CPS), the data can be created that are needed by adequate sensors in all processes. Hence, this is also applicable to observation, at least under the surface of the proposed observation-movement method’s overall process. In the model by Gridach [23], logic and unifying deep networks are combined. This and similar approaches can be used for reconstructing logic, formulating fuzzy logic or classifying relevant parts between objects. Graph Neural Networks (GNNs), based on deep learning, on the other hand, are also very popular in classification and ordering of systems according to [24]. Open and future research topics according to [24] and also indicated above partly, are robustness, interpretability, graph pre-training and complex graph structures. Content of the Work First, we give the research questions and the goal of this work in the next paragraph. Then, in Sect. 2, we introduce a basic epistemic orgiton model in two versions, which we will build later on for our argumentation. In Sect. 3, we give an idea for an algorithm of the natural language translation procedure. In Sect. 4, we apply the algorithm to the application of movement and show how the natural language translates into a mathematical formalisation that can be used for informational or computer modelling using this translation strategy algorithm. We will then also discuss the problems and limitations of this approach. In Sect. 5, we recapitulate the results and give some recommendations for researchers aiming to work with this approach, applying it in an inter-/and transdisciplinary fashion. Finally, in Sect. 6, we will summarise and conclude what are the implications of this work and will give an outlook for future applications and research work in this knowledge field. Research Questions and Goal of this Work The research question of this work is, how we can describe a method for general application to translate the natural language of every language in the world towards computational processing, utilising mechanistic abstractions and leading to a computational model that depicts functionally as a computational and informational model and the directed meaning dimension of the underlying expressed language expressions. Method The research method abstracts from mathematical methods, directing towards disciplines or application fields where then later on suitable methods can be applied. Hence, the primary method of this article is to explain with plain natural language arguing the progression of thoughts and knowledge, guided by axioms, allowing for an open follow-up research in different disciplines.

784

B. Heiden and B. Tonino-Heiden

2 Epistemic Knowledge Orgiton Model When looking at Fig. 1, we see the in the introduction depicted notion of (1) and the intended system, which we focus on as ‘observed system’, (2) the conceptual model and (3) the algorithmic representation of a language. In this first origitonal model of type (I) knowledge-orgiton or knowledge-o, we see two sets: That of the picture or the language (3), which depicts onto the premises the focus or goal (1) and the principal skeleton how to be able to observe (2) or the minimal structural condition to gain knowledge as a basic epistemology with close relation to the axiomatic method or as we could also say in the computational context, the basic epistemological knowledge gaining algorithm. In a refined epistemic orgiton model, we can link all these sets to one overall system, which we call knowledge-o of (second) type II in Fig. 2.

Fig. 1 Knowledge-o (I)

Text

Picture

1 3 2 Language

Process

Fig. 2 Knowledge-o (II)

Process-o0 1 Process-o1 2 Process-o2

3

Knowledge-o, Process-o

General Natural Language Processing Translation Strategy . . .

785

3 Natural Language Processing Translation Strategy or Algorithm We formulate an axiom system out of a ‘minimalistic’ natural language description of the underlying process that we want to computationalise or cyberneticise, which means that we produce a computational simulation of the process that we are focussing on. The problem focus (1) is the border of the system, and hence, the sentences as a mirror problem describe it as natural language axioms, and hence, the toolkit to describe (1) is (2) the natural language system. Axiom 1 A minimalistic set of the process description is the natural language axiomatisation. Further on, we have to describe in a computationalised way, which means we have to narrow down the language to a computational, mainly dependent on the available technology, which is neglected here to stay in the general systemic realm. Axiom 2 The sentences translate to a process performed by the symbolic language computational representational system. Each sentence of Language L1 according to Axiom 1 translates according to Axiom 2 into L2. We will not specify this further, but there has to be done a translation process which can be regarded as an optimisation process of finding concordance of L1 and L2 or a search and find process, a search-find-o according to Fig. 4. In fact, this process can be regarded as finding a higher order orgiton, in the knowledge-o, or a higher order trajectory according to the emergence sentence [25].

4 Application Example for the Simulation of Movement as a Translation Form Natural Language into Computer Processed Language In this Section, we apply the in the previous Section given algorithm. We first formulate a minimalistic set and a more general context of the problem. Then we show how the translation can be done straightforward.

4.1 Minimalistic Example Set We can formulate the movement problem as follows: Axiom 3 Follow a path. Axiom 4 A path is the connection of points in space.

786

B. Heiden and B. Tonino-Heiden

Axiom 5 Follow is in terms of the track, copy the track and do it yourself. (1) So when I execute a trajectory, I first have to think up the trajectory or look it up from somewhere. I know (as if I have knowledge) how the movement goes. (2) Subsequently, I only have to execute it myself. If we start from the mirror neurons, they draw the image of the path from the observation. This means that the human process of observing, e.g. already with the eyes, is already this following process. But following the movement is looking at object x and if x moves from point B to point B  then my eyes y move towards seeing the movement of x to A by y to B  . The A to A movement is with the eyes y. The A to A movement of y is, however, a projective image of the movement of x, i.e. the path B to B  from the perspective of observation. In relation to movement, I am observing means that I am moving and in a distorted perspective according to my observation standpoint. Now someone might object that a simulation is not a scientific method. We can answer that when we look at the movement of a body, its kinematics or dynamics, it is itself a biological simulation. Now building a machine that copies us is a simulation of the simulation, i.e. a continuous feedback process of reality with which we reconstruct the world and redesign or build it in the model as a continuation. From existing order blocks, we build up more complex ones in the sense of an ongoing open information process, understood as a cognitive process in the sense of Fig. 2 can be understood as the process of (3), i.e. the sentence formation is continuously generated when we consider the language system. A computer simulation is now a continuous recreation process of code in the form of virtual movement data or real movement data as an application. In humans, the observation itself results in movement, and the movement itself is a copy. Therefore, the basic process is a reflection of the observed in the observation, and thus, the process is internalised in the human being. For simplification, we introduce the observation axioms of points, which can be regarded as a finer-grained sub-specification of the formulated Axioms 3, 4, 5 leading to a smother natural language-computer translation later on. Axiom 6 We observe points in space.

4.2 Translation into a Computational Representation In the above example, we use the representation system of matrix algebra, a common feature of modern computer systems, e.g. when using computer algebra systems like MATLAB, Mathcad, Mathematica, Maple, Geobra and many others. In fact, these programmes that allow for symbolic transformation can also be regarded as artificial intelligence tools according to [26], as they enhance human intelligence by means of applied knowledge, usually only available for Ph.D.s in mathematics.

General Natural Language Processing Translation Strategy . . . Person, Eyes y

787

Object x

A

B

( ) Room B

* ( ) (multiplied) A’ B’ ( ) =Room B’

Movement’

Movement Double Mirror = Mirror of a Mirror (Mirror-o)

Fig. 3 Mirror-o

The movement itself can now be seen as a continuous matrix multiplication (see also Fig. 3) of moving movement systems. The Axiom 6 translates then to read in matrices of coordinates. This can then be used in Axiom 4 by connection to points or matrices by the transformation matrix according to Fig. 3. Now that we have observed the path of the object, we have to duplicate the path by copying the respective matrices and follow the observed path with our own motion control system, such that the process is in concordance with the observed system, which is, in fact, an optimisation process of the reality according to Axioms 3 and 5. In fact, the process is more complex in detail, as the fundamental search process is not implemented yet. In the following, we give an idea of how this can be done in principle. The axiomatisation and translation as demonstrated with Axioms 3–6 has then to be done to the pictorial process according to Fig. 4, which means that there has to be implemented a meta-algorithm, which represents the intelligence of the observers with regard to the observed in general.

4.3 Search- and Find Process or Search-Find-o The follow-up problem of observation is that of searching and finding the searchfind-orgiton or search-find-o (see also Fig. 4). But if I have a guess in which direction I should search, I find faster because I search according to a predictive or according to Carnap according to a ‘law’. This is the property of intelligence, the successful finding of an object match, i.e. a searchfind-o of a higher order. In the context of the human, we speak of understanding.

788 Fig. 4 Search-find-o

B. Heiden and B. Tonino-Heiden searching process

finding process

If I have forgotten everything (cf. this to the above mentioned Grossberg’s saying, that deep learning has the property of catastrophic forgetting, while ART is remembering by means of some sort of resonance phenomenon, or as we have said on another place with regard to brain functionality), I have to search in all directions. The finding is, after all, the recognition of a ‘known’ object, i.e. a pattern that I assume to be known, a form that I recognise. But the form is a feature from which I infer the whole. For example, this can be several features like red, round or mirroring for a ball. So a ball game playfully trains the predictive. That is why it is a substitute for high or top performance aimed activity in sport, just like a javelin thrower or a discus thrower at the Olympic games in Greece. Therefore, a game is always a potential preparation for extreme situations like survival training, which we call in our cultivated society in modest form, work or purposeful action. The challenge of man, culture, society and humanity is to shape action peacefully, that is, in such a way that it is inclusive and not exclusive. Cooperation is, therefore, the premise for peaceful action. But to survive, exclusion is always necessary. In general, however, the exclusion of the Kantian freedom of action of others to impede the (individual’s) own one is sufficient, according to the von Foerster’s basic ethic axiom [27], to act in such a way that the possibilities room shall be increased.

5 Results and Recommendation Results First, we have formulated a general epistemic set of two axioms analytically describing how we can formulate a real-world problem with a compact or minimal language description (Axiom 1) of near any real-world problem, especially a realworld process, or more specifically a sequence of events of physical objects. The second axiom (Axiom 2) then creates the order by any form of translation into another language. This can then be regarded as the generic epistemic knowledge-creating set. In the application example, we have given a simple descriptive example of how a problem is formulated and how it is solved. This can be regarded in a simplified

General Natural Language Processing Translation Strategy . . .

789

formulation, the problem of movement of objects and the related issue of how organisms can learn to move on their own, first as a result of following an object and later on following an inner copied object or an internal knowledge representation as an higher order orgiton. Recommendation The recommendation for future research and colleagues, not only in this field, is, when using different fields and disciplines of solving any real-world intelligence problem using language and computation, to follow four steps: (1) First formulate a problem in the fully aware context of life, by this compressing information by human intelligence. This is the basis modelling part. (2) Divide the overall (problem) process chain into separate disciplines or languages. (3) Translate this problem to a neighbouring problem discipline by translating it from one discipline to another, using disciplinary and transdisciplinary tools. (4) Formulate a closed-loop by transforming all translations to one overall process that refers to the first one, by this yielding an orgiton of higher order. When we start with (1), different tools from the natural language programming community can be used. An overall cycle of a machine translation could be, according to [28], the process chain (2) Embedding, Encoding (3), Decoding and Classifying, which then links the overall process of human to human translation (4). It is important to see that any process can be translated into another one. We have shown this, e.g. in our [29] paper, and how processes increase potentially order but also change their meaning. What comprises on the outer side a material system can be on the inner side an information system. This model can be applied to very simple but also to very complex models. The starting point is always language itself and with it connected logic. If logic itself is applied as in first-order logic, or in hybrid form with deep learning like in [23] then the mathematical or algorithmic part can be applied more or less directly. Otherwise, the translation to other mathematical forms has to be done according to the discipline, like Newtonian mechanics, Graph Theory, Deep Learning and many others.

6 Conclusion and Outlook Conclusions We have introduced in this work a near general method on how to transform a natural language system into computational systems. The method is straightforward and allows for an algorithmic and hence automated translation process of natural language descriptions into computational problems. The problem of commanding machines by this has been reduced according to the formulation of a set of axioms to compute the sentences with a human assistant or cooperative device as an application of direct use. It may hence be used for real-time control, and up to now, the off-line teach in of robots, e.g. for their movement control. We have argued that the movement problem, although it is quite elementary, is of deeper interest, as all sorts of processes can be built upon it, and even the cybernetic

790

B. Heiden and B. Tonino-Heiden

basis process which we have introduced here as a general algorithm can be regarded as a movement process, in this case of an higher order orgiton with relation to wisdom of knowledge process if we regard the epistemic process of how to generate knowledge in general, here demonstrated by the first introduced orgiton models for knowledge, knowledge-o I and II. In this model, the computation according to the algorithm in Sect. 3 leads to a potentially higher ordered orgiton a processo-o3 in our notion. The translation of natural language into computational representation according to Axiom 2 and the system formulation process according to Axiom 1 are closely related, as we have seen in the example. The practical problem is that every sentence has to be translated into more generalised processes, which means in the end, that the process may be divided further and into smaller elements which then have sufficient general properties that can be described with formalised and symbolised narrow languages as, e.g. matrix calculus in the indicated example. The advantage of this process is twofold. First, when we once have explored a language territory, e.g. that of movement, the process is straightforwardly usable for similar problems. Second, using the narrowing down of the problem into smaller and then more similar elements, the problem reduces to the statistical problem, whose simplification lies in the applicability of high degrees of possible automatisation of calculation processes in computational machines. These processes are better mind-controllable, explainable or understandable on the one side, and due to possible high model complexity highly adaptive on a larger scale on the other side, which also applies to the properties of the solidification fluidisation theorem [25], which relates better large system control to smaller basic units, because of the feedback density probability increase. Concerning our research, we think of further developing the simulation process on how this can be further merged with artificial intelligence tools, and how to further integrate manufacturing systems (see e.g. [30]) with formalised natural language systems up to the field of management, economic, organisational and other systemic applications. Outlook As an outlook, researchers in the broad field of not restricted to natural language programming and modelling and simulation are invited to try out the in this paper given algorithmic ideas and implementing them in the wide area of applications. The here indicated matrix calculus is only one. You can think of many other methods, like, e.g. all sorts of mathematics, geometry, like differential calculus, set theory or logic systems, programming languages like C++, Fortran or Python, or sophisticated process simulation software like, e.g. Witness or Anylogic in production, logistics and general systems, to mention only a few formalised symbolisation or otherwise systems available for disciplinary and transdisciplinary calculation and computation.

References 1. Carnap R (1974) An introduction to the philosophy of science. Basic Books, New York, 300 p 2. Ruttkamp E (2006) Philosophy of science: interfaces between logic and knowledge representation. S Afr J Philos 25(4):275–289. https://doi.org/10.1080/02580136.2006.12063057

General Natural Language Processing Translation Strategy . . .

791

3. Euklid (2015) Die Elemente - Bücher I-XIII. In: Thaer C (ed) Ostwalds Klassiker der exakten Wissenschaften, Band 235. Europa-Lehrmittel Verlag, Haan-Gruiten, 499 p 4. Pestel E (1969) Statics. McGraw-Hill, New York, 464 p 5. Pestel E (1969) Technische Mechanik - Statik Einführung, Band 1. BI-Wissenschaftsverlag, Mannheim, 284 p 6. Colerus E (1944) Von Pythagoras bis Hilbert - Die Epochen der Mathematik und ihre Baumeister - Geschichte der Mathematik für Jedermann. Karl H. Bischoff Verlag, Berlin-Wien-Leipzig, 362 p 7. Wittgenstein L (2019) Philosophische Untersuchungen, 9th edn. Suhrkamp Verlag AG, 300 p. ISBN: 3518223720 8. Whitehead AN, Holl HG (1987) Prozeß und Realität. Suhrkamp Verlag, 665 p. ISBN: 3518282905 9. Wiener N (1963) Kybernetik: Regelung und Nachrichtenübertragung im Lebewesen und in der Maschine. Cybernetics or control and communication in the animal and the machine (Deutscher Originaltext). Econ Verlag, 287 p 10. Götschl J (2006) Self-organization: new foundations for a more uniform understanding of reality (Original in German: ‘Selbstorganisation: Neue Grundlagen zu einem einheitlicheren Realitätsverständnis’). In: Vec M, Hütt MT, Freund A (eds) Self-organization—a system of thought for nature and society (Original in German: ‘Selbstorganisation - Ein Denksystem für Natur und Gesellschaft’). Böhlau Verlag, Köln, pp 35–65 11. Hilborn RC (1994) Chaos and nonlinear dynamics—an introduction for scientists and engineers. Oxford University Press, New York, 671 p 12. Ladyman J, Lambert J, Wiesner K (2012) What is a complex system? Eur J Philos Sci 3(1):33– 67. https://doi.org/10.1007/s13194-012-0056-8 13. von Bertalanffy L (2009) General system theory, revised edn. George Braziller, New York, 295 p 14. Heiden B, Tonino-Heiden B, Wissounig W, Nicolay P, Roth M, Walder S, Mingxing X, Maat W (2019) Orgiton theory (unpublished) 15. Heiden B, Tonino-Heiden B (2022) Philosophical studies—special orgiton theory/Philosophische Untersuchungen - Spezielle Orgitontheorie (English and German Edition) (unpublished) 16. Heiden B, Tonino-Heiden B (2022) Diamonds of the orgiton theory. In: 2022 11th international conference on industrial technology and management (ICITM), Oxford, Great Britain (unpublished) 17. Huang S-G, Samdin SB, Ting C-M, Ombao H, Chung MK (2020) Statistical model for dynamically-changing correlation matrices with application to brain connectivity. J Neurosci Methods 331:108480. https://doi.org/10.1016/j.jneumeth.2019.108480 18. Arbib MA, Caplan D (1979) Neurolinguistics must be computational. Behav Brain Sci 2:449– 483 19. Arbib MA (2018) From cybernetics to brain theory, and more: a memoir. Cogn Syst Res 50:83–145. ISSN: 13890417. https://doi.org/10.1016/j.cogsys.2018.04.001 20. Grossberg S (2021) Conscious mind, resonant brain: how each brain makes a mind. Oxford University Press, 768 p. ISBN: 0190070552 21. Grossberg S (2020) A path toward explainable AI and autonomous adaptive intelligence: deep learning, adaptive resonance, and models of perception, emotion, and action. Front Neurorobotics 14. https://doi.org/10.3389/fnbot.2020.00036 22. Bang SH, Ak R, Narayanan A, Lee YT, Cho H (2019) A survey on knowledge transfer for manufacturing data analytics. Comput Ind 104:116–130. https://doi.org/10.1016/j.compind. 2018.07.001 23. Gridach M (2020) A framework based on (probabilistic) soft logic and neural network for NLP. Appl Soft Comput 93:106232. https://doi.org/10.1016/j.asoc.2020.106232 24. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M (2020) Graph neural networks: a review of methods and applications. AI Open 1:57–81. https://doi.org/10.1016/j. aiopen.2021.01.001

792

B. Heiden and B. Tonino-Heiden

25. Heiden B, Tonino-Heiden B (2022) Emergence and solidification-fluidisation. In: Arai K (ed) LNNS 296. Intelligent systems conference (Intellisys) 2021, Amsterdam, The Netherlands, Fully virtual conference, 2–3 September 2021. Lecture notes in networks and systems. Springer Nature, Switzerland AG, pp 845–855. https://doi.org/10.1007/978-3-030-82199-9_57 26. Soldov A, Ochkov V (2005) Differential models. Springer, Berlin, 221 p 27. von Forster H (1993) KybernEthik. Merve Verlag GmbH, 175 p. ISBN: 3883961116 28. Tan Z, Wang S, Yang Z, Chen G, Huang X, Sun M, Liu Y (2020) Neural machine translation: a review of methods, resources, and tools. AI Open 1:5–21. https://doi.org/10.1016/j.aiopen. 2020.11.001 29. Heiden B, Tonino-Heiden B (2020) Key to artificial intelligence (AI). Intelligent systems and applications, IntelliSys 2020. In: Arai K, Kapoor S, Bhatia R (eds) Advances in intelligent systems and computing, vol 1252. Springer, Cham, pp 647–656. https://doi.org/10.1007/9783-030-55190-2_49 30. Heiden B, Tonino-Heiden B, Alieksieiev V, Hartlieb E, Foro-Szasz D (2021) Lambda Computatrix (LC)—towards a computational enhanced understanding of production and management. In: Yang X-S, Sherratt S, Dey N, Joshi A (eds) Proceedings of sixth international congress on information and communication technology: ICICT 2021, London, United Kingdom, Online, February 25–26, 2021, vol 236. Lecture notes in networks and systems. Springer Nature Singapore Pte Ltd., pp 37–46. https://doi.org/10.1007/978-981-16-2380-6_4

Artificial Intelligence in Disaster Management: A Survey Suchita Arora, Sunil Kumar, and Sandeep Kumar

Abstract This paper provides a literature review of cutting-edge artificial intelligence-based methods for disaster management. Most governments are worried about disasters, which, in general, are unbelievable events. Researchers tried to deploy numerous artificial intelligence (AI)-based approaches to eliminate disaster management at different stages. Machine learning (ML) and deep learning (DL) algorithms can manage large and complex datasets emerging intrinsically in disaster management circumstances and are incredibly well suited for crucial tasks such as identifying essential features and classification. The study of existing literature in this paper is related to disaster management, and further, it collects recent development in nature-inspired algorithms (NIA) and their applications in disaster management. Keywords Disaster management · Disaster mitigation · Disaster preparedness · Disaster response · Disaster recovery · Machine learning

1 Introduction ML is a subfield of AI, and deep learning is a subfield of ML; sometimes, they are used interchangeably. These techniques are used in prediction and forecasting for various fortunate or unfortunate events. As of January 2022, there are more than 4.95 billion active Internet users and approximately 4.62 billion active social media users depositing massive data on social media platforms such as Facebook, S. Arora · S. Kumar Department of Computer Science and Engineering, Amity School of Engineering and Technology, Amity University Rajasthan, Jaipur, Rajasthan 303002, India e-mail: [email protected] S. Kumar e-mail: [email protected] S. Kumar (B) Department of Computer Science and Engineering, CHRIST (Deemed to be University), Bangalore, Karnataka 560074, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_56

793

794

S. Arora et al.

Fig. 1 Role of AI for natural disasters

Twitter, Instagram, WhatsApp, and others. During the disaster response period, many users post several information on Twitter like disaster damage reports and disaster readiness conditions, making Twitter a vital social media for updating and retrieving data. Globally, governments spend vast amounts on providing aid packages to their citizens. The primary objective of government palliatives, relief, and aid packages is to cushion economic and psychological effects on the citizens affected by the disaster [1]. Disruptive technologies are one of the best choices for the timely prediction and analysis of disasters and are also helpful in their management. These technologies improved the infrastructure and have the potential to enhance disaster management. Disruptive technology is an innovation that significantly alters how buyers, industries, or enterprises work. A disruptive technology removes the systems or practices it replaces because it has superior attributes. Recent disruptive technology examples include social media, online news sites, e-commerce, the Internet of things, advanced robotics, automation of knowledge work, and autonomous vehicles. Television, electricity service, and automobiles were disruptive technologies in their times. Disaster management is categorized into relief, preparedness, reaction, and healing. The role of AI in disaster management is illustrated in Fig. 1.

2 Literature Survey The study of existing literature in this paper is divided into two parts. First, it covers research related to disaster management, and further, it collects recent development in NIA and their applications in disaster management. The importance of this topic is

795

Number of Documents

Artificial Intelligence in Disaster Management: A Survey

Year

Fig. 2 Year-wise publication on the topic of natural disaster. Source Scopus database

Fig. 3 Subject-wise publication on the topic of natural disaster. Source Scopus database

Mathematics 2% Nursing Decision 3% Sciences 3% Business, Management and Accounting…

Others 13%

Medicine 15%

Social Sciences 14%

Computer Science… Earth and Planetary Sciences 11%

Environmenta l Science 12%

Engineering 13%

explained with the help of publications on disaster management in various journals indexed in the Scopus database and depicted in Figs. 2, 3, 4 and 5. Figures 2, 3 and 4 illustrate the importance of the topic selected in this research. These figures show that researchers and scientists around the globe are working on disaster management due to the increasing frequency of natural and man-made disasters. Figure 2 shows the year-wise distribution of publications on disaster management, and Fig. 3 demonstrates the subject-wise distribution. In contrast, Fig. 4 analyzes based on the type of document published, and Fig. 5 shows country-wise publication detail. All these data were fetched from the Scopus database.

2.1 Disaster Management Munawar et al. [2] explored the possible ways to establish smart cities to minimize the potential loss with the help of disruptive technologies that may be caused by disasters.

796

S. Arora et al.

Fig. 4 Type of documents published on the topic of natural disasters. Source Scopus database

Taiwan South Korea Country

Indonesia Canada Germany India United Kingdom United States 0

5000 10000 15000 Number of Publications

20000

Fig. 5 Country-wise publication on the topic of natural disasters. Source Scopus database

Munawar et al. [2] considered the use of advanced connection and communications, use of smartphones, and Internet of things. The authors also examine the role of AI, big data, and advanced robotics. Munawar et al. [2] concluded that AI is the most crucial technology of recent times, with immense potential to change how disasters are handled entirely. Additionally, big data and IoT are other emerging technologies that help effectively manage relief work post-disaster. Paul and Bhaumik [3] explored the possibility of using IoT devices with AI in early prediction, recovery, and response for disaster management. IoT devices can collect real-time data for different events, which can be helpful for the prediction and monitoring of disastrous occurrences using AI approaches. IoT devices are beneficial for keeping track of equipment deployed in disaster management.

Artificial Intelligence in Disaster Management: A Survey

797

Yu et al. [4] performed a detailed review of the literature relating to big data, analyzed its role in managing natural disasters, and highlighted the current state of technology in providing practical and applicable solutions for managing a natural disaster. Yu et al. [4] illustrated the conclusions of several researchers on diverse scientific and technical approaches that impact the effectiveness of big data in reducing losses caused by natural disasters and help in managing relief work effectively. Karimiziarani et al. [5] highlighted the role of social media during natural calamities and after their occurrence. Karimiziarani et al. [5] presented an exhaustive analysis of the Twitter dataset for a specific time and geographic location. Khatoon et al. [6] addressed the availability of automated social media information extraction tools based on computer and analysis of this information. People posted their opinion on social networking sites, including Twitter, Instagram, and Facebook. With the increasing demand to stay connected and relevant, we can see rapid growth in various social media sites where people form and express their opinions on day-to-day issues. Sentiment analysis, also known as opinion mining, determines the mood behind an opinion. Shekhawat et al. [7] introduced a hybrid variant of SMO with k-means clustering (SMOK). The SMOK is deployed for deciding the optimal cluster head in the dataset. Shekhawat et al. [7] initialized SMO solutions with k-means and iteratively evaluated solutions using fitness. This approach eliminated random initialization of solutions and replaced it with k-means, leading to an enhanced convergence rate. Pandey et al. [8] hybridized cuckoo search with k-means for analyzing tweets. Both [7] and [8] used the existing dataset for the study. Chamola et al. [9] conducted an in-depth survey on disaster and pandemic management and the role of machine learning in this field. Significant challenges identified by Chamola et al. [9] are the requirement of specific data and proper feature extraction and selection, the requirement of a large amount of data, and inaccurate data. As per a survey carried out by Chamola et al. [9], feature selection can improve the performance of existing models, and to date, it is less exploited for disaster management. Nature-inspired algorithms are best suitable for the selection of optimal feature sets. The existing image systems lack adequate accuracy and must be enhanced for proper utilization. There are very few methods available to select the prominent features from the images to detect the loss caused by the disaster. Rare work reported disaster prediction and loss estimation using social media data. Fan et al. [10] proposed a vision using a digital twin for disaster cities with multidata sensing for data collection, integration and analytics, multi-actor game-theoretic decision-making, and dynamic network analysis. Fan et al. [10] extracted that social media helps identify critical events like power cuts quickly, need for food, and need for shelter. Social media data is useful in identifying public opinion about postdisaster relief work. This type of analytics enhances the decision-making process and provides early feedback for government policies. Finding out public opinion and consumer opinion has been a big business for a long time for marketing and public relations. An essential feature of social media is that they allow everyone from anywhere. One can freely express thoughts and ideas without telling their identity and without fear of unwanted consequences. Therefore,

798

S. Arora et al.

these views are precious. However, this anonymity also comes at a price. In this way, people with hidden or malicious intentions can easily play with the system to be independent public members and publish false opinions for advertising or promotion. Therefore, defame the target products, services, organizations, or individuals without revealing their true intentions or the secretly working person or organization. Therefore, it is necessary to analyze their social media posts to understand people’s opinions. Nowadays, sentiment analysis on various topics like products, services, movies, and daily social issues has become very important for companies as it helps them understand their users. Twitter is a micro-blogging site that permits receiving and sending small posts; these posts are tweets. Tweets can include text, photos, videos, and relevant resources associated with a particular topic. Sentiment analysis is a process of finding opinions or emotions in the direction of a specific topic. This is a process of determining whether the post is neutral, positive, or negative. This process is helpful for data analysis and understanding the writer’s experience. This is a significant area of data mining that identifies and transforms sentiment available on social media. Therefore, sentiment analysis of Twitter data is an area that has generated much interest over the past decade. This requires breaking down the “tweet” to determine user sentiment. Tan et al. [11] also analyzed the role of AI in disaster management. Artificial neural network (ANN)-based research shared more than 20% in case of disaster management, followed by support vector machine (SVM) and fuzzy logic (FL) with 12% and 8%, respectively. Some other famous techniques are regression algorithm (RA), genetic algorithms (GA), particle swarm optimization (PSO), and another classifier. Tan et al. [11] raised the issue related to the severity of the process at each stage of disaster management. M. Ivic [12] focused on analyzing geospatial data for disasters. Author mainly targeted machine learning-based approaches for analysis. Abid et al. [13] discussed how AI could help accelerate the disaster management process. This paper advocates that AI models help strengthen disaster management. Currently, research focuses on responding to and mitigating a disaster. Geospatial technology persists in growing and delivers a timely solution. Thus, it is compulsory to conceive and consider the geographical significance. Sun et al. [14] provided a summary of the contemporary applications of AI in managing disasters. Disaster management is categorized into four different classes: relief, preparedness, reaction, and healing. Sun et al. [14] discover that most AI applications concentrate on the disaster reaction step, while early prediction is a critical task. Saleem and Mehrotra [15] discussed some recent research for disaster management using information collected from social media. In the old days, newspapers, radio, and television were the only medium to get news and information on recent happening, but nowadays, we are equipped with smartphones, and we have several social media platforms to stay up to date. A short description of disaster management activities using AI is illustrated in Table 1.

Artificial Intelligence in Disaster Management: A Survey

799

Table 1 Disaster management with technology Author [References]

Year

Target natural disaster

Technology used

Area of research

Remark

Tang et al. [16]

2005

Tropical cyclone intensity

Apriori base

Data mining

Considered case study of Atlantic Basin

Anzengruber et al. [17]

2013

Prevent crowd disasters

SVM, linear ML regression (LR), and Gaussian model

GPS sensors were deployed and considered a case study of San Francisco

Buranasing and Prayote [18]

2014

Storm intensity

Symbolic aggregate approximation (SAX) and ANN

ML

Satellite image data used prediction intensity typhoons and tropical cyclones

Wolshon [19]

2015

Activating contraflows

Decision trees

ML

Weka tool used for Hurricanes prediction

Hassija et al. [20]

2016

Forecast flood risk

ANN

IoT and ML

Zigbee and WSN techniques used

Gupta et al. [21]

2016

Classification of crowd situation

Deep CNN and random forest

ML

UMN, UCSD, and Pets2009

Mori et al. [22]

2016

Disaster recognition

BB-SVM

ML

Enhanced existing ERESS model and considered case study of Kansai University

Dey et al. [23] 2017

Assessing areas

Ad hoc network UAV as an aerial mesh network

Raspberry Pi with NOIR Pi camera employed for a case study of UEM Campus

Amit and Aloki [24]

Flood and landslide detection

Convolutional neural network

Considered case study of Japan and Thailand

2017

ML

(continued)

800

S. Arora et al.

Table 1 (continued) Author [References]

Year

Target natural disaster

Technology used

Area of research

Remark

Jiang et al. [25]

2017

Determine an evacuation route

LSTM model

ML

Spatio and temporal features used and considered in a case study of the Kumamoto earthquakes

Tian and Jiang 2018 [26]

Planning evacuation route

Reinforcement learning

ML

Considered case study of Hong Kong fire outbreak

Shaiba et al. [27]

2018

Sandstorm detection

CART decision tree, Naïve Bayes, and LR

ML

Considered case study of Riyadh, Dammam, and Jeddah

Sadhukhan et al. [28]

2018

Extracting useful information

LR, SVM, and Social media voting classifier

Considered Twitter data for Chennai (rainfall)

Xu et al. [29]

2018

Crowd evacuation

CLOTHO (APA IoT and APA-RF)

Considered case study of chemical plant in Nanjing, China

Zhang et al. [30]

2019

Disaster assessment

Star algorithm, Tabu search, and gradient descent

UAV

Multi-UAVs were used in the case of the Jiuzhai valley earthquakes

Terzi et al. [31]

2019

Disaster assessment

SWIFTERS

UAV

DJI UAV, ROS library, and MapServer

Kumar et al. [32]

2019

Prediction

Geofencing

UAV

UAVs base station, Flight controller Surat, India

Balamurugan and Manojkumar [33]

2019

Rainfall prediction

ANN and LR

IoT and ML

LoRaWAN used and considered a case study of the UEM Campus (continued)

Artificial Intelligence in Disaster Management: A Survey

801

Table 1 (continued) Author [References]

Year

Target natural disaster

Technology used

Area of research

Remark

Wang et al. [34]

2019

Determine an evacuation route

K-medoids and reinforcement learning

ML

Considered office scenario

Wei and Sheng [35]

2019

Classification of crowd situation

CNN classifier and K-means

ML

Considered video data

Shibata and Yamamoto [36]

2019

Determine an evacuation route

Deep neural networks

ML

Raspberry Pi-3 and spectrum analyzer deployed for the case of Ritsumeikan University

Yabe and Ukkusuri [37]

2019

Foresee returning pattern for vacated

Gradient boosting

Social media

Considered Twitter data for New Jersey (Hurricane Sandy)

Assery et al. [38]

2019

Identifying useful tweets for helpful information

LR and Naive Bayes

Social media

Considered Twitter data for Hurricane Florence and Hurricane Michael

Akshya and Priyadarsini [39]

2019

Detecting flood areas

PCA, K-means clustering, and SVM

ML and UAV

Drones used for aerial images

Jamali et al. [40]

2019

Linkage of tweets with disaster

Dirichlet regression and dynamic query expansion (DQE)

Social media

Considered Twitter data for New York City (Hurricanes)

Anbarasan et al. [41]

2020

Floods prediction

CNN

IoT and ML

Hadoop MapReduce used and considered a case study of Surat, India

Ginantra et al. 2020 [42]

Detecting ISPA

SVM, KNN, neural networks, and Naïve Bayes

ML

Considered case study of Indonesia

802

S. Arora et al.

2.2 Recent Development in NIA and Their Applications in Disaster Management Heuristic and metaheuristic optimization techniques were developed in the late 70s to overcome the drawbacks of traditional optimization techniques. These newer methods overcame problems such as nonlinearity; besides, they were robust, discontinuous, discrete, globally performing, multi-objective, uncertain, and effective in handling dynamic problems. The intelligent natural phenomenon leads to the development of NIAs. Swarm and evolutionary algorithms (refer to Fig. 6) are two basic categories for NIAs. Evolutionary algorithms are based on the natural evolution of living beings and selection by nature—swarm intelligence-based algorithms are based on the collective intelligent behavior of simple individual agents. Individual agents are straightforward and have significantly less memory or no memory. These agents communicate with each other and share information among themselves. In most species, work is divided among them and fulfills the concept of division of labor. Swarming behavior has two necessary and sufficient conditions: self-organization and division of labor. The swarm followed the rule of self-organization, and specialized individuals performed specific tasks without any central control. Evolutionary algorithms are based on selection by nature and the evolution of species. Some popular algorithms inspired by natural phenomena are discussed in subsequent sections. The spider monkey optimization (SMO) algorithm imitates the social behavior of spider monkeys. The SMO algorithm was developed by Bansal et al. [43] in 2013. It is one of the recent population-based approaches in swarm intelligence. The spider monkey belongs to the family of atelidae and is found in the tropical forest of Central and South America. This species of monkey has long prehensile tails and long limbs. They use their tail as a fifth limb during forage in the high canopy. Generally, these

Fig. 6 Classification of nature-inspired algorithms

Artificial Intelligence in Disaster Management: A Survey

803

monkeys live in groups and split into small groups to fulfill the food requirements. Bansal et al. [43] studied the food foraging behavior of spider monkeys and developed the SMO algorithm. This class of monkeys follows a unique social system while foraging, that is, the fission–fusion social structure (FFSS). In this society, animals split (fission) into small groups for foraging and merge (fusion) into a large group at night for security. This society has three significant elements: composition, subgroup size, and dispersion. The group consisted of 40–50 members, and the most senior female member headed it. This oldest female was designated as a global leader and responsible for splitting into small groups. These algorithms used for plant identification used CNN to develop a plant identification model. CNN contains many transfer learning techniques. The three most popular models are discussed in subsequent subsections (also called subgroups) to meet food requirements. Subsequently, subgroups are also headed by the senior female member and termed as a local leader for that group. In spider monkeys, the size of parent groups and subgroups is dynamic [43].

3 Conclusion This paper performed a detailed survey on the role of AI in disaster management. The last few decades witnessed many deadly natural disasters and a steep hike in frequency. This surge in the number of disasters increased the worry of governments and forced them to equip their emergency services with state-of-the-art technologies, so the optimal use of available resources may be possible and loss can be minimized. This paper highlighted recent AI development for disaster management. The paper first presented a detailed discussion on disaster management. Then, we discussed various NIA that can deal with large datasets. Eventually, we outlined various challenges, open issues, and future research directions.

References 1. GFDRR (2018) Machine learning for disaster risk management. Washington, DC: Global Facility for Disaster Reduction and Recovery (GFDRR) 2. Munawar HS, Mojtahedi M, Hammad AW, Kouzani A, Mahmud MP (2022) Disruptive technologies as a solution for disaster risk management: a review. Sci Total Environ 806:151351 3. Kamal Paul S, Bhaumik P (2022) Disaster management through integrative AI. In: 23rd International conference on distributed computing and networking, pp 290–293 4. Yu M, Yang C, Li Y (2018) Big data in natural disaster management: a review. Geosciences 8(5):165 5. Karimiziarani M, Jafarzadegan K, Abbaszadeh P, Shao W, Moradkhani H (2022) Hazard risk awareness and disaster management: extracting the information content of twitter data. Sustain Cities Soc 77:103577

804

S. Arora et al.

6. Khatoon S, Asif A, Hasan MM, Alshamari M (2022) Social media-based intelligence for disaster response and management in smart cities. In: Artificial intelligence, machine learning, and optimization tools for smart cities. Springer, Cham, pp 211–235 7. Shekhawat SS, Shringi S, Sharma H (2021) Twitter sentiment analysis using hybrid Spider Monkey optimization method. Evol Intel 14(3):1307–1316 8. Pandey AC, Rajpoot DS, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manage 53(4):764–779 9. Chamola V, Hassija V, Gupta S, Goyal A, Guizani M, Sikdar B (2021) Disaster and pandemic management using machine learning: a survey. IEEE Internet of Things J 8(21):16047–16071. https://doi.org/10.1109/JIOT.2020.3044966 10. Fan C, Zhang C, Yahja A, Mostafavi A (2021) Disaster City Digital Twin: a vision for integrating artificial and human intelligence for disaster management. Int J Inf Manage 56:102049 11. Tan L, Guo J, Mohanarajah S, Zhou K (2021) Can we detect trends in natural disaster management with artificial intelligence? A review of modeling practices. Nat Hazards 107(3):2389–2417 12. Ivi´c M (2019). Artificial Intelligence and geospatial analysis in disaster management. Int Arch Photogrammetry Remote Sens Spatial Inf Sci 13. Abid SK, Sulaiman N, Chan SW, Nazir U, Abid M, Han H, Ariza-Montes A, Vega-Muñoz A (2021) Toward an integrated disaster management approach: how artificial intelligence can boost disaster management. Sustainability 13(22):12560 14. Sun W, Bocchini P, Davison BD (2020) Applications of artificial intelligence for disaster management. Nat Hazards 103(3):2631–2689 15. Saleem S, Mehrotra M (2022) Emergent use of artificial intelligence and social media for disaster management. In: Proceedings of international conference on data science and applications. Springer, Singapore, pp 195–210 16. Tang J, Yang R, Kafatos M (2005) Data mining for tropical cyclone intensity prediction. In: Sixth conference on coastal atmospheric and oceanic prediction and processes, 01 2005 17. Anzengruber B, Pianini D, Nieminen J, Ferscha A (2013) Predicting social density in mass events to prevent crowd disasters. 8238(11):206–215 18. Buranasing A, Prayote A (2014) Storm intensity estimation using symbolic aggregate approximation and artificial neural network. Int Comput Sci Eng Conf (ICSEC) 2014:234–237 19. Wolshon B (2008) Contraflow for evacuation traffic management. Boston, MA: Springer US, 2008, pp 165–170. [Online]. Available: 10:1007/978-0-387-35973-1_210 20. Hassija V, Chamola V, Saxena V, Jain D, Goyal P, Sikdar B (2019) A survey on IoT security: application areas, security threats, and solution architectures. IEEE Access 7:82 721–82 743 21. Gupta T, Nunavath V, Roy S (2019) Crowdvas-net: a deep-CNN based framework to detect abnormal crowd-motion behavior in videos for predicting crowd disaster. In: 2019 IEEE international conference on systems, man and cybernetics (SMC), pp 2877–2882 22. Mori K, Wada T, Ohtsuki K (2016) A new disaster recognition algorithm based on SVM for eress: buffering and bagging-SVM. In: 2016 45th International conference on parallel processing workshops (ICPPW), pp 22–30 23. Dey S, Kamal MN, Dutta S, Tiwari A, Ray S, Moatasimbillah MJ, Saha N, Adhikary N, Mukherjee D, Nayak S, Dey R, Saha S (2017) Ad-hoc networked UAVs as aerial mesh network for disaster management application and remote sensing: an approach. In: 2017 8th IEEE annual information technology, electronics and mobile communication conference (ICON), 2017, pp 301–304 24. Amit SNKB, Aoki Y (2017) Disaster detection from aerial imagery with convolutional neural network. In: 2017 International electronics symposium on knowledge creation and intelligent computing (IES-KCIC), pp 239–245 25. Jiang F, Zhong L, Thilakarathna K, Seneviratne A, Takano K, Yamada S, Ji Y (2017) Supercharging crowd dynamics estimation in disasters via Spatio-temporal deep neural network. IEEE Int Conf Data Sci Adv Anal (DSAA) 2017:184–192 26. Tian K, Jiang S (2018) Reinforcement learning for safe evacuation time of fire in hong kongZhuhai-Macau immersed tube tunnel. Syst Sci Control Eng 6:45–56

Artificial Intelligence in Disaster Management: A Survey

805

27. Shaiba HA, Alaashoub NS, Alzahrani AA (2018) Applying machine learning methods for predicting sand storms. In: 2018 1st international conference on computer applications information security (ICCAIS), pp 1–5 28. Sadhukhan S, Banerjee S, Das P, Sangaiah AK (2018) Producing better disaster management plan in post-disaster situation using social media mining. In: Computational intelligence for multimedia big data on the cloud with engineering applications. Elsevier, 2018, pp 171–183 29. Xu X, Zhang L, Sotiriadis S, Asimakopoulou E, Li M, Bessis N (2018) Clotho: a large-scale internet of things-based crowd evacuation planning system for disaster management. IEEE Internet Things J 5(5):3559–3568 30. Zhang Z, Wu J, He C (2019) Search method of disaster inspection coordinated by multi-UAV. In: 2019 Chinese control conference (CCC), 2019, pp 2144–2148 31. Terzi M, Anastasiou A, Kolios P, Panayiotou C, Theocharides T (2019) Swifters: a multiUAV platform for disaster management. In: 2019 International conference on information and communication technologies for disaster management (ICT-DM), 2019, pp 1–7 32. Kumar JS, Pandey SK, Zaveri MA, Choksi M (2019) Geo-fencing technique in unmanned aerial vehicles for post-disaster management in the internet of things. In: 2019 Second international conference on advanced computational and communication paradigms (ICACCP), pp 1–6 33. MS B, R M (2019) Design of disaster management based on artificial neural network and logistic regression. Association for Computing Machinery, New York, NY, USA 34. Wang Q, Liu H, Gao K, Zhang L (2019) Improved multi-agent reinforcement learning for path planning-based crowd simulation. IEEE Access 7:73 841–73 855 35. Wei G, Sheng Z (2019) Image quality assessment for intelligent emergency application based on deep neural network. J Vis Commun Image Representation 63:102581. [Online]. Available: http://www:sciencedirect:com/science/article/pii/S1047320319301968 36. Shibata K, Yamamoto H (2019) People crowd density estimation system using deep learning for radio wave sensing of cellular communication. In: 2019 International conference on artificial intelligence in information and communication (ICAIIC), 2019, pp 143–148 37. Yabe T, Ukkusuri S (2019) Integrating information from heterogeneous networks on social media to predict post-disaster returning behavior. J Comput Sci 32:02 38. Assery N, Xiaohong Y, Almalki S, Kaushik R, Xiuli Q (2019) Comparing learning-based methods for identifying disaster-related tweets. In: 2019 18th IEEE international conference on machine learning and applications (ICMLA), 2019, pp 1829–1836 39. Akshya J, Priyadarsini PLK (2019) A hybrid machine learning approach for classifying aerial images of flood-hit areas. In: 2019 International conference on computational intelligence in data science (ICCIDS), 2019, pp 1–5 40. Jamali M, Nejat A, Ghosh S, Cao G (2018) Social media data and post-disaster recovery. Int J Inf Manage 44:25–37 41. Anbarasan M, Muthu B, Sivaparthipan C, Sundarasekar R, Kadry S, Krishnamoorthy S, Dasel AA (2020) Detection of flood disaster system based on IoT, big data and convolutional deep neural network. Comput Commun 150:150–157. [Online]. Available: http://www:sciencedirect:com/science/article/pii/S0140366419310357 42. Ginantra NLWSR, Indradewi IGAD, Hartono E (2020) Machine learning approach for acute respiratory infections (ISPA) prediction: case study Indonesia. J Phys Conf Ser 1469:012044 43. Bansal JC, Sharma H, Jadon SS, Clerc M (2014) Spider monkey optimization algorithm for numerical optimization. Memetic Comput 6(1):31–47

A Survey on Plant Leaf Disease Detection Using Image Processing U. Lathamaheswari and J. Jebathangam

Abstract Agriculture plays a major role in the growth of a country. India occupies the second highest rank in the output of farm all over the world. Its development has the potential increase in the Indian economy. But agriculture faces lots of issues in the developing nations. One of the most important factors of which is the different kinds of plant diseases as it not only affects the growth of plants but also affects the production of crops. Identification of the disease with naked eye can be expensive and needs expert’s advice. In this paper, we have discussed various research works and techniques which have been developed a framework to find the defects in plants growth which can also easily identify the leaf disease in plants. It provides better understanding of leaf disease detection using image processing. If this technology is developed further with fast and reliable solutions based on its color, shape and texture to automatically detect diseases. It would be a boon to the farmers and the growth of agriculture. Keywords Segmentation · Feature extraction · Classification · Support vector machine · Random forest · Convolutional neural network · K-nearest neighbor · Naïve Bayes algorithms

1 Introduction Agriculture supports the growth of India. India is a country of diversity where 70% of the people rely on agriculture for their income and daily needs. As stated earlier, India is an agriculture country where the people depends highly on the production of agriculture. The farm produce is majorly affecting the nation’s economy as it highly depends on the quality and quantity of the crop production. Besides natural calamities U. Lathamaheswari (B) · J. Jebathangam Department of Computer Science, VISTAS, Chennai, India e-mail: [email protected] J. Jebathangam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_57

807

808

U. Lathamaheswari and J. Jebathangam

like draught, earthquakes, inadequate rainfall, and various types of diseases also a factor to major farm yield losses. Quality and quantity of the farm yield are majorly affected by these unknown diseases. Detecting and identifying disease by naked eye are more tedious and time-consuming job. Also this task will have less accuracy and more limitations. There is also Kisan Call Center (KCC) to help farmer to verify their doubts about the kind of disease, but it is not easily ready to be available as on required time [1]. So, to overcome these problems and to provide solutions we can develop the system using image-processing techniques to identify the diseases at its early stage [2–5]. Researchers have concentrated more on the quality of crop as it provides more profit with less cost and time along with increasing the production of the crops. The researchers have used huge varieties of crop for their research purpose to detect the types of diseases and classify it accordingly. They also used machine learning techniques along with AI. To get the image of the diseased leaves, we use image acquisition process, and to identify the disease, we need to follow various ways of tools and techniques which can be processed on the image data, and for classification our researchers have used various classifiers in which it can give fast, reliable, robust, accurate solutions.

2 Literature Review There are many researchers who did their contribution in the development of the plant leaf detection system based on various image processing tools and techniques. The ultimate goal of this section is to describe some of those techniques which are relevant to achieve the goal of detecting leaf disease using image processing. Das et al. [6] use various feature extraction techniques to enhance the use of classification system accuracy. Support Vector Machine (SVM), Random Forest and Logistic Regression are the different classification tools he used to identify and classify the different types of leaf diseases. Among which traditional SVM gives the result with most accuracy. The disadvantage is he used more Image dataset for training and testing which can reduce the speed of the system. Vaishnnave et al. [7] present a methodology in groundnut leaf disease detection and to enhance the development of the existing algorithm the author used KNN classification instead of using former SVM algorithm for the detection of ground leaf diseases. The advantage is the number of images used to classify is reduced to make the model fast and accurate, and the disadvantage is only four diseases are identified which can lead to false classifications. Singh and Misra [8] provide the image classification and segmentation methodology using genetic algorithm. It is categorized into five classes of leaf diseases, and it can be used in any plant’s leaf disease detection. The main advantage in this method is that it uses many samples of leaf images of varieties of plants and gets the disease identified at the initial stage. The disadvantage is that it may take more time to perform the task as it classifies various plants using single algorithm.

A Survey on Plant Leaf Disease Detection Using Image Processing

809

Dhaygude and Khumbhar [9] acquire the image in RGB format and then transform it into HUE for better perception; then the green pixels are masked as it mostly represents the healthy areas of the leaf, and we need to identify just the diseased part; they do not need for disease identification. From the above process, the infected region is extracted easily, and they are segmented into equal number of patches. After extracting the needed segments. The segments are used for the texture analysis by means of color co-occurrence matrix. At last, the texture parameters obtained from the diseased part are compared with the texture parameters of normal leaf, and the results are evolved. The disadvantage is the result based on only the color of the segmented image may not accurate over some leaves. Kumar et al. [10] present the system with the use of multiclass SVM classifier to find and classify the diseased leaf image. As the initial stage, the RGB acquired image is changed into HIS and then the segmentation of image using k-means clustering techniques. The identification of disease done by feature extraction here the authors used to classify nine types of features including texture, color, shape, etc.., from the infected leaf image and then the classification done by using multiclass SVM classifier. The advantage is this method is efficient for segmentation, and the disadvantage is too many features are included which may lead to misclassification of images. Sardogan et al. [11] clarify that they have preferred deep learning with CNN architecture for the classification of plant leaf disease as it can easily classify and identify the object without any preprocessing. The author has used an enormous amount of dataset which consists of 500 images which are divided into 400 images for training and the remaining 100 images for testing. In this methodology, they used three matrixes for R, G, B channels which were used as an input to CNN model, and the output was fed into neural network called Learning Vector Quantization (LVQ) which is proposed by Kohonen [12], and it connects the competitive learning with supervised learning. They achieved the average accuracy rate of 88%. But the only disadvantage is their proposed model can be used only for tomato plant-related diseases. Revathi and Hemalatha [13] obtained a method on edge detection based on image segmentation using homogeneous pixel counting technique which is used for cotton disease detection. In this method, in the acquired RGB leaf image, the green region is masked and removed based on edge detection using two algorithms, namely Sobel and canny with homogenous operating techniques. Calculate the RGB features while calling the ranging function for each and every pixels. Now the texture parameters are compared for the healthy and infected leaf. The main advantage of this model is the accuracy is higher than the existing system, and the system is developed with three language outputs in English, Tamil and Hindi for the farmers to easily understand the process. Jebathangam and Purushothaman [14] proposed a method for segmenting the images using K-Means. Required features are extracted using wavelet. ANN algorithm is used for classification of benign and malign.

810

U. Lathamaheswari and J. Jebathangam

3 Methodology Automation detection of leaf disease is very important nowadays as it is essential to identify the loss of crops at its early stage which is very useful in monitoring the large field of crops. Therefore, looking for method which is fast, accurate and automatic helps to provide realistic significance growth of agriculture. The machine learning methods have given clues to the researchers with numerous tools and techniques to identify and classify the diseases. Some examples of them are as follows:

3.1 Image Preprocessing It is the first ever step to be followed in image processing. The only aim of this process is to improve the quality of the image acquired using digital camera. To work on that image, we need to resize and enhance it to remove some unwanted distortions from that image. So that the research models will benefit to work on the qualified data. Steps involved in image preprocessing include: • • • • •

Read image Resize image Denoise image with Gaussian blurring Histogram equalization Correct rotation and transformation.

As in paper [8], image softening using smoothening filter and enhancement for contrast image are also performed.

3.2 Detection of Model The model will work on the diseased part to identify and classify it, so we need to detect the part by using image segmentation. Segmentation is the process which involves dividing the image in to multiple segments or pixels to make it easier and efficient to work on it. There are various kinds of tools available for the segmentation process. In paper [8], the author used the genetic algorithm to segment the image. In [1, 2], the authors used masking or thresholding to set apart various features among the image based on a specific pixel value. This value is known as thresholding value which determines the part to be mask. Using this the green-colored pixels are mostly masked as it is not needed for further process. So then the diseased part is highlighted for future analysis.

A Survey on Plant Leaf Disease Detection Using Image Processing

811

K-means clustering is also one of the separating or partitioning method which is used in paper [10] for dividing the points in the space to detect the nearest object to them. Euclidean distance also used to calculate the similarity value of cluster centroids. Otsu method of segmentation [15] is a global adaptive binary threshold segmentation algorithm which takes the maximum interclass variance between the background image and target image.

3.3 Feature Extraction and Training Extracting the features from the multiple segments is a crucial stage in image processing using deep learning techniques some features contains unique representation of leaf patterns that are helpful in differentiate the different classes of the future classification. This process from which the model which is exposed to learn certain features from the dataset is called model training. The extraction of statistical texture-based features from the image using the matrix called gray level co-occurrence matrix (GLCM). We can use the gray co-matrix function in MATLAB for finding GLCM. In [8], they used color co-occurrence methodology over the traditional grayscale. As in this color co-occurrence methodology, both the texture and color of an image are considered, to extract the unique features of the image.

3.4 Classification of the Object Classification is the process by which the image is classified and grouped into predefined class based on its specific features extracted from the above process. Classification in image processing is further divided into two methods: a. Supervised classification b. Unsupervised classification (a) Supervised classification is the process of classification by which the user selects the pixels from the image where each pixel represents the specific class and allows the required software to use the reference training model for the classification of those pixels in the image. Every process will occur under the knowledge of the user. (b) Unsupervised classification is the process by which the analysis of software involved in the image without the user directing the software to use the specific pixel. The computer software determines which pixel is related and should be placed on the specific classes. Some of the most common algorithms used is cluster analysis, neural networks, K-nearest neighbor and anomaly detection.

812

U. Lathamaheswari and J. Jebathangam

3.5 Identification 3.5.1

Shape- and Texture-Based Identification

Das et al. [6] use several shape- and texture-based classification tools such as SVM Classifier, Random Forest and Logistic Regression model. Among which SVM is a traditional machine learning algorithm which can successively classify the images of plants into several parts, but it also has lagging in some part to meet the needs, and we need to extract several features to meet the accuracy. K-means clustering is also one of the separating or partitioning methods which is used in paper [10] for dividing the points in the space to detect the nearest object to them. In [8], they used color co-occurrence methodology over the traditional grayscale to extract the texture features from the image.

3.5.2

Deep Learning-Based Identification

Neural networks have the distinct ability to derive the meaning from the complex or imperfect data which cannot be recognized by humans and computer techniques. Training the deep convolutional neural network for recognizing the plant leaf classification model is described in the paper [11] where the author clarifies that using deep convolutional neural network is a newly evolving technique which not only classifies the disease of the leaf but also makes the process easier without any preprocessing. CNN consists of several hidden layers which are fully connected to each other and each layer to perform specific task.

4 Conclusion The above-mentioned techniques are proposed based on the revised paper of previous authors, and their various tools are used in the field of classification and identification of plant disease detection. Recognizing and classifying infected leaf from healthy leaf is the crucial part of this research in image processing. If the model proposed for the system is accepted globally, it may be the boon to the agricultural field. As it is a vast area for research, we need more attention on it. Using the data provided by various authors, we can illustrate the future analysis in image processing.

A Survey on Plant Leaf Disease Detection Using Image Processing

813

References 1. Singh V, Sharma N, Singh S (2020) A review of imaging techniques for plant disease detection. Artif Intell Agric 2. Mohindru P, Kaur G, Pooja (2019) Simulative investigation of plant diseases using KNN algorithm. Int J Innovative Res Electr Electron Instrum Control Eng 7(8) 3. Bhimte NR, Thool VR (2018) Diseases detection of cotton leaf spot using image processing and SVM classifier. In: 2018 Second international conference on intelligent computing and control systems (ICICCS) 4. Devaraj A, Rathan K, Jaahnavi S, Indira K (2019) Identification of plant disease using image processing technique. In: International conference on communication and signal processing (ICCSP) 5. Islam M, Dinh A, Wahid K, Bhowmik P (2017) Detection of potato diseases using image segmentation and multiclass support vector machine. In: Canadian conference on electrical and computer engineering (CCECE) 6. Das D, Singh M, Mohanty SS, Chakravarty S (2020) Leaf disease detection using support vector machine. In: 2020 International conference on communication and signal processing (ICCSP). IEEE 7. Vaishnnave MP, Suganya Devi K, Srinivasan P, Arut Perum Jothi G (2019) Detection and classification of groundnut leaf diseases using KNN classifier. In: IEEE international conference on system, computation, automation and networking (ICSCAN) 8. Singh V, Misra AK (2017) Detection of plant leaf diseases using image segmentation and soft computing techniques. Inf Proc Agric 4:41–49 9. Dhaygude SB, Kumbhar NP (2013) Agricultural plant leaf disease detection using image processing. Int J Adv Res Electr Electron Instrum Eng 2(1):599–602 10. Kumar DA, Chakravarthi PS, Babu KS (2020) Multiclass support vector machine based plant leaf diseases identification from Color, texture and shape features. In: 2020 Third international conference on smart systems and inventive technology (ICSSIT) 11. Sardogan M, Tuncer A, Ozen Y (2018) Plant leaf disease detection and classification based on CNN with LVQ algorithm. In: 2018 3rd international conference on computer science and engineering (UBMK) 12. Kohonen T (1995) Improving human decision making through case based decision aiding. AI Mag 12(2): 52–68. 13. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot diseases using image processing edge detection techniques. In: 2012 International conference on emerging trends in science, engineering and technology (INCOSET) 14. Jebathangam J, Purushothaman S (2018) Implementation of k-means for segmentation of mammogram image to identify micro calcification. J Adv Res Dyn Control Syst 10(2):314–317 15. Sabrol H, Satish K (2016) Tomato plant disease classification in digital images using classification tree. In: International conference on communication and signal processing (ICCSP)

Feature Importance in Explainable AI for Expounding Black Box Models Bikram Pratim Bhuyan and Sudhanshu Srivastava

Abstract In recent years, research on Explainable Artificial Intelligence (XAI) has risen as an answer to the demand for more openness and confidence in artificial intelligence (AI). This is particularly essential since AI is utilized in sensitive fields with social, ethical and security consequences. The focus of XAI’s work is largely on the categorization, decision or action of machine learning (ML) with already thorough systemic evaluations. In this paper, we try to have a comprehensive study of the different types of XAI. We evaluate the feature importance by analyzing three different popular algorithms on a medical dataset. Finally, the future challenges concerning XAI are discussed. Keywords Explainable AI · Deep learning · Black box models · Artificial intelligence · Machine learning

1 Introduction Machine learning (ML) [1] has been a major factor in research and industry in recent years. Today’s ML algorithms are able to attain outstanding performance (sometimes even above the human level) in growing numbers of complicated job, especially via advancements in methodology, the availability of big databases and improved computer power. The development of deep learning models is a leading factor. However, these strong models were often seen as “black boxes” because of their layered nonlinear structure and did not provide information about exactly what made them get to their predictions. Since, for example, this type of lack of transparency in many applications is unacceptable, it has lately been more focused on developing techniques for visualizing, explaining, and interpreting profound learning models. B. P. Bhuyan (B) · S. Srivastava Department of Informatics, School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_58

815

816

B. P. Bhuyan and S. Srivastava

Explainable Artificial Intelligence (XAI) [2] is a developing study area that is rapidly becoming one of the most relevant artificial intelligence subjects (AI). In areas that are increasingly sensitive, AI systems with potential broad-scale social, ethnic and safety consequences, with systems for autonomous conduction, meteorological simulations, clinical diagnosis, demeanor patterns, embedded devices, computer vision, industry efficiency and safety to source a very few, are being used [3]. This increasing sensitivity and accessibility leads to obvious concerns of confidence, prejudicially, responsibility and procedure, i.e., how has the machine concluded? These issues come from the fact that, in general, machine learning (ML) is basically a black box with data input into a trained neural network and subsequently results in a classification, decision or action. These concerns are the most common and maybe the most powerful portion. The inner functioning of these algorithms is a lay person’s mystery (usually the person interacting with the AI). Data scientists can hardly even grasp or interpret the algorithms. While architecture and mathematics are clearly described, the inner status of the neural network (let’s explain alone) is relatively poorly known. This black box situation makes trusting the system with which it interacts challenging for end users. When an AI system delivers unintended outcomes, this lack of confidence typically leads in the end user’s mistrust and sometimes even rejection. It is not clear whether the result is “correct” or because of some flaw or partiality in the creation of the AI system that has meant that the model has been overpowered with training data that does not represent a wide variety of real world examples or lacks sufficiently to model the complexities of the target environment. The resulting misclassification, unjust treatment of members of society and illegal acts or financial implications for firms using AI solutions might have significant side consequences, such as hazardous factory behaviors. In this black box, XAI research in machine learning and deeper learning seeks to gauge information or clarifications on why the algorithm has come to the conclusion or action that it has taken. Besides offering trust and accountability mechanisms, XAI supports debugging and misrepresentation for machine learning. Ultimately, human input (human-in-the-loop) remains and affects the inputs, outputs and network architecture of machine learning algorithms and as such often are vulnerable to human errors or biases. XAI-enabled algorithms explanations might reveal certain faults or problems of that design (e.g., certain characteristics which are entirely irrelevant become too much of an output factor in the input image?). XAI is aimed at addressing these issues, giving the end user greater trust and more confidence in the machine.

2 Literature Survey This section provides an introduction of several approaches to AI, starting with model-agnostic methods and relying on a basic surrogate function to explain the forecasts. Then we explore techniques to explain by evaluating the reaction of the model to local disruption (e.g., by utilizing gradient information or by optimization).

Feature Importance in Explainable AI for Expounding Black Box . . .

817

We next offer highly efficient explanation approaches based on propagation that exploit the inner structure of the model. Lastly, we look at approaches that go beyond individual explanations to a model behavior meta-explanationability.

2.1 Surrogate Explainability Basic classificatory such as linear or superficial decision-making models are inherently interpretable, thus it becomes easy to clarify their predictions. Complex classifiers like deep neural networks or recurrent models, on the other hand, have several levels of nonlinear transformations which make it substantially complicated to discover the predictions. One way to explain complicated model predictions is to approach them locally using a simple, interpretable surrogate function. The local interpretable model-agnostic explanations are a prominent method in this area (LIME) [4]. This technique samples in the vicinity of the input, assesses the neural system at these places and attempts to adjust the surgical function so that the function of interest is approximated. If the substitute function’s input domain is human-interpretable, LIME may even explain judgments about a model with noninterpretative characteristics. As LIME is an agnostic model, it may be used for any category, even without understanding its interiors, for example architecture or weights of the neural network category. One of LIME’s main drawbacks is that it needs many minutes to explain a single prediction, e.g., for state-of-the-art models like GoogleNet. Like LIME, the SmoothGrad approach [5] samples the input district to approximate the gradient, which creates a model which approximates the local function of interest. SmoothGrad also doesn’t use the inside of the model, but it requires access to the gradients. It may thus also be seen as an explanatory approach based on gradients.

2.2 Local Perturbation Based Explainability Another group of techniques develops clarifications by studying the reaction of the model to local changes. This covers methods that use gradient information and disturbance and optimization methodologies. The techniques of explanation based on the degree of interest function have a long history in machine study. The so-called sensitivity analysis is one example (SA) [6]. Although employed frequently as an explanation approach, SA explains technically the change in the forecast rather than the expectation altogether. In addition, SA has proven to be experiencing basic difficulties, such as shattering gradients and discontinuities in explanations, and is thus seen as being inappropriate to explain the current IA models. There are other sensitivity analyses that solve some of these difficulties by locally averaging or integrating the gradients [5].

818

B. P. Bhuyan and S. Srivastava

Methods of disturbance explanation [7] explicitly examine the reaction of the model to more widespread local disruption. While the occlusion measurement technique [8] assesses the significance of input dimensions by masking sections of the input, the PDA methodology employs a pixel neighborhood conditional sample of an analyzed feature to efficiently erase relevant data. Both techniques are modelagnostic, which means that they may be used to any classifier but they are not particularly effective on a computer basis due of the requirement to assess the function of interest (for example, the neural network) in all disturbances. Another model-agnostic strategy to explain local disturbances is the meaningful disturbance method [7]. It considers explaining to be a meta prediction problem and uses optimization to synthesis the most useful explanations. Other techniques also employ the concept of explaining as an issue of optimization. These prototypes are calculated by looking for a pattern that gives the most desirable model response within the activation maximization framework.

2.3 Propagation-Based Explainability Propagation-based methods to explanation do not ignore the model they explain but integrate the model’s underlying structure into the process of explanation. Layer-wise Propagation of Relevance (LRP) [9] is an explanation-based approach that applies to generic architectures of the neural networks including deep neuronal networks and LSTMs and classifiers of fishing vectors [10]. In the local distribution rules, LRP explains individual model decisions by propagating the prediction from the output to the input. The spread of the deep Taylor decomposition framework [11] process may be theoretically integrated. Recently, LRP has been expanded to a more wide range of magnetic models, e.g., for identification of clusters, by converting the model first into a neural network (“Neuralization”) [12]. Using the model structure in conjunction with the application of suitable (theoretically justified) propagation rules, LRP may provide acceptable explanations for relatively little computer costs (one forward and one backward pass). Deconvolution and guided backpropagation [8] are other prominent explanation techniques which use the underlying structure of the model. These techniques, unlike LRP, do not explain the prediction as “how much the input feature contributed to the prediction” but rather discover input space patterns relating to the network output examined. In the literature, several different techniques of explanation are offered which fit under the category “leverage structure.” Some employ heuristics to direct the redistribution process, while others include the optimization process [13]. For several of these propagation-based explanation techniques, the iNNvestigate toolkit [14] provides effective implementation.

Feature Importance in Explainable AI for Expounding Black Box . . .

819

2.4 Metadata-Based Explainability Furthermore, the broad patterns for classifying behavior may be pooled and analyzed. Spectral relevance analysis (SpRAy) [15] calculates such meta explications by the clustering of each heat-map. A recently suggested approach. This technique allows the classifier prediction strategies for the whole dataset to be investigated (half) automatically, and weak areas are systematically identified in models or training datasets. Another form of meta-explanation tries to comprehend better the representations learnt and to interpret them in a human-friendly way. For example, the network dissection technique assesses the semantics of the units that are concealed, i.e., the ideas encoding these neurons. Another recent study [16] explains the extent to which these ideas are essential for prediction in terms of user-defined concepts and tests. Also ontological perspectives on XAI can be found in the works of formal concept analysis [17, 18].

3 Implementation Details We have handpicked the diabetes dataset [19] for experimentation and analysis. The variable ‘Outcome’ is used as the classification target class. The random forest Classifier and multi-layered perceptron models are two of the most popular and black box models. Table 1 gives the results. For the multi-layer perceptron model with 12 hidden layers, Adam Solver is utilized to achieve the outcome. We now analyze the feature importance with respect to the models of LIME, SHAP and ELI5 [20]. Figure 1 illustrates the feature importance of the random forest classification model, which is utilized again when analyzing data using LIME for understanding the function of features. The influence on binary diabetes detection indicated by ‘0’ and ‘1’ may be analyzed in Fig. 2. We may thus look at the plot if features that differentiate a certain class from the rest are produced. In other words, you can see in the binary classification summary how the machine has learnt from its features. The SHAP value was drawn as a summary to better comprehend the dimensions, as seen in Fig. 3. For a dimension corresponding to a data instance, each point as a value represents the shapley value. The position of the y-axis is determined by the x-axis feature and the Shapley value. If you analyze the picture, the lowest value of ‘blue pressure’ is shapley value, as the color changes from blue to red and the Shapley

Table 1 Classifier performance Model Random forest classifier Multi-layered perceptron

Accuracy (%) 75 71

820

B. P. Bhuyan and S. Srivastava

Fig. 1 Feature importance from LIME

Fig. 2 Feature importance from SHAP

value increases. In the direction of the y-axis, the coincidence spots are shifted such that the Shapley values are dispersed by characteristic. The qualities are classified according to their importance.

3.1 ELI5 ELI5 or Explain Like I’m 5 is a Python package that allows you to debug and display several machine learning models using a uniform API. It has built-in support for many machine learning frameworks and delivers a uniform approach to explaining

Feature Importance in Explainable AI for Expounding Black Box . . .

821

Fig. 3 SHAP feature values Fig. 4 Permutation importance

the black box models. It tries to gain insight from a tree-based model by permuting (alter the location) the values of each feature one at a time and see how it affects the model’s performance. The permutation importance technique was named after this. We tested ELI5 on our model that gave the permutation importance as shown in Fig. 4. The goal behind permutation importance is to see how the scoring (accuracy, precision, recall and so on) changes depending on whether a feature is present or not. Glucose, with a score of 0.2589, gets the highest score in the aforementioned result. It means that if we change the glucose feature, the model’s accuracy will change by as big as 0.2589. The uncertainty value is the value after the plus-minus sign. We have the uncertainty value since the permutation importance technique is fundamentally a random process. The more important the traits are to the scoring, the higher the position. Some of the features in the bottom row may have a negative value, which is intriguing since it will suggest that as we permute the feature, the score will increase. This occurs because the feature permutation boosts the score through chance. Figure 5 shows the feature importance using eli5. It shows the feature names alongside their weights. Bias means an intercept, where our line intercepts the y-axis for (linear regression). In machine learning, we can call the intercepts as biases.

822

B. P. Bhuyan and S. Srivastava

Fig. 5 Feature importance using ELI5

Fig. 2 shows that the bias offsets all the predictions that we make. But, we are more concerned with the features here where ‘DiabetesPedigreeFunction’ is the most important feature. What is important to note in the above table is that each feature contributed to the prediction result, with the contribution of each feature determining the weight outcome. After all, the percentage of each characteristic that contributed to the final prediction across all trees is referred to as weight (If we sum the weight it would be close to 1). With this package, we can assess the importance of a feature not just based on its performance score, but also on how each feature contributes to the decision-making process.

4 Challenges While important advances have been achieved in the previous several years with respect to explainable AIs, both on the approach and on the theory side and on how explanations are utilized in practice continue to face obstacles. Some of these issues, for instance, objective assessments of explanatory quality and the usage of explanations beyond visualization have previously been worked out by researchers. Other unresolved issues, particularly those relating to theory, are more fundamental and will take longer time to address them satisfactorily. Explanation approaches help us to learn how the AI model works. However, in a number of respects, these approaches remain restricted. First, heat-maps generated using today’s techniques of explanation visualize information “first order,” i.e., they reveal which inputs are recognized for the forecast. However, it remains unclear if the relationship between such traits, e.g., alone or just jointly, is relevant. The low abstraction level of explanations is another drawback. Heat-maps demonstrate that certain pixels are significant in relation to more abstract notions such as objects or scene seen in the image without linking these relevant values. People must interpret the explanations so that they can make sense and comprehend the behavior of the model. This is a tough and misinterpretation step. An ideal way to describe the model’s behavior on a more abstractly and humanly comprehensible level is to include information from these low-level heat-maps.

Feature Importance in Explainable AI for Expounding Black Box . . .

823

There is no theory of the explainable AI, with a description of what explanations that are formally and generally acknowledged. Some studies made a first step toward this objective by establishing mathematically sound techniques of explanation. For example, by incorporating it within the theoretical framework of Taylor’s degradation, the writers tackle the explanatory problem. Another possible route for the development of a comprehensive theory of explainable AI is axiomatic methods. Finally, it is a really open task to employ explanations beyond display. Future work will illustrate how clarifications may be included into a broader optimization process, for example, to enhance the performance of the model or decrease its complexity.

5 Conclusion This review is on Explainable Artificial Intelligence (XAI), which has recently been highlighted as the most essential need in real-life applications for the implementation of ML techniques. Our work has first developed, this issue by defining several notions behind model explanation and demonstrating the many objectives that inspire the quest for ML approaches which may be interpreted more easily. We have successfully implemented and compared the SHAP, LIME and ELI5 on basis of feature importance on a diabetic database. Future challenges were discussed which could lead to more advancement in this domain.

References 1. Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning. MIT Press, Cambridge 2. Amina A, Mohammed B (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:52138–52160 3. Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil-López S, Molina D, Benjamins R et al (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115 4. Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144 5. Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. arXiv:1706.03825 6. Morch NJS, Kjems U, Hansen LK, Svarer C, Law I, Lautrup B, Strother S, Rehm K (1995) Visualization of neural networks using saliency maps. In: Proceedings of ICNN’95-international conference on neural networks, vol 4. IEEE, pp 2085–2090 7. Fong RC, Vedaldi A (2017) Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE international conference on computer vision, pp 3429–3437 8. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, Berlin, pp 818–833 9. Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10(7):e0130140

824

B. P. Bhuyan and S. Srivastava

10. Binder A, Bach S, Montavon G, Müller K-R, Samek W (2016) Layer-wise relevance propagation for deep neural network architectures. In: Information science and applications (ICISA) 2016. Springer, Berlin, pp 913–922 11. Montavon G, Lapuschkin S, Binder A, Samek W, Müller KR (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn 65:211–222 12. Jacob K, Klaus-Robert M, Grégoire M (2020) Towards explaining anomalies: a deep Taylor decomposition of one-class models. Pattern Recogn 101:107198 13. Kindermans P-J, Schütt KT, Alber M, Müller K-R, Erhan D, Kim B, Dähne S (2017) Learning how to explain neural networks: patternnet and patternattribution. arXiv:1705.05598 14. Alber M, Lapuschkin S, Seegerer P, Hägele M, Schütt K, Montavon G, Samek W, Müller K-R, Dähne S, Kindermans P-J (2019) Innvestigate neural networks! J Mach Learn Res 20(93):1–8 15. Lapuschkin S, Wäldchen S, Binder A, Montavon G, Samek W, Müller KR (2019) Unmasking Clever Hans predictors and assessing what machines really learn. Nat Commun 10(1):1–8 16. Bau D, Zhou B, Khosla A, Oliva A, Torralba A (2017) Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6541–6549 17. Bhuyan BP, Karmakar A, Hazarika SM (2018) Bounding stability in formal concept analysis. In: Advanced computational and communication paradigms. Springer, Berlin, pp 545–552 18. Bhuyan BP (2017) Relative similarity and stability in fca pattern structures using game theory. In: 2017 2nd international conference on communication systems, computing and IT applications (CSCITA). IEEE, pp 207–212 19. UCI Machine Learning (2016) Pima Indians diabetes database 20. Fan A, Jernite Y, Perez E, Grangier D, Weston J, Auli M (2019) Eli5: long form question answering. arXiv:1907.09190

Sign Language Recognition System Using Customized Convolution Neural Network Dipmala Salunke, Ram Joshi, Nihar Ranjan, Pallavi Tekade, and Gaurav Panchal

Abstract With the rapid development of computer vision and its expansion of applications, the demand for interaction between human and machine is becoming more and more extensive. Sign language recognition, which is a part of language technology, involves evaluating human hand signs and gestures through machine learning algorithms. Vision-oriented hardware like camera is used to convert these sign languages into meaningful form like text or sound. We have implemented a system that captures the hand sign or gesture of the user and predicts the equivalent meaning of the language. A lot of work has been done in the field of sign language recognition; however, a smaller amount of work done where, kids can learn and practice sign language as well as learn basic mathematical operations using sign language. This paper mainly focuses on making a system that can be used by children with hearing or vocal disabilities to learn basic alphabets, numbers and some words in sign language. Deep learning-based customized convolution neural network is trained on the dataset size 2400 * 44, where seven convolution layers are defined for sign language prediction with the accuracy of 99.92%. Keywords Convolution neural network · Sign language recognition · Gesture recognition · Machine learning

1 Introduction According to data from leading health organizations like WHO, about 4.3 crore people across all the countries in the world have problems related to hearing loss which is about 6% of the world’s population, of whom 3.4 crore are children. Studies expect that the number will rise to 900 million by 2050 [1]. People can exchange the feelings and ideas and interact only though voice or actions. People communicate D. Salunke (B) · R. Joshi · N. Ranjan · P. Tekade · G. Panchal Department of Information Technology, JSPM’s Rajarshi Shahu College of Engineering Tathawade, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_59

825

826

D. Salunke et al.

with each other using sign language which is one of the ways for non-verbal communication. Sign language helps in communication for people to express themselves and understand each other using gestures or signs. To communicate with deaf persons, we can use hand, facial, and body movements. We are building a sign language recognition technology to bridge the gap between normal people and deaf people. It will be an excellent tool for communicating with people who have hearing impairments. Also, for non-sign language users to understand the communication, there is a good interpretation. Gesture recognition is a type of perceptual computing which has a user interface that allows computers to capture the user’s signs and interpret human gestures into its equivalent meaning. Simple gestures can be used to interface with devices. There are a variety of methods for recognizing gestures, including cameras, computer vision algorithms, and so on. With this, we can readily understand human body language. According to a review of the literature, several experiments on sign language identification have been conducted, although with small datasets, and some of the articles have failed to recognize all alphabets. For American sign language, we customized a convolution neural network that recognizes 26 alphabets, 0-9 digits and 8 words. The authors built a dataset of 2400 gestures for each letter, totaling 105600 gestures. The goal of this system is to obtain high recognition rates. The system is developed using Python OpenCV library, convolutional neural network with hidden layer nodes along with activation function rectified linear unit.

2 Literature Survey Narayana et al. [2] implemented a system that was able to improve the rate of hand recognition from 67.71 to 82.07% on ISO GD dataset. They used ISO GD dataset and NVIDIA datasets as benchmarks and got an accuracy of 91.28% which was increased by 8.9% for the previous best result not using IR dataset. Hossen et al. [3] worked on regional language and developed a system based on Bengali sign language, and with the help of CNN, they were able to implement a system where they could identify 37 static sciences out of the 51 letters in Bengali sign language. They used 1147 images for the system and got an accuracy of 96.33%. They used different layers like convolutional layer, max pooling layer and activation function rectified linear unit for better accuracy. Dieleman et al. [4] proposed a system using two convolutional neural networks for extracting hand features and one for extracting the upper body features in this system; they mainly focused on implementation of 3D Evolutions although a 2D convolution has better validation accuracy than 3D convolution, but they were able to achieve accuracy of 95.68% and also observed false positive rate of 4.13%. Cheng et al. [5] proposed a system where two networks like CNN and RBM network have combined together to give a better accuracy for gesture recognition. They used Kinect sensor to obtain the color and depth of samples and implemented a system which had a better recognition rate with an error of just 3.9%. By incorporating tropic layers and Optimizer, Narang et al. [6] were able to predict sign languages of 26 characters with an accuracy of 97.3%, eliminating background noise and preventing overfitting. Two

Sign Language Recognition System Using Customized Convolution …

827

stochastic gradient algorithms, root mean square propagation and adaptive gradient algorithm, were merged. Rajendran et al. [7] proposed a system in which they trained a three-layered deep convolutional neural network to recognize sign languages with a 99.96% on training data and a % accuracy on testing data. Pradeep Kumar et al. [8] implemented a system using independent Bayesian classification combination where the recognition rates were 96.05 and 94.27% for single- and double-hand gestures. They collected a dataset of 51 dynamic word sign gestures for the system. Beena et al. [9] proposed a system that has trained using 1000 images of each numeral signs. The algorithm extracts features from the block processed images and trained using the artificial neural network (ANN) and obtained an accuracy of 99.46% for the depth image. CNN model was used to recognize 50 different signs in Flemish sign language to recognize a set of 50 different signs in the Flemish sign language with an error of 2.5%, using the Microsoft Kinect.

3 System Architecture The flow diagram of sign language recognition system is shown in Fig. 1. Web camera is used to capture the gestures for 26 alphabets, 0–9 digits and 8 words including best of luck, like, remember, you, I/me, love, luck and I love you. Customized convolution neural network is used to classify the image, corresponding class of image is recognized, and text is displayed on the screen that gets converted into voice. The system consists of two modules:

Fig. 1 Sign recognition system flow diagram

828

D. Salunke et al.

3.1 OpenCV OpenCV is a library of programming functions that aimed at real-time computer vision. (a) Capturing frame and displaying them: • • • • • •

We will capture the frame from the camera one by one display them on to the display Importing libraries Creating camera object Reading the frame Displaying the frames

(b) Extracting region of interest: Threshholding the image to binary format. Converting it into grayscale (Fig. 2) thus giving out the object or action into a single format. (c) Finding contours: Contours are a curve of a points, with no gaps in the curve. Contours are extremely useful for approximation of shape and analysis (Fig. 3). (d) Finding convexity defects: Convexity defect is a cavity in an object segmented out from an image. There is some unwanted part inside the main boundary that needs to be removed (Fig. 4).

Fig. 2 Gray scaling

Sign Language Recognition System Using Customized Convolution …

829

Fig. 3 Region of interest

Fig. 4 Convexity defects

3.2 Convolutional Neural Network Convolutional neural network has found out to be the most efficient layer technology for classifying images. In this proposed system, we will train the model with the dataset created and then we will feed the camera with the user gestures which will then be fed to the CNN input layer as shown in Fig. 5. There can be many layers to a CNN, but for this model we will be implementing a customized CNN with one input layer, seven hidden layers and one output layer. Various matrices operations and load balancing in the nodes of the hidden layer will correspond to the prediction of the gestures given by the user on the screen.

830

D. Salunke et al.

Fig. 5 Convolutional neural network

4 Methodology The first step of the proposed system is to create a dataset. There are various ways to collect or create gestures, but for the proposed system, we are going to use webcam. The data goes under various processing where the background is detected and eliminated using Hue Saturation Value (HSV) algorithm. Segmentation is done to detect hand from the background. The image is then resized to 50 × 50 pixels. Our dataset contains 2400 images per gesture; also the training and testing data are split into 80:20 ratio. Binary pixels are extracted from the image and then forwarded to rotate the image where we rotate the data since the images generated by the webcam are mirrored so in this step, we flip the image into right side up. The dataset is trained using customized convolution neural network. The model is then evaluated, and the system then is able to predict the gestures.

4.1 Set Histogram We first create a histogram data in order to calculate the values of histogram to find the difference between hand and the background which helps to distinguish them. This process creates a histogram object and saves it into the directory. It is created using Pickle library which converts a Python object into byte stream. This object is then used later on for future operations.

4.2 Creating Dataset The data collection is an important in order to move forward for the gesture creation. We have created our own dataset containing 105,600 images of 44 gestures using webcam. We have 44 classes each for the gestures. In order to get higher consistency, we have created the dataset at same background with webcam each time commanded. The images captured are in RGB format and are then converted into hsv format as well; other operations are done before saving them. We have used SQLite in order

Sign Language Recognition System Using Customized Convolution …

831

Fig. 6 Image processing

to label the images as they are later referred using ‘id’ assigned to them. This helps to prepare a proper key–value pair for this image.

4.3 Image Processing In this, we convert the image from RGB to HSV format as shown in Fig. 6. Then we calculate the back project from the histogram of the background and the image to segment out hand gesture from it. We then threshold the image to get the proper contours of the image. The convexity defects are then removed as well; a fine line of Gaussian and Radian blur is used to remove any unnecessary noise created during creation of the image. The image is then converted to binary in which the image contains two colors, i.e., black and white. The image is then saved into a folder named ‘gestures’ which contains a folder of all the images named with their ‘ids’.

4.4 Model Creation A customized convolution neural network model is used to extract features from frame to predict hand gestures. It is a multilayered neural network used for image recognition. The architecture containing some seven convolution layers, each containing a pooling layer, activation function, flatten and batch normalization. The first layer contains 16 filters and kernel size of 2 × 2. ‘RELU’ activation function is used in the layer and the shape of 50 × 50. Then a pooling layer is used with the padding of ‘relu’ and a pool size of 2 × 2. The sequence of the layers is then 32, 64. Each containing a pooling layer of the specified specification. Then a flatten layer is used to convert the image into a single array. The dense layer is added onto the model of 128 filters with ‘relu’ as its activation function. The data sometimes

832

D. Salunke et al.

Fig. 7 Display output window

may cause overfitting so a dropout of 0.2 is done to the generated output from the layer. Then another dense layer containing the number of gestures as its filter is used with ‘softmax’ as its activation function. Then model is compiled by using category cross-entropy as the loss function and Adam as the Optimizer.

4.5 Displaying Gestures All the gestures are arranged into a single picture into gallery format and then saved onto the directory. This file is saved in ‘.jpeg’ format. The file is later used during classification. Figure 7 shows the output window to display predicted text by the model.

4.6 Real-Time Classification The model is now used to predict the gestures in real time. The process for real time classification uses the same image segmentation methods. The image is then sent to the model for prediction, and the model outputs the ‘id’ assigned to the gesture which in turn does a SQL query in the background to get the label associated with the gesture. The predicted text is displayed to the user in the predicted text block. If the prediction stays for more than 2 seconds and the predicted gesture has accuracy of more than 70%, the label associated with the gesture is added to the string which helps create a meaningful sentence out of many gestures in this step, we also have included a ‘pyttsx3’ which converts a text into speech which makes the text audible to the user. The audio is only played after the gestures are removed from the assigned box, and the threshold box doesn’t contain hand in it. The steps for real time sign language recognition are shown in Fig. 8.

Sign Language Recognition System Using Customized Convolution …

833

Fig. 8 Algorithm

5 Results and Discussions 5.1 Dataset The database used for this system is manually created using the webcam and contains more than 2400 images of the different gestures. Currently, it can detect all the alphabets as well as numbers; also the database contains few words for practice purposes.

5.2 Software Used To implement this deep learning model, we have used various Python libraries mainly keras and TensorFlow. The language used is Python 3.6. The libraries used in this are: • • • • • • •

h5py = 1.5 NumPy scikit-learn keras = 2.5.0 OpenCV-Python pyttsx3 TensorFlow = 1.5.0.

834

D. Salunke et al.

Fig. 9 Accuracy

5.3 Specifications The system runs on a machine having 8GB RAM. The operating system used is Linux, and the processor is Intel i5 with integrated intel graphics.

5.4 Confusion Matrix A confusion matrix is used to describe the performance of a classification model. The described performance is based on set of test data for which the true values are known. Figure 9 shows the accuracy value for the model. Figure 10 shows the training vs validation loss for the model. Figure 11 shows the training vs validation accuracy of the model and confusion matrix for this system having 44 of classes with accuracy value 99.92 and misclassification 0.0008 is depicted in Fig. 12.

6 Conclusion For those with speech and hearing disabilities, this paper proposes a customized convolution neural network model for American sign language recognition. Image binarization, image optimization and image segmentation algorithms are explored.

Sign Language Recognition System Using Customized Convolution …

835

Fig. 10 Training versus validation loss

Fig. 11 Training versus validation accuracy

The dataset contains 44 signs with 2400 gestures for each of the 26 alphabets, 0–9 numerals and 08 words. The model was trained on 80% of the data and tested on 20% of it. The research outlines our method for translating sign language into equivalent text, which has a 99.92% accuracy after 15 epochs. Several variables contribute to the excellent accuracy, beginning with pre-processing the dataset images and effectively applying augmentation to the images. By expanding the quantity of the dataset, the system can predict a greater number of gesture signs.

836

D. Salunke et al.

Fig. 12 Confusion matrix

References 1. Tao Y, Huo S, Zhou W (2020) Research on communication APP for deaf and mute people based on face emotion recognition technology. In: 2020 IEEE 2nd international conference on civil aviation safety and information technology (ICCASIT, 2020, pp 547–552. https://doi.org/10. 1109/ICCASIT50869.2020.9368771 2. Narayana P et al (2018) Gesture recognition: focus on the hands. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5235–5244 3. Hossen MA et al (2018) Bengali sign language recognition using deep convolutional neural network. In: International conference on informatics, electronics & vision. International conference on imaging, pp 369–373 4. Dieleman S et al (2014) Sign language recognition using convolutional neural networks. In: ECCV 2014 workshops. ECCV 2014. Lecture Notes in Computer Science, vol 8925. Springer, Cham.https://doi.org/10.1007/978-3-319-16178-5_40 5. Cheng W et al (2014) Jointly network: a network based on CNN and RBM for gesture recognition. Eur Conf Comput Vis ECCV 572–578 6. Yashnarang et al. (2021) Indian sign language to text conversion in real-time using machine learning. Int J Eng Appl Sci Technol (IJEAST) 2455–2143

Sign Language Recognition System Using Customized Convolution …

837

7. Rajendran R et al (2021) Finger spelled signs in sign language recognition using deep convolutional neural network. Int J Res Eng Sci Manage 4(6):249–253 8. https://Www.Sciencedirect.Com/Science/Article/Abs/Pii/S0020025516307897 9. Beena MV, Agnisarmannamboodiri MN et al (2017) ASL numerals recognition from depth maps using artificial neural networks. Middle-East J Sci Res 25(7):1407–1413. ISSN 1990–9233

Space Fractionalized Lattice Boltzmann Model-Based Image Denoising P. Upadhyay

Abstract In this paper, the recently proposed lattice Boltzmann (LB) model for time sub-diffusion equation in Caputo sense has been Caputo sense space fractionalized. The proposed model has been applied to denoise images corrupted by Gaussian noise and salt and pepper noise, respectively. The comparison of the proposed method with some of the other state-of-the-art methods is done in terms of PSNR and SSIM values. The obtained results are considerably improved compared to some of the other recently reported state-of-the-art methods. Keywords Denoising · Diffusion equation · Space fractionalization · Gaussian noise · Salt and pepper noise

1 Introduction Due to the imperfections of the image capturing devices, noise is introduced in the images from the image acquisition phase [1]. Further, noise also finds its place in signal amplification and transmission phases as well [2]. The different types of noises that affect images are mainly Gaussian noise, impulse noise, salt and pepper noise, etc. Researchers have proposed many efficient and good image denoising algorithms in recent years. The most commonly used image denoising methods include total variation denoising [3, 4] and the different frequency domain-based approaches [5, 6]. The total variation-based denoising algorithms take into account of the fidelity of the input image. This method is not much based upon the types of noise. Therefore, the details of the denoised images are thus smoothened, which means that this method incurs a large amount of loss in information due to the denoising algorithm. The frequency domain-based approaches take into account the details of the images but suffer from pseudo-Gibbs phenomena, which lead to the introduction of artifacts in the denoised images [6, 7]. To deal with the problem of denoising of images, to P. Upadhyay (B) DST-CIMS, Banaras Hindu University, Varanasi, Uttar Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_60

839

840

P. Upadhyay

minimize the loss of details, the space fractional lattice Boltzmann model recently reported by Du et al. [8] in Caputo sense is proposed. Tian et al. [9] proposed an attention-guided denoising convolutional neural network (ADNET) for image denoising. Wang et al. introduced an image denoising method based on noise detection through fractional differential gradient and denoising through fractional integration [10]. Pang et al. [11] proposed an adaptive weighted TV p regularization-based denoising model to denoise images. Zhang introduced a separation aggregation network for image denoising. This method decomposes the noisy image into multiple bands in which a pattern can be found. Then, a deep mapping function is used for learning and this finally helps in obtaining the clean image [12]. Thanh et al. [13] applied an adaptive TV L1 model to denoise images corrupted from salt and pepper noise. Li et al. [14] proposed an RFID multi-label dynamic localization system to perform denoising of images. Gai and Bao [15] proposed a method to denoise images based on an improved deep convolutional neural network with a joint loss function. Zhao et al. [2] proposed a method based on reaction diffusion equation to denoise images. Galiano and Velasco [16] generalized the cross-diffusion problem from the nonlinear complex variable diffusion model and applied this generalized model to denoise images.

2 Statement of Problem One of the most significant challenges in the area of image and signal denoising is to find a method that can 1. Denoise signals with least smoothing of details. 2. Preserve edges in the denoising process. Motivated from this, the space fractionalized lattice Boltzmann model for time sub-diffusion equation introduced by Du et al. [8] is proposed. To address the above challenges, the proposed model is applied to images. It was found that the proposed space fractionalized lattice Boltzmann model introduced by Du et al. [8] gives encouraging results.

3 Formulation of Problem The lattice Boltzmann for time sub-diffusion equation introduced by Du et al. [8] is given by ∂φ (X, tn ) = D∇ 2 φ + F(X, tn ) σ (1) ∂t

Space Fractionalized Lattice Boltzmann Model-Based Image . . .

where φ represents concentration, D represents diffusion coefficient, σ = and F(X, tn ) is given in Eq. (2) F(X, tn ) =

841

t 1−α (2 − α)

n−1 σ  [(n − m + 1)1−α − (n − m)1−α ][φ(X, tm ) − φ(X, tm−1 )] t m=1 (2)

+ g(X, tn ) Assuming φ(X, tn ) and g(X, tn ) ∈ C n [a, b], the space fractionalized model of (2) in Caputo sense can be written as   β1  ∂φ ∂ (X, tn ) σ ∂ x β1 ∂t ⎧ x 1 1 ∂n

⎪ ⎪ D∇ 2 φ + F(X, tn ) ; (n − 1) < β1 < n ⎨ β +1−n n 1 (n  − β1 ) a (x − s) ∂x = ∂n ∂φ ⎪ ⎪ β1 = n (X, tn ); ⎩ n σ ∂x ∂t (3)    ∂ β1 +1 ∂φ (X, tn ) ∂ x β1 ∂t ∂t ⎧ ∂n 2 ⎪ ⎪ ⎪= K ∂ x n (D∇ φ)(φ(X, tn ) − φ(X, tn−1 )) ⎪   ⎪ n n ⎪ ⎪ ⎪ 2 φ) ∂ φ(X, tn ) − ∂ φ(X, tn−1 ) ⎪ +K (D∇ ⎪ ⎪ ∂xn ∂xn ⎪ ⎪  n  ⎨ ∂ φ(X, tn ) ∂ n φ(X, tn−1 ) σ n−1 1−α [(n − m + 1) − (n − m)1−α ] − = +K ⎪ Δt m=1 ∂xn ∂xn ⎪ ⎪ β1 g(x, t ) ⎪ ⎪ ∂ n ⎪+ ⎪ ; n − 1 < β1 < n ⎪ ⎪ ∂ x β1 ⎪ ⎪ n+1 ⎪ φ(X, t ) ∂ ⎪ n ⎩= σ β1 = n ∂ x n ∂t

σ

where 1 K = (n − β1 )

x a

1 (x − s)β1 +1−n

≤ 0 α, β1 ≤ 1

(4)

(5)

(6)

In the proposed work φ(X, t) is the noisy image and g(X, t) = log(∇φ(X, t)) where ∇φ is the gradient of the noisy image. and we have ∂ n g(X, tn ) ∂ β1 g(X, tn ) =K (7) β ∂x 1 ∂xn

842

P. Upadhyay

4 Numerical Computations The qualitative evaluation of performances of different methods along with the proposed method is demonstrated in Figs. 1 and 2, respectively. Figure 1 shows the denoising performance comparison of different methods with the proposed method for salt and pepper noise with 20% noise density. Figure 2 illustrates the denoising comparison of different methods with the proposed method for Gaussian noise with noise variance 0.5. Figure 3a–d shows the denoising performance of the proposed method with 15% noise density salt and pepper noise and the denoising performance of the proposed method in the case of Gaussian noise with noise variance 0.3. The salt and pepper noise corrupted image of Fig. 1a with 20% noise density and Gaussian noise corrupted image of Fig. 2a with noise variance 0.5 are shown in Fig. 3e, f respectively. For quantitative evaluation of performance, PSNR (Peak Signal-to-Noise Ratio) given in (8) and SSIM (Structural Similarity Index) given in (9) indexes [17] are used.   (28 − 1)2 (8) PSNR(X 1 , X 2 ) = 10 log10 MSE(X 1 , X 2 ) SSIM(X 1 , X 2 ) =

(2μ X 1 μ X 2 + D1 )(2σ X 1 X 2 + D2 ) (μ X 1 2 + μ X 2 2 + D1 )(σ X 1 2 + σ X 2 2 + D2 )

where MSE(X 1 , X 2 ) =

m−1 n−1 1  [X 1 (i, j) − X 2 (i, j)]2 mn i=0 j=0

(9)

(10)

X 1 , X 2 are the original and noisy image matrices, respectively. The size of the image is represented by m × n. μX 1 and μ X 2 are the mean values of X 1 and X 2 , respectively. D1 , D2 are constants. σ X2 1 and σ X2 2 are the variances of original and noisy image and σ X 1 X 2 is the covariance.

5 Results and Discussion The value of D in (3) is taken of the order 10−6 in all the experiments. The value of D is calculated as indicated by the method in [8]. The values of α and β1 are taken as 0.2 and 0.6, respectively for both Figs. 1 and 2. Tables 1 and 2 illustrate the comparison of PSNR and SSIM values among the proposed method and other recently reported state-of-the-art methods [9, 10, 13, 14] for salt and pepper noise with different noise densities and Gaussian noise with different noise variances, respectively. The experiments have been performed by taking α = 0.2, β1 = 0.6, α = 0.3, β1 = 0.7 and α = 0.4, β1 = 0.8, respectively (left to right in the proposed method columns of Tables 1 and 2). From Figs. 1 and 2, it is evident that the proposed

Space Fractionalized Lattice Boltzmann Model-Based Image . . .

843

Fig. 1 a Original image, b adaptive TV L1-based denoised image, c attention guided-based denoised image, d FdnCNN-based denoised image, e fractional calculus-based denoised image, f proposed method-based denoised image

844

P. Upadhyay

Fig. 2 a Original image, b adaptive TV L1-based denoised image, c attention guided-based denoised image, d FdnCNN,-based denoised image, e fractional calculus-based denoised image, f proposed method-based denoised image

Space Fractionalized Lattice Boltzmann Model-Based Image . . .

845

Fig. 3 a Noisy image corrupted by 15% noise density salt and pepper noise, b proposed methodbased denoised image, c noisy image corrupted by Gaussian noise with noise variance 0.3, d proposed method-based denoised image, e noisy image corrupted by 20% noise density salt and pepper noise, f noisy image corrupted by Gaussian noise with noise variance 0.3

846

P. Upadhyay

Table 1 PSNR/SSIM values with different noise densities for salt and pepper noise Noise density

Adaptive TV Attention L1 guided

FDnCNN

Fractional calculus

Proposed method

10

34.01/0.907

45.23/0.984

35.79/0.911

36.88/0.914

56.54/0.996

56.28/0.994

56.23/0.998

15

32.86/0.887

43.86/0.979

34.76/0.898

35.21/0.906

52.78/0.995

52.75/0.993

52.75/0.991

20

31.29/0.868

42.11/0.961

33.09/0.897

34.51/0.892

52.24/0.989

52.27/0.988

52.23/0.986

25

30.24/0.844

40.28/0.958

32.36/0.869

32.81/0.886

52.16/0.987

52.12/0.985

52.19/0.983

30

29.11/0/811

40.16/0.957

29.59/0.857

31.09/0.852

50.24/0.980

50.28/0.981

50.56/0.985

35

28.18/0.786

39.78/0.948

28.92/0.822

30.77/0.814

48.88/0.978

49.01/0.979

49.12/0.979

40

27.99/0.758

39.21/0.936

28.67/0.801

30.02/0.811

48.21/0.974

48.35/0.976

48.45/0.978

45

26.91/0.712

38.81/0.927

27.89/0.764

29.87/0.796

47.28/0.973

47.46/0.977

47.53/0.979

50

26.88/0.689

37.46/

25.55/0.756

29.65/ 0.783 47.05/0.971

47.08/0.973

47.11/0.975

Table 2 PSNR/SSIM values with different noise variances for Gaussian noise Noise variance

Adaptive TV Attention L1 guided

FDnCNN

Fractional calculus

Proposed method

1.0

36.80/0.917

45.56/0.981

37.98/0.925

37.81/0.914

58.88/0.992

58.26/0.994

58.71/0.997

1.5

34.65/0.898

45.75/0.972

35.16/0.887

36.47/0.906

57.18/0.994

57.44/0.996

57.55/0.992

2.0

32.08/0.854

43.29/0.968

34.11/0.8776 34.93/0.899

55.48/0.987

55.27/0.984

55.68/0.988

2.5

31.99/0.841

42.18/0.955

34.07/0.863

33.19/0.884

54.28/0.983

54.25/0.986

54.62/0.989

3.0

29.15/0/817

41.63/0.943

32.87/0.844

32.49/0.858

52.14/0.981

52.45/0.984

52.61/0.987

3.5

27.96/0.775

38.81/0.930

27.18/0.765

31.08/0.837

51.83/0.980

51.74/0.978

53.68/0.975

4.0

25.87/0.741

37.15/0.928

25.34/0.759

29.87/0.824

49.11/0.975

49.57/0.971

40.29/0.976

4.5

24.17/0.705

36.88/0.923

24.78/0.752

27.33/0.792

48.88/0.974

48.85/0.978

48.83/0.971

5.0

23.66/0.674

34.62/

23.09/0.741

25.58/ 0.781 46.17/0.965

46.59/0.968

46.86/0.964

method outperforms the recently reported results of [9, 10, 13, 14] in terms of both PSNR and SSIM values for both salt and pepper noise and Gaussian noise in the given range of noise density and noise variance. Experimentally, it was observed that for any fixed value of α, the denoising performance of the proposed method improves with the increasing value of β1 for both salt and pepper noise with different noise densities and Gaussian noise with different noise varainces.

6 Conclusion and Future Direction It is evident that the proposed space fractionalized time fractional sub-diffusion lattice Boltzmann model gives superior results in comparison to the recently reported stateof-the-art methods [9, 10, 13, 14] in the cases of both Gaussian noise and salt and pepper noise, qualitatively as well as in terms of both PSNR and SSIM values, respectively. In future, it is planned to study about the denoising performance of

Space Fractionalized Lattice Boltzmann Model-Based Image . . .

847

different other types of diffusion equation models with different other types of noise that affect images.

References 1. Li Y, Liu F, Turner Ian W, Li T (2018) Time-fractional diffusion equation for signal smoothing. Appl Math Comput 326 2. Zhao X, Huang K, Wang X, Shi M, Zhu X, Gao Q, Yu Z (2018) Reaction-diffusion equation based image restoration. Appl Math Comput 338 3. Niu S, Zhang S, Huang J, Bian Z, Chen W, Yu G, Liang Z, Ma J (2016) Low-dose cerebral perfusion computed tomography image restoration via low-rank and total variation regularizations. Neurocomputing 197(C):143–160 4. Osher S, Burger M, Goldfarb D, Xu J, Yin W (2013) An iterative regularization method for total variation-based image restoration. J Multiscale Model Simul 4(2):460–489 5. Zhang L, Dong W, Zhang D, Shi G (2010) Two-stage image denoising by principal component analysis with local pixel grouping. Pattern Recogn 43(4):1531–1549 6. Yasar H, Ceylan M, Ozturk AE (2013) Comparison of real and complex valued versions of wavelet transform, curvelet transform and ridgelet transform for medical image denoising. Int J Electron Mech Mechatron Eng 3(1) 7. Zeng T, Li X, Ng M (2010) Alternating minimization method for total variation based wavelet shrinkage model. Commun Comput Phys 8(5):976–994 8. Du R, Sun D, Shi B, Chai Z (2019) Lattice Boltzmann model for time sub-diffusion equation in Caputo sense. Appl Math Comput 358 9. Tian C, Xu Y, Li Z, Zuo W, Fei L, Liu H (2020) Attention guided CNN for image denoising. Neural Netw 124 10. Wang Q, Ma J, Yu S, Tan L (2020) Noise detection and image denoising based on fractional calculus. Chaos, Solitons Fractals 131 11. Pang ZF, Li ZH, Luo S, Zeng T (2020) Image denoising based on the adaptive weighted TV p regularization. Sig Process 167 12. Zhang L, Li Y, Wang P, Wei W, Xu S, Zhang Y (2019) A separation aggregation network for image denoising. Appl Soft Comput 81 13. Thanh DNH, Thanh LT, Hien NN, Prasath S (2019) Adaptive total variation L1 regularization for salt and pepper image denoising. Optik 24 14. Li L, Yu X, Jin Z, Zhao Z, Zhuang X, Liu Z (2020) FDnCNN based image denoising for multi-labfel localization measurement. Measurement 152 15. Gai S, Bao Z (2019) New image denoising algorithm via improved deep convolutional neural network with perceptive loss. Expert Syst Appl 138 16. Galiano G, Velasco J (2018) On a cross-diffusion system arising in image denoising. Comput Math Appl 76 17. Guo X, Liu F, Yao J, Chen Y, Tian X (2020) Multi-weighted nuclear norm minimization for real world image denoising. Optik 206

A Review About Analysis and Design Methodology of Two-Stage Operational Transconductance Amplifier (OTA) Usha Kumari and Rekha Yadav

Abstract This article represents the analysis about OTA properties and its behavior. It is an essential building block analog circuit implemented using OP-amp. The two-stage OTA design methodology is based on sub-optimal method because it distinguishes its inter-related parameters such as settling time, performance parameters, and noise. The OTA circuit uses the methodology based on automatic selection and sizing of analog circuits. It is divided into two parts and for the part sizing process as well as final size uses a generic topology method. This circuit topology distribution is based on reuse library which is classified in the topology tree. It is most common region in which OTA using by the researchers because of it provides a highly linear properties, needs less operating supply voltage and having low power dissipation. Previous OTA’s only work up to 200 MHz frequency, but in recent time, it works on RF and microwave both applications. The proposed topology is providing a balance in between the performance and complexity both. For the comparison of OTA technologies, some parameters should kept in mind such as technology scaling on different-different parameters such as frequency range, power supply, transconductance, DC gain, and power consumption. Keywords Operational transconductance amplifier · CMOS · MOSFET · Power dissipation · CMRR

1 Introduction In recent time, there are lots of VLSI designing techniques are developed and power supply and the CMOS circuit size both are going to decrease. But, in analog VLSI designing, mixed signal chip design technology is very much time-consuming; then, designing of the digital part even the digital circuit designing contains more components. These results lead to require a better technique for the designing of analog U. Kumari (B) · R. Yadav Department of Electronics and Communication, DCRUST Murthal, Murthal, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_61

849

850

U. Kumari and R. Yadav

Fig. 1 Model of ideal OTA [10]

circuits. It having two steps such as subsequent sizing and topology creation. For the synthesis of analog circuit, the methods are as (1) for topology synthesis/selection method (2) automatic sizing method. In past centuries for designing of analog circuits use pool selection method, but it is not rarely used by industries. The OTA becomes a basic building block element in analog circuits having linear characteristics. The OTA is a basic building block element for most of the analog circuit with linear input–output characteristics. The OTA behaves as two quadrant multipliers so, popular for the implementation of voltage control filters (VCF), voltage control oscillator (VCO), frequency synthesizer, instrumental amplifiers as well as biological applications. It is a voltage control current source, and the word operational stands for the measurement of difference between two input voltages. The block diagram representation of the OTA is shown in Fig. 1. It is an integral part of switch capacitor circuit because of its having high gain with fast response time. The OTA primary application is to drive low impedance sink, e.g., coaxial cable having less distortion with large bandwidth. Traditionally, it is implemented with help of cascading of two stages for the high gain. For the supply voltage, the CMOS scaling technique is used. The main feature of CMOS process reduces, and decreases power supply results to decrease in the power dissipation per cell. But, on the other hand, reduced supply voltage degrades circuit overall performance in terms of voltage swing and bandwidth. The MOSFET scaling leads to decrease the circuit performance factors such as bandwidth and voltage swing and enhance the static power dissipation. Scaling improves the digital circuits performance , while analog cells only gain modestly since noise and offset requirements prevent the use of small transistors. The cascading becomes better phase margin and contains faster settling time. For the designing of OTA, the best suited technology is CMOS which reduces power supply and provide high analog digital on chip integration. The OTA is identical to a typical operational amplifier because it includes a high impedance at differential input stage and may be utilized with negative feedback. The principal difference in between the OTA and the ideal operational amplifier is the output becomes in form of voltage but in OTA is in form of current. Another difference is OTA uses open-loop configuration without any feedback and the resistance uses at the output stage that is kept always in the saturation region [1–3].

A Review About Analysis and Design Methodology of Two-Stage . . .

851

2 Literature Survey Singh [4], “Power Efficient Biquadratic Filter designing using OTA” the paper contributes the work about the OTA which is simulated with biquadratic filter technology. The digital circuits have some limitation such as supply voltage variations, environment effects, so they degrade the performance of the circuit. To overcome these limitations, use analog circuits with low G m , power dissipation and gain. By this proposed circuit optimized gain margin, GBP, power area, slew rate, voltage offset, areas, phase, etc., compared to other previous proposed filter circuits such as OTA type-C filters, OTA GMC filters designed with the of OTA. The device parameters such as DC gain, frequency, power dissipation, bandwidth, and center frequency all are electronically tuneable. The software implementation is done with help of CADENCE Virtuoso tool. The filter works at a frequency 425 kHz and zero-pass band gain. The power dissipation of OTA is 4.3 mW at supply voltage 1.8 DC Volt. The biasing voltage and current is 2.5 V, 50 µA and load capacitance 10 pF [4]. Rodrigues and Sushma [5] (2019) “Design of Low Transconductance OTA and its Application in Active Filter Design” the paper shows that a low power and low G m amplifier and in modern world requires portable as well as hand-held device use for the medical applications; OTA contains all its specification, so it is use. The paper proposed the OTA based on serial parallel gate-driven topology and its simulation done on 180 nm Cadence Virtuoso CMOS technology. The OTA design with help active filter (bi-quad Gm-C structure). The circuit consumes 194.33 nW and G m = 24.63 nS. With 1.6 V supply voltage [5]. Rani et al. [6], “Design and Analysis of CMOS Low Power OTA for Biomedical Applications” the proposed topology is acrdiac OTA which is used in medical purpose (Cardiac Implantable Medical Devices). In todays environment promoting for the device having low noise and large life time. The device is used for operating heart pulses in form of low noise, high gain, and less power dissipation. The low power OTA design is to convert the input voltage into current output with low noise and power with high gain. The simulation done with help of CMOS 45 nm technology, and the simulation results display as 11.9 µW power consumption at a frequency range of 20 kHz and the gain at this particular frequency is 51 dB. The input supply voltage is 1 V, and referring this input, the noise is 1.22 µVrms. This proposed circuit provides the suitable, sufficient, and satisfied output [6, 7]. Rambabu et al. [8] “A 90 nm Gain Enhanced Modified Cascode OTA Structure with Positive Feedback” the paper presents the OTA with positive feedback having high gain and fast latency in terms of speed. These applications use for the large bandwidth applications and low power circuits for the current mode technology in analog circuits. The proposed OTA simulation is done using 90 nm CMOS technology with input supply voltage is 1 V and 0.5 fF load capacitance at output for the load. The circuit consumes 11.39 µW power at a particular frequency range in between 8.33 and 11.12 GHz and gain is 24–63.3 dB [8].

852

U. Kumari and R. Yadav

Hari Krishnan and Hemavathi [18] “Analysis and Design of Operational Transconductance Amplifier (OTA) using Cadence tool” the paper represents the design simulation of two-stage OTA using switch capacitor circuit as load. It provides an optimal solution because of their inter-related matrices are decoupled, e.g., settling performance and noise. The circuit simulation is done with the cadence tool for the observation of settling time and also transient simulation as well as AC analysis. The simulation done at 90 nm CMOS technology had a power supply 1.1 V settling time 1.5 ns at a power dissipation of 2.3 mW with 174◦ phase plot [9, 10]. Patel et al. [19] “Design of Balanced Operational Transconductance Amplifier (OTA)” the paper describes the general description about the OTA and introduce a latest topology that concept based on balanced OTA. OTA is a circuit element used in the analog circuits where use high stability and less gain. The circuit is implemented in 180 nm CADENCE Specter environment. The circuit consumes 41.96 µW at supply voltage of 1.8 V and having biasing current 10 µA. The simulation results show gain is approximately 44.18 dB and UGB is 55.11 MHz. The gain margin and phase margin is 25.52 dB, 63.04◦ with a slew rate 30 V/µs. the balance OTA based on current mirror having a self-biased circuit as load.It has large slew rate gain bandwidth and also transconductance. The phase margin shown in simulation, and it is defining the stability of the circuit [11]. Singh et al. [7], “Design of Low Voltage OTA for Bio-medical Application” the author proposed the classical folded cascode OTA which is generally used in many biomedical application such as ECG, EEG, pulse-oximetry, PCG, temperature sensing, blood pressure, and neural recording, for sensing small portable wireless devices because of the medical equipment’s are operated at a very small voltage. They have weak amplitude and signal and low frequency; it is a challenge for biomedical electronic equipment’s to detect such signals. To fulfill all these condition needs a high CMRR, accurate and high gain and also for the amplification of common mode noise signals. It can also be used only to the amplification of the signals. Sajad et al. “Design and comparative analysis of high performance carbon nanotube-based operational transconductance amplifiers.” Nano 10.03 (2015): 1550039 [7]. Sheikh et al. [20], “Comparative Analysis of CMOS OTA” the article discusses about the basic building block element OTA with various topologies and talks about the OTA properties such as linearity, speed and low power operating voltage as well as low power consumption. The author gave a comparison view in between the traditional and conventical OTA in terms of the transconductance, DC gain, supply voltage, frequency range, and in form of power consumption. The paper shows mainly about the three topologies such as (1) Simple transconductance amplifier, (2) Cascode transconductance amplifier, (3) Regulated cascode positive feedback amplifier. The positive feedback cascode transconductance amplifier is the superior quality topology; it provides better stability as well as linearity and output impedance [7]. Palodiya et al. [12], “Design of Small-Gm Operational Transconductance Amplifier in 0.18 µm Technology” the paper proposed a small gm concept of OTA, and in modern vlsi, as the transistor size and operating voltage, both should kept low. Here, author proposed a technique that is current division technique and has a small capacitance uses in high-pass filters such as IC fabrications which are used in medi-

A Review About Analysis and Design Methodology of Two-Stage . . .

853

cal applications. The proposed work is implemented at 0.18 µm CMOS technology. The OTA having a biasing and supply voltage both are same is 1.8 V. The simulation result having an open-loop voltage gain 76 dB, UGB is 90.25 MHz; PSRR is 80 dB; CMRR is 91 dB as well as the power consumption is in mW that is 0.74 mW with slew rate 2.344 V/µs [12]. Karnik et al. [13], “Design of Operational Trans conductance Amplifier in 0.18 µm Technology” the paper designs a simple OTA using 0.18 µm technology. The paper shows the OTA is used in designing of analog circuits such as filter circuits, neural networks, ADC, and instrumental amplifiers. The OTA is similar to the OPAMP, but the only difference is the OTA has an output stage in the form of current. The paper discusses the parameters such as CMRR, power dissipation, PSNR, AC, and transient response of OTA. The simulation results show it consumes a power of 10 mW at 37 kHz gain bandwidth frequency, slew rate 2.344 V/µs, CMRR = 90 dB, PSNR = 85 dB, and open-loop gain is 71 dB [13].

3 Design Methodology In the OTA, the term transconductance defines as a process in which input voltage converts into the output current. The OTA output current mainly proportional to the variation in between the input voltages. It can also be amplified or configure to integrate either current or voltage (Fig. 2). The input and output characteristics of ideal OTA are shown in Fig. 3. The OTA width is in linear region that is inversely proportional to the transconductance magnitude, and in this case, the output current is maximum. Simply, it says that linear region is greater, so the gm should be less. In OTA, the input and output resistance should be always large. For the maximum transfer of voltage from input to the output, the input impedance should be infinite. The function of OTA mainly depends on three basis topologies such are as shown in Fig. 4. Figure 4a represents the simple transconductance amplifier, and the mosfet M1 operates in saturation region and converts the voltage at input into output current. This method has some limitation such as linearity and very low output impedance. In Fig. 4b, second topology is also known as cascode transconductance amplifier. In this topology, the Mosfet M1 always operates in ohmic region, and M2 is use for the

Fig. 2 General block diagram of OTA [12]

854

U. Kumari and R. Yadav

Fig. 3 I/O characteristics of ideal OTA [14]

Fig. 4 OTA topologies [9]

isolation in between input and output terminals. It enhances the linearity property as well as output impedance. In third topology, Fig. 4c known as regulated cascode feedback transconductance amplifier. The negative feedback is provided with help of amplifier having gain A. The input voltage changes are directly amplified with help of amplifier and ignore that effect of voltage changes and enhance the linearity. It provides better stability than other two topologies [13, 15]. The OTA is a symmetric to operational amplifier with large output impedance. It implements many kinds of analog circuits such as mixers, four-quadrant multipliers, data converters, continuous-time filters, modulators, and high-speed A/D converter

A Review About Analysis and Design Methodology of Two-Stage . . .

855

Fig. 5 Conventical OTA schematic [9]

circuits. The conventional OTA architecture is shown in Fig. 5. The voltage gain (Av ) may be represented as gm (1) Av = gout ‘gm ’ presents the transconductance and gout is output transconductance of amplifier. gm = gm1 gout =

gds1 gd3 gds5 gd7 gds3 gds5

(2) (3)

A modification of conventional OTA using telescopic CMOS topology is presented in Fig. 6 which achieves large gain, higher swing, low power dissipation, and highest speed. In this conventical OTA, the mosfet 7 and 8 is shorted with the mosfet 1 and 2 because of the M7 and M8 are operated with help of differential input of the circuit. It enhances the UGF and output swing. The overall transconductance of circuit is (4) gm ≈ gm1 + gm7

It is a folded cascaded 2 stage, telescopic topology having high gain high speed with high swing with low power dissipation. It contains higher transconductance gain compared to the conventical circuit. In Fig. 7, the new topology that is current mirror OTA or symmetric OTA. In this circuit construction, the basic elements are to be used. The differential pair is used at input stage. M2, M4 and M1, M3 are the self-bios inverter for composition of

856

U. Kumari and R. Yadav

Fig. 6 Modified OTA using CMOS topology [9]

Fig. 7 Current mirror OTA [11]

subcircuits. In ideal OTA, gm is function of input as well as the temperature. And also in ideal OTA contains infinite input impedence which means having no current at input and CMRR should also kept infinite, while the differential input function is to control ideal current source. Here, output current is not proportional to the input current. The OTA also contains some nonlinear characteristics such as [10, 16] • Higher the input differential voltages leads to nonlinear due to input stage transistor characteristics. • G m is sensitive with temp leads to nonlinear characteristics • Due to input and output circuit impedence variations.

A Review About Analysis and Design Methodology of Two-Stage . . .

857

4 Mathematical Modeling In general, the ideal OTA has output current becomes a linear function of differential input, and it is calculated as Iout = (Vin+ − Vin− )gm

(5)

Here, Vin+ is the input voltage at non-inverting terminal; Vin− is the input voltage at the inverting terminal, and gm is the transconductance of amplifier. The output voltage is Vout = Iout .Rload . It is a product of output current and the load resistance. The voltage gain of an OTA defines as the ration of the voltage at output terminal to the differential voltage at input terminal. G voltage =

Vout = Rload .gm Vin+ − Vin−

(6)

The Iabc is the amplifier bios current, and it is usually control the transconductance of amplifier. The gm is directly related to the current. This feature is use for the electronically control of amplifier. The symbol and implementation using CMOS is represented in Figs. 8 and 9. The OTA characteristics equation is define as I0 = G m Vin

Fig. 8 Symbol of OTA [5]

Fig. 9 Equivalent CMOS representation of OTA [5]

(7)

858

U. Kumari and R. Yadav

Here, Vin is the difference of input voltage at inverting and non-inverting voltage. And G m defines transconductance and mathematically its defines as K G m = √ (Vb − Vss − 2Vth ) 2

(8)

Here, K = µCox W/L and Cox stand for the oxide capacitance µ defines mobility W, L for the width and length of channel as well as Vth for the threshold voltage of Mosfet.

4.1 Comparision Table Table 1 shows the comparison in between the different parameters at various CMOS technology. From here see as the technology goes downward the power dissipation is less, and the power dissipation is also directly proportional to the supply voltage [6, 12, 14, 17].

5 Conclusion Here, conclude that the OPAMP is replaced by the OTA at the higher frequency because of OPAMP is not suitable for the high frequency applications. OTA design using CMOS technology is best suited circuit for high frequency analog circuit where requires less power dissipation. It also provides higher efficiency in case of analog digital on chip integration technique. A regulated cascode negative feedback topology via amplifier provides the superior results in case of linearity in terms of voltage fluctuations, better stability in terms of gain. Table 1 shows that as the typical lengths of CMOS devices are diminished, both their capacitive parasitic and channel delay are reduced, increasing the transistors’ cut-off frequencies. OTA has inverse relation with power consumption and bandwidth; power consumption decreases and bandwidth going to increase; it enhance the circuit performance and gave better results. High linearity, high frequency, and lower power consumption are the three major considerations in OTA design; however, sacrifices must be made between these qualities in order to build effective OTA circuits.

Ref. No.

[r1]

[r2]

[r3]

[r6]

[r7]

[r8]

[r9]

[r13]

S. No.

1

2

3

4

5

6

7

8

1.8 1.8

41.96 µW

0.74 mW

180 nm

0.18 µm 1.8

1.1

10 mW

0.18 µm

1.6 1

90 nm

90 nm

180 nm

194.336 nW

11.9 µW

11.39 µW

1.8 1

4.3 mW

45 nm

0.18 µm

Supply voltage (V)

Power dissipation

Technology used

Table 1 Comparision table at various CMOS technology

90.25 MHz

55.11 MHz

37 kHz

10 MHz

11.12 GHz

250.3 kHz

20 kHz

UGB

91/80

90/85

62/96

CMRR/PSRR (dB)

76

44.18

71

63.3

21.15

51

70.37

Open loop gain (dB)

2.344 V/µs

30 V/µs

2.344 V/µs

539.6 V/µs

1.22 µVrms

46.97 V/µs

Slew rate

0.1 pF

330 fF

0.5 fF

10 pF

Load capacitance

25.52 dB/63.04◦

38.1/62.6 ◦

GM/PM

A Review About Analysis and Design Methodology of Two-Stage . . . 859

860

U. Kumari and R. Yadav

References 1. Bin Wan W, Ruslan S, Ahmad N, Jubadi WM, Sanudin R (2019) Design of very low-voltages and high-performance CMOS gate-driven operational amplifier. Indonesian J Electr Eng Comput Sci (IJEECS) 14(1):230 2. Saravanakumar O, Kaleeswari N, Rajendran K (2014) Design and analysis of two-stage operational transconductance amplifier (OTA) using cadence tool. Int J Emerg Technol Adv Eng 4(4) 3. Liu Q, Nwankpa CO (2005) In: 2005 IEEE international symposium on circuits and systems. IEEE, pp 5302–5305 4. Singh G (2020) A novel simple 4-D hyperchaotic system with a saddle-point index-2 equilibrium point and metastability: design and FPGA-based applications. Circuits Syst 11(4):39 5. Rodrigues SN, Sushma P (2019) In: 2019 3rd international conference on trends in electronics and informatics (ICOEI). IEEE, pp 921–925 6. Rani DGN, Gifta G, Meenakshi M, Gomathy C, Gowsalaya T (2019) In: 2019 4th international conference on recent trends on electronics, information, communication and technology (RTEICT). IEEE, pp 871–876 7. Raghav HS, Singh B, Maheshwari S (2013) In: 2013 annual international conference on emerging research areas and 2013 international conference on microelectronics, communications and renewable energy. IEEE, pp 1–5 8. Rambabu S, Majumder A, Mondai AJ (2017) In: 2017 2nd international conference on communication and electronics systems (ICCES). IEEE, pp 203–207 9. Gerlach A, Scheible J, Rosahl T, Eitrich FT (2017) In: Design, automation and test in Europe conference and exhibition (DATE). IEEE, pp 898–901 10. Grise W (2004) Department of IET, Morehead State University, Morehead, KY, 40351 11. Yazdani-Nejad A, Pishgar SH (2016) Design and analysis of DTMOS based low voltage OTA and its filter application. Int J Comput Appl 139(7) 12. Palodiya V, Karnik S, Shrivastava M, Dodiya J (2012) Analysis and design of two-stage operational transconductance amplifier. Int J Eng Res Technol (IJERT) 1(5) 13. Karnik S, Kushwaha AK, Jain PK, Ajnar D (2016) Design of operational trans conductance amplifier in 0.18µm Technology. Int J Mod Eng Res (IJMER) 2(1) 14. Loan SA, Nizamuddin M, Alamoud AR, Abbasi SA (2015) Design and comparative analysis of high-performance carbon nanotube-based operational transconductance amplifiers. Nano 10(03):1550039 15. Majumdar D (2004) In: 17th international conference on VLSI design. Proceedings. IEEE, pp 47–51 16. Cabrera-Bernal E, Pennisi S, Grasso AD, Torralba A, Carvajal RG (2016) 0.7V three-stage class-AB CMOS operational transconductance amplifier. IEEE Trans Circuits Syst I Regul Pap 63(11):1807 17. Riad J, Estrada-López JJ, Sánchez-Sinencio E (2019) Classification and design space exploration of low-power three-stage operational transconductance amplifier architectures for wide load ranges. Electronics 8(11):1268 18. Hari Krishnan S, Hemavathi H (2017) Analysis and design of operational transconductance amplifier (OTA) using cadence tool. Int J Engg Res Rev 5(4) 19. Patel T, Raikar K, Hiremath S (2015) Design of balanced operational transconductance amplifier (OTA). Int J Emerg Technol Computer Sci Electron (IJETCSE), ISSN, pp 976–1353 20. Sheikh ST, Dahigoankar DJ, Lohana H (2012) Comparative analysis of CMOS OTA. IOSR J VLSI Signal Process (iosr-jvsp) issn, 2319–4200

Design and Optimization of Wideband RF Energy Harvesting Antenna for Low-Power Wireless Sensor Applications Geetanjali, Poonam Jindal, Nitin Saluja, Neeru Kashyap, and Nitika Dhingra Abstract A wideband RF energy harvesting antenna is proposed in this paper to be used for low-power wireless sensor applications. It consists of multi-element circular patch antenna which could be interfaced to a rectifier circuit to drive low-power devices. The proposed antenna acts as receptor of the RF waves from ISM WLAN band and the C band, creating a wide impedance bandwidth of 3.1 GHz ranging from 4.4 to 7.5 GHz. The antenna has the directivity of 5.1 dBi. The radiation pattern is directive to enable effective energy harvesting from the targeted RF sources. The antenna parameters are optimized for maximum power transfer to the rectifier circuit. The designed antenna is highly efficient with the maximum radiation efficiency of 90%. The antenna is capable of driving the low-power sensors and enables the effective use of wireless sensor networks for various IoT applications. Keywords Wideband antenna · RF energy harvesting · Low-power sensors · IoT applications

Geetanjali · P. Jindal (B) · N. Saluja · N. Kashyap · N. Dhingra Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India e-mail: [email protected] Geetanjali e-mail: [email protected] N. Saluja e-mail: [email protected] N. Kashyap e-mail: [email protected] N. Dhingra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_62

861

862

Geetanjali et al.

1 Introduction For a large number of low-power sensors deployed in different fields, it is not feasible to change the batteries of all these devices frequently or to pull cables everywhere to connect these devices to the grid. One possible solution to drive these devices wirelessly, without the need of batteries, is by using rectenna based on radio frequency (RF) energy harvesting for wireless power [1]. It consists of a receiving antenna and a rectifier circuit. It can also be used to provide power to radio frequency identification device (RFID) tags and embedded electronics devices. The RF energy is harvested from multiple sources like long-term evolution (LTE), Global System for Mobile communication (GSM), wireless local area network (WLAN) and satellite network towers using a multiband/wideband antenna [2]. The most preferred antenna that is used as receptor of RF energy is microstrip patch antenna due to its compactness, ease of fabrication, placement and integration to different devices [3]. It is matched to a rectifier to convert the electromagnetic signal to DC voltage. In a rectifier circuit, Schottky diodes are preferred because of their low threshold value, low power dissipation and fast switching [4]. The output of the rectifier is amplified using voltage multiplier. This harvested energy is stored in a capacitor which can power up the low-power sensors. This system design eliminates the need of batteries and can be placed anywhere, operating for an infinite time. These sensors are used to control operations in smart buildings, smart agriculture, smart retailing, RFID tagging, home automation, etc. A number of techniques have been presented in the literature to form an efficient antenna for effective energy harvesting from the ambient RF sources and obtain a good conversion efficiency using the rectifier circuit [5]. A summary comparing the parameters of different designs of antenna is presented in Table 1: Gain and bandwidth are the important parameters to look in for the effective operation of the antenna to collect energy from the surrounding RF sources. In [6], the authors have proposed a compact (of 55 × 55 × 1.0 mm3 ), wideband rectangular slot antenna with a dual polarization, for harvesting RF energy from the surrounding towers. It generates a wide bandwidth of 1 GHz which ranges between 1.7 and 2.7 GHz, but the designed structure is complex as it utilizes two different feedlines (a microstrip feedline and a CPW feed line). A gap coupled highly sensitive rectenna is presented at 60 GHz in [7] in which a low-loss waveguide structure connects the antenna to the rectifier through microstrip transitions, to form a composite rectenna system. The designed antenna is suitable for low power transmission for millimeter wave point-to-point communication. A circularly polarized dipole antenna design is given for RF energy harvesting in [8] achieving bandwidth of 2.75 GHz, but there is no mention of the gain parameter in the paper. In this design, a coaxial probe is used to feed the input power into the antenna, but it results in reduced bandwidth and is comparatively difficult to model. In [9], the authors have investigated a high-gain wideband rectenna with conversion efficiency around 50%. The rectenna design is based on a reflector below the antenna to increase the gain; however, it can affect bandwidth and input impedance of the wideband antenna. A 16-port antenna

Design and Optimization of Wideband RF Energy Harvesting Antenna …

863

Table 1 Summary of comparison of parameters of different designs of antenna Ref. No.

Frequency range/bandwidth

Polarization

Gain (dBi) Input Power (dBm)

Output power (dBm)

Conversion efficiency

[4]

0.866, 0.915, 2.45, 6-Tone



1.8, 1.9, 3.85

−5

− 6.31

65, 62, 60, 78

[6]

1.7–2.7 GHz

Dual polarization

2.1–3.0

−3

−1

< 40%

[7]

60–62.8 GHz

13.3





49.3%,

[8]

3.3–6.05 GHz

Circular



15

11.89

48.9%

[9]

1.9–3.2 GHz









50%

[10]

1.8 GHz, 2.1 GHz and Wi-Fi bands

Dual polarization

11

− 6.99

-10.97

40

[11]

1.8–2.5 GHz

Dual polarization



− 10

− 12.6

55%

[12]

4.9, 7.35 and 9.8 GHz





7.9

6.22

67.5%

[13]

1–3 GHz



5.9







design using multiple stubs for impedance matching and rectification is presented in [10]. The multiport antennas, however, have low mutual coupling. In [11], a dualpolarized cross-dipole antenna with harmonic rejection property is proposed with a bandwidth of 0.7 GHz which is not truly broadband and also lacks information on the gain of antenna. In [12], the author has designed a multifrequency antenna structure resonating at 4.9, 7.35 and 9.8 GHz for rectenna applications with efficiency of 67.5%. In [13], the efficiency as high as 78% has been achieved, but circular polarization for the designed antenna is not achieved and design complexity is increased. The goal of this paper is to harvest energy from as many RF sources by designing a simple wideband multi-element patch antenna on a single plane using CPW feed. It can be used in combination with rectifier circuit for supplying power to the wireless sensors for IoT or 5G applications.

2 Methodology This paper presents a multi-element circular parasitic patch antenna, formed on a single plane using conventional coplanar waveguide (CPW) feed. It effectively collects RF energy from the RF sources and can be used in a rectenna system by connecting it to an optimum rectifier circuit to achieve high conversion efficiency of AC to DC power. The components involved to form a rectenna system consist of an RF source, receiving patch antenna, impedance matching circuit, rectifier, voltage amplifier and super capacitor. The block diagram is shown in Fig. 1.

864

Geetanjali et al.

Fig. 1 Block diagram of the complete RF energy harvesting system

For the design of the receiving antenna structure in CST Microwave studio, the process flow is shown in Fig. 2 in which many parameters are considered at each step and need to be optimized for obtaining the desired results.

3 Antenna Design The receiving antenna consists of two parasitic circular patches coupled to a central driven patch (with radius ‘R’,) located on the FR4 substrate with size Ls × Ws × h mm3 . The permittivity value of the substrate used is 4.4, and the sheet thickness (h) of 1.6 mm is used to design the antenna structure. The two parasitic patches are of the same size with radius ‘r’ and symmetrically located around the driven patch. The distance of separation of each patch from the driven patch is ‘d’. A circular microstrip patch antenna is chosen as it is comparatively easy to design due to its dependency on one parameter i.e. the patch radius [3, 14]. The main radiating patch is driven by a CPW feedline of length ‘Lf’ and width ‘Wf’. The width of this feedline determines the impedance matching between the antenna structure and the cable used for transmission. The dimensions of the CPW

Design and Optimization of Wideband RF Energy Harvesting Antenna …

Fig. 2 Flowchart representing the parametric optimization

865

866

Geetanjali et al.

Fig. 3 A circular parasitic patch antenna array

Table 2 Optimized dimensions of the proposed antenna Parameter

Ls

Ws

Lg

Wg

Lf

Wf

s

R

r

d

x

y

Dimension(mm)

48

42

22

8

11

2

1

9

3.5

14.5

0.5

7

ground plane are Lg × Wg on either side of the feedline. The whole arrangement of the patches and the ground is built on same side of the substrate making the design quite simple to fabricate also. The geometry of the proposed structure is illustrated in Fig. 3. The dimensions of the antenna are optimized and are presented in Table 2. Parametric Studies The parametric variations are carried out to see the effect of various input parameters that affect the performance of the antenna and help to generate the desired results in the required bands. For the initial design of the circular patch antenna, based on the cavity model formulation, for the dominant TM110 mode, the radius ‘a’ of the main radiating patch is determined using the equation: a=

1+

2h πεr F

F  πF  1/2 ln 2h + 1.7726

where F=

8.791 × 109 √ f r εr

εr –is relative permittivity of the substrate

(1)

Design and Optimization of Wideband RF Energy Harvesting Antenna …

867

f r –resonant frequency for the dominant TM110 mode The antenna design is then modified by inserting the parasitic patches around the driven patch which get electromagnetically coupled to the main patch and create closely resonating bands to form a complete wideband operation. Further, cross-slits are added to the driven patch to cause redistribution of the current in the closely spaced resonances to enhance the antenna performance. The parametric sweep is applied to various parameters of the receiving antenna design to check where the antenna response is best in terms of the output parameters desired. The results are obtained after applying different geometric parameters which affect the antenna performance: 1. Effect of change in radius of the main radiating patch: It is evident from Fig. 4 that maximum bandwidth of 4.55 GHz (ranging from 4.35–8.90 GHz) is achieved by keeping the radius of the patch at 10 mm with considerable value of return loss (− 60 dB). If the radius is decreased, it reduces the bandwidth as well as the return loss value. 2. Effect of change in width of the slits in the central driven patch: As shown in Fig. 5, the width of the slits in the patch does not seem to have a significant effect in the bandwidth of the antenna.

Fig. 4 Effect of change in radius of the main radiating patch

Fig. 5 Effect of change in width of the slits in the central driven patch

868

Geetanjali et al.

Fig. 6 Effect of change in radius of the parasitic patches

3. Effect of change in radius of the parasitic patches: The optimum value of the radius of the parasitic patches is 3.5 mm which achieves the maximum bandwidth of 3.1 GHz. If the radius is increased beyond 3.5 mm, the bandwidth is seen to decrease (Fig. 6). The proposed design can be interfaced to a matched rectifier circuit which converts the power received from the antenna into DC voltage. The efficiency of the rectifier circuit is governed by Eq. 2 [15–17]. η(%) =

Pout × 100 Pin

(2)

4 Results and Discussions The receiving antenna structure is designed and simulated in CST Microwave studio. The designed multi-element circular patch antenna is subjected to simulation to observe the antenna performance in terms of S11 parameter, bandwidth, surface current, gain and radiation parameters. The S11 parameter result shown in Fig. 7 represents the capability of the antenna to receive the radiations or energy from free space efficiently. In Fig. 7a, b, the geometry and S11 result of the circular patch antenna without the parasitic patches is shown. It delivers a bandwidth of 1.95 GHz ranging from 4.4 to 6.35 GHz, whereas the addition of parasitic patches (Fig. 7c) causes the closely spaced resonant bands to combine and generate a wider bandwidth. Figure 7d shows a wideband response with frequencies ranging from 4.44 to 7.46 GHz and low reflection losses (− 24 dB). The impedance bandwidth of 3.02 GHz is achieved. The plot of surface current density in Fig. 8 shows that the antenna is capable of radiating in a wideband. As shown in Fig. 8, the lower periphery of the driven patch and the parasitic patches are responsible for radiation at resonance frequency of

Design and Optimization of Wideband RF Energy Harvesting Antenna …

869

Fig. 7 a Circular patch antenna without the parasitic patches. b S11 parameter for the circular patch antenna without the parasitic patches

5.03 GHz, whereas at 6.33 the surface current is distributed at the entire circumference of the parasitic elements as well as the driven patch causes the resonance at 6.33 GHz. These closely spaced resonances create a wideband operation. The antenna radiation efficiency, as seen from Fig. 9, lies between 86 and 92% over the desired range of frequencies which is effective to receive the harvested energy from the wireless base stations. The radiation pattern plots justify the harvesting mechanism by the proposed antenna structure. Figure 10a, b show the azimuth pattern for the resonant frequencies 5.03 and 6.33, respectively. The graphs show the directional behavior of the proposed structure in terms of reception of energy targeting particular bands or base stations for wireless communications. The VSWR plot shown in Fig. 11 shows that the antenna is capable of receiving energy without any significant reflections as the value of VSWR lies below 2 for the desired bandwidth.

Fig. 8 a Surface current density of circular parasitic patch antenna array at 5.03 GHz. b Surface current density of circular parasitic patch antenna array at 6.33 GHz

870

Geetanjali et al.

Fig. 9 Radiation efficiency of the proposed antenna with respect to frequency

Fig. 10 a Azimuthal pattern of the radiation from the proposed design at 5.03 GHz. b Azimuthal pattern of the radiation from the proposed design at 6.33 GHz

Fig. 11 VSWR of the proposed design

Design and Optimization of Wideband RF Energy Harvesting Antenna …

871

Due to multi-element circular patch antenna, it is capable of capturing signal over a wider frequency range targeting a number of wireless sources at one time. As such, the overall received power is increased along with increased bandwidth. The output of the antenna can be given to a voltage multiplier and rectifier to get sufficient DC voltage to operate low-power actuators or sensors used in wireless communication.

5 Conclusion A compact wideband, circular patch antenna with cross-slots is proposed for intercepting energy from RF sources for energy harvesting applications in low-power devices. A wide operational bandwidth of 3.02 GHz is achieved (4.44–7.46 GHz) using CPW feeding technique. An acceptable value of return loss of − 22.5 dB and − 23 dB is obtained at both the bands as shown by the S11 parameter. The simulated results confirm a high-gain antenna with directive properties presented in the radiation pattern plots. The proposed structure shows a maximum radiation efficiency of 90%. Hence, we obtain a broadband antenna structure for harvesting RF energy with a high-gain and wide frequency response and space coverage. It can be used in rectenna to provide power to small sensors deployed in different fields and thus solves the issue of repeatedly charging of the battery of such devices or its replacement again and again.

References 1. Kim BS et al (2014) Wireless energy harvesting. Proc IEEE 102(11):1649–1666 2. Ghiglino CM (2010) Ultra-wideband rectenna design for electromagnetic energy harvesting. October (October) 3. Pandey S, Markam K (2016) Design and analysis of circular shape microstrip patch antenna for c-band applications. Int J Adv Res Comput Sci Technol (IJARCST) 4(2):169–171. [Online]. Available: www.ijarcst.com 4. Quddious A, Zahid S, Tahir FA, Antoniades MA, Vryonides P, Nikolaou S (2021) Dual-Band compact rectenna for UHF and ISM wireless power transfer systems. IEEE Trans Antennas Propag 69(4):2392–2397. https://doi.org/10.1109/TAP.2020.3025299 5. Tran LG, Cha HK, Park WT (2017) RF power harvesting: a review on designing methodologies and applications. Micro Nano Syst Lett 5(1). https://doi.org/10.1186/s40486-017-0051-0 6. Alja’afreh SS, Song C, Huang Y, Xing L, Xu Q (2020) A dual-port, dual-polarized and wideband slot rectenna for ambient RF energy harvesting. In: 14th European conference antennas propagation, EuCAP 2020, pp 3–7. https://doi.org/10.23919/EuCAP48036.2020.9135441 7. Hannachi C, Boumaiza S, Tatu SO (2019) A highly sensitive broadband rectenna for low power millimeter-wave energy harvesting applications. In: 2018 IEEE wireless power transfer conference WPTC 2018, pp 1–4.https://doi.org/10.1109/WPT.2018.8639130 8. Guo C, Zhang W (2019) A wideband CP dipole rectenna for RF energy harvesting. In: 2019 Cross strait quad-regional radio science wireless technology conference CSQRWC 2019, pp 1–3. https://doi.org/10.1109/CSQRWC.2019.8799279

872

Geetanjali et al.

9. Zhang J, Wu ZP, Liu CG, Zhang BH, Zhang B (2015) A double-sided rectenna design for RF energy harvesting. In: 2015 IEEE international wireless symposium (IWS 2015), no 430070, pp 6–9.https://doi.org/10.1109/IEEE-IWS.2015.7164617 10. Shen S, Zhang Y, Chiu CY, Murch R (2020) A Triple-band high-gain multibeam ambient RF energy harvesting system utilizing hybrid combining. IEEE Trans Ind Electron 67(11):9215– 9226. https://doi.org/10.1109/TIE.2019.2952819 11. Song C, Huang Y, Zhou J, Zhang J, Yuan S, Carter P (2015) A high-efficiency broadband rectenna for ambient wireless energy harvesting. IEEE Trans Antennas Propag 63(8):3486– 3495. https://doi.org/10.1109/TAP.2015.2431719 12. Ren R, Huang J, Sun H (2020) Investigation of rectenna’s bandwidth for RF energy harvesting. In: 2020 IEEE MTT-S international microwave workshop series on advanced materials and processes for RF and THz applications (IMWS-AMP). IEEE, pp 1–2. https://doi.org/10.1109/ IMWS-AMP49156.2020.9199653. 13. Reyna A, Panduro MA, Balderas LI (2018) A wideband rectenna array for RF energy harvesting applications. IET Conf Publ 2018(CP741):4–6. https://doi.org/10.1049/cp.2018.0694 14. Singla G, Khanna R (2017) A modified compact ultra wideband antenna with band rejection for WLAN applications. Wirel Pers Commun 97(1):683–693. https://doi.org/10.1007/s11277017-4531-6 15. He Z, Liu C (2020) A compact high-efficiency broadband rectifier with a wide dynamic range of input power for energy harvesting. IEEE Microw Wirel Compon Lett 30(4):433–436. https:// doi.org/10.1109/LMWC.2020.2979711 16. Fakharian MM (2020) A wideband rectenna using high gain fractal planar monopole antenna array for RF energy scavenging. Int J Antennas Propag. https://doi.org/10.1155/2020/3489323 17. Song C et al (2016) A novel six-band dual CP rectenna using improved impedance matching technique for ambient RF energy harvesting. IEEE Trans Antennas Propag 64(7):3160–3171. https://doi.org/10.1109/TAP.2016.2565697

Forecasting of Novel Corona Cases in India Using LSTM-Based Recurrent Neural Networks Sawan Kumar Tripathi, Sanjeev Mishra, and S. D. Purohit

Abstract Novel corona disease is spreading all over the world. The cases of the corona virus are increasing drastically day by day. Therefore, it is necessary to predict the cases in advance to handle the condition. Recently, machine learning comes into the picture of researchers to solve the problem in engineering. The present study is focused to the application of LSTM recurrent neural network to predict the Novel corona cases on the daily basis in India. Various RNN models are used in this study, and performance evaluation of each model is carried out using different statistical parameters such as mean absolute error (MAE), mean absolute percentage error (MAPE), route mean square error (RMSE), and coefficient of determination (r 2 score) for regression problems. From the study, it is concluded that the LSTM RNN model can be utilized for the prediction of the novel corona cases. Keywords LSTM · Recurrent neural network · Coefficient of determination · And mean absolute error

1 Introduction Novel corona disease is spreading all over the world. It is a common disease in animal and birds. It causes respiratory tract infection that change to minor to severe. The cases of the corona virus are increasing drastically day by day. The spreading rate of corona is increasing in an unusual manner. This increase rate of corona causes many S. K. Tripathi · S. Mishra · S. D. Purohit Department of Mechanical Engineering, University Department, Rajasthan Technical University, Kota, India e-mail: [email protected] S. Mishra e-mail: [email protected] S. D. Purohit (B) Department of HEAS (Mathematics), University Department, Rajasthan Technical University, Kota, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_63

873

874

S. K. Tripathi et al.

systems failure such as lack of availability of corona testing kits to availability of the beds in the hospitals. Therefore, it is necessary to forecast the corona cases. Very limited studies are available in the literature on the prophecy of novel corona cases. Wang et al. [1] utilized patient information-based system (PIBS) to study the trend of real time data of corona in China. They clinched that the death rate is affected by different geographical conditions. Shinde et al. [2] carried out an exhaustive review of various forecasting models for corona. Kirbas et al. [3] performed comparative study and forecasted the corona cases in European countries using ARIMA, NARNN, and LSTM approaches. They have found long short-term memory recurrent neural network as the most concise model. Saba and Elsheikh [4] projected ARIMA and NARANN approaches to model and prediction the corona outbreak in Egypt. Tomar and Gupta [5] used a data-driven model on LSTM technique for corona. They also deliberated the effect of home isolation and lock-down on the spreading rate of corona. Liu et al. [6] used SEIRD, LSTM, and GWR models to forecast and examine the corona epidemic in China. Zandavi et al. [7] projected a novel hybrid model using LSTM artificial recurrent neural network with dynamic behavioral models to estimate the spread of corona. Berman [8] conducted the time series analysis and prediction of corona cases using LSTM and ARIMA models. He also established the two novel k-period metrics to study the performance of the models. Zhao et al. [9] projected the corona pandemic by using LSTM, RNN, slim LSTM, and curve fitting method. Villegas et al. [10] applied the recurrent neural network approach to predict the evolution of corona mortality risk in Spain. Chandra et al. [11] predicted the corona infection in India using deep learning via different LSTM models like LSTM networks, bidirectional LSTM, and encoder-decoder LSTM models for first phase. Chimmula and Zhang [12] developed a time series forecasting of corona transmission in Canada using machine learning and LSTM networks. Elsheikh et al. [13] proposed LSTM network to forecast corona outbreak in Saudi Arabia. The authors measured seven statistical criteria to check the reliability of their model. Pathan et al. [14] explored the mutation rate of gene sequence of corona patient’s form different countries using recurrent neural network-based LSTM model. Shahid et al. [15] projected different prediction models to predict corona epidemic. They clinched that Bi-LSTM model can be used for pandemic prediction for better planning and management. Shastri et al. [16] carried out a comparative study of corona cases in India and USA using deep learning models. They clinched that convolution LSTM is the more accurate and less error prone in comparison to other models. Wang et al. [17] established a prediction model of corona by using rolling update mechanism embedded with LSTM for Russia, Peru, and Iran. This model can predict long term projections also. Zerousal et al. [18] presented a comparative study of different deep learning methods for forecasting of corona data from six countries, namely Italy, Spain, France, China, USA, and Australia. They concluded that variational auto encoder (VAE) algorithm has superior performance in comparison to other algorithm. Kumar et al. [19] projected customized RNN model for 60-day forecasting of corona cases for top-10 countries and identified countries where corona cases reach plateau. Dairi et al. [20] investigated the performance of machine learning methods for corona transmission forecasting. Kumar et al. [21] used LSTM-based model and validated

Forecasting of Novel Corona Cases in India Using LSTM-Based …

875

the results with New Zealand’s data. They also forecasted the dates when other countries would be able to contain the spread of corona. Rauf et al. [22] presented the time-series forecasting of corona transmission in Asia pacific countries using deep neural networks. In this study, LSTM-based recurrent neural network (RNN) is used to predict the novel corona cases in the INDIA for first and second phases of the corona. To get the optimal architecture, various LSTM-based RNN models are trained, and based on the statistical parameters, the optimal architecture of the model is defined, which is further used for the prediction.

2 Methodology Artificial intelligence (AI) plays a significant role in every field of life in today’s world. Machine learning (ML), a subset of AI, is being used in engineering to solve the complex problem from foundation to extreme levels. ML is broadly classified into three categories: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is a part of ML in which the machine learns from the input–output dataset. Based on the input–output dataset, machine tries to discover the hidden mathematical relationship between input and output variables. Unsupervised learning is a type of ML which deals with unlabeled data. It is generally used for clustering. It explores the data and can draw inferences from unlabeled data to describe the hidden structure. Reinforcement learning is a type of learning in which machine learns from mistakes. The process of learning is known as Reward-based learning. Deep learning (DL) is another sub-field of artificial intelligence (AI) that simulates the personage brain’s working to process data to obtain the mathematical relationship between the input vector and the output vector. Deep learning generally deals with big data. Figure 1 shows the relation among AI, ML, and DL.

Fig. 1 Artificial intelligence

876

S. K. Tripathi et al.

Fig. 2 Data collection method

The task of machine learning is data-driven, so any machine learning algorithm’s accuracy significantly depends on the data.

2.1 Data Collection Data collection is the process of gathering information system that enables in the form of the table either from primary or secondary sources. Figure 2 shows the different methods of data collection. In this research work, data is collected through from Kaggle [23], which is a secondary source of data collection.

2.2 Data Cleaning Data cleaning is the process of detecting and correcting inaccurate data. It also includes detecting incomplete, inaccurate, and incorrect data and replacing, removing, and updating the data. Figure 3 shows the complete data cleaning process. This process starts with import data from different sources and merging data. Next steps include the rebuilding of data in which missing entries are filled up using statistics (mean, median, and mode). The data is standardized (μ = 0 and σ = 1), and normalization scopes down the data between 0 and 1. After normalization of data, duplicate entries in the data are removed, which is known as de-duplication. The next step data is verified using outlier detection, and finally, the data is exported in a new file. Data cleaning is done using Pandas and the sk-learn library.

Forecasting of Novel Corona Cases in India Using LSTM-Based …

877

Fig. 3 Data cleaning cycle

2.3 Recurrent Neural Network Model The artificial neural network was introduced to simulate the personage brain for deep learning tasks by treating computational units in a training model in such a way similar to the human nervous system. RNN is a feedforward neural network which has internal memory. As the name suggests, it is recurrent in nature as it perform the same function for each input data, which the output of current state depends on the previous state. The recurrent neural network model comprises an input layer, n numbers of hidden layers, and an output layer. Each neuron is passed through a connection with a synaptic weight, the strength of which changes with external inputs. The architecture of RNN has explained below in Fig. 4. RNN computes the input function by passing the calculated value from input neuron to output neuron(s) through hidden layers neurons by using synaptic weights as an intermediate parameter. The weighted sum is calculated at each node, and then it is passed through a function known as the activation function. Learning occurs by changing the synaptic weight connecting the neuron using the backpropagation algorithm. The backpropagation algorithm is completed in two steps, in first step variables (synaptic weights) and recurrence of dynamic programming. The backward phase derivative of cost/loss function is obtained, and a change in weights at each node is calculated. As per Charu [24], we describe the forward and backward phase as follows. In the forward phase, a random input data is used to obtain the values of outputs of every hidden layer neuron grounded on the present value of weight and nature of activation function. It is named as forward phase because all computations inherently occur in the forward direction across the layers. This phase aims to obtain all the synaptic weight of hidden layer

878

S. K. Tripathi et al.

Fig. 4 Recurrent neural network

neuron and output for a given input. Finally, when the calculation is completed, the error at each output node is calculated, and the cost function is computed. Error(e) = Yactual − Ypredict Cost function(L) =

1  ∗ (Error)2 N

(1) (2)

In backward phase, we are concerned to calculate the derivative of the cost function concerning various synaptic weights. First, we are interested in computing the derivative of cost function over the output, so the gradient computation’s initialization started. Subsequently, the derivative is feed in the backward direction using the multivariable chain rule of partial differentiation. The weights are adjusted to minimize cost function and for this stochastic gradient descent technique. The LSTM RNN is further covered in [25].

3 Performance Evaluation Different RNN models are used in this research work. These models’ results are evaluated in two phases named training phase and validation phase. Though the coefficient of determination (r 2 -score) is a valid statistic for assessing a model’s prediction accuracy, a high r 2 -score model may not necessarily lead to accurate predictions. This is because the model could systematically and significantly overand/or under-predict the data at different points along the regression line. An overfitted model could also lead to poor predictions. Therefore, it is essential to evaluate the models with other performance statistics, preferably based on an independent set of observations, to provide additional information on the models’ prediction accuracy. Various parameters, including coefficient of determination (r 2 -score), mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error

Forecasting of Novel Corona Cases in India Using LSTM-Based …

879

(MAPE), are used for performance evaluation and are calculated as follows: 1  (yactual − ypredicted ) ∗ n i=0 n

Mean Absolute Error (MAE) = 

1 n ∗ (yactual − ypredicted )2 i=0 n  n 2 i=1 (ypredicted − ymean ) 2  r − score = 1 − n 2 i=1 (yactual − ymean )

Root Mean Square Error =

(3)

(4)

(5)

‘ypredict ’ is predicted output, ‘yactual ’ is actual output, ‘ymean ’ is the mean of actual output, and n—total number of the dataset. Equations 3–5 are used by [26] for evaluation of the quality of the model. At the best condition, the prediction is almost equal to actual output and MAE, and RMSE are zero, and r 2 -score is precisely 1.

4 Software Information In this research work, Spyder (Python scripted IDE) [27] is used. Data handling is done by using Pandas and the NumPy library. Different machine learning models are generated by using sk-learn library. Matplotlib library is used for visualization of the data and results. Keras is used for creating a recurrent neural network (RNN) model.

5 Results and Discussion In this research, recurrent neural network technique was used for forecasting of the novel corona cases in India. The dataset used for the training and the forecasting of the novel corona cases was taken from Kaggle data library [23], which provides the real-time data of the corona cases all over the world. The dataset obtained is further divided into two phases, i.e., first phase (April 1, 2020 to December 31, 2020) and second phase (January 1, 2021 to June 12, 2021). Each phase is further divided into the training phase and the forecasting phase. Figure 5 shows the regression line for the training phase of the first phase. The MAE of the training phase is 3135, and r 2 -score is 0.98. Figure 6 shows the daywise cases in actual and the training phase of the first phase. The results show that the RNN model is sufficiently trained. Figure 7 shows the regression line for the forecasting phase of the first phase. The MAE of the forecasting phase is 5833, and r 2 -score is 0.89.

880

S. K. Tripathi et al.

Predicted Case

120000

y = 1.0817x - 460.63 R² = 0.9828

100000 80000 60000 40000 20000 0 0

20000

40000

60000

80000

100000

120000

Actual Case Fig. 5 RNN model first phase training results 120000

Cases

100000 80000 60000 40000 20000 0 01-04-2020

01-05-2020

01-06-2020

01-07-2020

01-08-2020

01-09-2020

Date Actual cases

Predicted Cases

Predicted Cases

Fig. 6 RNN model first phase actual versus training results 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0

y = 1.0415x + 2950.4 R² = 0.8907

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Actual Cases Fig. 7 RNN model first phase forecasting results

Figure 8 shows the daywise cases in actual and the forecasting phase of the first phase. The results show that the RNN model is satisfactory for the forecasting of the novel corona cases. Table 1 presents the different statistical parameters evaluated form the results of the first phase.

Forecasting of Novel Corona Cases in India Using LSTM-Based …

881

100000

Cases

80000 60000 40000 20000 0

Date Actual Cases

Predicted Cases

Fig. 8 RNN model first phase actual versus forecasting results

Table 1 Statistical parameters for the first phase S. No.

Statistical parameter

Training phase

Forecasting phase

1

MAE

3135

5833

2

RMSE

5811

7742

3

MAPE

9.21%

13.28%

4

r 2 -score

0.98

0.89

Figure 9 shows the daywise cases in actual and the training phase of the second phase. The results show that the RNN model is sufficiently trained. Figure 10 shows the daywise cases in actual and the forecasting phase of the second phase. The results show that the RNN model is satisfactory for the forecasting of the novel corona cases. Figure 11 shows the daywise cases in actual and the forecasting phase of the second phase. The results show that the RNN model is sufficiently trained. 450000

y = 1.021x - 34.397 R² = 0.994

400000 350000 300000 250000 200000 150000 100000 50000 0 0

50000

100000

150000

200000

Fig. 9 RNN model second phase training results

250000

300000

350000

400000

450000

882

S. K. Tripathi et al. 500000

Cases

400000 300000 200000 100000 0 01-01-2021

01-02-2021

01-03-2021

01-04-2021

Date Actual Cases

Predicted Cases

Fig. 10 RNN model second phase actual versus training results 500000 450000 400000 350000 300000 250000 200000 150000 100000 50000 0

y = 0.9207x + 47975 R² = 0.9476

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

Fig. 11 RNN model second phase forecasting results

Date Actual Cases

Predicted Cases

Fig. 12 RNN model second phase actual versus forecasting results

12-06-2021

10-06-2021

08-06-2021

06-06-2021

04-06-2021

02-06-2021

31-05-2021

27-05-2021

29-05-2021

25-05-2021

23-05-2021

21-05-2021

19-05-2021

17-05-2021

15-05-2021

13-05-2021

11-05-2021

09-05-2021

07-05-2021

05-05-2021

03-05-2021

500000 400000 300000 200000 100000 0 01-05-2021

Cases

Figure 12 shows the daywise cases in actual and the forecasting phase of the second phase. The results show that the RNN model is satisfactory for the forecasting of the novel corona cases in the second phase. Table 2 presents the different statistical parameters evaluated form the results of the second phase.

Forecasting of Novel Corona Cases in India Using LSTM-Based …

883

Table 2 Statistical parameters for the second phase S. No.

Statistical parameter

Training phase

Forecasting phase

1

MAE

4684

32,797

2

RMSE

8464

38,158

3

MAPE

6.61%

13.30%

4

r 2 -score

0.99

0.94

6 Conclusion In this research work, recurrent neural network technique of artificial intelligence is used for the forecasting of the novel corona cases in India. Based on the study, following conclusions are drawn: 1. The RNN was used to predict the daily corona cases in India, which can be further utilized by the government for the estimation of the lockdown period. 2. This predictive model is efficient, which can be utilized for the purpose of vaccination throughout the country. 3. Based on the results from the predictive model, supply chain management of the vaccine can be operated efficiently as the demand is already known to the pharmaceutical companies. 4. The government can use the data from the RNN model as base for the estimation of the fund to be released during this pandemic period.

References 1. Wang L, Li J, Guo S, Xie N, Yao L, Cao Y, et al (2020) Real-time estimation and pre- diction of mortality caused by COVID-19 with patient information based algorithm. Sci Total Environ 138394 2. Shinde GR, Kalamkar AB, Mahalle PN, Dey N, Chaki J, Hassanien AE (2020) Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art. SN Comput Sci 1(4):1–5 3. Kırba¸s ˙I, Sözen A, Tuncer AD, Kazancıo˘glu FS¸ (2020) Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches. Chaos Solitons Fractals 1(138):110015 4. Saba AI, Elsheikh AH (2020) Forecasting the prevalence of COVID-19 outbreak in Egypt using nonlinear autoregressive artificial neural networks. Process Saf Environ Prot 1(141):1–8 5. Tomar A, Gupta N (2020) Prediction for the spread of COVID-19 in India and effectiveness of preventive measures. Sci Total Environ 1(728):138762 6. Liu J, Zhou J, Yao J, Zhang X, Li L, Xu X, He X, Wang B, Fu S, Niu T, Yan J (2020) Impact of meteorological factors on the COVID-19 transmission: a multi-city study in China. Sci Total Environ 15(726):138513 7. Zandavi SM, Rashidi TH, Vafaee F (2020) Forecasting the spread of Covid-19 under control scenarios using LSTM and dynamic behavioral models. arXiv preprint arXiv:2005.12270 8. Barman A (2020) Time series analysis and forecasting of covid-19 cases using LSTM and ARIMA models. arXiv preprint arXiv:2006.13852

884

S. K. Tripathi et al.

9. Yan B, Tang X, Liu B, Wang J, Zhou Y, Zheng G, Zou Q, Lu Y, Tu W (2020) An improved method for the fitting and prediction of the number of covid-19 confirmed cases based on lstm. arXiv preprint arXiv:2005.03446 10. Villegas M, Gonzalez-Agirre A, Gutiérrez-Fandiño A, Armengol-Estapé J, Carrino CP, Fernández DP, Soares F, Serrano P, Pedrera M, García N, Valencia A (2021) Predicting the evolution of COVID-19 mortality risk: a recurrent neural network approach. medRxiv 2020–12 11. Chandra R, Jain A, Chauhan DS (2021) Deep learning via LSTM models for COVID-19 infection forecasting in India. arXiv preprint arXiv:2101.11881 12. Chimmula VK, Zhang L (2020) Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals 1(135):109864 13. Elsheikh AH, Saba AI, Abd Elaziz M, Lu S, Shanmugan S, Muthuramalingam T, Kumar R, Mosleh AO, Essa FA, Shehabeldeen TA (2021) Deep learning-based forecasting model for COVID-19 outbreak in Saudi Arabia. Process Saf Environ Prot 1(149):223–233 14. Pathan RK, Biswas M, Khandaker MU (2020) Time series prediction of COVID-19 by mutation rate analysis using recurrent neural network-based LSTM model. Chaos Solitons Fractals 1(138):110018 15. Shahid F, Zameer A, Muneeb M (2020) Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals 1(140):110212 16. Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2020) Time series forecasting of Covid19 using deep learning models: India-USA comparative case study. Chaos Solitons Fractals 1(140):110227 17. Wang P, Zheng X, Ai G, Liu D, Zhu B (2020) Time series prediction for the epidemic trends of COVID-19 using the improved LSTM deep learning method: case studies in Russia, Peru and Iran. Chaos Solitons Fractals 1(140):110214 18. Zeroual A, Harrou F, Dairi A, Sun Y (2020) Deep learning methods for forecasting COVID-19 time-series data: a comparative study. Chaos Solitons Fractals 1(140):110121 19. Arun Kumar KE, Kalaga DV, Kumar CM, Kawaji M, Brenza TM (2021) Forecasting of covid19 using deep layer recurrent neural networks (rnns) with gated recurrent units (grus) and long short-term memory (lstm) cells. Chaos Solitons Fractals 1(146):110861 20. Dairi A, Harrou F, Zeroual A, Hittawe MM, Sun Y (2021) Comparative study of machine learning methods for COVID-19 transmission forecasting. J Biomed Inform 1(118):103791 21. Kumar S, Sharma R, Tsunoda T, Kumarevel T, Sharma A (2021) Forecasting the spread of COVID-19 using LSTM network. BMC Bioinform 22(6):1–9 22. Rauf HT, Lali MI, Khan MA, Kadry S, Alolaiyan H, Razaq A, Irfan R (2021) Time series forecasting of COVID-19 transmission in Asia Pacific countries using deep neural networks. Pers Ubiquit Comput 10:1–8 23. https://www.kaggle.com/sudalairajkumar/covid19-in-india 24. Charu CA (2018) Neural networks and deep learning: a textbook. Springer 25. Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 26. Tso GK, Yau KK (2007) Predicting electricity energy consumption: a comparison of regression analysis, decision tree and neural networks. Energy 32(9):1761–1768 27. Raybaut P (2009) Spyder-documentation. Available online at: pythonhosted.org

Algorithms for Syllogistic Using RMMR Sumanta Sarathi Sharma and Varun Kumar Paliwal

Abstract Syllogistic reasoning uses diagrams for decision-making, among other methods, as they are effective in visual cognition. Similarly, decision-making and problem-solving use algorithms extensively as they depict the rules to be followed and show us the line of reasoning. This paper develops algorithms to define a moving environment, which helps us efficaciously understand the state of affairs and line of reasoning concerning syllogistic using the retooled method of minimal representation (RMMR), an alternate diagrammatic technique to test the validity of syllogism using diagrams and algorithms. Keywords Algorithms · RMMR · Syllogistic reasoning · Visual cognition

1 Background and Historical Preliminaries Diagrammatic methods and related tools (such as Euler circles [9] and Venn-Peirce [19, 28] diagrams) are well-established in syllogistic. There has been a flood of works about logic diagrams and diagrams for syllogistic in the last decade [3, 5, 13–15, 18, 21–23]. We know that ‘diagrams’ have played a key role in assisting our pursuits of reasoning. They not only helped us ‘to understand’, but also helped us to explain ‘what we understand’ to others. This fact is evident right from the times of Aristotle [10, 16]. A study found that certain diagrams can help individuals to reason more rapidly and more accurately [1]. Also, diagrams automatically support a large number of perceptual inferences, which are extremely easy for humans [12]. Furthermore, suitably formulated diagrams provide effective and concrete ways of solving syllogistic tasks [20]. Consequently, diagrams became an integral part of the S. S. Sharma (B) Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India e-mail: [email protected] V. K. Paliwal Birla Institute of Technology and Science Pilani, Pilani, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_64

885

886

S. S. Sharma and V. K. Paliwal

human reasoning process and are now inseparable from syllogistic reasoning. In a similar vein, algorithms and their usage have a long history in decision-making and thought processes [6]. Some studies also suggest using algorithms for syllogistic [7, 11, 17]. This article presents the retooled method of minimal representation (RMMR), which uses a still and a moving environment for syllogistic problem-solving and proposes algorithms for further clarifications. These revisions and improvements aim to show the line of reasoning and visually cognize all imaginable situations between classes depicted by the subject and predicate terms of categorical propositions. Finally, while dealing with a syllogism, we use algorithms apart from the still and moving interfaces. The paper has three sections. In the first section, we explain the fundamentals of RMMR, including singly and multiply represented diagrams for propositions and discuss its still and moving interfaces. Later, we understand the functioning of RMMR by taking few cases in the second section. In the final section, we develop algorithms for RMMR.

2 Retooled Method of Minimal Representation The method of minimal representation (MMR) (earlier also named as the ‘method of least representation’) [25] is a diagrammatic method to examine the validity or invalidity of syllogisms. MMR tests the validity of syllogisms in both Aristotelian and Modern logic formats, which is its distinctive feature [24]. Moreover, MMR plays a vital role in differentiating perfect from imperfect syllogisms [26]. In what follows, we explain MMR’s reformulation using still and moving interfaces along with algorithms in the end.

2.1 Preface of RMMR In RMMR, a ‘rectangle’ with ‘U’ engraved inside represents the diagram site. We know that a syllogism has three terms, namely S (minor), P (major), and M (middle). For this, we use three colors, i.e., red, green, and blue for S, P, and M, respectively. In addition to this, a ‘four arrowed pointer’ (FAP) for showing the movement of a set denoted by S, P, or M, is also used as shown in Fig. 1. The ‘Game of Logic’ inspires FAP, which has pointers that move as per prescribed rules in the biliteral and triliteral diagrams to validate or invalidate the claimed conclusion [4]. In the same vein, the significance of FAP (in RMMR) is to show and restrict the movement of a set denoted by the corresponding color in a designated area. Also, the pointer tells that the FAP can move in any direction. In RMMR, we show a set (say S, P, or M) using a square. This set can be drawn anywhere inside the diagram site U as shown in Fig. 2.

Algorithms for Syllogistic Using RMMR

887

Fig. 1 Diagram site and FAP

Fig. 2 Singly represented RMMR for any set

Fig. 3 Multiply represented RMMR for any set

The above diagram is a singly represented static diagram for various kind of movement, which is associated with the set S. A multiply represented static diagram of set S is depicted in Fig. 3. Both the above figures depict the possibility of set S in the diagram site U. The former (Fig. 2) is a singly represented static diagram. Thus, S is shown at one place. However, the latter (Fig. 3) is a multiply represented static diagram and thus, S has multiple representations of set S in the diagram site. Next, we draw propositions in RMMR.

888

S. S. Sharma and V. K. Paliwal

2.2 Propositions in RMMR Aristotle’s assertoric propositions were later rechristened (and came to be known) as categorical propositions by B¨oethius [2]. There are four categorical propositions, namely universal affirmative (represented by A), universal negative (represented by E), particular affirmative (represented by I), and particular negative(represented by O), whose depictions are shown next: Universal Affirmative Proposition (A) ‘all S is P’ is represented as follows: Here, a smaller red square S is drawn inside a larger green square P. Moreover, red FAP S is placed inside the green square of P to show that the red square S can be present or move inside the green square only. Moreover, it also tells us that the red square S cannot be present outside the green square P. However, green FAP P is placed outside the green square to show its movement and presence anywhere in the diagram site U as in Fig. 4. This diagram can have multiple iterations as S and P can be shown to have different sizes and can also acquire several positions in U. However, all the iterations must follow the rule as prescribed by FAP. It may be noted that we place the FAP inside or outside a corresponding set rather than the set itself. For example, a red FAP is placed corresponding to a green square and a green FAP corresponding to a red square. Since there are two sets in a proposition, one FAP each for red and green suffice our requirement. However, when we have to unify two propositions in a single diagram, we may require more than one FAP for a square (or a right-angled triangle). Universal Negative Proposition (E) ‘no S is P’ is represented as follows: Here, two non-overlapping squares, i.e., red S and green P, are drawn as shown in Fig. 5. The green and red FAPs are also placed outside each other and inside U to show their respective positions. Thus, it shows that squares S and P can move away from each other but can neither intersect nor contain each other. This diagram too can have multiple iterations as S and P can be shown to have different sizes and can acquire several positions in U as per the rules of FAP. In other words, squares S and P can never overlap or intersect with each other. Particular Affirmative Proposition (I) To depict, ‘some S is P’, we draw a red right-angled triangle S inside the green square P as shown in Fig. 6. The rationale for choosing a right-angled triangle is to depict it as a potential square. The red and green FAPs show that the red square S can only be inside P, whereas P can be anywhere in the diagram site U, respectively. Particular Negative Proposition (O) To diagram ‘some S is not P’, we draw a red right-angled triangle outside green square P. The red and green FAPs are placed outside green square and red right-angled triangle, respectively, to show that they cannot overlap in U as shown in Fig. 7. All propositions in RMMR have multiple representations. This is akin to the concept of ‘counterpart equivalence’ [8], where two diagrams (D and D ) represent

Algorithms for Syllogistic Using RMMR

Fig. 4 RMMR for universal affirmative proposition

Fig. 5 RMMR for universal negative proposition

Fig. 6 RMMR for particular affirmative proposition

Fig. 7 RMMR for particular negative proposition

889

890

S. S. Sharma and V. K. Paliwal

Fig. 8 Multiply represented RMMR for universal negative proposition

the same set of information and state of affairs except that it may have different size or orientation in the diagram site, they are called counterpart equivalent. All multiply represented static diagrams (MRSD) of a proposition depict counterpart equivalences of their singly represented static diagram (SRSD). In other words, an MRSD combines several diagrams in one diagram to express the equivalent state of affairs of its corresponding SRSD. The above expression demands some clarification from an inquisitive reader with respect to a singly represented static diagram, multiply represented static diagram, and counterpart equivalences. Let us understand it with the help of an example. Consider a universal negative proposition E as shown in Fig. 8. These iterations, as shown above, are some of the depictions of a universal negative proposition in RMMR. Each of these iterations is a counterpart equivalent of the standard depiction of a universal negative proposition. The iterations also depict that the squares can be horizontally or vertically placed, have different alignments or sizes, and are not strictly squared. These iterations have to ensure that S and P are non-overlapping or non-intersecting closed four-sided figures. On the other hand, a multiply represented static diagram of a universal negative proposition is a combined diagram of these possibilities. In a way, it is like several screenshots (Fig. 9) of a moving environment plotted in diagram site U. A multiply represented static diagram sketches some of the possibilities of finding a set in the diagram site. Roughly, it traces the movement of a set inside the diagram site. In the above case, sets S and P are shown to be found anywhere in U. However, they cannot intersect or overlap as depicted by FAPs. A multiply represented static diagram provides essential information to a ratiocinator while validating a syllogism. Let us see the working of RMMR to test syllogisms in the next section.

Algorithms for Syllogistic Using RMMR

891

Fig. 9 Screenshot of RMMR for universal negative proposition

2.3 Examining Syllogisms in RMMR In this section, we will test the validity of syllogisms using RMMR. Before testing syllogisms, we explicate some general rules to draw the premises and read off the conclusion in RMMR. Rules to draw premises in RMMR—A standard set of rules patterned on [27] to draw premises and validate a syllogism is as under: 1. Draw a single diagram to represent the facts that the two premises of a syllogism convey. (Let us call this diagram D P .) 2. Draw a diagram to represent the fact that the conclusion of the syllogism conveys. (Let us call this diagram DC .) 3. Check, if we can read off diagram DC from diagram D P . 4. If we can read off diagram DC from diagram D P , then the given syllogism is valid or else invalid. It is plain from above that reading off DC from D P holds the key in any diagrammatic technique for argumentation. D P always contains more information than DC since D P combines two premises, whereas DC depicts a single proposition. To read off DC from D P , we discount the surplus information of D P and concentrate on the delineation of DC . Let us state how to read off DC from D P . Rules to read off conclusion in RMMR—In categorical syllogisms, we will have two propositions to be drawn on the diagram site as premises. It means that a combined diagram of two propositions has to be drawn. Let this process of combining the two diagrams be guided by the rules of unification. After successful unification, the next step would be to read off DC from D P . This will require us to subtract or delete excess information from D P . Let us call them the rules of omission.

892

S. S. Sharma and V. K. Paliwal

Fig. 10 AA in first figure

Rule of Unification: Let D1 and D2 represent the major premise and minor premise, respectively, of a syllogism. Then D P is called as the unification of D1 and D2 provided that D P is the counterpart equivalence of D1 and D2 taken together. In other words, D P is the sum total of all the information present in D1 and D2 . Rule of Omission: The diagram site contains S (red), P (green), and M (blue) parts. Erase the M (blue) part, either a rectangle or a right-angled triangle, from the diagram site U. We explain the functioning of the RMMR with some examples, in the next section.

3 Functioning of the RMMR In this section, we understand the examination criteria of RMMR with the static and dynamic diagrams. We will also see how dynamic diagrams play an indomitable and crucial role in testing syllogisms, which otherwise in a static diagram is hard and difficult to read off. Let us begin with a simple example. Consider AA in the first figure, ‘all M is P and all S is M’, and then their corresponding D1 and D2 are shown in Fig. 10. D1 and D2 can be unified together by placing the red square S inside the blue square M in D1 . The D P so formed will have the FAPs in their respective positions, i.e., red inside blue, blue inside the green, and the green FAP inside U. The unified diagram is shown in Fig. 11. Next, we apply the rule of omission in the above diagram by erasing the blue part, representing M. The diagram so formed is a counterpart equivalent of DC , where we find a red square inside a green square (for all S is P) as in Fig. 12.

Algorithms for Syllogistic Using RMMR

893

Fig. 11 Singly represented D P of AA in first figure

Fig. 12 Singly represented DC of AAA in first figure

Fig. 13 AI in first figure

From the above diagram, we easily conclude, ‘all S is P’ from ‘all M is P’ and ‘all S is M’. Now, let us consider a case where it is difficult to read off conclusion after drawing the premises. For instance, consider AI in the second figure where the premises are ‘all P is M and some S is M’. Their corresponding D1 and D2 are shown in Fig. 13.

894

S. S. Sharma and V. K. Paliwal

Fig. 14 Singly represented D P of AI in second figure

D1 and D2 can be combined as in Fig. 14. We apply the rule of omission and erase the blue rectangle. The following is the obtained diagram as shown in Fig. 15. An interpreter of the above diagram is tempted to conclude ‘some S is not P’. However, the red and green FAPs show that the green rectangle P and the red rightangled triangle S can intersect and contain each other. Some possible states of the affair of their intersection and containment are given in Fig. 16. From the above diagram, it is clear that we cannot conclude that ‘some S is not P’. We may reiterate that FAPs play an essential role in exhibiting the intersection or containment of the classes represented by the S, P, and M terms. As a matter of fact, we cannot conclude anything whatsoever from the above diagram in terms of categorical propositions, and thus, the premise pair of AI in the second figure does not yield any valid conclusion. The authors admit that the above explanation or the expressed ideas (say, singly represented and multiply represented diagrams) seems sketchy and cumbersome (at least at the first go) as it is not clear to the ratiocinator that how to interpret or read off the conclusion from the D P or DC . It is an unavoidable situation initially, and to address this, we have devised a dynamic diagrammatic interpretation of the static diagrams. Therefore, let us reassert the static diagrams (both singly and multiply) and introduce the dynamic diagrams to an interpreter. Static Diagrams—There are two types of static diagrams, viz. singly and multiply. Singly represented static diagrams (SRSD) are those diagrams, which represent a single (possible) state of affairs of a categorical proposition. Multiply represented static diagrams (MRSD) represent more than one (possible) state of affairs of a categorical proposition. However, it may be noted, that no MRSD can represent all possible states of affair, as it is infinite due to the size, shape, and orientation of rectangles, right-angled triangles, and the diagram site.

Algorithms for Syllogistic Using RMMR

895

Fig. 15 Singly represented DC of AI in second figure

Fig. 16 Multiply represented DC of AI in second figure

Dynamic Diagrams—Let the diagram site U is a moving environment. The squares and right-angled triangles can move inside the diagram site U as per the rules prescribed by FAPs. It generates dynamic diagrams of categorical propositions. In other words, a dynamic diagram is the total of all possibilities of relations expressed by the subject and predicate classes of categorical propositions. Another way of comprehending, it will be to consider an MRSD as a finite number of screenshots (or say, graphical plotting) of the corresponding dynamic diagram. It is impossible to plot a dynamic diagram on a sheet of paper. However, MRSD can always help us visualize the states of the affair between the subject and predicate classes. For example, an MRSD of EE in the fourth figure, i.e., ‘no P is M’ and ‘no M is S’ is represented as shown in Fig. 17. It can be seen from the above set of diagrams that S and P can intersect, contain as well as stay disjoint. Therefore, no conclusion follows from EE as premises in the fourth figure. The dynamic diagram of EE as premises is found here: RMMR for EE It is now clear (after clicking on the link) that there is an infinite number of ways

896

S. S. Sharma and V. K. Paliwal

Fig. 17 Multiply represented D P of EE in fourth figure

in which MRSD can be depicted, and a dynamic diagram cannot be represented on a sheet of paper. SRSD, MSRD, and Dynamic diagrams may become intricate to draw diagrams for beginner and intermediate level reasoners. Thus, we introduce algorithms for the user to help her reason well.

4 Algorithms for RMMR To help users, draw the dynamic diagrams with ease, we pronounce the following rules for RMMR in the form of algorithms: Þ1 represents proposition 1 (Major Premise) Þ2 represents proposition 2 (Minor Premise)  represents square  represents right-angled triangle ∈ represents always reside inside ∈ / represents always reside outside  represents always contains The first step is to create the diagram site U. In the second step; we obtain the premises. After this, we check which of these premises is a universal proposition. If the first premise is universal, we draw it in the diagram site U. Or else, we draw the second premise (if it is a universal proposition). If both are universal, we draw any one of them. However, if both are particular propositions, we can again draw any one of the propositions first. The idea is to give preference to the universal proposition. The penultimate step concerns the interpretation of the premises, and finally, a conclusion is reached.

Algorithms for Syllogistic Using RMMR

897

Algorithm 1: General RMMR Data: Get proposition 1 (Major Premise) and proposition 2 (Minor Premise) and the proposition 3 (Conclusion) given. Result: Decide whether the syllogism is “Valid” or “Invalid”. 1 RMMR() 2 Create a diagram site = U˜ 3 Obtain premise 1 and premise 2 4 if (Þ2 == “A” || Þ2 == “E” ) && (Þ1 == “I ” || Þ1 == “O” ) then 5 Diagram_Interpretation (Þ2) 6 Another_Diagram_Interpretation (Þ1) 7 else 8 Diagram_Interpretation (Þ1) 9 Another_Diagram_Interpretation (Þ2) 10

Validity_of_Conclusion()

Algorithm 2: Diagram_Interpretation Data: Get proposition which is universal affirmative or negative else take proposition 1. Result: Dynamic diagrams for proposition. 1 Diagram_Interpretation ( pr emise_t ype) 2 Create  ∈ U˜ 5 6 7 8 9 10 11 12 13

if pr emise_t ype == “A” then Create another  ∈ Initial  else if pr emise_t ype == “E” then Create another  ∈ / Initial  else if pr emise_t ype == “I ” then Create  ∈ Initial  else if pr emise_t ype == “O” then Create  ∈ / Initial  and ∈ U˜

In what follows, we first write the algorithm for drawing the second premise after the first premise. Some cases may require creating a square from a right-angled triangle in case of particular propositions. Next, we write the algorithm for validating the conclusion from the premises.

898

S. S. Sharma and V. K. Paliwal

Algorithm 3: Another_Diagram_Interpretation Data: Get another proposition. Result: Dynamic diagrams for another proposition. 1 Another_Diagram_Interpretation ( pr emise_t ype) 2 if pr emise_t ype == “A” then 3 if subject_ter m == “M” then 4 Create another   blue  5 else 6 Create another  ∈ blue  7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

else if pr emise_t ype == “E” then Create another  ∈ / blue  and ∈ U˜ else if pr emise_t ype == “I ” then if subject_ter m == “M” then Create    portion of blue  else if blue  does not exist then Create blue  using blue  portion Create  ∈ blue  else if pr emise_t ype == “O” then if subject_ter m == “M” then Create  and  portion of blue  ∈ / else if blue  does not exist then Create blue  using blue  portion Create  ∈ / blue  and ∈ U˜

Algorithm 4: Validity_of_Conclusion Data: Get diagrams of proposition 1 and 2 Result: Decide Validity of Conclusion 1 Validity_of_Conclusion() 2 if “S” ∈ “P” then 3 U niver sal A f f ir mative Pr oposition 4 else if “S” ∈ / “P” then 5 U niver sal N egative Pr oposition 6 else if  portion of “S” ∈ “P” then 7 Par ti cular A f f irmative Pr oposition 8 else if  portion of “S” ∈ / “P” then 9 Par ticular N egative Pr oposition 10 else 11 I nvalid Syllogism

Algorithms for Syllogistic Using RMMR

899

5 Summary and Conclusion The usage of diagrams for syllogistic is age-old. Lately, there has been a renewed interest among scholars in using diagrammatic representation in logic. This paper reintroduces the MMR with static and dynamic diagrams. This retooling is called the RMMR, which uses static and dynamic interfaces for syllogistic thinking. It enables a user to perceive the line of reasoning and states of affairs. Moreover, we also develop algorithms for a more user-friendly experience. Using diagrammatic techniques in syllogistic logic help us reason well since our faculty of cognitive reasoning is more visually inclined. Euler circles, Venn-Peirce framework, and Carroll diagrams in syllogistic reasoning have gained currency in the last century. It is also true that rules of formation and transformation guide diagrammatic reasoning, but it is sometimes difficult to comprehend these rules. An expert user or logician has ways to tackle the nitty-gritty of formations and transformations, but things seem moderately challenging for users of intermediate or beginner level. At this point, we need dynamic interfaces and algorithms to help us through. In this paper, we address such and related problems for syllogistic reasoning with the help of algorithms. We also plan to extend RMMR for non-syllogistic reasoning (especially with three to four premises) as an avenue for future work.

References 1. Bauer MI, Johnson-Laird P (1993) How diagrams can improve reasoning. Psychol Sci 4(6):372–378 2. Bobzien S (2020) Ancient logic. In: Zalta EN (ed) Stanford encyclopedia of philosophy. https:// plato.stanford.edu/archives/sum2020/entries/logic-ancient/ 3. Bosley R (2013) The geometry of diagrams and the logic of syllogisms. In: Moktefi A, Shin S-J (eds) Visual reasoning with diagrams. Springer, Basel, pp 19–31 4. Carroll L (1886) The game of logic, 1st edn. Macmillan and Co., London 5. Castro-Manzano JM (2018) Syllogistic with jigsaw puzzle diagrams. In: Chapman P, Stapleton G, Moktefi A, Perez-Kriz S, Bellucci F (eds) Diagrammatic representation and inference. Diagrams 2018. Lecture notes in computer science, vol 10871. Springer, Cham, pp 657–671. https://doi.org/10.1007/978-3-319-91376-6_58 6. Chabert J-L, Weeks C, Barbin E, Borowczyk J, Chabert J-L, Guillemot M, Michel-Pajus A, Djebbar A, Martzloff J-C (1999) A history of algorithms: from the pebble to the microchip, 1st edn. Springer, Berlin 7. Gursoy KN, Senturk I, Oner T, Gursoy A (2020) A new algorithmic decision for categorical syllogisms via Carroll’s diagrams. Soft Comput 2(4):11337–11346 8. Hammer E, Shin S-J (2007) Euler’s visual logic. Hist Philos Logic 19(1):1–29 9. Hunter H (1802) Letters of Euler on different subjects in physics and philosophy addressed to a German princess, 2nd edn. Murray and Highley, London 10. Kneale W, Kneale M (1962) The development of logic, 1st edn. Clarendon Press, Oxford 11. Kumova B˙I, Çakır H (2010) Algorithmic decision of syllogisms. In: García-Pedrajas N, Herrera F, Fyfe C, Benítez JM, Ali M (eds) Trends in applied intelligent systems. IEA/AIE 2010. Lecture notes in computer science, vol 6097. Springer, Berlin, pp 28–38. https://doi.org/10.1007/9783-642-13025-0_4

900

S. S. Sharma and V. K. Paliwal

12. Larkin JH, Simon HA (1987) Why a diagram (sometimes) worth ten thousand words. Cogn Sci 11(1):65–100 13. Lemanski J (2017) Logic diagrams in the Weigel and Weise circles. Hist Philos Logic 39(1):1– 26 14. Lemanski J (2017) Oppositional geometry in the diagrammatic calculus CL. S Am J Logic 3(2):517–531 15. Lemanski J (2020) Euler-type diagrams and the quantification. J Philos Logic 49:401–416 16. Lemon O, De Rijke M, Shimojima A (1999) Editorial: efficacy of diagrammatic reasoning: visual logic, language, and information. J Logic Lang Inform 8(3):265–271 17. Morita K, Noritaka N, Emura E (1998) An efficient reasoning method for syllogism and its application to knowledge processing. Syst Comput Jpn 19(3):20–31 18. Pagnan R (2013) A diagrammatic calculus of syllogisms. In: Moktefi A, Shin S-J (eds) Visual reasoning with diagrams. Springer, Basel, pp 33–53 19. Peirce CS (1933) Collected papers, 1st edn. Harvard University Press, Cambridge, MA 20. Sato Y, Mineshima K (2015) How diagrams can support syllogistic reasoning: an experimental study. J Logic Lang Inform 24(4):409–455 21. Sautter FT, Secco GD (2018) A simple decision method for syllogistic. In: Chapman P, Stapleton G, Moktefi A, Perez-Kriz S, Bellucci F (eds) Diagrammatic representation and inference. Diagrams 2018. Lecture notes in computer science, vol 10871. Springer, Cham, pp 708–711. https://doi.org/10.1007/978-3-319-91376-6_64 22. Sautter FT (2017) Diagramas para Juízos Infinitos. Rev Port Filosofia 73(3–4):1115–1136 23. Sautter FT (2019) A bunch of diagrammatic methods for syllogistic. Log Univers 13:21–36 24. Sharma SS (2008) Method of minimal representation: an alternative diagrammatic technique to test the validity of syllogisms. In: Stapleton G, Howse J, Lee J (eds) 5th international conference, Diagrams 2008, LNAI, vol 5223. Springer, Heidelberg, pp 412–414. https://doi.org/10.1007/ 978-3-540-87730-1 25. Sharma SS (2012) Interpreting squares of oppositions with the help of diagrams. In: The square of opposition: a general framework for cognition. Peter Lang, Bern, pp 175–194 26. Sharma SS (2014) Perfect syllogisms and the method of minimal representation. In: CEUR workshop proceedings, vol 1132, pp 17–22 27. Shin S-J (1994) The logical status of diagrams, 1st edn. Cambridge University Press, New York 28. Venn J (1880) On the diagrammatic and mechanical representation of propositions and reasonings. Philos Mag 9(59):1–18

Machine Learning-Based Approach for Airfare Forecasting L. Sherly Puspha Annabel , G. Ramanan, R. Prakash, and S. Sreenidhi

Abstract Online purchases have become the major method of purchasing airline tickets due to the rapid expansion and widespread application of web technologies. Customers find it difficult to purchase tickets at a reduced cost due to the lack of information regarding pricing issues. Existing prediction methods that rely on tomorrow’s price projection may be inaccurate, resulting in missed opportunities to purchase tickets in the future. Hence, our proposed system deals with the problem of calculating flight prices that keep changing dynamically due to a lot of factors such as time, season, duration, special events, climate change, etc. To reduce middle-agent cost and difficulties on the customer part, an easy-to-use prediction calculator that provides flight search and booking facilities all in just one-click is developed. The data analysis part of the proposed system is based on datasets collected from opensource platforms, and predicted values use Random Forest Regression as the model with the prediction accuracy being 79% for test data. The prediction model undergoes hyperparameter tuning and adaptive boosting to increase efficiency gaining an increase to 81% from the usage of the former and 85% for the latter. Keywords Airfare report · Random forest · Hyperparameter tuning · Adaptive boosting

1 Introduction Nowadays, airline corporations employ a variety of tactics and methods to dynamically assign airfare pricing. Various financial, commercial, and marketing factors are taken into consideration for designing these approaches. The cheapest accessible ticket fluctuates over time, and the price of a ticket can be high or low. Therefore, it is essential to cater the customers with predicting the right time to purchase airline tickets beforehand. Hence, various techniques are introduced that take different L. Sherly Puspha Annabel (B) · G. Ramanan · R. Prakash · S. Sreenidhi Department of Information Technology, St. Joseph’s College of Engineering, Chennai 600119, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7_65

901

902

L. Sherly Puspha Annabel et al.

factors like duration, climate change, time, special events, season into consideration for prediction which helps the customer to purchase the air tickets in advance. Most of these approaches rely on advanced prediction models developed in the field of computational intelligence referred to as machine learning (ML). According to previous research, India’s aeronautics industry is booming. In 2020, India was the world’s third-largest avionics market, and by 2030, it will be the largest. The number of low-cost flight tickets is increasing all the time. The proposed system assists travelers in selecting and purchasing tickets as well as forecasting future air ticket prices. The attributes utilized for flight price prediction include duration, arrival time, price, source, destination, along with many others. The proposed system makes the following contributions: It proposes an approach for automatically extracting the best feature set from data, which employs a hierarchy of accessible features to optimally compute a feature set; it investigates the factors that affect ticket pricing. The main vulnerability in other machine learning algorithms (overfitting) is addressed in our system by using the random forest algorithm. Also, an ideal ticket fare system is created to advise customers on whether they should buy or wait for their tickets. The results of our approach have shown that it can assist customers in purchasing tickets at the best possible price. The remaining part of the paper is laid out as follows: The literature review and proposed system are explained in Sects. 2 and 3. The experimental results and conclusion are explained in Sections 4 and 5.

2 Literature Survey The importance of machine learning rests in the modeling tools it offers, which may be trained with a set of data describing a specific problem and then respond consistently to similar unknown data. The retail industry’s expansion has been aided by the increasing availability of customer demand data, new technologies that assist determine prices more efficiently by investigating consumer habits, and decision support aids as a result of new developing technology. Gini and Groves [1] used the partial least square regression (PLSR) to construct a model for forecasting the ideal time to buy airline tickets. Data was collected from major travel booking websites from February 22 to June 23, 2011. Additional data was collected and is being utilized to compare the performance of the final model. Since 2017, more effective machine learning models have been considered for improving airfare price prediction [2]. Tziridis et al. examined the performance of eight machine learning models, including artificial neural networks, random forest, support vector machine, and logistic regression, in predicting ticket pricing. The most accurate regression model has an 88% accuracy rate. The best model in their comparison was found to be the bagging regression tree, which is robust and unaffected by varied input feature sets. Deep regressor stacking [3] has been presented as a method of increasing forecast accuracy. The proposed method, which employs random forest

Machine Learning-Based Approach for Airfare Forecasting

903

(RF) and support vector machine (SVM) as regressors, is a novel multi-target strategy that may be easily adapted to other problem domains with similar problems. In Rama-research, Murthy’s the Official Airline Guide (OAG) and DB1B database data are used to estimate airfare prices [4]. The author also makes use of Sabre AirPrice data, which is provided by SABRE, but exclusively from their online users. The conclusions produced from this online user data may be distorted because it does not represent the full consumer market. According to Dominguez-Menchero’s research [5], the purchase time is entirely dependent on a course’s nonparametric isotonic backslide process, carriers, and time frame. Before purchasing a plane ticket, the model suggests the most suitable dates. The model considers two different types of variables: entry and acquisition dates. The authors got a good result as they used a training dataset for ticket price prediction. The logistic regression model has a 70–75% accuracy rate [6]. The model’s conclusion is that the majority of plane ticket prices fluctuate from day to day. The ticket price in the model is high for a while and then progressively declines to a particular level. The ticket price starts to rise again when the flight is more than 2–3 days out. The databases contain accurate statistics on a ticket’s profit based on timing, departure, and arrival. Price forecasting is also possible on a quarterly basis. Air ticket transaction information, which offers more precise information about airlines, such as time, departure and arrival, and so on, can be added to the framework, enabling for the construction of more accurate airfare price prediction models [7]. Yield management or revenue management [8] is the study of individual applications of dynamic pricing in the aviation industry. It comprises dividing passengers/travelers into three categories: business travelers, casual travelers, and hybrid travelers. The car industry [9], which combines production schedules and inventory decisions for improved profit margins and supply chain management, is another industry that uses dynamic pricing. Dynamic pricing also has the added benefit of displaying consumer demand easily and quickly, as well as enhancing equipment maker status. This concept of dynamic pricing, which combines web integration and automation, benefits sellers in a variety of ways. It eliminates the vendor’s physical presence [10], cuts input costs, merges customer data into a single database, and lowers the cost of publishing fresh catalogues [11]. It also serves as an explicit venue for consumers and sellers to talk and exchange reviews for better services, rather than a one-way street. There are a variety of methods for studying the techniques for determining dynamic pricing. The most common and straightforward technique is to conduct a survey or observe [12]. The two approaches are implemented by providing a price recommendation tool that thoroughly examines the price-to-sales ratio and consumer willingness to pay. Experimental auctions, for example, can be used to determine the best price techniques. Furthermore, the strategies are determined by the market type, whether mass or niche. Skimming and penetration pricing strategies are required for each market [13]. Customization [14], bundling, and versioning are all basic actions that can be performed to provide dynamic pricing of information items. The customization process finds the best pricing for a product and then customizes it. The authors

904

L. Sherly Puspha Annabel et al.

have presented Learn++, which fits the flowing style and understands the concept of floating in the sequence. The creation of this document is currently in its early stages. A magnificent view from an independent plane will be worth it instead of a predetermined price for all travel places, as passengers may perceive bias while purchasing tickets [15]. To solve the plane pricing issue, ML ideas and techniques are employed to forecast air prices in relation to the timeframe issue. Using a combination of clustering, modeling, and methodology, an adaptive context-aware ensemble regression (ACER) model for flight price prediction is developed. ACER finds content-dataappropriate properties, allowing models to be trained. ACER performs better on all routes, according to the results [16]. During detailed monitoring, the passenger receives an approximation of plane price with date, allowing them to choose the best date and price combination [17]. Because the price difference between the weekend and other days of the week is the most accidental and requires more elements, a nonlinear model for successful forecasting, which will be the next area of research for this presented technique, the price for the weekend on Sunday is not possible to calculate in this model. The author employed artificial neural networks and genetic algorithms to anticipate air ticket sales revenue for the travel industry [18]. Inputs included international oil prices, Taiwan’s stock market weighted index, Taiwan’s monthly unemployment rate, and other variables. This job will aid in the formulation of superior customer tactics, which will help the company flourish. This work’s research topic is built on this premise, and it proposes and examines a new form of framework for retail businesses to follow. Dynamic pricing is proposed in this paper using a combination of mining, statistical, and machine learning techniques.

3 Proposed System Figure 1 is the proposed system named random forest-based airfare prediction (RFBP), which is used to predict the price of flights from and to popular metro cities in India. In our proposed work, the user enters details of travel such as their source of travel, their preferred destination, time of travel, journey date, stoppage information, and flight of their choice. After that, the data is analyzed, hidden trends are discovered, and various prediction and classification models are applied to the training set.

3.1 Data Preparation The system takes these factors into account and calculates the best pricing. Historical data is required to solve any prediction/classification problem. Past flight rates for each route, collected on a daily basis, are required for this project. Manually gathering

Machine Learning-Based Approach for Airfare Forecasting

905

Fig. 1 Framework of the proposed system

data on a daily basis is inefficient, so kaggle dataset was imported that used a Python script to scrape data from official websites on a regular basis. The data then needs to be processed according to the model’s requirements. Table 1 illustrates the attributes of the dataset. During the first phase, unnecessary data is removed and the dataset is made clean and purely mathematical for the ML models to act upon. Null values, non-numerical values, and NaN values are eliminated. Secondly, the data which can be substituted by other fields are taken out. For example, the arrival date and departure date are known; hence, there is no need for date_of_journey column and duration column, as both can be derived from former columns. Similarly, all other repetitive columns are removed and new columns with only numerical data are created. Table 1 Data attributes with sample values

S. No.

Attributes

Sample values

1

Date_of_Journey

24/03/2019

2

Airline

IndiGo

3

Source

Banglore

4

Destination

New Delhi

5

Route

BLR → DEL

6

Dep_Time

22:20

7

Arrival_Time

01:10 22 Mar

8

Duration

2 h 50 m

9

Total_Stops

non-stop

10

Additional_Info

No info

11

Price

3897

906

L. Sherly Puspha Annabel et al.

3.2 Feature Selection A feature selection strategy is used to explore the degree of impact of each feature on the prediction result in order to improve the model’s performance. It is the reduction of the number of input variables. To lower the computational cost of modelling and, in some situations, to increase the model’s performance, the number of input variables should be reduced. Using RF, we created an automated feature selection module. RF is a tree-based ensemble learning technique that creates multiple decision tree classifiers during the training phase and delivers projected outcomes based on either the majority vote (classification) or the average of all decision tree predictions (regression). To determine the relevance of a feature, the average decrease in impurity is employed. It is the overall reduction in impurity produced by the related feature, weighted by the probability of this node being included in the decision path. In general, a feature becomes more important when it has a larger impact on lowering prediction error.

3.3 Data Analysis After creating multiple modules, the next step is to test them on the testing set and determine the most appropriate metric for calculating accuracy. Furthermore, integrating models and forecasting a cumulative target variable are frequently shown to be more accurate. The dataset is now being fed into machine learning models for training and prediction. The R2 score, also known as the coefficient of determination, is used to evaluate the performance of a regression model. The amount of variance in the output dependent characteristic that can be predicted is based on the input independent variables. It is used to examine how effectively the model reproduces observed findings, based on the ratio of total deviation of results explained by the model. R2 denotes the percentage of data points that fall within the regression equation’s line. A greater R2 number indicates better results, which is ideal. R 2 = 1 − SSres /SStot

(1)

where the sum of squares of the residual errors is SSres and the entire sum of the errors is SStot . Algorithm for Determining R2 Coefficient Step 1: Dataset is split into train and test set. Step 2: Scaling is done if needed. Step 3: The necessary models are imported from ML libraries such as scikit. Step 4: Data is fit into ML models such as random forest. Step 5: Root mean square error, mean absolute error, mean squared error scores are obtained.

Machine Learning-Based Approach for Airfare Forecasting

907

Step 6: Coefficient of determination R2 value using built-in metrics function is calculated. The predicted model has an R2 value of 0.79, indicating that the model can explain 79% of the changeability of the dependent output attribute while leaving 21% of the variability unaccounted for. For visual interpretation, graphs are plotted providing better understandability. It is basically used for a univariant set of observations and plotted as a histogram. Displot is used to find the distribution of the features, i.e., where most data lie. Scatter plots are used to show how much one variable affects another by plotting data points on horizontal and vertical axes. Each row in the data table has a marker whose position is determined by the values in the columns on the X and Y axes. A third variable can be assigned to the colour or size of the markers, giving the map yet another dimension.

3.3.1

Random Forest

The random forest is a supervised learning method that has the benefit of being able to be utilized for both characterization and relapse issues, which are common in today’s machine learning frameworks. RF produces a huge number of decision trees, which are then combined to provide a more exact and stable expectation. The parameters of a RF model are almost identical to those of a decision tree or a stowing classifier model. Determining the significance of each element on the expectation is fairly simple when compared to the other elements in this process. The suggested system starts with a random forest regression tree, which has a training set accuracy of 95% and a test set accuracy of 79%.

3.3.2

Hyperparameter Tuning

In machine learning, hyperparameter optimization or tuning refers to the process of identifying a set of ideal hyperparameters for a learning algorithm. A hyperparameter is a numerical value assigned to a parameter that controls the learning process. To improve efficiency, hyperparameter tuning is performed which uses a brute force approach to select parameters and increases exponentially during each iteration. Accuracy is improved by 2% leading to overall accuracy of 81%.

3.3.3

Adaboosting

Adaptive boosting is the process of re-assigning weights to each instance, with greater weights being allocated to erroneously classified instances. Boosting is a technique used in supervised learning to reduce bias and variation. An AdaBoost regressor is a meta-estimator that fits a regressor on the original dataset first, then fits multiple copies of the regressor on the same dataset with the weights of instances changed

908

L. Sherly Puspha Annabel et al.

based on the current prediction’s error. As a result, subsequent regressors concentrate on the most difficult cases. Further, the usage of AdaBoost boosting algorithm improves the accuracy by 4% resulting in a final accuracy of 85%.

4 Results and Discussion 4.1 Experimental Results The flight data to and from the metro cities of India in the year 2019 is collected and is loaded. The following processing is performed for both train data and test data. The most essential features of a flight that impact the prices of plane tickets are decided during this phase. The goal of this phase is to identify the best feature that will contribute to the target variable and has a good relationship with it. For feature importance selection, the ExtraTreesRegressor module is employed. This class implements a meta estimator that employs averaging to increase predicted accuracy and control over-fitting by fitting a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset. The property feature importance is used to extract the features with increasing importance from the dataset. ExtraTreesRegressor. Feature_importance provides a numerical array of values correspondingly for each column, plotting the selection as a graph the following Figure 2 is obtained. From Figure 3a, it is understood that the utmost price value is below 10,000 rupees. The scatterplot for the given data set was obtained as shown in Figure 3b. The distribution of the data based on the price field in both test and prediction data. The price of tickets lies primarily in the range of 0–20,000 rupees.

Fig. 2 Feature selection

Machine Learning-Based Approach for Airfare Forecasting

909

Fig. 3 a Random forest displot. b Random forest scatterplot

The accuracy of the random forest model in predicting right output is 79% as concluded above. In order to further improve the accuracy, we applied Hyperparameter tuning and adaptive boosting. Cross-validation is often used to estimate this generalization performance. The most common cross-validation methods are RandomizedSearchCV and GridSearchCV. A search space is defined as a bounded domain of hyperparameter values with randomly chosen points in the RandomizedSearchCV method. After setting a search space as a grid of hyperparameter values, GridSearchCV evaluates every point in the grid. As a result, RandomizedSearchCV is used for parameter optimization. Figure 4a shows the density of features after performing RandomizedSearchCV, and it can be noted that the maximum density has been reduced 12.5% from its previous value. Figure 4b depicts the scatter plot graph of test data features after subjecting them to cross validation. The disbursement has its core concentrated in the range 5000–15,000. Figure 5a is the displot graph plotted after adaptive boosting has been performed on the dataset. The range has now been brought closer with the samples having prices

Fig. 4 a Hyperparameter tuning displot. b Hyperparameter tuning scatterplot

910

L. Sherly Puspha Annabel et al.

Fig. 5 a Displot after adaptive boosting. b Scatterplot after adaptive boosting

in the range from 0 to 5000 in a decreasing trend on the basis of density. Figure 5b shows the distribution of samples after boosting has been performed. The course of the graph is concentrated in the range from 0 to 15,000.

4.2 Comparative Analysis By treating the problem as a classification task, Papadakis [19] predicted whether the ticket price would fall in the future. Logistic regression, ripple down rule learner, and linear support vector machine were the ML models utilized, and their accuracies were 74.5%, 69.9%, and 69.4%, respectively. Tziridis et al. compared machine learning techniques such as multilayer perceptron, regression tree, etc. to predict airfare prices using eight features in their study. The techniques they used and their respective accuracies with papadakis are compared. The proposed system RFBP yields an accuracy of 85%, and Fig. 6 shows a graph that represents the comparative analysis.

5 Conclusion As a result, an optimal purchasing choice assistance service for aeroplane ticket prices has been prepared. The method helps users identify the most reasonable price in a given period by integrating the benefits of random forest regression prediction, Hyperparameter tuning, and adaptive boosting. Accuracy of the model is at its possible maximum given the factors such as global fuel price increase, increased layers of taxes, import prices, economic fall down in OPEC countries, etc., contribute to rise and decline of fuel prices and hence airfare prices too. Added to them, the political and social issues that affect price increase. If these factors were to be included,

Machine Learning-Based Approach for Airfare Forecasting

911

Fig. 6 Comparison between ML models

it makes the model susceptible to large variance and decrease in bias, hence leading to less accurate predictions. Reducing intermediate hassles on the user side by providing data, reducing redundancy, providing resilience by state management, maintaining integrity and consistency, the system provides all necessary facilities needed for the user while solving the problem of price prediction with high accuracy and user satisfaction. In future versions of the system, the same concept can be used to predict prices of other similar products such as cars, hotels and can be aggregated into a one-for-all system that provides travel services, pantry, and recreational services. Further with more insightful data, the range of prediction can be increased to predict the type of airlines and the time period to purchase flight tickets, covering more metro cities and airports. The proposed system RFBP yields an accuracy of 85% which outperforms all the existing system’s performance. Also, the system could be automated adequately to provide less manual observation like introducing chatbots and Intelligent Customer Management systems. Additional tests on bigger datasets of airfare data are required, but this preliminary work demonstrates the potential of machine learning models to assist consumers in making cost-effective flight purchases in the market.

References 1. Groves W, Gini M (2013) An agent for optimizing airline ticket purchasing. In: 12th International conference on autonomous agents and multi-agent systems (AAMAS 2013), St. Paul, MN, 06–10 May 2013, pp 1341–1342 2. Tziridis K, Kalampokas T, Papakostas GA, Diamantaras KI (2017) Airfare prices pre- diction using machine learning techniques. In: 25th IEEE European signal processing conference, pp 1036–1039 3. Santana EJ, Mastelini SM, Barbon Jr S (201) Deep regressor stacking for air ticket prices prediction. In: the XIII Brazilian symposium on information systems: information systems for participatory digital governance. Brazilian Computer Society (SBC), pp 25–31

912

L. Sherly Puspha Annabel et al.

4. Rama-Murthy K (2006) Modeling of United states airline fares using the official airline guide (OAG) and airline origin and destination survey (DB1B). Ph.D. dissertation, Virginia Tech, 5. Domínguez-Menchero JS, Rivera J, Torres-Manzanera E (2014) Optimal purchase timing in the airline market. J Air Transport Manage 40:137–143 6. Papakostas GA, Diamantaras KI, Papadimitriou T (2017) Parallel pattern classification utilizing GPU-Based kernelized slackmin algorithm. J Parallel Distrib Comput 99:90–99 7. Wang T, Pouyanfar S, Tian H, Tao Y, Alonso M, Luis S, Chen SC (2019) A framework for airfare price prediction: a machine learning approach. In: 2019 IEEE 20th international conference on information reuse and integration for data science (IRI), IEEE, pp 200–207 8. Narahari Y, Raju CVL, Ravikumar K, Shah S (2005) Dynamic pricing models for electronic business. Sadhana Acad Proc Eng Sci Indian Acad Sci 30(2–3):231–256 9. Biller S, Chan LMA, Simchi-Levi D, Swann J (2005) Dynamic pricing and the direct-tocustomer model in the automotive industry. Electron Commer Res 5(2):309–334 10. Narahari Y, Dayama P (2005) Combinatorial auctions for electronic business. Sadhana Acad Proc Eng Sci Indian Acad Sci 30(2–3):179–211 11. Rutherford D (1995) Routledge dictionary of economics. Taylor & Francis 12. Lippert J (2011) Dynamic price optimization for the retail market. Prudysys AG 13. Bergemann D, Välimäki J (2006) Dynamic pricing of new experience goods. J Polit Econ 114(4):713–743 14. Viswanathan S, Anandalingam G (2005) Pricing strategies for information goods. Sadhana 30(2–3):257–274 15. Liu T, Cao J, Tan Y, Xiao Q (2017) ACER: An adaptive context-aware ensemble regression model for airfare price prediction. In: 2017 International conference on progress in informatics and computing (PIC). IEEE, pp 312–317 16. Chen Y, Cao J, Feng S, Tan Y (2015) An ensemble learning based approach for building airfare forecast service. In: 2015 IEEE international conference on big data (big data). IEEE, pp 964–969 17. Boruah A, Baruah K, Das B, Das MJ, Gohain NB (2019) A bayesian approach for flight fare prediction based on kalman filter. In: Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 191–203 18. Huang H-C (2013) A hybrid neural network prediction model of air ticket sales. Telkomnika Indonesian J Electr Eng 11(11):6413–6419 19. Papadakis M (2014) Predicting Airfare Prices

Author Index

A Abdelnaby, Abdalla Hamada, 217 Abhijnan Bajpai, 637 Abhijnya Bhat, 637 Abinash Sahoo, 1 Aditya Pai, B., 587 Ahmed, Lopa, 677 Ahmed, Zakaria, 397 Amita Dev, 351 Amit Virmani, 81 Anh, Truong Tran Mai, 145 Anirudh P. Hebbar, 587 Anitha Nallasivam, 107 Annu Dhiman, 663 Anukrity Varshney, 663 Anushka Choudhury, 427 Aravind Nadanasabapathy, 225 Arif Ahmad Rather, 291 Arnab Sinhamahapatra, 707 Arnav Balyan, 243 Ashis Kumar Mandal, 545 Ashley Kurian, 571 Ashraf, Fardeen, 677 Ashu Rastogi, 757 Ashutosh Tripathi, 81 Ashwini Kumar, 479

B Bacanin, Nebojsa, 507 Bach, Khuat Duy, 43 Bamaiyi, Bawa, 209 Bam Bahadur Sinha, 741 Basabi Chakraborty, 545 Berardinis De, Pierluigi, 767

Bikram Pratim Bhuyan, 815 Bindu Verma, 663 Busygin, Volodymyr, 647

C Çamur, Hüseyin, 209, 217 Ch. Vijayendra Sai, 313

D Daway, Hazim G., 275 Deba Prakash Satapathy, 1 Devi, R., 51 Dipmala Salunke, 825 Divya Vetriveeran, 301 Djuric, Marko, 507 Duke, George Edem, 217 Duong, Phung Thai, 43 Dzhus, Oleksandr, 647

E Elizar, Elizar, 599

F Faeza Hasani, 663 Fedorov, Eugene, 159

G Ganes Raj Muthu Arumugam, 13 Gaurav Panchal, 825 Gaurav Srivastava, 533

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Saraswat et al. (eds.), Proceedings of International Conference on Data Science and Applications, Lecture Notes in Networks and Systems 552, https://doi.org/10.1007/978-981-19-6634-7

913

914 Geetanjali, 861 Gitesh Prajapati, 693 Grigoriadis, Ioannis, 261

H Hannan, Tanveer, 397 Harishchandra Akarte, 199 Harsha R. Gaikwad, 199 Hashim, Ahmed Rafid, 275 Heiden, Bernhard, 781 Hemantkumar Wani, 129 Hung, Phan Duy, 43 Huynh, Le Dinh, 43

I Ippili Saikrishnamacharyulu, 1 Ipsita Pattnaik, 351 Iqbal, Tasnia, 677 Islam, Md. Adnanul, 677 Islam, Md. Mohaiminul, 397

J Jahangeer Sidiq, S., 71 Jayavel Kanniappan, 225 Jebathangam, J., 807 Jithin Gangadharan, 225 Juveeryah Qureshi, 625 Jyoti Gajrani, 463

K Kakarla Ajay Kumar Reddy, 313 Kandula Neha, 71 Kanniga Devi, 175 kareem, Hana H., 275 Karthik, P. C., 427 Karunakar, A. K., 447 Kassem, Youssef, 209, 217 Kathiravan, P., 89

L Laboni Sarker, 397 Lathamaheswari, U., 807 Laurini, Eleonora, 767 Leena Sri, R., 301

M Mahanand, B. S., 417 Mahesh Babu Mariappan, 175

Author Index Majid Zaman, 71 Malik Manivanan Sehgar, 13 Manjushree D. Laddha, 199 Manoj M. V. Kumar, 587 Mansha Ali, 625 Manzoor Ahmad Chachoo, 291 Marianne Rhea, R., 369 Marjanovic, Marina, 507 Matviichuk, Andrii, 647 Mehreen Mushtaq, 625 Mir Aiman, 625 Mohapatra, A. K., 351 Moroz, Boris, 647 Muharar, Rusdha, 599 Mukta, Md. Saddam Hossain, 677 Muskan Didwania, 427

N Nagashree, S., 417 Nagendra Kumar, 707 Namita Tiwari, 81 Narender Kumar, 463 Natarajan, S., 637 Naveen Sharma, 757 Neeraj Yadav, 557 Neeru Kashyap, 861 Neskorodieva, Tetiana, 159 Niharika Ganji, 707 Nihar Ranjan, 825 Nitika Dhingra, 861 Nitesh Pradhan, 533 Nitin Saluja, 861 Nupur Chugh, 243

O Özdemir, Tu˘gberk, 209

P Pallavi H. Chitte, 725 Pallavi Tekade, 825 Papakostas, George A., 261 Parente, Rosaria, 33 Petrovic, Aleksandar, 507 Polok, Isbat Khan, 677 Pooja Singh, 693 Poonam Jindal, 861 Prakash, R., 901 Prasant K. Mahapatra, 757 Priyalakshmi, V., 51 Priyanka Kumar, 313 Pulkit Agarwal, 557

Author Index Purohit, S. D., 873

R Rahman, Md. Mahbubur, 677 Rahul Katarya, 495 Rahul Thakur, 557 Rahul, 693 Rajesh Kumar Jayavel, 225 Rakoth Kandan Sambandam, 301 Ram Joshi, 825 Ram Kumar, 71 Ramanan, G., 901 Ramesh Kini, M., 571 Ramesh, K. T., 323 Ravish, 495 Rekha Yadav, 849 Rikta Sen, 545 Rishav Kumar, 557 Rotilio, Marianna, 767 Rupal Shukla, 479

S Saad Yunus Sait, 427 Sandeep Kumar, 793 Sandeep Samantaray, 1 Sangita S. Chaudhari, 725 Sanil Gandhi, 199 Sanjeev Mishra, 873 Saptarsi Goswami, 545 Saransh Gupta, 533 Saranya, R., 89 Saravanan Muthaiyah, 13 Sarfaraj Mirza, 757 Sashchuk, Hanna, 647 Satbir Singh, 757 Sawan Kumar Tripathi, 873 Sayan Chakraborty, 323 Sejal Priya, 637 Selvaraj, C., 107 Shahnawaz Ali, 625 Shaswati S. Mishra, 1 Shavantrevva Bilakeri, 447 Sherly Puspha Annabel, L., 901 Shubhi Bansal, 707 Shvachych, Gennady, 647 Siddhali Doshi, 337 Sreenidhi, S., 901 Sridurga Sekar, 89

915 Stofa, Marzuraikah Mohd, 521 Stornelli, Vincenzo, 767 Strumberger, Ivana, 507 Suchita Arora, 793 Sudhanshu Srivastava, 815 Sujithkumar, S. H., 129 Sukaj, Silvana, 33 Sumanta Sarathi Sharma, 885 Sundeep V. V. S. Akella, 313 Sunil Kumar, 793 Swati Aggarwal, 243

T Tarun Biswas, 741 Thangaperumal, S., 369 Tonino-Heiden, Bianca, 781 Tsiatsiou, Iliana, 261

U Ubaid Qureshi, 625 Upadhyay, P., 839 Usha Kumari, 849 Utkina, Tetyana, 159

V Vaidehi Bhaskara, 323 Varun Kumar Paliwal, 885 Vesic, Ana, 507 Vinesh Kumar Jain, 463 Vita De, Mariangela, 767 Vrochidou, Eleni, 261 Vi, Tran Duc, 145

Y Yegnanarayanan Venkataraman, 175

Z Zainuri, Muhammad Ammirrul Atiqi Mohd, 521 Zaman, Mohd Hairi Mohd, 521 Zaw, Thein Oak Kyaw, 13 Zim, Sumaiya Kashmin, 677 Zivkovic, Miodrag, 507 Zulkifley, Mohd Asyraf, 521, 599